When you write a DataFrame to a cassandra table, be careful to use
SaveMode.Overwrite
. In spark-cassandra-connector-1.6.0-M2
, TRUNCATE $keyspace.$table
will be called. See the code in CassandraSourceRelation.scala.
I did observe something weird when I use the following code to write a data frame to a cluster of Cassandra 2.1.8:
df.write
.format(""org.apache.spark.sql.cassandra")
.mode(SaveMode.Overwrite)
.options(Map("table" -> table, "keyspace" -> keyspace))
.save
After the scheduled spark job finishes, in CQLSH, the table is empty when running
select * from keyspace.table limit 10
. The same results if I change consistency level to QUORUM, and even ALL. It might take some time, then the query returns the results.
If I start the job manually from the command line, however, most of time the query can return the results.
If you check the CQL document for
TRUNCATE
, setting consistency level to ALL
is required. Note: The consistency level must be set to ALL prior to performing a TRUNCATE operation. All replicas must remove the data.
I don’t think the consistency level is changed before calling
TRUNCATE $keyspace.$table
in spark-cassandra-connector
. The default consistency level is LOCAL_QUORUM
. That might be the root cause.
No comments:
Post a Comment