Monday, November 3, 2014

"NoSuchMethodError: com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;" when running spark-cassandra-connector

When I tried to update cassandra table using spark-cassandra-connector in a Spark application, I encountered this problem. The reason is that there are multiple versions of com.google.guava:guava exists. I'm using CDH 5.1.0 with Spark 1.0.0. Spark uses guava-14.0.1, Hadoop mapreduce use guava-11.0.2, and spark-cassandra-connector uses guava-15.0.0. The similar issue is reported here: https://github.com/datastax/spark-cassandra-connector/issues/292 and https://github.com/datastax/spark-cassandra-connector/issues/326 I tried to use spark.files.userClassPathFirst=true, there are other errors. I tried to put guava-15.0 jar to SPARK_CLASSPATH, the driver side didn't report error, but failed on the worker side. Actually the solution is very simple, in you Spark project, exclude guava from spark-cassandra-connector.
  
   com.datastax.spark
   spark-cassandra-connector_${scala.major}
   
    
     com.codahale.metrics
     metrics-core
    
    
     com.google.guava
     guava
    
   
   1.1.0-beta1
  
When you run spark-submit, don't put guava-15.0 to --jars or the classpath.