Tuesday, June 18, 2013

Avro-mapred-1.7.3-hadoop2 for AvroMultipleOutputs

I got the following error message in my MapReduce job when I ran it in CDH 4.2.0 cluster:

2013-06-18 12:50:11,095 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
 at org.apache.avro.mapreduce.AvroMultipleOutputs.getNamedOutputsList(AvroMultipleOutputs.java:218)
 at org.apache.avro.mapreduce.AvroMultipleOutputs.(AvroMultipleOutputs.java:351)

It turns out that avro-mapred-1.7.3 causes this problem. My sbt project has a dependency on hive-exec which depends on avro-mapred-1.7.3. To eliminate this error, you should exclude avro-mapred from hive-exec, and add avro-mapred-1.7.3-hadoop2

If you have hive-exec-0.10.0-cdh4.2.0 in your project, you have trouble to see the source code for Avro because this jar include a copy of all avro classes, and hive-exec-0.10.0-cdh4.2.0-sources.jar doesn't include the source codes of Avro.

AvroMultipleOutputs in 1.7.3 doesn't support different outputs have different output schema. See Avro-1266.

1 comment:

  1. HI can you please post an example of how to exlude it. I guess we should be doing that in pom.xml