Wednesday, August 14, 2013

Scala and Java collection interoperability in MapReduce job

In Scala, you can import scala.collections.JavaConversions._ to make collections interoperabable between Scala and Java. for example
scala.collection.Iterable <=> java.lang.Iterable
Usually I prefer Scala collection API because it is concise and powerful. But be careful, this may not work in all cases. I encountered this problem when I wrote a Scala MapReduce job:
// do something on values(1)
values.drop(1).foreach { v =>
  ...
}
The code tries to handle the first element and the rest differently. This piece of code worked in the combiner perfectly, but failed in the reducer. Both the combiner and reducer use values.drop(1).foreach The reason is, I believe, that the iterable in reducer is based on a file, the file position cannot go back. When you call drop(1) in Scala, the file position moves to next, then two elements are actually dropped.

1 comment:

  1. Hi admin thanks for sharing informative article on hadoop technology. In coming years, hadoop and big data handling is going to be future of computing world. This field offer huge career prospects for talented professionals. Thus, taking Hadoop & Spark Training in Hyderabad will help you to enter big data hadoop & spark technology.

    ReplyDelete