Friday, August 19, 2011

How to debug Hadoop MapReduce jobs in Eclipse.

It is actually very easy to debug Hadoop MapReduce jobs in Eclipse, especially when you use maven.
  1. Create a maven project using m2eclipse.
  2. Add org.apache.hadoop:hadoop-core as dependency.
  3. You can set breakpoint at any line in your code.
  4. Right-click your drive class, Debug As -> Java Application
  5. In arguments tab of launch configuration, put "-fs file:/// -jt local -Dmapred.local.dir=c:/temp/hadoop your_input_file c:/temp/hadoop/output" in "Program arguments"
  6. If you run on Windows, you have to use Cygwin because hadoop uses external shell command "chmod". In Environment tab, add environment variable PATH, value is ${env_var:path};c:\cygwin\bin. Then hadoop can find chmod.
  7. Click debug, you can debug your MapReduce code in eclipse. Hadoop is running in local mode.




No comments:

Post a Comment