Wednesday, September 11, 2013

Deploy LZO for YARN in CDH4

This page Using the LZO Parcel is only for MRv1, not for YARN. It took me a while to figure out how to set up LZO in YARN correctly.

You may experience different error messages if you do not configure YARN correctly:

  • Class com.hadoop.compression.lzo.LzoCodec not found.
  • Class com.hadoop.mapred.DeprecatedLzoTextInputFormat not found.
  • No LZO codec found, cannot run.
  • native-lzo library not available

Here are the steps to setup LZO correctly:
  • You can follow the instruction in "Using the LZO Parcel" to install and activate the parcel
  • Add LzoCodec and LzopCodec. In cloudera manager, find the field for core-site.xml: hdfs1->Configuration -> Service-Wide -> Advanced -> Cluster-wide Configuration Safety Valve for core-site.xml. and add this property:
  • Add classpath and native library. In Cloudera manager, find this field in mapred-site.xml: yarn1->Configuration->Service-Wide->Advanced->YARN Service MapReduce Configuration Safety Valve, then add the following two properties:
  • Restart YARN and put the configuration files to gateways.
  • Don't forget to run "Deploy Client Configuration"

1 comment:

  1. Hi,

    Thanks for your article. It helped me some with my cluster. However, I found there's yet another setting that needs to be set for Hive to function properly with LZO. I described it here:

    Hopefully it's helpful to somebody.