This page Using the LZO Parcel is only for MRv1, not for YARN. It took me a while to figure out how to set up LZO in YARN correctly.
You may experience different error messages if you do not configure YARN correctly:
- Class com.hadoop.compression.lzo.LzoCodec not found.
- Class com.hadoop.mapred.DeprecatedLzoTextInputFormat not found.
- No LZO codec found, cannot run.
- native-lzo library not available
- You can follow the instruction in "Using the LZO Parcel" to install and activate the parcel
- Add LzoCodec and LzopCodec. In cloudera manager, find the field for core-site.xml:
hdfs1->Configuration -> Service-Wide -> Advanced -> Cluster-wide Configuration Safety Valve for core-site.xml. and add this property:
io.compression.codecs org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec - Add classpath and native library. In Cloudera manager, find this field in mapred-site.xml: yarn1->Configuration->Service-Wide->Advanced->YARN Service MapReduce Configuration Safety Valve, then add the following two properties:
mapreduce.application.classpath $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/* mapreduce.admin.user.env LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native - Restart YARN and put the configuration files to gateways.
- Don't forget to run "Deploy Client Configuration"