Showing posts with label lzop lzo yarn cloudera manager codecs. Show all posts
Showing posts with label lzop lzo yarn cloudera manager codecs. Show all posts

Wednesday, September 11, 2013

Deploy LZO for YARN in CDH4

This page Using the LZO Parcel is only for MRv1, not for YARN. It took me a while to figure out how to set up LZO in YARN correctly.

You may experience different error messages if you do not configure YARN correctly:

  • Class com.hadoop.compression.lzo.LzoCodec not found.
  • Class com.hadoop.mapred.DeprecatedLzoTextInputFormat not found.
  • No LZO codec found, cannot run.
  • native-lzo library not available

Here are the steps to setup LZO correctly:
  • You can follow the instruction in "Using the LZO Parcel" to install and activate the parcel
  • Add LzoCodec and LzopCodec. In cloudera manager, find the field for core-site.xml: hdfs1->Configuration -> Service-Wide -> Advanced -> Cluster-wide Configuration Safety Valve for core-site.xml. and add this property:
    
        io.compression.codecs
     org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec
    
    
  • Add classpath and native library. In Cloudera manager, find this field in mapred-site.xml: yarn1->Configuration->Service-Wide->Advanced->YARN Service MapReduce Configuration Safety Valve, then add the following two properties:
      
        mapreduce.application.classpath
        $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*
      
      
        mapreduce.admin.user.env
        LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
      
    
  • Restart YARN and put the configuration files to gateways.
  • Don't forget to run "Deploy Client Configuration"