Friday, March 14, 2014

Run hadoop shell command Super Fast

If you run Hadoop shell commands on console or use them to write a script, you will hate that because it loads and starts JVM for every command. A command like "hadoop fs -ls /tmp/abc" usually takes 3~4 seconds on my VirtualBox VM running CentOS 6.5 with 8 virtual core and 12GB.

$ time hadoop fs -ls /tmp/abc
Found 2 items
drwxrwxrwx   - bwang supergroup          0 2014-03-10 16:25 /tmp/abc/2014-03-10
drwxr-xr-x   - bwang supergroup          0 2014-03-14 14:57 /tmp/abc/567

real 0m3.632s
user 0m4.146s
sys 0m2.650s

I have been curious whether Nailgun can help me save time by running those commands. I just figured out today. It turns out pretty easy.

  • Install nailgun: Just clone from github, and follow the instruction in README.md. I only ran "mvn clean package" and "make".
    $ cd ~/git/nailgun
    $ mvn clean package
    $ make
    $ ls
    Makefile        nailgun-examples  ng       README.md
    nailgun-client  nailgun-server    pom.xml
    $ ls nailgun-server/target/
    apidocs                 nailgun-server-0.9.2-SNAPSHOT.jar
    classes                 nailgun-server-0.9.2-SNAPSHOT-javadoc.jar
    javadoc-bundle-options  nailgun-server-0.9.2-SNAPSHOT-sources.jar
    maven-archiver          surefire
    maven-status
    
  • Start Nailgun server: the trick is you need to put Hadoop classpath.
    $ java -cp `hadoop classpath`:/home/bwang/git/nailgun/nailgun-server/target/nailgun-server-0.9.2-SNAPSHOT.jar com.martiansoftware.nailgun.NGServer
    NGServer 0.9.2-SNAPSHOT started on all interfaces, port 2113.
    
  • Setup aliases: you can setup aliases so that you can run the same hadoop shell command just like with nailgun.
    $ alias hadoop='$HOME/git/nailgun/ng'
    $ hadoop ng-alias fs org.apache.hadoop.fs.FsShell
    $ hadoop ng-alias
    fs              org.apache.hadoop.fs.FsShell                      
    
    ng-alias        com.martiansoftware.nailgun.builtins.NGAlias      
                    Displays and manages command aliases
    
    ng-cp           com.martiansoftware.nailgun.builtins.NGClasspath  
                    Displays and manages the current system classpath
    
    ng-stats        com.martiansoftware.nailgun.builtins.NGServerStats
                    Displays nail statistics
    
    ng-stop         com.martiansoftware.nailgun.builtins.NGStop       
                    Shuts down the nailgun server
    
    ng-version      com.martiansoftware.nailgun.builtins.NGVersion    
                    Displays the server version number.
    $ time hadoop fs -ls /tmp/abc
    Found 2 items
    drwxrwxrwx   - bwang supergroup          0 2014-03-10 16:25 /tmp/abc/2014-03-10
    drwxr-xr-x   - bwang supergroup          0 2014-03-14 14:57 /tmp/abc/567
    
    real    0m0.046s
    user    0m0.000s
    sys     0m0.008s
    
  • create some shell script so that you won't remember those long command.

1 comment:

  1. Hey Ben, I just shot you a LinkedIn message and came across your blog here. I don't want to bug you too much but if there was an opportunity that was clearly better than the one your in, would it be worth at least having an exploratory conversation regarding? I'm working with an amazing local company that I believe to be that opportunity for you. I just want to make sure you knew that I think you'd be a perfect fit. Please tell me if I'm on the right track or if you have no interest, I can be reached at 425-998-9462 or elliot.yee@randstadusa.com.

    Thanks so much Ben, have a good one!

    -Elliot

    ReplyDelete