If you run Hadoop shell commands on console or use them to write a script, you will hate that because it loads and starts JVM for every command. A command like "hadoop fs -ls /tmp/abc" usually takes 3~4 seconds on my VirtualBox VM running CentOS 6.5 with 8 virtual core and 12GB.
$ time hadoop fs -ls /tmp/abc Found 2 items drwxrwxrwx - bwang supergroup 0 2014-03-10 16:25 /tmp/abc/2014-03-10 drwxr-xr-x - bwang supergroup 0 2014-03-14 14:57 /tmp/abc/567 real 0m3.632s user 0m4.146s sys 0m2.650s
I have been curious whether Nailgun can help me save time by running those commands. I just figured out today. It turns out pretty easy.
- Install nailgun: Just clone from github, and follow the instruction in README.md. I only ran "mvn clean package" and "make".
$ cd ~/git/nailgun $ mvn clean package $ make $ ls Makefile nailgun-examples ng README.md nailgun-client nailgun-server pom.xml $ ls nailgun-server/target/ apidocs nailgun-server-0.9.2-SNAPSHOT.jar classes nailgun-server-0.9.2-SNAPSHOT-javadoc.jar javadoc-bundle-options nailgun-server-0.9.2-SNAPSHOT-sources.jar maven-archiver surefire maven-status
- Start Nailgun server: the trick is you need to put Hadoop classpath.
$ java -cp `hadoop classpath`:/home/bwang/git/nailgun/nailgun-server/target/nailgun-server-0.9.2-SNAPSHOT.jar com.martiansoftware.nailgun.NGServer NGServer 0.9.2-SNAPSHOT started on all interfaces, port 2113.
- Setup aliases: you can setup aliases so that you can run the same hadoop shell command just like with nailgun.
$ alias hadoop='$HOME/git/nailgun/ng' $ hadoop ng-alias fs org.apache.hadoop.fs.FsShell $ hadoop ng-alias fs org.apache.hadoop.fs.FsShell ng-alias com.martiansoftware.nailgun.builtins.NGAlias Displays and manages command aliases ng-cp com.martiansoftware.nailgun.builtins.NGClasspath Displays and manages the current system classpath ng-stats com.martiansoftware.nailgun.builtins.NGServerStats Displays nail statistics ng-stop com.martiansoftware.nailgun.builtins.NGStop Shuts down the nailgun server ng-version com.martiansoftware.nailgun.builtins.NGVersion Displays the server version number. $ time hadoop fs -ls /tmp/abc Found 2 items drwxrwxrwx - bwang supergroup 0 2014-03-10 16:25 /tmp/abc/2014-03-10 drwxr-xr-x - bwang supergroup 0 2014-03-14 14:57 /tmp/abc/567 real 0m0.046s user 0m0.000s sys 0m0.008s
- create some shell script so that you won't remember those long command.
Hey Ben, I just shot you a LinkedIn message and came across your blog here. I don't want to bug you too much but if there was an opportunity that was clearly better than the one your in, would it be worth at least having an exploratory conversation regarding? I'm working with an amazing local company that I believe to be that opportunity for you. I just want to make sure you knew that I think you'd be a perfect fit. Please tell me if I'm on the right track or if you have no interest, I can be reached at 425-998-9462 or elliot.yee@randstadusa.com.
ReplyDeleteThanks so much Ben, have a good one!
-Elliot