#!/usr/bin/env ruby require 'rubygems' require 'pty' require 'expect' require 'io/console' hosts_file = ARGV[0] || "hosts" print "Password:" password = $stdin.noecho(&:gets) password.chomp! puts $expect_verbose = true File.open(hosts_file).each do |host| host.chomp! print "Copying id to #{host} ... " begin PTY.spawn("ssh-copy-id #{host}") do |cp_out, cp_in, pid| begin pattern = /#{host}'s password:/ cp_out.expect(pattern, 10) do |m| cp_in.printf("#{password}\n") end cp_out.readlines rescue Errno::EIO ensure Process.wait(pid) end end rescue PTY::ChildExited => e puts "Exited: #{e.status}" end status = $? if status == 0 puts "Done!" else puts "Failed with exit code #{status}!" end end
Thursday, December 27, 2012
Have to read all from Ruby PTY output
I wrote a Ruby script to call ssh-copy-id to distribute my public key to a list of hosts. It took me a while to make it work. The code is simple: read the password, and call ssh-copy-id for each host to copy the public key. Of course, the code doesn't consider all scenarios like the key is already copied, and wrong password. The tricky parts are "cp_out.readlines" and "Process.wait(pid)". If you don't read all the data from cp_out (comment out cp_out.readlines), the spawned process won't return.
Tuesday, December 11, 2012
Puppet require vs. include vs. class
According to puppet reference include and require:
- Both include and require are functions;
- Both include and require will "Evaluate one or more classes";
- Both include and require cannot handle parametered classes
- require is a superset of include, because it "adds the required class as a dependency";
- require could cause "nasty dependency cycle";
- require is "largely unnecessary"; See puppet language guide.
Puppet also has a require function, which can be used inside class definitions and which does implicitly declare a class, in the same way that the include function does. This function doesn’t play well with parameterized classes. The require function is largely unnecessary, as class-level dependencies can be managed in other ways.
- We can include a class multiple times, but cannot declare a class multiple times.
class inner { notice("I'm inner") file {"/tmp/abc": ensure => directory } } class outer_a { # include inner class { "inner": } notice("I'm outer_a") } class outer_b { # include inner class { "inner": } notice("I'm outer_b") } include outer_a include outer_b
Duplicate declaration: Class[Inner] is already declared in file /home/bewang/temp/puppet/require.pp at line 11; cannot redeclare at /home/bewang/temp/puppet/require.pp:18 on node pmaster.puppet-test.com
- You can safely include a class, first two examples pass, but you cannot declare class inner after outer_a or outer_b like the third one:
class inner { } class outer_a { include inner } class outer_b { include inner } class { "inner": } include outer_a include outer_b
class { "inner": } class { "outer_a": } class { "outer_b": }
include outer_a include outer_b class { "inner": } # Duplicate redeclaration error
Thursday, December 6, 2012
Hive Metastore Configuration
Recently I wrote a post for Bad performance of Hive meta store for tables with large number of partitions. I did tests in our environment. Here is what I found:
- Don't configure a hive client to access remote MySQL database directly as follows. The performance is really bad, especially when you query a table with a large number of partitions.
javax.jdo.option.ConnectionURL jdbc:mysql://mysql_server/hive_meta javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName hive_user javax.jdo.option.ConnectionPassword password
- On database server, use the same configuration as above
- Start the hive metasore service
hive --service metastore # If use CDH yum install hive-metastore /sbin/service hive-metastore start
hive.metastore.uris thrift://mysql_server:9083
ERROR conf.HiveConf: Found both hive.metastore.uris and javax.jdo.option.ConnectionURL Recommended to have exactly one of those config keyin configurationThe reason is: when Hive does partition pruning, it will read a list of partitions. The current metastore implementation uses JDO to query the metastore database:
- Get a list of partition names using db.getPartitionNames()
- Then call db.getPartitionsByName(List<Strin> partNames). If the list is too large, it will load in multiple times, 300 for each load by default. The JDO calls like this
- For one MPartition object.
- Send 1 query to retrieve MPartition basic fields.
- Send 1 query to retrieve MStorageDescriptors
- Send 1 query to retrieve data from PART_PARAMS.
- Send 1 query to retrieve data from PARTITION_KEY_VALS.
- ...
- Totally 10 queries for one MPartition. Because MPartition will be converted into Partition before send by, all fields will be populated
- If one query takes 40ms in my environment. And you can calculate how long does it take for thousands partitions.
- Using remote Hive metastore service, all those queries happens locally, it won't take that long for each query, so you can get performance improved significantly. But there are still a lot of queries.
I also wrote ObjectStore using EclipseLink JPA with @BatchFetch. Here is the test result, it will at least 6 times faster than remote metastore service. It will be even faster.
Partitions | JDO Remote MySQL |
Remote Service
| EclipseLink Remote MySQL |
10 | 6,142 | 353 | 569 |
100 | 57,076 | 3,914 | 940 |
200 | 116,216 | 5,254 | 1,211 |
500 | 287,416 | 21,385 | 3,711 |
1000 | 574,606 | 39,846 | 6,652 |
3000 | 132,645 | 19,518 |
Subscribe to:
Posts (Atom)