Tuesday, October 18, 2011

Puppet logs

It took me hours to figure out how to make puppet write the log files. I'm using EPEL puppet-2.6.6 on CentOS 5.6 x86_64. Puppet documentation is misleading, you may find puppetdlog and masterlog in this page Puppet Configuration Reference. I tried to set them as syslog or a file in /var/log/puppet, neither worked. It turns out that you have to give --logdest option on the command line. If you run puppet service, you can set the option in /etc/sysconfig/puppet and /etc/sysconfig/puppetmaster here is /etc/sysconfig/puppet
# Where to log to. Specify syslog to send log messages to the system log.
PUPPET_LOG=/var/log/puppet/agent.log

# Autoflush logs
PUPPET_EXTRA_OPTS=--autoflush
and /etc/sysconfig/puppetmaster
PUPPETMASTER_LOG=/var/log/puppet/master.log

PUPPETMASTER_EXTRA_OPTS=--autoflush
It is better to add --autoflush. I like to use puppet kick and monitor the log. Without --autoflush, puppet seems not working because the log is not written to disk.

Thursday, October 13, 2011

Hadoop Cluster Monitoring

Set zGangliaHost for a lot of servers

I realized that it was a big problem to set zGangliaHost for my Hadoop clusters, totally 55 servers. Fortunately, Zenoss is powerful if you can write some small Python code. Here is my solution: Load them using zenbatchload like this:
$ zenbatchload dev-cluster.txt
Here is the dev-cluster.txt:
/Devices/Server/SSH/Linux/Ganglia
    devnode001 comments="My Hadoop DEV cluster, client node", zGangliaHost="devnode001", setGroups='/Hadoop/DEV/ClientNode'
    devnode002 comments="My Hadoop DEV cluster, master node", zGangliaHost="devnode001", setGroups='/Hadoop/DEV/MasterNode'
    devnode003 comments="My Hadoop DEV cluster, data node", zGangliaHost="devnode001", setGroups='/Hadoop/DEV/DataNode'
You can set zGangliaHost and groups in a file. It is perfect. You can generate this file easily using a script. ZENBATCHLOAD HOW TO may be obsolete. My Zenoss version (3.1.0) doesn't use -i. My Zenoss administrator created the devices for my clusters without zGangliaHost. I don't want him to delete those devices and use zenbatchload. Here is my solution.
$ zendmd --script=set_gangliahost.py
Here is set_gangliahost.py:
import re

dev = re.compile('(hdcl001|had002|had01[0-2]).*', re.IGNORE
CASE)
test = re.compile('(hdcledw002|had001|had01[0-2]).*', re.IGNOR
ECASE)
prod = re.compile('prod.*', re.IGNORECASE)
for item in dmd.Devices.Server.SSH.Linux.Ganglia.devices.objectItems():
    (name, device) = item
    if dev.match(name):
        device.zGangliaHost = 'dev-gmond'
    elif test.match(name):
        device.zGangliaHost = 'test-gmond'
    elif prod.match(name):
        device.zGangliaHost = 'prod-gmond'
commit()
dev-gmond is the server name where you run gmond

Zenoss Ganglia ZenPack Fix

My company uses Zenoss to monitor all Linux hosts. And we want to use Ganglia ZenPack to monitor our Hadoop clusters. The ZenPack from this link doesn't work in my Zenoss, community version 3.1.0. A lot of weird things happened when I loaded the egg using "Advance -> Settings -> ZenPacks -> Install ZenPack ...". I also tried the source code of the ZenPack in github, and no luck. Finally I fixed the issue of Ganglia ZenPack and it worked perfectly in Zenoss 3.1.0. I created an ZenPack in Zenoss using "Create ZenPack ...", and used the folders under $ZENHOME/ZenPacks as a skeleton. And copied the files in the ZenPack from github except the skins. I also changed this line, but I don't remember if this is important.
diff ZenPacks/jschroeder/GangliaMonitor/datasources/GangliaMonitorDataSource.py /workplace/ws-zenpacks/ZenPacks.jschroeder.GangliaMonitor/ZenPacks/jschroeder/GangliaMonitor/datasources/GangliaMonitorDataSource.py 
72c72
<             return self.hostname
---
>             return self.host
Because I am a newbie of Zenoss, the following may be useful for you:
  • Switch to zenoss sudo su - zenoss
  • run zenpack --link --install=/tmp/ZenPacks.jschroeder.GangliaMonitor
  • If everything is correct, you should see ZenPack in the web page.
You'd better use command line tool zenpack, it will save you a lot of time if you are new to Zenoss and Python. You probably need to run zopectl restart time to time. From my experience with Zenoss, if your ZenPack works, everything looks perfect. If you have something wrong in your ZenPack, you are doomed because Zenoss doesn't tell you too much useful information, especially when you use Zenoss web. For example, I deleted MANIFEST.in in the folder, and the egg was built without the folders libexec and objects/objects.xml. After I installed this egg, ZenPack appears in the webpage after I run "zopectl restart", but it never worked as I expected.

Thursday, October 6, 2011

Puppet kick

I encountered several problems when I tried puppet kick. I did setup /etc/hosts to resolve pslave1 and could ping the host. It turns out that I have to enable tcp/8139 on pslave1's firewall.
$ sudo puppet kick -f --debug --host pslave1.puppet-test.com
Triggering pslave1.puppet-test.com
Host pslave1.puppet-test.com failed: No route to host - connect(2)
pslave1.puppet-test.com finished with exit code 2
Failed: pslave1.puppet-test.com
Then I run into another problem, I did add the following in /etc/puppet/auth.conf like this (THIS IS WRONG)
# this one is not stricly necessary, but it has the merit
# to show the default policy which is deny everything else
path /
auth any

path /run
method save
allow pmaster.puppet-test.com
And I did add run this command to create namespaceauth.conf
sudo touch /etc/puppet/namespaceauth.conf
But it is still don't allow me to kick the agent:
warning: Denying access: Forbidden request: pmaster.puppet-test.com(192.168.56.101) access to /run/pslave1.puppet-test.com [save] authenticated  at line 93
err: Forbidden request: pmaster.puppet-test.com(192.168.56.101) access to /run/pslave1.puppet-test.com [save] authenticated  at line 93
Finally I found why: because I put "path /run" after "path /". Here is the correct auth.conf
path /run
auth any
method save
allow pmaster.puppet-test.com

# this one is not stricly necessary, but it has the merit
# to show the default policy which is deny everything else
path /
auth any
You can run puppet agent like this to get the debug information:
sudo puppet agent --listen --debug --no-daemonize --verbose

Puppet master, symlink and SELinux

I created a puppet module p4 under my home folder and symlinked the module folder into /etc/puppet/modules. I can run sudo puppet apply test.pp succefully on the master, but when I ran
sudo puppet agent --no-daemonize --verbose --onetime
on an agent machine, I got the following error:
err: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class p4 at /etc/puppet/manifests/nodes.pp:2 on node pslave1.puppet-test.com

This page is helpful: http://groups.google.com/group/puppet-users/browse_thread/thread/66361418d801a97c. But my situation is different, the permission of module folders is rwxrwxr-x. I ran this command
sudo strace -e trace=file -f puppet master --no-daemonize --debug 2>&1 | tee log
It turned out that there WAS a "permission denied" issue:
[pid 15508] stat("/etc/puppet/modules/p4", 0x7fff44cfb630) = -1 EACCES (Permission denied)
After I copied p4 folder to /usr/share/puppet/modules, everything worked. SELinux is installed on my CentOS. It must be SELinux that blocks puppet to access the file.