Thursday, January 5, 2012

Debug Python Process

The python process of our cloudera Hue sometimes uses 100% CPU. I found this method on stackoverflow. It worked although I still couldn't find why Hue goes into an infinite loop, but at least I can attach the process and see what the process is doing. here is the link: http://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application
However, sometimes I need to debug a process that I didn't have the foresight to install the signal handler in. On linux, you can attach gdb to the process and get a python stack trace with some gdb macros. Put http://svn.python.org/projects/python/trunk/Misc/gdbinit in ~/.gdbinit, then:

Attach gdb: gdb -p PID
Get the python stack trace: pystack
It's not totally reliable unfortunately, but it works most of the time.
Here are the better ways to debug a python process:
  • Use strace -ppid. If the process is running as another user, use sudo -u hue strace -ppid. hue is the user account for Cloudera Hue.
  • Install python-debuginfo-2.4.3-44.el5 for Python 2.4 or python26-debuginfo-2.6.5-6.el5 for Python 2.6. And you can use this yum repository to get those packages, create /etc/yum.repos.d/debuginfo.repo :
    [debuginfo]
    name=CentOS-$releasever - DebugInfo
    # CentOS-5
    baseurl=http://debuginfo.centos.org/$releasever/$basearch/
    gpgcheck=0
    enabled=1
    # CentOS-5
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-5
    protect=1
    priority=1