Thursday, January 5, 2012

Debug Python Process

The python process of our cloudera Hue sometimes uses 100% CPU. I found this method on stackoverflow. It worked although I still couldn't find why Hue goes into an infinite loop, but at least I can attach the process and see what the process is doing. here is the link:
However, sometimes I need to debug a process that I didn't have the foresight to install the signal handler in. On linux, you can attach gdb to the process and get a python stack trace with some gdb macros. Put in ~/.gdbinit, then:

Attach gdb: gdb -p PID
Get the python stack trace: pystack
It's not totally reliable unfortunately, but it works most of the time.
Here are the better ways to debug a python process:
  • Use strace -ppid. If the process is running as another user, use sudo -u hue strace -ppid. hue is the user account for Cloudera Hue.
  • Install python-debuginfo-2.4.3-44.el5 for Python 2.4 or python26-debuginfo-2.6.5-6.el5 for Python 2.6. And you can use this yum repository to get those packages, create /etc/yum.repos.d/debuginfo.repo :
    name=CentOS-$releasever - DebugInfo
    # CentOS-5
    # CentOS-5