Roll Your Own Java Daemon OutOfMemory Handler

In this post I share one way of getting your Java daemon up and running again after it crashed to a grinding halt with a dreaded java.lang.OutOfMemoryError using Java 5. Our concern here is not looking into potential memory leaks, rather we take a sys admin view and our concern is simply to get a system daemon that had been knocked down, back up again.

First some context, I had the privilege of being able to dedicate a few weeks to sys admin work, and concentrated on writing an extensive shell script to install and configure JBoss 4.2.2.GA on Ubuntu 10.4 LTS. I won’t go into the details, but the one relevant issue I had to grapple with was suitable JVM memory options for the in effect “Hello World” root web site I configured as Tomcat root. I ended up configuring the JVM with what I took to be bare minimum amounts of memory and then started experimenting with Apache JMeter. To my horror one of my first iterations of a very basic HTTP Test Plan resulted in a java.lang.OutOfMemoryError meaning that my simple “Hello World” web site had keeled over thanks to the underlying JVM running out of memory. In my mind the next step was to take a sys admin view and simply to try to get the service back up once it had died.

The first consideration is how would one detect that an OutOfMemoryError occurred. One option is to write a script that looks at your jboss log file, waiting to pounce once it sees an OutOfMemoryError, such work is entirely unnecessary however thanks to the -XX:+HeapDumpOnOutOfMemoryError command-line option which was introduced in Java SE release 5.0 update 7 (search for HeapDumpOnOutOfMemoryError on this page for more details). As detailed in the Sun Java 5 Troubleshooting and Diagnostic Guide, this options tells the VM to generate a heap dump when the first thread throws a java.lang.OutOfMemoryError because the Java heap or the permanent generation is full. Regretfully there is no option in any release of the Sun Java 5 JVM to run a script upon the occurrence of an OutOfMemoryError (this has been rectified in the Sun Java 6 JVM with the -XX:OnOutOfMemoryError=”<cmd args>;
<cmd args>” option) however the -XX:HeapDumpPath options gives one the option to write the HeapDump to a file, and so one can react to that event.

I could not upgrade to the Sun Java 6 JVM, and so chose to use the option of  writing the HeapDump to a file, resulting in the relevant section of my JBoss 4.2.2GA run.conf file looking like this:


#
# Specify options to pass to the Java VM.
#
if [ "x$JAVA_OPTS" = "x" ]; then
JAVA_OPTS="-Xms64m -Xmx64m -XX:PermSize=64m -XX:MaxPermSize=64m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/jboss4/heapdump -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djboss.server.log.dir=/var/log/jboss4 -Dcom.sun.management.jmxremote"
fi

The next step was to write the bash script that would handle the above event and to update crontab to run this script every minute. In effect we are polling for the existence of the HeapDump file, there is probably a much better way of doing this, and please suggest a better way if you know of one. Here is jboss_rebooter.sh, which works but is still a work in progress as you’ll see in the comments.

#!/bin/bash

# This script will check for the existance of Java heap dump files which would
# be produced as a result of the Java 5 JVM being run with the following options:
#
# -XX:+HeapDumpOnOutOfMemoryError
# -XX:HeapDumpPath=/var/log/jboss4/heapdump
#
# If a heap dump file is found, the script will kill -9 the JVM process running
# JBOSS identified by /var/run/jboss4.pid and the start JBOSS with
# /etc/init.d/jboss start, but it will also delete the jboss4.pid file, which
# we expect to be recreated once JBOSS starts again.
#
# This script will be run by cron every minute. There may be a better, say
# event-driven way to do this.

# To run every minute from cron, put this in /etc/crontab:
# */1 * * * * root /usr/local/bin/gateway-fan
# place script in /root/scripts

if [ -e /var/log/jboss4/heapdump ]; then

 logger -p local1.crit -t HEAPDUMP "heap dump detected as a result of HeapDumpOnOutOfMemoryError JVM flag"

 # check if /var/log/jboss4/heapdumps directory exists, if not create it, then
 # move /var/log/jboss4/heapdump file into the mentioned directory for backup
 # purposes

 if [ ! -e /var/log/jboss4/heapdumps ]; then
 mkdir /var/log/jboss4/heapdumps
 chown -R jboss4:jboss4 /var/log/jboss4/heapdumps
 fi

 mv /var/log/jboss4/heapdump /var/log/jboss4/heapdumps

 logger -p local1.crit -t HEAPDUMP "heap dump file backed up to var/log/jboss4/heapdumps"

 JBOSS4_PID=`cat /var/run/jboss4.pid`

 # now kill the process with the JVM that had run out of memory
 # TODO double check that the said JVM is in fact in the state we think it is in? i.e. what if
 # this script is run, the heapdump file exists BUT the issue had somehow been resolved before
 # this script has been run, then we'll potentially kill a operational production JVM which
 # would be unacceptable
 kill -9 $JBOSS4_PID
 rm /var/run/jboss4.pid

 logger -p local1.crit -t HEAPDUMP "killed process with pid identified by /var/run/jboss4.pid"

 # now start the JBoss
 /etc/init.d/jboss4 start

 logger -p local1.crit -t HEAPDUMP "command issued to start jboss4"
fi

As a final note, the “Hello World” Tomcat hosted website I was referring to, serving just an image and some text, is shown below. In my mind memory leaks are not an issue to look into, or should not be, when it comes to serving a basic html page.

Advertisements