Version 9

    Basic ClassLoader Leak Testing in the AS Testsuite

     

    I have added some basic testing for classloader leaks to the AS testsuite. Test and related classes can be found in the testsuite module, org.jboss.test.classloader.leak package.  Core test class is org.jboss.test.classloader.leak.test.ClassloaderLeakTestBase, with a hierarchy of test classes descending from that. (Shortly after AS 5.1.0.CR1, the hierarchy was changed such that each test method had its own class; see JBAS-6864.)

     

    The test cases basically deploy various EE components, execute a few very basic operations on them, undeploy the components, and then check if their classloaders have been garbage collected.

     

    Caveat: These tests have found several leaks (since fixed) but they cannot find all leaks.  They can only find leaks introduced through the limited number of code paths they exercise.  So, for example, if some obscure code path uses a ThreadLocal and leaks an object to that ThreadLocal and thus to a server thread pool, these tests will only find the leak if they exercise the obscure code path.

     

    How the tests work

     

    The test case deploys an instance of the org.jboss.test.classloader.leak.clstore.ClassLoaderTracker service. This service basically does two things:

     

    1. Instantiates a VM singleton org.jboss.test.classloader.leak.clstore.ClassLoaderStore.  EE components being tested register their classloader with the ClassLoaderStore.

    2. Exposes an MBean interface that the test client can access via the RMIAdaptor in order to check whether the VM is still holding a reference to a particular classloader.

     

    The ClassLoaderStore maintains a Map<String, WeakReference<ClassLoader>>. An EE component registers its classloader under a known String key.  For example, a Servlet might do the following in it's doGet() method:

     

    ClassLoaderStore.getInstance().storeClassLoader("WEBAPP", getClass().getClassLoader());
    ClassLoaderStore.getInstance().storeClassLoader("WEBAPP_TCCL", Thread.currentThread().getContextClassLoader());
    

     

    The test case deploys the servlet, executes a GET against it, undeploys the servlet, and checks that the classloader(s) registered under keys WEBAPP and WEBAPP_TCCL has been garbage collected.

     

    Tracing ClassLoader Leaks

     

    In its basic mode of operation, ClassLoaderTracker can only tell the test client whether a given classloader has been cleared from the ClassLoaderStore.  Fine for things like cruisecontrol runs, but not very helpful in determining why a classloader hasn't been garbage collected.  To get better information about the leak, you need help from JBoss Profiler.

     

    If the server VM is running with the JBoss Profiler JVMTI agent installed, ClassLoaderTracker will detect this and will use the profiler's org.jboss.profiler.jvmti.JVMTIInterface class to take a heap snapshot, find all references in the heap to the leaked classloader, trace those references (up to 20 levels deep), and dump a report of the references to the AS's server.log file.  This report can be cut from the log file and analyzed to find the cause of the classloader leak.

     

    Installing the JBoss Profiler JVMTI Agent

     

    1) Download or build the native library with the JBoss Profiler JVMTI agent. This will either be a .dll for Windows or a .so for nix.

     

    NOTE: ClassLoaderTracker depends on API's added in JBP release 1.0.0.CR5. Unfortunately, the

    standard JBP downloads page currently only shows releases up to  1.0.0.CR4.  The 1.0.0.CR5 release of the DLL can be found here. A 1.0.0.CR5 source download that can be used to build a .so can be found here. The JBoss Profiler docs include instructions on compiling the .so from source.

     

    2) Place the jbossAgent.dll or jbossAgent.so in your AS install's bin directory or in some other location on your machine's PATH.

     

    NOTE: If you've used JBP before, beware of duplicate copies on the path.  I wasted a day trying to figure out why the jbossAgent.dll in bin didn't work, only to find there was an earlier version on the path that was taking precedence.

     

    3) Modify run.bat or run.conf to tell Java to use JBP as an agentlib. For example:

     

    rem JVM memory allocation pool parameters. Modify as appropriate.
    set JAVA_OPTS=%JAVA_OPTS% -Xms128m -Xmx512m
    
    rem JBoss Profiler JVMTI support
    set JAVA_OPTS=%JAVA_OPTS% -agentlib:jbossAgent
    

     

    Tips on installing on Linux 64

     

    To get this to work on my Linux 64 machine, I had to download the latest JBoss Profiler source from svn trunk and compile the .so:

     

    cd /root-of-jboss-profiler-src-trunk
    cd jvmti-lib/linux64
    ./compile.sh
    

     

    Then instead of copying the .so to $JBOSS_HOME/bin, I had to set LD_LIBRARY_PATH.

     

    export LD_LIBRARY_PATH=/root-of-jboss-profiler-src-trunk/jvmti-lib/linux64

     

    Finally, in run.conf I also had to include -d64 in $JAVA_OPTS:

     

    JAVA_OPTS="$JAVA_OPTS -agentlib:jbossAgent -d64"

     

    Tips with JBoss 5

     

    I've found that to get the heap analysis to complete with JBoss 5, I needed to increase the -Xmx setting used to start JBoss.  1.5GB seemed to work.

     

    Controlling the Depth of the Heap Analysis

     

    The analysis tool works by following chains of references leading from the leaked classloader. Following such chains as far as they go could eventually bring much of the heap into the report, making it unusable. So, by default the analyzer will follow chains up to 12 levels deep, after which it will include the 12th node in the report followed by MaxLevel.  If you find 12 levels deep is not enough to let you identify the cause of the leak, you can increase the number of levels by setting the jboss.classloader.leak.test.depth system property on the VM running JBoss AS:

     

    ./run.sh -c all -Djboss.classloader.leak.test.depth=16
    

    Analyzing the Report

     

    Search the server.log file for <br> -- that's the start of the report. The report can be cut-and-pasted into a separate file; give the file a .html extension to make it easy to look at in a browser.

     

    The leak report attached to JIRA JBAS-4191 is a good example of a leak report.  Reference chains that have a reasonable chance of being related to the leak are highlighted in bold.  The end of such chains is highlighted in red.

     

    Analyzing these files is a bit of an art, beyond the scope of this page. A few things I've found useful:

     

    1. Ignore chains leading to java.lang.Reference or one of its subclasses-- if the real leak is found this chain will get collected. The leak report will not mark up any such chains in bold, making it easier to ignore them.

    2. Ignore chains that lead to something ending with "was already described before on this report" -- if the real leak is found this chain will get collected. The leak report will not mark up any such chains in bold, making it easier to ignore them.

    3. Look for long reference chains leading into other systems (i.e. the chain leading into the Tomcat processor pool on the attached report.)  Even if that chain isn't the cause of the leak, an object earlier on the chain often is. A sample report showing a minor (i.e. short lived) leak to the Tomcat thread pool is attached. (This sample was created with an earlier version of the test and doesn't have the critical-path highlighting you'll see on newer reports.)

    4. Search for ThreadLocal$ThreadLocalMap$Entry.value. If you find this, it means an object with a reference chain leading to your classloader has been leaked to a ThreadLocal.  Unfortunately the report doesn't identify what thread local is the culprit (see http://jira.jboss.com/jira/browse/JBPROFILER-49); you have to search code to find it. The leak report at JBAS-4191 shows a ThreadLocal leak.

    5. I've seen some code where a ThreadLocal was stored as an instance field in an object, with the assumption that when the object is collected, the ThreadLocal will be as well.  This is an invalid assumption.  Any Thread on which a thread local set call was executed stores a reference to the ThreadLocal and its value, and if the thread local isn't explicitly cleared, when or if that reference will ever be cleared is indeterminate.

     

    Things Checked by the Test Cases

     

    The test classes currently check the following:

     

    1. GET request to a servlet.

    2. GET request to a jsp.

    3. Invocation to non-clustered EJB2 SLSB and SFSB over JRMP.

    4. GET request to servlet that then invokes on a non-clustered EJB2 SLSB and SFSB.

    5. GET request to jsp that then invokes on a non-clustered EJB2 SLSB and SFSB.

    6. Invocation to non-clustered EJB3 SLSB and SFSB over JRMP.

    7. GET request to servlet that then invokes on a non-clustered EJB3 SLSB and SFSB.

    8. GET request to jsp that then invokes on a non-clustered EJB3 SLSB and SFSB.

     

    The EJB3 tests are run from a separate test class hierarchy from the EJB2 ones.

     

    Eventually I'll probably add MDBs as well.

     

    The tests exercise a number of packaging options:

     

    1. Standalone war

    2. Standalone ejb jar.

    3. EAR with ejb jar.

    4. EAR with war and ejb.

    5. Above EAR combinations, but with a scoped classloader.

     

    These tests were originally created to check for leaks caused by use of Jakarta Commons Logging, so all the EE components also write a log message via the JCL API.  The packaging combos also include various ways of packaging commons-logging.jar (e.g. include it in the deployment or count on getting it from the AS classpath.) The EJB3 tests don't include the commons-logging packaging options.