Chapter 8. TreeCacheMarshaller

Rather than using standard Java serialization to serialize java.lang.reflect.Method objects and their parameters when remote caches talk to each other to replicate data, JBoss Cache uses its own mechanism to marshall and unmarshall data called the TreeCacheMarshaller.

In addition to providing the performance and efficiency enhancements over standard Java serialization, The TreeCacheMarshaller also performs one other function. In order to deserialize an object replicated to it from a remote cache, a cache instance needs to have access to the classloader that defines the object's class. This is simple if the cache's own classloader can access the required classes, but for situations where JBoss Cache is used as a service supporting clients that use different classloaders, the TreeCacheMarshaller can be configured to use different classloaders on a per-region basis by allowing application code to register a classloader that should be used to handle replication for a portion of the tree.

8.1. Basic Usage

TreeCache exposes the following basic API for controlling the behavior of TreeCacheMarshaller:

/**
 * Sets whether marshalling uses scoped class loaders on a per region basis.
 *
 * This property must be set to true before any call to
 * {@link #registerClassLoader(String, ClassLoader)} or
 * {@link #activateRegion(String)}
 *
 * @param isTrue
 */
void setUseRegionBasedMarshalling(boolean isTrue);

/**
 * Gets whether marshalling uses scoped class loaders on a per region basis.
 */
boolean getUseRegionBasedMarshalling();

/**
 * Registers the given classloader with TreeCacheMarshaller for
 * use in unmarshalling replicated objects for the specified region.
 *
 * @param fqn The fqn region. This fqn and its children will use this classloader for (un)marshalling.
 * @param cl The class loader to use
 *
 * @throws RegionNameConflictException if fqn is a descendant of
 *                                     an FQN that already has a classloader
 *                                     registered.
 * @throws IllegalStateException if useMarshalling is false
 */
void registerClassLoader(String fqn, ClassLoader cl) throws RegionNameConflictException;

/**
 * Instructs the TreeCacheMarshaller to no longer use a special
 * classloader to unmarshal replicated objects for the specified region.
 *
 * @param fqn The fqn of the root node of region.
 *
 * @throws RegionNotFoundException if no classloader has been registered for
 *                                 fqn.
 * @throws IllegalStateException if useMarshalling is false
 */
void unregisterClassLoader(String fqn) throws RegionNotFoundException;
	 		

Property UseRegionBasedMarshalling controls whether classloader-based marshalling should be used. This property should be set as part of normal cache configuration, typically in the cache's XML configuration file:

<attribute name="UseRegionBasedMarshalling">true</attribute>

Anytime after UseRegionBasedMarshalling is set to true, the application code can call registerClassLoader to associate a classloader with the portion of the cache rooted in a particular FQN. Once registered, the classloader will be used to unmarshal any replication traffic related to the node identified by the FQN or to any of its descendants.

At this time, registerClassLoader only supports String-based FQNs.

Note that it is illegal to register a classloader for an FQN that is a descendant of an FQN for which a classloader has already been registered. For example, if classloader X is registered for FQN /a, a RegionNameConflictException will be thrown if an attempt is made to register classloader Y for FQN /a/b.

Method unregisterClassLoader is used to remove the association between a classloader and a particular cache region. Be sure to call this method when you are done using the cache with a particular classloader, or a reference to the classloader will be held, causing a memory leak!

8.2. Region Activation/Inactivation

The basic API discussed above is helpful, but in situations where applications with different classloaders are sharing a cache, the lifecycle of those applications will typically be different from that of the cache. The result of this is that it is difficult or impossible to register all required classloaders before a cache is started. For example, consider the following scenario:

  1. TreeCache on machine A starts.
  2. On A a classloader is registered under FQN /x.
  3. Machine B starts, so TreeCache on B starts.
  4. An object is put in the machine A cache under FQN /x/1.
  5. Replication to B fails, as the required classloader is not yet registered.
  6. On B a classloader is registered under FQN /x, but too late to prevent the replication error.

Furthermore, if any objects had been added to server A before server B was started, the initial transfer of state from A to B would have failed as well, as B would not be able to unmarshal the transferred objects.

To resolve this problem, if region-based marshalling is used a cache instance can be configured to ignore replication events for a portion of the tree. That portion of the tree is considered "inactive". After the needed classloader has been registered, the portion of the tree can be "activated". Activation causes the following events to occur:

  • Any existing state for that portion of the tree is transferred from another node in the cluster and integrated into the local tree.
  • TreeCacheMarshaller begins normal handling of replication traffic related to the portion of the tree.

In addition to the basic marshalling related API discussed above, TreeCache exposes the following API related to activating and inactivating portions of the cache:

/**
 * Sets whether the entire tree is inactive upon startup, only responding
 * to replication messages after {@link #activateRegion(String)} is
 * called to activate one or more parts of the tree.
 * <p>
 * This property is only relevant if {@link #getUseRegionBasedMarshalling()} is
 * true.
 *
 */
public void setInactiveOnStartup(boolean inactiveOnStartup);

/**
 * Gets whether the entire tree is inactive upon startup, only responding
 * to replication messages after {@link #activateRegion(String)} is
 * called to activate one or more parts of the tree.
 * <p>
 * This property is only relevant if {@link #getUseRegionBasedMarshalling()} is
 * true.
 */
public boolean isInactiveOnStartup();

/**
 * Causes the cache to transfer state for the subtree rooted at
 * subtreeFqn and to begin accepting replication messages
 * for that subtree.
 * <p>
 * <strong>NOTE:</strong> This method will cause the creation of a node
 * in the local tree at subtreeFqn whether or not that
 * node exists anywhere else in the cluster.  If the node does not exist
 * elsewhere, the local node will be empty.  The creation of this node will
 * not be replicated.
 *
 * @param subtreeFqn Fqn string indicating the uppermost node in the
 *                   portion of the tree that should be activated.
 *
 * @throws RegionNotEmptyException if the node subtreeFqn
 *                                 exists and has either data or children
 *
 * @throws IllegalStateException
 *       if {@link #getUseRegionBasedMarshalling() useRegionBasedMarshalling} is false
 */
public void activateRegion(String subtreeFqn)
    throws RegionNotEmptyException, RegionNameConflictException, CacheException;

/**
 * Causes the cache to stop accepting replication events for the subtree
 * rooted at subtreeFqn and evict all nodes in that subtree.
 *
 * @param subtreeFqn Fqn string indicating the uppermost node in the
 *                   portion of the tree that should be activated.
 * @throws RegionNameConflictException if subtreeFqn indicates
 *                                     a node that is part of another
 *                                     subtree that is being specially
 *                                     managed (either by activate/inactiveRegion()
 *                                     or by registerClassLoader())
 * @throws CacheException if there is a problem evicting nodes
 *
 * @throws IllegalStateException
 *       if {@link #getUseRegionBasedMarshalling() useRegionBasedMarshalling} is false
 */
public void inactivateRegion(String subtreeFqn) throws RegionNameConflictException, CacheException;
         

Property InactiveOnStartup controls whether the entire cache should be considered inactive when the cache starts. In most use cases where region activation is needed, this property would be set to true. This property should be set as part of normal cache configuration, typically in the cache's XML configuration file:

<attribute name="InactiveOnStartup">true</attribute>

When InactiveOnStartup is set to true, no state transfer will be performed on startup, even if property FetchInMemoryState is true.

When activateRegion() is invoked, each node in the cluster will be queried to see if it has active state for that portion of the tree. If one does, it will return the current state, which will then be integrated into the tree. Once state is transferred from one node, no other nodes will be asked for state. This process is somewhat different from the initial state transfer process that occurs at startup when property FetchInMemoryState is set to true. During initial state transfer, only the oldest member of the cluster is queried for state. This approach is inadequate for region activation, as it is possible that the oldest member of the cluster also has the region inactivated, and thus cannot provide state. So, each node in the cluster is queried until one provides state.

Before requesting state from other nodes, activateRegion() will confirm that there is no existing data in the portion of the tree being activated. If there is any, a RegionNotEmptyException will be thrown.

It is important to understand that when a region of the tree is marked as inactive, this only means replication traffic from other cluster nodes related to that portion of the tree will be ignored. It is still technically possible for objects to be placed in the inactive portion of the tree locally (via a put call), and any such local activity will be replicated to other nodes. TreeCache will not prevent this kind of local activity on an inactive region, but, as discussed above activateRegion() will throw an exception if it discovers data in a region that is being activated.

8.2.1. Example usage of Region Activation/Inactivation

As an example of the usage of region activation and inactivation, let's imagine a scenario where a TreeCache instance is deployed as a shared MBean service by deploying a -service.xml in the JBoss /deploy directory. One of the users of this cache could be a web application, which when started will register its classloader with the TreeCache and activate its own region of the cache.

First, the XML configuration file for the shared cache service would be configured as follows (only relevant portions are shown):

<?xml version="1.0" encoding="UTF-8" ?>
<server>
  <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar" />

  <!--  ====================================================================  -->
  <!--  Defines TreeCache configuration                                       -->
  <!--  ====================================================================  -->
  <mbean code="org.jboss.cache.TreeCache" name="com.xyz.cache:service=SharedCache">

    .......

    <!-- Configure Marshalling -->
    <attribute name="getUseRegionBasedMarshalling">true</attribute>
    <attribute name="InactiveOnStartup">true</attribute>

    ........

  </mbean>
</server>
				

For the webapp, registering/unregistering the classloader and activating/inactivating the app's region in the cache are tasks that should be done as part of initialization and destruction of the app. So, using a ServletContextListener to manage these tasks seems logical. Following is an example listener:

package example;

import javax.management.MalformedObjectNameException;
import javax.management.ObjectName;
import javax.servlet.ServletContextEvent;
import javax.servlet.ServletContextListener;

import org.jboss.cache.TreeCacheMBean;
import org.jboss.mx.util.MBeanProxyExt;

public class ActiveInactiveRegionExample implements ServletContextListener
{
   private TreeCacheMBean cache;

   public void contextInitialized(ServletContextEvent arg0) {
      try {
         findCache();

         cache.registerClassLoader("/example", Thread.currentThread().getContextClassLoader());
         cache.activeRegion("/example");
      }
      catch (Exception e) {
         // ... handle exception
      }

   }

   public void contextDestroyed(ServletContextEvent arg0) {
      cache.inactivateRegion("/example");
      cache.unregisterClassLoader("/example");
   }

   private void findCache() throws MalformedObjectNameException {
      // Find the shared cache service in JMX and create a proxy to it
      ObjectName cacheServiceName_ = new ObjectName("com.xyz.cache:service=SharedCache");
      // Create Proxy-Object for this service
      cache = (TreeCacheMBean) MBeanProxyExt.create(TreeCacheMBean.class, cacheServiceName_);
   }
}
				

The listener makes use of the JBoss utility class MBeanProxyExt to find the TreeCache in JMX and create a proxy to it. (See the "Running and using TreeCache inside JBoss" section below for more on accessing a TreeCache). It then registers its classloader with the cache and activates its region. When the webapp is being destroyed, it inactivates its region and unregisters its classloader (thus ensuring that the classloader isn't leaked via a reference to it held by TreeCacheMarshaller).

Note the order of the method calls in the example class -- register a classloader before activating a region, and inactivate the region before unregistering the classloader.

8.3. Region Activation/Inactivation with a CacheLoader

The activateRegion()/inactivateRegion() API can be used in conjunction with a CacheLoader as well, but only if the cache loader implementation implements interface org.jboss.cache.loader.ExtendedCacheLoader. This is a subinterface of the normal CacheLoader interface. It additionally specifies the following methods needed to support the partial state transfer that occurs when a region is activated:

   /**
    * Fetch a portion of the state for this cache from secondary storage
    * (disk, DB) and return it as a byte buffer.
    * This is for activation of a portion of new cache from a remote cache.
    * The new cache would then call {@link #storeState(byte[], Fqn)}.
    *
    * @param subtree Fqn naming the root (i.e. highest level parent) node of
    *                the subtree for which state is requested.
    *
    * @see org.jboss.cache.TreeCache#activateRegion(String)
    */
   byte[] loadState(Fqn subtree) throws Exception;

   /**
    * Store the given portion of the cache tree's state in secondary storage.
    * Overwrite whatever is currently in secondary storage.
    *
    * @param state   the state to store
    * @param subtree Fqn naming the root (i.e. highest level parent) node of
    *                the subtree included in state.
    */
   void storeState(byte[] state, Fqn subtree) throws Exception;

   /**
    * Sets the {@link RegionManager} this object should use to manage
    * marshalling/unmarshalling of different regions using different
    * classloaders.
    * <p>
    * <strong>NOTE:</strong> This method is only intended to be used
    * by the TreeCache instance this cache loader is
    * associated with.
    * </p>
    *
    * @param manager    the region manager to use, or null.
    */
   void setRegionManager(RegionManager manager);
	 		

JBossCache currently comes with two implementations of ExtendedCacheLoader, FileExtendedCacheLoader and JDBCExtendedCacheLoader. These classes extend FileCacheLoader and JDBCCacheLoader, respectively, implementing the extra methods in the extended interface.

8.4. Performance over Java serialization

To achieve the performance and efficiency gains, the TreeCacheMarshaller uses a number of techniques including method ids for known methods and magic numbers for known internal class types which drastically reduces the size of calls to remote caches, greatly improving throughput and reducing the overhead of Java serialization.

To make things even faster, the TreeCacheMarshaller uses JBoss Serialization, a highly efficient drop-in replacement for Java serialization for user-defined classes. JBoss Serialization is enabled and always used by default, although this can be disabled, causing the marshalling of user-defined classes to revert to Java serialization. JBoss Serialization is disabled by passing in the -Dserialization.jboss=false environment variable into your JVM.

8.5. Backward compatibility

Marshalling in JBoss Cache is now versioned. All communications between caches contain a version short which allows JBoss Cache instances of different versions to communicate with each other. Up until JBoss Cache 1.4.0, all versions were able to communicate with each other anyway since they all used simple serialization of org.jgroups.MethodCall objects, provided they all used the same version of JGroups. This requirement (more a requirement of the JGroups messaging layer than JBoss Cache) still exists, even though with JBoss Cache 1.4.0, we've moved to a much more efficient and sophisticated marshalling mechanism.

JBoss Cache 1.4.0 and future releases of JBoss Cache will always be able to unmarshall data from previous versions of JBoss Cache. For JBoss Cache 1.4.0 and future releases to marshall data in a format that is compatible with older versions, however, you would have to start JBoss Cache with the following configuration attribute:

  
				<!-- takes values such as 1.2.3, 1.2.4 and 1.3.0 -->
				<attribute name="ReplicationVersion">1.2.4</attribute>