JBoss Cache TreeCache - A Structured, Replicated, Transactional Cache

User Documentation

Bela Ban

Manik Surtani

Brian Stansberry

Daniel Huang

Release 1.4.0 "Jalapeno"

July 2006


Table of Contents

Preface
1. Introduction
1.1. What is a TreeCache?
1.2. TreeCache Basics
2. Architecture
3. Basic API
4. Clustered Caches
4.1. Local Cache
4.2. Clustered Cache - Using Replication
4.2.1. Buddy Replication
4.2.1.1. Selecting Buddies
4.2.1.2. BuddyPools
4.2.1.3. Failover
4.2.1.4. Implementation
4.2.1.5. Configuration
4.3. Clustered Cache - Using Invalidation
5. Transactions and Concurrency
5.1. Concurrent Access
5.1.1. Locks
5.1.2. Pessimistic locking
5.1.2.1. Isolation levels
5.1.3. Optimistic locking
5.1.3.1. Architecture
5.1.3.2. Configuration
5.2. Transactional Support
5.2.1. Example
6. Eviction Policies
6.1. Eviction Policy Plugin
6.2. TreeCache Eviction Policy Configuration
6.3. TreeCache LRU eviction policy implementation
6.4. TreeCache FIFO eviction policy implementation
6.5. TreeCache MRU eviction policy implementation
6.6. TreeCache LFU eviction policy implementation
7. Cache Loaders
7.1. The CacheLoader Interface
7.2. Configuration via XML
7.3. Cache passivation
7.4. CacheLoader use cases
7.4.1. Local cache with store
7.4.2. Replicated caches with all nodes sharing the same store
7.4.3. Replicated caches with only one node having a store
7.4.4. Replicated caches with each node having its own store
7.4.5. Hierarchical caches
7.4.6. TcpDelegatingCacheLoader
7.4.7. RmiDelegatingCacheLoader
7.5. JDBC-based CacheLoader
7.5.1. JDBCCacheLoader configuration
7.5.1.1. Table configuration
7.5.1.2. DataSource
7.5.1.3. JDBC driver
7.5.1.4. Configuration example
8. TreeCacheMarshaller
8.1. Basic Usage
8.2. Region Activation/Inactivation
8.2.1. Example usage of Region Activation/Inactivation
8.3. Region Activation/Inactivation with a CacheLoader
8.4. Performance over Java serialization
8.5. Backward compatibility
9. State Transfer
9.1. Types of State Transfer
9.2. When State Transfer Occurs
10. Version Compatibility and Interoperability
11. Configuration
11.1. Sample XML-Based Configuration
11.2. Definition of XML attributes
11.3. Overriding options
12. Management Information
12.1. JBoss Cache MBeans
12.2. JBoss Cache Statistics
12.3. Receiving Cache Notifications
12.4. Accessing Cache MBeans in a Standalone Environment
13. Running JBoss Cache within JBoss Application Server
13.1. Running as an MBean

Preface

This and its accompanying documents describe JBoss Cache's TreeCache, a tree-structured replicated, transactional cache. JBoss Cache's PojoCache, an "object-oriented" cache that is the AOP-enabled subclass of TreeCache, allowing for Plain Old Java Objects (POJOs) to be inserted and replicated transactionally in a cluster, is described separately in a similar user guide.

The TreeCache is fully configurable. Aspects of the system such as replication mechanisms, transaction isolation levels, eviction policies, and transaction managers are all configurable. The TreeCache can be used in a standalone fashion - independent of JBoss Application Server or any other application server. PojoCache on the other hand requires both TreeCache and the JBossAOP standalone subsystem. PojoCache, documented separately, is the first in the market to provide a POJO cache functionality.

This document is meant to be a user guide to explain the architecture, api, configuration, and examples for JBoss Cache's TreeCache. Good knowledge of the Java programming language along with a string appreciation and understanding of transactions and concurrent threads is presumed. No prior knowledge of JBoss Application Server is expected or required.

If you have questions, use the user forum linked on the JBoss Cache website. We also provide a mechanism for tracking bug reports and feature requests on the JBoss JIRA issue tracker. If you are interested in the development of JBoss Cache or in translating this documentation into other languages, we'd love to hear from you. Please post a message on the user forum or contact us on the developer mailing list.

JBoss Cache is an open-source product based on LGPL. Commercial development support, production support and training for JBoss Cache is available through JBoss Inc. JBoss Cache is a product in JBoss Professional Open Source JEMS (JBoss Enterprise Middleware Suite).

Chapter 1. Introduction

1.1. What is a TreeCache?

A TreeCache is a tree-structured, replicated, transactional cache from JBoss Cache. TreeCache is the backbone for many fundamental JBoss Application Server clustering services, including - in certain versions - clustering JNDI, HTTP and EJB sessions, and clustering JMS.

In addition to this, TreeCache can be used as a standalone transactional and replicated cache or even an object oriented data store, may be embedded in other J2EE compliant application servers such as BEA WebLogic or IBM WebSphere, servlet containers such as Tomcat, or even in Java applications that do not run from within an application server.

1.2. TreeCache Basics

The structure of a TreeCache is a tree with nodes. Each node has a name and zero or more children. A node can only have 1 parent; there is currently no support for graphs. A node can be reached by navigating from the root recursively through children, until the requested node is found. It can also be accessed by giving a fully qualified name (FQN), which consists of the concatenation of all node names from the root to the node in question.

A TreeCache can have multiple roots, allowing for a number of different trees to be present in a single cache instance. Note that a one level tree is essentially a HashMap. Each node in the tree has a map of keys and values. For a replicated cache, all keys and values have to be Serializable. Serializability is not a requirement for PojoCache, where reflection and aspect-oriented programming is used to replicate any type.

A TreeCache can be either local or replicated. Local trees exist only inside the Java VM in which they are created, whereas replicated trees propagate any changes to all other replicated trees in the same cluster. A cluster may span different hosts on a network or just different JVMs on a single host.

The first version of TreeCache was essentially a single HashMap that replicated. However, the decision was taken to go with a tree structured cache because (a) it is more flexible and efficient and (b) a tree can always be reduced to a map, thereby offering both possibilities. The efficiency argument was driven by concerns over replication overhead, and was that a value itself can be a rather sophisticated object, with aggregation pointing to other objects, or an object containing many fields. A small change in the object would therefore trigger the entire object (possibly the transitive closure over the object graph) to be serialized and propagated to the other nodes in the cluster. With a tree, only the modified nodes in the tree need to be serialized and propagated. This is not necessarily a concern for TreeCache, but is a vital requirement for PojoCache (as we will see in the separate PojoCache documentation).

When a change is made to the TreeCache, and that change is done in the context of a transaction, then we defer the replication of changes until the transaction commits successfully. All modifications are kept in a list associated with the transaction for the caller. When the transaction commits, we replicate the changes. Otherwise, (on a rollback) we simply undo the changes locally and release any locks, resulting in zero replication traffic and overhead. For example, if a caller makes 100 modifications and then rolls back the transaction, we will not replicate anything, resulting in no network traffic.

If a caller has no transaction associated with it (and isolation level is not NONE - more about this later), we will replicate right after each modification, e.g. in the above case we would send 100 messages, plus an additional message for the rollback. In this sense, running without a transaction can be thought of as analogous as running with auto-commit switched on in JDBC terminology, where each operation is committed automatically.

There is an API for plugging in different transaction managers: all it requires is to get the transaction associated with the caller's thread. Several TransactionManagerLookup implementations are provided for popular transaction managers, including a DummyTransactionManager for testing.

Finally, we use pessimistic locking of the cache by default, with optimistic locking as a configurable option. With pessimistic locking, we can configure the local locking policy corresponding to database-style transaction isolation levels, i.e., SERIALIZABLE, REPEATABLE, READ_COMMITTED, READ_UNCOMMITTED and NONE. More on transaction isolation levels will be discussed later. Note that the cluster-wide isolation level is READ-UNCOMMITTED by default as we don’t acquire a cluster-wide lock on touching an object for which we don’t yet have a lock (this would result in too high an overhead for messaging).

With optimistic locking, isolation levels are ignored as each transaction effectively maintains a copy of the data with which it works on and then attempts to merge back into the tree structure upon transaction completion. This results in a near-serializable degree of data integrity, applied cluster-wide, for the minor performance penalty incurred when validating workspace data at commit time, and the occasional transaction commit failure due to validation failures at commit time.

Chapter 2. Architecture

Schematic TreeCache architecture

Figure 2.1. Schematic TreeCache architecture

The architecture is shown above. The example shows 2 Java VMs, each has created an instance of TreeCache. These VMs can be located on the same machine, or on 2 different machines. The setup of the underlying group communication subsystem is done using JGroups.

Any modification (see API below) in one cache will be replicated to the other cache[1] and vice versa. Depending on the transactional settings, this replication will occur either after each modification or at the end of a transaction (at commit time). When a new cache is created, it can optionally acquire the contents from one of the existing caches on startup.



[1] Note that you can have more than 2 caches in a cluster.

Chapter 3. Basic API

Here's some sample code before we dive into the API itself:

TreeCache tree = new TreeCache();
tree.setClusterName("demo-cluster");
tree.setClusterProperties("default.xml"); // uses defaults if not provided
tree.setCacheMode(TreeCache.REPL_SYNC);
tree.createService(); // not necessary, but is same as MBean lifecycle
tree.startService(); // kick start tree cache
tree.put("/a/b/c", "name", "Ben");
tree.put("/a/b/c/d", "uid", new Integer(322649));
Integer tmp = (Integer) tree.get("/a/b/c/d", "uid");
tree.remove("/a/b");
tree.stopService();
tree.destroyService(); // not necessary, but is same as MBean lifecycle

The sample code first creates a TreeCache instance and then configures it. There is another constructor which accepts a number of configuration options. However, the TreeCache can be configured entirely from an XML file (shown later) and we don't recommend manual configuration as shown in the sample.

The cluster name, properties of the underlying JGroups stack, and cache mode (synchronous replication) are configured first (a list of configuration options is shown later). Then we start the TreeCache. If replication is enabled, this will make the TreeCache join the cluster, and acquire initial state from an existing node.

Then we add 2 items into the cache: the first element creates a node "a" with a child node "b" that has a child node "c". (TreeCache by default creates intermediary nodes that don't exist). The key "name" is then inserted into the "/a/b/c" node, with a value of "Ben".

The other element will create just the subnode "d" of "c" because "/a/b/c" already exists. It binds the integer 322649 under key "uid".

The resulting tree looks like this:

Sample Tree Nodes

Figure 3.1. Sample Tree Nodes

The TreeCache has 4 nodes "a", "b", "c" and "d". Nodes "/a/b/c" has values "name" associated with "Ben" in its map, and node "/a/b/c/d" has values "uid" and 322649.

Each node can be retrieved by its absolute name (e.g. "/a/b/c") or by navigating from parent to children (e.g. navigate from "a" to "b", then from "b" to "c").

The next method in the example gets the value associated with key="uid" in node "/a/b/c/d", which is the integer 322649.

The remove() method then removes node "/a/b" and all subnodes recursively from the cache. In this case, nodes "/a/b/c/d", "/a/b/c" and "/a/b" will be removed, leaving only "/a".

Finally, the TreeCache is stopped. This will cause it to leave the cluster, and every node in the cluster will be notified. Note that TreeCache can be stopped and started again. When it is stopped, all contents will be deleted. And when it is restarted, if it joins a cache group, the state will be replicated initially. So potentially you can recreate the contents.

In the sample, replication was enabled, which caused the 2 put() and the 1 remove() methods to replicated their changes to all nodes in the cluster. The get() method was executed on the local cache only.

Keys into the cache can be either strings separated by slashes ('/'), e.g. "/a/b/c", or they can be fully qualified names Fqns . An Fqn is essentially a list of Objects that need to implement hashCode() and equals(). All strings are actually transformed into Fqns internally. Fqns are more efficient than strings, for example:

String n1 = "/300/322649";
Fqn n2 = new Fqn(new Object{new Integer(300), new Integer(322649)});

In this example, we want to access a node that has information for employee with id=322649 in department with id=300. The string version needs two map lookups on Strings, whereas the Fqn version needs two map lookups on Integers. In a large hashtable, the hashCode() method for String may have collisions, leading to actual string comparisons. Also, clients of the cache may already have identifiers for their objects in Object form, and don't want to transform between Object and Strings, preventing unnecessary copying.

Note that the modification methods are put() and remove(). The only get method is get().

There are 2 put() methods[2] : put(Fqn node, Object key, Object key) and put(Fqn node, Map values). The former takes the node name, creates it if it doesn't yet exist, and put the key and value into the node's map, returning the previous value. The latter takes a map of keys and values and adds them to the node's map, overwriting existing keys and values. Content that is not in the new map remains in the node's map.

There are 3 remove() methods: remove(Fqn node, Object key), remove(Fqn node) and removeData(Fqn node). The first removes the given key from the node. The second removes the entire node and all subnodes, and the third removes all elements from the given node's map.

The get methods are: get(Fqn node) and get(Fqn node, Object key). The former returns a Node[3] object, allowing for direct navigation, the latter returns the value for the given key for a node.

Also, the TreeCache has a number of getters and setters. Since the API may change at any time, we recommend the Javadoc for up-to-date information.



[2] Plus their equivalent helper methods taking a String as node name.

[3] This is mainly used internally, and we may decide to remove public access to the Node in a future release.

Chapter 4. Clustered Caches

The TreeCache can be configured to be either local (standalone) or clustered. If in a cluster, the cache can be configured to replicate changes, or to invalidate changes. A detailed discussion on this follows.

4.1. Local Cache

Local caches don't join a cluster and don't communicate with other nodes in a cluster. Therefore their elements don't need to be serializable - however, we recommend making them serializable, enabling a user to change the cache mode at any time.

4.2. Clustered Cache - Using Replication

Replicated caches replicate all changes to the other TreeCache instances in the cluster. Replication can either happen after each modification (no transactions), or at the end of a transaction (commit time).

Replication can be synchronous or asynchronous . Use of either one of the options is application dependent. Synchronous replication blocks the caller (e.g. on a put()) until the modifications have been replicated successfully to all nodes in a cluster. Asynchronous replication performs replication in the background (the put() returns immediately). TreeCache also offers a replication queue, where modifications are replicated periodically (i.e. interval-based), or when the queue size exceeds a number of elements, or a combination thereof.

Asynchronous replication is faster (no caller blocking), because synchronous replication requires acknowledgments from all nodes in a cluster that they received and applied the modification successfully (round-trip time). However, when a synchronous replication returns successfully, the caller knows for sure that all modifications have been applied at all nodes, whereas this may or may not be the case with asynchronous replication. With asynchronous replication, errors are simply written to a log. Even when using transactions, a transaction may succeed but replication may not succeed on all TreeCache instances.

4.2.1. Buddy Replication

Buddy Replication allows you to suppress replicating your data to all instances in a cluster. Instead, each instance picks one or more 'buddies' in the cluster, and only replicates to these specific buddies. This greatly helps scalability as there is no longer a memory and network traffic impact every time another instance is added to a cluster.

One of the most common use cases of Buddy Replication is when a replicated cache is used by a servlet container to store HTTP session data. One of the pre-requisites to buddy replication working well and being a real benefit is the use of session affinity, also known as sticky sessions in HTTP session replication speak. What this means is that if certain data is frequently accessed, it is desirable that this is always accessed on one instance rather than in a round-robin fashion as this helps the cache cluster optimise how it chooses buddies, where it stores data, and minimises replication traffic.

If this is not possible, Buddy Replication may prove to be more of an overhead than a benefit.

4.2.1.1. Selecting Buddies

Buddy Replication uses an instance of a org.jboss.cache.buddyreplication.BuddyLocator which contains the logic used to select buddies in a network. JBoss Cache currently ships with a single implementation, org.jboss.cache.buddyreplication.NextMemberBuddyLocator, which is used as a default if no implementation is provided. The NextMemberBuddyLocator selects the next member in the cluster, as the name suggests, and guarantees an even spread of buddies for each instance.

The NextMemberBuddyLocator takes in 2 parameters, both optional.

  • numBuddies - specifies how many buddies each instance should pick to back its data onto. This defaults to 1.
  • ignoreColocatedBuddies - means that each instance will try to select a buddy on a different physical host. If not able to do so though, it will fall back to colocated instances. This defaults to true.

4.2.1.2. BuddyPools

Also known as replication groups, a buddy pool is an optional construct where each instance in a cluster may be configured with a buddy pool name. Think of this as an 'exclusive club membership' where when selecting buddies, BuddyLocators would try and select buddies sharing the same buddy pool name. This allows system administrators a degree of flexibility and control over how buddies are selected. For example, a sysadmin may put two instances on two separate physical servers that may be on two separate physical racks in the same buddy pool. So rather than picking an instance on a different host on the same rack, BuddyLocators would rather pick the instance in the same buddy pool, on a separate rack which may add a degree of redundancy.

4.2.1.3. Failover

In the unfortunate event of an instance crashing, it is assumed that the client connecting to the cache (directly or indirectly, via some other service such as HTTP session replication) is able to redirect the request to any other random cache instance in the cluster. This is where a concept of Data Gravitation comes in.

Data Gravitation is a concept where if a request is made on a cache in the cluster and the cache does not contain this information, it then asks other instances in the cluster for the data. If even this fails, it would (optionally) ask other instances to check in the backup data they store for other caches. This means that even if a cache containing your session dies, other instances will still be able to access this data by asking the cluster to search through their backups for this data.

Once located, this data is then transferred to the instance which requested it and is added to this instance's data tree. It is then (optionally) removed from all other instances (and backups) so that if session affinity is used, the affinity should now be to this new cache instance which has just taken ownership of this data.

Data Gravitation is implemented as an interceptor. The following (all optional) configuration properties pertain to data gravitation.

  • dataGravitationRemoveOnFind - forces all remote caches that own the data or hold backups for the data to remove that data, thereby making the requesting cache the new data owner. If set to false an evict is broadcast instead of a remove, so any state persisted in cache loaders will remain. This is useful if you have a shared cache loader configured. Defaults to true.
  • dataGravitationSearchBackupTrees - Asks remote instances to search through their backups as well as main data trees. Defaults to true. The resulting effect is that if this is true then backup nodes can respond to data gravitation requests in addition to data owners.
  • autoDataGravitation - Whether data gravitation occurs for every cache miss. My default this is set to false to prevent unnecessary network calls. Most use cases will know when it may need to gravitate data and will pass in an Option to enable data gravitation on a per-invocation basis. If autoDataGravitation is true this Option is unnecessary.

4.2.1.4. Implementation

Class diagram of the classes involved in buddy replication and how they are related to each other

Figure 4.1. Class diagram of the classes involved in buddy replication and how they are related to each other

4.2.1.5. Configuration

                    <!-- Buddy Replication config -->
                    <attribute name="BuddyReplicationConfig">
                        <config>
                            
							<!-- Enables buddy replication.  This is the ONLY mandatory configuration element here. -->
                            <buddyReplicationEnabled>true</buddyReplicationEnabled>
                            
							<!-- These are the default values anyway -->
                            <buddyLocatorClass>org.jboss.cache.buddyreplication.NextMemberBuddyLocator</buddyLocatorClass>
                            
							<!-- numBuddies is the number of backup nodes each node maintains.  ignoreColocatedBuddies means that
                                each node will *try* to select a buddy on a different physical host.  If not able to do so though,
                                it will fall back to colocated nodes. -->
                            <buddyLocatorProperties>
                                numBuddies = 1
                                ignoreColocatedBuddies = true
                            </buddyLocatorProperties>
                            
							<!-- A way to specify a preferred replication group.  If specified, we try and pick a buddy why shares
                            the same pool name (falling back to other buddies if not available).  This allows the sysdmin to hint at
                            backup buddies are picked, so for example, nodes may be hinted topick buddies on a different physical rack
                            or power supply for added fault tolerance.  -->
                            <buddyPoolName>myBuddyPoolReplicationGroup</buddyPoolName>
                            
							<!-- Communication timeout for inter-buddy group organisation messages (such as assigning to and removing
                            from groups, defaults to 1000. -->
                            <buddyCommunicationTimeout>2000</buddyCommunicationTimeout>
							
							<!-- Whether data is removed from old owners when gravitated to a new owner.  Defaults to true.  -->
							<dataGravitationRemoveOnFind>true</dataGravitationRemoveOnFind>	
							
							<!-- Whether backup nodes can respond to data gravitation requests, or only the data owner is supposed to respond.  
								defaults to true. -->
							<dataGravitationSearchBackupTrees>true</dataGravitationSearchBackupTrees>	
							
							<!-- Whether all cache misses result in a data gravitation request.  Defaults to false, requiring callers to 
								enable data gravitation on a per-invocation basis using the Options API.  -->
						    <autoDataGravitation>false</autoDataGravitation>

                        </config>
                    </attribute>
                    
                    

4.3. Clustered Cache - Using Invalidation

If a cache is configured for invalidation rather than replication, every time data is changed in a cache other caches in the cluster receive a message informing them that their data is now stale and should be evicted from memory. Invalidation, when used with a shared cache loader (see chapter on Cache Loaders) would cause remote caches to refer to the shared cache loader to retrieve modified data. The benefit of this is twofold: network traffic is minimised as invalidation messages are very small compared to replicating updated data, and also that other caches in the cluster look up modified data in a lazy manner, only when needed.

Invalidation messages are sent after each modification (no transactions), or at the end of a transaction, upon successful commit. This is usually more efficient as invalidation messages can be optimised for the transaction as a whole rather than on a per-modification basis.

Invalidation too can be synchronous or asynchronous, and just as in the case of replication, synchronous invalidation blocks until all caches in the cluster receive invalidation messages and have evicted stale data while asynchronous invalidation works in a 'fire-and-forget' mode, where invalidation messages are broadcast but doesn't block and wait for responses.

Chapter 5. Transactions and Concurrency

5.1. Concurrent Access

JBoss Cache uses a pessimistic locking scheme by default to prevent concurrent access to the same data. Optimistic locking may alternatively be used, and is discussed later.

5.1.1. Locks

Locking is done internally, on a node-level. For example when we want to access "/a/b/c", a lock will be acquired for nodes "a", "b" and "c". When the same transaction wants to access "/a/b/c/d", since we already hold locks for "a", "b" and "c", we only need to acquire a lock for "d".

Lock owners are either transactions (call is made within the scope of an existing transaction) or threads (no transaction associated with the call). Regardless, a transaction or a thread is internally transformed into an instance of GlobalTransaction, which is used as a globally unique ID for modifications across a cluster. E.g. when we run a two-phase commit protocol (see below) across the cluster, the GlobalTransaction uniquely identifies the unit of work across a cluster.

Locks can be read or write locks. Write locks serialize read and write access, whereas read-only locks only serialize read access. When a write lock is held, no other write or read locks can be acquired. When a read lock is held, others can acquire read locks. However, to acquire write locks, one has to wait until all read locks have been released. When scheduled concurrently, write locks always have precedence over read locks. Note that (if enabled) read locks can be upgraded to write locks.

Using read-write locks helps in the following scenario: consider a tree with entries "/a/b/n1" and "/a/b/n2". With write-locks, when Tx1 accesses "/a/b/n1", Tx2 cannot access "/a/b/n2" until Tx1 has completed and released its locks. However, with read-write locks this is possible, because Tx1 acquires read-locks for "/a/b" and a read-write lock for "/a/b/n1". Tx2 is then able to acquire read-locks for "/a/b" as well, plus a read-write lock for "/a/b/n2". This allows for more concurrency in accessing the cache.

5.1.2. Pessimistic locking

By default, JBoss Cache uses pessimistic locking. Locking is not exposed directly to user. Instead, a transaction isolation level which provides different locking behaviour is configurable.

5.1.2.1. Isolation levels

JBoss Cache supports the following transaction isolation levels, analogous to database ACID isolation levels. A user can configure an instance-wide isolation level of NONE, READ_UNCOMMITTED, READ_COMMITTED, REPEATABLE_READ, or SERIALIZABLE. REPEATABLE_READ is the default isolation level used.

  1. NONE. No transaction support is needed. There is no locking at this level, e.g., users will have to manage the data integrity. Implementations use no locks.

  2. READ_UNCOMMITTED. Data can be read anytime while write operations are exclusive. Note that this level doesn't prevent the so-called "dirty read" where data modified in Tx1 can be read in Tx2 before Tx1 commits. In other words, if you have the following sequence,

    			Tx1   Tx2
    		     W
    		           R
    		

    using this isolation level will not Tx2 read operation. Implementations typically use an exclusive lock for writes while reads don't need to acquire a lock.

  3. READ_COMMITTED. Data can be read any time as long as there is no write. This level prevents the dirty read. But it doesn’t prevent the so-called ‘non-repeatable read’ where one thread reads the data twice can produce different results. For example, if you have the following sequence,

    		Tx1   Tx2
    		 R
    		       W
    		 R
    		

    where the second read in Tx1 thread will produce different result.

    Implementations usually use a read-write lock; reads succeed acquiring the lock when there are only reads, writes have to wait until there are no more readers holding the lock, and readers are blocked acquiring the lock until there are no more writers holding the lock. Reads typically release the read-lock when done, so that a subsequent read to the same data has to re-acquire a read-lock; this leads to nonrepeatable reads, where 2 reads of the same data might return different values. Note that, the write only applies regardless of transaction state (whether it has been committed or not).

  4. REPEATABLE_READ. Data can be read while there is no write and vice versa. This level prevents "non-repeatable read" but it does not prevent the so-called "phantom read" where new data can be inserted into the tree from the other transaction. Implementations typically use a read-write lock. This is the default isolation level used.

  5. SERIALIZABLE. Data access is synchronized with exclusive locks. Only 1 writer or reader can have the lock at any given time. Locks are released at the end of the transaction. Regarded as very poor for performance and thread/transaction concurrency.

5.1.3. Optimistic locking

The motivation for optimistic locking is to improve concurrency. When a lot of threads have a lot of contention for access to the data tree, it can be inefficient to lock portions of the tree - for reading or writing - for the entire duration of a transaction as we do in pessimistic locking. Optimistic locking allows for greater concurrency of threads and transactions by using a technique called data versioning, explained here. Note that isolation levels (if configured) are ignored if optimistic locking is enabled.

5.1.3.1. Architecture

Optimistic locking treats all method calls as transactional[4]. Even if you do not invoke a call within the scope of an ongoing transaction, JBoss Cache creates an implicit transaction and commits this transaction when the invocation completes. Each transaction maintains a transaction workspace, which contains a copy of the data used within the transaction.

For example, if a transaction calls get("/a/b/c"), nodes a, b and c are copied from the main data tree and into the workspace. The data is versioned and all calls in the transaction work on the copy of the data rather than the actual data. When the transaction commits, it's workspace is merged back into the underlying tree by matching versions. If there is a version mismatch - such as when the actual data tree has a higher version than the workspace, perhaps if another transaction were to access the same data, change it and commit before the first transaction can finish - the transaction throws a RollbackException when committing and the commit fails.

Optimistic locking uses the same locks we speak of above, but the locks are only held for a very short duration - at the start of a transaction to build a workspace, and when the transaction commits and has to merge data back into the tree.

So while optimistic locking may occasionally fail if version validations fail or may run slightly slower than pessimistic locking due to the inevitable overhead and extra processing of maintaining workspaces, versioned data and validating on commit, it does buy you a near-SERIALIZABLE degree of data integrity while maintaining a very high level of concurrency.

5.1.3.2. Configuration

Optimistic locking is enabled by using the NodeLockingScheme XML attribute, and setting it to "OPTIMISTIC":
			...
			<!--
				Node locking scheme:
			    OPTIMISTIC
			    PESSIMISTIC (default)
			-->
			<attribute name="NodeLockingScheme">OPTIMISTIC</attribute>
			...
		

5.2. Transactional Support

JBoss Cache can be configured to use transactions to bundle units of work, which can then be replicated as one unit. Alternatively, if transaction support is disabled, it is equivalent to setting AutoCommit to on where modifications are potentially[5] replicated after every change (if replication is enabled).

What JBoss Cache does on every incoming call (e.g. put()) is:

  1. get the transaction associated with the thread

  2. register (if not already done) with the transaction manager to be notified when a transaction commits or is rolled back.

In order to do this, the cache has to be configured with an instance of a TransactionManagerLookup which returns a javax.transaction.TransactionManager.

JBoss Cache ships with JBossTransactionManagerLookup and GenericTransactionManagerLookup. The JBossTransactionManagerLookup is able to bind to a running JBoss Application Server and retrieve a TransactionManager while the GenericTransactionManagerLookup is able to bind to most popular Java EE application servers and provide the same functionality. A dummy implementation - DummyTransactionManagerLookup - is also provided, which may be used for standalone JBoss Cache applications and unit tests running outside a Java EE Application Server. Being a dummy, however, this is just for demo and testing purposes and is not recommended for production use.

The implementation of the JBossTransactionManagerLookup is as follows:

public class JBossTransactionManagerLookup implements TransactionManagerLookup {

    public JBossTransactionManagerLookup() {}

    public TransactionManager getTransactionManager() throws Exception {
       Object tmp=new InitialContext().lookup("java:/TransactionManager");
       return (TransactionManager)tmp;
    }
}

The implementation looks up the JBoss Transaction Manager from JNDI and returns it.

When a call comes in, the TreeCache gets the current transaction and records the modification under the transaction as key. (If there is no transaction, the modification is applied immediately and possibly replicated). So over the lifetime of the transaction all modifications will be recorded and associated with the transaction. Also, the TreeCache registers with the transaction to be notified of transaction committed or aborted when it first encounters the transaction.

When a transaction rolls back, we undo the changes in the cache and release all locks.

When the transaction commits, we initiate a two-phase commit protocol[6] : in the first phase, a PREPARE containing all modifications for the current transaction is sent to all nodes in the cluster. Each node acquires all necessary locks and applies the changes, and then sends back a success message. If a node in a cluster cannot acquire all locks, or fails otherwise, it sends back a failure message.

The coordinator of the two-phase commit protocol waits for all responses (or a timeout, whichever occurs first). If one of the nodes in the cluster responds with FAIL (or we hit the timeout), then a rollback phase is initiated: a ROLLBACK message is sent to all nodes in the cluster. On reception of the ROLLBACK message, every node undoes the changes for the given transaction, and releases all locks held for the transaction.

If all responses are OK, a COMMIT message is sent to all nodes in the cluster. On reception of a COMMIT message, each node applies the changes for the given transaction and releases all locks associated with the transaction.

When we referred to 'transaction', we actually mean a global representation of a local transaction, which uniquely identifies a transaction across a cluster.

5.2.1. Example

Let's look at an example of how to use JBoss Cache in a standalone (i.e. outside an application server) fashion with dummy transactions:

Properties prop = new Properties();
prop.put(Context.INITIAL_CONTEXT_FACTORY, "org.jboss.cache.transaction.DummyContextFactory");
User Transaction tx=(UserTransaction)new InitialContext(prop).lookup("UserTransaction");
TreeCache tree = new TreeCache();
PropertyConfigurator config = new PropertyConfigurator();
config.configure(tree, "META-INF/replSync-service.xml");
tree.createService(); // not necessary
tree.startService(); // kick start tree cache

try {
   tx.begin();
   tree.put("/classes/cs-101", "description", "the basics");
   tree.put("/classes/cs-101", "teacher", "Ben");
   tx.commit();
}
catch(Throwable ex) {
   try { tx.rollback(); } catch(Throwable t) {}
}

The first lines obtain a user transaction using the 'JEE way' via JNDI. Note that we could also say

UserTransaction tx = new DummyUserTransaction(DummyTransactionManager.getInstance());

Then we create a new TreeCache and configure it using a PropertyConfigurator class and a configuration XML file (see below for a list of all configuration options).

Next we start the cache. Then, we start a transaction (and associate it with the current thread internally). Any methods invoked on the cache will now be collected and only applied when the transaction is committed. In the above case, we create a node "/classes/cs-101" and add 2 elements to its map. Assuming that the cache is configured to use synchronous replication, on transaction commit the modifications are replicated. If there is an exception in the methods (e.g. lock acquisition failed), or in the two-phase commit protocol applying the modifications to all nodes in the cluster, the transaction is rolled back.



[4] Because of this requirement, you must always have a transaction manager configured when using optimistic locking.

[5] Depending on whether interval-based asynchronous replication is used

[6] Only with synchronous replication or invalidation.

Chapter 6. Eviction Policies

Eviction policies specify the behavior of a node residing inside the cache, e.g., life time and maximum numbers allowed. Memory constraints on servers mean caches cannot grow indefinitely, so policies need to be in place to restrict the size of the cache in memory.

6.1. Eviction Policy Plugin

The design of the JBoss Cache eviction policy framework is based on the loosely coupled observable pattern (albeit still synchronous) where the eviction region manager will register a TreeCacheListener to handle cache events and relay them back to the eviction policies. Whenever a cached node is added, removed, evicted, or visited, the eviction registered TreeCacheListener will maintain state statistics and information will be relayed to each individual Eviction Region. Each Region can define a different EvictionPolicy implementation that will know how to correlate cache add, remove, and visit events back to a defined eviction behavior. It's the policy provider's responsibility to decide when to call back the cache "evict" operation.

There is a single eviction thread (timer) that will run at a configured interval. This thread will make calls into each of the policy providers and inform it of any TreeCacheListener aggregated adds, removes and visits (gets) to the cache during the configured interval. The eviction thread is responsible for kicking off the eviction policy processing (a single pass) for each configured eviction cache region.

In order to implement an eviction policy, the following interfaces must be implemented: org.jboss.cache.eviction.EvictionPolicy, org.jboss.cache.eviction.EvictionAlgorithm, org.jboss.cache.eviction.EvictionQueue and org.jboss.cache.eviction.EvictionConfiguration. When compounded together, each of these interface implementations define all the underlying mechanics necessary for a complete eviction policy implementation.

TreeCache eviction UML Diagram

Figure 6.1. TreeCache eviction UML Diagram

public interface EvictionPolicy
{
   /**
    * Evict a node form the underlying cache.
    *
    * @param fqn DataNode corresponds to this fqn.
    * @throws Exception
    */
   void evict(Fqn fqn) throws Exception;

   /**
    * Return children names as Objects
    *
    * @param fqn
    * @return Child names under given fqn
    */
   Set getChildrenNames(Fqn fqn);

   /**
    * Is this a leaf node?
    *
    * @param fqn
    * @return true/false if leaf node.
    */
   boolean hasChild(Fqn fqn);

   Object getCacheData(Fqn fqn, Object key);

   /**
    * Method called to configure this implementation.
    */
   void configure(TreeCache cache);

   /**
    * Get the associated EvictionAlgorithm used by the EvictionPolicy.
    * <p/>
    * This relationship should be 1-1.
    *
    * @return An EvictionAlgorithm implementation.
    */
   EvictionAlgorithm getEvictionAlgorithm();

   /**
    * The EvictionConfiguration implementation class used by this EvictionPolicy.
    *
    * @return EvictionConfiguration implementation class.
    */
   Class getEvictionConfigurationClass();

}
public interface EvictionAlgorithm
{
   /**
    * Entry point for evictin algorithm. This is an api called by the EvictionTimerTask
    * to process the node events in waiting and actual pruning, if necessary.
    *
    * @param region Region that this algorithm will operate on.
    */
   void process(Region region) throws EvictionException;

   /**
    * Reset the whole eviction queue. Queue may needs to be reset due to corrupted state, for example.
    *
    * @param region Region that this algorithm will operate on.
    */
   void resetEvictionQueue(Region region);

   /**
    * Get the EvictionQueue implementation used by this algorithm.
    *
    * @return the EvictionQueue implementation.
    */
   EvictionQueue getEvictionQueue();

}
      
public interface EvictionQueue
{
   /**
    * Get the first entry in the queue.
    * <p/>
    * If there are no entries in queue, this method will return null.
    * <p/>
    * The first node returned is expected to be the first node to evict.
    *
    * @return first NodeEntry in queue.
    */
   public NodeEntry getFirstNodeEntry();

   /**
    * Retrieve a node entry by Fqn.
    * <p/>
    * This will return null if the entry is not found.
    *
    * @param fqn Fqn of the node entry to retrieve.
    * @return Node Entry object associated with given Fqn param.
    */
   public NodeEntry getNodeEntry(Fqn fqn);

   public NodeEntry getNodeEntry(String fqn);

   /**
    * Check if queue contains the given NodeEntry.
    *
    * @param entry NodeEntry to check for existence in queue.
    * @return true/false if NodeEntry exists in queue.
    */
   public boolean containsNodeEntry(NodeEntry entry);

   /**
    * Remove a NodeEntry from queue.
    * <p/>
    * If the NodeEntry does not exist in the queue, this method will return normally.
    *
    * @param entry The NodeEntry to remove from queue.
    */
   public void removeNodeEntry(NodeEntry entry);

   /**
    * Add a NodeEntry to the queue.
    *
    * @param entry The NodeEntry to add to queue.
    */
   public void addNodeEntry(NodeEntry entry);

   /**
    * Get the size of the queue.
    *
    * @return The number of items in the queue.
    */
   public int size();

   /**
    * Clear the queue.
    */
   public void clear();

}
      
public interface EvictionConfiguration
{
   public static final int WAKEUP_DEFAULT = 5;

   public static final String ATTR = "attribute";
   public static final String NAME = "name";

   public static final String REGION = "region";
   public static final String WAKEUP_INTERVAL_SECONDS = "wakeUpIntervalSeconds";
   public static final String MAX_NODES = "maxNodes";
   public static final String TIME_TO_IDLE_SECONDS = "timeToIdleSeconds";
   public static final String TIME_TO_LIVE_SECONDS = "timeToLiveSeconds";
   public static final String MAX_AGE_SECONDS = "maxAgeSeconds";
   public static final String MIN_NODES = "minNodes";
   public static final String REGION_POLICY_CLASS = "policyClass";

   /**
    * Parse the XML configuration for the given specific eviction region.
    * <p/>
    * The element parameter should contain the entire region block. An example
    * of an entire Element of the region would be:
    * <p/>
    * <region name="abc">
    * <attribute name="maxNodes">10</attribute>
    * </region>
    *
    * @param element DOM element for the region. <region name="abc"></region>
    * @throws ConfigureException
    */
   public void parseXMLConfig(Element element) throws ConfigureException;
}

Note that:

  • The EvictionConfiguration class 'parseXMLConfig(Element)' method expects only the DOM element pertaining to the region the policy is being configured for.

  • The EvictionConfiguration implementation should maintain getter and setter methods for configured properties pertaining to the policy used on a given cache region. (e.g. for LRUConfiguration there is a int getMaxNodes() and a setMaxNodes(int))

Alternatively, the implementation of a new eviction policy provider can be further simplified by extending BaseEvictionPolicy and BaseEvictionAlgorithm. Or for properly sorted EvictionAlgorithms (sorted in eviction order - see LFUAlgorithm) extending BaseSortedEvictionAlgorithm and implementing SortedEvictionQueue takes care of most of the common functionality available in a set of eviction policy provider classes

public abstract class BaseEvictionPolicy implements EvictionPolicy
{
   protected static final Fqn ROOT = new Fqn("/");

   protected TreeCache cache_;

   public BaseEvictionPolicy()
   {
   }

   /** EvictionPolicy interface implementation */

   /**
    * Evict the node under given Fqn from cache.
    *
    * @param fqn The fqn of a node in cache.
    * @throws Exception
    */
   public void evict(Fqn fqn) throws Exception
   {
      cache_.evict(fqn);
   }

   /**
    * Return a set of child names under a given Fqn.
    *
    * @param fqn Get child names for given Fqn in cache.
    * @return Set of children name as Objects
    */
   public Set getChildrenNames(Fqn fqn)
   {
      try
      {
         return cache_.getChildrenNames(fqn);
      }
      catch (CacheException e)
      {
         e.printStackTrace();
      }
      return null;
   }

   public boolean hasChild(Fqn fqn)
   {
      return cache_.hasChild(fqn);
   }

   public Object getCacheData(Fqn fqn, Object key)
   {
      try
      {
         return cache_.get(fqn, key);
      }
      catch (CacheException e)
      {
         e.printStackTrace();
      }
      return null;
   }

   public void configure(TreeCache cache)
   {
      this.cache_ = cache;
   }

}
public abstract class BaseEvictionAlgorithm implements EvictionAlgorithm
{
   private static final Log log = LogFactory.getLog(BaseEvictionAlgorithm.class);

   protected Region region;
   protected BoundedBuffer recycleQueue;
   protected EvictionQueue evictionQueue;

   /**
    * This method will create an EvictionQueue implementation and prepare it for use.
    *
    * @param region Region to setup an eviction queue for.
    * @return The created EvictionQueue to be used as the eviction queue for this algorithm.
    * @throws EvictionException
    * @see EvictionQueue
    */
   protected abstract EvictionQueue setupEvictionQueue(Region region) throws EvictionException;

   /**
    * This method will check whether the given node should be evicted or not.
    *
    * @param ne NodeEntry to test eviction for.
    * @return True if the given node should be evicted. False if the given node should not be evicted.
    */
   protected abstract boolean shouldEvictNode(NodeEntry ne);

   protected BaseEvictionAlgorithm()
   {
      recycleQueue = new BoundedBuffer();
   }

   protected void initialize(Region region) throws EvictionException
   {
      this.region = region;
      evictionQueue = setupEvictionQueue(region);
   }

   /**
    * Process the given region.
    * <p/>
    * Eviction Processing encompasses the following:
    * <p/>
    * - Add/Remove/Visit Nodes
    * - Prune according to Eviction Algorithm
    * - Empty/Retry the recycle queue of previously evicted but locked (during actual cache eviction) nodes.
    *
    * @param region Cache region to process for eviction.
    * @throws EvictionException
    */
   public void process(Region region) throws EvictionException
   {
      if (this.region == null)
      {
         this.initialize(region);
      }

      this.processQueues(region);
      this.emptyRecycleQueue();
      this.prune();
   }

   public void resetEvictionQueue(Region region)
   {
   }

   /**
    * Get the underlying EvictionQueue implementation.
    *
    * @return the EvictionQueue used by this algorithm
    * @see EvictionQueue
    */
   public EvictionQueue getEvictionQueue()
   {
      return this.evictionQueue;
   }

   /**
    * Event processing for Evict/Add/Visiting of nodes.
    * <p/>
    * - On AddEvents a new element is added into the eviction queue
    * - On RemoveEvents, the removed element is removed from the eviction queue.
    * - On VisitEvents, the visited node has its eviction statistics updated (idleTime, numberOfNodeVisists, etc..)
    *
    * @param region Cache region to process for eviction.
    * @throws EvictionException
    */
   protected void processQueues(Region region) throws EvictionException
   {
      EvictedEventNode node;
      int count = 0;
      while ((node = region.takeLastEventNode()) != null)
      {
         int eventType = node.getEvent();
         Fqn fqn = node.getFqn();

         count++;
         switch (eventType)
         {
            case EvictedEventNode.ADD_EVENT:
               this.processAddedNodes(fqn);
               break;
            case EvictedEventNode.REMOVE_EVENT:
               this.processRemovedNodes(fqn);
               break;
            case EvictedEventNode.VISIT_EVENT:
               this.processVisitedNodes(fqn);
               break;
            default:
               throw new RuntimeException("Illegal Eviction Event type " + eventType);
         }
      }

      if (log.isTraceEnabled())
      {
         log.trace("processed " + count + " node events");
      }

   }

   protected void evict(NodeEntry ne)
   {
//      NodeEntry ne = evictionQueue.getNodeEntry(fqn);
      if (ne != null)
      {
         evictionQueue.removeNodeEntry(ne);
         if (!this.evictCacheNode(ne.getFqn()))
         {
            try
            {
               recycleQueue.put(ne);
            }
            catch (InterruptedException e)
            {
               e.printStackTrace();
            }
         }
      }
   }

   /**
    * Evict a node from cache.
    *
    * @param fqn node corresponds to this fqn
    * @return True if successful
    */
   protected boolean evictCacheNode(Fqn fqn)
   {
      if (log.isTraceEnabled())
      {
         log.trace("Attempting to evict cache node with fqn of " + fqn);
      }
      EvictionPolicy policy = region.getEvictionPolicy();
      // Do an eviction of this node

      try
      {
         policy.evict(fqn);
      }
      catch (Exception e)
      {
         if (e instanceof TimeoutException)
         {
            log.warn("eviction of " + fqn + " timed out. Will retry later.");
            return false;
         }
         e.printStackTrace();
         return false;
      }

      if (log.isTraceEnabled())
      {
         log.trace("Eviction of cache node with fqn of " + fqn + " successful");
      }

      return true;
   }

   /**
    * Process an Added cache node.
    *
    * @param fqn FQN of the added node.
    * @throws EvictionException
    */
   protected void processAddedNodes(Fqn fqn) throws EvictionException
   {
      if (log.isTraceEnabled())
      {
         log.trace("Adding node " + fqn + " to eviction queue");
      }

      long stamp = System.currentTimeMillis();
      NodeEntry ne = new NodeEntry(fqn);
      ne.setModifiedTimeStamp(stamp);
      ne.setNumberOfNodeVisits(1);
      // add it to the node map and eviction queue
      if (evictionQueue.containsNodeEntry(ne))
      {
         if (log.isTraceEnabled())
         {
            log.trace("Queue already contains " + ne.getFqn() + " processing it as visited");
         }
         this.processVisitedNodes(ne.getFqn());
         return;
      }

      evictionQueue.addNodeEntry(ne);

      if (log.isTraceEnabled())
      {
         log.trace(ne.getFqn() + " added successfully to eviction queue");
      }
   }

   /**
    * Remove a node from cache.
    * <p/>
    * This method will remove the node from the eviction queue as well as
    * evict the node from cache.
    * <p/>
    * If a node cannot be removed from cache, this method will remove it from the eviction queue
    * and place the element into the recycleQueue. Each node in the recycle queue will get retried until
    * proper cache eviction has taken place.
    * <p/>
    * Because EvictionQueues are collections, when iterating them from an iterator, use iterator.remove()
    * to avoid ConcurrentModificationExceptions. Use the boolean parameter to indicate the calling context.
    *
    * @param fqn FQN of the removed node
    * @throws EvictionException
    */
   protected void processRemovedNodes(Fqn fqn) throws EvictionException
   {
      if (log.isTraceEnabled())
      {
         log.trace("Removing node " + fqn + " from eviction queue and attempting eviction");
      }

      NodeEntry ne = evictionQueue.getNodeEntry(fqn);
      if (ne != null)
      {
         evictionQueue.removeNodeEntry(ne);
      }

      if (log.isTraceEnabled())
      {
         log.trace(fqn + " removed from eviction queue");
      }
   }

   /**
    * Visit a node in cache.
    * <p/>
    * This method will update the numVisits and modifiedTimestamp properties of the Node.
    * These properties are used as statistics to determine eviction (LRU, LFU, MRU, etc..)
    * <p/>
    * *Note* that this method updates Node Entries by reference and does not put them back
    * into the queue. For some sorted collections, a remove, and a re-add is required to
    * maintain the sorted order of the elements.
    *
    * @param fqn FQN of the visited node.
    * @throws EvictionException
    */
   protected void processVisitedNodes(Fqn fqn) throws EvictionException
   {
      NodeEntry ne = evictionQueue.getNodeEntry(fqn);
      if (ne == null)
      {
         this.processAddedNodes(fqn);
         return;
      }
      // note this method will visit and modify the node statistics by reference!
      // if a collection is only guaranteed sort order by adding to the collection,
      // this implementation will not guarantee sort order.
      ne.setNumberOfNodeVisits(ne.getNumberOfNodeVisits() + 1);
      ne.setModifiedTimeStamp(System.currentTimeMillis());
   }

   /**
    * Empty the Recycle Queue.
    * <p/>
    * This method will go through the recycle queue and retry to evict the nodes from cache.
    *
    * @throws EvictionException
    */
   protected void emptyRecycleQueue() throws EvictionException
   {
      while (true)
      {
         Fqn fqn;

         try
         {
            fqn = (Fqn) recycleQueue.poll(0);
         }
         catch (InterruptedException e)
         {
            e.printStackTrace();
            break;
         }

         if (fqn == null)
         {
            if (log.isTraceEnabled())
            {
               log.trace("Recycle queue is empty");
            }
            break;
         }

         if (log.isTraceEnabled())
         {
            log.trace("emptying recycle bin. Evict node " + fqn);
         }

         // Still doesn't work
         if (!evictCacheNode(fqn))
         {
            try
            {
               recycleQueue.put(fqn);
            }
            catch (InterruptedException e)
            {
               e.printStackTrace();
            }
            break;
         }
      }
   }

   protected void prune() throws EvictionException
   {
      NodeEntry entry;
      while ((entry = evictionQueue.getFirstNodeEntry()) != null)
      {
         if (this.shouldEvictNode(entry))
         {
            this.evict(entry);
         }
         else
         {
            break;
         }
      }
   }

}

Note that:

  • The BaseEvictionAlgorithm class maintains a processing structure. It will process the ADD, REMOVE, and VISIT events queued by the Region (events are originated from the EvictionTreeCacheListener) first. It also maintains an collection of items that were not properly evicted during the last go around because of held locks. That list is pruned. Finally, the EvictionQueue itself is pruned for entries that should be evicted based upon the configured eviction rules for the region.

   public abstract class BaseSortedEvictionAlgorithm extends BaseEvictionAlgorithm implements EvictionAlgorithm
   {
      private static final Log log = LogFactory.getLog(BaseSortedEvictionAlgorithm.class);

      public void process(Region region) throws EvictionException
      {
         super.process(region);
      }

      protected void processQueues(Region region) throws EvictionException
      {
         boolean evictionNodesModified = false;

         EvictedEventNode node;
         int count = 0;
         while ((node = region.takeLastEventNode()) != null)
         {
            int eventType = node.getEvent();
            Fqn fqn = node.getFqn();

            count++;
            switch (eventType)
            {
               case EvictedEventNode.ADD_EVENT:
                  this.processAddedNodes(fqn);
                  evictionNodesModified = true;
                  break;
               case EvictedEventNode.REMOVE_EVENT:
                  this.processRemovedNodes(fqn);
                  break;
               case EvictedEventNode.VISIT_EVENT:
                  this.processVisitedNodes(fqn);
                  evictionNodesModified = true;
                  break;
               default:
                  throw new RuntimeException("Illegal Eviction Event type " + eventType);
            }
         }

         if (log.isTraceEnabled())
         {
            log.trace("Eviction nodes visited or added requires resort of queue " + evictionNodesModified);
         }

         this.resortEvictionQueue(evictionNodesModified);


         if (log.isTraceEnabled())
         {
            log.trace("processed " + count + " node events");
         }

      }

      /**
       * This method is called to resort the queue after add or visit events have occurred.
       * <p/>
       * If the parameter is true, the queue needs to be resorted. If it is false, the queue does not
       * need resorting.
       *
       * @param evictionQueueModified True if the queue was added to or visisted during event processing.
       */
      protected void resortEvictionQueue(boolean evictionQueueModified)
      {
         long begin = System.currentTimeMillis();
         ((SortedEvictionQueue) evictionQueue).resortEvictionQueue();
         long end = System.currentTimeMillis();

         if (log.isTraceEnabled())
         {
            long diff = end - begin;
            log.trace("Took " + diff + "ms to sort queue with " + getEvictionQueue().size() + " elements");
         }
      }

   }

Note that:

  • The BaseSortedEvictionAlgorithm class will maintain a boolean through the algorithm processing that will determine if any new nodes were added or visited. This allows the Algorithm to determine whether to resort the eviction queue items (in first to evict order) or to skip the potentially expensive sorting if there have been no changes to the cache in this region.

public interface SortedEvictionQueue extends EvictionQueue
{
   /**
    * Provide contract to resort a sorted queue.
    */
   public void resortEvictionQueue();
}

Note that:

  • The SortedEvictionQueue interface defines the contract used by the BaseSortedEvictionAlgorithm abstract class that is used to resort the underlying queue. Again, the queue sorting should be sorted in first to evict order. The first entry in the list should evict before the last entry in the queue. The last entry in the queue should be the last entry that will require eviction.

6.2. TreeCache Eviction Policy Configuration

TreeCache 1.2.X allows a single eviction policy provider class to be configured for use by all regions. As of TreeCache 1.3.x each cache region can define its own eviction policy provider or it can use the eviction policy provider class defined at the cache level (1.2.x backwards compatibility)

Here is an example of a legacy 1.2.x EvictionPolicyConfig element to configure TreeCache for use with a single eviction policy provider

          <attribute name="EvictionPolicyClass">org.jboss.cache.eviction.LRUPolicy</attribute>
          <!-- Specific eviction policy configurations. This is LRU -->
          <attribute name="EvictionPolicyConfig">
             <config>
                <attribute name="wakeUpIntervalSeconds">5</attribute>
                <!-- Cache wide default -->
                <region name="/_default_">
                    <attribute name="maxNodes">5000</attribute>
                    <attribute name="timeToLiveSeconds">1000</attribute>
                </region>
                <region name="/org/jboss/data">
                    <attribute name="maxNodes">5000</attribute>
                    <attribute name="timeToLiveSeconds">1000</attribute>
                </region>
                <region name="/org/jboss/test/data">
                    <attribute name="maxNodes">5</attribute>
                    <attribute name="timeToLiveSeconds">4</attribute>
                </region>
                <region name="/test/">
                    <attribute name="maxNodes">10000</attribute>
                    <attribute name="timeToLiveSeconds">5</attribute>
                </region>
                <region name="/maxAgeTest/">
                   <attribute name="maxNodes">10000</attribute>
                   <attribute name="timeToLiveSeconds">8</attribute>
                   <attribute name="maxAgeSeconds">10</attribute>
                </region>
             </config>
          </attribute>
       

Here is an example of configuring a different eviction provider per region

      <attribute name="EvictionPolicyConfig">
         <config>
            <attribute name="wakeUpIntervalSeconds">5</attribute>
            <!-- Cache wide default -->
            <region name="/_default_" policyClass="org.jboss.cache.eviction.LRUPolicy">
               <attribute name="maxNodes">5000</attribute>
               <attribute name="timeToLiveSeconds">1000</attribute>
            </region>
            <region name="/org/jboss/data" policyClass="org.jboss.cache.eviction.LFUPolicy">
               <attribute name="maxNodes">5000</attribute>
               <attribute name="minNodes">1000</attribute>
            </region>
            <region name="/org/jboss/test/data" policyClass="org.jboss.cache.eviction.FIFOPolicy">
               <attribute name="maxNodes">5</attribute>
            </region>
            <region name="/test/" policyClass="org.jboss.cache.eviction.MRUPolicy">
               <attribute name="maxNodes">10000</attribute>
            </region>
            <region name="/maxAgeTest/" policyClass="org.jboss.cache.eviction.LRUPolicy">
               <attribute name="maxNodes">10000</attribute>
               <attribute name="timeToLiveSeconds">8</attribute>
               <attribute name="maxAgeSeconds">10</attribute>
            </region>
         </config>
      </attribute>
       

Lastly, an example of mixed mode. In this scenario the regions that have a specific policy defined will use that policy. Those that do not will default to the policy defined on the entire cache instance.

      <attribute name="EvictionPolicyClass">org.jboss.cache.eviction.LRUPolicy</attribute>
      <!-- Specific eviction policy configurations. This is LRU -->
      <attribute name="EvictionPolicyConfig">
         <config>
            <attribute name="wakeUpIntervalSeconds">5</attribute>
            <!-- Cache wide default -->
            <region name="/_default_">
               <attribute name="maxNodes">5000</attribute>
               <attribute name="timeToLiveSeconds">1000</attribute>
            </region>
            <region name="/org/jboss/data" policyClass="org.jboss.cache.eviction.FIFOPolicy">
               <attribute name="maxNodes">5000</attribute>
            </region>
            <region name="/test/" policyClass="org.jboss.cache.eviction.MRUPolicy">
               <attribute name="maxNodes">10000</attribute>
            </region>
            <region name="/maxAgeTest/">
               <attribute name="maxNodes">10000</attribute>
               <attribute name="timeToLiveSeconds">8</attribute>
               <attribute name="maxAgeSeconds">10</attribute>
            </region>
         </config>
      </attribute>
       

TreeCache now allows reconfiguration of eviction policy providers programatically at runtime. An example of how to reconfigure at runtime and how to set an LRU region to have maxNodes to 12345 timeToLiveSeconds to 500 and maxAgeSeconds to 1000 programatically.

         // note this is just to show that a running TreeCache instance must be
         // retrieved somehow. How it is implemented is up to the implementor.
         TreeCache cache = getRunningTreeCacheInstance();

         org.jboss.cache.eviction.RegionManager regionManager = cache.getEvictionRegionManager();
         org.jboss.cache.eviction.Region region = regionManager.getRegion("/myRegionName");
         EvictionConfiguation config = region.getEvictionConfiguration();
         ((LRUConfiguration)config).setMaxNodes(12345);
         ((LRUConfiguration)config).setTimeToLiveSeconds(500);
         ((LRUConfiguration)config).setMaxAgeSeconds(1000);
       

6.3. TreeCache LRU eviction policy implementation

TreeCache has implemented a LRU eviction policy, org.jboss.cache.eviction.LRUPolicy, that controls both the node lifetime and age. This policy guarantees O(n) = 1 for adds, removals and lookups (visits). It has the following configuration parameters:

  • wakeUpIntervalSeconds. This is the interval (in seconds) to process the node events and also to perform sweeping for the size limit and age-out nodes.

  • Region. Region is a group of nodes that possess the same eviction policy, e.g., same expired time. In TreeCache, region is denoted by a fqn, e.g., /company/personnel, and it is recursive. In specifying the region, the order is important. For example, if /org/jboss/test is specified before /org/jboss/test/data, then any node under /org/jboss/test/data belongs to the first region rather than the second. Note also that whenever eviction policy is activated, there should always be a /_default_ region which covers all the eviction policies not specified by the user. In addition, the region configuration is not programmable, i.e., all the policies have to be specified via XML configuration.

    • maxNodes. This is the maximum number of nodes allowed in this region. 0 denotes no limit.

    • timeToLiveSeconds. Time to idle (in seconds) before the node is swept away. 0 denotes no limit.

    • maxAgeSeconds. Time an object should exist in TreeCache (in seconds) regardless of idle time before the node is swept away. 0 denotes no limit.

Please see the above section for an example.

6.4. TreeCache FIFO eviction policy implementation

TreeCache has implemented a FIFO eviction policy, org.jboss.cache.eviction.FIFOPolicy, that will control the eviction in a proper first in first out order. This policy guarantees O(n) = 1 for adds, removals and lookups (visits). It has the following configuration parameters:

  • wakeUpIntervalSeconds. This is the interval (in seconds) to process the node events and also to perform sweeping for the size limit and age-out nodes.

  • Region. Region is a group of nodes that possess the same eviction policy, e.g., same expired time. In TreeCache, region is denoted by a fqn, e.g., /company/personnel, and it is recursive. In specifying the region, the order is important. For example, if /org/jboss/test is specified before /org/jboss/test/data, then any node under /org/jboss/test/data belongs to the first region rather than the second. Note also that whenever eviction policy is activated, there should always be a /_default_ region which covers all the eviction policies not specified by the user. In addition, the region configuration is not programmable, i.e., all the policies have to be specified via XML configuration.

    • maxNodes. This is the maximum number of nodes allowed in this region. Any integer less than or equal to 0 will throw an exception when the policy provider is being configured for use.

Please read the above section for an example.

6.5. TreeCache MRU eviction policy implementation

TreeCache has implemented a MRU eviction policy, org.jboss.cache.eviction.MRUPolicy, that will control the eviction in based on most recently used algorithm. The most recently used nodes will be the first to evict with this policy. This policy guarantees O(n) = 1 for adds, removals and lookups (visits). It has the following configuration parameters:

  • wakeUpIntervalSeconds. This is the interval (in seconds) to process the node events and also to perform sweeping for the size limit and age-out nodes.

  • Region. Region is a group of nodes that possess the same eviction policy, e.g., same expired time. In TreeCache, region is denoted by a fqn, e.g., /company/personnel, and it is recursive. In specifying the region, the order is important. For example, if /org/jboss/test is specified before /org/jboss/test/data, then any node under /org/jboss/test/data belongs to the first region rather than the second. Note also that whenever eviction policy is activated, there should always be a /_default_ region which covers all the eviction policies not specified by the user. In addition, the region configuration is not programmable, i.e., all the policies have to be specified via XML configuration.

    • maxNodes. This is the maximum number of nodes allowed in this region. Any integer less than or equal to 0 will throw an exception when the policy provider is being configured for use.

Please read the above section for an example.

6.6. TreeCache LFU eviction policy implementation

TreeCache has implemented a LFU eviction policy, org.jboss.cache.eviction.LFUPolicy, that will control the eviction in based on least frequently used algorithm. The least frequently used nodes will be the first to evict with this policy. Node usage starts at 1 when a node is first added. Each time it is visted, the node usage counter increments by 1. This number is used to determine which nodes are least frequently used. LFU is also a sorted eviction algorithm. The underlying EvictionQueue implementation and algorithm is sorted in ascending order of the node visits counter. This class guarantees O(n) = 1 for adds, removal and searches. However, when any number of nodes are added/visited to the queue for a given processing pass, a single O(n) = n*log(n) operation is used to resort the queue in proper LFU order. Similarly if any nodes are removed or evicted, a single O(n) = n pruning operation is necessary to clean up the EvictionQueue. LFU has the following configuration parameters:

  • wakeUpIntervalSeconds. This is the interval (in seconds) to process the node events and also to perform sweeping for the size limit and age-out nodes.

  • Region. Region is a group of nodes that possess the same eviction policy, e.g., same expired time. In TreeCache, region is denoted by a fqn, e.g., /company/personnel, and it is recursive. In specifying the region, the order is important. For example, if /org/jboss/test is specified before /org/jboss/test/data, then any node under /org/jboss/test/data belongs to the first region rather than the second. Note also that whenever eviction policy is activated, there should always be a /_default_ region which covers all the eviction policies not specified by the user. In addition, the region configuration is not programmable, i.e., all the policies have to be specified via XML configuration.

    • maxNodes. This is the maximum number of nodes allowed in this region. A value of 0 for maxNodes means that there is no upper bound for the configured cache region.

    • minNodes. This is the minimum number of nodes allowed in this region. This value determines what the eviction queue should prune down to per pass. e.g. If minNodes is 10 and the cache grows to 100 nodes, the cache is pruned down to the 10 most frequently used nodes when the eviction timer makes a pass through the eviction algorithm.

Please read the above section for an example.

Chapter 7. Cache Loaders

JBoss Cache can use a cache loader to back up the in-memory cache to a backend datastore. If JBoss Cache is configured with a cache loader, then the following features are provided:

  • Whenever a cache element is accessed, and that element is not in the cache (e.g. due to eviction or due to server restart), then the cache loader transparently loads the element into the cache if found in the backend store.
  • Whenever an element is modified, added or removed, then that modification is persisted in the backend store via the cache loader. If transactions are used, all modifications created within a transaction are persisted. To this end, the cache loader takes part in the two phase commit protocol run by the transaction manager.

Currently, the cache loader API looks similar to the TreeCache API. In the future, they will both implement the same interface. The goal is to be able to form hierarchical cache topologies, where one cache can delegate to another, which in turn may delegate to yet another cache.

As of JBossCache 1.3.0, you can now define several cache loaders, in a chain. The impact is that the cache will look at all of the cache loaders in the order they've been configured, until it finds a valid, non-null element of data. When performing writes, all cache loaders are written to (except if the ignoreModifications element has been set to true for a specific cache loader. See the configuration section below for details.

The cache loader interface is defined in org.jboss.cache.loader.CacheLoader as follows (edited for brevity):

public interface CacheLoader extends Service {

   /**
    * Sets the configuration. Will be called before {@link #create()} and {@link #start()}
    * @param props A set of properties specific to a given CacheLoader
    */
   void setConfig(Properties props);

   void setCache(TreeCache c);


   /**
    * Returns a list of children names, all names are <em>relative</em>. Returns null if the parent node is not found.
    * The returned set must not be modified, e.g. use Collections.unmodifiableSet(s) to return the result
    * @param fqn The FQN of the parent
    * @return Set<String>. A list of children. Returns null if no children nodes are present, or the parent is
    * not present
    */
   Set getChildrenNames(Fqn fqn) throws Exception;


   /**
    * Returns the value for a given key. Returns null if the node doesn't exist, or the value is not bound
    */
   Object get(Fqn name, Object key) throws Exception;


   /**
    * Returns all keys and values from the persistent store, given a fully qualified name.
    * 
    * NOTE that the expected return value of this method has changed from JBossCache 1.2.x 
    * and before!  This will affect cache loaders written prior to JBossCache 1.3.0 and such 
    * implementations should be checked for compliance with the behaviour expected.
    *  
    * @param name
    * @return Map<Object,Object> of keys and values for the given node. Returns null if the node is not
    * found.  If the node is found but has no attributes, this method returns an empty Map.
    * @throws Exception
    */
   Map get(Fqn name) throws Exception;



   /**
    * Checks whether the CacheLoader has a node with Fqn
    * @return True if node exists, false otherwise
    */
   boolean exists(Fqn name) throws Exception;


   /**
    * Inserts key and value into the attributes hashmap of the given node. If the node does not exist, all
    * parent nodes from the root down are created automatically
    */
   void put(Fqn name, Object key, Object value) throws Exception;

   /**
    * Inserts all elements of attributes into the attributes hashmap of the given node, overwriting existing
    * attributes, but not clearing the existing hashmap before insertion (making it a union of existing and
    * new attributes)
    * If the node does not exist, all parent nodes from the root down are created automatically
    * @param name The fully qualified name of the node
    * @param attributes A Map of attributes. Can be null
    */
   void put(Fqn name, Map attributes) throws Exception;

   /**
    * Inserts all modifications to the backend store. Overwrite whatever is already in
    * the datastore.
    * @param modifications A List<Modification> of modifications
    * @throws Exception
    */
   void put(List modifications) throws Exception;

   /** Removes the given key and value from the attributes of the given node. No-op if node doesn't exist */
   void remove(Fqn name, Object key) throws Exception;

   /**
    * Removes the given node. If the node is the root of a subtree, this will recursively remove all subnodes,
    * depth-first
    */
   void remove(Fqn name) throws Exception;

   /** Removes all attributes from a given node, but doesn't delete the node itself */
   void removeData(Fqn name) throws Exception;


   /**
    * Prepare the modifications. For example, for a DB-based CacheLoader:
    * <ol>
    * <li>Create a local (JDBC) transaction
    * <li>Associate the local transaction with <code>tx</code> (tx is the key)
    * <li>Execute the coresponding SQL statements against the DB (statements derived from modifications)
    * </ol>
    * For non-transactional CacheLoader (e.g. file-based), this could be a null operation
    * @param tx            The transaction, just used as a hashmap key
    * @param modifications List<Modification>, a list of all modifications within the given transaction
    * @param one_phase     Persist immediately and (for example) commit the local JDBC transaction as well. When true,
    *                      we won't get a {@link #commit(Object)} or {@link #rollback(Object)} method call later
    */
   void prepare(Object tx, List modifications, boolean one_phase) throws Exception;

   /**
    * Commit the transaction. A DB-based CacheLoader would look up the local JDBC transaction asociated
    * with <code>tx</code> and commit that transaction<br/>
    * Non-transactional CacheLoaders could simply write the data that was previously saved transiently under the
    * given <code>tx</code> key, to (for example) a file system (note this only holds if the previous prepare() did
    * not define one_phase=true
    */
   void commit(Object tx) throws Exception;

   /**
    * Roll the transaction back. A DB-based CacheLoader would look up the local JDBC transaction asociated
    * with <code>tx</code> and roll back that transaction
    */
   void rollback(Object tx);

   /**
    * Fetch the entire state for this cache from secondary storage (disk, DB) and return it as a byte buffer.
    * This is for initialization of a new cache from a remote cache. The new cache would then call
    * storeEntireState()
    * todo: define binary format for exchanging state
    */
   byte[] loadEntireState() throws Exception;

   /** Store the given state in secondary storage. Overwrite whatever is currently in storage */
   void storeEntireState(byte[] state) throws Exception;
}

NOTE: the contract defined by the CacheLoader interface has changed from JBoss Cache 1.3.0 onwards, specifically with the get(Fqn fqn) method. Special care must be taken with custom CacheLoader implementations to ensure this new contract is still adhered to. See the javadoc above on this method for details, or visit this wiki page for more discussion on this.

CacheLoader implementations that need to support partial state transfer should also implement the subinterface org.jboss.cache.loader.ExtendedCacheLoader:

public interface ExtendedCacheLoader extends CacheLoader
{
   /**
    * Fetch a portion of the state for this cache from secondary storage 
    * (disk, DB) and return it as a byte buffer.
    * This is for activation of a portion of new cache from a remote cache. 
    * The new cache would then call {@link #storeState(byte[], Fqn)}.
    * 
    * @param subtree Fqn naming the root (i.e. highest level parent) node of
    *                the subtree for which state is requested.
    *                
    * @see org.jboss.cache.TreeCache#activateRegion(String)
    */
   byte[] loadState(Fqn subtree) throws Exception;
   
   /**
    * Store the given portion of the cache tree's state in secondary storage. 
    * Overwrite whatever is currently in secondary storage.  If the transferred 
    * state has Fqns equal to or children of parameter subtree, 
    * then no special behavior is required.  Otherwise, ensure that
    * the state is integrated under the given 'subtree'. Typically
    * in the latter case 'subtree' would be the Fqn of the buddy 
    * backup region for a buddy group; e.g.
    * 
    * If the the transferred state had Fqns starting with "/a" and
    * 'subtree' was "/_BUDDY_BACKUP_/192.168.1.2:5555" then the
    * state should be stored in the local persistent store under
    * "/_BUDDY_BACKUP_/192.168.1.2:5555/a"
    * 
    * @param state   the state to store
    * @param subtree Fqn naming the root (i.e. highest level parent) node of
    *                the subtree included in 'state'.  If the Fqns  
    *                of the data included in 'state' are not 
    *                already children of 'subtree', then their
    *                Fqns should be altered to make them children of 
    *                'subtree' before they are persisted.
    */   
   void storeState(byte[] state, Fqn subtree) throws Exception;
   
   /**
    * Sets the {@link RegionManager} this object should use to manage 
    * marshalling/unmarshalling of different regions using different
    * classloaders.
    *
    * NOTE: This method is only intended to be used by the TreeCache instance 
    * this cache loader is associated with.
    * 
    * @param manager    the region manager to use, or null.
    */
   void setRegionManager(RegionManager manager);

}
   

NOTE: If a cache loader is used along with buddy replication, the cache loader must implement ExtendedCacheLoader unless its FetchPersistentState property is set to false.

NOTE: the contract defined by the ExtendedCacheLoader interface has changed from JBoss Cache 1.4.0 onwards, specifically with the requirement that data passed to storeState method be integrated under the given subtree, even if that data didn't originate in that subtree. This behavior is necessary to properly support buddy replication. Special care must be taken with custom ExtendedCacheLoader implementations to ensure this new contract is still adhered to.

7.1. The CacheLoader Interface

The interaction between JBoss Cache and a CacheLoader implementation is as follows. When CacheLoaderConfiguration (see below) is non-null, an instance of each configured cacheloader is created when the cache is created. Since CacheLoader extends Service,

public interface Service {
   void create() throws Exception;

   void start() throws Exception;

   void stop();

   void destroy();
}

CacheLoader.create() and CacheLoader.start() are called when the cache is started. Correspondingly, stop() and destroy() are called when the cache is stopped.

Next, setConfig() and setCache() are called. The latter can be used to store a reference to the cache, the former is used to configure this instance of the CacheLoader. For example, here a database CacheLoader could establish a connection to the database.

The CacheLoader interface has a set of methods that are called when no transactions are used: get(), put(), remove() and removeData(): they get/set/remove the value immediately. These methods are described as javadoc comments in the above interface.

Then there are three methods that are used with transactions: prepare(), commit() and rollback(). The prepare() method is called when a transaction is to be committed. It has a transaction object and a list of modfications as argument. The transaction object can be used as a key into a hashmap of transactions, where the values are the lists of modifications. Each modification list has a number of Modification elements, which represent the changes made to a cache for a given transaction. When prepare() returns successfully, then the CacheLoader must be able to commit (or rollback) the transaction successfully.

Currently, the TreeCache takes care of calling prepare(), commit() and rollback() on the CacheLoaders at the right time. We intend to make both the TreeCache and the CacheLoaders XA resources, so that instead of calling those methods on a loader, the cache will only enlist the loader with the TransactionManager on the same transaction.

The commit() method tells the CacheLoader to commit the transaction, and the rollback() method tells the CacheLoader to discard the changes associated with that transaction.

The last two methods are loadEntireState() and storeEntireState(). The first method asks the CacheLoader to get the entire state the backend store manages and return it as a byte buffer, and the second tells a CacheLoader to replace its entire state with the byte buffer argument. These methods are used for scenarios where each JBossCache node in a cluster has its own local data store, e.g. a local DB, and - when a new node starts - we have to initialize its backend store with the contents of the backend store of an existing member. See below for deails.

The ExtendedCacheLoader methods are also related to state transfer. The loadState(Fqn) method is called when the cache is preparing a partial state transfer -- that is, the transfer of just the portion of the cache loader's state that is rooted in the given Fqn. The storeState(byte[], Fqn) method is then invoked on the cache loader of the node that is receiving the state transfer. Partial state transfers occur when the cache's activateRegion() API is used and during the formation of buddy groups if buddy replication is used.

7.2. Configuration via XML

The CacheLoader is configured as follows in the JBossCache XML file:

    <!-- ==================================================================== -->
    <!-- Defines TreeCache configuration                                      -->
    <!-- ==================================================================== -->

    <mbean code="org.jboss.cache.TreeCache" name="jboss.cache:service=TreeCache">

        <!-- New 1.3.x cache loader config block -->
        <attribute name="CacheLoaderConfiguration">
            <config>
                <!-- if passivation is true, only the first cache loader is used; the rest are ignored -->
                <passivation>false</passivation>
                <!-- comma delimited FQNs to preload -->
                <preload>/</preload>
                <!-- are the cache loaders shared in a cluster? -->
                <shared>false</shared>

                <!-- we can now have multiple cache loaders, which get chained -->
                <!-- the 'cacheloader' element may be repeated -->
                <cacheloader>
                    <class>org.jboss.cache.loader.JDBCCacheLoader</class>
                    <!-- same as the old CacheLoaderConfig attribute -->
                    <properties>
                        cache.jdbc.driver=com.mysql.jdbc.Driver
                        cache.jdbc.url=jdbc:mysql://localhost:3306/jbossdb
                        cache.jdbc.user=root
                        cache.jdbc.password=
                    </properties>
                    <!-- whether the cache loader writes are asynchronous -->
                    <async>false</async>
                    <!-- only one cache loader in the chain may set fetchPersistentState to true.
                        An exception is thrown if more than one cache loader sets this to true. -->
                    <fetchPersistentState>true</fetchPersistentState>
                    <!-- determines whether this cache loader ignores writes - defaults to false. -->
                    <ignoreModifications>false</ignoreModifications>
                    <!-- if set to true, purges the contents of this cache loader when the cache starts up.
                    Defaults to false.  -->
                    <purgeOnStartup>false</purgeOnStartup>
                </cacheloader>

            </config>
        </attribute>

    </mbean>

Note: In JBossCache releases prior to 1.3.0, the cache loader configuration block used to look like this. Note that this form is DEPRECATED and you will have to replace your cache loader configuration with a block similar to the one above.

    <!-- ==================================================================== -->
    <!-- Defines TreeCache configuration                                      -->
    <!-- ==================================================================== -->

    <mbean code="org.jboss.cache.TreeCache" name="jboss.cache:service=TreeCache">
       <attribute name="CacheLoaderClass">org.jboss.cache.loader.bdbje.BdbjeCacheLoader</attribute>
       <!-- attribute name="CacheLoaderClass">org.jboss.cache.loader.FileCacheLoader</attribute -->
       <attribute name="CacheLoaderConfig" replace="false">
         location=c:\\tmp\\bdbje
       </attribute>
       <attribute name="CacheLoaderShared">true</attribute>
       <attribute name="CacheLoaderPreload">/</attribute>
       <attribute name="CacheLoaderFetchTransientState">false</attribute>
       <attribute name="CacheLoaderFetchPersistentState">true</attribute>
       <attribute name="CacheLoaderAsynchronous">true</attribute>
    </mbean>

The CacheLoaderClass attribute defines the class of the CacheLoader implementation. (Note that, because of a bug in the properties editor in JBoss, backslashes in variables for Windows filenames might not get expanded correctly, so replace="false" may be necessary).

The currently available implementations shipped with JBossCache are:

  • FileCacheLoader, which is a simple filesystem-based implementation. The <cacheloader><properties> element needs to contain a "location" property, which maps to a directory where the file is located (e.g., "location=c:\\tmp").

  • BdbjeCacheLoader, which is a CacheLoader implementation based on the Sleepycat DB Java Edition. The <cacheloader><properties> element needs to contain a "location" property, which maps to a directory,where the database file for Sleepycat resides (e.g., "location=c:\\tmp").

  • JDBCCacheLoader, which is a CacheLoader implementation using JDBC to access any relational database. The <cacheloader><properties> element contains a number of properties needed to connect to the database such as username, password, and connection URL. See the section on JDBCCacheLoader for more details.

  • LocalDelegatingCacheLoader, which enables loading from and storing to another local (same VM) TreeCache.

  • TcpDelegatingCacheLoader, which enables loading from and storing to a remote (different VM) TreeCache using TCP as the transport mechanism. This CacheLoader is available in JBossCache version 1.3.0 and above.

  • ClusteredCacheLoader, which allows querying of other caches in the same cluster for in-memory data via the same clustering protocols used to replicate data. Writes are not 'stored' though, as replication would take care of any updates needed. You need to specify a property called "timeout", a long value telling the cache loader how many milliseconds to wait for responses from the cluster before assuming a null value. For example, "timeout = 3000" would use a timeout value of 3 seconds. This CacheLoader is available in JBossCache version 1.3.0 and above.

Note that the Sleepycat implementation is much more efficient than the filesystem-based implementation, and provides transactional guarantees, but requires a commercial license if distributed with an application (see http://www.sleepycat.com/jeforjbosscache for details).

An implementation of CacheLoader has to have an empty constructor due to the way it is instantiated.

The properties element defines a configuration specific to the given implementation. The filesystem-based implementation for example defines the root directory to be used, whereas a database implementation might define the database URL, name and password to establish a database connection. This configuration is passed to the CacheLoader implementation via CacheLoader.setConfig(Properties). Note that backspaces may have to be escaped. Analogous to the CacheLoaderConfig attribute in pre-1.3.0 configurations.

preload allows us to define a list of nodes, or even entire subtrees, that are visited by the cache on startup, in order to preload the data associated with those nodes. The default ("/") loads the entire data available in the backend store into the cache, which is probably not a good idea given that the data in the backend store might be large. As an example, /a, /product/catalogue loads the subtrees /a and /product/catalogue into the cache, but nothing else. Anything else is loaded lazily when accessed. Preloading makes sense when one anticipates using elements under a given subtree frequently. Note that preloading loads all nodes and associated attributes from the given node, recursively up to the root node. Analogous to the CacheLoaderPreload attribute in pre-1.3.0 configurations.

fetchPersistentState determines whether or not to fetch the persistent state of a cache when joining a cluster. Only one configured cache loader may set this property to true; if more than one cache loader does so, a configuration exception will be thrown when starting your cache service. Analogous to the CacheLoaderFetchPersistentState attribute in pre-1.3.0 configurations.

async determines whether writes to the cache loader block until completed, or are run on a separate thread so writes return immediately. If this is set to true, an instance of org.jboss.cache.loader.AsyncCacheLoader is constructed with an instance of the actual cache loader to be used. The AsyncCacheLoader then delegates all requests to the underlying cache loader, using a separate thread if necessary. See the Javadocs on org.jboss.cache.loader.AsyncCacheLoader for more details. If unspecified, the async element defaults to false. Analogous to the CacheLoaderAsynchronous attribute in pre-1.3.0 configurations.

Note on using the async element: there is always the possibility of dirty reads since all writes are performed asynchronously, and it is thus impossible to guarantee when (and even if) a write succeeds. This needs to be kept in mind when setting the async element to true.

ignoreModifications determines whether write methods are pushed down to the specific cache loader. Situations may arise where transient application data should only reside in a file based cache loader on the same server as the in-memory cache, for example, with a further shared JDBC cache loader used by all servers in the network. This feature allows you to write to the 'local' file cache loader but not the shared JDBC cache loader. This property defaults to false, so writes are propagated to all cache loaders configured.

purgeOnStatup empties the specified cache loader (if ignoreModifications is false) when the cache loader starts up.

7.3. Cache passivation

A CacheLoader can be used to enforce node passivation and activation on eviction in a TreeCache.

Cache Passivation is the process of removing an object from in-memory cache and writing it to a secondary data store (e.g., file system, database) on eviction. Cache Activation is the process of restoring an object from the data store into the in-memory cache when it's needed to be used. In both cases, the configured CacheLoader will be used to read from the data store and write to the data store.

When the eviction policy in effect calls evict() to evict a node from the cache, if passivation is enabled, a notification that the node is being passivated will be emitted to the tree cache listeners and the node and its children will be stored in the cache loader store. When a user calls get() on a node that was evicted earlier, the node is loaded (lazy loaded) from the cache loader store into the in-memory cache. When the node and its children have been loaded, they're removed from the cache loader and a notification is emitted to the tree cache listeners that the node has been activated.

To enable cache passivation/activation, you can set passivation to true. The default is false. You set it via the XML cache configuration file. The XML above shows the passivation element when configuring a cache loader. When passivation is used, only the first cache loader configured is used. All others are ignored.

7.4. CacheLoader use cases

7.4.1. Local cache with store

This is the simplest case. We have a JBossCache instance, whose mode is LOCAL, therefore no replication is going on. The CacheLoader simply loads non-existing elements from the store and stores modifications back to the store. When the cache is started, depending on the preload element, certain data can be preloaded, so that the cache is partly warmed up.

When using PojoCache, this means that entire POJOs can be stored to a database or a filesystem, and when accessing fields of a POJO, they will be lazily loaded using the CacheLoader to access a backend store. This feature effectively provides simple persistency for any POJO.

7.4.2. Replicated