Chapter 1. Introduction

1.1. What is a TreeCache?

A TreeCache is a tree-structured, replicated, transactional cache from JBoss Cache. TreeCache is the backbone for many fundamental JBoss Application Server clustering services, including - in certain versions - clustering JNDI, HTTP and EJB sessions, and clustering JMS.

In addition to this, TreeCache can be used as a standalone transactional and replicated cache or even an object oriented data store, may be embedded in other J2EE compliant application servers such as BEA WebLogic or IBM WebSphere, servlet containers such as Tomcat, or even in Java applications that do not run from within an application server.

1.2. TreeCache Basics

The structure of a TreeCache is a tree with nodes. Each node has a name and zero or more children. A node can only have 1 parent; there is currently no support for graphs. A node can be reached by navigating from the root recursively through children, until the requested node is found. It can also be accessed by giving a fully qualified name (FQN), which consists of the concatenation of all node names from the root to the node in question.

A TreeCache can have multiple roots, allowing for a number of different trees to be present in a single cache instance. Note that a one level tree is essentially a HashMap. Each node in the tree has a map of keys and values. For a replicated cache, all keys and values have to be Serializable. Serializability is not a requirement for PojoCache, where reflection and aspect-oriented programming is used to replicate any type.

A TreeCache can be either local or replicated. Local trees exist only inside the Java VM in which they are created, whereas replicated trees propagate any changes to all other replicated trees in the same cluster. A cluster may span different hosts on a network or just different JVMs on a single host.

The first version of TreeCache was essentially a single HashMap that replicated. However, the decision was taken to go with a tree structured cache because (a) it is more flexible and efficient and (b) a tree can always be reduced to a map, thereby offering both possibilities. The efficiency argument was driven by concerns over replication overhead, and was that a value itself can be a rather sophisticated object, with aggregation pointing to other objects, or an object containing many fields. A small change in the object would therefore trigger the entire object (possibly the transitive closure over the object graph) to be serialized and propagated to the other nodes in the cluster. With a tree, only the modified nodes in the tree need to be serialized and propagated. This is not necessarily a concern for TreeCache, but is a vital requirement for PojoCache (as we will see in the separate PojoCache documentation).

When a change is made to the TreeCache, and that change is done in the context of a transaction, then we defer the replication of changes until the transaction commits successfully. All modifications are kept in a list associated with the transaction for the caller. When the transaction commits, we replicate the changes. Otherwise, (on a rollback) we simply undo the changes locally and release any locks, resulting in zero replication traffic and overhead. For example, if a caller makes 100 modifications and then rolls back the transaction, we will not replicate anything, resulting in no network traffic.

If a caller has no transaction associated with it (and isolation level is not NONE - more about this later), we will replicate right after each modification, e.g. in the above case we would send 100 messages, plus an additional message for the rollback. In this sense, running without a transaction can be thought of as analogous as running with auto-commit switched on in JDBC terminology, where each operation is committed automatically.

There is an API for plugging in different transaction managers: all it requires is to get the transaction associated with the caller's thread. Several TransactionManagerLookup implementations are provided for popular transaction managers, including a DummyTransactionManager for testing.

Finally, we use pessimistic locking of the cache by default, with optimistic locking as a configurable option. With pessimistic locking, we can configure the local locking policy corresponding to database-style transaction isolation levels, i.e., SERIALIZABLE, REPEATABLE, READ_COMMITTED, READ_UNCOMMITTED and NONE. More on transaction isolation levels will be discussed later. Note that the cluster-wide isolation level is READ-UNCOMMITTED by default as we don’t acquire a cluster-wide lock on touching an object for which we don’t yet have a lock (this would result in too high an overhead for messaging).

With optimistic locking, isolation levels are ignored as each transaction effectively maintains a copy of the data with which it works on and then attempts to merge back into the tree structure upon transaction completion. This results in a near-serializable degree of data integrity, applied cluster-wide, for the minor performance penalty incurred when validating workspace data at commit time, and the occasional transaction commit failure due to validation failures at commit time.