Chapter 37. High Availability and Failover

We define high availability as the ability for the system to continue functioning after failure of one or more of the servers. A part of high availability is failover which we define as the ability for client connections to migrate from one server to another in event of server failure so client applications can continue to operate.

JBoss Messaging provides high availability by replicating servers in pairs. It also provides both 100% transparent client failover and application-level client failover.

37.1. Server replication

JBoss Messaging allows pairs of servers to be linked together as live - backup pairs. In this release there is a single backup server for each live server. Backup servers are not operational until failover occurs. In later releases we will most likely support replication onto multiple backup servers.

When a live - backup pair is configured, JBoss Messaging ensures that the live server state is replicated to the backup server. Replicated state includes session state, and also global state such as the set of queues and addresses on the server.

When a client fails over from live to backup server, the backup server will already have the correct global and session state, so the client will be able to resume its session(s) on the backup server as if nothing happened.

Replication is performed in an asynchronous fashion between live and backup server. Data is replicated one way in a stream, and responses that the data has reached the backup is returned in another stream. By pipelining replications and responses to replications in separate streams allows replication throughput to be much higher than if we synchronously replicated data and waited for a response serially in an RPC manner before replicating the next piece of data.

37.1.1. Configuring live-backup pairs

First, on the live server, in jbm-configuration.xml configure the live server with knowledge of its backup server. This is done by specifying a backup-connector-ref element. This element references a connector, also specified on the live server which contains knowledge of how to connect to the backup server. Here's a snippet from jbm-configuration.xml showing a live server configured with a backup server:

<backup-connector-ref connector-name="backup-connector"/>
   
<!-- Connectors -->

<connectors>

   ...
   
   <!-- This connector specifies how to connect to the backup server -->
   <connector name="backup-connector">
     <factory-class>
        org.jboss.messaging.integration.transports.netty.NettyConnectorFactory
     </factory-class>
     <param key="jbm.remoting.netty.port" value="5445" type="Integer"/>
   </connector>

</connectors>

Secondly, on the backup server, also in jbm-configuration.xml , the element backup must be set to true. I.e. :

<backup>true</backup>

37.1.2. Synchronization of live-backup pairs

In order for live - backup pairs to operate properly, they must be identical replicas. This means you cannot just use any backup server that's previously been used for other purposes as a backup server, since it will have different data in its persistent storage. If you try to do so you will receive an exception in the logs and the server will fail to start.

To create a backup server for a live server that's already been used for other purposes, it's necessary to copy the data directory from the live server to the backup server. This means the backup server will have an identical persistent store to the backup server.

Similarly when a client fails over from a live server Lto a backup server B, the server Lbecomes invalid since, from that point on, the data on L and B may diverge. After such a failure, at the next available opportunity the B server should be taken down, and its data directory copied back to the L server. Live and backup servers can then be restarted. In this release of JBoss Messaging we do not provide any automatic facility for re-assigning a backup node with a live node while it is running.

For a backup server to function correctly it's also important that it has the same set of bridges, predefined queues, cluster connections, broadcast groups and discovery groups as defined on the live node. The easiest way to ensure this is just to copy the entire server side configuration from live to backup and just make the changes as specified in the previous section.

37.1.3. Queue activation timeout

If a live server fails, as client connections failover from the live node to the backup, they do so at a rate determined by the client, and it might be the case that some client connections never fail over.

Different client connections may have different consumers on the same queue(s). The queue on the backup will wait for all its consumers to reattach before activating delivery on itself. If all connections have not reattached with this timeout then the queue will activate regardless.

This param is defined in jbm-configuration.xml using the setting queue-activation-timeout. It's default value is 30000 milliseconds.

37.2. Automatic client failover

JBoss Messaging clients can be configured with knowledge of live and backup servers, so that in event of connection failure of the client - live server connection, the client will detect this and reconnect its sessions to the backup server. Because of server replication, then backup server will already have those sessions in the same state they were left on the live server and the client will be able to reconnect them and resume them 100% transparently as if nothing happened.

For automatic failover JBoss Messaging requires zero coding of special failover code on the client or server. This differs from other messaging systems which intrusively require you to code special failover handling code. JBoss Messaging automatic failover preserves all your normal JMS or core API semantics and allows your client code to continue 100% uninterrupted on event of connection failure and failover from a live to a backup server.

JBoss Messaging clients detect connection failure when it has not received packets from the server within the time given by client-failure-check-period as explained in section Chapter 15, Dead Connections and Session Multiplexing. If the client does not receive data in good time, it will assume the connection has failed and attempt failover.

JBoss Messaging clients can be configured with the list of live-backup server pairs in a number of different ways. They can be configured explicitly or probably the most common way of doing this is to use server discovery for the client to automatically discover the list. For full details on how to configure clients please see Section 36.2, “Server discovery”.

Sometimes you want a client to failover onto a backup server even if the live server is just cleanly shutdown rather than having crashed or the connection failed. To configure this you can set the property FailoverOnServerShutdown to false either on the JBossConnectionFactory if you're using JMS or in the jbm-jms.xml file when you define the connection factory, or if using core by setting the property directly on the ClientSessionFactoryImpl instance after creation. The default value for this property is false, this means that by default JBoss Messaging clients will not failover to a backup server if the live server is simply shutdown cleanly.

For a fully functional example of automatic failover, please see Section 9.1.2, “Automatic (Transparent) Failover”.

37.3. Application-level client failover

In some cases you may not want automatic client failover, and prefer to handle any connection failure yourself, and code your own manually reconnection logic in your own failure handler. We define this as application-level failover, since the failover is handled at the user application level.

If all your clients use application-level failover then you do not need server replication on the server side, and should disabled this. Server replication has some performance overhead and should be disabled if it is not required. To disable server replication simply do not specify a backup-connector element for each live server.

To implement application-level failover, if you're using JMS then you need to code an ExceptionListener class on the JMS connection. The ExceptionListener will be called by JBoss Messaging in the event that connection failure is detected. In your ExceptionListener you would close your old JMS connections, potentially look up new connection factory instances from JNDI and creating new connections. In this case you may well be using HA-JNDI [link] to ensure that the new connection factory is looked up from a different server.

For a working example of application-level failover, please see Section 9.1.1, “Application-Layer Failover”.

If you are using the core API, then the procedure is very similar: you would code a FailureListener on your core ClientSession instances.