The failure recovery subsystem of JBossTS will ensure that results of
a transaction are applied consistently to all resources affected by the
transaction, even if any of the application processes or the machine hosting
them crash or lose network connectivity. In the case of machine (system)
crash or network failure, the recovery will not take place until the system
or network are restored, but the original application does not need to
be restarted recovery responsibility is delegated to the Recovery Manager
process (see below). Recovery after failure requires that information about
the transaction and the resources involved survives the failure and is
accessible afterward: this information is held in the ActionStore, which
is part of the ObjectStore. If the ObjectStore is destroyed or modified,
recovery may not be possible.
Until the recovery procedures are complete, resources affected by a
transaction that was in progress at the time of the failure may be inaccessible.
For database resources, this may be reported as tables or rows held by
"in-doubt transactions".
The Recovery Manager
The Recovery Manager is a daemon process responsible for performing crash
recovery. Only one Recovery Manager runs per node. The Object Store provides
persistent data storage for transactions to log data. During normal transaction
processing each transaction will log persistent data needed for the commit
phase to the Object Store. On successfully committing a transaction this
data is removed, however if the transaction fails then this data remains
within the Object Store.
The Recovery Manager functions by:
-
Periodically scanning the Object Store for transactions that may have failed.
Failed transactions are indicated by the presence of log data after a period
of time that the transaction would have normally been expected to finish.
-
Checking with the application process which originated the transaction
whether the transaction is still in progress or not.
-
Recovering the transaction by re-activating the transaction and then replaying
phase two of the commit protocol.
To start the Recovery Manager issue the following command:
java com.arjuna.ats.arjuna.recovery.RecoveryManager
If the -test flag is used with the Recovery Manager then it will display a "Ready" message when initialised, i.e.,
java com.arjuna.ats.arjuna.recovery.RecoveryManager -test
On initialization the Recovery Manager first loads in configuration information
via a properties file. This configuration includes a number of recovery
activators and recovery modules, which are then dynamically loaded.
Each recovery activator, which implements the
com.arjuna.ats.arjuna.recovery.RecoveryActivator
interface, is used to instantiate a recovery class related to the underlying
communication protocol. Indeed, since the version 3.0 of JBossTS, the
Recovery Manager is not specifically tied to an Object Request Broker or
ORB, which is to specify a recovery instance able to manage the OTS recovery
protocol the new interface RecoveryActivator is provided to identify specific
transaction protocol. For instance, when used with OTS, the RecoveryActivitor
has the responsibility to create a RecoveryCoordinator object able to respond
to the replay_completion operation.
All RecoveryActivator instances inherit the same interface. They are loaded via
the following recovery extension property:
<property
name="com.arjuna.ats.arjuna.recovery.recoveryActivator_<number>"
value="RecoveryClass"/>
For instance the RecoveryActivator provided in the distribution of JTS/OTS, which shall not be commented, is as follow :
<property
name="com.arjuna.ats.arjuna.recovery.recoveryActivator_1"
value="com.arjuna.ats.internal.jts.
orbspecific.recovery.RecoveryEnablement"/>
Each recovery module, which implements the
com.arjuna.ats.arjuna.recovery.RecoveryModule
interface, is used to recover a different type of transaction/resource,
however each recovery module inherits the same basic behaviour.
Recovery consists of two separate passes/phases separated by two timeout
periods. The first pass examines the object store for potentially failed
transactions; the second pass performs crash recovery on failed transactions.
The timeout between the first and second pass is known as the backoff period.
The timeout between the end of the second pass and the start of the first
pass is the recovery period. The recovery period is larger than the backoff
period.
The Recovery Manager invokes the first pass upon each recovery module,
applies the backoff period timeout, invokes the second pass upon each recovery
module and finally applies the recovery period timeout before restarting
the first pass again.
The recovery modules are loaded via the following recovery extension property:
com.arjuna.ats.arjuna.recovery.recoveryExtension<number>=<RecoveryClass>
The default RecoveryExtension settings are:
<property name="com.arjuna.ats.arjuna.recovery.recoveryExtension1"
value="com.arjuna.ats.internal.
arjuna.recovery.AtomicActionRecoveryModule"/>
<property name="com.arjuna.ats.arjuna.recovery.recoveryExtension2"
value="com.arjuna.ats.internal.
txoj.recovery.TORecoveryModule"/>
<property name="com.arjuna.ats.arjuna.recovery.recoveryExtension3"
value="com.arjuna.ats.internal.
jts.recovery.transactions.TopLevelTransactionRecoveryModule"/>
<property name="com.arjuna.ats.arjuna.recovery.recoveryExtension4"
value="com.arjuna.ats.internal.
jts.recovery.transactions.ServerTransactionRecoveryModule"/>
Configuring the Recovery Manager
Periodic Recovery
The backoff period and recovery period are set using the following properties:
com.arjuna.ats.arjuna.recovery.recoveryBackoffPeriod (default 10 secs)
com.arjuna.ats.arjuna.recovery.periodicRecovery (default 120 secs)
Expired entry removal
The operation of the recovery subsystem will cause some entries to be made
in the ObjectStore that will not be removed in normal progress. The RecoveryManager
has a facility for scanning for these and removing items that are very
old. Scans and removals are performed by implementations of the
com.arjuna.ats.arjuna.recovery.ExpiryScanner.
Implementations of this interface are loaded by giving the class name as
the value of a property whose name begins with
ExperyScanner.
The RecoveryManager calls the scan() method on each loaded ExpiryScanner
implementation at an interval determined by the property
com.arjuna.ats.arjuna.recovery.expiryScanInterval.
This value is given in hours default is 12. An EXPIRY_SCAN_INTERVAL value
of zero will suppress any expiry scanning. If the value as supplied is
positive, the first scan is performed when RecoveryManager starts; if the
value is negative, the first scan is delayed until after the first interval
(using the absolute value)
The default ExpiryScanner is:
<property
name="com.arjuna.ats.arjuna.recovery.
expiryScannerTransactionStatusManager"
value="com.arjuna.ats.internal.arjuna.recovery.
ExpiredTransactionStatusManagerScanner"/>
The following table summarize properties used by the Recovery Manager. These
properties are defined by default the properties file named RecoveryManager-properties.xml.
Name
|
Description
|
Possible Value
|
Default Value
|
com.arjuna.ats.arjuna.
recovery.periodicRecoveryPeriod
|
Interval in seconds between initiating the periodic recovery modules
|
Value in seconds
|
120
|
com.arjuna.ats.arjuna.
recovery.recoveryBackoffPeriod
|
Interval in seconds between first and second pass of periodic recovery
|
Value in seconds
|
10
|
com.arjuna.ats.arjuna.
recovery.recoveryExtensionX
|
Indicates a periodic recovery module to use. X is the occurence number
of the recovery module among a set of recovery modules. These modules are
invoked in sort-order of names
|
The class name of the periodic recovery module
|
JBossTS provides a set classes given in the RecoveryManager-properties.xml file
|
com.arjuna.ats.arjuna.
recovery.recoveryActivator_X
|
Indicates a recovery activator to use. X is the occurence number of the
recovery activator among a set of recovery activators.
|
The class name of the periodic recovery activator
|
JBossTS provide one class that manages the recovery protocol specified
by the OTS specification
|
com.arjuna.ats.arjuna.
recovery.expiryScannerXXX
|
Expiry scanners to use (order of invocation is random). Names must begin
with "com.arjuna.ats.arjuna. recovery.expiryScanner"
|
Class name
|
JBossTS provides one class given in the RecoveryManager-properties.xml file
|
com.arjuna.ats.arjuna.
recovery.expiryScanInterval
|
Interval, in hours, between running the expiry scanners. This can be
quite long. The absolute value determines the interval - if the value is
negative, the scan will NOT be run until after one interval has elapsed.
If positive the first scan will be immediately
after startup. Zero will prevent any scanning.
|
Value in hours
|
12
|
com.arjuna.ats.arjuna.recovery.
transactionStatusManagerExpiryTime
|
Age, in hours, for removal of transaction status manager item. This should
be longer than any ts-using process will remain running. Zero = Never removed.
|
Value in Hours
|
12
|
com.arjuna.ats.arjuna.recovery
transactionStatusManagerPort
|
Use this to fix the port on which the TransactionStatusManager listens
|
Port number (short)
|
use a free port
|