arjuna logoarjuna strap line


print this page
email this page

What is a Transaction

Transaction management is one of the most crucial requirements for enterprise application development. Most of the large enterprise applications in the domains of finance, banking and electronic commerce rely on transaction processing for delivering their business functionality.

Enterprise applications often require concurrent access to distributed data shared amongst multiple components, to perform operations on data. Such applications should maintain integrity of data (as defined by the business rules of the application) under the following circumstances:

  • distributed access to a single resource of data, and
  • access to distributed resources from a single application component.

In such cases, it may be required that a group of operations on (distributed) resources be treated as one unit of work. In a unit of work, all the participating operations should either succeed or fail and recover together. This problem is more complicated when

  • a unit of work is implemented across a group of distributed components operating on data from multiple resources, and/or
  • the participating operations are executed sequentially or in parallel threads requiring coordination and/or synchronization.

In either case, it is required that success or failure of a unit of work be maintained by the application. In case of a failure, all the resources should bring back the state of the data to the previous state (i.e., the state prior to the commencement of the unit of work).

From the programmer's perspective a transaction is a scoping mechanism for a collection of actions which must complete as a unit. It provides a simplified model for exception handling since only two outcomes are possible:

  • success - meaning that all actions involved within a transaction are completed
  • failure - no actions complete
Example

To illustrate the reliability expected by the application let’s consider the funds transfer example which is familiar to all of us.
The Money transfer involves two operations: Deposit and Withdrawal
The complexity of implementation doesn't matter; money moves from one place to another. For instance, involved accounts may be either located in a same relational table within a database or located on different databases.

A Simple transfer consists on moving money from savings to checking while a Complex transfer can be performed at the end- of- day according to a reconciliation between international banks

The concept of a transaction, and a transaction manager (or a transaction processing service) simplifies construction of such enterprise level distributed applications while maintaining integrity of data in a unit of work.

A transaction is a unit of work that has the following properties:

  • Atomicity – either the whole transaction completes or nothing completes - partial completion is not permitted.
  • Consistency – a transaction transforms the system from one consistent state to another. In other words, On completion of a successful transaction, the data should be in a consistent state. For example, in the case of relational databases, a consistent transaction should preserve all the integrity constraints defined on the data.
  • Isolation: Each transaction should appear to execute independently of other transactions that may be executing concurrently in the same environment. The effect of executing a set of transactions serially should be the same as that of running them concurrently. This requires two things:
    • During the course of a transaction, intermediate (possibly inconsistent) state of the data should not be exposed to all other transactions.
    • Two concurrent transactions should not be able to operate on the same data. Database management systems usually implement this feature using locking.
  • Durabiliy: The effects of a completed transaction should always be persistent.

These properties, called as ACID properties, guarantee that a transaction is never incomplete, the data is never inconsistent, concurrent transactions are independent, and the effects of a transaction are persistent.

Transactional Concepts

Transaction Components

A collection of actions is said to be transactional if they possess the ACID properties. These properties are assumed to be ensured, in the presence of failures; if actions involved within the transaction are performed by a Transactional System. A transaction system includes a set of components where each of them has a particular role. Main components are described below.

Application Programs

Application Programs are clients for the transactional resources. These are the programs with which the application developer implements business transactions. With the help of the transaction manager, these components create global transactions and operate on the transactional resources with in the scope of these transactions. These components are not responsible for implementing mechanisms for preserving ACID properties of transactions. However, as part of the application logic, these components generally make a decision whether to commit or rollback transactions.

Application responsibilities could be summarized as follow:

  • Create and demarcate transactions
  • Operate on data via resource managers
Resource Managers

A resource manager is in general a component that manages persistent and stable data storage system, and participates in the two phase commit and recovery protocols with the transaction manager.

A resource manager is typically a driver that provides two sets of interfaces: one set for the application components to get connections and operating, and the other set for participating in two phase commit and recovery protocols coordinated by a transaction manager. This component may also, directly or indirectly, register resources with the transaction manager so that the transaction manager can keep track of all the resources participating in a transaction. This process is called as resource enlistment.

Resource Manager responsibilities could be summarized as follow

  • Enlist resources with the transaction manager
  • Participate in two-phase commit and recovery protocol
Transaction Manager

The transaction manager is the core component of a transaction processing environment. Its main responsibilities are to create transactions when requested by application components, allow resource enlistment and delistment, and to manage the two-phase commit or recovery protocol with the resource managers.

A typical transactional application begins a transaction by issuing a request to a transaction manager to initiate a transaction. In response, the transaction manager starts a transaction and associates it with the calling thread. The transaction manager also establishes a transaction context. All application components and/or threads participating in the transaction share the transaction context. The thread that initially issued the request for beginning the transaction, or, if the transaction manager allows, any other thread may eventually terminate the transaction by issuing a commit or rollback request.

Before a transaction is terminated, any number of components and/or threads may perform transactional operations on any number of transactional resources known to the transaction manager. If allowed by the transaction manager, a transaction may be suspended or resumed before finally completing the transaction.

Once the application issues the commit request, the transaction manager prepares all the resources for a commit operation, and based on whether all resources are ready for a commit or not, issues a commit or rollback request to all the resources.

Resource Manager responsibilities could be summarized as follow:

  • Establish and maintain transaction context
  • Maintain association between a transaction and the participating resources.
  • Initiate and conduct two-phase commit and recovery protocol with the resource managers.
  • Make synchronization calls to the application components before beginning and after end of the two-phase commit and recovery process
Local vs. Distributed Transaction

A transaction that involves only one transactional resource, such a database, is considered as local transaction, while a transaction that involves more than one transactional resource that need to be coordinated to reach a consistent state is considered as a distributed transaction.

A transaction can be specified by what is known as transaction demarcation. Transaction demarcation enables work done by distributed components to be bound by a global transaction. It is a way of marking groups of operations to constitute a transaction.

The most common approach to demarcation is to mark the thread executing the operations for transaction processing. This is called as programmatic demarcation. The transaction so established can be suspended by unmarking the thread, and be resumed later by explicitly propagating the transaction context from the point of suspension to the point of resumption.

The transaction demarcation ends after a commit or a rollback request to the transaction manager. The commit request directs all the participating resources managers to record the effects of the operations of the transaction permanently. The rollback request makes the resource managers undo the effects of all operations on the transaction.

Transaction Context and Propagation

Since multiple application components and resources participate in a transaction, it is necessary for the transaction manager to establish and maintain the state of the transaction as it occurs. This is usually done in the form of transaction context.

Transaction context is an association between the transactional operations on the resources, and the components invoking the operations. During the course of a transaction, all the threads participating in the transaction share the transaction context. Thus the transaction context logically envelops all the operations performed on transactional resources during a transaction. The transaction context is usually maintained transparently by the underlying transaction manager.

Resource Enlistment

Resource enlistment is the process by which resource managers inform the transaction manager of their participation in a transaction. This process enables the transaction manager to keep track of all the resources participating in a transaction. The transaction manager uses this information to coordinate transactional work performed by the resource managers and to drive two-phase and recovery protocol. At the end of a transaction (after a commit or rollback) the transaction manager delists the resources.

Two-Phase Commit

This protocol between the transaction manager and all the resources enlisted for a transaction ensures that either all the resource managers commit the transaction or they all abort. In this protocol, when the application requests for committing the transaction, the transaction manager issues a prepare request to all the resource managers involved. Each of these resources may in turn send a reply indicating whether it is ready for commit or not. Only The transaction manager issue a commit request to all the resource managers, only when all the resource managers are ready for a commit. Otherwise, the transaction manager issues a rollback request and the transaction will be rolled back.

Recovery and Logging
Basically, the Recovery is the mechanism which preserves the transaction atomicity in presence of failures. The basic technique for implementing transactions in presence of failures is based on the use of logs. That is, a transaction system has to record enough information to ensure that it can be able to return to a previous state in case of failure or to ensure that changes committed by a transaction are properly stored.

In addition to be able to store appropriate information, all participants within a distributed transaction must log similar information which allow them to take a same decision either to set data in their final state or in their initial state.

Two techniques are in general used to ensure transaction's atomicity. A first technique focuses on manipulated data, such the Do/Undo/Redo protocol (considered as a recovery mechanism in a centralized system), which allow a participant to set its data in their final values or to retrieve them in their initial values. A second technique relies on a distributed protocol named the two phases commit, ensuring that all participants involved within a distributed transaction set their data either in their final values or in their initial values. In other words all participants must commit or all must rollback.

In addition to failures we refer as centralized such system crashes, communication failures due for instance to network outages or message loss have to be considered during the recovery process of a distributed transaction.

In order to provide an efficient and optimized mechanism to deal with failure, modern transactional systems typically adopt a “presume abort” strategy, which simplifies the transaction management.

The presumed abort strategy can be stated as «when in doubt, abort». With this strategy, when the recovery mechanism has no information about the transaction, it presumes that the transaction has been aborted.

A particularity of the presumed-abort assumption allows a coordinator to not log anything before the commit decision and the participants do not to log anything before they prepare. Then, any failure which occurs before the 2pc starts lead to abort the transaction. Furthermore, from a coordinator point of view any communication failure detected by a timeout or exception raised on sending prepare is considered as a negative vote which leads to abort the transaction. So, within a distributed transaction a coordinator or a participant may fail in two ways: either it crashes or it times out for a message it was expecting. When a coordinator or a participant crashes and then restarts, it uses information on stable storage to determine the way to perform the recovery. As we will see it the presumed-abort strategy enable an optimized behavior for the recovery.
Heuristic Decision
In extremely rare cases, a resource manager may choose not to wait for the outcome from the transaction manager. This might occur if the communications path was lost and was not likely to be restored for a very long time. Typically this happens as a result of human intervention and not as an arbitrary action of a resource manager. In order to release locks and make this transactions data available to new transactions, the resource manager makes a heuristic decision, i.e. it guesses the proper transaction outcome. When it does so, it must remember its guess until contact with the transaction manager is ultimately re-established.

Standards

Saying that a distributed transaction can involve several distributed participants, means that these participant must be integrated within a global transaction manager which has the responsibility to ensure that all participants take a common decision to commit or rollback the distributed transaction. The key of such integration is the existence of a common transactional interface which is understood by all participants, transaction manager and resource managers such databases.

The importance of common interfaces between participants, as well as the complexity of their implementation, becomes obvious in an open systems environment. For this aim various distributed transaction processing standards have been developed by international standards organizations. Among these organizations, We list three of them which are mainly considered in the Arjuna Transaction Service product:

Basically these standards have proposed logical models, which divide transaction processing into several functions:
  • those assigned to the application which ties resources together in application- specific operations
  • those assigned to the Resource manager which access physically to data stores
  • functions performed by the Transaction Manager which manages transactions, and finally
  • Communication Resource Managers which allow to exchange information with other transactional domains.

Copyright 2002-2005 Arjuna Technologies All Rights Reserved.
info@arjuna.com +44 191 243 0676