0% found this document useful (0 votes)
64 views14 pages

5th Unit Notes

Distributed transactions are executed across connected data servers hosting related data. A distributed transaction comprises sub-transactions executed by each server to ensure serialization. Concurrency control mechanisms like locking and timestamps coordinate sub-transactions to maintain data integrity during distributed transaction execution.

Uploaded by

Harini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views14 pages

5th Unit Notes

Distributed transactions are executed across connected data servers hosting related data. A distributed transaction comprises sub-transactions executed by each server to ensure serialization. Concurrency control mechanisms like locking and timestamps coordinate sub-transactions to maintain data integrity during distributed transaction execution.

Uploaded by

Harini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

UNIT V DISTRIBUTED TRANSACTIONS

Transaction and Concurrency control- Nested Transactions -Locks-Optimistic concurrency


control-Timestamp ordering distributed transactions-Flat and Nested -Atomic -Two phase
commit protocol-concurrency control.

Distributed transactions are executed in a distributed database framework where a set of


connected data servers host related data. A distributed transaction comprises a set of sub-
transactions, each of which is executed by a specific data server. These transactions are
serialized to ensure effective execution.

A Transaction
A Transaction consists of a series of read or write operation performed on a database. The
read or write operation performs a specific unit of work. A transaction is a collection of
read/write operations succeeding only if all contained operations succeed. However, a
transaction depends on two outcomes, success or failure. When a transaction starts
execution, it could be terminated by two possibilities, which are Abort or Commit. A
transaction is said to be aborted when it does not continue to a successful end or its not
successfully completed whereas a transaction is committed when execution is successfully
done and the changes stored accordingly. A database transaction must be atomic, consistent,
isolated and durable. This is often referred to as the ACID Properties.
Properties of a Transaction
There is need to ensure data integrity in a concurrent environment. The ACID properties help
to maintain the integrity of a database in terms of transactions. These properties describe the
major guarantees of a database transaction. There are four properties of a transaction. These
properties are represented by the acronym, ACID, which means Atomicity, Consistency,
Isolation and Durability.
Different States of a Transaction
There are different states of a transaction, which include active state, partially committed
state, committed state, failed and terminated states.
i. Active State: A transaction is said to be in the active state as soon as the execution
begins. Read/write operations can be performed at this stage.
ii. Partially Committed State: At the end of a transaction, it is partially committed.
iii. Committed State: A transaction is in a committed state when the execution has
been successfully completed, and changes stored or saved accordingly.
iv. Failed State: a transaction that is aborted while still in its active state is said to be
a failed transaction.
v. Terminated State: A transaction is said to be terminated when certain transactions
which are leaving the system cannot be continued or restarted.

Distributed Concurrency Control


The task of managing concurrent access to a database in a distributed system is referred to as
Distributed Concurrency control. It allows users to access a multi-programmed database while
ensuring that transactions or processes are separated such that each user appears to be
running or executing alone on a dedicated device or system. Concurrency control involves the
coordination among concurrent accesses to maintain consistency and integrity of the
database. The key challenge in achieving this goal is to ensure that the changes or database
updates performed by one transaction do not affect the updates and retrievals of another
transaction. In case of concurrent execution of transactions, the consistency of the database
can be ensured with the help of serializable schedules because the result obtained will be
equivalent to one of the serial executions of the transactions. The use of only serial schedules
restricts the degree of performance thereby impacting concurrency. Thus, to ensure that the
schedules created by the simultaneous execution of transactions are serializable, concurrency
control techniques must be applied as posited.
Challenges Associated with Distributed Transactions and Concurrency Control
The challenges with concurrency control in a distributed system are highlighted below.
i. Data may be accessed by multiple users at a number of distant sites. This may cause
inconsistent retrieval and update problems.
ii. Database is fragmented and/or replicated across multiple sites.
iii. Concurrency control techniques implemented at one location must ensure the
consistency of the database at all other sites.
Benefits of Concurrency Control in Distributed Transactions
The benefits of concurrency control in distributed transactions are as follows:
i. Faster execution and response
ii. Improved performance
iii. It helps ensure serializability
iv. Reliability

NestedTransactions

A transaction that includes other transactions within its initiating point and an end point are
known as nested transactions. So, the nesting of the transactions is done in a transaction.
The nested transactions here are called sub-transactions.
The top-level transaction in a nested transaction can open sub-transactions, and each sub-
transaction can open more sub-transactions down to any depth of nesting.
A client’s transaction T opens up two sub-transactions, T1 and T2, which access objects on
servers X and Y, as shown in the diagram below.
T1.1, T1.2, T2.1, and T2.2, which access the objects on the servers M, N, and P, are opened
by the sub-transactions T1 and T2.
Nested Transaction

Concurrent Execution of the Sub-transactions is done which are at the same level – in the
nested transaction strategy. Here, in the above diagram, T1 and T2 invoke objects on
different servers and hence they can run in parallel and are therefore concurrent.
T1.1, T1.2, T2.1, and T2.2 are four sub-transactions. These sub-transactions can also run in
parallel.

Consider a distributed transaction (T) in which a customer transfers:


• Rs. 105 from account A to account C and
• Subsequently, Rs. 205 from account B to account D.

Transaction T :
Start
Transfer Rs 105 from A to C :
Deduct Rs 105 from A(withdraw from A) & Add Rs 105 to C(deposit to
C)
Transfer Rs 205 from B to D :
Deduct Rs 205 from B (withdraw from B)& Add Rs 205 to D(deposit to
D)
End
Assuming:

1. Account A is on server X
2. Account B is on server Y,and
3. Accounts C and D are on server Z.
The transaction T involves four requests – 2 for deposits and 2 for withdrawals. Now they
can be treated as sub transactions (T1, T2, T3, T4) of the transaction T.
As shown in the figure below, transaction T is designed as a set of four nested transactions:
T1, T2, T3 and T4.

So, the Transaction T may be divided into sub-transactions as:


//Start the Transaction
T = open transaction
//T1
openSubtransaction
a.withdraw(105);
//T2
openSubtransaction
b.withdraw(205);
//T3
openSubtransaction
c.deposit(105);
//T4
openSubtransaction
d.deposit(205);
//End the Transaction
close Transaction

Locking in Distributed Transactions


In a distributed transaction, each server maintains locks for its own data items. The local lock
manager can decide whether to grant a lock or make the requesting transaction wait.
However, it cannot release any locks until it knows that the transaction has been committed
or aborted at all the servers involved in the transaction. When locking is used for concurrency
control, the data items remain locked and are unavailable for other transactions during the
atomic commit protocol, although an aborted transaction releases its locks after phase 1 of
the protocol. As servers set their locks independently of one another, it is possible that
different servers may impose different orderings on transactions.
When a deadlock is detected, a transaction is aborted to resolve the deadlock. In this case,
the coordinator will be informed and will abort the transaction at the workers involved in the
transaction.
In a nested transaction, parent transactions are not allowed to run concurrently with their
child transactions, to prevent potential conflict between levels. Nested transactions inherit
locks from their ancestors. For a nested transaction to acquire a read lock on a data item, all
the holders of write locks on that data item must be its ancestors. Similarly for a nested
transaction to acquire a write lock on a data item, all the holders of read and write locks on
that data item must be its ancestors. When a nested transaction commits, its locks are
inherited by its parent. When a nested transaction aborts, its locks are removed.

Concurrency Control in Distributed Transactions

Concurrency control mechanisms provide us with various concepts & implementations to


ensure the execution of any transaction across any node doesn’t violate ACID or BASE
(depending on database) Properties causing inconstancy & mixup of data in the distributed
systems. Transactions in the distributed system are executed in “sets “, every set consists of
various sub-transactions. These sub-transactions across every node must be executed
serially to maintain data integrity & the concurrency control mechanisms do this serial
execution.
Types of Concurrency Control Mechanisms

There are 2 types of concurrency control mechanisms as shown below diagram:

Pessimistic Concurrency Control (PCC)

The Pessimistic Concurrency Control Mechanisms proceeds on assumption that, most of


the transactions will try to access the same resource simultaneously. It’s basically used to
prevent concurrent access to a shared resource and provide a system of acquiring a Lock on
the data item before performing any operation.

Optimistic Concurrency Control (OCC)

The problem with pessimistic concurrency control systems is that, if a transaction acquires
a lock on a resource so that no other transactions can access it. This will result in reducing
concurrency of the overall system.
The Optimistic Concurrency control techniques proceeds on the basis of assumption that, 0
or very less transactions will try to access a certain resource simultaneously. We can
describe a system as FULLY OPTIMISTIC, if it uses NO-Locks at all & checks for conflicts at
commit time. It has following 4-phases of operation:
• Read Phase: When a transaction begins, it read the data while also logging the
time-stamp at which data is read to verify for conflicts during the validation
phase.
• Execution Phase: In this phase, the transaction executes all its operation like
create, read, update or delete etc.
• Validation Phase: Before committing a transaction, a validation check is
performed to ensure consistency by checking the last_updated timestamp with
the one recorded at read_phase. If the timestamp matches, then the transaction
will be allowed to be committed and hence proceeds with commit phase.
• Commit phase: During this phase, the transactions will either be committed
or aborted, depending on the validation check performed during previous phase.
If the timestamp matches, then transactions are committed else they’re aborted.

Pessimistic Concurrency Control Methods

Following are the four Pessimistic Concurrency Control Methods:


1. Isolation Level
The isolation levels are defined as a degree to which the data residing
in Database must be isolated by transactions for modification. Because, if some
transactions are operating on some data let’s say transaction – T1 & there comes
another transaction – T2 and modifies it further while it was under operation by
transaction T1 this will cause unwanted inconsistency problems. Methods provided
in this are: Read-Uncommitted, Read-Committed2, Repeatable
Read & Serializable.
2. Two-Phase Locking Protocol
The two-phase locking protocol is a concurrency technique used to manage locks on
data items in database. This technique consists of 2 phases:

Growing Phase: The transaction acquires all the locks on the data items that’ll be
required to execute the transaction successfully. No locks will be realease in this
phase.
Shrinking Phase: All the locks acquired in previous phase will be released one by one
and No New locks will be acquired in this phase.

Optimistic Concurrency Control Methods

Timestamp Based (OCC)


Time stamp ordering concurrency control in distributed transactions
In a single server transaction, the server issues a unique timestamp to each transaction when
it starts. Serial equivalence is enforced by committing the versions of data items in the order
of the timestamps of transactions that accessed them. In distributed transactions, we require
that each server is able to issue globally unique timestamps. A globally unique transaction
timestamp is issued to the client by the first server accessed by a transaction. The transaction
timestamp is passed to each server that performs an operation in the transaction. The servers
of distributed transactions are jointly responsible for ensuring that they are performed in a
serially equivalent manner. For example, if the version of a data item accessed by transaction
U commits after the version accessed by T at one server, then if T and U access the same data
item as one another at other servers, they must commit them in the same order. To achieve
the same ordering at all the servers, the servers must agree to do the ordering of their
timestamps. A timestamp consists of a pair <local timestamp, server-id>.

Objectives of Timestamp Ordering

• Transaction Ordering − In order for the transaction outcomes and timestamps to


match, the transactions must be carried out in the right order.
• Conflict Resolution − If two transactions are in conflict, the timestamp ordering
mechanism must choose between terminating one of the transactions or postponing
it until the other transaction is finished
• Deadlock Prevention − To avoid deadlocks, which occur while several transactions are
awaiting one another's completion, the timestamp ordering mechanism must be used.

How Timestamp Ordering Works?

The timestamp ordering algorithm works by assigning a unique timestamp to each transaction
when it arrives in the system. The timestamp reflects the transaction's start time, and it is
used to order the transactions for execution. The algorithm consists of two phases: the
validation phase and the execution phase.
• Validation Phase − The timestamp ordering method verifies each transaction's
timestamp during the validation stage to make sure the transactions are performed in
the proper sequence. When one transaction's timestamp is lower than another's, the
earlier transaction must be carried out.
• Execution Phase − In the execution phase, the timestamp ordering algorithm executes
the transactions in the order determined by the validation phase. If there is a conflict
between transactions, the algorithm uses a conflict resolution strategy to resolve the
conflict. One strategy is to abort the transaction with the lower timestamp, while
another strategy is to delay the transaction with the lower timestamp until the other
transaction completes.

Benefits of Timestamp Ordering

The benefits of timestamp ordering are as follows −

• Transaction Consistency − Transaction consistency is ensured by the timestamp


ordering method, which implies that regardless of how the transactions are
conducted, the outcomes are the same as if they were carried out serially.
• High Concurrency − High concurrency is made possible via the timestamp ordering
mechanism, which permits several transactions to run concurrently.
• Deadlock Prevention − While two or more transactions are awaiting one another's
completion, a deadlock is avoided thanks to the timestamp ordering method.

In a timestamp-based concurrency technique, each transaction in the system is assigned a


unique timestamp which is taken as soon as the transaction begins, and its verified again
during the commit phase. If there’s new updated timestamp from a different transaction
then based on some policy defined by the System the transaction will either be restarted or
aborted. But if the times stamp is same & never modified by any other transaction then it
will be committed.

Example: Let’s say we have two transaction T1 and T2, they operate on data item – A. The
Timestamp concurrency technique will keep track of the timestamp when the data was
accessed by transaction T1 first time.
Now, let’s say this transaction T1 is about to commit, before committing, it will check the
initial timestamp with the most recent timestamp. In our case, the transaction T1 won’t be
committed because a write operation by transaction T2 was performed.

if(Initial_timestamp == Most_recent_timestamp)

then ‘Commit’

else

‘Abort’

In our case, transaction will be aborted because T2 modified the same data item at 12:15PM

Flat & Nested Distributed Transactions:

If a client transaction calls actions on multiple servers, it is said to be distributed.


Distributed transactions can be structured in two different ways:
1. Flat transactions
2. Nested transactions

Flat transactions

• A flat transaction has a single initiating point (Begin) and a single end point
(Commit or abort). They are usually very simple and are generally used for short
activities rather than larger ones.
• A client makes requests to multiple servers in a flat transaction. Transaction T, for
example, is a flat transaction that performs operations on objects in servers X, Y,
and Z.
Before moving on to the next request, a flat client transaction completes the
previous one. As a result, each transaction visits the server object in order.
• A transaction can only wait for one object at a time when servers utilize locking.

Limitations of a flat Transaction:


• All work is lost in the event of a crash.
• Only one DBMS may be used at a time.
• No partial rollback is possible.

Nested transactions

Refer above

Role of coordinator:

When the Distributed Transaction commits, the servers that are involved in the
transaction execution, for proper coordination, must be able to communicate with one
another.
When a client initiates a transaction, an “openTransaction” request is sent to any
coordinator server. The contacted coordinator carries out the “openTransaction” and
returns the transaction identifier to the client.
• Distributed transaction identifiers must be unique within the distributed system.
A simple way is to generate a TID contains two parts – the ‘server identifier”
(example: IP address) of the server that created it and a number unique to the
server.
The coordinator who initiated the transaction becomes the distributed
transaction’s coordinator and has the responsibility of either aborting it or
committing it.
• Every server that manages an object accessed by a transaction is a participant in
the transaction & provides an object we call the participant. The participants are
responsible for working together with the coordinator to complete the commit
process.
• The coordinator every time, records the new participant in the participants list.
Each participant knows the coordinator & the coordinator knows all the
participants. This enables them to collect the information that will be needed at
the time of commit and hence work in coordination.

Atomic Commit Protocol in Distributed System

In distributed systems, transactional consistency is guaranteed by the Atomic Commit


Protocol. It coordinates two phases—voting and decision—to ensure that a transaction is
either fully committed or completely cancelled on several nodes.
Atomic Commit

The atomic commit procedure should meet the following requirements:


• All participants who make a choice reach the same conclusion.
• If any participant decides to commit, then all other participants must have
voted yes.
• If all participants vote yes and no failure occurs, then all participants decide to
commit.

Distributed One-Phase Commit

A one-phase commitment protocol involves a coordinator who communicates with servers


and performs each task regularly to inform them to perform or cancel actions i.e.
transactions.

One phase Commit


Distributed Two-Phase Commit

There are two phases for the commit procedure to work:

Phase 1: Voting
• A “prepare message” is sent to each participating worker by the coordinator.
• The coordinator must wait until a response whether ready or not ready is
received from each worker, or a timeout occurs.
• Workers must wait until the coordinator sends the “prepare” message.
• If a transaction is ready to commit then a “ready” message is sent to the
coordinator.
• If a transaction is not ready to commit then a “no” message is sent to the
coordinator and resulting in aborting of the transaction.

Phase 2: Completion of the voting result


• In this phase, the coordinator will check about the “ready” message. If each
worker sent a “ready” message then only a “commit” message is sent to each
worker; otherwise, send an “abort” message to each worker.
• Now, wait for acknowledgment until it is received from each worker.
• In this phase, Workers wait until the coordinator sends a “commit” or “abort”
message; then act according to the message received.
• At last, Workers send an acknowledgment to the coordinator.

Two phase Commit

You might also like