Unit - 3 DDB
Unit - 3 DDB
TRANSACTION MANAGEMENT
Transaction:
X account :
Open_Account(X)
Old_Balance = X.balance
New_Balance = Old_Balance - 800
X.balance = New_Balance
Close_Account(X)
Y's Account:
Open_Account(Y)
Old_Balance = Y.balance
New_Balance = Old_Balance + 800
Y.balance = New_Balance
Close_Account(Y)
Operations of Transaction:
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in
a buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following
operations:
R(X);
X = X - 500;
W(X);
Transaction property:
The transaction has the four properties. These are used to maintain consistency in a
database, before and after the transaction.
Property of Transaction
Atomicity
Consistency
Isolation
Durability.
non-distributed transactions,
distributed transactions,
online transactions,
Work flow transaction,
flat transactions, and
nested transactions.
Distributed transactions:
A distributed transaction involves operations that span multiple databases or systems,
requiring coordination between them to maintain consistency and integrity.
These transactions typically occur in distributed systems where data is spread across
different servers, databases, or even different geographical locations.
Managing distributed transactions is more complex because it involves ensuring that all
the databases or systems involved either commit or roll back the transaction as a whole.
ONLINE TRANSACTIONS:
Online transaction processing (OLTP) is a data processing system that allows large
numbers of people to perform a large number of transactions in real time.
Key features of OLTP systems include:
High availability: They must be available at all times for real-time transactions.
Fast response times: To support a large number of users with minimal delay.
Concurrent transaction handling: OLTP systems must support multiple transactions
simultaneously without conflict.
FLAT TRANSACTIONS:
A flat transaction is the simplest and most basic type of transaction model in a
Database Management System (DBMS).
It refers to a single, indivisible sequence of operations that must be executed entirely or
not at all. Flat transactions are particularly useful in environments where the work to be
done is linear, meaning there is no need for nesting or dividing the transaction into sub-
transactions.
NESTED TRANSACTIONS:
Nested transactions are a more complex transaction model used in Database
Management Systems (DBMS) to manage hierarchically structured operations.
Unlike flat transactions, which are indivisible, nested transactions allow a parent
transaction to contain multiple sub-transactions, each of which can be committed or
rolled back independently.
This structure enables greater flexibility, particularly for long-running or complex
processes where certain parts of the transaction may succeed, while others may fail.
WORKFLOW TRANSACTIONS:
In Database Management Systems (DBMS), workflow transactions refer to a type of
transaction that spans multiple steps or tasks, typically part of a larger business process.
These transactions are often long-running, and they involve coordination between
different systems, tasks, or users.
Workflow transactions are commonly used in business processes that cannot be
completed instantly, such as order processing, insurance claims, or complex approval
workflows.
Key Concepts:
Long-running Nature: Unlike typical database transactions, which are short and quick,
workflow transactions can last for minutes, hours, or even days, as they involve a series
of steps that may require human intervention or coordination across multiple systems.
Chained and Structured: Each step performs a specific task or operation, and the
outcome of one step often determines the next action in the process.
Loose Coupling: The tasks or steps in a workflow transaction are loosely coupled,
meaning that they can run independently but still contribute to the overall transaction.
Compensation and Partial Rollback: Since workflow transactions are long-running and
involve multiple independent tasks, traditional rollback mechanisms are not always
practical. Instead, compensation mechanisms are used. If a failure occurs at any step, a
compensating action is taken to undo the effects of the completed tasks (e.g., issuing a
refund if an order fails to ship).
SERIALIZABILITY:
Serializability is a term that is a property of the system that describes how the different
process operates the shared data.
If the result given by the system is similar to the operation performed by the system, then
in this situation, we call that system serializable.
Here the cooperation of the system means there is no overlapping in the execution of the
data. In DBMS, when the data is being written or read then, the DBMS can stop all the
other processes from accessing the data.
Types of Serializability
1. Conflict Serializability:
Conflict serializability is a type of conflict operation in serializability that operates the
same data item that should be executed in a particular order and maintains the
consistency of the database. In DBMS, each transaction has some unique value, and
every transaction of the database is based on that unique value of the database.
This unique value ensures that no two operations having the same conflict value are
executed concurrently.
2. View Serializability
View serializability is a type of operation in the serializable in which each transaction
should produce some result and these results are the output of proper sequential execution
of the data item.
Unlike conflict serialized, the view serializability focuses on preventing inconsistency in
the database.
In DBMS, the view serializability provides the user to view the database in a conflicting
way.
DISTRIBUTED SERIALIZABILITY:
In a DDBMS two schedules must be considered
Local Schedule.
Global Schedule. (i.e., the union of local schedules)
Serializability in DDBMS
Extends in a straight forward manner to a DDBMS if data is not replicated.
Requires more care if data is replicated: It is possible that the local schedules are
serializable, but the mutual consistency of the database is not guaranteed.
* Mutual consistency: All the values of the all replicated data items are identical.
Consider two sites and a data item x which is replicated at both sites.
T1: Read(x) T2:read(x)
x=x+5 x=x*10
write(x) write(x)
Both transactions need to run on both sides.
The following two schedules might have been produced at both sites. (the order is
implicitly given):
Site1: S1={R1(X), W1(X), R2(X), W2(X)}
Site 2: S2={R2(X), W2(X), R1(X), W1(X)}
Both schedules are (trivially) serializble, thus are correct in the context.
But they produce different results, thus voilate the mutual consistency.
Therefore, a serializable global schedule must meet the following conditions:
-- Local schedule are serializable.
-- Two conflicting operations should be in the same relative order in all of the local schedules
they appear.
Transaction needs to be run on each site with replicated data item.
4-phases of operation:
Read Phase: When a transaction begins, it reads the data while also logging the time-
stamp at which data is read to verify for conflicts during the validation phase.
Execution Phase: In this phase, the transaction executes all its operation like create,
read, update or delete etc.
Validation Phase: Before committing a transaction, a validation check is performed to
ensure consistency by checking the last updated timestamp with the one recorded at read
phase. If the timestamp matches, then the transaction will be allowed to be committed
and hence proceed with the commit phase.
Commit phase: During this phase, the transactions will either be committed or aborted,
depending on the validation check performed during previous phase. If the timestamp
matches, then transactions are committed else they’re aborted.
o The Timestamp Ordering Protocol is used to order the transactions based on their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.
o The priority of the older transaction is higher that's why it executes first. To determine
the timestamp of the transaction, this protocol uses system time or logical counter.
o The lock-based protocol is used to manage the order between conflicting pairs among
transactions at the execution time. But Timestamp based protocols start working as soon
as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has
entered the system at 007 times and transaction T2 has entered the system at 009 times.
T1 has the higher priority, so it executes first as it is entered the system first.
o The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write'
operation on a data.
Basic Timestamp ordering protocol works as follows:
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
o If W_TS(X) >TS(Ti) then the operation is rejected.
o If W_TS(X) <= TS(Ti) then the operation is executed.
o Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X) operation:
o TS protocol ensures freedom from deadlock that means no transaction ever waits.
o But the schedule may not be recoverable and may not even be cascade- free.
Validation phase is also known as optimistic concurrency control technique. In the validation
based protocol, the transaction is executed in the following three phases:
1. Read phase: In this phase, the transaction T is read and executed. It is used to read the
value of various data items and stores them in temporary local variables. It can perform
all the write operations on temporary variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against
the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results
are written to the database or system otherwise the transaction is rolled back.
Here each phase has the following different timestamps:
Validation (Ti): It contains the time when Ti finishes its read phase and starts its validation
phase.
o This protocol is used to determine the time stamp for the transaction for serialization
using the time stamp of the validation phase, as it is the actual phase which determines if
the transaction will commit or rollback.
o Hence TS(T) = validation(T).
o The serializability is determined during the validation process. It can't be decided in
advance.
o While executing the transaction, it ensures a greater degree of concurrency and also less
number of conflicts.
o Thus it contains transactions which have less number of rollbacks.
Thomas Write Rule provides the guarantee of serializability order for the protocol. It improves
the Basic Timestamp Ordering Algorithm.
o If TS(T) < R_TS(X) then transaction T is aborted and rolled back, and operation is
rejected.
o If TS(T) < W_TS(X) then don't execute the W_item(X) operation of the transaction and
continue processing.
o If neither condition 1 nor condition 2 occurs, then allowed to execute the WRITE
operation by transaction Ti and set W_TS(X) to TS(T).
If we use the Thomas write rule then some serializable schedule can be permitted that does not
conflict serializable as illustrate by the schedule in a given figure:
In the above figure, T1's read and precedes T1's write of the same data item. This schedule does
not conflict serializable.
Thomas write rule checks that T2's write is never seen by any transaction. If we delete the write
operation in transaction T2, then conflict serializable schedule can be obtained which is shown in
below figure.
Multiple Granularity
Multiple Granularity:
o It can be defined as hierarchically breaking up the database into blocks which can be
locked.
o The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
o It maintains the track of what to lock and how to lock.
o It makes easy to decide either to lock a data item or to unlock a data item. This type of
hierarchy can be graphically represented as a tree.
For example: Consider a tree which has four levels of nodes.
o Database
o Area
o File
o Record
In this example, the highest level shows the entire database. The levels below are file, record,
and fields.
Intention-Exclusive (IX): It contains explicit locking at a lower level with exclusive or shared
locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared mode, and some
node is locked in exclusive mode by the same transaction.
Compatibility Matrix with Intention Lock Modes: The below table describes the
compatibility matrix for these lock modes:
It uses the intention lock modes to ensure serializability. It requires that if a transaction attempts
to lock a node, then that node must follow these protocols:
o If transaction T1 reads record Ra9 in file Fa, then transaction T1 needs to lock the
database, area A1 and file Fa in IX mode. Finally, it needs to lock Ra2 in S mode.
o If transaction T2 modifies record R a9 in file Fa, then it can do so after locking the
database, area A1 and file Fa in IX mode. Finally, it needs to lock the Ra9 in X mode.
o If transaction T3 reads all the records in file F a, then transaction T3 needs to lock the
database, and area A in IS mode. At last, it needs to lock Fa in S mode.
o If transaction T4 reads the entire database, then T4 needs to lock the database in S mode.
For example: In the student table, transaction T1 holds a lock on some rows and needs to update
some rows in the grade table. Simultaneously, transaction T2 holds locks on some rows in the
grade table and needs to update the rows in the Student table held by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and
similarly, transaction T2 is waiting for T1 to release its lock. All activities come to a halt state
and remain at a standstill. It will remain in a standstill until the DBMS detects the deadlock and
aborts one of the transactions.
Below is a list of conditions necessary for a deadlock to occur:
o Circular Waiting: It is when two or more transactions wait each other indefinitely for a
lock held by the others to be released.
o Partial Allocation: When a transaction acquires some of the required data items but not
all the data items as they may be exclusively locked by others.
o Non-Preemptive scheduling: A data item that could be only single transaction at a time.
o Mutual Exclusion: A data item can be locked exclusively by one transaction at a time.
To avoid a deadlock atleast one of the above mentioned necessary conditions should not occur.
Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the database rather
than aborting or restating the database. This is a waste of time and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A
method like "wait for graph" is used for detecting the deadlock situation but this method
is suitable only for the smaller database. For the larger database, deadlock prevention
method can be used.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should
detect whether the transaction is involved in a deadlock or not. The lock manager maintains a
Wait for the graph to detect the deadlock cycle in the database.
o This is the suitable method for deadlock detection. In this method, a graph is created
based on the transaction and their lock. If the created graph has a cycle or closed loop,
then there is a deadlock.
o The wait for the graph is maintained by the system for every transaction which is waiting
for some data held by the others. The system keeps checking the graph if there is any
cycle in the graph.
The wait for a graph for the above scenario is shown below:
Deadlock Prevention
o Deadlock prevention method is suitable for a large database. If the resources are allocated
in such a way that deadlock never occurs, then the deadlock can be prevented.
o The Database management system analyzes the operations of the transaction whether
they can create a deadlock situation or not. If they do, then the DBMS never allowed that
transaction to be executed.
Each transaction has unique identifier which is called timestamp. It is usually based on the state
of the transaction and assigned once the transaction is started. For example if the transaction T1
starts before the transaction T2 then the timestamp corresponding to the transaction T1 will be
less than timestamp corresponding to transaction T2. The timestamp decides whether a
transaction should wait or abort and rollback. Aborted transaction retain their timestamps values
and hence the seniority.
The following deadlock prevention schemes using timestamps have been proposed.
o Wait-Die scheme
o Wound wait scheme
The significant disadvantage of both of these techniques is that some transactions are aborted and
restarted unnecessarily even though those transactions never actually cause a deadlock.
Wait-Die scheme
In this scheme, if a transaction requests for a resource which is already held with a conflicting
lock by another transaction then the DBMS simply checks the timestamp of both transactions. It
allows the older transaction to wait until the resource is available for execution.
Let's assume there are two transactions Ti and Tj and let TS(T) is a timestamp of any transaction
T. If T2 holds a lock by some other transaction and T1 is requesting for resources held by T2
then the following actions are performed by DBMS:
1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some resource,
then Ti is allowed to wait until the data-item is available for execution. That means if the
older transaction is waiting for a resource which is locked by the younger transaction,
then the older transaction is allowed to wait for resource until it is available.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource and if Tj
is waiting for it, then Tj is killed and restarted later with the random delay but with the
same timestamp.
If T1 is allowed to
Transaction Timestamp
T1 5
T2 7
T3 9
If T1 request a data item is locked by transaction T2 then T1 has to wait until T2 completes and
all locks acquired by it are released because t(T1) < t(T2). On the other hand, if transaction T3
requests a data item locked by transaction T2 and T3 has to abort and rollback i.e. dies because
t(T3) < t(T2).
o In wound wait scheme, if the older transaction requests for a resource which is held by
the younger transaction, then older transaction forces younger one to kill the transaction
and release the resource. After the minute delay, the younger transaction is restarted but
with the same timestamp.
o If the older transaction has held a resource which is requested by the Younger
transaction, then the younger transaction is asked to wait until older releases it.
o It is based on preemptive technique.
If T1 is allowed to
Transaction Timestamp
T1 5
T2 7
T3 9
Following are the differences between the Wait-Die scheme and Wound wait scheme
In the deadlock detection scheme, the deadlock detection algorithm checks the state of
the system periodically whether the deadlock has occurred or not, if the deadlock exists
in the system tries to recover from the deadlock.
In order to detect a deadlock the system must have the following information:
o The system should have information about the concurrent group of transactions.
o Information about the current allocation of data items for each transaction.
o It must have information about the current set of data items that each transaction
is waiting for.
The system must provide an algorithm that uses this information i.e. the information
about the current allocations of data items to examine whether a system has entered a
deadlock state or not. If the deadlock exists then the system attempts to recover from the
deadlock.
If the wait for graph which is used for deadlock detection contains a deadlock situation
i.e. there exists cycles in it then those cycles should be removed to recover from the
deadlock. The most widely used technique of recovering from a deadlock is to rollback
one or more transactions till the system no longer displays a deadlock condition.
Selection of victim: There may be many transactions which are involved in a deadlock
i..e deadlocked transaction. So to recover from the deadlock some of the transaction
should be rolled back, out of the possible transactions causing a deadlock. The one that is
rolled back is known as victim transaction and the mechanism is known as victim
election.
The transactions to be rolled back are the one which has just started or has not made
many changes. Avoid selecting transactions that have made many updates and have been
running for a long time.
Rollback: Once the selection of the transaction to be rolled back is decided we should
find out how far the current transaction should be rolled back. One of the simplest
solution is the total rollback i.e. abort the transaction and restart it. However, the
transaction should be rolled back to the extent required to break the deadlock. Also, the
additional information of the state of currently executing transactions should be
maintained.
Starvation: To recover from the deadlock, we must ensure that the same transaction
should not be selected again and again as a victim to rollback. The transaction will never
complete if the type of situation is not avoided. To avoid starvation, only a finite number
of times a transaction should be picked up as a victim.
A widely used solution is to include the number of rollbacks of the transaction that is
selected as the victim.