IV Sem CSE DBMS Module 4 (Transaction Processing)
IV Sem CSE DBMS Module 4 (Transaction Processing)
Module 4
Objectives: Transaction Processing: Introduction to Transaction Processing,
Transaction and System concepts, Desirable properties of Transactions, Characterizing
schedules based on recoverability, Characterizing schedules based on Serializability,
Transaction support in SQL.
References:
1. Fundamentals of Database Systems, Ramez Elmasri and Shamkant B.
Navathe, 7th Edition, 2017, Pearson.
2. Database management systems, Ramakrishnan, and Gehrke, 3rd
Edition, 2014, McGraw Hill.
Chapter-20
Transaction Processing
Transaction Processing: Introduction to Transaction Processing, Transaction and
System concepts, Desirable properties of Transactions, Characterizing schedules based
on recoverability, Characterizing schedules based on Serializability, Transaction
support in SQL.
A transaction is an atomic unit of work that should either be completed in its entirety
or not done at all. For recovery purposes, the system needs to keep track of when each
transaction starts, terminates, and commits, or aborts. Therefore, the recovery manager
of the DBMS needs to keep track of the following operations:
Fig: State transition diagram illustrating the states for transaction execution
The above figure shows a state transition diagram that illustrates how a transaction
moves through its execution states.
A transaction goes into an active state immediately after it starts execution, where it
can execute its
READ and WRITE operations.
When the transaction ends, it moves to the partially committed state. At this
point, some types of concurrency control protocols may do additional checks
to see if the transaction can be committed or not.
Also, some recovery protocols need to ensure that a system failure will not
result in an inability to record the changes of the transaction permanently.
If these checks are successful, the transaction is said to have reached its
commit point and enters the committed state.
When a transaction is committed, it has concluded its execution successfully
and all its changes must be recorded permanently in the database, even if a
system failure occurs.
However, a transaction can go to the failed state if one of the checks fails or if the
transaction is aborted during its active state. The transaction may then have to be
rolled back to undo the effect of its WRITE operations on the database. The
terminated state corresponds to the transaction leaving the system. The transaction
information that is maintained in system tables while the transaction has been running
is removed when the transaction terminates. Failed or aborted transactions may be
restarted later—either automatically or after being resubmitted by the user—as brand
new transactions.
A transaction T reaches its commit point when all its operations that access the
database have been executed successfully and the effect of all the transaction
operations on the database have been recorded in the log.
The transaction then writes a commit record [commit, T] into the log. If a system
failure occurs, we can search back in the log for all transactions T that have written a
[start_transaction, T] record into the log but have not written their [commit, T]
record yet; these transactions may have to be rolled back to undo their effect on the
database during the recovery process.
Transactions that have written their commit record in the log must also have
recorded all their WRITE operations in the log, so their effect on the database can be
redone from the log records.
The log file must be kept on disk. Updating a disk file involves copying the
appropriate block of the file from disk to a buffer in main memory, updating the buffer
in main memory, and copying the buffer to disk.
At the time of a system crash, only the log entries that have been written back to disk
are considered in the recovery process if the contents of main memory are lost. Hence,
before a transaction reaches its commit point, any portion of the log that has not been
written to the disk yet must now be written to the disk. This process is called force-
writing the log buffer to disk before committing a transaction.
When transactions are executing concurrently in an interleaved fashion, then the order
of execution of operations from all the various transactions is known as a schedule (or
history).
example, the schedule in which we shall call Sa, can be written as follows in this
notation:
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);
Similarly, the schedule for fig. which we call Sb, can be written as follows, if we
assume that transaction T1 aborted after its read_item(Y) operation:
Sb: r1(X); w1(X); r2(X); w2(X); r1(Y); a1;
Conflicting Operations in a Schedule:
Two operations in a schedule are said to conflict if they satisfy all three of the
following conditions:
(1) they belong to different transactions;
(2) they access the same item X; and
(3) at least one of the operations is a write_item(X).
Intuitively, two operations are conflicting if changing their order can result in a
different outcome. For example, if we change the order of the two operations r1(X);
Dept. of CSE(Data Science), VCET, [email protected] Page 6
Database Management System (BCS403)
w2(X) to w2(X); r1(X), then the value of X that is read by transaction T1 changes,
because in the second ordering the value of X is read by r1(X) after it is changed by
w2(X), whereas in the first ordering the value is read before it is write conflict.
A recovery algorithm can be devised for any recoverable schedule. The (partial)
schedules Sa and Sb from the preceding section are both recoverable. Consider the
schedule Sa′ given below, which is the same as schedule Sa except that two commit
operations have been added to Sa: Sa′: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y);
c1;
Sa′ is recoverable, even though it suffers from the lost update problem; this problem is
handled by serializability theory.
Sc is not recoverable because T2 reads item X from T1, but T2 commits before T1
commits. The problem occurs if T1 aborts after the c2 operation in Sc; then the value
of X that T2 read is no longer valid and T2 must be aborted after it is committed,
leading to a schedule that is not recoverable. For the schedule to be recoverable, the c2
operation in Sc must be postponed until after T1 commits, as shown in Sd. If T1 aborts
instead of committing, then T2 should also abort as shown in Se, because the value of
X it read is no longer valid. In Se, aborting T2 is acceptable since it has not committed
yet, which is not the case for the non-recoverable schedule Sc.
Dept. of CSE(Data Science), VCET, [email protected] Page 7
Database Management System (BCS403)
In Cascadless Schedules If every Xn in the scheduled reads only items that written by
committed Xn.
strict schedule, in which transactions can neither read nor write an item X until the
last transaction that wrote X has committed (or aborted). For example, consider
schedule Sf:
Notes:-Any strict schedule is also cascadeless, and any cascadeless schedule is also
recoverable.
The Schedules that are always considered to be correct when concurrent transactions
are executing. Such Schedules are known as serializable schedules.
If no interleaving of operations is permitted there are only two possible arrangements
for executing transactions T1 and T2:
Execute (in sequence) all the operations of transaction T1, followed by all the
operations of transaction T2.
Execute (in sequence) all the operations of transaction T2, followed by all the
operations of transaction T1.
If interleaving of operations is allowed there will be many possible schedules.
The concept of serializability of schedules is used to identify which schedules are
correct.
Fig: Example of serial and non serial schedules involving transactions T1 and T2.
A schedule S is serial, if for every transaction T participating in the schedule, all the
operations of T are executed consecutively in the schedule; otherwise the schedule is
called non-serial.
In a serial schedule, only one transaction at a time is active—the commit (or abort) of
the active transaction initiates execution of the next transaction. No interleaving
occurs in a serial schedule.
The drawback of serial schedules is that they limit concurrency of interleaving of
operations.
Two schedules are called result equivalent if they produce the same final state of the
database.
Fig: Two schedules that are result equivalent for the initial value X=100, but are
not result equivalent in general.
The algorithm looks at only the read_item and write_item operations in a schedule to
construct a precedence graph (or serialization graph), which is a directed graph G
= (N, E)that consists of a set of nodes N = {T1, T2, ..., Tn } and a set of directed edges
E ={e1,e2, ..., em }. There is one node in the graph for each transaction Ti in the
schedule. Each edge ei in the graph is of the form (Tj → Tk ), 1 ≤ j ≤ n, 1 ≤ k f n,
where Tj is the starting node of ei and Tk is the ending node of ei. Such an edge from
node Tj to node Tk is created by the algorithm if one of the operations in Tj appears in
the schedule before some conflicting operation in Tk.
w1(A), r2(A), w1(B), w3(C), r2(C), r4(B), w2(D), w4(E), r5(D), w5(E)
Step1. We start with an empty graph with five vertices labeled T1, T2, T3, T4, T5.
Step 2 and 3.
w1(A): A is subsequently read by T2, so add edge T1 → T2
r2(A): no subsequent writes to A, so no new edges
w1(B): B is subsequently read by T4, so add edge T1 → T4
Step 5.This graph has no cycles, so the original schedule must be serializable.
Moreover, since one way to topologically sort the graph is T3–T1–T4–T2–T5, one
serial schedule that is conflict-equivalent is
w3(C), w1(A), w1(B), r4(B), w4(E), r2(A), r2(C), w2(D), r5(D), w5(E)
The approach taken in most commercial DBMS is to design protocols that if followed
by every individual transaction or if enforced by a DBMS concurrency control
subsystem will ensure serializibility of all schedules in which the transaction
participate.
Concurrency protocols:
3. Multi-version protocol
4. Optimistic protocols
Two phase locking: Locking the data items to prevent concurrent terms from
interfering with one another and enforcing serializibility.
Time Stamp ordering: Where transaction is assigned a unique timestamp and ensures
that any conflicting operations are executed in order of the transaction time stamp.
Two schedules S and S’ are said to be view equivalent of the following three
conditions holds.
1. The same set of transactions participate in S and S’, Sand S’ include the same
operations of those transactions.
2. For any operations r1(x) of Ti in S , if the value of X read by the operation has
taken written by an operation wj(x) of Tj the same condition must hold for the
value of X read by operation ri(x) of Ti in S’.
The definition of view serializibilty is less restrictive than that of conflict serializibilty
under the unconstrained write assumption where the value written by an operation
wi(x) in Ti can be independent of its old value from database.
Example: Sg of three transactions T1: r1(X); w1(X); T2: w2(X); and T3:w3(X): Sg:
r1(X); w2(X); w1(X); w3(X); c1; c2; c3;
In Sg the operations w2(X) and w3(X) are blind writes, since T2 and T3 do not read
the value of X. The schedule Sg is view serializable, since it is view equivalent to the
serial schedule T1, T2, T3. However, Sg is not conflict serializable, since it is not
conflict equivalent to any serial schedule.
Other Types of Equivalence of Schedules
Some applications can produce schedules that are correct by satisfying conditions less
stringent than either conflict serializibilty or view serializibilty.