0% found this document useful (0 votes)
3 views14 pages

IV Sem CSE DBMS Module 4 (Transaction Processing)

The document covers the fundamentals of transaction processing in database management systems, focusing on transaction concepts, desirable properties (ACID), and characterizing schedules based on recoverability and serializability. It explains the states of transactions, the importance of logging for recovery, and the definitions of recoverable and non-recoverable schedules. Additionally, it discusses the isolation levels and the significance of serializable schedules in ensuring correct execution of concurrent transactions.

Uploaded by

Pra Nav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views14 pages

IV Sem CSE DBMS Module 4 (Transaction Processing)

The document covers the fundamentals of transaction processing in database management systems, focusing on transaction concepts, desirable properties (ACID), and characterizing schedules based on recoverability and serializability. It explains the states of transactions, the importance of logging for recovery, and the definitions of recoverable and non-recoverable schedules. Additionally, it discusses the isolation levels and the significance of serializable schedules in ensuring correct execution of concurrent transactions.

Uploaded by

Pra Nav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Database Management System (BCS403)

Module 4
Objectives: Transaction Processing: Introduction to Transaction Processing,
Transaction and System concepts, Desirable properties of Transactions, Characterizing
schedules based on recoverability, Characterizing schedules based on Serializability,
Transaction support in SQL.

References:
1. Fundamentals of Database Systems, Ramez Elmasri and Shamkant B.
Navathe, 7th Edition, 2017, Pearson.
2. Database management systems, Ramakrishnan, and Gehrke, 3rd
Edition, 2014, McGraw Hill.

Chapter-20
Transaction Processing
Transaction Processing: Introduction to Transaction Processing, Transaction and
System concepts, Desirable properties of Transactions, Characterizing schedules based
on recoverability, Characterizing schedules based on Serializability, Transaction
support in SQL.

Transaction and System Concepts


Transaction states and Additional Operations

A transaction is an atomic unit of work that should either be completed in its entirety
or not done at all. For recovery purposes, the system needs to keep track of when each
transaction starts, terminates, and commits, or aborts. Therefore, the recovery manager
of the DBMS needs to keep track of the following operations:

Dept. of CSE(Data Science), VCET, [email protected] Page 1


Database Management System (BCS403)

BEGIN_TRANSACTION: This marks the beginning of transaction execution.


READ or WRITE: These specify read or write operations on the database items that
are executed as part of a transaction.
END_TRANSACTION: This specifies that READ and WRITE transaction
operations have ended and marks the end of transaction execution. However, at this
point it may be necessary to check whether the changes introduced by the transaction
can be permanently applied to the database (committed) or whether the transaction has
to be aborted because it violates serializability or for some other reason.
COMMIT_TRANSACTION: This signals a successful end of the transaction so that
any changes (updates) executed by the transaction can be safely committed to the
database and will not be undone.
ROLLBACK (or ABORT): This signals that the transaction has ended
unsuccessfully, so that any changes or effects that the transaction that may have
applied to the database must be undone.

Fig: State transition diagram illustrating the states for transaction execution
The above figure shows a state transition diagram that illustrates how a transaction
moves through its execution states.

A transaction goes into an active state immediately after it starts execution, where it
can execute its
READ and WRITE operations.

 When the transaction ends, it moves to the partially committed state. At this
point, some types of concurrency control protocols may do additional checks
to see if the transaction can be committed or not.
 Also, some recovery protocols need to ensure that a system failure will not
result in an inability to record the changes of the transaction permanently.

Dept. of CSE(Data Science), VCET, [email protected] Page 2


Database Management System (BCS403)

 If these checks are successful, the transaction is said to have reached its
commit point and enters the committed state.
 When a transaction is committed, it has concluded its execution successfully
and all its changes must be recorded permanently in the database, even if a
system failure occurs.
However, a transaction can go to the failed state if one of the checks fails or if the
transaction is aborted during its active state. The transaction may then have to be
rolled back to undo the effect of its WRITE operations on the database. The
terminated state corresponds to the transaction leaving the system. The transaction
information that is maintained in system tables while the transaction has been running
is removed when the transaction terminates. Failed or aborted transactions may be
restarted later—either automatically or after being resubmitted by the user—as brand
new transactions.

The System Log

 To be able to recover from failures that affect transactions, the system


maintains a log to keep track of all transaction operations that affect the values
of database items, as well as other transaction information that may be needed
to permit recovery from failures.
 The log is a sequential, append-only file that is kept on disk, so it is not
affected by any type of failure except for disk or catastrophic failure.
 Typically, one or more main memory buffers, called the log buffers, hold the
last part of the log file, so that log entries are first added to the log main
memory buffer.
 When the log buffer is filled, or when certain other conditions occur, the log
buffer is appended to the end of the log file on disk. In addition, the log file
from disk is periodically backed up to archival storage to guard against
catastrophic failures.
 The following are the types of entries—called log records—that are written to
the log file and the corresponding action for each log record. In these entries, T
refers to a unique transaction-id that is generated automatically by the
system for each transaction and that is used to identify each transaction:
[start_transaction, T]: Indicates that transaction T has started execution.

Dept. of CSE(Data Science), VCET, [email protected] Page 3


Database Management System (BCS403)

[write_item, T, X, old_value, new_value]: Indicates that transaction T has changed


the value of database item X from old_value to new_value.
[read_item, T, X]: Indicates that transaction T has read the value of database item X.
[commit, T]: Indicates that transaction T has completed successfully, and affirms that
its effect can be committed (recorded permanently) to the database.
[abort, T]: Indicates that transaction T has been aborted.

Commit Point of a Transaction

A transaction T reaches its commit point when all its operations that access the
database have been executed successfully and the effect of all the transaction
operations on the database have been recorded in the log.
The transaction then writes a commit record [commit, T] into the log. If a system
failure occurs, we can search back in the log for all transactions T that have written a
[start_transaction, T] record into the log but have not written their [commit, T]
record yet; these transactions may have to be rolled back to undo their effect on the
database during the recovery process.
Transactions that have written their commit record in the log must also have
recorded all their WRITE operations in the log, so their effect on the database can be
redone from the log records.

The log file must be kept on disk. Updating a disk file involves copying the
appropriate block of the file from disk to a buffer in main memory, updating the buffer
in main memory, and copying the buffer to disk.
At the time of a system crash, only the log entries that have been written back to disk
are considered in the recovery process if the contents of main memory are lost. Hence,
before a transaction reaches its commit point, any portion of the log that has not been
written to the disk yet must now be written to the disk. This process is called force-
writing the log buffer to disk before committing a transaction.

Desirable Properties of Transaction:

Transactions should possess several properties, often called the ACID


properties, they should be enforced by the concurrency control and recovery methods
of the DBMS. The following are the ACID properties:

Atomicity: A transaction is an atomic unit of processing, it should either be performed


in its entirety or not performed at all.

Dept. of CSE(Data Science), VCET, [email protected] Page 4


Database Management System (BCS403)

 It requires that we execute a transaction to completion. It is the responsibility


of the transaction recovery subsystem of a DBMS to ensure atomicity.
 If a transaction fails to complete for some reason, such as a system crash in the
midst of transaction execution, the recovery technique must undo any effects
of the transaction on the database.
 On the other hand, write operations of a committed transaction must be
eventually written to disk.
Consistency preservation: A transaction should be consistency preserving, meaning
that if it is completely executed from beginning to end without interference from other
transactions, it should take the database from one consistent state to another.
 It is generally considered to be the responsibility of the programmers who
write the database programs and of the DBMS module that enforces integrity
constraints.
 A database program should be written in a way that guarantees that, if the
database is in a consistent state before executing the transaction, it will be in a
consistent state after the complete execution of the transaction, assuming that
no interference with other transactions occurs.
Isolation: A transaction should appear as though it is being executed in isolation from
other transactions, even though many transactions are executing concurrently. That is,
the execution of a transaction should not be interfered with by any other transactions
executing concurrently.
 It is enforced by the concurrency control subsystem of the DBMS.
 If every transaction does not make its updates (write operations) visible to
other transactions until it is committed, one form of isolation is enforced that
solves the temporary update problem and eliminates cascading rollbacks but
does not eliminate all other problems.
Durability or permanency: The changes applied to the database by a committed
transaction must persist in the database. These changes must not be lost because of
any failure.
It is the responsibility of the recovery subsystem of the DBMS.
Levels of Isolation: There have been attempts to define the level of isolation of a
transaction.
 A transaction is said to have level 0 (zero) isolation if it does not overwrite the
dirty reads of higher-level transactions.

Dept. of CSE(Data Science), VCET, [email protected] Page 5


Database Management System (BCS403)

 Level 1 isolation has no lost updates.


 Level 2 isolation has no lost updates and no dirty reads.

 Level 3 isolation (also called true isolation) has repeatable reads.

 Another type of isolation is called snapshot isolation, and several practical


concurrency control methods are based on this.
Characterizing Schedules Based on Recoverability

When transactions are executing concurrently in an interleaved fashion, then the order
of execution of operations from all the various transactions is known as a schedule (or
history).

Schedules of Transactions: operations of the transactions. Operations from


different transactions can be
meaning that for any two operations in the schedule, one must occur before the
schedule.
The purpose of recovery and concurrency control, we are mainly interested in the
symbols b, r, w, e, c, and a for the
operations begin_transaction, read_item,write_item, end_transaction, commit, and
abort, respectively, and appends as a subscript the transaction id (transaction number)
to each operation in the schedule.

example, the schedule in which we shall call Sa, can be written as follows in this
notation:
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);

Similarly, the schedule for fig. which we call Sb, can be written as follows, if we
assume that transaction T1 aborted after its read_item(Y) operation:
Sb: r1(X); w1(X); r2(X); w2(X); r1(Y); a1;
Conflicting Operations in a Schedule:
Two operations in a schedule are said to conflict if they satisfy all three of the
following conditions:
(1) they belong to different transactions;
(2) they access the same item X; and
(3) at least one of the operations is a write_item(X).
Intuitively, two operations are conflicting if changing their order can result in a
different outcome. For example, if we change the order of the two operations r1(X);
Dept. of CSE(Data Science), VCET, [email protected] Page 6
Database Management System (BCS403)

w2(X) to w2(X); r1(X), then the value of X that is read by transaction T1 changes,
because in the second ordering the value of X is read by r1(X) after it is changed by
w2(X), whereas in the first ordering the value is read before it is write conflict.

Characterizing Schedules Based on Recoverability


It is easy to recover from transaction and system failures. In some cases, it is even
not possible to recover correctly after a failure. it is important to characterize the types
of schedules for which recovery is possible, as well as those for which recovery is
relatively simple.

once a transaction T is committed, it should never be necessary to roll back T. The


schedules that theoretically meet this criterion are called recoverable schedules. A
schedule where a committed transaction may have to be rolled back during recovery is
called nonrecoverable and hence should not be permitted by the DBMS.

A recovery algorithm can be devised for any recoverable schedule. The (partial)
schedules Sa and Sb from the preceding section are both recoverable. Consider the
schedule Sa′ given below, which is the same as schedule Sa except that two commit
operations have been added to Sa: Sa′: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y);
c1;

Sa′ is recoverable, even though it suffers from the lost update problem; this problem is
handled by serializability theory.

consider the two (partial) schedules Sc and Sd that follow: Sc:


r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1;
Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2;

Se: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); a1; a2;

Sc is not recoverable because T2 reads item X from T1, but T2 commits before T1
commits. The problem occurs if T1 aborts after the c2 operation in Sc; then the value
of X that T2 read is no longer valid and T2 must be aborted after it is committed,
leading to a schedule that is not recoverable. For the schedule to be recoverable, the c2
operation in Sc must be postponed until after T1 commits, as shown in Sd. If T1 aborts
instead of committing, then T2 should also abort as shown in Se, because the value of
X it read is no longer valid. In Se, aborting T2 is acceptable since it has not committed
yet, which is not the case for the non-recoverable schedule Sc.
Dept. of CSE(Data Science), VCET, [email protected] Page 7
Database Management System (BCS403)

In Cascadless Schedules If every Xn in the scheduled reads only items that written by
committed Xn.

strict schedule, in which transactions can neither read nor write an item X until the
last transaction that wrote X has committed (or aborted). For example, consider
schedule Sf:

Sf: w1(X, 5); w2(X, 8); a1;

Notes:-Any strict schedule is also cascadeless, and any cascadeless schedule is also
recoverable.

Characterizing Schedules Based on Serializability

The Schedules that are always considered to be correct when concurrent transactions
are executing. Such Schedules are known as serializable schedules.
If no interleaving of operations is permitted there are only two possible arrangements
for executing transactions T1 and T2:
Execute (in sequence) all the operations of transaction T1, followed by all the
operations of transaction T2.
Execute (in sequence) all the operations of transaction T2, followed by all the
operations of transaction T1.
If interleaving of operations is allowed there will be many possible schedules.
The concept of serializability of schedules is used to identify which schedules are
correct.

Fig: Example of serial and non serial schedules involving transactions T1 and T2.

Dept. of CSE(Data Science), VCET, [email protected] Page 8


Database Management System (BCS403)

Serial schedule A: T1 followed by T2.

Serial schedule B: T2 followed by T1.

Fig: Two non-serial schedules C and D with interleaving of operations.

Serial, Non-serial, and Conflict-Serializable Schedules

A schedule S is serial, if for every transaction T participating in the schedule, all the
operations of T are executed consecutively in the schedule; otherwise the schedule is
called non-serial.
In a serial schedule, only one transaction at a time is active—the commit (or abort) of
the active transaction initiates execution of the next transaction. No interleaving
occurs in a serial schedule.
The drawback of serial schedules is that they limit concurrency of interleaving of
operations.
Two schedules are called result equivalent if they produce the same final state of the
database.

Fig: Two schedules that are result equivalent for the initial value X=100, but are
not result equivalent in general.

Two definitions of equivalence of schedules are generally used: conflict equivalence


and view equivalence.

Dept. of CSE(Data Science), VCET, [email protected] Page 9


Database Management System (BCS403)

Two operations in a schedule are said to conflict if they belong to different


transactions, access the same database item, and either both are write_item operations
or one is a write_item and the other a read_item.
Two schedules are said to be conflict equivalent if the order of any two conflicting
operations is the same in both schedules.
A schedule S of n transactions is serializable if it is equivalent to some serial schedule
of the same n transactions.
Using conflict equivalence, we define a schedule S to be conflict serializable if it is
conflict- equivalent to some serial schedule S. In such a case, we can reorder the non-
conflicting operations in S until we form the equivalent serial schedule S.
Consider the following schedule for a set of three transactions.
We cannot perform reordering on this: The first two operations are both on A and at
least one is a write; the second and third operations are by the same transaction; the
third and fourth are both on B at at least one is a write; and so are the fourth and fifth.
So this schedule is not conflict-equivalent to anything else — and certainly not any
serial schedules.
However, since nobody ever reads the values written by the w1(A), w2(B), and w1(B)
operations, the schedule has the same outcome as the serial schedule:
w1(A), w1(B), w2(A), w2(B), w3(B)
Testing for Conflict Serializability of a Schedule

Using the definition of conflict-serializability to show that a schedule is conflict-


serializable is quite difficult. There's a much more efficient algorithm:

The algorithm looks at only the read_item and write_item operations in a schedule to
construct a precedence graph (or serialization graph), which is a directed graph G
= (N, E)that consists of a set of nodes N = {T1, T2, ..., Tn } and a set of directed edges
E ={e1,e2, ..., em }. There is one node in the graph for each transaction Ti in the
schedule. Each edge ei in the graph is of the form (Tj → Tk ), 1 ≤ j ≤ n, 1 ≤ k f n,
where Tj is the starting node of ei and Tk is the ending node of ei. Such an edge from
node Tj to node Tk is created by the algorithm if one of the operations in Tj appears in
the schedule before some conflicting operation in Tk.

1.For each transaction Ti participating in schedule S, create a node labeled Ti


in the precedence graph.
2. For each case in S where Tj executes a read_item(X) after Ti executes a

Dept. of CSE(Data Science), VCET, [email protected] Page 10


Database Management System (BCS403)

write_item(X), create an edge (Ti → Tj) in the precedence graph.


3. For each case in S where Tj executes a write_item(X) after Ti executes a
read_item(X), create an edge (Ti → Tj) in the precedence graph.
1. For each case in S where Tj executes a write_item(X) after Ti executes a
write_item(X), create an edge (Ti → Tj) in the precedence graph.
2.The schedule S is serializable if and only if the precedeence graph has no
cycles. As an example, consider the following schedule:

w1(A), r2(A), w1(B), w3(C), r2(C), r4(B), w2(D), w4(E), r5(D), w5(E)

Step1. We start with an empty graph with five vertices labeled T1, T2, T3, T4, T5.
Step 2 and 3.
w1(A): A is subsequently read by T2, so add edge T1 → T2
r2(A): no subsequent writes to A, so no new edges
w1(B): B is subsequently read by T4, so add edge T1 → T4

w3(C): C is subsequently read by T2, so add edge T3 → T2


r2(C): no subsequent writes to C, so no new edges
r4(B): no subsequent writes to B, so no new edges
w2(D): C is subsequently read by T2, so add edge T3 → T2
w4(E): E is subsequently written by T5, so add edge T4 → T5
r5(D): no subsequent writes to D, so no new edges
w5(E): no subsequent operations on E, so no new edges
Step 4: Creating precedence graph.

Step 5.This graph has no cycles, so the original schedule must be serializable.

Moreover, since one way to topologically sort the graph is T3–T1–T4–T2–T5, one
serial schedule that is conflict-equivalent is

Dept. of CSE(Data Science), VCET, [email protected] Page 11


Database Management System (BCS403)

w3(C), w1(A), w1(B), r4(B), w4(E), r2(A), r2(C), w2(D), r5(D), w5(E)

How Serializibilty is used for Concurrency Control

The approach taken in most commercial DBMS is to design protocols that if followed
by every individual transaction or if enforced by a DBMS concurrency control
subsystem will ensure serializibility of all schedules in which the transaction
participate.

When transaction is submitted continuously the system finds difficulty to determine


when to start or end transaction. Serializibility theory can be adapted to deal with this
problem by considering only the committed projection of schedules committed
projection C(s) of a schedule S includes only the operations in S that belong to
committed transactions.

Concurrency protocols:

1. Two phase locking

2. Time stamp ordering

3. Multi-version protocol

4. Optimistic protocols

Two phase locking: Locking the data items to prevent concurrent terms from
interfering with one another and enforcing serializibility.

Time Stamp ordering: Where transaction is assigned a unique timestamp and ensures
that any conflicting operations are executed in order of the transaction time stamp.

Multi-version protocol: Maintaining multiple versions of data items.

Optimistic protocol: Check possible serializibility violations after the transactions


terminate but before they are permitted to commit.

View Equivalence and view Serializibility

Two schedules S and S’ are said to be view equivalent of the following three
conditions holds.

1. The same set of transactions participate in S and S’, Sand S’ include the same
operations of those transactions.

2. For any operations r1(x) of Ti in S , if the value of X read by the operation has

Dept. of CSE(Data Science), VCET, [email protected] Page 12


Database Management System (BCS403)

taken written by an operation wj(x) of Tj the same condition must hold for the
value of X read by operation ri(x) of Ti in S’.

3. If the operation wk(y) of Tk is the last operation to write item Y in S , then


wk(y) of Tk must also be the last operation to write item Y in S’.

A schedule S is said to be view serializable if its view equivalent to serial schedule.

The definitions of conflict serializibility and view serializibilty are similar if a


condition known as the constrained write assumption holds on all transactions in the
schedule.

The definition of view serializibilty is less restrictive than that of conflict serializibilty
under the unconstrained write assumption where the value written by an operation
wi(x) in Ti can be independent of its old value from database.

Example: Sg of three transactions T1: r1(X); w1(X); T2: w2(X); and T3:w3(X): Sg:
r1(X); w2(X); w1(X); w3(X); c1; c2; c3;
In Sg the operations w2(X) and w3(X) are blind writes, since T2 and T3 do not read
the value of X. The schedule Sg is view serializable, since it is view equivalent to the
serial schedule T1, T2, T3. However, Sg is not conflict serializable, since it is not
conflict equivalent to any serial schedule.
Other Types of Equivalence of Schedules

Some applications can produce schedules that are correct by satisfying conditions less
stringent than either conflict serializibilty or view serializibilty.

An example is the type of transactions known as debit-credit transactions, for example


those that apply deposits and withdrawals to a data item whose value is the current
balance of a bank account. They update the values of X by adding or subtracting X.
Consider the following transactions each of which may be used to transfer an amount
of money between two bank accounts:

T1: r1(X); X := X − 10; w1(X); r1(Y); Y := Y + 10; w1(Y);


T2: r2(Y); Y := Y − 20; w2(Y); r2(X); X := X + 20; w2(X);
Consider the following non-serializable schedule Sh for the two transactions:
Sh: r1(X); w1(X); r2(Y); w2(Y); r1(Y); w1(Y); r2(X); w2(X);
With the additional knowledge, or semantics, that the operations between each
ri(I)and wi(I) are commutative, we know that the order of executing the sequences
consisting of (read, update, write) is not important as long as each (read, update, write)

Dept. of CSE(Data Science), VCET, [email protected] Page 13


Database Management System (BCS403)

sequence by a particular transaction Ti on a particular item I is not interrupted by


conflicting operations. Hence, the schedule Sh is considered to be correct even though
it is not serializable.

Dept. of CSE(Data Science), VCET, [email protected] Page 14

You might also like