DBMS Notes Unit 3
DBMS Notes Unit 3
UNIT-III - TRANSACTIONS
4) Durability. After a transaction completes successfully, the changes it has made to the
database persist, even if there are system failures. These properties are often called the
ACID properties; the acronym is derived from the first letter of each of the four properties.
It is important to know if a change to a data item appears only in main memory or if it has been
written to the database on disk. In a real database system, the write operation does not necessarily result
in the immediate update of the data on the disk; the write operation may be temporarily stored elsewhere
and executed on the disk later. For now, however, we shall assume that the write operation updates the
database immediately.
Let Ti be a transaction that transfers $50 from account A to account B. This transaction can be
defined as:
Ti : read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Atomicity:
Suppose that, just before the execution of transaction Ti, the values of accounts A and B are
$1000 and $2000, respectively. Now suppose that, during the execution of transaction Ti , a failure
occurs that prevents Ti from completing its execution successfully. Further, suppose that the failure
happened after the write (A) operation but before the write (B) operation. In this case, the values of
accounts A and B reflected in the database are $950 and $2000. The system destroyed $50 as a result of
this failure. In particular, we note that the sum A + B is no longer preserved. Thus, because of the
failure, the state of the system no longer reflects a real state of the world that the database is supposed to
capture. We term such a state an inconsistent state.
We must ensure that such inconsistencies are not visible in a database system. However, the
system must at some point be in an inconsistent state. Even if transaction Ti is executed to completion,
there exist a point at which the value of account A is $950 and the value of account B is $2000, which is
clearly an inconsistent state. This state, however, is eventually replaced by the consistent state where the
value of account A is $950, and the value of account B is $2050. Thus, if the transaction never started or
was guaranteed to complete, such an inconsistent state would not be visible except during the execution
of the transaction. That is the reason for the atomicity requirement: If the atomicity property is present,
all actions of the transaction are reflected in the database, or none are.
The basic idea behind ensuring atomicity is this: The database system keeps track (on disk) of
the old values of any data on which a transaction performs a write. This information is written to a file
called the log. If the transaction does not complete its execution, the database system restores the old
values from the log to make it appear as though the transaction never executed. Ensuring atomicity is
the responsibility of the database system; specifically, it is handled by a component of the database
called the recovery system.
Durability:
Once the execution of the transaction completes successfully, and the user who initiated the
transaction has been notified that the transfer of funds has taken place, it must be the case that no system
failure can result in a loss of data corresponding to this transfer of funds. The durability property
guarantees that, once a transaction completes successfully, all the updates that it carried out on the
database persist, even if there is a system failure after the transaction completes execution.
We assume for now that a failure of the computer system may result in loss of data in main
memory, but data written to disk are never lost. We can guarantee durability by ensuring that either: The
updates carried out by the transaction have been written to disk before the transaction completes.
Information about the updates carried out by the transaction and writ-ten to disk is sufficient to enable
the database to reconstruct the updates when the database system is restarted after the failure.
Isolation:
Even if the consistency and atomicity properties are ensured for each transaction, if several
transactions are executed concurrently, their operations may interleave in some undesirable way,
resulting in an inconsistent state. For example, as we saw earlier, the database is temporarily
inconsistent while the transaction to transfer funds from A to B is executing, with the deducted total
written to A and the increased total yet to be written to B. If a second concurrently running transaction
reads A and B at this intermediate point and computes A+ B, it will observe an inconsistent value.
Furthermore, if this second transaction then performs updates on A and B based on the
inconsistent values that it read, the database may be left in an inconsistent state even after both
transactions have completed.
3.3 Schedules
A series of operation from one transaction to another transaction is known as schedule. It is used
to preserve the order of the operation in each of the individual transaction.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it has no
interleaving of operations, then there are the following two possible outcomes:
1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.
o In the given figure 3.11, Schedule A shows the serial schedule where T1 followed by T2
o In the given figure 3.12, Schedule B shows the serial schedule where T2 followed by T1
3.3 Serializability
Serial schedules are serializable, but if steps of multiple transactions are interleaved, it is harder
to determine whether a schedule is serializable.
Since transactions are programs, it is difficult to determine exactly what operations a
transaction performs and how operations of various transactions interact.
For this reason, we shall not consider the various types of operations that a transaction can
perform on a data item, but instead consider only two operations: read and write.
We assume that, between a read(Q) instruction and a write(Q) instruction on a data item Q, a
transaction may perform an arbitrary sequence of operations on the copy of Q that is residing
in the local buffer of the transaction. In this model, the only significant operations of a
transaction, from a scheduling point of view, are its read and write instructions. Commit
operations, though relevant, are not considered. We therefore may show only read and write
instructions in schedules
Different forms of schedule serializablity are
1) Conflict serializability.
2) View serialzablity
1. Conflict Serializability:
Let us consider a schedule S in which there are two consecutive instructions, and J, of
transactions Ti and Tj , respectively (i = j ). If I and J refer to different data items, then we can swap I
and J without affecting the results of any instruction in the schedule. However, if I and J refer to the
same data item Q, then the order of the two steps may matter. Since we are dealing with only read and
write instructions, there are four cases that we need to considered:
I = read(Q), J=read(Q). The order of I and J does not matter, since the same value of Q is read by Ti and T j,
regardless of the order.
I = read(Q), J = write(Q). If I comes before J , then Ti does not read the value of Q that is written
by Tj in instruction J . If J comes before I , then Ti reads the value of Q that is written by Tj. Thus,
the order of I and J matters.
I = write(Q), J = read(Q). The order of I and J matters for reasons similar to those of the previous
case.
a. I = write(Q), J = write(Q). Since both instructions are write operations, the order of these
instructions does not affect either Ti or Tj . However, the value obtained by the next read(Q)
instruction of S is affected, since the result of only the latter of the two write instructions is
preserved in the database. If there is no other write(Q) instruction after I and J in S, then the order
of I and J directly affects the final value of Q in the database state that results from schedule S.
Fig. 3.6 Schedule 3 — showing only the read and write instructions
Thus, only in the case where both I and J are read instructions does the relative order of their
execution not matter. We say that I and J conflict if they are operations by different transactions on the
same data item, and at least one of these instructions is a write operation.
1. View Serializability:
There is another form of equivalence that is less stringent than conflict equivalence, but that, like
conflict equivalence, is based on only the read and writes operations of transactions.
Consider two schedules S and S, where the same set of transactions participates in both
schedules. The schedules S and S are said to be view equivalent if three conditions are met:
(1) For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then
transaction Ti must, in schedule S, also read the initial value of Q.
(2) For each data item Q, if transaction Ti executes read(Q) in schedule S, and if that value was
produced by a write(Q) operation executed by transaction Tj , then the read(Q) operation of
transaction Ti must, in schedule S , also read the value of Q that was produced by the same
write(Q) operation of transaction Tj .
(3) For each data item Q, the transaction (if any) that performs the final write(Q) operation in
schedule S must perform the final write(Q) operation in schedule S .
Conditions 1 and 2 ensure that each transaction reads the same values in both schedules and,
therefore, performs the same computation. Condition 3, coupled with conditions 1 and 2, and ensures
that both schedules result in the same final system state.
The concept of view equivalence leads to the concept of view serializability. We say that a
schedule S is view serializable if it is view equivalent to a serial schedule.
As an illustration, suppose that we augment schedule 4 with transaction T29, and obtain the
following view serializable (schedule 5):
Indeed, schedule 5 is view equivalent to the serial schedule <T27, T28, T29>, since the one
read(Q) instruction reads the initial value of Q in both schedules and T29 performs the final write of Q
in both schedules.
Every conflict-serializable schedule is also view serializable, but there are view-serializable
schedules that are not conflict serializable. Indeed, schedule 5 is not conflict serializable, since every
pair of consecutive instructions conflicts, and, thus, no swapping of instructions is possible.
Observe that, in schedule 5, transactions T28 and T29 perform write(Q) operations without
having performed a read(Q) operation. Writes of this sort are called blind writes. Blind writes appear in
any view-serializable schedule that is not conflict serializable.
(2) ROLLBACK: The ROLLBACK command is used to undo transactions that have not already saved
to database.
For example
Following command will delete the record from the database, but if we immediately performs
ROLLBACK, then this deletion is undone.
For instance -
DELETE FROM Student
WHERE RollNo = 2;
ROLLBACK;
Then the resultant table will be
RollNo Name
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
(3) SAVEPOINT: A SAVEPOINT is a point in a transaction when you can roll the transaction back to a
certain point without rolling back the entire transaction. The SAVEPOINT can be created as
SAVEPOINT savepoint_name;
Then we can ROLLBACK to SAVEPOIT as
ROLLBACK TO savepoint_name;
For example – Let us consider Student table and consider following commands
SQL> SAVEPOINT S1 SQL>DELETE FROM Student
Where RollNo=2;
SQL> SAVEPOINT S2
SQL>DELETE FROM Student
Where RollNo=3;
SQL> SAVEPOINT S3
SQL>DELETE FROM Student
Where RollNo=4
SQL> SAVEPOINT S4
SQL>DELETE FROM Student
Where RollNo=5
Thus the effect of deleting the record having RollNo 2, and RollNo3 is undone.
The two phase locking is a protocol in which there are two phases:
i) Growing phase (Locking phase): It is a phase in which the transaction may obtain locks but does
not release any lock.
ii) Shrinking phase (Unlocking phase): It is a phase in which the transaction may release the locks but
does not obtain any new lock.
There are 3 types of two – phase locking protocol. They are,
1. Strict Two – Phase Locking Protocol
2. Rigorous Two – Phase Locking Protocol
3. Conservative Two – Phase Locking Protocol
Example:
Consider following transactions
T1 T2
Lock-X(A) Lock-S(B)
Read(A) Read(B)
A=A-50 Unlock-S(B)
Write(A)
Lock-X(B)
Unlock-X(A)
B=B+100 Lock-S(A)
Write(B) Read(A)
Unlock-X(B) Unlock-S(A)
The important rule for being a two phase locking is all Lock operations precede all the unlock
operations. In above transactions T1 is in two phase locking mode but transaction T2 is not in two phase
locking. Because in T2, the Shared lock is acquired by data item B, then data item B is read and then the
lock is released. Again the lock is acquired by data item A, then the data item A is read and the lock is
then released. Thus we get lock-unlock-lock-unlock sequence. Clearly this is not possible in two phase
locking.
3.8 Timestamp
Timestamp-based Protocols
The most commonly used concurrency protocol is the timestamp based protocol. This protocol uses
either system time or logical counter as a timestamp. Lock-based protocols manage the order between
the conflicting pairs among transactions at the time of execution, whereas timestamp-based protocols
start working as soon as a transaction is created. Every transaction has a timestamp associated with it,
and the ordering is determined by the age of the transaction.
A transaction created at 0002 clock time would be older than all other transactions that come after
it. For example, any transaction 'y' entering the system at 0004 is two seconds younger and the priority
would be given to the older one. In addition, every data item is given the latest read and write-
timestamp. This lets the system know when the last ‘read and write’ operation was performed on the
data item.
3.9 Multiversion
The DBMS maintains multiple physical versions of single logical object in the database.
When a transaction writes to an object, the database creates a new version of that object.
When a transaction reads an object, it reads the newest version that exists when the
transaction started.
In this technique, the writers don't block readers or readers don't block writer.
This scheme makes use of:
i) Locking protocol and
ii) Timestamp protocol.
The multiversion is now used in almost all database management system as a modern
technique of concurrency control.
With the help of timestamp in validation phase, this protocol determines if the transaction will
commit or rollback. Hence TS (Ti) = validation (Ti). The serializability is determined at the validation
process, it can't be determined in advance. While executing the transactions, this protocol gives greater
degree of concurrency when there are less number of conflicts. That is because the serializability order
is not pre-decided (validated and then executed) and relatively less transaction will have to be rolled
back.
Snapshot Isolation
The snapshot isolation is a multi-version concurrency control technique. In snapshot isolation,
we can imagine that each transaction is given its own version, or snapshot, of the database when it
begins. It reads data from this private version and is thus isolated from the updates made by other
transactions. If the transaction updates the database, that update appears only in its own version, not in
the actual database itself. Information about these updates is saved so that the updates can be applied to
the "real" database if the transaction commits. When a transaction T enters the partially committed state,
it then proceeds to the committed state only if no other concurrent transaction has modified data that T
intends to update. Transactions that, as a result, cannot commit abort instead. Snapshot isolation ensures
that attempts to read data never need to wait.
Multiple granularity locking is a locking mechanism that provides different levels of locks for
different database objects. It allows for different locks at different levels of granularity. This mechanism
allows multiple transactions to lock different levels of granularity, ensuring that conflicts are minimized,
and concurrency is maximized.
For example:
Consider a tree which has four levels of nodes. The first level or higher level shows the entire
database. The second level represents a node of type area. The higher level database consists of exactly
these areas. The area consists of children nodes which are known as files. No file can be present in more
than one area. Finally, each file contains child nodes known as records. The file has exactly those
records that are its child nodes. No records represent in more than one file. Hence, the levels of the tree
starting from the top level are as follows:
- Database
- Area
- File
- Record
There are three additional lock modes with multiple granularity. They are,
i. Intention-shared (IS): It contains explicit locking at a lower level of the tree but only with
shared locks.
ii. Intention-Exclusive (IX): It contains explicit locking at a lower level with exclusive or
shared locks.
iii. Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared mode, and
some node is locked in exclusive mode by the same transaction.
The compatibility metrics for these lock modes are described below:
IS IX S SIX X
1S YES YES YES YES NO
QX YES YES NO NO NO
S YES NO YES NO NO
SIX YES NO NO NO NO
X NO NO NO NO NO
Deadlock is a situation in computing where two or more processes are unable to proceed because
each is waiting for the other to release resources.
For example, let us assume, we have two processes P1 and P2. Now, process P1 is holding the
resource R1 and is waiting for the resource R2. At the same time, the process P2 is having the resource
R2 and is waiting for the resource R1. So, the process P1 is waiting for process P2 to release its resource
and at the same time, the process P2 is waiting for process P1 to release its resource. And no one is
releasing any resource. So, both are waiting for each other to release the resource. This leads to infinite
waiting and no work is done here. This is called Deadlock.
i. Mutual Exclusion:
A resource can be held by only one process at a time. In other words, if a process P1 is using
some resource R at a particular instant of time, then some other process P2 can't hold or use the same
resource R at that particular instant of time. The process P2 can make a request for that resource R but it
can't use that resource simultaneously with process P1.
When system with concurrent transaction crashes and recovers, it does behave in the following
manner:
The recovery system reads the logs backwards from the end to the last Checkpoint.
It maintains two lists, undo-list and redo-list.
If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it
puts the transaction in redo-list.
If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the
transaction in undo-list.
All transactions in undo-list are then undone and their logs are removed. All transaction in redo-
list, their previous logs are removed and then redone again and log saved.
Crash Recovery
Though we are living in highly technologically advanced era where hundreds of satellite monitor
the earth and at every second billions of people are connected through information technology, failure is
expected but not every time acceptable.
DBMS is highly complex system with hundreds of transactions being executed every second.
Availability of DBMS depends on its complex architecture and underlying hardware or system software.
If it fails or crashes amid transactions being executed, it is expected that the system would follow some
sort of algorithm or techniques to recover from crashes or failures.
Failure Classification
To see where the problem has occurred we generalize the failure into various categories, as
follows:
1) Transaction Failure
When a transaction is failed to execute or it reaches a point after which it cannot be completed
successfully it has to abort. This is called transaction failure. Only few transaction or processes are hurt.
2) System Crash
There are problems, which are external to the system, which may cause the system to stop
abruptly and cause the system to crash. For example interruption in power supplies, failure of
underlying hardware or software failure.
3) Disk Failure:
In early days of technology evolution, it was a common problem where hard disk drives or
storage drives used to fail frequently. Disk failures include formation of bad sectors, unreachability to
the disk, disk head crash or any other failure, which destroys all or part of disk storage.
4) Storage Structure
We have already described storage system here. In brief, the storage structure can be divided in
various categories:
Volatile storage: As name suggests, this storage does not survive system crashes and mostly
placed much closed to CPU by embedding them onto the chipset itself for examples: main
memory, cache memory. They are fast but can store a small amount of information.
Nonvolatile storage: These memories are made to survive system crashes. They are huge in data
storage capacity but slower in accessibility. Examples may include, hard disks, magnetic tapes,
flash memory, non-volatile (battery backed up) RAM.
Maintaining the logs of each transaction, and writing them onto some stable storage before
actually modifying the database.
Maintaining shadow paging, where the changes are are done on a volatile memory and later
the actual database is updated.
Log-Based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction.
It is important that the logs are written prior to actual modification and stored on a stable storage media,
which is failsafe. Log based recovery works as follows:
- The log file is kept on stable storage media
- When a transaction enters the system and starts execution, it writes a log about it <Tn, Start>
- When the transaction modifies an item X, it write logs as <Tn, X, V1, V2>
- It reads Tn has changed the value of X, from V1 to V2.
- When transaction finishes, it logs: <Tn, commit>
The directory can be kept in the main memory. When a transaction begins executing, the current
directory-whose entries point to the most recent or current database pages on disk-is copied into a
directory called shadow directory. The shadow directory is then saved on disk while the current
directory is used by the transaction. During the execution of transaction, the shadow directory is never
modified.
When a write operation is to be performed then the new copy of modified database page is
created but the old copy of database page is never overwritten. This newly created database page is
written somewhere else. The current directory will point to newly modified web page and the shadow
page directory will point to the old web page entries of database disk. When the failure occurs then the
modified database pages and current directory is discarded. The state of database before the failure
occurs is now available through the shadow directory and this state can be recovered using shadow
directory pages. This technique does not require any UNDO/REDO operation.
3. Undo Phase
The Undo Phase reverses changes made by aborted transactions, restoring the database to a
consistent state by undoing any modifications from these transactions.
Example: From the log, we need to undo changes made by T2.
Steps:
1. Identify all operations from aborted transactions.
2. Apply undo operations to revert changes made by these transactions.