Concurrency Control Protocol & Recovery
Concurrency Control Protocol & Recovery
Concurrency Control Protocol & Recovery
Several problems can occur when concurrent transactions are executed in an uncontrolled
manner. Following are the three problems in concurrency control.
1. Lost updates
2. Dirty read
3. Unrepeatable read
When two transactions that access the same database items contain their operations
in a way that makes the value of some database item incorrect, then the lost update
problem occurs.
If two transactions T1 and T2 read a record and then update it, then the effect of
updating of the first record will be overwritten by the second update.
Example:
Here,
At time t5, Transactions-Y writes A's value on the basis of the value seen at time
t3.
So at time T5, the update of Transaction-X is lost because Transaction y
overwrites it without looking at its current value.
Such type of problem is known as Lost Update Problem as update made by one
transaction is lost here.
2. Dirty Read
The dirty read occurs in the case when one transaction updates an item of the
database, and then the transaction fails for some reason. The updated database item
is accessed by another transaction before it is changed back to the original value.
A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now has
values which have never formed part of the stable database.
Example:
Here,
At time t4, Transactions-Y rollbacks. So, it changes A's value back to that of prior to
t1.
So, Transaction-X now contains a value which has never become part of the stable
database.
Such type of problem is known as Dirty Read Problem, as one transaction reads a
dirty value which has not been committed.
Example:
1. Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
1. Shared lock:
It is also known as a Read-only lock. In a shared lock, the data item can
only read by the transaction.
It can be shared between the transactions because when the transaction
holds a lock, then it can't update the data on the data item.
2. Exclusive lock:
Exclusive lock is placed when we want to read and write the data. This lock
allows both the read and write operation, Once this lock is placed on the data
no other lock (shared or Exclusive) can be placed on the data until Exclusive
lock is released.
read the data(S lock) don’t allow it, if another transaction wants to write the
data(X lock) don’t allow that either.
__________________________
| | S | X |
|-------------------------
| S | True | False |
|-------------------------
| X | False | False |
--------------------------
How to read this matrix?:
There are two rows, first row says that when S lock is placed, another S lock can be
acquired so it is marked true but no Exclusive locks can be acquired so marked
False.
In second row, When X lock is acquired neither S nor X lock can be acquired so both
marked false.
Growing phase: In the growing phase, a new lock on the data item may be
acquired by the transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may
be released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can
happen:
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
Transaction T2:
2-PL ensures serializablity, but there are still some drawbacks of 2-PL. Let’s glance at the
drawbacks:
7
Take a moment to analyze the schedule. Yes, you’re correct, because of Dirty Read in T2and
T3 in lines 8 and 12 respectively, when T1 failed we have to rollback others also.
HenceCascading Rollbacks are possible in 2-PL. I have taken skeleton schedules as
examples because it’s easy to understand when it’s kept simple. When explained with real
time transaction problems with many variables, it becomes very complex.
Deadlock in 2-PL –
Consider this simple example, it will be easy to understand.Say we have two transactions
T1 and T2.
Schedule: Lock-X1(A) Lock-X2(B) Lock-X1(B) Lock-X2(A)
Drawing the precedence graph, you may detect the loop. So Deadlock is also possible in 2-
PL.
Two phase locking may also limit the amount of concurrency that occur in a schedule
because a Transaction may not be able to release an item after it has used it. This may be
because of the protocols and other restrictions we may put on the schedule to ensure
serializablity, deadlock freedom and other factors. This is the price we have to pay to ensure
serializablity and other factors, hence it can be considered as a bargain between
concurrency and maintaining the ACID properties.
8
The above mentioned type of 2-PL is called Basic 2PL. To sum it up it ensures Conflict
Serializability but does not prevent Cascading Rollback and Deadlock
Neither T3 nor T4 can make progress, executing lock-s(B) causes T4 to wait for T3
to release its lock on B,while executing lock-x(A) causes T3 to wait for T4 to
release its lock on A.
Such a situation is called Deadlock ,to handle a deadlock one of T3 and T4 must
be rolled back and its lock is released.
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
Where,
Database Recovery:
It is the method of restoring the database to its correct state in the event of a failure at
the time of the transaction or after the end of a process. Earlier you have been given the
concept of database recovery as a service which should be provided by all the DBMS for
ensuring that the database is dependable and remains in a consistent state in the
presence of failures. In this context, dependability refers to both the flexibility of the
DBMS to various kinds of failure and its ability to recover from those failures. In this
chapter, you will gather a brief knowledge of how this service can be provided
When a transaction is being executed in the system, it may fail to execute due to various
reasons. The failure can be because of system program, bug in a program, user, or
system crash. These failures can be broadly classified into three categories.
Transaction Failure : This type of failure affects only few tables or processes. This
is the condition in the transaction where a transaction cannot execute it
further. This failure can be because of user or executing program/ transaction.
The user may cancel the transaction when the transaction is executing by
pressing the cancel button or abort using the DB commands. The transaction
may fail because of the constraints on the tables – violation of constraints. It
can even fail if there is concurrent processing of multiple transactions and there
is lack of resources for all of them or deadlock situation. All these will cause the
transaction to stop processing in the middle of its execution. When a
transaction fails / stops in the middle, it would have partially changed DB and it
needs to be rolled back to previous consistent state. In ATM withdrawal
example, if the user cancels his transaction after step (i), the system should be
able to stop further processing of the transaction, or if he cancels the
transaction after step (ii), the system should be strong enough to update his
balance in his account. Here system may cancel the transaction due to
insufficient balance. The failure can be because of errors in the code – logical
errors or because of system errors like deadlock or unavailability of system
resources to execute the transactions.
Logical errors
System errors
1. Logical errors: Where a transaction cannot complete as a result of its code error
or an internal error condition.
2. System errors: Wherever the information system itself terminates an energetic
transaction as a result of the DBMS isn’t able to execute it, or it’s to prevent due
15
DiskFailure : These are the issues with hard disks like formation of bad sectors,
disk head crash, unavailability of disk etc. Data can even be lost because of fire,
flood, theft etc. This is mainly affects the secondary memory where the actual
data lies. In these cases, we need to have alternative ways of storing DB. We
can create backups of DB at regular basis and store them separately from the
memory where DB is stored or maintain multiple copies of DB at different
network locations to recover them from failure.
Recovery Mechanism
Every DBMS should offer the following facilities to help out with the recovery
mechanism:
Backup mechanism makes backup copies at a specific interval for the database.
16
Logging facilities keep tracing the current state of transactions and any changes
made to the database.
Checkpoint facility allows updates to the database for getting the latest patches
to be made permanent and keep secure from vulnerability.
Recovery manager allows the database system for restoring the database to a
reliable and steady state after any failure occurs.
In this method, log of each transaction is maintained in some stable storage, so that in case of
any failure, it can be recovered from there to recover the database. But storing the logs should
be done before applying the actual transaction on the database.
Every log in this case will have information’s like what transaction is being executed, which
values have been modified to which value, and state of the transaction. All these log
information will be stored in the order of execution.
Suppose there is a transaction to modify the address of a student. Let us see what logs are
written for this transaction.
<Tn, Start>
When the transaction modifies the address from ‘Troy’ to ‘Fraser Town’, another log is
written to the file.
When the transaction is completed, it writes another log to indicate end of the
transaction.
17
<Tn, Commit>
1. Undo 2. Redo
2. undo
An undo operation has to be executed for all the transactions that are not committed
during the failure. All uncommitted transaction has to be rolled back. i.e.; if a log file
contains <T1, Start>, but no <T1, Commit> then such transactions have to be rolled back
or undone
3. redo
All the committed transactions have to redone during the recovery. That means, these
transactions are already executed during the failure and its value has to be retained in
the DB. That is if the log file contains <T1, Start> and <T1, Commit> then such
transactions have to be re-executed.
There are two methods of creating this log files and updating the database
Updates are not written to the database until after a transaction has reached to its
commit part. Is called deferred database update/modification.
In this method, all the logs for the transaction is created and stored into stable storage
system first. Once it is stored, the database is updated with changes, That means, log is
created and stored in stable memory first; then the actual DB changes are done by checking
the logs.
For example, suppose we have transaction T1 which will read the value of X and Y,
and update their values as below :
18
Let initially values of X and Y be 25 and 30. Then deferred database modification will log all
the steps into log file first. It will not update the database as soon as WRITE (X) is logged
into log file. It will wait till all the logs for T1 is updated into log file. Once it is done, it will
read the log file and update the database. The log file for T1 will look as above.
If there is any crash, then the data is recovered by checking these logs. If there is
both <Tn, Start> and <Tn, Commit> for the transaction, then that transaction has to be re-
executed. But the crash can happen while updating the log files or may be while updating
the database itself.
So it is clear that when <T1, Commit> is reached, T1 is updated to DB and T2 is updated to DB,
when <T2, Commit> is reached in log file.
Case1:
Suppose the system fails while logging the logs. If it fails at step 3, <T1, Y, 30, 20>, then no need
to redo T1. This is because, log file has not updated commit for T1 still.
Case2:
If it fails after logging <T1, Commit>, then T1 has to be re-executed.
19
Case3:
If it fails after <T2, Commit>, then both T1 and T2 has to be re-executed.
For example, suppose we have transaction T1 which will read the value of X, Y,and
Z update their values as below :
Case1:
Suppose transaction fails after logging <T1, Y, 30, 20>, then T1 has to be rolled back –
undo (T1). i.e.; it should rollback Y to 30 and X to 25.
Case2.
Suppose transaction has failed as soon as <T1, Commit> is executed. Then T1 has to be
redone, because log has both start and commit entries. Hence X = 125 and Y = 30. It
should not consider old value of X = 125 while re-executing. Old values of X and Y should
always remain as 25 and 30 and new values always should be 125 and 20.
Case3
If the transaction failed after executing <T2, Z, 15, 20>, then T2 has to be undone first to
the value Z = 15 and T1 has to be re-executed; X and Y should have new values as 125
and 20 respectively.
Shadow Database: In this method, all the changes of transactions are updated in the shadow
copy (duplicate copy) of the database. Once all the transaction is complete, the DB pointer is
made to point to this shadow database, making this as the new copy of the DB. The old copy of
DB is then deleted.
If there is any failure during the transaction, the pointer will be still pointing to old copy of
database, and shadow database will be deleted. If the transactions are complete then the
pointer is changed to point to shadow DB, and old DB is deleted.
As we can see in above diagram, the DB pointer is always pointing to consistent and stable
database. This mechanism assumes that there will not be any disk failure and only one
transaction executing at a time so that the shadow DB can hold the data for that transaction. It
is useful if the DB is comparatively small because shadow DB consumes same memory space as
the actual DB. Hence it is not efficient for huge DBs. In addition, it cannot handle concurrent
execution of transactions. It is suitable for one transaction at a time.
Shadow Paging:
It provides atomicity and durability. A directory with n entries is constructed, where the ith
entry points to the ith database page on the link. When a transaction began executing the
current directory is copied into a shadow directory. When a page is to be modified, a shadow
page is allocated in which changes are made and when it is ready to become durable, all pages
that refer to original are updated to refer new replacement page.
A page table with n entries is constructed where the ith page table entry points to the
ith database page on disk.
Writes operations-new copy of database page is created and current page table entry
modified to point to new disk page/block
The state of the database before transaction execution is available through the shadow
page table
That state is recovered by reinstating the shadow page table to become the current page
table once mОre
Committing a transaction
o The checkpoint is a type of mechanism where all the previous logs are removed from the
system and permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the
transaction, the log files will be created.
o The checkpoint is used to declare a point before which the DBMS was in the consistent
state, and all transactions were committed
o When it reaches to the checkpoint, then the transaction will be updated into the
database, and till that point, the entire log file will be removed from the file. Then the
log file is updated with the new step of transaction till next checkpoint and so on.
In the following manner, a recovery system recovers the database from this failure:
Steps:
1. Recovery system maintains two lists, a redo-list, and an undo-list. So Initialize a redo-
list, and an undo-list
2. The recovery system scan the log files from the end, stopping when the first
<checkpoint> record is found. E.g It reads log files from T4 to T1.
b. If the record is <Ti start>, then if Ti is not in redo list , add Ti to undo list.
At this point, Undo list consist of incomplete transaction which must be undone, and redo-list
consist of finished transactions that must be redone.
OR
1. Recovery system maintains two lists, a redo-list, and an undo-list. So Initialize a redo-
list, and an undo-list
2. If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, transaction is put into redo state. In the redo-list and their previous list, all the
transactions are removed and then redone before saving their logs.
3. For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn,
Commit>. The T1 transaction will have only <Tn, commit> in the log file. That's why the
transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and T3
transaction into redo list.
4. If the recovery system sees a log with <Tn, Start> but no commit or abort log found,
Then transaction is put into undo state. In the undo-list, all the transactions are undone,
and their logs are removed.
5. For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list since
this transaction is not yet complete and failed amid.