Concurrency Control Protocol & Recovery

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 23

1

Concurrency Control & Recovery


# Concurrency Control
In the concurrency control, the multiple transactions can be executed simultaneously at the
same time on same data. It may affect the transaction result. Hence, It is highly important to
maintain the order of execution of those transactions .

Problems of concurrency control

Several problems can occur when concurrent transactions are executed in an uncontrolled
manner. Following are the three problems in concurrency control.

1. Lost updates
2. Dirty read
3. Unrepeatable read

1. Lost update problem

 When two transactions that access the same database items contain their operations
in a way that makes the value of some database item incorrect, then the lost update
problem occurs.
 If two transactions T1 and T2 read a record and then update it, then the effect of
updating of the first record will be overwritten by the second update.

Example:

Here,

 At time t2, transaction-X reads A's value.


 At time t3, Transaction-Y reads A's value.
 At time t4, Transactions-X writes A's value on the basis of the value seen at time
t2.
2

 At time t5, Transactions-Y writes A's value on the basis of the value seen at time
t3.
 So at time T5, the update of Transaction-X is lost because Transaction y
overwrites it without looking at its current value.
 Such type of problem is known as Lost Update Problem as update made by one
transaction is lost here.

2. Dirty Read

 The dirty read occurs in the case when one transaction updates an item of the
database, and then the transaction fails for some reason. The updated database item
is accessed by another transaction before it is changed back to the original value.
 A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now has
values which have never formed part of the stable database.

Example:

Here,

 At time t2, transaction-Y writes A's value.


 At time t3, Transaction-X reads A's value.

 At time t4, Transactions-Y rollbacks. So, it changes A's value back to that of prior to
t1.

 So, Transaction-X now contains a value which has never become part of the stable
database.

 Such type of problem is known as Dirty Read Problem, as one transaction reads a
dirty value which has not been committed.

3. Inconsistent Retrievals Problem/ Unrepeatable read


3

 Inconsistent Retrievals Problem is also known as unrepeatable read. When a


transaction calculates some summary function over a set of data while the other
transactions are updating the data, then the Inconsistent Retrievals Problem
occurs.
 A transaction T1 reads a record and then does some other processing during
which the transaction T2 updates the record. Now when the transaction T1 reads
the record, then the new value will be inconsistent with the previous value.

Example:

Suppose two transactions operate on three accounts.


4

 Transaction-X is doing the sum of all balance while transaction-Y is transferring an


amount 50 from Account-1 to Account-3.
 Here, transaction-X produces the result of 550 which is incorrect. If we write this
produced result in the database, the database will become an inconsistent state
because the actual sum is 600.

 Here, transaction-X has seen an inconsistent state of the database.

# Concurrency Control Protocol/Implementation method of


Concurrency Control
Concurrency control protocols ensure atomicity, isolation, and serializability of concurrent
transactions. The concurrency control protocol can be divided into three categories:

1. Lock based protocol


2. Time-stamp protocol

3. Validation based protocol

1. Lock-Based Protocol

In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:

1. Shared lock:

 It is also known as a Read-only lock. In a shared lock, the data item can
only read by the transaction.
 It can be shared between the transactions because when the transaction
holds a lock, then it can't update the data on the data item.

2. Exclusive lock:

Exclusive lock is placed when we want to read and write the data. This lock
allows both the read and write operation, Once this lock is placed on the data
no other lock (shared or Exclusive) can be placed on the data until Exclusive
lock is released.

For example, when a transaction wants to update the Steve’s account


balance, let it do by placing X lock on it but if a second transaction wants to
5

read the data(S lock) don’t allow it, if another transaction wants to write the
data(X lock) don’t allow that either.

So based on this we can create a table like this:

Lock Compatibility Matrix

__________________________
| | S | X |
|-------------------------
| S | True | False |
|-------------------------
| X | False | False |
--------------------------
How to read this matrix?:
There are two rows, first row says that when S lock is placed, another S lock can be
acquired so it is marked true but no Exclusive locks can be acquired so marked
False.
In second row, When X lock is acquired neither S nor X lock can be acquired so both
marked false.

3. Two-phase locking (2PL)

 The two-phase locking protocol divides the execution phase of the


transaction into three parts.
 In the first part, when the execution of the transaction starts, it seeks
permission for the lock it requires.
 In the second part, the transaction acquires all the locks. The third
phase is started as soon as the transaction releases its first lock.
 In the third phase, the transaction cannot demand any new locks. It only
releases the acquired locks.

There are two phases of 2PL:

Growing phase: In the growing phase, a new lock on the data item may be
acquired by the transaction, but none can be released.

Shrinking phase: In the shrinking phase, existing lock held by the transaction may
be released, but no new locks can be acquired.

In the below example, if lock conversion is allowed then the following phase can
happen:

1. Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.


6

2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.

Example:

The following way shows how unlocking and locking work with 2-PL.

Transaction T1:

 Growing phase: from step 1-3


 Shrinking phase: from step 5-7
 Lock point: at 3

Transaction T2:

 Growing phase: from step 2-6


 Shrinking phase: from step 8-9
 Lock point: at 6

Problem with Two-Phase Protocol

2-PL ensures serializablity, but there are still some drawbacks of 2-PL. Let’s glance at the
drawbacks:
7

 Cascading Rollback is possible under 2-PL.


 Deadlocks and Starvation is possible.

Cascading Rollbacks in 2-PL –


Let’s see the following Schedule:

Take a moment to analyze the schedule. Yes, you’re correct, because of Dirty Read in T2and
T3 in lines 8 and 12 respectively, when T1 failed we have to rollback others also.
HenceCascading Rollbacks are possible in 2-PL. I have taken skeleton schedules as
examples because it’s easy to understand when it’s kept simple. When explained with real
time transaction problems with many variables, it becomes very complex.

Deadlock in 2-PL –
Consider this simple example, it will be easy to understand.Say we have two transactions
T1 and T2.
Schedule: Lock-X1(A) Lock-X2(B) Lock-X1(B) Lock-X2(A)
Drawing the precedence graph, you may detect the loop. So Deadlock is also possible in 2-
PL.

Two phase locking may also limit the amount of concurrency that occur in a schedule
because a Transaction may not be able to release an item after it has used it. This may be
because of the protocols and other restrictions we may put on the schedule to ensure
serializablity, deadlock freedom and other factors. This is the price we have to pay to ensure
serializablity and other factors, hence it can be considered as a bargain between
concurrency and maintaining the ACID properties.
8

The above mentioned type of 2-PL is called Basic 2PL. To sum it up it ensures Conflict
Serializability but does not prevent Cascading Rollback and Deadlock

Strict Two-phase locking (Strict-2PL)


 The first phase of Strict-2PL is similar to 2PL. In the first phase, after
acquiring all the locks, the transaction continues to execute normally.
 The only difference between 2PL and strict 2PL is that Strict-2PL does not
release a lock after using it.
 Strict-2PL waits until the whole transaction to commit, and then it releases all
the locks at a time.
 Strict-2PL protocol does not have shrinking phase of lock release.

It does not have cascading abort as 2PL does.

Pitfalls of lock-based Protocols


9

 Neither T3 nor T4 can make progress, executing lock-s(B) causes T4 to wait for T3
to release its lock on B,while executing lock-x(A) causes T3 to wait for T4 to
release its lock on A.
 Such a situation is called Deadlock ,to handle a deadlock one of T3 and T4 must
be rolled back and its lock is released.

 The probability of deadlock exists in most locking protocols.Deadlocks are


necessary evil.

Starvation is also possible if concurrency control manager is badly designed.

 A transaction may be waiting for an X-lock on an item while a sequence of other


transaction request and are granted an S-lock on the same item.
 The concurrency control manager can be designed to prevent starvation.

Timestamp Ordering Protocol


 The Timestamp Ordering Protocol is used to order the transactions based on
their Timestamps. The order of transaction is nothing but the ascending order
of the transaction creation.
 The priority of the older transaction is higher that's why it executes first. To
determine the timestamp of the transaction, this protocol uses system time or
logical counter.
 The lock-based protocol is used to manage the order between conflicting pairs
among transactions at the execution time. But Timestamp based protocols
start working as soon as a transaction is created.
 Let's assume there are two transactions T1 and T2. Suppose the transaction
T1 has entered the system at 007 times and transaction T2 has entered the
system at 009 times. T1 has the higher priority, so it executes first as it is
entered the system first.
 The timestamp ordering protocol also maintains the timestamp of last 'read'
and 'write' operation on a data.

Basic Timestamp ordering protocol works as follows:

1. Check the following condition whenever a transaction Ti issues a Read (X) operation:

 If W_TS(X) >TS(Ti) then the operation is rejected.


 If W_TS(X) <= TS(Ti) then the operation is executed.
 Timestamps of all the data items are updated.
10

2. Check the following condition whenever a transaction Ti issues a Write(X) operation:

 If TS(Ti) < R_TS(X) then the operation is rejected.


 If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back
otherwise the operation is executed.

Where,

TS(TI) denotes the timestamp of the transaction Ti.


R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.

Advantages and Disadvantages of TO protocol:

 TO protocol ensures serializability since the precedence graph is as follows:


11

 TS protocol ensures freedom from deadlock that means no transaction ever


waits.
 But the schedule may not be recoverable and may not even be cascade- free.
12
13

Database Recovery:

It is the method of restoring the database to its correct state in the event of a failure at
the time of the transaction or after the end of a process. Earlier you have been given the
concept of database recovery as a service which should be provided by all the DBMS for
ensuring that the database is dependable and remains in a consistent state in the
presence of failures. In this context, dependability refers to both the flexibility of the
DBMS to various kinds of failure and its ability to recover from those failures. In this
chapter, you will gather a brief knowledge of how this service can be provided

Database Failure Classification


14

When a transaction is being executed in the system, it may fail to execute due to various
reasons. The failure can be because of system program, bug in a program, user, or
system crash. These failures can be broadly classified into three categories.

 Transaction Failure : This type of failure affects only few tables or processes. This
is the condition in the transaction where a transaction cannot execute it
further. This failure can be because of user or executing program/ transaction.
The user may cancel the transaction when the transaction is executing by
pressing the cancel button or abort using the DB commands. The transaction
may fail because of the constraints on the tables – violation of constraints. It
can even fail if there is concurrent processing of multiple transactions and there
is lack of resources for all of them or deadlock situation. All these will cause the
transaction to stop processing in the middle of its execution. When a
transaction fails / stops in the middle, it would have partially changed DB and it
needs to be rolled back to previous consistent state. In ATM withdrawal
example, if the user cancels his transaction after step (i), the system should be
able to stop further processing of the transaction, or if he cancels the
transaction after step (ii), the system should be strong enough to update his
balance in his account. Here system may cancel the transaction due to
insufficient balance. The failure can be because of errors in the code – logical
errors or because of system errors like deadlock or unavailability of system
resources to execute the transactions.

The reasons for transaction failure are:

 Logical errors
 System errors

1. Logical errors: Where a transaction cannot complete as a result of its code error
or an internal error condition.
2. System errors: Wherever the information system itself terminates an energetic
transaction as a result of the DBMS isn’t able to execute it, or it’s to prevent due
15

to some system condition. to Illustrate, just in case of situation or resource


inconvenience, the system aborts an active transaction.

 System Crash : This can be because of hardware or software failure or because of


external factors like power failure. This is the failure of the system because of
the bug in the software or the failure of system processor. This crash mainly
affects the data in the primary memory. If it affects only the primary memory,
the actual data will not be really affected and recovery from this failure is easy.
This is because primary memories are temporary storages and it would not
have updated the actual database. Hence the system will be in a consistent
state before to the transaction. But when secondary memory crashes, there
would be a loss of data and need to take serious actions to recover lost data.
Because secondary memories contain actual DB data. Recovering them from
crash is little tedious and requires more effort. DB Recovery system provides
strong mechanisms to recovery the system from crash and maintains the
atomicity of the transactions. In most of the cases data in the secondary
memory are not affected because of this crash. This is because; the database
has lots of integrity checkpoints to prevent the data loss from secondary
memory.

 DiskFailure : These are the issues with hard disks like formation of bad sectors,
disk head crash, unavailability of disk etc. Data can even be lost because of fire,
flood, theft etc. This is mainly affects the secondary memory where the actual
data lies. In these cases, we need to have alternative ways of storing DB. We
can create backups of DB at regular basis and store them separately from the
memory where DB is stored or maintain multiple copies of DB at different
network locations to recover them from failure.

Recovery Mechanism

Every DBMS should offer the following facilities to help out with the recovery
mechanism:

 Backup mechanism makes backup copies at a specific interval for the database.
16

 Logging facilities keep tracing the current state of transactions and any changes
made to the database.

 Checkpoint facility allows updates to the database for getting the latest patches
to be made permanent and keep secure from vulnerability.

 Recovery manager allows the database system for restoring the database to a
reliable and steady state after any failure occurs.

Database Recovery Methods/Scheme

1. Log Based Recovery


2. Shadow Paging

1. Log Based Recovery

In this method, log of each transaction is maintained in some stable storage, so that in case of
any failure, it can be recovered from there to recover the database. But storing the logs should
be done before applying the actual transaction on the database.

Every log in this case will have information’s like what transaction is being executed, which
values have been modified to which value, and state of the transaction. All these log
information will be stored in the order of execution.

Suppose there is a transaction to modify the address of a student. Let us see what logs are
written for this transaction.

 As soon as transaction is initiated, it writes ‘start’ log.

<Tn, Start>

 When the transaction modifies the address from ‘Troy’ to ‘Fraser Town’, another log is
written to the file.

<Tn, ADDRESS, ‘Troy’, ‘Fraser Town’>

 When the transaction is completed, it writes another log to indicate end of the
transaction.
17

<Tn, Commit>

To recovering the DB, two operations has to be performed.

1. Undo 2. Redo

2. undo

An undo operation has to be executed for all the transactions that are not committed
during the failure. All uncommitted transaction has to be rolled back. i.e.; if a log file
contains <T1, Start>, but no <T1, Commit> then such transactions have to be rolled back
or undone

3. redo

All the committed transactions have to redone during the recovery. That means, these
transactions are already executed during the failure and its value has to be retained in
the DB. That is if the log file contains <T1, Start> and <T1, Commit> then such
transactions have to be re-executed.

There are two methods of creating this log files and updating the database

a. Deferred database modification b. Immediate database modification

a. Deferred database modification

Updates are not written to the database until after a transaction has reached to its
commit part. Is called deferred database update/modification.

In this method, all the logs for the transaction is created and stored into stable storage
system first. Once it is stored, the database is updated with changes, That means, log is
created and stored in stable memory first; then the actual DB changes are done by checking
the logs.

For example, suppose we have transaction T1 which will read the value of X and Y,
and update their values as below :
18

Let initially values of X and Y be 25 and 30. Then deferred database modification will log all
the steps into log file first. It will not update the database as soon as WRITE (X) is logged
into log file. It will wait till all the logs for T1 is updated into log file. Once it is done, it will
read the log file and update the database. The log file for T1 will look as above.

If there is any crash, then the data is recovered by checking these logs. If there is
both <Tn, Start> and <Tn, Commit> for the transaction, then that transaction has to be re-
executed. But the crash can happen while updating the log files or may be while updating
the database itself.

Suppose in addition to T1 above we have another transaction T2, which updates


the values for Z. Then the transactions and log files will be as below :

So it is clear that when <T1, Commit> is reached, T1 is updated to DB and T2 is updated to DB,
when <T2, Commit> is reached in log file.

Case1:
Suppose the system fails while logging the logs. If it fails at step 3, <T1, Y, 30, 20>, then no need
to redo T1. This is because, log file has not updated commit for T1 still.

Case2:
If it fails after logging <T1, Commit>, then T1 has to be re-executed.
19

Case3:
If it fails after <T2, Commit>, then both T1 and T2 has to be re-executed.

Immediate database modification

Database is updated as soon as log is created

Consider the same example of X, Y and Z above.

For example, suppose we have transaction T1 which will read the value of X, Y,and
Z update their values as below :

Case1:
Suppose transaction fails after logging <T1, Y, 30, 20>, then T1 has to be rolled back –
undo (T1). i.e.; it should rollback Y to 30 and X to 25.

Case2.
Suppose transaction has failed as soon as <T1, Commit> is executed. Then T1 has to be
redone, because log has both start and commit entries. Hence X = 125 and Y = 30. It
should not consider old value of X = 125 while re-executing. Old values of X and Y should
always remain as 25 and 30 and new values always should be 125 and 20.

Case3
If the transaction failed after executing <T2, Z, 15, 20>, then T2 has to be undone first to
the value Z = 15 and T1 has to be re-executed; X and Y should have new values as 125
and 20 respectively.

(iii) Shadow Paging


20

Shadow Database: In this method, all the changes of transactions are updated in the shadow
copy (duplicate copy) of the database. Once all the transaction is complete, the DB pointer is
made to point to this shadow database, making this as the new copy of the DB. The old copy of
DB is then deleted.

If there is any failure during the transaction, the pointer will be still pointing to old copy of
database, and shadow database will be deleted. If the transactions are complete then the
pointer is changed to point to shadow DB, and old DB is deleted.

As we can see in above diagram, the DB pointer is always pointing to consistent and stable
database. This mechanism assumes that there will not be any disk failure and only one
transaction executing at a time so that the shadow DB can hold the data for that transaction. It
is useful if the DB is comparatively small because shadow DB consumes same memory space as
the actual DB. Hence it is not efficient for huge DBs. In addition, it cannot handle concurrent
execution of transactions. It is suitable for one transaction at a time.

Shadow Paging:

It provides atomicity and durability. A directory with n entries is constructed, where the ith
entry points to the ith database page on the link. When a transaction began executing the
current directory is copied into a shadow directory. When a page is to be modified, a shadow
page is allocated in which changes are made and when it is ready to become durable, all pages
that refer to original are updated to refer new replacement page.

Shadow paging is an alternative to log-based recovery; this scheme is useful if transactions


execute Serially
21

Data isn’t updated in place

 The database is considered to be made up of a number of n fixed-size disk blocks or


pages, for recovery purposes.

 A page table with n entries is constructed where the ith page table entry points to the
ith database page on disk.

 Current page table points to most recent current database

When a transaction begins executing

The Current page table is copied into a shadow page table

 Shadow page table is then saved.

 Shadow page table is never modified during transaction execution

 Writes operations-new copy of database page is created and current page table entry
modified to point to new disk page/block

To recover from a failure:

 The state of the database before transaction execution is available through the shadow
page table

 Free modified pages

 Discard current page table

 That state is recovered by reinstating the shadow page table to become the current page
table once mОre

 Committing a transaction

 Discard previous shadow page

 Free old page tables that it references

# Recovery using Checkpoint


Checkpoint
22

o The checkpoint is a type of mechanism where all the previous logs are removed from the
system and permanently stored in the storage disk.

o The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the
transaction, the log files will be created.

o The checkpoint is used to declare a point before which the DBMS was in the consistent
state, and all transactions were committed

o When it reaches to the checkpoint, then the transaction will be updated into the
database, and till that point, the entire log file will be removed from the file. Then the
log file is updated with the new step of transaction till next checkpoint and so on.

In the following manner, a recovery system recovers the database from this failure:

Steps:

1. Recovery system maintains two lists, a redo-list, and an undo-list. So Initialize a redo-
list, and an undo-list

2. The recovery system scan the log files from the end, stopping when the first
<checkpoint> record is found. E.g It reads log files from T4 to T1.

For each record found during the backward scan:

a. If the record is <Ti commit>, then add Ti to redo-list.

b. If the record is <Ti start>, then if Ti is not in redo list , add Ti to undo list.

3. For every Ti in L, if Ti is not in redo-list , add Ti to undo list.


23

At this point, Undo list consist of incomplete transaction which must be undone, and redo-list
consist of finished transactions that must be redone.

OR

1. Recovery system maintains two lists, a redo-list, and an undo-list. So Initialize a redo-
list, and an undo-list

2. If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, transaction is put into redo state. In the redo-list and their previous list, all the
transactions are removed and then redone before saving their logs.

3. For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn,
Commit>. The T1 transaction will have only <Tn, commit> in the log file. That's why the
transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and T3
transaction into redo list.

4. If the recovery system sees a log with <Tn, Start> but no commit or abort log found,
Then transaction is put into undo state. In the undo-list, all the transactions are undone,
and their logs are removed.

5. For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list since
this transaction is not yet complete and failed amid.

You might also like