Unit - 5 Concurrency Control Techniques
Unit - 5 Concurrency Control Techniques
write (B)
unlock (B);
lock-X (A);
read (A);
A: = A + 50;
write (A);
unlock (A);
T2 : lock-S (A);
read (A);
unlock (A);
lock-S (B)
read (B);
unlock (B);
5.1.2 The Two-Phase Locking Protocol
A transaction is said to follow the two-phase locking protocol if all locking operations precede the
first unlock operation in the transaction.
This protocol has divided into two phase.
• Growing Phase: A transaction may obtain new lock, but may not release any lock.
• Shrinking Phase : A transaction can realase lock but may not obtain any new lock.
Condition of 2 phase Locking Protocol :
• A transaction is in thc growing phase. The transaction obtained locks as needed. Once the
transaction release a lock, it enters the shrinking phase and it can issue no more lock request.
There are various version of two phase locking protocol.
(i) Dynamic 2-phase locking : Here a transaction locks a data item immediately before any
operation is applied on the data item. After finishing all the operations on all d ta item, it release all
the locks.
Example:
T : lock-X(A);
read (A);
A: = A-50;
write (A);
lock-X(B);
read (B);
B: = B + 50;
write (B); .
unlock (A);
unlock (B);
(2) Static (Conservative) two-phase locking:
In this ~cheme, all the data item~ are locked before any operation on them and are released any
alter the last operation performed on any data item.
Example:
T : lock-X(B);
lock-X(A);
read (B);
B: = B - 50;
182 DATABASE MANAGEMENT SYSTEMS
write (B);
read (A);
A: = A + 50;
write (A);
unlock (B);
unlock (A);
(3) Strict two-phase locking: In computer science, strict 2-PH locking is a locking method used
in concurrent system.
The two rules of strict 2PL are:
(i) If transaction T wants to read/write an object, it must request a shared/exclusive lock on the
object.
(ii) All exclusive lock held by transaction T are released when T commits or aborts (& not
before).
TJ T2
Lock - S (A)
read (A)
unlock (A)
Lock - S (A)
read (A)
Lock - X (B)
read (B)
write (B)
unlock (A)
unlock (B)
Note: Strict 2PL prevents transactions reading uncommitted data, overwriting uncommitted data
and unrepeatable reads. Thus it prevent cascading roll backs since exclusive lock must be held until a
transaction commib.
(4) The Rigorous 2 P Locking : Which requires that all locks be held until the transaction
commits. We can ea<;ily verify that, which rigorous two-phase locking, transaction can be serialized in
the order in which they commit.
Tl T2
L",ok - X (A)
read (A)
A:=A+S()
write (A)
unlol"k (A)
Lock - X (A)
read (A)
temp: = A * 0.3
A = A + temp
write (A)
unlock (A)
CONCURRENCY CONTROL TECHNIQUES 183
I R-TS 1-1------18r------l'---W_-_T_S-'
else
{
return REJECT;
else
return REJECT;
}
Ex. 1. read (B);
read (A);
display (A + B);
Tz : read (B);
B: B - 50;
write (B);
read (A);
A: = A + 50;
display (A + B);
In this schedules under timestamp protocol, we shall assume that a transaction is assigned a
timestamp immediately before its instruction.
Tl T2
read (8)
read (8)
8:=8-50
write (8)
Thus in this schedule TS(T 1) < TS (T 2) and the schedule is possible under time stamp protocol.
Ex. 2. Tl : W(X) T z : R(X) T3: W(X)
• T 1 Starts, given TS 10 .... Then T 2 Starts, given TS 11.
• a while later, T3 starts, given TS12.
CONCURRENCY CONTROL TECHNIQUES 185
Value Xl Value of
R-TS X W-TS
First TI write X
• o
(2) Validation (T) : The time when transaction T finished its read phase and started its validation
phase.
(3) Finish (T) : The time when transaction T finished its write phase.
The validation test for transaction Tj requires that, for all transaction Ti which TS(Ti) < TS (Tj) ,
one of the following two conditions must hold .
. (i) Finish (Ti) < start (Tj)
Since Ti compictes its execution before Tj started.
(ii) Start (Tj) < Finish (Ti) < Validation (Tj)
This condition ensure that the writes of Ti & Tj do not overlap; because writes of Ti do not
affect the read of Tj.
Example:
read (B)
read (B)
B: = B - 50
read (A)
A:=A+50
read (A)
<validate>
display (A + B)
<validate>
write (B)
write (A)
Advantages of Optimistic Methods
The optimistic concurrency control has the following advantages :
• This technique is very efficient when conflicts are rare. The occasional conflicts result in the
transaction roll back.
• The rollback involves only the local copy of data, the database is not involved and thus there
will not be any cascading rollbacks.
Problems of Optimistic Methods
The optimistic concurrency controi suffers from the following problems :
• Conflicts are expensive to deal with, since the conflicting transaction must be rolled back.
• Longer transactions are more likely to have conflicts and may be repeatedly rolled back
because of conflicts with short transactions.
Applications of Optimistic Methods
• Only suitable for environm~nts where there are few conflicts and no long transactions.
• Acceptable for mostly Read or Query database systems that require very few update
transactions.
5.4 MULTIPLE GRANULARITY LOCKING
• In computer science, multiple granularity locking, sometimes called the 'John Rayner' locking
method, is a locking method used in DBMS & RDBMS.
CONCURRENCY CONTROL TECHNIQUES 187
• In multiple granularity locking, locks are set of objects that contains object. MGL exploits the
hierarchical nature of the contains relationship.
• A database item could be choosen one of the following :
(i) A database record.
(ii) A field value of a database/record.
(iii) A whole file.
(iv) The whole database.
(v) A disk block.
The size of data items is often called the data itme of granularity.
Fine granularity means small item sizes where as coarse granularity mean large item size.
Example : A database may have files, which contain pages, which futher contains records. This
can be thought of as a tree of objects, where each node contains its children. A lock locks a node and
its descendents.
Granularity Hierarchy
• Multiple granularity locking is usually used with Non-Strict two phase locking to guarantee
serializability. Where a lock can be requested at an level.
There are three types of intention locks.
(1) Intention-shared (IS) indicates that a shared lock (S) will be requested on some descendent
node (S).
(2) Intention-exclusive (IX) indicates that on exclusive lock (X) will be requested on some
descendant node (S).
(3) Shared-Intention-exclusive (SIX) indicates that the current node is locked in shared mode
but an exclusive lock (S) will be requested on some descendent node (S).
188 DATABASE MANAGEMENT SYSTEMS
IS IX S SIX X
IS Yes Yes Yes Yes No
IX Yes Yes No No No
S Yes No Yes No No
SIX Yes No No No No
X No No No No No
To lock a node in (S or X), MGL has the transaction locks of all of its ancestors with IS (or IX),
so if a transaction lock a node in S (or X), no other transaction can access its ancestors in X (or S &
X).
5.5 MULTI-VERSION SCHEMES
In multiversion concurrency control schemes.
• Each write (X) operation creates a new "version" of the item X.
• A TS (Xn) does not overwrite old values of a data item X.
• A read operation is never rejected.
Read History
Write History
The W-timestamp & R-timestamp of Nth Version : In this method, several versions Xl' X 2• X3
.............. Xn of each data item X are maintained for each version, the value of version X n.
(i) Read-TS(Xn) OR [R-TS (Xn)] : The read timestamp of Xn is the largest of all the timestamp
of transactions that have successfully read version X n.
i.e.,
read by TS(X n) with timestamp TS
read Vj where
j = Max {if TSi < TS}
Add TS to read history;
(ii) Write-TS (Xn) OR [W-TS (X n)] :
The write timestamp of Xn is the timestamp of the transaction that wrote the value of version
When ever a transaction T is allowed to execute a write-item (X) operation a new version
Xn + 1 of item X is created with read-TS (Xn + 1) & write-TS (Xn + 1) set to TS(T).
That is write val by TS (Xn) with
TS
if (there exists k such that
TS < R-TSk < W-TSj
where j = min {i\TS<W-TSi}
{
REJECT;
}
else,
CONCURRENCY CONTROL TECHNIQUES 189
{
Add < TS, Val> to write history;
}
• When a transaction T is allowed to read the value of version X n, the value of read-TS(Xn) is
set of the larger of the current read-TS(Xn) and TS(T).
To ensure serializability, the folluwing two rules are used.
(1) If transaction T issues write (X) operation, and version n of x has the highest write-TS(Xn)
of all version of X that is also less than or equal to TS(T),
i.e., write-TS(Xn):S; TS(T)
& read-TS (Xn) > TS(T), Then
abort & rollback transaction T;
otherwise, create a new version Xj of X with
read-TS(Xj) = write-TS(Xj) = TS(T).
(2) If transaction T issues a read (X) operation, find the version n of X that has the highest,
write-TS(X n) of all version of X that is < = TS (T)
i.e., write-TS(Xn) :s; TS(T). Then return the value of Xn to T, & set the value of
read-TS(Xn» to the larger of TS(T) & current read-TS(Xn).
5.6 MULTI-VERSION TWO-PHASE LOCKING
The multiversion two-phase locking protocol attempts to combine the advantages of multiversion
concurrency control with the advantages of two-phase locking.
This protocol differentiates between read-only transactions & updates transaction.
• Update transactions perform rigorous two-phase locking, that is, they hold all locks up to the
end of the transaction.
Thus they can be serialized according to their commit order. Each version of a data item has a
single timestamp. The timestamp in this case is not a real clock-based timestamp.
• When a read-only transaction Ti issues a read (Q), the value returned is the contents of the
version whose timestamp is the largest timestamp less than TS(Ti).
• When an update transaction reads an item, it gets a shared lock on the item, and reads the
latest version of that item.
• When an update transaction wants to write an item, it first gets an exclusive lock on the item,
and then creates a n~w version of the data item. When the update transaction Ti completes
its actions, it carries out commit processing.
A multiversion two-phase locking scheduler works as follows:
(i) Each read requires a read lock on the item being read.
(ii) Each write requires a write lock on the item being written.
(iii) A write lock prevents ................... .
(a) Reading of the item but not its earlier version.
(b) Creating of a new version of the item.
Slightly more flexible variants of the two 2PL schedulers .....
• Read not only the most recent version of items.
• But that can lead to cascading aborts.
The scheduler uses three types of locks
• read lock that collides with certify lock (but not write lock)
• write lock that collides with write & certify lock (but not read lock)
• certify lock that collides with read, write, & certify lock.
190 DATABASE MANAGEMENT SYSTEMS
The computers in distributed system may vary in size and function, ranging from workstations upto
mainframe systems. The computers in a distributed database system are referred to by a number of
different names, such as sites or nodes.
Properties of Distributed Databases: Distributed database system should make the impact of
data distribution transparent. Distributed database systems should have the following properties.
• Distributed data independence.
• Distributed transaction atomicity.
Distributed Data Independence: Distributed data independence property enables users to ask
queries without specifying where the reference relations or copies or fragments of the relations are
located. This principle is a natural extension of physical and logical data independence. Further,
queries that span multiple sites should be optimised systematically in a cost-based manner, taking into
account communication costs and difference in local computation costs.
Distributed Transaction Atomicity: Distributed transaction atomicity property enables users
to write transactions that access and update data at several sites just as they would write transactions
over purely local data. In particular, the effects of a transaction across sites should continue to be
atomic. That is all changes persist if the transaction commits, and none persist if it aborts.
5.8.1 Classification of Distributed Database
We can classifie distributed database as :
• Homogeneous
• Hetrogencous
Homogeneous Distribtued Database
In a homogeneous distributed database, all sites have identical management system software that
agree to cooperate in processing users requests.
In such a system, local sites surrender a portion of their autonomy in terms of their right to
change schemes or DBMS software. .
Homogeneous DDBS is the simplest form of a distributed database where there are several sites,
each running their own applications on the same DBMS software. All sites have identical DBMS
software, all users use identical software, are aware of one another end agree to co-operate in
processing user's request. The application can all see the same schema and run the same transactions.
That is, there is location transparency in homogeneous DDBS.
Oracle
Database
Oracle
Database
IMS
Database
(3) Security : Distributed transaction executed with the proper management of the security of
the data.
(4) Distributed query processing: The ability to access remote sites & transmit queries and data
among the various sites via a communication network.
(5) Distributed Database Recovery: The ability of DDB to recover from individual sites crashes.
(6) Replicated Data Management: The ability to decide which copy of a replicated data item to
access to maintain the consistency of the replicated data items.
(7) Keeping track of data : The ability of DDB to keep track of the data fragmentation,
distribution, & replication by expanding DDBMS catalog.
(8) Improved scalability.
(9) Easier expansion.
(10) Improved the performance.
(11) Parallel evaluation.
5.8.4 Disadvantages of Distributed Database
(1) Technical problem of connecting dissimilar machine.
(2) Software cost and Complexity: More complex software is required for a distributed database
environment.
(3) Difficulty in Data Integrity Control : A byproduct of the increased complexity and need for
coordination is the additional exposure to improper updating and other problems of data integrity.
(4) Processing Overhead : The various sites must exchange messages and perform additional
calculation to ensure proper coordination among the sites.
(5) Communication Network failures.
(6) Loss of messages.
(7) Recovery of failure is more complex.
(8) Increased complexity in the system design and implementation.
(9) Security concern of replicated data in multiple location and the network.
(10) Increased transparency leads to a compromise between ease of use and the overhead cost
of providing transparency.
(11) Greater potential for bugs.
5.8.5 Architecture of Distributed Databases
Distributed databases use a client/server architecture to process information requests.
Client/Server Architecture: Client/Server architectures are those in which a DBMS related
work load is split into two logical components namely client and server, each of which typically executes
on different systems. Client is the user of the resource whereas the server is a provider of the resource.
The applications and tools are put on one or more client platforms and are connected to database
management system that resides on the server. The applications and tools act as 'client' of the DBMS,
making requests for its services. The client/sever architecture can be used to implement a DBMS in
which the client i~ the transaction processor and the server is the data processor.
The client applications issue SOL statements for data access, just as they do in centralised
computing client applications to connect to the server, send SOL statements and receive results or
error return code after server has processed the SOL statements.
CONCURRENCY CONTROL TECHNIQUES 195
Database Server
-
-...,
-
Databa~e Database
-
Fig. Client/Server database arC;:hitecture.
Database
-\pphcatlOn
Transaction
In~crt IIltn
Emp ~ak~
Dddc from
Dcpt
contrast, an indirect connection occurs when a client connects to a server and then access information
contained in a database on a different server.
Example: If you connect to the HQ database and access the dept table on this database as in
figure, you can issue the following:
SELECT * FROM dept;
This query is direct because you are not accessing on obejct on a remote database.
• If you connect to the HQ database but access the emp table on the remote sales database as
in figure, you can issue the following :
SELECT * FROM emp@sales;
This query is indirect because the object you are accessing is not on the database to which you
are directly connected.
Advantages of Client/Server Database Architecture
(1) In most cases, a client/server architecture enables the roles and responsibilities of a
computing system to be distributed among several independent computers that are known
to each other only through a network.
This creates an additional advantages to this architecture : greater ease of maintenance.
(2) It functions with multiple different users with different capabilities.
(3) This architectures is relatively simple to implement, due to its clean separation of
functionality because the server is centralised.
(4) Improved performance with more processing power scattered throughout the organization.
(5) All the data is stored on the servers, which generally have far greater security controls than
most clients. Server can better control access and resources, to guarantee that only those
clients with the appropriate permissions may access and change data.
(6) Since data stored in centralised, updates to that data are far easier to administer than what
would be possible under a peer to peer.
(7) Reduced the total cost of ownership.
(S) Increases productivity.
('J) As clients do not playa major role in this model, they require less administrator.
(10) Improved the security.
Disadvantages
(1) Traffic congestion on the network has been an issue since the inception of the client/server
paradigm. As the number of simultaneous client requests to a given server, the server can
become overloaded.
(2) The client/server paradigm lacks the robustness of a good peer to peer network. Under
client/server, should a critical server fail, clients requests cannot be fulfilled.
5.S.6 Distributed Database System Design
The design of a distributed database system is a complex task. Therefore, a careful assessment
of the strategies and objectives is required. Some of the strategies and objectives that are common to
the most distributed database system design are as follows :
• Data Fragmentation: Which are applied to relational database system to partition the
relations among network sites.
• Data Allocatifln: In which each fragment is stored at the site with optional distribution.
CONCURRENCY CONTROL TECHNIQUES 197
• Data Replication: Which increases the availability and improves the performance of the
system.
• Location Transparency: Which enables a user to access data without knowing, or being
concerned with, the site at which the data resides. The location of the data is hidden from the
us,cr.
• Replication Transparency: Meaning that when more than one copy of data exists, one copy
is chosen while retrieving data and all other copies are updated when changes are being made.
• Configuration Independence: Which enables the organization to add or replace hardware
without changing the existing software component of the DBMS. It ensures the expandability
of existing system when its current hardware is saturated.
• Non-homogeneity DBMS: Which helps in integrating databases maintained by different
, DBMSs at different sites on different computers.
Data fragmentation data replication and data allocation are the most commonly used techniques
that are uscd during the process of DDBS design to break up the database into logical units and storing
certain data in more than one site.
5.8.7 Transaction Processing in Distributed Systems
Transaction System : Transaction procesing systems are systems with large database and a large
number of concurrent users are executing database transaction.
Transaction Processing: In a distributed DBMS, a given transaction is submitted at some one
site, but it can access data at other sites. When a transaction is submitted at some sites, the transaction
manager at that site breaks it up into a collection of one or more subtransaction that execute at
different sites.
There are two types of transaction.
• Local Transaction : The local transactions are those that access and update data in only one
local database.
• Global Transactions: The global transactions are those that access and update data in several
local database.
5.8.7.1 System Structure: Each site has its own local transaction manager, whose function is to
ensure the ACID properties of those transactions that execute at that site.
In transaction system each site contains two subsystems.
(i) Transaction Manager: The transaction manager manages the execution of those transaction
that access data stored in a local site.
(ii) Transaction Coordinator : The transaction coordinator coordinates the execution of the
various transactions initiated at that site.
Transaction
coordinator
Transaction
manager
Computer 1 Computer 2
198 DATABASE MANAGEMENT SYSTEMS
Fragment Studentl
Roll No. Name Address City State
3. Gopal lrl>2 Noida V.P.
4. Sanjay M-129 New Delhi Delhi
(2) Vertical Fragmentation: Vertical fragmentation splits the relation by decomposition the
schema R of the relation r.
Each fragmentation ri of r is dermed by
Ir
l = JrRi (r)1
where R is an attribute. The fragmentation should be done in such a way that we can reconstruct
relation r from the fragments by taking the natural join.
r = r1 [X] r2[X1 .......... [X]rn
:,
Example.' Relation Student
Roll No. Name Address City State
1. Vijay
. H-327 Gr. Noida V.P.
2. Santosh F-300 New Delhi Delhi
3. Gopal lrl>2 Noida V.P.
4. S'anjay M-129 New Delhi Delhi
Fragment Student 1 Fragment Student 2
Roll No. Name Address Roll No. City State
1. Vijay H-327 1. Gr. Noida V.P.
2. Santosh F-300 2. New Delhi Delhi
3. Gopal lrl>2 3. Noida V.P.
4. Sanjay M-129 4. New Delhi Delhi
Mixed Fragmentation : We can intermix the two types of fragmentation (Horizontal & Vertical
Fragmentations) yielding mixed fragmentation.
The mathematically representation of mixed fragmentation is :
Ir l = JrRt (oPi (r» I
These are following cases arises.
Case 1 : if Pi :;t:. True & Ri = ATTRS (R),
Then, we get a vertical fragment.
Case 2: If Pi = True & Ri :;t:. ATTRS (R),
Then, we get- a horizontal fragment.
Case 3: If Pi :;t:. True & Ri :;t:. ATTRS(R),
Then, we get mixed fragment.
5.8.9 Data Replication & Allocation
Data Replication means the copies of data. Data replication is a technique that permits storage
of certain data in more than one site. The system maintains several identical replicas (copies) of the
relation and store each copy at a different site, Typically, data replication is introduced to increase the
200 DATABASE MANAGEMENT SYSTEMS
availability of the system when a copy is not available due to site failure(s), it should be possible to
access another copy.
Fully Replicated Distributed Database : The replication of the whole database at every site in
the distributed system, is called fully replicated distributed database.
Advantages : These are following advantages :
(i) This can improve the availability because the system can continue to operate as long as at
least one site is up.
(ii) It also improves performance of retrieval for global queries, because the result of such a
query can be obtained locally from anyone site.
Disadvantages: The disadvantages of fully replication is that:
(i) It can slow down update operations drastically, because a single logical update must be
performed on every copy of the database to keep the copies consistent.
(ii) Fully replication makes the concurrency control and recovery techniques more expensive
than they would be if there were no replication.
• Replication Schema : A description of the replication of fragments is called a replication
schema.
• P"drtial Replication : In partial Replication some fragments of the database may be replicated
where as others may not.
5.8.10 Data Allocation
Data allocation describes the process of deciding about locating (or placing) data to several sites.
Following are the data placement strategies that are used to distributed database systems.
• Centralised
• Partitioned or fragmented
• Replicated
In case of centralised strategies, entire single database and the DBMS is stored at one site.
However, users are geographically distributed across the network. Locality of reference is lowest as all
sites, except the central site, have to use the network for all data accesses. Thus, the communication
costs are high. Since the entire database resides at one site, there is loss of the entire database system
in case of failure of the central site. Hence, the reliability and availability are low.
In partitioned or fragmented strategy, database is divided into several disjoint parts (fragments)
and stored at several sites. If the data items are located at the site where they are used most frequently,
locality of reference is high. As there is no replication, storage costs are low. The failure of system at
a particular site will result in the loss of data of that site. Hence, the reliability and availability are
higher than centralised strategy.
However, overall reliability and availability are still low. The communication cost is low and
overall performance is good as compared to centralised strategy.
In replication strategy, copies of one or more database fragments are stored at several sites. Thus,
the locality of reference, reliability and availability and performance are maximized. But, the
communication and storage costs are very high.
In data allocation or data distributiOil each copy of a fragment must be assigned to a particular
site in the distributed system.
• Non Redundant Allocation: If each fragment is stored at exactly one site, then all fragments
must be disjoints except for the repetition of primary keys among vcctical or mixed fragments,
this is called no redundant allocation.
CONCURRENCY CONTROL TECHNIQUES 201
The choice of sites, the degree of replication depend on the performance and availability goals
of the system and on the types and frequencies of transactions submitted at each site.
For Example: If high availability is required and transactions can be submitted at any site and if
mo~t transactions are retrieval only, a fully replicated database is a good choice.
When a transaction wishes to a lock data item X, which is not replicated and resides at site Si,
a message is sent to the lock manager at site Si requesting a lock. •
If data item X is locked in an incompatible mode, then the request is delayed until it can be
granted. ~"
Once it has determined that thel5ck request can be granted, the lock manager sends a message
back to the initiator indicating that it has granted the lock requests.
Advantages :
• The simple implementation.
• Reduces the degree to which the coordinator is a bottleneck.
5.8.12 Distributed Recovery
The recovery process in distributed database is quite involved. We give only a very brief idea of
some of the issue here.
In some cases it is quite difficult even to determine whether a site is down without exchanging
numerous message with other sites.
For Example: Suppose that site X sends a message to site Y and expects a response from Y but
does not receive it.
There are several possible explanations :
• The message was not delivered to Y because of communication failure.
• Site Y is down and could not respond.
• Site Y is running and sent a response, but the response was not delivered.
Another problem with distributed recovery is distributed commit. When a transaction is updating
data at several sites, it cannot commit until it is sure that the effect of the transaction on every site
cannot be lost.
This means that every site must first have recorded the local effects of the transactions
permanently in the local site log on disk. The two phase commit protocol is often used to ensure the
correctness of distributed commit.
5.8.13 Two-Phase Commit Protocol
This protocol assumes that one of the cooperating processes acts as a coordinators, and other
processes are as cohorts.
• At the begining of the transaction, the coordinator sends a start transaction message to every
cohort.
Phase I : At the Coordinator:
(1) The Coordinator sends a COMMIT REQUEST message to every cohort requesting the
cohorts to commit.
(2) The Coordinator waits for replies from all the cohorts.
At Cohorts: On receiving the COMMIT-REQUEST message a cohort takes the following
actions.
• If the transaction executing at the cohort is successful, it writes 'UNDO' and 'REDO' log on
the stable storage and send on 'AGREED' message to coordinator.
• Otherwise, it sends an ABORT message to the coordinator.
Phase II : At the Coordinator :
(1) If all the cohorts reply AGREED and the coordinator also agrees, then the coordinator
writes a 'COMMIT' record into the log. Then it sends a COMMIT message to all the cohorts.
Otherwise, the coordinator sends an ABORT message to all the cohorts.
CONCURRENCY CONTROL TECHNIQUES 203
(2) The coordinator then waits for acknowledgments from each cohort.
(3) If an acknowledgment is not received from any cohort within a time out period, the
coordinator resends the COMMIT/ABORT message to that cohort.
(4) If all the acknowledgement are received, the coordinator writes a COMPLETE record to
the log.
At Cohorts:
(1) On receiving a COMMIT message, a Cohort release all the resources and lock held by it for
executing the transaction, and sends an acknowledgment.
(2) On receiving an ABORT message, a cohort undoes the transaction using the UNDO log
record, release all the resources and locks held by it for performing the transaction and send
an acknowledgment.
5.8.14 Handling of Failures
The two-phase commit protocol responds in different ways to various types of failures.
(1) Failure of Participating Site: If the coordinator Ci detects that a site has failed, it takes these
actions:
• If the site fails before responding with a ready T message to Ci, the coordinator assumes that
it responded with an ABORT T message.
• If the site fails after the coordinator has received the READY T message from the site, the
coordinator executes the rest of the commit protocol in the normal way, ignoring the failure
of the site.
When a participating site Si recovers from a failure, it most examine its log to determine the fate
of those transactions that were in the midst of execution when the failure occured.
We consider each of the possible cases.
• The log contains a < COMMIT T > record. In this case, the site executes REDO (T).
• The log contains an < ABORT T > record. In this case, the site executes UNDO (T).
• The log contains a < READY T > record. In this case, the site must consult Ci to determine
the fate of T.
• The log contains no control records (ABORT, COMMIT, READY) concerning T.
(2) Failure of the Coordinator: If the coordinator fails in the midst of execution of the commit
protocol for transaction T, then the participating sites must decide the fate of T.
In certain cases, the participating sites cannot decide whether to COMMIT or ABORT T, &
therefore these sites must wait for the recovery of the failed coordinator.
• If an active site contains a < COMMIT T > record in its log, then T must be committed.
• If an active site contains an < ABORT T ,> record in its log, then T must be aborted..
• If some active site does not contain a < READY T > record in its log, then the failed
coordinator Ci cannot have decided to commit T, because a site that does not have a <
READY T > record in its log cannot have sent a READY T message to Ci. However, the
coordinator may have decided to ABORT T, but not to COMMIT T. Rather than wait for Ci
to recover, it is preferable to abort T.
• If none the preceding cases holds, then all active sites must have a < READY T > record in
their log, but no additional control records such as < ABORT T > or < COMMIT T >.
204 DATABASE MANAGEMENT SYSTEMS
Solved Problems
Q. 1. Write down the method of write-ahead-logging mechanism for data recovery. (UPTU 2006)
Ans. Write-Ahead Logging (WAL) : Write-Ahead Logging (WAL) is a standard approach to
transaction logging.
WA~s central concept is that changes to data fIles (where tables and indexes reside) must be
written only after those changes have been logged, that is when log records have been flushed to
permanent storage.
If we follow this procedure, we do not need to flush data pages to disk on every transaction
commit, because we know that in the event of a crash we will be to recover the database using the log.
• Any changes that have not been applied to the data pages will first be redone from the log
records, this is roll forward recovery, also known as REDO.
• And then the changes made by uncommitted transactions will be removed from the data pages,
this is roll backward recovery UNDO.
For example: Consider the following Write-Ahead Logging (WAL) protocol for a recovery
Algorithm that requires both UNDO & REDO :
(1) The before image of an item cannot be over written by its after image in the database on
disk until all UNDO-type log records for the updating transaction-upto this point in time
have been force-written to disk.
(2) The commit operation of a transaction cannot be completed until all the REDO-type &
UNDO-type log records for that transaction have been force-written to disk.
Advantages of WAL :
• Write-Ahead-Logging is techniques for providing atomicity and durability in database system.
• In a system using WAL, all modifications are written to a log before they are applied to the
database. Usually both REDO and UNDO information is stored in the log.
• WAL is a significantly reduced number of disk writes, since only the log fIle needs to be flushed
to disk at the time of transaction commit.
• The next advantage is consistency of the data pages.
WAL saves the entire data page content in the log if that is required to ensure page consistency
for after crash recovery.
Q. 2. Show how the backward recovery technique is applied to a DBMS ? (UPTU 2002, 03)
Ans. Backward-Recovery Technique : At restart time, the system goes through the following
procedure in order to identify all transactions of 'JYpe T 2 - T 5
Time
l'
R
A Tl I
N
S 1'2
A T3
C
l' T-l
I
T5
0 Check point System failure
N (time tc ) (time tf)
(1) Start with two lists of transactions, the UNDO list and the REDO list. Set the UNDO list
equal to the list -of all transactions given in the most recent checkpoint record; set the- REDO
list to empty.
(2) Search forward through the log, starting from the checkpoint record.
(3) If a BEGIN TRANSACTION log entry is found for transaction T, add T to the UNDO list.
(4) If a COMMIT log entry is found for transaction T, move T from UNDO list to the REDO
list.
(5) When the end of the log is reached the UNDO & REDO list identify, respectively,
transaction of types T 3 & T 5 & transaction of types T 2 & T 4.
The system now works backward through the log, undoing the transactions in the UNDO-list.
Restoring the database to a consistent state by undoing work is called Backward Recovery.
Q. 3. Consider the following Transactions
T} : read (A);
read (B);
if A = 0 then B : = B + 1;
write (B);
T2 : read (A);
read (B);
if B = 0 then A : = A + 1;
write (A);
Add lock & unlock instruction to transactions Tl & T'O so that they observe the two phase locking
protocol. (UPTU 2003, 04)
Ans. Lock-X = Exclusive lock
Lock-S = Shared lock
Tl T2
Lock - S (A)
Lock - X (8)
read (A)
read (8)
ifA=Othen8:=8+1
write (8)
Unlock - S (A)
Unlock - X (8)
Lock - S (8)
Lock - X (A)
read (A)
read (8)
if 8 = 0 then A : = A + 1
write (A)
Unlock - X (A)
Unlock - S (8)
206 DATABASE MANAGEMENT SYSTEMS
Q.4. State whether the following schedule is conflict serializable or not. Justify your answer.
(UPTU 2003, 04)
Tl T2
Read (A)
Write (A)
Read (A)
Write (A)
Read (8)
Write (8)
Read (8)
Write (8)
Ans. In this schedule, the write (A) of T 1 conflicts with the Read (A) of T2, while write (A) of
T2 does not conflict with the Read (B) of T1. Because these two instructions access different data
items.
We can swap nonconflicting instructions.
• Swap Read (B) of Tl with Read (A) of T2'
• Swap write (B) of Tl with write (A) of T2.
• Swap write (B) of Tl with Read (A) of T2.
After the swapping the schedule.
Tl T2
Read (A)
Write (A)
Read (A)
Read (B)
Write (A)
Write (B)
Read (B)
Write (B)
The concept of conflict equivalence leads to the concept of conflict serializability.
Thus we can say that it is a conflict equivalence.
Hence it is the conflict serializable.
Q. 5. Which of the following schedules is conflict serializable ? For each serializable schedule
detennine the equivalent serial schedules.
(1) r1 (x); r3 (x); W1(X), r2(x); w3(x);
(2) "1 (x); "3 (x); w3(X), w1(x); r2(x);
(3) "3 (x); "2 (x); w3(X), r1(x); w1(x); (UPTU 2004, 05)
CONCURRENCY CONTROL TECHNIQUES 207
Ans. (1)
Tl T2 T3
fleX)
f3(X)
WI (X)
f2(X)
W3(X)
We can swap the following :
• Swap fl(x) of TI with f3(x) of T 3.
• Swap f2(x) of T2 with f3(x) of T3.
Now aftef the swapping we get the schedule.
Tl T2 T3
f3(X)
fleX)
WI (X)
f2(X)
W3(X)
Hence the given schedule is conflict sefializable.
(2)
Tl T2 T3
fleX)
f3(X)
W3(X)
WI (X)
f2(X)
We can swap
• Swap r1 (x) of T 1 with f3(x) of T 3.
After the swapping the schedule.
Tl T2 T3
f3(X)
q(x)
W3(X)
WI (X)
f2(X)
208 DATABASE MANAGEMENT SYSTEMS
Thus the given schedule is conflict serializable schedule because we can swap rl(x) of Tl with
r3(x) of T 3.
(3)
Tl T2 T3
r3(x)
r2(x)
W3(X)
. q(x)
Wl(X)
We can swap the following :
• Swap r3(x) of T 3 with r2(x) of T 2.
• Swap r2(x) of T2 with rl(x) of T 1.
After swapping the schedule.
Tl T2 T3
f2(x)
r3(x)
W3(X)
q(x)
WI (x)
Now the new serial schedule.
TJ, T2 & T3 are.
r2 (x); r3 (x); W3(X), rl(x); Wl(X);
Hence the schedule is conflict serializable.
Q. 6. Consider the precedence graph in given Fig. Is the cOm!sponding schedule conflict serializable?
Explain your Answer. (UPTU 2006, 07)
Ans. In the given precedence graph, the set of edges of graph holds one of the three conditions.
Tl -T2 (edge)
• Tl executes write (0) before T2 executed read (0)
• Tl executes read (0) before T2 executed write (0)
• Tl executes write (0) before T2 executed write (0)
CONCURRENCY CONTROL TECHNIQUES 209
An Edge T2 -T3
• T2 executed write (0) before T3 executed read (0)
• T2 executes read (0) before T3 executed write (0)
• T2 executes write (0) before T3 executed write (0)
An Edge T3 -Ts
• T3 executes write (0) before Ts executed read (0).
• T3 executes read (0) before Ts executed write (0).
• T3 executes write (0) before Ts executed write (0).
An Edge Tl -T4
• T 1 executes write (0) before T4 executed read (0).
• Tl executes read (0) before T4 executed write (0).
• Tl executes write (0) before T4 executed write (0).
An Edge Tl -T3
• Tl executes write (0) before T3 executed read (0).
• Tl executes read (0) before T3 executed write (0).
• Tl executes write (0) beforl:; T3 executed write (0).
An Edge T2 -T4 : Since all instructions of T2 are executed before the instruction of T4 is
executed.
Similarly: An Edge T4 - Ts
Since all instructions of T4 are executed before the instruction of Ts is executed.
We know that
"If the precedence graph contain the cycle, then the schedule is not conflict serializable".
From the above description the precedence graph does not contain the cycle.
Hence corresponding schedule of the precedence graph is conflict serializable.
Q. 7. Consider the precedence graph in Fig. Is the co"esponding schedule conflict serializable ?
Explain your answer. (UPTU 2004, 05)
Ans. The set of edges of graph holds one of the three conditions.
Tt-Tj edges
• T j executes write (0) before Tj executed read (0).
• T j executes read (0) before Tj executed write (0).
• T j executes write (0) before Tj executed write (0).
An edge T1 - T2 edges : Since all instructions of T1 are executed before the instructions of
T2 is executed.
An edge T1 - T3 : Since all instructions of T1 are executed before the instructions of T3 is
executed.
An edge Tl - T4 : Since all instructions of Tl are executed before the instructions of T4 is
executed.
210 DATABASE MANAGEMENT SYSTEMS
An edge T2 ---+ T4 : Since all instructions of T2 are executed before the instructions of T4 is
executed.
An edge T3 ---+ T5 : Since all instructions of T2 are executed before the instructions of T5 is
executed.
From the above descriptions, the given precedence graph does not contain a cycle.
Since if a precedence graph contain a cycle, then the corresponding schedule is not conflict
serializable.
Hence the corresponding schedule is conflict serializable.
Q. 8. Consider the precedence graph in Fig. Is the corresponding schedule conflict serializable?
Explain your answer.
Ans. The set of edges of graph holds one of the three conditions.
An edge T j ---+Tj edges
• T j executes write (0) before Tj executed read (0).
• T j executes read (0) before Tj executed write (0).
• T j executes write (0) before Tj executed write (0).
An edge T1 ---+T2 : Since all instructions of T1 are executed before the instructions of T2 is
executed.
An edge T1 ---+ T3 : Since all instructions of T1 are executed before the instructions of T3 is
executed.
An edge T2 ---+ T4 : Since all instructions of T2 are executed before the instructions of T4 is
executed.
An edge T4 ---+ T1 : All the instructions of T4 are executed before the instructions of Tl is
executed.
Since the precedence graph contains a cycle between the T 1, T2 & T4 hence the corresponding
schedule is not conflict serializable.
Q.9. Explain 'Blind-Writes' operation with help of example.
Ans.
Tl T2 T3
Read (A)
Write (A)
Write (A)
Write (A)
In above schedule, transactions T 2 & T3 perform write (A) operations without performed a read
(A) operation. Writes of this sort are called blind writes.
Note : Blind writes appear in any view-serializable schedule but not in conflict serializable.
CONCURRENCY CONTROL TECHNIQUES 211
= 2000 + 4000
= 6000 blocks
The size of natural join of RI & R2 ::::; IRII* IR21
= 20000 * 10000
Q. 11. Estimation of cost and optimization of tuple transfer for join in distributed database.
(UPTU 2006, 07)
Ans. In a distributed system, several additional factors further complicate query processing. The
first is the cost of transferring data over the network. This data includes intermediate files that are
transferred to other sites for further processing, as well as the final result files that may have to be
transferred to the site where the query result is needed.
Although these are connected via a high performance local area network, they become quite
significant in other types of networks.
Hence, DDBMS query optimization algorithms consider the goal of reducing the amount of data
transfer as an optimization criterion in choosing a distributed query execution strategy.
We illustrate this with simple example query.
Suppose that the EMPLOYEE and DEPARTMENT relations are distributed in figure.
212 DATABASE MANAGEMENT SYSTEMS
SITE 1
EMPLOYEE
ADDRESS
10,000 records
Each record is 100 bytes long SSN field is 9 bytes long FNAME field is 15 bytes long, LNAME
is 15 bytes long, D~O field is 4 bytes long
SITE 2
DEPARTMENT
100 records, each record is 35 bytes long DNUMBER field is 4 bytes long, DNAME is 10 bytes
long, MGRSSN field is 9 bytes long.
We will assume in this example that neither relation is fragmented. According to figure, the size
of the EMPLOYEE relation is 100 X 10,000 = Hf bytes, and the size of the DEPARTMENT relation
is 35 X 100 = 3500 bytes.
Consider the query Q
"For each employee, retrieve the employee name and the department name of which the
employee works"
JrFNAME, MNAME, LNAME, DNAME (EMPLOYEE [><] DNO=DNUMBER DEPARTMENT)
The result of this query will include 10,000 records, assuming that every employee is related to
a department. Suppose that each record in the query result is 40 bytes long.
The query is submitted at a distinct site 3, which is called the result site because the query result
is needed there. Neither the EMPLOYEE nor the DEPARTMENT relations reside at site 3.
These are three simple strategies for executing this distributed query:
1. Transfer both the EMPLOYEE and the DEPARTMENT relation to the result site, and
perform the join at site 3.
In this case total of 1,000,000 + 3500 = 1,003,500 bytes must be transferred.
2. Transfer the EMPLOYEE relation to site 2 execute the join at site 2, and send the result to
site 3.
The size of the query result is 40 X 10,000 = 400,000 bytes
so 400,000 + 1,000,000 = 1,400,000 bytes
must be trnsferred.
3. Transfer the DEPARTMENT relation to site 1, execute the join at site 1, and send the result
to site 3.
In this case 400,000 + 3500 = 403,500 bytes must be tranferred.
In minimizing the amount of data transfer is our optimzation criterion, we should choose
strategy 3.
Now consider another query Q1 :
"For each department, retrieve the department name and the name of the department manager"
The query Q1 :
Jr F!\,A:\1E. :\1:\,AME, LNAME, DNAME DEPARTMENT 1><1 MGRSSN = SSN EMPLOYEE)
CONCURRENCY CONTROL TECHNIQUES 213
Suppose that the result site is site 2, then we have two simple strategies :
1. Transfer the EMPLOYEE relation to site 2, execute the query, and present the result to the
user at site 2. Hete the same number of bytes 1,000,000 must be transferred for both Q and
Ql.
2. Transfer the DEPARTMENT relation to site 1, execute the query at site 1, and send the
result back to site 2.
In this case 400,000 + 3500 = 403,500 bytes must be transferred for Q and
4000 + 3500 = 7500 bytes for Ql.
Review Questions
1. What do you mean by concurrent execution of transaction? What are locks? Explain two phase
locking protocol in brief.
2. What is concurrency control? What are its objectives?
3. What are the different types of locks?
4. What is a timestamp? Discuss the timestamp ordering techniques for concurrency control.
5. List the salient features of two-phase locking protocol. Prove that two-phase locking ensures
serializability.
6. Consider the following transactions :
Tl Read (A)
Read (B)
°
If A = then B : = B + 1
Write (B)
T2 Read (B)
Read (A)
°
If B = then A : = A + 1
Write (A)
(a) Add lock and unlock instructions to transactions Tl and T 2, so that they observe the
two-phase locking protocol.
(b) Can the execution of these transactions result in a deadlock?
7. What is multiple-granularity locking? What is the difference between implicit and explicit locking
in multiple-granularity locking?
8. What is difference between exclusive lock and shared lock? Explain with example.
9. Explain the concurrency control scheme based on timestamping protocol.
10. Explain the validation concurrency control techniques.
11. What do you mean by multiversion scheme? Explain, what are W-time stamp and R-time stamp
of Nth version of data object.
12. What is the optimistic method of concurrency control? Discuss the different phases through
which a transaction moves during optimistic control.
13. List the advantages, problems and applications of optimistic method of concurrency control.
14. What is distributed database? Explain with a neat diagram.
15. What are the main advantages and disadvantages of distributed databases.
214 DATABASE MANAGEMENT SYSTEMS
16. What do you mean by architecture of a distributed database system? What are different types of
architecture? Discuss each of them with neat sketch.
17. What are the various types of distributed databases? Discuss in detail.
18. What are homogeneous DDBS? Explain in details.
19. What are heterogeneous DDBS? Explain in details.
20. What is a fragmenting a relation? What are the main types of data fragments? Why IS
fragmentation a useful concept in distributed database design?
21. What is horizontal data fragmentation? Explain with an example.
22. What is vertical data fragmentation? Explain with an example.
23. What is mixed data fragmentation? Explain with an example.
24. What is data replication? Why is data replication useful in a DDBMS? What typical units of data
replicated?
25. What is data allocation? Discuss.
26. What do you mean by data replication? What are its advantages and disadvantages?
27. Explain the difference among following:
(a) Homogeneous DDBMS and heterogeneous DDBMS.
(b) Horizontal fragmentation and vertical fragmentation.
28. Explain the difference among fragmentation, transparency, replication transparency and location
transparency.
29. Consider a failure occurs during two-phase commit for a transaction in distributed environment.
Explain two-phase commit protocol.
30. Explain the term transaction system. How is the transaction processing done in distributed
database?
31. Explain the algorithm: Read-before-write protocol.
32. Write a short notes on the following :
(a) Distributed database (b) Data fragmentation
(c) Data allocation (d) Data replication
(e) Timestamping (f) 1\vo-phase commit protocol.
DOD