0% found this document useful (0 votes)
34 views53 pages

Unit VI Transaction Processing, Concurrency Control and Recovery Techniques

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views53 pages

Unit VI Transaction Processing, Concurrency Control and Recovery Techniques

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Unit VI: Unit VI Transaction Processing,

Concurrency Control and Recovery Techniques

 ACID Properties
 Transaction States
 Implementation of Atomicity and
Durability
 Serializability
 Basic Concept of Concurrency Control and
Recovery
 Locking Protocols
 Time Stamp Based Protocol
Transaction Concept
 A transaction is a unit of program execution
that accesses and possibly updates various
data items.
 A transaction must see a consistent database.
 During transaction execution the database may
be inconsistent.
 When the transaction is committed, the
database must be consistent.
 Two main issues to deal with:
 Failures of various kinds, such as hardware failures
and system crashes
 Concurrent execution of multiple transactions
 Transaction : A collection of actions that
transforms the DB from one consistent state
into another state; during the execution the
DB might be inconsistent.
ACID Properties
To preserve integrity of data, the database system must ensure:
 Atomicity. Either all operations of the transaction
are properly reflected in the database or none are.
 Consistency. Execution of a transaction in isolation
preserves the consistency of the database.
 Isolation. Although multiple transactions may
execute concurrently, each transaction must be
unaware of other concurrently executing transactions.
Intermediate transaction results must be hidden from
other concurrently executed transactions.
 That is, for every pair of transactions Ti and Tj, it appears
to Ti that either Tj, finished execution before Ti started, or
Tj started execution after Ti finished.
 Durability. After a transaction completes
successfully, the changes it has made to the
database persist, even if there are system failures.
Example of Fund Transfer
 Transaction to transfer $50 from account A to
account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
 Consistency requirement – the sum of A and B is
unchanged by the execution of the transaction.
 Atomicity requirement — if the transaction fails
after step 3 and before step 6, the system
should ensure that its updates are not reflected
in the database, else an inconsistency will
result.
Example of Fund Transfer (Cont.)
 Durability requirement — once the user has been
notified that the transaction has completed (i.e., the
transfer of the $50 has taken place), the updates to
the database by the transaction must persist despite
failures.
 Isolation requirement — if between steps 3 and 6,
another transaction is allowed to access the partially
updated database, it will see an inconsistent
database
(the sum A + B will be less than it should be).
Can be ensured trivially by running transactions
serially, that is one after the other. However,
executing multiple transactions concurrently has
significant benefits, as we will see.
Transaction State
 Active, the initial state; the transaction stays in
this state while it is executing
 Partially committed, after the final statement
has been executed.
 Failed, after the discovery that normal
execution can no longer proceed.
 Aborted, after the transaction has been rolled
back and the database restored to its state prior
to the start of the transaction. Two options after
it has been aborted:
 restart the transaction – only if no internal logical error
 kill the transaction
 Committed, after successful completion.
Transaction State (Cont.)
Implementation of Atomicity and Durability

 The recovery-management component of a


database system implements the support for
atomicity and durability.
 The shadow-database scheme:
 assume that only one transaction is active at a time.
 a pointer called db_pointer always points to the
current consistent copy of the database.
 all updates are made on a shadow copy of the
database, and db_pointer is made to point to the
updated shadow copy only after the transaction
reaches partial commit and all updated pages have
been flushed to disk.
 in case transaction fails, old consistent copy pointed
to by db_pointer can be used, and the shadow
copy can be deleted.
Implementation of Atomicity and Durability (Cont.)

The shadow-database scheme:

 Assumes disks to not fail


 Useful for text editors, but extremely inefficient for
large databases: executing a single transaction
requires copying the entire database. Will see better
schemes in Chapter 17.
 Isolation
 Isolation is the property of transactions which
requires each transaction to see a consistent DB at all
times.
 If two concurrent transactions access a data
item that is being updated by one of them (i.e.,
performs a write operation), it is not possible to
guarantee that the second will read the correct
value
 Interconsistency of transactions is obviously
achieved if transactions are executed serially
 Therefore, if several transactions are executed
concurrently, the result must be the same as if
they were executed serially in some order
(→serializability)
 SQL-92 specifies 3 phenomena/situations that
occur if proper isolation is not maintained
 Dirty read
∗ T1 modifies x which is then read by T2 before T1
terminates; if T1 aborts, T2 has read value which never
exists in the DB:
 Non-repeatable (fuzzy) read
∗ T1 reads x; T2 then modifies or deletes x and commits; T1
tries to read x again but reads a different value or can’t
find it
 Phantom
∗ T1 searches the database according to a predicate P while
T2 inserts new tuples that satisfy P
 Based on the 3 phenomena, SQL-92 specifies
different isolation levels:
 Read uncommitted
∗ For transactions operating at this level, all three
phenomena are possible
 Read committed
∗ Fuzzy reads and phantoms are possible, but dirty reads are
not
 Repeatable read
∗ Only phantoms possible
 Anomaly serializable
∗ None of the phenomena are possible
Concurrent Executions
 Multiple transactions are allowed to run
concurrently in the system. Advantages are:
 increased processor and disk utilization,
leading to better transaction throughput: one
transaction can be using the CPU while another is
reading from or writing to the disk
 reduced average response time for
transactions: short transactions need not wait
behind long ones.
 Concurrency control schemes – mechanisms
to achieve isolation, i.e., to control the
interaction among the concurrent transactions
in order to prevent them from destroying the
consistency of the database
Schedules
 Schedules – sequences that indicate the
chronological order in which instructions of
concurrent transactions are executed
 a schedule for a set of transactions must consist of
all instructions of those transactions
 must preserve the order in which the instructions
appear in each individual transaction.
Example Schedules
 Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
The following is a serial schedule (Schedule 1 in the text), in which T1 is followed
by T2.
Example Schedule (Cont.)
 Let T1 and T2 be the transactions defined previously.
The following schedule (Schedule 3 in the text) is not
a serial schedule, but it is equivalent to Schedule 1.

In both Schedule 1 and 3, the sum A + B is preserved.


Example Schedules (Cont.)
 The following concurrent schedule (Schedule 4
in the text) does not preserve the value of the
the sum A + B.
Serializability
 Basic Assumption – Each transaction preserves
database consistency.
 Thus serial execution of a set of transactions
preserves database consistency.
 A (possibly concurrent) schedule is serializable if it is
equivalent to a serial schedule. Different forms of
schedule equivalence give rise to the notions of:
1.conflict serializability
2.view serializability
 We ignore operations other than read and write
instructions, and we assume that transactions may
perform arbitrary computations on data in local
buffers in between reads and writes. Our simplified
schedules consist of only read and write instructions.
Conflict Serializability
 Instructions li and lj of transactions Ti and Tj
respectively, conflict if and only if there exists some
item Q accessed by both li and lj, and at least one of
these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
 Intuitively, a conflict between li and lj forces a (logical)
temporal order between them. If li and lj are
consecutive in a schedule and they do not conflict,
their results would remain the same even if they had
been interchanged in the schedule.
Conflict Serializability (Cont.)
 If a schedule S can be transformed into a schedule S´
by a series of swaps of non-conflicting instructions,
we say that S and S´ are conflict equivalent.
 We say that a schedule S is conflict serializable if it
is conflict equivalent to a serial schedule
 Example of a schedule that is not conflict serializable:
T3 T4
read(Q)
write(Q)
write(Q)

We are unable to swap instructions in the above


schedule to obtain either the serial schedule < T3, T4
>, or the serial schedule < T4, T3 >.
Conflict Serializability (Cont.)
 Schedule 3 below can be transformed
into Schedule 1, a serial schedule where
T2 follows T1, by series of swaps of non-
conflicting instructions. Therefore
Schedule 3 is conflict serializable.
View Serializability
 Let S and S´ be two schedules with the same set of
transactions. S and S´ are view equivalent if the
following three conditions are met:
1.For each data item Q, if transaction Ti reads the initial value
of Q in schedule S, then transaction Ti must, in schedule S´,
also read the initial value of Q.
2.For each data item Q if transaction Ti executes read(Q) in
schedule S, and that value was produced by transaction Tj
(if any), then transaction Ti must in schedule S´ also read
the value of Q that was produced by transaction Tj .
3.For each data item Q, the transaction (if any) that performs
the final write(Q) operation in schedule S must perform the
final write(Q) operation in schedule S´.
As can be seen, view equivalence is also based purely
on reads
and writes alone.
View Serializability (Cont.)
 A schedule S is view serializable it is view
equivalent to a serial schedule.
 Every conflict serializable schedule is also view
serializable.
 Schedule 9 (from text) — a schedule which is view-
serializable but not conflict serializable.

 Every view serializable schedule that is not conflict


serializable has blind writes.
Lock-Based Protocols
 A lock is a mechanism to control concurrent
access to a data item
 Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both
read as well as
written. X-lock is requested using lock-X
instruction.
2. shared (S) mode. Data item can only be read.
S-lock is
requested using lock-S instruction.
 Lock requests are made to concurrency-control
manager. Transaction can proceed only after
request is granted.
Lock-Based Protocols (Cont.)
 Lock-compatibility matrix

 A transaction may be granted a lock on an item if the


requested lock is compatible with locks already held on
the item by other transactions
 Any number of transactions can hold shared locks on
an item, but if any transaction holds an exclusive on
the item no other transaction may hold any lock on the
item.
 If a lock cannot be granted, the requesting transaction
is made to wait till all incompatible locks held by other
transactions have been released. The lock is then
granted.
Lock-Based Protocols (Cont.)
 Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
 Locking as above is not sufficient to guarantee serializability —
if A and B get updated in-between the read of A and B, the
displayed sum would be wrong.
 A locking protocol is a set of rules followed by all
transactions while requesting and releasing locks. Locking
protocols restrict the set of possible schedules.
Pitfalls of Lock-Based Protocols
 Consider the partial schedule

 Neither T3 nor T4 can make progress — executing lock-S(B)


causes T4 to wait for T3 to release its lock on B, while executing
lock-X(A) causes T3 to wait for T4 to release its lock on A.
 Such a situation is called a deadlock.
 To handle a deadlock one of T or T must be
3 4
rolled back
and its locks released.
Pitfalls of Lock-Based Protocols (Cont.)
 The potential for deadlock exists in most
locking protocols. Deadlocks are a necessary
evil.
 Starvation is also possible if concurrency
control manager is badly designed. For
example:
 A transaction may be waiting for an X-lock on an
item, while a sequence of other transactions
request and are granted an S-lock on the same
item.
 The same transaction is repeatedly rolled back
due to deadlocks.
 Concurrency control manager can be designed
to prevent starvation.
The Two-Phase Locking Protocol
 This is a protocol which ensures conflict-
serializable schedules.
 Phase 1: Growing Phase
 transaction may obtain locks
 transaction may not release locks
 Phase 2: Shrinking Phase
 transaction may release locks
 transaction may not obtain locks
 The protocol assures serializability. It can be
proved that the transactions can be serialized in
the order of their lock points (i.e. the point where
a transaction acquired its final lock).
 Two-phase locking protocol
 Each transaction is executed in two phases

∗ Growing phase: the transaction obtains locks


∗ Shrinking phase: the transaction releases locks
 The lock point is the moment when transitioning from
the growing phase to the shrinking phase
The Two-Phase Locking Protocol
(Cont.)

 Two-phase locking does not ensure freedom from deadlocks


 Cascading roll-back is possible under two-phase locking. To
avoid this, follow a modified protocol called strict two-
phase locking. Here a transaction must hold all its
exclusive locks till it commits/aborts.
 Rigorous two-phase locking is even stricter: here all
locks are held till commit/abort. In this protocol transactions
can be serialized in the order in which they commit.
The Two-Phase Locking Protocol
(Cont.)
 There can be conflict serializable schedules
that cannot be obtained if two-phase locking
is used.
 However, in the absence of extra information
(e.g., ordering of access to data), two-phase
locking is needed for conflict serializability in
the following sense:
Given a transaction Ti that does not follow
two-phase locking, we can find a transaction Tj
that uses two-phase locking, and a schedule
for Ti and Tj that is not conflict serializable.
Implementation of Locking
 A Lock manager can be implemented as a separate
process to which transactions send lock and unlock
requests
 The lock manager replies to a lock request by sending
a lock grant messages (or a message asking the
transaction to roll back, in case of a deadlock)
 The requesting transaction waits until its request is
answered
 The lock manager maintains a datastructure called a
lock table to record granted locks and pending
requests
 The lock table is usually implemented as an in-memory
hash table indexed on the name of the data item being
locked
Lock Table  Black rectangles indicate
granted locks, white ones
indicate waiting requests
 Lock table also records the type
of lock granted or requested
 New request is added to the
end of the queue of requests for
the data item, and granted if it
is compatible with all earlier
locks
 Unlock requests result in the
request being deleted, and later
requests are checked to see if
they can now be granted
 If transaction aborts, all waiting
or granted requests of the
transaction are deleted
 lock manager may keep a list of
locks held by each transaction, to
implement this efficiently
Timestamp-Based Protocols
 Each transaction is issued a timestamp when it enters the
system. If an old transaction Ti has time-stamp TS(Ti), a new
transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti)
<TS(Tj).
 The protocol manages concurrent execution such that the time-
stamps determine the serializability order.
 In order to assure such behavior, the protocol maintains for each
data Q two timestamp values:
 W-timestamp(Q) is the largest time-stamp of any transaction that
executed write(Q) successfully.
 R-timestamp(Q) is the largest time-stamp of any transaction that
executed read(Q) successfully.
Timestamp-Based Protocols (Cont.)

 The timestamp ordering protocol ensures that any conflicting


read and write operations are executed in timestamp order.
 Suppose a transaction Ti issues a read(Q)
1. If TS(Ti)  W-timestamp(Q), then Ti needs to read a value of
Q
that was already overwritten. Hence, the read operation is

rejected, and Ti is rolled back.


2. If TS(Ti) W-timestamp(Q), then the read operation is
executed, and R-timestamp(Q) is set to the maximum of R-
timestamp(Q) and TS(Ti).
Timestamp-Based Protocols (Cont.)

 Suppose that transaction Ti issues write(Q).


 If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is
producing was needed previously, and the system
assumed that that value would never be produced. Hence,
the write operation is rejected, and Ti is rolled back.
 If TS(Ti) < W-timestamp(Q), then Ti is attempting to write
an obsolete value of Q. Hence, this write operation is
rejected, and Ti is rolled back.
 Otherwise, the write operation is executed, and W-
timestamp(Q) is set to TS(Ti).
Example Use of the Protocol
A partial schedule for several data items for
transactions with
timestamps 1, 2, 3, 4, 5
T1 T2 T3 T4 T5
read(X)
read(Y)
read(Y)
write(Y)
write(Z)
read(Z)
read(X)
abort
read(X)
write(Z)
abort
write(Y)
write(Z)
Correctness of Timestamp-Ordering Protocol
 The timestamp-ordering protocol guarantees
serializability since all the arcs in the precedence
graph are of the form:

transaction transaction
with smaller with larger
timestamp timestamp

Thus, there will be no cycles in the precedence


graph
 Timestamp protocol ensures freedom from
deadlock as no transaction ever waits.
 But the schedule may not be cascade-free, and
may not even be recoverable.
Recoverability and Cascade Freedom
 Problem with timestamp-ordering protocol:
 Suppose Ti aborts, but Tj has read a data item written by Ti
 Then Tj must abort; if Tj had been allowed to commit earlier,
the schedule is not recoverable.
 Further, any transaction that has read a data item written
by Tj must abort
 This can lead to cascading rollback --- that is, a chain of
rollbacks
 Solution:
 A transaction is structured such that its writes are all
performed at the end of its processing
 All writes of a transaction form an atomic action; no
transaction may execute while a transaction is being written
 A transaction that aborts is restarted with a new timestamp
Thomas’ Write Rule
 Modified version of the timestamp-ordering protocol in
which obsolete write operations may be ignored
under certain circumstances.
 When Ti attempts to write data item Q, if TS(Ti) < W-
timestamp(Q), then Ti is attempting to write an
obsolete value of {Q}. Hence, rather than rolling back
Ti as the timestamp ordering protocol would have
done, this {write} operation can be ignored.
 Otherwise this protocol is the same as the timestamp
ordering protocol.
 Thomas' Write Rule allows greater potential
concurrency. Unlike previous protocols, it allows some
view-serializable schedules that are not conflict-
serializable.
Deadlock Handling
 Consider the following two transactions:
T 1: write (X) T2: write(Y)
write(Y) write(X)
 Schedule with deadlock
T1 T2

lock-X on X
write (X)
lock-X on Y
write (X)
wait for lock-X on X
wait for lock-X on Y
Deadlock Handling

 System is deadlocked if there is a set of transactions such


that every transaction in the set is waiting for another
transaction in the set.
 Deadlock prevention protocols ensure that the system
will never enter into a deadlock state. Some prevention
strategies :
 Require that each transaction locks all its data items before it
begins execution (predeclaration).
 Impose partial ordering of all data items and require that a
transaction can lock data items only in the order specified by
the partial order (graph-based protocol).
More Deadlock Prevention Strategies
 Following schemes use transaction timestamps
for the sake of deadlock prevention alone.
 wait-die scheme — non-preemptive
 older transaction may wait for younger one to
release data item. Younger transactions never wait
for older ones; they are rolled back instead.
 a transaction may die several times before
acquiring needed data item
 wound-wait scheme — preemptive
 older transaction wounds (forces rollback) of
younger transaction instead of waiting for it.
Younger transactions may wait for older ones.
 may be fewer rollbacks than wait-die scheme.
Deadlock prevention (Cont.)
 Both in wait-die and in wound-wait schemes, a
rolled back transactions is restarted with its
original timestamp. Older transactions thus
have precedence over newer ones, and
starvation is hence avoided.
 Timeout-Based Schemes :
 a transaction waits for a lock only for a specified
amount of time. After that, the wait times out and
the transaction is rolled back.
 thus deadlocks are not possible
 simple to implement; but starvation is possible.
Also difficult to determine good value of the
timeout interval.
Deadlock Detection
 Deadlocks can be described as a wait-for graph, which
consists of a pair G = (V,E),
 V is a set of vertices (all the transactions in the system)
 E is a set of edges; each element is an ordered pair Ti Tj.
 If Ti  Tj is in E, then there is a directed edge from Ti to Tj,
implying that Ti is waiting for Tj to release a data item.
 When Ti requests a data item currently being held by Tj,
then the edge Ti Tj is inserted in the wait-for graph. This
edge is removed only when Tj is no longer holding a data
item needed by Ti.
 The system is in a deadlock state if and only if the wait-
for graph has a cycle. Must invoke a deadlock-detection
algorithm periodically to look for cycles.
Deadlock Detection (Cont.)

Wait-for graph without a cycle Wait-for graph with a cycle


Deadlock Recovery
 When deadlock is detected :
 Some transaction will have to rolled back (made a
victim) to break deadlock. Select that transaction
as victim that will incur minimum cost.
 Rollback -- determine how far to roll back
transaction
 Total rollback: Abort the transaction and then restart it.
 More effective to roll back transaction only as far as
necessary to break deadlock.
 Starvation happens if same transaction is always
chosen as victim. Include the number of rollbacks
in the cost factor to avoid starvation
 Types of Media with an increasing degree of reliability
 Main memory, magnetic disk, magnetic tape and
optical disk.
 Primary storage and secondary storage
 Volatile and Non Volatile.

 Failure Classifications
 Transaction Failures
 Logical Errors
 System Errors
 System Crash
 Disk Failure
Recovery Concepts
 Log Based Recovery
 In log based recovery system, a log is maintained, in
which all the modifications of the database are kept.
 A log consists of Log Records.
 A typical Log record must contain following fields:
 Transaction Identifier
 Data-item Identifier
 Date and time
 Old value
 New value
 Log must be written on the non-volatile (stable)
storage.
 In log based recovery, the following two operation for
recovery are required.
 Redo – it means, the work of the transactions that
completed successfully before crash is to be
performed again.
 Undo – It means, all the work done by the
transactions that did not completed due to crash is
to be undone.
[Ti, start]
[Ti, Aj ]
[Ti, Aj, v1 ,v2]
[Ti, Commit]
[Ti, Aborts]
 Caching (Buffering) of Disk Blocks
 Write – Ahead Logging
 Check Pointing

 Transaction Rollback and Cascading Rollback


 Recovery based on Deferred Update (No-UNDO/
REDO)
 Recovery Based on Immediate Update
(UNDO/REDO)
 Shadow Paging
 Database Backup and Recovery from
Catastrophic Failures

You might also like