Traacntions Dbms
Traacntions Dbms
REGULATION : 2017
3.1TRANSACTION CONCEPTS:
The term transaction refers to a collection of operations that form a single logical unit of
work.
Example: Transfer of money from one account to another is a transaction.
A transaction is initiated by a user program written in a high-level programming language
with embedded database accesses in JDBC or ODBC.
A transaction is delimited by statements of the form begin transaction and end
transaction.
write(X), which transfers the value in the variable X in the main-memory buffer of
the transaction that executed the write to the data item X in the database.
Example:
Let Ti be a transaction that transfers $50 from account A to account B. This transaction can
be defined as:
Ti : read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Let us now consider each of the ACID properties.
• Consistency: The consistency requirement here is that the sum of A and B be unchanged by the
execution of the transaction. This task may be facilitated by automatic testing of integrity
constraints.
Atomicity:The basic idea behind ensuring atomicity is this: The database system keeps track (on
disk) of the old values of any data on which a transaction performs a write. This information is
written to a file called the log. If the transaction does not complete its execution, the database
system restores the old values from the log to make it appear as though the transaction never
executed.
Suppose that, just before the execution of transaction Ti, the values of accounts A and B are
$1000 and $2000, respectively. Suppose that the failure happened after the write(A) operation but
before the write(B) operation. In this case, the values of accounts A and B reflected in the database
are $950 and $2000.
The system destroyed $50 as a result of this failure. In particular, we note that the sum A + B
is no longer preserved.
Ensuring atomicity is the responsibility of the database system; specifically, it is handled by
a component of the database called the recovery system.
Durability: The durability property guarantees that, once a transaction completes successfully, all
the updates that it carried out on the database persist, even if there is a system failure after the
transaction completes execution.
We can guarantee durability by ensuring that either:
1. The updates carried out by the transaction have been written to disk before the
transaction completes.
2. Information about the updates carried out by the transaction and written to disk is
sufficient to enable the database to reconstruct the updates when the database system
is restarted after the failure.
The recovery system of the database is responsible for ensuring durability.
Isolation: If several transactions are executed concurrently, their operations may interleave in some
undesirable way, resulting in an inconsistent state.
For example, the database is temporarily inconsistent while the transaction to transfer funds
from A to B is executing, with the deducted total written to A and the increased total yet to be
written to B. If a second concurrently running transaction reads A and B at this intermediate point
and computes A+B, it will observe an inconsistent value. Furthermore, if this second transaction
then performs updates on A and B based on the inconsistent values that it read, the database may be
left in an inconsistent
state even after both transactions have completed.
A way to avoid the problem of concurrently executing transactions is to execute transactions
serially—that is, one after the other.
Ensuring the isolation property is the responsibility of a component of the database system
called the concurrency-control system.
States of Transaction
The
state
diagram
correspondi
ng to a
transaction
is shown in
Figure.
A transaction must be in one of the following states:
Active: the initial state, the transaction stays in this state while it is executing.
Partially committed: after the final statement has been executed.
Failed: when the normal execution can no longer proceed.
Aborted: after the transaction has been rolled back and the database has been restored to
its state prior to the start of the transaction.
Committed: after successful completion.
3.3 SCHEDULES:
Schedule is defined as a sequence of instructions that specify the chronological order in
which instructions of concurrent transactions are executed.
A schedule is serializable if it is equivalent to a serial schedule.
A schedule where the operations of each transaction are executed consecutively without any
interference from other transactions is called serial sechedule.
Types of serializability are
1. Conflict Serializability
2. View Serializability
3.4 SERIALIZABILITY:
Conflict Serializability:
Instructions Ii and Ij, of transactions Ti and Tj respectively, conflict if and only if there exists
some item Q accessed by both Ii and Ij, and at least one of these instructions wrote Q.
1.Ii = read( Q), Ij = read( Q). Ii and Ij don't conflict.
2.Ii = read( Q), Ij = write( Q). They conflict.
3.Ii = write( Q), Ij = read( Q). They conflict.
4.Ii = write( Q), Ij = write( Q). They conflict.
If Ii and Ij are consecutive in a schedule and they do not conflict, their results would remain
the same even if they had been interchanged in the schedule.
Consider following schedule 3.
The write (A) of T1 conflicts with read (A) of T2. However, write (A) of T2 does not conflict
with read (B) of T1, because, the two instructions access different data items.
Because of no conflict, we can swap write (A) and read (B) instructions to generate a new
schedule 5.
Regardless of
initial system
state, schedule
3 and 5
generates same
result.
Swap the write (B) instruction of T1 with write (A) instruction of T2.
Swap the write (B) instruction of T1 with the read (A) instruction of T2 .
View Serializability:
A schedule S is view
serializable if it is view equivalent to
a serial schedule.
Let S and S0 be two schedules with the same set of transactions. S and S0 are view
equivalent if the following three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S,
then transaction Ti must, in schedule S0, also read the initial value of Q.
2. For each data item Q, if transaction Ti executes read(Q) in schedule S, and that
value was produced by transaction Tj (if any), then transaction Ti must in schedule
S0 also read the value of Q that was produced by transaction Tj.
3. For each data item Q, the transaction (if any) that performs the final write (Q)
operation in schedule S must perform the final write (Q) operation in schedule S0.
Every conflict serializable schedule is also view serializable.
The following schedule is view-serializable but not conflict serializable
In the above schedule, transactions T4 and T5 performs write(Q) operations without having
performed a read(Q) operation. Writes of this sort are called blind writes. View serializable
schedule with blind writes is not conflict serializable.
The
preced
ence
graph
for
schedu
le 1
contains a single edge T1→T2,since all the instructions of T1 are executed before the first
instruction of T2 is executed.
The precedence graph for schedule 2 contains a single edge T2→T1,since all the instructions
of T2 are executed before the first instruction of T1 is executed.
Consider the following schedule 4.
To test conflict serializability construct a precedence graph for given schedule. If the graph
contains cycle, the schedule is not conflict serializable. If the graph contains no cycle, the schedule
is conflict serializable.
Schedule 1 and schedule 2 are conflict serializable as the precedence graph for both
schedules does not contain any cycle. While the schedule 4 is not conflict serializable as the
precedence graph for it contains cycle.
Solution:
The precedence graph for given schedule is
Locks:
The two modes of locks are:
1. Shared. If a transaction Ti has obtained a shared-mode lock (denoted by S) on
item Q, then Ti can read, but cannot write, Q.
2. Exclusive. If a transaction Ti has obtained an exclusive-mode lock (denoted by X)
on item Q, then Ti can both read and write Q.
We require that every transaction request a lock in an appropriate mode on data item Q,
depending on the types of operations that it will perform on Q.
The transaction makes the request to the concurrency-control manager.
The transaction can proceed with the operation only after the concurrency-control manager
grants the lock to the transaction.
The use of these two lock modes allows multiple transactions to read a data item but limits
write access to just one transaction at a time.
To state this more generally, given a set of lock modes, we can define a compatibility
function on them as follows:
Let A and B represent arbitrary lock modes. Suppose that a transaction Ti requests a lock of
mode A on item Q on which transaction Tj (Ti = Tj ) currently holds a lock of mode B.
If transaction Ti can be granted a lock on Q immediately, in spite of the presence of the
mode B lock, then we say mode A is compatible with mode B. Such a function can be represented
conveniently by a matrix.
An element comp(A, B) of the matrix has the value true if and only if mode A is compatible
with mode B.
A transaction requests a shared lock on data item Q by executing the lock-S(Q) instruction.
Similarly, a transaction requests an exclusive lock through the lock-X(Q) instruction. A
transaction can unlock a data item Q by the unlock(Q) instruction.
To access a data item, transaction Ti must first lock that item. If the data item is already
locked by another transaction in an incompatible mode, the concurrency control manager will not
grant the lock until all incompatible locks held by other transactions have been released.
Thus, Ti is made to wait until all incompatible locks held by other transactions have been
released.
Transaction Ti may unlock a data item that it had locked at some earlier point.
Example:
Let A and B be two accounts that are accessed by transactions T1 and T2. Transaction T1
transfers $50 from account B to account A
T1: lock-X(B);
read(B);
B := B − 50;
write(B);
unlock(B);
lock-X(A);
read(A);
A := A + 50;
write(A);
unlock(A).
Transaction T1.
Transaction T2 displays the total amount of money in accounts A and B—that is, the sum
A+ B
T2: lock-S(A);
read(A);
unlock(A);
lock-S(B);
read(B);
unlock(B);
display(A + B).
Transaction T2.
Suppose that the values of accounts A and B are $100 and $200, respectively.
If these two transactions are executed serially, either in the order T1, T2 or the order T2, T1,
then transaction T2 will display the value $300.
If, however, these transactions are executed concurrently, then schedule 1is possible.
In this case, transaction T2 displays $250, which is incorrect.
The reason for this mistake is that the transaction T1 unlocked data item B too early, as a
result of which T2 saw an inconsistent state.
The
schedule
shows the
actions
executed
by the
transaction
s, as well
as the
points at
which the
concurrenc
y-control
manager
grants the
locks.
The
transaction
making a
lock
request
cannot
execute its
next action
until the
concurrenc
y control
manager grants the lock.
Hence, the lock must be granted in the interval of time between the lock -request operation
and the following action of the transaction.
Advantages:
The two-phase locking protocol ensures conflict serializability.
Consider any transaction, the point in the schedule where the transaction has obtained its
final lock is called the lock-point of the transaction. Now, the transactions can be ordered according
to their lock points. This ordering is the serializability ordering for the transactions.
Disadvantages
1. It does not ensure freedom from deadlock.
2. Cascading rollbacks
may occur under two-
phase locking.
Consider schedule 2
Fig.Partial schedule under two-phase locking
Here, the transactions T5, T6 and T7 are two phase, but failure of T5 after the read (A)
instruction of T7 leads to cascading rollback of T6 and T7.
Types:
1. Strict two phase locking protocol
This protocol requires that locking should be two phase, and all exclusive-mode locks taken by
a transaction should be held until the transaction. This requirement prevents any transaction from
reading the data written by any uncommitted transaction under exclusive mode until the transaction
commits, preventing any other transaction from reading the data.
Two-phase locking protocol allows lock conversions. There is a mechanism for upgrading a
shared lock to an exclusive lock, downgrading an exclusive lock to a shared lock. We donote the
conversion from shared to exclusive modes by upgrade, and from exclusive to shared by
downgrade. Lock conversion cannot be allowed arbitrarily.
Strict two-phase locking and rigorous two-phase locking (with lock conversions) are used
extensively in commercial database systems.
A simple but widely used scheme automatically generates the appropriate lock and unlock
instructions for a transaction, on the basis of read and write requests from the transaction:
When a transaction Ti issues a read (Q) operation, the system issues a lock-
S(Q) instruction followed by the read (Q) instruction.
When Ti issues a write (Q) operation, the system checks to see whether Ti
already holds a shared lock on Q. If it does, then the system issues an upgrade
(Q) instruction, followed by the write (Q) instruction. Otherwise, the system
issues a lock - X(Q) instruction, followed by the write (Q) instruction.
All locks obtained by a transaction are unlocked after that transaction commits or
aborts.
3.8 DEADLOCK:
Deadlock refers to a particular situation where two or more processes are each waiting for
another to release a resource, or more than two processes are waiting for resources in a circular
chain. Deadlock is a common problem in multiprocessing where many processes share a specific
type of mutually exclusive resource.
Some computers, usually those intended for the time-sharing and/or real-time markets, are
often equipped with a hardware lock, or hard lock, which guarantees exclusive access to processes,
forcing serialization.
Deadlocks are particularly disconcerting because there is no general solution to avoid them.
Livelock:
Livelock is a special case of resource starvation. A livelock is similar to a deadlock, except
that the states of the processes involved constantly change with regard to one another wile never
progressing.
The general definition only states that a specific process is not progressing.
For example, the system keeps selecting the same transaction for rollback causing the
transaction to never finish executing.
Another livelock situation can come about when the system is deciding which transaction
gets a lock and which waits in a conflict situation.
Deadlock Prevention:
To prevent any deadlock situation in the system, the DBMS aggressively inspects all the
operations which transactions are about to execute.
DBMS inspects operations and analyze if they can create a deadlock situation. If it finds that
a deadlock situation might occur then that transaction is never allowed to be executed.
There are deadlock prevention schemes, which uses time-stamp ordering mechanism of
transactions in order to pre-decide a deadlock situation.
Wait-Die Scheme:
In this scheme, if a transaction request to lock a resource (data item), which is already held
with conflicting lock by some other transaction, one of the two possibilities may occur:
▪ If TS(Ti) < TS(Tj), that is Ti, which is requesting a conflicting lock, is older than Tj,
Ti is allowed to wait until the data-item is available.
▪ If TS(Ti) > TS(tj), that is Ti is younger than Tj, Ti dies. Ti is restarted later with
random delay but with same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme:
In this scheme, if a transaction request to lock a resource (data item), which is already held
with conflicting lock by some other transaction, one of the two possibilities may occur:
▪ If TS(Ti) < TS(Tj), that is Ti, which is requesting a conflicting lock, is older than Tj,
Ti forces Tj to be rolled back, that is Ti wounds Tj. Tj is restarted later with random
delay but with same timestamp.
▪ If TS(Ti) > TS(Tj), that is Ti is younger than Tj, Ti is forced to wait until the resource
is available.
This scheme, allows the younger transaction to wait but when an older transaction request an
item held by younger one, the older transaction forces the younger one to abort and release the item.
In both cases, transaction, which enters late in the system, is aborted.
Deadlock Avoidance:
Aborting a transaction is not always a practical approach. Instead deadlock avoidance
mechanisms can be used to detect any deadlock situation in advance.
Methods like "wait-for graph" are available but for the system where transactions are light in
weight and have hold on fewer instances of resource. In a bulky system deadlock prevention
techniques may work well.
Wait-for Graph:
This is a simple method available to track if any deadlock situation may arise.
For each transaction entering in the system, a node is created.
When transaction Ti requests for a lock on item, say X, which is held by some other
transaction Tj, a directed edge is created from Ti to Tj. If Tj releases item X, the edge between them
is dropped and Ti locks the data item.
The system maintains this wait-for graph for every transaction waiting for some data items
held by others. System keeps checking if there's any cycle in the graph.
Two approaches can be used, first not to allow any request for an item, which is already
locked by some other transaction.
This is not always feasible and may cause starvation, where a transaction indefinitely waits
for data item and can never acquire it. Second option is to roll back one of the transactions.
It is not feasible to always roll back the younger transaction, as it may be important than the
older one.
With help of some relative algorithm a transaction is chosen, which is to be aborted, this
transaction is called victim and the process is known as victim selection.
Phase 1
(1) Write a begin_commit record to the log file and force-write it to stable storage.
Send a PREPARE message to all participants.
Wait for participants to respond within a timeout period.
Phase 2
(2) If a participant returns an ABORT vote,
Write an abort record to the log file and force write it to stable storage.
Send a GLOBAL_ABORT message to all participants.
Wait for participants to acknowledge within a timeout period.
(3) If a participant returns a READY_COMMIT vote,
Write a commit record to the log file and force-write it to stable storage.
Send a GLOBAL_COMMIT message to all participants.
Wait for participants to acknowledge within a timeout period.
(4) Once all acknowledgements have been received,
Write an end_transaction message to the log file.
Once a SAVEPOINT has been released, you can no longer use the ROLLBACK
command to undo transactions performed since the last SAVEPOINT.
Dirty Reads returns different results within a single transaction when an SQL operation an
uncommitted or modified record created by another transaction. Dirty Reads increases concurrency,
but reduces consistency.
Non-Repeatable Reads returns different results within a single transaction when an SQL operation
reads the same row in a table twice. Non-Repeatable Reads can occur when another transaction
modifies and commits a change to the row between transaction reads. Non-repeatable reads
increases consistency, but reduces concurrency.
Phantoms returns different results within a single transaction when an SQL operation retrieves a
range of data values twice. Phantoms can occur if another transaction inserted a new record and
committed the insertion between executions of the range retrieval.
Read Uncommitted – Read Uncommitted is the lowest isolation level. In this level, one
transaction may read not yet commited changes made by other transaction, thereby allowing
dirty reads. In this level, transactions are not isolated from each other.
Read Committed – This isolation level guarantees that any data read is committed at the
moment it is read. Thus it does not allows dirty read. The transaction hold a read or write lock
on the current row, and thus prevent other rows from reading, updating or deleting it.
Repeatable Read – This is the most restrictive isolation level. The transaction holds read locks
on all rows it references and write locks on all rows it inserts, updates, or deletes. Since other
transaction cannot read, update or delete these rows, consequently it avoids non repeatable read.
Serializable – This is the Highest isolation level. A serializable execution is guaranteed to be
serializable. Serializable execution is defined to be an execution of operations in which
concurrently executing transactions appears to be serially executing.
3.12 SQL FACILITIES FOR CONCURRENCY AND RECOVERY:
Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every
second. The durability and robustness of a DBMS depends on its complex architecture and its
underlying hardware and system software. If it fails or crashes amid transactions, it is expected that
the system would follow some sort of algorithm or techniques to recover lost data.
Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as
follows −
a) Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it
can’t go any further. This is called transaction failure where only a few transactions or processes
are hurt.
Reasons for a transaction failure could be −
Logical errors − Where a transaction cannot complete because it has some code error or
any internal error condition.
System errors − Where the database system itself terminates an active transaction because
the DBMS is not able to execute it, or it has to stop because of some system condition. For
example, in case of deadlock or resource unavailability, the system aborts an active
transaction.
System Crash
There are problems − external to the system − that may cause the system to stop abruptly
and cause the system to crash. For example, interruptions in power supply may cause the failure of
underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common problem where hard-disk drives or
storage drives used to fail frequently. Disk failures include formation of bad sectors, unreachability
to the disk, disk head crash or any other failure, which destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the storage structure can be divided into
two categories −
Volatile storage − As the name suggests, a volatile storage cannot survive system crashes.
Volatile storage devices are placed very close to the CPU; normally they are embedded
onto the chipset itself. For example, main memory and cache memory are examples of
volatile storage. They are fast but can store only a small amount of information.
Non-volatile storage − These memories are made to survive system crashes. They are huge
in data storage capacity, but slower in accessibility. Examples may include hard-disks,
magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to the actual modification and stored on a
stable storage media, which is failsafe.
Log-based recovery works as follows −
The log file is kept on a stable storage media.
When a transaction enters the system and starts execution, it writes a log about it.
<Tn, Start>
When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>
It reads T n has changed the value of X, from V1 to V2.
When the transaction finishes, it logs −
<Tn, commit>
The database can be modified using two approaches −
Deferred database modification − All logs are written on to the stable storage and the
database is updated when a transaction commits.
Immediate database modification − Each log follows an actual database modification.
That is, the database is modified immediately after every operation.
Recovery with Concurrent Transactions
When more than one transaction are being executed in parallel, the logs are interleaved. At
the time of recovery, it would become hard for the recovery system to backtrack all logs, and then
start recovering. To ease this situation, most modern DBMS use the concept of 'checkpoints'.
1.Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the
memory space available in the system. As time passes, the log file may grow too big to be handled
at all. Checkpoint is a mechanism where all the previous logs are removed from the system and
stored permanently in a storage disk. Checkpoint declares a point before which the DBMS was in
consistent state, and all the transactions were committed.
2.Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the
following manner −
If the recovery system sees a log with <T n, Start> and <T n, Commit> or just <T n, Commit>,
it puts the transaction in the redo-list.
If the recovery system sees a log with <T n, Start> but no commit or abort log found, it puts
the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the
transactions in the redo-list and their previous logs are removed and then redone before saving
their logs.