0% found this document useful (0 votes)
11 views61 pages

Transaction Management1

Transaction management involves a collection of operations that form a single logical unit of work, ensuring properties such as atomicity, consistency, isolation, and durability (ACID). These properties guarantee that transactions are executed reliably, maintaining database integrity even in the event of failures. The document also discusses the importance of concurrency control and serializability to prevent inconsistent states during concurrent transaction executions.

Uploaded by

verginjose12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views61 pages

Transaction Management1

Transaction management involves a collection of operations that form a single logical unit of work, ensuring properties such as atomicity, consistency, isolation, and durability (ACID). These properties guarantee that transactions are executed reliably, maintaining database integrity even in the event of failures. The document also discusses the importance of concurrency control and serializability to prevent inconsistent states during concurrent transaction executions.

Uploaded by

verginjose12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Transaction Management

Transaction Management

• The term transaction refers to a collection of operations that


form a single logical unit of work.
• For instance, transfer of money from one account to another is a
transaction consisting of two updates, one to each account.
Atomicity

• It is important that either all actions of a transaction be executed


completely, or, in case of some failure, partial effects of each
incomplete transaction be undone. This property is called
atomicity.
Atomicity

• A transaction is delimited by statements (or function calls) of the


form begin transaction and end transaction.
• The transaction consists of all operations executed between the
begin transaction and end transaction.
• This collection of steps must appear to the user as a single,
indivisible unit.
• Since a transaction is indivisible, it either executes in its entirety
or not at all.
• if a transaction begins to execute but fails for whatever reason,
any changes to the database that the transaction may have
made must be undone.
Consistency

• A transaction must preserve database consistency—if a


transaction is run atomically in isolation starting from a consistent
database, the database must again be consistent at the end of the
transaction.
• This consistency requirement goes beyond the data integrity
constraints we have seen earlier (such as primary-key constraints,
referential integrity, check constraints, and the like).
Isolation

• single SQL statement involves many separate accesses to the


database, and a transaction may consist of several SQL statements.
• The database system must take special actions to ensure that
transactions operate properly without interference from
concurrently executing database statements.
• This property is referred to as isolation
Durability

• once a transaction is successfully executed, its effects must


persist in the database—a system failure should not result in
the database forgetting about a transaction that successfully
completed. This property is called durability.
• Even if the system ensures correct execution of a transaction,
this serves little purpose if the system subsequently crashes
and, as a result, the system “forgets” about the transaction.
• Thus, a transaction’s actions must persist across crashes. This
property is referred to as durability
• Atomicity. Either all operations of the transaction are reflected
properly in the database, or none are.

• Consistency. Execution of a transaction in isolation (that is, with no


other transaction executing concurrently) preserves the consistency
of the database.

• Isolation. Even though multiple transactions may execute


concurrently, the system guarantees that, for every pair of
transactions Ti and Tj , it appears to Ti that either Tj finished
execution before Ti started or Tj started execution after Ti finished.
Thus, each transaction is unaware of other transactions executing
concurrently in the system.

• Durability. After a transaction completes successfully, the changes it


has made to the database persist, even if there are system failures.
• read(X), which transfers the data item X from the database to
a variable, called X, in a buffer in main memory belonging to
the transaction that executed the read operation.
• write(X), which transfers the value in the variable X in the
main-memory buffer of the transaction that executed the
write to the data item X in the database.
Let Ti be a transaction that transfers $50 from account A to
account B. This transaction can be defined as:

Ti : read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).

Let us now consider each of the ACID properties


Atomicity:
• Suppose that, just before the execution of transaction Ti, the
values of accounts A and B are $1000 and $2000, respectively.
• Now suppose that, during the execution of transaction Ti , a
failure occurs that prevents Ti from completing its execution
successfully. Further, suppose that the failure happened after the
write(A) operation but before the write(B) operation.
• In this case, the values of accounts A and B reflected in the
database are $950 and $2000. The system destroyed $50 as a result
of this failure. In particular, we note that the sum A + B is no
longer preserved
• The basic idea behind ensuring atomicity is this: The database
system keeps track (on disk) of the old values of any data on which
a transaction performs a write.
• This information is written to a file called the log.
• If the transaction does not complete its execution, the database
system restores the old values from the log to make it appear as
though the transaction never executed
Consistency:
• The consistency requirement here is that the sum of A and B be
unchanged by the execution of the transaction.
• Without the consistency requirement, money could be created or
destroyed by the transaction
• It can be verified easily that, if the database is consistent before an
execution of the transaction, the database remains consistent after
the execution of the transaction.
• Durability:
Once the execution of the transaction completes successfully, and
the user who initiated the transaction has been notified that the
transfer of funds has taken place, it must be the case that no system
failure can result in a loss of data corresponding to this transfer of
funds.

The durability property guarantees that, once a transaction


completes successfully, all the updates that it carried out on the
database persist, even if there is a system failure after the
transaction completes execution.
1. The updates carried out by the transaction have been written to
disk before the transaction completes.
2. Information about the updates carried out by the transaction and
written to disk is sufficient to enable the database to reconstruct the
updates when the database system is restarted after the failure.
• Active, the initial state; the transaction stays in this state while it is
executing.

• Partially committed, after the final statement has been executed.

• Failed, after the discovery that normal execution can no longer


proceed.

• Aborted, after the transaction has been rolled back and the
database has been restored to its state prior to the start of the
transaction.

• Committed, after successful completion.


• The state diagram corresponding to a transaction
appears in Figure 14.1. We say that a transaction has
committed only if it has entered the committed state.
A transaction starts in the active state.

When it finishes its final statement, it enters the partially


committed state.

At this point, the transaction has completed its execution, but it


is still possible that it may have to be aborted, since the actual
output may still be temporarily residing in main memory, and
thus a hardware failure may preclude its successful completion.
Compensating Transaction

Transaction that completes its execution successfully is said to be


committed. A committed transaction that has performed updates
transforms the database into a new consistent state, which must
persist even if there is a system failure.
• Once a transaction has committed, we cannot undo its effects by
aborting it. The only way to undo the effects of a committed
transaction is to execute a compensating transaction. For
instance, if a transaction added $20 to an account, the
compensating transaction would subtract $20 from the account.
A transaction may not always complete its execution successfully.
Such a transaction is termed aborted. If we are to ensure the
atomicity property, an aborted transaction must have no effect on
the state of the database.

• Thus, any changes that the aborted transaction made to the


database must be undone. Once the changes caused by an aborted
transaction have been undone, we say that the transaction has been
rolled back.
A transaction enters the failed state after the system determines
that the transaction can no longer proceed with its normal
execution (for example, because of hardware or logical errors).
Such a transaction must be rolled back. Then, it enters the aborted
state.
At this point, the system has two options

• It can restart the transaction, but only if the transaction was


aborted as a result of some hardware or software error that was not
created through the internal logic of the transaction.

• It can kill the transaction. It usually does so because of some


internal logical error that can be corrected only by rewriting the
application program
Transaction Isolation

Transaction-processing systems usually allow multiple


transactions to run concurrently.
Allowing multiple transactions to update data concurrently
causes several complications with consistency of the data.
However, there are two good reasons for allowing
concurrency:
Concurrent Execution

Improved throughput and resource utilization - A transaction


consists of many steps. Some involve I/O activity; others
involve CPU activity. The CPU and the disks in a computer
system can operate in parallel. Therefore, I/O activity can be
done in parallel with processing at the CPU.

• This increases the throughput of the system—that is, the


number of transactions executed in a given amount of time.
Correspondingly, the processor and disk utilization also
increase.
• Reduced waiting time - There may be a mix of transactions
running on a system, some short and some long. If transactions
run serially, a short transaction may have to wait for a preceding
long transaction to complete, which can lead to unpredictable
delays in running a transaction.

• If the transactions are operating on different parts of the


database, it is better to let them run concurrently, sharing the CPU
cycles and disk accesses among them.
• Consider again the simplified banking system which has
several accounts, and a set of transactions that access and
update those accounts.
• Let T1 and T2 be two transactions that transfer funds from
one account to another . Transaction T1 transfers $50 from
account A to account B. It is defined as:
T1: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
• Transaction T2 transfers 10 percent of the balance from
account A to account B. It is defined as:
T2: read(A);
temp := A * 0.1;
A := A − temp;
write(A);
read(B);
B := B + temp;
write(B).

• Suppose the current values of accounts A and B are $1000 and


$2000, respectively. Suppose also that the two transactions are
executed one at a time in the order T1 followed by T2. This
execution sequence appears in Figure 14.2.
• The final values of accounts A and B, after the execution in
Figure 14.2 takes place, are $855 and $2145, respectively.
Thus, the total amount of money in accounts A and B—that
is, the sum A + B—is preserved after the execution of both
transactions.
• Similarly, if the transactions are executed one at a time in the
order T2 followed by T1, then the corresponding execution
sequence is that of Figure 14.3. Again, as expected, the sum
A + B is preserved, and the final values of accounts A and B
are $850 and $2150, respectively.
• The execution sequences just described are called schedules.
• They represent the chronological order in which instructions
are executed in the system. Clearly, a schedule for a set of
transactions must consist of all instructions of those
transactions, and must preserve the order in which the
instructions appear in each individual transaction.

• Each serial schedule consists of a sequence of instructions


from various transactions, where the instructions belonging
to one single transaction appear together in that schedule.
• When the database system executes several transactions
concurrently, the corresponding schedule no longer needs to be
serial.
• If two transactions are running concurrently, the operating
system may execute one transaction for a little while, then
perform a context switch, execute the second transaction for
some time, and then switch back to the first transaction for some
time, and so on.
• One possible schedule appears in Figure 14.4. After this
execution takes place, we arrive at the same state as the one in
which the transactions are executed serially in the order T1
followed by T2.
• The sum A + B is indeed preserved
• Not all concurrent executions result in a correct state. To illustrate,
consider the schedule of Figure 14.5. After the execution of this
schedule, we arrive at a state where the final values of accounts A
and B are $950 and $2100, respectively.
• This final state is an inconsistent state, since we have gained $50 in
the process of the concurrent execution. Indeed, the sum A + B is
not preserved by the execution of the two transactions
• If control of concurrent execution is left entirely to the
operating system, many possible schedules may leave the
database in an inconsistent state.
• It is the job of the database system to ensure that any
schedule that is executed will leave the database in a
consistent state.
• The concurrency-control component of the database system
carries out this task.
Serializable schedules
• We can ensure consistency of the database under concurrent
execution by making sure that any schedule that is executed has
the same effect as a schedule that could have occurred without
any concurrent execution.
• That is, the schedule should, in some sense, be equivalent to a
serial schedule. Such schedules are called serializable
schedules.
Serializability
• we shall not consider the various types of operations that a
transaction can perform on a data item, but instead consider
only two operations: read and write.
We assume that, between a read(Q) instruction and a
write(Q)instruction on a data item Q, a transaction may
perform an arbitrary sequence of operations on the copy of Q
that is residing in the local buffer of the transaction.
• In this model, the only significant operations of a transaction,
from a scheduling point of view, are its read and write
instructions.
• Let us consider a schedule S in which there are two
consecutive instructions, I and J , of transactions Ti and Tj ,
respectively.
• If I and J refer to different data items, then we can swap I and
J without affecting the results of any instruction in the
schedule. However, if I and J refer to the same data item Q,
then the order of the two steps may matter
Conflict Serializability
Since we are dealing with only read and write instructions, there
are four cases that we need to consider:
1. I = read(Q), J = read(Q). The order of I and J does not
matter, since the same value of Q is read by Ti and Tj ,
regardless of the order.
2. I = read(Q), J = write(Q). If I comes before J , then Ti does
not read the value of Q that is written by Tj in instruction J . If J
comes before I, then Ti reads the value of Q that is written by Tj.
Thus, the order of I and J matters.
3. I = write(Q), J = read(Q). The order of I and J matters for
reasons similar to those of the previous case.
4. I = write(Q), J = write(Q). Since both instructions are write
operations, the order of these instructions does not affect
either Ti or Tj . However, the value obtained by the next
read(Q) instruction of S is affected, since the result of only the
latter of the two write instructions is preserved in the database.

• We say that I and J conflict if they are operations by different


transactions on the same data item, and at least one of these
instructions is a write operation.
• To illustrate the concept of conflicting instructions, we
consider schedule 3 in Figure 14.6. The write(A) instruction
of T1 conflicts with the read(A) instruction of T2.
• However, the write(A) instruction of T2 does not conflict with
the read(B) instruction of T1, because the two instructions
access different data items.
• Let I and J be consecutive instructions of a schedule S. If I and J
are instructions of different transactions and I and J do not
conflict, then we can swap the order of I and J to produce a new
schedule S’.
• S is equivalent to S’, since all instructions appear in the same
order in both schedules except for I and J , whose order does not
matter.
• Since the write(A) instruction of T2 in schedule 3 of Figure 14.6
does not conflict with the read(B) instruction of T1, we can swap
these instructions to generate an equivalent schedule, schedule
5, in Figure 14.7.
• Regardless of the initial system state, schedules 3 and 5 both
produce the same final system state. We continue to swap non
conflicting instructions:

• Swap the read(B) instruction of T1 with the read(A) instruction of


T2.
• Swap the write(B) instruction of T1 with the write(A) instruction
of T2.
• Swap the write(B) instruction of T1 with the read(A) instruction
of T2.
The final result of these swaps, schedule 6 of Figure 14.8, is a
serial schedule.
• If a schedule S can be transformed into a schedule S’ by a series
of swaps of non conflicting instructions, we say that S and S’ are
conflict equivalent.
Transaction Atomicity

• If a transaction Ti fails, for whatever reason, we need to undo


the effect of this transaction to ensure the atomicity property
of the transaction.
• In a system that allows concurrent execution, the atomicity
property requires that any transaction Tj that is dependent on
Ti (that is, Tj has read data written by Ti) is also aborted.
Non recoverable Schedules
• Consider the partial schedule 9, in which T7 is a transaction
that performs only one instruction: read(A).
• We call this a partial schedule because we have not included
a commit or abort operation for T6 . Notice that T7 commits
immediately after executing the read(A) instruction. Thus, T7
commits while T6 is still in the active state.
• Now suppose that T6 fails before it commits. T7 has read the
value of data item A written by T6. Therefore, we say that T7 is
dependent on T6.
• Because of this, we must abort T7 to ensure atomicity.
However, T7 has already committed and cannot be aborted.
Thus, we have a situation where it is impossible to recover
correctly from the failure of T6.
Recoverable schedule

• A recoverable schedule is one where, for each pair of


transactions Ti and Tj such that Tj reads a data item previously
written by Ti , the commit operation of Ti appears before the
commit operation of Tj .
• For the example of schedule 9 to be recoverable, T7 would
have to delay committing until after T6 commits
Cascading Rollback.

• consider the partial schedule of Figure 14.15.


• Transaction T8 writes a value of A that is read by transaction
T9. Transaction T9 writes a value of A that is read by
transaction T10. Suppose that, at this point, T8 fails. T8 must
be rolled back. Since T9 is dependent on T8, T9 must be
rolled back. Since T10 is dependent on T9, T10 must be
rolled back.
• This phenomenon, in which a single transaction failure leads
to a series of transaction rollbacks, is called cascading
rollback.
Cascading Rollback
Cascadeless schedule

• Formally, a cascadeless schedule is one where, for each


pair of transactions Ti and Tj such that Tj reads a data item
previously written by Ti , the commit operation of Ti appears
before the read operation of Tj .
• It is easy to verify that every cascadeless schedule is also
recoverable.
Implementation of Isolation

• There are various concurrency-control policies that we can


use to ensure that, even when multiple transactions are
executed concurrently, only acceptable schedules are
generated, regardless of how the operating system time-
shares resources (such as CPU time) among the transactions
Example
• A transaction acquires a lock on the entire database before it
starts and releases the lock after it has committed.
• While a transaction holds a lock, no other transaction is allowed
to acquire the lock, and all must therefore wait for the lock to be
released.
• As a result of the locking policy, only one transaction can
execute at a time. Therefore, only serial schedules are
generated. These are trivially serializable, and it is easy to
verify that they are recoverable and cascadeless as well
Locking

• Instead of locking the entire database, a transaction could,


instead, lock only those data items that it accesses.
• Under such a policy, the transaction must hold locks long
enough to ensure serializability, but for a period short enough
not to harm performance excessively.
Timestamps

• Another category of techniques for the implementation of


isolation assigns each transaction a timestamp, typically
when it begins. For each data item, the system keeps two
timestamps.
• The read timestamp of a data item holds the timestamp of
those transactions that read the data item.
• The write timestamp of a data item holds the timestamp of
the transaction that wrote the current value of the data item.
Multiple Versions and Snapshot Isolation

• In snapshot isolation, we can imagine that each transaction is


given its own version, or snapshot, of the database when it
begins.
• It reads data from this private version and is thus isolated
from the updates made by other transactions.
• If the transaction updates the database, that update appears
only in its own version, not in the actual database itself.
Information about these updates is saved so that the updates
can be applied to the “real” database if the transaction
commits.

You might also like