BCSE302L-Database Systems Module - 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

BCSE302L -

Database Systems

Dr.M.Revathi
Assistant Professor (Sr) / SCOPE
VIT Chennai
[email protected]

Module 5 :Transaction Processing and


Recovery
Introduction to Transaction Processing – Transaction concepts: ACID
Properties of Transactions, Transaction States - Serial and Serializable
Schedules - Schedules based on recoverability – Schedules based on
Serializability - Conflict Serializabilty – Recovery Concepts: Log Based
Recovery Protocols, Recovery based on deferred update, Recovery
techniques based on immediate update – Shadow Paging Algorithm

1
Transaction Processing
• TransactionsCollections of operations that form a single logical unit
of work
• A database system must ensure proper execution of transactions
despite failures—either the entire transaction executes, or none of it
does.
• Transaction processing systems  systems with large databases and
hundreds of concurrent users executing database transactions

Transaction Processing

Transaction Concept
• A transaction is a unit of program execution that accesses and updates
various data items.
• Initiated by a user program written in a high-level data-manipulation language (typically
SQL), or programming language (for example, C++, or Java), with embedded database
accesses in JDBC or ODBC.
• Delimited by statements (or function calls) of the form begin transaction
and end transaction
• Consists of all operations executed between the begin transaction and end
transaction
• This collection of steps must appear to the user as a single, indivisible unit.
• Since a transaction is indivisible, it either executes in its entirety or not at all. 4

2
Transaction Processing

Transaction Concept
• A single application program may contain more than one transaction if it
contains several transaction boundaries.
• Read-only transaction
• Do not update the database but only retrieve data
• Read-write transaction
• Updates the database
• A database is basically represented as a collection of named data items.
• The size of a data item is called its granularity.
• A data item database record, a whole disk block, or even a an individual field
(attribute) value of some record in the database.
5

Transaction Processing
ACID Properties
Atomicity
• Either all operations of the transaction are reflected properly in the database, or none are
Consistency
• Execution of a transaction in isolation (that is, with no other transaction executing
concurrently) preserves the consistency of the database
Isolation
• Each transaction is unaware of other transactions executing concurrently in the system.
Durability
• After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.
6

3
Transaction Processing
A Simple Transaction Model
Transactions access data using two operations:
read(X)
• Transfers the data item X from the database to a variable, also called X, in a buffer
in main memory belonging to the transaction that executed the read operation.
write(X)
• Transfers the value in the variable X in the main-memory buffer of the transaction
that executed the write to the data item X in the database.

Transaction Processing
A Simple Transaction Model
Example:
Consider a simple bank application consisting of several accounts and a set of
transactions that access and update those accounts. Consider Rs.50 is transferred
from account A to account B
Ti be a transaction that transfers 50 from account A to account B
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B) 8

4
Transaction Processing
A Simple Transaction Model
Consistency:
• Consistency requirement:
• The sum of A and B is unchanged by the execution of the transaction
• Consistency requirements include
• Explicitly specified integrity constraints such as primary keys and foreign keys
• Implicit integrity constraints
• e.g. sum of balances of all accounts, minus sum of loan amounts must equal value of cash-in-hand

• A transaction must see a consistent database.


• During transaction execution the database may be temporarily inconsistent.
• When the transaction completes successfully the database must be consistent
• Erroneous transaction logic can lead to inconsistency
• Ensuring consistency responsibility of the application programmer 9

Transaction Processing
A Simple Transaction Model
Atomicity:
• Atomicity requirement:
• If the transaction fails after step 3 and before step 6,
money will be “lost” leading to an inconsistent database
state.
• Failure could be due to software or hardware.
• The system should ensure that updates of a partially
executed transaction are not reflected in the database.
• Ensuring atomicity is the responsibility of the database
system handled by the recovery system
10

5
Transaction Processing
A Simple Transaction Model
Durability:
• Durability requirement:
• Once the user has been notified that the transaction has been
completed (i.e., the transfer of the 50 has taken place), the
updates to the database by the transaction must persist even if
there are software or hardware failures.
• Recovery system of the database is responsible for ensuring
durability

11

Transaction Processing
A Simple Transaction Model
Isolation:
• Isolation requirement — if between steps 3 and 6, another transaction T2 is allowed to access
the partially updated database, it will see an inconsistent database (the sum A + B will be less
than it should be).
• T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B)
• Isolation can be ensured trivially by running transactions serially  that is, one after the other
• Isolation is ensured by concurrency-control system 12

6
Transaction Processing
Transaction States

13

Transaction Processing
Transaction States
• Transaction may not always complete its execution successfully transaction
is termed aborted
• Any changes that the aborted transaction made to the database must be
undone.
• Once the changes caused by an aborted transaction have been undone, 
the transaction has been rolled back
• A transaction that completes its execution successfully is said to be
committed.
• A committed transaction that has performed updates transforms the database into a
new consistent state, which must persist even if there is a system failure.
• Once a transaction has committed, we cannot undo its effects by aborting it.
• The only way to undo the effects of a committed transaction is to execute a
compensating transaction. 14

7
Transaction Processing
Transaction States
• Active the initial state; the transaction stays in this state while it is
executing.
• Partially committed after the final statement has been executed.
• Failed  after the discovery that normal execution can no longer
proceed.
• Abortedafter the transaction has been rolled back and the database
has been restored to its state prior to the start of the transaction.
• Committedafter successful completion
• A transaction is said to have terminated if it has either committed or
aborted. 15

Transaction Processing
Transaction States
At the aborted state, the system has two options
• It can restart the transactionif the transaction was aborted as a result
of some hardware or software error that was not created through the
internal logic of the transaction. A restarted transaction is considered
to be a new transaction.
• It can kill the transactionbecause of some internal logical error that
can be corrected only by rewriting the application program, or
because the input was bad, or because the desired data were not
found in the database.
16

8
Transaction Processing
Schedules
• Schedule – A sequence of instructions that specify the chronological
order in which instructions of concurrent transactions are executed
• A schedule for a set of transactions must consist of all instructions of those
transactions
• Must preserve the order in which the instructions appear in each individual
transaction.
• A transaction that successfully completes its execution will have a
commit instructions as the last statement
• By default transaction assumed to execute commit instruction as its last step
• A transaction that fails to successfully complete its execution will have
an abort instruction as the last statement 17

Transaction Processing
Schedules
• A schedule S is serial if, for every transaction T participating in the
schedule, all the operations of T are executed consecutively in the
schedule;
• Otherwise, the schedule is called nonserial.
• In a serial schedule, only one transaction at a time is active
• the commit (or abort) of the active transaction initiates execution of the next
transaction.
• No interleaving occurs in a serial schedule

18

9
Transaction Processing
Schedules
• Let T1 transfer 50 from A to B, and T2 transfer 10% of the balance from A to B.
• A serial schedule in which T1 is followed by T2 :

Schedule 1 19

Transaction Processing
Schedules
• A serial schedule in which T2 is followed by T1 :

Schedule 2 20

10
Transaction Processing
Schedules
• The following schedule is not a serial
schedule, but it is equivalent to Schedule 1 :

The schedule should,


in some sense, be
equivalent to a serial
schedule. Such
schedules are called
serializable
schedules.

21
Schedule 3

Transaction Processing
Schedules
• The following concurrent schedule does not
preserve the value of (A + B )-a concurrent
schedule resulting in an inconsistent state

Schedule 4 22

11
Serializability
• Basic Assumption – Each transaction preserves database
consistency.
• Serial execution of transactions preserves database
consistency.
• A schedule is serializable if it is equivalent to a serial
schedule. Different forms of schedule equivalence:
1. conflict serializability
2. view serializability
• Simplified view of transactions
• Ignore operations other than read and write instructions
• Assume that transactions may perform arbitrary
Schedule 3—showing
computations on data in local buffers in between reads only the read and write
23
and writes. instructions

Serializability
• Conflicting Instructions
• Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if
there exists some item Q accessed by both li and lj, and at least one of these
instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict(order of I and J matters)
3. li = write(Q), lj = read(Q). They conflict (order of I and J matters)
4. li = write(Q), lj = write(Q). They conflict
• A conflict between li and lj forces a temporal order between them.
• If li and lj are consecutive in a schedule and they do not conflict, their results
would remain the same even if they had been interchanged in the schedule.
24

12
Serializability
Conflict Serializability
• If a schedule S can be transformed into a schedule S´ by a series of swaps of non-
conflicting instructionsS and S´ are conflict equivalent.
• A schedule S is conflict serializable if it is conflict equivalent to a serial schedule

25

Serializability
Conflict Serializability

Regardless of the initial system


state, schedules 3 and 5 both
produce the same final system
state

Schedule 5—schedule 3 after


Schedule 3—showing swapping of a pair of instructions
only the read and write
instructions 26

13
Serializability
Conflict Serializability
• Continue to swap nonconflicting instructions
• Schedule 3 can be transformed into Schedule 6, a
serial schedule where T2 follows T1, by series of
swaps of non-conflicting instructions.
• Schedule 3 is conflict serializable.

Schedule 6—a serial schedule that is equivalent to


schedule 3 27
Schedule 3

Serializability
Conflict Serializability
• Example of a schedule that is not conflict serializable:

Schedule 7.

Not conflict serializable, since it is not equivalent to either the serial


schedule <T3,T4> or the serial schedule <T4,T3> 28

14
Serializability
Conflict Serializability
• Determining conflict serializability of a schedule
• Consider a schedule S
• Construct a directed graph, called a precedence graph, from S.
• Consists of a pair G = (V, E), where V is a set of vertices and E is a set of
edges.
• The set of vertices consists of all the transactions participating in the
schedule.

29

Serializability
Conflict Serializability
• Determining conflict serializability of a schedule
• The set of edges consists of all edges Ti → Tj for which one of three
conditions holds:
1. Ti executes write(Q) before Tj executes read(Q).
2. Ti executes read(Q) before Tj executes write(Q).
3. Ti executes write(Q) before Tj executes write(Q).
• If an edge Ti → Tj exists in the precedence graph, then, in any serial schedule
S’ equivalent to S, Ti must appear before Tj.

30

15
Serializability
Conflict Serializability
• Determining conflict serializability of a schedule

Precedence graph for schedule 1

31

Serializability
Conflict Serializability
• Determining conflict serializability of a schedule

Precedence graph for schedule 2

32

16
Serializability
Conflict Serializability
• Determining conflict serializability of a schedule
• If the precedence graph for S has a cycle, then schedule S is not conflict
serializable.
• If the graph contains no cycles, then the schedule S is conflict serializable.

33

Serializability
Conflict Serializability
• Determining conflict serializability of a schedule

T1 T2

Precedence graph for schedule 4

T1 → T2, because T1 executes read(A) before


T2 executes write(A).
T2 → T1, because T2 executes read(B) before
T1 executes write(B)
34

17
Serializability
Conflict Serializability
• Determining conflict serializability of a schedule

35

Serializability
Conflict Serializability
• Determining conflict serializability of a schedule

T1 T2
X

Y Y, Z
T3

36

18
Serializability
Conflict Serializability
• Determining conflict serializability of a schedule
• If precedence graph is acyclic, the serializability order
can be obtained by a topological sorting of the graph.
• Linear order consistent with the partial order of the
graph.

Topological sorting 37

Serializability
Conflict Serializability
Consider the precedence graph of Figure. Is the corresponding schedule conflict
serializable?

The precedence graph is acyclic .


A possible schedule is obtained by doing
a topological sort, that is, T1, T2, T3, T4, T5

38

19
Serializability
Conflict Serializability
• Determining conflict serializability of a schedule
X,Y

T1 T2

Y, Z
Y
T3

39

Serializability
Conflict Serializability
• Determining conflict serializability of a schedule

40
Schedule 8.

20
Serializability
View Serializability
• Let S and S´ be two schedules with the same set of transactions.
• S and S´ are view equivalent if the following three conditions are met, for
each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in
schedule S’ also transaction Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that value was
produced by transaction Tj (if any), then in schedule S’ also transaction Ti
must read the value of Q that was produced by the same write(Q)
operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q) operation in
schedule S must also perform the final write(Q) operation in schedule S’.
41

Serializability
View Serializability
• View equivalence is also based purely on reads and writes alone.
• A schedule S is view serializable if it is view equivalent to a serial schedule.
• Every conflict serializable schedule is also view serializable.
• Every view serializable schedule that is not conflict serializable has blind
writes.
Schedule is view equivalent to the serial
schedule <T3, T4, T6>,
since the one read(Q) instruction reads the
initial value of Q in both schedules and T6
performs the final write of Q in both
schedules. 42

21
Serializability
View Serializability
Check whether the schedule is view serializable or not? T1 T2 T3
R2(B)
S : R2(B); R2(A); R1(A); R3(A); W1(B); W2(B); W3(B); R2(A)
R1(A)
Sol: R3(A)
W1(B)
With 3 transactions, total number of schedules possible W2(B)
=6 Since the final update on B is made by T3, W3(B)

<T1 T2 T3> so the transaction T3 must execute after


transactions T1 and T2. Now, Removing
<T1 T3 T2>
those schedules in which T3 is not The transaction T2 reads B initially
<T2 T3 T1> which is updated by T1.
executing at last:
<T2 T1 T3> <T1 T2 T3> So T2 must execute before T1.
<T3 T1 T2> <T2 T1 T3> View equivalent serial schedule is:
<T3 T2 T1> T2 → T1 → T3 43

Schedules based on recoverability


Schedules classified into two main classes
Recoverable schedule:
• A schedule S is recoverable if no transaction T in S commits until all
transactions T’ that have written an item that T reads have committed.
Non-recoverable schedule
• A schedule where a committed transaction may have to be rolled back
during recovery.
• Violates Durability from ACID properties
• Non-recoverable schedules should not be allowed.

44

22
Schedules based on recoverability
Recoverable schedule:
• If a transaction Tj reads a data item previously written by a transaction
Ti, then the commit operation of Ti appears before the commit
operation of Tj.
• The following schedule is not recoverable if T7 commits immediately
after the read before T6 commits

45

Schedules based on recoverability


Recoverable schedule:
• Cascading rollback
• A single transaction failure leads to a series of transaction rollbacks.

46

23
Schedules based on recoverability
Recoverable schedule:
• Cascading rollback
• Consider the following schedule where
none of the transactions has yet committed
If T8 fails
• T8 must be rolled back.
• Since T9 is dependent on T8, T9 must be rolled
back.
• Since T10 is dependent on T9, T10 must be
rolled back
47

Schedules based on recoverability


Recoverable schedule:
• Cascadeless Schedule
• For each pair of transactions Ti and Tj such that Tj reads a data item
previously written by Ti, the commit operation of Ti appears before the
read operation of Tj.
• Every cascadeless schedule is also recoverable
• Cascading rollback is undesirable, since it leads to the undoing of a
significant amount of work.
• It is desirable to restrict the schedules to those where cascading rollbacks
cannot occur.
48

24
Recovery Concepts
Why recovery is needed?
1. A computer failure (system crash)
2. A transaction or system error
3. Local errors or exception conditions
4. Concurrency control enforcement
5. Disk failure
6. Physical problems and catastrophes

49

Recovery Concepts
Recovery Algorithms
• Recovery algorithms ensure database consistency and transaction
atomicity and durability despite failures
Recovery algorithms have two parts
• Actions taken during normal transaction processing  information
exists to recover from failures
• Actions taken after a failure to recover the database contents to
ensure atomicity, consistency and durability

50

25
Recovery Concepts
Log-Based Recovery
• A log is kept on stable storage
• The log is a sequence of log records, and maintains a record of
update activities on the database.
• When transaction Ti starts, it registers itself by writing a
<Ti start>log record
• Before Ti executes write(X), a log record <Ti, X, V1, V2> is written
• When Ti finishes it last statement, the log record <Ti commit> is
written.

51

Recovery Concepts
Log-Based Recovery
• Assume that log records are written directly to stable storage
• Two approaches using logs
• Deferred database modification
• Immediate database modification

52

26
Recovery Concepts
Deferred database modification
• The deferred database modification scheme records all modifications
to the log, but defers all the writes after commit.
• Assume that transactions execute serially
• Transaction starts by writing <Ti start> record to log.
• A write(X) operation results in a log record <Ti, X, V> being written,
where V is the new value for X
• The write is not performed on X at this time, but is deferred.
• When Ti commits, <Ti commit> is written to the log
• The log records are read and used to execute the previously deferred 53

writes

Recovery Concepts
Deferred database modification
• During recovery after a crash, a transaction needs to be redone if and
only if both <Ti start> and<Ti commit> are there in the log.
• Redoing a transaction Ti ( redoTi) sets the value of all data items
updated by the transaction to the new values.
• Crashes can occur while
• the transaction is executing the original updates, or
• while recovery action is being taken

54

27
Recovery Concepts
Deferred database modification
• Example: Consider transactions T0 and T1 (T0 executes before T1):
T0: read (A) T1 : read (C)
A: - A - 50 C:- C- 100
Write (A) write (C)
read (B)
B:- B + 50
write (B)

55

Recovery Concepts
Deferred database modification

The same log, shown at three different times

56

28
Recovery Concepts
Deferred database modification
• After a system crash has occurred, the system consults the log to
determine which transactions need to be redone,
• Transaction Ti needs to be redone if the log contains the record <Ti
start> and either the record <Ti commit> or the record <Ti abort>.

57

Recovery Concepts
Deferred database modification

(a) No redo actions need to be taken


(b) redo(T0) must be performed since <T0 commit> is present
(c) redo(T0) must be performed followed by redo(T1) since
<T0 commit> and <T1 commit> are present 58

29
Recovery Concepts
Immediate Database Modification
• The immediate database modification scheme allows database updates
of an uncommitted transaction to be made as the writes are issued
• Update logs must have both old value and new value
• Update log record must be written before database item is written
• Recovery procedure has two operations instead of one:
• undo(Ti) restores the value of all data items updated by Ti to their old
values, going backwards from the last log record for Ti
• redo(Ti) sets the value of all data items updated by Ti to the new values,
going forward from the first log record for Ti
59

Recovery Concepts
Immediate Database Modification
• When recovering after failure:
• Transaction Ti needs to be undone if the log contains the record <Ti
start>, but does not contain the record <Ti commit>.
• Transaction Ti needs to be redone if the log contains both the
record <Ti start> and the record <Ti commit>.
• Undo operations are performed first, then redo operations

60

30
Recovery Concepts
Immediate Database Modification

The same log, shown at three different times


61

Recovery Concepts
Immediate Database Modification

Recovery actions in each case above are:


(a) undo (T0): B is restored to 2000 and A to 1000.
(b) undo (T1) and redo (T0): C is restored to 700, and then A and B are set to
950 and 2050 respectively.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050 respectively. Then
62
C is set to 600

31
Recovery Concepts
Checkpoints
• Problems in recovery procedure using log:
1. searching the entire log is time-consuming
2. Unnecessarily redo transactions which have already output their
updates to the database.
• Streamline recovery procedure by periodically performing
checkpointing
• Write a log record < checkpoint> onto stable storage

63

Recovery Concepts
Checkpoints

T1 can be ignored (updates already output to disk due to


checkpoint)
T2 and T3 redone.
T4 undone 64

32
Recovery Concepts
Shadow Paging Algorithm
• The AFIM does not overwrite its BFIM but recorded at another place on the
disk.
• Old value of the data item before updating is called the before image (BFIM)
• The new value after updating is called the after image (AFIM)
• At any time a data item has AFIM and BFIM (Shadow copy of the data item) at
two different places on the disk.
X Y
X' Y'

Database
X and Y: Shadow copies of data items
X' and Y': Current copies of data items 65

Recovery Concepts
Shadow Paging Algorithm
• To manage access of data items by concurrent transactions two directories
(current and shadow) are used.
• NO-UNDO/NO-REDO technique for recovery

66

33

You might also like