0% found this document useful (0 votes)
13 views36 pages

Unit 4 Material

The document explains the concept of transactions in Database Management Systems (DBMS), highlighting their key properties known as ACID (Atomicity, Consistency, Isolation, Durability). It details the transaction lifecycle, the Shadow Database scheme for ensuring atomicity and durability, and the challenges of concurrent execution, including potential problems and the importance of serializability. Additionally, it discusses concurrency control protocols, particularly lock-based protocols and the two-phase locking mechanism.

Uploaded by

suji39433
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views36 pages

Unit 4 Material

The document explains the concept of transactions in Database Management Systems (DBMS), highlighting their key properties known as ACID (Atomicity, Consistency, Isolation, Durability). It details the transaction lifecycle, the Shadow Database scheme for ensuring atomicity and durability, and the challenges of concurrent execution, including potential problems and the importance of serializability. Additionally, it discusses concurrency control protocols, particularly lock-based protocols and the two-phase locking mechanism.

Uploaded by

suji39433
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Transaction Concept in DBMS

A transaction is a single, logical unit of work that consists of one or more


related tasks. A transaction is treated as a single, indivisible operation, which
means that either all the tasks within the transaction are executed
successfully, or none are.
Here are a few key properties of transactions, often referred to by the
acronym ACID:

ACID Properties
1. Atomicity
2. Consistency
3. Isolation
4. Durability

Atomicity: This property states that a transaction must be treated as an


atomic unit, that is, either all of its operations are executed or none. There is
no such thing as partial transaction; if a transaction fails, all the changes made
in the database so far by that transaction are rolled back, and the database
remains unchanged.
Consistency: The database must remain in a consistent state after any
transaction. No transaction should have any adverse effect on the data residing
in the database. If the database was in a consistent state before a transaction,
then after execution of the transaction, the database must be back to its
consistent state.
Isolation: Each transaction is considered independent of other transactions.
That is, the execution of multiple transactions concurrently will have the same
effect as if they had been executed one after the other. But isolation does not
ensure which transaction will execute first.
Durability: The changes made to the database by a successful transaction
persist even after a system failure.

Transaction State in DBMS


Each and every transactions in DBMS, pass through several states during their
lifespan. These states, known as "Transaction States," collectively constitute
the "Transaction Lifecycle." Here are the main states in the lifecycle of a
transaction:

1. Active: This is the initial state of every transaction. In this state, the
transaction is being executed. The transaction remains in this state as
long as it is executing SQL statements.
2. Partially Committed: When a transaction executes its final statement,
it is said to be in a 'partially committed' state. At this point, the
transaction has passed the modification phase but has not yet been
committed to the database. If a failure occurs at this stage, the
transaction will roll back.
3. Failed: If a transaction is in a 'partially committed' state and a problem
occurs that prevents the transaction from committing, it is said to be in
a 'failed' state. When a transaction is in a 'failed' state, it will trigger a
rollback operation.
4. Aborted: If a transaction is rolled back and the database is restored to
its state before the transaction began, the transaction is said to be
'aborted.' After a transaction is aborted, it can be restarted again, but
this depends on the policy of the transaction management component
of the DBMS.
5. Committed: When a transaction is in a 'partially committed' state and
the commit operation is successful, it is said to be 'committed.' At this
point, the transaction has completed its execution and all of its updates
are permanently stored in the database.
6. Terminated: After a transaction reaches the 'committed' or 'aborted'
state, it is said to be 'terminated.' This is the final state of a transaction.
Note: The actual terms and the number of states may vary depending on the
specifics of the DBMS and the transaction processing model it uses. However,
the fundamental principles remain the same.

Shadow Database scheme


The Shadow Database scheme is a technique used for ensuring atomicity and
durability in a database system, two of the ACID properties. The core idea is
to keep a shadow copy of the database and make changes to a separate copy.
Once the transaction is complete and it's time to commit, the system switches
to the new copy, effectively making it the current database. This allows for
easy recovery and ensures data integrity even in the case of system failures.

Working Principle of Shadow Database

 Initial State: Initially, the database is in a consistent state. Let's call


this the "Shadow" database.
 Transaction Execution: When transactions are executed, they are not
applied directly to the shadow database. Instead, they are applied to a
separate copy of the database.
 Commit: After a transaction is completed successfully, the system
makes the separate copy the new shadow database.
 Rollback: If a transaction cannot be completed successfully, the system
discards the separate copy, effectively rolling back all changes.
 System Failure: If a system failure occurs in the middle of a
transaction, the original shadow database remains untouched and in a
consistent state.

Shadow Database scheme Example


Let's consider a simple banking database that has one table
named `Account` with two fields: `AccountID` and `Balance`.

Shadow Database - Version 1

| AccountID | Balance |
|-----------|---------|
| 1 | 1000 |
| 2 | 2000 |

1. Transaction Start: A transaction starts to transfer 200 from AccountID 1


to AccountID 2.
2. Separate Copy: A separate copy of the database is made and the changes
are applied to it.
Modified Copy

| AccountID | Balance |
|-----------|---------|
| 1 | 800 |
| 2 | 2200 |

3. Commit: The transaction completes successfully. The modified copy


becomes the new shadow database.

Concurrent Executions in DBMS


Concurrent execution refers to the simultaneous execution of more than
one transaction. This is a common scenario in multi-user database
environments where many users or applications might be accessing or
modifying the database at the same time. Concurrent execution is crucial for
achieving high throughput and efficient resource utilization. However, it
introduces the potential for conflicts and data inconsistencies.

Advantages of Concurrent Execution

1. Increased System Throughput: Multiple transactions can be in


progress at the same time, but at different stages
2. Maximized Processor Utilization: If one transaction is waiting for I/O
operations, another transaction can utilize the processor.
3. Decreased Wait Time: Transactions no longer have to wait for other
long transactions to complete.
4. Improved Transaction Response Time: Transactions get processed
faster because they can be executed in parallel.

Potential Problems with Concurrent Execution

1. Lost Update Problem (Write-Write


conflict):
One transaction's updates could be overwritten by another.
Examples:

T1 | T2
----------|-----------
Read(A) |
A = A+50 |
| Read(A)
| A = A+100
Write(A) |
| Write(A)

Result: T1's updates are lost.


2.Temporary Inconsistency or Dirty Read
Problem (Write-Read conflict):
One transaction might read an inconsistent state of data that's being updated
by another.
Examples:

T1 | T2
------------------|-----------
Read(A) |
A = A+50 |
Write(A) |
| Read(A)
| A = A+100
| Write(A)
Read(A)(rollbacks)|
| commit

Result: T2 has a "dirty" value, that was never committed in T1 and doesn't
actually exist in the database.

3. Unrepeatable Read Problem (Read-Write


conflict):
when a single transaction reads the same row multiple times and observes
different values each time. This occurs because another concurrent transaction
has modified the row between the two reads.
Examples:

T1 | T2
----------|----------
Read(A) |
| Read(A)
| A = A+100
| Write(A)
Read(A) |

Result: Within the same transaction, T1 has read two different values for the
same data item. This inconsistency is the unrepeatable read.
To manage concurrent execution and ensure the consistency and reliability of
the database, DBMSs use concurrency control techniques. These typically
include locking mechanisms, timestamps, optimistic concurrency control, and
serializability checks.

Serializability in DBMS
Schedule is an order of multiple transactions executing in concurrent
environment.

Serial Schedule: The schedule in which the transactions execute one after
the other is called serial schedule. It is consistent in nature.
For example: Consider following two transactions T1 and T2.

T1 | T2
----------|----------
Read(A) |
Write(A) |
Read(B) |
Write(B) |
| Read(A)
| Write(A)
| Read(B)
| Write(B)

All the operations of transaction T1 on data items A and then B executes and
then in transaction T2 all the operations on data items A and B execute.
Non Serial Schedule: The schedule in which operations present within the
transaction are intermixed. This may lead to conflicts in the result or
inconsistency in the resultant data.
For example- Consider following two transactions,

T1 | T2
----------|----------
Read(A) |
Write(A) |
| Read(A)
| Write(B)
Read(A) |
Write(B) |
| Read(B)
| Write(B)

The above transaction is said to be non serial which result in inconsistency or


conflicts in the data.

What is serializability? How it is tested?


Serializability is the property that ensures that the concurrent execution of a
set of transactions produces the same result as if these transactions were
executed one after the other without any overlapping, i.e., serially.

Why is Serializability Important?


In a database system, for performance optimization, multiple transactions
often run concurrently. While concurrency improves performance, it can
introduce several data inconsistency problems if not managed properly.
Serializability ensures that even when transactions are executed concurrently,
the database remains consistent, producing a result that's equivalent to a
serial execution of these transactions.

Testing for serializability in DBMS


Testing for serializability in a DBMS involves verifying if the interleaved
execution of transactions maintains the consistency of the database. The most
common way to test for serializability is using a precedence graph (also known
as a serializability graph or conflict graph).

Types of Serializability

1. Conflict Serializability
2. View Serializability

Conflict Serializability
Conflict serializability is a form of serializability where the order of non-
conflicting operations is not significant. It determines if the concurrent
execution of several transactions is equivalent to some serial execution of
those transactions.
Two operations are said to be in conflict if:

 They belong to different transactions.


 They access the same data item.
 At least one of them is a write operation.
Examples of non-conflicting operations

T1 | T2
----------|----------
Read(A) | Read(A)
Read(A) | Read(B)
Write(B) | Read(A)
Read(B) | Write(A)
Write(A) | Write(B)

Examples of conflicting operations

T1 | T2
----------|----------
Read(A) | Write(A)
Write(A) | Read(A)
Write(A) | Write(A)
A schedule is conflict serializable if it can be transformed into a serial schedule
(i.e., a schedule with no overlapping transactions) by swapping non-conflicting
operations. If it is not possible to transform a given schedule to any serial
schedule using swaps of non-conflicting operations, then the schedule is not
conflict serializable.

To determine if S is conflict serializable:


Precedence Graph (Serialization Graph): Create a graph where:
Nodes represent transactions.

Draw an edge from Ti to Tj if an operation in Ti precedes and conflicts with an


operation in Tj.
For the given example:

T1 | T2
----------|----------
Read(A) |
| Read(A)
Write(A) |
| Read(B)
| Write(B)

R1(A) conflicts with W1(A), so there's an edge from T1 to T1, but this is
ignored because they´re from the same transaction.
R2(A) conflicts with W1(A), so there's an edge from T2 to T1.
No other conflicting pairs.
The graph has nodes T1 and T2 with an edge from T2 to T1. There are no
cycles in this graph.
Decision: Since the precedence graph doesn't have any cycles,Cycle is a path
using which we can start from one node and reach to the same node. the
schedule S is conflict serializable. The equivalent serial schedules, based on
the graph, would be T2 followed by T1.

View Serializability
View Serializability is one of the types of serializability in DBMS that ensures
the consistency of a database schedule. Unlike conflict serializability, which
cares about the order of conflicting operations, view serializability only cares
about the final outcome. That is, two schedules are view equivalent if they
have:
 Initial Read: The same set of initial reads (i.e., a read by a transaction
with no preceding write by another transaction on the same data item).
 Updated Read: For any other writes on a data item in between, if a
transaction Tj reads the result of a write by transaction Tiin one
schedule, then Tj should read the result of a write by Ti in the other
schedule as well.
 Final Write: The same set of final writes (i.e., a write by a transaction
with no subsequent writes by another transaction on the same data
item).
Let's understand view serializability with an example:
Consider two transactions T1 and T2:
Schedule 1(S1):

| Transaction T1 | Transaction T2 |
|---------------------|---------------------|
| Write(A) | |
| | Read(A) |
| | Write(B) |
| Read(B) | |
| Write(B) | |
| Commit | Commit |

Schedule 2(S2):

| Transaction T1 | Transaction T2 |
|---------------------|---------------------|
| | Read(A) |
| Write(A) | |
| | Write(A) |
| Read(B) | |
| Write(B) | |
| Commit | Commit |

Here,

1. Both S1 and S2 have the same initial read of A by T2.


2. Both S1 and S2 have the final write of A by T2.
3. For intermediate writes/reads, in S1, T2 reads the value of A
after T1 has written to it. Similarly, in S2, T2 reads A which can be
viewed as if it read the value after T1 (even though in actual
sequence T2 read it before T1 wrote it). The important aspect is the
view or effect is equivalent.
4. B is read and then written by T1 in both schedules.
Considering the above conditions, S1 and S2 are view equivalent. Thus, if S1
is serializable, S2 is also view serializable.
CONCURRENCY CONTROL
Concurrency Control is the working concept that is required for controlling and
managing the
concurrent execution of database operations and thus avoiding the
inconsistencies in the
database. Thus, for maintaining the concurrency of the database, we have the
concurrency
control protocols.
Concurrency Control Protocols
The concurrency control protocols ensure the atomicity, consistency, isolation,
durability and serializability of the concurrent execution of the database
transactions.
Therefore, these protocols are categorized as:
o Lock Based Concurrency Control Protocol
o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it
acquires an
appropriate lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only
read by
the transaction.
o It can be shared between the transactions because when the transaction holds a
lock,
then it can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify
the same
data simultaneously.
TWO-PHASE LOCKING (2PL)
o The two-phase locking protocol divides the execution phase of the transaction
into
three parts.
o In the first part, when the execution of the transaction starts, it seeks
permission for
the lock it requires.
o In the second part, the transaction acquires all the locks. The third phase is
started as
soon as the transaction releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only
releases the
acquired locks.
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be
acquired by the
transaction, but none can be released.
DATABASE MANAGEMENT SYSTEMS Page 145
Shrinking phase: In the shrinking phase, existing lock held by the transaction
may be
released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase
can happen:
1. Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.
2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
o Growing phase: from step 1-3
o Shrinking phase: from step 5-7
o Lock point: at 3
Transaction T2:
o Growing phase: from step 2-6
o Shrinking phase: from step 8-9
o Lock point: at 6
4. Strict Two-phase locking (Strict-2PL)
o The first phase of Strict-2PL is similar to 2PL. In the first phase, after
acquiring all the
locks, the transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not
release a
lock after using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all
the
locks at a time.
o Strict-2PL protocol does not have shrinking phase of lock release.
TIMESTAMP ORDERING PROTOCOL
o The Timestamp Ordering Protocol is used to order the transactions based on
their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.
o The priority of the older transaction is higher that's why it executes first. To
determine
the timestamp of the transaction, this protocol uses system time or logical
counter.
o The lock-based protocol is used to manage the order between conflicting pairs
among
transactions at the execution time. But Timestamp based protocols start working
as
soon as a transaction is created.
Basic Timestamp ordering protocol works as follows:
1. Check the following condition whenever a transaction Ti issues a Read (X)
operation:
o If W_TS(X) >TS(Ti) then the operation is rejected.
o If W_TS(X) <= TS(Ti) then the operation is executed.
o Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X)
operation:
o If TS(Ti) < R_TS(X) then the operation is rejected.
o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back
otherwise the
operation is executed.
Where,
TS(TI) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Validation Based Protocol
Validation phase is also known as optimistic concurrency control technique. In
the validation
based protocol, the transaction is executed in the following three phases:
1. Read phase: In this phase, the transaction T is read and executed. It is used to
read
the value of various data items and stores them in temporary local variables. It
can
perform all the write operations on temporary variables without an update to the
actual database.
2. Validation phase: In this phase, the temporary variable value will be validated
against the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the
temporary
results are written to the database or system otherwise the transaction is rolled
back.
Here each phase has the following different timestamps:
Start(Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its
validation
phase.
Finish(Ti): It contains the time when Ti finishes its write phase.
o This protocol is used to determine the time stamp for the transaction for
serialization
using the time stamp of the validation phase, as it is the actual phase which
determines if the transaction will commit or rollback.
o Hence TS(T) = validation(T).
o The serializability is determined during the validation process. It can't be
decided in
advance.
o While executing the transaction, it ensures a greater degree of concurrency
and also
less number of conflicts.
o Thus it contains transactions which have less number of rollbacks.
THOMAS WRITE RULE
Thomas Write Rule provides the guarantee of serializability order for the
protocol. It
improves the Basic Timestamp Ordering Algorithm.
The basic Thomas write rules are as follows:
o If TS(T) < R_TS(X) then transaction T is aborted and rolled back, and
operation is
rejected.
o If TS(T) < W_TS(X) then don't execute the W_item(X) operation of the
transaction
and continue processing.
o If neither condition 1 nor condition 2 occurs, then allowed to execute the
WRITE
operation by transaction Ti and set W_TS(X) to TS(T).
MULTIPLE GRANULARITY
Let's start by understanding the meaning of granularity.
Granularity: It is the size of data item allowed to lock.
Multiple Granularity:
o It can be defined as hierarchically breaking up the database into blocks which
can be
locked.
o The Multiple Granularity protocol enhances concurrency and reduces lock
overhead.
o It maintains the track of what to lock and how to lock.
o It makes easy to decide either to lock a data item or to unlock a data item. This
type of
hierarchy can be graphically represented as a tree.
o The first level or higher level shows the entire database.
o The second level represents a node of type area. The higher level database
consists of
exactly these areas.
o The area consists of children nodes which are known as files. No file can be
present in
more than one area.
o Finally, each file contains child nodes known as records. The file has exactly
those
records that are its child nodes. No records represent in more than one file.
o Hence, the levels of the tree starting from the top level are as follows:
o Database
o Area
o File
o Record
Multiple Granularity in DBMS
In the context of database systems, "granularity" refers to the size or extent
of a data item that can be locked by a transaction. The idea behind multiple
granularity is to provide a hierarchical structure that allows locks at various
levels, ranging from coarse-grained (like an entire table) to fine-grained (like
a single row or even a single field). This hierarchy offers flexibility in achieving
the right balance between concurrency and data integrity.
The concept of multiple granularity can be visualized as a tree. Consider a
database system where:

 The entire database can be locked.


 Within the database, individual tables can be locked.
 Within a table, individual pages or rows can be locked.
 Even within a row, individual fields can be locked.

Lock Modes in multiple granularity


To ensure the consistency and correctness of a system that allows multiple
granularity, it's crucial to introduce the concept of "intention locks." These
locks indicate a transaction's intention to acquire a finer-grained lock in the
future.
There are three main types of intention locks:
1. Intention Shared (IS): When a Transaction needs S lock on a node "K",
the transaction would need to apply IS lock on all the precedent nodes of "K",
starting from the root node. So, when a node is found locked in IS mode, it
indicates that some of its descendent nodes must be locked in S mode.
Example: Suppose a transaction wants to read a few records from a table but
not the whole table. It might set an IS lock on the table, and then set individual
S locks on the specific rows it reads.
2. Intention Exclusive (IX): When a Transaction needs X lock on a node
"K", the transation would need apply IX lock on all the precedent nodes of "K",
starting from the root node. So, when a node is found locked in IX mode, it
indicates that some of its descendent nodes must be locked in X mode.
Example: If a transaction aims to update certain records within a table, it may
set an IX lock on the table and subsequently set X locks on specific rows it
updates.
3. Shared Intention Exclusive (SIX): When a node is locked in SIX mode;
it indicates that the node is explicitly locked in S mode and Ix mode. So, the
entire tree rooted by that node is locked in S mode and some nodes in that
are locked in X mode. This mode is compatible only with IS mode.
Example: Suppose a transaction wants to read an entire table but also update
certain rows. It would set a SIX lock on the table. This tells other transactions
they can read the table but cannot update it until the SIX lock is released.
Meanwhile, the original transaction can set X locks on specific rows it wishes
to update.

Compatibility Matrix with Lock Modes in


multiple granularity
A compatibility matrix defines which types of locks can be held simultaneously
on a database object. Here's a simplified matrix:

| | NL | IS | IX | S | SIX | X |
|:-----:|:-----:|:------:|:-----:|:------:|:-----:|:------:|

| NL | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |

| IS | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |

| IX | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |

| S | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ |

| SIX | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |

| X | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |

(NL = No Lock, S = Shared, X = Exclusive)


The Scheme operates as follows:-

a. A Transaction must first lock the Root Node and it can be locked in any
mode.
b. Locks are granted as per the Compatibility Matrix indicated above.
c. A Transaction can lock a node in S or IS mode if it has already locked all
the predecessor nodes in IS or IX mode.
d. A Transaction can lock a node in X or IX or SIX mode if it has already
locked all the predecessor nodes in SIX or IX mode.
e. A transaction must follow two-phase locking. It can lock a node, only if
it has not previously unlocked a node. Thus, schedules will always be
conflict-serializable.
f. Before it unlocks a node, a Transaction has to first unlock all the children
nodes of that node. Thus, locking will proceed in top-down manner and
unlocking will proceed in bottom-up manner. This will ensure the
resulting schedules to be deadlock-free.
Benefits of using multiple granularity
 Flexibility: Offers flexibility to transactions in deciding the appropriate
level of locking, which can lead to improved concurrency.
 Performance: Reduces contention by allowing transactions to lock only
those parts of the data that they need.

Challenges in multiple granularity


 Complexity: Managing multiple granularity adds complexity to the lock
management system.
 Overhead: The lock manager needs to handle not just individual locks
but also the hierarchy and compatibility of these locks.

Recovery and Atomicity in dbms


 When a system crashes, it may have several transactions being executed
and various files opened for them to modify the data items.
 But according to ACID properties of DBMS, atomicity of transactions as
a whole must be maintained, that is, either all the operations are
executed or none.
 Database recovery means recovering the data when it get deleted,
hacked or damaged accidentally.
 Atomicity is must whether is transaction is over or not it should reflect
in the database permanently or it should not effect the database at all.
When a DBMS recovers from a crash, it should maintain the
following −

 It should check the states of all the transactions, which were being
executed.
 A transaction may be in the middle of some operation; the DBMS must
ensure the atomicity of the transaction in this case.
 It should check whether the transaction can be completed now or it
needs to be rolled back.
 No transactions would be allowed to leave the DBMS in an inconsistent
state.
There are two types of techniques, which can help a DBMS in
recovering as well as maintaining the atomicity of a transaction −

 Maintaining the logs of each transaction, and writing them onto some
stable storage before actually modifying the database.
 Maintaining shadow paging, where the changes are done on a volatile
memory, and later, the actual database is updated

Log-Based Recovery
Log-based recovery is a widely used approach in database management
systems to recover from system failures and maintain atomicity and durability
of transactions. The fundamental idea behind log-based recovery is to keep a
log of all changes made to the database, so that after a failure, the system can
use the log to restore the database to a consistent state.

How Log-Based Recovery Works

1. Transaction Logging:
For every transaction that modifies the database, an entry is made in the log.
This entry typically includes:

 Transaction ID: A unique identifier for the transaction.


 Data item identifier: Identifier for the specific item being modified.
 OLD value: The value of the data item before the modification.
 NEW value: The value of the data item after the modification.

We represent an update log record as <Ti , Xj , V1, V2>, indicating that


transaction Ti has performed a write on data item Xj. Xj had value V1 before
the write, and has value V2 after the write. Other special log records exist to
record significant events during transaction processing, such as the start of a
transaction and the commit or abort of a transaction. Among the types of log
records are:

 <Ti start>. Transaction Ti has started.


 <Ti commit>. Transaction Ti has committed.
 <Ti abort>. Transaction Ti has aborted.

2. Writing to the Log


Before any change is written to the actual database (on disk), the
corresponding log entry is stored. This is called the Write-Ahead Logging
(WAL) principle. By ensuring that the log is written first, the system can later
recover and apply or undo any changes.
3. Checkpointing
Periodically, the DBMS might decide to take a checkpoint. A checkpoint is a
point of synchronization between the database and its log. At the time of a
checkpoint:

 All the changes in main memory (buffer) up to that point are written to
disk.
 A special entry is made in the log indicating a checkpoint. This helps in
reducing the amount of log that needs to be scanned during recovery.

4. Recovery Process

 Redo: If a transaction is identified (from the log) as having committed


but its changes have not been reflected in the database (due to a crash
before the changes could be written to disk), then the changes are
reapplied using the 'After Image' from the log.
 Undo: If a transaction is identified as not having committed at the time
of the crash, any changes it made are reversed using the 'Before Image'
in the log to ensure atomicity.

5. Commit/Rollback
Once a transaction is fully complete, a commit record is written to the log. If
a transaction is aborted, a rollback record is written, and using the log, the
system undoes any changes made by this transaction.

Benefits of Log-Based Recovery


 Atomicity: Guarantees that even if a system fails in the middle of a
transaction, the transaction can be rolled back using the log.
 Durability: Ensures that once a transaction is committed, its effects are
permanent and can be reconstructed even after a system failure.
 Efficiency: Since logging typically involves sequential writes, it is
generally faster than random access writes to a database.

Shadow paging - Its Working principle


Shadow Paging is an alternative disk recovery technique to the more common
logging mechanisms. It's particularly suitable for database systems. The
fundamental concept behind shadow paging is to maintain two page tables
during the lifetime of a transaction: the current page table and the shadow
page table.
Here's a step-by-step breakdown of the working principle of shadow paging:

Initialization
When the transaction begins, the database system creates a copy of the
current page table. This copy is called the shadow page table.
The actual data pages on disk are not duplicated; only the page table entries
are. This means both the current and shadow page tables point to the same
data pages initially.

During Transaction Execution


When a transaction modifies a page for the first time, a copy of the page is
made. The current page table is updated to point to this new page.
Importantly, the shadow page table remains unaltered and continues pointing
to the original, unmodified page.
Any subsequent changes by the same transaction are made to the copied page,
and the current page table continues to point to this copied page.

On Transaction Commit
Once the transaction reaches a commit point, the shadow page table is
discarded, and the current page table becomes the new "truth" for the
database state.
The old data pages that were modified during the transaction (and which the
shadow page table pointed to) can be reclaimed.

Recovery after a Crash


If a crash occurs before the transaction commits, recovery is straightforward.
Since the original data pages (those referenced by the shadow page table)
were never modified, they still represent a consistent database state.
The system simply discards the changes made during the transaction (i.e.,
discards the current page table) and reverts to the shadow page table.

The recovery system consists of recovery data which is the update history of
transactions.
Recovery utility is software that must run for recovering when database is in
bad state, operation
or database disallowed.
Basic Concepts in Recovering the System:
Recovery: It is a process to restore database to a consistent state after it has
met with
a failure.
Failure: It is a database inconsistency that is visible.
Transaction Recovery: It is a process to restore that last consistent state of
data
items modified by failed transactions.
Transaction Log: It maintains execution history of concurrent transactions in
the
form of following record;
(transaction_id, operation, date item, before image, after image)
BFIM and AFIM: The value of a database object before its update is called as
before
image (BFIM) and the value of that object after its update is called as after
image
(AFIM).
Transaction directories: During the execution of a transaction two directories
are
maintained:
o Current directory: The entries in this directory points to the most recent
database pages on disk.
o Shadow directory: It points to the old entries. It gets its entry by the current
directory.
Recovery log entries: There are two types of recovery log entries
o Undo type log entry: It includes the BFIM of a data being updated. It is
required for undo operation.
o Redo type log entry: It includes the AFIM of a data item being updated. It is
needed for the redo operation.
Recovery approaches: Steal/no – steal approach
o Steal: In this case updated pages are allowed to be written to disk before the
transaction commits. It is a form of immediate update.
o No steal: Updated pages cannot be written to the disk before the transaction
commits. It is a kind of deferred update.
Recovery approaches: Force/no force approach
o Force: In this case transaction writes immediately all updated pages to the
disk
when the transaction commits.
o No force: Pages updated by the transaction are not written immediately to the
disk.
Recovery management: Recovery management has two components
o Recovery manager: It keeps track of transactions, handles commit and abort
operations. It also takes care of system checkpoint and restart.
o Log manager: It provides log services to the recovery manager and other
components that may need its service.
Causes of Transaction Failure: The causes are as follows.
Logical Errors: These are defined as fatal errors in transaction
DBMS Error: It is due to deadlock detection and rollback system enters in
bad state.
System Crash: Power out, OS failure, H/W malfunction.
I/O: It is like a disk failure.
4.2 ARIES ALGORITHM
Algorithms for Recovery and Isolation Exploiting Semantics, or ARIES is a
recovery algorithm
designed to work with a no-force, steal database approach; it is used by IBM
DB2, Microsoft
SQL Server and many other database systems.
Three main principles lie behind ARIES
Write ahead logging: Any change to an object is first recorded in the log, and
the log
must be written to stable storage before changes to the object are written to disk.
Repeating history during Redo: On restart after a crash, ARIES retraces the
actions of a
database before the crash and brings the system back to the exact state that it
was in
before the crash. Then it undoes the transactions still active at crash time.
Logging changes during Undo: Changes made to the database while undoing
transactions are logged to ensure such an action isn't repeated in the event of
repeated
restarts.
ARIES perform three steps after crash
Analysis: Finds all pages that have not been written to disk (dirty pages) and
all
active transactions at the time of crash.
Redo: Repeats all the statements in the log (at an appropriate point) and
restore the
database to a state similar to before crash has occurred.
Undo: Undoes the operations of transactions those did not commit.
Information sources of ARIES recovery
Log record: Each log record has a log sequence number (LSN) which is
monotonically increasing. It indicates the address of the log record on the disk.
There
are different logging actions like write, commit, abort, undo and ending a
transaction
which are recorded in log record.
Transaction table: It contains an entry for each active transaction. In recovery
process it is rebuild.
Dirty page table: It contains an entry for each dirty page in the buffer. It also
includes the page ID and the LSN corresponding to the earliest update to that
page.
ARIES Compensation Log Record (CLR)
This record is written just before the change recorded in update log is undone.
It describes the action taken to undo the actions recorded in the
corresponding update
record.
It contains field undoNextLSN, the LSN of the next log record that is to be
undone
for the transaction that wrote the update record.
It describes an action that will never be undone.
CLR contains information needed to reapply or redo, but not to reverse it.
4.3 LOG BASED RECOVERY
Log file is a sequential file that contains a record of actions taken by an entity. A
log is kept on
stable storage. There are two log records used by log based recovery technique:
Undo log records: It contains log entries of all write operations before update.
Redo log records: It contains log entries of all write operations after update.
The algorithm for log based recovery is as follows:
When transaction Ti starts, it registers itself by writing a <Ti start> log record
Before Ti executes write (X), a log record < Ti, X, V1, V2> is written, where
V1 is the
value of X before the write, and V2 is the value to be written to X.
When Ti finishes its last statement, the log record <Ti commit> is written
We assume for now that log records are written directly to stable storage (
that is, they are
not buffered)
The BFIM is not overwritten by AFIM until all undo log records for the
updating
information is force written to the disk.
The commit operation of a transaction cannot be completed until all the redo
and undo
log are force written to the disk.
Two approaches using logs
o Deferred database modification: This scheme records all modifications to the
log,
but defers all the writes to after partial commit.
o Immediate database modifications: This scheme allows database updates of an
uncommitted transaction to be made as the writes are issued.
4.4 TRANSACTION AND DIRTY PAGE TABLE
Dirty page table is used to represent information about dirty buffer pages during
normal
processing. It is also used during restart recovery. It is implemented using
hashing or via the
deferred- writes queue mechanism. Each entry in the table consists of two
fields:
PageID and
RecLSN
During normal processing , when a non-dirty page is being fixed in the buffers
with the intention
to modify , the buffer manager records in the buffer pool (BP) dirty-pages table ,
as RecLSN ,
the current end-of-log LSN , which will be the LSN of the next log record to be
written. The
value of RecLSN indicates from what point in the log there may be updates.
Whenever pages are
written back to nonvolatile storage, the corresponding entries in the BP dirty-
page table are
removed. The contents of this table are included in the checkpoint record that is
written during
normal processing. The restart dirty-pages table is initialized from the latest
checkpoint's record
and is modified during the analysis of the other records during the analysis pass.
The minimum
RecLSN value in the table gives the starting point for the redo pass during
restart recovery.
4.5 WRITE AHEAD LOG PROTOCOL
Write-ahead logging (WAL) is a family of techniques for providing atomicity
and durability
(two of the ACID properties) in database systems.
The Write-Ahead Logging Protocol:
Must force the log record for an update before the corresponding data page
gets to disk.
Must write all log records for a exact before commit.
Guarantees Atomicity.
Guarantees Durability.
WAL allows updates of a database to be done in-place. Another way to
implement atomic
updates is with shadow paging, which is not in-place. The main advantage of
doing updates inplace is that it reduces the need to modify indexes and block
lists.
ARIES is a popular algorithm in the WAL family.
4.6 CHECKPOINTS
Checkpoint mechanism copies the state of a process into nonvolatile storage.
Restore mechanism
copies the last known checkpointed state of the process back into memory and
continues
processing. This mechanism is especially useful for application which may run
for long periods
of time before reaching a solution.
Checkpoint-Recovery gives an application or system the ability to save its state,
and tolerate
failures by enabling a failed executive to recover to an earlier safe state.
Key ideas
Saves executive state
Provides recovery mechanism in the presence of a fault
Can allow tolerance of any non-apocalyptic failure
Provides mechanism for process migration in distributed systems for fault
tolerance reasons or load balancing
During the execution of the transaction, periodically perform checkpointing.
This includes
Output the log buffers to the log.
Force – write the database buffers to the disk.
Output an entry < checkpoint > on the log.
During the recovery process, the following two steps are performed
Undo all the transactions that have not committed.
Redo all transactions that have committed after the checkpoint.
Demerits of technique:
Insufficient in context of large databases
It requires transactions to execute serially
4.7 RECOVERY FROM A SYSTEM CRASH
Sometimes when there is power failure or some hardware or software failure
occurs, it causes the
system to crash. The following actions are taken when recovering from system
crash
Scan log forward from last <checkpoint> record
Repeat history by physically redoing all updates of all transactions.
Create an undo-list during scan as follows
o Undo-list is set to L initially
o Whenever <Ti start> is found Ti is added to undo-list
o Whenever <Ti commit> or <Ti abort> is found, Ti is deleted from undo-list
This brings database to state as of crash, with committed as well as
uncommitted transactions
having been redone. Now undo-list contains transactions that are incomplete,
that is, have neither
committed nor been fully rolled back.
Scan log backwards, performing undo on log records of transactions found in
undo-list.
Transactions are rolled back as described earlier.
When <Ti start> is found for a transaction Ti in undo-list, write a <Ti abort>
log record.
Stop scan when <Ti start> records have been found for all Ti in undo-list
This undoes the effects of incomplete transactions (those with neither commit
nor abort log
records). Recovery is now complete.
4.8 REDO AND UNDO PHASES
The recovery algorithms like ARIES have two phases: Undo and Redo phases
(i) Undo (Ti): Restore the value of all data items updated by Ti to old values.
The log is scanned
backwards and the operations of transactions that were active at the time of the
crash are undone
in reverse order.
(ii) Redo (Ti): Sets the value of data items updated by transaction Ti to new
values. It actually
reapplies updates from the log to the database. Generally the Redo operation is
applied to only
committed transactions. If a transaction was aborted before the crash and its
updates were
undone, as indicated by CLRs, the actions described in CLRs are also reapplied.

You might also like