DBMS Unit 4 Notes by MultiAtomsPlus
DBMS Unit 4 Notes by MultiAtomsPlus
Unit-4
Syllabus
Transaction System
Content Unit-4
ACID properties - (2021-22)
Operations During a Transaction
State diagram - (2022-23)
Schedule & Its Types - (2022-23)
Serializability
Testing of Serializability
Conflict & View Serializable
Recoverability
Recoverable , Cascadeless & Strict
Recovery System
The ACID properties ensure reliability and consistency for all database
transactions.
1. Atomicity:
2. Consistency:
Meaning: A transaction must take the database from one valid state to another,
maintaining all integrity rules.
Multi Atoms
Example: If a student’s marks are updated, total marks must still reflect correctly.
Why Useful?: Ensures database rules (like constraints or relationships) are always
followed.
3. Isolation:
Meaning: Multiple transactions can occur simultaneously, but they should not interfere
with each other.
Example: Two people trying to book the last flight seat will not both succeed; one will
complete first.
Why Useful?: Prevents data corruption or incorrect results in a concurrent environment.
4. Durability:
Meaning: Once a transaction is committed, the changes are permanent, even if the
system crashes.
Example: After transferring money, the balance update remains saved despite power
failure.
Why Useful?: Provides reliability and trust in the system.
Importance of Log:
How it works:
Logs contain:
Why important?
1. If a transaction fails midway, the log ensures the database can roll back changes to
its previous state.
2. After a system crash, logs are used to restore the database to a consistent state.
Multi Atoms
Operations During a Transaction
1. Read (R)
2. Write (W)
3. Update
4. Commit
A transaction moves through various states during its execution. Here’s a step-by-step explanation
with a diagram:
Partially
Active Commited
Committed
Start End
Failed Aborted
Explanation of States:
1. Active State:
The transaction is actively executing its operations (e.g., reading or writing to the
database).
Example: Deducting ₹500 from Account A.
3. Committed State:
The transaction is successfully completed, and its changes are permanently saved.
Example: Account A and Account B balances are updated and stored in the database.
4. Failed State:
Errors or issues (like system failure, invalid input, or insufficient funds) prevent the
transaction from completing.
Example: ₹500 deduction fails due to insufficient balance in Account A.
5. Aborted State:
The database rolls back the transaction, undoing any changes made during execution.
Example: If the money transfer fails, Account A and B are restored to their original
balances.
Multi Atoms
AKTU- 2022-23
A rollback occurs when a transaction fails, undoing all changes made so far to maintain the
database’s consistency.
Types of Schedules
Serial Schedule
Non-serial Schedule
Serializable Non-Serializable
Multi Atoms
Conflict View Recoverable Non-Recoverable
Types of Schedules
1. Serial Schedule
2. Non-Serial Schedule
Allows interleaving of operations from different transactions.
Provides better concurrency.
Example: Operations from T1 and T2 are mixed together.
Multi Atoms
Behaves as if the transactions were executed in a serial order.
Includes two types:
2. Non-Serializable
1. Recoverable Schedule
2 Cascading Schedule
2. Cascading Schedule
3. Cascadeless Schedule
4. Strict Schedule
Concurrency Problems
1. Dirty Read
Definition: A transaction reads uncommitted changes made by another transaction.
Multi Atoms
Impact: Inconsistent and unreliable data.
Prevention: Use the Read Committed isolation level or higher.
3. Phantom Read
Definition: A transaction re-executes a query and gets a different result set because
another transaction has added or removed rows that match the query criteria.
Impact: The result set changes unexpectedly.
Prevention: Use the Serializable isolation level.
4. Lost Update
Definition: Two transactions read the same data and update it concurrently, but one
update overwrites the other, leading to a loss of data.
Impact: One update is lost, leading to incorrect results.
Prevention: Use locks or the Serializable isolation level.
5. Deadlock
Definition: Two or more transactions wait indefinitely for resources locked by each other,
creating a circular wait.
Impact: Transactions are unable to proceed, leading to a system stall.
Prevention: Deadlock detection and recovery mechanisms (e.g., timeout, resource
ordering).
Multi Atoms
Multi Atoms
What is Serializability?
Importance of Serializability
1. Conflict Serializability
A schedule is conflict serializable if it can be transformed into a serial schedule by
swapping non-conflicting operations (read/write on different data items).
Conflict serializability is determined using a precedence graph
A schedule is conflict-serializable if its precedence graph (serialization graph) has no
cycles.
→T T T
2. Draw a directed edge j if a conflicting operation in i precedes j .
View Serializability
View serializability ensures that a non-serial schedule produces the same results
as a serial schedule by maintaining the consistency of database operations.
Two schedules S1and S2 are view equivalent if they satisfy the following
conditions:
Steps to Check View Serializability
1. Check Initial Read: Compare the initial reads for each data item in the non-
serial and serial schedules.
2. Check Updated Read: Verify that each transaction reads data updated by the
same transaction in both schedules.
3. Check Final Write: Ensure the final writes for each data item are performed
by the same transaction.
If all three conditions are satisfied, the schedule is view equivalent and hence
view serializable.
S1 S2
Multi Atoms
1. Initial Read
S S
The first transaction that reads a data item in 1 and 2 should be the same.
For A:
S T
In 1 , 1 performs R(A) first.
S T
In 2 , 1 performs R(A) first.
✅ Condition is satisfied for A.
For B :
S T
In 1 , 1 performs R(B) first.
S T
In 2 , 1 performs R(B) first.
✅ C nditi n i ti fi d f B
✅ Condition is satisfied for B .
Multi Atoms
1. Recoverable Schedule:
Multi Atoms
A schedule is recoverable if a transaction commits only after the transactions it depends on
have committed.
Example: If T2 reads a value written by T1, T2 should not commit until T1 has committed.
Why important? Prevents inconsistency caused by committing a transaction that depends on
another uncommitted transaction.
2. Cascadeless Schedule:
A stricter type of schedule where a transaction is not allowed to read uncommitted data
from another transaction.
Example: T2 cannot read A until T1 has committed its changes to A.
Why important? Prevents cascading rollbacks where one failure causes many transactions to
fail.
1. Why Better?: T2 waits for T1 to commit before reading its data, avoiding cascading rollbacks.
3. Strict Schedule:
The strictest type where no transaction can read or write a value modified by another
transaction until that transaction has committed or rolled back.
Why important? Makes recovery easier because no uncommitted changes are accessed by
other transactions.
1. Why Best?: T2 neither reads nor writes A until T1 commits, ensuring maximum safety.
AKTU- 2022-23
Multi Atoms
Multi Atoms
Multi Atoms
Recovery System
It is responsible for ensuring the database's consistency and integrity after failures. It
restores the database to its last consistent state using recovery techniques.
Types of Failures
Failures in DBMS can occur at various levels and are classified to identify the cause and
restore the database to a consistent state.
1. Transaction Failure
2. System Crash
Occurs due to issues at the system level, like hardware or software failures.
3. Disk Failure
Undo changes:
Restore A to its original value.
Restore B to its original value.
Multi Atoms
2. Redo Operation
Purpose: Reapplies changes made by committed transactions to ensure durability.
When Used: If a system crash occurs after a transaction commits but before the
changes are written to the database, those changes need to be reapplied.
Transaction T2:
1. Write(A = 100)
2. Write(B = 200)
3. Commit
If a crash occurs after the commit but before changes are fully applied:
Redo changes:
Reapply A = 100.
Reapply B = 200.
Log-Based Recovery Aktu 2023-24
A log is a sequence of records stored in stable storage to enable recovery of the database
after a failure. Each database operation is logged before it is applied.
2. Modification Log
<Tn, Account_Balance, 1000, 1200>: Logs the old and new values after a
transaction modifies the Account_Balance.
3. Commit Log
<Tn, Commit>: Indicates that the transaction Tn has been successfully completed.
4. Abort Log
Recovery Process:
If the system crashes after <T0, Commit>, redo T0 as its changes are in the log.
If the system crashes before <T1, Commit>, ignore T1's changes, as they are not applied.
2. Immediate Database Modification
Definition: Changes to the database are applied immediately, even before the transaction
commits.
Log Requirement:
1. Every operation writes to the log before the actual database modification.
2. Both undo and redo operations are needed in case of failure.
Advantages:
Faster updates during transaction execution.
<T0, Start>
<T0, A, 850, 800>
<T0, B, 1000, 1050>
<T0, Commit>
<T1, Start>
<T1, C, 600, 500>
Recovery Process:
If the system crashes after <T0, Commit>, redo T0 since it is committed.
If the system crashes before <T1, Commit>, undo T1 since it is not committed.
Multi Atoms
Checkpoints
A checkpoint is a mechanism that marks a point in the transaction log where the system was in
a consistent state. It ensures efficient log management by discarding older logs after the
checkpoint and recording new ones after the checkpoint.
Recovery Process:
1. LogScanning: The recovery system scans the logs in reverse order, starting from the most
recent transaction logs and going back to the checkpoint.
Redo List: Transactions that were committed (i.e., <Tn, Start> and <Tn, Commit>) and need to
be reapplied.
Undo List: Transactions that were incomplete (i.e., <Tn, Start> but no <Tn, Commit> or <Tn,
Abort>) and need to be rolled back.
Example of Recovery:
Let’s assume we have the following log after a checkpoint:
Redo transactions: T2 and T3 (both have <Tn, Start> and <Tn, Commit>).
Undo transactions: T1 (only <Tn, Start>) and T4 (only <T4, Start>).
Deadlock in DBMS
Multi Atoms
A deadlock occurs in a database when two or more transactions are stuck
because each is waiting for the other to release a resource. This
creates a cycle where no transaction can proceed, halting the system.
Deadlocks are a significant challenge in multi-user environments and can
severely impact the system's performance and reliability.
T1 T2
Re
qu
es e st
t qu Hold
Hold Re
R1 R2
Characteristics of Deadlock / Necessary Conditions
Deadlock Handling
1. Deadlock Detection
Multi Atoms
Wait-For Graph:
Transactions are represented as nodes. If a cycle is detected in the graph,
a deadlock exists.
Action: Abort one transaction in the cycle to break the deadlock.
T1 T2
Wait For Lock(R2)
2. Deadlock Avoidance
3. Deadlock Prevention
Deadlock prevention ensures that the system allocates resources in a way that
avoids circular waits, which are the main cause of deadlocks. Two common schemes
for prevention are Wait-Die and Wound-Wait.
Wait-Die Scheme
Multi Atoms
Key Idea: Older transactions have higher priority, and younger ones are rolled back
when conflicts occur.
Example:
Transaction T1 (older) requests a resource held by T2 (younger) → T1 waits.
Transaction T2 (younger) requests a resource held by T1 (older) → T2 is aborted
and restarted.
Wound-Wait Scheme
In multi-user database systems, deadlocks can occur when two or more transactions wait
indefinitely for resources held by each other. To ensure the system operates smoothly, it is
crucial to detect and recover from deadlocks efficiently.
Deadlock Detection
Deadlock detection identifies cycles of waiting transactions that prevent further progress.
The primary approach is the Wait-for Graph.
Definition: A directed graph where each node represents a transaction, an edge T1 ->
T2 indicates that transaction T1 is waiting for a resource held by transaction T2.
Cycle: If the graph contains a cycle, a deadlock is present.
Detection Steps:
1. Monitor Resources: The DBMS tracks transactions and their resource requests.
2. Graph Construction: The system constructs a wait-for graph using transaction states
and resource allocations.
3. Cycle Detection: Algorithms such as depth-first search (DFS) are used to find cycles in
the graph.
4. Deadlock Confirmation: If a cycle exists, the transactions involved are declared
4. Deadlock Confirmation: If a cycle exists, the transactions involved are declared
deadlocked.
Deadlock Recovery
Once a deadlock is detected, the system must resolve it to allow progress. Common recovery
strategies include:
1. Transaction Abortion
Key Idea: Abort one or more transactions involved in the deadlock to break the cycle.
Criteria for Selection:
2. Rollback Transactions
Key Idea: Undo the actions of the aborted transactions.
Steps:
3. Timeout Mechanism
Multi Atoms
Key Idea: Automatically terminate transactions waiting too long for resources.
Distributed Database
It is a collection of data spread across multiple locations, interconnected via a network. Each site in
a distributed database system functions independently, but together they form a unified database
system.
Key components of distributed databases include Distributed Data Storage, Concurrency Control,
and Directory System, explained below:
1. Distributed Data Storage
Distributed Data Storage refers to the practice of splitting and storing a database across multiple
physical locations, which could be on different servers or geographical regions. This allows for
better performance, scalability, and fault tolerance.
Techniques:
1. Fragmentation:
Multi Atoms
The database is divided into smaller pieces called fragments. These fragments can be stored
across multiple locations.
Horizontal Fragmentation: Divides a table by rows (e.g., all customer data for a specific region).
Vertical Fragmentation: Divides a table by columns (e.g., only storing certain fields like
customer names or addresses).
2. Replication:
Copies of the same data are stored at multiple sites to ensure availability and faster access.
Full Replication: All data is copied to every site.
Partial Replication: Only some data is copied across sites, based on access patterns or other
criteria.
3. Hybrid Approach:
Combines both fragmentation and replication to ensure that data is divided efficiently and
replicated for fault tolerance.
Advantages:
Faster local access to data: Data can be stored closer to users or applications, reducing latency.
Improved reliability and fault tolerance: Even if one site fails, data is still available from other
sites that store replicas.
Challenges:
Data Synchronization across Sites: Keeping data consistent across multiple locations can be
complex, especially in cases of updates or changes.
Increased Storage Requirements: Replicating data across multiple sites requires additional storage
space, which can increase costs.
2. Concurrency Control
Multi Atoms
Concurrency Control ensures that multiple transactions running simultaneously across different
locations in a distributed database do not cause inconsistencies or violations of data integrity.
Goals:
Maintain Data Integrity: Ensures that concurrent transactions do not interfere with each
Prevent Conflicts during Concurrent Updates: Prevents issues like lost updates, temporary
Preserve Transaction Isolation and Consistency: Ensures that transactions are isolated from
one another and the system remains in a consistent state even when multiple transactions are
executed concurrently.
Techniques:
1. Lock-Based Protocols:
Distributed Two-Phase Locking (2PL): A protocol where each transaction locks resources
before it starts and releases locks after completing. It ensures consistency by maintaining
a consistent order of locking across sites, ensuring that all required locks are acquired
before a transaction can be executed.
2. Time-Stamp Ordering:
Each transaction is assigned a unique global timestamp, and conflicts are resolved by the
order of their timestamps. This ensures transactions follow the correct sequence without
interfering with one another.
Transactions execute without locks or restrictions, but before committing, they are
validated to check if any conflicts occurred during their execution. If no conflicts are
found, the transaction is committed; otherwise, it is rolled back and retried.
4. Quorum-Based Protocols:
Multi Atoms
In this method, a transaction requires approval from a majority (quorum) of the nodes
before it can proceed. This ensures that data is not modified by transactions that are not
fully validated by the majority.
3. Directory System
The directory system in a distributed database maintains metadata about the database's structure,
data locations, fragmentation, and replication. It functions like a "map" that tracks where data
resides and how it's organized, enabling efficient data retrieval and management across multiple
sites.
Responsibilities:
L D F I h l i fi di h ifi d i f d
Locate Data or Fragments: It helps in finding where specific data or its fragments are stored
Manage Replication: It tracks duplicate copies of data to ensure they are available and up to
Transparent Access: The directory system ensures that users and applications can access
data without needing to know the physical location of the data or where it is replicated.
Types:
1. Centralized Directory:
A single directory that holds all the metadata. It’s simple to implement but creates a
single point of failure, which could be problematic for reliability and availability.
2. Distributed Directory:
3. Hierarchical Directory:
This approach combines both centralized and distributed systems, organizing the
metadata in a tree-like structure to balance efficiency and scalability.