0% found this document useful (0 votes)
2 views

note_dbms

The document discusses the concept of serializability in database management systems, which ensures that concurrent transactions yield a consistent database state equivalent to some serial execution. It outlines types of schedules, including conflict and view serializability, and methods of concurrency control such as lock-based and timestamp-based schedulers. Additionally, it covers multi-version concurrency control (MVCC) and optimistic concurrency control (OCC), along with database recovery mechanisms to maintain consistency and durability in the event of failures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

note_dbms

The document discusses the concept of serializability in database management systems, which ensures that concurrent transactions yield a consistent database state equivalent to some serial execution. It outlines types of schedules, including conflict and view serializability, and methods of concurrency control such as lock-based and timestamp-based schedulers. Additionally, it covers multi-version concurrency control (MVCC) and optimistic concurrency control (OCC), along with database recovery mechanisms to maintain consistency and durability in the event of failures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

note

Serializability of Scheduling
1. Introduction
In a multi-user database environment, multiple transactions often execute concurrently to
improve performance and resource utilization. However, concurrent execution can lead to
problems like inconsistency, lost updates, dirty reads, and uncommitted data.

To preserve database correctness, the DBMS uses a concept called serializability, which
ensures that the concurrent execution of transactions is equivalent to some serial
execution of those transactions.

2. Definition
Serializability is the highest level of isolation in concurrency control. A schedule is said to
be serializable if its result is equivalent to some serial schedule — i.e., one where
transactions are executed one after another without overlapping.
In other words, although transactions may interleave during actual execution, the final
outcome (state of the database) must be as if they were executed in some serial order.

3. Types of Schedules
Serial Schedule: Transactions execute one after another with no interleaving.

Concurrent Schedule: Operations from different transactions are interleaved.

4. Types of Serializability
4.1 Conflict Serializability
Based on the idea of conflicting operations:

Read-Write ( RW )

Write-Read ( WR )

Write-Write ( WW ) on the same data item

A schedule is conflict-serializable if it can be transformed into a serial schedule by


swapping non-conflicting operations.

note 1
Conflict serializability is checked using a Precedence Graph (also known as a
Serialization Graph):
Steps:

1. Create a node for each transaction.

2. For every conflicting operation between Ti and Tj, add an edge from Ti → Tj if Ti's
operation comes first.

3. If the graph has no cycles, the schedule is conflict-serializable.

4.2 View Serializability


Weaker than conflict serializability but more general.

Two schedules are view-equivalent if:

Initial reads are the same.

Final writes are the same.

Each read operation in one schedule reads the same value as in the other.

A schedule is view-serializable if it is view-equivalent to a serial schedule.


Note: All conflict-serializable schedules are view-serializable, but not vice versa.

4.3 Commit Order Serializability (CO-Serializability)


Applies in distributed and distributed replicated databases.

Enforces that transactions must commit in the same order as the serialization order.

5. Importance of Serializability
Ensures consistency and correctness of transactions.

Prevents race conditions and anomalies like:

Dirty Read

Lost Update

Non-repeatable Read

Phantom Read

Basis for designing concurrency control protocols like:

Two-Phase Locking (2PL)

note 2
Timestamp Ordering

Optimistic Concurrency Control

6. Serializability vs Recoverability
Concept Focus Ensures

Serializability Logical correctness of interleaving Final state is consistent

Recoverability Transaction commit/abort behavior No transaction uses uncommitted data

7. Example
Consider two transactions:

T1: T2:
Read(A) Read(A)
Write(A) Write(A)

If interleaved as:

T1: Read(A)
T2: Read(A)
T2: Write(A)
T1: Write(A)

This is not conflict-serializable, because:

T1 reads A before T2 writes A

T2 writes A before T1 writes A

Leads to a cycle in the precedence graph: T1 → T2 → T1 ⇒ Not Serializable

8. Limitations
View serializability is undecidable in general; conflict serializability is used more in
practice due to its algorithmic testability.

May reduce performance if the system over-constrains the schedule to preserve


serializability.

note 3
Conclusion
Serializability is the cornerstone of concurrency control in DBMS. It ensures that
concurrent execution of transactions results in a database state that could be obtained
under some serial execution, thereby preserving correctness. While conflict serializability
is easier to implement and verify, view serializability offers a broader notion of
correctness. Understanding and enforcing serializability is essential for designing reliable
and consistent transaction processing systems.

Locking and Timestamp-Based Schedulers


Concurrency control in a database ensures isolation and correctness when multiple
transactions execute simultaneously. Schedulers are mechanisms within the DBMS
responsible for ordering the execution of operations from concurrent transactions in such
a way that the final result is serializable and recoverable.
Two primary methods used for concurrency control are:

1. Lock-Based (Pessimistic) Schedulers

2. Timestamp-Based (Optimistic) Schedulers

1. Lock-Based (Pessimistic) Scheduling


1.1 Overview
Lock-based concurrency control uses the idea of locks to regulate concurrent access to
data. When a transaction wants to read or write a data item, it must first acquire an
appropriate lock. If another transaction already holds a conflicting lock, the current
transaction must wait.
This is a pessimistic approach, assuming that conflicts will occur, so it prevents them
ahead of time.

1.2 Types of Locks


1. Shared Lock (S-lock): Allows a transaction to read a data item. Multiple transactions
can hold shared locks on the same item.

2. Exclusive Lock (X-lock): Allows a transaction to read and write a data item. Only one
transaction can hold an exclusive lock on a data item at a time.

1.3 Lock Compatibility Matrix

note 4
Shared (S) Exclusive (X)

Shared (S) Yes No

Exclusive (X) No No

1.4 Two-Phase Locking Protocol (2PL)


To ensure conflict serializability, transactions follow the Two-Phase Locking Protocol:

1. Growing Phase: Transaction can acquire locks but cannot release any.

2. Shrinking Phase: Transaction releases locks but cannot acquire new ones.

Strict 2PL: A special case where a transaction holds all its locks until it commits or aborts.
This ensures recoverability and avoids cascading aborts.

1.5 Deadlock in Locking Schedulers


Deadlock occurs when two or more transactions wait indefinitely for each other’s locks.
Example:

T1 holds lock on A, waiting for B

T2 holds lock on B, waiting for A

Deadlock Handling Strategies:

Wait-Die / Wound-Wait protocols

Timeouts

Deadlock Detection with Wait-for Graphs

2. Timestamp-Based (Optimistic) Scheduling


2.1 Overview
Timestamp-based concurrency control assigns a unique timestamp to each transaction at
its start time, ensuring a serial order of execution based on these timestamps.
It is an optimistic approach, assuming that conflicts are rare, and resolving them only if
they occur.

2.2 How It Works


Each transaction Ti is assigned a timestamp TS(Ti) when it begins.
For each data item X , the system maintains:

note 5
read_TS(X): Largest timestamp of any transaction that read X

write_TS(X): Largest timestamp of any transaction that wrote to X

When Ti performs a read or write:

Read Rule:

If TS(Ti) < write_TS(X): The read is rejected (too late); rollback.

Else: The read is allowed and read_TS(X) is updated.

Write Rule:

If TS(Ti) < read_TS(X): Write causes inconsistency → rollback.

If TS(Ti) < write_TS(X): Obsolete write → reject and rollback.

Else: Allow write and update write_TS(X) .

This ensures that all operations appear as if executed in timestamp order.

2.3 Advantages
Deadlock-free: Since transactions don’t wait, there are no cycles in wait-for graphs.

No locking overhead: Avoids blocking and lock management.

2.4 Disadvantages
May cause frequent rollbacks (especially under high contention)

Does not support recoverability and cascading abort prevention by default — needs
extensions like Thomas’ Write Rule or multi-version concurrency control (MVCC).

Comparison of Lock-Based vs Timestamp-Based


Schedulers
Feature Lock-Based Scheduler Timestamp-Based Scheduler

Approach Pessimistic (prevents conflict) Optimistic (resolves post-conflict)

Mechanism Locks (S, X) Timestamps and version tracking

Serializability Ensured By Two-phase locking Timestamp ordering

Deadlock Possibility Yes No

Rollbacks Fewer More frequent under contention

Blocking Yes (transactions may wait) No (transactions never wait)

note 6
Recovery Support Easier with strict 2PL Needs additional mechanisms

Conclusion
Both Lock-Based and Timestamp-Based Schedulers are fundamental techniques in
concurrency control. Lock-based schedulers are more commonly used due to their
simplicity and direct support for recoverability, especially when implemented with strict
2PL. Timestamp-based schedulers are conceptually elegant and avoid deadlocks, but
require careful handling to manage rollbacks and ensure recoverability.
The choice between the two depends on the workload characteristics, system
requirements, and design priorities of the DBMS.

1. Multi-Version Concurrency Control (MVCC)


1.1 Overview
Multi-Version Concurrency Control (MVCC) is a concurrency control method that allows
multiple versions of data items to exist simultaneously. Instead of locking data, MVCC
maintains several historical versions of a data item and allows transactions to access the
appropriate version based on their timestamps or transaction IDs.
MVCC is a non-blocking, read-optimized approach and is widely used in systems like
PostgreSQL, Oracle, and MySQL (InnoDB).

1.2 Motivation
In traditional locking schemes, readers and writers can block each other. MVCC eliminates
this issue by letting readers see a consistent snapshot of the database as of the time they
began, without waiting for writers to finish.

1.3 How MVCC Works


For each data item X , the system maintains:
Multiple versions: X₁, X₂, ..., Xn

Each version has metadata:

Write timestamp (TSw)

Valid time range (start_TS, end_TS)

note 7
Sometimes: read timestamps, transaction ID

Operations:
Read(X):

A transaction T with timestamp TS(T) reads the version Xk such that:

write_TS(Xk) ≤ TS(T)

And no other newer version exists ≤ TS(T)

Write(X):

A transaction T creates a new version of X , say Xn+1

Old versions are retained until no longer needed

1.4 Advantages of MVCC


Non-blocking reads: Readers don’t wait for writers

Higher concurrency: Allows more transactions to run in parallel

Snapshot isolation: Each transaction sees a consistent view of the database

Reduced deadlocks

1.5 Disadvantages of MVCC


Storage overhead: Multiple versions consume more disk/memory

Version maintenance: Requires garbage collection of obsolete versions

Complex implementation: Requires efficient version tracking and validation

1.6 Example
Assume T1 starts at time TS=10 , and T2 updates a record at TS=12 . T1 will continue to read
the older version (written before TS=10 ) even if T2 commits. This guarantees that T1 reads a
consistent snapshot of the database.

2. Optimistic Concurrency Control (OCC)


2.1 Overview

note 8
Optimistic Concurrency Control (OCC) is based on the assumption that conflicts between
transactions are rare. Therefore, it allows transactions to execute without restrictive control
and only validates at the end whether the transaction can commit.

If a conflict is detected during validation, the transaction is rolled back.


This is a "run-first, check-later" strategy — unlike locking, which is "check-first, run-
later."

2.2 Phases of OCC


Each transaction goes through three key phases:

1. Read Phase
The transaction reads values and performs computations using local copies.

No changes are made to the database.

2. Validation Phase
The system checks whether committing this transaction would violate serializability.

It compares read/write sets with other concurrent transactions.

3. Write Phase
If validation passes, changes are written to the database.

If it fails, the transaction is aborted and restarted.

2.3 Validation Rules


Assume T1 finishes before T2 starts its write phase. To preserve serializability, ensure:

T1's write set doesn't overlap with T2's read set

If overlap exists, T2 must be aborted and retried

2.4 Advantages of OCC


No locking: Eliminates deadlocks

Efficient under low contention: Best when few transactions conflict

Supports high concurrency

note 9
2.5 Disadvantages of OCC
High rollback rate in high-contention workloads

Wasted work: Entire transaction may be aborted at the end

Validation overhead: Expensive when read/write sets are large

2.6 Use Cases


Read-intensive systems

Mobile/disconnected environments

Large-scale distributed systems with low write-write conflicts

3. MVCC vs OCC – Conceptual Comparison


Feature MVCC OCC

Concurrency
Multi-version snapshot Single-version, optimistic validation
Model

Blocking Non-blocking for reads Non-blocking for all operations

During read/write using


Conflict Detection During validation phase
timestamps

Rollbacks Rare (only for write conflicts) Can be frequent under high contention

Storage Overhead High (due to versioning) Low

Deadlocks Avoided Avoided

Mixed workloads, frequent Low-conflict, high-read, high-latency


Best Use Case
reads environments

Conclusion
Both MVCC and OCC aim to improve concurrency by avoiding locks, but they take different
approaches:

MVCC sacrifices storage for read performance, ideal for workloads with frequent reads
and moderate writes.

OCC delays conflict detection until commit time, making it suitable for distributed or
mobile systems with low contention.

note 10
Modern systems may combine both strategies or dynamically adapt between them
depending on the workload and contention level.

Database Recovery – A Detailed Explanation


1. Introduction
A database system must ensure that the database remains consistent, correct, and
durable, even in the face of failures such as:

System crashes

Power outages

Software bugs

Disk failures

Transaction aborts

To achieve this, Database Recovery is the process of restoring the database to a correct
state after a failure.

It is based on the ACID properties of transactions — particularly Atomicity (all-or-nothing


execution) and Durability (committed data persists permanently).

2. Types of Failures
1. Transaction Failure

Logical error (e.g., divide by zero)

Explicit abort (e.g., rollback command)

2. System Failure

Crash of the DBMS or operating system

Main memory is lost, but disk remains intact

3. Media Failure

Disk or storage failure (e.g., hard disk crash)

4. Application/Software Failure

Bugs or misbehaving transactions

note 11
3. Recovery Objectives
Undo the effects of incomplete or failed transactions

Ensure that committed transactions are durable

Maintain database consistency and integrity

Resume normal operation quickly after failure

4. Key Concepts
4.1 Transaction States
Active: Executing

Partially Committed

Committed: All effects made permanent

Failed: Error occurred

Aborted: Rolled back

Only committed transactions should affect the final database state.

4.2 Log-Based Recovery


The transaction log (or system log) is a crucial component in recovery. It records all
actions performed by transactions.
Types of log entries:

<T_i, START>

<T_i, X, old_value, new_value> (for writes)

<T_i, COMMIT>

<T_i, ABORT>

Logs are typically written to stable storage before actual data is changed (Write-Ahead
Logging – WAL).

5. Recovery Techniques
5.1 Deferred Update (No-Undo/Redo)
Changes are not written to the database until the transaction commits.

note 12
On failure: no need to undo uncommitted changes.

Redo may be needed for committed transactions after recovery.

5.2 Immediate Update (Undo/Redo)


Changes can be written before commit.

On failure:

Undo changes of uncommitted transactions.

Redo changes of committed transactions.

5.3 Checkpoints
A checkpoint is a snapshot of the database state written to disk at regular intervals to
reduce recovery time.
Process:

1. Suspend new transactions

2. Write all log and dirty pages to disk

3. Record a <CHECKPOINT> log entry

During recovery, the system can skip scanning the entire log and start from the most recent
checkpoint.

5.4 Recovery with Write-Ahead Logging (WAL)


WAL Protocol ensures:

Before writing a data block to disk, its log entry must be written first

Ensures atomicity and durability

Recovery Actions:

Undo uncommitted transactions

Redo committed transactions

6. Shadow Paging (Non-Log-Based Recovery)


An alternative to log-based recovery.

Each data page has a shadow copy.

Modifications are made on a new page.

note 13
At commit, pointers are updated atomically to the new pages.

If a crash occurs before commit, original pages are intact.

Advantages:

No logs required

Simple to implement

Disadvantages:

High overhead for page copying

Poor performance for high-volume systems

7. ARIES Recovery Algorithm (Advanced)


Used in most commercial DBMSs.

ARIES = Algorithms for Recovery and Isolation Exploiting Semantics


Key Features:

Uses WAL

Supports fine-grained locking

Allows partial rollbacks

Phases:

1. Analysis: Identify active and committed transactions

2. Redo: Reapply all changes from log

3. Undo: Roll back incomplete transactions

8. Recovery in Distributed Systems


Requires coordinated recovery across multiple sites

Often uses Two-Phase Commit (2PC) to ensure atomicity

Prepare phase: All sites vote

Commit phase: All commit only if all vote “yes”

9. Summary of Recovery Actions


Transaction Status Action on Recovery

note 14
Committed Redo

Uncommitted Undo

10. Conclusion
Database recovery is essential for ensuring atomicity and durability in transaction
processing. Whether through log-based approaches, shadow paging, or advanced
techniques like ARIES, the DBMS must be equipped to handle failures gracefully and
restore the database to a consistent state. Recovery mechanisms form the backbone of
reliable data management in mission-critical systems.

note 15

You might also like