This document discusses various concurrency control techniques for database management systems, including conflicting operations, conflict serializable schedules, dependency graphs, two-phase locking, timestamp ordering, and multi-version concurrency control. Key points covered include how to ensure serializability and avoid issues like deadlocks, cascading aborts, and lost updates through the use of locking and timestamp-based approaches.
This document discusses various concurrency control techniques for database management systems, including conflicting operations, conflict serializable schedules, dependency graphs, two-phase locking, timestamp ordering, and multi-version concurrency control. Key points covered include how to ensure serializability and avoid issues like deadlocks, cascading aborts, and lost updates through the use of locking and timestamp-based approaches.
Jia Zou Arizona State University Review Conflicting Operations • We need a formal notion of equivalence that can be implemented efficiently… • Base it on the notion of “conflicting” operations • Definition: Two operations conflict if: • They are by different transactions • They are on the same object and at least one of them is a write Conflict Serializable Schedules • Two schedules are conflict equivalent iff: • They involve the same actions of the same transactions, and • Every pair of conflicting actions is ordered the same way. • Schedule S is conflict serializable if: • S is conflict equivalent to some serial schedule. Dependency Graphs • One node per txn. • Edge from Ti to Tj if: • An operation Oi of Ti conflicts with an operation Oj of Tj and • Oi appears earlier in the schedule than Oj • Also known as a “precedence graph” Dependency Graphs • Theorem: A schedule is conflict serializable if and only if its dependency graph is acyclic. Example #1 Example #1 Not conflict serializable! Example #2 – Lost Update Example #2 – Lost Update
Not conflict serializable!
Example #3 – Threesome Example #3 – Threesome • Q: Is this equivalent to a serial execution? • A: Yes (T2, T1, T3) • Notice that T3 should go after T2, although it starts before it!
Conflict serializable! Example #4 – Inconsistent Analysis Example #4 – Inconsistent Analysis Example #4 – Inconsistent Analysis Example #4 – Inconsistent Analysis
Not conflict serializable!
Two-Phase Locking • Executing with Locks Two-Phase Locking • Phase 1: Growing • Each txn requests the locks that it needs from the DBMS’s lock manager. • The lock manager grants/denies lock requests. • Phase 2: Shrinking • The txn is allowed to only release locks that it previously acquired. It cannot acquire new locks. Two-Phase Locking • The txn is not allowed to acquire/upgrade locks after the growing phase finishes. Two-Phase Locking • The txn is not allowed to acquire/upgrade locks after the growing phase finishes. Executing with 2PL Executing with 2PL Two-Phase Locking • 2PL on its own is sufficient to guarantee conflict serializability (i.e., schedules whose precedence graph is acyclic), but, it is subject to cascading aborts 2PL – Cascading Aborts Strict Two-Phase Locking • To fix the dirty-reads problem • Intuitively: Release all locks at the end of a txn • A schedule is strict if a value written by a txn is not read or overwritten by other txns until that txn finishes. • Advantages: • Does not incur cascading aborts. • Aborted txns can be undone by just restoring original values of modified tuples. Examples • T1: Move $50 from A’s account to B’s account. • T2: Compute the total amount in all accounts and return it to the application. Non-2PL Example 2PL Example Strict 2PL Example Today’s Agenda • Deadlocks • Concurrency Control based on Time Ordering Optional! • Multi-version Concurrency Control Deadlocks (Optional) • Deadlock: Cycle of transactions waiting for locks to be released by each other. • Two ways of dealing with deadlocks: • Deadlock detection • Deadlock prevention Deadlock Detection • The DBMS creates a waits-for graph: • Nodes are transactions • Edge from Ti to Tj if Ti is waiting for Tj to release a lock • The system periodically check for cycles in waits-for graph. Deadlock Detection Deadlock Detection Deadlock Detection Solutions to Deadlock • Deadlock handling • Select a “victim” transaction, and rollback it back to break the deadlock. • Deadlock prevention • When a txn tries to acquire a lock that is held by another txn, kill one of them to prevent a deadlock. • No waits-for graph or detection algorithm. Summary • Concurrency Control Summary • Conflict Serializability ↔ Correctness • Automatically correct interleavings: • Locks + protocol (2PL, S2PL ...) • Deadlock detection and prevention Concurrency Control • Two-Phase Locking (2PL) • Determine serializability order of conflicting operations at runtime while txns execute. • Timestamp Ordering (T/O) • Determine serializability order of txns before they execute. Concurrency Control • Two-Phase Locking (2PL) • Determine serializability order of conflicting operations at runtime while txns execute. • Timestamp Ordering (T/O) Optiona l • Determine serializability order of txns before they execute. Timestamp Allocation • Each txn Ti is assigned a unique fixed timestamp that is monotonically increasing. • Let TS(Ti ) be the timestamp allocated to txn Ti • Different schemes assign timestamps at different times during the txn. • Multiple implementation strategies: • System Clock. • Logical Counter. • Hybrid. T/O Concurrency Control • Use these timestamps to determine the serializability order. • If TS(Ti ) < TS(Tj ), then the DBMS must ensure that the execution schedule is equivalent to a serial schedule where Ti appears before Tj . Basic T/O • Txns read and write objects without locks. • Every object X is tagged with timestamp of the last txn that successfully did read/write: • W-TS(X) – Write timestamp on X • R-TS(X) – Read timestamp on X • Check timestamps for every operation: • If txn tries to access an object “from the future”, it aborts and restarts. Basic T/O – Reads • If TS(Ti ) < W-TS(X), this violates timestamp order of Ti w.r.t. writer of X. • Abort Ti and restart it with a newer Ts • Else: • Allow Ti to read X. • Update R-TS(X) to max(R-TS(X), TS(Ti )) • Have to make a local copy of X to ensure repeatable reads for Ti . Basic T/O - Writes • If TS(Ti ) < R-TS(X) or TS(Ti ) < W-TS(X) • Abort and restart Ti with a newer Ts. • Else: • Allow Ti to write X and update W-TS(X) • Also have to make a local copy of X to ensure repeatable reads for Ti . Basic T/O – Example #1 Basic T/O – Example #1 Basic T/O – Example #1 Basic T/O – Example #1 Basic T/O – Example #1 Basic T/O – Example #1 Basic T/O – Example #1 Basic T/O – Example #1 Basic T/O – Example #2 Basic T/O – Example #2 Basic T/O – Example #2 Basic T/O – Example #2 Basic T/O • Ensures conflict serializability. • No deadlocks because no txn ever waits. • Possibility of starvation for long txns if short txns keep causing conflicts. Multi-Version Concurrency Control • Writes create new versions of objects (global copies) instead of in- place updates: • Each successful write results in the creation of a new version of the data item written. • Use write timestamps to label versions. • Let Xk denote the version of X created by a given txn Ti : W-TS(Xk) = TS(Ti ) MVCC – Reads • Any read operation sees the latest version of an object right before that txn started. • Every read request can be satisfied without blocking the txn. • If TS(Ti ) > R-TS(Xk): • Set R-TS(Xk) = TS(Ti ) MVCC – Writes • If TS(Ti ) < R-TS(Xk): • Abort and restart Ti . • If TS(Ti ) = W-TS(Xk): • Overwrite the contents of Xk. • Else: • Create a new version of Xk+1 and set its write timestamp to TS(Ti ). MVCC • Can still incur cascading aborts because a txn sees uncommitted versions from txns that started before it did. • Old versions of tuples accumulate. • The DBMS needs a way to remove old versions to reclaim storage space. MVCC Implementations • Store versions directly in main tables: • Postgres, Firebird/Interbase • Store versions in separate temp tables: • MSFT SQL Server • Only store a single master version: • Oracle, MySQL MVCC – Performance Issues • High abort overhead cost. • Garbage collection overhead. • Requires stalls to ensure recoverability. • Secondary index updates. Real Systems