0% found this document useful (0 votes)
15 views

Class 3

Uploaded by

jaf42747
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Class 3

Uploaded by

jaf42747
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Outline

1. A Model for Concurrency Control


2. Serializability Theory
5. Concurrency Control 3. Synchronization Requirements for Recoverability
4. Two-Phase Locking
for Transactions 5. Implementing Two-Phase Locking
6. Locking Performance
CSE 593 Transaction Processing 7. Hot Spot Techniques
Philip A. Bernstein 8. Query-Update Techniques
Copyright ©2001 Philip A. Bernstein
9. Phantoms
10. B-Trees
11. Tree locking
1/14/01 1 1/14/01 2

5.1 A Model for Concurrency Control


System Model
The Problem
Transaction 1 Transaction N
• Goal - Ensure serializable (SR) executions Start,
• Implementation technique - Delay operations SQL Ops
that would lead to non-SR results (e.g. set locks Commit, Abort
on shared data) Query Optimizer
• For good performance minimize overhead and Query Executor
Database
Access Method
delay from synchronization operations System
(record-oriented files)
• First, we’ll study how to get correct (SR) results Page-oriented Files
• Then, we’ll study performance implications
Database
1/14/01 3 1/14/01 4

How to Implement SQL Which Operations Get Synchronized?


SQL operations
• Query Optimizer - translates SQL into an ordered Query Optimizer
expression of relational DB operators (Select, Query Executor
Access Method Record-oriented operations
Project, Join)
(record-oriented files)
• Query Executor - executes the ordered expression Page-oriented operations
Page-oriented Files
by running a program for each operator, which in
• It’s a tradeoff between
turn accesses records of files
– amount of concurrency and
• Access methods - provides indexed record-at-a-
– overhead and complexity of synchronization
time access to files (OpenScan, GetNext, …)
• For now, assume page operations
• Page-oriented files - Read or Write (page address)
– notation: ri[x], wi[x] where “x” is a page

1/14/01 5 1/14/01
and use the neutral term data manager 6

1
Assumption - Atomic Operations Assumption - Txns communicate
only via Read and Write
• We will synchronize Reads and Writes.
• Read and Write are the only operations the
• We must therefore assume they’re atomic system will control to attain serializability.
– else we’d have to synchronize the finer-grained
• So, if transactions communicate via messages,
operations that implement Read and Write
then implement SendMsg as Write, and
• Read(x) - returns the current value of x in the DB ReceiveMsg as Read.
• Write(x, val) overwrites all of x (the whole page) • Else, you could have the following:
• This assumption of atomic operations is what w1[x] r2[x] send2[M] receive1[M]
allows us to abstract executions as sequences of – data manager didn’t know about send/receive and
reads and writes (without loss of information). thought the execution was SR.
– Otherwise, what would wk[x] ri[x] mean? • Also watch out for brain transport
1/14/01 7 1/14/01 8

Transactions Can Communicate via


Brain Transport (cont’d)
Brain Transport
T1: Start • For practical purposes, if user waits for T1 to
. . . Brain commit before starting T2, then the data manager
transport
Display output User reads output can ignore brain transport.
Commit … • This is called a transaction handshake
User enters input (T1 commits before T2 starts)
T2: Start • Reason - Locking preserves the order imposed by
Get input from display
transaction handshakes
. . .
Commit – e.g., it serializes T1 before T2.
• Stating this precisely and proving it is non-trivial.
• … more later ….
1/14/01 9 1/14/01 10

5.2 Serializability Theory Equivalence of Histories


• The theory is based on modeling executions as • Two operations conflict if their execution order
histories, such as affects their return values or the DB state.
H1 = r1[x] r2[x] w1[x] c1 w2[y] c2 – a read and write on the same data item conflict
• First, characterize a concurrency control – two writes on the same data item conflict
algorithm by the properties of histories it allows. – two reads (on the same data item) do not conflict
• Then prove that any history having these • Two histories are equivalent if they have the
properties is SR same operations and conflicting operations are in
• Why bother? It helps you understand why the same order in both histories
concurrency control algorithms work. – because only the relative order of conflicting
operations can affect the result of the histories

1/14/01 11 1/14/01 12

2
Examples of Equivalence Serializable Histories
• The following histories are equivalent • A history is serializable if it is equivalent to a serial
H1 = r1[x] r2[x] w1[x] c1 w2[y] c2 history
H2 = r2[x] r1[x] w1[x] c1 w2[y] c2 • For example,
H3 = r2[x] r1[x] w2[y] c2 w1[x] c1 H1 = r1[x] r2[x] w1[x] c1 w2[y] c2
H4 = r2[x] w2[y] c2 r1[x] w1[x] c1
is equivalent to
• But none of them are equivalent to
H4 = r2[x] w2[y] c2 r1[x] w1[x] c1
H5 = r1[x] w1[x] r2[x] c1 w2[y] c2
(r2[x] and w1[x] are in the same order in H1 and H4.)
because r2[x] and w1[x] conflict and
r2[x] precedes w1[x] in H1 - H4, but • Therefore, H1 is serializable.
w1[x] precedes r2[x] in H5.

1/14/01 13 1/14/01 14

Another Example Serialization Graphs


• A serialization graph, SG(H), for history H tells the
• H6 = r1[x] r2[x] w1[x] r3[x] w2[y] w3[x] c3 w1[y] c1 c2 effective execution order of transactions in H.
is equivalent to a serial execution of T2 T1 T3,
• Given history H, SG(H) is a directed graph whose
H7 = r2[x] w2[y] c2 r1[x] w1[x] w1[y] c1 r3[x] w3[x] c3 nodes are the committed transactions and whose
• Each conflict implies a constraint on any equivalent edges are all Ti → Tk such that at least one of Ti’s
serial history: operations precedes and conflicts with at least one
T2→T3 of Tk’s operations
H6 = r1[x] r2[x] w1[x] r3[x] w2[y] w3[x] c3 w1[y] c1 c2 H6 = r1[x] r2[x] w1[x] r3[x] w2[y] w3[x] c3 w1[y] c1 c2
T2→T1 T1→T3 T2→T1 SG(H6) = T2 →T1 →T3
1/14/01 15 1/14/01 16

The Serializability Theorem How to Use


A history is SR if and only if SG(H) is acyclic.
the Serializability Theorem
Proof: (if) SG(H) is acyclic. So let Hs be a serial
history consistent with SG(H). Each pair of • Characterize the set of histories that a
conflicting ops in H induces an edge in SG(H). concurrency control algorithm allows
Since conflicting ops in Hs and H are in the same • Prove that any such history must have an
order, Hs≡H, so H is SR. acyclic serialization graph.
(only if) H is SR. Let Hs be a serial history equivalent • Therefore, the algorithm guarantees SR
to H. Claim that if Ti → Tk in SG(H), then Ti executions.
precedes Tk in Hs (else Hs≡H). If SG(H) had a cycle, • We’ll use this soon to prove that locking
T1→T2→…→Tn→T1, then T1 precedes T1 in Hs, produces serializable executions.
a contradiction. So SG(H) is acyclic. 17 18
1/14/01 1/14/01

3
5.3 Synchronization Requirements Recoverability
for Recoverability • If Tk reads from Ti and Ti aborts, then Tk must abort
• In addition to guaranteeing serializability, – Example - w1[x] r2[x] a1 implies T2 must abort
synchronization is needed to implement abort easily. • But what if Tk already committed? We’d be stuck.
• When a transaction T aborts, the data manager wipes – Example - w1[x] r2[x] c2 a1
out all of T’s effects, including – T2 can’t abort after it commits
– undoing T’s writes that were applied to the DB, and • Executions must be recoverable:
A transaction T’s commit operation must follow the
– aborting transactions that read values written by T
commit of every transaction from which T read.
(these are called cascading aborts)
– Recoverable - w1[x] r2[x] c1 c2
• Example - w1[x] r2[x] w2[y] – Not recoverable - w1[x] r2[x] c2 a1
– to abort T1, we must undo w1[x] and abort T2 • Recoverability requires synchronizing operations.
(a cascading abort)
1/14/01 19 1/14/01 20

Avoiding Cascading Aborts Strictness


• It’s convenient to undo a write, w[x], by restoring its
• Cascading aborts are worth avoiding to
before image (=the value of x before w[x] executed)
– avoid complex bookkeeping, and
– avoid an uncontrolled number of forced aborts
• Example - w1[x,1] writes the value “1” into x.
– w1[x,1] w1[y,3] c1 w2[y,1] r2[x] a2
• To avoid cascading aborts, a data manager should
– abort T2 by restoring the before image of w2[y,1], = 3
ensure transactions only read committed data
• But this isn’t always possible.
• Example
– For example, consider w1[x,2] w2[x,3] a1 a2
– avoids cascading aborts: w1[x] c1 r2[x]
– a1 & a2 can’t be implemented by restoring before images
– allows cascading aborts: w1[x] r2[x] a1
– notice that w1[x,2] w2[x,3] a2 a1 would be OK
• A system that avoids cascading aborts also
guarantees recoverability. • A system is strict if it only reads or overwrites
committed data.
1/14/01 21 1/14/01 22

Strictness (cont’d) 5.4 Two-Phase Locking


• More precisely, a system is strict if it only executes • Basic locking - Each transaction sets a lock on each
ri[x] or wi[x] if all previous transactions that wrote x data item before accessing the data
committed or aborted. – the lock is a reservation
• Examples (“…” marks a non-strict prefix) – there are read locks and write locks
– strict: w1[x] c1 w2[x] a2 – if one transaction has a write lock on x, then no other
– not strict: w1[x] w2[x] … a1 a2 transaction can have any lock on x
– strict: w1[x] w1[y] c1 w2[y] r2[x] a2 • Example
– not strict: w1[x] w1[y] w2[y] a1 r2[x] a2 – rli[x], rui[x], wli[x], wui[x] denote lock/unlock operations
• “Strict” implies “avoids cascading aborts.” – wl1[x] w1[x] rl2[x] r2[x] is impossible
– wl1[x] w1[x] wu1[x] rl2[x] r2[x] is OK

1/14/01 23 1/14/01 24

4
Basic Locking Isn’t Enough Two-Phase Locking (2PL) Protocol
• Basic locking doesn’t guarantee serializability • A transaction is two-phase locked if:
– before reading x, it sets a read lock on x
• rl1[x] r1[x] ru1[x] wl1[y] w1[y] wu1[y] c1
– before writing x, it sets a write lock on x
rl2[y] r2[y] wl2[x] w2[x] ru2[y] wu2[x] c2 – it holds each lock until after it executes the
corresponding operation
• Eliminating the lock operations, we have
– after its first unlock operation, it requests no new locks
r1[x] r2[y] w2[x] c2 w1[y] c1 which isn’t SR
• Each transaction sets locks during a growing phase
and releases them during a shrinking phase.
• The problem is that locks aren’t being released • Example - on the previous page T2 is two-phase
properly. locked, but not T1 since ru1[x] < wl1[y]
– use “<” for “precedes”
1/14/01 25 1/14/01 26

2PL Theorem: If all transactions in an execution are


two-phase locked, then the execution is SR. 2PL and Recoverability
Proof: Define Ti ⇒ Tk if either • 2PL does not guarantee recoverability
– Ti read x and Tk later wrote x, or
– Ti wrote x and Tk later read or wrote x
• This non-recoverable execution is 2-phase locked
wl1[x] w1[x] wu1[x] rl2[x] r2[x] c2 … c1
• If Ti ⇒ Tk, then Ti released a lock before Tk
– hence, it is not strict and allows cascading aborts
obtained some lock.
• However, holding write locks until after commit or
• If Ti ⇒ Tk ⇒ Tm, then Ti released a lock before Tm
abort guarantees strictness
obtained some lock (because Tk is two-phase).
– and hence avoids cascading aborts and is recoverable
• If Ti ⇒... ⇒ Ti, then Ti released a lock before Ti
– In the above example, T1 must commit before it’s first
obtained some lock, breaking the 2-phase rule. unlock-write (wu1): wl1[x] w1[x] c1 wu1[x] rl2[x] r2[x] c2
• So there cannot be a cycle. By the Serializability
Theorem, the execution is SR. 27 28
1/14/01 1/14/01

Automating Locking 2PL Preserves Transaction Handshakes


• 2PL can be hidden from the application • Recall the definition: Ti commits before Tk starts
• When a data manager gets a Read or Write • 2PL serializes txns consistent with all transaction
operation from a transaction, it sets a read or write handshakes. I.e. there’s an equivalent serial
lock. execution that preserves the transaction order of
• How does the data manager know it’s safe to transaction handshakes
release locks (and be two-phase)? • This isn’t true for arbitrary SR executions. E.g.
• Ordinarily, the data manager holds a transaction’s – r1[x] w2[x] c2 r3[y] c3 w1[y] c1
locks until it commits or aborts. A data manager – T2 commits before T3 starts, but the only equivalent
– can release read locks after it receives commit serial execution is T3 T1 T2
– releases write locks only after processing commit, – rl1[x] r1[x] wl1[y] ru1[x] wl2[x] w2[x] wu2[x] c2
to ensure strictness (stuck, can’t set rl3[y]) r3[y] … so not 2PL
1/14/01 29 1/14/01 30

5
2PL Preserves Transaction Brain Transport  One Last Time
Handshakes (cont’d)
• If a user reads committed displayed output of Ti
• Stating this more formally … and uses that displayed output as input to
• Theorem: transaction Tk, then he/she should wait for
For any 2PL execution H, Ti to commit before starting Tk.
there is an equivalent serial execution Hs, • The user can then rely on transaction handshake
such that for all Ti, Tk, preservation to ensure Ti is serialized before Tk.
if Ti committed before Tk started in H,
then Ti precedes Tk in Hs.

1/14/01 31 1/14/01 32

5.5 Implementing Two-Phase Locking Lock Manager


• A lock manager services the operations
• Even if you never implement a DB system, it’s
– Lock(trans-id, data-item-id, mode)
valuable to understand locking implementation,
– Unlock(trans-id, data-item-id)
because it can have a big effect on performance.
– Unlock(trans-id)
• A data manager implements locking by
• It stores locks in a lock table. Lock op inserts
– implementing a lock manager
[trans-id, mode] in the table. Unlock deletes it.
– setting a lock for each Read and Write
Data Item List of Locks Wait List
– handling deadlocks
x [T1,r] [T2,r] [T3,w]
y [T4,w] [T5,w] [T6, r]

1/14/01 33 1/14/01 34

Lock Manager (cont’d)


• Caller generates data-item-id, e.g. by hashing data Lock Manager (cont’d)
item name
• The lock table is hashed on data-item-id • In MS SQL Server
– Locks are approx 32 bytes each.
• Lock and Unlock must be atomic, so access to the
– Each lock contains a Database-ID, Object-Id, and other
lock table must be “locked”
resource-specific lock information such as record id
• Lock and Unlock are called frequently. They must (RID) or key.
be very fast. Average < 100 instructions. – Each lock is attached to lock resource block (64 bytes)
– This is hard, in part due to slow compare-and-swap and lock owner block (32 bytes)
operations needed for atomic access to lock table

1/14/01 35 1/14/01 36

6
Deadlocks Deadlock Prevention
• A set of transactions is deadlocked if every • Never grant a lock that can lead to deadlock
transaction in the set is blocked and will remain • Often advocated in operating systems
blocked unless the system intervenes. • Useless for TP, because it would require running
– Example rl1[x] granted transactions serially.
rl2[y] granted – Example to prevent the previous deadlock,
wl2[x] blocked rl1[x] rl2[y] wl2[x] wl1[y], the system can’t grant rl2[y]
wl1[y] blocked and deadlocked
• Avoiding deadlock by resource ordering is unusable
• Deadlock is 2PL’s way to avoid non-SR executions in general, since it overly constrains applications.
– rl1[x] r1[x] rl2[y] r2[y] … can’t run w2[x] w1[y] and be SR – But may help for certain high frequency deadlocks
• To repair a deadlock, you must abort a transaction • Setting all locks when txn begins requires too much
– if you released a transaction’s lock without aborting it, advance knowledge and reduces concurrency.
1/14/01
you’d break 2PL 37 1/14/01 38

Deadlock Detection
• Detection approach: Detect deadlocks automatically, Detection Using Waits-For Graph
and abort a deadlocked transactions (the victim).
• It’s the preferred approach, because it • Explicit deadlock detection - Use a Waits-For Graph
– allows higher resource utilization and
– Nodes = {transactions}
– uses cheaper algorithms
– Edges = {Ti → Tk | Ti is waiting for Tk to release a lock}
• Timeout-based deadlock detection - If a transaction
– Example (previous deadlock) T1 T2
is blocked for too long, then abort it.
• Theorem: If there’s a deadlock, then the waits-for
– Simple and easy to implement
graph has a cycle.
– But aborts unnecessarily and
– some deadlocks persist for too long

1/14/01 39 1/14/01 40

Detection Using Waits-For Graph Cyclic Restart


(cont’d) • Transactions can cause each other to abort forever.
• So, to find deadlocks – T1 starts running. Then T2 starts running.
– when a transaction blocks, add an edge to the graph – They deadlock and T1 (the oldest) is aborted.
– periodically check for cycles in the waits-for graph – T1 restarts, bumps into T2 and again deadlocks
• Don’t test for deadlocks too often. (A cycle won’t – T2 (the oldest) is aborted ...
disappear until you detect it and break it.) • Choosing the youngest in a cycle as victim avoids
• When a deadlock is detected, select a victim from cyclic restart, since the oldest transaction is never
the cycle and abort it. the victim.
• Select a victim that hasn’t done much work • Can combine with other heuristics, e.g. fewest-locks
(e.g., has set the fewest locks).
1/14/01 41 1/14/01 42

7
MS SQL Server Distributed Locking
• Aborts the transaction that is “cheapest” to roll
back. • Suppose a transaction can access data at many
– “Cheapest” is determined by the amount of log data managers
generated. • Each data manager sets locks in the usual way
– Allows transactions that you’ve invested a lot in to • When a transaction commits or aborts, it runs
complete.
two-phase commit to notify all data managers it
• SET DEADLOCK_PRIORITY LOW
accessed
(vs. NORMAL) causes a transaction to sacrifice
itself as a victim. • The only remaining issue is distributed deadlock

1/14/01 43 1/14/01 44

Distributed Deadlock
• The deadlock spans two nodes. Oracle Deadlock Handling
Neither node alone can see it.
Node 1 Node 2 • Uses a waits-for graph for single-server
rl1[x] rl2[y] deadlock detection.
wl2[x] (blocked) wl1[y] (blocked) • The transaction that detects the deadlock is
the victim.
• Timeout-based detection is popular. Its weaknesses
are less important in the distributed case: • Uses timeouts to detect distributed
– aborts unnecessarily and some deadlocks persist too long deadlocks.
– possibly abort younger unblocked transaction to avoid
cyclic restart

1/14/01 45 1/14/01 46

Fancier Dist’d Deadlock Detection Locking Granularity


• Use waits-for graph cycle detection with a central • Granularity - size of data items to lock
deadlock detection server – e.g., files, pages, records, fields
– more work than timeout-based detection, and no • Coarse granularity implies
evidence it does better, performance-wise – very few locks, so little locking overhead
– phantom deadlocks? - No, because each waits-for edge – must lock large chunks of data, so high chance of
is an SG edge. So, WFG cycle => SG cycle conflict, so concurrency may be low
(modulo spontaneous aborts)
• Fine granularity implies
• Path pushing - Send paths Ti→ … → Tk to each
– many locks, so high locking overhead
node where Tk might be blocked.
– locking conflict occurs only when two transactions try to
– Detects short cycles quickly
access the exact same data concurrently
– Hard to know where to send paths.
Possibly too many messages • High performance TP requires record locking
1/14/01 47 1/14/01 48

8
Multigranularity Locking (MGL) MGL Type and Instance Graphs
Database DB1
• Allow different txns to lock at different granularity
– big queries should lock coarse-grained data (e.g. tables)
Area A1 A2
– short transactions lock fine-grained data (e.g. rows)
• Lock manager can’t detect these conflicts File F1 F2 F3
– each data item (e.g., table or row) has a different id
• Multigranularity locking “trick” Record R1.1 R1.2 R2.1 R2.2 R2.3 R2.1 R2.2
– exploit the natural hierarchy of data containment Lock Type Lock Instance Graph
– before locking fine-grained data, set intention locks on Graph
coarse grained data that contains it • Before setting a read lock on R2.3, first set an intention-read
– e.g., before setting a read-lock on a row, get an lock on DB1, then A2, and then F2.
intention-read-lock on the table that contains the row 49
• Set locks root-to-leaf. Release locks leaf-to-root. 50
1/14/01 1/14/01

MGL Compatibility Matrix MGL Complexities


r w ir iw riw • Relational DBMSs use MGL to lock SQL queries,
r y n y n n riw = read with short updates, and scans with updates.
w n n n n n intent to write, • Use lock escalation - start locking at fine-grain and
for a scan that
ir y n y y y escalate to coarse grain after nth lock is set.
updates some
iw n n y y n of the records it • The lock type graph is a Area
riw n n y n n reads
directed acyclic graph, not
Index
• E.g., ir conflicts with w because ir says there’s a fine- a tree, to cope with indices
grained r-lock that conflicts with a w-lock on the container • R-lock one path to an item. Index Entry File
• To r-lock an item, need an r-, ir- or riw-lock on its parent W-lock all paths to it.
• To w-lock an item, need a w-, iw- or riw-lock on its parent 51 Record
1/14/01 1/14/01 52

MS SQL Server Outline


• MS SQL Server can lock at table, page, and row level. á 1. A Model for Concurrency Control
• Uses intention read (“share”) and intention write á 2. Serializability Theory
(“exclusive”) locks at the table and page level. á 3. Synchronization Requirements for Recoverability
• Tries to avoid escalation by choosing the “appropriate” á 4. Two-Phase Locking
granularity when the scan is instantiated. á 5. Implementing Two-Phase Locking
Table 6. Locking Performance
7. Hot Spot Techniques
8. Query-Update Techniques
Index Range Extent
9. Phantoms
10. B-Trees
Page 11. Tree locking
1/14/01 53 1/14/01 54

9
5.6 Locking Performance
• Deadlocks are rare Conversions in MS SQL Server
– up to 1% - 2% of transactions deadlock
• Update-lock prevents lock conversion deadlock.
• The one exception to this is lock conversions – Conflicts with other update and write locks, but not
– r-lock a record and later upgrade to w-lock with read locks.
– e.g., Ti = read(x) … write(x) – Only on pages and rows (not tables)
– if two txns do this concurrently, they’ll deadlock • You get an update lock by using the UPDLOCK
(both get an r-lock on x before either gets a w-lock)
hint in the FROM clause
– To avoid lock conversion deadlocks, get a w-lock first
and down-grade to an r-lock if you don’t need to write. Select Foo.A
From Foo (UPDLOCK)
– Use SQL Update statement or explicit program hints
Where Foo.B = 7
1/14/01 55 1/14/01 56

Blocking and Lock Thrashing More on Thrashing


• The locking performance problem is too much delay
due to blocking
• It’s purely a blocking problem
– little delay until locks are saturated
– then major delay, due to the locking bottleneck – It happens even when the abort rate is low
– thrashing - the point where throughput decreases with • As number of transactions increase
increasing load – each additional transaction is more likely to block
Throughput – but first, it gathers some locks, increasing the
High probability others will block (negative feedback)

thrashing
Low
# of Active Txns
1/14/01
Low High 57 1/14/01 58

Interesting Sidelights
Avoiding Thrashing • By getting all locks before transaction Start, you
can increase throughput at the thrashing point
• If over 30% of active transactions are blocked, because blocked transactions hold no locks
then the system is (nearly) thrashing – But it assumes you get exactly the locks you need
so reduce the number of active transactions and retries of get-all-locks are cheap
• Timeout-based deadlock detection mistakes • Pure restart policy - abort when there’s a conflict
– They happen due to long lock delays and restart when the conflict disappears
– So the system is probably close to thrashing – If aborts are cheap and there’s low contention for
– So if deadlock detection rate is too high (over 2%) other resources, then this policy produces higher
reduce the number of active transactions throughput before thrashing than a blocking policy
– But response time is greater than a blocking policy
1/14/01 59 1/14/01 60

10
How to Reduce Lock Contention Reducing Lock Contention (cont’d)
• If each transaction holds a lock L for t seconds, • Reduce number of conflicts
then the maximum throughput is 1/t txns/second – Use finer grained locks, e.g., by partitioning tables
vertically
Start Lock L Commit
Part# Price OnHand PartName CatalogPage
t

• To increase throughput, reduce t (lock holding time) Part# Price OnHand Part# PartName CatalogPage
– Set the lock later in the transaction’s execution
(e.g., defer updates till commit time)
– Reduce transaction execution time (reduce path length,
read from disk before setting locks)
– Split a transaction into smaller transactions
1/14/01 61 1/14/01 62

Mathematical Model of Locking 5.7 Hot Spot Techniques


• K locks per transaction • N transactions
• D lockable data items • T time between lock requests • If each txn holds a lock for t seconds, then the
• N transactions each own K/2 locks on average max throughput is 1/t txns/second for that lock.
– KN/2 in total • Hot spot - A data item that’s more popular than
• Each lock request has probability KN/2D of others, so a large fraction of active txns need it
conflicting with an existing lock. – Summary information (total inventory)
– End-of-file marker in data entry application
• Each transaction requests K locks, so its probability
of experiencing a conflict is K2N/2D. – Counter used for assigning serial numbers

• Probability of a deadlock is proportional to K4N/D2 • Hot spots often create a convoy of transactions.
The hot spot lock serializes transactions.
– Prob(deadlock) / Prop(conflict) = K2/D
– if K=10 and D = 106, then K2/D = .0001 63 64
1/14/01 1/14/01

Delaying Operations Until Commit


Hot Spot Techniques (cont’d)
• Data manager logs each transaction’s updates
• Special techniques are needed to reduce t • Only applies the updates (and sets locks) after
– Keep the hot data in main memory receiving Commit from the transaction
– Delay operations on hot data till commit time • IMS Fast Path uses this for
– Use optimistic methods – Data Entry DB
– Batch up operations to hot spot data – Main Storage DB
– Partition hot spot data • Works for write, insert, and delete, but not read

1/14/01 65 1/14/01 66

11
Solving the Threshold Problem
Locking Higher-Level Operations Another IMS Fast Path Technique
• Read is often part of a read-write pair, such as • Use a blind Decrement (no threshold) and
Increment(x, n), which adds constant n to x, Verify(x, n), which returns true if x ≥ n
but doesn’t return a value.
• Re-execute Verify at commit time
• Increment (and Decrement) commute – If it returns a different value than it did during normal
• So, introduce Increment and Decrement locks execution, then abort
– It’s like checking that the threshold lock you didn’t
r w inc dec • But if Inc and Dec have a set during Decrement is still valid.
r y n n n threshold (e.g. a quantity of
w n n n n zero), then they conflict bEnough = Verify(iQuantity, n);
inc n n y y (when the threshold is near) If (bEnough) Decrement(iQuantity, n)
dec n n y y else print (“not enough”);
1/14/01 67 1/14/01 68

Optimistic Concurrency Control Batching


• The Verify trick is optimistic concurrency control • Transactions add updates to a mini-batch and only
periodically apply the mini-batch to shared data.
• Main idea - execute operations on shared data
– Each process has a private data entry file,
without setting locks. At commit time, test if there
in addition to a global shared data entry file
were conflicts on the locks (that you didn’t set).
– Each transaction appends to its process’ file
• Often used in client/server systems – Periodically append the process file to the shared file
– Client does all updates in cache without shared locks
• Tricky failure handling
– At commit time, try to get locks and perform updates
– Gathering up private files
– Avoiding holes in serial number order

1/14/01 69 1/14/01 70

5.8 Query-Update Techniques


Partitioning • Queries run for a long time and lock a lot of data —
a performance nightmare when trying also to run
• Split up inventory into partitions short update transactions
• Each transaction only accesses one partition • There are several good solutions
• Example – Use a data warehouse
– Each ticket agency has a subset of the tickets – Accept weaker consistency guarantees
– If one agency sells out early, it needs a way to – Use multiversion data
get more tickets from other agencies (partitions) • Solutions trade data quality or timeliness for
performance

1/14/01 71 1/14/01 72

12
Data Warehouse Degrees of Isolation
• A data warehouse contains a snapshot of the DB • Serializability = Degree 3 Isolation
which is periodically refreshed from the TP DB • Degree 2 Isolation (a.k.a. cursor stability)
• All queries run on the data warehouse – Data manager holds read-lock(x) only while reading x,
• All update transactions run on the TP DB but holds write locks till commit (as in 2PL)
• Queries don’t get absolutely up-to-date data – E.g. when scanning records in a file, each get-next-record
releases lock on current record and gets lock on next one
• How to refresh the data warehouse?
– read(x) is not “repeatable” within a transaction, e.g.,
– Stop processing transactions and copy the TP DB to the rl1[x] r1[x] ru1[x] wl2[x] w2[x] wu2[x] rl1[x] r1[x] ru1[x]
data warehouse. Possibly run queries while refreshing
– Degree 2 is commonly used by ISAM file systems
– Treat the warehouse as a DB replica and use a replication
– Degree 2 is often a DB system’s default behavior!
technique
And customers seem to accept it!!!
1/14/01 73 1/14/01 74

Degrees of Isolation (cont’d) ANSI SQL Isolation Levels


• Could run queries Degree 2 and updaters Degree 3 • Uncommitted Read - Degree 1
– Updaters are still serializable w.r.t. each other • Committed Read - Degree 2
• Degree 1 - no read locks; hold write locks to commit • Repeatable Read - Uses read locks and write locks,
• Unfortunately, SQL concurrency control standards but allows “phantoms”
have been stated in terms of “repeatable reads” and • Serializable - Degree 3
“cursor stability” instead of serializability, leading
to much confusion.

1/14/01 75 1/14/01 76

MS SQL Server Multiversion Data


• Lock hints in SQL FROM clause • Assume record granularity locking
– All the ANSI isolation levels, plus … • Each write operation creates a new version instead
– UPDLOCK - use update locks instead of read locks of overwriting existing value.
– READPAST - ignore locked rows (if running read • So each logical record has a sequence of versions.
committed) • Tag each record with transaction id of the
– PAGLOCK - use page lock when the system would transaction that wrote that version
otherwise use a table lock
Tid Previous E# Name Other fields
– TABLOCK - shared table lock till end of command or 123 null 1 Bill
transaction 175 123 1 Bill
– TABLOCKX - exclusive table lock till end of 134 null 2 Sue
command or transaction 199 134 2 Sue
227 null 27 Steve
1/14/01 77 1/14/01 78

13
Multiversion Data (cont’d) Commit List Management
• Execute update transactions using ordinary 2PL • Maintain and periodically recompute a tid T-Oldest, such
that
• Execute queries in snapshot mode – Every active txn’s tid is greater than T-Oldest
– System keeps a commit list of tids of all committed txns – Every new tid is greater than T-Oldest
– For every committed transaction with tid ≤ T-Oldest,
– When a query starts executing, it reads the commit list its versions are committed
– When a query reads x, it reads the latest version of x – For every aborted transaction with tid ≤ T-Oldest,
written by a transaction on its commit list its versions are wiped out
– Thus, it reads the database state that existed when it • Queries don’t need to know tids ≤ T-Oldest
started running – So only maintain the commit list for tids > T-Oldest

1/14/01 79 1/14/01 80

Multiversion Garbage Collection Oracle Multiversion


• Can delete an old version of x if no query will
Concurrency Control
ever read it • Data page contains latest version of each record, which
points to older version in rollback segment.
– There’s a later version of x whose tid ≥ T-Oldest
(or is on every active query’s commit list) • Read-committed query reads data as of its start time.
• Read-only isolation reads data as of transaction start time.
• Originally used in Prime Computer’s
• “Serializable” query reads data as of the txn’s start time.
CODASYL DB system and Oracle’s Rdb/VMS
– An update checks that the updated record was not modified after
txn start time.
– If that check fails, Oracle returns an error.
– If there isn’t enough history for Oracle to perform the check,
Oracle returns an error. (You can control the history area’s size.)
– What if T1 and T2 modify each other’s readset concurrently?
1/14/01 81 1/14/01 82

5.9 Phantoms
Oracle Concurrency Control (cont’d) • Problems when using 2PL with inserts and deletes
Accounts Assets
r1[x] r1[y] r2[x] r2[y] w1[x′] c1 w2[y′] c2 Acct# Location Balance Location Total
1 Seattle 400 Seattle 400
• The result is not serializable!
2 Tacoma 200 Tacoma 500
• In any SR execution, one transaction would have 3 Tacoma 300
read the other’s output
T1: Read Accounts 1, 2, and 3 The phantom record
T2: Insert Accounts[4, Tacoma, 100]
T2: Read Assets(Tacoma), returns 500
T2: Write Assets(Tacoma, 600)
T1: Read Assets(Tacoma), returns 600
T1: Commit
1/14/01 83 1/14/01 84

14
The Phantom Phantom Problem Avoiding Phantoms - Predicate Locks
• It looks like T1 should lock record 4, which isn’t
• Suppose a query reads all records satisfying
there! predicate P. For example,
• Which of T1’s operations determined that there – Select * From Accounts Where Location = “Tacoma”
were only 3 records? – Normally would hash each record id to an integer lock id
– Read end-of-file? – And lock control structures. Too coarse grained.
– Read record counter? • Ideally, set a read lock on P
– SQL Select operation? – which conflicts with a write lock Q if some record can
satisfy (P and Q)
• This operation conflicts with T2’s Insert
Accounts[4,Tacoma,100] • For arbitrary predicates, this is too slow to check
– Not within a few hundred instructions, anyway
• Therefore, Insert Accounts[4,Tacoma,100]
shouldn’t run until after T1 commits
1/14/01 85 1/14/01 86

Precision Locks 5.10 B-Trees


• Suppose update operations are on single records • An index maps field values to record ids.
• Maintain a list of predicate Read-locks – Record id = [page-id, offset-within-page]
– Most common DB index structures: hashing and B-trees
• Insert, Delete, & Update write-lock the record and
– DB index structures are page-oriented
check for conflict with all predicate locks
• Query sets a read lock on the predicate and check • Hashing uses a function H:V→B, from field values
for conflict with all record locks to block numbers.
– V = social security numbers. B = {1 .. 1000}
• Cheaper than predicate satisfiability, but still too H(v) = v mod 1000
expensive for practical implementation. – If a page overflows, then use an extra overflow page
– At 90% load on pages, 1.2 block accesses per request!
– BUT, doesn’t help for key range access (10 < v < 75)
1/14/01 87 1/14/01 88

B-Tree Structure Example n=3


• Index node is a sequence of [pointer, key] pairs 127 496
• K1 < K2 < … < Kn-2 < Kn-1
• P1 points to a node containing keys < K1 14 83 221 352 521 690
• Pi points to a node containing keys in range [Ki-1, Ki)
• Pn points to a node containing keys > Kn-1
127 145 189 221 245 320 352 353 487
• So, K ´1 < K ´2 < … < K ´n-2 < K ´n-1
• Notice that leaves are sorted by key, left-to-right
• Search for value v by following path from the root
K1 P1 ... Ki Pi Ki+1 . . . Kn-1 Pn
• If key = 8 bytes, ptr = 2 bytes, page = 4K, then n = 409
• So 3-level index has up to 68M leaves (4093)
K´1 P´1 . . . K´i P´i K´i+1 . . . K´n-1 P´n • At 20 records per leaf, that’s 136M records
1/14/01 89 1/14/01 90

15
Insertion B-Tree Observations
• To insert key v, search for the leaf where v should appear • Delete algorithm merges adjacent nodes < 50% full,
• If there’s space on the leave, insert the record but rarely used in practice
• If no, split the leaf in half, and split the key range in its • Root and most level-1 nodes are cached, to reduce
parent to point to the two leaves
disk accesses
To insert key 15
19 -- • split the leaf • Secondary (non-clustered) index - Leaves contain
X • split the parent’s range [0, 19) [key, record id] pairs.
12 14 17 to [0, 15) and [15, 19) • Primary (clustered) index - Leaves contain records
• if the parent was full, you’d
split that too (not shown here)
• Use key prefix for long (string) key values
15 19
X • this automatically keeps the – drop prefix and add to suffix as you move down the tree
tree balanced
12 14 15 17
1/14/01 91 1/14/01 92

Key Range Locks 5.11 Tree Locking


• Lock on B-tree key range is a cheap predicate lock • Can beat 2PL by exploiting root-to-leaf access in a
tree
127 496 • Select Dept Where ((Budget > 250) • If searching for a leaf, after setting a lock on a node,
and (Budget < 350)) release the lock on its parent
221 352 • lock the key range [221, 352) record
• only useful when query is on an A
indexed field wl(A) wl(B) wu(A) wl(E) wu(B)
221 245 320
B C D
• Commonly used with multi-granularity locking E F
– Insert/delete locks record and intention-write locks range
– MGL tree defines a fixed set of predicates, and thereby • The lock order on the root serializes access
avoids predicate satisfiability to other nodes
1/14/01 93 1/14/01 94

B-tree Locking B-link Optimization


• Root lock on a B-tree is a bottleneck • B-link tree - Each node has a side pointer to the next
• Use tree locking to relieve it • After searching a node, you can release its lock before
• Problem: node splits locking its child
P If you unlock P before splitting C, – r1[P] r2[P] r2[C] w2[C] w2[C´] w2[P] r1[C] r1[C´]
19 -- then you have to back up and lock
C X P P
P again, which breaks the tree 15 19
12 14 17 19 --
locking protocol. X
C X C C´
• So, don’t unlock a node till you’re sure its child won’t split 12 14 17 12 14 15 17
(i.e. has space for an insert)
• Searching has the same behavior as if it locked the child
• Implies different locking rules for different ops before releasing the parent … and ran later (after the insert)
(search vs. insert/update)
1/14/01 95 1/14/01 96

16

You might also like