Class 3
Class 3
1/14/01 5 1/14/01
and use the neutral term data manager 6
1
Assumption - Atomic Operations Assumption - Txns communicate
only via Read and Write
• We will synchronize Reads and Writes.
• Read and Write are the only operations the
• We must therefore assume they’re atomic system will control to attain serializability.
– else we’d have to synchronize the finer-grained
• So, if transactions communicate via messages,
operations that implement Read and Write
then implement SendMsg as Write, and
• Read(x) - returns the current value of x in the DB ReceiveMsg as Read.
• Write(x, val) overwrites all of x (the whole page) • Else, you could have the following:
• This assumption of atomic operations is what w1[x] r2[x] send2[M] receive1[M]
allows us to abstract executions as sequences of – data manager didn’t know about send/receive and
reads and writes (without loss of information). thought the execution was SR.
– Otherwise, what would wk[x] ri[x] mean? • Also watch out for brain transport
1/14/01 7 1/14/01 8
1/14/01 11 1/14/01 12
2
Examples of Equivalence Serializable Histories
• The following histories are equivalent • A history is serializable if it is equivalent to a serial
H1 = r1[x] r2[x] w1[x] c1 w2[y] c2 history
H2 = r2[x] r1[x] w1[x] c1 w2[y] c2 • For example,
H3 = r2[x] r1[x] w2[y] c2 w1[x] c1 H1 = r1[x] r2[x] w1[x] c1 w2[y] c2
H4 = r2[x] w2[y] c2 r1[x] w1[x] c1
is equivalent to
• But none of them are equivalent to
H4 = r2[x] w2[y] c2 r1[x] w1[x] c1
H5 = r1[x] w1[x] r2[x] c1 w2[y] c2
(r2[x] and w1[x] are in the same order in H1 and H4.)
because r2[x] and w1[x] conflict and
r2[x] precedes w1[x] in H1 - H4, but • Therefore, H1 is serializable.
w1[x] precedes r2[x] in H5.
1/14/01 13 1/14/01 14
3
5.3 Synchronization Requirements Recoverability
for Recoverability • If Tk reads from Ti and Ti aborts, then Tk must abort
• In addition to guaranteeing serializability, – Example - w1[x] r2[x] a1 implies T2 must abort
synchronization is needed to implement abort easily. • But what if Tk already committed? We’d be stuck.
• When a transaction T aborts, the data manager wipes – Example - w1[x] r2[x] c2 a1
out all of T’s effects, including – T2 can’t abort after it commits
– undoing T’s writes that were applied to the DB, and • Executions must be recoverable:
A transaction T’s commit operation must follow the
– aborting transactions that read values written by T
commit of every transaction from which T read.
(these are called cascading aborts)
– Recoverable - w1[x] r2[x] c1 c2
• Example - w1[x] r2[x] w2[y] – Not recoverable - w1[x] r2[x] c2 a1
– to abort T1, we must undo w1[x] and abort T2 • Recoverability requires synchronizing operations.
(a cascading abort)
1/14/01 19 1/14/01 20
1/14/01 23 1/14/01 24
4
Basic Locking Isn’t Enough Two-Phase Locking (2PL) Protocol
• Basic locking doesn’t guarantee serializability • A transaction is two-phase locked if:
– before reading x, it sets a read lock on x
• rl1[x] r1[x] ru1[x] wl1[y] w1[y] wu1[y] c1
– before writing x, it sets a write lock on x
rl2[y] r2[y] wl2[x] w2[x] ru2[y] wu2[x] c2 – it holds each lock until after it executes the
corresponding operation
• Eliminating the lock operations, we have
– after its first unlock operation, it requests no new locks
r1[x] r2[y] w2[x] c2 w1[y] c1 which isn’t SR
• Each transaction sets locks during a growing phase
and releases them during a shrinking phase.
• The problem is that locks aren’t being released • Example - on the previous page T2 is two-phase
properly. locked, but not T1 since ru1[x] < wl1[y]
– use “<” for “precedes”
1/14/01 25 1/14/01 26
5
2PL Preserves Transaction Brain Transport One Last Time
Handshakes (cont’d)
• If a user reads committed displayed output of Ti
• Stating this more formally … and uses that displayed output as input to
• Theorem: transaction Tk, then he/she should wait for
For any 2PL execution H, Ti to commit before starting Tk.
there is an equivalent serial execution Hs, • The user can then rely on transaction handshake
such that for all Ti, Tk, preservation to ensure Ti is serialized before Tk.
if Ti committed before Tk started in H,
then Ti precedes Tk in Hs.
1/14/01 31 1/14/01 32
1/14/01 33 1/14/01 34
1/14/01 35 1/14/01 36
6
Deadlocks Deadlock Prevention
• A set of transactions is deadlocked if every • Never grant a lock that can lead to deadlock
transaction in the set is blocked and will remain • Often advocated in operating systems
blocked unless the system intervenes. • Useless for TP, because it would require running
– Example rl1[x] granted transactions serially.
rl2[y] granted – Example to prevent the previous deadlock,
wl2[x] blocked rl1[x] rl2[y] wl2[x] wl1[y], the system can’t grant rl2[y]
wl1[y] blocked and deadlocked
• Avoiding deadlock by resource ordering is unusable
• Deadlock is 2PL’s way to avoid non-SR executions in general, since it overly constrains applications.
– rl1[x] r1[x] rl2[y] r2[y] … can’t run w2[x] w1[y] and be SR – But may help for certain high frequency deadlocks
• To repair a deadlock, you must abort a transaction • Setting all locks when txn begins requires too much
– if you released a transaction’s lock without aborting it, advance knowledge and reduces concurrency.
1/14/01
you’d break 2PL 37 1/14/01 38
Deadlock Detection
• Detection approach: Detect deadlocks automatically, Detection Using Waits-For Graph
and abort a deadlocked transactions (the victim).
• It’s the preferred approach, because it • Explicit deadlock detection - Use a Waits-For Graph
– allows higher resource utilization and
– Nodes = {transactions}
– uses cheaper algorithms
– Edges = {Ti → Tk | Ti is waiting for Tk to release a lock}
• Timeout-based deadlock detection - If a transaction
– Example (previous deadlock) T1 T2
is blocked for too long, then abort it.
• Theorem: If there’s a deadlock, then the waits-for
– Simple and easy to implement
graph has a cycle.
– But aborts unnecessarily and
– some deadlocks persist for too long
1/14/01 39 1/14/01 40
7
MS SQL Server Distributed Locking
• Aborts the transaction that is “cheapest” to roll
back. • Suppose a transaction can access data at many
– “Cheapest” is determined by the amount of log data managers
generated. • Each data manager sets locks in the usual way
– Allows transactions that you’ve invested a lot in to • When a transaction commits or aborts, it runs
complete.
two-phase commit to notify all data managers it
• SET DEADLOCK_PRIORITY LOW
accessed
(vs. NORMAL) causes a transaction to sacrifice
itself as a victim. • The only remaining issue is distributed deadlock
1/14/01 43 1/14/01 44
Distributed Deadlock
• The deadlock spans two nodes. Oracle Deadlock Handling
Neither node alone can see it.
Node 1 Node 2 • Uses a waits-for graph for single-server
rl1[x] rl2[y] deadlock detection.
wl2[x] (blocked) wl1[y] (blocked) • The transaction that detects the deadlock is
the victim.
• Timeout-based detection is popular. Its weaknesses
are less important in the distributed case: • Uses timeouts to detect distributed
– aborts unnecessarily and some deadlocks persist too long deadlocks.
– possibly abort younger unblocked transaction to avoid
cyclic restart
1/14/01 45 1/14/01 46
8
Multigranularity Locking (MGL) MGL Type and Instance Graphs
Database DB1
• Allow different txns to lock at different granularity
– big queries should lock coarse-grained data (e.g. tables)
Area A1 A2
– short transactions lock fine-grained data (e.g. rows)
• Lock manager can’t detect these conflicts File F1 F2 F3
– each data item (e.g., table or row) has a different id
• Multigranularity locking “trick” Record R1.1 R1.2 R2.1 R2.2 R2.3 R2.1 R2.2
– exploit the natural hierarchy of data containment Lock Type Lock Instance Graph
– before locking fine-grained data, set intention locks on Graph
coarse grained data that contains it • Before setting a read lock on R2.3, first set an intention-read
– e.g., before setting a read-lock on a row, get an lock on DB1, then A2, and then F2.
intention-read-lock on the table that contains the row 49
• Set locks root-to-leaf. Release locks leaf-to-root. 50
1/14/01 1/14/01
9
5.6 Locking Performance
• Deadlocks are rare Conversions in MS SQL Server
– up to 1% - 2% of transactions deadlock
• Update-lock prevents lock conversion deadlock.
• The one exception to this is lock conversions – Conflicts with other update and write locks, but not
– r-lock a record and later upgrade to w-lock with read locks.
– e.g., Ti = read(x) … write(x) – Only on pages and rows (not tables)
– if two txns do this concurrently, they’ll deadlock • You get an update lock by using the UPDLOCK
(both get an r-lock on x before either gets a w-lock)
hint in the FROM clause
– To avoid lock conversion deadlocks, get a w-lock first
and down-grade to an r-lock if you don’t need to write. Select Foo.A
From Foo (UPDLOCK)
– Use SQL Update statement or explicit program hints
Where Foo.B = 7
1/14/01 55 1/14/01 56
thrashing
Low
# of Active Txns
1/14/01
Low High 57 1/14/01 58
Interesting Sidelights
Avoiding Thrashing • By getting all locks before transaction Start, you
can increase throughput at the thrashing point
• If over 30% of active transactions are blocked, because blocked transactions hold no locks
then the system is (nearly) thrashing – But it assumes you get exactly the locks you need
so reduce the number of active transactions and retries of get-all-locks are cheap
• Timeout-based deadlock detection mistakes • Pure restart policy - abort when there’s a conflict
– They happen due to long lock delays and restart when the conflict disappears
– So the system is probably close to thrashing – If aborts are cheap and there’s low contention for
– So if deadlock detection rate is too high (over 2%) other resources, then this policy produces higher
reduce the number of active transactions throughput before thrashing than a blocking policy
– But response time is greater than a blocking policy
1/14/01 59 1/14/01 60
10
How to Reduce Lock Contention Reducing Lock Contention (cont’d)
• If each transaction holds a lock L for t seconds, • Reduce number of conflicts
then the maximum throughput is 1/t txns/second – Use finer grained locks, e.g., by partitioning tables
vertically
Start Lock L Commit
Part# Price OnHand PartName CatalogPage
t
• To increase throughput, reduce t (lock holding time) Part# Price OnHand Part# PartName CatalogPage
– Set the lock later in the transaction’s execution
(e.g., defer updates till commit time)
– Reduce transaction execution time (reduce path length,
read from disk before setting locks)
– Split a transaction into smaller transactions
1/14/01 61 1/14/01 62
• Probability of a deadlock is proportional to K4N/D2 • Hot spots often create a convoy of transactions.
The hot spot lock serializes transactions.
– Prob(deadlock) / Prop(conflict) = K2/D
– if K=10 and D = 106, then K2/D = .0001 63 64
1/14/01 1/14/01
1/14/01 65 1/14/01 66
11
Solving the Threshold Problem
Locking Higher-Level Operations Another IMS Fast Path Technique
• Read is often part of a read-write pair, such as • Use a blind Decrement (no threshold) and
Increment(x, n), which adds constant n to x, Verify(x, n), which returns true if x ≥ n
but doesn’t return a value.
• Re-execute Verify at commit time
• Increment (and Decrement) commute – If it returns a different value than it did during normal
• So, introduce Increment and Decrement locks execution, then abort
– It’s like checking that the threshold lock you didn’t
r w inc dec • But if Inc and Dec have a set during Decrement is still valid.
r y n n n threshold (e.g. a quantity of
w n n n n zero), then they conflict bEnough = Verify(iQuantity, n);
inc n n y y (when the threshold is near) If (bEnough) Decrement(iQuantity, n)
dec n n y y else print (“not enough”);
1/14/01 67 1/14/01 68
1/14/01 69 1/14/01 70
1/14/01 71 1/14/01 72
12
Data Warehouse Degrees of Isolation
• A data warehouse contains a snapshot of the DB • Serializability = Degree 3 Isolation
which is periodically refreshed from the TP DB • Degree 2 Isolation (a.k.a. cursor stability)
• All queries run on the data warehouse – Data manager holds read-lock(x) only while reading x,
• All update transactions run on the TP DB but holds write locks till commit (as in 2PL)
• Queries don’t get absolutely up-to-date data – E.g. when scanning records in a file, each get-next-record
releases lock on current record and gets lock on next one
• How to refresh the data warehouse?
– read(x) is not “repeatable” within a transaction, e.g.,
– Stop processing transactions and copy the TP DB to the rl1[x] r1[x] ru1[x] wl2[x] w2[x] wu2[x] rl1[x] r1[x] ru1[x]
data warehouse. Possibly run queries while refreshing
– Degree 2 is commonly used by ISAM file systems
– Treat the warehouse as a DB replica and use a replication
– Degree 2 is often a DB system’s default behavior!
technique
And customers seem to accept it!!!
1/14/01 73 1/14/01 74
1/14/01 75 1/14/01 76
13
Multiversion Data (cont’d) Commit List Management
• Execute update transactions using ordinary 2PL • Maintain and periodically recompute a tid T-Oldest, such
that
• Execute queries in snapshot mode – Every active txn’s tid is greater than T-Oldest
– System keeps a commit list of tids of all committed txns – Every new tid is greater than T-Oldest
– For every committed transaction with tid ≤ T-Oldest,
– When a query starts executing, it reads the commit list its versions are committed
– When a query reads x, it reads the latest version of x – For every aborted transaction with tid ≤ T-Oldest,
written by a transaction on its commit list its versions are wiped out
– Thus, it reads the database state that existed when it • Queries don’t need to know tids ≤ T-Oldest
started running – So only maintain the commit list for tids > T-Oldest
1/14/01 79 1/14/01 80
5.9 Phantoms
Oracle Concurrency Control (cont’d) • Problems when using 2PL with inserts and deletes
Accounts Assets
r1[x] r1[y] r2[x] r2[y] w1[x′] c1 w2[y′] c2 Acct# Location Balance Location Total
1 Seattle 400 Seattle 400
• The result is not serializable!
2 Tacoma 200 Tacoma 500
• In any SR execution, one transaction would have 3 Tacoma 300
read the other’s output
T1: Read Accounts 1, 2, and 3 The phantom record
T2: Insert Accounts[4, Tacoma, 100]
T2: Read Assets(Tacoma), returns 500
T2: Write Assets(Tacoma, 600)
T1: Read Assets(Tacoma), returns 600
T1: Commit
1/14/01 83 1/14/01 84
14
The Phantom Phantom Problem Avoiding Phantoms - Predicate Locks
• It looks like T1 should lock record 4, which isn’t
• Suppose a query reads all records satisfying
there! predicate P. For example,
• Which of T1’s operations determined that there – Select * From Accounts Where Location = “Tacoma”
were only 3 records? – Normally would hash each record id to an integer lock id
– Read end-of-file? – And lock control structures. Too coarse grained.
– Read record counter? • Ideally, set a read lock on P
– SQL Select operation? – which conflicts with a write lock Q if some record can
satisfy (P and Q)
• This operation conflicts with T2’s Insert
Accounts[4,Tacoma,100] • For arbitrary predicates, this is too slow to check
– Not within a few hundred instructions, anyway
• Therefore, Insert Accounts[4,Tacoma,100]
shouldn’t run until after T1 commits
1/14/01 85 1/14/01 86
15
Insertion B-Tree Observations
• To insert key v, search for the leaf where v should appear • Delete algorithm merges adjacent nodes < 50% full,
• If there’s space on the leave, insert the record but rarely used in practice
• If no, split the leaf in half, and split the key range in its • Root and most level-1 nodes are cached, to reduce
parent to point to the two leaves
disk accesses
To insert key 15
19 -- • split the leaf • Secondary (non-clustered) index - Leaves contain
X • split the parent’s range [0, 19) [key, record id] pairs.
12 14 17 to [0, 15) and [15, 19) • Primary (clustered) index - Leaves contain records
• if the parent was full, you’d
split that too (not shown here)
• Use key prefix for long (string) key values
15 19
X • this automatically keeps the – drop prefix and add to suffix as you move down the tree
tree balanced
12 14 15 17
1/14/01 91 1/14/01 92
16