Transaction Management: CSE 444: Database Internals
Transaction Management: CSE 444: Database Internals
Problem Illustration
Client 1: START TRANSACTION INSERT INTO SmallProduct(name, price) SELECT pname, price FROM Product WHERE price <= 0.99 Crash ! DELETE Product WHERE price <=0.99 COMMIT
Recovery
From which events below can DBMS recover ? Wrong data entry Disk failure Fire / earthquake / etc. Systems crashes
Software errors Power failures
What do we do now?
Magda Balazinska - CSE 444, Spring 2012 3 Magda Balazinska - CSE 444, Spring 2012 4
Recovery
Type of Crash
Wrong data entry Disk crashes Fire or other major disaster System failures
System Failures
Each transaction has internal state When system crashes, internal state is lost
Dont know which parts executed and which didnt Need ability to undo and redo
Prevention
Constraints and Data cleaning Redundancy: RAID, backup, replica Redundancy: Replica far away DATABASE RECOVERY
5 Magda Balazinska - CSE 444, Spring 2012 6
Most frequent
INPUT OUTPUT
Disk = collection of blocks
Disk
Data must be in RAM for DBMS to operate on it! Buffer pool = table of <frame#, pageid> pairs
Buffer Manager
DBMSs build their own buffer manager and dont rely on the OS. Why? Reason 1: Performance
DBMS may be able to anticipate access patterns Hence, may also be able to perform prefetching May select better page replacement policy
Transactions
Assumption: db composed of elements
Usually 1 element = 1 block Can be smaller (=1 record) or larger (=1 relation)
Reason 2: Correctness
DBMS needs fine grained control for transactions Needs to force pages to disk for recovery purposes
Magda Balazinska - CSE 444, Spring 2012 9 Magda Balazinska - CSE 444, Spring 2012 10
Example
START TRANSACTION READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t); COMMIT;
11
WRITE(X,t)
copy transaction local variable t to element X
INPUT(X)
read element X to memory buffer
OUTPUT(X)
write element X to disk
12
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t); Transaction Buffer pool Disk
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t); Transaction Buffer pool Disk
Action INPUT(A) READ(A,t) t:=t*2 WRITE(A,t) INPUT(B) READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A) OUTPUT(B)
Mem A
Mem B
Disk A 8
Disk B 8
Action INPUT(A) READ(A,t) t:=t*2 WRITE(A,t) INPUT(B) READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A)
Mem A 8
Mem B
Disk A 8
Disk B 8
13
OUTPUT(B)
14
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t); Transaction Buffer pool Disk
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t); Transaction Buffer pool Disk
Action INPUT(A) READ(A,t) t:=t*2 WRITE(A,t) INPUT(B) READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A) OUTPUT(B)
Mem A 8
Mem B
Disk A 8 8 8
Disk B 8 8 8
Action INPUT(A) READ(A,t) t:=t*2 WRITE(A,t) INPUT(B) READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A)
Mem A 8
Mem B
Disk A 8 8 8 8
Disk B 8 8 8 8
8 16
8 8
8 16 16
8 8 16
15
OUTPUT(B)
16
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t); Transaction Buffer pool Disk
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t); Transaction Buffer pool Disk
Action INPUT(A) READ(A,t) t:=t*2 WRITE(A,t) INPUT(B) READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A) OUTPUT(B)
Mem A 8
Mem B
Disk A 8 8 8 8
Disk B 8 8 8 8 8
Action INPUT(A) READ(A,t) t:=t*2 WRITE(A,t) INPUT(B) READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A)
Mem A 8
Mem B
Disk A 8 8 8 8
Disk B 8 8 8 8 8 8 8
8 16 16 16
8 8 16 16 8
8 16 16 16 8 16
8 8 16 16 16 16 8 8 8
8 8 8
17
OUTPUT(B)
18
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t); Transaction Buffer pool Disk
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t); Transaction Buffer pool Disk
Action INPUT(A) READ(A,t) t:=t*2 WRITE(A,t) INPUT(B) READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A) OUTPUT(B)
Mem A 8
Mem B
Disk A 8 8 8 8
Disk B 8 8 8 8 8 8 8 8
Action INPUT(A) READ(A,t) t:=t*2 WRITE(A,t) INPUT(B) READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A)
Mem A 8
Mem B
Disk A 8 8 8 8
Disk B 8 8 8 8 8 8 8 8 8
20
8 16 16 16 8 16 16
8 8 16 16 16 16 16 8 8 8 16
8 16 16 16 8 16 16 16
8 8 16 16 16 16 16 16 8 8 8 16 16
8 8 8 8
8 8 8 8 16
19
OUTPUT(B)
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t); Transaction Buffer pool Disk
Action INPUT(A)
Disk B 8 8 8 8 8 8 8 8 8 16
21
t 8 16 16 16 8 16 16 16 16
Mem A 8 8 8 16 16 16 16 16 16 16
Mem B
Disk A 8 8 8 8
Disk B 8 8 8 8 8 8 8 8 8
Action INPUT(A) READ(A,t) t:=t*2 WRITE(A,t) INPUT(B) READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A) OUTPUT(B)
Mem A 8
Mem B
Disk A 8 8 8 8
8 16 16 16 8 16 16 16
8 8 16 16 16 16 16 16 8 8 8 16 16
8 8 8 16 16 16
8 8 8 8 16 16
8 8 8 8 16
Crash !
16
22
FORCE or NO-FORCE
Should all updates of a transaction be forced to disk before the transaction commits?
24
Undo Logging
Log records <START T>
Transaction T has begun
Action
Mem A
Mem B
Disk A
Disk B
8 8 8 16
8 8 8 8
8 8 8 8 <T,A,8>
<COMMIT T>
T has committed
INPUT(B)
READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A) OUTPUT(B) COMMIT
25
16
8 16 16 16 16
16
16 16 16 16 16
8
8 8 16 16 16
8
8 8 8 16 16
8
8 8 8 8 16 <COMMIT T>
26
<ABORT T>
T has aborted
<T,B,8>
Action
Mem A
Mem B
Disk A
Disk B
Action
Mem A
Mem B
Disk A
Disk B
8 8 8 16
8 8 8 8
8 8 8 8 <T,A,8>
8 8 8 16
8 8 8 8
8 8 8 8 <T,A,8>
INPUT(B)
READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A) OUTPUT(B) COMMIT
16
8 16 16 16 16
16
16 16 16 16 16
8
8 8 16 16 16
8
8 8 8 16 16
8
8 8 8 8 16 <T,B,8>
INPUT(B)
READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A) OUTPUT(B) COMMIT
16
8 16 16 16 16
16
16 16 16 16 16
8
8 8 16 16 16
8
8 8 8 16 16
8
8 8 8 8 16 <COMMIT T> <T,B,8>
Crash !
<COMMIT T>
27
WHAT DO WE DO ?
WHAT DO WE DO ?
Crash !
28
After Crash
In the first example:
We UNDO both changes: A=8, B=8 The transaction is atomic, since none of its actions have been executed
Undo-Logging Rules
U1: If T modifies X, then <T,X,v> must be written to disk before OUTPUT(X) U2: If T commits, then OUTPUT(X) must be written to disk before <COMMIT T> Hence: OUTPUTs are done early, before the transaction commits
29
30
Action
Mem A
Mem B
Disk A
Disk B
8 8 8 16
8 8 8 8
8 8 8 8 <T,A,8>
INPUT(B)
READ(B,t) t:=t*2 WRITE(B,t) OUTPUT(A) OUTPUT(B) COMMIT
16
8 16 16 16 16
16
16 16 16 16 16
8
8 8 16 16 16
8
8 8 8 16 16
8
8 8 8 8 16 <COMMIT T>
31
<T,B,8>
33
crash
35
36
Checkpointing
Checkpoint the database periodically Stop accepting new transactions Wait until all current transactions complete Flush log to disk Write a <CKPT> log record, flush Resume transactions
other transactions
transactions T2,T3,T4,T5
37
38
Nonquiescent Checkpointing
Problem with checkpointing: database freezes during checkpoint Would like to checkpoint while database is operational Idea: nonquiescent checkpointing
Nonquiescent Checkpointing
Write a <START CKPT(T1,,Tk)> where T1,,Tk are all active transactions. Flush log to disk Continue normal operation When all of T1,,Tk have completed, write <END CKPT>. Flush log to disk
Quiescent = being quiet, still, or at rest; inactive Non-quiescent = allowing transactions to be active
Magda Balazinska - CSE 444, Spring 2012 39
40
Implementing ROLLBACK
Recall: a transaction can end in COMMIT or ROLLBACK Idea: use the undo-log to implement ROLLBACK How ?
LSN = Log Sequence Number Log entries for the same transaction are linked, using the LSNs Read log in reverse, using LSN pointers
later transactions
41 Magda Balazinska - CSE 444, Spring 2012 42
Redo Logging
Log records <START T> = transaction T has begun <COMMIT T> = T has committed <ABORT T>= T has aborted <T,X,v>= T has updated element X, and its new value is v
Action
Mem A
Mem B
Disk A
Disk B
8 16 16 8 16 16
8 8 16 16 16 16 8 8 16
8 8 8 8 8 8
OUTPUT(A) OUTPUT(B)
Magda Balazinska - CSE 444, Spring 2012 43
16 16
16 16
16 16
16 16
8 16
44
Redo-Logging Rules
R1: If T modifies X, then both <T,X,v> and <COMMIT T> must be written to disk before OUTPUT(X)
Action
Mem A
Mem B
Disk A
Disk B
8 16 16 8 16 16
8 8 16 16 16 16 8 8 16
8 8 8 8 8 8
16 16
16 16
16 16
16 16
8 16
46
Step 2. Read log from the beginning, redo all updates of committed transactions
47
48
Nonquiescent Checkpointing
Write a <START CKPT(T1,,Tk)> where T1,,Tk are all active transactions Flush to disk all blocks of committed transactions (dirty blocks), while continuing normal operation When all blocks have been written, write <END CKPT>
49
Cannot use
Step 2: redo from the earliest start of T4, T5, T6 ignoring transactions committed earlier
50
Comparison Undo/Redo
Undo logging:
Steal/Force OUTPUT must be done early If <COMMIT T> is seen, T definitely has written all its data to disk (hence, dont need to redo) inefficient No-Steal/No-Force OUTPUT must be done late If <COMMIT T> is not seen, T definitely has not written any of its data to disk (hence there is not dirty data on disk, no need to undo) inflexible
Undo/Redo Logging
Log records, only one change <T,X,u,v>= T has updated element X, its old value was u, and its new value is v
Redo logging
Would like more flexibility on when to OUTPUT: undo/redo logging (next) Steal/No-Force
Magda Balazinska - CSE 444, Spring 2012 51 Magda Balazinska - CSE 444, Spring 2012 52
Undo/Redo-Logging Rule
UR1: If T modifies X, then <T,X,u,v> must be written to disk before OUTPUT(X) Note: we are free to OUTPUT early or late relative to <COMMIT T>
Action
Mem A
Mem B
Disk A
Disk B
8 16 16 8 16 16 16
8 8 16 16 16 16 16 8 8 16 16
8 8 8 8 8 8 16
OUTPUT(B)
Magda Balazinska - CSE 444, Spring 2012 53
16
16
16
16
16
55
56
10