Crash Recovery: CS 186 Fall 2009 R&G - Chapter 18
Crash Recovery: CS 186 Fall 2009 R&G - Chapter 18
Performance Logging/Recovery
Implications Implications
Basic Idea: Logging
LOG RAM
DB
LogRecords
prevLSN Xact Table
XID Data pages lastLSN
type each status
pageID with a
length pageLSN Dirty Page Table
offset recLSN
before-image master record
after-image LSN of flushedLSN
most recent
checkpoint
Crash Recovery: Big Picture
Start from a checkpoint
Oldest log (found via master record).
rec. of Xact
active at crash
Three phases. Need to:
1. Analysis - update structures:
Smallest – Trans Table: which Xacts
recLSN in
dirty page were active at time of crash.
table after – Dirty Page Table: which
Analysis
pages might have been dirty
in the buffer pool at time of
Last chkpt
crash.
2. REDO all actions.
CRASH (repeat history)
A R U 3. UNDO effects of failed Xacts.
Recovery: The Analysis Phase
• Re-establish knowledge of state at checkpoint.
– via transaction table and dirty page table stored in the checkpoint
• Scan log forward from checkpoint.
– End record: Remove Xact from Xact table.
– All Other records: Add Xact to Xact table, set lastLSN=LSN, change Xact
status on commit.
– also, for Update records: If page P not in Dirty Page Table, Add P to
DPT, set its recLSN=LSN.
• At end of Analysis…
– transaction table says which xacts were active at time of crash.
– DPT says which dirty pages might not have made it to disk
Phase 2: The REDO Phase
• We repeat History to reconstruct state at crash:
– Reapply all updates (even of aborted Xacts!), redo CLRs.
• Scan forward from log rec containing smallest recLSN in DPT. Q:
why start here?
• For each update log record or CLR with a given LSN, REDO the
action unless:
– Affected page is not in the Dirty Page Table, or
– Affected page is in D.P.T., but has recLSN > LSN, or
– pageLSN (in DB) LSN. (this last case requires I/O)
• To REDO an action:
– Reapply logged action.
– Set pageLSN to LSN. No additional logging, no forcing!
Phase 3: The UNDO Phase
LSN LOG
RAM 00 begin_checkpoint
05 end_checkpoint
Xact Table 10 update: T1 writes P5
lastLSN 20 update T2 writes P3
status
30 T1 abort
Dirty Page Table
recLSN 40 CLR: Undo T1 LSN 10, UndoNxt=Null
flushedLSN 45 T1 End
50 update: T3 writes P1
ToUndo 60 update: T2 writes P5
CRASH, RESTART
Example (cont.):Analysis & Redo
LSN LOG
00 begin_checkpoint
Xact Table
05 end_checkpoint
Trans lastLSN Stat update: T1 writes P5
10
T1
T2 10
30
40
20
60 ra 20 update T2 writes P3
T2
T3 20
50 r 30 T1 abort
40 CLR: Undo T1 LSN 10, UndoNxt=Null
45 T1 End
Dirty Page Table
50 update: T3 writes P1
PageId recLSN
60 update: T2 writes P5
P5 10 CRASH, RESTART
P3 20
Redo starts at LSN 10;
P1 50 in this case, reads P5, P3,
and P1 from disk, redoes
ops if pageLSN < LSN
Ex (cont.): Undo & Crash During
00 begin_checkpoint,
Restart! 05 end_checkpoint
After Analysis/Redo: 10 update: T1 writes P5;Prvl=null
ToUndo: 50 & 60 20 update T2 writes P3; Prvl = null
ToUndo: 30 T1 abort
50 & 20 40 CLR: Undo T1 LSN 10
ToUndo: 45 T1 End
20 50 update: T3 writes P1; PrvL=null
After Analysis/Redo: 60 update: T2 writes P5; PrvL=20
ToUndo: 70 CRASH, RESTART
ToUndo: 70 CLR: Undo T2 LSN 60; UndoNxtLSN=20
20 80 CLR: Undo T3 LSN 50;UndoNxtLSN=null
85 T3 end
ToUndo:
Finished! CRASH, RESTART
90 CLR: Undo T2 LSN 20;UndoNxtLSN=null
100 T2 end
Additional Crash Issues
• What happens if system crashes during Analysis? During
REDO?