Database Recovery
Database Recovery
1
Overview
• Many types of failures:
– Transaction failure: bad input, data not found, etc.
– System crash: bugs in OS, DBMS, loss of power, etc.
– Disk failure: disk head crash
• Recovery manager is called after a system crash to
restore DBMS to a consistent state before the crash.
– Ensure Two transaction properties:
• Atomicity: undo all actions of uncommitted transactions.
• Durability: actions of committed transactions survives failures.
(redo their update actions if they have not been written to disks).
• ARIES: log-based recovery algorithm.
2
ARIES Overview
• Assume HW support:
– Log actions on an independent “crash-safe” storage
• What are SW problems?
– Results of uncommitted transactions may be written to disks → undo
them
– Results of committed transactions may not be written to the disk →
redo them
– Questions:
• What are the states of transactions at the time of crash?
• What are the states of page (dirty?) at the time of the crash?
• Where to start undo & redo?
3
ARIES General Approach
• Before crash: • Do we really need “redo” & “undo”?
Under what condition no need?
– Log changes to DB (WAL) – Page replacement in buffer pool
– E.g., allow only committed
– Checkpoints transactions can update data in disks.
• After crash:
– Analysis phase
• Figure out states of [committed vs. uncommitted] transactions &
pages [dirty or clean]
– Redo phase
• Repeat actions from uncommitted & committed transactions [till
the time of the crash]
– Undo phase
• Undo actions of uncommitted transactions
4
Three Phases of ARIES
Log
Analysis
Most recent
C checkpoint
5
Steal Policy
• ARIES is designed to work with a steal, T1 T2
no-force approach. . .
. .
– Related to page replacement policies W(O) R(Q)
• Steal property:
– Can the changes made to an object O in Buffer Pool
the buffer pool by T1 be written to disk
before T1 commits? O
– If yes, we have a steal (T2 steals a frame write
from T1). Read (Q)
– Say T2 wants to bring a new page (Q), and
buffer pool replace the frame containing Disk
O.
6
Force Policy
T1
• When T1 commits, do we .
ensure that all changes T1 has .
made are immediately forced to W(O)
.
disk? Commit
• If yes, we have a force approach.
Buffer Pool
write
Disk
7
Steal, No-Force Write Policies
• ARIES can recover crashes from DB with steal & no-force
write policy:
– Modified pages may be written to disk before a transaction commits.
– Modified pages may not be written to disk after a transaction commits.
• “No-steal & Force write policy” makes recovery really easy,
but the tradeoff is low DB performance.
– Why?
– Adding constraints to an optimal buffer replacement.
8
ARIES
• ARIES is a recovery algorithm
that can work with a steal,
no-force write policy.
• ARIES is invoked after a crash.
This process is called restart.
• ARIES maintains a history of
actions executed by DBMS in
a log.
– The log is stored on stable
storage and must survive
crashes. (Use RAID-1 Mirrored)
9
Three Phases in ARIES (1)
• Goal:
LSN LOG
– Restore the DB to the state (buffer
pool & disk) before the crash 10 Update: T1 writes P5
– AND without effects from actions of
active (uncommitted) transactions. 20 Update: T2 writes P3
10
Three Phases in ARIES (1)
• Redo Phase:
LSN LOG
– Repeat actions (active &
committed), starting from a 10 Update: T1 writes P5
redo point in the log, and
20 Update: T2 writes P3
restores the database state to
what it was at the time of crash. 30 T2 commits
– Where is the redo point? 40 T2 ends
50 Update: T3 writes P1
60 Update: T3 writes P3
CRASH, RESTART
11
Three Phases in ARIES (2)
• Undo Phase:
– Undo actions of active transactions LSN LOG
in reverse-time order, so that DB 10 Update: T1 writes P5
reflects only the actions of
committed transactions. 20 Update: T2 writes P3
• LSNs 70 -> 60 -> 50 -> 10 30 T2 commits
• Why in reserve-time order? 40 T2 ends
– Consider forward-time order. P5
50 Update: T3 writes P1
will be restored to a value written
by T1 (in LSN #10), rather than 60 Update: T3 writes P3
before it. 70 Update: T3 writes P5
CRASH, RESTART
12
Three Principles of ARIES
• Write-Ahead Logging (WAL)
– Update to a DB object is first recorded in the log.
– The log record must be forced to a stable storage before the writing the DB
object to disk.
• How is WAL different from Force-Write?
– Forcing the log vs. data to disk.
• Repeating History During Redo
– On restart, redo the actions (recorded in the log) to bring the system back to
the exact state at the time of crash. Then undo the actions of active (not
committed) transactions.
• Logging Changes During Undo
– Since undo may change DB, log these changes (and don’t repeat them).
13
How is ARIES explained?
• Describe the needed data structure for recovery
– WAL
– Data page
– Transaction table
– Dirty page table
– Checkpoints
• Describe the algorithm
– no crash during recovery
– crash during recovery
14
Log Structure
• Log contains history of actions
executed by the DBMS. LSN LOG
• A DB action is recorded in a log 10 Update: T1 writes P5
record. 20 Update: T2 writes P3
• Log Tail: most recent portion of 30 T2 commits
the log in main memory. 40 T2 ends
– It is periodically forced to stable
storage. 50 Update: T3 writes P1
– Aren’t all records in a log in stable 60 Update: T3 writes P3
storage? No, only when writes to
disk or commits.
• Log Sequence Number (LSN):
unique ID for each log record.
15
Data Page
• PageLSN: the LSN of the most
recent log record that made a LSN LOG
change to this page. 10 Update: T1 writes P5
– Every page in the DB must have a
20 Update: T2 writes P3
pageLSN.
– What is P3’s pageLSN? 30 T2 commits
• 60 or 20 40 T2 ends
– It is used in the Redo phase of the 50 Update: T3 writes P1
algorithm. 60 Update: T3 writes P3
16
What Actions to Record Log?
• A log is written for each of the following actions:
– Updating a page: when a transaction writes a DB object, it write an
update type record. It also updates pageLSN to this record’s LSN.
– Commit: when a transaction commits, it force-writes a commit type log
record to stable storage.
– Abort: when a transaction is aborted, it writes an abort type log record.
– End: when a transaction is either aborted or committed, it writes an end-
type log record.
– Undoing an update: when a transaction is rolled back (being aborted, or
crash recovery), it undoes the updates and it writes a compensation log
record (CLR).
17
Log Record
prevLSN transID Type PageID / Offset Before- After-
Length image image
Fields common to all log Additional fields for update log records
records
PrevLSN: LSN of the previous log record in the same transaction. It forms a
single linked list of log records going back in time. It will be used for
recovery.
Type: update, commit, abort, end, CLR 18
Compensation Log Record (CLR)
undoNextLSN
CLR is written when undoing an update (T1000 30) after an abort (or during
crash recovery).
CLR records undoNextLSN, which is the LSN of the next log record that is to
be undone for T1000, which is the prevLSN of log record #30.
undoNextLSN is used for undoing actions in the reverse order. 19
Other Recovery-Related Structures
• Transaction Table: one entry LSN Trans Type PageID /
ID Length
for each active (uncommitted)
transaction. Each entry has 00 T1000 update P500 / 3
pageID recLSN
transID lastLSN P500 00
T1000 30 P600 10
T2000 20 P505 30
22
Checkpointing
• A checkpoint is a snapshot of DBMS state stored in stable
storage.
• Checkpointing in ARIES has three steps:
(1) write begin_checkpoint record to log
(2) write the state of transaction table and dirty page table +
end_checkpoint record to log
(3) write a special master record containing LSN of begin_checkpoint log
record.
• Why checkpointing?
– The restart process will look for the most recent checkpoint & start
analysis from there.
– Shorten the recovery time -> take frequent checkpoints.
23
Recovering from a System Crash
• Recovering will use WAL & the most recent checkpoint
– Write-ahead log
• The most recent checkpoint
• Compensation Log Records
– undoNextLSN: the LSN of the next log record that is to be undone
– Transaction table
• active (not committed) transactions
• lastLSNs: the LSN of the most recent log record for this transaction.
(analysis)
• Used for undo
– Dirty page table
• dirty (not written to disk) pages
• recLSNs: LSN of the first log record that caused this page to become dirty
• Used for redo
24
Analysis Phase
• Determine three things:
– A point in the log to start REDO.
• Earliest update log that may not have been written to disk.
– Dirty pages in the buffer pool at the time of crash -> restore the dirty
page table to the time of crash.
– Active transactions at time of crash for UNDO -> restore the
transaction table to the time of crash.
25
Analysis Phase: Algorithm
1. Find the most recent begin_checkpoint log record.
2. Initialize transaction & dirty page tables from the ones
saved in the most recent checkpoint.
3. Scan forward the records from begin_checkpoint log record
to the end of the log. For each log record LSN, update
trans_tab and dirty_page_tab as follows:
– If we see an end log record for T, remove T from trans_tab.
– If we see a log record for T’ not in trans_tab, add T’ in trans_tab. If
T’ is in the trans_tab, then set T’s lastLSN field to LSN.
– If we see an update/CLR log record for page P and P is not in the
dirty page table, add P in dirty page table and set its recLSN to LSN.
26
Analysis Phase: Example (1)
LSN TransID Type PageID
• After system crash, both
00 T1000 update P500
table are lost.
10 T2000 update P600
• No previous 20 T2000 update P500
checkpointing, initialize 30 T1000 update P505
tables to empty. 40 T2000 commit
System Crash
pageID recLSN
transID lastLSN
System Crash
pageID recLSN
transID lastLSN P500 00
T1000 00
System Crash
pageID recLSN
transID lastLSN P500 00
T1000 00 P600 10
T2000 10
System Crash
pageID recLSN
transID lastLSN P500 00
T1000 00 P600 10
T2000 20
System Crash
pageID recLSN
transID lastLSN P500 00
T1000 30 P600 10
T2000 20 P505 30
33
Redo Phase: Example (1)
LSN TransID Type PageID
• Scan forward from the redo
point (LSN 00). 00 T1000 update P500
pageID recLSN
transID lastLSN P500 00
T1000 30 P600 10
P505 30
36
Undo Phase: Example (1)
LSN TransID Type PageID
• The only loser transaction is
T1000. 00 T1000 update P500
pageID recLSN
transID lastLSN P500 00
T1000 30 P600 10
P505 30
38