0% found this document useful (0 votes)
5 views38 pages

Database Recovery

The document discusses crash recovery in database management systems, focusing on the ARIES recovery algorithm, which ensures atomicity and durability through log-based recovery. It outlines the types of failures, the recovery process involving analysis, redo, and undo phases, and the importance of write-ahead logging and checkpointing. Additionally, it explains the structures used in recovery, such as transaction and dirty page tables, and the principles guiding the ARIES algorithm.

Uploaded by

kaushal56
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views38 pages

Database Recovery

The document discusses crash recovery in database management systems, focusing on the ARIES recovery algorithm, which ensures atomicity and durability through log-based recovery. It outlines the types of failures, the recovery process involving analysis, redo, and undo phases, and the importance of write-ahead logging and checkpointing. Additionally, it explains the structures used in recovery, such as transaction and dirty page tables, and the principles guiding the ARIES algorithm.

Uploaded by

kaushal56
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Crash Recovery

1
Overview
• Many types of failures:
– Transaction failure: bad input, data not found, etc.
– System crash: bugs in OS, DBMS, loss of power, etc.
– Disk failure: disk head crash
• Recovery manager is called after a system crash to
restore DBMS to a consistent state before the crash.
– Ensure Two transaction properties:
• Atomicity: undo all actions of uncommitted transactions.
• Durability: actions of committed transactions survives failures.
(redo their update actions if they have not been written to disks).
• ARIES: log-based recovery algorithm.
2
ARIES Overview
• Assume HW support:
– Log actions on an independent “crash-safe” storage
• What are SW problems?
– Results of uncommitted transactions may be written to disks → undo
them
– Results of committed transactions may not be written to the disk →
redo them
– Questions:
• What are the states of transactions at the time of crash?
• What are the states of page (dirty?) at the time of the crash?
• Where to start undo & redo?

3
ARIES General Approach
• Before crash: • Do we really need “redo” & “undo”?
Under what condition no need?
– Log changes to DB (WAL) – Page replacement in buffer pool
– E.g., allow only committed
– Checkpoints transactions can update data in disks.
• After crash:
– Analysis phase
• Figure out states of [committed vs. uncommitted] transactions &
pages [dirty or clean]
– Redo phase
• Repeat actions from uncommitted & committed transactions [till
the time of the crash]
– Undo phase
• Undo actions of uncommitted transactions
4
Three Phases of ARIES
Log

Undo Oldest log record of


A transactions active at
crash

Redo Smallest recLSN in


B dirty page table at end
of Analysis

Analysis
Most recent
C checkpoint

CRASH (end of log)

5
Steal Policy
• ARIES is designed to work with a steal, T1 T2
no-force approach. . .
. .
– Related to page replacement policies W(O) R(Q)
• Steal property:
– Can the changes made to an object O in Buffer Pool
the buffer pool by T1 be written to disk
before T1 commits? O
– If yes, we have a steal (T2 steals a frame write
from T1). Read (Q)
– Say T2 wants to bring a new page (Q), and
buffer pool replace the frame containing Disk
O.

6
Force Policy
T1
• When T1 commits, do we .
ensure that all changes T1 has .
made are immediately forced to W(O)
.
disk? Commit
• If yes, we have a force approach.
Buffer Pool

write

Disk

7
Steal, No-Force Write Policies
• ARIES can recover crashes from DB with steal & no-force
write policy:
– Modified pages may be written to disk before a transaction commits.
– Modified pages may not be written to disk after a transaction commits.
• “No-steal & Force write policy” makes recovery really easy,
but the tradeoff is low DB performance.
– Why?
– Adding constraints to an optimal buffer replacement.

8
ARIES
• ARIES is a recovery algorithm
that can work with a steal,
no-force write policy.
• ARIES is invoked after a crash.
This process is called restart.
• ARIES maintains a history of
actions executed by DBMS in
a log.
– The log is stored on stable
storage and must survive
crashes. (Use RAID-1 Mirrored)

9
Three Phases in ARIES (1)
• Goal:
LSN LOG
– Restore the DB to the state (buffer
pool & disk) before the crash 10 Update: T1 writes P5
– AND without effects from actions of
active (uncommitted) transactions. 20 Update: T2 writes P3

• Analysis Phase: 30 T2 commits


– Identify active transaction at the
40 T2 ends
time of crash
• T1 and T3. 50 Update: T3 writes P1
– Identify dirty pages in the buffer
60 Update: T3 writes P3
pool:
• P1, P3 and P5. CRASH, RESTART

10
Three Phases in ARIES (1)
• Redo Phase:
LSN LOG
– Repeat actions (active &
committed), starting from a 10 Update: T1 writes P5
redo point in the log, and
20 Update: T2 writes P3
restores the database state to
what it was at the time of crash. 30 T2 commits
– Where is the redo point? 40 T2 ends
50 Update: T3 writes P1

60 Update: T3 writes P3

CRASH, RESTART

11
Three Phases in ARIES (2)
• Undo Phase:
– Undo actions of active transactions LSN LOG
in reverse-time order, so that DB 10 Update: T1 writes P5
reflects only the actions of
committed transactions. 20 Update: T2 writes P3
• LSNs 70 -> 60 -> 50 -> 10 30 T2 commits
• Why in reserve-time order? 40 T2 ends
– Consider forward-time order. P5
50 Update: T3 writes P1
will be restored to a value written
by T1 (in LSN #10), rather than 60 Update: T3 writes P3
before it. 70 Update: T3 writes P5
CRASH, RESTART

12
Three Principles of ARIES
• Write-Ahead Logging (WAL)
– Update to a DB object is first recorded in the log.
– The log record must be forced to a stable storage before the writing the DB
object to disk.
• How is WAL different from Force-Write?
– Forcing the log vs. data to disk.
• Repeating History During Redo
– On restart, redo the actions (recorded in the log) to bring the system back to
the exact state at the time of crash. Then undo the actions of active (not
committed) transactions.
• Logging Changes During Undo
– Since undo may change DB, log these changes (and don’t repeat them).

13
How is ARIES explained?
• Describe the needed data structure for recovery
– WAL
– Data page
– Transaction table
– Dirty page table
– Checkpoints
• Describe the algorithm
– no crash during recovery
– crash during recovery

14
Log Structure
• Log contains history of actions
executed by the DBMS. LSN LOG
• A DB action is recorded in a log 10 Update: T1 writes P5
record. 20 Update: T2 writes P3
• Log Tail: most recent portion of 30 T2 commits
the log in main memory. 40 T2 ends
– It is periodically forced to stable
storage. 50 Update: T3 writes P1
– Aren’t all records in a log in stable 60 Update: T3 writes P3
storage? No, only when writes to
disk or commits.
• Log Sequence Number (LSN):
unique ID for each log record.

15
Data Page
• PageLSN: the LSN of the most
recent log record that made a LSN LOG
change to this page. 10 Update: T1 writes P5
– Every page in the DB must have a
20 Update: T2 writes P3
pageLSN.
– What is P3’s pageLSN? 30 T2 commits
• 60 or 20 40 T2 ends
– It is used in the Redo phase of the 50 Update: T3 writes P1
algorithm. 60 Update: T3 writes P3

16
What Actions to Record Log?
• A log is written for each of the following actions:
– Updating a page: when a transaction writes a DB object, it write an
update type record. It also updates pageLSN to this record’s LSN.
– Commit: when a transaction commits, it force-writes a commit type log
record to stable storage.
– Abort: when a transaction is aborted, it writes an abort type log record.
– End: when a transaction is either aborted or committed, it writes an end-
type log record.
– Undoing an update: when a transaction is rolled back (being aborted, or
crash recovery), it undoes the updates and it writes a compensation log
record (CLR).

17
Log Record
prevLSN transID Type PageID / Offset Before- After-
Length image image

Fields common to all log Additional fields for update log records
records

T1000 update P500 / 3 21 ABC DEF


T2000 update P600 / 3 41 HIJ KLM
T1000 update P500 / 3 20 GDE QRS
T1000 update P505 / 3 21 TUV WXY

PrevLSN: LSN of the previous log record in the same transaction. It forms a
single linked list of log records going back in time. It will be used for
recovery.
Type: update, commit, abort, end, CLR 18
Compensation Log Record (CLR)
undoNextLSN

prevLS LSN transID Type PageID / Offset Before- After-


prevLSN

N Length image image


00 T1000 update P500 / 3 21 ABC DEF
10 T2000 update P600 / 3 41 HIJ KLM
20 T2000 update P500 / 3 20 GDE QRS
30 T1000 update P505 / 3 21 TUV WXY
40 T1000 abort
50 T1000 CLR / undo 30 P505 / 3 21 TUV

CLR is written when undoing an update (T1000 30) after an abort (or during
crash recovery).
CLR records undoNextLSN, which is the LSN of the next log record that is to
be undone for T1000, which is the prevLSN of log record #30.
undoNextLSN is used for undoing actions in the reverse order. 19
Other Recovery-Related Structures
• Transaction Table: one entry LSN Trans Type PageID /
ID Length
for each active (uncommitted)
transaction. Each entry has 00 T1000 update P500 / 3

– Transaction ID 10 T2000 update P600 / 3


– lastLSN: the last LSN log record 20 T2000 update P500 / 3
for this transaction.
30 T1000 update P505 / 3
– How is it used? (In Undo)

pageID recLSN
transID lastLSN P500 00
T1000 30 P600 10
T2000 20 P505 30

Transaction Table Dirty Page


Table
20
Other Recovery-Related Structures
• Dirty Page Table: one LSN Trans Type PageID /
ID Length
entry for each dirty page
00 T1000 update P500 / 3
(not written to disk) in the
10 T2000 update P600 / 3
buffer pool.
20 T2000 update P500 / 3
– recLSN: LSN of the first log
record that caused this 30 T1000 update P505 / 3
page to become dirty.
– How is it used? (In Redo) pageID recLSN
transID lastLSN P500 00
T1000 30 P600 10
T2000 20 P505 30

Transaction Table Dirty Page


Table
21
Write-Ahead Log (WAL)
• Before writing a page (P) to disk, every update log record that
describes a change to P must be forced to stable storage.
• A committed transaction forces its log records (including the
commit record) to stable storage.
• (Non-forced approach + WAL) vs. (Forced approach) at
Transaction Commit Time:
– Non-forced approach + WAL mean log records are written to stable
storage, but not data records.
– Forced approach means data pages are written to disk.
– Log records are smaller than data pages!

22
Checkpointing
• A checkpoint is a snapshot of DBMS state stored in stable
storage.
• Checkpointing in ARIES has three steps:
(1) write begin_checkpoint record to log
(2) write the state of transaction table and dirty page table +
end_checkpoint record to log
(3) write a special master record containing LSN of begin_checkpoint log
record.
• Why checkpointing?
– The restart process will look for the most recent checkpoint & start
analysis from there.
– Shorten the recovery time -> take frequent checkpoints.

23
Recovering from a System Crash
• Recovering will use WAL & the most recent checkpoint
– Write-ahead log
• The most recent checkpoint
• Compensation Log Records
– undoNextLSN: the LSN of the next log record that is to be undone
– Transaction table
• active (not committed) transactions
• lastLSNs: the LSN of the most recent log record for this transaction.
(analysis)
• Used for undo
– Dirty page table
• dirty (not written to disk) pages
• recLSNs: LSN of the first log record that caused this page to become dirty
• Used for redo

24
Analysis Phase
• Determine three things:
– A point in the log to start REDO.
• Earliest update log that may not have been written to disk.
– Dirty pages in the buffer pool at the time of crash -> restore the dirty
page table to the time of crash.
– Active transactions at time of crash for UNDO -> restore the
transaction table to the time of crash.

25
Analysis Phase: Algorithm
1. Find the most recent begin_checkpoint log record.
2. Initialize transaction & dirty page tables from the ones
saved in the most recent checkpoint.
3. Scan forward the records from begin_checkpoint log record
to the end of the log. For each log record LSN, update
trans_tab and dirty_page_tab as follows:
– If we see an end log record for T, remove T from trans_tab.
– If we see a log record for T’ not in trans_tab, add T’ in trans_tab. If
T’ is in the trans_tab, then set T’s lastLSN field to LSN.
– If we see an update/CLR log record for page P and P is not in the
dirty page table, add P in dirty page table and set its recLSN to LSN.

26
Analysis Phase: Example (1)
LSN TransID Type PageID
• After system crash, both
00 T1000 update P500
table are lost.
10 T2000 update P600
• No previous 20 T2000 update P500
checkpointing, initialize 30 T1000 update P505
tables to empty. 40 T2000 commit

System Crash

pageID recLSN
transID lastLSN

Transaction Table Dirty Page


Table 27
Analysis Phase: Example (2)
LSN TransID Type PageID
• Scanning log 00:
00 T1000 update P500
– Add T1000 to transaction
10 T2000 update P600
table.
20 T2000 update P500
– Add P500 to dirty page
30 T1000 update P505
table.
40 T2000 commit

System Crash

pageID recLSN
transID lastLSN P500 00
T1000 00

Transaction Table Dirty Page


Table 28
Analysis Phase: Example (3)
LSN TransID Type PageID
• Scanning log 10:
00 T1000 update P500
– Add T2000 to transaction
10 T2000 update P600
table.
20 T2000 update P500
– Add P600 to dirty page
30 T1000 update P505
table.
40 T2000 commit

System Crash

pageID recLSN
transID lastLSN P500 00
T1000 00 P600 10
T2000 10

Transaction Table Dirty Page


Table 29
Analysis Phase: Example (4)
LSN TransID Type PageID
• Scanning log 20:
00 T1000 update P500
– Set lastLSN to 20
10 T2000 update P600
20 T2000 update P500
30 T1000 update P505
40 T2000 commit

System Crash

pageID recLSN
transID lastLSN P500 00
T1000 00 P600 10
T2000 20

Transaction Table Dirty Page Table


30
Analysis Phase: Example (5)
LSN TransID Type PageID
• Scanning log 30:
00 T1000 update P500
– Add P505 to dirty page
10 T2000 update P600
table.
20 T2000 update P500
30 T1000 update P505
40 T2000 commit

System Crash

pageID recLSN
transID lastLSN P500 00
T1000 30 P600 10
T2000 20 P505 30

Transaction Table Dirty Page


Table 31
Analysis Phase: Example (6)
LSN TransID Type PageID
• Scanning log 40:
00 T1000 update P500
– Remove T2000 from
transaction table. 10 T2000 update P600
– We are done! 20 T2000 update P500
30 T1000 update P505
• The redo point starts at 00.
40 T2000 Commit
• Why?
– P500 is the earliest log that
System Crash
may not have been written
to disk before crash. pageID recLSN

• We have restored transID lastLSN P500 00


transaction table & dirty T1000 30 P600 10
page table. T2000 10 P505 30

Transaction Table Dirty Page


Table 32
Redo Phase: Algorithm
• Scan forward from the redo point (LSN 00).
• For each update/CLR-undo log record LSN, perform redo
unless one of the conditions holds:
– The affected page is not in the dirty page table
• It is not dirty. So no need to redo.
– The affected page is in the dirty page table, but recLSN > LSN.
• The page’s recLSN (oldest log record causing this page to be dirty) is after
LSN.
– pageLSN >= LSN
• A later update on this page has been written (pageLSN = the most recent
LSN to update the page on disk).

33
Redo Phase: Example (1)
LSN TransID Type PageID
• Scan forward from the redo
point (LSN 00). 00 T1000 update P500

• Assume that P600 has been 10 T2000 update P600 (disk)


written to disk. 20 T2000 update P500
– But it can still be in the dirty 30 T1000 update P505
page table. 40 T2000 commit
• Scanning 00: System Crash
– P500 is in the dirty page table.
– 00(recLSN) = 00 (LSN) pageID recLSN
– 20 (pageLSN) > 00 (LSN)
transID lastLSN P500 00
– Donot Redo 00
T1000 30 P600 10
• Scanning 10: P505 30

Transaction Table Dirty Page


Table 34
Redo Phase: Example (2)
LSN TransID Type PageID
• Scanning 10:
– 10 (pageLSN) == 10 (LSN) 00 T1000 update P500

– Do not redo 10 10 T2000 update P600 (disk)


20 T2000 update P500
30 T1000 update P505
40 T2000 commit
System Crash

pageID recLSN
transID lastLSN P500 00
T1000 30 P600 10
P505 30

Transaction Table Dirty Page


Table 35
Undo Phase: Algorithm
• It scans backward in time from the end of the log.
• It needs to undo all actions from active (not committed)
transactions. They are also called loser transactions.
– Same as aborting them.
• Analysis phase gives the set of loser transactions, called
ToUndo set.
• Repeatedly choose the record with the largest LSN value in
this set and processes it, until ToUndo is empty.
– If it is a CLR and undoNextLSN value is not null, use undoNextLSN
value in ToUndo. If undoNextLSN is null, this transaction is completely
undo.
– If it is an update record, a CLR is written and restore the data record
value to before-image. Use prevLSN value in ToUndo.

36
Undo Phase: Example (1)
LSN TransID Type PageID
• The only loser transaction is
T1000. 00 T1000 update P500

• ToUndo set is {T1000:30} 10 T2000 update P600 (disk)


20 T2000 update P500
30 T1000 update P505
40 T2000 commit
System Crash

pageID recLSN
transID lastLSN P500 00
T1000 30 P600 10
P505 30

Transaction Table Dirty Page


Table 37
Undo Phase: Example (2)
• The only loser transaction is LSN TransID Type PageID
T1000. 00 T1000 update P500
• ToUndo set is {T1000:30} 10 T2000 update P600 (disk)
• Undoing LSN:30 20 T2000 update P500
– Write CLR:undo record log. 30 T1000 update P505
– ToUndo becomes {T1000:00} 40 T2000 commit

• Undoing LSN:00 System Crash

– Write CLR:undo record log. 50 T1000 CLR:undo:30 P505


undoNextLSN 60 T1000 CLR:undo:00 P500
– ToUndo becomes null.
– We are done.

38

You might also like