0% found this document useful (0 votes)
33 views91 pages

20 Logging

The document outlines the schedule and important dates for the Database Systems course taught by Prof. Andy Pavlo, including project deadlines and the final exam date. It discusses multi-version concurrency control (MVCC), crash recovery, and various buffer pool policies, including steal and force policies. Additionally, it introduces shadow paging as a method for managing database consistency during transactions.

Uploaded by

ayush anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views91 pages

20 Logging

The document outlines the schedule and important dates for the Database Systems course taught by Prof. Andy Pavlo, including project deadlines and the final exam date. It discusses multi-version concurrency control (MVCC), crash recovery, and various buffer pool policies, including steal and force policies. Additionally, it introduces shadow paging as a method for managing database consistency during transactions.

Uploaded by

ayush anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Database

Systems
Database
Logging
15-445/645 FALL 2024 PROF. ANDY PAVLO

15-445/645 FALL 2024 PROF. ANDY PAVLO


2

ADMINISTRIVIA
Project #3 is due Sunday Nov 17th @ 11:59pm
→ Saturday Office Hours on Nov 16th @ 3-5pm GHC 5207

Project #4 is due Sunday Dec 8th @ 11:59pm

Final Exam is on Friday Dec 13th @ 8:30am


→ Early exam will not be offered.
→ Do not leave campus before this date.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


3

LAST CLASS
We discussed multi-version concurrency control
(MVCC) and how it effects the design of the entire
DBMS architecture.

A DBMS's concurrency control protocol gives it


Atomicity + Consistency + Isolation.
We now need ensure Atomicity + Durability…

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


4

MOTIVATION
Schedule
T1
BEGIN
R(A)
W(A) Buffer Pool

TIME

COMMIT A=1

Page
A=1

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


5

MOTIVATION
Schedule
T1
BEGIN
R(A)
W(A) Buffer Pool

TIME

COMMIT A=2
A=1

Page
A=1

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


6

MOTIVATION
Schedule
T1
BEGIN
R(A)
W(A) Buffer Pool

TIME

COMMIT A=2
A=1

Page
A=1

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


7

MOTIVATION
Schedule
T1
BEGIN
R(A)
W(A) Buffer Pool

TIME

COMMIT

Page
A=1

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


8

CRASH RECOVERY
Recovery algorithms are techniques to ensure
database consistency, transaction atomicity, and
durability despite failures.

Recovery algorithms have two parts:


→ Actions during normal txn processing to ensure that the
DBMS can recover from a failure. Today
→ Actions after a failure to recover the database to a state that
ensures atomicity, consistency, and durability.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


9

TODAY’S AGENDA
Buffer Pool Policies
Shadow Paging
Write-Ahead Log
Logging Schemes
Checkpoints
DB Flash Talk: Confluent

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


10

OBSERVATION
The database’s primary storage location is on non-
volatile storage, but this is slower than volatile
storage. Use volatile memory for faster access:
→ First copy target record into memory.
→ Perform the writes in memory.
→ Write dirty records back to disk.

The DBMS needs to ensure the following:


→ The changes for any txn are durable once the DBMS has
told somebody that it committed.
→ No partial changes are durable if the txn aborted.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


11

UNDO VS. REDO


Undo: The process of removing the effects of an
incomplete or aborted txn.
Redo: The process of re-applying the effects of a
committed txn for durability.

How the DBMS supports this functionality depends


on how it manages the buffer pool …

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


12

BUFFER POOL
Schedule
T1 T2
BEGIN
R(A) Buffer Pool
W(A)
BEGIN
TIME

A=1 B=9 C=7


R(B)
W(B) A=1 B=9 C=7
COMMIT

ROLLBACK

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


13

BUFFER POOL
Schedule
T1 T2
BEGIN
R(A) Buffer Pool
W(A)
BEGIN
TIME

A=3
A=1 B=9 C=7
R(B)
W(B) A=1 B=9 C=7
COMMIT

ROLLBACK

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


14

BUFFER POOL
Schedule
T1 T2
BEGIN
R(A) Buffer Pool
W(A)
BEGIN
TIME

A=3
A=1 B=9 C=7
R(B)
W(B) A=1 B=9 C=7
COMMIT

ROLLBACK

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


15

BUFFER POOL
Schedule
T1 T2
BEGIN
R(A) Buffer Pool
W(A)
BEGIN
TIME

A=3
A=1 B=8
B=9 C=7
R(B)
W(B) A=1 B=9 C=7
COMMIT

ROLLBACK

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


16

BUFFER POOL
Schedule
T1 T2
BEGIN
R(A) Buffer Pool
W(A)
BEGIN
TIME

A=3
A=1 B=8
B=9 C=7
R(B)
W(B) A=1 B=9 C=7
COMMIT

ROLLBACK

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


17

BUFFER POOL
Schedule
T1 T2
BEGIN
R(A) Buffer Pool
W(A)
BEGIN
TIME

A=3
A=1 B=8
B=9 C=7
R(B)
W(B) A=3
A=1 B=9
B=8 C=7
COMMIT

ROLLBACK

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


18

BUFFER POOL
Schedule
T1 T2
BEGIN
R(A) Buffer Pool
W(A)
BEGIN
TIME

A=3
A=1 B=8
B=9 C=7
R(B)
W(B) A=3
A=1 B=9
B=8 C=7
COMMIT

ROLLBACK

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


19

STEAL POLICY
Whether the DBMS can evict a dirty object in the
buffer pool modified by an uncommitted txn and
overwrite the most recent committed version of
that object in non-volatile storage.

STEAL: Eviction + overwriting is allowed.


NO-STEAL: Eviction + overwriting is not allowed.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


20

FORCE POLICY
Whether the DBMS requires that all updates made
by a txn are written back to non-volatile storage
before the txn can commit.

FORCE: Write-back is required.


NO-FORCE: Write-back is not required.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


21

NO-STEAL + FORCE
Schedule
T1 T2
BEGIN
R(A)
W(A) Buffer Pool
BEGIN
TIME

R(B) A=1 B=9 C=7


W(B)
COMMIT A=1 B=9 C=7

ROLLBACK

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


22

NO-STEAL + FORCE
Schedule
T1 T2
BEGIN
R(A)
W(A) Buffer Pool
BEGIN
TIME

R(B) A=3
A=1 B=9 C=7
W(B)
COMMIT A=1 B=9 C=7

ROLLBACK

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


23

NO-STEAL + FORCE
Schedule
T1 T2
BEGIN
R(A)
W(A) Buffer Pool
BEGIN
TIME

R(B) A=3
A=1 B=9 C=7
W(B)
COMMIT A=1 B=9 C=7

ROLLBACK

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


24

NO-STEAL + FORCE
Schedule
T1 T2
BEGIN
R(A)
W(A) Buffer Pool
BEGIN
TIME

R(B) A=3
A=1 B=8
B=9 C=7
W(B)
COMMIT A=1 B=9 C=7

ROLLBACK

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


25

NO-STEAL + FORCE
Schedule
T1 T2
BEGIN
R(A)
W(A) Buffer Pool
BEGIN
TIME

R(B) A=3
A=1 B=8
B=9 C=7
W(B)
COMMIT A=1 B=9 C=7

ROLLBACK
FORCE means that T2
changes must be written
to disk at this point.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


26

NO-STEAL + FORCE
Schedule
T1 T2
BEGIN NO-STEAL means that T1 changes
R(A) cannot be written to disk yet.
W(A) Buffer Pool
BEGIN
TIME

R(B) A=3
A=1 B=8
B=9 C=7
W(B) Copy
COMMIT A=1 B=9
B=8 C=7
⋮ A=1 B=8 C=7
ROLLBACK
FORCE means that T2
changes must be written
to disk at this point.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


27

NO-STEAL + FORCE
Schedule
T1 T2
BEGIN
R(A)
W(A) Buffer Pool
BEGIN
TIME

R(B) A=3
A=1 B=8
B=9 C=7
W(B)
COMMIT A=1 B=9
B=8 C=7

ROLLBACK

Now it’s trivial to


rollback T1

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


28

NO-STEAL + FORCE
This approach is the easiest to implement:
→ Never have to undo changes of an aborted txn because the
changes were not written to disk.
→ Never have to redo changes of a committed txn because all
the changes are guaranteed to be written to disk at commit
time (assuming atomic hardware writes).

Previous example cannot support write sets that


exceed the amount of physical memory available.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


29

SHADOW PAGING
Instead of copying the entire database, the DBMS
copies pages on write to create two versions:
→ Master: Contains only changes from committed txns.
→ Shadow: Temporary database with changes made from
uncommitted txns.
To install updates when a txn commits, overwrite
the root so it points to the shadow, thereby
swapping the master and shadow.

Buffer Pool Policy: NO-STEAL + FORCE

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


30

SHADOW PAGING – EXAMPLE


Memory Disk
1 Master Pointer
2
3
4
Master
Page Table
Master
Pointer

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


31

SHADOW PAGING – EXAMPLE


Memory Disk
1 Master Pointer
2
3
4
Master
Page Table
Master
Pointer 1
Txn T1 2
3
4
Shadow
Page Table

Active modifying txn


updates shadow pages.
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


32

SHADOW PAGING – EXAMPLE


Memory Disk
1 Master Pointer
2
3
4
Master
Page Table
Master
Pointer 1
Txn T1 Update 2
3
4
Shadow
Page Table

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


33

SHADOW PAGING – EXAMPLE


Memory Disk
1 Master Pointer
2
3
4
Master
Page Table
Master
Pointer 1
Txn T1 2
3
4
Shadow
Page Table

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


34

SHADOW PAGING – EXAMPLE


Memory Disk
1 Master Pointer
2
3
4
Master
Page Table
Master
Pointer 1
Txn T1 2
3
4
Shadow
Page Table

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


35

SHADOW PAGING – EXAMPLE


Read-only txns access
the current master. Memory Disk
1 Master Pointer
Txn T2 2
3
4
Master
Page Table
Master
Pointer 1
Txn T1 2
3
4
Shadow
Page Table

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


36

SHADOW PAGING – EXAMPLE


Read-only txns access
the current master. Memory Disk
1 Master Pointer
Txn T2 2
3
4
Master
Page Table
Master
Pointer 1
Txn T1 2
3
4
Shadow
Page Table

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


37

SHADOW PAGING – EXAMPLE


Memory Disk
Update
1 Master Pointer
2
3
4
Master
Page Table
Master
Pointer 1
Txn T1 2
3
COMMIT 4
Shadow
Page Table

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


38

SHADOW PAGING – EXAMPLE


Memory Disk
Update
1 Master Pointer
2
3
4
Master
Page Table
Master
Pointer 1
Txn T1 2
3
COMMIT 4
Master
Page Table

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


39

SHADOW PAGING – EXAMPLE


Memory Disk
Update
1 Master Pointer
2
3
4
Master
Page Table
Master
Pointer 1
Txn T1 2
3
COMMIT 4
Master
Page Table

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


40

SHADOW PAGING – EXAMPLE


Memory Disk
Master Pointer

Master
Pointer 1
Txn T1 2
3
COMMIT 4
Master
Page Table

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


41

SHADOW PAGING – UNDO/REDO


Supporting rollbacks and recovery is easy with
shadow paging.

Undo: Remove the shadow pages. Leave the master


and the DB root pointer alone.

Redo: Not needed at all.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


42

SHADOW PAGING – DISADVANTAGES


Copying the entire page table is expensive:
→ Use a page table structured like a B+tree (LMDB).
→ No need to copy entire tree, only need to copy paths in the
tree that lead to updated leaf nodes.

Commit overhead is high:


→ Flush every updated page, page table, and root.
→ Data gets fragmented (bad for sequential scans).
→ Need garbage collection.
→ Only supports one writer txn at a time or txns in a batch.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


43

SQLITE (PRE-2010)
When a txn modifies a page, the Memory
DBMS copies the original page to a
separate journal file before Page 1 Page 2 Page 3

overwriting master version.


→ Called rollback mode.
rollback mode

Disk
After restarting, if a journal file exists, Page 1 Page 4 Journal File
then the DBMS restores it to undo
changes from uncommitted txns.
Page 2 Page 5

Page 3 Page 6

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


44

SQLITE (PRE-2010)
When a txn modifies a page, the Memory
DBMS copies the original page to a
separate journal file before Page 1 Page 2 Page 3

overwriting master version.


→ Called rollback mode.
rollback mode

Disk
After restarting, if a journal file exists, Page 1 Page 4 Journal File
then the DBMS restores it to undo
changes from uncommitted txns.
Page 2 Page 5 Page 2

Page 3 Page 6

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


45

SQLITE (PRE-2010)
When a txn modifies a page, the Memory
DBMS copies the original page to a
separate journal file before Page 1 Page 2 Page 3

overwriting master version.


→ Called rollback mode.
rollback mode

Disk
After restarting, if a journal file exists, Page 1 Page 4 Journal File
then the DBMS restores it to undo
changes from uncommitted txns.
Page 2 Page 5 Page 2

Page 3 Page 6

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


46

SQLITE (PRE-2010)
When a txn modifies a page, the Memory
DBMS copies the original page to a
separate journal file before Page 1 Page 2 Page 3

overwriting master version.


→ Called rollback mode.
rollback mode

Disk
After restarting, if a journal file exists, Page 1 Page 4 Journal File
then the DBMS restores it to undo
changes from uncommitted txns.
Page 2 Page 5 Page 2

Page 3 Page 6

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


47

SQLITE (PRE-2010)
When a txn modifies a page, the Memory
DBMS copies the original page to a
separate journal file before Page 1 Page 2 Page 3

overwriting master version.


→ Called rollback mode.
rollback mode

Disk
After restarting, if a journal file exists, Page 1 Page 4 Journal File
then the DBMS restores it to undo
changes from uncommitted txns.
Page 2 Page 5 Page 2

Page 3 Page 6 Page 3

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


48

SQLITE (PRE-2010)
When a txn modifies a page, the Memory
DBMS copies the original page to a
separate journal file before Page 1 Page 2 Page 3

overwriting master version.


→ Called rollback mode.
rollback mode

Disk
After restarting, if a journal file exists, Page 1 Page 4 Journal File
then the DBMS restores it to undo
changes from uncommitted txns.
Page 2 Page 5 Page 2

Page 3 Page 6 Page 3

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


49

SQLITE (PRE-2010)
When a txn modifies a page, the Memory
DBMS copies the original page to a
separate journal file before
overwriting master version.
→ Called rollback mode.
rollback mode

Disk
After restarting, if a journal file exists, Page 1 Page 4 Journal File
then the DBMS restores it to undo
changes from uncommitted txns.
Page 2 Page 5 Page 2

Page 3 Page 6 Page 3

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


50

SQLITE (PRE-2010)
When a txn modifies a page, the Memory
DBMS copies the original page to a
separate journal file before
overwriting master version.
→ Called rollback mode.
rollback mode

Disk
After restarting, if a journal file exists, Page 1 Page 4 Journal File
then the DBMS restores it to undo
changes from uncommitted txns.
Page 2 Page 5 Page 2

Page 3 Page 6 Page 3

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


51

SQLITE (PRE-2010)
When a txn modifies a page, the Memory
DBMS copies the original page to a
separate journal file before Page 2 Page 3

overwriting master version.


→ Called rollback mode.
rollback mode

Disk
After restarting, if a journal file exists, Page 1 Page 4 Journal File
then the DBMS restores it to undo
changes from uncommitted txns.
Page 2 Page 5 Page 2

Page 3 Page 6 Page 3

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


52

SQLITE (PRE-2010)
When a txn modifies a page, the Memory
DBMS copies the original page to a
separate journal file before Page 2 Page 3

overwriting master version.


→ Called rollback mode.
rollback mode

Disk
After restarting, if a journal file exists, Page 1 Page 4 Journal File
then the DBMS restores it to undo
changes from uncommitted txns.
Page 2 Page 5 Page 2

Page 3 Page 6 Page 3

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


53

OBSERVATION
Shadowing page requires the DBMS to perform
writes to random non-contiguous pages on disk.

We need a way for the DBMS convert random


writes into sequential writes…

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


54

WRITE-AHEAD LOG (WAL)


Maintain a log file separate from data files that
contains the changes that txns make to database.
→ Assume that the log is on stable storage.
→ Log contains enough information to perform the necessary
undo and redo actions to restore the database.

DBMS must write to disk the log file records that


correspond to changes made to a database object
before it can flush that object to disk.

Buffer Pool Policy: STEAL + NO-FORCE

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


55

BUFFER POOL + WAL


Steal
No Yes
No-Force
No

Desired Concern: Crash before a page is flushed to disk. Durability?


Force

Solution: Force a summary/log @ commit. Use to REDO.

Force (on every update, flush the updated page to disk)


Yes

Trivial
Poor response time, but enforces durability of committed txns.

No-Steal Steal (flush an unpinned dirty page even if the updating txn is active)
Low throughtput, Concern: A stolen+flushed page was modified by an uncommitted txn. T.
but works for If T aborts, how is atomicity enforced?
aborted txns. Solution: Remember old value (logs). Use to UNDO.
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


56

WAL PROTOCOL
The DBMS stages all a txn’s log records in volatile
storage (usually backed by buffer pool).

All log records pertaining to an updated page are


written to non-volatile storage before the page itself
is over-written in non-volatile storage.

A txn is not considered committed until all its log


records have been written to stable storage.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


57

WAL PROTOCOL
Write a <BEGIN> record to the log for each txn to
mark its starting point.

Append a record every time a txn changes an object:


→ Transaction Id
→ Object Id Not necessary if using
→ Before Value (UNDO)
→ After Value (REDO)
append-only MVCC

When a txn finishes, the DBMS appends a


<COMMIT> record to the log.
→ Make sure that all log records are flushed before it returns
an acknowledgement to application.
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


58

WAL – EXAMPLE
Schedule
T1 WAL Buffer
BEGIN
W(A) <T1 BEGIN>
W(B)

TIME

COMMIT
A=1 B=5 C=7

Buffer Pool

A=1 B=5 C=7

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


59

WAL – EXAMPLE
Schedule
T1 WAL Buffer
BEGIN
W(A) <T1 BEGIN>
W(B) 1 <T1, A, 1, 8>

TIME

Before After

COMMIT
A=1 B=5 C=7

Buffer Pool

A=1 B=5 C=7

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


60

WAL – EXAMPLE
Schedule
T1 WAL Buffer
BEGIN
W(A) <T1 BEGIN>
W(B) 1 <T1, A, 1, 8>

TIME

Before After

COMMIT
A=1 B=5 C=7

Buffer Pool

A=1 B=5 C=7


A=8
2
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


61

WAL – EXAMPLE
Schedule
T1 WAL Buffer <T1 BEGIN>
<T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
<T1 COMMIT>
W(A) <T1 BEGIN>
W(B) <T1, A, 1, 8>
⋮ <T1, B, 5, 9>
TIME

<T1 COMMIT>
COMMIT
A=1 B=5 C=7

Txn result is now safe to


return to application. Buffer Pool

A=1 B=9
A=8 B=5 C=7

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


62

WAL – EXAMPLE
Schedule
T1 WAL Buffer <T1 BEGIN>
<T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
<T1 COMMIT>
W(A) <T1 BEGIN>
W(B) <T1, A, 1, 8>
⋮ <T1, B, 5, 9>
TIME

<T1 COMMIT>
COMMIT ⋮
A=1 B=5 C=7

Buffer Pool

A=1 B=9
A=8 B=5 C=7

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


63

WAL – EXAMPLE
Everything we need to
Schedule restore T1 is in the log!
T1 WAL Buffer <T1 BEGIN>
<T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
<T1 COMMIT>
W(A)
W(B)

TIME

COMMIT
A=1 B=5 C=7

Buffer Pool

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


64

WAL – IMPLEMENTATION
Flushing the log buffer to disk every time a txn
commits will become a bottleneck.

The DBMS can use the group commit


optimization to batch multiple log flushes together
to amortize overhead.
→ When the buffer is full, flush it to disk.
→ Or if there is a timeout (e.g., 5 ms).

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


65

WAL – GROUP COMMIT


Schedule
T1 T2 WAL Buffers
BEGIN
W(A) <T1 BEGIN>
W(B)
BEGIN
TIME

W(C)
W(D)
⋮ ⋮

COMMIT
COMMIT

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


66

WAL – GROUP COMMIT


Schedule
T1 T2 WAL Buffers
BEGIN
W(A) <T1 BEGIN>
W(B) <T1, A, 1, 8>
BEGIN
TIME

W(C)
W(D)
⋮ ⋮

COMMIT
COMMIT

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


67

WAL – GROUP COMMIT


Schedule
T1 T2 WAL Buffers
BEGIN
W(A) <T1 BEGIN>
W(B) <T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
TIME

W(C)
W(D)
⋮ ⋮

COMMIT
COMMIT

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


68

WAL – GROUP COMMIT


Schedule
T1 T2 WAL Buffers
BEGIN
W(A) <T1 BEGIN>
W(B) <T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
TIME

W(C) <T2 BEGIN>


W(D)
⋮ ⋮

COMMIT
COMMIT

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


69

WAL – GROUP COMMIT


Schedule
T1 T2 WAL Buffers
BEGIN
W(A) <T1 BEGIN>
W(B) <T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
TIME

W(C) <T2 BEGIN>


W(D) <T2, C, 1, 2>
⋮ ⋮

COMMIT
COMMIT

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


70

WAL – GROUP COMMIT


Schedule
Flush the buffer
T1 T2 when WAL
it is full.
Buffers <T1 BEGIN>
<T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
W(A) <T1 BEGIN> <T2 BEGIN>
<T2, C, 1, 2>
W(B) <T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
TIME

W(C) <T2 BEGIN>


W(D) <T2, C, 1, 2>
⋮ ⋮

COMMIT
COMMIT

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


71

WAL – GROUP COMMIT


Schedule
T1 T2 WAL Buffers <T1 BEGIN>
<T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
W(A) <T1 BEGIN> <T2 BEGIN>
<T2, C, 1, 2>
W(B) <T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
TIME

W(C) <T2 BEGIN>


W(D) <T2, C, 1, 2>
⋮ ⋮

<T2, D, 3, 4>

COMMIT
COMMIT

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


72

WAL – GROUP COMMIT


Schedule
T1 T2 WAL Buffers <T1 BEGIN>
<T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
W(A) <T1 BEGIN> <T2 BEGIN>
<T2, C, 1, 2>
W(B) <T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
TIME

W(C) <T2 BEGIN>


W(D) <T2, C, 1, 2>
⋮ ⋮

<T2, D, 3, 4>

COMMIT
COMMIT

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


73

WAL – GROUP COMMIT


Schedule
T1 T2 WAL Buffers <T1 BEGIN>
<T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
W(A) <T1 BEGIN> <T2 BEGIN>
<T2, C, 1, 2>
W(B) <T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
TIME

W(C) <T2 BEGIN>


W(D) Flush after
<T2,anC,elapsed
1, 2>
⋮ ⋮ amount of time.
<T2, D, 3, 4>

COMMIT
COMMIT

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


74

WAL – GROUP COMMIT


Schedule
T1 T2 WAL Buffers <T1 BEGIN>
<T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
<T2 BEGIN>
W(A) <T2, C, 1, 2>
W(B)
BEGIN
TIME

W(C)
W(D) Flush after an elapsed
⋮ ⋮ amount of time.
<T2, D, 3, 4>

COMMIT
COMMIT

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


75

WAL – GROUP COMMIT


Schedule
T1 T2 WAL Buffers <T1 BEGIN>
<T1, A, 1, 8>
BEGIN <T1, B, 5, 9>
<T2 BEGIN>
W(A) <T2, C, 1, 2>
W(B) <T2, D, 3, 4>
BEGIN
TIME

W(C)
W(D) Flush after an elapsed
⋮ ⋮ amount of time.
<T2, D, 3, 4>

COMMIT
COMMIT

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


76

BUFFER POOL POLICIES


Almost every DBMS uses NO-FORCE + STEAL

Runtime Performance Recovery Performance


NO-STEAL STEAL NO-STEAL STEAL

NO-FORCE Fastest NO-FORCE Slowest

FORCE Slowest FORCE Fastest

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


77

BUFFER POOL POLICIES


Almost every DBMS uses NO-FORCE + STEAL

Runtime Performance Recovery Performance


Undo + Redo
NO-STEAL STEAL NO-STEAL STEAL

NO-FORCE Fastest NO-FORCE Slowest

FORCE Slowest FORCE Fastest

No Undo + No Redo

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


78

LOGGING SCHEMES
Physical Logging
→ Record the byte-level changes made to a specific page.
→ Example: git diff

Logical Logging
→ Record the high-level operations executed by txns.
→ Example: UPDATE, DELETE, and INSERT queries.

Physiological Logging
→ Physical-to-a-page, logical-within-a-page.
→ Hybrid approach with byte-level changes for a single tuple
identified by page id + slot number.
→ Does not specify organization of the page.
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


79

LOGGING SCHEMES
UPDATE foo SET val = XYZ WHERE id = 1;

Physical Logical Physiological


<T1, <T1, <T1,
Table=X, Query="UPDATE foo Table=X,
Page=99, SET val=XYZ Page=99,
Offset=1024, WHERE id=1"> Slot=1,
Before=ABC, Before=ABC,
After=XYZ> After=XYZ>
<T1, <T1,
Index=X_PKEY, Index=X_PKEY,
Page=45, IndexPage=45,
Offset=9, Key=(1,Record1)>
Key=(1,Record1)>

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


80

PHYSICAL VS. LOGICAL LOGGING


Logical logging requires less data written in each log
record than physical logging.

Difficult to implement recovery with logical logging


if you have concurrent txns running at lower
isolation levels.
→ Hard to determine which parts of the database may have
been modified by a query before crash.
→ Recovery takes longer because DBMS re-executes every
query in the log again.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


81

CHANGE DATA CAPTURE (CDC)


Automatically propagate changes to
external sources to replicate and
synchronize database contents.
<T1 BEGIN>
<T1, A, 1, 8>
<T1, B, 5, 9>

→ Extract Transform Load (ETL)


<T1 COMMIT>

→ Some systems can do this automatically.


Others require third-party tools.

Approach #1: WAL


Approach #2: Triggers
Approach #3: Timestamps

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


82

OBSERVATION
The DBMS's WAL will grow forever.
After a crash, the DBMS must replay the entire log,
which will take a long time.

The DBMS periodically takes a checkpoint where


it flushes all buffers out to disk.
→ This provides a hint on how far back it needs to replay the
WAL after a crash.
→ Truncate the WAL up to a certain safe point in time.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


83

CHECKPOINTS
Blocking / Consistent Checkpoint Protocol:
→ Pause all queries.
→ Flush all WAL records in memory to disk.
→ Flush all modified pages in the buffer pool to disk.
→ Write a <CHECKPOINT> entry to WAL and flush to disk.
→ Resume queries.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


84

CHECKPOINTS
Use the <CHECKPOINT> record as the WAL
starting point for analyzing the WAL. <T1 BEGIN>
<T1, A, 1, 2>
<T1 COMMIT>
<T2 BEGIN>
<T2, A, 2, 3>
<T3 BEGIN>
<CHECKPOINT>
<T2, B, 4, 5>
<T2 COMMIT>
<T3, A, 3, 4>

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


85

CHECKPOINTS
Use the <CHECKPOINT> record as the WAL
starting point for analyzing the WAL. <T1 BEGIN>
Any txn that committed before the <T1, A, 1, 2>
<T1 COMMIT>
checkpoint is ignored (T1). <T2 BEGIN>
<T2, A, 2, 3>
<T3 BEGIN>
<CHECKPOINT>
<T2, B, 4, 5>
<T2 COMMIT>
<T3, A, 3, 4>

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


86

CHECKPOINTS
Use the <CHECKPOINT> record as the WAL
starting point for analyzing the WAL. <T1 BEGIN>
Any txn that committed before the <T1, A, 1, 2>
<T1 COMMIT>
checkpoint is ignored (T1). <T2 BEGIN>
<T2, A, 2, 3>
T2 + T3 did not commit before the last <T3 BEGIN>
checkpoint. <CHECKPOINT>
<T2, B, 4, 5>
<T2 COMMIT>
<T3, A, 3, 4>

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


87

CHECKPOINTS
Use the <CHECKPOINT> record as the WAL
starting point for analyzing the WAL. <T1 BEGIN>
Any txn that committed before the <T1, A, 1, 2>
<T1 COMMIT>
checkpoint is ignored (T1). <T2 BEGIN>
<T2, A, 2, 3>
T2 + T3 did not commit before the last <T3 BEGIN>
checkpoint. <CHECKPOINT>
<T2, B, 4, 5>
→ Need to redo T2 because it committed <T2 COMMIT>
after checkpoint. <T3, A, 3, 4>
→ Need to undo T3 because it did not ⋮
commit before the crash.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


88

CHECKPOINTS – CHALLENGES
In this example, the DBMS must stall txns when it
takes a checkpoint to ensure a consistent snapshot.
→ We will see how to get around this problem next class.

Scanning the log to find uncommitted txns can take


a long time.
→ Unavoidable but we will add hints to the <CHECKPOINT>
record to speed things up next class.

How often the DBMS should take checkpoints


depends on many different factors…

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


89

CHECKPOINTS – FREQUENCY
Checkpointing too often causes the runtime
performance to degrade.
→ System spends too much time flushing buffers.

But waiting a long time is just as bad:


→ The checkpoint will be large and slow.
→ Makes recovery time much longer.

Tunable option that depends on application


recovery time requirements.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


90

CONCLUSION
Write-Ahead Logging is (almost) always the best
approach to handle loss of volatile storage.
Use incremental updates (STEAL + NO-FORCE)
with checkpoints.

On Recovery: undo uncommitted txns + redo


committed txns.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


91

NEXT CLASS
Better Checkpoint Protocols.
Recovery with ARIES.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

You might also like