0% found this document useful (0 votes)
18 views52 pages

Lectures - 11 Transactions Logging

Uploaded by

trailblazerwzl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views52 pages

Lectures - 11 Transactions Logging

Uploaded by

trailblazerwzl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Transactions

Stats (Out of 90+2)


○ Median: 76.5, Mean: 73.9, High: 92, Standard Deviation: 10.5
○ Regrade requests open for 1 week (next Tuesday)

Exam
Grades
Feedback Themes
- TAs were ultra-responsive during the “24 hours”
- Test was too long. We’ll recalibrate for Finals
- Most liked 24 hour flex time
UPDATE Product
SET Price = Price – 1.99
WHERE pname = ‘Gizmo’

SQL INSERT INTO SmallProduct(name, price)


Writes SELECT pname, price
FROM Product
WHERE price <= 0.99

DELETE Product
WHERE price <=0.99
UPDATE Product
SET Price = Price – 1.99
WHERE pname = ‘Gizmo’

SQL INSERT INTO SmallProduct(name, price)


Writes SELECT pname, price
FROM Product
Readfrom another table
WHERE price <= 0.99

DELETE Product
WHERE price <=0.99
How?
Mobile Game
Report & Share

Example Real-Time
DBMS
Business/Product
Analysis

Game App User Events


DB

DB v0
Q1: 1000 users/sec? Q7: How to model/evolve game data? Q4: Which user cohorts?
Q2: Offline? Q8: How to scale to millions of users? Q5: Next features to build?
(Recap lectures) Q3: Support v1, v1’ versions? Q9: When machines die, restore game Experiments to run?
state gracefully? Q6: Predict ads demand?

App designer Systems designer Product/Biz designer


How?
Mobile Game
Report & Share

Example Real-Time
DBMS
Business/Product
Analysis

Game App User Events


DB

DB v0
Q1: 1000 users/sec? Q7: How to model/evolve game data? Q4: Which user cohorts?
Q2: Offline? Q8: How to scale to millions of users? Q5: Next features to build?
(Recap lectures) Q3: Support v1, v1’ versions? Q9: When machines crash, restore Experiments to run?
game state gracefully? Q6: Predict ads demand?

App designer Systems designer Product/Biz designer


1. Why Transactions?

Today’s
Lecture 2. Transactions

3. Properties of Transactions: ACID

4. Logging
Example

Unpack
ATM DB:

Transaction
Read Balance Read Balance
Give money vs Update Balance
Update Balance Give money
Visa does > 60,000 TXNs/sec with users & merchants

Want your 4$ Starbucks transaction to wait for a stranger’s 10k$ bet in Las Vegas ?
⇒ Transactions can (1) be quick or take a long time, (2) unrelated to you
Transactions are at the core of
-- payment, stock market, banks, ticketing
-- Gmail, Google Docs (e.g., multiple people editing)
Money Money (@4:29 am day+1)

Example

Monthly
bank
interest
transaction
‘T-Monthly-423’
Monthly Interest 10%
4:28 am Starts run on 100M bank accounts
Takes 24 hours to run

UPDATE Money
SET Balance = Balance * 1.1
Money Money (@4:29 am day+1)
Example

Monthly
bank
interest
transaction
Cost to update all data
100M bank accounts → 100M seeks? (worst case)

Performance (@10 msec/seek, that’s 1 million secs)

Problem1: SLOW :(
Money Money (@10:45 am)

Example
??

Monthly ?? Did T-Monthly-423 complete?


?? Which tuples are bad?
bank
interest ??

transaction
Case1: T-Monthly-423 crashed
‘T-Monthly-423’ Case2: T-Monthly-423 completed
With crash Monthly Interest 10% 4002 deposited 20$ at 10:45 am
4:28 am Starts run on 100M bank accounts
Takes 24 hours to run
Network outage at 10:29 am,
System access at 10:45 am
Problem 2: Wrong :(
15
Primary data structures/algorithms

LOGS LOCKS

Big Scale

Roadmap
?????
1. Why Transactions?

Today’s
Lecture 2. Properties of Transactions: ACID

3. Logging
Transactions: Basic Definition

A transaction (“TXN”) is a sequence of In the real world, a TXN


either happened
one or more operations (reads or completely or not at all
writes) which reflects a single real-world (e.g., you withdrew 100$
transition. from bank. Or not.)

START TRANSACTION
UPDATE Product
SET Price = Price – 1.99
WHERE pname = ‘Gizmo’
COMMIT
Transactions in SQL

• In “ad-hoc” SQL, each statement = one transaction

• In a program, multiple statements can be grouped together as a transaction

START TRANSACTION
UPDATE Bank SET amount = amount – 100
WHERE name = ‘Bob’
UPDATE Bank SET amount = amount + 100
WHERE name = ‘Joe’
COMMIT
Motivation for Transactions
Group user actions (reads & writes) into Transactions helps with two goals:

1. Recovery & Durability: Keep the data consistent and durable.


Despite system crashes, user canceling TXN part way, etc.
This lecture!

Idea: Use LOGS. Support to “commit” or “rollback” TXNs

2. Concurrency: Get better performance by parallelizing TXNs


without creating ‘bad data.’ Despite slow disk writes and reads.

Next lecture

Idea: Use LOCKS. Run several user TXNs concurrently.


Example 1: Protection against crashes / aborts

Scenario: Make a CheapProducts table, from a Products table

Client 1:
INSERT INTO CheapProduct(name, price)
SELECT pname, price
FROM Product Crash / abort!
WHERE price <= 0.99

DELETE Product
WHERE price <=0.99

What goes wrong?


Client 1:
START TRANSACTION
INSERT INTO CheapProduct(name, price)
SELECT pname, price
FROM Product
WHERE price <= 0.99

DELETE Product
WHERE price <=0.99
COMMIT

Now we’d be fine! We’ll see how / why this lecture


Example 2: Multiple users: single statements

Client 1: [at 10:01 am] Client 2: [at 10:01 am]


UPDATE Product UPDATE Product
SET Price = Price – 1.99 SET Price = Price*0.5
WHERE pname = ‘Gizmo’ WHERE pname=‘Gizmo’

Two managers attempt to discount products at same time -

What could go wrong?


Client 1: START TRANSACTION Client 2: START TRANSACTION
UPDATE Product UPDATE Product
SET Price = Price – 1.99 SET Price = Price*0.5
WHERE pname = ‘Gizmo’ WHERE pname=‘Gizmo’
COMMIT COMMIT

Now works like a charm- we’ll see how / why next lecture…
3. Properties of Transactions
1. Atomicity

2. Consistency
What you will
learn about in 3. Isolation
this section
4. Durability
ACID: Atomicity

• TXN is all or nothing


• Commits: all the changes are made

• Aborts: no changes are made


ACID: Consistency

• The tables must always satisfy user-specified integrity constraints


• E.g., Account number is unique, Sum of debits and of credits is 0

• How consistency is achieved:


• Programmer writes a TXN to go from one consistent state to a
consistent state
• System makes sure that the TXN is atomic (e.g., if EXCEPTION, rolls
back)
ACID: Isolation

• A TXN executes concurrently with other TXNs

• Effect of TXNs is the same as TXNs running one after another

Conceptually,
• similar to OS “sandboxes”
• E.g. TXNs can’t observe each other’s “partial updates”
ACID: Durability

• The effect of a TXN must persist after the TXN


• And after the whole program has terminated
• And even if there are power failures, crashes, etc.

• ⇒ Write data to durable IO (e.g., disk)


ACID Summary

• Atomic
• State shows either all the effects of TXN, or none of them
• Consistent
• TXN moves from a state where integrity holds, to another where integrity
holds
• Isolated
• Effect of TXNs is the same as TXNs running one after another
• Durable
• Once a TXN has committed, its effects remain in the database
A Note: ACID is one popular option!

• Many debates over ACID, both historically and currently

• Some “NoSQL” DBMSs relax ACID

• In turn, now “NewSQL” reintroduces ACID compliance to


NoSQL-style DBMSs…

⇒ Usually, depends on what consistency and performance your


application needs

ACID is an extremely important & successful paradigm,


but still debated!
4. Atomicity & Durability via Logging
Conceptual
Idea:
Trip to Europe

1. Make TODO list. Buy 2. Actual Visit


tickets
(Much longer than buying tickets)
Recall (on disks)

▹ Sequential reads FASTER than random reads


▹ Sequential writes (aka “appends”) FASTER than random writes

Big Idea Big Idea: LOGs (or log files or ledger)


▹ Any value that changes? Append to LOG!
LOGS! ■ LOG is a compact “todo” list of data updates
▹ Intuition:
(aka TODO/ ■ Data pages: (a) Update in RAM (fast) (b) Update on disk
ledger) later (slow)
■ LOGs: ( c) Append “todo” in LOGs and (d) control when
you Flush LOGs to disk

Many kinds of LOGs. We’ll study a few key ones!


1. How to make/use LOGs?

What you will


learn about in
this section
2. How to make it fast? (Mess with memory
and disk)
Basic Idea: (Physical) Logging

Idea:
• Log consists of an ordered list of Update Records
• Log record contains UNDO information for every update!
<TransactionID, &reference, old value, new value>
(e.g., key)

What DB does?
• Owns the log “service” for all applications/transactions.
• Appends to log. Flush when necessary — force writes to disk

This is sufficient to UNDO any transaction!


Money Money (@4:29 am day+1) WA Log (@4:29 am day+1)

Example
Update
Records

Monthly
bank Commit
interest Record

transaction
‘T-Monthly-423’

Full run
Monthly Interest 10%
4:28 am Starts run on 100M bank accounts
Takes 24 hours to run

START TRANSACTION
UPDATE Money
SET Amt = Amt * 1.10
COMMIT
Money Money (@10:45 am) WA Log (@10:29 am)

Example
??

Monthly ??
??
bank
interest ??

transaction
TXN ‘T-Monthly-423’
Did T-Monthly-423 complete?
With crash Monthly Interest 10%
4:28 am Starts run on 100M bank accounts
Which tuples are bad?

Takes 24 hours to run Case1: T-Monthly-423 was crashed


Network outage at 10:29 am, Case2: T-Monthly-423 completed. 4002
System access at 10:45 am deposited 20$ at 10:45 am

Can you infer from RED log records?


Money (@10:45 am) Money (after recovery) WA Log (@10:29 am)

Example

Monthly
bank
interest
transaction
System recovery (after 10:45 am)

1. Rollback uncommitted transactions


- Restore old values from WAL Log (if any)
Recovery 2.
- Notify developers about aborted TXN
Redo Recent transactions (w/ new values)
3. Back in business; Redo (any pending) transactions
1. How to make/use LOGs?

What you will


learn about in
this section
2. ⇒ How to make it fast? (Mess with
memory and disk)
A picture of logging

Log is a file (like any


A=7 Log data table)
1. Pages updated in

B=5
Main Memory RAM
2. Flushed as DB
blocks on disk
(sequential I/O)

A=7 “Flushing to disk”


= writing to disk
Data on Disk Log on Disk from main
memory
A picture of logging

T: R(A=7), W(A=13) [Update Record]


[T reads A=7, writes A=13] <Tid, &A, 7,13>
Operation
T A=13 Log recorded in update
log in main

B=5
Main Memory memory!

A=7
Data on Disk Log on Disk
Why do we need logging for
atomicity?
• Could we just write TXN updates to disk only once whole TXN
complete?
• Then, if abort / crash and TXN not complete, it has no effect- atomicity!
• With unlimited memory and time, this could work…

• ⇒ We need to log partial results of TXNs because of:


• Memory constraints (e.g. , billions of updates)
• Time constraints (what if one TXN takes very long?)

We need to write partial results to disk!


…And so we need a LOG to (maybe) undo these partial results!
What is the correct way to LOG to disk?

• We’ll look at the Write-Ahead Logging (WAL) protocol

• We’ll see why it works by looking at other protocols which are incorrect!

Remember: Key idea is to ensure durability


while maintaining our ability to “undo”!
Write-Ahead Logging (WAL)
TXN Commit Protocol
Write-ahead Logging (WAL)
Commit Protocol
[Update Record]

Commit after we’ve written


COMMIT
T: R(A), W(A) <Tid, &A, 7,13> record
log to disk but before
we’ve written data to
disk…
T A=13 Log-RAM

B=5
Main Memory
OK, Commit!

A=7
Data on Disk Log-Disk
Write-ahead Logging (WAL)
Commit Protocol
Commit after we’ve written
T: R(A), W(A) log to disk but before
we’ve written data to
disk… this is WAL!
T A=13

Main Memory
OK, Commit!

If we crash now, is T
<Tid, &A, 7,13> durable?
Yes
A=7
A=13
USE THE LOG!
Data on Disk Log-Disk
Write-Ahead Logging (WAL)

Algorithm: WAL

For each tuple update, write Update Record into LOG-RAM

Follow two Flush rules for LOG


● Rule1: Flush Update Record into LOG-Disk before → Durability
corresponding data page goes to storage
● Rule2: Before TXN commits,
- Flush all Update Records to LOG-Disk
- Flush COMMIT Record to LOG-Disk
→ Atomicity

Transaction is committed once COMMIT


record is on stable storage

Time > 0 Time > 0

Flush Update TXN


Flush COMMIT Record
Data Flush
Record to LOG-Disk to LOG-Disk COMMIT
Rule1: For each tuple update Rule2: Before TXN commits
after
shouldhappen
all records ateflushe
to logdisk
Incorrect Commit Protocol #1
Let’s try committing
before we’ve written
either data or LOG to
T: R(A), W(A) A: 7→13 disk…

T A=13 Log-RAM
OK, Commit!
B=5
Main Memory
If we crash now, is T
durable?
No
Lost T’s update!
A=7
Data on Disk Log-Disk
Incorrect Commit Protocol #2
Let’s try committing after
we’ve written data but
before we’ve written LOG
T: R(A), W(A) A: 7→13 to disk…

T A=13 Log-RAM
OK, Commit!
B=5
Main Memory
If we crash now, is T
durable? Yes! Except…

A=13 How do we know


whether T was
Data on Disk Log-Disk
committed??

loseinformation onleg
Money Money (@4:29 am day+1) WAL (@4:29 am day+1)
Example

Monthly
bank
interest
transaction
Cost to update all data Cost to Append to log
100M bank accounts → 100M seeks? (worst + 1 seek to get ‘end of log’
case) + write 100M log entries sequentially

Performance (@10 msec/seek, that’s 1 Million secs) (fast!!! < 10 sec)

[Lazily update data on disk later,


when convenient.]

Speedup for TXN Commit


1 Million secs vs 10 sec!!!
Logging Summary

• If DB says TX commits, TX effect remains after database


crash

• DB can undo actions and help us with atomicity

• This is only half the story…

You might also like