18 Timestampordering
18 Timestampordering
Systems
Optimistic
Concurrency Control
15-445/645 FALL 2024 PROF. ANDY PAVLO
LAST CLASS
We discussed concurrency control protocols for
generating conflict serializable schedules without
needing to know what queries a txn will execute.
OBSERVATION
If you assume that conflicts between txns are rare
and that most txns are short-lived, then forcing
txns to acquire locks adds unnecessary overhead.
TIMESTAMP ALLOCATION
Each txn Ti is assigned a unique fixed timestamp
that is monotonically increasing.
→ Let TS(Ti) be the timestamp allocated to txn Ti.
→ Different schemes assign timestamps at different times
during the txn.
TODAY’S AGENDA
Optimistic Concurrency Control
Phantom Reads
Isolation Levels
DB Flash Talk: Weaviate
OCC PHASES
Phase #1 – Read
→ Track the read/write sets of txns and store their writes in a
private workspace.
→ DBMS copies every tuple that the txn accesses from the
shared database to its workspace ensure repeatable reads.
Phase #2 – Validation
→ Assign the txn a unique timestamp (TS) and then check
whether it conflicts with other txns.
Phase #3 – Write
→ If validation succeeds, set the write timestamp (W-TS) to all
modified objects in private workspace and install them into
5-445/645 (Fall 2024)
OCC EXAMPLE
Schedule
T1 T2
Database
BEGIN Object Value W-TS
READ A 123 0
R(A) BEGIN
- - -
READ
TIME
R(A)
VALIDATE
WRITE
COMMIT
W(A)
R(A)
VALIDATE
WRITE
COMMIT
OCC EXAMPLE
Schedule
T1 T2
Database
BEGIN Object Value W-TS
READ A 123 0
R(A) BEGIN
- - -
READ
TIME
R(A)
VALIDATE
WRITE
COMMIT
W(A)
R(A)
VALIDATE
WRITE
COMMIT
OCC EXAMPLE
Schedule
T1 T2
Database
BEGIN Object Value W-TS
READ A 123 0
R(A) BEGIN
- - -
READ
TIME
R(A)
VALIDATE
WRITE T1 Workspace
COMMIT
Object Value W-TS
W(A)
- - -
R(A)
VALIDATE - - -
WRITE
COMMIT
OCC EXAMPLE
Schedule
T1 T2
Database
BEGIN Object Value W-TS
READ A 123 0
R(A) BEGIN
- - -
READ
TIME
R(A)
VALIDATE
WRITE T1 Workspace
COMMIT
Object Value W-TS
W(A)
R(A) A- 123
- -0
VALIDATE - - -
WRITE
COMMIT
OCC EXAMPLE
Schedule
T1 T2
Database
BEGIN Object Value W-TS
READ A 123 0
R(A) BEGIN
- - -
READ
TIME
R(A)
VALIDATE
WRITE T1 Workspace T2 Workspace
COMMIT
Object Value W-TS Object Value W-TS
W(A)
R(A) A- 123
- -0 - - -
VALIDATE - - - - - -
WRITE
COMMIT
OCC EXAMPLE
Schedule
T1 T2
Database
BEGIN Object Value W-TS
READ A 123 0
R(A) BEGIN
- - -
READ
TIME
R(A)
VALIDATE
WRITE T1 Workspace T2 Workspace
COMMIT
Object Value W-TS Object Value W-TS
W(A)
A- 123
- -0 -A 123
- -0
R(A)
VALIDATE - - - - - -
WRITE
COMMIT
OCC EXAMPLE
Schedule
T1 T2
Database
BEGIN Object Value W-TS
READ A 123 0
R(A) BEGIN
- - -
READ
TS(T2 )=1
TIME
R(A)
VALIDATE
WRITE T1 Workspace T2 Workspace
COMMIT
Object Value W-TS Object Value W-TS
W(A)
A- 123
- -0 -A 123
- -0
R(A)
VALIDATE - - - - - -
WRITE
COMMIT
OCC EXAMPLE
Schedule
T1 T2
Database
BEGIN Object Value W-TS
READ A 123 0
R(A) BEGIN
- - -
READ
TS(T2 )=1
TIME
R(A)
VALIDATE
WRITE T1 Workspace
COMMIT
Object Value W-TS
W(A)
R(A) A- 123
- -0
VALIDATE - - -
WRITE
COMMIT
OCC EXAMPLE
Schedule
T1 T2
Database
BEGIN Object Value W-TS
READ A 123 0
R(A) BEGIN
- - -
READ
TS(T2 )=1
TIME
R(A)
VALIDATE
WRITE T1 Workspace
COMMIT
Object Value W-TS
W(A)
R(A) A- 456
-456
123 -1∞
0
VALIDATE - - -
WRITE
COMMIT
OCC EXAMPLE
Schedule
T1 T2
Database
BEGIN Object Value W-TS
READ A 123 0
R(A) BEGIN
- - -
READ
TS(T2 )=1
TIME
R(A)
VALIDATE
WRITE T1 Workspace
COMMIT
Object Value W-TS
W(A)
R(A) A- 456
-456
123 -1∞
0
VALIDATE - - -
WRITE
COMMIT
OCC EXAMPLE
Schedule
T1 T2
Database
BEGIN Object Value W-TS
READ A 123 0
R(A) BEGIN
- - -
READ
TS(T2 )=1
TIME
R(A)
VALIDATE
WRITE T1 Workspace
COMMIT
Object Value W-TS
W(A)
R(A) TS(T1)=2 A- 456
-456
123 -1∞
0
VALIDATE - - -
WRITE
COMMIT
OCC EXAMPLE
Schedule
T1 T2
Database
BEGIN Object Value W-TS
READ A 456
123 02
R(A) BEGIN
- - -
READ
TS(T2 )=1
TIME
R(A)
VALIDATE
WRITE T1 Workspace
COMMIT
Object Value W-TS
W(A)
R(A) TS(T1)=2 A- 456
-456
123 -1∞
02
VALIDATE - - -
WRITE
COMMIT
WRITE
COMMIT BEGIN
READ
happen before T2's.
⋮ → This just means that there is serial
VALIDATE ordering.
WRITE
COMMIT
READ
R(A)
VALIDATE T1 Workspace
VALIDATE
Object Value W-TS
WRITE
- - -
COMMIT
- - -
READ
R(A)
VALIDATE T1 Workspace
VALIDATE
Object Value W-TS
WRITE
-A 123
- -0
COMMIT
- - -
READ
R(A)
VALIDATE T1 Workspace
VALIDATE
Object Value W-TS
WRITE
-A 123
-456 -0
∞
COMMIT
- - -
READ
R(A)
VALIDATE T1 Workspace T2 Workspace
VALIDATE
Object Value W-TS Object Value W-TS
WRITE
-A 123
-456 -0
∞ - - -
COMMIT
- - - - - -
READ
R(A)
VALIDATE T1 Workspace T2 Workspace
VALIDATE
Object Value W-TS Object Value W-TS
WRITE
-A 123
-456 -0
∞ -A -123 -0
COMMIT
- - - - - -
READ
R(A)
VALIDATE T1 Workspace T2 Workspace
VALIDATE
Object Value W-TS Object Value W-TS
WRITE
-A 123
-456 -0
∞ -A -123 -0
T1 must abort even though T2
COMMIT
- - - - - -
did not modify the database.
R(A)
VALIDATE
VALIDATE T1 Workspace
WRITE
Object Value W-TS
COMMIT WRITE
- - -
COMMIT
- - -
R(A)
VALIDATE
VALIDATE T1 Workspace
WRITE
Object Value W-TS
COMMIT WRITE
-A 123
- -0
COMMIT
- - -
R(A)
VALIDATE
VALIDATE T1 Workspace T2 Workspace
WRITE
Object Value W-TS Object Value W-TS
COMMIT WRITE
-A 123
- -0 - - -
COMMIT
- - - - - -
R(A)
VALIDATE
VALIDATE T1 Workspace T2 Workspace
WRITE
Object Value W-TS Object Value W-TS
COMMIT WRITE
-A 123
-456 -0
∞ - - -
COMMIT
- - - - - -
R(A)
VALIDATE
VALIDATE T1 Workspace T2 Workspace
WRITE
Object Value W-TS Object Value W-TS
COMMIT WRITE
-A 123
-456 -0
∞ -A -123 -0
COMMIT
- - - - - -
R(A)
VALIDATE
VALIDATE T1 Workspace T2 Workspace
WRITE
Object Value W-TS Object Value W-TS
COMMIT WRITE
-A 123
-456 -0
∞ -A -123 -0
COMMIT
- - - - - -
R(A)
VALIDATE
VALIDATE T1 Workspace T2 Workspace
WRITE
Object Value W-TS Object Value W-TS
COMMIT WRITE
-A 123
-456 -0
∞ -A -123 -0
Safe to commit T1 because T2
COMMIT
- - - - - -
commits logically before T1
R(B)
VALIDATE
WRITE T1 Workspace T2 Workspace
COMMIT R(A)
Object Value W-TS Object Value W-TS
VALIDATE
-A -123
456 -0∞ -B -XYZ 0-
WRITE
COMMIT - - - - - -
R(B)
VALIDATE
WRITE T1 Workspace T2 Workspace
COMMIT R(A)
Object Value W-TS Object Value W-TS
VALIDATE
-A -123
456 -0∞ -B -XYZ 0-
WRITE
COMMIT - - - - - -
R(B)
VALIDATE
WRITE T1 Workspace T2 Workspace
COMMIT R(A)
Object Value W-TS Object Value W-TS
VALIDATE
-A -123
456 -0∞
1 -B -XYZ 0-
WRITE
COMMIT - - - - - -
R(B)
VALIDATE
WRITE T2 Workspace
COMMIT R(A)
Object Value W-TS
VALIDATE
-B -XYZ 0-
WRITE
COMMIT -A -456 1-
R(B)
VALIDATE
WRITE TS(T2 )=2 T2 Workspace
COMMIT R(A)
Object Value W-TS
VALIDATE
-B -XYZ 0-
WRITE
COMMIT -A -456 1-
Txn #1
COMMIT
Txn #2
COMMIT
Txn #3
COMMIT
TIME
5-445/645 (Fall 2024)
Txn #1
COMMIT
Validation Scope
Txn #2
COMMIT
Txn #3
COMMIT
TIME
5-445/645 (Fall 2024)
Txn #1
COMMIT
Txn #2
COMMIT
Txn #3
COMMIT
TIME
5-445/645 (Fall 2024)
COMMIT
Txn #2
COMMIT
Txn #3
COMMIT
TIME
5-445/645 (Fall 2024)
Parallel Commits:
→ Use fine-grained write latches to support parallel
Validation/Write phases.
→ Txns acquire latches in a sequential key order to avoid
deadlocks.
OCC: OBSERVATIONS
OCC works well when the # of conflicts is low:
→ All txns are read-only (ideal).
→ Txns access disjoint subsets of data.
OBSERVATION
We have only dealt with transactions that read and
update existing objects in the database.
COMMIT
COMMIT
cnt=100
COMMIT
cnt=100
COMMIT
OOPS?
How did this happen?
→ Because T1 locked only existing records and not ones that
other txns are adding to the database!
RE-EXECUTE SCANS
The DBMS tracks the WHERE clause for all queries
that the txn executes.
→ Retain the scan set for every range query in a txn.
PREDICATE LOCKING
Proposed locking scheme from System R.
→ Shared lock on the predicate in a WHERE clause of a SELECT
query.
→ Exclusive lock on the predicate in a WHERE clause of any
UPDATE, INSERT, or DELETE query.
PREDICATE LOCKING
SELECT COUNT(*) AS cnt
FROM people INSERT INTO people VALUES
WHERE status='lit' (101, 'Andy', 'lit')
status='lit'
name='Andy' ∧
status='lit'
KEY-VALUE LOCKS
Locks that cover a single key-value in an index.
Need “virtual keys” for non-existent values.
10 12 14 16
GAP LOCKS
Each txn acquires a key-value lock on the single key
that it wants to access. Then get a gap lock on the
next key gap.
KEY-RANGE LOCKS
Locks that cover a key value and the gap to the next
key value in a single index.
→ Need “virtual keys” for artificial values (infinity)
KEY-RANGE LOCKS
Locks that cover a key value and the gap to the next
key value in a single index.
→ Need “virtual keys” for artificial values (infinity)
KEY-RANGE LOCKS
Locks that cover a key value and the gap to the next
key value in a single index.
→ Need “virtual keys” for artificial values (infinity)
HIERARCHICAL LOCKING
Allow for a txn to hold wider key-range locks with
different locking modes.
→ Reduces the number of visits to lock manager.
IX
B+Tree Leaf Node
HIERARCHICAL LOCKING
Allow for a txn to hold wider key-range locks with
different locking modes.
→ Reduces the number of visits to lock manager.
IX
B+Tree Leaf Node
X [14, 16)
HIERARCHICAL LOCKING
Allow for a txn to hold wider key-range locks with
different locking modes.
→ Reduces the number of visits to lock manager.
IX
B+Tree Leaf Node
X [12, 12] X [14, 16)
IX
10 {Gap} 12 {Gap} 14 {Gap} 16
[10, 16)
ISOLATION LEVELS
Controls the extent that a txn is exposed to the
actions of other concurrent txns.
ISOLATION LEVELS
SERIALIZABLE: No phantoms, all reads repeatable,
Isolation (High→Low)
no dirty reads.
ISOLATION LEVELS
Dirty Unrepeatable Lost
Read Read Updates Phantom
SERIALIZABLE No No No No
REPEATABLE No No No Maybe
READ
ISOLATION LEVELS
SERIALIZABLE: Strong Strict 2PL with phantom
protection (e.g., index locks).
Not all DBMS support all isolation BEGIN TRANSACTION ISOLATION LEVEL
<isolation-level>;
levels in all execution scenarios
→ Replicated Environments
The default depends on
implementation…
ISOLATION LEVELS
Default Maximum
Actian Ingres SERIALIZABLE SERIALIZABLE
IBM DB2 CURSOR STABILITY SERIALIZABLE
CockroachDB SERIALIZABLE SERIALIZABLE
Google Spanner STRICT SERIALIZABLE STRICT SERIALIZABLE
MSFT SQL Server READ COMMITTED SERIALIZABLE
MySQL REPEATABLE READS SERIALIZABLE
Oracle READ COMMITTED SNAPSHOT ISOLATION
PostgreSQL READ COMMITTED SERIALIZABLE
SAP HANA READ COMMITTED SERIALIZABLE
VoltDB SERIALIZABLE SERIALIZABLE
YugaByte SNAPSHOT ISOLATION SERIALIZABLE
5-445/645 (Fall 2024)
STRICT SERIALIZABLE
SERIALIZABLE
CURSOR STABILITY
READ COMMITTED
20
12 12 12 11 11
10 10 10
10 8
6
8
4 5 5
2 2 3 3 2
1 1 0
0 Read Read Committed Cursor Stability Repeatable Read Snapshot Isolation Serializable
Uncommitted
CONCLUSION
Every concurrency control protocol can be broken
down into the basic concepts that have been
described in the last two lectures.
→ Pessimistic: Locking
→ Optimistic: Timestamps
NEXT CLASS
Multi-Version Concurrency Control