Lecture 2 Ho PDF
Lecture 2 Ho PDF
Lecture 2 Ho PDF
Q Introduction
Q Distributed DBMS Architecture
Q Distributed Database Design
Q Distributed Query Processing
Q Distributed Concurrency Control
¯ Transaction Concepts & Models
¯ Serializability
¯ Distributed Concurrency Control Protocols
Distributed DBMS 1
Transaction
A transaction is a collection of actions that make consistent
transformations of system states while preserving system
consistency.
¯ concurrency transparency
¯ failure transparency
Database may be
Database in a temporarily in an Database in a
consistent inconsistent state consistent
state during execution state
Distributed DBMS 2
Page 1
Example Database
Distributed DBMS 3
Example Transaction
Begin_transaction Reservation
begin
input(flight_no, date, customer_name);
EXEC SQL UPDATE FLIGHT
SET STSOLD = STSOLD + 1
WHERE FNO = flight_no AND DATE = date;
EXEC SQL INSERT
INTO FC(FNO, DATE, CNAME, SPECIAL);
VALUES (flight_no, date, customer_name, null);
output(“reservation completed”)
end . {Reservation}
Distributed DBMS 4
Page 2
Termination of Transactions
Begin_transaction Reservation
begin
input(flight_no, date, customer_name);
EXEC SQL SELECT STSOLD,CAP
INTO temp1,temp2
FROM FLIGHT
WHERE FNO = flight_no AND DATE = date;
if temp1 = temp2 then
output(“no free seats”);
Abort
else
EXEC SQL UPDATE FLIGHT
SET STSOLD = STSOLD + 1
WHERE FNO = flight_no AND DATE = date;
EXEC SQL INSERT
INTO FC(FNO, DATE, CNAME, SPECIAL);
VALUES (flight_no, date, customer_name, null);
Commit
output(“reservation completed”)
endif
end . {Reservation}
Distributed DBMS 5
Properties of Transactions
ATOMICITY
¯ all or nothing
CONSISTENCY
¯ no violation of integrity constraints
ISOLATION
¯ concurrent changes invisible È serializable
DURABILITY
¯ committed updates persist
Distributed DBMS 6
Page 3
Transactions Provide…
Distributed DBMS 7
Q Reliability protocols
¯ Atomicity & Durability
Distributed DBMS 8
Page 4
Transaction Processing Issues
Distributed DBMS 9
Architecture Revisited
Begin_transaction,
Read, Write,
Commit, Abort Results
Distributed
Execution Monitor
Transaction Manager
(TM)
With other With other
Scheduling/
TMs Descheduling SCs
Requests
Scheduler
(SC)
To data
processor
Distributed DBMS 10
Page 5
Centralized Transaction
Execution
User User
…
Application Application
Read, Write,
Results
Abort, EOT
Scheduler
(SC)
Scheduled
Results
Operations
Recovery
Manager
(RM)
Distributed DBMS 11
Distributed Transaction
Execution
User application
Distributed
SC SC Concurrency Control
Protocol
Local
RM RM Recovery
Protocol
Distributed DBMS 12
Page 6
Concurrency Control
Q The problem of synchronizing concurrent
transactions such that the consistency of the
database is maintained while, at the same time,
maximum degree of concurrency is achieved.
Q Anomalies:
¯ Lost updates
X The effects of some transactions are not reflected on the
database.
¯ Inconsistent retrievals
X A transaction, if it reads the same data item more than
once, should always read the same value.
Distributed DBMS 13
H1={W2(x),R1(x), R3(x),W1(x),C1,W2(y),R3(y),R2(z),C2,R3(z),C3}
Distributed DBMS 14
Page 7
Serial History
Q All the actions of a transaction occur
consecutively.
Q No interleaving of transaction operations.
Q If each transaction is consistent (obeys integrity
rules), then the database is guaranteed to be
consistent at the end of executing a serial history.
Hs={W2(x),W2(y),R2(z),C2,R1(x),W1(x),C1,R3(x),R3(y),R3(z),C3}
Distributed DBMS 15
Serializable History
Q Transactions execute concurrently, but the net
effect of the resulting history upon the database is
equivalent to some serial history.
Q Equivalent with respect to what?
¯ Conflict equivalence: the relative order of execution of the
conflicting operations belonging to unaborted transactions in
two histories are the same.
¯ Conflicting operations: two incompatible operations (e.g.,
Read and Write) conflict if they both access the same data
item.
X Incompatible operations of each transaction is assumed to
conflict; do not change their execution orders.
X If two operations from two different transactions conflict,
the corresponding transactions are also said to conflict.
Distributed DBMS 16
Page 8
Serializability in Distributed
DBMS
Q Somewhat more involved. Two histories have to
be considered:
¯ local histories
¯ global history
Distributed DBMS 17
Global Non-serializability
T1: Read(x) T2: Read(x)
x ←x+5 x ←x∗15
Write(x) Write(x)
Commit Commit
Distributed DBMS 18
Page 9
Concurrency Control
Algorithms
Q Pessimistic
¯ Two-Phase Locking-based (2PL)
X Centralized (primary site) 2PL
X Primary copy 2PL
X Distributed 2PL
¯ Timestamp Ordering (TO)
X Basic TO
X Multiversion TO
X Conservative TO
¯ Hybrid
Q Optimistic
¯ Locking-based
¯ Timestamp ordering-based
Distributed DBMS 19
Locking-Based Algorithms
Q Transactions indicate their intentions by requesting
locks from the scheduler (called lock manager).
Q Locks are either read lock (rl) [also called shared
lock] or write lock (wl) [also called exclusive lock]
Q Read locks and write locks conflict (because Read
and Write operations are incompatible
rl wl
rl yes no
wl no no
Q Locking works nicely to allow concurrent processing
of transactions.
Distributed DBMS 20
Page 10
Two-Phase Locking (2PL)
A Transaction locks an object before using it.
When an object is locked by another transaction,
the requesting transaction must wait.
When a transaction releases a lock, it may not
request another lock. Lock point
Obtain lock
Release lock
No. of locks
Phase 1 Phase 2
BEGIN END
Distributed DBMS 21
Strict 2PL
Hold locks until the end.
Obtain lock
Release lock
Transaction
BEGIN END duration
period of
data item
use
Distributed DBMS 22
Page 11
Centralized 2PL
Q There is only one 2PL scheduler in the distributed system.
Q Lock requests are issued to the central scheduler.
Data Processors at
participating sites Coordinating TM Central Site LM
Lock
Requ
est
ranted
Lock G
tion
Opera
End o
fO peratio
n
Relea
se Locks
Distributed DBMS 23
Distributed 2PL
Q 2PL schedulers are placed at each site. Each
scheduler handles lock requests for data at that
site.
Q A transaction may read any of the replicated
copies of item x, by obtaining a read lock on one
of the copies of x. Writing into x requires
obtaining write locks for all copies of x.
Distributed DBMS 24
Page 12
Distributed 2PL Execution
Coordinating TM Participating LMs Participating DPs
Lock
Requ
est
Oper
ation
ation
End of Oper
Rele
ase Lock
s
Distributed DBMS 25
Timestamp Ordering
Transaction (Ti) is assigned a globally unique timestamp
ts(Ti).
Transaction manager attaches the timestamp to all
operations issued by the transaction.
Each data item is assigned a write timestamp (wts) and a
read timestamp (rts):
¯ rts(x) = largest timestamp of any read on x
¯ wts(x) = largest timestamp of any read on x
Conflicting operations are resolved by timestamp order.
Basic T/O:
for Ri(x) for Wi(x)
if ts(Ti) < wts(x) if ts(Ti) < rts(x) and ts(Ti) < wts(x)
then reject Ri(x) then reject Wi(x)
else accept Ri(x) else accept Wi(x)
rts(x) ← ts(Ti) wts(x) ← ts(Ti)
Distributed DBMS 26
Page 13
Multiversion Timestamp
Ordering
Q Do not modify the values in the database, create
new values.
Q A Ri(x) is translated into a read on one version of
x.
¯ Find a version of x (say xv) such that ts(xv) is the largest
timestamp less than ts(Ti).
Distributed DBMS 27
Optimistic execution
Distributed DBMS 28
Page 14
Optimistic Concurrency Control
Algorithms
Distributed DBMS 29
Tk R V W
R V W
Tij
Distributed DBMS 30
Page 15
Optimistic CC Validation Test
If there is any transaction Tk such that ts(Tk)<ts(Tij) and
which completes its write phase while Tij is in its read
phase, then validation succeeds if WS(Tk) ∩
RS(Tij) = Ø
¯ Read and write phases overlap, but Tij does not read data items
written by Tk
Tk R V W
R V W
Tij
Distributed DBMS 31
R V W
Tk
R V W
Tij
Distributed DBMS 32
Page 16
Deadlock
Q A transaction is deadlocked if it is blocked and will
remain blocked until there is intervention.
Q Locking-based CC algorithms may cause deadlocks.
Q TO-based algorithms that involve waiting may cause
deadlocks.
Q Wait-for graph
¯ If transaction Ti waits for another transaction Tj to release a lock on
an entity, then Ti → Tj in WFG.
Ti Tj
Distributed DBMS 33
T2 T3
Global WFG
T1 T4
T2 T3
Distributed DBMS 34
Page 17
Deadlock Management
Q Prevention
¯ Guaranteeing that deadlocks can never occur in the first
place. Check transaction when it is initiated. Requires no run
time support.
Q Avoidance
¯ Detecting potential deadlocks in advance and taking action to
insure that deadlock will not occur. Requires run time support.
Q Detection and Recovery
¯ Allowing deadlocks to form and then finding and breaking
them. As in the avoidance scheme, this requires run time
support.
Distributed DBMS 35
Deadlock Prevention
Q All resources which may be needed by a transaction
must be predeclared.
¯ The system must guarantee that none of the resources will be
needed by an ongoing transaction.
¯ Resources must only be reserved, but not necessarily allocated a
priori
¯ Unsuitability of the scheme in database environment
¯ Suitable for systems that have no provisions for undoing processes.
Q Evaluation:
– Reduced concurrency due to preallocation
– Evaluating whether an allocation is safe leads to added overhead.
– Difficult to determine (partial order)
+ No transaction rollback or restart is involved.
Distributed DBMS 36
Page 18
Deadlock Avoidance
Q Transactions are not required to request
resources a priori.
Q Transactions are allowed to proceed unless a
requested resource is unavailable.
Q In case of conflict, transactions may be
allowed to wait for a fixed time interval.
Q Order either the data items or the sites and
always request locks in that order.
Q More attractive than prevention in a database
environment.
Distributed DBMS 37
Deadlock Detection
Distributed DBMS 38
Page 19
Centralized Deadlock Detection
Q One site is designated as the deadlock detector for
the system. Each scheduler periodically sends its
local WFG to the central site which merges them to a
global WFG to determine cycles.
Q How often to transmit?
¯ Too often ⇒ higher communication cost but lower delays due to
undetected deadlocks
¯ Too late ⇒ higher delays due to deadlocks, but lower
communication cost
Distributed DBMS 39
DDox
DD11 DD14
Distributed DBMS 40
Page 20
Distributed Deadlock Detection
Q Sites cooperate in detection of deadlocks.
Q One example:
¯ The local WFGs are formed at each site and passed on to other
sites. Each local WFG is modified as follows:
Since each site receives the potential deadlock cycles from
other sites, these edges are added to the local WFGs
The edges in the local WFG which show that local transactions
are waiting for transactions at other sites are joined with edges
in the local WFGs which show that remote transactions are
waiting for local ones.
¯ Each local deadlock detector:
X looks for a cycle that does not involve the external edge. If it
exists, there is a local deadlock which can be handled locally.
X looks for a cycle involving the external edge. If it exists, it
indicates a potential global deadlock. Pass on the information to
the next site.
Distributed DBMS 41
Outline
Q Introduction
Q Distributed DBMS Architecture
Q Distributed Database Design
Q Distributed Query Processing
Q Distributed Concurrency Control
Q Distributed Reliability Protocols
¯ Distributed Commit Protocols
¯ Distributed Recovery Protocols
Distributed DBMS 42
Page 21
Reliability
Problem:
How to maintain
atomicity
durability
properties of transactions
Distributed DBMS 43
Types of Failures
Q Transaction failures
¯ Transaction aborts (unilaterally or due to deadlock)
¯ Avg. 3% of transactions abort abnormally
Q System (site) failures
¯ Failure of processor, main memory, power supply, …
¯ Main memory contents are lost, but secondary storage contents
are safe
¯ Partial vs. total failure
Q Media failures
¯ Failure of secondary storage devices such that the stored data is
lost
¯ Head crash/controller failure (?)
Q Communication failures
¯ Lost/undeliverable messages
¯ Network partitioning
Distributed DBMS 44
Page 22
Local Recovery Management –
Architecture
Q Volatile storage
¯ Consists of the main memory of the computer system (RAM).
Q Stable storage
¯ Resilient to failures and loses its contents only in the presence of
media failures (e.g., head crashes on disks).
¯ Implemented via a combination of hardware (non-volatile storage)
and software (stable-write, stable-read, clean-up) components.
Main memory
Secondary Local Recovery
storage Manager
Fetch,
Flush Database
Stable Read Write buffers
database Database Buffer (Volatile
Write Manager Read database)
Distributed DBMS 45
Update Strategies
Q In-place update
¯ Each update causes a change in one or more data values on
pages in the database buffers
Q Out-of-place update
¯ Each update causes the new value(s) of data item(s) to be
stored separate from the old value(s)
Distributed DBMS 46
Page 23
In-Place Update Recovery
Information
Database Log
Every action of a transaction must not only perform the action,
but must also write a log record to an append-only file.
Old New
Update
stable database stable database
Operation
state state
Database
Log
Distributed DBMS 47
Logging Interface
Secondary
storage
Main memory
Log
Stable Local Recovery
Manager buffers
log a d
Fetch, Re e
rit Database
Flush W
Read buffers
Stable Read Database Buffer
Manager (Volatile
database Write Write database)
Distributed DBMS 48
Page 24
REDO Protocol
Old New
stable database REDO stable database
state state
Database
Log
UNDO Protocol
New Old
stable database UNDO stable database
state state
Database
Log
Page 25
Write–Ahead Log (WAL)
Protocol
Q Notice:
¯ If a system crashes before a transaction is committed, then all
the operations must be undone. Only need the before images
(undo portion of the log).
¯ Once a transaction is committed, some of its actions might
have to be redone. Need the after images (redo portion of the
log).
Q WAL protocol :
Before a stable database is updated, the undo portion of the
log should be written to the stable log
When a transaction commits, the redo portion of the log must
be written to stable log prior to the updating of the stable
database.
Distributed DBMS 51
Distributed DBMS 52
Page 26
Two-Phase Commit (2PC)
Phase 1 : The coordinator gets the participants
ready to write the results into the database
Phase 2 : Everybody writes the results into the
database
¯ Coordinator :The process at the site where the transaction
originates and which controls the execution
¯ Participant :The process at the other sites that participate in
executing the transaction
Global Commit Rule:
The coordinator aborts a transaction if and only if at least one
participant votes to abort it.
The coordinator commits a transaction if and only if all of the
participants vote to commit it.
Distributed DBMS 53
Centralized 2PC
P P
P P
C C C
P P
P P
Phase 1 Phase 2
Distributed DBMS 54
Page 27
2PC Protocol Actions
Coordinator Participant
INITIAL INITIAL
A RE
PREP
write
begin_commit write abort No Ready to
in log in log Commit?
ORT
E-AB
VOT Yes
WAIT VOTE-COMMIT write ready
in log
Distributed DBMS 55
Linear 2PC
Phase 1
1 2 3 4 5 N
Phase 2
Distributed DBMS 56
Page 28
Distributed 2PC
Coordinator Participants Participants
global-commit/
global-abort
vote-abort/ decision made
prepare vote-commit independently
Phase 1
Distributed DBMS 57
WAIT READY
Coordinator Participants
Distributed DBMS 58
Page 29
Site Failures - 2PC Termination
COORDINATOR
Q Timeout in INITIAL
INITIAL
¯ Who cares
ABORT COMMIT
Distributed DBMS 59
INITIAL
Q Timeout in INITIAL
¯ Coordinator must have failed in Prepare
INITIAL state Vote-commit
Prepare
Vote-abort
¯ Unilaterally abort
READY
Q Timeout in READY
¯ Stay blocked Global-abort Global-commit
Ack Ack
ABORT COMMIT
Distributed DBMS 60
Page 30
Site Failures - 2PC Recovery
COORDINATOR
ABORT COMMIT
Distributed DBMS 61
Distributed DBMS 62
Page 31
Problem With 2PC
Q Blocking
¯ Ready implies that the participant waits for the coordinator
¯ If coordinator fails, site is blocked until recovery
¯ Blocking reduces availability
Distributed DBMS 63
Three-Phase Commit
Q 3PC is non-blocking.
Q A commit protocols is non-blocking iff
¯ it is synchronous within one state transition, and
¯ its state transition diagram contains
X no state which is “adjacent” to both a commit
and an abort state, and
X no non-committable state which is “adjacent” to
a commit state
Q Adjacent: possible to go from one stat to
another with a single state transition
Q Committable: all sites have voted to
commit a transaction
¯ e.g.: COMMIT state
Distributed DBMS 64
Page 32
Communication Structure
P P P
P P P
C C C C
P P P
P P P
pre-commit/
ready? yes/no pre-abort? yes/no commit/abort ack
Distributed DBMS 65
Network Partitioning
Q Simple partitioning
¯ Only two partitions
Q Multiple partitioning
¯ More than two partitions
Distributed DBMS 66
Page 33
Independent Recovery Protocols for
Network Partitioning
Q No general solution possible
¯ allow one group to terminate while the other is blocked
¯ improve availability
Distributed DBMS 67
Distributed DBMS 68
Page 34
Quorum Protocols for
Replicated Databases
Q Network partitioning is handled by the replica
control protocol.
Q One implementation:
¯ Assign a vote to each copy of a replicated data item (say
Vi) such that Σi Vi = V
¯ Each operation has to obtain a read quorum (Vr) to read
and a write quorum (Vw) to write a data item
¯ Then the following rules have to be obeyed in determining
the quorums:
X Vr + Vw > V a data item is not read and written
by two transactions concurrently
X Vw > V/2 two write operations from two
transactions cannot occur
concurrently on the same data item
Distributed DBMS 69
Distributed DBMS 70
Page 35