Introduction: Chapter 13: Distributed Transactions
Introduction: Chapter 13: Distributed Transactions
• Introduction
• Flat and nested distributed transactions
• Atomic commit protocols
• Concurrency control in distributed
transactions
• Distributed deadlocks
• Transaction recovery
• Summary
Introduction
• Distributed transaction
– A flat or nested transaction that accesses objects
managed by multiple servers
• Atomicity of transaction
– All or nothing for all involved servers
– Two phase commit
• Concurrency control
– Serialize locally + serialize globally
– Distributed deadlock
Chapter 13: Distributed Transactions
• Introduction
• Flat and nested distributed transactions
• Atomic commit protocols
• Concurrency control in distributed
transactions
• Distributed deadlocks
• Transaction recovery
• Summary
Flat and nested distributed transactions
• Flat transaction
• Nested transaction
• Nested banking transaction
– The four subtransactions run in parallel
The architecture of distributed transactions
• The coordinator
– Accept client request
– Coordinate behaviors on different servers
– Send result to client
– Record a list of references to the participants
• The participant
– One participant per server
– Keep track of all recoverable objects at each server
– Cooperate with the coordinator
– Record a reference to the coordinator
• Example
Chapter 13: Distributed Transactions
• Introduction
• Flat and nested distributed transactions
• Atomic commit protocols
• Concurrency control in distributed
transactions
• Distributed deadlocks
• Transaction recovery
• Summary
One-phase atomic commit protocol
• The protocol
– Client request to end a transaction
– The coordinator communicates the commit or
abort request to all of the participants and to
keep on repeating the request until all of them
have acknowledged that they had carried it out
• The problem
– some servers commit, some servers abort
• How to deal with the situation that some servers
decide to abort?
Introduction to two-phase commit protocol
• Allow for any participant to abort
• First phase
– Each participant votes to commit or abort
• The second phase
– All participants reach the same decision
• If any one participant votes to abort, then all abort
• If all participants votes to commit, then all commit
– The challenge
• work correctly when error happens
• Failure model
– Server crash, message may be lost
The two-phase commit protocol
• When the client request to abort
– The coordinator informs all participants to abort
• When the client request to commit
– First phase
• The coordinator ask all participants if they prepare to
commit
• If a participant prepare to commit, it saves in the
permanent storage all of the objects that it has altered
in the transaction and reply yes. Otherwise, reply no
– Second phase
• The coordinator tell all participants to commit ( or
abort)
The two-phase commit protocol … continued
• Operations for two-phase commit protocol
• The two-phase commit protocol
– Record updates that are prepared to commit in
the permanent storage
• When the server crash, the information can be
retrieved by a new process
• If the coordinator decide to commit, all
participants will commit eventually
Timeout actions in the two-phase commit protocol
• Communication in two-phase commit protocol
• New processes to mask crash failure
– Crashed process of coordinator and participant will be repla
ced by new processes
• Time out for the participant
– Timeout of waiting for canCommit: abort
– Timeout of waiting for doCommit
• Uncertain status: Keep updates in the permanent storage
• getDecision request to the coordinator
• Time out for the coordinator
– Timeout of waiting for vote result: abort
– Timeout of waiting for haveCommited: do nothing
• The protocol can work correctly without the confirmation
Two-phase commit protocol for nested transactions
• Each subtransaction
– If commit provisionally
• Report the status of it and its descendants to its pa
rent
– If abort
• Report abort to its parent
• Top level transaction
– Receive a list of status of all subtransactions
– Start two-phase commit protocol on all subtra
nsactions that have committed provisionally
Example of a distributed nested transactions
• Introduction
• Flat and nested distributed transactions
• Atomic commit protocols
• Concurrency control in distributed
transactions
• Distributed deadlocks
• Transaction recovery
• Summary
Serial equivalence on all servers
• Objective
– Serial equivalence on all involved servers
• If transaction T is before transaction U in their
conflicting access to objects at one of the server
then they must be in that order at all of the servers
whose objects are accessed in a conflicting
manner by both T and U
• Approach
– Each server apply concurrency control it its
own objects
– All servers coordinate together to reach the
objective
Lock
• Each participant locks on objects locally
– strict two phase locking scheme
• Atomic commit protocol
– a server can not release any locks until it knows that
the transaction has been committed or aborted at all
• Distributed deadlock
Timestamp ordering concurrency control
• Globally unique transaction timestamp
– Be issued to the client by the first coordinator
accessed by a transaction
– The transaction timestamp is passed to the
coordinator at each server
– Each server accesses shared objects according to the
timestamp
• Resolution of a conflict
– Abort a transaction from all servers
Optimistic concurrency control
• The validation
– takes place during the first phase of two phase com
mit protocol
• Parallel validation
– Suitable for distributed transaction
– Rule 3 must be checked as well as rule 2 for backwa
rd validation
– Possibly different validation order on different serve
r
• measure 1: global validation check that the combination o
f the orderings at the individual servers is serializable
• measure 2: each server validates according to a globally u
nique transaction number of each transaction
Chapter 13: Distributed Transactions
• Introduction
• Flat and nested distributed transactions
• Atomic commit protocols
• Concurrency control in distributed
transactions
• Distributed deadlocks
• Transaction recovery
• Summary
Distributed deadlocks
• Distributed deadlocks
– A cycle in the global wait-for graph
• An example
• Simple resolution
– A centralized deadlock detector
• collect latest copy of each servers local wait-for graph
• construct global wait-for graph
• find cycles in the global wait-for graph
– Drawbacks
• poor availability, lack of fault tolerance, poor scalability
• cost of collecting information is high
Phantom deadlocks
• Phantom deadlocks
– a deadlock that is detected but is not really a
deadlock
• may occur when some deadlocked transactions abort or
release locks
• An example
– at server Y: U request lock V
– at server X: U release lock for T
– at global deadlock detector: message from server Y
arrives earlier than message from server X, then
phantom deadlock happens
Edge chasing
• Idea
– Detect dead-lock in a distributed manner
– Each server involved in the dead-lock
forwards the partial knowledge of wait-for
edge which is called probes to other servers to
construct the wait-for graph
• Question
– When to send a probe?
Edge chasing algorithm
• Initiation
– When a server finds that a transaction T starts waiting for another
transaction U, where U is waiting to access an object at another
server, it initiates detection by sending a probe containing the edge
<TU> to the server of the object at which transaction U is blocked
• Detection
– Receive probes
– Detect whether deadlock has occurred
• Merge the local wait-for knowledge and that of the probes, find cycle
– Decide whether to forward the probes
• If there is a new transaction V is waiting for another object elsewhere, the
probe is forwarded
• Resolution
– When a cycle is detected, a transaction in the cycle is aborted
Edge chasing algorithm - example
W
W U V W Held by Waits for
Deadlock
detected C
A
Z
Initiation X
W U V
Waits
for W U
V
U
• Example
– Priorities: U > V > W
– Deadlock will be detected when W begins to
detect U
Chapter 13: Distributed Transactions
• Introduction
• Flat and nested distributed transactions
• Atomic commit protocols
• Concurrency control in distributed
transactions
• Distributed deadlocks
• Transaction recovery
• Summary
What is transaction recovery?
• Durability and failure atomicity
• Recovery
– Restoring the server with the latest committed versions
of its objects from permanent storage
• The task of the recovery manager
– Save objects in permanent storage (in a recovery file)
for committed transactions
– Restore the server’s objects after a crash
– Reorganize the recovery file to improve the
performance of recovery
– Reclaim storage space ( in the recovery file)
Important components of a recovery file
• Intentions list
– Keep track of the objects accessed by transactions
– An intention list per active transaction
– Contains a list of the references and the values of all the
objects that are altered by the transaction
– When the transaction is committed
• Replace the committed version of each object by the tentative version
object
– When the transaction is aborted
• Delete the tentative version object
• Example
– Each transaction status entry contains a pointer to the previous
transaction status entry
Recovery of objects
• When a server is replaced after a crash
– Set default initial values for all objects, then hand over to
recovery manager
• Recovery manager’s task
– Include all the effects of all the committed transactions
performed in the correct order and none of the effects of
incomplete or aborted transactions
• Two approaches
– Find most recent checkpoint, and then replay all committed
transactions after the checkpoint by the help of intention lists
and the committed values of objects
– Read the recovery file backwards until all objects have been
restored to the most recent committed values
Recovery of objects - example
• If the server fails at the point reached at P7
• Restore by the second approach
– P7 is ignored
– P4 is committed, so find P3
– Restore A and B by the intention list of P3
– Restore C by P0
• Reorganize the recovery file
– Add an aborted transaction status to the recovery file
for transaction U
Reorganize the recovery file
• Checkpoint
– Checkpointing
• The process of writing the current committed values of a server’s obje
ct to a new recovery file, together with transaction status entries and in
tentions lists of transactions that have not yet been fully resolved
– Checkpoint
• The information stored by the checkpointing process
– The purpose of make checkpoints
• Reduce the number of transactions to be dealt with during recovery, T
o reclaim file space
– When to make checkpoint
• Immediately after recovery, or from time to time
• Recovery from the checkpoint
– Discard old recovery file
Shadow versions
• Map and Version store
– Map locates versions of the objects in a file called a
version store
– To restore objects, locate the objects in the version
store by the map
• When a transaction is prepared to commit
– Updated objects are appended to the version store
• Shadow version: these new as yet tentative versions
• When a transaction commits
– New map is made by copying the old map and
entering the positions of the shadow versions
Shadow versions … continued
• Example
• Shadow version vs. logging
– Faster recovery
• The positions of the current committed
objects are recorded in the map
– Slower normal activity
• Switch from the old map to the new map
must be performed in a single atomic step,
so as to lead to an additional stable storage
write
Chapter 13: Distributed Transactions
• Introduction
• Flat and nested distributed transactions
• Atomic commit protocols
• Concurrency control in distributed
transactions
• Distributed deadlocks
• Transaction recovery
• Summary
Summary
• Flat and nested distributed transaction
• Two-phase commit protocol
– Take an unbounded amount of time to complete but
is guaranteed to complete eventually
• Concurrency control
– Lock
– timestamp ordering
– Optimistic concurrency control
Summary … continued
• Distributed deadlock
– Edge-chasing algorithm
• Recovery
– Logging
– Shadow version
Distributed transactions
X
Client T1 N
T T 12
Y
T
T
T
21
T2
Client
Y
P
Z
T
22
Nested banking transaction
X
Client T A a.withdraw(10)
1
T
T = openTransaction Y
openSubTransaction T B b.withdraw(20)
2
a.withdraw(10);
openSubTransaction
b.withdraw(20); Z
openSubTransaction
c.deposit(10); T
3 C c.deposit(10)
openSubTransaction
d.deposit(20); T D d.deposit(20)
4
closeTransaction
A distributed banking transaction
T = openTransaction
join BranchY
a.withdraw(4);
c.deposit(4); participant
b.withdraw(3);
d.deposit(3); C c.deposit(4);
closeTransaction
D d.deposit(3);
Note: the coordinator is in one of the servers, e.g. BranchX BranchZ
Operations for two-phase commit protocol
canCommit?(trans)-> Yes / No
Call from coordinator to participant to ask whether it can commit a
transaction. Participant replies with its vote.
doCommit(trans)
Call from coordinator to participant to tell participant to commit its part of a
transaction.
doAbort(trans)
Call from coordinator to participant to tell participant to abort its part of a
transaction.
haveCommitted(trans, participant)
Call from participant to coordinator to confirm that it has committed the
transaction.
getDecision(trans) -> Yes / No
Call from participant to coordinator to ask for the decision on a transaction
after it has voted Yes but has still had no reply after some delay. Used to
recover from server crash or delayed messages.
Operations for two-phase commit protocol
Coordinator Participant
haveCommitted 4 committed
done
Transaction T decides whether to commit
T abort (at M)
11
T aborted (at Y)
2
T provisional commit (at P)
22
Transaction T decides whether to commit
T 11 no (aborted) T 11
T 22 no (parent aborted)T 22
canCommit? for hierarchic two-phase commit
protocol
T U
Write(A) At X LocksA
Write(B) At Y LocksB
U V W
d.deposit(10) lock D
b.deposit(10) lock B
lock A at Y
a.deposit(20)
at X
c.deposit(30) lock C
at Z
b.withdraw(30) wait at Y
c.withdraw(20) wait at Z
a.withdraw(20) wait at X
Distributed deadlock
(a) (b)
W
W
Held by Waits for
C D A
Z X V
Held
Waits Held by by
for U
V U
B Waits for
Held
by
Y
Local and global wait-for graphs
T
T U V T
U V
X Y
Two probes initiated
T
Waits for T Waits for T T U
W V T
V U W V T U
V U V
T U W U
T U W V
W Waits
for W V
Waits
W W for
Probes travel downhill
( a) V stores probe when U starts waiting ( b) Probe is forwarded when V starts waiting
W
W UV probe
U Waits V W queue
for C
U V U
V Waits for
V Waits for
probe U V B B
queue UV
Type of entry in a recovery file
commit protocol.
P0 P1 P2 P3 P4 P5 P6 P7
Object:A Object:B Object:C Object:A Object:B Trans:T Trans:T Object:C Object:B Trans:U
100 200 300 80 220 prepared committed 278 242 prepared
<A, P1> <C, P5>
<B, P2> <B, P6>
P0 P3 P4
Checkpoint
End
of log
Shadow versions
P0 P0' P0" P1 P2 P3 P4
Version store 100 200 300 80 220 278 242
Checkpoint