0% found this document useful (0 votes)
3 views

DistributedTransaction

The document discusses distributed transactions, emphasizing the need for atomicity and coordination among multiple servers to ensure that transactions either fully commit or abort. It outlines various protocols for managing these transactions, including one-phase and two-phase commit protocols, and addresses issues like concurrency control, deadlock detection, and resolution strategies. The document also explores the complexities of maintaining global serializability and the challenges posed by distributed deadlocks.

Uploaded by

raphaelvon28
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

DistributedTransaction

The document discusses distributed transactions, emphasizing the need for atomicity and coordination among multiple servers to ensure that transactions either fully commit or abort. It outlines various protocols for managing these transactions, including one-phase and two-phase commit protocols, and addresses issues like concurrency control, deadlock detection, and resolution strategies. The document also explores the complexities of maintaining global serializability and the challenges posed by distributed deadlocks.

Uploaded by

raphaelvon28
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 46

Distributed transactions

Topics in Distributed Transactions

 In previous chapter, we discussed transactions accessed objects at


a single server. In the general case, a transaction will access
objects located in different computers. Distributed transaction
accesses objects managed by multiple servers.
 The atomicity property requires that either all of the servers
involved in the same transaction commit the transaction or all of
them abort. Agreement among servers are necessary.
 Transaction recovery is to ensure that all objects are recoverable.
The values of the objects reflect all changes made by committed
transactions and none of those made by aborted ones.
Distributed transactions

(a) Flat transaction (b) Nested transactions


M
X
T11

X
Client T1 N
T T 12
Y
T
T
T
21
T2
Client
Y
P
Z
T
22

Flat transaction send out requests to different


servers and each request is completed before client
goes to the next one. Nested transaction allows sub-
transactions at the same level to execute
Nested banking transaction

X
Client T A a.withdraw(10)
1
T

T = openTransaction Y
openSubTransaction T B b.withdraw(20)
2
a.withdraw(10);
openSubTransaction
b.withdraw(20); Z
openSubTransaction
c.deposit(10); T
3 C c.deposit(10)
openSubTransaction
d.deposit(20); T D d.deposit(20)
4
closeTransaction
Coordinator of a distributed transaction

 Servers for a distributed transaction need to coordinate their


actions.
 A client starts a transaction by sending an openTransaction
request to a coordinator. The coordinator returns the TID to
the client. The TID must be unique (serverIP and number
unique to that server)
 Coordinator is responsible for committing or aborting it. Each
other server in a transaction is a participant. Participants are
responsible for cooperating with the coordinator in carrying
out the commit protocol, and keep track of all recoverable
objects managed by it.
 Each coordinator has a set of references to the participants.
Each participant records a reference to the coordinator.
A distributed banking transaction
coordinator

openTransaction join participant


closeTransaction
A a.withdraw(4);
.
join
BranchX
T
participant
b.withdraw(T, 3);
Client B b.withdraw(3);

T = openTransaction
join BranchY
a.withdraw(4);
c.deposit(4); participant
b.withdraw(3);
d.deposit(3); C c.deposit(4);
closeTransaction
D d.deposit(3);
Note: client invoke an operation b.withdraw(),
B will inform participant at BranchY to join coordinator. BranchZ

the coordinator is in one of the servers, e.g. BranchX


One-phase atomic commit protocol

 A transaction comes to an end when the client requests


that a transaction be committed or aborted.
 Simple way is: coordinator to communicate the commit
or abort request to all of the participants in the
transaction and to keep on repeating the request until all
of them have acknowledged that they had carried it out.
 Inadequate because when the client requests a commit,
it does not allow a server to make a unilateral decision
to abort a transaction. E.g. deadlock avoidance may
force a transaction to abort at a server when locking is
used. So any server may fail or abort and client is not
aware.
Two-phase commit protocol

 Allow any participant to abort its part of a transaction.


Due to atomicity, the whole transaction must also be
aborted.
 In the first phase, each participant votes for the
transaction to be committed or aborted. Once voted to
commit, not allowed to abort it. So before votes to
commit, it must ensure that it will eventually be able to
carry out its part, even if it fails and is replaced.
 A participant is said to be in a prepared state if it will
eventually be able to commit it. So each participant
needs to save the altered objects in the permanent
storage device together with its status-prepared.
Two-phase commit protocol

 In the second phase, every participant in the


transaction carries out the joint decision. If any one
participant votes to abort, the decision must be to
abort. If all the participants vote to commit, then
the decision is to commit the transaction.
 The problem is to ensure that all of the participants
vote and that they all reach the same decision. It is
an example of consensus. It is simple if no error
occurs. However, it should work when servers fail,
message lost or servers are temporarily unable to
communicate with one another.
Two-phase commit protocol

 If the client requests abort, or if the transaction


is aborted by one of the participants, the
coordinator informs the participants
immediately.
 It is when the client asks the coordinator to
commit the transaction that two-phase commit
protocol comes into use.
 In the first phase, the coordinator asks all the
participants if they are prepared to commit; and
in the second, it tells them to commit or abort
the transaction.
Operations for two-phase commit protocol

canCommit?(trans)-> Yes / No
Call from coordinator to participant to ask whether
it can commit a transaction. Participant replies
with its vote.
doCommit(trans)
Call from coordinator to participant to tell
participant to commit its part of a transaction.
doAbort(trans)
Call from coordinator to participant to tell
participant to abort its part of a transaction.
haveCommitted(trans, participant)
Call from participant to coordinator to confirm that
it has committed the transaction.
getDecision(trans) -> Yes / No
Call from participant to coordinator to ask for the
decision on a transaction after it has voted Yes but
has still had no reply after some delay. Used to
recover from server crash or delayed messages.
The two-phase commit protocol

Phase 1 (voting phase):


1. The coordinator sends a canCommit? request to each
of the participants in the transaction.
2. When a participant receives a canCommit? request
it replies with its vote (Yes or No) to the
coordinator. Before voting Yes, it prepares to
commit by saving objects in permanent storage. If
the vote is No the participant aborts immediately.
Phase 2 (completion according to outcome of vote):
3. The coordinator collects the votes (including its
own).
(a) If there are no failures and
all the votes are Yes the coordinator
decides to commit the transaction and sends
a doCommit request to each of the
participants.
(b) Otherwise the coordinator
decides to abort the transaction and sends
doAbort requests to all participants that
voted Yes.
Communication in two-phase commit protocol

Coordinator Participant

step status step status


canCommit?
1 prepared to commit
(waiting for votes) Yes 2 prepared to commit
3 committed doCommit (uncertain)
haveCommitted 4 committed
done
two-phase commit protocol

 Consider when a participant has voted Yes and is waiting for the
coordinator to report on the outcome of the vote by telling it to
commit or abort.
 Such a participant is uncertain and cannot proceed any further. The
objects used by its transaction cannot be released for use by other
transactions.
 Participant makes a getDecision request to the coordinator to
determine the outcome. If the coordinator has failed, the participant
will not get the decision until the coordinator is replaced resulting in
extensive delay for participant in uncertain state.
 Timeout are used since exchange of information can fail when one
of the servers crashes, or when messages are lost So process will
not block forever.
Performance of two-phase commit protocol

 Provided that all servers and communication


channels do not fail, with N participants
 N number of canCommit? Messages and replies
 Followed by N doCommit messages
 The cost in messages is proportional to 3N
 The cost in time is three rounds of message.
 The cost of haveCommitted messages are not
counted, which can function correctly without
them- their role is to enable server to delete
stale coordinator information.
Failure of Coordinator

 When a participant has voted Yes and is waiting for


the coordinator to report on the outcome of the vote,
such participant is in uncertain stage. If the
coordinator has failed, the participant will not be able
to get the decision until the coordinator is replaced,
which can result in extensive delays for participants in
the uncertain state.
 One alternative strategy is allow the participants to
obtain a decision from other participants instead of
contacting coordinator. However, if all participants are
in the uncertain state, they will not get a decision.
Concurrency Control in Distributed Transactions

 Concurrency control for distributed transactions:


each server applies local concurrency control to
its own objects, which ensure transactions
serializability locally.
 However, the members of a collection of servers
of distributed transactions are jointly responsible
for ensuring that they are performed in a serially
equivalent manner. Thus global serializability
is required.
Locks

 Lock manager at each server decide whether to


grant a lock or make the requesting transaction
wait.
 However, it cannot release any locks until it
knows that the transaction has been committed
or aborted at all the servers involved in the
transaction.
 A lock managers in different servers set their
locks independently of one another. It is
possible that different servers may impose
different orderings on transactions.
Timestamp ordering concurrency control

 In a single server transaction, the coordinator issues a unique


timestamp to each transaction when it starts. Serial equivalence
is enforced by committing the versions of objects in the order of
the timestamps of transactions that accessed them.
 In distributed transactions, we require that each coordinator
issue globally unique time stamps. The coordinators must agree
as to the ordering of their timestamps. <local timestamp, server-
id>, the agreed ordering of pairs of timestamps is based on a
comparison in which the server-id is less significant.
 The timestamp is passed to each server whose objects perform
an operation in the transaction.
Timestamp ordering concurrency control

 To achieve the same ordering at all the servers,


The servers of distributed transactions are
jointly responsible for ensuring that they are
performed in a serially equivalent manner. E.g.
If T commits after U at server X, T must commits
after U at server Y.
 Conflicts are resolved as each operation is
performed. If the resolution of a conflict requires
a transaction to be aborted, the coordinator will
be informed and it will abort the transaction at all
the participants.
Locking

T U
Write(A) at X locks A
Write(B) at Y locks B
Read(B) at Y waits for U
Read(A) at X waits
for T
******************************************************************
T before U in one server X and U before T in server Y.
These different ordering can lead to cyclic dependencies
between transactions and a distributed deadlock situation
arises.
Distributed Deadlock

 Deadlocks can arise within a single server when


locking is used for concurrency control. Servers
must either prevent or detect and resolve
deadlocks.

 Using timeout to resolve deadlock is a clumsy


approach. Why? Another way is to detect
deadlock by detecting cycles in a wait for graph.
Interleavings of transactions U, V and W

U V W
d.deposit(10)lockD
at Z
b.deposit(10)lock B
a.deposit(20)lock A at Y
at X
c.deposit(30)lock C
wait atY
b.withdraw(30) at Z

wait at Z
c.withdraw(20)
wait atX
a.withdraw(20)

U V and W: transactions
Objects a and b by server X and Y
Objects c and d by server Z
Figure 14.14
Distributed deadlock

(a) (b)
W
W
Held by Waits for

C D A

Z X V

Held
Waits Held by by
for U
V U

B Waits for
Held
by
Y
Figure 14.14
Local and global wait-for graphs

local wait-for graph

U V
W U
Y
Global wait for graph is held in part by
X each of the several servers involved.
Communication between these servers is
required to find cycles in the graph.
Simple solution: one server takes on the
role of global deadlock detector. From time
V W to time, each server sends the latest copy
of its local wait-for graph.
Disadvantages: poor availability, lack of
fault tolerance and no ability to scale. The
Z cost of frequent transmission of local wait-
for graph is high.
Phantom deadlock

 A deadlock that is detected but is not really a


deadlock is called a phantom deadlock.
 As the procedure of sending local wait-for graph
to one place will take some time, there is a
chance that one of the transactions that holds a
lock will meanwhile have released it, in which
case the deadlock will no longer exist.
Figure 14.14
Phantom deadlock

local wait-for graph local wait-for graphglobal deadlock detector

T
T U V T

U V
X Y
suppose U releases object at X and request object held by
V . U->V
Then the global detector will see deadlock. However, the
edge from T to U no longer exist.

However, if two-phase locking is used, transactions can


not release locks and then obtain more locks, and phantom
deadlock cycles cannot
Instructor’s occurDollimore
Guide for Coulouris, in the wayDistributed
and Kindberg suggested above.
Systems: Concepts and
Design Edn.4
© Pearson Education 2005
Edge Chasing / Path Pushing

 Distributed approach for deadlock detection. No


global wait-for graph is constructed, but each of
the servers has knowledge about some of its
edges. The servers attempt to find cycles by
forwarding messages called probes, which
follow the edges of the graph throughout the
distributed system.
 A probe message consists of transaction wait-
for relationships representing a path in the
global wait-for graph.
Figure 14.15
Probes transmitted to detect deadlock

W
W U  V  W Held by Waits for
Deadlock
detected C
A
Z
Initiation X
W U  V
Waits
for W U

V
U

Held by Waits for


Y B
Initiation

When to send the probe in the Initiation?

Considering a server X detects a local waiting for


relationship as
W U
If U is not waiting:

There is no chance that a cycle can be formed.

However, if U is waiting for another transaction say


V, there is a potential for a possible cycle to form.
W U V

V … …W U V
Three steps

 Initiation: when a server notes that a transaction T starts waiting


for another U, where U is waiting to access object at another
server. It initiates detection by sending a probe containing the
edge<T->U> to the server of the object at which U is blocked.
 Detection: consists of receiving probes and deciding whether
deadlock has occurred and whether to forward the probes. The
server receives the probe and check to see whether U is also
waiting. If it is, the transaction it wais for (e.g. V) is added to the
probe making it <T->U->V>, and if the new transaction V is waiting
for another object elsewhere, the probe is forwarded.
 In this way, paths through the global wait-for graph are built one
edge at a time. After a new transaction is added to the probe, it will
see if the just added transaction has caused a cycle.
 Resolution: when a cycle is detected, a transaction in the cycle is
aborted to break the deadlock.
Three steps

 Server X initiates detection by


W
sending probe <W->U> to the server WU  VW Held by Waits for
of B ( server Y)
Deadlock
 Server Y receives probe <W->U>, detected C
note that B is held by V and appends A
Z
V to the probe to produce <W->U- X
>V>. It notes that V is waiting for C at Initiation
server Z. This probe is forwarded to WU V
Waits
server Z. for WU
 Server Z receives probe <W->U->V>
and notes C is held by W and V
U
appends W to the probe to produce
<W->U->V->W>.
 One of the transactions in the cycle Held by Waits for
must abort and the choice can be Y B
made based on priorities.
Coordinator and Participants for a distributed transaction

coordinator

openTransaction join participant


closeTransaction
A a.withdraw(4);
.
join
BranchX
T
participant
b.withdraw(T, 3);
Client B b.withdraw(3);

T = openTransaction
join BranchY
a.withdraw(4);
c.deposit(4); participant
b.withdraw(3);
d.deposit(3); C c.deposit(4);
closeTransaction
D d.deposit(3);
Note: client invoke an operation b.withdraw(),
B will inform participant at BranchY to join coordinator. BranchZ

the coordinator is in one of the Guide


Instructor’s servers, e.g. BranchX
for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and
Design Edn.4
© Pearson Education 2005
Probe Forwarding between servers is actually through Coordinator

 Lock manager at participants inform coordinator when transaction


starts waiting for objects and when transaction acquires objects and
become active again.
 The coordinator is responsible for recording whether the
transaction is active or waiting for a object, and participants can get
this information from the coordinator.

 A server usually sends its probe to the coordinator of the last


transaction in the path to find out whether the transaction is waiting
for another object elsewhere.
 E.g. W->U->V, see V if waiting or not, if V is waiting for another
object, V’s coordinator will forward the probe to the server of the
object on which V is waiting on.
 This shows that when a probe is forwarded, two messages are
required.
Performance Analysis

 In above example, two probe messages to


detect a cycle involving three transactions.
 when one probe is forwarded, two messages
are required.
 In general, a probe that detects a cycle involving
N transactions will be forwarded by (N-1)
transaction coordinators via (N-1) servers of
objects, requiring a total of 2(N-1) messages.
 Deadlock detection can be initiated by several
transactions in a cycle at the same time.
Figure 14.16
Two probes initiated

(a) initial situation (b) detection initiated (c) detection


at object requested by initiated at object
T requested by W
T
Waits for T Waits for T T U W V T
V U V U W V T U
V
T U W U
T U W V
W Waits
for W V
Waits
W W for
Multiple Probes Problems

 At about the same time, T waits for U ( T->U) and W waits for V (W->V).
Two probes occur, two deadlocks detected by different servers.

 We want to ensure that only one transaction is aborted in the same


deadlock since different servers may choose different transaction to abort
leading to unnecessary abort of transactions.
 So using priorities to determine which transaction to abort will result in the
same transaction to abort even if the cycles are detected by different
servers.

 Using priority can also reduce the number of probes. For example, we only
initiate probe when higher priority transaction starts to wait for lower priority
transaction.
 If we say the priority order from high to low is: T, U, V and W. Then only the
probe of T->U will be sent and not the probe of W->V.
Transaction recovery

 Atomic property of transactions can be


described in two aspects:
 Durability: objects are saved in permanent storage
and will be available indefinitely thereafter.
Acknowledgement of a client’s commit request
implies that all the effects of the transaction have
been recorded in permanent storage as well as in the
server’s volatile object.
 Failure atomicity: the effects of transactions are
atomic even when the server crashes.
 Both can be realized by recovery manager.
Recovery manager

 Tasks of a recovery manager:


 Save objects in permanent storage ( in a recovery
file) for committed transactions;
 To restore the server’s objects after a crash;
 To reorganize the recovery file to improve the
performance of recovery;
 To reclaim storage space in the recovery file.
Figure 14.18
Types of entry in a recovery file

Type of entry
Description of contents of entry
Object A value of an object.
Transaction
Transaction
status identifier, transactionprepared
status
,committed
(
) aborted
and other status values used for the two-phase
commit protocol.
Intentions
Transaction
list identifier and a sequence of intentions, ea
which consists of <identifier of Position
object>, of
<
value of object>.
Intention list records all of its currently active transactions. A list of a particular
transaction contains a list of the references and the values of all the objects that
are altered. When committed, the committed version of each object is replaced
by the tentative version made by that transaction. When a transaction aborts,
the server uses the intention list to delete all the tentative versions of objects.
When a participant says it is prepared to commit, its recovery manager must have
saved both its intention list for that transaction and the objects in that intention
list in its recovery file, so it will be able to carry out the commitment later on,
even if it crashes in the interim.
Figure 14.19
Log for banking service

P0 P1 P2 P3 P4 P5 P6 Tran
P7
Object:
AObject:
BObject:
CObject:
AObject:
B Trans: s: U
T Trans:T Object:C Objec Bprepared
100 200 300 80 220 prepared committed278 t:
242
<A, P1 > <C, P5>
<B, P2 > <B, P6>
P0 P3 P4
Checkpoint
End
of log
 Log technique contains history of all transactions by a server. When
prepared, commits or aborts, the recovery manager is called. It
appends all objects in its intention list followed by the current status.
After a crash, any transaction that does not have a committed
status in the log is aborted.
 Each transaction status entry contains a pointer to the position in
the recovery file of the previous transaction status entry to enable
the recovery manager to follow the transaction entries in reverse
order. The last pointer points to the checkpoint.
Recovery of objects

P0 P1 P2 P3 P4 P5 P6 P7
Tran
Object:
AObject:
BObject:
CObject:
AObject:
B Trans: s: U
T Trans:T Object:C Objec Bprepared
100 200 300 80 220 prepared committed278 t:
242
<A, P1 > <C, P5>
<B, P2 > <B, P6>
P0 P3 P4
Checkpoint
End
 When a server is replaced after a crash, it first sets default initial values for its objectsofand
log
hands over to its recovery manager, which is responsible for restoring the server’s objects so
that include all effects of all committed transactions in the correct order and none of aborted
transactions. Two approaches:
 Starting from the beginning of the most recent checkpoint, reads in the values of each of the objects.
For committed transactions replaces the values of the objects.
 Reading the recovery file backwards. Use transactions with committed status to restore those objects
that have not yet been restored. It continues until it has restored all of the server’s object. Advantage is
each object is restored once only.
 (U aborted, ignore C and B, then restore A and B as 80 and 220, then C as 300.
 Reorganize the log file: use Checkpoin: to write the current committed values of all objects to a new
recovery file. Since all we need is the committed values.
Figure 14.21
Log with entries relating to two-phase commit protocol

Trans:
T Coord’r:
T Trans:
T Trans:U Part’pant:
UTrans:U Trans:U

prepared
part’pant committed
prepared Coord’r: .
uncertain
. committed
list: . . .
intentions intenti
list list
ons
 Coordinator uses committed/aborted to indicate that the outcome of the vote is Yes/no
and done to indicate that two-phase commit protocol is complete, prepared before vote.

 Participate uses prepared to indicate it has not yet voted and can abort the transaction
and uncertain to indicate that it has voted Yes, but does not yet know the outcome and
committed indicates that has finished.

 Above example, this server plays the role of coordinator for transaction T, play participant
role for transaction U.
Log with entries relating to two-phase commit protocol

 In phase 1, when the coordinator is prepared to commit and has


already added a prepared status entry, its recovery manager adds
a coordinator entry. Before a participant can vote Yes, it must have
already prepared to commit and must have already added a
prepared status entry. When it votes Yes, its recovery manager
records a participant entry and adds an uncertain status. When a
participant votes No, it adds an abort status to recovery file.
 In phase 2, the recovery manager of the coordinator adds either a
committed or an aborted, according to the decision. Recovery
manager of participants add a commit or abort status to their
recovery files according to message received from coordinator.
When a coordinator has received a confirmation from all its
participants, its recovery manager adds a done status.
Log with entries relating to two-phase commit protocol

 When a server is replaced after a crash, the recovery manager has


to deal with the two-phase commit protocol in addition to restore the
objects.
 For any transaction where the server has played the coordinator
role, it should find a coordinator entry and a set of transaction
status entries. For any transaction where the server has played the
participant role, it should find a participant entry and a set of set of
transaction status entries. In both cases, the most recent
transaction status entry, that is the one nearest the end of log
determine the status at the time of failure.
 The action of the recovery manage with respect to the two-phase
commit protocol for any transaction depends on whether the server
was the coordinator or a participant and on its status at the time of
failure as shown in the following table.
Figure 14.22
Recovery of the two-phase commit protocol

Role StatusAction of recovery manager


Coordinator
Noprepared
decision had been reached before the server failed. It se
abortTransaction
to all the servers in the participant list and a
transaction status
inaborted
its recovery file. Same action for st
. If there
aborted
is no participant list, the participants will e
timeout and abort the transaction.
Coordinator
A decision
committedto commit had been reached before the server faile
sends
to all
a
doCommit
the participants in its participant list (
it had not done so before) and resumes the two-phase protocol
(Fig 13.5).
Participant committed
The participant sends
haveCommitted
a message to the coordinator (
case this was not done before it failed). This will allow the
to discard information about this transaction at the next chec
Participant
The uncertain
participant failed before it knew the outcome of the trans
cannot determine the status of the transaction until the coor
informs it of the decision. It will getDecision
send to
a the coordinator
to determine the status of the transaction. When it receives th
will commit or abort accordingly.
Participant
Theprepared
participant has not yet voted and can abort the transacti
Coordinatordone No action is required.

You might also like