CS542: Topics in Distributed Systems
CS542: Topics in Distributed Systems
CS542: Topics
Topics in
in
Distributed
Distributed Systems
Systems
C D
T22
F
D
Z
BranchX
T
participant
B.withdraw(T, 3);
Client B B.withdraw(3);
T = openTransaction
join BranchY
A.withdraw(4);
C.deposit(4); participant
B.withdraw(3);
D.deposit(3); C C.deposit(4);
closeTransaction
D D.deposit(3);
Note: the coordinator is in one of the servers, e.g. BranchX
BranchZ
Atomic
Atomic Commit
Commit Problem
Problem
Atomicity principle requires that either all the
distributed operations of a transaction complete, or
all abort.
At some stage, client executes closeTransaction().
Now, atomicity requires that either all participants
(remember these are on the server side) and the
coordinator commit or all abort.
What problem statement is this?
Atomic
Atomic Commit
Commit
Protocols
Protocols
Consensus, but the system is asynchronous!!
So, need to ensure safety property in real-life implementation.
Never have some agreeing to commit, and others agreeing to
abort.
First cut: one-phase commit protocol. The coordinator
unilaterally communicates either commit or abort, to all
participants (servers) until all acknowledge.
Doesn’t work when a participant crashes before receiving this
message (partial transaction results that were in memory are lost).
Does not allow participant to abort the transaction, e.g., under error
conditions.
Atomic
Atomic Commit
Commit
Protocols
Protocols
Consensus, but it’s impossible in asynchronous networks!
So, need to ensure safety property in real-life implementation.
Never have some committing while others abort. Err on the side
of safety.
Alternative: Two-phase commit protocol
First phase involves coordinator collecting a vote (commit or abort) from each
participant
Participant stores partial results in permanent storage before voting
Now coordinator makes a decision
if all participants want to commit and no one has crashed, coordinator
multicasts “commit” message
Everyone commits
If participant fails, then on recovery, can get commit msg from coord
else if any participant has crashed or aborted, coordinator multicasts “abort”
message to all participants
Everyone aborts
RPCs
RPCs for
for Two-Phase
Two-Phase Commit
Commit Protocol
Protocol
canCommit?(trans)-> Yes / No
Call from coordinator to participant to ask whether it can commit a
transaction. Participant replies with its vote. Phase 1.
doCommit(trans)
Call from coordinator to participant to tell participant to commit its part of a
transaction. Phase 2.
doAbort(trans)
Call from coordinator to participant to tell participant to abort its part of a
transaction. Phase 2.
getDecision(trans) -> Yes / No
Call from participant to coordinator to ask for the decision on a transaction
after it has voted Yes but has still has received no reply within timeout. Also
used to recover from server crash or delayed messages.
haveCommitted(trans, participant)
Call from participant to coordinator to confirm that it has committed the
transaction. (May not be required if getDecision() is used)
The
The two-phase
two-phase commit commit
protocol
protocol
Phase 1 (voting phase):
1. The coordinator sends a canCommit? request to each of the participants in
the transaction.
2. When a participant receives a canCommit? request, it replies with its vote
(Yes or No) to the coordinator. Before voting Yes, it “prepares to commit”
Recall that a by saving objects in permanent storage. If its vote is No, the participant
server may
crash aborts immediately.
Phase 2 (completion according to outcome of vote):
3. The coordinator collects the votes (including its own), makes a decision,
and logs this on disk.
(a) If there are no failures and all the votes are Yes, the coordinator
decides to commit the transaction and sends a doCommit request
to each of the participants.
(b) Otherwise the coordinator decides to abort the transaction and
sends doAbort requests to all participants that voted Yes. This is
the step erring on the side of safety.
4. Participants that voted Yes are waiting for a doCommit or doAbort request
from the coordinator. When a participant receives one of these messages,
it acts accordingly – when committed, it makes a haveCommitted call.
• If it times out waiting for a doCommit/doAbort, participant keeps sending a getDecision
to coordinator, until it knows of the decision
Communication
Communication in
in Two-Phase
Two-Phase Commit
Commit
Coordinator Participant
Execute Execute
not
• Precommit ready ready
request
Abort
Uncertain • Precommit
•Send NO to
•Send request to coordinator • send YES to
each participant coordinator
NO
• Wait for replies YES • Wait for
(time out possible) decision
Timeout All COMMIT
or a NO YES decision ABORT
decision
Abort Commit Commit
•Send ABORT to •Send COMMIT to • Make Abort
each participant each participant transaction
visible