0% found this document useful (0 votes)
2 views

Distributed Computing Replication Control

The document discusses replication control in distributed computing, focusing on how to manage operations across multiple servers. It highlights the benefits of replication, such as fault tolerance and load balancing, while addressing challenges like maintaining replication transparency and consistency. The document also covers atomic commitment protocols, including one-phase and two-phase commit protocols, and their implications for transaction management in distributed systems.

Uploaded by

sachinjdh31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Distributed Computing Replication Control

The document discusses replication control in distributed computing, focusing on how to manage operations across multiple servers. It highlights the benefits of replication, such as fault tolerance and load balancing, while addressing challenges like maintaining replication transparency and consistency. The document also covers atomic commitment protocols, including one-phase and two-phase commit protocols, and their implications for transaction management in distributed systems.

Uploaded by

sachinjdh31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Distributed Computing

Replication Control
A K Singh
Computer Engineering
N I T Kurukshetra
Slides adapted from Indy@UIUC © IG
Server-side Focus

• Concurrency Control = how to coordinate


multiple concurrent clients executing operations
(or transactions) with a server
Next:
• Replication Control = how to handle operations
(or transactions) when the objects are stored at
multiple servers, with or without replication

2
Replication: What and Why

• Replication = An object has identical copies,


each maintained by a separate server
– Copies are called “replicas”
• Why replication?
– Fault-tolerance: With k replicas of each object, can
tolerate failure of any (k-1) servers in the system
– Load balancing: Spread read/write operations out over
the k replicas => load lowered by a factor of k compared
to a single replica
– Replication => Higher Availability

3
Rollback Recovery vs. Rollforward Recovery

Repair interval
message ≡ request

4
Availability

• If each of k servers has an independent


probability p of failing or becoming unreachable
– Server’s failure probability
• With no replication, availability of object =
= Probability that single copy is up
= (1 – p)
• With k replicas, availability of object =
Probability that at least one replicas is up
= 1 – Probability that all replicas are down
= (1 – p k)
5
Nines Availability

• With no replication, availability of object =


= (1 – p)
• With k replicas, availability of object =
= (1 – p k)
Availability Table

p = failure No replication k=3 replicas k=5 replicas


probability
0.1 90% 99.9% 99.999%
0.05 95% 99.9875% 6 Nines
0.01 99% 99.9999% 10 Nines 6
What’s the Catch?

• Challenge is to maintain two properties


1. Replication Transparency
– A client ought not to be aware of multiple copies of
objects existing on the server side
2. Replication Consistency
– All clients see single consistent copy of data, in spite of
replication
– For transactions, guarantee ACID

7
Replication Transparency
Replicas of an
Front ends Replica 1 object O
provide replication
transparency
Client Front End
Replica 2
Client
Front End
Client Replica 3

Requests
(replies flow opposite) 8
Replication Consistency

• Two ways to forward updates from front-ends State machine replication:


(FEs) to replica group Each replica is modeled as
a state machine =>
– Passive Replication: uses a primary replica (leader or
sequential & deterministic
previously a.k.a. “master”) state changes
– Active Replication: treats all replicas identically •State variables
•Interfaces accessible by the
• Both approaches use the concept of “Replicated clients that operate on state
State Machines” variables
– Each replica’s code runs the same state machine •Deterministic state change
via interface
– Multiple copies of the same State Machine begun in the
Start state, and receiving the same Inputs in the same
order will arrive at the same State having generated the
same Outputs. [Schneider 1990] 9
Passive Replication

• Leader => total ordering of all updates


Replica 1 • On leader failure, run election

Client Front End Leader (elected leader)


Replica 2
Client
Front End
Client Replica 3

Requests
(replies flow opposite) 10
Active Replication

Multicast
Front ends Replica 1 inside
provide replication Replica group
transparency
Client Front End
Replica 2
Client
Front End
Client Replica 3

Requests
(replies flow opposite) 11
Active Replication Using Concepts You’ve
Learnt earlier
• Can use any flavor of multicast ordering,
depending on application
– FIFO ordering
– Causal ordering
– Total ordering
– Hybrid ordering
• Total or Hybrid (*-Total) ordering + Replicated
State machines approach
– => all replicas reflect the same sequence of updates to
the object

12
Active Replication Using Concepts You’ve
Learnt earlier (2)
• What about failures?
– Use virtual synchrony (i.e., view synchrony)
• Virtual synchrony with total ordering for
multicasts =>
– All replicas see all failures/joins/leaves and all
multicasts in the same order
– Could also use causal (or even FIFO) ordering if
application can tolerate it

13
Transactions and Replication
• One-copy serializability
– A concurrent execution of transactions in a replicated database
is one-copy-serializable if it is equivalent to a serial execution of
these transactions over a single logical copy of the database
– (Or) The effect of transactions performed by clients on replicated
objects should be the same as if they had been performed one at
a time on a single set of objects (i.e., one replica per object)
• In a non-replicated system, transactions appear to
be performed one at a time in some order
– Correctness means serial equivalence of transactions
• When objects are replicated, transaction systems
for correctness need one-copy serializability
14
Next

• Committing transactions with distributed servers

15
Transactions with Distributed Servers

Server 1
Transaction T Object
write(A,1); A
write(B,2); Object B
… .
write(Y, 25); .
write(Z, 26); .
commit
Server 13
Object Y

Object Z
16
Transactions With Distributed Servers

• Transaction T may touch objects that reside on


different servers
• When T tries to commit, need to ensure
– Either all these servers commit their updates from T =>
T will commit
– Or none of these servers commit => T will abort
• What problem is this?

17
Transactions With Distributed Servers

• Transaction T may touch objects that reside on


different servers
• When T tries to commit
– Need to ensure all these servers commit their updates
from T => T will commit
– Or none of these servers commit => T will abort
• What problem is this?
– Consensus!
– (It’s also called the “Atomic Commit problem”)

18
Atomic Commitment Protocol

• An atomic commitment protocol (ACP)


is an algorithm for the coordinator and
participants such that either the
coordinator and all participants commit
the transaction or they all abort it
• Each process may cast exactly one of
two votes: Yes or No, and can reach
exactly one of two decisions: Commit or
19
Abort
Atomic Commitment Protocol (2)

Two important consequence of


• An ACP is an algorithm for processes to reach decisions such AC3
that: • Each process can unilaterally decide
– AC1: All processes that reach a decision reach the same one (consistent) Abort at any time, if it has not yet
– AC2: A process can’t reverse its decision after it has reached one (irrevocable) voted Yes
– AC3: The Commit decision can only be reached if all processes voted Yes •After voting Yes a process can’t take
– AC4: If there are no failures and all processes voted Yes, then the decision unilateral action
will be to Commit • The period between the moment a
– AC5: Consider any execution containing only failures that the algorithm is process votes Yes and the moment it
designed to tolerate. At any point in this execution, if all existing failures has received sufficient information to
are repaired and no new failures occur for sufficiently long, then all know what the decision will be is
processes will eventually reach a decision called the uncertainty period for that
process
• A process is called uncertain while it
is in its uncertainty period

20
Atomic Commitment Protocol (3)

• Scenario I: A failure disables communication between a


process p and all other processes, while p is uncertain. By the
definition of uncertainty period, p can’t reach a decision until
after the communication failure has been repaired
• Scenario II: p fails while in its uncertainty period. When p
recovers, it can’t reach a decision on its own unless it
communicates with other processes to find out what the
decision was

21
Atomic Commitment Protocol (4)

• Independent recovery: The ability of a recovering process


to reach a decision without communicating with other
processes
• This ability is very attractive, because
– It makes recovery cheaper and simpler
– The lack of independent recovery in conjunction with total failures (when
all processes fail) gives rise to blocking
• Example
– Say p, in Scenario II, is the first process to recover from a total failure
– Since p is uncertain, it must communicate with other processes before it can
reach a decision
– However, it can’t communicate with them, since they all are down
– Thus, p is blocked
22
Atomic Commitment Protocol (5)

• Failures while a process is uncertain → serious problems


• Can we design an ACP that eliminates uncertainty periods?
– Unfortunately, not
• Doing so would essentially require that a process cast its vote
and learn the votes of all other processes all at once
– In general, this is impossible
• Thus, we have the following important observations:
• Proposition 1: If communication failures or total failures are
possible, then every ACP may cause processes to become
blocked
– However, a non-blocking ACP may exist if only site failures occur
• Proposition 2: No ACP can guarantee independent recovery
23
of failed processes
One-phase Commit

Coordinator Server 1 1-phase atomic commit


Transaction T Server Object protocol:
A The coordinator
write(A,1); communicates the commit
. Object B
write(B,2); or abort request to all of the
.
… . .
participants in the
transaction and keep on
write(Y, 25); . repeating the request until
write(Z, 26); . all of them have
commit acknowledged that they
Server 13 have carried it out
• Special server called “Coordinator” Object Y
initiates atomic commit
• Tells other servers to either Object Z
commit or abort 24
One-phase Commit: Issues

Reasons that prevent a server from


being able to commit its part of a
• One-phase atomic commit protocol is inadequate, transaction (i) If locking is in use,
though, because it does not allow a server to the resolution of a deadlock can lead
make a unilateral decision to abort, in case it to the aborting of a transaction
without the client being aware (ii) If
can’t commit, a transaction when the client optimistic concurrency control is in
requests a commit use, the failure of validation at a
server would cause it to decide to
– Server with object has no say in whether transaction abort the transaction (iii) The
commits or aborts coordinator may not know if a server
• If object corrupted, it just can’t commit (while other servers have has crashed and been replaced
committed) during the progress of a distributed
transaction – such a server will need
– Server may crash, before receiving commit message, to abort the transaction
with some updates still in memory 25
Two-phase Commit
In the 2nd phase of the protocol, every
• The two-phase commit protocol is designed to allow any participant in the transaction carries out
the joint decision
participant to abort its part of a transaction •If any one participant votes to abort, then the
– Due to the requirement for atomicity, if one part of a transaction is aborted, decision must be to abort the transaction. If all
then the whole transaction must be aborted the participants vote to commit, then the
• In the 1st phase of the protocol, each participant votes for the decision is to commit the transaction
Problem: To ensure that all of the
transaction to be committed or aborted
participants vote and that they all reach
– Once a participant has voted to commit, it is not allowed to abort
the same decision
– Therefore, before a participant votes to commit a transaction, it must ensure
•Simple if no errors occur, but the protocol
that it will eventually be able to carry out its part of the commit protocol,
must work correctly even when some of the
even if it fails and is replaced in the interim
servers fail, messages are lost or servers are
– Such participant in a transaction is said to be in a prepared state temporarily unable to communicate with one
• To make sure of this, each participant saves, in permanent another
storage, all of the objects that it has altered in the transaction,
together with its status – prepared 26
Two-phase Commit
Coordinator

Server Server 1 Server 13
Prepare

27
Two-phase Commit
Coordinator

Server Server 1 Server 13
Prepare
• Save updates to disk
• Respond with “Yes” or
“No”
Two-phase Commit
Coordinator

Server Server 1 Server 13
Prepare
• Save updates to disk
• Respond with “Yes” or
“No”
If any
“No” vote Abort
or timeout
before all
(13) votes
Two-phase Commit
Coordinator

Server Server 1 Server 13
Prepare
• Save updates to disk
• Respond with “Yes” or
“No”
All (13)
“Yes” Commit
votes
received
within
timeout?
Two-phase Commit
Coordinator

Server Server 1 Server 13
Prepare
• Save updates to disk
• Respond with “Yes” or
“No”
All (13)
“Yes” Commit
votes • Wait! Can’t commit or abort
received before receiving next message!
within
timeout?
Two-phase Commit
Coordinator

Server Server 1 Server 13
Prepare
• Save updates to disk
• Respond with “Yes” or
“No”
All (13)
“Yes” Commit
votes • Commit updates from disk
received to store
within
OK
timeout?
Two-phase Commit: An Example

• e-commerce website handling an order


transaction
– When a customer places an order, the e-commerce
system needs to check the inventory (whether items are
in stock), payment processing (funds availability),
and delivery system (delivery partner availability)
– The coordinator (order system) sends a Prepare request
to each component
– If all components confirm they are ready, the
coordinator proceeds with Commit
– If any component indicates an issue (e.g., insufficient
stock), the coordinator aborts the transaction to maintain
33
consistency
Failures in Two-phase Commit

• If server voted Yes, it can’t commit unilaterally before


receiving Commit message
• If server voted No, can abort right away (why?)
• To deal with server crashes
– Each server saves tentative updates into permanent storage, right
before replying Yes/No in 1st phase. Retrievable after crash
recovery

• To deal with coordinator crashes


– Coordinator logs all decisions and received/sent messages on
disk
– After recovery or new election => new coordinator takes over 34
Failures in Two-phase Commit (2)

• To deal with Prepare message loss


– The server may decide to abort unilaterally after a timeout for 1st
phase (server will vote No, and so coordinator will also
eventually abort)

• To deal with Yes/No message loss


– The coordinator aborts the transaction after a timeout
(pessimistic!). It must announce Abort message to all

• To deal with Commit/Abort message loss


– Server can poll coordinator (repeatedly)

35
Failure Model for Commit protocols

• Commit protocols are designed to work in an asynchronous


system in which servers may crash, msgs may be lost, and no
Byzantine faults
• The 2-phase commit is a protocol for reaching consensus
– FLP – consensus can’t be reached in an asynchronous system if processes
sometimes fail
• However, the 2-phase commit protocol does reach consensus
under those conditions
– Because crash failures of processes are masked by replacing a crashed
process with a new process whose state is set from information saved in
permanent storage and information held by other processes

36
Using Paxos in Distributed Servers

Atomic Commit set of multiple operations to be executed as a single operation


•Can instead use Paxos to decide whether to commit
a transaction or not
•But need to ensure that if any server votes No,
everyone aborts
Ordering updates
•Paxos can also be used by replica group (for an
object) to order all updates – iteratively do:
– Server proposes message for next sequence number
– Group reaches consensus (or not)
37
Two-phase Commit: Issues

• If the coordinator has failed,


• 2PC does not handle network failures very well the participant will not be able
to get the decision until the
• If the coordinator fails during the Commit Phase, some
coordinator is replaced, which
participants may not receive the final decision can result in extensive delays
• It can result in blocking for participants in the
– A participant must await the repair of failures before proceeding uncertain state
– Participants wait for an arbitrarily long period of time, especially if the • Further, if all the active
coordinator fails to send a response participants are uncertain,
– Such a participant is uncertain of the outcome (commit/abort) and can’t delays will occur even if a
proceed any further until it gets the outcome of the vote from the cooperative protocol allows
coordinator
participants to make requests
– The participant can’t decide unilaterally what to do next, and meanwhile the
to other participants
transaction may remain unterminated, uselessly consuming resources (e.g.
holding locks), for arbitrarily long at a blocked process’s site • “How 3PC protocol alleviate
– Consequently, the objects used by its transaction can’t be released for use such delays”, we’ll see later
by other transactions 38
In step (3), if the coordinator waiting for
YES or NO messages from all the

2PC: Steps and Timeouts participants timeouts, it can decide Abort


but must send ABORT to every
participant from which it received a YES

In step (2), if a participant waiting for a VOTE-REQ from the coordinator timeouts, it can simply
decide Abort and stop • In step (4), if a participant
1. The coordinator sends a VOTE-REQ (i.e., vote request) message to all voted Yes and waiting for a
participants COMMIT or ABORT from
2. When a participant receives a VOTE-REQ, it responds by sending to the the coordinator timeouts, it is
coordinator a message containing that participant’s vote: YES or NO. If uncertain; now, it can consult
the participant votes No, it decides Abort and stops other processes to find out
3. The coordinator collects the vote messages from all participants, If all of what to decide
them were YES and the coordinator’s vote is also Yes, then the • An insight for termination
– Say, there are two participants p and q
coordinator decides Commit and sends COMMIT messages to all
– The coordinator might send a
participants; otherwise, the coordinator decides Abort and sends ABORT COMMIT or ABORT to q but fail just
messages to all participants that voted Yes (those that voted No already before sending it to p. Thus, even
though p is uncertain, q is not. If p can
decided Abort in step (2)). In either case, the coordinator then stops communicate with q, it can find out
4. Each participant that voted Yes waits for a COMMIT or ABORT the decision from q. It need not block
waiting for the coordinator’s recovery
message from the coordinator. When it receives the message, it decides
accordingly and stops
39
2PC: The Cooperative Termination

• With this protocol, if p can


• A participant p, which times out while in its uncertainty communicate with some q for
which either (1) or (2) holds,
period, sends a DECISION-REQ message to every other
then p can reach a decision
process q, to inquire whether q either knows the decision without blocking
or can unilaterally reach one. In this scenario, p is the • On the other hand, if (3) holds
initiator and q a responder in the termination protocol for all processes with which p
can communicate, then p is
• Three cases:
blocked
1. q has already decided Commit (or Abort): q simply sends a COMMIT
• In summary, even though the
(or ABORT) to p, and p decides accordingly
cooperative termination
2. q has not voted yet: q can unilaterally decide Abort. It then sends an protocol reduces the
ABORT to p, and p therefore decides Abort probability of blocking, it does
3. q has voted Yes but has not yet reached a decision: q is also uncertain not eliminate it
and therefore can’t help p reach a decision
40
2PC: Time Complexity
A round is the maximum time for a message to reach its destination
• This is independent of the
• In the absence of failures, 2PC requires three rounds:
number of failures! The catch
– (1) the coordinator broadcasts VOTE-REQS; (2) the participants reply with
is that some processes may be
their votes; and (3) the coordinator broadcasts the decision
blocked
• If failures happen, then the termination protocol may need • By definition, a blocked
two additional rounds: process may remain blocked
– (4) for a participant, which timed out, to send a DECISION-REQ, and (5) for an unbounded period of
for a process that receives that message and is outside its uncertainty period
time
to reply
• Therefore, to get meaningful
• Several participants may independently invoke the results, we must exclude
termination protocol blocked processes from
– However, the two rounds of different invocations can overlap, so the consideration in measuring
combined effect of all invocations of the termination protocol is only two
time complexity
rounds
– Thus, in the presence of failures it will take up to five rounds for all
processes that aren’t blocked or failed to reach a decision 41
2PC: Message Complexity

• Let n be the number of participants (so the total number of processes is


n + 1)
– In each round of 2PC, n messages are sent
– Thus, in the absence of failures, the protocol uses 3n messages
• The cooperative termination protocol is invoked by all participants
that voted Yes but didn’t receive COMMIT or ABORT from the
coordinator
• Let, there be m such uncertain participants, 0 ≤ m ≤ n
– Thus, m processes will initiate the termination protocol, each sending n
DECISION-REQ messages
– At most (n - m) + 1 processes (the maximum that might not be in their
uncertainty period) will respond to the 1st DECISION-REQ message
• As a result of these responses, one more process may move outside its
uncertainty period and thus respond to the DECISION-REQ message
of another initiator of the termination protocol 42
2PC: Message Complexity (2)

43
2PC: Lessons Learned

• 3PC is a protocol that satisfies


• In 2PC, if all operational processes are uncertain, they are NB
blocked • The idea is simple
• They can’t decide Abort, even if they know that processes • Consider why 2PC violates
they can’t communicate with have failed, because some NB
failed process could have decided Commit before failing • The coordinator sends
COMMITS to the participants
• Suppose we’ve managed to design an ACP with the
while the latter are uncertain
following “non-blocking property”:
• Thus, if participant p receives
• NB: If any operational process is uncertain then no process a COMMIT before participant
(whether operational or failed) can have decided to Commit q, the former will decide
– If the operational sites discover they are all uncertain, they can decide Commit while the latter is still
Abort, safe in their knowledge that the failed processes had not decided uncertain
Commit
• How 3PC avoids this?
– When the failed processes recover they can be told to decide Abort too 44
• This way blocking can be prevented
Three-phase Commit: Key Idea

• After the coordinator has found that all votes were Yes, it sends • If a process votes No, then
PRE-COMMIT messages to the participants 3PC behaves just like 2PC
• When a participant p receives that message, it knows that al1 • The coordinator sends
processes voted Yes and is thereby moved outside its uncertainty ABORT to all processes
period
– However, p does not decide Commit yet
• At this point, p knows that it will decide Commit provided it does not
fail
• Each participant acknowledges the receipt of PRE-COMMIT
• When the coordinator has received all the acknowledgments to
PRE-COMMITs, it knows that no participant is uncertain anymore
• It then sends COMMIT to all participants
• When a participant receives a COMMIT it can decide Commit
• This decision satisfies NB since no process is uncertain any longer 45
Three-phase Commit: Assumptions

• The first version of 3PC is designed to handle only


(non-total) site failures
– Consequently, we assume that communication failures do not happen
• Although, site failures may cause communication failures as
a side effect (Fig.)
• Two major implications of assuming only site failures
• if sites A and C fail, then
1. All operational processes can communicate with each other
sites B and E can’t
2. A process that times out waiting for a message from process q knows that q
is down and therefore that no processing can be taking place there; in
communicate even though
particular, no other process can be communicating with q both are operational
• Neither of these statements is true if communication failures
are possible
46
Three-phase Commit: Steps

1. The coordinator sends a VOTE-REQ to all participants


2. When a participant receives a VOTE-REQ, it responds with a YES or NO message, depending on
its vote. If a participant sends NO, it decides Abort and stops
3. The coordinator collects the vote messages from all participants. If any of them was NO or if the
coordinator voted No, then the coordinator decides Abort, sends ABORT to all participants that
voted Yes, and stops. Otherwise, the coordinator sends PRE-COMMIT messages to all participants
4. A participant that voted Yes waits for a PRE-COMMIT or ABORT message from the coordinator
I. If participant receives an ABORT, it decides Abort and stops
II. If participant receives a PRE-COMMIT, it responds with an ACK (i.e., acknowledgment) message to the coordinator
5. The coordinator collects the ACKs. When they have all been received, it decides Commit, sends
COMMIT to all participants, and stops
6. A participant waits for a COMMIT from the coordinator. When it receives that message, it decides
47
Commit and stops
3PC: Timeout Actions

• Cases (1) & (2) are handled


• On timeout, “what a process should do” depends exactly as in 2PC
on the message it was waiting for • In both cases the process that
times out knows that no
• There are five places in which a process waits for process can have decided
some message in 3PC Commit
• Thus, it can unilaterally decide
1. In Step (2) participants wait for VOTE-REQ Abort
2. In step (3) the coordinator waits for the votes • In case (1) the participant can
3. In step (4) participants wait for a PRE-COMMIT or ABORT simply stop once it has
decided Abort
4. In step (5) the coordinator waits for ACKs
• In case (2) the coordinator
5. In Step (6) participants wait for COMMIT should also send ABORTs to
all participants that had voted
Yes 48
Timeout Actions (during W1 & W2)
Coordinator Participant p
no process can have decided
VOTE-RE W1 Commit, decide Abort
Q
no process can have decided
Commit, decide Abort W2
VOTE

PRE-COM W3
MIT/ABO
RT

W4

ACK

W5
COMMIT

49
3PC: Timeout Actions (2)

• The failed participants, when they • In case (4) the coordinator


• On timeout, “what a process recover,should
are responsible
do” for finding out
depends times out because one or more
that the decision was to Commit participants failed before
on the message it was •waiting With thisfor timeout action, processes sending an ACK
might decide Commit while some – The coordinator does not know
• There are five places in which a process waits for whether these participants failed
failed participant is uncertain before or after receiving a
some message in 3PC • This will happen if some participant PRE-COMMIT

1. In Step (2) participants wait for


thatVOTE-REQ
did not send an ACK had actually • But it does know that these
failed participants had voted Yes and
2. In step (3) the coordinator waits for before even receiving the
the votes
PRE-COMMIT from the coordinator were therefore prepared to
3. In step (4) participants wait for(refer
a PRE-COMMIT
step 4 of 3PC) or ABORT decide Commit
4. In step (5) the coordinator waits
• Thisfordoes
ACKs not violate NB, which • Thus, it ignores the failures
5. requires
In Step (6) participants wait for COMMIT only that while any operational and proceeds to send
process is uncertain (let failed process be COMMIT to the operational
uncertain), no operational or failed participants as if it had
process has decided Commit received all ACKs50
Timeout Action (during W4)

Participant q Coordinator Participant p


VOTE-RE W1
Q
W2
VOTE

PRE-COM W3
MIT/ABO
RT
W3
MIT
q uncertain, but failed PRE-COM W4 p might decide Commit

ACK

W5
COMMIT

51
3PC: Timeout Actions (3)

• In case (3) and (5) processes


• On timeout, “what a process should do” depends can’t act autonomously in
response to the timeout
on the message it was waiting for
• They must communicate with
• There are five places in which a process waits for other processes to reach a
some message in 3PC consistent decision
• It is clear why such
1. In Step (2) participants wait for VOTE-REQ communication is necessary in
2. In step (3) the coordinator waits for the votes case (3)
3. In step (4) participants wait for a PRE-COMMIT or ABORT • p is operational and has not
received any PRE-COMMIT
4. In step (5) the coordinator waits for ACKs
and is therefore uncertain
5. In Step (6) participants wait for COMMIT • The timeout of an operational
uncertain participant can’t be
52
handled unilaterally
Timeout Actions (during W3)
Coordinator Participant p
VOTE-RE W1
Q
W2
VOTE

PRE-COM W3 p uncertain, but operational


MIT/ABO
RT

W4

ACK

W5
COMMIT

53
3PC: Timeout Actions (4)

• But, in case (5), why does a


• On timeout, “what a process should do” depends timed out participant p need to
communicate with others?
on the message it was waiting for
• p is operational and has
• There are five places in which a process waits for already received a
some message in 3PC PRE-COMMIT and is
therefore not uncertain
1. In Step (2) participants wait for VOTE-REQ • After all, p knows that only
2. In step (3) the coordinator waits for the votes COMMIT could possibly
3. In step (4) participants wait for a PRE-COMMIT or ABORT arrive from the coordinator
• Why can’t p ignore the
4. In step (5) the coordinator waits for ACKs
timeout and simply decide
5. In Step (6) participants wait for COMMIT Commit?
• By deciding Commit, p could
violate NB 54
Timeout Action (during W5)
Coordinator Participant p
VOTE-RE W1
Q
W2
VOTE

PRE-COM W3
MIT/ABO
RT

W4

ACK

W5 p not uncertain
COMMIT

55
3PC: Timeout Actions (5)

Example:
• On timeout, “what a process should do”•Say, depends
the coordinator failed after having sent the
on the message it was waiting for PRE-COMMIT to p but before sending it to some
other participant q
• There are five places in which a process waits
•Thus p willfor
time out in case (5) – outside its
some message in 3PC uncertainty period, while q will time out in case (3) –
inside its uncertainty period
1. In Step (2) participants wait for VOTE-REQ
•If p, on timeout, were to decide Commit while q
2. In step (3) the coordinator waits for the votes (which is operational) is still uncertain, it would violate
3. In step (4) participants wait for a PRE-COMMIT NB or ABORT
4. In step (5) the coordinator waits for ACKs •This suggests that before deciding Commit, p should
make sure that all operational participants have
5. In Step (6) participants wait for COMMIT received a PRE-COMMIT, and have therefore moved
outside their uncertainty periods
56
Timeout Actions (during W3 & W5)
Participant q Coordinator Participant p
VOTE-RE W1
Q
W2
VOTE
W3
PRE-COM
q uncertain, but MIT
operational W3
MIT
PRE-COM W4

ACK
p not uncertain, on timeout
W5 might decide Commit
COMMIT

57
3PC: Termination Protocol

• Let us define states of a process relative to the


messages it has sent or received Note:-
• Four possible states: 1.Any process is in precisely one
1. Aborted: the process has not voted, has voted No, or has state at any time
received an ABORT (i.e., it has either decided Abort or can 2.Some pairs of states can’t coexist,
i.e., can’t be occupied at the same
unilaterally decide so)
time by two operational processes
2. Uncertain: the process is in its uncertainty period
3. Committable: the process has received a PRE-COMMIT but
has not yet received a COMMIT
4. Committed: the process has received a COMMIT (i.e., it has
decided Commit)
58
3PC: Termination Protocol (2)

• Fig. shows which states can


coexist and which can’t
• Two operational sites can
occupy two states s and s' at
the same time iff the entry
whose row corresponds to s
and whose column
corresponds to s' contains a
“Y” while an “N” means they
can’t
• e.g., as per NB, if any
operational site is “Uncertain”
then no site (operational or
failed) can be “Committed”
59
3PC: Termination Protocol (3)

• The first step of termination protocol is coordinator


election, which works as follows:
• When a participant times out in case (3) or (5), it
initiates an election protocol
– Election protocol involves all processes that are operational and
results in the “election” of a new coordinator
– The old one must have failed; otherwise, no participant would
have timed out!
• Election Rule: If the present coordinator fails, the
smallest of the operational processes will become the
new coordinator
60
3PC: Termination Protocol (4)

• Once the new coordinator has been elected, the


termination protocol proceeds as follows:
• The coordinator sends a STATE-REQ message to all
processes that participated in the election
– By our assumption about failures, all operational sites will
participate
• A participant in the termination protocol (i.e., any
operational process other than the new coordinator)
responds to this message by sending its present state to
the coordinator
• The coordinator collects these states and proceeds
61
according to the following termination rule:
3PC: Termination Protocol (5)

• TR1: If some process is Aborted, the coordinator decides Abort, sends


ABORT messages to all participants, and stops
• TR2: If some process is Committed, the coordinator decides Commit,
sends COMMIT messages to all participants, and stops
• TR3: If all processes that reported their state are Uncertain, the coordinator
decides Abort, sends ABORT messages to all participants, and stops
• TR4: If some process is Committable but none is Committed, the
coordinator first sends PRE-COMMIT messages to all processes that
reported Uncertain, and waits for acknowledgments from these processes.
After having received these acknowledgments the coordinator decides
Commit, sends COMMIT messages to all processes, and stops
• A participant that receives a COMMIT (or ABORT) message, decides
62
Commit (or Abort), and stops
3PC: Time Complexity

• Resiliency and Blocking: 3PC is resilient to site failures


only. It is non-blocking except for a total site failure. By
Propositions 1 & 2 this is the maximal level of fault tolerance • This may appear deceptively
a non-blocking ACP can attain worse than the five rounds
required, in the worst case, for
• Time Complexity: In the absence of failures, 3PC uses at 2PC (independent of the
most five rounds of messages: number of failures!)
– (1) to distribute VOTE-REQs (2) to deliver votes (3) to distribute • Nevertheless, recall that 2PC
PRE-COMMITs (4) to acknowledge the PRE-COMMITs (5) to distribute
may cause blocking and the
COMMITs
five round bound concerned
– If the decision is Abort, only three rounds are needed
only non-blocked processes
• Each invocation of the termination protocol contributes at
most five more rounds, plus the election protocol, which
requires only one round to send UR-ELECTEDs
63
• Thus, if f processes fail, at most 6f + 5 rounds are needed
3PC: Message Complexity

= 3(f + 1)(2n - 1) - n

64
3PC and Communication Failures

• Up to this point we have assumed that no communication


failures occur
• Let us now remove this assumption. Unfortunately, the
termination protocol for 3PC, we’ve presented, may result in
processes reaching inconsistent decisions
• Example:
– Say, the processes are partitioned into two components A and B
– It is possible that, at the time the partition occurred, all processes in A are
Uncertain while all those in B are Committable (refer table)
– According to the termination protocol, the processes in A will decide Abort
(case TR3), while those in B will decide Commit (case TR4)
• We need new termination protocol that avoids such
inconsistencies and guarantees correctness even in the
65
presence of communication failures
3PC and Communication Failure (2)

• The overall structure of the termination protocol is as


before
– A coordinator is elected that collects the states of participants
and decides how to proceed on the basis of the states it has
received
• The termination protocol, in short:
– After being elected, a coordinator sends STATE-REQ messages
to all processes
– A process that receives this message from its coordinator
responds by sending its present state
– The coordinator waits for a period of time, collecting responses
– It then proceeds by using the Majority Termination Rule (self
study) 66
3PC and Communication Failure (3)

• Therefore, the majority termination rule guarantees


that a decision will be reached • The crucial property of
– Note:- It is not necessary for all failures to be repaired majorities, that is needed here,
is “any two majorities must
• In general, if there is a set of processes A that forms a intersect”
majority and such that every process in A can • A quorum is a generalization
communicate with the smallest process in A, then all of the concept of majority that
processes in A will reach a decision (i.e., will become also satisfies this intersection
property
unblocked)
• Note:- Certain actions can be taken by the coordinator
only if it can communicate with a majority of sites
67
Three-phase Commit: In Short

• Enhanced version of 2PC


• 3PC mitigates blocking issues
– Introduces an additional phase between Prepare and Commit, known as
the Pre-Commit phase, to ensure participants have enough time to respond, by adding a timeout
reducing the risk of indefinite blocking mechanism, allowing
• 1. Prepare Phase participants to autonomously
– Similar to 2PC, the coordinator sends a Prepare request to participants, decide to commit or abort if
asking if they can commit the transaction; participants respond Yes or No they don’t receive further
• 2. Pre-Commit Phase messages from the
– If all participants respond Yes, the coordinator sends coordinator
a Pre-Commit message, indicating the transaction is likely to be committed; • 3PC reduces the likelihood of
participants acknowledge the Pre-Commit, preparing to commit the indefinite blocking,
transaction but not yet completing it particularly helpful in
• 3. Commit Phase situations with unreliable
– If all participants acknowledge the Pre-Commit, the coordinator sends networks
the Commit command; if any participant fails to acknowledge the 68
Pre-Commit, the transaction is Aborted
2PC vs. 3PC: Closing Remarks

• The 1st version of 3PC has the advantage over 2PC that it
completely eliminates blocking (except, unavoidably, in the
event of total failures)
– Useful for systems built to tolerate only site failures
• The 2nd version of 3PC can be used in systems designed to
tolerate both site and communication failures
– It does not completely eliminate blocking but causes blocking less
frequently than 2PC
• For instance, in 2PC processes may be blocked even if just
one process – the coordinator – fails
• In the 2nd version of 3PC no process will be blocked (in the
absence of communication failures), as long as a majority of
the processes are still operational 69
2PC vs. 3PC: Closing Remarks (2)

• In most practical applications, the circumstances under which


2PC causes blocking are sufficiently rare, and thus blocking
is usually not considered a big problem
• Secondly, 3PC has greater message and round complexity
than 2PC
• Consequently, almost all systems we know of that employ
atomic commitment protocols use some version of 2PC
• Even though, generally, 3PC is not used in practice, it is an
interesting protocol both in its own right and also because it
illustrates a number of important techniques used in the
design of fault-tolerant communication protocols
• For these reasons, we feel that its study is a worthwhile
70
endeavor
Summary

• Multiple servers in cloud


– Replication for Fault-tolerance
– Load balancing across objects
• Replication Flavors using concepts we learnt
earlier
– Active replication
– Passive replication
• Transactions and distributed servers
– Two phase commit
– Three phase commit
71

You might also like