Distributed Computing Replication Control
Distributed Computing Replication Control
Replication Control
A K Singh
Computer Engineering
N I T Kurukshetra
Slides adapted from Indy@UIUC © IG
Server-side Focus
2
Replication: What and Why
3
Rollback Recovery vs. Rollforward Recovery
Repair interval
message ≡ request
4
Availability
7
Replication Transparency
Replicas of an
Front ends Replica 1 object O
provide replication
transparency
Client Front End
Replica 2
Client
Front End
Client Replica 3
Requests
(replies flow opposite) 8
Replication Consistency
Requests
(replies flow opposite) 10
Active Replication
Multicast
Front ends Replica 1 inside
provide replication Replica group
transparency
Client Front End
Replica 2
Client
Front End
Client Replica 3
Requests
(replies flow opposite) 11
Active Replication Using Concepts You’ve
Learnt earlier
• Can use any flavor of multicast ordering,
depending on application
– FIFO ordering
– Causal ordering
– Total ordering
– Hybrid ordering
• Total or Hybrid (*-Total) ordering + Replicated
State machines approach
– => all replicas reflect the same sequence of updates to
the object
12
Active Replication Using Concepts You’ve
Learnt earlier (2)
• What about failures?
– Use virtual synchrony (i.e., view synchrony)
• Virtual synchrony with total ordering for
multicasts =>
– All replicas see all failures/joins/leaves and all
multicasts in the same order
– Could also use causal (or even FIFO) ordering if
application can tolerate it
13
Transactions and Replication
• One-copy serializability
– A concurrent execution of transactions in a replicated database
is one-copy-serializable if it is equivalent to a serial execution of
these transactions over a single logical copy of the database
– (Or) The effect of transactions performed by clients on replicated
objects should be the same as if they had been performed one at
a time on a single set of objects (i.e., one replica per object)
• In a non-replicated system, transactions appear to
be performed one at a time in some order
– Correctness means serial equivalence of transactions
• When objects are replicated, transaction systems
for correctness need one-copy serializability
14
Next
15
Transactions with Distributed Servers
Server 1
Transaction T Object
write(A,1); A
write(B,2); Object B
… .
write(Y, 25); .
write(Z, 26); .
commit
Server 13
Object Y
Object Z
16
Transactions With Distributed Servers
17
Transactions With Distributed Servers
18
Atomic Commitment Protocol
20
Atomic Commitment Protocol (3)
21
Atomic Commitment Protocol (4)
27
Two-phase Commit
Coordinator
…
Server Server 1 Server 13
Prepare
• Save updates to disk
• Respond with “Yes” or
“No”
Two-phase Commit
Coordinator
…
Server Server 1 Server 13
Prepare
• Save updates to disk
• Respond with “Yes” or
“No”
If any
“No” vote Abort
or timeout
before all
(13) votes
Two-phase Commit
Coordinator
…
Server Server 1 Server 13
Prepare
• Save updates to disk
• Respond with “Yes” or
“No”
All (13)
“Yes” Commit
votes
received
within
timeout?
Two-phase Commit
Coordinator
…
Server Server 1 Server 13
Prepare
• Save updates to disk
• Respond with “Yes” or
“No”
All (13)
“Yes” Commit
votes • Wait! Can’t commit or abort
received before receiving next message!
within
timeout?
Two-phase Commit
Coordinator
…
Server Server 1 Server 13
Prepare
• Save updates to disk
• Respond with “Yes” or
“No”
All (13)
“Yes” Commit
votes • Commit updates from disk
received to store
within
OK
timeout?
Two-phase Commit: An Example
35
Failure Model for Commit protocols
36
Using Paxos in Distributed Servers
In step (2), if a participant waiting for a VOTE-REQ from the coordinator timeouts, it can simply
decide Abort and stop • In step (4), if a participant
1. The coordinator sends a VOTE-REQ (i.e., vote request) message to all voted Yes and waiting for a
participants COMMIT or ABORT from
2. When a participant receives a VOTE-REQ, it responds by sending to the the coordinator timeouts, it is
coordinator a message containing that participant’s vote: YES or NO. If uncertain; now, it can consult
the participant votes No, it decides Abort and stops other processes to find out
3. The coordinator collects the vote messages from all participants, If all of what to decide
them were YES and the coordinator’s vote is also Yes, then the • An insight for termination
– Say, there are two participants p and q
coordinator decides Commit and sends COMMIT messages to all
– The coordinator might send a
participants; otherwise, the coordinator decides Abort and sends ABORT COMMIT or ABORT to q but fail just
messages to all participants that voted Yes (those that voted No already before sending it to p. Thus, even
though p is uncertain, q is not. If p can
decided Abort in step (2)). In either case, the coordinator then stops communicate with q, it can find out
4. Each participant that voted Yes waits for a COMMIT or ABORT the decision from q. It need not block
waiting for the coordinator’s recovery
message from the coordinator. When it receives the message, it decides
accordingly and stops
39
2PC: The Cooperative Termination
43
2PC: Lessons Learned
• After the coordinator has found that all votes were Yes, it sends • If a process votes No, then
PRE-COMMIT messages to the participants 3PC behaves just like 2PC
• When a participant p receives that message, it knows that al1 • The coordinator sends
processes voted Yes and is thereby moved outside its uncertainty ABORT to all processes
period
– However, p does not decide Commit yet
• At this point, p knows that it will decide Commit provided it does not
fail
• Each participant acknowledges the receipt of PRE-COMMIT
• When the coordinator has received all the acknowledgments to
PRE-COMMITs, it knows that no participant is uncertain anymore
• It then sends COMMIT to all participants
• When a participant receives a COMMIT it can decide Commit
• This decision satisfies NB since no process is uncertain any longer 45
Three-phase Commit: Assumptions
PRE-COM W3
MIT/ABO
RT
W4
ACK
W5
COMMIT
49
3PC: Timeout Actions (2)
PRE-COM W3
MIT/ABO
RT
W3
MIT
q uncertain, but failed PRE-COM W4 p might decide Commit
ACK
W5
COMMIT
51
3PC: Timeout Actions (3)
W4
ACK
W5
COMMIT
53
3PC: Timeout Actions (4)
PRE-COM W3
MIT/ABO
RT
W4
ACK
W5 p not uncertain
COMMIT
55
3PC: Timeout Actions (5)
Example:
• On timeout, “what a process should do”•Say, depends
the coordinator failed after having sent the
on the message it was waiting for PRE-COMMIT to p but before sending it to some
other participant q
• There are five places in which a process waits
•Thus p willfor
time out in case (5) – outside its
some message in 3PC uncertainty period, while q will time out in case (3) –
inside its uncertainty period
1. In Step (2) participants wait for VOTE-REQ
•If p, on timeout, were to decide Commit while q
2. In step (3) the coordinator waits for the votes (which is operational) is still uncertain, it would violate
3. In step (4) participants wait for a PRE-COMMIT NB or ABORT
4. In step (5) the coordinator waits for ACKs •This suggests that before deciding Commit, p should
make sure that all operational participants have
5. In Step (6) participants wait for COMMIT received a PRE-COMMIT, and have therefore moved
outside their uncertainty periods
56
Timeout Actions (during W3 & W5)
Participant q Coordinator Participant p
VOTE-RE W1
Q
W2
VOTE
W3
PRE-COM
q uncertain, but MIT
operational W3
MIT
PRE-COM W4
ACK
p not uncertain, on timeout
W5 might decide Commit
COMMIT
57
3PC: Termination Protocol
= 3(f + 1)(2n - 1) - n
64
3PC and Communication Failures
• The 1st version of 3PC has the advantage over 2PC that it
completely eliminates blocking (except, unavoidably, in the
event of total failures)
– Useful for systems built to tolerate only site failures
• The 2nd version of 3PC can be used in systems designed to
tolerate both site and communication failures
– It does not completely eliminate blocking but causes blocking less
frequently than 2PC
• For instance, in 2PC processes may be blocked even if just
one process – the coordinator – fails
• In the 2nd version of 3PC no process will be blocked (in the
absence of communication failures), as long as a majority of
the processes are still operational 69
2PC vs. 3PC: Closing Remarks (2)