0% found this document useful (0 votes)
23 views65 pages

Replication Consistency

This document discusses replication, failures, and consistency in distributed systems. It covers topics like primary-backup protocols, asynchronous and synchronous protocols, and the two-phase commit protocol. Replication is used for fault tolerance and performance, but keeping replicas consistent is challenging, especially for read-write data.

Uploaded by

dayyanali789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views65 pages

Replication Consistency

This document discusses replication, failures, and consistency in distributed systems. It covers topics like primary-backup protocols, asynchronous and synchronous protocols, and the two-phase commit protocol. Replication is used for fault tolerance and performance, but keeping replicas consistent is challenging, especially for read-write data.

Uploaded by

dayyanali789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

CS 382: Network-Centric Computing

Replication, Failures, Consistency

Zartash Afzal Uzmi


Spring 2023-24

ACK: Slides use some material from Scott Shenker (UC Berkeley) and Jim Kurose (UMass)
Agenda
⚫ Replication (of?)

⚫ Failures and Correctness

⚫ Primary-backup Protocols
⚫ Asynchronous protocols
⚫ Synchronous protocols

⚫ Two-phase Commit (2PC) Protocol


⚫ Example
2
Replication

3
Replication
⚫ When we replicate an object, we create copies of the object and store
them on different servers

⚫ Each copy is called a replica

4
Why Replication?
⚫ Fault tolerance
⚫ If one replica crashes, simply switch to another replica
⚫ With k replicas of each object, it can tolerate the failure of any (k-1) servers in the
system

⚫ Performance
⚫ Helps when scaling for size or geographical area
⚫ Load balancing: divide the workload among multiple servers
⚫ Placing the copy of the object in proximity of the client, e.g., CDNs

5
Nature of Replicated Data
⚫ Read-only data
⚫ Easy to replicate; we just make multiple copies
⚫ Read-write data
⚫ Writes result in different replicas. Any challenge?
⚫ Replicas must be kept consistent; modifications need to be propagated to all
other copies!
⚫ What do applications need: Read-only or Read-write?
⚫ Want the distributed system with multiple replicas to appear as if there was one
copy on a single machine. Want Read-write data replication!
⚫ Challenge:
⚫ When and how to propagate write updates?
6
Depends on Application Requirements
⚫ What do applications require?
⚫ From replicated/distributed systems

⚫ Availability
⚫ The application is operational and instantly processes requests
⚫ Some server failures do not prevent surviving servers from continuing to operate

⚫ Partition Tolerance:
⚫ The application continues to operate despite message loss due to network partition

7
Network Partitions Divide Systems

8
Network Partitions Divide Systems

9
Fundamental Tradeoff
⚫ Replicas appear to be a single [consistent] machine but lose availability
during a network partition

⚫ OR

⚫ All replicas remain available during a network partition but do not


appear to be a single machine (inconsistent data!!!)

10
CAP Theorem Preview
⚫ You cannot achieve all three of:
1. Consistency Consistency Availability
2. Availability
3. Partition-Tolerance
Partition
tolerance

⚫ Consistency ➙ Replicas Act Like Single Machine


⚫ Availability ➙ All Sides of Partition Continue
⚫ Partition Tolerance ➙ Partitions Can Happen
11
CAP Conjecture [Brewer 00]
⚫ From a keynote lecture by Eric Brewer (2000)
⚫ History: Eric started Inktomi, early Internet search site based around
“commodity” clusters of computers

⚫ Popular interpretation: 2-out-of-3


⚫ Consistency
⚫ Availability
⚫ Partition Tolerance

12
CAP Theorem [Gilbert Lynch 02]
Assume that an algorithm provides all of CAP (to contradict!)
Let us start with a variable x=0 consistently stored at A and B

Client 1 Client 1

A B

13
CAP Theorem [Gilbert Lynch 02]
Assume that an algorithm provides all of CAP (to contradict!)

w(x=1)
Client 1 Client 1
ok
A B

Write eventually returns


(from A)

Partition Possible
14
CAP Theorem [Gilbert Lynch 02]
Assume to contradict that an algorithm provides all of CAP

w(x=1) r(x)
Client 1 Client 1
ok x=0
A B

Read begins after write


Write eventually returns completes
Read eventually returns

Partition Possible
15
CAP Theorem [Gilbert Lynch 02]
Assume to contradict that an algorithm provides all of CAP
Not consistent (C) => contradiction!

w(x=1) r(x)
Client 1 Client 1
ok x=0
A B

Read begins after write


Write eventually returns completes
Read eventually returns

Partition Possible
16
CAP Interpretation Part 1
⚫ Cannot “choose” no partitions
⚫ 2-out-of-3 interpretation doesn’t make sense
⚫ Instead, availability OR consistency?

⚫ i.e., a fundamental tradeoff between availability and consistency


⚫ When designing a system, you must choose one or the other; both are not possible
simultaneously

17
CAP Interpretation Part 2
⚫ It is a theorem, with proof, that you understand!

⚫ Cannot “beat” CAP Theorem

⚫ Can engineer systems to make partitions extremely rare, and then just
take the rare hit to availability (or consistency)

18
Some Real Distributed Systems Relax
Consistency Constraints …

Memcache at
Facebook

19
Questions?

20
Node Failures and Correctness

21
Failure Model: Fail-Stop
Node Fails!

Nodes fail by crashing


A machine is either working correctly or it is doing nothing

22
Failure Model: Byzantine Failures
Node Fails

Node operates arbitrarily after a failure


(this includes not sending messages at all or sending different and wrong
messages to different servers or lying about a value)

Can be caused by
• Malicious attacks
• Software errors

23
Failures in Distributed Systems (DS)
⚫ What distinguishes DS from single-machine systems:
⚫ Some nodes might still be working correctly while others are experiencing failures
⚫ DS may continue to operate even if part of it is failing

⚫ A design goal for distributed systems:


⚫ “Correctly” operate even when failures occur

24
Correctness for Strong Consistency
⚫ Replicas act like a single machine

⚫ Specifically
⚫ If one node commits an update, no other replica rejects it
⚫ If one replica rejects it, no one commits the update

25
We will assume Fail-Stop model
⚫ Many consistency protocols assume fail-stop model

⚫ Reason: Byzantine failures are very hard to deal with and usually add
substantial performance overheads

⚫ However, Byzantine fault-tolerant solutions are important in certain


contexts
⚫ Covered in the advanced course “CS 582: Distributed Systems”
⚫ Also covered in “CS3812: Into to Blockchain”

26
Primary-backup protocols

27
Primary-backup protocols
⚫ One special node (primary) orders requests

⚫ Route all updates through the primary node


⚫ Assigns an order to the updates
⚫ All replicas commit updates in the assigned order
⚫ Simple to implement

28
Two Types of Primary-backup Schemes
⚫ Asynchronous primary-backup protocol

⚫ Synchronous primary-backup protocol

29
Primary-backup protocol

Client

Replica Primary Replica

30
Primary-backup protocol

Client

Write-request

Replica Primary Replica

31
Primary-backup protocol
• The primary sends an ACK to
the client once it has performed
the write locally
Client OR
• It waits for all replicas first to
perform the update and then
sends the ACK
Write-request

Replica Primary Replica

32
Primary-backup protocol[asynchronous version]

Client

Step 1
Write-request

Replica Primary Replica

33
Primary-backup protocol[asynchronous version]

Client

Step 1 ACK write


Write-request completed
Step 2

Replica Primary Replica

34
Primary-backup protocol[asynchronous version]

Client

Step 1 ACK write


Write-request completed
Step 2

Replica Primary Replica

Tell replicas Tell replicas


Step 3
to update to update
35
Primary-backup protocol[asynchronous version]

Client

Step 1 ACK write


Write-request completed
Step 2
Step 4 Update Update
ACK ACK
Replica Primary Replica

Tell replicas Tell replicas


Step 3
to update to update
36
Analyzing the Asynchronous Version
⚫ Performance (w/o failures):
⚫ The client does not need to spend any additional time waiting for the internals of
the system to do their work
⚫ The system is also more tolerant of network latency since fluctuations in internal
latency do not cause additional waiting on the client-side

⚫ What about correctness (with failures)?

37
What could go wrong if there are failures?
⚫ If the primary fails before the updates are sent to the backups,
⚫ Then updates may be lost

⚫ Correctness can get violated under failures


⚫ If one server commits (update), no one rejects it
⚫ If one rejects it, no one commits

⚫ From the client’s perspective


⚫ There are no guarantees that you can read back what you wrote if there are any
failures in the system – no “Read Your Writes” consistency!

38
Primary-backup protocol[synchronous version]

Client

Step 1
Write-request

Replica Primary Replica

39
Primary-backup protocol[synchronous version]

Client

Step 1
Write-request

Replica Primary Replica

Tell replicas Tell replicas


Step 2
to update to update
40
Primary-backup protocol[synchronous version]

Client

Step 1
Write-request

Step 3 Update Update


ACK ACK
Replica Primary Replica

Tell replicas Tell replicas


Step 2
to update to update
41
Primary-backup protocol[synchronous version]

Client

Step 1 ACK write


Write-request completed
Step 4
Step 3 Update Update
ACK ACK
Replica Primary Replica

Tell replicas Tell replicas


Step 2
to update to update
42
Analyzing the Synchronized Version
⚫ Performance (w/o failures):
⚫ Clients have to wait for additional time – used for synchronizing replicas
⚫ Network delay fluctuations inside the system impact the client

⚫ Can this version guarantee correctness with failures?

43
What could go wrong if there are failures?

44
Failure Scenario

Client

Write-request

Replica Primary Replica

45
Failure Scenario

Client

Write-request

Replica Primary Replica

Tell replicas Tell replicas


to update to update
46
Failure Scenario

Client

Write-request
This replica
crashes
Update
ACK

Replica Primary Replica

Write update Tell replicas Tell replicas


committed to update to update
47
Failure Scenario

The client assumes the


Client
write update failed

Write-request
This replica
crashes
Update
ACK

Replica Primary Replica

Write update Tell replicas Tell replicas


committed to update to update
48
Failure Scenario

Replicas do not agree


The client assumes the
We need a way to roll Client
write update failed
back updates

Write-request
This replica
crashes
Update
ACK

Replica Primary Replica

Write update Tell replicas Tell replicas


committed to update to update
49
Two Phase Commit (2-PC)
⚫ Consists of two distinct phases

⚫ Used in several distributed systems

50
Two Phase Commit (2-PC)

51
Key idea
⚫ Allow the system to roll back updates on failures
⚫ By using two phases

⚫ This is in contrast to single-phase primary-based protocols


⚫ Where there is no step for rolling back an operation that has failed on some nodes
and succeeded on other nodes

52
Terminology
⚫ Primary Node
⚫ Coordinator

⚫ Replicas
⚫ Cohort, worker, participant

⚫ Prepare phase (also voting phase OR commit-request phase)


⚫ Replicas become prepared
⚫ Commit phase
⚫ Transactions are committed across the system
53
2-PC

Coordinator Replica 1 … Replica N-1

54
2-PC

Coordinator Replica 1 … Replica N-1

Prepare Save update to


disk

Respond with
yes or No

55
2-PC

Coordinator Replica 1 … Replica N-1

Save update to
Prepare disk

Respond with
Yes from all yes or No
replicas within
the timeout

56
2-PC

Coordinator Replica 1 … Replica N-1

Save update to
Prepare disk

Respond with
Yes from all yes or No
replicas within
the timeout
Commit
Commit
updates from
disk to store
ACK
57
2-PC

Coordinator Replica 1 … Replica N-1

Save update to
Prepare disk

Respond with
If any “No” or yes or No
timeout before
all votes
Abort

ACK
58
Failures in 2-PC: Food for Thought
⚫ If a server votes yes, can it commit unilaterally before receiving commit
message?

⚫ If a server voted No, can it abort right away without waiting for an abort
message?

59
Failures in 2-PC
⚫ To deal with replica crashes
⚫ Each replica saves tentative updates into permanent storage, right before replying
Yes/No in the first phase
⚫ Retrievable after crash recovery

⚫ To deal with coordinator crashes


⚫ The coordinator logs all decisions and received/sent messages on disk
⚫ After recovery or new election ➙ new coordinator takes over

60
Correctness and Performance of 2-PC
⚫ Correctness: All hosts that decide reach the same decision
⚫ No commit unless everyone says “yes”

⚫ Performance: If failures, then 2PC might block


⚫ Failure of any process can result in non-progress
⚫ Doesn’t tolerate failures well: must wait for repair

61
2-PC Summary
⚫ Primary-backup schemes (1 Phase protocols)
⚫ Safety may be violated ➙ Can’t roll back updates

⚫ Two-Phase Commit
⚫ Allow for rolling back updates
⚫ Sensitive to coordinator failure ➙ blocking

62
Summing it up …
⚫ Replication is needed for fault tolerance and performance

⚫ But it introduces challenges


⚫ CAP theorem shows we cannot provide all three properties: strong consistency,
availability, and partition tolerance

⚫ How to design a consistency protocol for a replicated system?


⚫ Primary-backup protocols: asynchronous and synchronous
⚫ Need more than one phase to provide consistency under failures

63
Advanced Topics on Consistency
⚫ Paxos

⚫ Raft

⚫ PBFT

⚫ Blockchain

64
Questions?

66

You might also like