0% found this document useful (0 votes)
19 views34 pages

Consensus and Paxos

Uploaded by

rowantarek344
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views34 pages

Consensus and Paxos

Uploaded by

rowantarek344
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Consensus and Paxos

The following slides are an edited version of slides by professors Michael Freedman,
Kyle Jamieson, and Wyatt Lloyd taught in Princeton’s COS418 course (CC license).
Outline
1. From primary-backup to viewstamped replication

2. Consensus

3. Paxos Overview

2
Review: Primary-Backup Replication
Clients
shl

Logging State Logging State


Module Machine Module Machine

Log Log Servers


add jmp mov shl add jmp mov shl

• Nominate one replica primary


– Clients send all requests to primary
– Primary orders clients’ requests
3
From Two to Many Replicas
Clients
shl

Logging State Logging State Logging State


Module Machine Module Machine Module Machine

Log Log Log Servers


add jmp mov shl add jmp mov shl add jmp mov shl

• Primary-backup with many replicas


– Primary waits for acknowledgement from all backups
– All updates to set of replicas needs to update shared disk
4
What else can we do with more replicas?

• Viewstamped Replication:
– State Machine Replication for any number of replicas
– Replica group: Group of 2f + 1 replicas
• Protocol can tolerate f replica crashes

• Differences with primary-backup


– No shared disk (no reliable failure detection)
– Don’t need to wait for all replicas to reply
– Need more replicas to handle f failures (2f+1 vs f+1)

5
With multiple replicas, don’t need to wait for all…

• Viewstamped Replication:
– State Machine Replication for any number of replicas
– Replica group: Group of 2f + 1 replicas
• Protocol can tolerate f replica crashes

• Assumptions
1. Handles crash failures only: Replicas fail only by completely stopping
2. Unreliable network: Messages might be lost, duplicated, delayed, or delivered
out-of-order

6
Replica State
1. configuration: identities of all 2f + 1 replicas

2. In-memory log with clients’ requests in assigned order

⟨op1, args1⟩ ⟨op2, args2⟩⟨op3, args3⟩ ⟨op4, args4⟩

7
Normal Operation (f = 1)

Request Prepare PrepareOK Reply


Client
Execute
A (Primary)
B
C
Time →

1. Primary adds request to end of its log


2. Replicas add requests to their logs in primary’s log order
3. Primary waits for f PrepareOKs → request is committed

8
Normal Operation: Key Points (f = 1)

Request Prepare PrepareOK Reply


Client
Execute
A (Primary)
B
C
Time →

• Protocol provides state machine replication


• On execute, primary knows request in f + 1 = 2 nodes’ logs
– Even if f = 1 then crash, ≥ 1 retains request in log

9
Piggybacked Commits (f = 1)

Request Prepare PrepareOK Reply


Client
+Commit previous Execute
A (Primary)
B
C
Time →

• Previous Request’s commit piggybacked on current Prepare


• No client Request after a timeout period?
– Primary sends Commit message to all backups

10
The Need For a View Change
• So far: Works for f failed backup replicas
• But what if the f failures include a failed primary?
– All clients’ requests go to the failed primary
– System halts despite merely f failures

11
Views
• Let different replicas assume role of primary over time
• System moves through a sequence of views
– View = (view number, primary id, backup id, ...)

P
View #3
P
View #1
P
View #2
12
Correctly Changing Views
• View changes happen locally at each replica
• Old primary executes requests in the old view, new primary
executes requests in the new view
• Want to ensure state machine replication
• So correctness condition: Executed requests
1. Survive in the new view
2. Retain the same order in the new view

13
How do replicas agree to move to a new view?

How do replicas agree on what was executed


(and in what order) in the old view?
Consensus

• Definition:

1. A general agreement about something


2. An idea or opinion that is shared by all the
people in a group
Consensus Used in Systems
Group of servers want to:

• Make sure all servers in group receive the same updates in the
same order as each other
• Maintain own lists (views) on who is a current member of the group,
and update lists when somebody leaves/fails
• Elect a leader in group, and inform everybody
• Ensure mutually exclusive (one process at a time only) access to a
critical resource like a file
16
Consensus

Given a set of processors, each with an initial value:

• Termination: All non-faulty processes eventually decide on a value

• Agreement: All processes that decide do so on the same value

• Validity: Value decided must have proposed by some process


Safety vs. Liveness Properties
• Safety (bad things never happen)

• Liveness (good things eventually happen)


Paxos

• Safety (bad things never happen)


• Agreement: All processes that decide do so on the same value

• Validity: Value decided must have proposed by some process

• Liveness (good things eventually happen)


• Termination: All non-faulty processes eventually decide on a value
Paxos’s Safety and Liveness
• Paxos is always safe

• Paxos is very often live (but not always, more later)

• Also true for Viewstamped Replication, RAFT, and other


similar protocols
Roles of a Process in Paxos

• Three conceptual roles


– Proposers propose values
– Acceptors accept values, where value is chosen if majority accept
– Learners learn the outcome (chosen value)

• In reality, a process can play any/all roles

22
Strawmen

• 3 proposers, 1 acceptor
– Acceptor accepts first value received
– No liveness with single failure

• 3 proposers, 3 acceptors
– Accept first value received, learners choose common value
known by majority
– But no such majority is guaranteed

23
Paxos
• Each acceptor accepts multiple proposals
– Hopefully one of multiple accepted proposals will have a majority vote
(and we determine that)
– If not, rinse and repeat (more on this)

• How do we select among multiple proposals?


– Ordering: proposal is tuple (proposal #, value) = (n, v)
– Proposal # strictly increasing, globally unique
– Globally unique?
• Trick: set low-order bits to proposer’s ID

24
Paxos: Well-behaved Run

1 1 1 1 1

2 2 2
<accept, decide
. . .
(1,v1)> v1
. . .
. . .
<prepare, 1> <promise, 1>
n n n

<accepted, (1 ,v1)>

25
Paxos Protocol Overview
• Proposers:
1. Choose a proposal number n
2. Ask acceptors if any accepted proposals with na < n
3. If existing proposal va returned, propose same value (n, va)
4. Otherwise, propose own value (n, v)
Note altruism: goal is to reach consensus, not “win”

• Accepters try to accept value with highest proposal n


• Learners are passive and wait for the outcome

26
Paxos Phase 1
• Proposer: • Acceptors:
– Choose proposal n, • If n > nh
send <prepare, n> to • nh = n ← promise not to accept
acceptors
any new proposals n’ < n
• If no prior proposal accepted
• Reply < promise, n, Ø >
• Else
• Reply < promise, n, (na , va) >
• Else
• Reply < prepare-failed >
27
Paxos Phase 2
• Proposer:
– If receive promise from majority of acceptors,
• Determine va returned with highest na, if exists
• Send <accept, (n, va || v)> to acceptors

• Acceptors:
– Upon receiving (n, v), if n ≥ nh,
• Accept proposal and notify learner(s)
na = nh = n
va = v
28
Paxos Phase 3

• Learners need to know which value chosen


• Approach #1
– Each acceptor notifies all learners
– More expensive
• Approach #2
– Elect a “distinguished learner”
– Acceptors notify elected learner, which informs others
– Failure-prone

29
Paxos is Safe
• Intuition: if proposal with value v chosen, then every higher-
numbered proposal issued by any proposer has value v.

Majority of
acceptors Next prepare request
accept (n, v): with proposal n+1
v is chosen
30
Often, but not always, live
Process 0 Process 1
Completes phase 1
with proposal n0
Starts and completes phase 1
with proposal n1 > n0
Performs phase 2,
acceptors reject
Restarts and completes phase
1 with proposal n2 > n1
Performs phase 2, acceptors
reject
… can go on indefinitely … 31
Paxos Summary

• Described for a single round of consensus


• Proposer, Acceptors, Learners
– Often implemented with nodes playing all roles
• Always safe: Quorum intersection
• Very often live
• Acceptors accept multiple values
– But only one value is ultimately chosen
• Once a value is accepted by a majority it is chosen
Flavors of Paxos
• Terminology is a mess
• Paxos loosely and confusingly defined…

• We’ll stick with


– Basic Paxos
– Multi-Paxos
Flavors of Paxos: Basic Paxos
• Run the full protocol each time
– e.g., for each slot in the command log

• Takes 2 rounds until a value is chosen


Flavors of Paxos: Multi-Paxos
• Elect a leader and have them run 2nd phase directly
– e.g., for each slot in the command log
– Leader election uses Basic Paxos

• Takes 1 round until a value is chosen


– Faster than Basic Paxos

• Used extensively in practice!


– RAFT is similar to Multi Paxos

You might also like