Consensus and Paxos
Consensus and Paxos
The following slides are an edited version of slides by professors Michael Freedman,
Kyle Jamieson, and Wyatt Lloyd taught in Princeton’s COS418 course (CC license).
Outline
1. From primary-backup to viewstamped replication
2. Consensus
3. Paxos Overview
2
Review: Primary-Backup Replication
Clients
shl
• Viewstamped Replication:
– State Machine Replication for any number of replicas
– Replica group: Group of 2f + 1 replicas
• Protocol can tolerate f replica crashes
5
With multiple replicas, don’t need to wait for all…
• Viewstamped Replication:
– State Machine Replication for any number of replicas
– Replica group: Group of 2f + 1 replicas
• Protocol can tolerate f replica crashes
• Assumptions
1. Handles crash failures only: Replicas fail only by completely stopping
2. Unreliable network: Messages might be lost, duplicated, delayed, or delivered
out-of-order
6
Replica State
1. configuration: identities of all 2f + 1 replicas
7
Normal Operation (f = 1)
8
Normal Operation: Key Points (f = 1)
9
Piggybacked Commits (f = 1)
10
The Need For a View Change
• So far: Works for f failed backup replicas
• But what if the f failures include a failed primary?
– All clients’ requests go to the failed primary
– System halts despite merely f failures
11
Views
• Let different replicas assume role of primary over time
• System moves through a sequence of views
– View = (view number, primary id, backup id, ...)
P
View #3
P
View #1
P
View #2
12
Correctly Changing Views
• View changes happen locally at each replica
• Old primary executes requests in the old view, new primary
executes requests in the new view
• Want to ensure state machine replication
• So correctness condition: Executed requests
1. Survive in the new view
2. Retain the same order in the new view
13
How do replicas agree to move to a new view?
• Definition:
• Make sure all servers in group receive the same updates in the
same order as each other
• Maintain own lists (views) on who is a current member of the group,
and update lists when somebody leaves/fails
• Elect a leader in group, and inform everybody
• Ensure mutually exclusive (one process at a time only) access to a
critical resource like a file
16
Consensus
22
Strawmen
• 3 proposers, 1 acceptor
– Acceptor accepts first value received
– No liveness with single failure
• 3 proposers, 3 acceptors
– Accept first value received, learners choose common value
known by majority
– But no such majority is guaranteed
23
Paxos
• Each acceptor accepts multiple proposals
– Hopefully one of multiple accepted proposals will have a majority vote
(and we determine that)
– If not, rinse and repeat (more on this)
24
Paxos: Well-behaved Run
1 1 1 1 1
2 2 2
<accept, decide
. . .
(1,v1)> v1
. . .
. . .
<prepare, 1> <promise, 1>
n n n
<accepted, (1 ,v1)>
25
Paxos Protocol Overview
• Proposers:
1. Choose a proposal number n
2. Ask acceptors if any accepted proposals with na < n
3. If existing proposal va returned, propose same value (n, va)
4. Otherwise, propose own value (n, v)
Note altruism: goal is to reach consensus, not “win”
26
Paxos Phase 1
• Proposer: • Acceptors:
– Choose proposal n, • If n > nh
send <prepare, n> to • nh = n ← promise not to accept
acceptors
any new proposals n’ < n
• If no prior proposal accepted
• Reply < promise, n, Ø >
• Else
• Reply < promise, n, (na , va) >
• Else
• Reply < prepare-failed >
27
Paxos Phase 2
• Proposer:
– If receive promise from majority of acceptors,
• Determine va returned with highest na, if exists
• Send <accept, (n, va || v)> to acceptors
• Acceptors:
– Upon receiving (n, v), if n ≥ nh,
• Accept proposal and notify learner(s)
na = nh = n
va = v
28
Paxos Phase 3
29
Paxos is Safe
• Intuition: if proposal with value v chosen, then every higher-
numbered proposal issued by any proposer has value v.
Majority of
acceptors Next prepare request
accept (n, v): with proposal n+1
v is chosen
30
Often, but not always, live
Process 0 Process 1
Completes phase 1
with proposal n0
Starts and completes phase 1
with proposal n1 > n0
Performs phase 2,
acceptors reject
Restarts and completes phase
1 with proposal n2 > n1
Performs phase 2, acceptors
reject
… can go on indefinitely … 31
Paxos Summary