0% found this document useful (0 votes)
12 views40 pages

Lecture 14

The document discusses fault models in distributed systems, including crash, rational, and Byzantine faults, and emphasizes the importance of designing systems to handle these faults effectively. It introduces state-machine replication and consensus protocols, particularly focusing on the Paxos algorithm, which aims to achieve agreement among distributed nodes despite potential failures. The document highlights the challenges of ensuring consistency and reliability in replicated services and the implications of the FLP impossibility result on consensus algorithms.

Uploaded by

ronesa3901
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views40 pages

Lecture 14

The document discusses fault models in distributed systems, including crash, rational, and Byzantine faults, and emphasizes the importance of designing systems to handle these faults effectively. It introduces state-machine replication and consensus protocols, particularly focusing on the Paxos algorithm, which aims to achieve agreement among distributed nodes despite potential failures. The document highlights the challenges of ensuring consistency and reliability in replicated services and the implications of the FLP impossibility result on consensus algorithms.

Uploaded by

ronesa3901
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

University of Pennsylvania

CIS 5050
Software Systems

Linh Thi Xuan Phan

Department of Computer and Information Science


University of Pennsylvania

Lecture 14: State-machine replication


April 2, 2024

©2016-2024 Linh Thi Xuan Phan


Plan for today
• Fault models NEXT

– Crash faults
– Rational faults
– Byzantine faults
• State-machine replication
– Consensus protocols
– FLP impossibility
• Paxos
– Intuition
– Algorithm

2
©2016-2024 Linh Thi Xuan Phan
Faults and failures
What is X? X=5
X=5 Correct
X=5

X=5 What is X?
Set X:=5 X=5 Fault
(masked)
X=5 X=5

What is X?
Faults
X=3 causing
X=3 failure

• Terminology:
– Fault: Some component is not working correctly
– Failure: System as a whole is not working correctly
3
©2016-2024 Linh Thi Xuan Phan
Faults in distributed systems
• What could possibly go wrong?
– Node loses power
– Hard disk fails
– Administrator accidentally erases data
– Administrator configures node incorrectly
– Software bug triggers
– Network overloaded, drops lots of packets
– Hacker breaks into some of the nodes
– Disgruntled employee manipulates node
– Fire breaks out in data center where node resides
– Police confiscates node because of illegal activity
– ...
– ...

4
©2016-2024 Linh Thi Xuan Phan
Common misconceptions about faults
• "Faults are rare exceptions"
– NO! At scale, faults are occurring all the time
– Stopping the system while handling the fault is NOT an option -
system needs to continue despite the fault

• "Faulty machines always stop/crash"


– NO! There are many types of faults with different effects
– If your system is designed to handle only crash faults and another
type of fault occurs, things can become very bad

5
©2016-2024 Linh Thi Xuan Phan
Types of faults
• Crash faults
– Node simply stops
– Examples: OS crash, power loss

• Rational behavior
– Owner manipulates node to increase profit
– Example: Traffic attraction attack (see next slide)

• Byzantine faults
– Arbitrary - faulty node could do anything
(stop, tamper with data, tell lies, attack
other nodes, send spam, spy on user...)
– Example: Node compromised by a hacker,
data corruption, hardware defect...

6
©2016-2024 Linh Thi Xuan Phan
Goldberg et al., "Rationality and Traffic Attraction: Incentives for honestly announcing paths in BGP", SIGCOMM 2010
Rational fault example
I have a good
$$$$
route

$$$$$$$ $$$$
Who knows
I need how to
get to YouTube?
connectivity!

I have an
okay route
$$$
$$$

Alice
• Example: Interdomain routing with BGP
– Control over the system is distributed
– Alice's provider can choose between several routes to the same destination
7
©2016-2024 Linh Thi Xuan Phan
Goldberg et al., "Rationality and Traffic Attraction: Incentives for honestly announcing paths in BGP", SIGCOMM 2010
Rational fault example

$$$$

I wish my
route
I havehad
a GREAT
routechosen
been to YouTube

$$$

Alice

• Networks have an incentive to make their routes


appear better than they are
8
©2016-2024 Linh Thi Xuan Phan
Some examples of Byzantine faults

http://status.aws.amazon.com/s3-20080720.html
9
©2016-2024 Linh Thi Xuan Phan
Source: http://www.justice.gov/criminal/cybercrime/duronioIndict.htm
Some examples of Byzantine faults
Disgruntled UBS PaineWebber Employee Charged with Allegedly Unleashing “Logic Bomb” on
Company Computers

NEWARK - A disgruntled computer systems administrator for UBS PaineWebber was charged today with
using a “logic bomb” to cause more than $3 million in damage to the company’s computer network, and with
securities fraud for his failed plan to drive down the company’s stock with activation of the logic bomb, U.S.
Attorney Christopher J. Christie announced. Roger Duronio, 60, of Bogota, N.J., was charged today a two-
count Indictment returned by a federal grand jury, according to Assistant U.S. Attorney William Devaney.
The Indictment alleges that Duronio, who worked at PaineWebber’s offices in Weehawken, N.J., planted the
logic bomb in some 1,000 of PaineWebber’s approximately 1,500 networked computers in branch offices
around the country. Duronio, who repeatedly expressed dissatisfaction with his salary and bonuses at Paine
Webber resigned from the company on Feb. 22, 2002. The logic bomb Duronio allegedly planted was
activated on March 4, 2002. In anticipation that the stock price of UBS PaineWebber’s parent company,
UBS, A.G., would decline in response to damage caused by the logic bomb, Duronio also purchased more
than $21,000 of “put option” contracts for UBS, A.G.’s stock, according to the charging document. A put
option is a type of security that increases in value when the stock price drops. Market conditions at the time
suggest there was no such impact on the UBS, A.G. stock price. [...] The Indictment alleges that, from about
November 2001 to February, Duronio constructed the logic bomb computer program. On March 4, as
planned, Duronio’s program activated and began deleting files on over 1,000 of UBS PaineWebber’s
computers. It cost PaineWebber more than $3 million to assess and repair the damage, according to the
Indictment. As one of the company’s computer systems administrators, Duronio had responsibility for, and
access to, the entire UBS PaineWebber computer network, according to the Indictment. He also had access
to the network from his home computer via secure Internet access. [...]

10
©2016-2024 Linh Thi Xuan Phan
Correlated faults
• A single problem can cause many faults
– Example: Overloaded machine crashes, increases load on other
machines ® domino effect
– Example: Bug is triggered in a program that is used on lots of
machines
– Example: Hacker manages to break into many computers due to a
shared vulnerability
– Example: Machines may be connected to the same power grid,
cooled by the same A/C, managed by the same admin
– ...

• Why is this problematic?

11
©2016-2024 Linh Thi Xuan Phan
Recap: Faults and failures
• Faults happen all the time
– Hardware malfunction, software bug, manipulation, hacker break-ins,
misconfiguration, ...
– NOT a rare occurrence at scale - must design system to handle
them

• All faults are NOT independent crash faults


– Faults can be correlated
– Rational and Byzantine faults are real

• Three common fault models:


– Crash fault model: Faulty machines simply stop
– Rational model: Machines manipulated by selfish owners
– Byzantine fault model: Faulty machines could do anything

12
©2016-2024 Linh Thi Xuan Phan
Plan for today
• Fault models
– Crash faults
– Rational faults
– Byzantine faults
• State-machine replication NEXT

– Consensus protocols
– FLP impossibility
• Paxos
– Intuition
– Algorithm

13
©2016-2024 Linh Thi Xuan Phan
Why should we care?
• Some services are so important that a failure or
downtime would be a disaster
– Examples: Google's central synchronization service,
Yahoo's ZooKeeper service, ...

• For such a service, even the best individual


machine may not be reliable enough!
– Idea: Multiple machines implement the service collectively
– Result: Service is available as long as a certain fraction of the
machines are working

14
©2016-2024 Linh Thi Xuan Phan
Goal: Replicated service

• How does this work?


– Client sends its request to each of the machines
– The machines coordinate and each return a result
– Client chooses one of the results, e.g., the one that is returned by
the largest number of machines
– If a small fraction of the machines returns the wrong result, or no
result at all, they are 'outvoted' by the other machines
15
©2016-2024 Linh Thi Xuan Phan
Challenges
• Faults must not be correlated
– Otherwise, all machines may fail at the same time
– Challenges: Bugs, power failures, misconfiguration, ...
• Each of the machines must process the requests in
the same way
– Otherwise, their state will diverge → Not obvious what the 'correct'
result should be
– Idea: Machines can implement a deterministic state machine!
• All machines must process the requests in the
same order
– ... even if the network reorders them or delays some of them!
– Otherwise, this can also cause the state to diverge
– Idea: Need consensus on the order in which to process
16
©2016-2024 Linh Thi Xuan Phan
Deterministic state machine

1,3,7,2 1,3,7,2

State: ABC State: ABC

Foo, bar Foo, bar

• What does this mean?


– IF two instances of the program start in the same state, and
– IF both are given the same sequence of inputs,
– THEN both instances produce the same sequence of outputs
• Is this the case for real software?
– What could be possible sources of nondeterminism?
– Can something be done about this?
17
©2016-2024 Linh Thi Xuan Phan
Consensus
• Intuition: Each process may propose a value, and
then the processes agree on which value they want
to use

• Formally, a solution must satisfy the following:


– Termination: Every correct process eventually decides
– Validity: If all processes propose the same value v, then every
correct process decides v
– Integrity: Every correct process decides at most one value, and it
can only decide values that have been proposed
– Agreement: If some correct process decides v, then every other
correct process also decides v

18
©2016-2024 Linh Thi Xuan Phan
FLP: Consensus is "impossible"!
• No asynchronous algorithm for agreeing on a one-
bit value can guarantee that it will terminate in the
presence of crash faults
– Even if no crash faults actually occur

• What now?
– Change the problem statement: Randomized algorithms,
approximate agreement, k-set agreement, ...
– Change the assumptions: Assume bounds on message delays, or
that we have a reliable oracle (failure detector) that tells us when a
node crashed
– Paxos: Guarantees safety but not liveness (termination)

M. Fischer, N. Lynch, M. Paterson, "Impossibility of distributed consensus with one faulty process",
Journal of the ACM, April 1985, 32(2):374-382. ACM Knuth Prize 2007!
19
©2016-2024 Linh Thi Xuan Phan
Plan for today
• Fault models
– Crash faults
– Rational faults
– Byzantine faults
• State-machine replication
– Consensus protocols
– FLP impossibility
• Paxos NEXT

– Intuition
– Algorithm

20
©2016-2024 Linh Thi Xuan Phan
What is Paxos?

Παξοί (Island of Paxos)

21
©2016-2024 Linh Thi Xuan Phan
An unusual paper...

22
©2016-2024 Linh Thi Xuan Phan
The Paxos algorithm

...
37. Add 7 to X
38. Read Y
39. (nop)
40. Z:=X+Y
...

• Scenario:
– There is a replicated append-only log ("ledgers" in the paper)
– The instances of this log are kept consistent by the protocol
– We can use this log to record the sequence of operations
→ All nodes will process them in the same order
23
©2016-2024 Linh Thi Xuan Phan
The Paxos algorithm
• Safety
– Only a value that has been proposed may be chosen
– Only a single value is chosen
– A node never learns that a value has been chosen unless
it actually has been chosen

• Liveness
– Some proposed value is eventually chosen
– A nodes can eventually learn the chosen value

• Paxos guarantees safety but not liveness

24
©2016-2024 Linh Thi Xuan Phan
The Paxos algorithm
• Simple strawman solution (not Paxos):
– There is a single node (let's call it the "acceptor") that decides which
entries should go into the log
– Other nodes can send messages to the acceptor to propose a new
entry they want to add
– The acceptor accepts proposals in the order in which they are
received
– When the acceptor adds a new entry, it sends a message to the
other nodes, so they can learn what the entry was

• What is wrong with this solution?

25
©2016-2024 Linh Thi Xuan Phan
Model
• Network:
– May lose messages (messenger leaves forever)
– May duplicate messages
– Asynchronous (messages can be delayed arbitrarily)
– But: No message corruption

• Nodes:
– Can fail by crashing (legislator leaves the Chamber)
– No central clock (hourglass timers)
– But: Have some persistent memory (ledgers)
– But: Strictly follow the protocol - no lying, data corruption...

26
©2016-2024 Linh Thi Xuan Phan
Phase 1: Prepare
• Suppose a node A wants to propose a value X for an
entry e:
– A chooses a new proposal number n
– A sends PREPAREe(n) to a majority ("quorum") of the other nodes

• Intuition: PREPAREe(n) means


– May I make a proposal for entry e with proposal number n?
– If so, can you suggest a value I should use?

– Note: Fairness is not a goal; A is happy with any different value

• The algorithm is run for a specific entry


– For clarity, we will omit the entry (subscript e) in the messages

27
©2016-2024 Linh Thi Xuan Phan
Phase 1: Prepare
• If a node B receives PREPARE(n) from A:
– If B has already acknowledged a PREPARE(n’) with n’>n for this
entry, then it does nothing.
– If B has previously accepted any proposals for this entry, it responds
with ACK(n, n', X'), where n' is the highest proposal number it has
accepted for this entry, and X' is the corresponding value.
– Otherwise, B responds with ACK(n, -, -).

• Intuition: An ACK means


– Yes, go ahead and make your proposal
– If a value X’ is given: you should choose value X’
– Otherwise: any value is fine with me (if X’ is not given)
– I won't accept any further proposals with proposal numbers < n

28
©2016-2024 Linh Thi Xuan Phan
Phase 2: Accept
• If A receives ACKs from a majority of the other
nodes, it issues ACCEPT(n, X*)
– X* is the value from the ACK with the highest proposal number, or
the original X if none of the ACKs had a value
• If B receives ACCEPT(n, X*)
– If B has already acknowledged to a PREPARE(n') with n’ > n, then
B does nothing
– Otherwise, B accepts the proposal and sends ACCEPT(n, X*) to all
the learners
• If a learner L receives ACCEPT(n, X*) from a
majority of the acceptors, it decides X*
– L then sends DECIDE(X*) to all the other learners
– If another learner receives DECIDE(X*), it decides X* for this entry
29
©2016-2024 Linh Thi Xuan Phan
Let's think about this
• What happens if two nodes concurrently propose
different values?

30
©2016-2024 Linh Thi Xuan Phan
Example
A's quorum

A B C D E F G H I

A: PREPARE(5)

31
©2016-2024 Linh Thi Xuan Phan
Example
A's quorum

A B C D E F G H I

A: PREPARE(5)
C,D,E,G,H: ACK(5,-,-)

32
©2016-2024 Linh Thi Xuan Phan
Example
A's quorum

A B C D E F G H I

A: PREPARE(5)
C,D,E,G,H: ACK(5,-,-)
A: ACCEPT(5,X)
E: ACCEPT(5,X)

33
©2016-2024 Linh Thi Xuan Phan
Example
A's quorum B's quorum

A B C D E F G H I

A: PREPARE(5)
C,D,E,G,H: ACK(5,-,-)
A: ACCEPT(5,X)
E: ACCEPT(5,X)
B: PREPARE(8)

34
©2016-2024 Linh Thi Xuan Phan
Example
A's quorum B's quorum

A B C D E F G H I

A: PREPARE(5)
C,D,E,G,H: ACK(5,-,-)
A: ACCEPT(5,X)
E: ACCEPT(5,X)
B: PREPARE(8)
D,F,G,I: ACK(8,-,-)

35
©2016-2024 Linh Thi Xuan Phan
Example
A's quorum B's quorum

A B C D E F G H I

A: PREPARE(5)
C,D,E,G,H: ACK(5,-,-)
A: ACCEPT(5,X)
E: ACCEPT(5,X)
B: PREPARE(8)
D,F,G,I: ACK(8,-,-)
E: ACK(8,5,X)

36
©2016-2024 Linh Thi Xuan Phan
Example
A's quorum B's quorum

A B C D E F G H I

A: PREPARE(5)
C,D,E,G,H: ACK(5,-,-)
A: ACCEPT(5,X)
E: ACCEPT(5,X)
B: PREPARE(8)
D,F,G,I: ACK(8,-,-)
E: ACK(8,5,X)

B: ACCEPT(8,X)

37
©2016-2024 Linh Thi Xuan Phan
Example
A's quorum B's quorum

A B C D E F G H I

A: PREPARE(5)
C,D,E,G,H: ACK(5,-,-)
A: ACCEPT(5,X)
E: ACCEPT(5,X)
B: PREPARE(8)
D,F,G,I: ACK(8,-,-)
E: ACK(8,5,X)

B: ACCEPT(8,X)

D,..,I: ACCEPT(8,X)

38
©2016-2024 Linh Thi Xuan Phan
For clarify, we assume only A, B, C are learners. In general, all nodes are learners.
Example
A's quorum B's quorum

A B C D E F G H I

A: PREPARE(5)
C,D,E,G,H: ACK(5,-,-)
A: ACCEPT(5,X)
E: ACCEPT(5,X)
B: PREPARE(8)
D,F,G,I: ACK(8,-,-)
E: ACK(8,5,X)

B: ACCEPT(8,X)

D,..,I: ACCEPT(8,X)
A: DECIDE(X)

39
©2016-2024 Linh Thi Xuan Phan
For clarify, we assume only A, B, C are learners. In general, all nodes are learners.
Recap: Paxos
• Goal: Build a replicated service
– Multiple machines acting 'as if' they were a single machine
– Can mask faults if not too many happen simultaneously
• Paxos implements an important building block: A
consistent append-only log
– Useful to make the replicas agree on the order in which to process
requests → prevent divergence
– More generally, consensus is useful in many other scenarios
• But: Paxos assumes crash faults
– Malicious nodes can easily disrupt the algorithm by telling lies
– Can we build a similar protocol that can tolerate malicious nodes as
well?
– Yes, we can! See next lecture.
40
©2016-2024 Linh Thi Xuan Phan

You might also like