0% found this document useful (0 votes)
16 views68 pages

T1 BFTSMR

The document summarizes a tutorial on state machine replication for Byzantine fault tolerance (BFT). It covers the basics of state machine replication, including passive and active replication approaches. It discusses the requirements for BFT state machine replication, including total order multicast. It also outlines some fundamental results in distributed systems, including the impossibility of reliable communication and the equivalence between total order multicast and consensus.

Uploaded by

zqkhtggvbq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views68 pages

T1 BFTSMR

The document summarizes a tutorial on state machine replication for Byzantine fault tolerance (BFT). It covers the basics of state machine replication, including passive and active replication approaches. It discusses the requirements for BFT state machine replication, including total order multicast. It also outlines some fundamental results in distributed systems, including the impossibility of reliable communication and the equivalence between total order multicast and consensus.

Uploaded by

zqkhtggvbq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

4/10/12

Tutorial  on
EuroSys 2012

(BFT)  State  Machine  


Replication:  
The  Hype,  The  Virtue...  
and  even  some  Practice  

Alysson Neves Bessani


[email protected]
University of Lisbon, Faculty of Sciences
© Alysson Bessani. All rights reserved. 1

Summary
• Part 1: The Basics
o State machine replication
o Potential applications
o 5 fundamental results on distributed systems
o Paxos/Viewstamped replication
o Castro & Liskov’ PBFT
• Part 2: BFT Literature Review
o Improving performance
o Improving resource efficiency
o Improving robustness
• Part 3: Applications, Open Problems & Practice
o BFT Applications
o Open problems on BFT
o BFT-SMaRt
o Practice: a BFT KV (in memory) Store

© Alysson Bessani. All rights reserved. 2

1  
4/10/12  

Part  I
The Basics

EuroSys 2012
© Alysson Bessani. All rights reserved. 3

Replication
• Replication is a technique used for performance
and/or fault tolerance

• Each replica is a state machine:


o A deterministic program that receives an input, change its state and
produces an output
o State transitions are atomic

• Replication can be passive or active

© Alysson Bessani. All rights reserved. 4

2  
4/10/12  

Passive  Replication
• Also called Primary-Backup (PB) or master-slave
• Clients talk with the primary, that sends the
operations and checkpoints to the backups
o Sometimes backup replicas answer read-only operations
• If the primary crashes, one of the backups takeover
op1,  op2

op1 PRIMARY
CLIENTS
checkpoint  |  op1,  op2
op2

BACKUPS

state  +  log state  +  log


© Alysson Bessani. All rights reserved. 5

Active  Replication
• Also called State Machine Replication – SMR
(Schneider, ACM CS 1990)
• All servers execute the same set of operations in the
same order (servers are always “synchronized”)
• Clients wait for the first reply (crash faults)

Waits  for  the  first  reply op1,  op2

op1 op1,  op2


CLIENTS SERVERS

op1,  op2
op2

© Alysson Bessani. All rights reserved. 6

3  
4/10/12  

Byzantine  Fault  Tolerance


• A component that suffers an arbitrary (or Byzantine)
failure can exhibit any behavior
o Stay silent, delay messages, change messages
o It can model intrusions

• PB is hard to use with this fault model


o How to know if the reply/delta checkpoint produced by
the primary is correct?

• SMR is the way to go:


o All replicas execute the operations and send replies
o Clients can vote for the correct reply

© Alysson Bessani. All rights reserved. 7

BFT  State  Machine  


Replication
• All servers execute the same sequence of operations
• Requires total order multicast

Waits  f+1  equal  replies op1,  op2

op1 op1,  op2


CLIENTS SERVERS

Total  Order  Multicast op1,  op2


op2
op1,  op2

© Alysson Bessani. All rights reserved. 8

4  
4/10/12  

SMR  Requirements
• Initial state: All replicas start on the same state
Easy!
• Coordination: All replicas receive the same
sequence of inputs Total Order
Multicast

• Determinism: all replicas receiving the same input


on the state produce the same output and resulting
state
Easy?

© Alysson Bessani. All rights reserved. 9

System  Properties
• Safety: all servers execute the same sequence of
requests

• Liveness: all correct clients requests are executed

© Alysson Bessani. All rights reserved. 10

5  
4/10/12  

System  Models:  
BFT  SMR  Assumptions
• Faults:
o How many faulty servers and clients the system tolerate? Of what type
(e.g., crash, crash-recovery, Byzantine)?

• Time
o Do I need time assumptions (e.g., upper bound on message and
execution times, synchronized clocks)?

• Connectivity
o All processes are connected?
o The communication links are reliable? Authenticated?

• Cryptography
o What cryptography assumptions are needed?

• Architecture
o Homogeneous or heterogeneous?

© Alysson Bessani. All rights reserved. 11

Five  Distributed  Computing  


Fundamental  Results  
• Impossibility of reliable communication
• Equivalence between total order and consensus
• Impossibility of fault-tolerant consensus
• Minimum synchrony required for FT consensus
• Fault thresholds: f+1, 2f+1, 3f+1 …

© Alysson Bessani. All rights reserved. 12

6  
4/10/12  

Impossibility  of  Reliable  


Communication
• How can we implement reliable channels on
unreliable networks?
o We can’t! We need some weak reliability guarantee in
order to build them…
• Fair channels:
o If a message is sent infinitely many times through a channel, it will be received
infinitely by its receiver
• A practical interpretation:
o a channel can lose messages for some time
o eventually, some of these messages will reach the destination
• Reliability can now be implemented:
o Send a message repeatedly until an ACK is received
o For BFT, a HMAC should be added to each message, and when it is not valid
the message is discarded

© Alysson Bessani. All rights reserved. 13

Total  Order  Multicast  and  


Consensus  Equivalence
• Total order multicast is equivalent to consensus:
o A consensus protocol can be used to solve atomic broadcast

p1
p2 Reliable Consensus
p3 Multicast
p4
• Why it works? every process decide the same set
o An atomic broadcast protocol can be used to solve consensus

p1 Total
p2 Order
p3 Multicast
p4
• Why it works? The decision will be the first message delivered first to
every process
• This equivalence holds in most system models
© Alysson Bessani. All rights reserved. 14

7  
4/10/12  

Impossibility  of  Fault-­‐‑


Tolerant  Consensus
• Result:
Consensus is not solvable in asynchronous systems with reliable channels
(or reliable shared memory) even with one crash fault

• Why?
o Cannot differentiate faulty from slow processes p1 and p2 does not
p1 receive  anything
from  p3
v  =  1
p1 and p2 cannot
decide between v  =  ?
0 or 1
p3

v  =  0

p2
© Alysson Bessani. All rights reserved. 15

Minimum  Synchrony  
required  for  FT  Consensus
• Result:
Fault-tolerant consensus can be solved in the eventually synchronous
system model

• Why?
o The system is asynchronous but has the notion of time
o After some point, the system will become synchronous (bounded but
unknown communication and processing delays)
o If the algorithm keeps trying (always ensuring safety) and increasing the
timeout values, it will be able to solve consensus

p1
p2 Round  0 New Round  1 New
p3 Round Round
p4
p0  have  T  seconds p1  have  1.5T  seconds
to  enforce  its  value to  enforce  its  value
© Alysson Bessani. All rights reserved. 16

8  
4/10/12  

Fault  Thresholds
• State Machine Replication has two phases
o Ordering è consensus requirements

Crash Byzantine
Synchronous f+1 3f+1/f+1*
Non-­‐‑Synchronous 2f+1 3f+1
* using signatures
o Execution è voting requirements

Crash Byzantine
f+1 2f+1

o The required number of replicas is the maximum required among these


two phases.

© Alysson Bessani. All rights reserved. 17

(Non-­‐‑Byzantine)  FT  State  


Machine  Replication
• Paxos (Lamport, TOCS 1998)
o Agreement framework
o Can be instantiated as a consensus primitive or a SMR algorithm
o Three roles: proposer, acceptor and learner

• Viewstamped Replication (Oki & Liskov, PODC’88)


o Similar to Paxos as a SMR algorithm
o System model (also similar to Paxos):
• Unbounded number of crash-prone clients
• 2f+1 replicas
• Stable storage
• Partially synchronous system
o Safety is always ensured, but Liveness requires synchrony

© Alysson Bessani. All rights reserved. 18

9  
4/10/12  

Viewstamped  Replication
Request Reply

c
Prepare PrepareOk
(leader)
0

2
• Requests are executed only after the majority of the
replicas have it on its log
• It ensures the request will be visible even if the leader fails

© Alysson Bessani. All rights reserved. 19

Viewstamped  Replication
Request

DoViewChange StartView
(old  leader)
0

1
(new  leader)

2
• If a replica suspects the leader, it sends a message
to the next leader
• If the next leader receives f+1 messages, it
synchronizes replica logs and start a new view
© Alysson Bessani. All rights reserved. 20

10  
4/10/12  

Some  Industrial  Applications  


of  Paxos/VR
• Oracle’ Berkley DB
o At least for leader election

• Google’ Chubby (Burrows, OSDI’06)


• Google Megastore (Baker et al, CIDR’11)
o Uses in a different way…

• Yahoo!/Apache Zookeeper (Hunt et al, USENIX’10)


o Zab is a protocol similar to Paxos

• IBM’ Spinnaker (Rao et al, VLDB’11)


• MS’ Gaios (Bolosky et al, NSDI’11)
• MS’ Windows Azure Storage (Calder et al, SOSP’11)
o Paxos for intra-datacenter replication

© Alysson Bessani. All rights reserved. 21

Practical  Byzantine  Fault  


Tolerance  (PBFT)
• The paper (Castro & Liskov, OSDI’99)
sparkled the interest in BFT replication
o It shows BFT can be fast through the avoidance of public-key
crypto (using HMAC vectors instead)
o Other BFT papers both extend and use the PBFT protocol (and
implementation) as a baseline
o Several versions published: OSDI’99, TOCS’02, Liskov’ 2010 book
chapter

© Alysson Bessani. All rights reserved. 22

11  
4/10/12  

PBFT:  System  Model


• Asynchronous distributed system
o Needs partial synchrony for Liveness

• Network can lose, delay, reorder and duplicate


messages; but cannot do that indefinitely
o i.e., they require fair links to implement reliable channels

• Byzantine fault model


o Fault independence (i.e., no common mode faults)
o N = 3f+1 servers, being at most f faulty
o An unbounded number of clients, all of them can be faulty

• Cryptography
o PK signatures to simplify the protocol presentation
o MAC (each pair of processes share a key)
o Digests (hashes)

© Alysson Bessani. All rights reserved. 23

PBFT:  Normal  Operation

• Algorithm outline:
o System evolves in views, numbered sequentially. In each view v, one
server is the primary, the others are the backups: primaryv = v mod N
o Client multicasts a signed request to all servers
o Servers reach agreement about the sequence number of the request
• The primary proposes the sequence number for each request
• The backups confirm that the primary follows the protocol
o If the primary fails, there is a view change
o Client waits for at least f+1 replies with the same result (at least one
correct server executed the operation and produced the result)
© Alysson Bessani. All rights reserved. 24

12  
4/10/12  

PBFT:  Normal  Operation  I


v  is  the  view  number;  n  is  the  sequence  number  of  m

• Pre-prepare phase:
o primary receives a correctly signed request m
o It assigns a sequence number n to the message and sends this number, a
digest of request D(m) and its current view number to all backups (other
replicas) in a PRE-PREPARE message
o backup replicas receive the message and test its validity, i.e., if n was not
assigned to another request and if it is in view v
o If a replica has m and a valid PRE-PREPARE for it, it proceeds to the prepare
phase (m is pre-prepared)
© Alysson Bessani. All rights reserved. 25

PBFT:  Normal  Operation  II


v  is  the  view  number;  n  is  the  sequence  number  of  m

• Prepare phase:
o replicas store the received PRE-PREPARE message
o each replica sends a PREPARE message to other replicas containing v, n and
the digest D(m) of the message
o all servers that receive 2f PREPARE message from other replicas with the same
v, n and D(m), proceed to the commit phase
o when a replica finishes the prepare phase for m, we say that m is prepared on
this replica

© Alysson Bessani. All rights reserved. 26

13  
4/10/12  

PBFT:  Normal  Operation  III


v  is  the  view  number;  n  is  the  sequence  number  of  m

• Commit phase:
o each replica multicasts a COMMIT message containing v and n
o the request m for which n was assigned is executed when:
• a replica receives 2f COMMIT messages with the same v and n from
other replicas
• all requests with sequence number lower than n are executed
o when the replica i finishes the commit phase we say that m is committed in i

© Alysson Bessani. All rights reserved. 27

PBFT:  Protocol  Invariants


• <m,n,v> is prepared in a correct replica →
2f+1 replicas pre-prepared <m,n,v> →
at least f+1 of them are correct →
(f+1) + (2f+1) > 3f+1 (any 2f+1 quorum of the system will contain at least
one of these correct replicas) →
it is impossible to have <m’,n,v> prepared (m’ ≠ m) on some correct
replica (a correct replica will not pre-prepare two messages with the
same n and v)

• <m,n,v> is committed in a correct replica →


2f+1 replicas prepared <m,n,v> →
at least f+1 of them are correct →
any 2f+1 quorum of this system will contain at least one of these correct
replicas (that can show that <m,n,v> is prepared)

© Alysson Bessani. All rights reserved. 28

14  
4/10/12  

PBFT:  Checkpoint  and  


View  Change
• Checkpoints
o All prepared and committed messages are logged in memory
o Periodically, replicas exchange messages to save a stable checkpoint
and truncate the log

• View Change Protocol

o If 2f+1 replicas suspect the primary of view v, a new view is started


o The objective of this protocol is to make the correct replicas agree about
a new primary and the state of the log

© Alysson Bessani. All rights reserved. 29

PBFT:  Checkpoint
• Every protocol message is only accepted (and logged) if the
assigned sequence number falls on a certain interval marked
by two values: h and H = h + L (maximum log size)
• Periodically (every K request executions), the replicas
exchange CHECKPOINT messages to advance h and H by K
• CHECKPOINT messages contain a digest of system’ state
before the checkpoint and the sequence number n of the last
executed request to reach this state (n mod K = 0)
• Replicas store 2f+1 CHECKPOINT messages as a proof that no
other checkpoint for n is possible
o (2f+1) + (2f+1) = 4f+2; even with f Byzantine 4f+2 – f > 3f+1
• All messages regarding requests with sequence number small
than n can be discarded from the log
• Late replicas can update themselves fetching states that can
be proved correct with 2f+1 CHECKPOINT messages

© Alysson Bessani. All rights reserved. 30

15  
4/10/12  

PBFT:  View  Change  I

• A backup replica triggers the view change protocol if it stays with some
pending message m for more than a certain time limit (request timeout expires)
• At this point, the replica stops accepting messages for v and sends a VIEW-
CHANGE message containing:
o the next view number v+1
o the sequence number n of the last stable checkpoint
o a set C of 2f+1 signed CHECKPOINT messages that validate n
o a set P of messages prepared in i on views v’ ≤ v
o a set Q of messages pre-prepared in i on views v’ ≤ v

© Alysson Bessani. All rights reserved. 31

PBFT:  View  Change  II

• VIEW-CHANGE messages are accepted if C validates n and all messages in P and Q


are from views ≤ v
• for each accepted VIEW-CHANGE message, a replica sends a VIEW-CHANGE-ACK
to the primary of the next view (v+1)
• the new primary only accept a VIEW-CHANGE from a replica if it receives 2f-1 VIEW-
CHANGE-ACKs for it from other replicas
(the conference paper you read on assignment 2 does not contain this phase, but it
requires PK signatures on view changes)

© Alysson Bessani. All rights reserved. 32

16  
4/10/12  

PBFT:  View  Change  III

• the new primary uses the information on accepted VIEW-CHANGE messages


to define new view’s h as the highest sequence number found on a valid
checkpoint
• for each sequence number n such that h < n ≤ h + L
o if there is some message m prepared with n in 2f+1 replicas (possibly commited in some of them),
the sequence number n must be assigned to m
o otherwise, n must be assigned to a null operation (this only fill gaps)
• these assignments must be sent to other replicas in a NEW-VIEW message
together with a digest from each accepted VIEW-CHANGE message used to
define them

© Alysson Bessani. All rights reserved. 33

PBFT:  View  Change  IV


What  
happens  
now?

• each backup replica that receive the NEW-VIEW obtains the VIEW-
CHANGE messages used to build it
o they can have it already or they can fetch them from other replicas
• with these messages, each <message, sequence number> assignment
contained on the NEW-VIEW message can be verified (with the same
procedure used by the primary used to choose these assignments)
o if there some assignment is invalid, a VIEW-CHANGE for v+2 is sent to all replicas
o otherwise, a PREPARE message is sent for each assignment and the protocol resumes to its normal
behavior, as if the assignment was a PRE-PREPARE message

© Alysson Bessani. All rights reserved. 34

17  
4/10/12  

Why  PBFT  works?    


(Safety)
• A Byzantine primary can not “create” its own requests:
o Backup replicas only process authenticated requests from clients
• A Byzantine primary can not assign the same sequence
number to different messages:
o A correct backup sends a PREPARE message only for the first request it
receives for a certain sequence number n
o A correct backup sends a COMMIT message only if it receives PREPARE
messages from 2f other replicas
o There can not be two different quorums of 2f+1 out-of 3f+1 replicas that
send PREPARE messages for the same n and different requests
• These quorums overlap on at least f+1 replicas
• Thus, one correct replica should have send contradictory messages,
which is not possible.
• Consequently, all replicas execute the same sequence of
requests created by clients

© Alysson Bessani. All rights reserved. 35

Why  PBFT  works?    


(Liveness)
• A Byzantine primary can decide not to send PRE-
PREPARE messages for some requests or to skip
sequence numbers:
o However, when a backup replica receives a request from a client it starts a
timer, which is stopped when the request is executed
o If the timer expires, the backup trigger the view change protocol
o When enough backup replicas trigger a view change, a new primary is
defined and a new view is installed

• For each timer expiration, the timer value is doubled


• Liveness is ensured as long as eventually a timer
value suffices to finish the protocol execution with a
correct primary

© Alysson Bessani. All rights reserved. 36

18  
4/10/12  

PBFT:  Optimizations  I
• One of the key contributions of PBFT are its
optimizations

• Rationale for optimizations:


“Faults, concurrency and asynchrony are very rare”

© Alysson Bessani. All rights reserved. 37

PBFT:  Optimizations  II


• MAC vectors instead of digital signatures
o The use of PK signatures were the main reason for the poor performance
of previous protocols
o MAC vectors are weaker than digital signatures, so the former cannot
always be used to substitute the later

• Digest replies
o Instead of all replicas sending the reply of a request, the client can
choose just one replica to send the reply, the others only send a digest of
the reply to allow voting
o If the received reply is wrong, the client can ask for a (full) reply from other
replicas

• Batching
o Instead of running the agreement protocol for every request to be
executed, it can be done for request sets (batches)

© Alysson Bessani. All rights reserved. 38

19  
4/10/12  

PBFT:  Optimizations  III


• Read-only requests
o Read-only requests generally does not require ordering because they do
not change the system’ state
o All replicas can immediately reply to the client and it can finishes the read
if there are 2f+1 matching replies - instead of f+1, to ensure Linearizability
o Otherwise (due to faulty replicas or concurrency), the client retries the
request using the normal protocol

No  2f+1  matching  replies!

© Alysson Bessani. All rights reserved. 39

PBFT:  Optimizations  IV


• Tentative execution
o Replicas can tentatively execute a request when it is prepared and they
have committed all requests with lower sequence number
o This reduces the protocol latency from 5 to 4 communication steps
o The client needs to wait for 2f+1 matching replies from different replicas to
be sure that the execution order will eventually commit
o If the client do not receive these replies and a timer expires, it resends the
request without asking for tentative execution

© Alysson Bessani. All rights reserved. 40

20  
4/10/12  

References
• Schneider. Implementing Fault-Tolerant Services using the
State Machine Approach: a Tutorial. ACM Computing
Surveys 1990.
• Lamport. The Part-time Parliament. ACM TOCS 1998
• Oki & Liskov. Viewstamped Replication: A New Primary
Copy Method to Support Highly-Available Distributed
Systems. PODC’88
• Burrows. The Chubby Lock Service for loosely-coupled
distributed systems. OSDI’06
• Baker et al. Megastore: Providing Scalable, Highly
Available Storage for Interactive Services. CIDR’11
• Hunt et al. ZooKeeper: Wait-free Coordination for
Internet-scale Systems. USENIX’10

© Alysson Bessani. All rights reserved. 41

References
• Bolosky et al. Paxos Replicated State Machines as the
Basis of a High-Performance Data Store. NSDI’11
• Rao et al. Using Paxos to Build a Scalable, Consistent,
and Highly Available Datastore. VLDB’11
• Calder et al. Windows Azure Storage: A Highly Available
Cloud Storage Service with Strong Consistency. SOSP’11
• Castro & Liskov. Practical Byzantine Fault Tolerance.
OSDI’99
• Castro & Liskov. Practical Byzantine Fault Tolerance and
Proactive Recovery. ACM TOCS 2002
• Liskov. From Viewstamped Replication to Byzantine Fault
Tolerance. Replication: Theory and Practice, 2010

© Alysson Bessani. All rights reserved. 42

21  
4/10/12  

Part  II
BFT Literature Review

EuroSys 2012

© Alysson Bessani. All rights reserved. 43

Outline
• Improving BFT performance
• Robust BFT protocols
• Architectural hybridization
• Implementation techniques
• Complementary techniques for BFT

Note: there are other papers and other aspects, but this is my
selection given the time constraints we have

© Alysson Bessani. All rights reserved. 44

22  
4/10/12  

Improving  BFT  
Performance
• PBFT performance is competitive with crash fault-
tolerant systems, and in some cases even with non-
replicated systems
• However, in the expected common situation where
o There are no faults
o The system is synchronous
o There is no concurrency

• PBFT still requires 2(n-1)2+(n-1) messages and 5


communication steps (without optimizations)

Can  we  do  be\er?

© Alysson Bessani. All rights reserved. 45

Improving  BFT  
Performance
• Since PBFT publication, several works tried to
improve its performance
• Q/U – Query/Update (Abd-El-Malek et al, SOSP’05)
o “Pure” quorum-based protocol that works on asynchronous system
o Advantages:
• Improves the fault scalability of the system, i.e., the throughput of the
system does not drop dramatically when f increases
• Operations require only two communication steps (best case)
o Drawbacks:
• Sacrifices Liveness (Obstruction-freedom instead of Wait-freedom):
operations only terminate if there is no write contention on the object
• Requires n ≥ 5f +1

© Alysson Bessani. All rights reserved. 46

23  
4/10/12  

Improving  BFT  
Performance
• HQ-Replication (Cowling et al, OSDI’06)
o Combines quorum-based protocols with PBFT
• If there is no concurrency, executes a (f-dissemination BQS) write
protocol to change the system state
• If concurrency is detected, start PBFT to order concurrent requests
o Same advantages of Q/U, with the same Liveness guarantees of PBFT and
using only 3f+1 replicas

1  or  2
comm.
steps

© Alysson Bessani. All rights reserved. 47

Zyzzyva:  Speculative  BFT  


(Kotla  et  al,  TOCS  2009)
• The “final word” on high-performance BFT protocols
• Main idea: PBFT with speculative execution
o Each replica (speculatively) executes a request just after receiving the
sequence number of this request by the primary
o After executing the request they send a reply to the client
o The consistent state of the replicas only matter to clients, so let them verify
if all replicas are on the same state
o If there is some problem (e.g., the primary sends different operations to
different replicas), a correct client will detect it
o This client will inform the replicas, which must rollback to a safe state and
change the primary

• Improves latency and throughput on the best case


o Zyzzyva requires only 3 communication steps
o Zyzzyva requires only 2n message exchanges

© Alysson Bessani. All rights reserved. 48

24  
4/10/12  

Zyzzyva:  Speculative  BFT


• Best-case execution (synchronous and fault-free)
timeout

REQUEST

SPEC-­‐‑RESPONSE

1 Client  waits  for  3f+1  matching


replies  that  reflect  the  same
ORDER-­‐‑REQ
history
2

3 Replicas  speculatively
execute  the  requests  in  the  
4 order  given  by  the  primary

© Alysson Bessani. All rights reserved. 49

Zyzzyva:  Speculative  BFT Client  receives  less  than  3f+1


matching  replies  before  the
• Asynchrony or faulty replica timer    expires
timeout

REQUEST

COMMIT
1
LOCAL-­‐‑COMMIT
ORDER-­‐‑REQ
2
Replicas  see  that  there  are
SPEC-­‐‑RESPONSE
2f+1  replicas  that  matches
3 some  history  and  commits  it

© Alysson Bessani. All rights reserved. 50

25  
4/10/12  

Zyzzyva:  Speculative  BFT


Client  receives  non-­‐‑matching
replies  and  sends  a  POM  (Proof-­‐‑
Of-­‐‑Misbehavior)  message

• Malicious primary Correct  replicas  see  that  their


timeout histories  are  different  and
start  view  change  to  elect
a  new  primary.
REQUEST

POM
1
ORDER-­‐‑REQ
2
View LOCAL-­‐‑COMMIT
change
3
SPEC-­‐‑RESPONSE
Malicious  primary  send
different  ORDER-­‐‑REQ  to
4 different  replicas

© Alysson Bessani. All rights reserved. 51

Zyzzyva:  Speculative  BFT


• Comparison with other protocols (theory)

© Alysson Bessani. All rights reserved. 52

26  
4/10/12  

Zyzzyva:  Speculative  BFT


• Comparison with other protocols (experimental)
-­‐‑  n  =  4
-­‐‑  no  faults
-­‐‑  0  byte  requests
-­‐‑  null  operations

Zyzzyva  (batch):  84Kops/s

PBFT  (batch):  60Kops/s

Q/U  and  HQ:  23Kops/s


(quorum-­‐‑based  protocol
cannot  batch  messages)

© Alysson Bessani. All rights reserved. 53

Zyzzyva:  Speculative  BFT


• Zyzzyva is the fastest protocol one can devise for
ordering requests under the Byzantine fault model
• However, it is not perfect
o Speculative execution on servers might not be a good idea
• You need to be able to rollback to a committed state if a view
change is triggered
• This makes your server code much more complicated
o If you wait for replies from all replicas, you will always be waiting for the
slower one
• In non-synchronous networks you will have to calibrate your timeout
value carefully
o Zyzzyva is vulnerable to several attacks, just like PBFT

© Alysson Bessani. All rights reserved. 54

27  
4/10/12  

The  Next  700  BFT  


Protocols
• HQ and Zyzzyva are protocols with fast and slow
paths, being the slow path similar to PBFT
• Guerraoui et al (EuroSys’10) generalized this idea
with the ABSTRACT abstraction
o An ABSTRACT instance is just like a state machine replication, but abortale
o ABSTRACT instances are composable, i.e., if one instance aborts, it returns
enough information for clients to start another
o This allows the development of optimistic protocols that can revert to
more conservative approaches if the expected conditions are not meet
• Aliph BFT SMR protocol:
c
p1
p2 Quorum Chain Backup  (PBFT)
p3 Aborts  if Aborts  if
contention Never  aborts
p4 is  detected
asynchronous

© Alysson Bessani. All rights reserved. 55


ct to Work: Aliph ZLight, replies sent by replicas contain a digest of their his-
onstrate how we can build novel, tory. The client checks that the histories sent by the 3f + 1
cols, using Abstract. Our new pro- replicas match. If that is not the case, or if the client does
ves up to 30% lower latency and up not receive 3f + 1 replies, the client invokes a panicking
put than state-of-the-art protocols. mechanism. This is the same as in ZLight (Sec. 3.2): (i) the
h consisted in building two new in- client sends a PANIC message to replicas, (ii) replicas stop
requiring less than 25% of the code executing requests on reception of a PANIC message, (iii) PBFT Q/U HQ Zyzzyva Alip

The  Next  700  BFT  


ols, and reusing Backup (Sec. 3.3). replicas send back a signed message containing their his- Number of replicas 3f+1 5f+1 3f+1 3f+1 3f+
describe Aliph and then we evaluate tory. The client collects 2f + 1 signed messages containing Throughput (MAC ops at bottleneck server) 2+ 8f 2+4f 2+4f 2+ 3f 1+ f +
b b b
replica histories and generates an abort history. Note that, Latency (1-way messages in the critical path) 4 2 4 3 2

Protocols
unlike ZLight, Quorum does not tolerate contention: concur-
rent requests can be executed in different orders byTable 2. Characteristics of state-of-the-art BFT protocols. Row 1 is the number of replicas. Row 2 is the thr
different
replicas. This is not the case in ZLight, as requestsnumber of MAC operations at the bottleneck replica (assuming batches of b requests). Row 3 is the latency
are or-
Aliph are summarized in Table 2, dered by the primary.
of 1-way messages in the critical path. Bold entries denote protocols with the lowest known cost.
of [20]. In short, Aliph is the first
ol that achieves a latency of 2 one- 3f+1 2 3f+1 Number of MAC f+1 f+2 2(f+1) 2(f+1) f+2 f+1 Number of MAC When the client receives a correct rep
en there is no contention. It is also client operations per process client operations per process
the other hand, when the reply is not corr
h the number of MAC operations at not receive any reply (e.g., due to the B
ds to 1 (under high contention when r1 r1 discards the request), the client broadcast
nabled): 50% less than required by r2
to replicas. As in ZLight and Quorum, w
r2 a PANIC message, they stop executing
ract implementations: Backup (in- r3
r3 back a signed message containing their
uorum and Chain (both described Number of MACs collects 2f + 1 signed messages contain
r4
ance commits requests as long as r4
1 1
Number of MACs
carried by a message
f+1 2f+1 (f+1)(f+2) 2f+1 f+1 carried by a message and generates an abort history.
2
k failures, (b) client Byzantine fail- Chain’s implementation requires 3300
Quorum Chain panic and checkpoint libraries). Moreove
Quorum implements a very simple Figure 4. Communication pattern of Quorum.
(latency-­‐‑optimal) Figure 5. Communication pattern of Chain.
(throughput-­‐‑optimal) tocol in which the number of MAC oper
nd gives Aliph the low latency fla- neck replica tends to 1. This comes from
nditions are satisfied. On the other The implementation of Quorum is very simple. It requiresThe behavior of Chain, as described so far, is very sim- contention, the head of the chain can ba
actly the same progress guarantees 3200 lines of C code (including panicking and checkpointilar to the crash-tolerant protocol described in [29]. We tol- and tail do thus need to read (resp. write)
it commits requests as long as there libraries). Quorum makes Aliph the first BFT protocol to
erate Byzantine failures by ensuring: (1) that the content of to) the client, and write (resp. read) f +
es or Byzantine clients. Chain im-
ABSTRACT  is  a  nice  idea  that  really  simplifies  the
achieve a latency of 2 one-way message delays, while only is not modified by a malicious replica, (2) that no
a message of requests. Thus for a single request, he
ern and, as we show below, allows design  of  optimistic  state  machine  replication. 1+ f +1
requiring 3f + 1 replicas (Q/U [1] has the same latency replicabut
in the chain is bypassed, and (3) that the reply sent by b MAC operations. Note that all ot
ak throughput than all existing pro- the tail is correct. To provide those guarantees, our Chain re-
requires 5f + 1 replicas). Given its simplicity and efficiency, requests in batch, and have thus a lower n
d Chain share the panicking mech- it might seem surprising not to have seen it published liesearlier.
on a novel authentication method we call chain authen- erations per request. State-of-the-art pro
h is invoked by the client when it We believe that Abstract made ticatorswe
that possible because (CAs). CAs are lightweight MAC authenticators, re- require at least 2 MAC operations at th
© Alysson Bessani. All rights reserved. 56
st. quiring processes to generate (at most) f + 1 MACs (in con- (with the same assumption on batching
could focus on weaker (and hence easier to implement)
ng static switching ordering to or- trast to 3f + 1 in traditional authenticators). CAs guarantee this number tends to 1 in Chain can be in
Abstract specifications, without caring about (numerous)
that, if a client commits request req, every correct replica by the fact that these are two distinct repl
protocols: Quorum-Chain-Backup- difficulties outside the Quorum “common-case”.
executed req. CAs, along with the inherent throughput ad- request (the head) and send the reply (the
−etc. Initially, Quorum is active. As
vantages of a pipeline pattern, are key to Chain’s dramatic
due to contention), it switches to 4.1.2 Chain 4.1.3 Optimizations
throughput improvements over other BFT protocols. We de-
equests until it aborts (e.g., due to Chain organizes replicas in a pipeline (see Fig. 5). All
scriberepli-
below how CAs are used in Chain. When a Chain instance is executing in
witches to Backup, which executes cas know the fixed ordering of replica IDs (called chain or- generate CAs in order to authenticate the mes-
Processes requests as long as there are no server
When Backup commits k requests, sages
der); the first (resp., last) replica is called the head they send. Each CA contains MACs for a set of pro-
(resp., the Aliph implementation we benchmar
o Quorum, and so on. cesses called successor set. The successor set of clients con-
the tail). Without loss of generality we assume an ascending
28  
we slightly modified the progress proper
first describe Quorum (Sec. 4.1.1) ordering by replica IDs, where the head (resp., tail) sists of the f + 1 first replicas in the chain. The successor
is replica it aborts requests as soon as replicas de
full details and correctness proofs r1 (resp., r3f +1 ). set of replica ri depends on its position i: (a) for the first 2f contention (i.e. there is only one active
2s). Moreover, Chain replicas add an i
hen, we discuss some system-level In Chain, a client invokes a request by sendingreplicas,
it to thethe successor set comprises the next f + 1 replicas
ec. 4.1.3). in the chain, whereas (b) for other replicas (i > 2f ), the suc- abort history to specify that they aborted
head, who assigns sequence numbers to requests. Then, each
cessor set comprises all subsequent replicas in the chain, as

→ of contention. We slightly modified Bac
replica ri forwards the message to its successor ri , where
4/10/12  

Robust  BFT  Protocols


• All distributed protocols can have their
performance hurt by (Distributed) DoS attacks
o There is nothing we can do about that… we need communication and
timing assumptions in order to solve BFT consensus

• However, the quest for optimizing these protocols


for the “expected common case” made them
even more fragile to malicious behavior
o E.g., malicious clients can try to execute operations continuously on
systems like HQ and Q/U to make their operation extremely slow

• However, there are two attacks (≠ (D)DoS) that can


really hurt the performance of systems like PBFT and
Zyzzyva (Amir et al., DSN’ 08, TDSC 2011)

© Alysson Bessani. All rights reserved. 57

BFT  Under  A\ack


• Attack #1: causing view change with a malicious
client without using DoS
o On PBFT, clients need to send a request “signed” with an authenticator (a
MAC vector)
• Correct authenticator: MAC(c,0) MAC(c,1) MAC(c,2) MAC(c,3)
o A malicious client can send a corrupted authenticator that is valid for all
backup replicas but not for the primary
• Malicious authenticator: ?!#@$ MAC(c,1) MAC(c,2) MAC(c,3)
• The primary will ignore the client’s request
• Other replicas will accept it and, after their timer expires, will relay it to
the primary
o Since the primary will never accept this request, other replicas will start a
view change after a second timeout
o Conclusion: the use of authenticators allow faulty clients to force view
changes as they wish

© Alysson Bessani. All rights reserved. 58

29  
4/10/12  

BFT  Under  A\ack


• How to “patch” attack #1’ vulnerability?
o Make clients sign (not with MAC vectors) their messages
o Digital signatures (like RSA) ensure that if some correct server authenticate
the message, then all correct servers will authenticate the message
o Performance issues:
• Clients generate signatures, servers only verify one signature per
request
• Operation’ latency increases by a signature (~5 ms on standard
hardware) plus a verification (~0.5 ms)
• Throughput becomes limited by the amount of signatures each server
can verify per second
o The mentioned single core machine cannot do more than 2Kops/s
o But a 4-year-old quad-core machine can do ~40Kops/s

© Alysson Bessani. All rights reserved. 59

BFT  Under  A\ack


• Attack #2: degrading performance with a faulty
primary
o A faulty primary must order a request before other replicas timer expires
for this request
o Assuming Ttimeout = 100 ms, if a faulty primary delays the ordering of each
request by 90 ms, a view-change will not be triggered
o Nonetheless, the performance of the system will drop dramatically
o This attack can be even more devastating if combined with attack #1,
since for each view change Ttimeout is doubled
o Conclusion: a faulty primary can inject a delay of almost Ttimeout ms on
each request processing, making the end-to-end performance of the
system orders of magnitude worse than expected

© Alysson Bessani. All rights reserved. 60

30  
4/10/12  

BFT  Under  A\ack


• How to mitigate attack #2?
o Solution 1: use decentralized (leader-free) protocols
• The request’ sequence number is not defined by a primary
• Replicas will propose their set of pending requests for ordering in a
decentralized consensus (Moniz et al, TDSC 2011)
• Whether or not this approach works depends on how similar the
proposals are, i.e., if all replicas receive client’s requests on the same
order (which happens very often on HUB-based networks)
o Solution 2: monitor the primary’s performance and start a view change if
it’s too low
• Problem is “how to define the threshold between an unstable network
and a faulty primary”
• A wrong view change can hurt the protocol’ performance
o Solution 3: rotate the primary periodically

© Alysson Bessani. All rights reserved. 61

Protocols  Solving  these  Issues


• Prime (Amir et al, DSN’08, TDSC 2011)
Identified
oAMIR ET AL.: these
PRIME:problems for REPLICATION
BYZANTINE the first time in their DSN’08
UNDER ATTACKpaper
o The Prime protocol adds a pre-order phase to PBFT
operation across all links is at most
ð2f þ 1Þsop . During the Preordering
tion is sent to between 2f and 3f se
least 2fsop . Therefore, reconciliation
same amount of aggregate bandw
Fig. 2. Fault-free operation of Prime (f ¼ 1).
• Aardvark (Clement et al, NSDI’09) semination. Note that a single serve
o PBFT made robust
one reconciliation part per operation
Summary of normal-case operation. To summarize the
at least f þ 1 correct servers share t
Preordering and Global Ordering subprotocols, Fig. 2 follows
• Spinning
the path of (Veronese et al, SRDS’09)
a client operation through the system during 5.4 The Suspect-Leader Subpr
normal-case operation.
o Rotating-coordinator BFT SMR The operation is first preordered in There are two types of performan
two rounds (PO-REQUEST and PO-ACK), after which its mounted by a malicious leader.
preordering is cumulatively acknowledged (PO-SUMMARY).
© Alysson Bessani. All rights reserved. PREPARE
62 messages at a rate slower
When the leader is correct, it includes, in its next PRE- by the protocol. Second, even if
PREPARE, the set of at least 2f þ 1 PO-SUMMARY messages PREPARE messages at the correct r
that prove that at least 2f þ 1 servers have preordered the include a summary matrix that do
operation. The PRE-PREPARE flooding step (not shown) runs up-to-date PO-SUMMARY message
in parallel with the PREPARE step. The client operation will be This can prevent or delay preor
executed once the PRE-PREPARE is globally ordered. Note becoming eligible for execution.
that in general, many operations are being preordered in 31  
The Suspect-Leader subprotocol
parallel, and globally ordering a PRE-PREPARE will make against these attacks. The protocol c
many operations eligible for execution. isms that work together to enforce t
leader. The first provides a mean
5.3 The Reconciliation Subprotocol servers can tell the leader which P
4/10/12  

Robust  BFT  SMR


• Clement et al. (NSDI’09) proposes a variation of PBFT
that implements robust state machine replication
o The name of the protocol is Aardvark J

• By robust, it means:
o Maintains a stable performance even when under attack by f malicious
replicas and an unbounded number of clients

• Three main differences (when compared with PBFT):


o Clients must sign requests
• to avoid malicious clients provoking view changes
o Resource Isolation
• to resist denial of service attacks against network interfaces
o Regular view changes
• to avoid performance degradation attacks

© Alysson Bessani. All rights reserved. 63

Robust  BFT  SMR:  


Replica  Architecture
• Every replica needs n
NICs (one to each other
replica plus one to
clients)
• This makes the system
resilient to DoS network
attacks from faulty
replicas and clients
• To help resist DoS attacks,
there are specific
algorithms to verify client
requests and process
replica messages

© Alysson Bessani. All rights reserved. 64

32  
4/10/12  

Robust  BFT  SMR  


(client  request  verification)
• Client messages have
both a MAC and a
signature
o Why?
• Each reply is cached to
deal with retransmissions
• Clients that misbehave
are blacklisted
• ”Redundancy” and
”once per view checks”
take care of replay
attacks
o Clients need to sign
different requests to make
them valid

© Alysson Bessani. All rights reserved. 65

Robust  BFT  SMR  


(replica  msg  processing)
• If some replica is
sending 20 times more
messages than the
others, blacklist it
• To avoid resource
exhaustion, messages
are processed on a
round robin fashion
• Only process catch up
messages if the system
is idle

© Alysson Bessani. All rights reserved. 66

33  
4/10/12  

Spinning
• A protocol build upon PBFT, but with a modification
based on a simple idea:
o PBFT’s problem is that a malicious primary can keep ordering requests very
slowly without triggering view changes
o So, why not change view after each message commit?
o in this way, the sequence number of a message matches exactly the view
number of its delivery

• Potential problem:
o The view change protocol is complex and costly
o But it is not a problem: the view change will deterministically happen after
every committed message, so it is not necessary to have a special
protocol to change primary

• The resulting protocol (Spinning) makes the primary


role rotates to all servers

© Alysson Bessani. All rights reserved. 67

Spinning
• Example execution of Spinning:
o first request is ordered by s1, which is the primary of view v
o second request is ordered by s2, which is the primary of v+1
o …

© Alysson Bessani. All rights reserved. 68

34  
4/10/12  

Spinning:  Performance
• Changing primary improves or degrades
performance in fault-free executions?

No,  the  performance


is  be\er!

The  primary  extra


load  is  evenly  
distributed  between  
-­‐‑  n  =  4
-­‐‑  no  faults all  replicas
-­‐‑  0  byte  requests
-­‐‑  null  operations

© Alysson Bessani. All rights reserved. 69

Spinning:  Performance
• What happens when a latency is injected by a
faulty primary?
No  delay Spinning  
performance  
degrades  much  
slower  than  PBFT

Malicious  primaries  
can  only  degrade  the  
performance  of  the  
system  in  f  out-­‐‑of  n  
protocol  executions
Amount  of  delay  injected

© Alysson Bessani. All rights reserved. 70

35  
4/10/12  

Spinning:  Issues
• Without the repair procedure of view changes, how
replicas recover from a malicious primary on some view?
o Merge operation: joins one or more faulty views (i.e., with faulty primaries) with
a correct view (i.e., with correct primary)
o The idea is very similar to PBFT’s view change: the new correct primary will read
the state of the system and proceed ordering requests ensuring the protocol
invariants

• Faulty replicas can force merge operations periodically


o To avoid that, after a merge operation, the primary of the most recent merged
view is put on a blacklist
o Every time a replica goes to a black list, it stays there for a number of turns (n
views) equal to the number of times it was blacklisted
• First time, loses 1 turn; second time, loses 2 turns, and so on…
o Blacklisted replicas cannot be primary on their views, but otherwise can
participate on the protocol as a backup

© Alysson Bessani. All rights reserved. 71

Architectural  
Hybridization
• Motivation: BFT in Homogeneous Systems is Expensive

PBFT

Zyzzyva

• At least 3f+1 replicas


• At least 3 communication steps for establish agreement
(non-speculative normal case operation)

© Alysson Bessani. All rights reserved. 72

36  
4/10/12  

Architectural  
Hybridization
• Is it possible to do better?
1- Less than 3f+1 replicas to tolerate f Byzantine faults?
• Homogeneous non-synchronous systems require 3f+1 replicas

(and, at the same time)


2- Less than 3 communication steps to establish agreement
• It is possible to solve consensus with 2 communication steps if there
are 5f+1 replicas (Martin & Alvisi, TDSC 2007)

• Hybrid distributed systems (Veríssimo, SIGACT News


2006) with local trusted components can do that!

© Alysson Bessani. All rights reserved. 73

History:  Trusted  
Components  and  BFT
• (Correia et al, SRDS’02) BFT Reliable Multicast using TTCB,
a distributed real-time trusted component
• (Correia et al, SRDS’04) BFT SMR with 2f+1 replicas using a
distributed trusted component
• (Chun et al, SOSP’07) PBFT with 2f+1 replicas using a
complex local trusted component (A2M)
• (Levin et al, NSDI’09) A2M reduced to a simple secure
counter (TrInc), that can be build using a TPM chip
• (Veronese et al, DI-FCUL TR 2008, TC 2011) MinBFT shows
that with a trusted counter one can reduces BFT SMR to
viewstamped replication/Paxos
• (Kapitza et al, EuroSys’12) BFT SMR with only f+1 active
replicas using a trusted counter efficiently implemented
in FPGA

© Alysson Bessani. All rights reserved. 74

37  
4/10/12  

MinBFT:  System  Model


§ Eventually synchronous system

§ Authenticated and reliable channels

§ Local Trusted Component (can only crash)

§ Secure hash function

§ n ≥ 2f+1 replicas, at most f can suffer Byzantine faults

© Alysson Bessani. All rights reserved. 75

MinBFT  Trusted  Component  


(USIG  -­‐‑  Unique  Sequential  Identifier  Generator)
• A minimal local trusted component containing
o A cryptosystem for authenticating its outputs
o A monotonic counter

• Storage (on the USIG of process i):


o A private-key PrKi
o An unbounded counter count
• Operations:
o createUI(m)
• Assigns a counter value c_val to a message m
• Increments the counter: count++
• Outputs UI = <c_val, i, H(m), SignPrKi>
o verifyUI(j, PuKj, UI, m)
• Verifies if UI was generated by createUI(m) on the USIG of process j. This
verification uses j’s USIG public key PuKj
• Outputs true or false

© Alysson Bessani. All rights reserved. 76

38  
4/10/12  

PBFT  x  MinBFT
pre-­‐‑
request prepare prepare commit reply
Client request prepare commit reply

Replica  0

Replica  1

Replica  2

Replica  3
MinBFT
PBFT

Benefits of MinBFT
• 2f+1 instead of 3f+1 replicas (minimal for general SMR)
• 2 steps instead of 3 on the normal case (minimal for consensus)
• USIG is arguably a minimal trusted component

© Alysson Bessani. All rights reserved. 77

MinBFT:  Normal  Case


§ Primary defines the order
§ The sequence number is
the USIG counter value
assigned to the message REQUEST PREPARE COMMIT REPLY

§ Servers accepted f+1 Client

commits Replica  0

§ Each one should have a Replica  1


valid UI from its sender
USIG Replica  2
§ Execution follows the
order on PREPARE’ UI
§ Client waits for f+1
matching replies

© Alysson Bessani. All rights reserved. 78

39  
4/10/12  

MinBFT:  View  Change


REQ-­‐‑VIEW-­‐‑ VIEW-­‐‑CHANGE NEW-­‐‑VIEW
CHANGE
<  V +  1  > < Clast,  UI  > <  V,    UI  >
Replica  0
primary  v  

§  When  a  request  is  received,  


a  timer  Texec  is  started
Replica  1
primary  v+1   §  f+1  REQ-­‐‑VIEW-­‐‑CHANGE

§  f+1  VIEW-­‐‑CHANGE
Replica  2

© Alysson Bessani. All rights reserved. 79

MinBFT:  Why  it  Works?


• Uses 2f+1 replicas with quorums of size f+1
• One replica in the intersection of any two quorums

write read
set set

• What if this replica is faulty?


o It cannot lie because every value is associated with a UI
o Different messages will have different UIs

• Practical effects:
o A primary replica cannot send two PREPARE messages with different
messages and the same sequence number (UI)
o A backup replica cannot lie about the value proposed by the primary

© Alysson Bessani. All rights reserved. 80

40  
4/10/12  

USIG  Implementation:  VM


State  Machine
Code USIG
Replica BFT  Rep.  Alg. Daemon
Architecture
OS OS
Virtual  Machine  Monitor
Hardware
• The BFT protocol + application runs on a untrusted virtual
machine that have access to the outside network
• The USIG is implemented as a daemon on a trusted
virtual machine (e.g., Xen’s Dom0)
• They communicate by TCP sockets

© Alysson Bessani. All rights reserved. 81

USIG  Implementation:  VM


0 PuK1 1 PuK0
Public-key USIG PuK2 PuK2
(RSA or ESIGN) PrK0
m,S(m)
PrK1
• Only createUI
requires trusted
component 2 PuK0
access PuK1
PrK2

0 1
HMAC USIG
• Both createUI and SK SK
m,HMAC(m)
verifyUI requires
trusted
component 2
access
SK

© Alysson Bessani. All rights reserved. 82

41  
4/10/12  

USIG  Implementation:  VM


• MinBFT can be implemented with both variants, but
the HMAC version, albeit potentially more efficient,
can be more difficult to manage
o Symmetric keys have a short life-cycle than PK keys
o How to refresh them without interrupting the protocol?

• MinZyzzyva requires clients to verify UIs, meaning


that clients need to have a trusted component with
the shared secret key
o This is infeasible in practice
o Conclusion: MInZyzzyva only works with PK USIG

© Alysson Bessani. All rights reserved. 83

USIG  Implementation:  TPM


State  Machine
Code

Replica BFT  Rep.  Alg.


Architecture
OS
TPM
Hardware

• A public-key (2048-bit RSA) implementation of the USIG


service
• The private key and the counter are stored in the TPM
(version 1.2 or higher)
• BFT protocol access a TPM driver to issue commands

© Alysson Bessani. All rights reserved. 84

42  
4/10/12  

USIG  Implementation:  TPM


createUI(m), as called by replica i
• First, calculate hm = H(m)
• Send the following commands to the TPM (details omitted)
1. TPM_EstablishTransport
2. TPM_ExecuteTransport(TPM_IncrementCounter)
3. TPM_ReleaseTransportSigned(hm)
• The last command returns:
o A 2048-bit RSA signature S of <TPM_IncrementCounter; c_val; hm >
• This UI value is <<TPM_IncrementCounter; c_val; hm>, S>
Proves  that  the  counter  was  incremented
Proves  that  it  was  generated
Association  of  a  counter by  TPM  on  replica  i
value  with  the  message

© Alysson Bessani. All rights reserved. 85

USIG  Implementation:  TPM


verifyUI(j, PKj , UI, m)
• Verify the format of the data structure (e.g., there is an
increment on the TPM counter)
• Verify if hm = H(m) is on the UI
• Verify the signature using the public key PKj
• If all these checks are passed, return true, otherwise, return
false

Important Remark: This function don’t need to access the TPM

© Alysson Bessani. All rights reserved. 86

43  
4/10/12  

USIG  Performance:  
VM  x  TPM
• TPM USIG
o Signature: 797 ms
o One increment by 3.5 seconds
o 32-bit monotonic counter

• VM USIG

© Alysson Bessani. All rights reserved. 87

Implementation  
Techniques
• BASE (Castro et al, TOCS 2003)
o Define useful abstractions to implement diverse BFT services
• Parallel execution of requests (Kotla & Dahlin, DSN’04)
o Some service requests do not require total order execution (writes on different files of
a file system), and can be executed in parallel
o May improves the throughput of certain services (e.g., distributed FS)

• On-Demand Replica Consistency (Distler & Kapitza, EuroSys’11)


o Breaks the service state in partitions
o Each partition executes a subset of the submitted requests
o Specially useful if executing a request require a lot of resources

• Separating Agreement from Execution (Yin et al, SOSP’03)


• UpRight (Clement et al, SOSP’09)
• ZZ (Wood et al, EuroSys’11)

© Alysson Bessani. All rights reserved. 88

44  
4/10/12  

Classical  BFT  SMR  


Architecture
• Clients sign requests and sent them to the replicas
• Replicas verify client signature and run an agreement protocol
to establish total order
• Replicas execute the request and send the reply to the client

verify agreement
sign

execute

© Alysson Bessani. All rights reserved. 89

Separating  Agreement/
Execution  Architecture
• Separate servers in two layers: agreement and execution
• Clients sign requests, agreement replicas verify it
• 3f+1 replicas to agree on requests sequence number and 2f+1
for executing the requests

agreement
verify
sign ordered  request

execute

reply

© Alysson Bessani. All rights reserved. 90

45  
4/10/12  

Agreement/Exectuion  
Problem
• In data centers, clients usually are also servers… they have to
be fast (generating signatures is very costly)
o E.g.: web services (BFT clients) access a BFT database (execution)
• These web service hosts need to serve lots of clients (high
throughput) and they are paid by the service provider

BFT  clients agreement


execution
clients Datacenter

Internet

© Alysson Bessani. All rights reserved. 91

UpRight  Architecture
agreement
• A new layer need to be deployed
to avoid client signatures: request
quorum (RQ)
• Servers on this layer store the
request and generate a matrix 3.  Request  hash
signature to be ordered by the +  sequence
agreement layer 2.  Request  hash  +
 matrix  signature number
• The execution layer fetches the
request after ordered from RQ,
execute it and send a reply verify 4.  Exec.  replicas  fetch
ordered  request
sign

5.  Req.  is  obtained


execute
1.  Requests  are  sent
7.  Reply  is  received
6.  Reply  is  sent

© Alysson Bessani. All rights reserved. 92

46  
4/10/12  

UpRight  Remarks
• Number of faults tolerated:
o Request quorum: nr ≥ 2u + r + 1
o Ordering: no ≥ 2u + r + 1
o Execution: ne ≥ u + r + 1

• Clients only do MACs, not signatures… it is more


aligned with cloud applications (clients are also
servers of application services)
• Speculative execution is not employed in the
service, but only on order assignment (execution
servers are just like clients receiving replies from
Zyzzyva)

© Alysson Bessani. All rights reserved. 93

ZZ  Architecture
• Key observation: In fault free executions, f+1 execution
replicas are enough for the execution layer
• In server consolidation scenarios, these extra f execution
replicas can be dormant VMs

agreement
verify
sign ordered  request

execute
reply

ZZZ
© Alysson Bessani. All rights reserved. 94

47  
4/10/12  

Complementary  Techniques  
for  BFT:  Fault  Recovery
• Problem with tolerating f faults:
o If an intelligent adversary is able to compromises f machines, given
enough time, he/she will compromise f+1 (or more)
o Solution: Proactive Recovery (Castro & Liskov, TOCS 2002)
• Replicas (compromised or not) are cleaned periodically
• PR requires a local trusted real-time component
o Otherwise, it may be vulnerable to certain attacks (Sousa et al, DSN’05)
o Most proactive recovery systems are vulnerable (Sousa et al, HotDep’06)
• To ensure availability you may also need 2k extra
replicas if at most k recover at the same time
Outdated  …

© Alysson Bessani. All rights reserved. 95

Complementary  Techniques  
for  BFT:  Diversity
• f-fault-tolerant replicated systems are useful only
if faults are not correlated
• It usually requires diverse replicas
o Different administrative domains
o N-version programming (effective?)
o Obfuscation, Memory randomization (effective?)
o Use of different components like databases (Gashi et
al, TDSC 2007), file systems (Castro et al, TOCS 2003)
and operating systems (Garcia et al, DSN’11) is
effective!
• What about deploying and managing diversity?

© Alysson Bessani. All rights reserved. 96

48  
4/10/12  

References
• Abd-El-Malek et al. Fault-scalable Byzantine Fault-
tolerant Services. SOSP’05
• Cowling et al. HQ-Replication: a Hybrid Quorum Protocol
for Byzantine Fault Tolerance. OSDI’06
• Kotla et al. Zyzzyva: Speculative Byzantine Fault
Tolerance. ACM TOCS 2009 (prel. SOSP’07)
• Guerraoui et al. The Next 700 BFT Protocols. EuroSys’10
• Amir et al. Byzantine protocols Under Attack. IEEE TDSC
2011 (prel. DSN’08)
• Moniz et al. RITAS: Services for Randomized Intrusion
Tolerance. IEEE TDSC 2011
• Veronese et al. Spin One’s Wheels? Byzantine Fault
Tolerance with a Spinning Primary. SRDS’09

© Alysson Bessani. All rights reserved. 97

References
• Clement et al. Making Byzantine Fault-tolerant Systems
Tolerate Byzantine faults. NSDI’09
• Martin & Alvisi. Fast Byzantine Paxos. IEEE TDSC 2007
• Veríssimo. Travelling through Wormholes: a new look at
Distributed System Models. SIGACT News 2006
• Correia et al. Hybrid Byzantine-resilient Reliable Multicast.
SRDS’02
• Correia et al. How to Tolerate Half less One Byzantine
Faults in Practical Distributed Systems. SRDS’04
• Chun et al. Attested append-only memory: Making
adversaries stick to their word. SOSP’07
• Levin et al. TrInc: Small Trusted Hardware for Large
Distributed Systems. NSDI’09

© Alysson Bessani. All rights reserved. 98

49  
4/10/12  

References
• Veronese et al. Efficient Byzantine Fault Tolerance. IEEE
TC 2011. to appear (prel . DI-FCUL Tech. Report 2008)
• Kapitza et al. ChepBFT: Resource-efficient Byzantine
Fault Tolerance.EuroSys’12
• Castro et al. BASE: Using Abstractions to Improve Fault
Tolerance. ACM TOCS 2003
• Kotla & Dahlin. High-throughput Byzantine Fault
Tolerance. DSN’04
• Distler & Kapitza. Increasing Performance in Byzantine
Fault-Tolerant Systems with On-Demand Replica
Consistency. EuroSys’11
• Yin et al. Separating Agreement from Execution in
Byzantine Fault-tolerant Services. SOSP’03

© Alysson Bessani. All rights reserved. 99

References
• Clement et al. UpRight Cluster Services. SOSP’09
• Wood et al. ZZ and the Art of BFT Execution.
EuroSys’11
• Sousa et al. How resilient are distributed f fault/
intrusion-tolerant systems? DSN’05
• Sousa et al. Hidden Problems of Asynchronous
Proactive Recovery. HotDep’07
• Gashi et al. Fault tolerance via diversity for off-the-
shelf products: a study with SQL database servers.
IEEE TDSC 2007
• Garcia et al. OS Diversity for Intrusion tolerance:
Myth or Reality? DSN’11

© Alysson Bessani. All rights reserved. 100

50  
4/10/12  

Other  Aspects
Wide-area replication
• Wester et al. Tolerating Latency in Replicated State Machines Through
Client Speculation. NSDI’09
• Mao et al. Towards Low Latency State Machine Replication for Uncivil
Wide-area Networks. HotDep’09
• Amir et al. STEWARD: Scaling Byzantine Fault-Tolerant Replication to Wide-
Area Networks. IEEE TDSC 2010
• Veronese et al. EBAWA: Efficient Byzantine Agreement for Wide-Area
Networks. HASE’10
Weak consistency & others
• Li & Mazières. Beyond One-third Faulty Replicas in Byzantine Fault Tolerant
Systems. NSDI’07
• Singh et al. Zeno: Eventually Consistent Byzantine-Fault Tolerance. NSDI’09
• Sen et al. Prophecy: Using History for High-Throughput Fault Tolerance.
NSDI’10
• Bessani et al. Active Quorum Systems. HotDep’10
© Alysson Bessani. All rights reserved. 101

Part  III
Applications, Open Problems & Practice

EuroSys 2012

© Alysson Bessani. All rights reserved. 102

51  
4/10/12  

BFT  Applications
• Distributed File Systems
o BFS (Castro & Liskov, TOCS 2002), BASEFS (Castro et al, TOCS 2003)
o Oceanstore (Kubiatowicz et al, ASPLOS’00), Farsite (Adya et al, OSDI’02)
o UR-HDFS (Celement et al, SOSP’09)

• Database replication
o Commit Barrier Scheduling (Vandiver et al, SOSP’07)
o Byzantium (Garcia et al, EuroSys’11)

• Coordination Service
o DepSpace (Bessani et al, EuroSys’08)
o UR-Zookeeper (Clement et al, SOSP’09)

• Naming Services
o DNS (Cachin & Samar, DSN’04)
o LDAP (FCUL, unpublished)

© Alysson Bessani. All rights reserved. 103

BFT  Real  Applications?


• Tolerating non-malicious Byzantine faults
o Memory and disk corruptions are relatively common at large scale
o These problems are detected and corrected using end-to-end integrity
checks (i.e., crypto hashes stored separately)
o Can we use BFT SMR to tolerate this?
• Where these faults happen?
• Are there simple techniques?
o What about software (heisen)bugs?

• General fault tolerance


o BFT is a general technique for fault tolerance
o The next step on fault tolerance evolution

• Malicious Byzantine faults


o What if Byzantine faults are the result of successful attacks?
o BFT is not enough, we need Intrusion tolerance

© Alysson Bessani. All rights reserved. 104

52  
4/10/12  

Intrusion  Tolerance  (InTol)


• Coined by Joni Fraga and David Powell
“A Fault- and Intrusion-Tolerant File System”, IFIP SEC,1985

• An intrusion-tolerant system can maintain its


security properties (confidentiality, integrity
and availability) despite some of its
components being compromised

• Appeal: since it’s impossible to prove that a


system has no vulnerabilities, it is more safe
to assume that intrusions can happen

© Alysson Bessani. All rights reserved. 105

The  Promise  of  BFT


• From PBFT’ abstract (Castro & Liskov, OSDI’99):
“We believe that Byzantine fault-tolerant
algorithms will be increasingly important in
the future because malicious attacks and
software errors are increasingly common
and can cause faulty nodes to exhibit
arbitrary behavior.”

© Alysson Bessani. All rights reserved. 106

53  
4/10/12  

InTol  vs  BFT


• BFT replication protocols are a key
mechanism for intrusion-tolerant systems
o However, I/T systems assume faults may be
caused by malicious and intelligent adversaries
• Differences and I/T added requirements:
o Unfavorable executions
o Diversity
o Recovery and Self-healing
o Confidentiality

© Alysson Bessani. All rights reserved. 107

Intrusion-­‐‑tolerant  Systems
• Definition
An intrusion-tolerant system is a replicated system
in which a malicious adversary needs to
compromise more than f out-of n components in
less than T time units in order to make it fail.

Comments:
• Similar to BFT with proactive recovery
• T and f make little sense without previous requirements
• Other definitions are possible

© Alysson Bessani. All rights reserved. 108

54  
4/10/12  

Problems  of  Intrusion  


Tolerance
• Originally described in (Bessani, WRAITS’11)

• 3 Solved
• 2 Half-solved
• 5 Open

© Alysson Bessani. All rights reserved. 109

Solved  Problem:  
Performance
1990s: first implementations appeared with
useful performance (Rampart, SecureRing)
1999: Castro & Liskov’ PBFT
2000s: PBFT-like protocols with better
performance in certain favorable conditions
Minimal Maximal
Latency Throughput

PBFT  (1999) Zyzzyva  (2007) Next  700  BFT  (2010)

© Alysson Bessani. All rights reserved. 110

55  
4/10/12  

Solved  Problem:  
Resource  Efficiency
• Separating agreement from execution
o 3f+1 replicas for ordering requests
o 2f+1 replicas for executing requests
o f+1 exec. replicas may be sufficient with VMs
• Trusted components (e.g., TPM)
o Agreement with 2f+1 replicas (instead of 3f+1)

Minimal:
-­‐‑  Number  of  replicas
-­‐‑  Communication  steps
PBFT -­‐‑  Trusted  component
MinBFT
© Alysson Bessani. All rights reserved. 111

Solved  Problem:  
Recovery
• Problem with tolerating f faults:
o If an intelligent adversary is able to compromises f machines, given
enough time, he/she will compromise f+1 (or more)

• Solution: Periodic (Proactive) Recovery


o Replicas (compromised or not) are cleaned periodically

• Requires a trusted real-time component

© Alysson Bessani. All rights reserved. 112

56  
4/10/12  

Half-­‐‑solved  Problem:  
Diversity
• f-fault-tolerant replicated systems are
useful only if faults are not correlated
• It usually requires diverse replicas
o Different administrative domains
o N-version programming (effective?)
o Obfuscation, Memory randomization
(effective?)
o Use of different components like databases,
file systems, operating systems is effective!
• What about deploying diversity?

© Alysson Bessani. All rights reserved. 113

Half-­‐‑solved  Problem:  
Robust  Performance  of  BFT
• BFT replication is
o very efficient in favorable conditions
o very inefficient in unfavorable conditions
• What about a balance?
o efficient enough in most conditions
• Design principles (Prime, Aardvark, AQS)
o No complex optimizations
o Use public-key crypto if needed
o Exploit application semantics for optimizations

© Alysson Bessani. All rights reserved. 114

57  
4/10/12  

Open  Problems:  
Intrusion  Reaction
• Most BFT protocols only tolerate faults and
don’t take actions against malicious replicas
(others than what is required for correctness)
• In practice, replica behavior needs to be
monitored and recovery actions need to be
executed if intrusions are detected
• Research question: Given the specification
of a protocol, how to automatically detect
misbehaviors and react to them?

© Alysson Bessani. All rights reserved. 115

Open  Problems:  
Time-­‐‑bounded  State  Transfer
• Recall that the window of vulnerability of an
intrusion-tolerant system is bounded by T
o Every T time units all replicas are rejuvenated
o Every replica must take no more than T/n time units to
recover itself, i.e., take the following steps:
• Shutdown
• Chose a clean (and different) OS image
• Boot
• Fetch and validate service state
• Research question: How to bound the last step?

© Alysson Bessani. All rights reserved. 116

58  
4/10/12  

Open  Problems:  
Diversity  Management
• Research question: Assume we have a pool
of diverse configurations for the system
replicas, how to choose the best set?
o The idea is to minimize the number of shared
vulnerabilities/bugs among any two replicas
o This is even more complicated if replicas change
at runtime
• Besides that, diversity means management
of complexity. How to deal with it?

© Alysson Bessani. All rights reserved. 117

Open  Problems:  
Confidential  Operation

store(k,v)
CLIENTS
SERVERS

read(k)

• One intrusion → Data leaked


• Threshold crypto/secret sharing help in some cases,
e.g., storage systems (Bessani et al, EuroSys’08)
• Homomorphic crypto can be solution

© Alysson Bessani. All rights reserved. 118

59  
4/10/12  

Open  Problems:  
Graceful  Degradation
• Our intrusion tolerance definition is very strict (all-or-
nothing)
• Research question: How to specify degraded
behaviors for intrusion tolerant systems in general?
• Examples: What if …
o … there are more than f faulty replicas?
o … the system is completely asynchronous?

© Alysson Bessani. All rights reserved. 119

SMR  Programming  Model


• Basic client-server synchronous RPC

execute(command){
       //change  state
reply  =  invoke(command);        return  reply;
}

© Alysson Bessani. All rights reserved. 120

60  
4/10/12  

SMR  Programming  Model


• What about server-initiated communication?
o Client needs to poll the server for updates

• What about asynchronous RPC?


o Do a synchronous RPC at the client-side on a separated thread

• What about nested calls?


o Requires special support for the API

• What about multithreading?


o Remove it!
o The replication library provides nonces and timestamps for dealing
with other sources of non-determinism

© Alysson Bessani. All rights reserved. 121

BFT-­‐‑SMaRt
• Started in 2006, as a Byzantine Paxos
implementation on the Neko simulator
• Later extended to be the replication layer of
DepSpace (Bessani et al, EuroSys’08)
• Currently used/maintained by researchers in
Portugal, Brazil and Germany
• Sponsored by:

FCT Fundação para a Ciência e a Tecnologia

© Alysson Bessani. All rights reserved. 122

61  
4/10/12  

BFT-­‐‑SMaRt
• BFT-SMaRt design principles:
o Java-based (for security and correctness reasons)
o No optimizations that bring complexity
o Modularity
o Features: Extensible API, State Management, Reconfiguration

• Implements a protocol very similar to PBFT, but modular;


Mod-SMaRt (Sousa & Bessani, EDCC’12)
© Alysson Bessani. All rights reserved. 123

Dealing  with  Complexity


Features  X  Complexity
We  are  here!

-­‐‑  Java  instead  of  C++


SMR  Complexity
(LoCs  &  Module -­‐‑  Avoid  overcomplicated  optimizations
dependencies) -­‐‑  Number  of  lines  of  code:  8399
   (PBFT:  ~20K  LoCC;  UpRight:  ~22K  LoJC)
-­‐‑  Number  of  classes/interfaces:  90

© Alysson Bessani. All rights reserved. 124

62  
4/10/12  

Modularity

© Alysson Bessani. All rights reserved. 125

BFT-­‐‑SMaRt  Replica  
Architecture Timers  to  trigger
Execute
operation
8 regency  change

2
Protocol 7
3
1 Core
6

4
5

Signature  are
verified  here
Send  to Receive  from
secure  sockets secure  sockets
© Alysson Bessani. All rights reserved. 126

63  
4/10/12  

BFT-­‐‑SMaRt  Software
• It is a library (.jar file) that must be linked with the
client and the servers…
• There is no service/component that must be
deployed or managed besides the BFT client and
server
• Available at http://code.google.com/p/bft-smart/
• Current version: 0.7
o Many disruptive features are being integrated in the code
o API changes will happen
o Bugs remain
o Any help is welcome!

© Alysson Bessani. All rights reserved. 127

BFT-­‐‑SMaRt  Software
BFT-SMaRt.jar

CounterClient uses ServiceProxy


+main() +invoke() ServiceReplica
+constructor()

uses
Config.
CounterServer
Config. +main()

© Alysson Bessani. All rights reserved. 128

64  
4/10/12  

Configuration
• A directory containing three things
o The keys directory with the process i privatekeyi file and publickeyj for
every other process j
• In the future these keys will go to keystores/trustores

o hosts.config: IP:port of the n replicas


#id address port (0 to n-1 are replicas)
0 127.0.0.1 11000
1 127.0.0.1 11010
2 127.0.0.1 11020
3 127.0.0.1 11030

o Do not use consecutive ports (each replica uses its port p, plus p+1)

© Alysson Bessani. All rights reserved. 129

Configuration
• system.config: a Java properties file containing the
system parameters
system.authentication.hmacAlgorithm = HmacSHA1
system.servers.num = 4
system.servers.f = 1
system.totalordermulticast.timeout = 12000000
system.totalordermulticast.highMark = 10000
system.totalordermulticast.maxbatchsize = 400
system.totalordermulticast.verifyTimestamps = false
system.totalordermulticast.state_transfer = true
system.totalordermulticast.checkpoint_period = 50
system.totalordermulticast.revival_highMark = 10
system.communication.useSignatures = 0
system.communication.useMACs = 1
system.initial.view = 0,1,2,3
system.debug = 0

© Alysson Bessani. All rights reserved. 130

65  
4/10/12  

BFT-­‐‑SMaRt  Programming
• Client-side:
o ServiceProxy is the main class to be used
o Requests and replies are byte arrays (to avoid unnecessary overheads)

public class ServiceProxy extends ... {!


! !public ServiceProxy(int processId) ...!
!public ServiceProxy(int processId, !
! !String configHome,!
! !Comparator<byte[]> replyComparator, !
! !Extractor replyExtractor) ...!
!
!public byte[] invokeOrdered(byte[] request) ...!
!public byte[] invokeUnordered(byte[] request) ...!
!public void invokeAsynchronous(byte[] request, !
! !ReplyListener listener, int[] targets) ...!
}!

© Alysson Bessani. All rights reserved. 131

BFT-­‐‑SMaRt  Programming
• Server-side:
o ServiceReplica is the main class
o It needs an implementation of Executable and Recoverable to work

public class ServiceReplica extends ... {!


!public ServiceReplica(int id,!
! !Executable executor, !
! !Recoverable recover) ...!
!public ServiceReplica(int id, String configHome,!
! !boolean isToJoin, Executable executor,!
! !Recoverable recover) ...!
!
!public void leave() ...!
!public ReplicaContext getReplicaContext () ...!
}!

© Alysson Bessani. All rights reserved. 132

66  
4/10/12  

BFT-­‐‑SMaRt  Programming
• Server-side (cont.):

public interface Executable {!


!public byte[] executeUnordered(byte[] command,!
! !MessageContext msgCtx);!
}!
public interface SingleExecutable extends Executable {!
!public byte[] executeOrdered(byte[] command,!
! !MessageContext msgCtx);!
}!
public interface BatchExecutable extends Executable {!
!public byte[][] executeBatch(byte[][] command,!
! !MessageContext[] msgCtx);!
}!
public interface Recoverable {!
!public byte[] getState();!
!public void setState(byte[] state);!
}!

© Alysson Bessani. All rights reserved. 133

Creating  an  In-­‐‑Memory  


KV-­‐‑Store  with  BFT-­‐‑SMaRt  
• Download BFT-SMaRt 0.7 from
http://code.google.com/p/bft-smart
• (optional) Create a project in your favorite Java IDE
and add dist/BFT-SMaRt.jar and other lib/*.jar to it
• Create a KVMessage class to represent the
messages exchanged between clients and the
replicas
• Create a KVServer class implementing the
SingleExecutable and Recoverable interfaces, and
using ServiceReplica
• Create a KVClient class using ServiceProxy

© Alysson Bessani. All rights reserved. 134

67  
4/10/12  

References
• Kubiatowicz et al. OceanStore: An Architecture for Global-scale Persistent
Storage. ASPLOS’00
• Adya et al. FARSITE: Federated, Available, and Reliable Storage for an
Incompletely Trusted Environment. OSDI’02
• Vandiver et al. Tolerating Byzantine Faults in Database Systems using Commit
Barrier Scheduling. SOSP’07
• Garcia et al. Efficient Middleware for Byzantine Fault-tolerant Database
Replication. EuroSys’11
• Bessani et al. DepSpace: A Byzantine Fault-tolerant Coordination Services.
EuroSys’08
• Christian Cachin and Asad Samar. Secure distributed DNS. DSN’04
• Fraga & Powel. A Fault- and Intrusion-Tolerant File System, IFIP SEC’85
• Bessani. From Byzantine Fault Tolerance to Intrusion Tolerance (A position
paper). WRAITS’11
• Sousa & Bessani. From Byzantine Consensus to BFT State Machine Replication:
A latency-optimal transformation. EDCC’12

© Alysson Bessani. All rights reserved. 135

h\p://www.di.fc.ul.pt/~bessani    
h\p://code.google.com/p/bft-­‐‑smart  

EuroSys 2012

© Alysson Bessani. All rights reserved. 136

68  

You might also like