0% found this document useful (0 votes)
7 views

PBFT Algorithm

Uploaded by

Pradeep Bammidi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

PBFT Algorithm

Uploaded by

Pradeep Bammidi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 76

Byzantine Generals Problem Lieutenant - 1

Commander
Lieutenant - 2
Attack
Byzantine Generals Problem

Attack
Attack
Faulty
Commander
Byzantine Generals Problem

Retreat
Faulty Attack
Commander
Byzantine Generals Problem

We need a consensus
mechanism to
decide who is faulty

Retreat
General Said:
Attack Attack
Faulty
Commander
Byzantine Generals Problem

Retreat
Commander Said:
Attack Attack
Faulty
Commander
Byzantine Generals Problem

Retreat Commander Said:


Retreat
Commander Said:
Faulty Attack Attack
Commander
Byzantine Generals Problem

Can we reach to
consensus?

Retreat Commander Said:


Retreat
Good Attack
Commander
Byzantine Generals Problem

Attack
Commander Said:
Good Attack
Attack
Commander
Byzantine Generals Problem

Commander Said:
Retreat
Attack

Faulty Lieutenant
Commander Said:
Good Attack
Attack
Commander
Byzantine Generals Problem
Consensus is NOT POSSIBLE
with one commander and
two lieutenants, when one
is faulty
Commander Said:
Retreat
Attack

Faulty Lieutenant
Byzantine Generals Problem – Three Lieutenants
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty
Commander
Attack

Retreat
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty
Commander
Attack Attack

Retreat Attack
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty
Commander
Attack Attack

Retreat Attack

Attack

Attack
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty
Commander
Attack Attack

Retreat Attack

Attack
Retreat

Attack
Retreat
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty Attack: 3
Commander Retreat: 0
Attack Attack

Retreat Attack

Attack
Retreat

Attack
Retreat
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty Attack: 2
Commander Retreat: 1
Attack Attack

Retreat Attack

Attack
Retreat
Attack: 2
Attack
Retreat: 1
Retreat
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty Attack: 2
Commander Retreat: 1
Attack Attack

Retreat Attack

Attack
Retreat
Attack: 2
Attack: 2
Retreat: 1 Attack
Retreat: 1
Retreat
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty Attack: 2
Commander Retreat: 1
Attack Attack

Retreat Consensus is reached ! Attack

Attack
Retreat
Attack: 2
Attack: 2
Retreat: 1 Attack
Retreat: 1
Retreat
Byzantine Generals Problem – Three Lieutenants
Attack
Good
Commander
Attack

Attack
Byzantine Generals Problem – Three Lieutenants
Attack
Good
Commander
Attack

Attack

Attack

Retreat Faulty
Lieutenant
Byzantine Generals Problem – Three Lieutenants
Attack
Good Attack: 3
Commander Retreat: 0
Attack Attack

Attack
Attack Consensus is reached !

Attack Attack: 3
Retreat: 0
Attack
Attack: 2
Retreat: 1 Retreat Faulty
Lieutenant
Attack
• F faulty nodes – need 3F + 1 nodes to reach consensus

Asynchronous Byzantine Agreement


• Lamport-Shostak-Peas Algorithm*
• Synchronous environment
• Reliable communication channel
BFT Consensus
• Fully Connected Network
• Receivers always know the identity of the Senders

* LAMPORT, LESLIE, ROBERT SHOSTAK, and MARSHALL PEASE. "The Byzantine Generals Problem." ACM Transactions on
Programming Languages and Systems 4.3 (1982): 382-401.
• Lamport-Shostak-Peas Algorithm*
• Synchronous environment Unrealistic assumptions for
• Reliable communication channel real networks
BFT Consensus
• Fully Connected Network
• Receivers always know the identity of the Senders

* LAMPORT, LESLIE, ROBERT SHOSTAK, and MARSHALL PEASE. "The Byzantine Generals Problem." ACM Transactions on
Programming Languages and Systems 4.3 (1982): 382-401.
• Lamport-Shostak-Peas Algorithm*
• Synchronous environment
• Reliable communication channel
BFT Consensus
• Fully Connected Network
• Receivers always know the identity of the Senders

• Many different variants of BFT Consensus have emerged


• Lamport-Shostak-Peas Algorithm*
• Synchronous environment
• Reliable communication channel
BFT Consensus
• Fully Connected Network
• Receivers always know the identity of the Senders

• Many different variants of BFT Consensus have emerged

• Practical Byzantine Fault Tolerance (PBFT)**


• Use cryptographic techniques to release the *unrealistic* assumptions

** Castro, Miguel, and Barbara Liskov. "Practical byzantine fault tolerance." USENIX OSDI. Vol. 99. No. 1999.
1999.
• Why Practical?
• Considers an asynchronous environment (Gives priority to Safety over Liveness)
• Utilizes digital signature to validate the identity of the senders
Practical Byzantine Fault Tolerance
• Low overhead

• Incorporated in a large number of distributed applications


including blockchain
• Tendermint
• Hyperledger Fabric

• Uses cryptographic techniques to make the messages tamper-proof


• Based on State Machine Replication
• Considers 3F + 1 replicas where F can be the maximum number of faulty replicas

PBFT Overview
• The replicas move through a succession of configurations, known as views
• One replica in a view is considered as the primary (works like a leader), and others are
considered backups
• The primary proposes a value (similar to the Proposers in Paxos), and the backups
accept the value (similar to the Paxos Acceptors)
• When the primary is detected as faulty, the view is changed – PBFT elects a new
primary and a new view is initiated
• Every view is identified by a unique integer v
• Only the messages from the current view is accepted
• PBFT comprises three sub-protocols called normal operation, view
change, and checkpointing.
• Normal operation sub-protocol refers to a scheme that is executed
when everything is running normally and no errors are in the system.
• View change is a sub-protocol that runs when a faulty leader node is
detected in the system.
• Checkpointing is another sub-protocol, which is used to discard the
old data from the system.
• The PBFT protocol comprises three phases or steps.
• These phases run in a sequence to achieve consensus.
• These phases are :
pre-prepare,
prepare,
commit,
pre-prepare sub-protocol algorithm:
1. Accepts a request from the client.
2. Assigns the next sequence number.
3. Sends the pre-prepare message to all backup
replicas.
The prepare sub-protocol algorithm:
1. Accepts the pre-prepare message. If the backup has not accepted any
pre-prepare messages for the same view or sequence number, then it
accepts the message.
2. Sends the prepare message to all replicas.
The commit sub-protocol algorithm:
1. The replica waits for 2F prepare messages with the same view, sequence, and
request.
2. Sends a commit message to all replicas.
3. Waits until a 2F + 1 valid commit message arrives and is accepted.
4. Executes the received request.
5. Sends a reply containing the execution result to the client.
Request
Client Primary

PBFT – Broad Idea

Backups
Request
Client Primary

PBFT – Broad Idea

Request Request
Request Request
Request

Backups
Request
Client Primary

PBFT – Broad Idea Client waits for F+1 replies,


where F is the maximum
number of faulty nodes
Reply
Reply

Backups
Request
C

P
PBFT – The Algorithm
R

R2

• The protocol starts by the client sending a Request message to the primary
R3• The primary collects all the Request messages from different clients and
order them based on certain pre-defined logic
C
Pre-Prepare
P
PBFT – The Algorithm
R1

R2

R3

• Primary assigns a sequence number n to the Request (or a set of Requests)


and multicast a message <<PRE-PREPARE, v, n, d>β_p, m> to all the backups
C
Pre-Prepare
P
PBFT – The Algorithm
R1

R2

• Pre-prepare works as a proof that the Request was assigned a sequence


number n for the view v
R3
A backup accepts the Pre-prepare message, if
• The signature is correct and d is the digest of the message m
PBFT•–The
Thebackup
Algorithm
is in view v
• It has not received a different Pre-Prepare message with sequence n and
view v with a different message digest
• The sequence number is within a threshold (the message is not too old

prevents a reply attack)
C

P
PBFT – The Algorithm Prepare
R1

R2

•R3The correct backups send a Prepare message to all other backups including the
primary – works as proof that the backups agree on the message with the
sequence number n under view v
C

P
PBFT – The Algorithm Prepare
R1

R2

•R3Message format for backup k : <PREPARE, v, n, d, k>β_k


PBFT – Three Phase Commit

• Pre-prepare and Prepare ensure that non-faulty replicas guarantee on a total


order for the requests within a view

• Assumptions for Commit:


• Prinary is non-faulty
• You may have a maximum of f faults including Crash + Network +
Byzantine
• A message is committed if
PBFT •– 2f Prepare
Three Phasefrom different backups matches with the corresponding Pre-
Commit
prepare
• You have total 2f + 1 votes (one from the primary that you already
have!) from the non-faulty replicas
Quorum in PBFT

• You have f number of faulty nodes – you need atleast 3f + 1 replicas to


reach consensus
• But you do not know whether those are Crash faults, Network faults,
or Byzantine Faults

• Case 1: All f are Crash or Network faulty – You'll not receive messages from
them!
• You'll receive 2f + 1 Prepare messages from non-faulty nodes
• All these 2f + 1 are non-faulty votes – you can reach to an agreement
Quorum in PBFT

• You have f number of faulty nodes – you need atleast 3f + 1 replicas to reach
consensus
• But you do not know whether those are Crash faults, Network faults,
or Byzantine Faults

• Case 2: All f are Byzantine faulty – they send messages!


• You may receive at most 3f + 1 Prepare messages (votes) -- f are from
Byzantine nodes
• Sufficient to wait till 2f + 1 Prepare messages – even if f are faulty, you still
have f+1 non-faulty votes
• You cannot wait for f+1, the first f might be all faulty
C
Commit
P
PBFT – The Algorithm
R1

R2

• Message format for replica k : <COMMIT, v, n, d, k>β_k


R3
C
Commit
P
PBFT – The Algorithm
R1

R2

• Message format for replica k : <COMMIT, v, n, d, k>β_k


•R3The protocol is committed for a replica when
• It has sent the Commit message
• It has received 2f Commit messages from other replicas
Reply
C

P
PBFT – The Algorithm
R

R2

• Message format for replica k : <COMMIT, v, n, d, k>β_k


•R3The protocol is committed for a replica when
• It has sent the Commit message
• It has received 2f Commit messages from other replicas
• What if the primary is faulty ?
• Non-faulty replicas detect the fault
• Replicas together start view change
View Change
operation
• What if the primary is faulty ?
• Non-faulty replicas detect the fault
• Replicas together start view change operation
View Change
• View-change protocol provides eventual liveness -- Allows the system to
make progress when primary fails

• If the primary fails, backups will not receive any message or will receive
faulty messages from the primary

• View changes are triggered by timeouts (weak synchrony assumption)


• Prevent backups from waiting indefinitely for requests to execute
View-change
• View-change occurs when a primary is suspected faulty.
• This phase is required to ensure protocol progress.
• With the view change sub-protocol, a new primary is selected, which
then starts normal mode operation again.
• The new primary is selected in a round-robin fashion.
• When a backup replica receives a request, it tries to execute it after
validating the message, but for some reason, if it does not execute it for
a while, the replica times out and initiates the view change sub-
protocol.
.
• In the view change protocol, the replica stops accepting messages
related to the current view and updates its state to view-change.

• The only messages it can receive in this state are checkpoint


messages, view-change messages, and new-view messages.

• After that, it sends a view-change message with the next view


number to all replicas.
• When this message arrives at the new primary, the primary
waits for at least 2F view-change messages for the next
view. If at least 2F view-change messages are received it
broadcasts a new view message to all replicas and
progresses toward running normal operation mode once
again.
• When other replicas receive a new-view message, they
update their local state accordingly and start normal
operation mode.
The checkpoint sub-protocol

• It is used to discard old messages in the log of all replicas.


• With this, the replicas agree on a stable checkpoint that provides a snapshot of
the global state at a certain point in time.
• This is a periodic process carried out by each replica after executing the request
and marking that as a checkpoint in its log.
• A variable called low watermark (in PBFT terminology) is used to record the
sequence number of the last stable checkpoint.
• This checkpoint is then broadcast to other nodes. As soon as a replica has at
least 2F+1 checkpoint messages, it saves these messages as proof of a stable
checkpoint.
• It discards all previous pre-prepare, prepare, and commit messages from its
logs.
Istanbul Byzantine Fault Tolerance

In traditional client-server model, PBFT works well; however, in the case of blockchain, directly
implementing PBFT in its original state may not work correctly.

This is because PBFT’s original design was not developed for blockchain.
The differences between PBFT and IBFT
Let's first discuss the primary differences between the PBFT and IBFT protocols. They are as
follows:

• There is no distinctive concept of a client in IBFT. Instead, the proposer can be seen as a client, and in
fact, all validators can be considered clients.
• There is a concept of dynamic validators, which is in contrast with the original PBFT, where the nodes
are static. However, in IBFT, the validators can be voted in and out asrequired.
• There are two types of nodes in an IBFT network, nodes and validators. Nodes are synchronized with
the blockchain without participating in the IBFT consensus process.

In contrast, validators are the nodes that participate in the IBFT consensus process.
• IBFT relies on a more straightforward structure of view-change (round change) messages as compared
to PBFT.
• In contrast with PBFT, in IBFT there is no concrete concept of checkpoints. However, each block can be
considered an indicator of the progress so far (the chain height).
• There is no concept of garbage collection in IBFT.
Consensus states

• New round: In this state, a new round of the consensus mechanism starts, and the selected proposer
sends a new block proposal to other validators. In this state, all other validators wait for the PRE-
PREPARE message.

• Pre-prepared: A validator transitions to this state when it has received a PRE-PREPARE message and
broadcasts a PREPARE message to other validators. The validator then waits for 2F + 1 PREPARE or
COMMIT messages.

• Prepared: This state is achieved by a validator when it has received 2F+1 prepare messages and has
broadcast the commit messages. The validator then awaits 2F+1 commit messages to arrive from other
validators.
• Committed: The state indicates that a validator has received 2F+1 COMMIT messages. The validator at
this stage can insert the proposed block into the blockchain.

• Final committed: This state is achieved by a validator when the newly committed block is inserted
successfully into the blockchain. At this state, the validator is also ready for the next round of
consensus.

• Round change: This state indicates that the validators are waiting for 2F+1 round change messages to
arrive for the newly proposed new round number.
1. The protocol starts with a new round. In the new round, the selected proposer
broadcasts a proposal (block) as a pre-prepare message.

2. The nodes that receive this pre-prepare message validate the message and accept it if it
is a valid message. The nodes also then set their state to pre-prepared.

3. At this stage, if a timeout occurs, or a proposal is seen as invalid by the nodes, they will
initiate a round change. The normal process then begins again with a proposer,
proposing a block.

4. Nodes then broadcast the prepare message and wait for 2F+1 prepare messages to be
received from other nodes. If the nodes do not receive 2F+1 messages in time, then they
time out, and the round change process starts. The nodes then set their state to prepared
after receiving 2F+1 messages from other nodes.
5. Finally, the nodes broadcast a commit message and wait for 2F+1 messages to arrive
from other nodes. If they are received, then the state is set to committed, otherwise,
timeout occurs and the round change process starts.

6. Once committed, block insertion is tried. If it succeeds, the protocol proceeds to the
final committed state and, eventually, a new round starts. If insertion fails for some
reason, the round change process triggers. Again, nodes wait for 2F+1 round change
messages, and if the threshold of the messages is received, then round change occurs.
Tendermint

Tendermint is another variant of PBFT. It was inspired by both the DLS and PBFT
protocols.
Tendermint also makes use of the SMR approach to providing consensus.

Similarly, in Tendermint, these safety and liveness properties consist of agreement,


termination, and validity.
These properties are defined as follows:
• Agreement: No two correct processes decide on different values.
• Termination: All correct processes eventually decide on a value.
• Validity: A decided upon value is valid, that is, it satisfies the predefined
predicate denoted valid().
HotStuff
HotStuff is the latest class of BFT protocol with a number of optimizations.
There are several changes in HotStuff that make it a different and, in some ways, better protocol
than traditional PBFT. HotStuff was introduced by VMware Research in 2018.

There are three key properties that HotStuff has addressed. These properties are listed as follows:

Linear view change


Linear view change results in reduced communication complexity.

Optimistic responsiveness
Optimistic responsiveness ensures that any correct leader after GST is reached only requires the
first N - F responses to ensure progress.

Chain quality
This property ensures fairness and liveness in the system by allowing fast and frequent leader
rotation.
In comparison with traditional PBFT, HotStuff has introduced several changes, which result in
improved performance:

• PBFT-style protocols work using a mesh communication topology, where each message is
required to be broadcast to other nodes on the network.

• In HotStuff, the communication has been changed to the star topology, which means that nodes do
not communicate with each other directly, but all consensus messages are collected by a leader
and then broadcast to other nodes.

• This immediately results in reduced communication complexity.


A question arises here: what happens if the leader somehow is corrupt or
compromised? This issue is solved by the same BFT tolerance rules
where, if a leader proposes a malicious block, it will be rejected by other
honest validators and a new leader will be chosen. This scenario can slow
down the network for a limited time (until a new honest leader is chosen),
but eventually (as long as a majority of the network is honest), an honest
leader will be chosen, which will propose a valid block. Also, for further
protection, usually, the leader role is frequently (usually, with each block)
rotated between validators, which can neutralize any malicious attacks
targeting the network. This property ensures fairness, which helps to
achieve chain quality, introduced previously.
How HotStuff works

HotStuff works in phases, namely the prepare phase, pre-commit phase, commit phase, and decide
phase.

Prepare:
• Once a new leader has collected new-view messages from N - F nodes, the protocol for the new
leader starts.
• The leader collects and processes these messages to figure out the latest branch in which the highest
quorum certificate of prepare messages was formed.

Pre-commit:
• As soon as a leader receives N - F prepare votes, it creates a quorum certificate called "prepare
quorum certificate."
• This "prepare quorum certificate" is broadcast to other nodes as a PRECOMMIT message.
• When a replica receives the PRE-COMMIT message, it responds with a pre-commit vote.
• The quorum certificate is the indication that the required threshold of nodes has confirmed the
Commit:
• When the leader receives N - F pre-commit votes, it creates a PRE-COMMIT quorum certificate and
broadcasts it to other nodes as the COMMIT message.
• When replicas receive this COMMIT message, they respond with their commit vote.
• At this stage, replicas lock the PRE-COMMIT quorum certificate to ensure the safety of the algorithm
even if view change-occurs.

Decide:
• When the leader receives N - F commit votes, it creates a COMMIT quorum certificate.
• This COMMIT quorum certificate is broadcast to other nodes in the DECIDE message.
• When replicas receive this DECIDE message, replicas execute the request, because this message
contains an already committed certificate/value.
• Once the state transition occurs as a result of the DECIDE message being processed by a replica, the
new view starts.

You might also like