PBFT Algorithm
PBFT Algorithm
Commander
Lieutenant - 2
Attack
Byzantine Generals Problem
Attack
Attack
Faulty
Commander
Byzantine Generals Problem
Retreat
Faulty Attack
Commander
Byzantine Generals Problem
We need a consensus
mechanism to
decide who is faulty
Retreat
General Said:
Attack Attack
Faulty
Commander
Byzantine Generals Problem
Retreat
Commander Said:
Attack Attack
Faulty
Commander
Byzantine Generals Problem
Can we reach to
consensus?
Attack
Commander Said:
Good Attack
Attack
Commander
Byzantine Generals Problem
Commander Said:
Retreat
Attack
Faulty Lieutenant
Commander Said:
Good Attack
Attack
Commander
Byzantine Generals Problem
Consensus is NOT POSSIBLE
with one commander and
two lieutenants, when one
is faulty
Commander Said:
Retreat
Attack
Faulty Lieutenant
Byzantine Generals Problem – Three Lieutenants
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty
Commander
Attack
Retreat
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty
Commander
Attack Attack
Retreat Attack
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty
Commander
Attack Attack
Retreat Attack
Attack
Attack
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty
Commander
Attack Attack
Retreat Attack
Attack
Retreat
Attack
Retreat
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty Attack: 3
Commander Retreat: 0
Attack Attack
Retreat Attack
Attack
Retreat
Attack
Retreat
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty Attack: 2
Commander Retreat: 1
Attack Attack
Retreat Attack
Attack
Retreat
Attack: 2
Attack
Retreat: 1
Retreat
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty Attack: 2
Commander Retreat: 1
Attack Attack
Retreat Attack
Attack
Retreat
Attack: 2
Attack: 2
Retreat: 1 Attack
Retreat: 1
Retreat
Byzantine Generals Problem – Three Lieutenants
Attack
Faulty Attack: 2
Commander Retreat: 1
Attack Attack
Attack
Retreat
Attack: 2
Attack: 2
Retreat: 1 Attack
Retreat: 1
Retreat
Byzantine Generals Problem – Three Lieutenants
Attack
Good
Commander
Attack
Attack
Byzantine Generals Problem – Three Lieutenants
Attack
Good
Commander
Attack
Attack
Attack
Retreat Faulty
Lieutenant
Byzantine Generals Problem – Three Lieutenants
Attack
Good Attack: 3
Commander Retreat: 0
Attack Attack
Attack
Attack Consensus is reached !
Attack Attack: 3
Retreat: 0
Attack
Attack: 2
Retreat: 1 Retreat Faulty
Lieutenant
Attack
• F faulty nodes – need 3F + 1 nodes to reach consensus
* LAMPORT, LESLIE, ROBERT SHOSTAK, and MARSHALL PEASE. "The Byzantine Generals Problem." ACM Transactions on
Programming Languages and Systems 4.3 (1982): 382-401.
• Lamport-Shostak-Peas Algorithm*
• Synchronous environment Unrealistic assumptions for
• Reliable communication channel real networks
BFT Consensus
• Fully Connected Network
• Receivers always know the identity of the Senders
* LAMPORT, LESLIE, ROBERT SHOSTAK, and MARSHALL PEASE. "The Byzantine Generals Problem." ACM Transactions on
Programming Languages and Systems 4.3 (1982): 382-401.
• Lamport-Shostak-Peas Algorithm*
• Synchronous environment
• Reliable communication channel
BFT Consensus
• Fully Connected Network
• Receivers always know the identity of the Senders
** Castro, Miguel, and Barbara Liskov. "Practical byzantine fault tolerance." USENIX OSDI. Vol. 99. No. 1999.
1999.
• Why Practical?
• Considers an asynchronous environment (Gives priority to Safety over Liveness)
• Utilizes digital signature to validate the identity of the senders
Practical Byzantine Fault Tolerance
• Low overhead
PBFT Overview
• The replicas move through a succession of configurations, known as views
• One replica in a view is considered as the primary (works like a leader), and others are
considered backups
• The primary proposes a value (similar to the Proposers in Paxos), and the backups
accept the value (similar to the Paxos Acceptors)
• When the primary is detected as faulty, the view is changed – PBFT elects a new
primary and a new view is initiated
• Every view is identified by a unique integer v
• Only the messages from the current view is accepted
• PBFT comprises three sub-protocols called normal operation, view
change, and checkpointing.
• Normal operation sub-protocol refers to a scheme that is executed
when everything is running normally and no errors are in the system.
• View change is a sub-protocol that runs when a faulty leader node is
detected in the system.
• Checkpointing is another sub-protocol, which is used to discard the
old data from the system.
• The PBFT protocol comprises three phases or steps.
• These phases run in a sequence to achieve consensus.
• These phases are :
pre-prepare,
prepare,
commit,
pre-prepare sub-protocol algorithm:
1. Accepts a request from the client.
2. Assigns the next sequence number.
3. Sends the pre-prepare message to all backup
replicas.
The prepare sub-protocol algorithm:
1. Accepts the pre-prepare message. If the backup has not accepted any
pre-prepare messages for the same view or sequence number, then it
accepts the message.
2. Sends the prepare message to all replicas.
The commit sub-protocol algorithm:
1. The replica waits for 2F prepare messages with the same view, sequence, and
request.
2. Sends a commit message to all replicas.
3. Waits until a 2F + 1 valid commit message arrives and is accepted.
4. Executes the received request.
5. Sends a reply containing the execution result to the client.
Request
Client Primary
Backups
Request
Client Primary
Request Request
Request Request
Request
Backups
Request
Client Primary
Backups
Request
C
P
PBFT – The Algorithm
R
R2
• The protocol starts by the client sending a Request message to the primary
R3• The primary collects all the Request messages from different clients and
order them based on certain pre-defined logic
C
Pre-Prepare
P
PBFT – The Algorithm
R1
R2
R3
R2
P
PBFT – The Algorithm Prepare
R1
R2
•R3The correct backups send a Prepare message to all other backups including the
primary – works as proof that the backups agree on the message with the
sequence number n under view v
C
P
PBFT – The Algorithm Prepare
R1
R2
• Case 1: All f are Crash or Network faulty – You'll not receive messages from
them!
• You'll receive 2f + 1 Prepare messages from non-faulty nodes
• All these 2f + 1 are non-faulty votes – you can reach to an agreement
Quorum in PBFT
• You have f number of faulty nodes – you need atleast 3f + 1 replicas to reach
consensus
• But you do not know whether those are Crash faults, Network faults,
or Byzantine Faults
R2
R2
P
PBFT – The Algorithm
R
R2
• If the primary fails, backups will not receive any message or will receive
faulty messages from the primary
In traditional client-server model, PBFT works well; however, in the case of blockchain, directly
implementing PBFT in its original state may not work correctly.
This is because PBFT’s original design was not developed for blockchain.
The differences between PBFT and IBFT
Let's first discuss the primary differences between the PBFT and IBFT protocols. They are as
follows:
• There is no distinctive concept of a client in IBFT. Instead, the proposer can be seen as a client, and in
fact, all validators can be considered clients.
• There is a concept of dynamic validators, which is in contrast with the original PBFT, where the nodes
are static. However, in IBFT, the validators can be voted in and out asrequired.
• There are two types of nodes in an IBFT network, nodes and validators. Nodes are synchronized with
the blockchain without participating in the IBFT consensus process.
In contrast, validators are the nodes that participate in the IBFT consensus process.
• IBFT relies on a more straightforward structure of view-change (round change) messages as compared
to PBFT.
• In contrast with PBFT, in IBFT there is no concrete concept of checkpoints. However, each block can be
considered an indicator of the progress so far (the chain height).
• There is no concept of garbage collection in IBFT.
Consensus states
• New round: In this state, a new round of the consensus mechanism starts, and the selected proposer
sends a new block proposal to other validators. In this state, all other validators wait for the PRE-
PREPARE message.
• Pre-prepared: A validator transitions to this state when it has received a PRE-PREPARE message and
broadcasts a PREPARE message to other validators. The validator then waits for 2F + 1 PREPARE or
COMMIT messages.
• Prepared: This state is achieved by a validator when it has received 2F+1 prepare messages and has
broadcast the commit messages. The validator then awaits 2F+1 commit messages to arrive from other
validators.
• Committed: The state indicates that a validator has received 2F+1 COMMIT messages. The validator at
this stage can insert the proposed block into the blockchain.
• Final committed: This state is achieved by a validator when the newly committed block is inserted
successfully into the blockchain. At this state, the validator is also ready for the next round of
consensus.
• Round change: This state indicates that the validators are waiting for 2F+1 round change messages to
arrive for the newly proposed new round number.
1. The protocol starts with a new round. In the new round, the selected proposer
broadcasts a proposal (block) as a pre-prepare message.
2. The nodes that receive this pre-prepare message validate the message and accept it if it
is a valid message. The nodes also then set their state to pre-prepared.
3. At this stage, if a timeout occurs, or a proposal is seen as invalid by the nodes, they will
initiate a round change. The normal process then begins again with a proposer,
proposing a block.
4. Nodes then broadcast the prepare message and wait for 2F+1 prepare messages to be
received from other nodes. If the nodes do not receive 2F+1 messages in time, then they
time out, and the round change process starts. The nodes then set their state to prepared
after receiving 2F+1 messages from other nodes.
5. Finally, the nodes broadcast a commit message and wait for 2F+1 messages to arrive
from other nodes. If they are received, then the state is set to committed, otherwise,
timeout occurs and the round change process starts.
6. Once committed, block insertion is tried. If it succeeds, the protocol proceeds to the
final committed state and, eventually, a new round starts. If insertion fails for some
reason, the round change process triggers. Again, nodes wait for 2F+1 round change
messages, and if the threshold of the messages is received, then round change occurs.
Tendermint
Tendermint is another variant of PBFT. It was inspired by both the DLS and PBFT
protocols.
Tendermint also makes use of the SMR approach to providing consensus.
There are three key properties that HotStuff has addressed. These properties are listed as follows:
Optimistic responsiveness
Optimistic responsiveness ensures that any correct leader after GST is reached only requires the
first N - F responses to ensure progress.
Chain quality
This property ensures fairness and liveness in the system by allowing fast and frequent leader
rotation.
In comparison with traditional PBFT, HotStuff has introduced several changes, which result in
improved performance:
• PBFT-style protocols work using a mesh communication topology, where each message is
required to be broadcast to other nodes on the network.
• In HotStuff, the communication has been changed to the star topology, which means that nodes do
not communicate with each other directly, but all consensus messages are collected by a leader
and then broadcast to other nodes.
HotStuff works in phases, namely the prepare phase, pre-commit phase, commit phase, and decide
phase.
Prepare:
• Once a new leader has collected new-view messages from N - F nodes, the protocol for the new
leader starts.
• The leader collects and processes these messages to figure out the latest branch in which the highest
quorum certificate of prepare messages was formed.
Pre-commit:
• As soon as a leader receives N - F prepare votes, it creates a quorum certificate called "prepare
quorum certificate."
• This "prepare quorum certificate" is broadcast to other nodes as a PRECOMMIT message.
• When a replica receives the PRE-COMMIT message, it responds with a pre-commit vote.
• The quorum certificate is the indication that the required threshold of nodes has confirmed the
Commit:
• When the leader receives N - F pre-commit votes, it creates a PRE-COMMIT quorum certificate and
broadcasts it to other nodes as the COMMIT message.
• When replicas receive this COMMIT message, they respond with their commit vote.
• At this stage, replicas lock the PRE-COMMIT quorum certificate to ensure the safety of the algorithm
even if view change-occurs.
Decide:
• When the leader receives N - F commit votes, it creates a COMMIT quorum certificate.
• This COMMIT quorum certificate is broadcast to other nodes in the DECIDE message.
• When replicas receive this DECIDE message, replicas execute the request, because this message
contains an already committed certificate/value.
• Once the state transition occurs as a result of the DECIDE message being processed by a replica, the
new view starts.