Unit 4 BCT
Unit 4 BCT
Introduction
Raft protocol was developed by Diego Ongaro and John Ousterhout (Stanford University)
which won Diego his Ph.D in 2014(The link for the paper is in the References section at the
end of the article). Raft was designed for better understandability of how Consensus(we will
explain what consensus is, in a moment) can be achieved considering that its predecessor, the
Paxos Algorithm, developed by Lesli Lamport is very difficult to understand and implement.
Hence, the title of the paper by Diego, ‘In Search of an Understandable Consensus
Algorithm’. Before Raft, Paxos was considered the holy grail in achieving Consensus..
Lets start.
Consensus
So, to understand Raft, we shall first have a look at the problem which the Raft protocol tries
to solve and that is achieving Consensus. Consensus means multiple servers agreeing on
same information, something imperative to design fault-tolerant distributed systems. Lets
describe it with the help of couple visuals.
So, lets first define the process used when a client interacts with a server to clarify the
process.
Process : The client sends a message to the server and the server responds back with a reply.
A consensus protocol tolerating failures must have the following features :
Single Server system : The client interacts with a system having only one server with no
backup. There is no problem in achieving consensus in such a system.
Multiple Server system : The client interacts with a system having multiple servers. Such
systems can be of two types :
Symmetric :- Any of the multiple servers can respond to the client and all the
other servers are supposed to sync up with the server that responded to the
client’s request, and
Asymmetric :- Only the elected leader server can respond to the client. All other
servers then sync up with the leader server.
Such a system in which all the servers replicate(or maintain) similar data(shared state) across
time can for now be referred to as, replicated state machine.
We shall now define some terms used to refer individual servers in a distributed system.
Leader – Only the server elected as leader can interact with the client. All other servers
sync up themselves with the leader. At any point of time, there can be at most one
leader(possibly 0, which we shall explain later)
Follower – Follower servers sync up their copy of data with that of the leader’s after every
regular time intervals. When the leader server goes down(due to any reason), one of the
followers can contest an election and become the leader.
Candidate – At the time of contesting an election to choose the leader server, the servers
can ask other servers for votes. Hence, they are called candidates when they have requested
votes. Initially, all servers are in the Candidate state.
So, the above system can now be labelled as in the following snap.
CAP theorem CAP Theorem is a concept that a distributed database system can only have 2
of the 3:
Consistency – The data is same in all the server nodes(leader or follower), implying the
system has nearly instantaneous sync capabilities
Availability – Every request gets a response(success/failure). It requires the system to be
operational 100% of the time to serve requests, and
Partition Tolerance – The system continues to respond, even after some of the server
nodes fail. This implies that the system maintains all the requests/responses function
somehow.
Under normal conditions, a node can stay in any one of the above three states. Only a leader
can interact with the client; any request to the follower node is redirected to the leader node.
A candidate can ask for votes to become the leader. A follower only responds to candidate(s)
or the leader.
To maintain these server status(es), the Raft algorithm divides time into small terms of
arbitrary length. Each term is identified by a monotonically increasing number, called term
number.
Term number
This term number is maintained by every node and is passed while communications between
nodes. Every term starts with an election to determine the new leader. The candidates ask for
votes from other server nodes(followers) to gather majority. If the majority is gathered, the
candidate becomes the leader for the current term. If no majority is established, the situation
is called a split vote and the term ends with no leader. Hence, a term can have at most one
leader.
Purpose of maintaining term number
Following tasks are executed by observing the term number of each node:
Servers update their term number if their term number is less than the term numbers of
other servers in the cluster. This means that when a new term starts, the term numbers are
tallied with the leader or the candidate and are updated to match with the latest
one(Leader’s)
Candidate or Leader demotes to the Follower state if their term number is out of date(less
than others). If at any point of time, any other server has a higher term number, it can
become the Leader immediately.
As we said earlier that the term number of the servers are also communicated, if a request
is achieved with a stale term number, the said request is rejected. This basically means that
a server node will not accept requests from server with lower term number
Raft algorithm uses two types of Remote Procedure Calls(RPCs) to carry out the functions :
RequestVotes RPC is sent by the Candidate nodes to gather votes during an election
AppendEntries is used by the Leader node for replicating the log entries and also as a
heartbeat mechanism to check if a server is still up. If heartbeat is responded back to, the
server is up else, the server is down. Be noted that the heartbeats do not contain any log
entries.
Now, lets have a look at the process of leader election.
Leader election
In order to maintain authority as a Leader of the cluster, the Leader node sends heartbeat to
express dominion to other Follower nodes. A leader election takes place when a Follower
node times out while waiting for a heartbeat from the Leader node. At this point of time, the
timed out node changes it state to Candidate state, votes for itself and issues RequestVotes
RPC to establish majority and attempt to become the Leader. The election can go the
following three ways:
The Candidate node becomes the Leader by receiving the majority of votes from the
cluster nodes. At this point of time, it updates its status to Leader and starts sending
heartbeats to notify other servers of the new Leader.
The Candidate node fails to receive the majority of votes in the election and hence the term
ends with no Leader. The Candidate node returns to the Follower state.
If the term number of the Candidate node requesting the votes is less than other Candidate
nodes in the cluster, the AppendEntries RPC is rejected and other nodes retain their
Candidate status. If the term number is greater, the Candidate node is elected as the new
Leader.
raft leader election
The following excerpt from the Raft paper(linked in the references below) explains a
significant aspect of server timeouts.
Raft uses randomized election timeouts to ensure that split votes are rare and that they are
resolved quickly. To prevent split votes in the first place, election timeouts are chosen
randomly from a fixed interval (e.g., 150–300ms). This spreads out the servers so that in most
cases only a single server will time out; it wins the election and sends heartbeats before any
other servers time out. The same mechanism is used to handle split votes. Each candidate
restarts its randomized election timeout at the start of an election, and it waits for that
timeout to elapse before starting the next election; this reduces the likelihood of another split
vote in the new election.
Log Replication
For the sake of simplicity while explaining to the beginner level audience, we will restrict our
scope to client making only write requests. Each request made by the client is stored in the
Logs of the Leader. This log is then replicated to other nodes(Followers). Typically, a log
entry contains the following three information :
In Raft, the leader handles inconsistencies by forcing the followers’ logs to duplicate its own.
This means that conflicting entries in follower logs will be overwritten with entries from the
leader’s log.
The Leader node will look for the last matched index number in the Leader and Follower, it
will then overwrite any extra entries further that point(index number) with the new entries
supplied by the Leader. This helps in Log matching the Follower with the Leader. The
AppendEntries RPC will iteratively send the RPCs with reduced Index Numbers so that a
match is found. When the match is found, the RPC succeeds.
Safety
In order to maintain consistency and same set of server nodes, it is ensured by the Raft
consensus algorithm that the leader will have all the entries from the previous terms
committed in its log.
During a leader election, the RequestVote RPC also contains information about the
candidate’s log(like term number) to figure out which one is the latest. If the candidate
requesting the vote has less updated data than the Follower from which it is requesting vote,
the Follower simply doesn’t vote for the said candidate. The following excerpt from the
original Raft paper clears it in a similar and profound way.
Raft determines which of two logs is more up-to-date by comparing the index and term of the
last entries in the logs. If the logs have last entries with different terms, then the log with the
later term is more up-to-date. If the logs end with the same term, then whichever log is longer
is more up-to-date.
The Raft protocol is designed to be easily understandable considering that the most popular
way to achieve consensus on distributed systems was the Paxos algorithm, which was very
hard to understand and implement. Anyone with basic knowledge and common sense can
understand major parts of the protocol and the research paper published by Diego Ongaro
and John Ousterhout
It is comparatively easy to implement than other alternatives, primarily the Paxos, because
of a more targeted use case segment, assumptions about the distributed system. Many open
source implementations of the Raft are available on the internet. Some are in Go, C+
+, Java
The Raft protocol has been decomposed into smaller subproblems which can be tackled
relatively independently for better understanding, implementation, debugging, optimizing
performance for a more specific use case
The distributed system following the Raft consensus protocol will remain operational even
when minority of the servers fail. For example, if we have a 5 server node cluster, if 2
nodes fail, the system can still operate.
The leader election mechanism employed in the Raft is so designed that one node will
always gain the majority of votes within a maximum of 2 terms.
The Raft employs RPC(remote procedure calls) to request votes and sync up the
cluster(using AppendEntries). So, the load of the calls does not fall on the leader node in
the cluster.
Raft was designed recently, so it employs modern concepts which were not yet understood
at the time of the formulation of the Paxos and similar protocols.
Any node in the cluster can become the leader. So, it has a certain degree of fairness.
Many different open source implementations for different use cases are already out there
on GitHub and related places
Companies like MongoDB, HashiCorp, etc. are using it!
Raft Alternatives
Limitations
Raft is strictly single Leader protocol. Too much traffic can choke the system. Some
variants of Paxos algorithm exist that address this bottleneck.
There are a lot of assumptions considered to be acting, like non-occurrence of Byzantine
failures, which sort of reduces the real life applicability.
Raft is a more specialized approach towards a subset of problems which arise in achieving
consensus.
Cheap-paxos(a variant of Paxos), can work even when there is only one node functioning
in the server cluster. To generalise, K+1 replicated servers can tolerate shutting down of/
fault in K servers.
Permissioned Blockchain – Raft Consensus
The idea behind the Raft consensus algorithm is that the nodes (i.e., server computers)
collectively select a leader, and the remaining nodes become the followers. The leader is
responsible for state transition log replication across the followers under the closed
distributed environment, assuming that all the nodes are trustworthy and have no malicious
intent.
The basic idea of Raft came from the fact that in a distributed environment, we can come to a
consensus based on the Paxos algorithm and elect a leader. Interestingly, if we have a leader
in the system, we can avoid multiple proposers proposing something altogether.
In the case of Paxos, we don’t have any straightforward mechanism to elect a leader.
However, to elect a leader, multiple proposers propose the thing simultaneously.
Consequently, the protocol becomes complex, and the acceptors have to accept one of the
proposals from the proposer. In that case, we use the highest proposal number for the tie-
breaking mechanism and embed a certain algorithm in Paxos to ensure that every proposal
coming from a different proposer is unique. Thus, all these internal details make the Paxos
more complicated.
The system starts up and has a set of follower nodes. The follower nodes look for a leader. If
a timeout happens, there is no leader, and there is a need to elect a leader in the system. A
few candidates stand for a leader in the election process, and the remaining nodes vote for the
candidate. The candidates who receive the majority votes become the leader. The leader
proposes a proposal, and the followers can either vote for or against that proposal.
Raft
consensus algo
An example from the database replication: We have distributed multiple replicated servers,
and we want to build a consensus among these multiple replicated servers. Whenever some
transactions are coming up from the clients, we want these replicated servers to decide
whether to commit those transactions collectively.
The first part of the Raft is to elect a leader, and for that, there should be some leader
candidates. The nodes sense the network, and if there is no leader candidate, then one of the
nodes will announce that I want to be a leader. The leader candidate requests the votes. This
voting request contains 2 parameters:
These algorithms work in multiple rounds, and the term indicates a new voting round. If the
last voting finishes, then the next term will be old term number + 1; The index indicates
committed transactions available to the candidate. It is just like an increasing number to
distinguish between already committed and new transactions.
Once the nodes receive a voting request, their task is to vote pro or against the candidate. So,
this is the mechanism to elect a leader in the Raft consensus algorithm. Each node compares
the received term and index with the corresponding current known values.
The node(i) receives the voting request. It compares the already seen team with the
newly received team. If a newly received team is less than the already seen team, then
it discards because the node considers this request as an old request.
The newly received team is greater than the already seen team. It checks for the newly
received index number with the already seen index number. If the newly received
index number is greater than already seen, it votes for the candidate; else, it declines.
Electing the Leader: Majority Voting
Every node sends their vote and candidates who get majority vote becomes a leader, and
commit the corresponding log entry. in other words, If a certain leader candidate, receives
majority of the vote from the nodes, then that particular candidate becomes a leader and other
becomes the follower of that
node.
Multiple Leader Candidates: Current Leader Failure
Let us understand a scenario where there is a leader, and three followers and the current team
is 10, and the commit index value is 100. Suppose the leader node has failed or followers
didn’t receive a heartbeat message within the heartbeat timeout period.
After the timeout, one of the nodes will become a leader candidate, initiates a leader election,
and becomes a new leader with team 11 and commit index value 100. The new leader
periodically sends the heartbeat message to everyone to indicate his presence.
In the meantime, the old leader gets recovered, and he also receives a heartbeat message from
the new leader. The old leader understands that a new term has started. Then the old leader
will change his status from leader to follower. So this is the one way to handling a new leader
by utilizing the team parameter.
Let us understand a scenario where there is a leader, and three followers and the current team
is 20, and the commit index value is 200. Suppose the leader node has failed or followers
didn’t receive a heartbeat message within the heartbeat timeout period. It may be possible
that multiple followers sense the timeout period simultaneously and become a leader
candidate, and initiates the leader election procedure independently. Two nodes send the
request messages with team 21 at the same time in the network.
There are two leader candidates, and both are sending voting request messages, at the same
time, for round (term) 21. Then, they look for the majority voting. In this example, the first
candidate receives two votes, and the second candidate receives one vote, so based on the
majority voting, the first candidate is a winner.
The node which gets the majority votes send a heartbeat message to everyone. Another leader
candidate also received the heartbeat message from the winner, and this leader candidate falls
back to a follower from the leader candidate.
Committing Entry Log
In the above sections, we have seen the procedure to elect a leader and other special cases.
Now we will understand how the transactions are managed in a closed distributed
environment. Let us consider that the current term value is 10, and the index value is 100,
which means most of the nodes have seen and committed transaction index value number
100.
The leader proposes a new transaction, adds an entry log with term 10 and the new
transaction index value as 101. Further, the leader sends a message called append entries to
all the followers, and they collectively vote either for or against this transaction.
The leader receives the vote for this transaction index value 101. The followers’ node votes
for or against this transaction. If the majority says that they are fine with committing this
particular log. Then, the leader considers that the transaction log is approved by the
followers.
After successful acceptance of the entry log, the leader sends an accept message based on the
majority voting to all the individual followers to update the committed index to 101.
Handling Failure
Multiple kinds of failures exist in the environment. However, Paxos and Raft consensus
algorithms only support Crash or Network fault. The followers may have crashed, but the
system can tolerate up to N/2 -1, where N is the total number of nodes in the environment, as
it does not affect the system due to the majority voting. This indicates that the majority of the
followers are non-faulty, and they can send a vote. The leader can take the majority decision
whether to accept or reject a particular transaction.
Pulse-1 is the initial pulse where the commander sends the message to all the
Lieutenants. Broadcast (N, t=0), where N is the number of processes and t is the algorithm
parameter, denotes the individual rounds. The Commander decides his own value, and in this
case, the possible values are {retreat, attack}. In this example, N = 3 has three different
lieutenants and is trying to reach a consensus.
Base Condition for Lieutenant
Each lieutenant receives the message from the commander and checks whether it is a pulse-1
message or not. If it is a pulse-1 message, and the sender is the commander, accept it;
otherwise, wait for a pulse-1 message. Suppose a pulse-1 message is received then broadcast
this message to all other processes in the network.
All the lieutenants broadcast their values to the other lieutenants except the senders. At the
end of the rounds, all the lieutenants must be having N-1 values, except the offline
lieutenants. In the end, they will apply the majority voting principle and achieves the
consensus.
In this agreement Protocol, after N rounds, each process must be having the N values; this is
because the system is synchronous and having a reliable communication medium. Once they
have, N values can apply the majority voting principle and achieve the consensus. However,
to achieve consensus, the system should satisfy the below condition.
The system must have a minimum of three lieutenants (N =3) and a commander. So,
out of N number of processes (lieutenants), maximum of F number of the processes
can be faulty, and F + 1 number of processes must be non-faulty such that N = 2*F +
1.
The system should be fully connected, and the receivers always know the identity of
the senders.
The system should be synchronous and having a reliable communication medium.
The problem was explained aptly in a paper by LESLIE LAMPORT, ROBERT SHOSTAK,
and MARSHALL PEASE at Microsoft Research in 1982:
Imagine that several divisions of the Byzantine army are camped outside an enemy city,
each division commanded by its own general. The generals can communicate with one
another only by messenger. After observing the enemy, they must decide upon a common
plan of action. However, some of the generals may be traitors, trying to prevent the loyal
generals from reaching an agreement. The generals must decide on when to attack the city,
but they need a strong majority of their army to attack at the same time. The generals must
have an algorithm to guarantee that (a) all loyal generals decide upon the same plan of
action, and (b) a small number of traitors cannot cause the loyal generals to adopt a bad
plan. The loyal generals will all do what the algorithm says they should, but the
traitors may do anything they wish. The algorithm must guarantee condition (a) regardless
of what the traitors do. The loyal generals should not only reach agreement, but should
agree upon a reasonable plan.
Byzantine fault tolerance can be achieved if the correctly working nodes in the network
reach an agreement on their values. There can be a default vote value given to missing
messages i.e., we can assume that the message from a particular node is ‘faulty’ if the
message is not received within a certain time limit. Furthermore, we can also assign a
default response if the majority of nodes respond with a correct value.
Leslie Lamport proved that if we have 3m+1 correctly working processors, a consensus
(agreement on same state) can be reached if atmost m processors are faulty which means
that strictly more than two-thirds of the total number of processors should be honest.
Types of Byzantine Failures:
There are two categories of failures that are considered. One is fail-stop (in which the node
fails and stops operating) and other is arbitrary-node failure. Some of the arbitrary node
failures are given below :
· Failure to return a result.
· Respond with an incorrect result.
· Respond with a deliberately misleading result.
· Respond with a different result to different parts of the system.
Advantages of pbft:
· Energy efficiency :
pBFT can achieve distributed consensus without carrying out complex
mathematical computations (like in PoW). Zilliqa employs pBFT in
combination with PoW-like complex computations round for every 100th
block.
· Transaction finality :
The transactions do not require multiple confirmations (like in case
of PoW mechanism in Bitcoin where every node individually verifies all the
transactions before adding the new block to the blockchain; confirmations
can take between 10-60 minutes depending upon how many entities confirm
the new block) after they have been finalized and agreed upon.
· Low reward variance :
Every node in the network takes part in responding to the request by the
client and hence every node can be incentivized leading to low variance in
rewarding the nodes that help in decision making.
pBFT tries to provide a practical Byzantine state machine replication that can work even
when malicious nodes are operating in the system.
Nodes in a pBFT enabled distributed system are sequentially ordered with one node being
the primary (or the leader node) and others referred to as secondary (or the backup nodes).
Note here that any eligible node in the system can become the primary by transitioning
from secondary to primary (typically, in the case of a primary node failure). The goal is that
all honest nodes help in reaching a consensus regarding the state of the system using the
majority rule.
A practical Byzantine Fault Tolerant system can function on the condition that the
maximum number of malicious nodes must not be greater than or equal to one-third of all
the nodes in the system. As the number of nodes increase, the system becomes more secure.
pBFT consensus rounds are broken into 4 phases (refer with the image below):
· The client sends a request to the primary(leader) node.
· The primary(leader) node broadcasts the request to the all the
secondary(backup) nodes.
· The nodes(primary and secondaries) perform the service requested and then
send back a reply to the client.
· The request is served successfully when the client receives ‘m+1’ replies
from different nodes in the network with the same result, where m is the
maximum number of faulty nodes allowed.
Limitations of pBFT:
The pBFT consensus model works efficiently only when the number of nodes in the
distributed network is small due to the high communication overhead that increases
exponentially with every extra node in the network.
Variations of pBFT:
To enhance the quality and performance of pBFT for specific use cases and
conditions, many variations were proposed and employed. Some of them are:
· RBFT – Redundant BFT
· ABsTRACTs
· Q/U
· HQ – Hybrid Quorum Protocol for BFT
· Adapt
· Zyzzyva – Speculative Byzantine Fault Tolerance
· Aardvark
An ABFT network allows for messages to be lost or indefinitely delayed and assumes only
that at some point an honest node’s messages will eventually get through. It is much more
challenging for an honest node to assess whether another node is not following the rules, if
that node’s messages can be indeterminately delayed, but this scenario much better reflects
that network reliability in the real world.
“several divisions of the Byzantine army are camped outside an enemy city, each division
commanded by its own general. The generals can communicate with one another only by
messenger. After observing the
enemy, they must decide upon a common plan of action.”
“Byzantine Generals” metaphor used in the classical paper by Lamport et al. [Lamport et al.,
1982]
The paper considered a synchronous system, i.e., a system in which there are known delay
bounds for processing and communication.
Byzantine Generals
The problem is given in terms of generals who have surrounded the enemy.
Generals wish to organize a plan of action to attack or to retreat.Each general observes the
enemy and communicates his observations to the others.
Unfortunately there are traitors among generals and traitors want to influence this plan to the
enemy’s advantage. They may lie about whether they will support a particular plan and what
other generals told them.
The game theory analogy behind the Byzantine Generals Problem is that several generals are
besieging Byzantium. They have surrounded the city, but they must collectively decide when
to attack. If all generals attack at the same time, they will win, but if they attack at different
times, they will lose. The generals have no secure communication channels with one another
because any messages they send or receive may have been intercepted or deceptively sent by
Byzantium’s defenders. How can the generals organize to attack at the same time?
B: A small number of traitors cannot cause loyal generals to adopt a bad plan
What algorithm for decision making should the generalsuse to reach a Consensus?
What percentage of liars can the algorithm tolerate andstill correctly determine a
Consensus?
Assume plan of actions: attack or retreat
n be the number of generals
v(i) be the opinion of general i (attack/retreat)
each general i communicate the value v(i) by messangers to each other general j
each general final decision obtained by: majority vote among the values v(1), …, v(n)
To satisfy condition A:
every general must apply the majority function to the same values v(1),…,v(n).
But a traitor may send different values to different generals thus generals may receive
different values
To satisfy condition B:
for each i, if the i-th general is loyal, then the value he sends must be used by every loyal
general as the value v(i)
Let us consider the Consensus problem into a simpler situation in which we have: 1
commanding general (C) n-1 lieutenant generals (L1, …, Ln-1)
Consensus:
Interactive Consistency conditions.
IC1: All loyal lieutenant generals obey the same command
IC2: The decision of loyal lieutenants must agree with the commanding general’s order if he
is loyal.
Lieutenant generals send messages back and forth among themselves reporting the command
received by the Commanding General.
The generals announce their troop strengths (in units of 1 thousand soldiers).
The vectors that each general assembles based on (a)
The vectors that each general receives in step 3.
The solution to the problem relies on an algorithm that can guarantee that:
1. Using Oral Message
Solution using Oral Message
Solution for more than 3m+1 generals with m traitors
Oral messages:
With the property that if a majority of the values v i equals v, then majority(v 1,…,v n-1 )
equals v.
Order set V i
Defined recursively
Base case: OM(0)
Commander sends messages to Lieutenants
Each Lieutenant receives and records it. V i ={v 0 :attack}
OM(m)
Each Lieutenant act as the commander in OM(m-1)
Send messages to ‘his’ Lieutenants
Do this recursively attack
So it stands to reason that the objective of a Byzantine Fault Tolerant system is to be able
to defend against Byzantine failures.
Therefore, the Byzantine Fault Tolerance model could help in resolving this problem. The
generals would need an algorithm that could guarantee the following conditions.
1. All the loyal generals would act and agree on the same plan of action.
2. The loyal generals of the Byzantine army would not follow a bad plan under the influence of
traitor generals.
3. The loyal generals would follow all the rules specified in the algorithm
4. All the loyal generals of the Byzantine army must reach a consensus irrespective of the
actions of traitors.
5. Most important of all, the loyal generals should also reach an agreement on a specific and
reasonable plan.
Note that Byzantine faults are the most severe and difficult to deal with. Byzantine fault
tolerance has been needed in airplane engine systems, nuclear power plants, and pretty much
any system whose actions depend on the results of a large amount of sensors.
pBFT tries to provide a practical Byzantine state machine replication that can work even
when malicious nodes are operating in the system.
Nodes in a pBFT enabled distributed system are sequentially ordered with one node being the
primary (or the leader node) and others referred to as secondary (or the backup nodes). Note
here that any eligible node in the system can become the primary by transitioning from
secondary to primary (typically, in the case of a primary node failure). The goal is that all
honest nodes help in reaching a consensus regarding the state of the system using the
majority rule.
A practical Byzantine Fault Tolerant system can function on the condition that the maximum
number of malicious nodes must not be greater than or equal to one-third of all the nodes in
the system. As the number of nodes increase, the system becomes more secure.
pBFT consensus rounds are broken into 4 phases (refer with the image below):
· The primary(leader) node broadcasts the request to the all the secondary(backup)
nodes.
· The nodes(primary and secondaries) perform the service requested and then send back
a reply to the client.
· The request is served successfully when the client receives ‘m+1’ replies from
different nodes in the network with the same result, where m is the maximum number of
faulty nodes allowed.
The primary(leader) node is changed during every view(pBFT consensus rounds) and can be
substituted by a view change protocol if a predefined quantity of time has passed without the
leading node broadcasting a request to the backups(secondary). If needed, a majority of the
honest nodes can vote on the legitimacy of the current leading node and replace it with the
next leading node in line.
BFT over Asynchronous systems
What’s “asynchronous” Byzantine fault tolerance (ABFT)?
When a decentralized network is Byzantine fault tolerant, it means that the honest members,
or nodes, of a network can be guaranteed to agree on the timing and order (consensus) of a
set of transactions. Regardless as to whether there are some nodes maliciously trying to
prevent that consensus — even if as many as 1/3 of nodes are trying to negatively affect
consensus by delaying transactions or otherwise corrupting things. This is the ‘fault
tolerance’ of the network, meaning how many nodes can the network tolerate acting
maliciously, but still come to an honest consensus.
An ABFT network allows for messages to be lost or indefinitely delayed and assumes only
that at some point an honest node’s messages will eventually get through. It is much more
challenging for an honest node to assess whether another node is not following the rules, if
that node’s messages can be indeterminately delayed, but this scenario much better reflects
that network reliability in the real world.