0% found this document useful (0 votes)
68 views24 pages

Performance Evaluation of Blockchain Systems A Systematic Survey

dfsdf

Uploaded by

quanghuynr23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views24 pages

Performance Evaluation of Blockchain Systems A Systematic Survey

dfsdf

Uploaded by

quanghuynr23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Received June 4, 2020, accepted June 19, 2020, date of publication June 30, 2020, date of current version

July 22, 2020.


Digital Object Identifier 10.1109/ACCESS.2020.3006078

Performance Evaluation of Blockchain


Systems: A Systematic Survey
CAIXIANG FAN 1 , SARA GHAEMI 1 , (Graduate Student Member, IEEE),
HAMZEH KHAZAEI2 , (Member, IEEE), AND PETR MUSILEK 1,3 , (Senior Member, IEEE)
1 Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 2R3, Canada
2 Department of Electrical Engineering and Computer Science, York University, Toronto, ON M3J 1P3, Canada
3 Department of Applied Cybernetics, University of Hradec Králové, 500 03 Hradec Králové, Czech Republic

Corresponding author: Caixiang Fan ([email protected])


This work was supported by the Future Energy Systems through the Canada First Research Excellence Fund (CFREF) at the University of
Alberta.

ABSTRACT Blockchain has been envisioned to be a disruptive technology with potential for applications
in various industries. As more and more different blockchain platforms have emerged, it is essential to
assess their performance in different use cases and scenarios. In this paper, we conduct a systematic
survey on the blockchain performance evaluation by categorizing all reviewed solutions into two general
categories, namely, empirical analysis and analytical modelling. In the empirical analysis, we comparatively
review the current empirical blockchain evaluation methodologies, including benchmarking, monitoring,
experimental analysis and simulation. In analytical modelling, we investigate the stochastic models applied to
performance evaluation of mainstream blockchain consensus algorithms. Through contrasting, comparison
and grouping different methods together, we extract important criteria that can be used for selecting the most
suitable evaluation technique for optimizing the performance of blockchain systems based on their identified
bottlenecks. Finally, we conclude the survey by presenting a list of possible directions for future research.

INDEX TERMS Blockchain, distributed ledger technology, performance modelling, performance evalua-
tion, systematic survey.

I. INTRODUCTION algorithm, proof-of-work (PoW), other blockchain platforms


Since its first introduction in Bitcoin by Nakamoto and such as Ethereum [14] and Litecoin [15] also inherit the
Bitcoin [1] in 2008, blockchain has been recognized as performance flaws of Bitcoin. Without doubt, the perfor-
a disruptive technology in various industries beyond cryp- mance issue has become the major constraint of blockchain’s
tocurrency, including finance [2], [3], Internet of Things applications in production. This is especially true for systems
(IoT) [4], [5], health care [6], [7], energy [8]–[10] and logis- demanding high performance such as the online transaction
tics [11], [12]. Compared to conventional, centralized solu- processing (OLTP) and real-time payment systems.
tions, blockchain has some significant advantages such as To overcome this problem, many blockchains put efforts on
immutability, enhanced security, fault tolerance and trans- improving their performance, e.g., by modifying the system
parency. However, the decentralized nature of blockchain structure and designing new consensus algorithms. These
dramatically limits its performance (e.g., throughput and solutions include, but are not limited to, off-chain [16]–[19],
latency). For example, Bitcoin can only achieve a low side-chain [20]–[23], concurrent execution (smart con-
throughput of 7 transactions per second (TPS), and it takes tract) [24]–[26] sharding [27]–[31], and directed acyclic
around 10 minutes for a transaction to get confirmed [13]. graph (DAG) [32]–[39].
In contrast, current centralized payment systems such as Existing and new solutions should be comparatively eval-
VisaNet and MasterCard can reach thousands of TPS and uated in a meaningful manner to show their efficiency and
almost real-time payments. By taking a similar consensus effectiveness. For example, different versions of Hyperledger
Fabric (HLF), e.g., HLF v0.6 and HLF v1.0, should be com-
The associate editor coordinating the review of this manuscript and pared in the same evaluation framework to demonstrate the
approving it for publication was Ahmed Mohamed Ahmed Almradi . performance advantages/disadvantages of the new release.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 126927
C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

In addition, through performance evaluation and analysis, To the best knowledge of the authors, this is the first
bottlenecks can be identified and used to inspire further opti- survey that systematically reviews the state-of-the-art on the
mization ideas. Therefore, performance evaluation plays an blockchain performance evaluation from several different
important role in the area of blockchain research. perspectives. The reviewed evaluation approaches can be
To this end, it would be useful to summarize, classify classified into two high-level groups: empirical evaluation
and survey the existing efforts on blockchain performance and analytical modelling, as shown in Figure 1. Empirical
evaluation and to identify future directions in this area. How- evaluation includes benchmarking, monitoring, experimental
ever, most existing related surveys only focus on improve- analysis and simulation. Analytical modelling mainly covers
ment (scalability) solutions or a specific evaluation topic three types of stochastic models: Markov chains, queueing
of blockchain performance. A representative list of existing models and stochastic Petri nets (SPNs). Through this clas-
surveys, shown in Table 1, clearly identifies the need for a sification, we aim to depict a big picture of the performance
systematic survey on blockchain performance evaluation. evaluation landscape, identify current challenges in this area,
and provide suggestions for future research. The contribu-
tions of this survey can be summarized as follows:
TABLE 1. Research scope of existing blockchain performance related
surveys.

In this contribution, we present a comprehensive, system-


atic survey on blockchain performance evaluation. The sur-
FIGURE 1. A landscape of DLT performance evaluation approaches and
vey covers existing studies on evaluating the performance of evaluated ledgers.
various mainstream blockchains, and compares their advan-
tages and disadvantages. It addresses the following research
• It provides a systematic survey on the blockchain per-
questions:
formance evaluation, covering all existing evaluation
RQ1. What are the current mainstream techniques, main (empirical and analytical) approaches for evaluating the
evaluation metrics and benchmark workloads for mainstream blockchain systems.
blockchain performance evaluation? • It introduces existing popular models for analytical per-
RQ2. How to comparatively evaluate the performance of formance evaluation of prominent blockchain platforms,
two blockchain systems with different consensuses? categorizes them and performs a comparative analysis of
RQ3. What are the significant bottlenecks identified in their advantages and disadvantages.
various blockchain systems? • It identifies the current challenges in this area, and sub-
RQ4. What are the main challenges and opportunities in sequently provides suggestions for future research.
blockchain performance evaluation? The remainder of this paper is organized as follows.
To answer these questions, the authors have searched and Section II provides some prerequisite knowledge on dis-
reviewed the latest papers published since 2015. The papers tributed ledger technology (DLT), its categorization and
have been retrieved from major scientific databases, includ- architecture. Section III introduces the blockchain perfor-
ing ACMDL, IEEEXplore, Elsevier, MPDI and SpringerLink. mance evaluation solutions from the perspective of empirical
In addition, closely related papers cited by the selected com- analysis, including benchmarking, monitoring, experimental
munications have also been taken into consideration. Note analysis and simulation. Section IV focuses on the tech-
that this survey focuses only on blockchain performance nical and mathematical introduction of existing commonly
evaluation, and solutions for blockchain performance or scal- used performance modelling solutions including Markov
ability improvement are not discussed. Interested readers may chains, queueing models and stochastic Petri nets. The fol-
refer to the published surveys of performance improvement lowing Section V summarizes the major findings and points
solutions for blockchain [40]–[45] listed in Table 1. out potential opportunities in this area for future research

126928 VOLUME 8, 2020


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

according to the identified open issues. Finally, the survey is their unique data structure and do not fall into either of these
concluded in Section VI. categories, such as Radix and Corda.
Based on the permissions of the ledger, DLTs can be
II. BACKGROUND classified as permissioned and permissionless, which usually
Blockchain is a major type of distributed ledger technologies makes one think of another taxonomy: private and public
(DLTs). The relationship of blockchain to DLT is just like based on the ledger accessibility. In permissioned distributed
the car to the vehicle [46]. As such, terms ‘‘blockchain’’ ledgers, the identity of all the participants is known.
and ‘‘DLT’’ are used interchangeably throughout this paper. By contrast, everyone can participate anonymously in a per-
Any ledger that is stored in a distributed fashion and shared missionless DLT network. Public and private DLTs can be
among a set of nodes or participants can be referred to as a distinguished by who can read the data on the ledger and
distributed ledger. For new information to be added to this verify its validity. Public ledgers are open and anyone can
ledger, all participating nodes must reach a consensus on read the data on the ledger and host a node without the need to
whether the information is legitimate or not. The algorithm be approved. Private ledgers, by contrast, are only accessible
which determines how this decision is reached, called consen- by those who are pre-approved.
sus algorithm, is an important part of the DLT. In this section, Therefore, based on the permissions and accessibility of
we introduce a categorization of DLT and its abstraction layer the ledger, DLTs can be divided into four groups, as shown
architecture. in Figure 2. Public permissionless ledgers, such as Bitcoin,
Ethereum, and Litecoin, have no restriction on the partici-
pating parties. In public permissioned ledgers, the identity
A. CATEGORIZATION OF DLTs
of participants should be known but anyone can read and
DLTs are widely used in cryptocurrencies such as Bitcoin [1], validate the ledger. EOS, Ripple, and Sovrin are examples of
Ethereum [14], and EOS [47]. However, they can also be used this type. In private permissionless blockchains, the identi-
in a variety of applications beyond cryptocurrencies. In 2019, ties of the participants are not known but only pre-approved
CB Insights identified 55 industries that can be transformed nodes validate the data. Examples of this category include
by this technology [48]. LTO, Holochain, and Monet. Finally, in private permissioned
A possible taxonomy of distributed ledger technologies is ledgers, such as Hyperledger and Corda, access is restricted to
shown in Figure 2. DLTs can be categorized based on their pre-approved participants and the identities of the participants
data architecture. Two main categories are blockchain and are known.
directed acyclic graph (DAG). In blockchain, transactions are
stored in containers called blocks, which are chained together B. DLT ABSTRACTION LAYERS
using their hash values. This chain of information, similar Dinh et al. [49] introduced a blockchain design comprised
to a linked list, is immutable. Examples of this category are of four identified abstraction layers, namely application,
Bitcoin, Ethereum, EOS, and Litecoin. In DAG, on the other execution engine, data model and consensus. In the Oracle
hand, transactions are connected to one another by a reference blockchain guidance book [46], the authors defined five lay-
relationship, forming a directed graph rather than a linked ers to display the general architecture of blockchain, includ-
list. This category includes DLTs such as IOTA, Byteball, ing the application and presentation layer, consensus layer,
and Nano. In addition, there are distributed ledgers that have network layer, data layer and hardware/infrastructure layer.
To better describe the architecture of DLT for the purpose
of performance evaluation, we formulate an abstraction layer
architecture following mainly Dinh’s model [49], but extend
it to five layers shown in Fig. 3.

1) APPLICATION LAYER
As the top presentation of DLT’s technology stack, this layer
contains the applications that are mainly used by the end
users. Up to date, the most popular one is still cryptocur-
rency. As the first published digital currency, Bitcoin has con-
trolled most of the marketplace and developed many variants.
Ethereum has its own currency called Ether. IOTA also has
its currency with the same name as the network, IOTA [37].
Other examples include the wallet to manage cryptocurrency,
smart contracts, and all kinds of decentralized applications
(DApps). In a DLT system, a smart contract is a piece of code
designed to digitally facilitate, verify, or enforce the execution
of a contract. For Ethereum, the smart contract is running on a
FIGURE 2. Categories of distributed ledger technologies. dedicated virtual machine (called EVM); and most contracts

VOLUME 8, 2020 126929


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

caused by inconsistent execution would result in computation


resource waste and further decrease the performance. Addi-
tionally, the resource configurations (e.g., CPU and RAM)
may impact the execution performance.

3) DATA LAYER
In the data layer, a wide range of data-related topics are
defined, including transaction models, data structure, Merkel
trees, hash function, encryption algorithms, etc. There are
two popular transaction models: unspent transaction output
(UTXO) and account. For UTXO, one owner completes value
transfers by signing a transaction transferring the ownership
of the UTXO to the receiver’s public key. It involves an extra
step of searching for ownership of the transaction from the
sender’s side. The account-based model is more efficient as it
atomically updates two accounts in one transaction. A smart
FIGURE 3. Abstraction layer model for DLT.
contract (chaincode for HLF) is a special type of account.
For blockchain, blocks containing transactions and con-
tract execution states are chained together in a linked list by
on the system are related to cryptocurrency. While HLF’s putting the hashed result of its previous block’s content into
smart contract is running in a container such as Docker. One the header of the current block. Ethereum and HLF employ
of the best-known DApps is the decentralized autonomous a two-layer data structure to organize the block’s content.
organization (DAO) in Ethereum, which creates communities All states are stored in a key-value database on a disk and
to raise funding for exchange and investment. indexed in a hash tree. The hash tree root is contained in
Because this layer is in charge of presenting the final results the block’s header. With a similar design, different DLTs
executed from the distributed ledger system, it is supported have their own storage solutions for each level. For example,
and impacted by all the lower layers. Therefore, the perfor- Ethereum uses LevelDB, and HLF uses CouchDB to store
mance evaluation results of the application layer reflect the the states; Ethereum and Parity employ Patricia-Merkle (key-
overall performance of tested DLTs. value store) tree, while HLF implements Bucket-Merkle tree
to store the indices [49]. For IOTA, transactions are directly
2) EXECUTION LAYER appended to the DAG structure called tangle in a hashed man-
The execution layer is in charge of executing contract or ner. The IRI uses RocksDB database to store the snapshot,
low-level machine code (bytecode) in a runtime environ- a pruned ledger to prevent the tangle from expanding too large
ment installed on DLT network nodes. Ethereum has its own in size.
machine language and a virtual machine (EVM) developed Besides the factors mentioned above, there are other
to run the smart contracts code. Unlike Java virtual machine design parameters in the data layer, such as hash functions
(JVM), the EVM reads and executes a low-level represen- (e.g., SHA 256 v.s. SHA 128), encryption algorithms
tation of smart contracts called the Ethereum bytecode. The (RSA v.s. ECC), and block size. All these factors may influ-
smart contracts are programmed in a dedicated high-level lan- ence the performance of a blockchain system.
guage named Solidity, which is first compiled to bytecode by
Solidity compiler. The Ethereum bytecode is an assembly lan- 4) CONSENSUS LAYER
guage made up of multiple opcodes. Each opcode performs The consensus protocol is the core of a DLT system. It sets
certain action on the Ethereum blockchain. In contrast, HLF the rules and forces all nodes to follow them to reach an
does not take the semantics of language into consideration. agreement (e.g., transaction confirmation) on blockchain
It runs the compiled machine codes (from chaincode) inside content. Generally, there are two basic types of consensus
Docker images. In addition, HLF’s smart contract (chain- mechanisms, which are proof-based and vote-based consen-
code) supports multiple general high-level programming lan- suses. The most popular proof-based consensus is proof-of-
guages such as Go, node.js, and Java rather than a dedicated work (PoW), which has been employed by many blockchain
language like Solidity of Ethereum. IOTA does not support systems. PoW is a very computation intensive consensus.
smart contracts up to date. It adopts Java as the main devel- It requires the nodes to solve a difficult puzzle to compete
opment language and runs its reference implementation (IRI) for the right of recording the ledger. Only the first node
in JVM. IOTA also has a version running in Docker image. (called winner) solving the puzzle can append the block to the
The runtime environment used to execute contracts or ledger and gets incentives accordingly. Since PoW provides
transactions needs to be efficient. And the execution result high security, integrity and decentralization in an untrusted
should be deterministic to avoid the uncertainty and inconsis- environment, it is very popular in public blockchains. How-
tency of transactions on all nodes. Any transaction abortion ever, the classic PoW protocol has a poor efficiency on

126930 VOLUME 8, 2020


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

processing transactions. To tackle this problem, many vari- III. EMPIRICAL ANALYSIS IN BLOCKCHAIN
ations have been proposed to keep the same safety while PERFORMANCE EVALUATION
achieving a better performance. Examples include greedy In this section, we investigate existing approaches to
heaviest observed subtree (GHOST), proof of authority blockchain performance evaluation from the perspective of
(PoA), proof of stake (PoS) and proof of elapsed time (PoET). empirical analysis. Specifically, different solutions, includ-
The vote-based consensuses are communication intensive. ing benchmarking, monitoring measurements, self-designed
Different from PoW, vote-based solutions always give a deter- experiments and simulation, are reviewed and compared.
ministic execution result and usually achieve a relatively In practice, these approaches are usually used together to pro-
high performance. They rely on frequent message transitions vide more evidence for blockchain performance evaluation.
among different roles in a network to ensure that all nodes
reach an agreement on the block order. It is very popular in
permissioned blockchains. Raft and Byzantine fault tolerance A. BLOCKCHAIN BENCHMARKING TOOLS
(BFT)-based, (e.g., PBFT and BFT-SMaRt) algorithms are The performance benchmarking has been well studied and
two representatives of this consensus type. Raft has only documented for the cloud (e.g., Hadoop, Mapreduce and
crash fault tolerance (CFT), while PBFT and BFT-SMaRt can Spark) and database (e.g., relational and NoSQL) systems.
address Byzantine fault. Some proposed benchmark frameworks such as TPC-C [50],
There are also some hybrid DLTs that combine differ- YCSB [51] and SmallBank [52] are well-established and
ent types of consensuses. For example, Tendermint com- have essentially formed the industrial standards. For example,
bines PoS and PBFT; EOS takes a hybrid design combining YCSB is widely used for benchmarking NoSQL databases
PBFT and DPoS. Both target on improved performance and such as Cassandra [53], MongoDB [54] and HBase [55]; and
enhanced security. Interested readers may refer to the pub- SmallBank is a popular benchmark for OLTP workload.
lished surveys of blockchain consensus. Because of the deter- However, these frameworks cannot be directly applied to
ministic properties, BFT-based consensus algorithms have a benchmark distributed ledger systems due to the diversity
much lower transaction delay than PoW. But the expensive of consensus mechanisms and APIs. As more and more
communication cost makes it difficult to scale, especially in blockchain systems emerge striving to improve DTL per-
a large network. Therefore, consensus design, evaluation and formance, it becomes imperative to devise a solution for
optimization in DLTs still remain an active research topic. comparing different platforms in a meaningful manner.
Up to date (June 2020), there are three popular perfor-
mance benchmarks dedicated to evaluating blockchain sys-
5) NETWORK LAYER tems, as listed in Table. 2.
A peer-to-peer (P2P) network is the foundation of a DLT Blockbench [49] is the first benchmark framework
system. It takes care of peer discovery, transactions, and designed for evaluating private blockchains in terms of per-
block propagation. In a public blockchain such as Bitcoin, formance metrics on throughput, latency, scalability and
this network is very large, with thousands of nodes working fault-tolerance. Presently, it supports measurement on four
together to reach consensus. For private blockchain systems, major private blockchain platforms, namely Ethereum, Parity,
the scale varies from several entities to over a hundred. Either HLF and Quorum. However, it claims to support the evalua-
way, a basic requirement for the P2P network is to provide tion of any private blockchain by accordingly extending the
speed and stablility. When a new participant wants to join, workload and blockchain adaptors.
this layer ensures that nodes can discover each other. Then, all In the design of Blockbench, four abstraction layers in
connected nodes communicate, propagate and synchronize blockchain are identified: consensus, data model, execution
with each other to maintain the current state of the blockchain engine and application, from the bottom (low level) to the
network. Specifically, transaction broadcast, validation and top (high level). The consensus layer is in charge of setting
transaction commit are all completed via this layer, as well as the rule of agreement and getting all network participants to
the world state propagation. In the P2P network, there are two agree on the block content so that it can be appended to the
basic types of nodes: full nodes and light nodes. Full nodes blockchain. The data model defines the data structure, content
take care of mining, transaction validation and execution of and operations on the blockchain data. The execution engine
consensus rules, while light nodes only keep the header of the contains resources of the runtime environment such as the
blockchain (keys) and act as clients to issue transactions. EVM and Docker, which support the execution operations of
Therefore, the network layer is critical, especially for blockchain codes. The application layer includes all kinds of
communication-intensive DLTs. Peer discovery and ledger blockchain applications such as smart contracts and different
synchronization among neighbours directly rely on the net- types of DApps. It is worth noting that Blockbench adopts and
work, so that the speed determines the efficiency. And some designs various workloads to test the performance of different
detailed metrics, such as the number of transactions per net- layers, as shown in Table 3.
work data are also related to this layer. Moreover, the package Hyperledger Caliper [57] is a performance evaluation
loss rate and network delay may have an impact on the framework mainly focusing on benchmarking Hyperledger
performance of DLT. blockchains such as Hyperledger Fabric, Sawtooth, Iroha,

VOLUME 8, 2020 126931


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

TABLE 2. A comparison of three popular blockchain benchmarks.

TABLE 3. Blockbench workloads for evaluating each layer of blockchain.

Burrow and Besu. In the system architecture, there are two Besides the general performance metrics evaluation, there
main components: Caliper core and Caliper adaptors. The are also studies focusing on specific metrics for particular
former defines system workflow, while the latter are used to blockchain. For example, OpBench [58] and another bench-
extend evaluation for other blockchains such as Ethereum and mark framework [59] are proposed to evaluate if a miner’s
FISCO BCOS. Before running a test, benchmark workloads award is proportional against to the CPU execution time or
and necessary information interfacing adaptor to the system consumed computation power for Ethereum smart contracts.
under test (SUT) need to be predefined in configuration files.
During the test, a resource monitor runs to collect resource B. BLOCKCHAIN PERFORMANCE MONITORING
utilization information (e.g., CPU, RAM, network and IO) Blockchain benchmarking usually requires a standardized
and all clients publish their transaction rate control infor- environment and a well-documented workload as input. How-
mation to a performance analyzer. When a test is finished, ever, for public blockchain systems, it is impossible to have
a detailed test report is generated by a report generator. a good control against the real workload and consensus par-
DAGbench [56] is a relatively recent framework dedicated ticipants, which makes the benchmarking more challenging.
to benchmarking DAG distributed ledgers such as IOTA, In terms of evaluating public blockchains, there are two
Nano and Byteball. The currently supported indicators are potential solutions.
throughput, latency, scalability, success indicator, resource The first solution is to build a private version of the asso-
consumption, transaction data size and transaction fee. From ciated test network and leverage the existing benchmarks
the system design perspective, DAGbench shares the same mentioned above to evaluate blockchain under artificially
approach with Blockbench and Caliper which adopt a mod- designed workloads. This may require new adapter develop-
ular adaptor-based architecture. Users just need to choose ment for either workload or blockchain network. In addition,
(or develop if not available) associated adaptors for different this approach should take into consideration the scalability
workloads and blockchain systems under evaluation. problem of blockchain because the tested private version of

126932 VOLUME 8, 2020


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

blockchain may encounter scaling issues when implemented Fabric’s performance characteristics were also studied by
publicly. Therefore, the tested result may show better values varying the number of chaincodes, channels and peers. The
of performance metrics compared to the real public network. results show that the throughput of HLF v1.0 is sensitive to
The second solution is to monitor and evaluate the live the orderer settings, and it is a significant drawback for the
public system’s performance under realistic workload [60]. commiter in this version that it does not process transactions
Zheng et al. [61] proposed a detailed, real-time performance in parallel, incapable of taking advantage of multiple vCPUs.
monitoring framework using a log-based approach. It has Another comprehensive empirical study was conducted by
lower overhead, more details, and better scalability compared Thakkar et al. [64] who explored the performance bottlenecks
to its counterpart solution via remote procedure call (RPC). of the HLF v1.0 under different block sizes, endorsement
The high-level system framework is shown in Fig. 4. policies, number of channels, resource allocation and state
database choices (GoLevelDB vs. CouchDB). The experi-
mental results indicated that endorsement policy verification,
sequential policy validation of transactions in a block, and
state validation and commit (with CouchDB) were the three
major bottlenecks. Accordingly, the authors suggested three
optimization solutions, including parallel VSCC validation,
cache for membership service provider (MSP), and bulk
read/write for CouchDB. All these optimizations have been
implemented in release HFL v1.1.
A study completed at IBM by Androulaki et al. [65]
focused on HLF v1.1 to explore the impact of block size,
peer CPU, and SSD vs. RAM disk on blockchain latency,
throughput and network scalability under different numbers
FIGURE 4. Blockchain performance monitoring framework [61]. of peers. The results show that HLF v1.1 achieves end-to-
end throughput of 3500+ TPS in certain popular deployment
configurations, with the latency of a few hundred ms, scaling
C. EXPERIMENTAL ANALYSIS OF BLOCKCHAIN SYSTEMS well to 100+ peers.
In this section, we look at DLT performance evaluation from Nguyen et al. [66] conducted an experimental study to
the perspective of empirical analysis based on self-designed explore the impact of large network delays on the perfor-
experiments. Even though empirical analysis can hardly mance of Fabric by deploying HLF v1.2.1 over an area net-
provide standardized test results like benchmarking, this work between France and Germany. The results reveal that an
approach is very flexible in parameterization. It can be used obvious network delay (3.5s) brings 134 seconds offset after
to identify potential bottlenecks and pave the way to further the 100th block between two clouds, which indicates that the
performance improvements. tested version of Fabric can not provide sufficient consistency
Experiment-based approaches have been widely employed guaranties. Therefore, HLF v1.2.1 cannot be used in critical
to evaluate distributed ledger systems such as Hyperledger, environments such as banking or trading. This was the first
Ethereum and DAG-based ledger. Various private blockchain work that experimentally demonstrated the negative impact
platforms and different versions of a certain blockchain can of network delays on a PBFT-based blockchain.
be compared on performance by running tests under a well- Another HLF performance evaluation work focusing on
controlled test environment. In addition, some studies exam- the underlying communication network was conducted by
ined the detailed performance, for example, the performance Geyer et al. [67] using Caliper [57] and a dedicated testbed
of different encryption and hash algorithms, from the data on which network parameters, such as latency or packet
layer in the blockchain abstraction model. loss, can be configured. In the experiments, the influence of
transaction rate, chaincode, network properties, local network
1) HYPERLEDGER PERFORMANCE ANALYSIS impairment, and block size have been separately examined
Nasir et al. [62] conducted an experimental performance and quantitatively analyzed. The experiment results identified
analysis of two versions of HLF (v0.6 and v1.0) on their the validation of the transactions as the major contributor to
execution time, latency, throughput and scalability by varying the transaction latency in HLF.
the workloads and node scales. The overall results indicate As the first long-term support release, HLF v1.4 caught the
that HLF v1.0 consistently outperforms HLF v0.6 across all attention of several blockchain researchers. Kuzlu et al. [68]
evaluated performance metrics. investigated the impact of network workloads on the per-
Baliga et al. [63] took an experimental approach to study formance of a blockchain platform in terms of transaction
throughput and latency of HLF v1.0. Using Caliper as the throughput, latency, and scalability (i.e., the number of par-
benchmarking tool, the authors configured different transac- ticipants serviceable by the platform). Following network
tion and chaincode parameters to explore how they impact load parameters were varied in the experiment: number of
transaction latency and throughput under micro-workloads. transactions, transaction rate and transaction type.

VOLUME 8, 2020 126933


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

Although the practical Byzantine fault tolerance (PBFT) 3) DAG DLT PERFORMANCE ANALYSIS
algorithm has been adopted as the consensus protocol since In traditional blockchain systems, transactions are stored in
its version 0.6, dishonest participants and their attacks such as blocks that are then organized as a ledger in a single chain data
intentionally delaying messages, sending inconsistent mes- structure. This structure makes it incapable of concurrently
sages and distributed-denial-of-service (DDoS) never stop. generating blocks, and thus limiting the transaction through-
Malicious behaviour may significantly undermine the system put. In DAG-based distributed ledgers, transactions or blocks
in terms of both security and performance. To explore the are organized in different vertices of the directed graph, which
performance of HLF with malicious behaviour, Wang [69] allows parallel block generation and inclusion. Based on
designed multiple malicious behaviour patterns and exper- this idea, many distributed ledgers have been proposed with
imentally tested the transaction throughput and latency on their own consensus mechanisms. For example, IOTA [37]
HLF. The results show that delay attacks, along with keep- employs a cumulative weight approach for transaction con-
ing some replicas out of working, dramatically decrease the firmation and Markov chain Monte Carlo (MCMC) sampling
system performance. algorithm for random tip selection; Byteball [38] achieves
Apart from Fabric, Shi et al. [70] empirically studied consensus by relying on 12 selected reputable Witnesses;
the performance of Sawtooth, another well-known permis- and Nano [39] adopts a balance-weighted vote mechanism
sioned blockchain platform from Hyperledger. The examined to reach agreement on transaction confirmation.
performance metrics include consistency (i.e., whether the Even though DAG-based ledgers are designed to theo-
platform’s performance behaves consistently each time with retically have faster transaction speed than blockchain, it is
the same workload and cloud VM configuration), stability necessary to evaluate the performance of existing DAG dis-
(i.e., whether the platform’s performance remains stable with tributed ledger implementations and identify their potential
the same workload, but different cloud VM configurations) bottlenecks. Fan et al. [76] demonstrated the scalability of
and scalability (i.e., how the platform performance achieves IOTA under IoT scenarios in a private network with 40 nodes.
scalability with different workloads and configuration param- The experiment results indicated that transaction through-
eters). The adjustable configuration parameters identified for put (TPS) has good linear scalability against the transaction
optimizing the performance of Sawtooth are scheduler and arrival rate. Three representatives of DAG-based distributed
maximum batches per block. ledgers, namely IOTA, Nano and Byteball, were compara-
From the results of empirical performance analysis tively evaluated using the proposed DAGbench in [56]. From
summarized above, it is obvious that Hyperledger needs the experimental results, some useful observations, such as
improvement on both geographical scalability (limited by the the advantages and disadvantages of the three tested DAG
network latency) [66] and size scalability (the platform fails implementations, can be obtained.
scaling beyond 16 nodes [49]). The bottleneck is the adopted
PBFT consensus, which is a communication bound mech-
anism as opposed to the computation intensive PoW [71] 4) COMPARATIVE ANALYSIS
consensus. Before developing a blockchain-enabled application, deci-
sion makers should first assess the suitability of blockchain
2) ETHEREUM PERFORMANCE ANALYSIS implementation. Then, a comparative performance analysis
Rouhani and Deters [72] studied the performance of is often necessary to select a blockchain platform that will
Ethereum on a private blockchain by analyzing two most perform well in the target application environment.
popular Ethereum clients: PoW-based Geth and PoA-based After developing Blockbench, Dinh et al. [49] used this
Parity. The results indicate that, compared to Geth, Parity is tool to conduct a comparative performance analysis on
89.82% faster in terms of transaction processing, on average, three mainstream private blockchains, namely Ethereum
under different workloads. (geth v1.4.18), Parity (v1.6.0) and HLF (v0.6.0-preview).
Yasaweerasinghelage et al. [73] introduced an approach to Their findings can be summarized as follows: 1) HLF
predict the latency of blockchain-based systems using soft- performs consistently better than Ethereum and Parity
ware architectural modelling tool Palladio workbench [74] across all macro (e.g., throughput and latency) and micro
and simulation. They leveraged the proposed method to test (e.g., IOHeavy) benchmarks, but it fails to scale up to more
latency on a private Ethereum (Geth) experimental environ- than 16 nodes; 2) The consensus protocols are identified
ment. The results show a low relative error of response time, as major bottlenecks for HLF and Ethereum, while transac-
mostly under 10%. tion signing is a bottleneck for Parity. The authors further
Bez et al. [75] conducted an initial quantitative analysis on compared the performance of two different versions of HLF
the scalability of Ethereum. The transaction throughput was v0.6.0 and v1.0.0 against IOHeavy workload in their more
evaluated under an extensible test environment with synthetic recent work [71].
benchmarks. The results indicate that Ethereum follows the Because of the lack of interface standards, evaluating
scalability trilemma, which claims that a blockchain platform different blockchains remains difficult. To overcome this
can hardly reach decentralization, scalability and security problem, a generic workload performing the same functions
simultaneously. on different blockchain interfaces was designed in [77] to

126934 VOLUME 8, 2020


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

comparatively evaluate three prominent consortium To explore whether existing blockchain solutions can scale
blockchain platforms for IoT. They were HLF v0.6 with to large IoT networks, Han et al. [84] comparatively evalu-
the PBFT consensus, HLF v1.0 with the Byzantine fault- ated the performance of five selected prominent distributed
tolerant state machine replication (BFT-SMaRt) consensus, ledgers using classic consensus protocols: Ripple, Tender-
and Ripple with the Ripple consensus. Results confirmed that mint, Corda and v0.6 and v1.0 of HLF. A series of exclusive
the evaluated blockchains could offer reasonable throughput tests were run to evaluate the throughput and latency with
but with very limited scalability. different numbers of nodes (ranging from 2 to 32) for each
Pongnumkul et al. [78] conducted a preliminary perfor- of the ledgers. The results show that even though these sys-
mance analysis of two popular private blockchain platforms tems can sometimes provide thousands of TPS throughput,
HLF (v0.6) and Ethereum (geth 1.5.8, private deployment) their networks usually do not scale to tens of devices as the
under varying workloads. The experimental results demon- performance drops dramatically when the number of nodes
strated that HLF outperforms Ethereum in terms of all increases. Table 4 lists an overview of various DLTs’ per-
evaluated metrics (execution time, latency and throughput). formance extracted from the reviewed experimental analysis
However, this study also pointed out that the performances studies.
of both platforms are still not competitive with current main-
stream database systems, especially under high workloads. 5) ENCRYPTION PERFORMANCE ANALYSIS
This conclusion was tested and confirmed by another, more In addition to the end-to-end performance metrics, there
recent study [79], in which Ethereum and MySQL were are also some evaluation works focusing on the detailed
compared. performance of a certain step or subprocess such as the
Comparative analysis can also be conducted on con- efficiency of encryption and hash function. According to
sensus algorithms of different blockchains. For example, Park et al. [85], the transaction processing time equation
Hao et al. [80] compared the performance between Hyper- is
ledger (PBFT) and private Ethereum (PoW) via their pro-
posed benchmark framework constructed with four modules:
workload configuration module, consensus smart contract T = ti + tc = (tv + tpow + tn + te ) + tc , (1)
module, data collector module and the target blockchain
platforms. The evaluation results show that HLF consistently where ti refers to the issuance time, tc to the con-
outperforms Ethereum in terms of average throughput (TPS) firmation time, tv to the validation time, tpow to the
and latency. This study also points out that the consen- PoW time, tn to the network overhead, and te to the
sus mechanism induces performance bottleneck in private processing overheads. The processing overheads include
blockchains. Another example is the performance analy- encryption/decryption, hashing and authentication. Efficient
sis conducted on PoW and the Proof-of-Collatz Conjecture encryption and hashing algorithms contribute to transac-
(PCC) [81]. PCC [82] is a recently introduced number-based tion issuance speed in DLT. Chandel et al. [86] analyzed
theoretic PoW using a new metric called Collatz orbits, which and compared the performance of the two most commonly
are defined in the Collatz Conjecture algorithm. Authors used encryption algorithms in blockchain, Rivest-Shamir-
evaluated these two consensus algorithms with respect to the Adleman (RSA) and elliptic-curve cryptography (ECC).
execution time, the deployment time and the latency on a Their comprehensive analysis results based on the key size,
private blockchain network. The experiment results demon- key generation performance and signature verification per-
strate that PCC-based blockchain consistently outperforms formance show that the ECC algorithm (adopted by Bitcoin
PoW-based blockchain in terms of all tested metrics and even and Ethereum) outperforms RSA in general. This study also
steadily achieves 1000× faster execution speed than of PoW. points out that ECC satisfies the security needs of blockchain
To provide system designers suggestions on smart con- better than RSA.
tract platform selection, Benahmed et al. [83] conducted a More recently, Ferreira et al. [87] conducted a study on
comparative performance analysis of Hyperledger Sawtooth, Blockchain-based IoT (BIoT) [88] to explore the perfor-
EOS and Ethereum. Following the workloads used in Block- mance of hash function in blockchains. Particularly, authors
bench [49], the authors modified and defined three types of developed a blockchain in an IoT scenario to evaluate the
workloads, namely CPUHeavy, KVStore (Key-Value Store), performance of different cryptographic hash functions such
and SmallBank, to comparatively test CPU consumption, as MD5, SHA-1, SHA-224, SHA-384 and SHA-512. The
memory consumption, load scalability and network scalabil- test results show that SHA-224 and SHA-384 are the best
ity in distributed ledgers. The results reveal that the third hash functions for blockchain due to their lack of collision
generation platform EOS outperforms the other two in both attacks. In hashing ciphers, a collision attack is the problem
resource consumption and speed, but with some shortcom- that there exist two different messages m1 and m2 , such that
ings such as centralization. In addition, according to their hash(m1 ) = hash(m2 ). In addition, these two hash functions
performance in the test, Sawtooth was suggested for use in are more time-efficient than others to process blockchain
the Internet of Things and Ethereum’s PoA implementation functions with the advantage of producing a smaller average
for the fast development of web-oriented applications. block size.

VOLUME 8, 2020 126935


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

TABLE 4. Overview of different DLT performance (throughput and latency) under various evaluation environments.

D. SIMULATION provides a convenient way for users to tune the system param-
All the evaluation solutions mentioned above (i.e., bench- eters to run different settings for the sake of comparison.
marking, monitoring and experimental analysis) require the In this subsection, we will take a look at the role of simulation
availability of the systems, no matter private or public in the blockchain world.
blockchains. However, the system under evaluation is not
always available. For instance, when a company needs to 1) BlockSIM
make a selection between two blockchain platforms under In 2019, there were three similar simulators with the same
development according to their performance, none of the name, BlockSim (or BlockSIM), proposed for simulating
previously discussed solutions is feasible. Moreover, it is blockchain systems. Alharby and van Moorsel [89] pro-
usually costly on both time and resources to construct posed and implemented a framework called BlockSim to
a real blockchain network for testing. This brings us to build discrete-event dynamic system models for PoW-based
explore another evaluation approach, namely, simulation. blockchain systems. This framework was organized in three
A blockchain simulator can mimic the behaviours of network layers: incentive layer, connector layer and system layer.
nodes in reaching the consensus, providing performance sim- Using the proposed simulation tool, the authors explored
ilar to a real system. Besides, a blockchain simulator usually the block creation performance under the PoW consensus

126936 VOLUME 8, 2020


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

TABLE 5. A comparison on different empirical performance evaluation solutions for blockchain system.

algorithm. This simulator helped to understand the details that by issuing a transaction with a smaller average number
of the block generation process in PoW. The predefined test of parents n in DAG, the transaction speed (TPS) can be
cases were validated and verified in their extension study, increased. Kusmierz et al. [94] ran IOTA tangle simula-
where the simulation outcomes were compared with results tions in a continuous-time model to explore how different
of real-life systems such as Bitcoin and Ethereum to show tip selection algorithms, i.e., uniform random tip selec-
the feasibility of this approach. However, the extensibility of tion (URTS) and unbiased random walk (URW), affect the
this simulator is still a problem for future research. growth of the tangle. Simulations under varying transaction
To help architects better understand, evaluate and plan arrival rates were used to analyze the performance of the
for the system performance, Pandey et al. [90] proposed tangle.
and developed a comprehensive open-source simulation tool
called BlockSIM for simulating private blockchain systems. E. COMPARISON OF DIFFERENT EVALUATION SOLUTIONS
This tool is designed to evaluate system stability and trans- In the previous subsections, we introduced four types of
action throughput (TPS) for private blockchain networks by empirical evaluation solutions and surveyed existing studies
running scenarios, and then decide on the optimal system which adopted the associated approaches. In this subsec-
parameters suited for the purposes of architects. The compar- tion, we comparatively discuss the advantages and disad-
ison results between BlockSIM and a real-world Ethereum vantages of the above-mentioned solutions. This comparison
private network running PoA consensus show that BlockSIM is based on both the general characteristics of the individ-
can be used effectively. ual approaches and their suitability in evaluating different
More recently, Faria and Correia [91] presented a flexible types of blockchains. The compared items are divided into
discrete-event simulator (also called BlockSim) to evaluate two categories: solution requirements and solution efficiency,
different blockchain implementations. With a good design of see Table 5. Solution requirements describe the network spec-
APIs, new blockchains can be easily modelled and simulated ifications for evaluating blockchain systems in terms of the
by extending the models. Running this simulator for Bitcoin node, network and workload. Solution efficiency provides
and Ethereum, the authors got some interesting performance three dimensions, namely parameterization, extensibility and
conclusions. For instance, doubling the block size (number difficulty, to compare the efficiency and effectiveness of dif-
of transactions) had a small impact on the block propagation ferent solutions.
delay (10ms), while encrypting communication had a higher Monitoring the performance of a blockchain system
impact on that delay (more than 25%). requires a realistic deployment of the system in production
with real workloads. Even though this approach can also
2) DAGsim be used to evaluate a private blockchain in an experimental
Similarly, Zander et al. [92] presented a continuous- setup, we argue that it is more suitable to evaluate public
time, multi-agent simulation framework called DAGsim, for blockchain when compared with benchmarking. In the con-
DAG-based distributed ledgers. Specifically, the performance text of evaluating a public blockchain, it becomes difficult to
of IOTA in terms of the transaction attachment probability change any parameters for multiple tests. The challenge of the
was analyzed using this tool. The results indicate that agents extensibility lies in the development of adaptable log parser
with low latency and high connection degrees have a higher for various blockchains. But, it is easy to deploy for certain
probability of having their transactions accepted in the net- blockchains using the existing solutions [61].
work. Another multi-agent tangle simulator [93] built with Benchmarking requires a well-controlled evaluation envi-
NetLogo simulates both random uniform and MCMC tip ronment with a test network and artificial workloads. Once a
selection in a visualized and interactive way. benchmark tool is selected, the supported workloads and test
In addition to pure simulators, some other studies lever- metrics are limited, as well as the parameters which can be
age simulations combined with analytical results to conduct tuned. For example, Blockbench doesn’t support tuning the
validation or exploration. Park et al. [85] proposed and imple- network layer parameters such as network delay and, up to
mented a general DAG-based cryptocurrency simulator using date, it only supports evaluating four types of blockchain
Python. This simulator was used to validate the proposed platforms, i.e., Ethereum, HyperledgerFabric (HLF), Par-
analytical performance model, through which they found ity and Quorum. However, the well-designed APIs allow

VOLUME 8, 2020 126937


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

users to develop their own adaptors and extend its feasi- A. MARKOV CHAINS FOR MODELLING DLT CONSENSUSES
bility to evaluate any private blockchains. So, the extensi- In probability theory, Markov processes are a type of stochas-
bility of benchmarking is relatively higher than monitoring. tic process with Markov property. Also called memoryless
In addition, this solution is easy to deploy since there have property, it refers to the fact that the future states of the
been several popular and well-documented benchmark tools, process depend only on the present state, but not on the previ-
see Table 2. ous ones. Markov chain is defined as a Markov process with
Experimental analysis refers to the evaluation solution discrete state space. It is a fundamental mathematical tool to
based on self-designed experiments. This is a very general evaluate the performance of distributed ledger systems [111].
solution that is commonly used. It is very similar to bench- In this subsection, we investigate how Markov chains are used
marking but different in two main aspects. First, self-design to model two different consensus algorithms: Raft and the
gives more flexibility in evaluating impact factors, pro- tangle for IOTA. The specific type of process used for this
viding a high capability of parameterization. For example, modelling is called discrete time Markov chain (DTMC).
the impact of network delay on HLF performance can be DTMC for Modelling Raft: In a Raft [112] cluster, each
evaluated in a self-designed experiment, which is not sup- node is at any given time in one of the three states: follower,
ported by benchmarking. Second, the test is usually dedicated candidate and leader. Normally, there is only one leader in
to a specific blockchain and is not as standardized as bench- a Raft cluster. We call it network split in the case of two
marking, which limits the extensibility. So, the deployment or more leaders being elected simultaneously, which may
difficulty partly depends on the complexity of the SUT and dramatically impact the performance of the system. After
what to evaluate. the leader has been elected, it handles all requests from the
Simulations have a relatively greater difficulty in the stage client and sends them to followers for validation. Followers
of simulator design and development. But, once it is com- simply receive requests from and respond to leaders and can-
pleted, the simulator usually provides a number of advan- didates. Candidates are a mid-state transiting from follower
tages in comparison to other approaches. The simulation to leader. The whole Raft consensus can be divided into
solution is very extensible and can be used to quickly test several ever increasing timely manners called terms which
different configuration parameters at a low cost. As men- have two processes: leader election and ledger replication.
tioned in subsection III-D, another obvious advantage of Each term starts with a leader election, in which all nodes
simulation is that it does not require the availability of the start from follower state. Then, a node transits to candidate,
blockchain. However, as for the evaluation results, there candidate to leader or back to follower according to the rules
may be a relatively large difference (e.g., 10%) between depicted in Fig. 5 [112]. Once a leader is elected successfully,
simulation and experiment, which induces concerns about the ledger replication process starts with the leader sending
the accuracy of this solution. Moreover, some metrics can- heartbeat messages to all other nodes to establish its authority
not be evaluated in simulators such as the transactions per and prevent new elections. Once the leader receives responses
CPU, transactions per memory second, transactions per disk of writing new transaction entry to the ledger from the major-
IO, and transactions per network data for a blockchain ity of the followers, it notifies them and the client that the
system. transaction is committed.

IV. ANALYTICAL MODELLING IN BLOCKCHAIN


PERFORMANCE EVALUATION
Analytical modelling of performance leverages mathemati-
cal tools to formalize blockchain system in an abstract way
and to solve ensuing models with rigor. The model output
(e.g., average transaction latency being expressed as a func-
tion of network indicators) provides analytical evidence for
blockchain performance evaluation. In this section, we survey
the performance evaluation solutions of distributed ledger
systems based on analytical modelling. We aim to summa- FIGURE 5. Node states transition illustration in Raft consensus.
rize the mainstream techniques, explore how and why these
models are employed for certain distributed ledgers, and then To explore the impact of network properties on the
identify the current challenges in blockchain performance blockchain performance, Huang et al. [95] have built a simple
modelling. In particular, we focus on surveying the stochastic Markov chain model for the process of a node transferring
models, which have been used to successfully model many from follower state to candidate. They consequently present
cloud systems. the network split probability as a function of the network size,
In Table. 6, we classify the existing popular solutions the packet loss rate, and the election timeout period. Let us
of performance modelling for distributed ledgers into four define the packet loss probability as a constant value p for a
categories: Markov chains, queueing models, stochastic Petri given network, the timeout period for each round of election
nets and other models. as Et uniformly initiated from a range [a,b], and interval

126938 VOLUME 8, 2020


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

TABLE 6. A summary of blockchain performance modelling studies.

between two heartbeats as τ . Thus, the maximum number of MCMC random walk algorithm to select two tips. All trans-
heartbeats for an election to timeout is K ∈ {K1 , K2 , . . . , Kr }, actions directly or indirectly approved by this new transaction
where K1 = ba/τ c and Kr = bb/τ c. Then, two discrete then add its weight to their cumulative weights, as shown
time stochastic processes at time n can be defined: g(n) as in Fig. 6. For an approved transaction, its cumulative weight
the stage status {1,2,. . . ,r} of a given node, and b(n) as the gradually increases to reach a predefined threshold. Finally,
remaining steps (i.e., number of heartbeats) for the election the corresponding transaction is considered confirmed and
phase to timeout in a term. permanently recorded in the ledger.
Therefore, the transition process for an observed node from
follower to candidate can be modelled as a two-dimensional
stochastic process {g(n),b(n)}. It can be further trans-
formed to an absorbing DTMC on the state space
{(1,K1 ),. . . ,(1,0),. . . , (i,K1 ),. . . ,(i,0),. . . ,(r,Kr ),. . . ,(r,0)}.
Using the mathematical derivations proven in [95], the net-
work split probability before n-th step can be obtained.
DTMC for Modelling IOTA Tangle: IOTA tangle [37]
is a DAG-based distributed ledger designed for the
microtransactions in the IoT. Its consensus encourages FIGURE 6. An example of the IOTA tangle.
all participants to contribute in maintaining the ledger
through referencing (i.e., approving) two unapproved trans- To explore the impact of various transaction arrival rates on
actions called tips before issuing any new transaction. For the cumulative weight and confirmation delay of an observed
the new coming transaction, IOTA tangle leverages the transaction, Cao et al. [96] built a Markov chain model to

VOLUME 8, 2020 126939


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

analyze the tangle consensus. They classified the network we review the queue systems for modelling some popular
into four different regimes, according to the load situations: consensus mechanisms such as PoW and PoA.
high load (HR), low load (LR), high-to-low load (H2LR) Queueing Models for PoW: In PoW-based blockchain such
and low-to-high-load (L2HR). In each regime, the consensus as Bitcoin [1], the ledger is maintained and updated by the
process can be divided into two stages, namely the reveal mining process. In the mining process, a bunch of nodes
stage and accumulating stage [37]. Since the first two steady called miners compete for solving very difficult puzzle-like
regimes HR and LR have been analyzed in [37], the authors problems, which consume a lot of computation power. Trans-
only focus on two unsteady regimes H2LR and L2HR. actions issued by users are grouped into a container called a
The system can be modelled as a two-dimensional stochas- block, and the mining competition winner who first finds the
tic process S(t) = W (t), L(t) at an arbitrary time t, where W (t) algorithmic puzzle answer specialized for the block has the
is the cumulative weight of a transaction observed at time t, right to add the new block to the blockchain and accordingly
and L(t) is the total number of tips in the tangle at time t, t = gets incentives.
kh, k = 0,1,2,. . . ,∞. Considering that W (t + h) and L(t + h) In 2017, Kawase and Kasahara [97] first built a modified
are only determined by the current states W (t) and L(t), but M/GB /1 queue with batch service to model the Bitcoin mining
not related to the earlier status, the consensus process for a process, trying to deal with the transaction-confirmation time.
new observed transaction from issuance to confirmation can In this model, transaction arrival was assumed to be a Poisson
be formulated as a Markov process. Furthermore, this Markov process and service time interval to be a general (or arbitrary)
process can be formalized as a DTMC on discrete transaction distribution. Arriving transactions are served in a batch man-
arrival time intervals. Here, one step transition of the observed ner with a maximum batch size b. In a typical M/GB /1 queue
transaction is defined as the arrival of an incoming transaction system, an idle server starts service immediately if there are
with randomly selecting two tips for reference from L(t) tips. one or more customers awaiting service in the system [114].
Based on this DTMC model, the expected cumulative weight But in this variant model, newly arriving transactions wait in
and confirmation delay at a certain time in both H2LR and the queue for getting served until the next block-generation
L2HR can be obtained. time, even when the number of transactions is smaller than b.
This is regarded as the service with multiple vacations. This
B. QUEUEING THEORY FOR MODELLING
is a very straightforward model description from the Bitcoin
DLT CONSENSUSES
block generation and mining process based on Nakamoto’s
consensus, in which new transactions are grouped into a block
Queueing theory was originally proposed by Agner Krarup
to wait for being mined in the next block-generation time or
Erlang in 1909, to describe the Copenhagen telephone
even later on.
exchange. It was later developed to solve different types
To analyze this queue system, the authors leveraged
of system problems that involve waiting, such as customers
the joint distribution of the number of transactions in the
waiting for teller service in banks. In recent years, queueing
system and the elapsed service time to derive the mean
theory has been widely used to model computer networks and
transaction-confirmation time. Then, by using the method of
systems, cloud computing centers, and blockchain systems.
supplementary variables, a system of differential-difference
In a blockchain system, transactions issued by clients need to
equations was set up to formalize the problem. However, they
wait for servers (e.g., miner, validator or orderer) to provide
have not successfully provided the unique solution of the
service (e.g., mining, validating or ordering), and finally get
differential-difference equations’ system, leaving analysis of
confirmed.
the blockchain queue system as an interesting open problem
Using queueing theory, different consensus processes of
for future research.
DLTs can be modelled as different types of queue systems,
To overcome the difficulties encountered in the original
which are named according to the Kendall’s notation [113].
model [97], Li et al. introduced a new blockchain queue-
Within a queue system, it is possible to quantitatively answer
ing model [98] by decomposing the mining process into
some system performance questions such as what is the
two different exponential service stages: block-generation
expected number of transactions in the system, what is the
and blockchain-building processes. The sum of both stages’
transaction throughput of the system and what is the aver-
times is regarded as the transaction-confirmation time, also
age service time (i.e., transaction time). In this subsec-
called service time. In this model, all Bitcoin transactions
tion, we focus on introducing the typical queueing models
are assumed to be arriving according to a Poisson process,
(e.g., M/M/1, M/G/1 and G/M/1 queues) used for addressing
namely the arrival interval times follow an exponential dis-
these performance questions of some mainstream consensus
tribution with arrival rate λ. Service times in two stages
algorithms for blockchain.
of batch services are also simply assumed to be i.i.d. and
exponentially distributed with rates µ2 and µ1 , respectively.
1) QUEUEING MODELS FOR PROOF-BASED CONSENSUSES First, each transaction enters a queue waiting room and waits
Proof-based consensus is a type of consensus mechanism that for services. Then, in the first service process, a group of
requires the network participants to provide enough proof transactions are mined into a block with rate µ2 and, simul-
to compete for the chance of updating the ledger. Here, taneously, a nonce is appended to the block by the mining

126940 VOLUME 8, 2020


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

winner. The block has a limited transaction capacity of b, also average stationary number of transactions in the block E(N2 )
called batch size in the model. In practice, the selection of are obtained separately.
transactions may not follow a first-come-first-serve (FCFS) Because of the batch service and the Service-In-Random-
discipline, meaning that some latter coming transactions in Order discipline for choosing transactions from the queue-
the queue may be preferentially first selected into the block. ing waiting room into a block, the Markov chain structure
But in this model, all computations are based on the FCFS becomes more complicated. This makes the computation
discipline for the reason of simplification. Finally, a generated of transaction-confirmation time very difficult. To over-
block with all transactions wrapped in it is attached to the come the challenge, the authors borrowed a computational
blockchain in a transaction rate of µ1 . The simple blockchain technique by means of both the PH distributions and the
queueing model is illustrated in Fig. 7. RG factorizations [99].
There have been other blockchain queueing models pro-
posed for analyzing the performance of PoW consensus.
Ricci et al. [100] proposed a framework combining machine
learning with queueing theory to study Bitcoin transaction
delays. They introduced a simple queueing model for char-
acterization of the transaction confirmation that can be con-
siered a variant of M/G/1 queue. Different from complicated
mathematical derivations, the authors mainly leveraged the
operational laws in queueing theory, such as Little’s Law to
solve the queue system. The most important result, namely
FIGURE 7. Blockchain queueing model with two batch service processes. average transaction delay experienced by a user, is given as
E(D) = αE(B) + E(Br ), where α is the expected number
of blocks that a user needs to wait until a transaction is
To analyze this queueing model, the authors defined two confirmed, E(Br ) denotes the residual time of the inter-block
random variables I (t) and J (t) as the numbers of transactions time, and E(B) stands for the average time between block
in the block and in the queue at time t, respectively. Thus, confirmations. This formalization is inspired by the standard
the system can be modelled as a two-dimensional continuous- M/G/1 queueing model, where the coefficient of the residual
time Markov chain (CTMC) X (t) = {I (t), J (t)} on the state service time equals the system utilization. In this variant
space  = {(i, j) : i = 0, 1, . . . , b; j = 0, 1, 2 . . .}. By ana- of the model, a block is always being mined, making the
lyzing the state transition diagram (see [98] for details), utilization 100% all the time.
the only three possible transitions from an arbitrary state Zhao et al. [101] established a type of non-exhaustive
(i, j) are to state: (i, j+1), the same level; (0, j), i levels queueing model to study the average transaction confirma-
up; or (l, i+j−l), l (1 ≤ l ≤ b) levels down. With all tion time in a PoW-based blockchain system. For such sys-
these characteristics, the corresponding Markov transition tem, any block has a size limitation, and the block cannot
matrix (or infinitesimal generator) Q is a lower Hessenberg be confirmed during the mining process. Therefore, a non-
matrix, which is constructed by different repetitive small exhaustive queue with a limited batch service and a possible
matrix blocks. Therefore, X (t) is a continuous-time Markov zero-transaction service is naturally more suitable to capture
process of GI/M/1-type. This block-structured Markov chain the system features. In this queueing model, the mining pro-
(the other two examples are M/M/1-type and M/G/1-type) cess is treated as a vacation, and the block-verification pro-
can be solved using the matrix-analytic (or matrix-geometric) cess is regarded as a service. Transaction arrival is assumed
approach. to be a Poisson process with rate λ. The time duration
However, this model has very strong assumptions on trans- V for a mining process and the time duration S for a
action arrival and service processes. It is too specific and not block-verification process are both i.i.d. variables that follow
suitable for many practical conditions of blockchain systems. a general distribution with distribution functions V (t) and
To generalize this model, in their more recent work [99], S(t), respectively. Laplace-Stieltjes transform (LST) has been
the authors changed the transaction arrivals from Poisson to widely used in modelling both mining and block-verification
Markov arrival process (MAP), the service times from expo- processes to provide integral expressions for E[V ] and E[S].
nential to phase-type (PH), and the service discipline from Through a series of mathematical transformations and deriva-
FCFS to service-in-random-order. Under the new assump- tions, the authors eventually obtained the following expres-
tions, the blockchain queueing model description keeps the sion for average transaction confirmation time: E[C] =
same. Note that this is also a structured GI/M/1-type Markov E[S] + E[V ] (refer to [101] for details).
chain. For the solution, matrix-geometric approaches are Queueing Models for PoA: To evaluate the performance
adopted to analyze and find the stable condition. This is of the mining process in Proof-of-X based blockchains,
the same as the stationary condition of the previous model. Geissler et al. [102] proposed a generic discrete-time
The simple expressions for the average stationary number GI/GIN /1 queueing model. Their goal was to investigate key
of transactions in the queue waiting room E(N1 ) and the performance indicators, such as the mean queue size and

VOLUME 8, 2020 126941


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

mean transaction waiting time, and to identify significant implements the PBFT consensus algorithm among the net-
impact factors. To make this model general, the authors work peers (i.e., endorser, orderer and committer) mainly
abstracted the blockchain network as a single server by through three phases: endorsement, ordering and validation,
neglecting the information propagation delays among net- as illustrated in Fig. 8.
work nodes. Then, the model was built around a fixed-point
iteration of the queue size distribution by representing the
system state with queue size Qn .
In this system model, the transaction interarrival time A
follows a general
Pk distribution a(k) described as A(k) =
P(A < k) = i=0 a(k), k ∈ [0, ∞). The service time T
is also assumed to follow a general distribution. Every time
a new transaction arrives, the size of queue Q(k) increases
by one, while every block generation decreases the queue
size by confirming a batch of transactions from the queue.
Thus, the queue size distribution can be defined recursively,
with iteration based on an embedded Markov chain with FIGURE 8. Hyperledger Fabric transaction workflow.
embedding times right before a block generation event. Fur-
thermore, the distribution of key performance indicator trans-
action waiting time can be defined by the recurrence time Phase 1. Endorsement (also called proposal or execution):
distribution of the block generation process rT (x) and the 1 The client generates transaction proposals and sub-
coefficient of weighted probability c(k). The corresponding mits to endorsers for execution. 2 The endorsers
expressions are obtained through recursive solutions, Little’s simulate the transactions by executing the operation
law of queueing theory and basic probability mathemati- previously written on the chaincode, and then return
cal derivations, see [102] for details. In the evaluation part, responses with signed endorsements to the client. The
the authors obtained a good match by comparing the model endorsements contain the values read or written called
data and the experimental measurements, which showed read/write set (or rw-set) by the chaincode.
the effectiveness and accuracy of the model. Unfortunately, Phase 2. Ordering: The client sends the transaction together
this general model was only validated by using a specific with the endorsements to the Solo orderer for ordering
Ethereum implementation based on the Proof-of-Authority service. 3 The orderer collects transactions submit-
(PoA) consensus. It discounts the versatility of this model, ted from different clients, establishes a total order on
since the more popular PoX consensuses such as PoW and them for each channel, packages multiple transactions
PoS have not been examined. into blocks and generates a hash-chained sequence of
blocks. As for HLF v2.0, there are three implementa-
2) QUEUEING MODELS FOR VOTE-BASED CONSENSUSES tions of ordering peers: Solo, Kafka, and Raft.
Vote-based consensus is a type of high performance algo- Phase 3. Validation (also called validation and commit): 4
rithms relying upon voting to reach agreement on trans- The ordered blocks are delivered to committers through
action processing among participant nodes in a distributed gossip protocol broadcasting. All peers are committers
system. It is the most popular consensus mechanism used by default, including pure committers and committers
in permissioned blockchain. Three widely used represen- with additional endorser responsibilities. Subsequently,
tatives of the vote-based consensus implementations are the peers validate each transaction contained in the
PBFT [115], BFT-SMaRt [116], and delegated Byzantine received blocks. If all validations are passed, the trans-
fault tolerance (dBFT). action’s write set is applied to the peer’s world state,
Queueing Models for PBFT: The classic PBFT algorithm and the client gets a notification about the successful
was firstly proposed in 1999 to solve the transmission errors execution of the transaction 5 . Otherwise, any check
and Byzantine faults in distributed systems [115]. It consists fail will mark the transaction as invalid, and its effects
of five steps: request, pre-prepare, prepare, commit and reply. are disregarded.
When the PBFT is adopted in constructing blockchain sys- Geyer et al. [67] modelled the Solo ordering process of
tems such as Hyperledger Fabric v0.6 [65], Zilliqa [117], HLF as an M/MB /1 queueing system. According to the pre-
and EOS [47], it has different implementations and/or com- viously described three phases, transactions with endorse-
binations with other protocols. For example, EOS takes a ments arrive at the orderer at different times and are queued.
hybrid consensus of combining PBFT with DPoS, to greatly While the queued transactions reach a threshold number B
reduce the required consensus time. Zilliqa uses an optimized (called batch size), the orderer immediately provides ordering
version of classic PBFT binded with sharded PoW to achieve service and packages them all at once into a block. If the
consensus in an efficient manner, yielding a high throughput transactions arrive according to a Poisson process with rate λ
for the blockchain system. HLF, as the most well-known and the ordering service time is assumed to follow exponen-
permissioned enterprise-level distributed ledger platform, tial distribution with rate µ and FCFS discipline, the service

126942 VOLUME 8, 2020


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

process can be described as an M/MB /1 queueing system, The idea of leveraging an (n, k) fork-join queue to model
as shown in Fig. 9. vote-based blockchain is based on the fact that the ser-
vice process of this queue system matches well with the
above-mentioned transaction propagation and validation pro-
cedure. In an (n, k) fork–join queue, the incoming jobs are
split/forked on arrival for simultaneous and independent ser-
vice by numerous servers and joined before departure. While
in a vote-based blockchain system, if we consider the con-
firmation of a transaction as a big job requiring enough
FIGURE 9. Hyperledger Fabric ordering service illustration and
M/MB /1 queueing model.
validations from n nodes, this job can be split into n sub-
tasks, associated with being broadcast to n nodes and being
validated independently at the same time. Once any k out
To solve this model, the authors borrowed the results of n sub-tasks are finished, they are joined to finish the
from a well-studied general bulk service queueing model service and make the transaction confirmed and recorded to
M/Ma,b /1 [118]. They simply modified the batch size range the local ledger. The remaining n−k sub-tasks keep executing
to a = b = B. Then, the average time spent in the ordering until being finished. This is called a non-purging (n, k) fork-
phase E(T ) can be expressed by the given parameters, among queue. By contrast, a purging (n, k) fork-join queue removes
which the batch size B is approved to be significant to E(T ) all remaining sub-tasks of a job from both sub-queues and
from the numeric evaluations. This model well captures the service nodes once it receives the job’s kth answer.
characteristics of the ordering phase in Solo implementation. In the literature, this model is highly prevalent for per-
However, its shortcomings are obvious: 1) it is not suitable for formance modelling (e.g., estimating the sojourn time of
Raft or Kafka implementations; and 2) it does not describe the jobs in the queues) of parallel and distributed systems.
whole transaction delay in the HLF system. Recently, it has been found effective for use in studying
Alaslani et al. [103] focused on PBFT blockchain system the delay performance of the synchronization process of the
end-to-end delay evaluation in IoT. To study the system vote-based permissioned blockchain systems [104]. A typical
delay, the authors built a model with two standard queues non-purging (n, k) fork-join queueing model is illustrated
to capture the features of PBFT consensus from the sys- in Fig. 10.
tem level. In this system, there are M IoT devices working
as clients to send transaction requests, and K intermedi-
ate switches and R consensus replicas working together to
process transactions. Since different IoT applications have
different latency requirements to guarantee their service level
agreement (SLA), network parameters need to be analyzed
to meet the requirements. In the first part of the model,
an M/G/1 queue is considered in which the maximum number
of network hops K ∗ needs to be calculated under the appli-
cation latency constrains. In the second part of the model,
an M/M/1 queue is used to calculate R, the number of consen-
sus replicas (i.e., blockchain consensus participants) needed FIGURE 10. A typical fork-join queueing model. All blockchain voting
nodes are homogeneous with the common service rate µ.
to maintain the end-to-end requirements. Next, operational
laws such as Little’s law are used to analyze the network
Even though few analytical results exist for fork–join
hops, and the number of consensus replicas, where three main
queues, various approximation solutions are known.
phases (i.e., preprepare, prepare, and commit) of PBFT and
An example is the linear transformation approach [119],
its fault tolerance capability f out of N = 3f + 1 replicas are
which can be used to approximately compute the sojourn time
taken into consideration.
t(n, k) of a general non-purging (n, k) fork-join queue for the
Fork-Join Queue for Vote-Based Consensus: In vote-based
vote-based blockchain system.
permissioned blockchain systems, transactions are broadcast
to all authenticated voting peers of the P2P overlay after 3) FLUID QUEUE FOR IOTA TANGLE
being proposed. These voting peers, called miner nodes or In queueing theory, a fluid queue (also called fluid model)
validators, are selected and authorized to validate transac- is a mathematical model used to describe the fluid level in
tions, generate new blocks and record data to the local ledger a reservoir, for which the periods of filling and emptying
if a transaction gets enough validation votes, e.g., k out of n. are randomly determined. It can be viewed as a large tank
For example, in the PBFT, a block is accepted and recorded connected to a series of pipes that pour fluid into the tank
if 2f + 1 out of n = 3f + 1 peers independently agree on and a series of pumps that remove fluid from the tank. The
the block of transactions, where f is the maximum number of capacity of this tank is typically assumed to be infinite.
Byzantine fault peers this system can tolerate. The fluid level X (t) of this tank at time t is a random variable

VOLUME 8, 2020 126943


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

that can be calculated if the fluid arrival and leaving rates are
given.
This model was successfully used to describe the dynamic
behaviour of the IOTA tangle [105]. First, a fluid model
was heuristically built based on some requisite stochastic
models and the assumptions on the transaction arrival rate.
Through solving the proposed delay differential equations
system, the authors analyzed the stability of conflicts, which
impacted the performance in return.

C. STOCHASTIC PETRI NETS FOR MODELLING DLT


CONSENSUSES
Another type of commonly used analytical tool for
BFT-based consensus performance modelling is stochastic FIGURE 11. SPN models for ordering service in HLF V1.0+: (a) GPSN [106]
(b) SRN [107].
Petri net (SPN), especially its variants generalized stochastic
Petri net (GSPN) and stochastic reward net (SRN). Petri
nets (PNs) are a type of powerful mathematical modelling exponentially distributed request arrival and constant size of
language used to model and simulate discrete-event dis- each transaction. The symbols in the figure are interpreted as
tributed systems. They are graphs consisting of two types of follows: Te is a transition signifying the arrival of an endorsed
nodes: places and transitions, which represent variables of transaction. Pwait_o is a place signifying the transaction is
system states represented by circles and actions made by the queuing, the number of token #(Pwait_o ) denotes the queuing
system represented by rectangles. When the firing times of all length. N is the batch processing size in number of trans-
transitions are exponentially distributed (timed transitions), actions. Pserve_o is a place signifying transactions are being
the model is called SPN. Built on SPN, a GSPN allows ordered. Pidle_o is a place signifying the server is idle now,
transitions to have zero firing times (immediate transitions) the number of token #(Pidle_o ) denotes the number of idle
and inhibitor arc – an arc from a place to a transition that servers. Tin is an immediate transition whose enable predicate
inhibits the firing of the transition when the input place is not is #(Pwait_o ) > 0 & #(Pidle_o ) > 0, which means there are idle
empty. According to the literature, any GSPN model can be servers and queuing transactions. Pnext is a place signifying
converted to an equivalent CTMC, and vice versa. At the net the next processing phase.
level, an SRN substantially improves the modelling power of Similarly, other phases can be modelled by following the
the GSPN by adding guard functions, marking dependent arc same methodology. Consequently, the proposed analytic sys-
multiplicities, general transition priorities, and reward rates. tem model based on GSPN indicates that the HLF system is
HLF V1.0+ adopts a highly modular architecture design composed of multiple successive M/M/1 queue networks, and
by decomposing the transaction process into three main the system throughput is equal to the lowest throughput of
stages as shown in Fig. 8. They can be also refined into all those phases. Using a tool embedded in Matlab named
five phases, namely HTTP, endorsement, ordering, valida- pntool, this system can be numerically solved to determine
tion & committing, and response [106]. HLF’s modular the latency and throughput.
design makes it possible to separately build a model for The second part of Fig. 11 describes a simple SRN model
each phase and then cascade them to analyze the perfor- for HLF ordering service in a network with one client, two
mance from the net/system level. There have been two stud- endorsers and one peer running the validation logic. After the
ies on HLF performance analysis using GSPN [106] and client receives a response from both endorsing peers, it sends
SRN [107], respectively. Both follow these general steps: the endorsed transaction to the ordering service (transition
1) clarify transaction process steps and the business logic TTx ), specified by a token deposited in place POS . When the
behind them; 2) create the associated transition diagrams number of pending transactions reaches block size (denoted
of Petri nets according to the corresponding rules under by M ) or block timeout for general, a number of transac-
reasonable assumptions; 3) translate to Markov chains for tions are ordered into a block (transition TOS ). The block is
analytical solutions or directly leverage mathematical tools delivered to the committing peers (place Tnext ) for validations,
for numerical simulation solutions. The second step is critical such as VSCC validation and MVCC validation. Finally, all
because it bridges the real system to an analytical model and successfully validated transactions in the block are recorded
paves the way to solutions for the performance indicators into the local ledger. As for solving this SRN model, one
such as transaction throughput, latency, average queue length can use the simulation approach called Stochastic Petri net
and utilization. Here, we focus on the Petri nets’ transition Package (SPNP) [120] to numerically find answers for the
diagrams of the ordering phase from the two reviewed studies, following performance metrics.
as shown in Fig. 11. • throughput: corresponds to the rate of each transition,
In Fig. 11 (a), the ordering service starts with taking using function rate() in SPNP to capture. E.g., the rate
endorsed transactions as inputs under the assumptions of the of transition TLedger signifies the block throughput of the

126944 VOLUME 8, 2020


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

system, which can be used to multiply by M to obtain solutions. Finally, we point out some open issues and provide
transaction throughput. suggestions for future research.
• utilization: computed by the probability that the corre-
sponding transition in SRN is enabled, using func- A. FINDINGS FOR EMPIRICAL ANALYSIS
tion enabled(), or computed using reward functions for 1) PERFORMANCE METRICS AND WORKLOADS
transitions with function-dependent marking rate (such The evaluated performance metrics can be divided into two
as TVSCC). categories: macro (or overall) metrics and micro (or detailed)
• mean queue length: obtained by the number of tokens in metrics. Macro metrics provide an overview of the system’s
the corresponding phase, using function mark(). E.g., performance for users from the application level, such as
the mean number of tokens in place POS indicates the transaction throughput, latency, scalability, fault tolerance,
mean queue length at the ordering service. transactions per CPU/memory second/disk IO/network data.
The first two metrics (transaction throughput and latency) are
evaluated most frequently, over all blockchains. Micro met-
D. OTHER MODELS IN DLT PERFORMANCE MODELLING rics depict the performance of different subprocesses of trans-
Besides the stochastic models described earlier, there actions or specific layers in the blockchain abstract model
have been other analytical models proposed for analyz- for developers, such as peer discovery rate, RPC response
ing blockchain performance. For example, a prediction rate, transaction propagating rate, contract execution time,
model [108] derived from the core Ethereum’s structure state updating time, consensus-cost time, encryption and hash
called World State was proposed to provide companies with function efficiency. Both macro and micro metrics are evalu-
a more accurate estimation of performance and required stor- ated under well-designed workloads.
age. By analyzing the modified Merkle Patricia tree (MPT), In blockchain performance benchmarking or monitoring
which is the implementation of the World State in Ethereum, frameworks, these workloads have been designed to evaluate
the expectation and the max tree height were derived as the performance of different layers of blockchain. Macro
a function of the total number of transactions n. These workloads, such as YCSB, Smallbank, EtherId, Doubler and
results linked to the performance and storage, which were WavesPresale, are designed to evaluate the application layer
meaningful for decision making and early warnings. Another in blockchain. Micro workloads, such as DoNothing, Analyt-
study [109] adopted stochastic network models to analyze the ics, IOHeavy and CPUHeavy, are designed to evaluate lower
overall block generation rate for the PoW-based Ethereum. layers of blockchain, including execution, data and consensus
Through this model, the blockchain evolution and dynamics layers [49].
can be captured and used to analyze the impact of the block In general, there are two popular ways to generate work-
dissemination delay and hashing power of the member nodes loads for experiment-based performance evaluation. One is to
on the block generation rate. construct a synthetic application with commonly used func-
Random graph (also called Erdős-Rényi model) is a power- tions (e.g., CreateAccount, IssueMoney and TransferMoney),
ful mathematical tool first introduced by Erdos [121] and Bol- and leverage a client node to send requests of transactions
lobás and Béla [122] to model and analyze complex networks. (i.e., implemented functions) to a blockchain system [78].
It has properties suitable for modelling the peer-to-peer over- The other is to leverage HTTP performance testing toolkit for
lay networks used by blockchain systems [110]. There are generating requests, for example, using the loadtest library
two main variants of the Erdős-Rényi model. One of them is of Node.js to specify an HTTP request as a JSON-formated
Gp (N ), which is a graph constructed by randomly connecting object, and constructing workloads for blockchains as sepa-
nodes. Each edge is included in the graph with probability p rate JSON objects [77], [84].
independent from every other edge. Shahsavari et al. [110]
presented a random graph using Gp (N ) to model the Bit- 2) EVALUATED BLOCKCHAINS
coin blockchain network, where N is the total number of HLF (v0.6 with PBFT and v1.0+ with BFT-SMaRt), private
nodes, and p refers to the independent probability that there Ethereum (Geth with PoW and Parity with PoA/PoW) and
exists a link between any two observed nodes in the peer-to- Ripple with XRP consensus are the most often compara-
peer overlay network. Based on the well-established random tively evaluated blockchain platforms [49], [72], [78], [80].
graph analysis results, some key performance measures can Among them, HLF and Ripple can reach 1,000+ TPS within
be derived in terms of block dissemination delay and traffic a small network and outperform the Ethereum platforms in
overhead. terms of throughput and latency, under both macro and micro
benchmark workloads. However, because of the underlying
V. FINDINGS AND SUGGESTIONS FOR FUTURE consensus algorithms they use, both HLF and Ripple fail
RESEARCH to scale beyond a certain number of nodes in the network
In this section, we summarize the main findings from previ- (e.g., 16 [49] for HLF v0.6). For HLF, it is well-known that
ous evaluation sections. First, we discuss the findings from BFT-based consensuses (e.g., PBFT and BFT-SMaRt) rely
the empirical and analytical evaluations. We then take a look on a leader for processing transactions, which may act as
at the performance bottlenecks identified from all reviewed a bottleneck and cause performance limitations. For Ripple,

VOLUME 8, 2020 126945


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

a limited and fixed number of validators receive and process are collected from different nodes to finish the original job
numerous transaction requests, and finally fail to scale when (e.g., block validation).
the number of requests goes beyond the capability of the
validators. This conclusion is shared by a number of early C. IDENTIFIED PERFORMANCE BOTTLENECKS
evaluation studies such as [49], [77], [78], [80]. Between From the perspective of users or managers, performance eval-
the different versions of HLF, its new release v1.0+ has uation results can be used for decision making on blockchain
better performance than v0.6 [62] across all evaluated macro system selection. Developers and system designers, on the
metrics such as execution time, latency, throughput and scal- other hand, may care more about the identified bottlenecks
ability. In addition, another blockchain proposed for IoT (i.e., rather than the comparison results. They can analyze these
Tendermint) outperforms HLF V0.6 and Ripple on both the bottlenecks and propose solutions for further performance
throughput and the latency [84]. optimization. All bottlenecks identified in the reviewed
It is worth noting that we did not encounter any improve- papers are listed in Table 7.
ment solutions such as off-chain, side-chain, concurrent exe- As we can see, most bottlenecks are still unresolved. This
cution and sharding in the evaluated blockchain systems. means that corresponding effective solutions to solve the
In fact, many of the proposed solutions only exist at the performance problems have not yet been found. Another
conceptual stage at the time of writing this survey. Some observation is that most bottlenecks are identified by empir-
of them provide a brief comparative evaluation and analysis ical analysis, which can be attributed to two reasons. First,
under a specific use case for the purpose of proof-of-concept, there are more empirical analyses conducted than perfor-
but lack a systematic evaluation in a meaningful manner to mance modelling. Second, due to the involved mathematical
demonstrate their effectiveness and efficiency. expressions, analytical modelling is much more difficult than
experimental solutions in exploring the impact of design
parameters. In blockchain performance modelling, even one
3) CONSENSUS FINALITY
simple extra parameter can significantly increase the model
Consensus finality refers to the deterministic property of a complexity. Therefore, empirical analysis becomes more effi-
blockchain where a block is considered confirmed once it is cient and popular in bottleneck identification than its mod-
appended to the ledger. BFT-based blockchains are all with elling counterpart.
consensus finality, while those PoW-based are usually not.
This property has a direct impact on the transaction latency. D. OPEN ISSUES AND FUTURE DIRECTIONS
For example, Bitcoin usually requires six successive confir- As a fundamental component of blockchain research, per-
mations as a secure finality that a transaction will not end up formance evaluation plays an important role in boost-
being pruned and removed from the blockchain, which makes ing blockchain applications. Although numerous blockchain
the latency reach an unacceptable time of almost one hour. improvements have been proposed and implemented, only a
In contrast, HLF with BFT-based consensuses can finalize a small number of them have been well evaluated. The evalua-
transaction within seconds right after it is appended to the tion methods also need more analysis and explorations. Here,
ledger. Therefore, BFT-based blockchains have an obvious we identify some open issues and suggest potential directions
advantage over PoW-based blockchain systems in terms of for future research in this area.
performance.
• For empirical analysis, difficulties lie in comparative
evaluation among different blockchain platforms, espe-
B. FINDINGS FOR ANALYTICAL MODELLING cially for those with very different consensus algorithms
Performance Modelling Strategies: Most models neglect and data structures. The main reason is the lack of
information propagation delays in the network and simply interface standards in running workloads. For example,
collapse the whole network into a single node that provides when evaluating blockchain platforms for IoT such as
service to process and confirm transactions. These models HLF 1.0, Ripple and IOTA, it is difficult to design a
are usually queue systems that provide bulk services such as common interface for uploading workload. Since smart
M/MB /1 and M/GB /1 queues. Only a small portion of models contracts are not supported by Ripple or IOTA, one
consider the system as separate disjoint nodes and take the solution is to design an equivalent workload such as
network latency among network nodes into consideration. transferring a unity amount from account A to another
They aim to calculate system end-to-end output (e.g., delays) account B [77]. However, this approach has limited
using queue networks or by cascading different queues such extensibility, and requires to deploy a dedicated work-
as M/G/1 and M/M/1 together to model the blockchain load for each blockchain under evaluation. Thereby,
network. there is a great potential for future research to develop
An (n, k) fork-join queue combines both modelling strate- more extensible tools for comparative evaluation of
gies. It first regards the system as a single server when the blockchain platforms.
system receives a job request. Then, it splits the job into • Many methods of experimental analysis rely on RPCs
several sub-tasks for independent and simultaneous processes to communicate with blockchain consensus nodes and
on different network nodes. In the joint phase, process results collect transaction statistic data (e.g., the total number

126946 VOLUME 8, 2020


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

TABLE 7. Identified performance bottlenecks for different blockchain systems.

of confirmed transactions of certain duration). Although VI. CONCLUSION


the RPC API protocols (e.g., gRPC and JSON-RPC) As blockchain has matured to receive more and more atten-
claim to be efficient, they still induce extra overhead tion, its performance problems (e.g., low throughput and
onto the consensus peers [61], which is counted as high latency) have became critical. To resolve these issues,
the peer consumption and in turn makes the evalua- there have been many improvements proposed, from sys-
tion results inaccurate. Therefore, a more light-weight tem level optimization to new efficient consensus proto-
and low-overhead data collection approach, such as cols. However, such blockchain modifications need to be
log-based approach [61], deserves more attention in the evaluated in a meaningful manner to demonstrate their
future. performance advantages. In this paper, we present a sys-
• RPC methods are widely used for data collection in tematic survey covering existing blockchain performance
empirical performance evaluation of blockchain sys- evaluation approaches. From the high level perspective, they
tems. For micro metrics and micro workload design, can be categorized into empirical and analytical evaluation
it is challenging to decouple the impact from other lay- methods.
ers. For example, two queries on transaction values are The empirical analysis can be further divided into four
designed to evaluate the data model performance. For groups: performance benchmarking, monitoring, experi-
Ethereum, both queries can be easily implemented via mental analysis and simulation. Three popular benchmark
invoking JSON-RPC APIs. However, for HLF, a chain- frameworks (i.e., Blockbench, DAGbench and Hyperledger
code (VersionKVStore) must be implemented as there Caliper) are introduced and comparatively analyzed. Per-
are no APIs to query historical states in the system. formance monitoring is recognized as the best solution for
Inevitably, this involves execution of a smart contract performance evaluation of public blockchain.
making the evaluation inaccurate by adding extra over- Analytical modelling approaches are more powerful than
head. Therefore, for detailed evaluation of performance empirical solutions especially for analyzing the consensus
metrics in specific blockchain abstraction layers, it is an layer of blockchain system. There are three main types
open issue to design a reasonable workload that allevi- of modelling approaches compared in this survey: Markov
ates the impact of other layers and improves accuracy. chains, queueing models and stochastic Petri nets. This com-
• Besides the classic blockchain systems such as HLF and parison can provide directions for selecting blockchain eval-
Ethereum, there is an urgent need for evaluating the per- uation approach suitable for given purpose.
formance of their proposed improvements. For example, We also summarized the results of surveyed performance
sharding claims to be a promising solution and has been evaluation studies and identified the bottlenecks of major
implemented in many blockchains. However, there is blockchain platforms. The survey concludes with identifi-
no evaluation work for comparing different shard-based cation of open issues and ascertainment of future research
blockchain systems. Different solutions, such as shard- directions in this important area.
ing v.s. DAG and off-chain v.s. side-chain also need
to be comparatively evaluated. In addition, it would ACKNOWLEDGMENT
be beneficial to combine empirical and analytical The authors would also like to thank Compute Canada, SAVI,
approaches in blockchain performance evaluation in the and Cybera for the support of this research through their cloud
future. services.

VOLUME 8, 2020 126947


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

REFERENCES [26] P. S. Anjana, S. Kumari, S. Peri, S. Rathor, and A. Somani, ‘‘An efficient
framework for optimistic concurrent execution of smart contracts,’’ in
[1] S. Nakamoto and A. Bitcoin. (2008). A Peer-to-Peer Electronic Cash Proc. 27th Euromicro Int. Conf. Parallel, Distrib. Netw.-Based Process.
System. [Online]. Available: https://bitcoin.org/bitcoin.pdf (PDP), Feb. 2019, pp. 83–92.
[2] Q. K. Nguyen, ‘‘Blockchain—A financial technology for future sus- [27] L. Luu, V. Narayanan, C. Zheng, K. Baweja, S. Gilbert, and P. Saxena,
tainable development,’’ in Proc. 3rd Int. Conf. Green Technol. Sustain. ‘‘A secure sharding protocol for open blockchains,’’ in Proc. ACM
Develop. (GTSD), Nov. 2016, pp. 51–54. SIGSAC Conf. Comput. Commun. Secur., Oct. 2016, pp. 17–30.
[3] L. Cocco, A. Pinna, and M. Marchesi, ‘‘Banking on blockchain: Costs [28] E. Kokoris-Kogias, P. Jovanovic, L. Gasser, N. Gailly, E. Syta, and
savings thanks to the blockchain technology,’’ Future Internet, vol. 9, B. Ford, ‘‘OmniLedger: A secure, scale-out, decentralized ledger via
no. 3, p. 25, Jun. 2017. sharding,’’ in Proc. IEEE Symp. Secur. Privacy (SP), May 2018,
[4] M. Hölbl, M. Kompara, A. Kamišalić, and L. N. Zlatolas, ‘‘A systematic
pp. 583–598.
review of the use of blockchain in healthcare,’’ Symmetry, vol. 10, no. 10, [29] M. Zamani, M. Movahedi, and M. Raykova, ‘‘RapidChain: Scaling
p. 470, Oct. 2018. blockchain via full sharding,’’ in Proc. ACM SIGSAC Conf. Comput.
[5] C. C. Agbo, Q. H. Mahmoud, and J. M. Eklund, ‘‘Blockchain technology
Commun. Secur., Jan. 2018, pp. 931–948.
in healthcare: A systematic review,’’ Healthcare, vol. 7, no. 2, p. 56, [30] J. Wang and H. Wang, ‘‘Monoxide: Scale out blockchains with asyn-
Apr. 2019. chronous consensus zones,’’ in Proc. 16th USENIX Symp. Netw. Syst.
[6] L. A. Linn and M. B. Koo, ‘‘Blockchain for health data and its potential
Design Implement. (NSDI), 2019, pp. 95–112.
use in health it and health care related research,’’ in Proc. ONC/NIST [31] H. Dang, T. T. A. Dinh, D. Loghin, E.-C. Chang, Q. Lin, and B. C. Ooi,
Use Blockchain Healthcare Res. Workshop. Gaithersburg, MD, USA: ‘‘Towards scaling blockchain systems via sharding,’’ in Proc. Int. Conf.
ONC/NIST, 2016, pp. 1–10. Manage. Data (SIGMOD), 2019, pp. 123–140.
[7] M. Mettler, ‘‘Blockchain technology in healthcare: The revolution starts [32] Y. Lewenberg, Y. Sompolinsky, and A. Zohar, ‘‘Inclusive block chain
here,’’ in Proc. IEEE 18th Int. Conf. e-Health Netw., Appl. Services protocols,’’ in Proc. Int. Conf. Financial Cryptogr. Data Secur. Berlin,
(Healthcom), Sep. 2016, pp. 1–3. Germany: Springer, 2015, pp. 528–547.
[8] Z. Li, J. Kang, R. Yu, D. Ye, Q. Deng, and Y. Zhang, ‘‘Consortium [33] Y. Sompolinsky, Y. Lewenberg, and A. Zohar, ‘‘SPECTRE: A fast
blockchain for secure energy trading in industrial Internet of Things,’’ and scalable cryptocurrency protocol,’’ IACR Cryptol. ePrint Arch.,
IEEE Trans. Ind. Informat., vol. 14, no. 8, pp. 3690–3700, Aug. 2018. White Papers, 2016, p. 1159.
[9] E. Mengelkamp, B. Notheisen, C. Beer, D. Dauer, and C. Weinhardt, [34] Y. Sompolinsky and A. Zohar, ‘‘PHANTOM: A scalable blockdag proto-
‘‘A blockchain-based smart grid: Towards sustainable local energy mar- col,’’ IACR Cryptol. ePrint Arch., White Papers, 2018, p. 104.
kets,’’ Comput. Sci.-Res. Develop., vol. 33, nos. 1–2, pp. 207–214, [35] C. Li, P. Li, D. Zhou, W. Xu, F. Long, and A. Yao, ‘‘Scaling
Feb. 2018. Nakamoto consensus to thousands of transactions per second,’’ 2018,
[10] N. Z. Aitzhan and D. Svetinovic, ‘‘Security and privacy in decentralized arXiv:1805.03870. [Online]. Available: http://arxiv.org/abs/1805.03870
energy trading through multi-signatures, blockchain and anonymous mes- [36] S. D. Lerner, ‘‘DagCoin: A cryptocurrency without blocks,’’ White
saging streams,’’ IEEE Trans. Dependable Secure Comput., vol. 15, no. 5, Paper, 2015. Accessed: Jul. 7, 2020. [Online]. Available: https://
pp. 840–852, Sep. 2018. dagcoin.org/wpcontent/uploads/2019/07/Dagcoin_White_Paper.pdf
[11] G. Perboli, S. Musso, and M. Rosano, ‘‘Blockchain in logistics and supply [37] S. Popov, ‘‘The tangle,’’ White Papers, 2016, p. 131.
chain: A lean approach for designing real-world use cases,’’ IEEE Access, [38] A. Churyumov. (2016). Byteball: A Decentralized System for Storage and
vol. 6, pp. 62018–62028, 2018. Transfer of Value. [Online]. Available: https://byteball.org/Byteball.pdf
[12] E. Tijan, S. Aksentijević, K. Ivanić, and M. Jardas, ‘‘Blockchain technol- [39] C. LeMahieu. (2018). Nano: A Feeless Distributed Cryptocurrency
ogy implementation in logistics,’’ Sustainability, vol. 11, no. 4, p. 1185, Network. Accessed: Mar. 24, 2018. [Online]. Available: https://nano.
Feb. 2019. org/en/whitepaper
[13] K. Croman, C. Decker, I. Eyal, A. E. Gencer, A. Juels, A. Kosba, [40] S. Kim, Y. Kwon, and S. Cho, ‘‘A survey of scalability solutions on
A. Miller, P. Saxena, E. Shi, E. G. Sirer, D. Song, and R. Wattenhofer, blockchain,’’ in Proc. Int. Conf. Inf. Commun. Technol. Converg. (ICTC),
‘‘On scaling decentralized blockchains,’’ in Proc. Int. Conf. Financial Oct. 2018, pp. 1204–1207.
Cryptogr. Data Secur. Berlin, Germany: Springer, 2016, pp. 106–125. [41] S. Rouhani and R. Deters, ‘‘Security, performance, and applica-
[14] G. Wood, ‘‘Ethereum: A secure decentralised generalised transaction tions of smart contracts: A systematic survey,’’ IEEE Access, vol. 7,
ledger,’’ Ethereum Project Yellow Paper, vol. 151, pp. 1–32, Apr. 2014. pp. 50759–50779, 2019.
[15] J. Reed, Litecoin: An Introduction to Litecoin Cryptocurrency and Lite- [42] X. Zheng, Y. Zhu, and X. Si, ‘‘A survey on challenges and progresses in
coin Mining. Scotts Valley, CA, USA: CreateSpace Independent Publish- blockchain technologies: A performance and security perspective,’’ Appl.
ing Platform, 2017. Sci., vol. 9, no. 22, p. 4731, 2019.
[16] J. Teutsch and C. Reitwießner, ‘‘Truebit: A scalable verification solution [43] R. Wang, K. Ye, and C.-Z. Xu, ‘‘Performance benchmarking and
for blockchains,’’ White Papers, 2018. optimization for blockchain systems: A survey,’’ in Proc. Int. Conf.
[17] H. Kalodner, S. Goldfeder, X. Chen, S. M. Weinberg, and E. W. Felten, Blockchain. Cham, Switzerland: Springer, 2019, pp. 171–185.
‘‘Arbitrum: Scalable, private smart contracts,’’ in Proc. 27th USENIX [44] Q. Zhou, H. Huang, Z. Zheng, and J. Bian, ‘‘Solutions to scalability of
Secur. Symp. (USENIX Security), 2018, pp. 1353–1370. blockchain: A survey,’’ IEEE Access, vol. 8, pp. 16440–16455, 2020.
[18] J. Poon and T. Dryja, ‘‘The bitcoin lightning network: Scalable off-chain [45] G. Yu, X. Wang, K. Yu, W. Ni, J. A. Zhang, and R. P. Liu, ‘‘Survey:
instant payments,’’ White Papers, 2016. Sharding in blockchains,’’ IEEE Access, vol. 8, pp. 14155–14181, 2020.
[19] Raiden Network. (2018). Fast, Cheap, Scalable Token Transfers [46] V. Acharya, A. E. Yerrapati, and N. Prakash, Oracle Blockchain Quick
for Ethereum. Accessed: Jul. 7, 2020. [Online]. Available: https:// Start Guide: A Practical Approach to Implementing Blockchain in Your
raiden.network Enterprise. Birmingham, U.K.: Packt, 2019.
[20] A. Back, M. Corallo, L. Dashjr, M. Friedenbach, G. Maxwell, [47] EOS. IO. (2017). EOSIO Technical White Paper. Acce-
A. Miller, A. Poelstra, J. Timón, and P. Wuille. (2014). Enabling ssed: Dec. 18, 2017. [Online]. Available: https://github.com/EOSIO/
Blockchain Innovations With Pegged Sidechains. [Online]. Available: Documentation
http://www.opensciencereview.com/papers/123/enablingblockchain- [48] CB Insights. (2019). Banking is Only the Beginning: 42 Big Industries
innovations-with-pegged-sidechains Blockchain Could Transform. Accessed: Feb. 24, 2020 [Online].
[21] J. Poon and V. Buterin, ‘‘Plasma: Scalable autonomous smart contracts,’’ Available: https://www.cbinsights.com/research/industries-disrupted-
White Paper, 2017, pp. 1–47. Accessed: Jul. 7, 2020. [Online]. Available: blockchain//
https://plasma.io/plasma-deprecated.pdf [49] T. T. A. Dinh, J. Wang, G. Chen, R. Liu, B. C. Ooi, and K.-L. Tan,
[22] P. Gazi, A. Kiayias, and D. Zindros, ‘‘Proof-of-stake sidechains,’’ IACR ‘‘BLOCKBENCH: A framework for analyzing private blockchains,’’ in
Cryptol. ePrint Arch., White Papers, 2018, p. 1239. Proc. ACM Int. Conf. Manage. Data (SIGMOD), 2017, pp. 1085–1100.
[23] A. Kiayias and D. Zindros, ‘‘Proof-of-work sidechains,’’ in Proc. Int. [50] S. T. Leutenegger and D. Dias, ‘‘A modeling study of the TPC-C bench-
Conf. Financial Cryptogr. Data Secur. Cham, Switzerland: Springer, mark,’’ ACM SIGMOD Rec., vol. 22, no. 2, pp. 22–31, Jun. 1993.
2019, pp. 21–34. [51] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears,
[24] T. Dickerson, P. Gazzillo, M. Herlihy, and E. Koskinen, ‘‘Adding concur- ‘‘Benchmarking cloud serving systems with YCSB,’’ in Proc. 1st ACM
rency to smart contracts,’’ Distrib. Comput., vol. 33, pp. 209–225, 2020. Symp. Cloud Comput. (SoCC), 2010, pp. 143–154.
[25] L. Yu, W.-T. Tsai, G. Li, Y. Yao, C. Hu, and E. Deng, ‘‘Smart-contract [52] D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux, ‘‘OLTP-
execution with concurrent block building,’’ in Proc. IEEE Symp. Service- bench: An extensible testbed for benchmarking relational databases,’’
Oriented Syst. Eng. (SOSE), Apr. 2017, pp. 160–167. Proc. VLDB Endowment, vol. 7, no. 4, pp. 277–288, Dec. 2013.

126948 VOLUME 8, 2020


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

[53] V. Abramova and J. Bernardino, ‘‘NoSQL databases: MongoDB vs cas- [74] C. Rathfelder and B. Klatt, ‘‘Palladio workbench: A quality-prediction
sandra,’’ in Proc. Int. C∗ Conf. Comput. Sci. Softw. Eng. (CSE), 2013, tool for component-based architectures,’’ in Proc. 9th Work. IEEE/IFIP
pp. 14–22. Conf. Softw. Archit., Jun. 2011, pp. 347–350.
[54] Y. Abubakar, T. S. Adeyi, and I. G. Auta, ‘‘Performance evaluation of [75] M. Bez, G. Fornari, and T. Vardanega, ‘‘The scalability challenge of
NoSQL systems using YCSB in a resource austere environment,’’ Int. J. ethereum: An initial quantitative analysis,’’ in Proc. IEEE Int. Conf.
Appl. Inf. Syst., vol. 7, no. 8, pp. 23–27, Sep. 2014. Service-Oriented Syst. Eng. (SOSE), Apr. 2019, pp. 167–176.
[55] H. Matallah, G. Belalem, and K. Bouamrane, ‘‘Experimental comparative [76] C. Fan, H. Khazaei, Y. Chen, and P. Musilek, ‘‘Towards a scalable DAG-
study of NoSQL databases: HBASE versus MongoDB by YCSB,’’ Int. J. based distributed ledger for smart communities,’’ in Proc. IEEE 5th World
Comput. Syst. Sci. Eng., vol. 32, no. 4, pp. 307–317, 2017. Forum Internet Things (WF-IoT), Apr. 2019, pp. 177–182.
[56] Z. Dong, E. Zheng, Y. Choon, and A. Y. Zomaya, ‘‘DAGBENCH: A per- [77] R. Han, V. Gramoli, and X. Xu, ‘‘Evaluating blockchains for IoT,’’
formance evaluation framework for DAG distributed ledgers,’’ in Proc. in Proc. 9th IFIP Int. Conf. New Technol., Mobility Secur. (NTMS),
IEEE 12th Int. Conf. Cloud Comput. (CLOUD), Jul. 2019, pp. 264–271. Feb. 2018, pp. 1–5.
[57] ‘‘Hyperledger blockchain performance metrics white paper V1.01,’’ [78] S. Pongnumkul, C. Siripanpornchana, and S. Thajchayapong, ‘‘Perfor-
HyperLedger Found., Hyperledger Perform. Scale Working Group, mance analysis of private blockchain platforms in varying workloads,’’
White Paper, 2018. Accessed: Jul. 7, 2020. [Online]. Available: https:// in Proc. 26th Int. Conf. Comput. Commun. Netw. (ICCCN), Jul. 2017,
www.hyperledger.org/wpcontent/uploads/2018/10/HL_Whitepaper_ pp. 1–6.
Metrics_PDFVersion.pdf [79] S. Chen, J. Zhang, R. Shi, J. Yan, and Q. Ke, ‘‘A comparative testing
[58] A. Aldweesh, M. Alharby, M. Mehrnezhad, and A. Van Moorsel, on performance of blockchain and relational database: Foundation for
‘‘OpBench: A CPU performance benchmark for ethereum smart con- applying smart technology into current business systems,’’ in Proc. Int.
tract operation code,’’ in Proc. IEEE Int. Conf. Blockchain (Blockchain), Conf. Distrib., Ambient, Pervas. Interact. Cham, Switzerland: Springer,
Jul. 2019, pp. 274–281. 2018, pp. 21–34.
[59] A. Aldweesh, M. Alharby, E. Solaiman, and A. van Moorsel, ‘‘Per- [80] Y. Hao, Y. Li, X. Dong, L. Fang, and P. Chen, ‘‘Performance analysis of
formance benchmarking of smart contracts to assess miner incentives consensus algorithm in private blockchain,’’ in Proc. IEEE Intell. Vehicles
in ethereum,’’ in Proc. 14th Eur. Dependable Comput. Conf. (EDCC), Symp. (IV), Jun. 2018, pp. 280–285.
Sep. 2018, pp. 144–149. [81] H. M. A. Aljassas and S. Sasi, ‘‘Performance evaluation of proof-of-work
[60] M. T. Oliveira, G. R. Carrara, N. C. Fernandes, C. V. N. Albuquerque, and collatz conjecture consensus algorithms,’’ in Proc. 2nd Int. Conf.
R. C. Carrano, D. S. V. Medeiros, and D. M. F. Mattos, ‘‘Towards a Comput. Appl. Inf. Secur. (ICCAIS), May 2019, pp. 1–6.
performance evaluation of private blockchain frameworks using a realistic [82] R. Deloin, ‘‘Proof of collatz conjecture,’’ Asian Res. J. Math., vol. 14,
workload,’’ in Proc. 22nd Conf. Innov. Clouds, Internet Netw. Workshops no. 2, pp. 1–18, Jun. 2019.
(ICIN), Feb. 2019, pp. 180–187. [83] S. Benahmed, I. Pidikseev, R. Hussain, J. Lee, S. M. A. Kazmi,
[61] P. Zheng, Z. Zheng, X. Luo, X. Chen, and X. Liu, ‘‘A detailed and real- A. Oracevic, and F. Hussain, ‘‘A comparative analysis of distributed
time performance monitoring framework for blockchain systems,’’ in ledger technologies for smart contract development,’’ in Proc. IEEE
Proc. 40th Int. Conf. Softw. Eng. Softw. Eng. Pract. (ICSE-SEIP), 2018, 30th Annu. Int. Symp. Pers., Indoor Mobile Radio Commun. (PIMRC),
pp. 134–143. Sep. 2019, pp. 1–6.
[62] Q. Nasir, I. A. Qasse, M. A. Talib, and A. B. Nassif, ‘‘Performance anal- [84] R. Han, G. Shapiro, V. Gramoli, and X. Xu, ‘‘On the performance of
ysis of hyperledger fabric platforms,’’ Secur. Commun. Netw., vol. 2018, distributed ledgers for Internet of Things,’’ Internet Things, vol. 10,
pp. 1–14, Sep. 2018. Jun. 2020, Art. no. 100087.
[63] A. Baliga, N. Solanki, S. Verekar, A. Pednekar, P. Kamat, and [85] S. Park, S. Oh, and H. Kim, ‘‘Performance analysis of DAG-based
S. Chatterjee, ‘‘Performance characterization of hyperledger fabric,’’ in cryptocurrency,’’ in Proc. IEEE Int. Conf. Commun. Workshops (ICC
Proc. Crypto Valley Conf. Blockchain Technol. (CVCBT), Jun. 2018, Workshops), May 2019, pp. 1–6.
pp. 65–74. [86] S. Chandel, W. Cao, Z. Sun, J. Yang, B. Zhang, and T.-Y. Ni, ‘‘A multi-
[64] P. Thakkar, S. Nathan, and B. Viswanathan, ‘‘Performance benchmarking dimensional adversary analysis of RSA and ECC in blockchain encryp-
and optimizing hyperledger fabric blockchain platform,’’ in Proc. IEEE tion,’’ in Proc. Future Inf. Commun. Conf. Cham, Switzerland: Springer,
26th Int. Symp. Modeling, Anal., Simulation Comput. Telecommun. Syst. 2019, pp. 988–1003.
(MASCOTS), Sep. 2018, pp. 264–276. [87] J. Ferreira, M. Antunes, M. Zhygulskyy, and L. Frazão, ‘‘Performance
[65] E. Androulaki et al., ‘‘Hyperledger fabric: A distributed operating system of hash functions in blockchain applied to IoT devices,’’ in Proc. 14th
for permissioned blockchains,’’ in Proc. 13th EuroSys Conf., Apr. 2018, Iberian Conf. Inf. Syst. Technol. (CISTI), Jun. 2019, pp. 1–7.
pp. 1–15. [88] T. M. Fernández-Caramés and P. Fraga-Lamas, ‘‘A review on the
[66] T. S. L. Nguyen, G. Jourjon, M. Potop-Butucaru, and K. L. Thai, ‘‘Impact use of blockchain for the Internet of Things,’’ IEEE Access, vol. 6,
of network delays on hyperledger fabric,’’ in Proc. IEEE Conf. Comput. pp. 32979–33001, 2018.
Commun. Workshops (INFOCOM WKSHPS), Apr. 2019, pp. 222–227. [89] M. Alharby and A. van Moorsel, ‘‘Blocksim: A simulation framework for
[67] F. Geyer, H. Kinkelin, H. Leppelsack, S. Liebald, D. Scholz, G. Carle, blockchain systems,’’ ACM SIGMETRICS Perform. Eval. Rev., vol. 46,
and D. Schupke, ‘‘Performance perspective on private distributed ledger no. 3, pp. 135–138, 2019.
technologies for industrial networks,’’ in Proc. Int. Conf. Netw. Syst. [90] S. Pandey, G. Ojha, B. Shrestha, and R. Kumar, ‘‘BlockSIM: A practical
(NetSys), Mar. 2019, pp. 1–8. simulation tool for optimal network design, stability and planning,’’ in
[68] M. Kuzlu, M. Pipattanasomporn, L. Gurses, and S. Rahman, ‘‘Perfor- Proc. IEEE Int. Conf. Blockchain Cryptocurrency (ICBC), May 2019,
mance analysis of a hyperledger fabric blockchain framework: Through- pp. 133–137.
put, latency and scalability,’’ in Proc. IEEE Int. Conf. Blockchain [91] C. Faria and M. Correia, ‘‘BlockSim: Blockchain simulator,’’ in Proc.
(Blockchain), Jul. 2019, pp. 536–540. IEEE Int. Conf. Blockchain (Blockchain), Jul. 2019, pp. 439–446.
[69] S. Wang, ‘‘Performance evaluation of hyperledger fabric with malicious [92] M. Zander, T. Waite, and D. Harz, ‘‘DAGsim: Simulation of DAG-based
behavior,’’ in Proc. Int. Conf. Blockchain. Cham, Switzerland: Springer, distributed ledger protocols,’’ ACM SIGMETRICS Perform. Eval. Rev.,
2019, pp. 211–219. vol. 46, no. 3, pp. 118–121, Jan. 2019.
[70] Z. Shi, H. Zhou, Y. Hu, S. Jayachander, C. de Laat, and Z. Zhao, [93] M. Bottone, F. Raimondi, and G. Primiero, ‘‘Multi-agent based simula-
‘‘Operating permissioned blockchain in clouds: A performance study of tions of block-free distributed ledgers,’’ in Proc. 32nd Int. Conf. Adv. Inf.
hyperledger sawtooth,’’ in Proc. 18th Int. Symp. Parallel Distrib. Comput. Netw. Appl. Workshops (WAINA), May 2018, pp. 585–590.
(ISPDC), Jun. 2019, pp. 50–57. [94] B. Kusmierz, W. Sanders, A. Penzkofer, A. Capossele, and A. Gal,
[71] T. T. A. Dinh, R. Liu, M. Zhang, G. Chen, B. C. Ooi, and J. Wang, ‘‘Properties of the tangle for uniform random and random walk tip
‘‘Untangling blockchain: A data processing view of blockchain systems,’’ selection,’’ in Proc. IEEE Int. Conf. Blockchain (Blockchain), Jul. 2019,
IEEE Trans. Knowl. Data Eng., vol. 30, no. 7, pp. 1366–1385, Jul. 2018. pp. 228–236.
[72] S. Rouhani and R. Deters, ‘‘Performance analysis of ethereum transac- [95] D. Huang, X. Ma, and S. Zhang, ‘‘Performance analysis of the raft
tions in private blockchain,’’ in Proc. 8th IEEE Int. Conf. Softw. Eng. consensus algorithm for private blockchains,’’ IEEE Trans. Syst., Man,
Service Sci. (ICSESS), Nov. 2017, pp. 70–74. Cybern. Syst., vol. 50, no. 1, pp. 172–181, Jan. 2020.
[73] R. Yasaweerasinghelage, M. Staples, and I. Weber, ‘‘Predicting latency [96] B. Cao, S. Huang, D. Feng, L. Zhang, S. Zhang, and M. Peng, ‘‘Impact
of blockchain-based systems using architectural modelling and sim- of network load on direct acyclic graph based blockchain for Internet
ulation,’’ in Proc. IEEE Int. Conf. Softw. Archit. (ICSA), Apr. 2017, of Things,’’ in Proc. Int. Conf. Cyber-Enabled Distrib. Comput. Knowl.
pp. 253–256. Discovery (CyberC), Oct. 2019, pp. 215–218.

VOLUME 8, 2020 126949


C. Fan et al.: Performance Evaluation of Blockchain Systems: A Systematic Survey

[97] Y. Kawase and S. Kasahara, ‘‘Transaction-confirmation time for bitcoin: [121] P. Erdos, ‘‘On random graphs,’’ Publicationes Mathematicae, vol. 6,
A queueing analytical approach to blockchain mechanism,’’ in Proc. Int. pp. 290–297, Nov. 1958.
Conf. Queueing Theory Netw. Appl. Cham, Switzerland: Springer, 2017, [122] B. Bollobás and B. Béla, Random Graphs, no. 73. Cambridge, U.K.:
pp. 75–88. Cambridge Univ. Press, 2001.
[98] Q.-L. Li, J.-Y. Ma, and Y.-X. Chang, ‘‘Blockchain queue theory,’’ in
Proc. Int. Conf. Comput. Social Netw. Cham, Switzerland: Springer, 2018,
pp. 25–40. CAIXIANG FAN received the bachelor’s degree
[99] Q.-L. Li, J.-Y. Ma, Y.-X. Chang, F.-Q. Ma, and H.-B. Yu, ‘‘Markov from the University of Electronic Science and
processes in blockchain systems,’’ Comput. Social Netw., vol. 6, no. 1, Technology of China, in 2012, and the M.Sc.
pp. 1–28, Dec. 2019. degree from the Electrical and Computer Engi-
[100] S. Ricci, E. Ferreira, D. S. Menasche, A. Ziviani, J. E. Souza,
neering (ECE) Department, University of Alberta,
and A. B. Vieira, ‘‘Learning blockchain delays: A queueing theory
in 2019, where he is currently pursuing the Ph.D.
approach,’’ ACM SIGMETRICS Perform. Eval. Rev., vol. 46, no. 3,
pp. 122–125, Jan. 2019. degree in software engineering and intelligent sys-
[101] W. Zhao, S. Jin, and W. Yue, ‘‘Analysis of the average confirma- tems. He worked as an IT Engineer at Huawei
tion time of transactions in a blockchain system,’’ in Proc. Int. Conf. Technologies Company Ltd., China. He is cur-
Queueing Theory Netw. Appl. Cham, Switzerland: Springer, 2019, rently working as a Research Assistant at the
pp. 379–388. Performant and Available Computing Systems (PACS) Lab under the
[102] S. Geissler, T. Prantl, S. Lange, F. Wamser, and T. Hossfeld, ‘‘Discrete- co-supervision of Dr. H. Khazaei and Dr. P. Musilek. His research interests
time analysis of the blockchain distributed ledger technology,’’ in Proc. include blockchain and DAG-based distributed ledger system design, perfor-
31st Int. Teletraffic Congr. (ITC), Aug. 2019, pp. 130–137. mance evaluation, and modeling.
[103] M. Alaslani, F. Nawab, and B. Shihada, ‘‘Blockchain in IoT systems:
End-to-end delay evaluation,’’ IEEE Internet Things J., vol. 6, no. 5,
pp. 8332–8344, Oct. 2019. SARA GHAEMI (Graduate Student Member,
[104] U. R. Krieger, M. H. Ziegler, and H. L. Cech, ‘‘Performance modeling IEEE) received the B.S. degree in electrical engi-
of the consensus mechanism in a permissioned blockchain,’’ in Proc. Int. neering from the Amirkabir University of Tech-
Conf. Comput. Netw. Cham, Switzerland: Springer, 2019, pp. 3–17. nology, Tehran, Iran, in 2018. She is currently
[105] P. Ferraro, C. King, and R. Shorten, ‘‘Distributed ledger technology for
pursuing the M.S. degree in software engineering
smart cities, the sharing economy, and social compliance,’’ IEEE Access,
vol. 6, pp. 62728–62746, 2018. and intelligent systems with University of Alberta,
[106] P. Yuan, K. Zheng, X. Xiong, K. Zhang, and L. Lei, ‘‘Performance Edmonton, AB, Canada. She is a Research Assis-
modeling and analysis of a hyperledger-based system using GSPN,’’ tant with the University of Alberta and a Visit-
Comput. Commun., vol. 153, pp. 117–124, Mar. 2020. ing Research Assistant with the Performant and
[107] H. Sukhwani, N. Wang, K. S. Trivedi, and A. Rindos, ‘‘Performance Available Computing Systems (PACS) Lab, York
modeling of hyperledger fabric (permissioned blockchain network),’’ in University, Toronto, ON, Canada. Her research interests include cloud com-
Proc. IEEE 17th Int. Symp. Netw. Comput. Appl. (NCA), Nov. 2018, puting, distributed ledger technologies, and distributed systems.
pp. 1–8.
[108] H. Zhang, C. Jin, and H. Cui, ‘‘A method to predict the performance
and storage of executing contract for ethereum consortium-blockchain,’’ HAMZEH KHAZAEI (Member, IEEE) received
in Proc. Int. Conf. Blockchain. Cham, Switzerland: Springer, 2018, the Ph.D. degree in computer science from the
pp. 63–74. University of Manitoba. He was an Assistant Pro-
[109] N. Papadis, S. Borst, A. Walid, M. Grissa, and L. Tassiulas, ‘‘Stochas- fessor with the University of Alberta, a Research
tic models and wide-area network measurements for blockchain design Associate with the University of Toronto, and
and analysis,’’ in Proc. IEEE Conf. Comput. Commun. (INFOCOM),
a Research Scientist with IBM. He is currently
Apr. 2018, pp. 2546–2554.
[110] Y. Shahsavari, K. Zhang, and C. Talhi, ‘‘Performance modeling an Assistant Professor with the Department of
and analysis of the bitcoin inventory protocol,’’ in Proc. IEEE Electrical Engineering and Computer Science,
Int. Conf. Decentralized Appl. Infrastruct. (DAPPCON), Apr. 2019, York University. He extended queuing theory and
pp. 79–88. stochastic processes to accurately model the per-
[111] G. Bolch, S. Greiner, H. De Meer, and K. S. Trivedi, Queueing Networks formance and availability of cloud computing systems at the University
and Markov Chains: Modeling and Performance Evaluation with Com- of Manitoba. His research interests include performance modeling, cloud
puter Science Applications. Hoboken, NJ, USA: Wiley, 2006. computing, and engineering distributed systems.
[112] D. Ongaro and J. Ousterhout, ‘‘In search of an understandable consensus
algorithm,’’ in Proc. USENIX Annu. Tech. Conf. (USENIX ATC), 2014,
pp. 305–319. PETR MUSILEK (Senior Member, IEEE) received
[113] E. Gelenbe, G. Pujolle, E. Gelenbe, and G. Pujolle, Introduction to
the Dipl.Ing. degree (Hons.) in electrical engi-
Queueing Networks, vol. 2. New York, NY, USA: Wiley, 1998.
[114] M. L. Chaudhry and J. G. C. Templeton, ‘‘The queuing system M/GB/l neering and the Ph.D. degree in cybernetics from
and its ramifications,’’ Eur. J. Oper. Res., vol. 6, no. 1, pp. 56–60, the Military Academy, Brno, Czech Republic,
Jan. 1981. in 1991 and 1995, respectively. In 1995, he was
[115] M. Castro and B. Liskov, ‘‘Practical Byzantine fault tolerance,’’ in Proc. appointed as the Head of the Computer Appli-
OSDI, vol. 99, 1999, pp. 173–186. cations Group, Institute of Informatics, Military
[116] A. Bessani, J. Sousa, and E. E. P. Alchieri, ‘‘State machine replication for Medical Academy, Hradec Králové, Czech Repub-
the masses with BFT-SMART,’’ in Proc. 44th Annu. IEEE/IFIP Int. Conf. lic. From 1997 to 1999, he was a NATO Sci-
Dependable Syst. Netw., Jun. 2014, pp. 355–362. ence Fellow with the Intelligent Systems Research
[117] ‘‘The ZILLIQA technical whitepaper,’’ ZILLIQA Team, White Paper, Laboratory, University of Saskatchewan, Canada. In 1999, he joined the
Sep. 2017, p. 2019, vol. 16. Accessed: Jul. 7, 2020. [Online]. Available: Department of Electrical and Computer Engineering, University of Alberta,
https://docs.zilliqa.com/whitepaper.pdf
[118] J. Medhi, ‘‘Waiting time distribution in a Poisson queue with a general
Canada, where he is currently a Full Professor. He was the Director of
bulk service rule,’’ Manage. Sci., vol. 21, no. 7, pp. 777–782, Mar. 1975. Computer Engineering Program, from 2016 to 2017. He serves as an Asso-
[119] H. Wang, J. Li, Z. Shen, and Y. Zhou, ‘‘Approximations and bounds ciate Chair (Research and Planning) with the ECE Department. His research
for (n, k) fork-join queues: A linear transformation approach,’’ in Proc. interests include artificial intelligence and energy systems. He has developed
18th IEEE/ACM Int. Symp. Cluster, Cloud Grid Comput. (CCGRID), a number of innovative solutions in the areas of renewable energy systems,
May 2018, pp. 422–431. smart grids, wireless sensor networks, and environmental monitoring and
[120] G. Ciardo, J. K. Muppala, and K. S. Trivedi ‘‘SPNP: Stochastic Petri net modeling.
package,’’ in Proc. PNPM, vol. 89, 1989, pp. 142–151.

126950 VOLUME 8, 2020

You might also like