User Identification and Behaviour Patterns On The Ethereum Blockchain: An Exploratory Study
User Identification and Behaviour Patterns On The Ethereum Blockchain: An Exploratory Study
Master’s Thesis
for the Attainment of the Degree
Master of Science
at the TUM School of Management
of the Technical University of Munich
Offering the possibility to create and deploy smart contracts, the Ethereum blockchain
covers various use cases, such as decentralised exchanges, financial applications and
tokenisation of digital assets. The use of these applications and their number of users
have witnessed a steady growth since the inception of the platform. Enabled by the
transparent yet pseudonymous nature of blockchains, existing research on transactional
data has focused on the general quantitative analysis of the on-chain data from a
network perspective. This study, however, contributes to knowledge in this area by
identifying user groups and describing different behaviour patterns on an end-user level.
An exploratory quantitative analysis on the on-chain data has been conducted to shed
light on the different user activity levels, how features, such as smart contracts,
decentralised applications and tokens are used, and how users engage with the mining
process of the blockchain. The study leverages data from different sources, such as
Google Big Query and Etherscan, and examines the transactional data from the
inception of the Ethereum blockchain in July 2015 until February 2021. Regarding the
activity level of the externally owned accounts, it could be shown that more than 90% of
all accounts have sent at most ten transactions. A subset of regular users has been
defined to limit the study to a more representative set of addresses. These users
primarily make transactions targeted at smart contracts. Centralised and decentralised
exchanges are the most commonly used applications. Furthermore, over 10% of the
regular users have engaged in the block finding process by participating in mining
pools. Although these mining pool participants send a substantial amount to
decentralised exchanges, the overall majority of mining rewards of each miner is
distributed over, on average, 45 different low-activity addresses on the blockchain.
These results raise questions about the appropriateness of quantitative analyses on an
address level. They indicate the necessity of a holistic address clustering approach to
account for multi-address usage and achieve a more comprehensive picture of user
behaviour patterns.
II
Table of Content
Abstract ............................................................................................................................ II
List of Figures .................................................................................................................. V
List of Tables................................................................................................................... VI
1 Introduction .................................................................................................................... 1
2 Background .................................................................................................................... 3
2.1 Externally Owned Accounts .................................................................................... 4
2.2 Smart Contracts ....................................................................................................... 5
2.3 Decentralised Applications...................................................................................... 6
2.4 Tokens ..................................................................................................................... 6
2.5 Consensus Mechanism ............................................................................................ 7
3 Related Work ................................................................................................................. 8
4 Data Collection............................................................................................................. 12
4.1 Ethereum Data Structure ....................................................................................... 12
4.2 Google Big Query ................................................................................................. 13
5 Data Pre-Processing ..................................................................................................... 14
5.1 On-Chain Data Sources ......................................................................................... 15
5.2 Off-Chain Data Enrichment .................................................................................. 16
5.3 Data Aggregation and Transformation .................................................................. 18
6 Results .......................................................................................................................... 21
6.1 General User Characterisation and Transaction Dynamics ................................... 22
6.1.1 Address Structure ............................................................................................ 22
6.1.2 Transaction Dynamics and Different Activity Levels .................................... 23
6.1.3 Smart Contract Interaction .............................................................................. 29
6.1.4 ERC20 Contract Interaction and Token Transfer Dynamics .......................... 32
6.1.5 Dapp Interaction ............................................................................................. 33
6.2 Mining Pool Participants ....................................................................................... 36
6.2.1 Mining Pool Structure..................................................................................... 37
6.2.2 Miner Identification ........................................................................................ 38
6.2.3 Mining Behaviour ........................................................................................... 40
6.2.4 Transaction Behaviour .................................................................................... 44
6.2.5 Distribution of Mined Ether ............................................................................ 48
7 Conclusion ................................................................................................................... 52
7.1 Overall Summary and Key Findings ..................................................................... 52
7.2 Research Limitations ............................................................................................. 54
III
7.3 Directions for Further Research ............................................................................ 55
References ....................................................................................................................... 57
Appendix A ..................................................................................................................... 60
Appendix B ..................................................................................................................... 65
Appendix C ..................................................................................................................... 67
Declaration of Authorship ............................................................................................... 81
IV
List of Figures
Figure 1: Data Collection until Data Interpretation ........................................................ 15
Figure 2: ECDF of Outgoing Transaction Count ............................................................ 24
Figure 3: ECDF of Active Days ...................................................................................... 25
Figure 4: Comparison of Idle Times and Ether Price ..................................................... 26
Figure 5: Cluster Sizes with Logarithmical Scale ........................................................... 28
Figure 6: Active Days before and after Head/Tail Breaks Clustering ............................ 29
Figure 7: Smart Contract Ratio ....................................................................................... 30
Figure 8: Zero-Value Ratio for all Transactions ............................................................. 31
Figure 9: Zero-Value Ratio for Transactions to Smart Contracts ................................... 31
Figure 10: Number of ERC20 Contract Transactions ..................................................... 33
Figure 11: Comparison of Incoming and Outgoing Token Transfers of Regular Users . 33
Figure 12: Incoming Transactions for Miners and Non-Miners ..................................... 43
Figure 13: Active Days of Miners and Non-Miners ....................................................... 44
Figure 14: Average Payout Value in Ether ..................................................................... 44
Figure 15: Smart Contract Ratio Miners vs Non-Miners ................................................ 47
Figure 16: Zero-Value Ratio Miners vs Non-Miners...................................................... 47
Figure 17: Number of Unique Recipients Miners vs Non-Miners .................................. 47
V
List of Tables
Table 1: Initial Data Frame ............................................................................................. 19
Table 2: Address Structure .............................................................................................. 23
Table 3: Activity Level of Top 10 Dapp ......................................................................... 36
Table 4: Activity Level of Exchanges (Etherscan Labels).............................................. 36
Table 5: Top 5 of 72 Mining Pools by Blocks Mined .................................................... 38
Table 6: Descriptives about Miners ................................................................................ 43
Table 7: Transaction Behaviour Miner vs Non-Miners (Mean & Median) .................... 46
Table 8: Comparison of Incoming and Outgoing Ether for Miners ................................ 49
Table 9: Cumulative Distribution of Ether Sent and Received by Top Addresses ......... 50
Table 10: Ether and Address Distribution Secondary Receivers .................................... 50
Table 11: Top 12 Secondary Receivers by Total Ether Received .................................. 51
VI
1 Introduction
Dubbing the Ethereum blockchain as the “Next-Generation Smart Contract and
Decentralised Application Platform” (Buterin, 2021, para. 1), Vitalik Buterin, the founder
and developer, launched his novel decentralised governed ecosystem in 2015. With
blockchain as the underlying technology, Ethereum introduces functionalities and
applications that go beyond the sole purpose of exchanging cryptocurrency. Although
Ethereum shares the same underlying blockchain technology with other cryptocurrencies,
such as Bitcoin, it includes several distinct features, enabling entirely new possibilities of
user interactions (Buterin, 2021). The distinct property of Ethereum lies in the unique
architecture of the blockchain, which determines the constraints and conditions of
changes in the data structure. The execution of commands and operations is based on a
stack-based architecture, represented by the Ethereum Virtual Machine (Wood). This
virtual machine executes a specific low-level machine code compiled from the high-level
programming language Solidity to write and deploy smart contracts on the Ethereum
blockchain. Unlike Bitcoin’s scripting language, Solidity offers Turing completeness and
facilitates the creation of sophisticated applications (Cai et al., 2018). Ethereum’s smart
contracts determine the prerequisites and sequence of transactions according to the smart
contract’s code. They can also be programmed to interact and exchange data with other
smart contracts on the blockchain. Consequently, whole applications usually used on
platforms such as mobile phones and conventional desktop computers can be created with
smart contracts (Oliva et al., 2020). These so-called decentralised applications cover
various applications, such as games, gambling, marketplaces, and even social networks
(Wu, 2019).
1
decentralised game CryptoKitties1, where ERC721 tokens constitute unique assets that
are traded in the game (Sako et al., 2021).
Although smart contracts and tokens are two significant features of Ethereum, there are
also other noteworthy features or properties that end-users can interact with. Recent
cryptocurrency price trends indicate a growing interest in using Ethereum for investment
and trading purposes (Schaupp & Festa, 2018). Both decentralised and centralised
exchanges facilitate the trade of Ether, the native currency of Ethereum, against other
crypto- or fiat currencies. Another type of entity that is an essential element of the
Ethereum ecosystem are mining pools. Mining pools are cooperating miners who
consolidate their computing power to participate in the block finding process (Cong et
al., 2021).
With the growing popularity and number of unique addresses on the Ethereum
blockchain, scholarly interest in the internal blockchain activity has increased during the
last several years (Casino et al., 2019). Although a plethora of blockchain studies have
recognised the necessity for quantitative analyses of blockchain data, most of them have
focused on the general exploratory analysis of the Ethereum blockchain as a whole
(Anoaica & Levard, 2/26/2018 - 2/28/2018; Ferretti & D’Angelo, 2020; Guo et al., 2019;
Motamed & Bahrak, 2019; Sun et al., 2019; Zanelatto Gavião Mascarenhas et al., 2020;
Zheng et al., 2020). Thus, existing literature does not address the question of what distinct
user groups exist and what these groups engage in on the Ethereum blockchain. The
identification of user groups and their behaviour patterns is relevant in many regards.
Scholars have pinpointed the existence of obstacles and entry barriers that hinder the mass
adoption of decentralised applications, for example (Glomann et al., 2019). Therefore,
developers of decentralised applications have an interest in examining the user base of
their applications to assess their popularity and adoption behaviour, which, in turn, can
be considered in future updates. A study from 2017 on the end-user adoption of Bitcoin
as a digital currency revealed that technical curiosity is an important motivation, further
underlining the absent mass adoption by groups other than technology-savvy users
(Presthus & O’Malley, 2017). This study’s insights can serve as a starting point for
tackling the above-mentioned issues by providing a current picture of the user base and
its activities on the Ethereum blockchain.
1
Decentralised game that is centred around non-fungible in-game items, https://www.cryptokitties.co/
2
To account for the increasing diversity of applications, I conducted a quantitative
exploratory study on the user identification and behaviour patterns on the Ethereum
blockchain. The study characterises users by examining their activity level, smart contract
and decentralised applications interaction, and participation in the mining process. To
gain insights into users’ transactional behaviour on the blockchain, on-chain data
extracted from Google Big Query is used and leveraged with several labelled lists from
State of the Dapps and Etherscan. All externally owned accounts and their transactions
from July 2015 until February 2021 are included in the analysis. By primarily analysing
the outgoing transactions of externally owned accounts, the study shows that the regularly
used functionalities of Ethereum go beyond the simple transfer of Ether. Users seem to
engage in smart contracts to a significant extent and use certain types of decentralised
apps more than others. In understanding what different user groups can be distinguished
and what they use the blockchain for, relevant stakeholders can follow a more user-centric
approach for future developments and monitor the network’s status more holistically.
This paper is subdivided into seven sections, with the introduction encompassing the
research topic, the current gap in the literature, the scope of this study and the objectives.
The subsequent section outlines several relevant concepts and elements of the Ethereum
blockchain that are subject to the analysis. Then, related research in the field is examined,
including literature on the quantitative analysis of both the Ethereum and the Bitcoin
blockchain. The data collection and data pre-processing sections elucidate the data
retrieval procedures and the creation of variables derived from the initial data. The main
section is devoted to the presentation and discussion of the results of the exploratory
analysis. It first presents the analysis results on the general user structure and then
elaborates on mining pool participants and their mining and transaction behaviour. In a
final step, the analysis outcomes are concluded, and directions for further research are
voiced.
2 Background
This section will provide background information on the Ethereum blockchain relevant
to the analysis. More specifically, key concepts, such as consensus mechanism, addresses,
smart contracts, decentralised applications and tokens, will be elucidated from a
functional perspective to highlight their role in user-blockchain interactions.
3
2.1 Externally Owned Accounts
Smart contracts, on the other hand, are controlled by their code and cannot initiate
transactions. That infers that transactions only occur when an EOA initiates a transaction.
However, both EOAs and smart contracts can be recipients. The recipient address field
may be left out in case of a smart contract creation transaction. Apart from that, a
transaction with an EOA as the recipient has the sole purpose of transferring Ether.
Optionally, limited text information can be sent through the data input field as well.
The sender of each transaction must provide a defined number of input data for the
transaction to be regarded as valid and executed by the blockchain. This input data
includes specifications about the maximum amount of transaction fees (gas), the gas limit,
and the amount of transaction fee per computational step, the gas price that the sender is
willing to pay, to perform the transaction. Although the simple Ether transfer to other
EOAs requires gas consumption as well, these specifications have a more critical role for
smart contract calls. All subsequent computational steps that a smart contract performs
subsequently to a smart contract call must be covered by the gas provided by the
corresponding transaction’s initiator.
The interaction with other EOAs can also be driven by other motifs. Exchanges on the
Ethereum blockchain commonly require the user to send their respective Ether amounts
4
to a deposit address, often EOAs that belong to the exchange, to use their services if users
do not deposit fiat currencies (Victor, 2020). Since the number of different addresses that
can belong to a single person is not limited, Ether transfers to other externally owned
accounts can also represent shifts in Ether to a specific address of the same owner (Somin
et al., 2018). These two examples are non-exhaustive and illustrate the difficulty of
assigning different EOAs to an owner without additional information.
As mentioned earlier, smart contracts are the second group of addresses that can be
subject to a transaction as the recipient role. Smart contracts have to be successfully
deployed on the blockchain before they can be interacted with. EOAs can create smart
contracts by performing a transaction that contains the compiled contract code with an
undefined recipient. If the specified gas limit and gas price are sufficient to cover the
transaction fee of the respective deployment transaction, the smart contract gets
incorporated into the current block and is available on the blockchain. However,
containing the respective contract code, smart contracts can also deploy new smart
contracts themselves. Once a smart contract is deployed, it remains on the blockchain and
is immutable. More specifically, no changes to the contract code can be made
retrospectively. This limitation is due to the immutability nature of the Ethereum
blockchain, ensuring a tamper-proof contract for involved participants. However, a self-
destruct function is available to remove the smart contract from the blockchain.
Developers can draw upon this functionality if a particular smart contract’s replacement
is necessary or if the contract goes obsolete for any other reason.
Successfully deployed smart contracts can then be accessed by EOAs or other smart
contracts. Interactions with smart contracts initiated by EOAs are represented by
transactions to these contracts to invoke their functions. The invocation of these functions
causes the smart contract to execute its contract code according to the function that has
been invoked. To what extent the computation for performing the function takes place
depends on the predetermined gas limit and gas price that the caller or initiator has set.
Depending on the code’s content and the smart contract’s role, a function invocation can
generate a cascade of other function invocations on other smart contracts. These contract
invocations that stem from smart contracts are referred to as message calls, which always
5
have a transaction by an externally owned account as an origin. The gas that is consumed
by message calls is forwarded and provided by the original transaction. This study does
not cover contract creations and contract invocations by smart contracts, as this study
focuses on user behaviour patterns.
With the proliferation of more complex decentralised applications and the increasing
popularity of token exchange, the analysis of smart contract interactions can constitute a
promising source that provides relevant insights into the transaction behaviour pattern of
users of the Ethereum blockchain (Wu, 2019).
End-users interact with Dapps by sending transactions from their EOA to the
corresponding smart contracts. Therefore, the usage of Dapps always consumes a specific
amount of Ether, gas, that is required to carry out the computations of the application.
2.4 Tokens
As mentioned earlier, ERC20 is the most common token standard among a variety of
other standards that have been introduced in the last few years. A smart contract can be
programmed to act as a bookkeeping entity for a crypto token on the Ethereum
blockchain. The so-called ERC20 interface, which the token contract must include,
6
enables different functionalities, such as token transfers, balance reading of token holders
and total token supply. To transfer tokens from an EOA to another, the sender performs
a transaction to a corresponding token contract. More specifically, the token contract’s
transfer function has to be invoked, leading to the token transfer. Information on the token
balances of each token holder is held by the token contract and thus also updated within
the token contract.
Smart contracts that incorporate this framework handle tokens interoperably with other
smart contracts that are ERC20 compatible. Hence, this standardisation enables more
efficient handling of different tokens and facilitates a fast-developing trading and
exchange token network. Already proposed by Buterin at the inception of Ethereum, it
paved the way for the proliferation of cryptocurrency exchanges that allow users to
exchange fiat money for tokens. Furthermore, crypto tokens are often used to represent
secured digital assets comparable to traditional securities. Initial coin offerings represent
a major use case of crypto tokens and make fundraising through the Ethereum blockchain
possible (W. Chen et al., 2020).
7
their resources to achieve a higher combined computational power. Participants in these
mining pools are offered a steady income after time. Since most mining pools pay out
their Ether reward according to the share that all participants have contributed, they attract
a significant number of users and enable an accessible strategy to engage in the consensus
mechanism.
3 Related Work
A plethora of studies in the field of cryptocurrency from the last decade has recognised
the significance of user activity analysis of blockchain networks. However, the vast
majority of scientific work has focused on the Bitcoin network. Released in January 2009,
the cryptocurrency Bitcoin surpasses Ethereum regarding market capitalisation and
operating time at the time of writing. As the applications of Bitcoin and its non-Turing
complete scripting language are far more limited than those of Ethereum, user-related
research has centred around entity clustering and general quantitative analyses of the
Bitcoin blockchain rather than on actual behaviour patterns. Since the use of multiple
addresses by the same user is a common behaviour to increase privacy and anonymity to
a great extent, clustering represents an appropriate method to discover structures on the
blockchain (Ermilov et al., 2017; Victor, 2020). Both above-mentioned areas are highly
related and help us to put the results into perspective.
By using newly developed address clustering heuristics and interacting with a set of
predefined Bitcoin blockchain services, such as mining pools, gambling services, vendors
and exchanges, Meiklejohn et al. (2013) were able to elucidate the transactional structure
of the Bitcoin network. More specifically, the role of various relevant services within the
network was put into perspective by analysing their transactional behaviour (outgoing
and incoming transactions, number of different users). Satoshi Dice, a blockchain-based
betting game, has been examined more extensively. Features, such as the number of
outgoing and incoming transactions and average Bitcoin transferred, revealed that Satoshi
Dice accounted for a significant proportion of micro-valued transactions (transactions
with low Bitcoin value transferred). As these services can be, to some extent, compared
to decentralised applications and other concepts on the Ethereum network, this analysis
is possibly relevant for exploring the interactions between addresses and Dapps on the
Ethereum blockchain.
8
Employing the heuristics used earlier by Meiklejohn et al. (2013), Ermilov et al. (2017)
leveraged both on-chain Bitcoin data and off-chain data to cluster addresses that belong
to the same owner. The transactional behaviour was examined and taken into account in
that process. More specifically, behaviour patterns, such as sending Bitcoin to multiple
recipients in the same transactions, were then augmented with address tags that the
authors collected from public forums and social networks to assess the resulting clusters.
Also addressing the topic of privacy and anonymity of the Bitcoin network, Monaco
(2015) proposed a user-identification method to identify users on the blockchain by
analysing various transactional features. The behaviour pattern that the author exploited
consists of features, such as the time interval between two successive transactions, the
hour of the day of the transaction timestamp, the ratio between outgoing and incoming
Bitcoin value and the ratio between sent and received transactions. By quantifying the
behaviour pattern and observing the transactions over a sufficient amount of time, the
author demonstrates the deterministic character of long-term transactional behaviour for
the analysed set of Bitcoin addresses. His study’s methodology represents an alternative
to common address clustering heuristics to identify entities on the blockchain. The use of
behavioural biometrics in his work indicates the importance of specific behavioural
features on the Bitcoin network that can potentially be leveraged for behaviour analysis
on the Ethereum blockchain as well.
A more model-based method has been employed by Harlev et al. (2018) to de-anonymise
users. Using supervised machine learning, the study succeeded at classifying a set of
addresses with predefined labels. Unidentified clusters were classified with an accuracy
of 77% with the gradient boosting classifier. Variables that were used in the analysis were
chosen on a transactional level and included features, such as transaction timestamp and
Bitcoin value sent or received.
As the Ethereum blockchain matured over time, the respective blockchain research has
also gained momentum. Differing substantially from the Bitcoin blockchain in
functionality and technical specifications, Ethereum has facilitated different research
directions covering the analysis of different components and concepts specific to
Ethereum, such as smart contracts, decentralised applications and token systems. More
holistic approaches are taken by studies on the quantitative analysis of internal activities.
As a network with a high number of users interacting with each other in a complex
9
manner, Ethereum has been subject to several graph analysis studies that attempt to shed
light on the blockchain’s transactional structure.
Address clustering for the Ethereum blockchain has been conducted by several studies as
well (Béres et al., 5/28/2020; Sun et al., 2019; Victor, 2020). With the existence of another
type of address, smart contracts, address grouping widened its scope to include newer
heuristics. Victor (2020) discovers new clustering methods on the Ethereum blockchain
by incorporating patterns, such as exchange deposit address reuse and airdrop multi-
participation, drawing from token transfer and transaction data. Like Monaco in his work
on Bitcoin user identification, Béres et al. (5/28/2020) used similar biometrics or
identifiers to reveal the same owners of addresses. Time-of-day transaction activity, gas
price distribution and transaction graph analysis were incorporated in the study.
The first systematic graph analysis on the internal Ethereum activity has been conducted
by T. Chen et al. (2018). The authors constructed three graphs, money flow graph, smart
contract creation graph, and smart contract invocation graph, to perform a range of
quantitative analyses on the on-chain data. Covering the blockchain data up until 2018,
when the paper was published, the study gave insights into addresses’ blockchain usage
preferences. They have discovered that addresses preferred sending Ether to other
addresses over engaging with smart contracts. Furthermore, they pointed out that only a
few decentralised applications existed at that time, stressing the predominance of
financial Dapps and exchanges. With the emergence of new decentralised applications
and the ERC-20 token standard in the following years after the blockchain’s inception in
2015, behaviour patterns, especially smart contract and token usage, are likely to have
undergone substantial shifts. Putting our results in relation to that development helps to
understand the past growth and the future directions of the network.
10
A more recent study from the year 2020 by Lee et al. (2020) followed a similar approach
to examine the similarity of different networks on the Ethereum blockchain to social
networks. Four different networks were created covering different components of the
blockchain, such as smart contract interactions, token transactions and the complete
EOA-smart contract interactions. Applying graph algorithms, the authors have discovered
distinctive higher outdegree nodes, such as mining pools and transaction mixers, and
distinctive higher indegree vertices, such as ICO smart contracts. Highly connected nodes
such as the exchange Binance account for most of the connections on the network.
To better comprehend the results of the following explorations and analyses regarding
smart contract and Dapp interactions, it is reasonable to review work on smart contract
activity and structure as well. In a recent study by Oliva et al. (2020), an exploratory
quantitative approach has been followed to elucidate the activity level, categorisation and
source code complexity of smart contracts on the blockchain. The authors cross-linked
blockchain data from different sources, on-chain data from Google Big Query and off-
chain data from platforms, such as Etherscan and State of the Dapps. Regarding the
activity level, the smart contract data has been analysed thoroughly along multiple
dimensions and features. The uneven distribution of transactions could be confirmed for
smart contract as well. It has been revealed that less than 0.05% of all smart contracts
received 80% of all smart contract transactions and that a majority of verified smart
contracts belong to that 0.05% of high-activity smart contracts. Furthermore, most of the
high-activity smart contracts have been active in the last 100 days, indicating an even
higher activity concentration on a small number of active smart contracts. Oliva et al.
(2020) also discovered a high proportion of token contracts among the high-activity
contracts. This points to the general high usage of tokens on the blockchain. The authors
also indicated that Games, Exchanges and Gambling Dapps are the most popular Dapps
that belong to high-activity contracts.
11
deploying and executing smart contracts of Dapps to understand the Ether consumption
for Dapp usage more thoroughly.
Constituting one of the most important use cases of the Ethereum blockchain, ERC20
token smart contracts have been subject to several quantitative studies as well. A network
analysis on the ERC20 token networks performed by Victor and Lüders (2019) revealed
activity distribution and usage patterns. They indicated that most token transfers focused
on the token distribution from large emitting addresses and high in-degree exchanges that
users send their tokens to. Furthermore, EOAs do not seem to send tokens to each other.
Tokens tend to remain in the ownership of the respective addresses once they have been
emitted, showing a low degree of circulation.
4 Data Collection
In the following, I describe which data sources have been used for the analysis and how
the relevant data is organised and structured within these sources. This helps us to
understand the nature of the data and to assess its credibility.
As the analysis of user groups and their behaviour patterns on the Ethereum blockchain
mainly relies on the actual on-chain data, it is essential to comprehend the data’s
organisational structure and how it is leveraged in the following to generate insights on
an account level. Operating within the framework of a global virtual machine, the
Ethereum blockchain features several transaction execution and data storage mechanisms
that differ significantly from those of other blockchain networks. Interactions among
EOAs and between EOAs and smart contracts result in changes in the virtual machine’s
global state and thus in the properties of the corresponding addresses on the blockchain.
These transactions and their relevant information are organised in a unique data structure,
a Patricia tree, which has been modified to incorporate properties of a Merkle tree.
Relevant details of transactions include but are not limited to the value, gas price and
transaction nonce. Each block includes the root node of the transaction tree of their
corresponding transactions in the block header. Consistent with the immutable nature of
blockchains, completed transactions within a block cannot be altered, thereby
12
determining status details, such as the number of successfully sent transactions and Ether
balance, of every address on the blockchain. The addresses and their corresponding
details are stored in another separate structure, the state tree, also a modified Patricia tree,
which exists globally and is constantly updated by each completed transaction.
Furthermore, the modified Patricia tree’s specific properties facilitate an efficient
reference between the current state and previous states, rendering the storage of the whole
blockchain history in every block obsolete.
As the relation between the state tree and the transaction tree suggests, the Ethereum
blockchain uses an account-based ledger where each unique address refers to a unique
account. This logic is fundamentally different from the transaction-based ledger of the
Bitcoin blockchain, which state consists of the assignment of unspent Bitcoins to the
accounts that are eligible to spend these Bitcoins. The distinct assignment between
addresses and their balances on the Ethereum blockchain makes the data processing by
Google Big Query and the resulting analysis more straightforward from a user
perspective.
Obtaining the complete blockchain data requires running a full node because all the data
can be inferred from the full node. A full node provides a copy of the entire state of the
Ethereum blockchain. However, since processing data from a full node to obtain
blockchain data in a meaningful, queryable structure for further analysis is
computationally complex and non-trivial, we resort to using Google Big Query’s online
service2.
To fill the gap of a missing blockchain data extraction tool capable of handling and
dealing with complex and unconventional data structures, Google Big Query
implemented Python scripts3 for the extraction, transformation and loading of internal
blockchain data. The data includes information about blocks, transactions,
ERC20/ERC721 tokens and their transfers, receipts, logs, smart contracts and internal
transactions. More precisely, Google Big Query extracts the data from nodes in the cloud,
2
https://cloud.google.com/bigquery
3
https://github.com/blockchain-etl/ethereum-etl
13
which fetch the data by running Parity, an open-source Ethereum client4. The data is then
stored in Google’s data warehouse BigQuery, which facilitates scalable data analysis over
a high amount of data. Offering Online Analytical Processing capacities, it provides a
regularly updated Ethereum dataset, which can be queried by using Standard SQL, a
structured query language to communicate with data structures in a relational database.
The Ethereum blockchain dataset is stored, together with various other publicly available
datasets, in the section for public datasets.
Google BigQuery will be the primary data source for the following explorations and
analyses. Relevant data is primarily queried within the in-browser Google Cloud Console,
and corresponding query results are exported as a comma-separated values file before
they are further processed. In addition, the service features exporting and saving of query
results as datasets on Google Cloud. Querying own datasets enables shorter computation
time for queries, which would otherwise require a high degree of subqueries and nested
statements. Leveraging the computational power and capacity, we make use of the
features of uploading and saving our own datasets to enrich the on-chain data with
labelled data. This data includes labels and categories that we obtain from off-chain data
sources such as Etherscan5, an established block explorer and blockchain analytics
platform, and State of the Dapps6, a curated list of Dapps of different blockchains. A more
detailed explanation of the data handling and the data aggregation process will be
provided in the following sections of the paper.
5 Data Pre-Processing
To enable more detailed results, the data needs to be obtained, transformed and
aggregated with off-chain data. In this section, the different pre-processing steps are
described. First, I define the data sources, on-chain and off-chain, and the specific data
tables included in the analysis. Then, new variables were generated by aggregating the
original variables with the off-chain data. An overview of the different data processing
steps can be found in Figure 1. All steps except for the last step have been performed
online on Google Big Query by querying with SQL. A complete list of all relevant queries
4
https://github.com/openethereum/openethereum
5
https://etherscan.io/
6
https://www.stateofthedapps.com/
14
in ANSI SQL can be found in Appendix C. Several openly accessible Python software
libraries, such as pandas7, NumPy8 and Matplotlib9, provided the analysis and
interpretation tools for the data manipulation and analysis.
The blockchain data, which Google Big Query obtains, is de-normalised before it is stored
in the cloud. The organisation in different, queryable tables enables the exploration and
study of various research questions. Focusing on analysing users’ transactional behaviour
patterns on the Ethereum blockchain, this study includes only several specific Google Big
Query tables.
As the data has been aggregated on an address level, tables that include the entirety of
addresses provide the foundation for the analysis. The “crypto_ethereum.balances” table
contains all addresses on the blockchain with their Ether balances. To filter out smart
contracts and examine transactions to smart contracts at the same time, I included the
table “crypto_ethereum.contracts”, which contains all smart contracts’ addresses with
additional information, such as type of token contract and bytecode. Further incorporating
7
https://pandas.pydata.org/
8
https://numpy.org/
9
https://matplotlib.org/
15
the table “crypto_ethereum.transactions”, we are able to examine the level of activity for
both outgoing and incoming transactions. To consider token trading, a central use case of
smart contracts, the tables “crypto_ethereum.tokens” and
“crypto_ethereum.token_transfers” have been leveraged to include the token transfer
activity of all addresses on the blockchain.
As mentioned earlier in the paper, Google Big Query’s capability of uploading and
processing custom datasets has been leveraged to enrich the query results with off-chain
labels. Although the Ethereum blockchain’s account-based ledger supports a more trivial
analysis on an address level, their entity name or type (e.g. exchanges, mining pools or
Dapps) cannot be inferred from the address, which constitutes a 20-byte address
identifier. Crypto exchanges, which run on the Ethereum blockchain, generally rely on a
certain number of interlinked smart contracts to offer specific services, such as
exchanging tokens for Ether and swapping different tokens, to the user on the blockchain.
By solely examining the on-chain data, it would not be apparent to an external observer
whether a group of smart contracts makes up a decentralised application without
deploying more sophisticated analytical methods, such as network and graph analysis.
The leading online Ethereum block explorer Etherscan, which is openly accessible
through the browser, provides various functions, such as general blockchain data
visualisation and a search engine, to examine transaction properties or address activity in
detail. The platform offers the possibility to submit labels for EOAs and smart contracts.
Submissions are verified and then approved to get accumulated in a label directory. Since
the exploration and analysis are limited to identifying different user groups and their
behaviour, extracting name tags for both EOAs and smart contracts reveals insights about
the transaction behaviour regarding the recipient preference. Therefore, the Etherscan
label directory has been scraped to obtain name labels for addresses that belong to mining
pools, centralised exchanges and decentralised exchanges.
With Ether as the native cryptocurrency of the Ethereum blockchain and mining new
blocks to the blockchain as the only way of adding new Ether to the network, it is
reasonable to enrich the addresses extracted from Google Big Query with mining pool
labels. As the mining difficulty of the cryptographic puzzle of the Proof-of-Work
16
consensus mechanism has generally been increasing dynamically since the genesis block,
miners with conventional hardware started to accumulate their computational power by
forming mining pools (Zamyatin et al., 9/20/2017 - 9/22/2017). Examining the
transactions between mining pool addresses and other EOAs, the analysis attempts to
shed light on the spending behaviour of mining pool participants, which represent the first
spenders of newly mined Ether.
As the entire set of miner addresses, which have mined all blocks up to the present block,
can be conclusively determined, the ratio of mined blocks by mining pools to the total
number of mined blocks on the Ethereum blockchain has been examined. Looking at all
miner addresses up to the 11,828,337th block, it can be inferred that a significant number
of blocks has been mined by the set of mining pools that are labelled on Etherscan.
Seventy-two addresses from 72 different mining pools have been extracted.
Although it is not possible to precisely determine the ratio of labelled centralised and
decentralised exchanges to the entirety of exchanges on the Ethereum blockchain,
assessing the number of incoming and outgoing transactions of labelled exchanges in
relation to the total number of transactions that have taken place on the blockchain
suggests a significant fraction of traffic. A total number of 377 addresses are labelled as
exchanges.
Considering Dapps as one of the key use cases of the Ethereum blockchain, it appears
reasonable to incorporate off-chain data regarding Dapps (Wu, 2019). For this purpose,
State of the Dapps has been included in the off-chain data enrichment process. State of
the Dapps is an online database that offers a curated list of Dapps from different
blockchains. Besides a variety of metadata, such as author, current operation status, web
presence and a short description, State of the Dapps provides a list of associated smart
contract addresses for a limited number of Dapps. Furthermore, category labels are
available for every Dapp and their associated smart contracts, enabling Dapp usage
analyses on a categorical level. However, since the exact number of Dapps or smart
contracts belonging to Dapps is not determinable with on-chain data, no inference on the
actual proportion of labelled addresses can be made. The scraped data from State of the
Dapps includes 3813 smart contracts addresses from 1137 different Dapps divided into
18 categories.
17
All address labels obtained from Etherscan and State of the Dapp have been converted
into comma-separated values files and uploaded to Google Big Query to aggregate both
on-chain data and off-chain data with appropriate queries.
Different data aggregation and transformation steps have been conducted on both the on-
chain and off-chain data to create new descriptive transactional parameters.
First, features, such as transaction count and token transfer count, have been transformed
to construct normalised variables that describe the transactional activity pattern rather
than the level of activity itself. Subsequently, the on-chain data have been enriched by the
off-chain data from both Etherscan and State of the Dapps.
These constructed variables can be divided into two groups. The first group contains
variables related to each individual address and the second group holds variables stemmed
from the address interactions. In the following, the construction of these variables is
described, and an overview of the relevant variables for the analysis is presented.
The initial data frame that has been obtained by querying the Google Big Query tables in
5.1 and has not yet been subject to data aggregations and transformations is represented
in Table 1. The data table “crypto_ethereum.balances” contains all addresses on the
blockchain together with their Ether balance denoted in Wei10, a base unit of Ether. Since
only EOAs are relevant for the analysis, all addresses which represent smart contracts
have been excluded from the data. “crypto_ethereum.contracts”, which includes all smart
contract addresses, has been used for the matching process to examine transactions to
smart contracts. To extract all relevant addresses’ activity level regarding their active time
and their level of activity, the two tables, “crypto_ethereum.transactions” and
“crypto_ethereum.token_transfers”, have been aggregated with the address list. Both
tables contain all performed transactions and token transfers, respectively. Each
transaction’s timestamp represents the point in time when the respective block that
included that transaction was mined by the miner. Thus, the timestamp of the first and the
last outgoing transaction of every address is included to assess the activity period. The
number of outgoing transactions and incoming transactions is trivially obtained by
10
Smallest denominator of Ether, 1 Ether = 1,000,000,000,000,000,000 Wei (10 18)
18
counting the occurrences for the corresponding address as sender for the outgoing count
and receiver for the incoming count. As mentioned in the previous section 2.6, it is worth
noting that transactions and token transfers do not have the same order from an
architectural perspective. One transaction can lead to several token transfers (i.e., the
transfer of several different tokens). One token transfer always represents the change of
ownership of tokens from a specific token contract. This is illustrated by comparing the
total number of transactions to smart contracts with the total number of token transfers,
which exceeds the former figure significantly.
Ether Tokens
Address First Tx Sent Last Tx Sent Tx Sent Tx Received Tokens Sent
Balance Received
…
Source: Own representation
In addition to the initial variable set extracted directly from the Google Big Query data,
various derived variables were constructed by transforming the original variables. The
performed steps are listed in the following.
1. Out/In Token Ratio measures the proportion of outgoing token transfers (Out
Token) to the total sum of outgoing and incoming token transfers (In Token).
2. By combining the transactions and the contracts table, the variable SC Tx Sent
variable is formed, which indicates the number of outgoing transactions to smart
contracts. Correspondingly, SC Ratio specifies the proportion of smart contract
transactions to the total number of outgoing transactions (SC Tx Sent/ Tx Sent).
19
4. As the contract table contains information about the existence of an ERC20
interface for each contract address, one can infer the frequency of each user’s
token contract interaction. ERC20 Tx Sent represents the number of outgoing
transactions to ERC20 contracts. ERC20 Ratio specifies the proportion of ERC20
transactions to the total number of outgoing transactions, respectively (ERC20 Tx
Sent/ Tx Sent).
5. The Unique Receiver variable holds the number of distinct addresses to which an
address has sent transactions.
6. The Unique Sender variable holds the number of distinct addresses from which
an address has received transactions, respectively. These two variables reveal
whether a user interacts with a high number or only with a limited number of other
addresses.
7. To include temporal features in the dataset, we define the Active Days as the time
difference between the first outgoing transaction and the last outgoing transaction
in days for each address.
8. To evaluate whether an address has been active recently, the variable Idle Time is
calculated by determining the difference between the last outgoing transaction of
an address and the data extraction date.
After introducing the new variables derived from the original variable set, the Google Big
Query data and the off-chain from Etherscan need to be combined and aggregated to
obtain a deeper insight into the specific entities of senders and recipients that are
addressed in the examined transactions.
9. To assess the involvement in the mining process and thus in the block reward
system, the list of mining pool addresses has been joined with the transactional
data of each address. By identifying the transactions with mining pools as senders,
one can make inferences about the degree of mining pool participation for each
address. The number of transactions received from mining pools is represented by
the variable Mining Pool Tx Received, which is equal to the number of block
reward payouts for pools with direct payout schemes. Furthermore, counting the
20
distinct mining pool addresses results in the unique number of mining pools in
which the users participated (Unique Mining Pools). The contributed
computational power to the pool is proportional to the paid-out block rewards for
proportional payout schemes. Thus, the level of mining pool contribution is
determined by the Total Amount of Ether from Mining Pools and the Average
Received Block Reward (Zamyatin et al., 9/20/2017 - 9/22/2017). The Mining
Ratio is the ratio between Mining Pool Tx Received and Tx Received and
represents how much of the received transactions constitutes Ether reward
payouts.
6 Results
In this section, the analysis findings on the data whose pre-processing steps have been
described in the previous section are presented. The number of transactions an address
has sent and received serves as the primary variable to classify the relevant user groups
and analyse their usage patterns. First, the overall user structure on the Ethereum
blockchain is characterised along with features, such as activity level and activity time.
Then, the blockchain’s statistical and structural properties regarding user activity are
studied to arrive at a representative address sample. A subgroup of addresses is defined
by employing a clustering algorithm that clusters the EOAs by the number of outgoing
transactions. The selection of an appropriate subgroup aims at removing outliers, such as
low-activity addresses and high-active non-human user addresses (e.g. exchanges and
wallets). Next, the distribution of different parameters, such as SC Ratio and Zero Tx
Ratio, is analysed. The examination of ERC20 token contract and Dapp interactions
completes the first part of the results section.
After examining the overall transaction behaviour of EOAs, the mining pool participants
or miners are analysed in Section 6.2. The relevance of mining pools on the Ethereum
blockchain is illustrated by determining the share of their mined block rewards to the total
block rewards until the data extraction date. By assessing the outgoing transactions from
mining pools, all miners are identified. Subsequently, I investigate different parameters,
such as Average Received Block Reward and Unique Mining Pools, to analyse the mining
behaviour. The same transactional parameters that examine the interaction with other
addresses in Section 6.1.3 (e.g. SC Ratio) are calculated for mining pool participants as
21
well to pinpoint differences in transactional behaviour patterns. Lastly, I examine all
addresses that receive Ether transfers from miners to analyse how miners distribute their
Ether in the network.
The results enable us to gain more specific insights into the existing user groups and
understand which use cases and features of the Ethereum blockchain are the driving
activities for each user group.
In this first part of the results section, the entirety of EOAs is explored. This part gives a
general overview of the allocation of addresses on the blockchain and serves as a starting
point for further user group identification analyses.
Table 2 summarises the address structure of the Ethereum blockchain on the 10th of
February 2021. The total number of unique addresses queryable on Google Big Query on
that above-mentioned date amounts to more than 152 million, with around three-quarters
constituting EOAs. By identifying the EOAs that do not occur in the transaction table, the
number of inactive addresses with no outgoing transactions can be determined. More than
21%, around 24 million, of EOAs belong to those inactive addresses. This high number
of inactive addresses is due to an atypical occurrence in the second year after the inception
of Ethereum. No fewer than 19 million empty addresses were created in Autumn 2016
during a Denial-of-Service attack, which exploited a security flaw of the Ethereum client
Geth (Bok Consulting Pty Ltd, 2016). The attack created a large number of smart
contracts, which created numerous new EOAs. The analyser of the attack pointed out one
exemplary smart contract11 that received 4750 transactions and facilitated the address
creation. As the creation of these inactive EOAs that this address has created is still
queryable on Google Big Query, it can be assumed that a substantial proportion of these
empty addresses make up the more than 24 million inactive accounts found.
11
https://etherscan.io/address/0x6a0a0fc761c612c340a0e98d33b37a75e5268472
22
Table 2: Address Structure
OG Tx = 0 24,356,227 21.1 %
Considering the more than 900 million transactions that have been sent until the 10th of
February 2021, it is crucial to further examine how they are distributed over the remaining
active 90 million EOAs. Figure 2 illustrates the empirical cumulative distribution for the
number of outgoing transactions for all EOAs with at least one outgoing transaction. More
than 47% of the active addresses have sent only one transaction. 93.47% of these
addresses have sent no more than ten transactions. The heavy skewness towards addresses
that made very few transactions could suggest that most users generally send very few
transactions from their EOAs. It suggests that they interact very infrequently with the
blockchain or that an owner uses multiple addresses that he uses only once.
23
Figure 2: ECDF of Outgoing Transaction Count
To put the distribution of outgoing transaction count in perspective to the address lifetime,
the cumulative distribution function for the Active Days of addresses with at least one
outgoing transaction has been plotted in Figure 3. As all transactions until the
11,828,337th block, which was mined on the 10th of February 2021, were considered, the
addresses with the most extended lifetime have been active for 2014 days. This is
consistent with the time difference between the genesis block’s timestamp and the
timestamp of the 11,828,337th block (2022 days). The overall majority of 64.62% of
EOAs has been active for one day or less. This group represents addresses that have sent
only one transaction or multiple transactions within less than 24 hours. These addresses
will be called one-day addresses in the following. Further, only 5.51% of all addresses
are active over a period of 365 days or more.
24
Figure 3: ECDF of Active Days
An examination of the Idle Time gives insights into the number of last active addresses
within a specific period before the extraction date. As the majority of addresses have only
sent one transaction or are active for one day or less, the idle time for these addresses is
equal to their creation date or first occurrence on the blockchain. To distinguish the above-
mentioned user group from the remaining population, I, therefore, present the distribution
of idle times for both addresses that have been active for one day or less and addresses
that have been active for two days or more in Figure 4. The histogram bins the idle time
in days in one-month intervals from the data extraction date until the inception day of
Ethereum. By plotting the daily Ether price against the idle time, a strong correlation with
one-day addresses’ idle time can be visually observed. The lighter bars, which represent
the idle time of one-day addresses, demonstrate a sharp increase of occurrences between
the 30th of December 2017 and the 30th of January 2018, the highest number within 30
days. During this period, almost five million one-day addresses have made their first
transaction and have been inactive ever since. This coincidence indicates a possible
attraction of short-term Ethereum users by significant surges in popularity (Sovbetov,
2018). By further examining the values for addresses that have been active for two days
or more, a less volatile increase of idle times toward the data extraction date throughout
the years can be seen. This analysis suggests a steady increase in the number of users that
have been active at the same period (bin size: 30 days), with the highest number for the
25
most recent period before the extraction date. Around three million addresses that have
been active for two or more days have sent their last transaction within 30 days of the
extraction date.
After analysing the distribution of the number of outgoing transactions, active days, and
idle times of all addresses on the blockchain, the statistical properties are used to derive
a sample of EOAs that represents regular users. The inclusion of addresses that were last
active a considerable time before the extraction date potentially enables examining a
larger variety of user groups and changes in the behaviour patterns since the genesis
block. However, as different applications and features of Ethereum, such as the ERC20
token standard, decentralised apps and mining pools, were not initially present on the
blockchain and only gained popularity or were introduced separately over the years after
the inception, limiting the addresses to be analysed to a specific timeframe avoids
distortions of transactional behaviour patterns. As can be seen from the idle time plot in
Figure 4, a significant and persistent increase of the Ether price and the number of
simultaneously active users rule the network dynamics since 2020, with the highest
monthly increase from two months to one month before the extraction date. Thus, we
limit the following analysis on addresses with their last transaction within the last 30 days
of the extraction date (Idle Time ≤ 30 , excluding inactive addresses. Assuming that the
26
development of transactional behaviour pattern requires a certain amount of usage time,
one-day addresses will be excluded from the sample to be analysed (Active Days ≤ 1 .
After examining the number of outgoing transactions for the remaining addresses, the
large range of 31,098,717 transactions implies the existence of significant outliers that
have sent an anomalously high number of transactions. Querying the five addresses12 with
the highest transaction count, we observed that all accounts belong to a mining pool or an
exchange, thus, not to a human user. To arrive at a set of regular users, both accounts with
low activity and accounts with transaction counts on the high end of the distribution must
be removed reasonably. Using a clustering algorithm results in breaks by which the data
can be divided. As the transaction count distribution exhibits properties of a heavy-tailed
distribution with significantly higher frequencies for the low values and lower frequencies
for the higher values, the head/tail breaks algorithm should be preferred over the more
commonly used Jenks natural breaks classification method13 for clustering one-
dimensional data (Jiang, 2013). Jiang (2013) describes that the algorithm iterates through
the data by partitioning the values in the head of the distribution until a heavy-tail
distribution cannot be observed for the new partitioned values anymore, thereby
clustering the data without requiring external data about the number of breaks or clusters.
Both the cluster sizes and the intervals preserve the heavy-tail nature of the data.
The algorithm clusters the around three million addresses, which were last active within
30 days before the extraction date and were active for longer than one day, into nine
clusters. Figure 5 shows the histogram for the nine clusters with a logarithmic y-axis. The
clustering analysis results suggest that the addresses within the first cluster with
transaction counts between 2 and 123 belong to low-activity addresses. Being the overall
majority, they account for more than 94% of the addresses under consideration.
Furthermore, addresses within the third cluster and upwards can be regarded as
abnormally high-active addresses, such as addresses belonging to an exchange or a
mining pool. Thus, the set of EOAs representing regular users contains around 166,000
12
Ethermine: https://etherscan.io/address/0xea674fdde714fd979de3edf0f56aa9716b898ec8,
Nanopool: https://etherscan.io/address/0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5,
Bittrex: https://etherscan.io/address/0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98,
F2Pool: https://etherscan.io/address/0x829bd824b016326a401d083b33d092293333a830,
Binance: https://etherscan.io/address/0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be
13
Clustering algorithm that minimises deviations of members within the from cluster mean and maxim-
ises deviation from means of other clusters.
27
addresses with outgoing transaction counts between 124 and 1.894. Hereinafter these
addresses are referred to as regular users.
To evaluate the clustering process and assumptions that resulted in the set of regular users,
a comparison between the Active Days distribution of regular users and that of all users
before clustering is conducted. Both distributions are plotted against each other in Figure
6. Since both datasets represent recently active addresses (Idle Time ≤ 30), the subtraction
of Active Days from the extraction date roughly results in the creation date. The
distribution for the set of addresses before clustering shows an apparent heavy-tail
property, which is consistent with the empirical cumulative distribution function for the
entirety of addresses with at least one outgoing transaction in Figure 3. On the contrary,
the distribution for the regular users demonstrates a pattern that differs from that of a
heavy-tailed distribution. It suggests that in periods of high volatility (January 2018 and
January to February 2021), the number of new users that become long-term user,
addresses that have been active since then is higher than in periods of low volatility.
28
Figure 6: Active Days before and after Head/Tail Breaks Clustering
Having defined a set of addresses, regular users, to be studied, we employ the transaction
and token centred variables from section 5.3 to conduct analyses on the general behaviour
pattern. To examine the more specific characteristics of the transactions sent by regular
users as well as the recipients that they interact with, SC Ratio, Zero Tx Ratio, ERC20 Tx
Sent and the Out/In Token Ratio is calculated. Figure 7 displays the distribution of SC
Ratio of regular users where zero implies that the address has exclusively interacted with
other EOAs. One implies that they have sent all transactions to smart contracts. The
number of bins was set to 100 to capture more details of the distribution. As the histogram
indicates, most regular users sent more transactions to smart contracts than to other EOAs.
The mean for the SC Ratio is 0.76, and the median is 0.92. However, at both the low end
and the high end of the distribution, a derivation from the remaining plot can be seen.
More than 14,000 of the 166,000 regular users do not interact with smart contracts or in
less than 1% of all outgoing transactions. On the other hand, more than 16,000 users
exclusively sent transactions to smart contracts. These findings are not consistent with
the quantitative analysis of the Ethereum blockchain of Anoaica and Levard (2/26/2018
- 2/28/2018) published in 2018. The analysis of the transactions from the genesis block
until the block mined in August 2017 found a transaction pattern skewed towards user-
29
to-user transactions. The proportion of user-to-smart contract transactions to all
transactions, on the other hand, was only 34.3% for the examined timeframe. Our findings
possibly suggest a significant general transition to a more smart contract oriented
transaction behaviour. It should be noted that the constraints of considering only regular
users might have an impact on this figure as well because the most active accounts, which
are mining pools and exchanges, have been removed. These addresses send the majority
of the transactions to EOAs.
Transactions do not necessarily have to carry an Ether value. Transactions sent to smart
contracts to invoke their functions do not often send Ether to the smart contract. The
interaction with an ERC20 token contract where no Ether is transferred is an example of
a zero-value transaction. Invoking the token contracts functions to change the ownership
of someone’s tokens or set the desired allowance amount does not require the transfer of
Ether. Only Ether in the form of gas for the transaction processing is mandatory. Figure
8 shows the distribution of the proportion of zero-value transactions to the total number
of outgoing transactions (Zero Tx Ratio). The visual impression of this distribution is
similar to that of the SC Ratio distribution. It clearly shows outlying addresses on both
the minimum and maximum x-value in the histogram. Approximately 20,000 of the
166,000 regular users have transferred Ether in all of their transactions, and around 12,500
addresses did not transfer Ether in more than 99% of their transactions. The average and
the median for the Zero Tx Ratio are 0.63 and 0.70, respectively. To test the hypothesis
30
of a higher zero-value transaction proportion for smart contract transactions, the Zero Tx
Ratio is plotted for all outgoing transactions to smart contracts in Figure 9. The
distribution shows a significantly different pattern as the Zero Tx Ratio is noticeably
higher for transactions to smart contracts. The average and the median are 0.79 and 0.84,
respectively. Approximately 20% of all regular users (around 35,000) with at least one
outgoing transaction to a smart contract never transferred Ether to a smart contract. Both
analyses indicate that most of the transactions sent by regular users do not transfer any
Ether to the recipient and that zero-value transactions to smart contracts predominate.
31
6.1.4 ERC20 Contract Interaction and Token Transfer Dynamics
To further analyse the structure of smart contracts that regular users use, we examine the
number of ERC20 contracts transactions. ERC20 token contracts constitute an essential
subgroup of smart contracts on the Ethereum blockchain, as ERC20 is the most common
token standard at the time of data extraction. As mentioned earlier, for an EOA to invoke
different token related functions, including the transfer function, it must send a transaction
to the corresponding ERC20 contract. The relationship between transactions to ERC20
contracts and actual token transfers is examined in the following. Figure 10 represents the
histogram for the number of transactions sent to ERC20 contracts (ERC20 Tx Sent). The
range of outgoing transactions for regular users is between 124 and 1894 (as defined
during the clustering process in 6.1.2), and the average and median are 365 and 245
transactions, respectively. However, the average and median for transactions to ERC20
contracts are only 63 and 14. Furthermore, transactions to ERC20 contracts amount to
only 18% of all transactions and 25% of transactions to smart contracts (ERC20 Ratio).
Figure 11 depicts the number of incoming and outgoing token transfers for the regular
users. With 248 incoming and 184 outgoing token transfers on average, the studied
address set exhibits a higher in-degree with approximately 34.8% more incoming
transfers than outgoing. As tokens can also be distributed to arbitrary recipients during an
Airdrop event, an address might accumulate tokens over time that he did not intend to
receive in the first place (Victor, 2020). These tokens might partly account for the higher
token in-degree (mean and median of Out/In Token Ratio are 0.44 and 0.45). However,
the relatively high difference between incoming and outgoing token transfers might be
due to a high proportion of token investors which rather hold tokens than sell them.
Moreover, the high difference between the number of ERC20 transactions (ERC20 Tx
Sent) and the actual token transfers (incoming and outgoing) suggest that a significant
proportion of the token transfer activity is not caused by direct interactions with ERC20
contracts. Decentralised exchanges, for example, offer the possibility of trading different
tokens and Ether with other users on the blockchain. Tokens can also be purchased from
centralised exchanges, such as Coinbase and Binance, with Fiat money.
32
Figure 10: Number of ERC20 Contract Transactions
Figure 11: Comparison of Incoming and Outgoing Token Transfers of Regular Users
After analysing the interaction between regular users and smart contracts and ERC20
contracts, an overview of the Dapp usage will be given. As the Dapp data from State of
the Dapps provides us with category labels for each Dapp, the activity level of each Dapp
will be compared to each other. We augment the Dapp data by adding the Etherscan labels
for centralised and decentralised exchanges. As the Dapp list already contains a category
exchanges, a possible overlap of addresses is removed from the analysis. However, only
a negligible number of overlapping addresses had to be removed because the Etherscan
33
exchange list contains major exchanges, such as Binance and Bittrex, that broker
exchanges and offer off-chain access. In contrast, the category exchanges from the Dapps
list rather contains services and applications for on-chain users.
Table 3 shows the activity level for the ten most relevant Dapp categories by the
transactions they have received from regular users. More than 50% of all regular users
have sent at least one transaction to an exchange Dapp. The second most popular Dapp
category by the coverage of users is the finance category, as 43% of the regular user have
used at least one finance Dapp. Gaming Dapps have the highest number of transactions
per user. Gaming Dapp users send on average more than 40 transactions to such Dapps.
The high transaction count relative to the lower number of users might suggest that
Gaming Dapps require a higher engagement and a higher number of transactions or that
users of those apps generally interact more frequently with the blockchain.
CoinGathernator14 is the most widespread Exchange Dapp used by more than 26%
(44,329) of all regular users. The most widespread Finance Dapp is MakerDAO15, being
used by 33% (55,230) of all regular users. The Gaming Dapp with the highest reach as in
proportion of regular users who have sent a transaction to one of the Dapp addresses is
BRAVE FRONTIER HEROES16. More than 7% (11,818) of the regular users have
engaged in this Dapp.
Table 4 presents the activity level for centralised and decentralised exchanges.
Transactions to decentralised exchanges account for almost 20% of regular users’
outgoing transactions (11,894,999 of 60,931,081 total outgoing transactions). In
comparison, the number of transactions to centralised exchanges only amounts to 2.77
million transactions or 4.5% of all outgoing transactions. 61% of all regular users have
interacted with decentralised exchanges, but only 10% have sent a transaction to a
centralised exchange. Examining the most used exchanges for both categories, it is
apparent that Uniswap17 is the most popular decentralised exchange. It accounts for more
than 90% of all transactions sent to all decentralised exchanges. However, the user
distribution of centralised exchanges differs significantly from that of decentralised ones.
As the most popular centralised exchange Binance only covers around 2% of all regular
users or 21% of all users who use centralised exchanges, it is evident that they are more
14
https://www.stateofthedapps.com/dapps/coingathernator
15
https://www.stateofthedapps.com/dapps/makerdao
16
https://www.stateofthedapps.com/dapps/brave-frontier-heroes
17
https://uniswap.org/
34
evenly distributed over several different exchanges. The top five centralised exchanges,
Binance, Huobi, ZB.com,Yobit.net and Poloniex, are used by only more than 56% of
exchange users. The sum of distinct users for each centralised exchange is roughly equal
to the number of distinct users for the whole category. This indicates that users do not use
different exchanges simultaneously and do not decide to change exchanges as the two
sums of the individual exchanges would otherwise be higher than the total category sum.
Under the assumption that the on-chain interactions reflect the actual activity to a great
extent, the findings suggest both a higher competition among centralised exchanges and
a low rate of users moving to other exchanges.
18
Binance incoming transactions:
https://etherscan.io/txs?a=0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be&f=3
35
Table 3: Activity Level of Top 10 Dapp
Top 10 Dapp categories by incoming transaction count from regular users (State of the Dapps labels)
User/Regular User
Category Transaction count User Transaction/User
Ratio
User/Regular User
Exchanges Transaction count User Transaction/User
Ratio
After analysing the general transactional dynamics and the behaviour patterns focusing
on regular users, a subset of EOAs selected based on transaction activity, a more in-depth
analysis of a functional subgroup of users will be conducted. This section will provide an
overview of different findings regarding the user characterisation and transaction
behaviour of mining pool participants. Furthermore, I investigate patterns in the outgoing
transactions of mining pool participants to discover Ether spending behaviour. The results
will then be put into context to derive relevant implications about the decentralisation of
the Ethereum blockchain and the role of mining pools and their participants in the whole
network.
36
6.2.1 Mining Pool Structure
Mining pools are entities that aggregate the computational power of different participants
and participate in the consensus finding mechanism of the Ethereum blockchain with the
combined hashing power. The participation in these mining pools is motivated by the
block reward earning, which is, in most cases, shared among all participants in case of a
successfully mined block. This provides a stable revenue as the probability of successfully
mining new blocks with the aggregated computational power is higher than finding new
blocks as a single miner (Zamyatin et al., 9/20/2017 - 9/22/2017). The most common
reward payout scheme transfers the Ether from the block reward directly to an EOA
representing a mining pool participant (in the following also referred to as miner). This
direct on-chain payout scheme is used by major mining pools, such as Ethermine19,
Nanopool20 and F2Pool21. Hence, the mining pool participants (hereinafter referred to as
miners) can be identified by tracking the outgoing transactions from these mining pools.
19
https://ethermine.org/
20
https://nanopool.org/
21
https://www.f2pool.com/
37
Table 5: Top 5 of 72 Mining Pools by Blocks Mined
Reward Share represents the proportion of Ether received as a block or uncle reward to the total number of mined Ether (also mined by other
miners not in the labelled list) at the time of data extraction (42,602,285.34 ETH), Block Share represents the proportion of blocks mined to
the total number of blocks mined (also mined by other miners not in the labelled list) at the time of data extraction (11,828,337 blocks).
… … … … … … … …
As mentioned earlier, the identification of miners in a mining pool is trivial for the
majority of the cases. Four out of the five top mining pools use a direct payout scheme,
and miners can be tracked by examining the recipients of outgoing Ether transfers.
However, miners at mining pools that use proxy addresses for payouts cannot be directly
inferred. Spark Pool22 , for example, transfers the Ether reward to a proxy address which
then initiates Ether transfers to the miners. In order to differentiate between miner
addresses and proxy addresses, the structure of individual payouts is examined for every
mining pool on Etherscan. Mining pools that employ the most common proportional
payout scheme transfer the mined Ether after every successfully found block. The amount
transferred is determined by the computational power distribution that the miners have
contributed. Given these assumptions, the direct proportional payout scheme leads to a
high number of regular payouts for every mined block. The number of payouts per found
block is equal to the number of participants who contributed to finding the corresponding
block. By examining the structure of outgoing transactions of Ethermine23, it is evident
22
https://www.sparkpool.com/
23
https://etherscan.io/txs?a=0xea674fdde714fd979de3edf0f56aa9716b898ec8&f=2
38
that this mining pool uses a direct payout scheme as the time difference between most
consecutive payouts is within one second and the value of Ether transfers is less than one
for most of the payouts. However, mining pools that payout to proxy addresses should
demonstrate a different pattern regarding the number of Ether transferred and the number
of unique recipient addresses. Spark Pool24 transfers Ether in the two- to the three-digit
range to only one specific proxy address within a two-digit minute range. This proxy
address then forwards the Ether in a structure similar to that of the Ethermine address. It
should be noted that Spark Pool has transitioned from a direct payout pattern to a proxy
pattern in the past.
To identify the proxy addresses out of the addresses that received Ether from mining
pools, a list of all Ether payouts together with their corresponding mining pools, the
recipient address, the transferred amount and their timestamp were constructed.
In the following, it is assumed that only one distributing proxy address is used by mining
pools that follow the proxy pattern. A two-step heuristic is then used on every outgoing
payout sent by mining pool addresses to identify proxy addresses:
It is assumed that miners at pools with direct payout patterns are rarely paid more
than once consecutively because every contributing miner is paid in each round.
24
https://etherscan.io/address/0x5a0b54d5dc17e0aadc383d2db43b0a0d3e029c4c
39
However, as proxy addresses receive the mined ether first, they should be the
predominant first-degree receivers and thus have a high consecutive occurrence
count.
According to the described heuristic, six out of 72 mining pools use a proxy address. The
proxy addresses are added to the mining pool addresses but removed from the recipients
of payouts to account for the varying payout patterns. The miners are identified by
querying the recipients of Ether transfers initiated by mining pool addresses. The query
resulted in a total number of 1,342,602 unique recipient addresses (miners) that have
received Ether from mining pool addresses or their proxy addresses until the data
extraction date. However, to limit the analysis on recently and sufficiently active mining
pool participants, only the miners within the set of regular users (defined in section 6.1.2)
are examined in the remainder of this section.
Out of the 166,000 regular users, 18,600 users have received a transaction from a mining
pool or its proxy address and are considered miners. Table 6 presents general descriptive
figures of miners regarding their received Ether and the number of mining pools they
have used. Looking at the incoming transactions, it is evident that most of the miners use
their addresses solely as a recipient address to receive the mining pool payouts. This is
illustrated by the high mean for the ratio of incoming mining pool payouts to the total
25
Centralised or decentralised exchanges that are defined by the labels obtained from Etherscan
40
number of incoming transactions (Mining Pool Tx Received/Tx Received). Half of the
miners receive 99% of their incoming transactions from mining pools as reward payouts.
Under the assumption that an owner of a miner address generally engages with other
functionalities of the Ethereum blockchain, such as smart contracts and Dapps, than the
mining process as well, this finding might suggest a tendency to obfuscate transaction
activities by using additional addresses for other purposes. This hypothesis is further
examined in the following subsections 6.2.4 and 6.2.5 as the structure of miners’ outgoing
transactions will be analysed.
Although 72 labelled mining pools have been obtained from Etherscan, more than 50%
of the miners only use two or fewer mining pools (median of Unique Mining Pools is 2).
The mean and median for Unique Senders are 11 and 4, respectively. This finding seems
to indicate a lack of motivation for miners to change to other mining pools. This might
be due to entry barriers, such as software requirements or registration, for participating in
new mining pools. Zamyatin et al. (9/20/2017 - 9/22/2017) describe a Pay-Per-Last-N-
Shares payout scheme, which aims at preventing pool hopping by paying the reward only
after submitting a defined amount of contribution. However, the sole examination of on-
chain transaction data does not reveal the existence of such a scheme for a mining pool.
The average number of payouts that a miner receives is 361 (Mining Pool Tx Received).
Furthermore, half of the miners receive 480 (median) or fewer payouts from mining pools.
This indicates a higher in-degree for miners than non-mining users as the mean and
median for the number of incoming transactions for non-miners are only 78 and 27,
respectively. Figure 12 illustrates the distribution of incoming payouts for miners plotted
against the incoming transactions of non-mining users. It appears that, unlike the
incoming transactions for non-miners, the incoming payouts for miners do not follow a
heavy-tailed distribution. Although a small head, which might represent short-time
miners, exists until x = 90, the distribution (yellow plot) seems to have a local maximum
at around x = 180. This suggests that miners tend to participate at least to some specific
extent. The tail of the distribution indicates an even higher activity for a proportion of the
miners.
Since most reward payouts are proportionate to the contributed computational power, as
mentioned earlier, miners are encouraged to maximise the time they spend participating
in mining pools (Zamyatin et al., 9/20/2017 - 9/22/2017). This is further underlined by
the acquisition cost of required mining hardware, whose computational power often
41
exceeds that of ordinary computers. The amortisation period for each miner’s mining
hardware should vary according to the hardware and thus the miner’s computational
power.
To gain further insights into the payouts’ internal structure, the histogram for the payout
value distribution in Ether is examined (Average Received Block Reward). 75% of all
miners receive payouts that transfer on average 0.28 Ether or less. Under the assumption
that Ether rewards are proportional to the corresponding computational power, the above-
mentioned figures suggest a relatively evenly distributed computational power among
miners in mining pools. Figure 14 illustrates the histogram for average payouts value per
address in Ether from zero to one Ether divided into 100 bins. Although the distribution
shows that most miners are concentrated around the median at 0.13 Ether, two distinct
spikes can be observed between the range of 0.0 and 0.3 Ether. More than 2300 miners
receive an average payout between 0.05 and 0.06 Ether. The second peak occurs between
0.20 and 0.21 Ether with more than 800 miners. These spikes likely represent
42
accumulations of miners that use similar hardware and thus contribute similar
computational power to the mining pools.
Standard
405.73 11.01 1.13 28%
Deviation
43
Figure 13: Active Days of Miners and Non-Miners
As the results of the previous analyses on mining pool participants’ mining behaviour
suggest, their activity on the blockchain, based on the prevailing incoming reward
payouts, is primarily defined by contributing to the block finding process. Furthermore,
the significant role of mining pools and their mined Ether has been shown, underlined by
the majority proportion of the sum of mined Ether generated from those mining pools. To
further investigate miners’ transactional behaviour and, most importantly, if and how
miners put their mined Ether into circulation, the previously applied analyses on the
different variables around the smart contract usage, ERC20 token transfers and
44
transaction value will be conducted in the context of mining pool participants. As with
the analysis of regular users, the study of miners will be mainly guided by the examination
of different histograms to uncover patterns in the outgoing transactions. To highlight
miners’ distinct behaviours and thus the differences between miners and non-miners, the
subset of regular users that does not mine, the two groups’ plots will be compared. Table
7 illustrates the comparison of the mean and median between both mentioned groups and
shows the significance of a Welch’s t-test for several variables. All variables except one
differed significantly regarding their mean.
In Figure 15, the histogram for the SC Ratio for miners and non-miners is shown with
overlapping bars. Consistent with the results in 6.1, non-miners send their transactions
primarily to smart contracts (mean = 0.83, median = 0.93). Miners, however, exhibit an
opposite behaviour. The majority of miners address their outgoing transactions to other
EOAs (mean = 0.24, median = 0.00). These findings further confirm the previous
conjecture that miners do not use the same reward collection address for interacting with
other blockchain functionalities but forward their rewarded Ether to other EOAs first. As
a consequence of the low SC Ratio for miners, the vast majority of their transactions
represent Ether transfers. Figure 16 illustrate the relatively low ratio of zero-value
transactions (Zero Tx Ratio) of miners (mean = 0.10, median = 0.00) compared to non-
miners (mean = 0.69, median = 0.74).
Emphasised by these results, it appears that the miner addresses act as proxy addresses
that collect mining rewards from mining pools and forward them to other addresses. In
order to determine whether the outgoing transaction activity of miners is well spread
among different recipients or whether there exist major recipients, the distribution for the
number of different recipients (Unique Receiver) is plotted in Figure 17 for the range one
to 200. Although the visual impression of the non-miner plot suggests a relatively evenly
distribution for 50% of the addresses sending transactions to 46 different recipients or
fewer, the yellow depiction for the miner group reveals a significantly different pattern.
More than 50% of all miners send their transactions and thus their Ether to only fewer
than four different addresses. Despite the apparent difference of both plots conveyed by
the visual impression, a Welch’s test shows no significant difference in their means. This
result is due to the similar spread of the number of unique recipients, which is limited to
the equal maximum number of outgoing transactions for both groups.
45
Although the majority of the miners send their transaction to very few addresses, a large
proportion, 25%, of the miners send their transactions to more than 64 different recipients,
indicating a more complex behaviour. By manually analysing the outgoing transactions
of the five miners26 with the highest number of unique recipients, a similar transaction
pattern to that of mining pool proxy addresses, defined in 6.2.2, can be observed. They
seem to distribute or forward the rewarded Ether to a high number of different EOAs.
Furthermore, the amount of transferred Ether per transaction lies in the same range
observed for the mining pool proxy addresses. These addresses might represent proxy
addresses of mining pools that have not been recognised by the heuristic applied in
Section 6.2.2.
Our findings regarding miners’ transaction behaviour indicate that mining pool
participants highly differ from non-miner addresses regarding their smart contract usage,
their recipients they transfer Ether to and their token transfer activity. A large majority of
these reward collecting addresses forward their Ether to other EOAs and thus do not
interact with smart contracts or ERC20 token contracts. This result leaves questions about
the extent of diversity of the recipient of those Ether transfers and the subsequent
utilisation of mined Ether.
Token Transfers
SC Ratio Zero Tx Ratio Unique Recipients Out/In Ratio
Incoming
Welch’s t-test
< 0.01 < 0.01 > 0.05 < 0.01 < 0.01
(p-value)
26
Miner addresses: 0x3598ac8dffa22994f0b18087035647c7a09fc389,
0xe8c8e1291ed1949da844b38ff1860a6a4c96e459,
0x57efa1377ef6c0363327ee6cffe46926e5fb682b,
0x1670f6ceaea120fe4e72c9edba3883ae7941401c,
0x7f01921af6dbcf81086c5cf88aa7c93f665278
46
Figure 15: Smart Contract Ratio Miners vs Non-Miners
47
6.2.5 Distribution of Mined Ether
As it could be shown that the rewarded Ether from mining pools account for more than
three-fourths of the mined Ether supply on the Ethereum blockchain, generating insights
on the structure of receiving addresses helps to understand the distribution hence
centralisation of the mined Ether over the remaining network. Therefore, all addresses (in
the following referred to as secondary receivers) that receive Ether transfers from miners
will be identified. Afterwards, the Ether distribution among these secondary receivers will
be analysed to reveal specific Ether accumulation patterns.
To further examine the proportion of received Ether rewards that are actually forwarded
to other addresses, the following analysis aims at comparing the sum of received Ether to
the sum of sent Ether. By calculating the total value of Ether transferred from all labelled
mining pools and their proxy addresses, the sum of received Ether can be determined for
the 18,617 miners who are within the set of regular users (Total Amount of Ether from
Mining Pools). 5,169,348 Ether have been transferred to miners until the extraction date.
In contrast, 12,421,736 Ether has been transferred from these miners to other addresses.
This substantial difference between the incoming and outgoing amount of Ether
conclusively implies that the Ether transferred from the labelled mining pools accounts
for only a proportion of the sum of outgoing Ether. Thus, it can be inferred that miner
also receive Ether from unlabelled mining pools, exchanges or other EOAs. Table 8
shows the deviation of the sum of outgoing Ether from the sum of incoming Ether for
different groups of miners sorted by the proportion of incoming mining pool transactions
to the total number of incoming transactions. The row deviation indicates the ratio of the
difference of incoming and outgoing Ether to the incoming Ether. It can be observed that
miners who use their address to receive transactions from other entities than mining pools
send more Ether than they have received as mining reward. Although this group of miners
represents a relatively small fraction of miners, they account for an unproportionally high
amount of outgoing Ether. Addresses exclusively used for participating in mining pools
(Incoming Mining Pool Transactions / All Incoming Transactions = 1.00 in Table 8,
column 4) send approximately as much Ether as they receive as the deviation is close to
0%.
48
Table 8: Comparison of Incoming and Outgoing Ether for Miners
Comparison of different groups of miners divided by the proportion of incoming mining pool transactions (Mining Pool Tx Received) to the
total number of received transactions (Tx Received).
1 2 3 4 5
≥ 0.00 & < 0.88 ≥ 0.88 & < 0.99 ≥ 0.99 & <1.00 = 1.0 All Miners
(~25%) (~50%) (~75%) (~100%) (0.00 – 1.00)
While it can be assumed that the received Ether rewards are to some extent included in
the outgoing transactions of all miners, it can obviously not be determined whether a
received Ether reward has been used in a specific outgoing transaction. Therefore, only
the secondary receivers receiving Ether from miners with a Mining Ratio of 0.99 or higher
will be considered (Table 8, column 3 and 4), as it can be assumed that most of the
outgoing Ether originates from incoming mining pool reward payouts. This group of
9,185 miners constitute 49.3% of all miners considered and received 38.6% of all mining
rewards. The total Ether they send to secondary receivers amounts to 2,063,150. To
analyse the structure of secondary receivers, I used a query to extract all outgoing
transactions and their recipients for the selected subgroup of miners. The considered
9,185 miners have transferred Ether in 2,904,728 transactions to 432,603 different
secondary receivers.
Although it seems that the miners distributed the rewarded Ether to a higher number of
secondary receivers, a view on the most prominent addresses by Ether sent (miners) and
received (secondary receivers) results in a different impression. Table 9 provides the
number of top miners by Ether sent and top secondary receivers by Ether received
together with their share of the total Ether sent by all miners. The eighty top miners by
total Ether sent have sent more than 50% of the 2,063,150 Ether transferred to secondary
receivers. On the contrary, only 12 of the top secondary receivers accumulate 50% of the
incoming Ether. Table 11 shows the top secondary receivers that, together, receive more
than 50% of all the sent Ether. Six of these secondary receivers are exchanges. In total,
49
more than 35% of all the Ether are sent to centralised exchanges. The remaining
proportion is primarily distributed to other EOAs (Table 10).
The values in the column Miners represent the number of top miners by Ether sent to secondary receivers that account for the corresponding
proportion to the total amount of Ether sent by the miners considered (2,063,150 Ether). The values in the column Secondary Receivers
represent the number of top receivers by Ether received from miners that account for the corresponding proportion to the total amount of
Ether sent by the miners considered (2,063,150 Ether).
25% 5 5
50% 80 12
75% 777 47
50
Table 11: Top 12 Secondary Receivers by Total Ether Received
51
7 Conclusion
In the following, the paper is concluded by highlighting the key findings of the data
analysis. Furthermore, this section discusses to what extent the obtained results have
answered the research questions. Then, the limitations of this study are discussed, and
potential directions for future studies are provided.
This paper represents one of the first attempts at identifying and describing transactional
user behaviour patterns on the Ethereum blockchain. As prior research on quantitative
analyses of on-chain data has rather focused on general transaction activity levels on a
network level, this study was designed to examine the usage of different functionalities
and applications, such as smart contracts, decentralised applications and mining pools,
from a user perspective. The analysis has been conducted on the on-chain data extracted
from the cloud service Google Big Query and included data between July 2015 and
February 2021. Furthermore, labelled data for decentralised applications, mining pools
and exchanges were used to augment the existing on-chain data. After aggregating and
transforming the data, new variables derived from the existing data have been created to
form the data foundation for the subsequent analyses. To characterise the users and their
behaviour patterns on the blockchain, I performed investigations on the data with the help
of general statistics and graphical representations. These investigations were focused on
the distribution of different variables over the examined set of users. The users were
described along three dimensions: transactional activity level, smart contract and
application interaction, and mining process engagement.
In order to analyse the existence of different user groups, a granular approach has been
used by examining the activity level of all EOAs on the blockchain. Regarding the number
of outgoing transactions, the results show that more than 47% of all EOAs have only sent
one transaction, and more than 93% have sent fewer than ten transactions. Furthermore,
almost two-thirds of all accounts have been active for less than one day. The activity level
follows a heavy-tailed distribution with a high occurrence of low-activity addresses and
a low occurrence of high-activity addresses. Additional analyses of the historical price
chart of Ether in USD revealed a significant correlation between the price and the number
of low-activity addresses created, possibly indicating a relationship between volatility and
52
participation of new users. A subset of regular users, whose transaction behaviour is
analysed in more detail, has been identified by excluding low-activity addresses and high-
activity addresses belonging to entities like wallets and exchanges.
To further examine the usage pattern of applications, such as smart contracts and
exchanges, the structure of outgoing transactions was considered. More than 90% of all
regular users, which have made 124 to 1894 outgoing transactions, have sent a transaction
to at least one smart contract. Over 50% of all regular users have sent 90% of their
transactions to smart contracts. This high proportion of smart contract interactions comes
along with a high ratio of zero-value transactions, transactions with no Ether value
transferred, indicating that Ethereum is not solely used for the purpose of currency
transfers. More than 60% of regular users use decentralised exchanges to trade
cryptocurrencies. The analysis of Dapp interaction revealed that users engage in various
applications, albeit with a tendency to exchange, finance and gaming Dapps.
Besides using smart contracts and other functionalities of Ethereum, users also engage in
the blockchain’s mining process to a significant extent. By identifying the most active
block miners, it has been shown that mining pools account for more than three-fourths of
the mined blocks and that more than 10% of regular users participate in those pools. These
mining pool participants have been active for a significantly longer period than other
users. This difference is also reflected in the higher number of outgoing transactions.
Regarding the transactional behaviour pattern, it could be shown that miners primarily
send Ether to other EOAs and centralised exchanges. 75% of the mined Ether rewarded
to participants that use their addresses primarily for mining is sent to only a fraction of
the initial number of miners (9185 senders to 47 receivers). Centralised exchanges receive
around one-third of the mined Ether, possibly facilitating the exchange with Fiat
currencies or other cryptocurrencies. Although most mined Ether gets forwarded and
concentrated at only a few exchanges and other EOAs, the remaining Ether is distributed
over almost fifty times as many addresses (9185 senders to 410,000 receivers). These
secondary Ether receivers exhibit low-activity behaviour. These findings underpin the
differences in transaction behaviour and address usage between miners and non-miners.
53
7.2 Research Limitations
Like most exploratory data analysis studies, this study presents results and interpretations
whose validity is limited in some regards. Since the analysis of all different user groups
from the inception date would require considering the entirety of EOAs on the blockchain
and thus more complex analyses, certain thresholds have been defined to limit the study
to a subset of users. These thresholds were determined by the head/tail breaks clustering
algorithm that accounted for the underlying distribution pattern. The further exploratory
investigations were based on the assumption that end-users of the blockchain were
sufficiently well represented by the defined subset of users and that end-users only
primarily use one address to interact with the blockchain. While this might be valid for a
proportion of users, the analysis on mining pool participants has shown a more complex
address usage pattern. The exclusion of certain groups of EOAs, such as low-activity
accounts, can lead to an incomplete picture of the user base and thus distorted analysis
results if users with multiple addresses account for a substantial proportion of users.
The labelled data for decentralised applications, exchanges and mining pool was obtained
from online directories and block explorers that heavily depend on the input of third
parties. Therefore, a proportion of unlabelled yet relevant addresses was not included in
the analysis. A Dapp developer might, for example, decide not to promote his Dapp on
State of the Dapps or to omit specific addresses associated with his Dapp on the Dapp
page. This limitation would underestimate the Dapp usage in general or in particular for
specific Dapps. As centralised and decentralised exchanges often use a complex system
of intertwined smart contracts and EOAs, the limited set of labelled exchange addresses
might not account for all relevant addresses.
The identification of mining pool participants relied solely on the assumption that mining
pools primarily follow a direct round-based pay-per-share payout scheme. Following this
reasoning, the analysis identified the miners by examining all outgoing transactions from
mining pools. However, to account for any proxy addresses used to pay out to miners, a
heuristic has been introduced. Although several proxy addresses have been identified, the
inclusion in the further analyses is not based on ground truth. Lastly, no differentiation
has been made between regular users and potential institutional miners with great
computation power regarding miners’ mining activity. This could have an impact on the
observed miner transaction behaviour.
54
7.3 Directions for Further Research
While this study focused on transaction behaviour patterns of different user groups on the
Ethereum blockchain, it has covered the token transfer activity to only a minimal extent.
Since the activity of receiving and sending tokens does not necessarily involve the direct
interaction with the token contract by the sender or receiver, it is not adequately reflected
by the ERC20 transaction activity (ERC20 Tx Sent). An investigation on the ERC20 token
usage behaviour on a token transfer level could reveal more distinct user groups as the
functionalities of tokens on the Ethereum blockchain become more diverse over time.
Different areas of interest could include analysing users who participated in specific
ICOs, initial coin offerings, or hold tokens from a specific airdrop.
This study provides an exploratory analysis of user groups and behaviour patterns on the
Ethereum blockchain. The examination of the underlying address structure can serve as a
foundation for further quantitative analyses to reveal more distinct user groups.
Unsupervised machine learning could be used to first cluster addresses of the same owner
and then cluster distinct users based on different transactional features that have been
presented in this study. Not only features regarding smart contract or ERC20 token
contract interactions could be used. One could cluster EOAs with labelled data, thus
differentiate between users who use different exchanges or Dapps, for example.
The analysis of mining pool participants and their transaction behaviour exposed an Ether
distribution pattern that included the transfer of mined Ether to a high number of
secondary addresses. This finding raises the question of the extent to which these
secondary addresses are associated with the initial senders. The development of address
clustering heuristics has been discussed in Section 3. Since the number of addresses that
can be created and used by a single user or entity is not technically limited, controlling
multiple EOAs distorts analysis findings and complicates the analysis of user behaviour
patterns. A more holistic approach regarding possible address clusters can provide a better
picture of the entire user base and thus a better base for assessing the network health. This
approach may include the analysis of complete transaction graphs that examine Ether
transfers from the reward payout to cover subsequent transfer paths.
With the Ethereum blockchain’s ongoing transition to Ethereum 2.0, several upgrades
that are being rolled out incrementally introduce significant changes to different aspects
of the blockchain. Implementing new features such as Sharding and Proof-of-Stake
55
mechanism increases general scalability and economic sustainability (ethereum.org,
2021). The shift from Proof-of-Work to Proof-of-Stake renders the use of computational
power for the block finding process obsolete. A set of validators will be responsible for
the consensus finding process by locking a specific amount of Ether into a deposit. The
Ether distribution and centralisation play an essential role in Ethereum 2.0 as validators’
votes depend on the amount of Ether staked (Ethereum Foundation, 2021). Similar to
mining pools, staking pools will facilitate participation in the consensus finding process
for individuals. Since it can be expected that a significant proportion of the block reward
under Proof-of-Stake will also be distributed to a low number of receiving addresses,
applying our findings of the mining pool participants on the future stake pool participants
can lead to relevant implications for future developments. Although there exist
approaches to disincentivise centralisation economically, examining the stake pool
participants and their Ether distribution patterns can give insights into the network
centralisation and how it can be regulated by future mechanisms of Ethereum 2.0.
56
References
57
Glomann, L., Schmid, M., & Kitajewa, N. (2019). Improving the blockchain user
experience - An approach to address blockchain mass adoption issues from a human-
centred perspective. In T. Ahram (Ed.), Advances in Intelligent Systems and
Computing. Advances in Artificial Intelligence, Software and Systems Engineering:
Proceedings of the AHFE 2019 International Conference on Human Factors in
Artificial Intelligence and Social Computing, the AHFE (Vol. 965, pp. 608–616).
Springer. https://doi.org/10.1007/978-3-030-20454-9_60
Guo, D., Dong, J., & Wang, K. (2019). Graph structure and statistical properties of
Ethereum transaction relationships. Information Sciences, 492, 58–71.
https://doi.org/10.1016/j.ins.2019.04.013
Harlev, M. A., Sun Yin, H., Langenheldt, K. C., Mukkamala, R., & Vatrapu, R. (2018).
Breaking bad: De-anonymising entity types on the Bitcoin blockchain using
supervised machine learning. In T. Bui (Ed.), Proceedings of the Annual Hawaii
International Conference on System Sciences, Proceedings of the 51st Hawaii
International Conference on System Sciences. Hawaii International Conference on
System Sciences. https://doi.org/10.24251/HICSS.2018.443
Jiang, B. (2013). Head/tail breaks: A new classification scheme for data with a heavy-
tailed distribution. The Professional Geographer, 65(3), 482–494.
https://doi.org/10.1080/00330124.2012.700499
Lee, X. T., Khan, A., Sen Gupta, S., Ong, Y. H., & Liu, X [Xuan] (2020).
Measurements, analyses, and insights on the entire Ethereum blockchain network. In
Y. Huang (Ed.), ACM Digital Library, Proceedings of The Web Conference 2020
(pp. 155–166). Association for Computing Machinery.
https://doi.org/10.1145/3366423.3380103
Meiklejohn, S., Pomarole, M., Jordan, G., Levchenko, K., McCoy, D., Voelker, G. M.,
& Savage, S. (2013). A fistful of Bitcoins. In K. Papagiannaki (Ed.), ACM Digital
Library, Proceedings of the 2013 conference on Internet measurement conference
(pp. 127–140). ACM. https://doi.org/10.1145/2504730.2504747
Monaco, J. V. (2015). Identifying Bitcoin users by transaction behavior. In I. A.
Kakadiaris, A. Kumar, & W. J. Scheirer (Eds.), SPIE Proceedings, Biometric and
Surveillance Technology for Human and Activity Identification XII (p. 945704).
SPIE. https://doi.org/10.1117/12.2177039
Motamed, A. P., & Bahrak, B. (2019). Quantitative analysis of cryptocurrencies
transaction graph. Applied Network Science, 4(1). https://doi.org/10.1007/s41109-
019-0249-6
Oliva, G. A., Hassan, A. E., & Jiang, Z. M. (2020). An exploratory study of smart
contracts in the Ethereum blockchain platform. Empirical Software Engineering,
25(3), 1864–1904. https://doi.org/10.1007/s10664-019-09796-5
Presthus, W., & O’Malley, N. O. (2017). Motivations and barriers for end-user adoption
of Bitcoin as digital currency. Procedia Computer Science, 121, 89–97.
https://doi.org/10.1016/j.procs.2017.11.013
Sako, K., Matsuo, S., & Meier, S. (2021, February 7). Fairness in ERC token markets:
A case study of CryptoKitties. https://arxiv.org/pdf/2102.03721
Schaupp, L. C., & Festa, M. (2018). Cryptocurrency adoption and the road to
regulation. In M. Janssen (Ed.), Proceedings of the 19th Annual International
Conference on Digital Government Research: Governance in the Data Age (pp. 1–
9). ACM. https://doi.org/10.1145/3209281.3209336
58
Somin, S., Gordon, G., & Altshuler, Y. (2018, May 31). Social signals in the Ethereum
trading network. https://arxiv.org/pdf/1805.12097
Sovbetov, Y. (2018). Factors influencing cryptocurrency prices: Evidence from Bitcoin,
Ethereum, Dash, Litcoin, and Monero. Journal of Economics and Financial Analysis,
2, 1–27.
Sun, H., Ruan, N., & Liu, H. (2019). Ethereum analysis via node clustering. In Birukou
& Liu (Eds.), Lecture Notes in Computer Science. Network and System Security
(Vol. 11928, pp. 114–129). Springer International Publishing.
https://doi.org/10.1007/978-3-030-36938-5_7
Victor, F. (2020). Address clustering heuristics for Ethereum. In J. Bonneau & N.
Heninger (Eds.), Lecture Notes in Computer Science. Financial Cryptography and
Data Security: 24th international conference, fc (Vol. 12059, pp. 617–633).
SPRINGER NATURE. https://doi.org/10.1007/978-3-030-51280-4_33
Victor, F., & Lüders, B. K. (2019). Measuring Ethereum-based ERC20 token networks.
In Goldberg & Birukou (Eds.), Lecture Notes in Computer Science. Financial
Cryptography and Data Security (1st ed., Vol. 11598, pp. 113–129). Springer
International Publishing. https://doi.org/10.1007/978-3-030-32101-7_8
Wood, G. Ethereum: A secure decentralised generalised transaction ledger EIP-150
REVISION.
Wu, K. (2019, February 13). An empirical study of blockchain-based decentralized
applications. https://arxiv.org/pdf/1902.04969
Wu, K., Ma, Y., Huang, G., & Liu, X [Xuanzhe] (2019). A first look at blockchain‐
based decentralized applications. Software: Practice and Experience. Advance
online publication. https://doi.org/10.1002/spe.2751
Zamyatin, A., Wolter, K., Werner, S., Harrison, P. G., Mulligan, C. E. A., &
Knottenbelt, W. J. (9/20/2017 - 9/22/2017). Swimming with fishes and sharks:
Beneath the surface of queue-based Ethereum mining pools. In 2017 IEEE 25th
International Symposium on Modeling, Analysis, and Simulation of Computer and
Telecommunication Systems (MASCOTS) (pp. 99–109). IEEE.
https://doi.org/10.1109/MASCOTS.2017.22
Zanelatto Gavião Mascarenhas, J., Ziviani, A., Wehmuth, K., & Vieira, A. B. (2020).
On the transaction dynamics of the Ethereum-based cryptocurrency. Journal of
Complex Networks, 8(4), Article cnaa042. https://doi.org/10.1093/comnet/cnaa042
Zheng, P., Zheng, Z., Wu, J., & Dai, H.‑N. (2020). XBlock-ETH: Extracting and
exploring blockchain data from Ethereum. IEEE Open Journal of the Computer
Society, 1, 95–106. https://doi.org/10.1109/OJCS.2020.2990458
59
Appendix A
The following tables include Google Big Query tables that were used for the analysis.
Relevant fields are highlighted in blue. Table identifiers have been modified for reasons
of conciseness (bigquery-public-data.crypto_ethereum to
crypto_ethereum.balances
60
crypto_ethereum.blocks
logs_bloom STRING NULLABLE The bloom filter for the logs of the block
transactions_root STRING NULLABLE The root of the transaction trie of the block
state_root STRING NULLABLE The root of the final state trie of the block
receipts_root STRING NULLABLE The root of the receipts trie of the block
61
crypto_ethereum.contracts
crypto_ethereum.token_transfers
block_hash STRING REQUIRED Hash of the block where this transfer was in
62
crypto_ethereum.traces
input STRING NULLABLE The data sent along with the message call
block_hash STRING REQUIRED Hash of the block where this trace was in
63
crypto_ethereum.transactions
input STRING NULLABLE The data sent along with the transaction
64
Appendix B
The following tables include off-chain data sets from State of the Dapps and Etherscan
that were used for the analysis and uploaded to Google Big Query for the data pre-pro-
cessing.
65
dex – Decentralised Exchanges from Etherscan
The table contains 95 entries.
66
Appendix C
SQL Queries
The following SQL queries were used to arrive at the analysis findings in Section 6. Table
identifiers have been modified to be consistent with the identifiers used in Appendix A
and B.
UNION ALL
UNION ALL
UNION ALL
UNION ALL
UNION ALL
UNION ALL
67
2. Tx Sent, Active Days, Idle Time for all EOAs
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337
)
GROUP BY address
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337
)
GROUP BY address
HAVING DATE_DIFF(DATE(MAX(bt)), DATE(MIN(bt)), DAY) <= 1
68
4. Idle Time for EOAs with Active Days ≥ 2
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337
)
GROUP BY address
HAVING DATE_DIFF(DATE(MAX(bt)), DATE(MIN(bt)), DAY) >= 2
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337
)
GROUP BY address
HAVING COUNT(1) >= 124 and COUNT(1) <= 1894 and DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
69
6. SC Ratio for regular users with Idle Time ≤ 30
)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
70
8. Zero Tx Ratio for regular users with Idle Time ≤ 30 (transactions to smart
contracts)
)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
71
10. Out Token, In Token, Out/In Token Ratio for regular users with Idle Time ≤ 30
WITH regular_users AS (
SELECT DISTINCT address
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337
)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
),
out AS (
SELECT DISTINCT from_address AS address,
COUNT(1) AS out_token
FROM `crypto_ethereum.token_transfers`
WHERE block_number <= 11828337 AND from_address IN (SELECT address FROM
regular_users)
GROUP BY from_address
)
72
11. Activity level of Dapp categories
WITH regular_users AS (
SELECT DISTINCT address
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
where block_number <= 11828337
)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
)
73
12. Activity level of centralised exchanges
WITH regular_users AS (
SELECT DISTINCT address
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337
)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
)
74
13. Activity level of decentralised exchanges
WITH regular_users AS (
SELECT DISTINCT address
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337
)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
)
75
14. Activity level of mining pools
WITH a AS (
SELECT DISTINCT to_address,
COUNT(1) AS reward_count,
COUNT(CASE WHEN reward_type = 'block' THEN 1 END) AS blocks,
COUNT(CASE WHEN reward_type = 'uncle' THEN 1 END) AS uncles,
FROM `crypto_ethereum.traces`
WHERE block_number <= 11828337 AND to_address IN (SELECT address FROM
mining_pools) AND trace_type = 'reward'
GROUP BY to_address
)
SELECT b.label,a.*
FROM a
LEFT JOIN mining_pools AS b ON a.to_address = b.address
ORDER BY blocks DESC
76
15. Proxy address heuristic: Query calculates the highest consecutive occurrence of
payouts to the same receiver and orders receivers by maximum occurrence in
descending order. The query has to be run for every mining pool.
WITH all_mp_payouts AS (
SELECT from_address AS address,
label,
to_address AS recipient,
block_timestamp,
FROM `crypto_ethereum.transactions` AS t
LEFT JOIN mining_pools AS mp ON mp.address = t.from_address
WHERE block_number <= 11828337
),
a AS (
SELECT recipient,
label,
block_timestamp
FROM all_mp_payouts
WHERE address = <INSERT ADDRESS OF MINING POOL TO EXAMINE>
ORDER BY block_timestamp ASC
),
b AS (
SELECT recipient,
label,
count(*) AS occ
FROM (SELECT a.*,
(ROW_NUMBER() OVER (ORDER BY
block_timestamp) - ROW_NUMBER() OVER
(PARTITION BY recipient ORDER BY
block_timestamp)
) AS grp
FROM a
) a
GROUP BY grp, recipient, label
),
c AS(
SELECT DISTINCT recipient,
label,
MAX(occ) AS maxocc
FROM b
GROUP BY recipient,label
)
77
16. Mining Pool Tx Received, Average Received Block Reward, Unique Mining
Pools and Mining Ratio of mining pool participants
WITH mp_agg AS (
SELECT * FROM mining_pools
UNION ALL
SELECT * FROM mp_proxy
),
total_inc AS (
SELECT DISTINCT to_address AS address,
COUNT(1) AS tx_received
FROM `crypto_ethereum.transactions`
WHERE block_number <= 11828337 AND
GROUP BY to_address
)
FROM `crypto_ethereum.transactions` AS a
LEFT JOIN mp_agg AS b ON a.from_address = b.address
LEFT JOIN total_inc AS c ON a.to_address = c.address
78
17. Outgoing Ether transfers of mining pool participants (with mining ratio ≥ 0.99)
to secondary receivers
WITH mp_agg AS (
SELECT * FROM mining_pools
UNION ALL
SELECT * FROM mp_proxy
),
total_inc AS (
SELECT DISTINCT to_address AS address,
COUNT(1) AS tx_received
FROM `crypto_ethereum.transactions`
WHERE block_number <= 11828337 AND
GROUP BY to_address
),
miner AS (
SELECT DISTINCT to_address AS miner,
COUNT(1)/tx_received AS mining_ratio
FROM `crypto_ethereum.transactions` AS a
LEFT JOIN mp_agg AS b ON a.from_address = b.address
LEFT JOIN total_inc AS c ON a.to_address = c.address
79
18. Secondary receivers by incoming Ether from mining pool participants (with
mining ratio ≥ 0.99)
WITH mp_agg AS (
SELECT * FROM mining_pools
UNION ALL
SELECT * FROM mp_proxy
),
total_inc AS (
SELECT DISTINCT to_address AS address,
COUNT(1) AS tx_received
FROM `crypto_ethereum.transactions`
WHERE block_number <= 11828337 AND
GROUP BY to_address
),
miner AS (
SELECT DISTINCT to_address AS miner,
COUNT(1)/tx_received AS mining_ratio
FROM `crypto_ethereum.transactions` AS a
LEFT JOIN mp_agg AS b ON a.from_address = b.address
LEFT JOIN total_inc AS c ON a.to_address = c.address
80
Declaration of Authorship
I hereby declare that the thesis submitted is my own unaided work. All direct or indirect
sources used are acknowledged as references.
I am aware that the thesis in digital form can be examined for the use of unauthorized aid
and in order to determine whether the thesis as a whole or parts incorporated in it may be
deemed as plagiarism. For the comparison of my work with existing sources, I agree that
it shall be entered in a database where it shall also remain after examination to enable
comparison with future theses submitted. Further rights of reproduction and usage,
however, are not granted here.
This paper was not previously presented to another examination board and has not been
published.
___________________________ ___________________________
first and last name city, date and signature
81