0% found this document useful (0 votes)
294 views87 pages

User Identification and Behaviour Patterns On The Ethereum Blockchain: An Exploratory Study

This thesis explores user identification and behavior patterns on the Ethereum blockchain through an exploratory analysis of on-chain transaction data from July 2015 to February 2021. Key findings include: 1) Over 90% of externally owned accounts sent 10 or fewer transactions, with a subset of regular users primarily interacting with smart contracts. 2) Centralized and decentralized exchanges were the most commonly used applications. 3) Over 10% of regular users engaged in mining by participating in mining pools, with individual miners distributing mining rewards across an average of 45 low-activity addresses. The results indicate that quantitative analysis at the address-level may be inappropriate and that holistic address clustering is needed to better understand user behavior patterns.

Uploaded by

Truong Vu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
294 views87 pages

User Identification and Behaviour Patterns On The Ethereum Blockchain: An Exploratory Study

This thesis explores user identification and behavior patterns on the Ethereum blockchain through an exploratory analysis of on-chain transaction data from July 2015 to February 2021. Key findings include: 1) Over 90% of externally owned accounts sent 10 or fewer transactions, with a subset of regular users primarily interacting with smart contracts. 2) Centralized and decentralized exchanges were the most commonly used applications. 3) Over 10% of regular users engaged in mining by participating in mining pools, with individual miners distributing mining rewards across an average of 45 low-activity addresses. The results indicate that quantitative analysis at the address-level may be inappropriate and that holistic address clustering is needed to better understand user behavior patterns.

Uploaded by

Truong Vu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

User Identification and Behaviour

Patterns on the Ethereum Blockchain:


An Exploratory Study

Master’s Thesis
for the Attainment of the Degree
Master of Science
at the TUM School of Management
of the Technical University of Munich

Examiner Prof. Dr. Joachim Henkel


Dr. Theo Schöller-Stiftungslehrstuhl für Technologie- und
Innovationsmanagement
Arcisstr. 21
80333 Munich

Submitted by Dinh Truong Vu


Floriansmühlstr 28, 80939 Munich
Matriculation Number: 03710745

Person in Support Daniel Obermeier


Co-operation Partner Center for Digital Technology and Management (CDTM)
Submitted on 19.04.2021
Abstract

Offering the possibility to create and deploy smart contracts, the Ethereum blockchain
covers various use cases, such as decentralised exchanges, financial applications and
tokenisation of digital assets. The use of these applications and their number of users
have witnessed a steady growth since the inception of the platform. Enabled by the
transparent yet pseudonymous nature of blockchains, existing research on transactional
data has focused on the general quantitative analysis of the on-chain data from a
network perspective. This study, however, contributes to knowledge in this area by
identifying user groups and describing different behaviour patterns on an end-user level.
An exploratory quantitative analysis on the on-chain data has been conducted to shed
light on the different user activity levels, how features, such as smart contracts,
decentralised applications and tokens are used, and how users engage with the mining
process of the blockchain. The study leverages data from different sources, such as
Google Big Query and Etherscan, and examines the transactional data from the
inception of the Ethereum blockchain in July 2015 until February 2021. Regarding the
activity level of the externally owned accounts, it could be shown that more than 90% of
all accounts have sent at most ten transactions. A subset of regular users has been
defined to limit the study to a more representative set of addresses. These users
primarily make transactions targeted at smart contracts. Centralised and decentralised
exchanges are the most commonly used applications. Furthermore, over 10% of the
regular users have engaged in the block finding process by participating in mining
pools. Although these mining pool participants send a substantial amount to
decentralised exchanges, the overall majority of mining rewards of each miner is
distributed over, on average, 45 different low-activity addresses on the blockchain.
These results raise questions about the appropriateness of quantitative analyses on an
address level. They indicate the necessity of a holistic address clustering approach to
account for multi-address usage and achieve a more comprehensive picture of user
behaviour patterns.

II
Table of Content

Abstract ............................................................................................................................ II
List of Figures .................................................................................................................. V
List of Tables................................................................................................................... VI
1 Introduction .................................................................................................................... 1
2 Background .................................................................................................................... 3
2.1 Externally Owned Accounts .................................................................................... 4
2.2 Smart Contracts ....................................................................................................... 5
2.3 Decentralised Applications...................................................................................... 6
2.4 Tokens ..................................................................................................................... 6
2.5 Consensus Mechanism ............................................................................................ 7
3 Related Work ................................................................................................................. 8
4 Data Collection............................................................................................................. 12
4.1 Ethereum Data Structure ....................................................................................... 12
4.2 Google Big Query ................................................................................................. 13
5 Data Pre-Processing ..................................................................................................... 14
5.1 On-Chain Data Sources ......................................................................................... 15
5.2 Off-Chain Data Enrichment .................................................................................. 16
5.3 Data Aggregation and Transformation .................................................................. 18
6 Results .......................................................................................................................... 21
6.1 General User Characterisation and Transaction Dynamics ................................... 22
6.1.1 Address Structure ............................................................................................ 22
6.1.2 Transaction Dynamics and Different Activity Levels .................................... 23
6.1.3 Smart Contract Interaction .............................................................................. 29
6.1.4 ERC20 Contract Interaction and Token Transfer Dynamics .......................... 32
6.1.5 Dapp Interaction ............................................................................................. 33
6.2 Mining Pool Participants ....................................................................................... 36
6.2.1 Mining Pool Structure..................................................................................... 37
6.2.2 Miner Identification ........................................................................................ 38
6.2.3 Mining Behaviour ........................................................................................... 40
6.2.4 Transaction Behaviour .................................................................................... 44
6.2.5 Distribution of Mined Ether ............................................................................ 48
7 Conclusion ................................................................................................................... 52
7.1 Overall Summary and Key Findings ..................................................................... 52
7.2 Research Limitations ............................................................................................. 54

III
7.3 Directions for Further Research ............................................................................ 55
References ....................................................................................................................... 57
Appendix A ..................................................................................................................... 60
Appendix B ..................................................................................................................... 65
Appendix C ..................................................................................................................... 67
Declaration of Authorship ............................................................................................... 81

IV
List of Figures
Figure 1: Data Collection until Data Interpretation ........................................................ 15
Figure 2: ECDF of Outgoing Transaction Count ............................................................ 24
Figure 3: ECDF of Active Days ...................................................................................... 25
Figure 4: Comparison of Idle Times and Ether Price ..................................................... 26
Figure 5: Cluster Sizes with Logarithmical Scale ........................................................... 28
Figure 6: Active Days before and after Head/Tail Breaks Clustering ............................ 29
Figure 7: Smart Contract Ratio ....................................................................................... 30
Figure 8: Zero-Value Ratio for all Transactions ............................................................. 31
Figure 9: Zero-Value Ratio for Transactions to Smart Contracts ................................... 31
Figure 10: Number of ERC20 Contract Transactions ..................................................... 33
Figure 11: Comparison of Incoming and Outgoing Token Transfers of Regular Users . 33
Figure 12: Incoming Transactions for Miners and Non-Miners ..................................... 43
Figure 13: Active Days of Miners and Non-Miners ....................................................... 44
Figure 14: Average Payout Value in Ether ..................................................................... 44
Figure 15: Smart Contract Ratio Miners vs Non-Miners ................................................ 47
Figure 16: Zero-Value Ratio Miners vs Non-Miners...................................................... 47
Figure 17: Number of Unique Recipients Miners vs Non-Miners .................................. 47

V
List of Tables
Table 1: Initial Data Frame ............................................................................................. 19
Table 2: Address Structure .............................................................................................. 23
Table 3: Activity Level of Top 10 Dapp ......................................................................... 36
Table 4: Activity Level of Exchanges (Etherscan Labels).............................................. 36
Table 5: Top 5 of 72 Mining Pools by Blocks Mined .................................................... 38
Table 6: Descriptives about Miners ................................................................................ 43
Table 7: Transaction Behaviour Miner vs Non-Miners (Mean & Median) .................... 46
Table 8: Comparison of Incoming and Outgoing Ether for Miners ................................ 49
Table 9: Cumulative Distribution of Ether Sent and Received by Top Addresses ......... 50
Table 10: Ether and Address Distribution Secondary Receivers .................................... 50
Table 11: Top 12 Secondary Receivers by Total Ether Received .................................. 51

VI
1 Introduction
Dubbing the Ethereum blockchain as the “Next-Generation Smart Contract and
Decentralised Application Platform” (Buterin, 2021, para. 1), Vitalik Buterin, the founder
and developer, launched his novel decentralised governed ecosystem in 2015. With
blockchain as the underlying technology, Ethereum introduces functionalities and
applications that go beyond the sole purpose of exchanging cryptocurrency. Although
Ethereum shares the same underlying blockchain technology with other cryptocurrencies,
such as Bitcoin, it includes several distinct features, enabling entirely new possibilities of
user interactions (Buterin, 2021). The distinct property of Ethereum lies in the unique
architecture of the blockchain, which determines the constraints and conditions of
changes in the data structure. The execution of commands and operations is based on a
stack-based architecture, represented by the Ethereum Virtual Machine (Wood). This
virtual machine executes a specific low-level machine code compiled from the high-level
programming language Solidity to write and deploy smart contracts on the Ethereum
blockchain. Unlike Bitcoin’s scripting language, Solidity offers Turing completeness and
facilitates the creation of sophisticated applications (Cai et al., 2018). Ethereum’s smart
contracts determine the prerequisites and sequence of transactions according to the smart
contract’s code. They can also be programmed to interact and exchange data with other
smart contracts on the blockchain. Consequently, whole applications usually used on
platforms such as mobile phones and conventional desktop computers can be created with
smart contracts (Oliva et al., 2020). These so-called decentralised applications cover
various applications, such as games, gambling, marketplaces, and even social networks
(Wu, 2019).

Besides enabling a plethora of different decentralised applications, which can actively be


used on the blockchain, smart contracts also enable the creation of crypto tokens. The
ERC20 token standard, which Vogelsteller and Buterin proposed in late 2015, is the most
widespread token standard on the Ethereum blockchain at the time of writing. ERC20
tokens can be used as digital assets or alternative cryptocurrencies existing next to the
native Ether (Di Angelo & Salzer, 2020). Their respective ERC20 smart contracts handle
the management, transfer, and creation of these tokens. Another noticeable token standard
that has been introduced in 2018, after the launch of the blockchain in 2015, is the
ERC721 standard. ERC721 tokens represent digital assets that are unique and
distinguishable by a unique ID. A known application of the ERC721 token standard is the

1
decentralised game CryptoKitties1, where ERC721 tokens constitute unique assets that
are traded in the game (Sako et al., 2021).

Although smart contracts and tokens are two significant features of Ethereum, there are
also other noteworthy features or properties that end-users can interact with. Recent
cryptocurrency price trends indicate a growing interest in using Ethereum for investment
and trading purposes (Schaupp & Festa, 2018). Both decentralised and centralised
exchanges facilitate the trade of Ether, the native currency of Ethereum, against other
crypto- or fiat currencies. Another type of entity that is an essential element of the
Ethereum ecosystem are mining pools. Mining pools are cooperating miners who
consolidate their computing power to participate in the block finding process (Cong et
al., 2021).

With the growing popularity and number of unique addresses on the Ethereum
blockchain, scholarly interest in the internal blockchain activity has increased during the
last several years (Casino et al., 2019). Although a plethora of blockchain studies have
recognised the necessity for quantitative analyses of blockchain data, most of them have
focused on the general exploratory analysis of the Ethereum blockchain as a whole
(Anoaica & Levard, 2/26/2018 - 2/28/2018; Ferretti & D’Angelo, 2020; Guo et al., 2019;
Motamed & Bahrak, 2019; Sun et al., 2019; Zanelatto Gavião Mascarenhas et al., 2020;
Zheng et al., 2020). Thus, existing literature does not address the question of what distinct
user groups exist and what these groups engage in on the Ethereum blockchain. The
identification of user groups and their behaviour patterns is relevant in many regards.
Scholars have pinpointed the existence of obstacles and entry barriers that hinder the mass
adoption of decentralised applications, for example (Glomann et al., 2019). Therefore,
developers of decentralised applications have an interest in examining the user base of
their applications to assess their popularity and adoption behaviour, which, in turn, can
be considered in future updates. A study from 2017 on the end-user adoption of Bitcoin
as a digital currency revealed that technical curiosity is an important motivation, further
underlining the absent mass adoption by groups other than technology-savvy users
(Presthus & O’Malley, 2017). This study’s insights can serve as a starting point for
tackling the above-mentioned issues by providing a current picture of the user base and
its activities on the Ethereum blockchain.

1
Decentralised game that is centred around non-fungible in-game items, https://www.cryptokitties.co/

2
To account for the increasing diversity of applications, I conducted a quantitative
exploratory study on the user identification and behaviour patterns on the Ethereum
blockchain. The study characterises users by examining their activity level, smart contract
and decentralised applications interaction, and participation in the mining process. To
gain insights into users’ transactional behaviour on the blockchain, on-chain data
extracted from Google Big Query is used and leveraged with several labelled lists from
State of the Dapps and Etherscan. All externally owned accounts and their transactions
from July 2015 until February 2021 are included in the analysis. By primarily analysing
the outgoing transactions of externally owned accounts, the study shows that the regularly
used functionalities of Ethereum go beyond the simple transfer of Ether. Users seem to
engage in smart contracts to a significant extent and use certain types of decentralised
apps more than others. In understanding what different user groups can be distinguished
and what they use the blockchain for, relevant stakeholders can follow a more user-centric
approach for future developments and monitor the network’s status more holistically.

This paper is subdivided into seven sections, with the introduction encompassing the
research topic, the current gap in the literature, the scope of this study and the objectives.
The subsequent section outlines several relevant concepts and elements of the Ethereum
blockchain that are subject to the analysis. Then, related research in the field is examined,
including literature on the quantitative analysis of both the Ethereum and the Bitcoin
blockchain. The data collection and data pre-processing sections elucidate the data
retrieval procedures and the creation of variables derived from the initial data. The main
section is devoted to the presentation and discussion of the results of the exploratory
analysis. It first presents the analysis results on the general user structure and then
elaborates on mining pool participants and their mining and transaction behaviour. In a
final step, the analysis outcomes are concluded, and directions for further research are
voiced.

2 Background
This section will provide background information on the Ethereum blockchain relevant
to the analysis. More specifically, key concepts, such as consensus mechanism, addresses,
smart contracts, decentralised applications and tokens, will be elucidated from a
functional perspective to highlight their role in user-blockchain interactions.

3
2.1 Externally Owned Accounts

Analogous to a bank account required to record transactions between a bank and a


customer, an Ethereum address is required to interact with the blockchain. The identifier
for every Ethereum address is a code in hexadecimal format, specifically the last 20 bytes
of the public key that controls the address. Using an account-based ledger, Ethereum
assigns an account balance to each of the addresses. The state of the account balance,
expressed as Ether amount, changes when either an outgoing transaction or an incoming
transaction with an Ether value occurs (Ether transferred is greater than zero). Ethereum
addresses are made up of two distinct groups, externally owned accounts and smart
contracts. Externally owned accounts (EOAs) represent actual users interacting with
either other externally owned accounts or smart contracts on the blockchain. As user
behaviour patterns are examined in this study, the following explorations and analyses
focus on the transactional pattern of EOAs. The terms user, account and EOA are used
interchangeably throughout the rest of the paper. Addresses, however, can also include
smart contracts.

Smart contracts, on the other hand, are controlled by their code and cannot initiate
transactions. That infers that transactions only occur when an EOA initiates a transaction.
However, both EOAs and smart contracts can be recipients. The recipient address field
may be left out in case of a smart contract creation transaction. Apart from that, a
transaction with an EOA as the recipient has the sole purpose of transferring Ether.
Optionally, limited text information can be sent through the data input field as well.

The sender of each transaction must provide a defined number of input data for the
transaction to be regarded as valid and executed by the blockchain. This input data
includes specifications about the maximum amount of transaction fees (gas), the gas limit,
and the amount of transaction fee per computational step, the gas price that the sender is
willing to pay, to perform the transaction. Although the simple Ether transfer to other
EOAs requires gas consumption as well, these specifications have a more critical role for
smart contract calls. All subsequent computational steps that a smart contract performs
subsequently to a smart contract call must be covered by the gas provided by the
corresponding transaction’s initiator.

The interaction with other EOAs can also be driven by other motifs. Exchanges on the
Ethereum blockchain commonly require the user to send their respective Ether amounts

4
to a deposit address, often EOAs that belong to the exchange, to use their services if users
do not deposit fiat currencies (Victor, 2020). Since the number of different addresses that
can belong to a single person is not limited, Ether transfers to other externally owned
accounts can also represent shifts in Ether to a specific address of the same owner (Somin
et al., 2018). These two examples are non-exhaustive and illustrate the difficulty of
assigning different EOAs to an owner without additional information.

2.2 Smart Contracts

As mentioned earlier, smart contracts are the second group of addresses that can be
subject to a transaction as the recipient role. Smart contracts have to be successfully
deployed on the blockchain before they can be interacted with. EOAs can create smart
contracts by performing a transaction that contains the compiled contract code with an
undefined recipient. If the specified gas limit and gas price are sufficient to cover the
transaction fee of the respective deployment transaction, the smart contract gets
incorporated into the current block and is available on the blockchain. However,
containing the respective contract code, smart contracts can also deploy new smart
contracts themselves. Once a smart contract is deployed, it remains on the blockchain and
is immutable. More specifically, no changes to the contract code can be made
retrospectively. This limitation is due to the immutability nature of the Ethereum
blockchain, ensuring a tamper-proof contract for involved participants. However, a self-
destruct function is available to remove the smart contract from the blockchain.
Developers can draw upon this functionality if a particular smart contract’s replacement
is necessary or if the contract goes obsolete for any other reason.

Successfully deployed smart contracts can then be accessed by EOAs or other smart
contracts. Interactions with smart contracts initiated by EOAs are represented by
transactions to these contracts to invoke their functions. The invocation of these functions
causes the smart contract to execute its contract code according to the function that has
been invoked. To what extent the computation for performing the function takes place
depends on the predetermined gas limit and gas price that the caller or initiator has set.
Depending on the code’s content and the smart contract’s role, a function invocation can
generate a cascade of other function invocations on other smart contracts. These contract
invocations that stem from smart contracts are referred to as message calls, which always

5
have a transaction by an externally owned account as an origin. The gas that is consumed
by message calls is forwarded and provided by the original transaction. This study does
not cover contract creations and contract invocations by smart contracts, as this study
focuses on user behaviour patterns.

With the proliferation of more complex decentralised applications and the increasing
popularity of token exchange, the analysis of smart contract interactions can constitute a
promising source that provides relevant insights into the transaction behaviour pattern of
users of the Ethereum blockchain (Wu, 2019).

2.3 Decentralised Applications

The Turing completeness of Solidity enables the generation of complex machine


instructions that can perform specific tasks on the blockchain. Smart contracts can form
sophisticated applications depending on the complexity of the code and the
interconnection with other smart contracts. These so-called decentralised applications (in
the following referred to as Dapps) run on the Ethereum blockchain and offer various
functionalities and services for the end-users of the blockchain. Since the creation and
deployment of smart contracts is open to every blockchain user, the number of
decentralised applications on the blockchain has been increasing steadily over the past
years. Dapp authors advertise their applications on online directories to foster their
expansion and usage (Wu, 2019). According to the Dapp directory State of the Dapps,
the applications can be divided into various categories, such as Finance, Exchange and
Gambling, that generally describe a Dapp from a functional perspective.

End-users interact with Dapps by sending transactions from their EOA to the
corresponding smart contracts. Therefore, the usage of Dapps always consumes a specific
amount of Ether, gas, that is required to carry out the computations of the application.

2.4 Tokens

As mentioned earlier, ERC20 is the most common token standard among a variety of
other standards that have been introduced in the last few years. A smart contract can be
programmed to act as a bookkeeping entity for a crypto token on the Ethereum
blockchain. The so-called ERC20 interface, which the token contract must include,

6
enables different functionalities, such as token transfers, balance reading of token holders
and total token supply. To transfer tokens from an EOA to another, the sender performs
a transaction to a corresponding token contract. More specifically, the token contract’s
transfer function has to be invoked, leading to the token transfer. Information on the token
balances of each token holder is held by the token contract and thus also updated within
the token contract.

Smart contracts that incorporate this framework handle tokens interoperably with other
smart contracts that are ERC20 compatible. Hence, this standardisation enables more
efficient handling of different tokens and facilitates a fast-developing trading and
exchange token network. Already proposed by Buterin at the inception of Ethereum, it
paved the way for the proliferation of cryptocurrency exchanges that allow users to
exchange fiat money for tokens. Furthermore, crypto tokens are often used to represent
secured digital assets comparable to traditional securities. Initial coin offerings represent
a major use case of crypto tokens and make fundraising through the Ethereum blockchain
possible (W. Chen et al., 2020).

2.5 Consensus Mechanism

Like other blockchains, Ethereum is a decentralised system that requires implementing a


consensus mechanism to maintain a consistent state persistently for all participating
nodes. The mechanism aims at preventing the inclusion of information broadcasted by a
malicious node. Discouraged to exhibit malicious behaviour, validators of new blocks
must carry out computational calculations before mining the following block to the chain.
For the Ethereum blockchain, these calculations are performed to hash data with the
brute-force method until a given solution is found. This procedure is referred to as Proof-
of-Work (PoW) and is a crucial element of Ethereum’s consensus protocol. Miners are
rewarded with a certain amount of Ether upon successfully adding the following block to
the chain. Since the average interval between two blocks is kept constant for security
reasons, the mining difficulty adapts continuously to the growing number of miners and
their mining hardware’s computational power. While miners with ordinary personal
computers made up a substantial proportion of the total users on the blockchain in the
first years after the inception, their chances to mine a block now is negligible compared
to those of mining pools. Mining pools are consolidations of individual miners that pool

7
their resources to achieve a higher combined computational power. Participants in these
mining pools are offered a steady income after time. Since most mining pools pay out
their Ether reward according to the share that all participants have contributed, they attract
a significant number of users and enable an accessible strategy to engage in the consensus
mechanism.

3 Related Work
A plethora of studies in the field of cryptocurrency from the last decade has recognised
the significance of user activity analysis of blockchain networks. However, the vast
majority of scientific work has focused on the Bitcoin network. Released in January 2009,
the cryptocurrency Bitcoin surpasses Ethereum regarding market capitalisation and
operating time at the time of writing. As the applications of Bitcoin and its non-Turing
complete scripting language are far more limited than those of Ethereum, user-related
research has centred around entity clustering and general quantitative analyses of the
Bitcoin blockchain rather than on actual behaviour patterns. Since the use of multiple
addresses by the same user is a common behaviour to increase privacy and anonymity to
a great extent, clustering represents an appropriate method to discover structures on the
blockchain (Ermilov et al., 2017; Victor, 2020). Both above-mentioned areas are highly
related and help us to put the results into perspective.

By using newly developed address clustering heuristics and interacting with a set of
predefined Bitcoin blockchain services, such as mining pools, gambling services, vendors
and exchanges, Meiklejohn et al. (2013) were able to elucidate the transactional structure
of the Bitcoin network. More specifically, the role of various relevant services within the
network was put into perspective by analysing their transactional behaviour (outgoing
and incoming transactions, number of different users). Satoshi Dice, a blockchain-based
betting game, has been examined more extensively. Features, such as the number of
outgoing and incoming transactions and average Bitcoin transferred, revealed that Satoshi
Dice accounted for a significant proportion of micro-valued transactions (transactions
with low Bitcoin value transferred). As these services can be, to some extent, compared
to decentralised applications and other concepts on the Ethereum network, this analysis
is possibly relevant for exploring the interactions between addresses and Dapps on the
Ethereum blockchain.

8
Employing the heuristics used earlier by Meiklejohn et al. (2013), Ermilov et al. (2017)
leveraged both on-chain Bitcoin data and off-chain data to cluster addresses that belong
to the same owner. The transactional behaviour was examined and taken into account in
that process. More specifically, behaviour patterns, such as sending Bitcoin to multiple
recipients in the same transactions, were then augmented with address tags that the
authors collected from public forums and social networks to assess the resulting clusters.

Also addressing the topic of privacy and anonymity of the Bitcoin network, Monaco
(2015) proposed a user-identification method to identify users on the blockchain by
analysing various transactional features. The behaviour pattern that the author exploited
consists of features, such as the time interval between two successive transactions, the
hour of the day of the transaction timestamp, the ratio between outgoing and incoming
Bitcoin value and the ratio between sent and received transactions. By quantifying the
behaviour pattern and observing the transactions over a sufficient amount of time, the
author demonstrates the deterministic character of long-term transactional behaviour for
the analysed set of Bitcoin addresses. His study’s methodology represents an alternative
to common address clustering heuristics to identify entities on the blockchain. The use of
behavioural biometrics in his work indicates the importance of specific behavioural
features on the Bitcoin network that can potentially be leveraged for behaviour analysis
on the Ethereum blockchain as well.

A more model-based method has been employed by Harlev et al. (2018) to de-anonymise
users. Using supervised machine learning, the study succeeded at classifying a set of
addresses with predefined labels. Unidentified clusters were classified with an accuracy
of 77% with the gradient boosting classifier. Variables that were used in the analysis were
chosen on a transactional level and included features, such as transaction timestamp and
Bitcoin value sent or received.

As the Ethereum blockchain matured over time, the respective blockchain research has
also gained momentum. Differing substantially from the Bitcoin blockchain in
functionality and technical specifications, Ethereum has facilitated different research
directions covering the analysis of different components and concepts specific to
Ethereum, such as smart contracts, decentralised applications and token systems. More
holistic approaches are taken by studies on the quantitative analysis of internal activities.
As a network with a high number of users interacting with each other in a complex

9
manner, Ethereum has been subject to several graph analysis studies that attempt to shed
light on the blockchain’s transactional structure.

Address clustering for the Ethereum blockchain has been conducted by several studies as
well (Béres et al., 5/28/2020; Sun et al., 2019; Victor, 2020). With the existence of another
type of address, smart contracts, address grouping widened its scope to include newer
heuristics. Victor (2020) discovers new clustering methods on the Ethereum blockchain
by incorporating patterns, such as exchange deposit address reuse and airdrop multi-
participation, drawing from token transfer and transaction data. Like Monaco in his work
on Bitcoin user identification, Béres et al. (5/28/2020) used similar biometrics or
identifiers to reveal the same owners of addresses. Time-of-day transaction activity, gas
price distribution and transaction graph analysis were incorporated in the study.

The first systematic graph analysis on the internal Ethereum activity has been conducted
by T. Chen et al. (2018). The authors constructed three graphs, money flow graph, smart
contract creation graph, and smart contract invocation graph, to perform a range of
quantitative analyses on the on-chain data. Covering the blockchain data up until 2018,
when the paper was published, the study gave insights into addresses’ blockchain usage
preferences. They have discovered that addresses preferred sending Ether to other
addresses over engaging with smart contracts. Furthermore, they pointed out that only a
few decentralised applications existed at that time, stressing the predominance of
financial Dapps and exchanges. With the emergence of new decentralised applications
and the ERC-20 token standard in the following years after the blockchain’s inception in
2015, behaviour patterns, especially smart contract and token usage, are likely to have
undergone substantial shifts. Putting our results in relation to that development helps to
understand the past growth and the future directions of the network.

Another quantitative analysis on the internal activity of Ethereum conducted by Anoaica


and Levard (2/26/2018 - 2/28/2018) in the same year examined the network regarding
different transaction types, namely user-to-user, user-to-smart contract and smart contract
deployment transactions. The predominance of user-to-user transactions could be
confirmed as well, most probably explained by the low time difference to the study of T.
Chen et al. (2018). Furthermore, Anoaica and Levard emphasised that more than 97% of
all addresses have been engaged in less than ten transactions and that exchanges and
mining pool comprise most of the activity on the blockchain.

10
A more recent study from the year 2020 by Lee et al. (2020) followed a similar approach
to examine the similarity of different networks on the Ethereum blockchain to social
networks. Four different networks were created covering different components of the
blockchain, such as smart contract interactions, token transactions and the complete
EOA-smart contract interactions. Applying graph algorithms, the authors have discovered
distinctive higher outdegree nodes, such as mining pools and transaction mixers, and
distinctive higher indegree vertices, such as ICO smart contracts. Highly connected nodes
such as the exchange Binance account for most of the connections on the network.

To better comprehend the results of the following explorations and analyses regarding
smart contract and Dapp interactions, it is reasonable to review work on smart contract
activity and structure as well. In a recent study by Oliva et al. (2020), an exploratory
quantitative approach has been followed to elucidate the activity level, categorisation and
source code complexity of smart contracts on the blockchain. The authors cross-linked
blockchain data from different sources, on-chain data from Google Big Query and off-
chain data from platforms, such as Etherscan and State of the Dapps. Regarding the
activity level, the smart contract data has been analysed thoroughly along multiple
dimensions and features. The uneven distribution of transactions could be confirmed for
smart contract as well. It has been revealed that less than 0.05% of all smart contracts
received 80% of all smart contract transactions and that a majority of verified smart
contracts belong to that 0.05% of high-activity smart contracts. Furthermore, most of the
high-activity smart contracts have been active in the last 100 days, indicating an even
higher activity concentration on a small number of active smart contracts. Oliva et al.
(2020) also discovered a high proportion of token contracts among the high-activity
contracts. This points to the general high usage of tokens on the blockchain. The authors
also indicated that Games, Exchanges and Gambling Dapps are the most popular Dapps
that belong to high-activity contracts.

A more detailed study exclusively on decentralised Ethereum applications has been


conducted by Wu et al. (2019). One of the research questions that the authors attempted
to answer was how the popularity or activity of different Dapps is distributed in the
network. Drawing from on-chain data and labels from State of the Dapps, the study
confirms the same predominance for Games, Exchanges and Gambling Dapps. Besides
analysing the activity level for the different categories, they also explored the cost of

11
deploying and executing smart contracts of Dapps to understand the Ether consumption
for Dapp usage more thoroughly.

Constituting one of the most important use cases of the Ethereum blockchain, ERC20
token smart contracts have been subject to several quantitative studies as well. A network
analysis on the ERC20 token networks performed by Victor and Lüders (2019) revealed
activity distribution and usage patterns. They indicated that most token transfers focused
on the token distribution from large emitting addresses and high in-degree exchanges that
users send their tokens to. Furthermore, EOAs do not seem to send tokens to each other.
Tokens tend to remain in the ownership of the respective addresses once they have been
emitted, showing a low degree of circulation.

4 Data Collection
In the following, I describe which data sources have been used for the analysis and how
the relevant data is organised and structured within these sources. This helps us to
understand the nature of the data and to assess its credibility.

4.1 Ethereum Data Structure

As the analysis of user groups and their behaviour patterns on the Ethereum blockchain
mainly relies on the actual on-chain data, it is essential to comprehend the data’s
organisational structure and how it is leveraged in the following to generate insights on
an account level. Operating within the framework of a global virtual machine, the
Ethereum blockchain features several transaction execution and data storage mechanisms
that differ significantly from those of other blockchain networks. Interactions among
EOAs and between EOAs and smart contracts result in changes in the virtual machine’s
global state and thus in the properties of the corresponding addresses on the blockchain.
These transactions and their relevant information are organised in a unique data structure,
a Patricia tree, which has been modified to incorporate properties of a Merkle tree.
Relevant details of transactions include but are not limited to the value, gas price and
transaction nonce. Each block includes the root node of the transaction tree of their
corresponding transactions in the block header. Consistent with the immutable nature of
blockchains, completed transactions within a block cannot be altered, thereby

12
determining status details, such as the number of successfully sent transactions and Ether
balance, of every address on the blockchain. The addresses and their corresponding
details are stored in another separate structure, the state tree, also a modified Patricia tree,
which exists globally and is constantly updated by each completed transaction.
Furthermore, the modified Patricia tree’s specific properties facilitate an efficient
reference between the current state and previous states, rendering the storage of the whole
blockchain history in every block obsolete.

As the relation between the state tree and the transaction tree suggests, the Ethereum
blockchain uses an account-based ledger where each unique address refers to a unique
account. This logic is fundamentally different from the transaction-based ledger of the
Bitcoin blockchain, which state consists of the assignment of unspent Bitcoins to the
accounts that are eligible to spend these Bitcoins. The distinct assignment between
addresses and their balances on the Ethereum blockchain makes the data processing by
Google Big Query and the resulting analysis more straightforward from a user
perspective.

4.2 Google Big Query

Obtaining the complete blockchain data requires running a full node because all the data
can be inferred from the full node. A full node provides a copy of the entire state of the
Ethereum blockchain. However, since processing data from a full node to obtain
blockchain data in a meaningful, queryable structure for further analysis is
computationally complex and non-trivial, we resort to using Google Big Query’s online
service2.

To fill the gap of a missing blockchain data extraction tool capable of handling and
dealing with complex and unconventional data structures, Google Big Query
implemented Python scripts3 for the extraction, transformation and loading of internal
blockchain data. The data includes information about blocks, transactions,
ERC20/ERC721 tokens and their transfers, receipts, logs, smart contracts and internal
transactions. More precisely, Google Big Query extracts the data from nodes in the cloud,

2
https://cloud.google.com/bigquery
3
https://github.com/blockchain-etl/ethereum-etl

13
which fetch the data by running Parity, an open-source Ethereum client4. The data is then
stored in Google’s data warehouse BigQuery, which facilitates scalable data analysis over
a high amount of data. Offering Online Analytical Processing capacities, it provides a
regularly updated Ethereum dataset, which can be queried by using Standard SQL, a
structured query language to communicate with data structures in a relational database.
The Ethereum blockchain dataset is stored, together with various other publicly available
datasets, in the section for public datasets.

Google BigQuery will be the primary data source for the following explorations and
analyses. Relevant data is primarily queried within the in-browser Google Cloud Console,
and corresponding query results are exported as a comma-separated values file before
they are further processed. In addition, the service features exporting and saving of query
results as datasets on Google Cloud. Querying own datasets enables shorter computation
time for queries, which would otherwise require a high degree of subqueries and nested
statements. Leveraging the computational power and capacity, we make use of the
features of uploading and saving our own datasets to enrich the on-chain data with
labelled data. This data includes labels and categories that we obtain from off-chain data
sources such as Etherscan5, an established block explorer and blockchain analytics
platform, and State of the Dapps6, a curated list of Dapps of different blockchains. A more
detailed explanation of the data handling and the data aggregation process will be
provided in the following sections of the paper.

5 Data Pre-Processing
To enable more detailed results, the data needs to be obtained, transformed and
aggregated with off-chain data. In this section, the different pre-processing steps are
described. First, I define the data sources, on-chain and off-chain, and the specific data
tables included in the analysis. Then, new variables were generated by aggregating the
original variables with the off-chain data. An overview of the different data processing
steps can be found in Figure 1. All steps except for the last step have been performed
online on Google Big Query by querying with SQL. A complete list of all relevant queries

4
https://github.com/openethereum/openethereum
5
https://etherscan.io/
6
https://www.stateofthedapps.com/

14
in ANSI SQL can be found in Appendix C. Several openly accessible Python software
libraries, such as pandas7, NumPy8 and Matplotlib9, provided the analysis and
interpretation tools for the data manipulation and analysis.

Figure 1: Data Collection until Data Interpretation

Source: Own representation

5.1 On-Chain Data Sources

The blockchain data, which Google Big Query obtains, is de-normalised before it is stored
in the cloud. The organisation in different, queryable tables enables the exploration and
study of various research questions. Focusing on analysing users’ transactional behaviour
patterns on the Ethereum blockchain, this study includes only several specific Google Big
Query tables.

As the data has been aggregated on an address level, tables that include the entirety of
addresses provide the foundation for the analysis. The “crypto_ethereum.balances” table
contains all addresses on the blockchain with their Ether balances. To filter out smart
contracts and examine transactions to smart contracts at the same time, I included the
table “crypto_ethereum.contracts”, which contains all smart contracts’ addresses with
additional information, such as type of token contract and bytecode. Further incorporating

7
https://pandas.pydata.org/
8
https://numpy.org/
9
https://matplotlib.org/

15
the table “crypto_ethereum.transactions”, we are able to examine the level of activity for
both outgoing and incoming transactions. To consider token trading, a central use case of
smart contracts, the tables “crypto_ethereum.tokens” and
“crypto_ethereum.token_transfers” have been leveraged to include the token transfer
activity of all addresses on the blockchain.

5.2 Off-Chain Data Enrichment

As mentioned earlier in the paper, Google Big Query’s capability of uploading and
processing custom datasets has been leveraged to enrich the query results with off-chain
labels. Although the Ethereum blockchain’s account-based ledger supports a more trivial
analysis on an address level, their entity name or type (e.g. exchanges, mining pools or
Dapps) cannot be inferred from the address, which constitutes a 20-byte address
identifier. Crypto exchanges, which run on the Ethereum blockchain, generally rely on a
certain number of interlinked smart contracts to offer specific services, such as
exchanging tokens for Ether and swapping different tokens, to the user on the blockchain.
By solely examining the on-chain data, it would not be apparent to an external observer
whether a group of smart contracts makes up a decentralised application without
deploying more sophisticated analytical methods, such as network and graph analysis.

The leading online Ethereum block explorer Etherscan, which is openly accessible
through the browser, provides various functions, such as general blockchain data
visualisation and a search engine, to examine transaction properties or address activity in
detail. The platform offers the possibility to submit labels for EOAs and smart contracts.
Submissions are verified and then approved to get accumulated in a label directory. Since
the exploration and analysis are limited to identifying different user groups and their
behaviour, extracting name tags for both EOAs and smart contracts reveals insights about
the transaction behaviour regarding the recipient preference. Therefore, the Etherscan
label directory has been scraped to obtain name labels for addresses that belong to mining
pools, centralised exchanges and decentralised exchanges.

With Ether as the native cryptocurrency of the Ethereum blockchain and mining new
blocks to the blockchain as the only way of adding new Ether to the network, it is
reasonable to enrich the addresses extracted from Google Big Query with mining pool
labels. As the mining difficulty of the cryptographic puzzle of the Proof-of-Work

16
consensus mechanism has generally been increasing dynamically since the genesis block,
miners with conventional hardware started to accumulate their computational power by
forming mining pools (Zamyatin et al., 9/20/2017 - 9/22/2017). Examining the
transactions between mining pool addresses and other EOAs, the analysis attempts to
shed light on the spending behaviour of mining pool participants, which represent the first
spenders of newly mined Ether.

As the entire set of miner addresses, which have mined all blocks up to the present block,
can be conclusively determined, the ratio of mined blocks by mining pools to the total
number of mined blocks on the Ethereum blockchain has been examined. Looking at all
miner addresses up to the 11,828,337th block, it can be inferred that a significant number
of blocks has been mined by the set of mining pools that are labelled on Etherscan.
Seventy-two addresses from 72 different mining pools have been extracted.

Although it is not possible to precisely determine the ratio of labelled centralised and
decentralised exchanges to the entirety of exchanges on the Ethereum blockchain,
assessing the number of incoming and outgoing transactions of labelled exchanges in
relation to the total number of transactions that have taken place on the blockchain
suggests a significant fraction of traffic. A total number of 377 addresses are labelled as
exchanges.

Considering Dapps as one of the key use cases of the Ethereum blockchain, it appears
reasonable to incorporate off-chain data regarding Dapps (Wu, 2019). For this purpose,
State of the Dapps has been included in the off-chain data enrichment process. State of
the Dapps is an online database that offers a curated list of Dapps from different
blockchains. Besides a variety of metadata, such as author, current operation status, web
presence and a short description, State of the Dapps provides a list of associated smart
contract addresses for a limited number of Dapps. Furthermore, category labels are
available for every Dapp and their associated smart contracts, enabling Dapp usage
analyses on a categorical level. However, since the exact number of Dapps or smart
contracts belonging to Dapps is not determinable with on-chain data, no inference on the
actual proportion of labelled addresses can be made. The scraped data from State of the
Dapps includes 3813 smart contracts addresses from 1137 different Dapps divided into
18 categories.

17
All address labels obtained from Etherscan and State of the Dapp have been converted
into comma-separated values files and uploaded to Google Big Query to aggregate both
on-chain data and off-chain data with appropriate queries.

5.3 Data Aggregation and Transformation

Different data aggregation and transformation steps have been conducted on both the on-
chain and off-chain data to create new descriptive transactional parameters.

First, features, such as transaction count and token transfer count, have been transformed
to construct normalised variables that describe the transactional activity pattern rather
than the level of activity itself. Subsequently, the on-chain data have been enriched by the
off-chain data from both Etherscan and State of the Dapps.

These constructed variables can be divided into two groups. The first group contains
variables related to each individual address and the second group holds variables stemmed
from the address interactions. In the following, the construction of these variables is
described, and an overview of the relevant variables for the analysis is presented.

The initial data frame that has been obtained by querying the Google Big Query tables in
5.1 and has not yet been subject to data aggregations and transformations is represented
in Table 1. The data table “crypto_ethereum.balances” contains all addresses on the
blockchain together with their Ether balance denoted in Wei10, a base unit of Ether. Since
only EOAs are relevant for the analysis, all addresses which represent smart contracts
have been excluded from the data. “crypto_ethereum.contracts”, which includes all smart
contract addresses, has been used for the matching process to examine transactions to
smart contracts. To extract all relevant addresses’ activity level regarding their active time
and their level of activity, the two tables, “crypto_ethereum.transactions” and
“crypto_ethereum.token_transfers”, have been aggregated with the address list. Both
tables contain all performed transactions and token transfers, respectively. Each
transaction’s timestamp represents the point in time when the respective block that
included that transaction was mined by the miner. Thus, the timestamp of the first and the
last outgoing transaction of every address is included to assess the activity period. The
number of outgoing transactions and incoming transactions is trivially obtained by

10
Smallest denominator of Ether, 1 Ether = 1,000,000,000,000,000,000 Wei (10 18)

18
counting the occurrences for the corresponding address as sender for the outgoing count
and receiver for the incoming count. As mentioned in the previous section 2.6, it is worth
noting that transactions and token transfers do not have the same order from an
architectural perspective. One transaction can lead to several token transfers (i.e., the
transfer of several different tokens). One token transfer always represents the change of
ownership of tokens from a specific token contract. This is illustrated by comparing the
total number of transactions to smart contracts with the total number of token transfers,
which exceeds the former figure significantly.

Table 1: Initial Data Frame

Ether Tokens
Address First Tx Sent Last Tx Sent Tx Sent Tx Received Tokens Sent
Balance Received

0x6c8eff… 88362550… 2019-10-21… 2021-01-20… 998 177 53 1995



Source: Own representation

In addition to the initial variable set extracted directly from the Google Big Query data,
various derived variables were constructed by transforming the original variables. The
performed steps are listed in the following.

1. Out/In Token Ratio measures the proportion of outgoing token transfers (Out
Token) to the total sum of outgoing and incoming token transfers (In Token).

2. By combining the transactions and the contracts table, the variable SC Tx Sent
variable is formed, which indicates the number of outgoing transactions to smart
contracts. Correspondingly, SC Ratio specifies the proportion of smart contract
transactions to the total number of outgoing transactions (SC Tx Sent/ Tx Sent).

3. Zero Tx Sent represents the number of outgoing transactions without an Ether


value. These transactions do not transfer Ether to the recipient. However,
transaction fees still apply. Zero Tx Ratio specifies the proportion of zero value
transactions to the total number of outgoing transactions, respectively (Zero Tx
Sent/ Tx Sent).

19
4. As the contract table contains information about the existence of an ERC20
interface for each contract address, one can infer the frequency of each user’s
token contract interaction. ERC20 Tx Sent represents the number of outgoing
transactions to ERC20 contracts. ERC20 Ratio specifies the proportion of ERC20
transactions to the total number of outgoing transactions, respectively (ERC20 Tx
Sent/ Tx Sent).

5. The Unique Receiver variable holds the number of distinct addresses to which an
address has sent transactions.

6. The Unique Sender variable holds the number of distinct addresses from which
an address has received transactions, respectively. These two variables reveal
whether a user interacts with a high number or only with a limited number of other
addresses.

7. To include temporal features in the dataset, we define the Active Days as the time
difference between the first outgoing transaction and the last outgoing transaction
in days for each address.

8. To evaluate whether an address has been active recently, the variable Idle Time is
calculated by determining the difference between the last outgoing transaction of
an address and the data extraction date.

After introducing the new variables derived from the original variable set, the Google Big
Query data and the off-chain from Etherscan need to be combined and aggregated to
obtain a deeper insight into the specific entities of senders and recipients that are
addressed in the examined transactions.

9. To assess the involvement in the mining process and thus in the block reward
system, the list of mining pool addresses has been joined with the transactional
data of each address. By identifying the transactions with mining pools as senders,
one can make inferences about the degree of mining pool participation for each
address. The number of transactions received from mining pools is represented by
the variable Mining Pool Tx Received, which is equal to the number of block
reward payouts for pools with direct payout schemes. Furthermore, counting the

20
distinct mining pool addresses results in the unique number of mining pools in
which the users participated (Unique Mining Pools). The contributed
computational power to the pool is proportional to the paid-out block rewards for
proportional payout schemes. Thus, the level of mining pool contribution is
determined by the Total Amount of Ether from Mining Pools and the Average
Received Block Reward (Zamyatin et al., 9/20/2017 - 9/22/2017). The Mining
Ratio is the ratio between Mining Pool Tx Received and Tx Received and
represents how much of the received transactions constitutes Ether reward
payouts.

6 Results
In this section, the analysis findings on the data whose pre-processing steps have been
described in the previous section are presented. The number of transactions an address
has sent and received serves as the primary variable to classify the relevant user groups
and analyse their usage patterns. First, the overall user structure on the Ethereum
blockchain is characterised along with features, such as activity level and activity time.
Then, the blockchain’s statistical and structural properties regarding user activity are
studied to arrive at a representative address sample. A subgroup of addresses is defined
by employing a clustering algorithm that clusters the EOAs by the number of outgoing
transactions. The selection of an appropriate subgroup aims at removing outliers, such as
low-activity addresses and high-active non-human user addresses (e.g. exchanges and
wallets). Next, the distribution of different parameters, such as SC Ratio and Zero Tx
Ratio, is analysed. The examination of ERC20 token contract and Dapp interactions
completes the first part of the results section.

After examining the overall transaction behaviour of EOAs, the mining pool participants
or miners are analysed in Section 6.2. The relevance of mining pools on the Ethereum
blockchain is illustrated by determining the share of their mined block rewards to the total
block rewards until the data extraction date. By assessing the outgoing transactions from
mining pools, all miners are identified. Subsequently, I investigate different parameters,
such as Average Received Block Reward and Unique Mining Pools, to analyse the mining
behaviour. The same transactional parameters that examine the interaction with other
addresses in Section 6.1.3 (e.g. SC Ratio) are calculated for mining pool participants as

21
well to pinpoint differences in transactional behaviour patterns. Lastly, I examine all
addresses that receive Ether transfers from miners to analyse how miners distribute their
Ether in the network.

The results enable us to gain more specific insights into the existing user groups and
understand which use cases and features of the Ethereum blockchain are the driving
activities for each user group.

6.1 General User Characterisation and Transaction Dynamics

In this first part of the results section, the entirety of EOAs is explored. This part gives a
general overview of the allocation of addresses on the blockchain and serves as a starting
point for further user group identification analyses.

6.1.1 Address Structure

Table 2 summarises the address structure of the Ethereum blockchain on the 10th of
February 2021. The total number of unique addresses queryable on Google Big Query on
that above-mentioned date amounts to more than 152 million, with around three-quarters
constituting EOAs. By identifying the EOAs that do not occur in the transaction table, the
number of inactive addresses with no outgoing transactions can be determined. More than
21%, around 24 million, of EOAs belong to those inactive addresses. This high number
of inactive addresses is due to an atypical occurrence in the second year after the inception
of Ethereum. No fewer than 19 million empty addresses were created in Autumn 2016
during a Denial-of-Service attack, which exploited a security flaw of the Ethereum client
Geth (Bok Consulting Pty Ltd, 2016). The attack created a large number of smart
contracts, which created numerous new EOAs. The analyser of the attack pointed out one
exemplary smart contract11 that received 4750 transactions and facilitated the address
creation. As the creation of these inactive EOAs that this address has created is still
queryable on Google Big Query, it can be assumed that a substantial proportion of these
empty addresses make up the more than 24 million inactive accounts found.

11
https://etherscan.io/address/0x6a0a0fc761c612c340a0e98d33b37a75e5268472

22
Table 2: Address Structure

OG Tx: Outgoing transactions

Type of Address N Percentage

ERC20 199,012 0.5 %

ERC721 4,394 0.01 %

Other Contracts 37,063,390 99.49 %

Smart Contracts 37,266,796 24.4 %

OG Tx > 0 90,994,522 88.9 %

OG Tx = 0 24,356,227 21.1 %

EOAs 115,350,749 75.6 %

Source: Own representation

6.1.2 Transaction Dynamics and Different Activity Levels

Considering the more than 900 million transactions that have been sent until the 10th of
February 2021, it is crucial to further examine how they are distributed over the remaining
active 90 million EOAs. Figure 2 illustrates the empirical cumulative distribution for the
number of outgoing transactions for all EOAs with at least one outgoing transaction. More
than 47% of the active addresses have sent only one transaction. 93.47% of these
addresses have sent no more than ten transactions. The heavy skewness towards addresses
that made very few transactions could suggest that most users generally send very few
transactions from their EOAs. It suggests that they interact very infrequently with the
blockchain or that an owner uses multiple addresses that he uses only once.

23
Figure 2: ECDF of Outgoing Transaction Count

Source: Own representation

To put the distribution of outgoing transaction count in perspective to the address lifetime,
the cumulative distribution function for the Active Days of addresses with at least one
outgoing transaction has been plotted in Figure 3. As all transactions until the
11,828,337th block, which was mined on the 10th of February 2021, were considered, the
addresses with the most extended lifetime have been active for 2014 days. This is
consistent with the time difference between the genesis block’s timestamp and the
timestamp of the 11,828,337th block (2022 days). The overall majority of 64.62% of
EOAs has been active for one day or less. This group represents addresses that have sent
only one transaction or multiple transactions within less than 24 hours. These addresses
will be called one-day addresses in the following. Further, only 5.51% of all addresses
are active over a period of 365 days or more.

24
Figure 3: ECDF of Active Days

Source: Own representation

An examination of the Idle Time gives insights into the number of last active addresses
within a specific period before the extraction date. As the majority of addresses have only
sent one transaction or are active for one day or less, the idle time for these addresses is
equal to their creation date or first occurrence on the blockchain. To distinguish the above-
mentioned user group from the remaining population, I, therefore, present the distribution
of idle times for both addresses that have been active for one day or less and addresses
that have been active for two days or more in Figure 4. The histogram bins the idle time
in days in one-month intervals from the data extraction date until the inception day of
Ethereum. By plotting the daily Ether price against the idle time, a strong correlation with
one-day addresses’ idle time can be visually observed. The lighter bars, which represent
the idle time of one-day addresses, demonstrate a sharp increase of occurrences between
the 30th of December 2017 and the 30th of January 2018, the highest number within 30
days. During this period, almost five million one-day addresses have made their first
transaction and have been inactive ever since. This coincidence indicates a possible
attraction of short-term Ethereum users by significant surges in popularity (Sovbetov,
2018). By further examining the values for addresses that have been active for two days
or more, a less volatile increase of idle times toward the data extraction date throughout
the years can be seen. This analysis suggests a steady increase in the number of users that
have been active at the same period (bin size: 30 days), with the highest number for the

25
most recent period before the extraction date. Around three million addresses that have
been active for two or more days have sent their last transaction within 30 days of the
extraction date.

Figure 4: Comparison of Idle Times and Ether Price

Source: Own representation

After analysing the distribution of the number of outgoing transactions, active days, and
idle times of all addresses on the blockchain, the statistical properties are used to derive
a sample of EOAs that represents regular users. The inclusion of addresses that were last
active a considerable time before the extraction date potentially enables examining a
larger variety of user groups and changes in the behaviour patterns since the genesis
block. However, as different applications and features of Ethereum, such as the ERC20
token standard, decentralised apps and mining pools, were not initially present on the
blockchain and only gained popularity or were introduced separately over the years after
the inception, limiting the addresses to be analysed to a specific timeframe avoids
distortions of transactional behaviour patterns. As can be seen from the idle time plot in
Figure 4, a significant and persistent increase of the Ether price and the number of
simultaneously active users rule the network dynamics since 2020, with the highest
monthly increase from two months to one month before the extraction date. Thus, we
limit the following analysis on addresses with their last transaction within the last 30 days
of the extraction date (Idle Time ≤ 30 , excluding inactive addresses. Assuming that the

26
development of transactional behaviour pattern requires a certain amount of usage time,
one-day addresses will be excluded from the sample to be analysed (Active Days ≤ 1 .

After examining the number of outgoing transactions for the remaining addresses, the
large range of 31,098,717 transactions implies the existence of significant outliers that
have sent an anomalously high number of transactions. Querying the five addresses12 with
the highest transaction count, we observed that all accounts belong to a mining pool or an
exchange, thus, not to a human user. To arrive at a set of regular users, both accounts with
low activity and accounts with transaction counts on the high end of the distribution must
be removed reasonably. Using a clustering algorithm results in breaks by which the data
can be divided. As the transaction count distribution exhibits properties of a heavy-tailed
distribution with significantly higher frequencies for the low values and lower frequencies
for the higher values, the head/tail breaks algorithm should be preferred over the more
commonly used Jenks natural breaks classification method13 for clustering one-
dimensional data (Jiang, 2013). Jiang (2013) describes that the algorithm iterates through
the data by partitioning the values in the head of the distribution until a heavy-tail
distribution cannot be observed for the new partitioned values anymore, thereby
clustering the data without requiring external data about the number of breaks or clusters.
Both the cluster sizes and the intervals preserve the heavy-tail nature of the data.

The algorithm clusters the around three million addresses, which were last active within
30 days before the extraction date and were active for longer than one day, into nine
clusters. Figure 5 shows the histogram for the nine clusters with a logarithmic y-axis. The
clustering analysis results suggest that the addresses within the first cluster with
transaction counts between 2 and 123 belong to low-activity addresses. Being the overall
majority, they account for more than 94% of the addresses under consideration.
Furthermore, addresses within the third cluster and upwards can be regarded as
abnormally high-active addresses, such as addresses belonging to an exchange or a
mining pool. Thus, the set of EOAs representing regular users contains around 166,000

12
Ethermine: https://etherscan.io/address/0xea674fdde714fd979de3edf0f56aa9716b898ec8,
Nanopool: https://etherscan.io/address/0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5,
Bittrex: https://etherscan.io/address/0xfbb1b73c4f0bda4f67dca266ce6ef42f520fbb98,
F2Pool: https://etherscan.io/address/0x829bd824b016326a401d083b33d092293333a830,
Binance: https://etherscan.io/address/0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be
13
Clustering algorithm that minimises deviations of members within the from cluster mean and maxim-
ises deviation from means of other clusters.

27
addresses with outgoing transaction counts between 124 and 1.894. Hereinafter these
addresses are referred to as regular users.

Figure 5: Cluster Sizes with Logarithmical Scale

Source: Own representation

To evaluate the clustering process and assumptions that resulted in the set of regular users,
a comparison between the Active Days distribution of regular users and that of all users
before clustering is conducted. Both distributions are plotted against each other in Figure
6. Since both datasets represent recently active addresses (Idle Time ≤ 30), the subtraction
of Active Days from the extraction date roughly results in the creation date. The
distribution for the set of addresses before clustering shows an apparent heavy-tail
property, which is consistent with the empirical cumulative distribution function for the
entirety of addresses with at least one outgoing transaction in Figure 3. On the contrary,
the distribution for the regular users demonstrates a pattern that differs from that of a
heavy-tailed distribution. It suggests that in periods of high volatility (January 2018 and
January to February 2021), the number of new users that become long-term user,
addresses that have been active since then is higher than in periods of low volatility.

28
Figure 6: Active Days before and after Head/Tail Breaks Clustering

Source: Own representation

6.1.3 Smart Contract Interaction

Having defined a set of addresses, regular users, to be studied, we employ the transaction
and token centred variables from section 5.3 to conduct analyses on the general behaviour
pattern. To examine the more specific characteristics of the transactions sent by regular
users as well as the recipients that they interact with, SC Ratio, Zero Tx Ratio, ERC20 Tx
Sent and the Out/In Token Ratio is calculated. Figure 7 displays the distribution of SC
Ratio of regular users where zero implies that the address has exclusively interacted with
other EOAs. One implies that they have sent all transactions to smart contracts. The
number of bins was set to 100 to capture more details of the distribution. As the histogram
indicates, most regular users sent more transactions to smart contracts than to other EOAs.
The mean for the SC Ratio is 0.76, and the median is 0.92. However, at both the low end
and the high end of the distribution, a derivation from the remaining plot can be seen.
More than 14,000 of the 166,000 regular users do not interact with smart contracts or in
less than 1% of all outgoing transactions. On the other hand, more than 16,000 users
exclusively sent transactions to smart contracts. These findings are not consistent with
the quantitative analysis of the Ethereum blockchain of Anoaica and Levard (2/26/2018
- 2/28/2018) published in 2018. The analysis of the transactions from the genesis block
until the block mined in August 2017 found a transaction pattern skewed towards user-

29
to-user transactions. The proportion of user-to-smart contract transactions to all
transactions, on the other hand, was only 34.3% for the examined timeframe. Our findings
possibly suggest a significant general transition to a more smart contract oriented
transaction behaviour. It should be noted that the constraints of considering only regular
users might have an impact on this figure as well because the most active accounts, which
are mining pools and exchanges, have been removed. These addresses send the majority
of the transactions to EOAs.

Figure 7: Smart Contract Ratio

Source: Own representation

Transactions do not necessarily have to carry an Ether value. Transactions sent to smart
contracts to invoke their functions do not often send Ether to the smart contract. The
interaction with an ERC20 token contract where no Ether is transferred is an example of
a zero-value transaction. Invoking the token contracts functions to change the ownership
of someone’s tokens or set the desired allowance amount does not require the transfer of
Ether. Only Ether in the form of gas for the transaction processing is mandatory. Figure
8 shows the distribution of the proportion of zero-value transactions to the total number
of outgoing transactions (Zero Tx Ratio). The visual impression of this distribution is
similar to that of the SC Ratio distribution. It clearly shows outlying addresses on both
the minimum and maximum x-value in the histogram. Approximately 20,000 of the
166,000 regular users have transferred Ether in all of their transactions, and around 12,500
addresses did not transfer Ether in more than 99% of their transactions. The average and
the median for the Zero Tx Ratio are 0.63 and 0.70, respectively. To test the hypothesis

30
of a higher zero-value transaction proportion for smart contract transactions, the Zero Tx
Ratio is plotted for all outgoing transactions to smart contracts in Figure 9. The
distribution shows a significantly different pattern as the Zero Tx Ratio is noticeably
higher for transactions to smart contracts. The average and the median are 0.79 and 0.84,
respectively. Approximately 20% of all regular users (around 35,000) with at least one
outgoing transaction to a smart contract never transferred Ether to a smart contract. Both
analyses indicate that most of the transactions sent by regular users do not transfer any
Ether to the recipient and that zero-value transactions to smart contracts predominate.

Figure 8: Zero-Value Ratio for all Transactions

Source: Own representation

Figure 9: Zero-Value Ratio for Transactions to Smart Contracts

Source: Own representation

31
6.1.4 ERC20 Contract Interaction and Token Transfer Dynamics

To further analyse the structure of smart contracts that regular users use, we examine the
number of ERC20 contracts transactions. ERC20 token contracts constitute an essential
subgroup of smart contracts on the Ethereum blockchain, as ERC20 is the most common
token standard at the time of data extraction. As mentioned earlier, for an EOA to invoke
different token related functions, including the transfer function, it must send a transaction
to the corresponding ERC20 contract. The relationship between transactions to ERC20
contracts and actual token transfers is examined in the following. Figure 10 represents the
histogram for the number of transactions sent to ERC20 contracts (ERC20 Tx Sent). The
range of outgoing transactions for regular users is between 124 and 1894 (as defined
during the clustering process in 6.1.2), and the average and median are 365 and 245
transactions, respectively. However, the average and median for transactions to ERC20
contracts are only 63 and 14. Furthermore, transactions to ERC20 contracts amount to
only 18% of all transactions and 25% of transactions to smart contracts (ERC20 Ratio).

Figure 11 depicts the number of incoming and outgoing token transfers for the regular
users. With 248 incoming and 184 outgoing token transfers on average, the studied
address set exhibits a higher in-degree with approximately 34.8% more incoming
transfers than outgoing. As tokens can also be distributed to arbitrary recipients during an
Airdrop event, an address might accumulate tokens over time that he did not intend to
receive in the first place (Victor, 2020). These tokens might partly account for the higher
token in-degree (mean and median of Out/In Token Ratio are 0.44 and 0.45). However,
the relatively high difference between incoming and outgoing token transfers might be
due to a high proportion of token investors which rather hold tokens than sell them.
Moreover, the high difference between the number of ERC20 transactions (ERC20 Tx
Sent) and the actual token transfers (incoming and outgoing) suggest that a significant
proportion of the token transfer activity is not caused by direct interactions with ERC20
contracts. Decentralised exchanges, for example, offer the possibility of trading different
tokens and Ether with other users on the blockchain. Tokens can also be purchased from
centralised exchanges, such as Coinbase and Binance, with Fiat money.

32
Figure 10: Number of ERC20 Contract Transactions

Source: Own representation

Figure 11: Comparison of Incoming and Outgoing Token Transfers of Regular Users

Source: Own representation

6.1.5 Dapp Interaction

After analysing the interaction between regular users and smart contracts and ERC20
contracts, an overview of the Dapp usage will be given. As the Dapp data from State of
the Dapps provides us with category labels for each Dapp, the activity level of each Dapp
will be compared to each other. We augment the Dapp data by adding the Etherscan labels
for centralised and decentralised exchanges. As the Dapp list already contains a category
exchanges, a possible overlap of addresses is removed from the analysis. However, only
a negligible number of overlapping addresses had to be removed because the Etherscan

33
exchange list contains major exchanges, such as Binance and Bittrex, that broker
exchanges and offer off-chain access. In contrast, the category exchanges from the Dapps
list rather contains services and applications for on-chain users.

Table 3 shows the activity level for the ten most relevant Dapp categories by the
transactions they have received from regular users. More than 50% of all regular users
have sent at least one transaction to an exchange Dapp. The second most popular Dapp
category by the coverage of users is the finance category, as 43% of the regular user have
used at least one finance Dapp. Gaming Dapps have the highest number of transactions
per user. Gaming Dapp users send on average more than 40 transactions to such Dapps.
The high transaction count relative to the lower number of users might suggest that
Gaming Dapps require a higher engagement and a higher number of transactions or that
users of those apps generally interact more frequently with the blockchain.
CoinGathernator14 is the most widespread Exchange Dapp used by more than 26%
(44,329) of all regular users. The most widespread Finance Dapp is MakerDAO15, being
used by 33% (55,230) of all regular users. The Gaming Dapp with the highest reach as in
proportion of regular users who have sent a transaction to one of the Dapp addresses is
BRAVE FRONTIER HEROES16. More than 7% (11,818) of the regular users have
engaged in this Dapp.

Table 4 presents the activity level for centralised and decentralised exchanges.
Transactions to decentralised exchanges account for almost 20% of regular users’
outgoing transactions (11,894,999 of 60,931,081 total outgoing transactions). In
comparison, the number of transactions to centralised exchanges only amounts to 2.77
million transactions or 4.5% of all outgoing transactions. 61% of all regular users have
interacted with decentralised exchanges, but only 10% have sent a transaction to a
centralised exchange. Examining the most used exchanges for both categories, it is
apparent that Uniswap17 is the most popular decentralised exchange. It accounts for more
than 90% of all transactions sent to all decentralised exchanges. However, the user
distribution of centralised exchanges differs significantly from that of decentralised ones.
As the most popular centralised exchange Binance only covers around 2% of all regular
users or 21% of all users who use centralised exchanges, it is evident that they are more

14
https://www.stateofthedapps.com/dapps/coingathernator
15
https://www.stateofthedapps.com/dapps/makerdao
16
https://www.stateofthedapps.com/dapps/brave-frontier-heroes
17
https://uniswap.org/

34
evenly distributed over several different exchanges. The top five centralised exchanges,
Binance, Huobi, ZB.com,Yobit.net and Poloniex, are used by only more than 56% of
exchange users. The sum of distinct users for each centralised exchange is roughly equal
to the number of distinct users for the whole category. This indicates that users do not use
different exchanges simultaneously and do not decide to change exchanges as the two
sums of the individual exchanges would otherwise be higher than the total category sum.
Under the assumption that the on-chain interactions reflect the actual activity to a great
extent, the findings suggest both a higher competition among centralised exchanges and
a low rate of users moving to other exchanges.

To further understand the functional mechanism of centralised exchange transactions, the


transaction overview18 of the most active exchange Binance is examined on Etherscan.
All incoming transactions to the Binance address have been sent by unlabelled EOAs.
Furthermore, every transaction transfers Ether to Binance. These Ether transfers likely
represent Ether deposits to the exchange platform, which are then used for trading against
other tokens cryptocurrencies. Since centralised exchanges, such as Binance, have their
own internal exchange and trading mechanisms, which are not tracked on the blockchain,
it is impossible to trace trades, such as how much Ether has been exchanged assets by
which address. A low incoming transaction count for centralised exchanges compared to
decentralised exchanges might suggest that regular users prefer to deposit fiat money off-
chain to fund trading activities, possibly indicating a proportion of users that are not
traceable from on-chain transactions to the exchange.

However, the incoming transaction count for decentralised exchanges accurately


represents the entire activity as these exchanges facilitate a peer-to-peer exchange of
cryptocurrencies without relying on a third party. Therefore, all relevant information
about trades is retrievable on the blockchain.

18
Binance incoming transactions:
https://etherscan.io/txs?a=0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be&f=3

35
Table 3: Activity Level of Top 10 Dapp

Top 10 Dapp categories by incoming transaction count from regular users (State of the Dapps labels)

User/Regular User
Category Transaction count User Transaction/User
Ratio

Exchange 2,719,736 83,831 32.4 0.51

Games 1,301,068 32,328 40.2 0.19

Finance 1,287,333 72,183 17.8 0.43

Marketplace 616,781 25,416 24.3 0.15

High Risk 547,199 17,089 32.0 0.10

Gambling 323,425 9,412 34.4 0.06

Security 208,472 32,145 6.5 0.19

Identity 168,903 6,599 25.6 0.04

Wallet 167,815 26,794 6.3 0.16

Social 117,051 15,852 7.4 0.10

Source: Own representation

Table 4: Activity Level of Exchanges (Etherscan Labels)

User/Regular User
Exchanges Transaction count User Transaction/User
Ratio

Centralised Exchanges 2,769,969 16,157 171,44 0.10

Decentralised Exchanges 11,894,999 100,980 117,80 0.61

Source: Own representation

6.2 Mining Pool Participants

After analysing the general transactional dynamics and the behaviour patterns focusing
on regular users, a subset of EOAs selected based on transaction activity, a more in-depth
analysis of a functional subgroup of users will be conducted. This section will provide an
overview of different findings regarding the user characterisation and transaction
behaviour of mining pool participants. Furthermore, I investigate patterns in the outgoing
transactions of mining pool participants to discover Ether spending behaviour. The results
will then be put into context to derive relevant implications about the decentralisation of
the Ethereum blockchain and the role of mining pools and their participants in the whole
network.

36
6.2.1 Mining Pool Structure

Mining pools are entities that aggregate the computational power of different participants
and participate in the consensus finding mechanism of the Ethereum blockchain with the
combined hashing power. The participation in these mining pools is motivated by the
block reward earning, which is, in most cases, shared among all participants in case of a
successfully mined block. This provides a stable revenue as the probability of successfully
mining new blocks with the aggregated computational power is higher than finding new
blocks as a single miner (Zamyatin et al., 9/20/2017 - 9/22/2017). The most common
reward payout scheme transfers the Ether from the block reward directly to an EOA
representing a mining pool participant (in the following also referred to as miner). This
direct on-chain payout scheme is used by major mining pools, such as Ethermine19,
Nanopool20 and F2Pool21. Hence, the mining pool participants (hereinafter referred to as
miners) can be identified by tracking the outgoing transactions from these mining pools.

As extensively described in Section 5.2, a total number of 72 addresses from 72 different


mining pools have been extracted from Etherscan. The mining pool addresses represent
the EOAs to which the block reward is transferred after successfully adding a new block
to the blockchain or finding an uncle block. Table 5 shows the top five mining pools by
the number of blocks mined until the 11,828,337th block, which corresponds to our data
extraction date. More than 60% of all blocks and 52% of all rewarded Ether until block
11828337 have been mined by the top five mining pools. The blocks mined by the
remaining 67 labelled mining pools account for 17.8% of all mined blocks. Since our
scraped labels account for 77.78% of all mined blocks and 72.74% rewarded Ether,
identifying the miners, which finally bring the Ether into circulation, facilitates analyses
on which mining pools are the most popular, how mined Ether is spent and how
centralised the distribution of rewarded Ether is. It should be noted that around 42.6
million mined Ether do not include the total number of around 72 million premined Ether,
which existed before any Ether was generated through the mining mechanism. Therefore,
the circulation of those Ether cannot be tracked by identifying miners.

19
https://ethermine.org/
20
https://nanopool.org/
21
https://www.f2pool.com/

37
Table 5: Top 5 of 72 Mining Pools by Blocks Mined

Reward Share represents the proportion of Ether received as a block or uncle reward to the total number of mined Ether (also mined by other
miners not in the labelled list) at the time of data extraction (42,602,285.34 ETH), Block Share represents the proportion of blocks mined to
the total number of blocks mined (also mined by other miners not in the labelled list) at the time of data extraction (11,828,337 blocks).

Address Reward in Reward Cum Rew. Blocks Cum. Block


Name Block Share
Shortened ETH Share Share Mined Share

0xea674fdd Ethermine 6,946,007.69 16.30% 16.30% 2,319,599 19.61% 19.61%

0x5a0b54d5 Spark Pool 3,908,615.72 9.17% 25.47% 1,709,956 14.46% 34.07%

0x52bc44d5 Nanopool 3,948,609.69 9.27% 34.74% 1,152,005 9.74% 43.81%

0x829bd824 F2Pool 3,041,903.31 7.14% 41.88% 1,090,311 9.22% 53.02%

0x2a65aca4 DwarfPool 4,471,246.81 10.50% 52.38% 940,219 7.95% 60.97%

… … … … … … … …

Sum 72 Mining Pools 30,987,574.22 72.74% - 9,192,332 77,78% -

Non-Labelled Miners 11,614,711.12 27.26% - 2,636,005 22.22% -

Total 42,602,285.34 100% - 11,828,337 100% -


Source: Own representation

6.2.2 Miner Identification

As mentioned earlier, the identification of miners in a mining pool is trivial for the
majority of the cases. Four out of the five top mining pools use a direct payout scheme,
and miners can be tracked by examining the recipients of outgoing Ether transfers.
However, miners at mining pools that use proxy addresses for payouts cannot be directly
inferred. Spark Pool22 , for example, transfers the Ether reward to a proxy address which
then initiates Ether transfers to the miners. In order to differentiate between miner
addresses and proxy addresses, the structure of individual payouts is examined for every
mining pool on Etherscan. Mining pools that employ the most common proportional
payout scheme transfer the mined Ether after every successfully found block. The amount
transferred is determined by the computational power distribution that the miners have
contributed. Given these assumptions, the direct proportional payout scheme leads to a
high number of regular payouts for every mined block. The number of payouts per found
block is equal to the number of participants who contributed to finding the corresponding
block. By examining the structure of outgoing transactions of Ethermine23, it is evident

22
https://www.sparkpool.com/
23
https://etherscan.io/txs?a=0xea674fdde714fd979de3edf0f56aa9716b898ec8&f=2

38
that this mining pool uses a direct payout scheme as the time difference between most
consecutive payouts is within one second and the value of Ether transfers is less than one
for most of the payouts. However, mining pools that payout to proxy addresses should
demonstrate a different pattern regarding the number of Ether transferred and the number
of unique recipient addresses. Spark Pool24 transfers Ether in the two- to the three-digit
range to only one specific proxy address within a two-digit minute range. This proxy
address then forwards the Ether in a structure similar to that of the Ethermine address. It
should be noted that Spark Pool has transitioned from a direct payout pattern to a proxy
pattern in the past.

To identify the proxy addresses out of the addresses that received Ether from mining
pools, a list of all Ether payouts together with their corresponding mining pools, the
recipient address, the transferred amount and their timestamp were constructed.

Under the assumption of homogeneity of miners regarding their contributed computation


power, an examination of the ratio between mined blocks and unique recipient addresses
should reveal proxy addresses. However, that approach does not consider the possibility
of a payout scheme change, which, for example, Spark Pool has gone through.

In the following, it is assumed that only one distributing proxy address is used by mining
pools that follow the proxy pattern. A two-step heuristic is then used on every outgoing
payout sent by mining pool addresses to identify proxy addresses:

1) The highest number of consecutive occurrences for every recipient address is


calculated for each mining pool. Consecutive occurrences for a specific recipient
address are counted when a specific mining pool transfers Ether to the same
recipient address in two or more consecutive payouts. The highest consecutive
occurrence count is then compared to the five recipient addresses with the next
lowest count. If the recipient address with the highest count has a count that is at
least fifty-fold higher than that of each of the next five addresses, it is considered
a reward-distributing proxy address.

It is assumed that miners at pools with direct payout patterns are rarely paid more
than once consecutively because every contributing miner is paid in each round.

24
https://etherscan.io/address/0x5a0b54d5dc17e0aadc383d2db43b0a0d3e029c4c

39
However, as proxy addresses receive the mined ether first, they should be the
predominant first-degree receivers and thus have a high consecutive occurrence
count.

2) After identifying the potential proxy addresses, the structure of outgoing


transactions for these addresses is examined on Etherscan. Assuming that proxy
addresses forward the Ether only to the miners, we do not consider a potential
address to be a proxy address if it has outgoing transactions to any kind of
exchanges25. Furthermore, it is not a proxy address if it has received Ether from
other mining pools or sources. Additionally, if the last 100 outgoing transactions
have a low variance regarding their Ether value transferred and transfer to more
than fifty different externally owned accounts, it is regarded as a distributing
proxy address.

According to the described heuristic, six out of 72 mining pools use a proxy address. The
proxy addresses are added to the mining pool addresses but removed from the recipients
of payouts to account for the varying payout patterns. The miners are identified by
querying the recipients of Ether transfers initiated by mining pool addresses. The query
resulted in a total number of 1,342,602 unique recipient addresses (miners) that have
received Ether from mining pool addresses or their proxy addresses until the data
extraction date. However, to limit the analysis on recently and sufficiently active mining
pool participants, only the miners within the set of regular users (defined in section 6.1.2)
are examined in the remainder of this section.

6.2.3 Mining Behaviour

Out of the 166,000 regular users, 18,600 users have received a transaction from a mining
pool or its proxy address and are considered miners. Table 6 presents general descriptive
figures of miners regarding their received Ether and the number of mining pools they
have used. Looking at the incoming transactions, it is evident that most of the miners use
their addresses solely as a recipient address to receive the mining pool payouts. This is
illustrated by the high mean for the ratio of incoming mining pool payouts to the total

25
Centralised or decentralised exchanges that are defined by the labels obtained from Etherscan

40
number of incoming transactions (Mining Pool Tx Received/Tx Received). Half of the
miners receive 99% of their incoming transactions from mining pools as reward payouts.
Under the assumption that an owner of a miner address generally engages with other
functionalities of the Ethereum blockchain, such as smart contracts and Dapps, than the
mining process as well, this finding might suggest a tendency to obfuscate transaction
activities by using additional addresses for other purposes. This hypothesis is further
examined in the following subsections 6.2.4 and 6.2.5 as the structure of miners’ outgoing
transactions will be analysed.

Although 72 labelled mining pools have been obtained from Etherscan, more than 50%
of the miners only use two or fewer mining pools (median of Unique Mining Pools is 2).
The mean and median for Unique Senders are 11 and 4, respectively. This finding seems
to indicate a lack of motivation for miners to change to other mining pools. This might
be due to entry barriers, such as software requirements or registration, for participating in
new mining pools. Zamyatin et al. (9/20/2017 - 9/22/2017) describe a Pay-Per-Last-N-
Shares payout scheme, which aims at preventing pool hopping by paying the reward only
after submitting a defined amount of contribution. However, the sole examination of on-
chain transaction data does not reveal the existence of such a scheme for a mining pool.

The average number of payouts that a miner receives is 361 (Mining Pool Tx Received).
Furthermore, half of the miners receive 480 (median) or fewer payouts from mining pools.
This indicates a higher in-degree for miners than non-mining users as the mean and
median for the number of incoming transactions for non-miners are only 78 and 27,
respectively. Figure 12 illustrates the distribution of incoming payouts for miners plotted
against the incoming transactions of non-mining users. It appears that, unlike the
incoming transactions for non-miners, the incoming payouts for miners do not follow a
heavy-tailed distribution. Although a small head, which might represent short-time
miners, exists until x = 90, the distribution (yellow plot) seems to have a local maximum
at around x = 180. This suggests that miners tend to participate at least to some specific
extent. The tail of the distribution indicates an even higher activity for a proportion of the
miners.

Since most reward payouts are proportionate to the contributed computational power, as
mentioned earlier, miners are encouraged to maximise the time they spend participating
in mining pools (Zamyatin et al., 9/20/2017 - 9/22/2017). This is further underlined by
the acquisition cost of required mining hardware, whose computational power often

41
exceeds that of ordinary computers. The amortisation period for each miner’s mining
hardware should vary according to the hardware and thus the miner’s computational
power.

To validate whether the propensity for receiving a minimum number of rewards as a


miner is also reflected in the period of activity, the distribution for Active Days will be
examined and compared to that of non-miners. If the time between a payout is roughly
equal for most miners, a distribution similar to a Gaussian distribution can be expected.
Figure 13 illustrates that most non-miners have been active for 500 days and less,
consistent with the analysis in subsection 6.1.2. On the other hand, most miners seem to
be active for, on average, around 800 to 1300 days. The exact mean and median are 984
and 1037, respectively. Since the Idle Time of both groups is less than 30 days, as defined
earlier, one can conclude that most miners have started to participate in mining pools in
the year 2017. In contrast, most non-miners have been first active only within the last 300
days before the data extraction date. To test whether the average Active Days differ
significantly from miners and non-miners, a two-sample Welch’s t-test has been
conducted. The Welch’s t-test is used as an alternative to the Student’s t-test for samples
with unequal sizes and variances. For the sample size of 18,617 for the miner group and
146,893 for the non-miner group, the test results indicated a statistically significant
difference between the two means (t = -172.44, α = 0.05, p = 0.00). Thus, from a time
perspective, miners are a group of users that differ significantly from non-miners, as they
have been active for a more extended period.

To gain further insights into the payouts’ internal structure, the histogram for the payout
value distribution in Ether is examined (Average Received Block Reward). 75% of all
miners receive payouts that transfer on average 0.28 Ether or less. Under the assumption
that Ether rewards are proportional to the corresponding computational power, the above-
mentioned figures suggest a relatively evenly distributed computational power among
miners in mining pools. Figure 14 illustrates the histogram for average payouts value per
address in Ether from zero to one Ether divided into 100 bins. Although the distribution
shows that most miners are concentrated around the median at 0.13 Ether, two distinct
spikes can be observed between the range of 0.0 and 0.3 Ether. More than 2300 miners
receive an average payout between 0.05 and 0.06 Ether. The second peak occurs between
0.20 and 0.21 Ether with more than 800 miners. These spikes likely represent

42
accumulations of miners that use similar hardware and thus contribute similar
computational power to the mining pools.

Table 6: Descriptives about Miners

Average Payout in Unique Mining Ratio Payouts to all


Payouts Received
ETH Pools Used Inc. Txn

Mean 361.33 0.65 1.79 85%

Standard
405.73 11.01 1.13 28%
Deviation

Min 1 0.0 1.0 0.00

Median 480.0 0.13 2.0 99%

Max 19,165.0 904.82 16.0 100%

Source: Own representation

Figure 12: Incoming Transactions for Miners and Non-Miners

Source: Own representation

43
Figure 13: Active Days of Miners and Non-Miners

Source: Own representation

Figure 14: Average Payout Value in Ether

Source: Own representation

6.2.4 Transaction Behaviour

As the results of the previous analyses on mining pool participants’ mining behaviour
suggest, their activity on the blockchain, based on the prevailing incoming reward
payouts, is primarily defined by contributing to the block finding process. Furthermore,
the significant role of mining pools and their mined Ether has been shown, underlined by
the majority proportion of the sum of mined Ether generated from those mining pools. To
further investigate miners’ transactional behaviour and, most importantly, if and how
miners put their mined Ether into circulation, the previously applied analyses on the
different variables around the smart contract usage, ERC20 token transfers and

44
transaction value will be conducted in the context of mining pool participants. As with
the analysis of regular users, the study of miners will be mainly guided by the examination
of different histograms to uncover patterns in the outgoing transactions. To highlight
miners’ distinct behaviours and thus the differences between miners and non-miners, the
subset of regular users that does not mine, the two groups’ plots will be compared. Table
7 illustrates the comparison of the mean and median between both mentioned groups and
shows the significance of a Welch’s t-test for several variables. All variables except one
differed significantly regarding their mean.

In Figure 15, the histogram for the SC Ratio for miners and non-miners is shown with
overlapping bars. Consistent with the results in 6.1, non-miners send their transactions
primarily to smart contracts (mean = 0.83, median = 0.93). Miners, however, exhibit an
opposite behaviour. The majority of miners address their outgoing transactions to other
EOAs (mean = 0.24, median = 0.00). These findings further confirm the previous
conjecture that miners do not use the same reward collection address for interacting with
other blockchain functionalities but forward their rewarded Ether to other EOAs first. As
a consequence of the low SC Ratio for miners, the vast majority of their transactions
represent Ether transfers. Figure 16 illustrate the relatively low ratio of zero-value
transactions (Zero Tx Ratio) of miners (mean = 0.10, median = 0.00) compared to non-
miners (mean = 0.69, median = 0.74).

Emphasised by these results, it appears that the miner addresses act as proxy addresses
that collect mining rewards from mining pools and forward them to other addresses. In
order to determine whether the outgoing transaction activity of miners is well spread
among different recipients or whether there exist major recipients, the distribution for the
number of different recipients (Unique Receiver) is plotted in Figure 17 for the range one
to 200. Although the visual impression of the non-miner plot suggests a relatively evenly
distribution for 50% of the addresses sending transactions to 46 different recipients or
fewer, the yellow depiction for the miner group reveals a significantly different pattern.
More than 50% of all miners send their transactions and thus their Ether to only fewer
than four different addresses. Despite the apparent difference of both plots conveyed by
the visual impression, a Welch’s test shows no significant difference in their means. This
result is due to the similar spread of the number of unique recipients, which is limited to
the equal maximum number of outgoing transactions for both groups.

45
Although the majority of the miners send their transaction to very few addresses, a large
proportion, 25%, of the miners send their transactions to more than 64 different recipients,
indicating a more complex behaviour. By manually analysing the outgoing transactions
of the five miners26 with the highest number of unique recipients, a similar transaction
pattern to that of mining pool proxy addresses, defined in 6.2.2, can be observed. They
seem to distribute or forward the rewarded Ether to a high number of different EOAs.
Furthermore, the amount of transferred Ether per transaction lies in the same range
observed for the mining pool proxy addresses. These addresses might represent proxy
addresses of mining pools that have not been recognised by the heuristic applied in
Section 6.2.2.

Our findings regarding miners’ transaction behaviour indicate that mining pool
participants highly differ from non-miner addresses regarding their smart contract usage,
their recipients they transfer Ether to and their token transfer activity. A large majority of
these reward collecting addresses forward their Ether to other EOAs and thus do not
interact with smart contracts or ERC20 token contracts. This result leaves questions about
the extent of diversity of the recipient of those Ether transfers and the subsequent
utilisation of mined Ether.

Table 7: Transaction Behaviour Miner vs Non-Miners (Mean & Median)

Token Transfers
SC Ratio Zero Tx Ratio Unique Recipients Out/In Ratio
Incoming

Miner 0.24/0.00 0.10/0.00 55.85/3.00 0.47/0.48 60.38/3.0

Non-Miner 0.83/0.93 0.69/0.74 66.87/46.00 0.87/0.91 268.62/137.0

Welch’s t-test
< 0.01 < 0.01 > 0.05 < 0.01 < 0.01
(p-value)

Source: Own representation

26
Miner addresses: 0x3598ac8dffa22994f0b18087035647c7a09fc389,
0xe8c8e1291ed1949da844b38ff1860a6a4c96e459,
0x57efa1377ef6c0363327ee6cffe46926e5fb682b,
0x1670f6ceaea120fe4e72c9edba3883ae7941401c,
0x7f01921af6dbcf81086c5cf88aa7c93f665278

46
Figure 15: Smart Contract Ratio Miners vs Non-Miners

Source: Own representation

Figure 16: Zero-Value Ratio Miners vs Non-Miners

Source: Own representation

Figure 17: Number of Unique Recipients Miners vs Non-Miners

Source: Own representation

47
6.2.5 Distribution of Mined Ether

As it could be shown that the rewarded Ether from mining pools account for more than
three-fourths of the mined Ether supply on the Ethereum blockchain, generating insights
on the structure of receiving addresses helps to understand the distribution hence
centralisation of the mined Ether over the remaining network. Therefore, all addresses (in
the following referred to as secondary receivers) that receive Ether transfers from miners
will be identified. Afterwards, the Ether distribution among these secondary receivers will
be analysed to reveal specific Ether accumulation patterns.

To further examine the proportion of received Ether rewards that are actually forwarded
to other addresses, the following analysis aims at comparing the sum of received Ether to
the sum of sent Ether. By calculating the total value of Ether transferred from all labelled
mining pools and their proxy addresses, the sum of received Ether can be determined for
the 18,617 miners who are within the set of regular users (Total Amount of Ether from
Mining Pools). 5,169,348 Ether have been transferred to miners until the extraction date.
In contrast, 12,421,736 Ether has been transferred from these miners to other addresses.
This substantial difference between the incoming and outgoing amount of Ether
conclusively implies that the Ether transferred from the labelled mining pools accounts
for only a proportion of the sum of outgoing Ether. Thus, it can be inferred that miner
also receive Ether from unlabelled mining pools, exchanges or other EOAs. Table 8
shows the deviation of the sum of outgoing Ether from the sum of incoming Ether for
different groups of miners sorted by the proportion of incoming mining pool transactions
to the total number of incoming transactions. The row deviation indicates the ratio of the
difference of incoming and outgoing Ether to the incoming Ether. It can be observed that
miners who use their address to receive transactions from other entities than mining pools
send more Ether than they have received as mining reward. Although this group of miners
represents a relatively small fraction of miners, they account for an unproportionally high
amount of outgoing Ether. Addresses exclusively used for participating in mining pools
(Incoming Mining Pool Transactions / All Incoming Transactions = 1.00 in Table 8,
column 4) send approximately as much Ether as they receive as the deviation is close to
0%.

48
Table 8: Comparison of Incoming and Outgoing Ether for Miners

Comparison of different groups of miners divided by the proportion of incoming mining pool transactions (Mining Pool Tx Received) to the
total number of received transactions (Tx Received).

Mining Pool Tx Received / Tx Received (Percentile)

1 2 3 4 5

≥ 0.00 & < 0.88 ≥ 0.88 & < 0.99 ≥ 0.99 & <1.00 = 1.0 All Miners
(~25%) (~50%) (~75%) (~100%) (0.00 – 1.00)

Group Size 4,558 4,874 4,080 5,105 18,617

Inc. Ether from


881,220 2,290,909 1,329,037 668,181 5,169,348
Mining Pools

Outg. Ether 6,803,152 3,555,433 1,394,898 668,251 12,421,736

Deviation 672,02% 55,20% 4,96% 0.00% 140,30%

Source: Own representation

While it can be assumed that the received Ether rewards are to some extent included in
the outgoing transactions of all miners, it can obviously not be determined whether a
received Ether reward has been used in a specific outgoing transaction. Therefore, only
the secondary receivers receiving Ether from miners with a Mining Ratio of 0.99 or higher
will be considered (Table 8, column 3 and 4), as it can be assumed that most of the
outgoing Ether originates from incoming mining pool reward payouts. This group of
9,185 miners constitute 49.3% of all miners considered and received 38.6% of all mining
rewards. The total Ether they send to secondary receivers amounts to 2,063,150. To
analyse the structure of secondary receivers, I used a query to extract all outgoing
transactions and their recipients for the selected subgroup of miners. The considered
9,185 miners have transferred Ether in 2,904,728 transactions to 432,603 different
secondary receivers.

Although it seems that the miners distributed the rewarded Ether to a higher number of
secondary receivers, a view on the most prominent addresses by Ether sent (miners) and
received (secondary receivers) results in a different impression. Table 9 provides the
number of top miners by Ether sent and top secondary receivers by Ether received
together with their share of the total Ether sent by all miners. The eighty top miners by
total Ether sent have sent more than 50% of the 2,063,150 Ether transferred to secondary
receivers. On the contrary, only 12 of the top secondary receivers accumulate 50% of the
incoming Ether. Table 11 shows the top secondary receivers that, together, receive more
than 50% of all the sent Ether. Six of these secondary receivers are exchanges. In total,

49
more than 35% of all the Ether are sent to centralised exchanges. The remaining
proportion is primarily distributed to other EOAs (Table 10).

Table 9: Cumulative Distribution of Ether Sent and Received by Top Addresses

The values in the column Miners represent the number of top miners by Ether sent to secondary receivers that account for the corresponding
proportion to the total amount of Ether sent by the miners considered (2,063,150 Ether). The values in the column Secondary Receivers
represent the number of top receivers by Ether received from miners that account for the corresponding proportion to the total amount of
Ether sent by the miners considered (2,063,150 Ether).

Percentile Miners (9,185) Secondary Receivers (432,603)

25% 5 5

50% 80 12

75% 777 47

Source: Own representation

Table 10: Ether and Address Distribution Secondary Receivers

Address Type Ether Addresses

Centralised Exchanges 738,348 35.8% 69 0.0%

Decentralised Exchanges 124 0.0% 7 0.0%

Smart Contracts 77,339 3.7% 3,285 0.8%

Externally Owned Accounts 1,247,339 60.5% 429,242 99.2%

Source: Own representation

50
Table 11: Top 12 Secondary Receivers by Total Ether Received

Address Label ETH Received Cum.

0x49b21bdfa30333858956342f4028ce72e37eb851 - 152,880 7.4% 7.4%

0x32be343b94f860124dc4fee278fdcbd38c102d88 Poloniex 130,814 6.3% 13.8%

0x209c4784ab1e8183cf58ca33cb740efbf3fc18ef Poloniex 2 (Contract) 118,451 5.7% 19.5%

0xf2cb2371191cdf9672f60b81e6218e22b4934dd0 - 101,469 4.9% 24.4%

0xfa52274dd61e1643d2205169732f29114bc240b3 Kraken (Contract) 98,345 4.8% 29.2%

0xf0d9fcb4fefdbd3e7929374b4632f8ad511bd7e3 - 87,967 4.3% 33.4%

0x4ef5c587e53c66cdfbc6588e29dcb100a5859263 - 81,596 4.0% 37.4%

0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be Binance 71,484 3.5% 40.9%

0x1f973b233f5ebb1e5d7cfe51b9ae4a32415a3a08 - 65,113 3.2% 44.0%

0x0d0707963952f2fba59dd06f2b425ace40b492fe Gate.io 54,322 2.6% 46.6%

0x60d0cc2ae15859f69bf74dadb8ae3bd58434976b ZB.com 50,185 2.4% 49.1%

0xebfaec3c043bb8098fcf01d7c4a35fac6676ebe5 - 44,328 2.1% 51.2%

Source: Own representation

Although a substantial proportion of mined Ether is directly sent to centralised exchanges


by miners, the remaining majority is distributed over almost 430,000 EOAs. To assess
whether these addresses exhibit a similar transactional behaviour to that of regular users
examined in Section 6.1, the number of outgoing transactions and smart contract
transactions were extracted for those EOAs. The analysis showed that 75% of these
almost 430,000 secondary receivers sent only two transactions or fewer. Furthermore,
only 31% of these addresses have sent a smart contract transaction (low SC Ratio). These
findings raise questions about the relationship between miners and these low-activity
addresses and the role of these low-activity addresses on the blockchain. Furthermore, the
results suggest that a significant proportion of low-activity accounts filtered out in Section
6.1.2 receive their Ether from a lower number of miners. Therefore, the total number of
EOAs does probably not reflect a realistic number for actual users of the Ethereum
blockchain.

51
7 Conclusion
In the following, the paper is concluded by highlighting the key findings of the data
analysis. Furthermore, this section discusses to what extent the obtained results have
answered the research questions. Then, the limitations of this study are discussed, and
potential directions for future studies are provided.

7.1 Overall Summary and Key Findings

This paper represents one of the first attempts at identifying and describing transactional
user behaviour patterns on the Ethereum blockchain. As prior research on quantitative
analyses of on-chain data has rather focused on general transaction activity levels on a
network level, this study was designed to examine the usage of different functionalities
and applications, such as smart contracts, decentralised applications and mining pools,
from a user perspective. The analysis has been conducted on the on-chain data extracted
from the cloud service Google Big Query and included data between July 2015 and
February 2021. Furthermore, labelled data for decentralised applications, mining pools
and exchanges were used to augment the existing on-chain data. After aggregating and
transforming the data, new variables derived from the existing data have been created to
form the data foundation for the subsequent analyses. To characterise the users and their
behaviour patterns on the blockchain, I performed investigations on the data with the help
of general statistics and graphical representations. These investigations were focused on
the distribution of different variables over the examined set of users. The users were
described along three dimensions: transactional activity level, smart contract and
application interaction, and mining process engagement.

In order to analyse the existence of different user groups, a granular approach has been
used by examining the activity level of all EOAs on the blockchain. Regarding the number
of outgoing transactions, the results show that more than 47% of all EOAs have only sent
one transaction, and more than 93% have sent fewer than ten transactions. Furthermore,
almost two-thirds of all accounts have been active for less than one day. The activity level
follows a heavy-tailed distribution with a high occurrence of low-activity addresses and
a low occurrence of high-activity addresses. Additional analyses of the historical price
chart of Ether in USD revealed a significant correlation between the price and the number
of low-activity addresses created, possibly indicating a relationship between volatility and

52
participation of new users. A subset of regular users, whose transaction behaviour is
analysed in more detail, has been identified by excluding low-activity addresses and high-
activity addresses belonging to entities like wallets and exchanges.

To further examine the usage pattern of applications, such as smart contracts and
exchanges, the structure of outgoing transactions was considered. More than 90% of all
regular users, which have made 124 to 1894 outgoing transactions, have sent a transaction
to at least one smart contract. Over 50% of all regular users have sent 90% of their
transactions to smart contracts. This high proportion of smart contract interactions comes
along with a high ratio of zero-value transactions, transactions with no Ether value
transferred, indicating that Ethereum is not solely used for the purpose of currency
transfers. More than 60% of regular users use decentralised exchanges to trade
cryptocurrencies. The analysis of Dapp interaction revealed that users engage in various
applications, albeit with a tendency to exchange, finance and gaming Dapps.

Besides using smart contracts and other functionalities of Ethereum, users also engage in
the blockchain’s mining process to a significant extent. By identifying the most active
block miners, it has been shown that mining pools account for more than three-fourths of
the mined blocks and that more than 10% of regular users participate in those pools. These
mining pool participants have been active for a significantly longer period than other
users. This difference is also reflected in the higher number of outgoing transactions.
Regarding the transactional behaviour pattern, it could be shown that miners primarily
send Ether to other EOAs and centralised exchanges. 75% of the mined Ether rewarded
to participants that use their addresses primarily for mining is sent to only a fraction of
the initial number of miners (9185 senders to 47 receivers). Centralised exchanges receive
around one-third of the mined Ether, possibly facilitating the exchange with Fiat
currencies or other cryptocurrencies. Although most mined Ether gets forwarded and
concentrated at only a few exchanges and other EOAs, the remaining Ether is distributed
over almost fifty times as many addresses (9185 senders to 410,000 receivers). These
secondary Ether receivers exhibit low-activity behaviour. These findings underpin the
differences in transaction behaviour and address usage between miners and non-miners.

53
7.2 Research Limitations

Like most exploratory data analysis studies, this study presents results and interpretations
whose validity is limited in some regards. Since the analysis of all different user groups
from the inception date would require considering the entirety of EOAs on the blockchain
and thus more complex analyses, certain thresholds have been defined to limit the study
to a subset of users. These thresholds were determined by the head/tail breaks clustering
algorithm that accounted for the underlying distribution pattern. The further exploratory
investigations were based on the assumption that end-users of the blockchain were
sufficiently well represented by the defined subset of users and that end-users only
primarily use one address to interact with the blockchain. While this might be valid for a
proportion of users, the analysis on mining pool participants has shown a more complex
address usage pattern. The exclusion of certain groups of EOAs, such as low-activity
accounts, can lead to an incomplete picture of the user base and thus distorted analysis
results if users with multiple addresses account for a substantial proportion of users.

The labelled data for decentralised applications, exchanges and mining pool was obtained
from online directories and block explorers that heavily depend on the input of third
parties. Therefore, a proportion of unlabelled yet relevant addresses was not included in
the analysis. A Dapp developer might, for example, decide not to promote his Dapp on
State of the Dapps or to omit specific addresses associated with his Dapp on the Dapp
page. This limitation would underestimate the Dapp usage in general or in particular for
specific Dapps. As centralised and decentralised exchanges often use a complex system
of intertwined smart contracts and EOAs, the limited set of labelled exchange addresses
might not account for all relevant addresses.

The identification of mining pool participants relied solely on the assumption that mining
pools primarily follow a direct round-based pay-per-share payout scheme. Following this
reasoning, the analysis identified the miners by examining all outgoing transactions from
mining pools. However, to account for any proxy addresses used to pay out to miners, a
heuristic has been introduced. Although several proxy addresses have been identified, the
inclusion in the further analyses is not based on ground truth. Lastly, no differentiation
has been made between regular users and potential institutional miners with great
computation power regarding miners’ mining activity. This could have an impact on the
observed miner transaction behaviour.

54
7.3 Directions for Further Research

While this study focused on transaction behaviour patterns of different user groups on the
Ethereum blockchain, it has covered the token transfer activity to only a minimal extent.
Since the activity of receiving and sending tokens does not necessarily involve the direct
interaction with the token contract by the sender or receiver, it is not adequately reflected
by the ERC20 transaction activity (ERC20 Tx Sent). An investigation on the ERC20 token
usage behaviour on a token transfer level could reveal more distinct user groups as the
functionalities of tokens on the Ethereum blockchain become more diverse over time.
Different areas of interest could include analysing users who participated in specific
ICOs, initial coin offerings, or hold tokens from a specific airdrop.

This study provides an exploratory analysis of user groups and behaviour patterns on the
Ethereum blockchain. The examination of the underlying address structure can serve as a
foundation for further quantitative analyses to reveal more distinct user groups.
Unsupervised machine learning could be used to first cluster addresses of the same owner
and then cluster distinct users based on different transactional features that have been
presented in this study. Not only features regarding smart contract or ERC20 token
contract interactions could be used. One could cluster EOAs with labelled data, thus
differentiate between users who use different exchanges or Dapps, for example.

The analysis of mining pool participants and their transaction behaviour exposed an Ether
distribution pattern that included the transfer of mined Ether to a high number of
secondary addresses. This finding raises the question of the extent to which these
secondary addresses are associated with the initial senders. The development of address
clustering heuristics has been discussed in Section 3. Since the number of addresses that
can be created and used by a single user or entity is not technically limited, controlling
multiple EOAs distorts analysis findings and complicates the analysis of user behaviour
patterns. A more holistic approach regarding possible address clusters can provide a better
picture of the entire user base and thus a better base for assessing the network health. This
approach may include the analysis of complete transaction graphs that examine Ether
transfers from the reward payout to cover subsequent transfer paths.

With the Ethereum blockchain’s ongoing transition to Ethereum 2.0, several upgrades
that are being rolled out incrementally introduce significant changes to different aspects
of the blockchain. Implementing new features such as Sharding and Proof-of-Stake

55
mechanism increases general scalability and economic sustainability (ethereum.org,
2021). The shift from Proof-of-Work to Proof-of-Stake renders the use of computational
power for the block finding process obsolete. A set of validators will be responsible for
the consensus finding process by locking a specific amount of Ether into a deposit. The
Ether distribution and centralisation play an essential role in Ethereum 2.0 as validators’
votes depend on the amount of Ether staked (Ethereum Foundation, 2021). Similar to
mining pools, staking pools will facilitate participation in the consensus finding process
for individuals. Since it can be expected that a significant proportion of the block reward
under Proof-of-Stake will also be distributed to a low number of receiving addresses,
applying our findings of the mining pool participants on the future stake pool participants
can lead to relevant implications for future developments. Although there exist
approaches to disincentivise centralisation economically, examining the stake pool
participants and their Ether distribution patterns can give insights into the network
centralisation and how it can be regulated by future mechanisms of Ethereum 2.0.

56
References

Anoaica, A., & Levard, H. (2/26/2018 - 2/28/2018). Quantitative description of internal


activity on the Ethereum public blockchain. In 2018 9th IFIP International
Conference on New Technologies, Mobility and Security (NTMS) (pp. 1–5). IEEE.
https://doi.org/10.1109/NTMS.2018.8328741
Béres, F., Seres, I. A., Benczúr, A. A., & Quintyne-Collins, M. (5/28/2020). Blockchain
is watching you: Profiling and deanonymizing Ethereum users.
https://arxiv.org/pdf/2005.14051
Bok Consulting Pty Ltd. (2016). Ethereum network attacker’s IP address is traceable.
Bok Consulting Pty Ltd. https://www.bokconsulting.com.au/blog/ethereum-network-
attackers-ip-address-is-traceable/#accountbloat
Buterin, V. (2021, February 9). Ethereum whitepaper. ethereum.org.
https://ethereum.org/en/whitepaper/
Cai, W., Wang, Z., Ernst, J. B., Hong, Z., Feng, C., & Leung, V. C. M. (2018).
Decentralized applications: The blockchain-empowered software system. IEEE
Access, 6, 53019–53033. https://doi.org/10.1109/ACCESS.2018.2870644
Casino, F., Dasaklis, T. K., & Patsakis, C. (2019). A systematic literature review of
blockchain-based applications: Current status, classification and open issues.
Telematics and Informatics, 36, 55–81. https://doi.org/10.1016/j.tele.2018.11.006
Chen, T., Zhu, Y., Li, Z., Chen, J., Li, X., Luo, X., Lin, X., & Zhange, X. (2018).
Understanding Ethereum via graph analysis. In IEEE INFOCOM 2018 - IEEE
Conference on Computer Communications (pp. 1484–1492). IEEE.
https://doi.org/10.1109/INFOCOM.2018.8486401
Chen, W., Zhang, T., Chen, Z., Zheng, Z., & Lu, Y. (2020). Traveling the token world:
A graph analysis of Ethereum ERC20 token ecosystem. In Y. Huang (Ed.), ACM
Digital Library, Proceedings of The Web Conference 2020 (pp. 1411–1421).
Association for Computing Machinery. https://doi.org/10.1145/3366423.3380215
Cong, L. W., He, Z., & Li, J. (2021). Decentralized mining in centralized pools. The
Review of Financial Studies, 34(3), 1191–1235. https://doi.org/10.1093/rfs/hhaa040
Di Angelo, M., & Salzer, G. (2020). Tokens, types, and standards: Identification and
utilization in Ethereum. In J. Xu (Ed.), 2020 IEEE International Conference on
Decentralized Applications and Infrastructures: Proceedings : 3-6 August 2020,
Oxford, United Kingdom (pp. 1–10). Conference Publishing Services, IEEE
Computer Society. https://doi.org/10.1109/DAPPS49028.2020.00001
Ermilov, D., Panov, M., & Yanovich, Y. (2017). Automatic Bitcoin address clustering.
In 2017 16th IEEE International Conference on Machine Learning and Applications
(ICMLA 2017): Cancun, Mexico, 18-21 December 2017 (pp. 461–466). IEEE.
https://doi.org/10.1109/ICMLA.2017.0-118
Ethereum Foundation. (2021, April 13). Proof-of-Stake-FAQs. Ethereum Wiki.
https://eth.wiki/en/concepts/proof-of-stake-faqs
ethereum.org. (2021, April 13). The ETH2 upgrades. ethereum.org.
https://ethereum.org/en/eth2/
Ferretti, S., & D’Angelo, G. (2020). On the Ethereum blockchain structure: A complex
networks theory perspective. Concurrency and Computation: Practice and
Experience, 32(12). https://doi.org/10.1002/cpe.5493

57
Glomann, L., Schmid, M., & Kitajewa, N. (2019). Improving the blockchain user
experience - An approach to address blockchain mass adoption issues from a human-
centred perspective. In T. Ahram (Ed.), Advances in Intelligent Systems and
Computing. Advances in Artificial Intelligence, Software and Systems Engineering:
Proceedings of the AHFE 2019 International Conference on Human Factors in
Artificial Intelligence and Social Computing, the AHFE (Vol. 965, pp. 608–616).
Springer. https://doi.org/10.1007/978-3-030-20454-9_60
Guo, D., Dong, J., & Wang, K. (2019). Graph structure and statistical properties of
Ethereum transaction relationships. Information Sciences, 492, 58–71.
https://doi.org/10.1016/j.ins.2019.04.013
Harlev, M. A., Sun Yin, H., Langenheldt, K. C., Mukkamala, R., & Vatrapu, R. (2018).
Breaking bad: De-anonymising entity types on the Bitcoin blockchain using
supervised machine learning. In T. Bui (Ed.), Proceedings of the Annual Hawaii
International Conference on System Sciences, Proceedings of the 51st Hawaii
International Conference on System Sciences. Hawaii International Conference on
System Sciences. https://doi.org/10.24251/HICSS.2018.443
Jiang, B. (2013). Head/tail breaks: A new classification scheme for data with a heavy-
tailed distribution. The Professional Geographer, 65(3), 482–494.
https://doi.org/10.1080/00330124.2012.700499
Lee, X. T., Khan, A., Sen Gupta, S., Ong, Y. H., & Liu, X [Xuan] (2020).
Measurements, analyses, and insights on the entire Ethereum blockchain network. In
Y. Huang (Ed.), ACM Digital Library, Proceedings of The Web Conference 2020
(pp. 155–166). Association for Computing Machinery.
https://doi.org/10.1145/3366423.3380103
Meiklejohn, S., Pomarole, M., Jordan, G., Levchenko, K., McCoy, D., Voelker, G. M.,
& Savage, S. (2013). A fistful of Bitcoins. In K. Papagiannaki (Ed.), ACM Digital
Library, Proceedings of the 2013 conference on Internet measurement conference
(pp. 127–140). ACM. https://doi.org/10.1145/2504730.2504747
Monaco, J. V. (2015). Identifying Bitcoin users by transaction behavior. In I. A.
Kakadiaris, A. Kumar, & W. J. Scheirer (Eds.), SPIE Proceedings, Biometric and
Surveillance Technology for Human and Activity Identification XII (p. 945704).
SPIE. https://doi.org/10.1117/12.2177039
Motamed, A. P., & Bahrak, B. (2019). Quantitative analysis of cryptocurrencies
transaction graph. Applied Network Science, 4(1). https://doi.org/10.1007/s41109-
019-0249-6
Oliva, G. A., Hassan, A. E., & Jiang, Z. M. (2020). An exploratory study of smart
contracts in the Ethereum blockchain platform. Empirical Software Engineering,
25(3), 1864–1904. https://doi.org/10.1007/s10664-019-09796-5
Presthus, W., & O’Malley, N. O. (2017). Motivations and barriers for end-user adoption
of Bitcoin as digital currency. Procedia Computer Science, 121, 89–97.
https://doi.org/10.1016/j.procs.2017.11.013
Sako, K., Matsuo, S., & Meier, S. (2021, February 7). Fairness in ERC token markets:
A case study of CryptoKitties. https://arxiv.org/pdf/2102.03721
Schaupp, L. C., & Festa, M. (2018). Cryptocurrency adoption and the road to
regulation. In M. Janssen (Ed.), Proceedings of the 19th Annual International
Conference on Digital Government Research: Governance in the Data Age (pp. 1–
9). ACM. https://doi.org/10.1145/3209281.3209336

58
Somin, S., Gordon, G., & Altshuler, Y. (2018, May 31). Social signals in the Ethereum
trading network. https://arxiv.org/pdf/1805.12097
Sovbetov, Y. (2018). Factors influencing cryptocurrency prices: Evidence from Bitcoin,
Ethereum, Dash, Litcoin, and Monero. Journal of Economics and Financial Analysis,
2, 1–27.
Sun, H., Ruan, N., & Liu, H. (2019). Ethereum analysis via node clustering. In Birukou
& Liu (Eds.), Lecture Notes in Computer Science. Network and System Security
(Vol. 11928, pp. 114–129). Springer International Publishing.
https://doi.org/10.1007/978-3-030-36938-5_7
Victor, F. (2020). Address clustering heuristics for Ethereum. In J. Bonneau & N.
Heninger (Eds.), Lecture Notes in Computer Science. Financial Cryptography and
Data Security: 24th international conference, fc (Vol. 12059, pp. 617–633).
SPRINGER NATURE. https://doi.org/10.1007/978-3-030-51280-4_33
Victor, F., & Lüders, B. K. (2019). Measuring Ethereum-based ERC20 token networks.
In Goldberg & Birukou (Eds.), Lecture Notes in Computer Science. Financial
Cryptography and Data Security (1st ed., Vol. 11598, pp. 113–129). Springer
International Publishing. https://doi.org/10.1007/978-3-030-32101-7_8
Wood, G. Ethereum: A secure decentralised generalised transaction ledger EIP-150
REVISION.
Wu, K. (2019, February 13). An empirical study of blockchain-based decentralized
applications. https://arxiv.org/pdf/1902.04969
Wu, K., Ma, Y., Huang, G., & Liu, X [Xuanzhe] (2019). A first look at blockchain‐
based decentralized applications. Software: Practice and Experience. Advance
online publication. https://doi.org/10.1002/spe.2751
Zamyatin, A., Wolter, K., Werner, S., Harrison, P. G., Mulligan, C. E. A., &
Knottenbelt, W. J. (9/20/2017 - 9/22/2017). Swimming with fishes and sharks:
Beneath the surface of queue-based Ethereum mining pools. In 2017 IEEE 25th
International Symposium on Modeling, Analysis, and Simulation of Computer and
Telecommunication Systems (MASCOTS) (pp. 99–109). IEEE.
https://doi.org/10.1109/MASCOTS.2017.22
Zanelatto Gavião Mascarenhas, J., Ziviani, A., Wehmuth, K., & Vieira, A. B. (2020).
On the transaction dynamics of the Ethereum-based cryptocurrency. Journal of
Complex Networks, 8(4), Article cnaa042. https://doi.org/10.1093/comnet/cnaa042
Zheng, P., Zheng, Z., Wu, J., & Dai, H.‑N. (2020). XBlock-ETH: Extracting and
exploring blockchain data from Ethereum. IEEE Open Journal of the Computer
Society, 1, 95–106. https://doi.org/10.1109/OJCS.2020.2990458

59
Appendix A

Google Big Query Data Tables

The following tables include Google Big Query tables that were used for the analysis.
Relevant fields are highlighted in blue. Table identifiers have been modified for reasons
of conciseness (bigquery-public-data.crypto_ethereum to

crypto_ethereum.balances

Field name Type Mode Description

address STRING REQUIRED Address

eth_balance NUMERIC NULLABLE Ether balance

Source: Google Big Query

60
crypto_ethereum.blocks

Field name Type Mode Description

The timestamp for when the block was col-


timestamp TIMESTAMP REQUIRED
lated

number INTEGER REQUIRED The block number

hash STRING REQUIRED Hash of the block

parent_hash STRING NULLABLE Hash of the parent block

nonce STRING REQUIRED Hash of the generated proof-of-work

sha3_uncles STRING NULLABLE SHA3 of the uncles data in the block

logs_bloom STRING NULLABLE The bloom filter for the logs of the block

transactions_root STRING NULLABLE The root of the transaction trie of the block

state_root STRING NULLABLE The root of the final state trie of the block

receipts_root STRING NULLABLE The root of the receipts trie of the block

The address of the beneficiary to whom the


miner STRING NULLABLE
mining rewards were given

difficulty NUMERIC NULLABLE Integer of the difficulty for this block

Integer of the total difficulty of the chain un-


total_difficulty NUMERIC NULLABLE
til this block

size INTEGER NULLABLE The size of this block in bytes

extra_data STRING NULLABLE The extra data field of this block

gas_limit INTEGER NULLABLE The maximum gas allowed in this block

The total used gas by all transactions in this


gas_used INTEGER NULLABLE
block

transaction_count INTEGER NULLABLE The number of transactions in the block

Source: Google Big Query

61
crypto_ethereum.contracts

Field name Type Mode Description

address STRING REQUIRED Address of the contract

bytecode STRING NULLABLE Bytecode of the contract

function_sighashes STRING REPEATED 4-byte function signature hashes

is_erc20 BOOLEAN NULLABLE Whether this contract is an ERC20 contract

is_erc721 BOOLEAN NULLABLE Whether this contract is an ERC721 contract

Timestamp of the block where this contract


block_timestamp TIMESTAMP REQUIRED
was created

Block number where this contract was cre-


block_number INTEGER REQUIRED
ated

Hash of the block where this contract was


block_hash STRING REQUIRED
created

Source: Google Big Query

crypto_ethereum.token_transfers

Field name Type Mode Description

token_address STRING REQUIRED ERC20 token address

from_address STRING NULLABLE Address of the sender

to_address STRING NULLABLE Address of the receiver

Amount of tokens transferred (ERC20) / id


of the token transferred (ERC721). Use
value STRING NULLABLE
safe_cast for casting to NUMERIC or
FLOAT64

transaction_hash STRING REQUIRED Transaction hash

log_index INTEGER REQUIRED Log index in the transaction receipt

Timestamp of the block where this transfer


block_timestamp TIMESTAMP REQUIRED
was in

block_number INTEGER REQUIRED Block number where this transfer was in

block_hash STRING REQUIRED Hash of the block where this transfer was in

Source: Google Big Query

62
crypto_ethereum.traces

Field name Type Mode Description

transaction_hash STRING NULLABLE Transaction hash where this trace was in

Integer of the transactions index position in


transaction_index INTEGER NULLABLE
the block

Address of the sender, null when trace_type


from_address STRING NULLABLE
is genesis or reward

Address of the receiver if trace_type is call,


address of new contract or null if trace_type
is create, beneficiary address if trace_type is
to_address STRING NULLABLE suicide, miner address if trace_type is re-
ward, shareholder address if trace_type is
genesis, WithdrawDAO address if trace_type
is daofork

value NUMERIC NULLABLE Value transferred in Wei

input STRING NULLABLE The data sent along with the message call

The output of the message call, bytecode of


output STRING NULLABLE
contract when trace_type is create

One of call, create, suicide, reward, genesis,


trace_type STRING REQUIRED
daofork

call_type STRING NULLABLE One of call, callcode, delegatecall, staticcall

reward_type STRING NULLABLE One of block, uncle

gas INTEGER NULLABLE Gas provided with the message call

gas_used INTEGER NULLABLE Gas used by the message call

subtraces INTEGER NULLABLE Number of subtraces

Comma separated list of trace address in call


trace_address STRING NULLABLE
tree

Error if message call failed. This field


error STRING NULLABLE
doesn’t contain top-level trace errors.

Either 1 (success) or 0 (failure, due to any


status INTEGER NULLABLE operation that can cause the call itself or any
top-level call to revert)

Timestamp of the block where this trace


block_timestamp TIMESTAMP REQUIRED
was in

block_number INTEGER REQUIRED Block number where this trace was in

block_hash STRING REQUIRED Hash of the block where this trace was in

Unique string that identifies the trace. For


transaction-scoped traces it is
{trace_type}_{transaction_hash}_{trace_ad-
trace_id STRING NULLABLE
dress}. For block-scoped traces it is
{trace_type}_{block_number}_{in-
dex_within_block}
Source: Google Big Query

63
crypto_ethereum.transactions

Field name Type Mode Description

hash STRING REQUIRED Hash of the transaction

The number of transactions made by the


nonce INTEGER REQUIRED
sender prior to this one

Integer of the transactions index position in


transaction_index INTEGER REQUIRED
the block

from_address STRING REQUIRED Address of the sender

Address of the receiver. null when it’s a con-


to_address STRING NULLABLE
tract creation transaction

value NUMERIC NULLABLE Value transferred in Wei

gas INTEGER NULLABLE Gas provided by the sender

gas_price INTEGER NULLABLE Gas price provided by the sender in Wei

input STRING NULLABLE The data sent along with the transaction

receipt_cumula- The total amount of gas used when this


INTEGER NULLABLE
tive_gas_used transaction was executed in the block

The amount of gas used by this specific


receipt_gas_used INTEGER NULLABLE
transaction alone

The contract address created if the transac-


receipt_contract_address STRING NULLABLE
tion was a contract creation, otherwise null

32 bytes of post-transaction stateroot (pre-


receipt_root STRING NULLABLE
Byzantium)

Either 1 (success) or 0 (failure) (post-By-


receipt_status INTEGER NULLABLE
zantium)

Timestamp of the block where this transac-


block_timestamp TIMESTAMP REQUIRED
tion was in

block_number INTEGER REQUIRED Block number where this transaction was in

Source: Google Big Query

64
Appendix B

Off-Chain Data Tables

The following tables include off-chain data sets from State of the Dapps and Etherscan
that were used for the analysis and uploaded to Google Big Query for the data pre-pro-
cessing.

dapps - Dapps from State of the Dapps


The table contains 3813 entries from 18 categories.

Field name Type Mode Description

address STRING NULLABLE Address

Name of the Dapp the address is associated


label STRING NULLABLE
with

cat STRING NULLABLE Category of Dapp address is associated with

Source: Own representation

mining_pools - Mining Pools from Etherscan


The table contains 72 entries.

Field name Type Mode Description

address STRING NULLABLE Address

Name of the mining pool the address is asso-


label STRING NULLABLE
ciated with

Source: Own representation

mp_proxy - Mining Pool Proxy Addresses Identified from Heuristic


The table contains 6 entries.

Field name Type Mode Description

address STRING NULLABLE Address

Name of the Dapp the proxy address is asso-


label STRING NULLABLE
ciated with

Source: Own representation

65
dex – Decentralised Exchanges from Etherscan
The table contains 95 entries.

Field name Type Mode Description

address STRING NULLABLE Address

Name of the decentralized exchange the ad-


label STRING NULLABLE
dress is associated with

Source: Own representation

exch – Centralised Exchanges from Etherscan


The table contains 282 entries.

Field name Type Mode Description

address STRING NULLABLE Address

Name of the centralised exchange the ad-


label STRING NULLABLE
dress is associated with

Source: Own representation

66
Appendix C

SQL Queries

The following SQL queries were used to arrive at the analysis findings in Section 6. Table
identifiers have been modified to be consistent with the identifiers used in Appendix A
and B.

1. Address Structure – Number of ERC20 contracts, ERC721 contracts, smart


contracts, EOAs without outgoing transactions, EOAs with outgoing
transactions (queried on 10th of February 2021)

SELECT COUNT(DISTINCT address) FROM `crypto_ethereum.balances`

UNION ALL

SELECT COUNT(DISTINCT address) FROM `crypto_ethereum.balances`


WHERE address NOT IN (SELECT address FROM `crypto_ethereum.contracts`)

UNION ALL

SELECT COUNT(DISTINCT address) FROM `crypto_ethereum.balances`


WHERE address IN (SELECT address FROM `crypto_ethereum.transactions`)

UNION ALL

SELECT COUNT(DISTINCT address) FROM `crypto_ethereum.balances`


WHERE address NOT IN (SELECT address FROM `crypto_ethereum.transac-
tions`)
AND address NOT IN (SELECT address FROM `crypto_ethereum.contracts`)

UNION ALL

SELECT COUNT(DISTINCT address) FROM `crypto_ethereum.balances`


WHERE address IN (SELECT address FROM `crypto_ethereum.contracts`)

UNION ALL

SELECT COUNT(DISTINCT address) FROM `crypto_ethereum.balances`


WHERE address IN (SELECT address FROM `crypto_ethereum.contracts`
WHERE is_Erc20 IS True)

UNION ALL

SELECT COUNT(DISTINCT address) FROM `crypto_ethereum.balances`


WHERE address IN (SELECT address FROM `crypto_ethereum.contracts`
WHERE is_erc721 IS True)

67
2. Tx Sent, Active Days, Idle Time for all EOAs

SELECT DISTINCT address,


COUNT(1) AS tx_sent,
DATE_DIFF(DATE(MAX(bt)), DATE(MIN(bt)), DAY) AS active_days,
DATE_DIFF(DATE('2021-02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) AS
idle_time

FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337

)
GROUP BY address

3. Idle Time for EOAs with Active Days ≤ 1

SELECT DISTINCT address,


DATE_DIFF(DATE('2021-02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) AS
idle_time

FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337

)
GROUP BY address
HAVING DATE_DIFF(DATE(MAX(bt)), DATE(MIN(bt)), DAY) <= 1

68
4. Idle Time for EOAs with Active Days ≥ 2

SELECT DISTINCT address,


DATE_DIFF(DATE('2021-02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) AS
idle_time

FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337

)
GROUP BY address
HAVING DATE_DIFF(DATE(MAX(bt)), DATE(MIN(bt)), DAY) >= 2

5. Active Days for regular users with Idle Time ≤ 30

SELECT DISTINCT address,


COUNT(1) AS tx,
DATE_DIFF(DATE(MAX(bt)), DATE(MIN(bt)), DAY) AS active_days

FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337

)
GROUP BY address
HAVING COUNT(1) >= 124 and COUNT(1) <= 1894 and DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30

69
6. SC Ratio for regular users with Idle Time ≤ 30

SELECT DISTINCT address,


COUNT(1) AS tx_sent,
COUNT(CASE WHEN to_address IN (SELECT address FROM
`crypto_ethereum.contracts`) THEN 1 END) AS sc_tx_sent,
(COUNT(CASE WHEN to_address IN (SELECT address FROM
`crypto_ethereum.contracts`) THEN 1 END))/(COUNT(1)) AS sc_ratio
FROM
(
SELECT
from_address AS address,
to_address AS to_address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337

)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30

7. Zero Tx Ratio for regular users with Idle Time ≤ 30

SELECT DISTINCT address,


COUNT(1) AS tx_sent,
COUNT(CASE WHEN value = 0 THEN 1 END) AS zero_tx_sent,
(COUNT(CASE WHEN value = 0 THEN 1 END))/(COUNT(1)) AS zero_tx_ratio
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
value,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337

)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30

70
8. Zero Tx Ratio for regular users with Idle Time ≤ 30 (transactions to smart
contracts)

SELECT DISTINCT address,


COUNT(1) AS tx_sent,
COUNT(CASE WHEN value = 0 THEN 1 END) AS zero_tx_sent,
(COUNT(CASE WHEN value = 0 THEN 1 END))/(COUNT(1)) AS zero_tx_ratio
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
value,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337 AND to_address IN
(SELECT address FROM `crypto_ethereum.contracts`)

)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30

9. ERC20 Tx Sent for regular users with Idle Time ≤ 30

SELECT DISTINCT address,


COUNT(CASE WHEN to_address IN (SELECT address FROM `bigquery-public-
data.crypto_ethereum.contracts` WHERE is_erc20 IS True) THEN 1 END) AS
erc20_tx_sent,
FROM
(
SELECT
from_address AS address,
to_address AS to_address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337 AND to_address IN
(SELECT address FROM `crypto_ethereum.contracts`)

)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30

71
10. Out Token, In Token, Out/In Token Ratio for regular users with Idle Time ≤ 30

WITH regular_users AS (
SELECT DISTINCT address
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337

)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
),

out AS (
SELECT DISTINCT from_address AS address,
COUNT(1) AS out_token
FROM `crypto_ethereum.token_transfers`
WHERE block_number <= 11828337 AND from_address IN (SELECT address FROM
regular_users)
GROUP BY from_address
)

SELECT DISTINCT to_address AS address,


out_token,
COUNT(1) AS in_token,
(out_token)/(count(1) + out_token) AS out_in_token_ratio
FROM `crypto_ethereum.token_transfers` AS incom
LEFT JOIN out ON incom.to_address = out.address
WHERE block_number <= 11828337 AND to_address IN (SELECT address FROM
regular_users)
GROUP BY to_address, out_token

72
11. Activity level of Dapp categories

WITH regular_users AS (
SELECT DISTINCT address
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
where block_number <= 11828337

)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
)

SELECT DISTINCT cat as category,


COUNT(1) AS transaction_count,
COUNT(DISTINCT from_address) AS user,
COUNT(1)/COUNT(DISTINCT from_address) AS transaction_per_user,
COUNT(DISTINCT from_address)/COUNT((SELECT * FROM regular_users)) AS
user_regularuser_ratio
FROM
(
SELECT
to_address AS address,
from_address AS from_address,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337 AND to_address IN
(select address FROM dapps) AND
from_address IN (SELECT address FROM regular_users)
) AS f
LEFT JOIN dapps AS dapps ON dapps.address = f.address
GROUP BY category

73
12. Activity level of centralised exchanges

WITH regular_users AS (
SELECT DISTINCT address
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337

)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
)

SELECT COUNT(1) AS transaction_count,


COUNT(DISTINCT from_address) AS user,
COUNT(1)/COUNT(DISTINCT from_address) AS transaction_per_user,
COUNT(DISTINCT from_address)/COUNT((SELECT * FROM regular_users)) AS
user_regularuser_ratio
FROM
(
SELECT
to_address AS address,
from_address AS from_address,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337 AND to_address IN
(SELECT address FROM exch) AND from_address IN
(SELECT address FROM regular_users)
)

74
13. Activity level of decentralised exchanges

WITH regular_users AS (
SELECT DISTINCT address
FROM
(
SELECT
from_address AS address,
block_timestamp AS bt,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337

)
GROUP BY address
HAVING COUNT(1) >= 124 AND COUNT(1) <= 1894 AND DATE_DIFF(DATE('2021-
02-10 10:53:31 UTC'), DATE(MAX(bt)), DAY) <= 30
)

SELECT COUNT(1) AS transaction_count,


COUNT(DISTINCT from_address) AS user,
COUNT(1)/COUNT(DISTINCT from_address) AS transaction_per_user,
COUNT(DISTINCT from_address)/COUNT((SELECT * FROM regular_users)) AS
user_regularuser_ratio
FROM
(
SELECT
to_address AS address,
from_address AS from_address,
FROM `crypto_ethereum.transactions` AS t
WHERE block_number <= 11828337 AND to_address IN
(SELECT address FROM dex) AND from_address IN
(SELECT address FROM regular_users)
)

75
14. Activity level of mining pools

WITH a AS (
SELECT DISTINCT to_address,
COUNT(1) AS reward_count,
COUNT(CASE WHEN reward_type = 'block' THEN 1 END) AS blocks,
COUNT(CASE WHEN reward_type = 'uncle' THEN 1 END) AS uncles,

SUM(CASE WHEN reward_type = 'block' THEN value END) AS block_reward,


SUM(CASE WHEN reward_type = 'uncle' THEN value END) AS uncle_reward,
SUM(value) AS total_reward,

FROM `crypto_ethereum.traces`
WHERE block_number <= 11828337 AND to_address IN (SELECT address FROM
mining_pools) AND trace_type = 'reward'
GROUP BY to_address
)

SELECT b.label,a.*
FROM a
LEFT JOIN mining_pools AS b ON a.to_address = b.address
ORDER BY blocks DESC

76
15. Proxy address heuristic: Query calculates the highest consecutive occurrence of
payouts to the same receiver and orders receivers by maximum occurrence in
descending order. The query has to be run for every mining pool.

WITH all_mp_payouts AS (
SELECT from_address AS address,
label,
to_address AS recipient,
block_timestamp,
FROM `crypto_ethereum.transactions` AS t
LEFT JOIN mining_pools AS mp ON mp.address = t.from_address
WHERE block_number <= 11828337
),

a AS (
SELECT recipient,
label,
block_timestamp
FROM all_mp_payouts
WHERE address = <INSERT ADDRESS OF MINING POOL TO EXAMINE>
ORDER BY block_timestamp ASC
),

b AS (
SELECT recipient,
label,
count(*) AS occ
FROM (SELECT a.*,
(ROW_NUMBER() OVER (ORDER BY
block_timestamp) - ROW_NUMBER() OVER
(PARTITION BY recipient ORDER BY
block_timestamp)
) AS grp
FROM a
) a
GROUP BY grp, recipient, label
),

c AS(
SELECT DISTINCT recipient,
label,
MAX(occ) AS maxocc
FROM b
GROUP BY recipient,label
)

SELECT * FROM c ORDER BY maxocc DESC

77
16. Mining Pool Tx Received, Average Received Block Reward, Unique Mining
Pools and Mining Ratio of mining pool participants

WITH mp_agg AS (
SELECT * FROM mining_pools
UNION ALL
SELECT * FROM mp_proxy
),

total_inc AS (
SELECT DISTINCT to_address AS address,
COUNT(1) AS tx_received
FROM `crypto_ethereum.transactions`
WHERE block_number <= 11828337 AND
GROUP BY to_address
)

SELECT DISTINCT to_address AS miner,


COUNT(1) AS mining_pool_tx_received,
COUNT(DISTINCT label) AS unique_mining_pool,
AVG(value) AS avg_received_block_reward,
SUM(value) AS total_amount_ether_from_mining_pool,
COUNT(1)/tx_received AS mining_ratio

FROM `crypto_ethereum.transactions` AS a
LEFT JOIN mp_agg AS b ON a.from_address = b.address
LEFT JOIN total_inc AS c ON a.to_address = c.address

WHERE block_number <= 11828337 AND

from_address IN (SELECT address FROM mp_agg) AND


to_address NOT IN (SELECT address FROM mp_proxy)
GROUP BY miner

78
17. Outgoing Ether transfers of mining pool participants (with mining ratio ≥ 0.99)
to secondary receivers

WITH mp_agg AS (
SELECT * FROM mining_pools
UNION ALL
SELECT * FROM mp_proxy
),

total_inc AS (
SELECT DISTINCT to_address AS address,
COUNT(1) AS tx_received
FROM `crypto_ethereum.transactions`
WHERE block_number <= 11828337 AND
GROUP BY to_address
),

miner AS (
SELECT DISTINCT to_address AS miner,
COUNT(1)/tx_received AS mining_ratio

FROM `crypto_ethereum.transactions` AS a
LEFT JOIN mp_agg AS b ON a.from_address = b.address
LEFT JOIN total_inc AS c ON a.to_address = c.address

WHERE block_number <= 11828337 AND


from_address IN (SELECT address FROM mp_agg) AND
to_address NOT IN (SELECT address FROM mp_proxy)
GROUP BY miner
)

SELECT DISTINCT from_address AS miner,


COUNT(1) AS tx_sent,
SUM(value) AS total_eth_forwarded
FROM `crypto_ethereum.transactions`
WHERE block_number <= 11828337 AND
from_address IN (SELECT miner FROM miner WHERE mining_ratio >= 0.99)

79
18. Secondary receivers by incoming Ether from mining pool participants (with
mining ratio ≥ 0.99)

WITH mp_agg AS (
SELECT * FROM mining_pools
UNION ALL
SELECT * FROM mp_proxy
),

total_inc AS (
SELECT DISTINCT to_address AS address,
COUNT(1) AS tx_received
FROM `crypto_ethereum.transactions`
WHERE block_number <= 11828337 AND
GROUP BY to_address
),

miner AS (
SELECT DISTINCT to_address AS miner,
COUNT(1)/tx_received AS mining_ratio

FROM `crypto_ethereum.transactions` AS a
LEFT JOIN mp_agg AS b ON a.from_address = b.address
LEFT JOIN total_inc AS c ON a.to_address = c.address

WHERE block_number <= 11828337 AND


from_address IN (SELECT address FROM mp_agg) AND
to_address NOT IN (SELECT address FROM mp_proxy)
GROUP BY miner
)

SELECT DISTINCT to_address AS secondary_receiver,


COUNT(1) AS tx_received,
SUM(value) AS total_eth_received
FROM `crypto_ethereum.transactions`
WHERE block_number <= 11828337 AND
from_address IN (SELECT miner FROM miner WHERE mining_ratio >= 0.99)

80
Declaration of Authorship

I hereby declare that the thesis submitted is my own unaided work. All direct or indirect
sources used are acknowledged as references.

I am aware that the thesis in digital form can be examined for the use of unauthorized aid
and in order to determine whether the thesis as a whole or parts incorporated in it may be
deemed as plagiarism. For the comparison of my work with existing sources, I agree that
it shall be entered in a database where it shall also remain after examination to enable
comparison with future theses submitted. Further rights of reproduction and usage,
however, are not granted here.

This paper was not previously presented to another examination board and has not been
published.

Dinh Truong Vu Ho Chi Minh City, 19.04.2020

___________________________ ___________________________
first and last name city, date and signature

81

You might also like