0% found this document useful (0 votes)
75 views9 pages

Machine Learning For Blockchain Data Analysis: Progress and Opportunities

Uploaded by

khinpyaephyosan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views9 pages

Machine Learning For Blockchain Data Analysis: Progress and Opportunities

Uploaded by

khinpyaephyosan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Machine Learning for Blockchain Data Analysis: Progress and Opportunities

Poupak Azad1 , Cuneyt Gurcan Akcora2 , Arijit Khan3


1
University of Manitoba, Canada
2
University of Central Florida, USA
3
Aalborg University, Denmark
[email protected], [email protected], [email protected]
arXiv:2404.18251v1 [cs.CR] 28 Apr 2024

Abstract Simultaneously, the field of machine learning (ML) is ex-


periencing an exponential surge in its application to data anal-
Blockchain technology has rapidly emerged to ysis across domains, thanks to deep neural methods and arti-
mainstream attention, while its publicly accessible, ficial general intelligence. ML and deep learning algorithms,
heterogeneous, massive-volume, and temporal data capable of discerning patterns, trends, and anomalies within
are reminiscent of the complex dynamics encoun- vast datasets, have proven indispensable for extracting mean-
tered during the last decade of big data. Unlike any ingful insights and enabling predictions from complex data in
prior data source, blockchain datasets encompass an automated and end-to-end manner.
multiple layers of interactions across real-world en- The importance of Blockchain is increasingly felt as the
tities, e.g., human users, autonomous programs, United Nations, through its Innovation Fund, has committed
and smart contracts. Furthermore, blockchain’s in- substantial resources ($35M + 2267ETH + 8BTC) to explore
tegration with cryptocurrencies has introduced fi- and develop blockchain technologies for creating transparent,
nancial aspects of unprecedented scale and com- efficient systems and rethinking problem-solving approaches
plexity such as decentralized finance, stablecoins, in enhancing lives and developing communities [Chapiro et
non-fungible tokens, and central bank digital cur- al., 2021]. Our exploration reveals that “Machine Learning
rencies. These unique characteristics present both for Blockchain Data Analysis” has emerged as a vibrant and
opportunities and challenges for machine learning influential field since 2018 with more than 1750 publications
on blockchain data. dedicated to this field in the ACM Digital Library.
On one hand, we examine the state-of-the-art We apply rigorous criteria to select and evaluate papers that
solutions, applications, and future directions as- contribute the most to the “ML for Blockchain Data Analy-
sociated with leveraging machine learning for sis” field. They encompass factors such as the relevance of
blockchain data analysis critical for the improve- the research, the significance of the problem addressed, the
ment of blockchain technology such as e-crime de- quality of the methodology employed, and the impact of the
tection and trends prediction. On the other hand, findings on the broader artificial intelligence community. Our
we shed light on the pivotal role of blockchain search particularly focused on articles that analyzed and built
by providing vast datasets and tools that can cat- models for data from a public blockchain such as Bitcoin,
alyze the growth of the evolving machine learning Ethereum, Litecoin, Eosio, Ripple, Monero, Zcash, and Dash.
ecosystem. This paper serves as a comprehensive Contributions and Roadmap. Our survey offers several key
resource for researchers, practitioners, and policy- contributions to the field. First, it provides a comprehensive
makers, offering a roadmap for navigating this dy- taxonomy (§2) and overview (§4) of the latest advancements
namic and transformative field. in “ML for Blockchain Data Analysis” since 2018, offering
insights into the state of the art. Second, in §5 we discuss
how the datasets and tools we have highlighted can signif-
1 Introduction icantly facilitate future ML research, benchmarking, and the
Blockchain, originally designed as the underlying technol- development of innovative applications in the field. Addition-
ogy for cryptocurrencies, e.g., Bitcoin [Nakamoto, 2008], has ally, we discuss the unique challenges (§3) and opportunities
evolved into a robust framework for recording and verify- (§6) inherent in this domain, shedding light on areas that re-
ing transactions. Its inherent features, including decentral- quire further exploration and innovation. Ultimately, our sur-
ization and cryptographic security, make it an ideal candidate vey aims to guide researchers, practitioners, and policymak-
for myriad applications beyond finance, such as internet-of- ers in harnessing the potential of machine learning within the
things, healthcare, and smart city. One of the most intriguing blockchain ecosystem, promoting user-friendly, explainable,
aspects of blockchain is its ability to generate vast and pub- and responsible data analysis practices. To the best of our
licly accessible datasets, containing records of transactions knowledge, this is the first comprehensive survey that covers
involving diverse real-life entities and autonomous agents. all five areas of ML on blockchains (see Table 1).
Table 1: Comparison of survey articles across ML for blockchains. agreements encoded directly in the blockchain. Addition-
Survey Graph Seq. Code Temp. Text
ally, the peer-to-peer (P2P) network underpins the decen-
ML ML ML ML ML tralized nature of blockchains, allowing direct interactions
A Survey on Blockchain Anomaly Detection ✓ × ✓ ✓ × among users. User accounts represent individuals or entities
Using Data Mining Techniques [Li et al., with their transaction histories and balances. A decentralized
2020a]
application (dApp) combines one or more smart contracts
Knowledge Discovery in Cryptocurrency ✓ ✓ ✓ ✓ × to support a certain functionality on a distributed, peer-to-
Transactions: A Survey [Liu et al., 2021a]
A Survey on Blockchain Data Analysis [Hou et ✓ ✓ ✓ × × peer network; for example, decentralized finance (DeFi) are
al., 2021] dApps for financial services. One may also consider exter-
Analysis of Cryptocurrency Transactions from ✓ × ✓ ✓ ✓
a Network Perspective: An Overview [Wu et nal sources, including social media data, online blogs, cryp-
al., 2021] tocurrency prices, Google Trends, etc., to mine public senti-
Anomaly Detection in Blockchain Networks: A ✓ ✓ ✓ ✓ × ments and trends about blockchains. For a detailed survey on
Comprehensive Survey [Hassan et al., 2022]
Graph Analysis of the Ethereum Blockchain ✓ × ✓ ✓ × blockchain components, we refer to [Khan, 2022].
Data: A Survey of Datasets Methods and Fu-
ture Work [Khan, 2022]
A survey on machine learning approaches in × ✓ × × ×
cryptocurrency: challenges and opportunities 2.3 Blockchain Data Models
[Mujlid, 2023]
Blockchain Data Mining with Graph Learning: ✓ ✓ ✓ ✓ ×
A survey [Qi et al., 2023] The data model for blockchain analysis in ML includes i)
Machine Learning for Blockchain Data Analy- ✓ ✓ ✓ ✓ ✓ simple graphs that illustrate basic peer-to-peer connections,
sis: Progress and Opportunities [ours]
ii) temporal graphs that capture changes across time, iii) at-
tributed graphs where nodes and edges carry distinct prop-
2 Taxonomy erties and iv) weighted graphs with varying importance as-
We discuss our taxonomy of machine learning methods signed to connections. Furthermore, directed graphs indicat-
(§2.1), blockchain components (§2.2), data models (§2.3), ing transaction directions, dynamic graphs reflecting evolv-
and applications of blockchain data analysis (§2.4). ing relationships, stream graphs representing continuous data
flows, and higher-order graphs offering a multi-dimensional
2.1 Machine Learning Methods perspective on interactions, have been considered [Akcora et
The integration of machine learning is unlocking new poten- al., 2022].
tial in blockchain data analysis and decision-making [Khan Another aspect of the data model is the analysis of smart
and Akcora, 2022]. ML approaches, including graph-based contract code, which is essential for understanding the func-
learning, recurrent neural networks (RNN), and transformers, tional mechanics of blockchain systems [Bartoletti et al.,
have become pivotal in extracting insights from blockchain’s 2020]. This includes both the source code, which offers in-
complex and varied data structures. These methods enable sights into the logic and rules governing the contracts; and
a nuanced understanding of blockchain components, such as the bytecode, which is the executable form deployed on the
transaction networks and smart contracts, by identifying pat- blockchain. Furthermore, analyzing text data from transac-
terns and anomalies that might otherwise remain obscured. tion descriptions, user comments, and other textual inputs
Graph ML approaches such as unsupervised methods, provides a unique perspective on user behaviors and social
graph embedding, and graph neural networks, e.g., graph dynamics within the blockchain ecosystem. The integration
convolutional neural networks (GCNs) and graph attention of these varied data types, including sequential data models,
networks (GATs) [Xia et al., 2021] are essential for analyz- e.g., time series, is indispensable for a comprehensive anal-
ing complex network structures. Sequential ML, e.g., RNNs ysis. This integration not only helps in decoding the current
and transformers are adept at processing sequential data [Wen state of the blockchain but also in forecasting future trends.
et al., 2023], thus crucial for transaction analysis. Code ML We shall highlight graph, time series, and smart contract code
techniques for smart contract analysis focus on interpreting data models, as well as their combinations in our survey.
code and bytecode [Pierro et al., 2020]. Temporal ML han-
dles time-sensitive data – revealing trends, prices, and pat-
terns over time [Benidis. et al., 2023]. Lastly, Text ML, 2.4 Applications of Blockchain Data Analysis
particularly using text and NLP on social media posts, of-
fers insights into public perception and interactions regarding Blockchain data analysis has diverse applications pivotal to
blockchains [Rouhani and Abedin, 2020]. The categories are the advancement of blockchain technology. This domain fa-
not mutually exclusive, e.g., temporal graph learning deals cilitates predictive analytics in financial cryptocurrency mar-
with both graph ML and temporal ML; it has been exploited kets and anomaly detection within blockchain networks [Li
in cryptocurrency e-crimes detection [Akcora et al., 2021]. et al., 2020a]. Furthermore, the field is useful in identify-
ing and mitigating financial crimes, including ransomware,
2.2 Blockchain Components money laundering, darknet markets, and Ponzi schemes [Wu
The key blockchain components include the transaction net- et al., 2023]. Additionally, blockchain data analysis is key in
work, which records assets (e.g., cryptocurrency) movements; address/transaction clustering and scrutinizing code for dupli-
token networks, managing the distribution and interactions of cates or malicious contents, thus enhancing the security and
various tokens; and smart contracts, which are automated integrity of blockchain systems.
Pattern-Based:
3 Challenges of Machine Learning for Supervised ML:
➤ Clustering - Harlev
➤ Heuristic clustering -
Victor
Graph ML
Unsupervised ML: Code ML
Blockchain Data Analysis ➤ Zcash anonymity -
Kappos
➤ Graph2Vec + - Yuan
GNN: GNN:
Temp. ML
➤ OCGNN - Patel ➤ EvanGCN - Patel
Ensemble Learning: Supervised ML:
In the realm of blockchain technology, a complex web of TDA:
➤ Chainlet - Akcora
➤ Phishing probe - Chen
➤ Blacklist - Kilic
GNN:
challenges emerges from technology, its usage, control mech- Structural Analysis:
➤ Social - Alqassem ➤ Linear GCN – Alarab Unsupervised ML:
➤ Botnets - Zarpelao LSTM: ➤ Blockchain
anisms, the nature of data, and the ML methods employed. ➤ Transaction linking - Li characteristics - Scheid
Supervised ML: Pattern-Based: LSTM:
Blockchain Technology. A fundamental aspect of all pub- ➤ Ponzi identifier - Chen
2019
➤ EOSIO - Huang
2021
➤ BiLSM-Attention - Qian
2023

lic blockchains is the anonymous nature of blockchain ad- 2018


GNN:
2020
Subgraph Mapping:
2022
Address Embedding:
➤ Laundering probe - ➤ TSGN - Wang
dresses. The anonymity allows fast and easy access to Weber Graph attention:
➤ Chainlet orbits - Azad

➤ Anti-laundery - Yu GNN:
blockchain for users, but it also presents a significant hurdle TDA:
➤ Chainnet - Abay
TDA: ➤ AML/CFT - Pocher
➤ BitcoinHeist - Akcora LLM:
when tracking addresses and analyzing transaction patterns. LSTM:
➤ Neural Forecast -
➤ BlockGPT - Gai
Semi-supervised ML:
A second technological challenge in blockchain arises from Lahmiri ➤ Hybrid motifs: Wu
GNN:
Hybrid Deep Learning:
➤ Lightning Cat - Tang
Supervised ML:
the fact that only the compiled binary of smart contract code ➤ SoliAudit - Liao
➤ DR-GCN - Zhuang
Attentive Encoder:
➤ Pattern Fusion - Liu
is visible on the blockchain. This limited visibility restricts
DNN:
our understanding of the underlying source code, obscuring ➤ VSCL - Mi
Supervised ML:
the logic and potential vulnerabilities of these contracts. This ➤ AI-SPSD - Fan

opacity is a significant concern for ensuring the integrity and


security of the blockchain network, as it hinders comprehen- Figure 1: The timeline of machine learning for Blockchain research.
sive auditing and analysis of smart contracts.
of models, presenting a substantial obstacle to the effective-
Blockchain Usage. A blockchain is characterized by the dy- ness of machine learning applications in blockchain analysis.
namic nature of its data. With new transactions arriving in
blocks every 15 seconds (as seen on Ethereum [Wood, 2018]) ML Models. The challenges extend into the domain of ma-
to 10 minutes (as on Bitcoin [Nakamoto, 2008]), the data chine learning methods used for blockchain data analysis.
is in a constant state of evolution. This poses a significant The “black-box” neural models, particularly deep learning,
challenge in maintaining updated and relevant analyses in raise concerns about explainability and interpretability. These
real-time. The sheer volume of this data, compounded by are critical issues in a field that demands transparency and
its sparse and graph-like structure, exacerbates computational accountability to comply with financial regulations. Inherent
and analytical difficulties. Additionally, the complexity is fur- biases in ML algorithms pose risks of unfairness, contradict-
ther intensified by coin-mixing schemes [Wu et al., 2022a], ing the ethos of blockchain technology. Furthermore, the high
which deliberately muddle the process of tracking transac- computational demands, including extensive training and in-
tion flows, often to obscure the origins of funds for purposes ference times and the need for large volumes of labeled train-
such as coin-laundering [Akcora et al., 2020]. ing data, present substantial challenges, especially when data
is often scarce, dynamic, and unlabeled.
Blockchain Control Mechanisms. The open and decentral-
ized nature of blockchains, while one of its strengths, also
invites a range of adversarial behaviors. This includes long- 4 Survey: Blockchain Data Models, Machine
range attacks and manipulations, challenging the system’s Learning Methods, and Applications
integrity and reliability. The lack of a centralized review
We primarily investigate three non-exclusive ML approaches:
mechanism for both code and users in the blockchain further
graph machine learning (§4.1), temporal machine learning
heightens these risks, leaving the network vulnerable to ma-
(§4.2), and machine learning for smart contracts (§4.3). We
licious smart contracts and abusive users.
survey their methods for blockchain data analysis, respective
Blockchain Data. Data-related challenges in blockchains data models, and applications. A schematic diagram connect-
are multifaceted. When utilizing labeled data in blockchain ing various articles in our survey is illustrated in Figure 1.
analysis, the rarity of the positive class (such as instances of
ransomware or money laundering) compared to the vast size 4.1 Graph Machine Learning on Blockchains
of the networks results in a significant bias in the methods
4.1.1 Graph Data Models
employed. Such a skewed distribution can lead to mislead-
ingly high accuracy metrics. The scarcity of verified, reliable UTXO Data Models. Blockchain technology, which started
ground truth data hampers the development and validation with Bitcoin, utilizes a distinctive data structure known as
of robust analytical models. Furthermore, the challenge of an “output” that contains an address and an amount. Such
train-test mismatch in blockchain analytics is accentuated by blockchains are referred to as the UTXO (Unspent Transac-
the ever-evolving nature of blockchains, which are frequently tion Output) blockchains. An address is a unique string rep-
impacted by real-world events such as government regula- resentation of the holder within the transaction network. A
tions or bans [Xie, 2019]. These external influences can sig- Bitcoin transaction, where a later transaction consumes one
nificantly alter the nature of the data within a given period, or more outputs to generate new outputs, can effectively be
leading to a scenario where the blockchain’s state during the modeled as heterogeneous graphs comprising two primary
training phase may be different from that in the testing phase. node types: addresses and transactions. However, a signif-
This divergence between training and testing data distribu- icant challenge arises with most graph libraries, e.g., Net-
tions severely compromises the accuracy and generalizability workX [Hagberg et al., 2008], which are designed to handle
graphs with a single node type. This limitation has led re- the realm of blockchain graph machine learning, providing a
searchers to frequently model the Bitcoin transaction network rich source of labeled data. This marked a transition towards
as either an address graph [Spagnuolo et al., 2014] by omit- more supervised learning approaches, broadening the scope
ting transactions, or a transaction graph [Ron and Shamir, and precision of blockchain data analysis. We categorize
2013] by omitting addresses. Specifically, both the address these supervised methods into three classes: graph features
graph and the transaction graph are edge-weighted, directed extraction, graph embeddings, and graph neural networks.
graphs with nodes representing their respective namesakes, Graph Features Extraction. Harlev et al. [Harlev et al., 2018]
and directed edges record the flow of coins. An edge weight first use unsupervised clustering on the transaction graph to
represents the amount of coins transferred. link bitcoin addresses owned by the same user. Next, super-
Account Data Models. The emergence of Ethereum intro- vised machine learning based on cluster features has been em-
duced a shift in blockchain data models. Unlike Bitcoin, ployed to de-anonymize entities on the Bitcoin blockchain.
Ethereum employs an account-based model that eschews the This approach relies on known data about entities whose
output data structure. Instead, the representation shifts to a identities were previously exposed to form a training dataset,
graph of address nodes. A key feature of these networks is thereby reducing the level of anonymity inherent in Bit-
the variety of edge types, which can represent different forms coin transactions. Supervised learning has also been effec-
of value transfer, such as the native cryptocurrency (Ether), tively used in detecting blacklisted addresses in the Ethereum
tokens, or other user-defined assets. This complexity trans- blockchain [Kılıç et al., 2022]. The approach involved using
forms the network into a multiplex network [Dickison et al., both local and global features extracted from the Ethereum
2016], where address nodes are shared, but the edges differ transaction graph to train various machine learning models.
in their types and meanings. Therefore, these graphs are cat- This method’s feature extraction process, employing tech-
egorized as directed, edge-weighted multigraphs. niques such as random undersampling and SMOTE [Chawla
Moreover, the application of hypergraphs [Antelmi et al., et al., 2002], is designed to address label scarcity.
2023] presents a new dimension in modeling blockchain
transactions, particularly beneficial in e-crime scenarios Graph Embeddings. Graph embeddings map each node in a
where coins flow between seemingly different addresses graph to a low-dimensional vector, e.g., for supervised node
which are, in reality, owned by the same user. For instance, classification, which has been pivotal in detecting phishing
in coin mixing networks such as Tornado Cash [Wu et al., activities within blockchain networks. Yuan et al. [Yuan
2022b], the flow of coins creates a hyper-edge that connects et al., 2020] introduce a graph-based classification frame-
more than two nodes, providing a more nuanced view of asset work leveraging an improved Graph2Vec algorithm to ana-
transfers in such systems. lyze Ethereum transaction networks for this purpose. The pa-
per’s focus on Ether flow in phishing scams integrates this
4.1.2 Graph Machine Learning Methods aspect into the machine learning model, enhancing phish-
We categorize the discussion based on unsupervised and su- ing detection capabilities. Similarly, Wang et al. [Wang et
pervised graph ML, as well as techniques to scale graph ML. al., 2021] develop the transaction subgraph network model to
Unsupervised Learning. The evolution of blockchain an- identify phishing accounts in the Ethereum blockchain, uti-
alytics has been significantly influenced by the application lizing a directed version of the model that retains transaction
of unsupervised learning techniques. Initial research in this flow information crucial for identifying such illicit activities.
domain mainly focused on examining transaction patterns Graph Neural Networks. GNNs are deep learning models de-
within blockchain networks to understand the flow of digi- veloped for graph-related tasks in an end-to-end manner.
tal currencies, identify trends, and detect anomalies [Ron and A notable contribution in this domain is the work on de-
Shamir, 2013]. This analysis typically included studying as- tecting Ponzi schemes within the Ethereum blockchain [Yu
pects such as transaction volumes, frequency, and the interre- et al., 2021b]. Here, a model based on a graph convolu-
lationships between different addresses [Lee et al., 2020]. tional network is developed to classify nodes in the Ethereum
As the research progressed, a shift towards more address transaction network as Ponzi or non-Ponzi. This approach
and transaction-centric views emerged. Address cluster- demonstrates the efficacy of supervised learning in identify-
ing, aiming to deduce which addresses are controlled by ing fraudulent schemes by examining the topological struc-
the same user, gained considerable attention [Victor, 2020; ture and transactional characteristics of smart contracts. The
Harrigan and Fretter, 2016]. Address clustering employs development of graph attention network models to identify
various heuristics that exploit the characteristics of UTXO abnormal transactions in dynamically generated data is also a
transactions. This process is largely unsupervised and fo- key area where supervised learning has shown great promise.
cuses on linking entities behind blockchain addresses. Clus- Yu et al. [Yu et al., 2021a] introduce a GAT approach, fo-
tering plays a crucial role in identifying and understanding cusing on exploiting the graph structure of transactions. The
address behaviors and transaction patterns [Spagnuolo et al., method’s dynamic graph handling capability and weight as-
2014]. Similar unsupervised analyses have been performed signment to nodes based on their relevance to abnormal trans-
on reportedly “anonymous” cryptocurrencies, e.g., Monero actions offer advanced capabilities.
[Möser et al., 2017], Zcash [Kappos et al., 2018], and a di- Moreover, the concept of anomaly detection in Ethereum’s
verse set of cryptocurrency ledgers [Yousaf et al., 2019]. blockchain network has been explored. Patel et al. [Patel
Supervised Learning. The advent of public datasets, e.g., et al., 2020] employ the “one-class” graph neural network
Elliptic [Weber et al., 2019] signified a pivotal moment in capturing complex relationships and interactions between ac-
counts for more effective identification of anomalous pat- algorithms and scalable systems. Real-time analysis is crucial
terns. Analogously, the paper by Patel et al. [Patel et al., as blockchain data evolves rapidly where latency in detecting
2022] develops EvAnGCN, a dynamic GCN for detecting anomalies can cause billions of dollars in lost value (e.g., in
anomalous behaviors in blockchain networks by structuring the LunaTerra collapse). Integrating machine learning across
the data as temporal graphs. This model efficiently learns multiple blockchains is complex, involving data heterogene-
from the dynamic and evolving structures of blockchain net- ity and interoperability challenges (e.g., in UTXO-account
works, utilizing both temporal and structural features. data integration). Detecting data shifts within blockchain
Furthermore, the identification of illicit Bitcoin addresses graphs is essential for maintaining model accuracy as us-
has been enhanced through the integration of structure and age patterns by ordinary users, as well as e-crime operators,
temporal information of Bitcoin transactions. Tian et al. [Tian change. Tackling these challenges is essential for harnessing
et al., 2021] develop an attention-based graph neural network machine learning’s potential in blockchain data analysis.
that refines address embeddings through neighbor embedding
and attention mechanisms. An LSTM-based auto-encoder 4.2 Temporal Machine Learning on Blockchains
is used to capture hidden temporal features from transaction The integration of ML with blockchain’s temporal data offers
records, augmenting identification accuracy. unique opportunities for enhanced security, predictive analyt-
Scaling Graph Machine Learning. Scaling graph machine ics, and understanding dynamic market behaviors.
learning on blockchains is crucial for handling the vast and 4.2.1 Temporal Data Models
continuously growing volume of data within transaction net-
Temporal data on blockchains offer a rich variety, including
works. For example, Bitcoin has ≈ 700,000 unique addresses
time series of crypto asset prices; temporal, multilayer graphs
daily in 500,000 transactions. 1 Examining the Bitcoin trans-
of transaction and asset networks; discrete and continuous
action network for even a single day poses a computationally
dynamic graphs; and graphs with temporal node and edge
demanding challenge for graph neural networks which are
features. The market volumes of native coins have reached
considered state-of-the-art in a multitude of predictive tasks,
billions of dollars. Hence, the most critical temporal data re-
such as node classification [Yang et al., 2023].
lates to the price of the native coins, such as Ether on the
In their initial efforts to analyze large graphs, researchers
Ethereum network, denominated in fiat currency. The price
typically focus on extracting information from the local
data also exists for a subset of crypto assets on blockchains,
neighborhoods of nodes. Kılıç et al. employ easily calcu-
such as tokens on Ethereum due to global trading activities,
lable features, including neighbor counts and the time dif-
thereby establishing an external pricing dataset. Transaction
ference between the first and last transactions of a given ad-
and asset trading networks provide temporal transaction data
dress [Kılıç et al., 2022]. If computing power permits, e.g.,
in the form of networks where both node and edge attributes,
using parallel computing, researchers may extend their anal-
as well as edge types, may change. When a blockchain has a
ysis to higher-hop neighborhoods [Yu et al., 2021a].
short block creation interval (e.g., Ethereum’s ≈ 12 sec gap
One common scaling approach is node sampling. This between two blocks), the network can be effectively modeled
technique has been widely employed to manage large trans- as an (almost) continuous-time dynamic graph.
action networks. For instance, Harlev et al. classify entities
based on transactional behaviors without necessitating analy- 4.2.2 Temporal Machine Learning Methods
sis of the entire network [Harlev et al., 2018] . Similarly, Yu Time Series Analysis. Early work in time series analy-
et al. identify Ponzi schemes within the Ethereum blockchain sis for cryptocurrencies used abundant transaction network
by node sampling to create subgraphs for analysis [Yu et al., data to extract predictive signals. Abay et al. [Abay et
2021b]. The authors randomly sample centered contracts to al., 2019] use Bitcoin graph substructures, called chainlets
obtain their first-order neighbors, significantly reducing the [Akcora et al., 2018], to predict Bitcoin prices. Kwon et
computational load. Another scaling strategy involves the al. [Kwon et al., 2019] use the long short-term memory
use of subgraph sampling, where transaction subgraphs are (LSTM) model [Schmidhuber and Hochreiter, 1997] on the
extracted and analyzed. This is evident in the work of Yu et historic cryptocurrency price time series data to classify the
al., where the dynamic graph structures employ a GAT model time series. Livieris et al. use ensemble-averaging, bagging,
that relies on the structure of the sampled edges, rather than and stacking with deep learning models for forecasting hourly
requiring a complete graph for analysis [Yu et al., 2021a]. cryptocurrency prices [Livieris et al., 2020].
This method is particularly effective in processing dynamic
graph structures, and adapting to real-time transaction data. Unsupervised Learning. The transaction network provides
a dynamic dataset abundant in user behavior, enabling the
4.1.3 Open Questions and Challenges mining of complex patterns. For instance, Alqassem et
Graph machine learning for blockchains faces several critical al. analyze the Bitcoin transaction graph from its incep-
challenges. Label scarcity is a prominent but well-known is- tion [Alqassem et al., 2018]. They observe changes in net-
sue. An under-reported issue is the undisclosed e-crime trans- work diameter, node connectivity, and community structure
actions (e.g., ransomware payments), which may create false over time. Their findings include patterns like the densifi-
positives in node classification tasks. The scale of blockchain cation power law and shrinking diameter. Importantly, they
graphs presents a computational hurdle, demanding efficient underscore the influence of anonymity-seeking behavior on
Bitcoin’s network dynamics. Zhao et al. investigate the evo-
1
https://www.blockchain.com/charts/n-unique-addresses lutionary nature of the Ethereum blockchain network such as
the growth rate, active lifespan of high-degree nodes, detect- fined rules or patterns, making it significantly more effective
ing anomalies based on temporal changes in global network in detecting anomalies in Ethereum transactions.
properties, and forecasting the survival of network communi- Graph Neural Networks. Zhuang et al. propose a novel
ties [Z. et al., 2021]. In the context of blockchain selection, method for detecting vulnerabilities in smart contracts using
Scheid et al. [Scheid et al., 2022] introduce an ML-based ap- graph neural networks [Zhuang et al., 2021]. They introduce
proach to simplify the selection process for non-technical in- a degree-free graph convolutional neural network and a tem-
dividuals. The authors present a novel metric to quantify the poral message propagation network for automatic detection.
subjective popularity of blockchain platforms, contributing to The temporal aspect is central to their approach, considering
the feature set used in their ML model. This work emphasizes the sequence of operations and interactions within smart con-
the temporal flexibility of their ML model, which adapts over tracts to detect vulnerabilities over time. Liu et al. introduce
time to new parameters and data. a method for detecting vulnerabilities in smart contracts by
Supervised Learning. Many temporal ML articles study combining graph neural networks with expert knowledge [Liu
graph ML topics with a temporal view. Alarab et al. divide et al., 2021b]. They transform smart contract source code
the popular Elliptic dataset into 49 time-steps, each repre- into a contract graph, focusing on critical nodes through a
senting a distinct set of transactions within a three-hour win- node elimination phase. A temporal message propagation
dow [Alarab et al., 2020]. This temporal division of data network is employed to extract graph features, considering
ensures that the model can handle real-time transaction data the sequential nature of smart contract execution. This ap-
and be trained on temporally coherent subsets. Temporal in- proach is pivotal in detecting vulnerabilities by capturing the
formation is also useful in profiling blockchain addresses. temporal dynamics of data and control flows within smart
Harlev et al. focus on de-anonymizing entities on the Bitcoin contracts. Other notable works include [Patel et al., 2022;
blockchain by analyzing transactions over time and extracting Yu et al., 2021a] for detecting anomalous transactions; due
useful features, such as transaction patterns and time-series to the non-exclusive nature of our categorization, they have
data [Harlev et al., 2018]. This temporal dimension enables been discussed earlier in graph ML (§4.1.2).
predicting behaviors based on transaction history. 4.2.3 Open Questions and Challenges
In e-crime research, temporal transaction patterns exhib- Linking temporal data across multiple blockchains (e.g., be-
ited by operators such as ransomware hackers [Akcora et al., tween Bitcoin and Monero in money laundering) to identify
2021] is invaluable. Pocher et al. effectively utilize pat- behavior patterns presents a complex challenge. Blockchains
terns by first grouping Bitcoin transactions into distinct time operate independently, and cross-chain data analysis requires
steps and then using a chronological analysis of transaction addressing issues related to data heterogeneity, interoper-
patterns to find characteristic of e-crime activities [Pocher ability, and privacy while uncovering valuable insights into
et al., 2023]. In anonymity-seeking behavior, users em- cross-blockchain behaviors. Identifying significant changes
ploy different addresses for each transaction to maintain their or anomalies in temporal blockchain data is critical for under-
anonymity. The anonymous behavior is further strengthened standing and responding to emerging trends or irregularities
by coin-mixing services where one can launder the coins such as hacked blockchain bridges, seized addresses, and ex-
through a mixing service. Wu et al. propose a feature-based ternal events [Xie, 2019]. Developing effective change point
network analysis framework to identify such mixing services detection algorithms tailored to blockchain data remains an
on Bitcoin [Wu et al., 2022a]. In their work, temporal mo- open question on (sparse) transaction graphs. Another chal-
tifs are crucial to distinguish normal transactions from those lenge is dealing with data staleness issues. As blockchain
associated with mixing services. data continuously evolves, ensuring that ML models operate
on informative and up-to-date information is essential.
Sequence-based Models. Li et al. focus on identifying il-
licit Bitcoin addresses by extracting temporal features from 4.3 Machine Learning for Smart Contracts
the change in the balance of addresses over time [Li et al., 4.3.1 Smart Contract Data Models
2020b]. They use an auto-encoder with LSTM to generate We consider four types of smart contract data: transaction,
discriminating temporal features, enhancing the model’s abil- contract state, event log, and source code. Transaction data
ity to identify illicit addresses based on temporal patterns. includes information on each transaction executed on the
This approach highlights the importance of temporal anal- blockchain, e.g., sender and receiver addresses, and block
ysis in distinguishing normal transaction behavior from il- numbers. Smart contracts have a state, which is essentially
licit activities. Lahmiri et al. used LSTM neural networks the current data stored in the contract. This state includes
for predicting cryptocurrency prices [Lahmiri and Bekiros, variables, balances, and other information specific to the con-
2019]. Their model memorizes both long-term and short- tract’s functionality. Events, emitted by contracts, record spe-
term temporal information, which is crucial for predicting cific occurrences, such as the completion of a task, or the oc-
the volatile and dynamic nature of cryptocurrency markets. currence of an event-triggering condition. The source code of
One recent contribution in this field is BlockGPT, a dynamic, a smart contract (in bytecode or higher level languages, e.g.,
real-time approach for detecting anomalous blockchain trans- Solidity) is another critical element for ML analysis.
actions [Gai et al., 2023]. This tool is notable for its ability
to generate tracing representations of blockchain activity and 4.3.2 Machine Learning Methods for Smart Contracts
train an LLM as a real-time intrusion detection system. Un- Contract Graph Analysis. Ferreira et al. automate detec-
like traditional methods, BlockGPT does not rely on prede- tion and investigation of attacks on Ethereum smart contracts,
utilizing logic-driven and graph-driven analysis of transac- which has been utilized in GNNs. However, the dataset em-
tions [Ferreira T. et al., 2021]. Zhuang et al. construct a ploys anonymized addresses, and descriptions of node fea-
contract graph to represent both syntactic and semantic struc- tures are not shared due to intellectual property rights issues.
tures of contract functions [Zhuang et al., 2021]. Liu et al. The BitcoinHeist dataset shares address and labels for about
propose a method that transforms smart contract source code 30K addresses linked to ransomware, facilitating more direct
into a contract graph, highlights critical nodes via a node transaction pattern analysis [Akcora et al., 2021].
elimination phase, and employs a temporal message propa- The evolution of blockchain datasets has been notable. Ini-
gation network to extract graph features [Liu et al., 2021b]. tially, datasets were released in conjunction with academic
These features, combined with expert-designed security pat- articles in isolated repositories [Anoaica and Levard, 2018;
terns, contribute to an effective and scalable vulnerability de- Liang et al., 2018; Lee et al., 2020]. However, recent trends,
tection system on platforms, e.g., Ethereum and VNT Chain. particularly highlighted in benchmark tracks of conferences,
Source Code Analysis. Mi et al. propose a metric learning- e.g., NeurIPS, have led to the development of standardized
based deep neural network for vulnerability detection in and accessible benchmarks, such as Chartalist [Shamsi et al.,
smart contracts, focusing on analyzing bytecode [Mi et al., 2022] and NFTGraph [Zhang et al., 2023]. These bench-
2021]. Fan et al. detect smart Ponzi schemes in blockchain marks provide large-scale, labeled graph data crucial for di-
systems by extracting smart contract features from Op- verse research areas, from financial fraud detection to net-
Codes [Fan et al., 2021]. Qian et al. present a deep learn- work dynamics analysis. The datasets are also used in the
ing model, BiLSTM-Attention, for detecting defects in smart analysis of real-life phenomena where datasets are quite dif-
contracts, treating contract operation codes as sequential sen- ficult to access. For example, Zhang et al. have proposed to
tences, and utilizing attention mechanisms for accurate detec- use blockchain networks for studying the resilience of power
tion [Qian et al., 2022]. Tang et al. identify vulnerabilities by networks [Zhang and Y., 2021].
analyzing code snippets of functions [Tang et al., 2023]. Code. Smart contract code datasets, such as [Ortner and Es-
kandari, 2024; di Angelo et al., 2023], include vulnerable
Community and Transaction Analysis. Huang et al. pro-
smart contract codes, offering valuable insights into secu-
vide a large-scale analysis of the EOSIO blockchain ecosys-
rity vulnerabilities within blockchain applications. Ibba et
tem, identifying bot activities at both community-level and
al. [Ibba, 2022] provide token and non-fungible token con-
account-level [Huang et al., 2020]. SoliAudit combines ML
tract code datasets, shedding light on the intricacies of these
and fuzz testing for vulnerability assessment using Solidity
specialized smart contract types.
machine code as learning features and incorporating gray-box
fuzz testing [Liao et al., 2019]. Chen et al. detect Ponzi Tools. Kushwaha et al. provide a comprehensive overview of
schemes in Ethereum by extracting features from user ac- tools and methodologies for analyzing Ethereum-based smart
counts and operation codes of contracts [Chen et al., 2018]. contracts [Kushwaha et al., 2022]. Additionally, [Durieux et
al., 2020] provides a comprehensive resource for an empirical
4.3.3 Open Questions and Challenges review of automated analysis tools on a dataset of 47, 587
One significant challenge in code machine learning for Ethereum smart contracts.
blockchains is the difficulty in finding the high-level code of
smart contracts. Smart contracts often have their bytecode 6 Conclusion and Future Direction
uploaded to the blockchain, making it challenging to access
their human-readable source code. Lack of access to high- The field of machine learning for blockchains has made sig-
level code hinders comprehensive analysis and interpretation. nificant progress in addressing numerous challenges, as high-
The decentralized and distributed nature of blockchain net- lighted in this survey. However, several promising future
works can introduce vulnerabilities, such as reentry attacks, directions await further advancement. Firstly, ensuring that
not found in typical software projects. Analyzing the script ML model decisions are transparent and interpretable is cru-
languages of blockchains for these vulnerabilities requires cial for responsible and trustworthy blockchain data analy-
blockchain domain knowledge as well as a good understand- sis. As blockchain data continues to grow in size and com-
ing of how distributed systems work. As a result, coding for plexity, the development of scalable learning and inference
blockchains is a challenging software domain. techniques becomes imperative. Efficient algorithms and dis-
Additionally, functions and opcodes on blockchains of- tributed computing approaches will play a pivotal role in han-
ten lack direct equivalents in conventional programming lan- dling the ever-expanding datasets. Furthermore, exploring
guages, which makes it challenging to apply standard code the application of machine learning to complex blockchain
analysis techniques, as the mapping between blockchain code networks, including cross-chain analysis, offers new insights
and traditional code constructs may not be straightforward. and opportunities for research. Moreover, the dynamic nature
of blockchain data requires the development of machine un-
5 Datasets and Tools learning and continuous learning techniques, enabling mod-
els to adapt to evolving data distributions and maintain ac-
Graphs. Blockchain network data have become increasingly curacy over time. Lastly, harnessing the capabilities of large
valuable in research for financial transactions, network dy- language models for understanding natural language, inter-
namics, and user behavior. The Elliptic dataset [Weber et al., acting with data, and generating source code can revolution-
2019] stands out with its labeled Bitcoin transaction graph, ize blockchain data and smart contract analysis.
References [Hagberg et al., 2008] A. Hagberg, P. Swart, and D. S Chult. Ex-
ploring network structure, dynamics, and function using net-
[Abay et al., 2019] N. C. Abay, C. G. Akcora, Y. R Gel, et al.
workx. Technical report, Los Alamos National Lab, United
Chainnet: Learning on blockchain graphs with topological fea- States, 2008.
tures. In ICDM, 2019. [Harlev et al., 2018] M. A. Harlev, H. Sun Yin, et al. Breaking bad:
[Akcora et al., 2018] C. G. Akcora, A. K. Dey, Y. R Gel, and De-anonymising entity types on the bitcoin blockchain using su-
M. Kantarcioglu. Forecasting bitcoin price with graph chainlets. pervised machine learning. HICSS, 2018.
In PAKDD, 2018. [Harrigan and Fretter, 2016] M. Harrigan and C. Fretter.
[Akcora et al., 2020] C. G. Akcora, S. Purusotham, et al. How The unreasonable effectiveness of address clustering. In
to not get caught when you launder money on blockchain? UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld, 2016.
arXiv:2010.15082, 2020. [Hassan et al., 2022] Muneeb Ul Hassan, Mubashir Husain
[Akcora et al., 2021] C. G. Akcora, Y. Li, Y. R Gel, and M. Kantar- Rehmani, and Jinjun Chen. Anomaly detection in blockchain
cioglu. Bitcoinheist: Topological data analysis for ransomware networks: A comprehensive survey. IEEE Communications
prediction on the bitcoin blockchain. In IJCAI, 2021. Surveys & Tutorials, 2022.
[Akcora et al., 2022] C. G. Akcora, Y. R. Gel, and M. Kantar- [Hou et al., 2021] Wenhan Hou, Bo Cui, and Ru Li. A survey on
cioglu. Blockchain Networks: Data Structures of Bitcoin, Mon- blockchain data analysis. In COMPSAC, 2021.
ero, Zcash, Ethereum, Ripple, and Iota. WIREs Data Mining [Huang et al., 2020] Y. Huang, H. Wang, L. Wu, et al. Understand-
Knowl. Discov., 12(1), 2022. ing (mis) behavior on the eosio blockchain. AMCS, 4(2):1–28,
[Alarab et al., 2020] I. Alarab, S. Prakoonwit, and M. I. Nacer. 2020.
Competence of graph convolutional networks for anti-money [Ibba, 2022] G. Ibba. A smart contracts repository for top trending
laundering in bitcoin blockchain. In ICMLT, 2020. contracts. In IWETSEB, pages 17–20, 2022.
[Alqassem et al., 2018] I. Alqassem, I. Rahwan, and D. Svetinovic. [Kappos et al., 2018] G. Kappos, H. Yousaf, M. Maller, and
The anti-social system properties: Bitcoin network data analysis. S. Meiklejohn. An empirical analysis of anonymity in zcash. In
IEEE Trans Syst Man Cybern, 50(1):21–31, 2018. USENIX Security, 2018.
[Anoaica and Levard, 2018] A. Anoaica and H. Levard. Quan- [Khan and Akcora, 2022] Arijit Khan and Cuneyt Gurcan Akcora.
titative description of internal activity on the ethereum public Graph-based management and mining of blockchain data. In
blockchain. In NTMS, 2018. CIKM, 2022.
[Antelmi et al., 2023] A. Antelmi, G. Cordasco, et al. A survey on [Khan, 2022] Arijit Khan. Graph analysis of the ethereum
hypergraph representation learning. ACM Comp. Sur., 56(1):1– blockchain data: A survey of datasets, methods, and future work.
38, 2023. In Blockchain, 2022.
[Bartoletti et al., 2020] M. Bartoletti, S. Carta, T. Cimoli, and [Kılıç et al., 2022] B. Kılıç, A. Sen, and C. Özturan. Fraud detec-
R. Saia. Dissecting ponzi schemes on ethereum: identification, tion in blockchains using machine learning. In BCCA, 2022.
analysis, and impact. Future Generation Computer Systems, [Kushwaha et al., 2022] S. S. Kushwaha, S. Joshi, et al. Ethereum
102:259–277, 2020. smart contract analysis tools: A systematic review. IEEE Access,
[Benidis. et al., 2023] K. Benidis., Syama S. R., et al. Deep learn- 10:57037–57062, 2022.
ing for time series forecasting: Tutorial and literature survey. [Kwon et al., 2019] D. Kwon, J. Kim, J. Heo, C. Kim, and Y. Han.
ACM Comput. Surv., 55(6):121:1–121:36, 2023. Time series classification of cryptocurrency price trend based on
[Chapiro et al., 2021] C. Chapiro, M. Hydary, and C. Lomazzo. a recurrent lstm neural network. Journal of Information Process-
Linking blockchain to impact, 2021. ing Systems, 15(3):694–706, 2019.
[Chawla et al., 2002] N. V Chawla, K. W Bowyer, et al. Smote: [Lahmiri and Bekiros, 2019] S. Lahmiri and S. Bekiros. Cryptocur-
Synthetic minority over-sampling technique. Journal of artificial rency forecasting with deep learning chaotic neural networks.
intelligence research, 16:321–357, 2002. Chaos, Solitons & Fractals, 118:35–40, 2019.
[Chen et al., 2018] W. Chen, Z. Zheng, et al. Detecting ponzi [Lee et al., 2020] X. T. Lee, A. Khan, et al. Measurements, analy-
schemes on ethereum: Towards healthier blockchain technology. ses, and insights on the entire ethereum blockchain network. In
In WWW, 2018. WebConf, 2020.
[di Angelo et al., 2023] M. di Angelo, T. Durieux, J. F. Ferreira, [Li et al., 2020a] Ji Li, C. Gu, F. Wei, and Xi Chen. A survey on
and G. Salzer. SmartBugs 2.0: An execution framework for blockchain anomaly detection using data mining techniques. In
weakness detection in Ethereum smart contracts. In ASE, 2023. BlockSys, pages 491–504. Springer, 2020.
to appear. [Li et al., 2020b] Y. Li, Y. Cai, H. Tian, G. Xue, and Z. Zheng.
[Dickison et al., 2016] M. E. Dickison, M. Magnani, and L. Rossi. Identifying illicit addresses in bitcoin network. In BlockSys,
Multilayer social networks. Cambridge University Press, 2016. pages 99–111. Springer, 2020.
[Durieux et al., 2020] T Durieux, J. F. Ferreira, et al. Empirical [Liang et al., 2018] J Liang, L. Li, and D. Zeng. Evolutionary dy-
review of automated analysis tools on 47, 587 ethereum smart namics of cryptocurrency transaction networks: An empirical
contracts. In ICSE, pages 530–541. ACM, 2020. study. PLOS ONE, 13(8):1–18, 08 2018.
[Fan et al., 2021] S. Fan, S. Fu, H. Xu, and X. Cheng. Al-spsd: [Liao et al., 2019] J. Liao, T. Tsai, C. He, and C. Tien. Soliaudit:
Anti-leakage smart ponzi schemes detection in blockchain. IPM, Smart contract vulnerability assessment based on machine learn-
58(4):102587, 2021. ing and fuzz testing. In IOTSMS, pages 458–465. IEEE, 2019.
[Ferreira T. et al., 2021] Christof Ferreira T., A. K. I., A. Gervais, [Liu et al., 2021a] X. Liu, X. Jiang, et al. Knowledge discovery in
and R. State. The eye of horus: Spotting and analyzing attacks cryptocurrency transactions: A survey. Ieee access, 9:37229–
on ethereum smart contracts. In FC, 2021. 37254, 2021.
[Gai et al., 2023] Y. Gai, L. Zhou, K. Qin, D. Song, and A. Ger- [Liu et al., 2021b] Z. Liu, P. Qian, X. Wang, et al. Combining graph
vais. Blockchain large language models. arXiv preprint neural networks with expert knowledge for smart contract vulner-
arXiv:2304.12749, 2023. ability detection. IEEE TKDE, 2021.
[Livieris et al., 2020] I. E Livieris, E. Pintelas, S. Stavroyiannis, [Weber et al., 2019] M. Weber, G. Domeniconi, et al. Anti-money
and P. Pintelas. Ensemble deep learning models for forecasting laundering in bitcoin: Experimenting with graph convolutional
cryptocurrency time-series. Algorithms, 13(5):121, 2020. networks for financial forensics. arXiv:1908.02591, 2019.
[Mi et al., 2021] F. Mi, Z. Wang, et al. Vscl: automating vulnera- [Wen et al., 2023] M Wen, R. Lin, et al. Large sequence models for
bility detection in smart contracts with deep learning. In ICBC, sequential decision-making: a survey. Frontiers Comput. Sci.,
pages 1–9. IEEE, 2021. 17(6):176349, 2023.
[Möser et al., 2017] M. Möser, Kyle Soska, et al. An empirical [Wood, 2018] G. Wood. Ethereum: A secure decentralised
analysis of traceability in the monero blockchain. arXiv preprint generalised transaction ledger. https://github.com/ethereum/
arXiv:1704.04299, 2017. yellowpaper, 2018.
[Mujlid, 2023] Hana Mujlid. A survey on machine learning ap- [Wu et al., 2021] J Wu, J. Liu, Y. Zhao, and Z. Zheng. Analy-
proaches in cryptocurrency: Challenges and opportunities. In sis of cryptocurrency transactions from a network perspective:
iCoMET, pages 1–6. IEEE, 2023. An overview. Journal of Network and Computer Applications,
[Nakamoto, 2008] S. Nakamoto. Bitcoin: A Peer-to-Peer Elec- 190:103139, 2021.
tronic Cash System, 2008. [Wu et al., 2022a] J. Wu, J. Liu, W. Chen, et al. Detecting mix-
[Ortner and Eskandari, 2024] M. Ortner and S. Eskandari. Smart ing services via mining bitcoin transaction network with hybrid
contract sanctuary, 2024. motifs. IEEE Trans. Syst. Man Cybern. Syst., 52(4):2237–2249,
[Patel et al., 2020] V. Patel, L. Pan, and S. Rajasegarar. Graph deep 2022.
learning based anomaly detection in ethereum blockchain net- [Wu et al., 2022b] M. Wu, W. McTighe, , et al. Tutela: An open-
work. In ICNSS, pages 132–148. Springer, 2020. source tool for assessing user-privacy on ethereum and tornado
[Patel et al., 2022] V. Patel, S Rajasegarar, et al. Evangcn: Evolv- cash. arXiv:2201.06811, 2022.
ing graph deep neural network based anomaly detection in [Wu et al., 2023] J. Wu, K. Lin, Dan Lin, et al. Financial crimes in
blockchain. In ICADMA, pages 444–456. Springer, 2022. web3-empowered metaverse: Taxonomy, countermeasures, and
[Pierro et al., 2020] G. A. Pierro, R. Tonelli, and M. Marchesi. opportunities. IEEE Open Journal of the Computer Society,
An Organized Repository of Ethereum Smart Contracts’ Source 4:37–49, 2023.
Codes and Metrics. Future Internet, 12(11):197, 2020. [Xia et al., 2021] F. Xia, K. Sun, et al. Graph learning: A survey.
[Pocher et al., 2023] N. Pocher, M. Zichichi, et al. Detecting IEEE Trans. Artif. Intell., 2(2):109–127, 2021.
anomalous cryptocurrency transactions: An aml/cft applica- [Xie, 2019] Rain Xie. Why china had to ban cryptocurrency but
tion of machine learning-based forensics. Electronic Markets, the us did not: a comparative analysis of regulations on crypto-
33(1):37, 2023. markets between the us and china. Wash. U. Global Stud. L. Rev.,
[Qi et al., 2023] Y. Qi, J. Wu, H. Xu, and M. Guizani. Blockchain 18:457, 2019.
data mining with graph learning: A survey. IEEE Trans. on Patt. [Yang et al., 2023] Z. Yang, G. Zhang, J. Wu, et al. A com-
An. and Ma. Int., 2023. prehensive survey of graph-level learning. arXiv preprint
[Qian et al., 2022] C. Qian, T. Hu, and B. Li. A bilstm-attention arXiv:2301.05860, 2023.
model for detecting smart contract defects more accurately. In [Yousaf et al., 2019] H. Yousaf, G. Kappos, and S. Meiklejohn.
QRS, pages 53–62. IEEE, 2022. Tracing transactions across cryptocurrency ledgers. In USENIX
[Ron and Shamir, 2013] D. Ron and A. Shamir. Quantitative analy- Security, 2019.
sis of the full bitcoin transaction graph. In FC 2013, pages 6–24. [Yu et al., 2021a] L. Yu, N. Zhang, and W. Wen. Abnormal trans-
Springer, 2013. action detection based on graph networks. In COMPSAC, 2021.
[Rouhani and Abedin, 2020] S. Rouhani and E. Abedin. Crypto- [Yu et al., 2021b] S. Yu, J. Jin, Y. Xie, J. Shen, and Q. Xuan. Ponzi
currencies narrated on tweets: a sentiment analysis approach. scheme detection in ethereum transaction network. In BlockSys,
IJES, 36(1):58–72, 2020. 2021.
[Scheid et al., 2022] E. J Scheid, R. Hy, et al. On the employment [Yuan et al., 2020] Z. Yuan, Q. Yuan, and J. Wu. Phishing detection
of machine learning in the blockchain selection process. IEEE on ethereum via learning representation of transaction subgraphs.
Transactions on Network and Service Management, 19(4):3835– In BlockSys, 2020.
3846, 2022. [Z. et al., 2021] Lin Z., S. S. Gupta, A. Khan, and R. Luo. Temporal
[Schmidhuber and Hochreiter, 1997] J. Schmidhuber and S.. analysis of the entire ethereum blockchain network. In WebConf,
Hochreiter. Long short-term memory. Neural Comput, 2021.
9(8):1735–1780, 1997. [Zhang and Y., 2021] X. Zhang and Gel Y. Eager: Collaborative
[Shamsi et al., 2022] K. Shamsi, F. Victor, et al. Chartalist: Labeled research: Blockchain graphs as testbeds of power grid resilience
graph datasets for utxo and account-based blockchains. NeurIPS, and functionality metrics, 2021.
35:34926–34939, 2022. [Zhang et al., 2023] Z. Zhang, B. Luo, S. Lu, and B. He. Live
[Spagnuolo et al., 2014] M. Spagnuolo, F. Maggi, and S. Zanero. graph lab: Towards open, dynamic and real transaction graphs
Bitiodine: Extracting intelligence from the bitcoin network. In with NFT. CoRR, abs/2310.11709, 2023.
FC, 2014. [Zhuang et al., 2021] Y. Zhuang, Z. Liu, P. Qian, Q. Liu, X. Wang,
[Tang et al., 2023] X. Tang, Y. Du, A. Lai, et al. Deep learning- and Q. He. Smart contract vulnerability detection using graph
based solution for smart contract vulnerabilities detection. Scien- neural networks. In IJCAI, 2021.
tific Reports, 13(1):20106, 2023.
[Tian et al., 2021] H. Tian, Y. Li, Y. Cai, X. Shi, and Z. Zheng.
Attention-based graph neural network for identifying illicit bit-
coin addresses. In BlockSys, 2021.
[Victor, 2020] F. Victor. Address clustering heuristics for ethereum.
In FC, 2020.
[Wang et al., 2021] J. Wang, P. Chen, S. Yu, and Q. Xuan. Tsgn:
Transaction subgraph networks for identifying ethereum phishing
accounts. In BlockSys, 2021.

You might also like