AML Bitcoin
AML Bitcoin
Charles E. Leiserson
arXiv:1908.02591v1 [cs.SI] 31 Jul 2019
MIT CSAIL
[email protected]
ABSTRACT CCS CONCEPTS
Anti-money laundering (AML) regulations play a critical role in • Security and privacy → Database activity monitoring; • Com-
safeguarding financial systems, but bear high costs for institutions puting methodologies → Machine learning; • Applied com-
and drive financial exclusion for those on the socioeconomic and puting → Network forensics.
international margins. The advent of cryptocurrency has intro-
duced an intriguing paradox: pseudonymity allows criminals to KEYWORDS
hide in plain sight, but open data gives more power to investigators Graph Convolutional Networks, Anomaly Detection, Financial
and enables the crowdsourcing of forensic analysis. Meanwhile Forensics, Cryptocurrency, Anti-Money Laundering, Visualization
advances in learning algorithms show great promise for the AML
ACM Reference Format:
toolkit. In this workshop tutorial, we motivate the opportunity to
Mark Weber, Giacomo Domeniconi, Jie Chen, Daniel Karl I. Weidele, Claudio
reconcile the cause of safety with that of financial inclusion. We Bellei, Tom Robinson, and Charles E. Leiserson. 2019. Anti-Money Laun-
contribute the Elliptic Data Set, a time series graph of over 200K dering in Bitcoin: Experimenting with Graph Convolutional Networks for
Bitcoin transactions (nodes), 234K directed payment flows (edges), Financial Forensics. In Proceedings of ACM Conference (KDD ’19 Workshop
and 166 node features, including ones based on non-public data; on Anomaly Detection in Finance). ACM, New York, NY, USA, 7 pages.
to our knowledge, this is the largest labelled transaction data set
publicly available in any cryptocurrency. We share results from a 1 TOWARD FINANCIAL INCLUSION
binary classification task predicting illicit transactions using varia- “It’s expensive to be poor.” This is a common credo among advocates
tions of Logistic Regression (LR), Random Forest (RF), Multilayer for financial inclusion. It speaks to the fact that those on the margins
Perceptrons (MLP), and Graph Convolutional Networks (GCN), of society suffer from restricted access to the financial system and
with GCN being of special interest as an emergent new method for higher relative costs of participation.
capturing relational information. The results show the superiority The problem of restricted access (e.g. the ability to sign up for
of Random Forest (RF), but also invite algorithmic work to combine a bank account) is, in part, an unintended consequence of increas-
the respective powers of RF and graph methods. Lastly, we consider ingly stringent anti-money laundering (AML) regulations, which,
visualization for analysis and explainability, which is difficult given while essential for safeguarding the financial system, have a dis-
the size and dynamism of real-world transaction graphs, and we proportionately negative effect on low-income people, immigrants,
offer a simple prototype capable of navigating the graph and ob- and refugees [16]. Approximately 1.7 billion adults are unbanked
serving model performance on illicit activity over time. With this [7]. The problem of higher relative costs is also, in part, a function
tutorial and data set, we hope to a) invite feedback in support of of AML policy, which enforces high fixed costs of compliance on
our ongoing inquiry, and b) inspire others to work on this societally money service businesses (MSBs) along with the fear of criminal
important challenge. and monetary penalties for noncompliance – “low value" customers
∗ Both
just aren’t worth the risk. Consider global remittances to low-and-
authors contributed equally to this research.
middle-income countries, which reached a record high $529 billion
Permission to make digital or hard copies of all or part of this work for personal or in 2018, far outpacing the global aid contribution of $153 billion.
classroom use is granted without fee provided that copies are not made or distributed The current average cost of sending $200 is an expensive 7 per-
for profit or commercial advantage and that copies bear this notice and the full citation cent, with some countries suffering rates of over 10 percent. The
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, United Nations Sustainable Development Goal number 10.7 targets
to post on servers or to redistribute to lists, requires prior specific permission and/or a a reduction to 3 percent by 2030.[12]
fee. Request permissions from [email protected].
And yet AML regulations cannot be summarily dismissed as over
KDD ’19 Workshop on Anomaly Detection in Finance, August 2019, Anchorage, AK, USA
© 2019 Association for Computing Machinery. burdensome. Multi-billion dollar illicit industries like drug cartels,
human trafficking, and terrorist organizations cause intense human
KDD ’19 Workshop on Anomaly Detection in Finance, August 2019, Anchorage, AK, USA Weber and Domeniconi, et al.
suffering around the world. The recent 1Malaysia Development know your customer’s customer. In the fragmented data ecosystem
Berhad (1MDB) money laundering scandal robbed the Malaysian of traditional finance, this aspect of compliance is often executed
people of over $11 billion in taxpayer funds earmarked for the na- by phone calls between MSBs. But in the open system of Bitcoin,
tion’s development [22], with mega-fines and criminal indictments the full graph transaction network data is publicly available, albeit
for Goldman Sachs among others implicated in the wrongdoing. in pseudonymous and unlabelled form.
The even more recent Danske Bank money laundering scandal in To meet the opportunity this public data presents, cryptocur-
Estonia, which served as a hub for an estimated $200 billion in illicit rency intelligence companies have emerged to provide AML solu-
money flows from Russia and Azerbaijan, similarly extracted an tions tailored to the cryptocurrency domain. Whereas the pseudonymity
incalculable toll on innocent citizens of these countries and served of Bitcoin is an advantage for criminals, the public availability of
implicated institutions like Danske Bank and Deutsche Bank with data is a key advantage for investigators.
billions of dollars in losses [23].
Money laundering is not a victimless crime, and current methods 2 THE ELLIPTIC DATA SET
for the traditional financial system are doing a poor job of stopping Elliptic is a cryptocurrency intelligence company focused on safe-
it. Without reducing this complex challenge to data analysis alone, guarding cryptocurrency ecosystems from criminal activity. For
we pose the question: with the right tools and open data, can we help this tutorial and as a contribution to the research community, we
reconcile the need for safety with the cause of financial inclusion? present The Elliptic Data Set, a graph network of Bitcoin transac-
tions with handcrafted features. As a contribution to the research
and AML communities, Elliptic has agreed to share this data set pub-
1.1 AML in a cryptocurrency world licly. To our knowledge, it constitutes the world’s largest labelled
The advent of cryptocurrency introduced by Bitcoin [17] ignited an transaction data set publicly available in any cryptocurrency.
explosion of technological and entrepreneurial interest in payment
processing. Around the world, money transfer startups spun up to 2.1 Graph Construction
compete with legacy banks and MSBs like Western Union. They
The Elliptic Data Set maps Bitcoin transactions to real entities
focused on enabling low-cost, peer-to-peer transfers of cash within
belonging to licit categories (exchanges, wallet providers, miners,
and across borders using Bitcoin and other cryptocurrencies as the
licit services, etc.) versus illicit ones (scams, malware, terrorist
“rails” (a commonly used term in this space). Many explicitly tar-
organizations, ransomware, Ponzi schemes, etc.). From the raw
geted remittances and championed the cause of financial inclusion.
Bitcoin data, a graph is constructed and labelled such that the nodes
Alongside these entrepreneurs grew a community of academics and
represent transactions and the edges represent the flow of Bitcoin
policy advocates supporting updated regulatory considerations for
currency (BTC) going from one transaction to the next one. A given
cryptocurrency.
transaction is deemed licit (versus illicit) if the entity initiating the
Dampening this excitement was Bitcoin’s bad reputation. Many
transaction (i.e., the entity controlling the private keys associated
criminals used Bitcoin’s pseudonymity to hide in plain sight, con-
with the input addresses of a specific transaction) belongs to a licit
ducting ransomware attacks and operating dark marketplaces for
(illicit) category1 . Importantly, all features are constructed using
the exchange of illegal goods and services.
only publicly available information.
In May 2019, the Financial Crimes Enforcement Network (Fin-
CEN) of the United States issued new guidance on how the Bank 2.1.1 Nodes and Edges. There are 203,769 node transactions and
Secrecy Act (BSA) of 1970 applies to cryptocurrency, or what Fin- 234,355 directed edge payments flows. For perspective, using the
CEN calls convertible virtual currencies (CVC) [18]. Consistent same graph representation the full Bitcoin network has approxi-
with the BSA, the guidance calls for MSBs to generate individual- mately 438M nodes and 1.1B edges as of this writing. In the Elliptic
ized risk assessments measuring exposure to money laundering, Data Set, two percent (4,545) are labelled class1 (illicit). Twenty-one
terrorism finance, and other financial crime. These assessments are percent (42,019) are labelled class2 (licit). The remaining transac-
based on customer composition, geographies served, and financial tions are not labelled with regard to licit versus illicit, but have
products or services offered. The assessments must inform the man- other features.
agement of customer relationships, including the implementation
2.1.2 Features. Each node has associated 166 features. The first 94
of controls commensurate with risk; in other words, MSBs must not
features represent local information about the transaction – includ-
only report suspicious accounts, but must also take action against
ing the time step, number of inputs/outputs, transaction fee, output
them (e.g. freeze them or shut them down). The guidance defines a
volume and aggregated figures such as average BTC received (spent)
“well-developed risk assessment” as “assisting MSBs in identifying
by the inputs/outputs and average number of incoming (outgoing)
and providing a comprehensive analysis of their individual risk
transactions associated with the inputs/outputs. The remaining 72
profile.” Reinforcing the Know Your Customer (KYC) requirements
features, called aggregated features, are obtained by aggregating
of the BSA, the guidance requires MSBs to “know enough about
transaction information one-hop backward/forward from the cen-
their customers to be able to determine the risk level they represent
ter node - giving the maximum, minimum, standard deviation and
to the institution.”
correlation coefficients of the neighbour transactions for the same
What it means to “know enough” about one’s customer is the
information data (number of inputs/outputs, transaction fee, etc.).
subject of much debate in compliance and policy circles. In practice,
one of the most challenging aspects of this is an implicit but effec- 1 Notethat for simplicity, this argument ignores mixer transactions where the inputs
tively enforced requirement to not only know your customer, but to are controlled by multiple entities.
Anti-Money Laundering in Bitcoin KDD ’19 Workshop on Anomaly Detection in Finance, August 2019, Anchorage, AK, USA
Figure 1: (Top) Fraction of illicit vs. licit nodes at different time steps in the data set. (Bottom) Number of nodes vs. time step.
2.1.3 Temporal Information. A time stamp is associated with each of wallets participating in the selected transactions. To overcome
node, representing an estimate of the time when the transaction is this, Elliptic uses a high-performance all-in-memory graph engine
confirmed by the Bitcoin network. There are 49 distinct time steps, for the computation of features.
evenly spaced with an interval of about two weeks. Each time The second challenge arises from the underlying graph structure
step contains a single connected component of transactions that of the data and the heterogeneity in the number of neighbors a
appeared on the blockchain within less than three hours between transaction can have. In building the 72 aggregated features, the
each other; there are no edges connecting different time steps. problem of heterogeneous neighborhoods is addressed by naively
Clearly the nodes in a specific time step have associated time stamps constructing statistical aggregates (minimum, maximum, etc.) of the
very close to each other, so effectively each one of them can be local features of a neighbor transaction. In general, this solution is
thought of as an instantaneous “snapshot” in time. The number of sub-optimal because it carries a significant loss of information. We
nodes for each time step is reasonably uniform over time (ranging address this in our forthcoming discussion of graph deep learning
from 1,000 to 8,000 nodes). See Figure 1. methods, which may better account for the local graph topology.
3.1 Benchmark Methods A 2-layer GCN, as often used, can be neatly written as
Given the features previously described, benchmark machine learn- H (2) = softmax(A
b · ReLU(AXW
b (0)
) · W (1) ).
ing methods use the first 94 features in supervised learning for
binary classification. Such techniques include Logistic Regression A “skip” variant, which we find practically useful, inserts a skip con-
[1], Multilayer Perceptron (MLP) (ibid), and Random Forest [2]. In nection between the intermediate embedding H (1) = ReLU(AXW b (0) )
MLP, each input neuron takes in a data feature and the output is a and the input node features X , resulting in the architecture
softmax with a probability vector for each class. Logistic Regression
e(2) = softmax(A
H b · ReLU(AXW
b (0)
) · W (1) + X W
e (1) ),
and Random Forest are popular for AML, especially when used in
concert with one another for their respective advantages—Random where W e (1) is a weight matrix for the skip connection. We call this
Forest for accuracy and Logistic Regression for explainability. These architecture Skip-GCN. When W (0) and W (1) are zero, Skip-GCN
methods, however, do not leverage any graph information. is equivalent to Logistic Regression. Hence, Skip-GCN should be at
In the Elliptic Data Set, the local features are enhanced with a set least as powerful as Logistic Regression.
of 72 features that contain information about the immediate neigh-
bourhood. We will see the utilization of these features improves 3.3 Temporal Modeling
performance. While this approach shows the graph structure carries
Financial data are inherently temporal as transactions are time
in the binary classification problem, and that this can be used with
stamped. It is reasonable to assume there exists certain dynamics,
standard machine learning techniques, it is challenging to extend
albeit hidden, that drive the evolution of the system. A prediction
the purely feature-based method beyond the immediate neighbour-
model will be more useful if it is designed in a manner to capture the
hood. This drawback motivates the use of Graph Convolutional
dynamism. This way, a model trained on a given time period may
Networks.
better generalize to subsequent time steps. The better the model
captures system dynamics, which are also evolving, the longer
3.2 Graph Convolutional Networks (GCN)
horizon it can forest into.
Deep learning on graph structured data is a subject of rapidly in- A temporal model that extends GCN is EvolveGCN [19], which
creasing interest [3, 6, 8, 9, 14]. Dealing with combinatorial com- computes a separate GCN model for each time step. These GCNs
plexity inherent to graph structures poses scalability challenges are then connected through a recurrent neural network (RNN) to
for practical applications, and significant strides have been made capture the system dynamics. Hence, the GCN model for a future
in addressing these challenges [5, 11, 24]. Specifically, we consider time step is evolved from those in the past, where the evolution
Graph Convolutional Networks (GCNs). A GCN consists of multiple captures the dynamism.
layers of graph convolution, which is similar to a perceptron but In EvolveGCN, the GCN weights are collectively treated as the
additionally uses a neighborhood aggregation step motivated by system state. The model is updated upon an input to the system
spectral convolution. every time, by using an RNN (e.g., GRU). The input is the graph
Consider the Bitcoin transaction graph from the Elliptic Data information at the current time step. The graph information may
Set as G = (N , E), where N is the set of node transactions and E is be instantiated in many ways; in EvolveGCN, it is represented by
the set of edges representing the flow of BTC. The l-th layer of the the embeddings of the top-k influential nodes in the graph.
GCN takes the adjacency matrix A and the node embedding matrix
H (l ) as input, and uses a weight matrix W (l ) to update the node 4 EXPERIMENTS
embedding matrix to H (l +1) as output. Mathematically, we write
Here we show experimental results obtained on the Elliptic Data
H (l +1) = σ (AH
b (l )W (l ) ), (1) Set. We performed a 70:30 temporal split of training and test data,
respectively. That is, the first 34 time steps are used for training
where A
b is a normalization of A defined as:
! the model and the last 15 for test. We use a temporal split because
1
e− 2 A
1
e− 2 ,
Õ it reflects the nature of the task. As such, GCN is trained in an
A
b= D eD A
e= A + I, D
e = diag A
ei j ,
j
inductive setting.
We first tested standard classification models for the licit/illicit
and σ is the activation function (typically ReLU) for all but the prediction using three standard approaches: Logistic Regression
output layer. The initial embedding matrix comes from the node (with default parameters from the scikit-learn Python package [4]),
features; i.e., H (0) = X . Let there be L layers of graph convolutions. Random Forest (also from scikit-learn, with 50 estimators and 50
In the case of node classification, the output layer is the softmax, max features), and Multilayer Perceptron (implemented in PyTorch).
where H (L) consists of prediction probabilities. Our MLP had one hidden layer of 50 neurons and was trained for
One sees a graph convolution layer is similar to a feed forward 200 epochs by using the Adam optimizer and a learning rate of
layer, except for the multiplication with A b in the front. This matrix 0.001.
is motivated by spectral graph filtering on the graph Laplacian We evaluated these models by using all the 166 features (referred
matrix and it results from a linear functional of the Laplacian. On to as AF ), as well as only the local ones, i.e., the first 94 (referred to
the other hand, one may also interpret the multiplication with A b as as LF ). The results are summarized in the top part of Table 1.
an aggregation of the transformed embeddings of the neighboring The bottom part of Table 1 reports the results achieved when
nodes. The parameters of the GCN are the weight matrices W (l ) , we leveraged the graph structure of the data. We trained the GCN
for different layers l. model for 1000 epochs using the Adam optimizer with a learning
Anti-Money Laundering in Bitcoin KDD ’19 Workshop on Anomaly Detection in Finance, August 2019, Anchorage, AK, USA
Table 1: Illicit classification results. Top part of the table in this case, the input features are quite informative already. Using
shows results without the leverage of the graph information, these features alone, Random Forest achieves the best F 1 score. The
for each model are shown results with different input: AF representation power of the input features is also reflected by the
refers to all features, LF refers to the local features, i.e. the gain of Skip-GCN over GCN.
first 94, and N E refers to the node embeddings computed by Another insight from Table 1 is obtained from the comparison
GCN. Bottom part of the table shows results with GCN. between methods trained on all the features (AF ) and those on
only the 94 local features (LF ). For all the three evaluated mod-
Illicit MicroAVG els, the aggregated information led to higher accuracy, indicating
Method Precision Recall F1 F1 the importance of the graph structure in this context. With this
observation, we further evaluated the methods with an enhanced
Logistic RegrAF 0.404 0.593 0.481 0.931 input feature set. The goal of this experiment was to show that
Logistic RegrAF +N E 0.537 0.528 0.533 0.945 graph information was useful to enhance the representation of a
Logistic RegrLF 0.348 0.668 0.457 0.920 transaction. In this setting, we concatenated the node embeddings
Logistic RegrLF +N E 0.518 0.571 0.543 0.945 obtained from GCN with the original features X . Results show that
RandomForestAF 0.956 0.670 0.788 0.977 with the enhanced feature set the accuracy of the model improves,
RandomForestAF +N E 0.971 0.675 0.796 0.978 for both full features (AF + N E) and local features (LF + N E).
RandomForestLF 0.803 0.611 0.694 0.966 Table 2 compares the prediction performance between the non-
RandomForestLF +N E 0.878 0.668 0.759 0.973 temporal GCN and the temporal EvolveGCN. EvolveGCN consis-
MLPAF 0.694 0.617 0.653 0.962 tently outperforms GCN, although the improvement is not substan-
MLPAF +N E 0.780 0.617 0.689 0.967 tial for this data set. One avenue of further investigation is the use
MLPLF 0.637 0.662 0.649 0.958 of alternative forms of system input to drive the recurrent update
MLPLF +N E 0.6819 0.5782 0.6258 0.986 inside GRU.
GCN 0.812 0.512 0.628 0.961
Table 2: GCN v.s. EvolveGCN
Skip-GCN 0.812 0.623 0.705 0.966
GCN EvolveGCN
Precis. Recall F1 Precis. Recall F1
rate of 0.001. In our experiment we used a 2-layer GCN and, after
Illicit 0.812 0.623 0.705 0.850 0.624 0.720
hyper-parameter tuning, we set the size of the node embeddings to
MicroAVG 0.966 0.966 0.966 0.968 0.968 0.968
be 100.
The task is a binary classification and the two classes are im-
balanced (see Figure 1). For AML, more important is the minority The Dark Market Shutdown. An important consideration for
class (i.e., the illicit class). Hence, we trained the GCN model using AML is the robustness of a prediction model with respect to emerg-
a weighted cross entropy loss to provide higher importance to the ing events. One interesting aspect of this data set is the sudden
illicit samples. After hyperparameter tuning, we opted for a 0.3/0.7 closure of a dark market occurring during the time span of the data
ratio for the licit and illicit classes. Table 1 shows the testing results (at time step 43). As seen in Figure 2, this event causes all methods
in term of precision, recall, and F 1 score for the illicit class. For the to perform poorly after the shutdown. Even a Random Forest model
sake of completeness, we also show the micro-averaged F 1 score. re-trained after every test time step, assuming the availability of
Note that GCN and the variant Skip-GCN outperform Logistic ground truth after each time, is not able to reliably capture new
Regression, indicating the usefulness of the graph-based method illicit transactions after the dark market shutdown. The robustness
compared to one agnostic to graph information. On the other hand, of methods to such events emerges as a major challenge to address.
KDD ’19 Workshop on Anomaly Detection in Finance, August 2019, Anchorage, AK, USA Weber and Domeniconi, et al.
6 GRAPH VISUALIZATION
Lastly, in support of analysis and explainability, which are impor-
tant for AML compliance, we have created a visualization prototype
called Chronograph. Visualizing a high-dimensional graph imposes
a layer of complexity on top of plain feature vectors with respect to
explaining model performance. Chronograph aims to address this
by supporting the human analyst with an integrated representation
of the model.
[10] Martin Harrigan and Christoph Fretter. 2016. The unreasonable effectiveness
of address clustering. In 2016 Intl IEEE Conferences on Ubiquitous Intelligence &
Computing, Advanced and Trusted Computing, Scalable Computing and Commu-
nications, Cloud and Big Data Computing, Internet of People, and Smart World
Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld). IEEE, 368–373.
[11] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with
Graph Convolutional Networks. In ICLR.
[12] Knomad and World Bank Group. 2019. Migration and Remittances: Recent
Developments and Outlook. Migration and Development Brief 31.
[13] Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samuel Rota Bulo.
2015. Deep Neural Decision Forests. In ICCV.
[14] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated
Graph Sequence Neural Networks. In ICLR.
[15] Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform man-
ifold approximation and projection for dimension reduction. arXiv preprint
arXiv:1802.03426 (2018).
[16] Daniel J. Mitchell. 2012. World Bank Study Shows How Anti-Money Laundering
Rules Hurt the Poor. Forbes.
[17] Satoshi Nakamoto. 2008. Bitcoin: A peer-to-peer electronic cash system. (2008).
[18] Financial Crimes Enforcement Network. 2019. Application of FinCENâĂŹs
Regulations to Certain Business Models Involving Convertible Virtual Currencies.
FIN-2019-G001 (May 2019).
[19] Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hi-
roki Kanezashi, Tim Kaler, and Charles E. Leiserson. 2019. EvolveGCN: Evolving
Graph Convolutional Networks for Dynamic Graphs. Preprint arXiv:1902.10191.
Figure 4: Chronograph User Interface: User can navigate [20] Paulo E Rauber, Samuel G Fadel, Alexandre X Falcao, and Alexandru C Telea. 2016.
through time-sliced transaction data and observe transac- Visualizing the hidden activity of artificial neural networks. IEEE transactions on
tion patterns and patterns of change. Illicit transactions are visualization and computer graphics 23, 1 (2016), 101–110.
[21] Mark Weber, Jie Chen, Toyotaro Suzumura, Aldo Pareja, Tengfei Ma, Hiroki
dyed red. Further statistics are displayed on the left. Kanezashi, Tim Kaler, Charles E. Leiserson, and Tao B. Schardl. 2018. Scalable
Graph Learning for Anti-Money Laundering: A First Look. CoRR abs/1812.00076
(2018). arXiv:1812.00076 http://arxiv.org/abs/1812.00076
[22] Wikipedia. [n. d.]. 1Malaysia Development Berhad scandal.
[23] Wikipedia. [n. d.]. Danske Bank money laundering scandal.
shared early experimental results using a variety of methods includ- [24] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton,
ing Graph Convolutional Networks, and discussed possible next and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale
steps for algorithmic advances. We have provided a prototype for Recommender Systems. In KDD.
visualization of such data and models for augmenting human anal-
ysis and explainability. Most important, we hope to have inspired
others to work on this societally important challenge of making
our financial systems safer and more inclusive.
ACKNOWLEDGMENTS
This work was funded by the MIT-IBM Watson AI Lab (mitibm.mit.edu),
a joint research initiative between the Massachusetts Institute of
Technology and IBM Research. Data and domain expertise were
provided by Elliptic (www.elliptic.co).
REFERENCES
[1] Christopher Bishop. 2006. Pattern Recognition and Machine Learning. Springer-
Verlag.
[2] Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.
[3] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral
Networks and Locally Connected Networks on Graphs. In ICLR.
[4] Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas
Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort,
Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël
Varoquaux. 2013. API design for machine learning software: experiences from
the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining
and Machine Learning. 108–122.
[5] Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph
Convolutional Networks via Importance Sampling. In ICLR.
[6] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convo-
lutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In
NIPS.
[7] Demirguc-Kunt, Leora Klapper, Dorothe Singer, Sinya Ansar, and Jake Hess. 2017.
The Global Findex Database 2017: Measuring Financial Inclusion and the Fintech
Revolution.
[8] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E.
Dahl. 2017. Neural Message Passing for Quantum Chemistry. In ICML.
[9] William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation
Learning on Large Graphs. In NIPS.