Van Liebergen - Machine Learning in Compliance Risk Management PDF
Van Liebergen - Machine Learning in Compliance Risk Management PDF
Van Liebergen - Machine Learning in Compliance Risk Management PDF
Machine Learning:
A Revolution in Risk
Management and
Compliance?
Bart van Liebergen – Associate Policy Advisor, Institute of International Finance
Abstract
Machine learning and artificial intelligence are big topics in the the ability of machine learning methods to analyze very large
financial services sector these days. Financial institutions (FIs) amounts of data, while offering a high granularity and depth of
are looking to more powerful analytical approaches in order to predictive analysis, can improve analytical capabilities across
manage and mine increasing amounts of regulatory reporting risk management and compliance areas in FIs. Examples are
data and unstructured data, for purposes of compliance and the detection of complex illicit transaction patterns on pay-
risk management (applying machine learning as “RegTech”) or ment systems and more accurate credit risk modeling. Sec-
in order to compete effectively with other FIs and FinTechs. ond, the application of machine learning approaches within
This article aims to give an introduction to the machine learn- the financial services sector is highly context-dependent. Am-
ing field and discusses several application cases within finan- ple, high-quality data for training or analysis are not always
cial institutions, based on discussions with IIF members and available in FIs. More importantly, the predictive power and
technology ventures: credit risk modeling, detection of credit granularity of analysis of several approaches can come at the
card fraud and money laundering, and surveillance of conduct cost of increased model complexity and a lack of explanatory
breaches at FIs. insight. This is an issue particularly where analytics are applied
in a regulatory context, and a supervisor or compliance team
Two tentative conclusions emerge on the added value of ap- will want to audit and understand the applied model.
plying machine learning in the financial services sector. First,
60
THE CAPCO INSTITUTE JOURNAL OF FINANCIAL TRANSFORMATION
Machine Learning: A Revolution in Risk Management and Compliance?
BACKGROUND TO MACHINE LEARNING statistical learning: with applications in R, Springer Texts in Statistics. The
difference between both methods has also been described as the supervised ML
being based on “labeled” data to train the algorithm, while unsupervised ML lacks
training data with such labels and has to determine correlations by itself. However,
Machine learning comprises a broad range of analytical tools,
this is the same as having a dependent variable or not: labels in the training data
which can be categorized into “supervised” and “unsuper- are values of the dependent variable.
vised” learning tools. Supervised machine learning involves 3 Large datasets are typically divided into several separate samples to estimate a
model (training), to choose the model (validation), and to evaluate how well the
building a statistical model for predicting or estimating an
chosen model performs (testing).
output based on one or more inputs (e.g., predicting GDP 4 Khandani, E. A. J. Kimz, and A. W. Lox, 2010, “Consumer credit risk models via
growth based on several variables). In unsupervised learning, machine-learning algorithms,” Journal of Banking & Finance 34, 2767–2787
61
THE CAPCO INSTITUTE JOURNAL OF FINANCIAL TRANSFORMATION
Machine Learning: A Revolution in Risk Management and Compliance?
of statistical problem one might want to address. Broadly Regression • Principal components Penalized regression:
• Ridge • LASSO
speaking, machine learning can be applied to three classes of
• Partial least squares • LARS
statistical problems: regression, classification, and clustering. • LASSO • elastic nets
Regression and classification problems both can be solved Neural networks and
deep learning
through supervised machine learning; clustering is an unsu-
pervised machine learning approach. Classification Support vector Decision trees:
machines • classification trees
• regression trees
Regression problems involve prediction of a quantitative, con-
Problem type
• random forests
tinuous dependent variable, such as GDP growth or inflation. Support vector
Linear learning methods try to solve regression problems in- machines
Deep learning
cluding partial least squares5 and principal component analy-
sis; non-linear learning methods include penalized regression Unsupervised
approaches, such as LASSO and elastic nets.6 In penalized Clustering* Clustering methods: K- and X-means, hierarchical
approaches, a factor is typically added to penalize complexity Principal components analysis
Deep learning
in the model, which should improve its predictive performance.
* Since unsupervised methods do not describe a
relation between a dependent and interdependent
Classification problems typically involve prediction of a quali- variable, they cannot be labelled linear or non-
linear.
tative (discrete) dependent variable, which takes on values in
a class, such as blood type (A/B/AB/O). An example is filtering Table 1 – Overview of machine learning methods
spam e-mail, where the dependent variable can take on the
values SPAM/NO SPAM. Such problems can be solved by a
decision tree, “which aims to deliver a structured set of yes/no
questions that can quickly sort through a wide set of features, inference as well, as statistical methods are typically subject to
and thus produce an accurate prediction of a particular out- a trade-off between explanatory and predictive performance.
come.”7 Support vector machines also classify observations, A good predictive model can be very complex, and may thus
but by applying and optimizing a margin that separates the be very hard to interpret.10 For predictive purposes, a model
different classes more efficiently.8 would need only to give insight in correlations between vari-
ables, not in causality. In the case of credit scoring a loan port-
In clustering, lastly, only input variables are observed while folio, a good inferential model would explain why certain bor-
a corresponding dependent variable is lacking. An example rowers do not repay their loans. Its inferential performance can
is exploring data to detect fraud without knowing which ob- be assessed through its statistical significance and its good-
servations are fraudulent and which not. An anti-money laun- ness-of-fit within the data sample. A good predictive model,
dering (AML) analysis may nonetheless yield insights from the on the other hand, will select those indicators that prove to be
data by grouping them in clusters according to their observed the strongest predictors of a borrower default. To that end, it
characteristics. This may allow an analyst to understand which
transactions are similar to others. In some instances, unsuper-
vised learning is first applied to explore a dataset; the outputs
of this approach are then used as inputs for supervised learn- 5 PLS is used to find the fundamental relations between two matrices through linear
regression.
ing methods.9
6 LASSO stands for least absolute shrinkage and selection operator. LASSO and
elastic nets both perform variable selection, yet apply different types of penalties
Table 1 classifies popular machine learning approaches ac- for model complexity.
7 Tiffin, A., 2016, “Seeing in the dark: a machine-learning approach to nowcasting in
cording to their (un)supervised learning character, and the
Lebanon,” IMF Working Paper WP/16/56, March
types of problems they can be applied to. 8 Auria, L., and R. Moro, 2008, “Support vector machines as a technique for
solvency analysis,” Deutsches Institut für Wirtschaftsforschung, discussion papers
62
THE CAPCO INSTITUTE JOURNAL OF FINANCIAL TRANSFORMATION
Machine Learning: A Revolution in Risk Management and Compliance?
does not matter whether an indicator reflects a causal factor of approaches. It can only be more accurate at inferring those
the borrower’s ability to repay, or a symptom of it. What mat- correlations. However, one observer has noted, “[i]f you have
ters is that it contains information about the ability to repay. no idea what is behind a correlation, you have no idea what
might cause that correlation to break down.”17
Tackling overfitting: bagging and ensembles
Excessively complex models can also lead to “overfitting,” Deep learning and neural networks: from
where they describe random error or noise instead of under- machine learning to artificial intelligence
lying relationships in the dataset. Model complexity can be So far, discussion has focused on “classic” machine learning
due to having too many parameters relative to the number of methods that are applied to statistical problems with well-de-
observations.11 In machine learning, overfitting is particularly fined and structured datasets. Additionally, machine learning
prevalent in non-parametric, non-linear models, which are also approaches have been advanced and combined to solve all
complex by design (and therefore also typically hard to inter- kinds of complex problems, functioning as “artificial intelli-
pret). When a model describes noise in a dataset, it will fit that gence.” One of the dominant approaches is deep learning, a
one data sample very well, but will perform poorly when tested learning approach that can be based on both supervised and
out-of-sample.12 non-supervised methods; all are non-linear.18
There are several ways to deal with overfitting and improve In deep learning, multiple layers of algorithms are stacked to
the forecast power of machine learning models, including mimic neurons in the layered learning process of the human
“bootstrapping,” “boosting” and “bootstrap aggregation” (also brain. Each of the algorithms is equipped to lift a certain fea-
called bagging).13 Boosting concerns the overweighting of ture from the data. This so-called representation or abstrac-
scarcer observations in a training dataset to ensure the model tion is then fed to the following algorithm, which again lifts
will train more intensively on them. For example, one may want out another aspect of the data.19 The stacking of representa-
to overweight the fraudulent observations due to their relative tion-learning algorithms allows deep-learning approaches to
scarcity when training a model to detect fraudulent transac- be fed with all kinds of data, including low-quality, unstruc-
tions in a dataset. In “bagging,” a model is run hundreds or tured data; the ability of the algorithms to create relevant ab-
thousands of times, each on a different subsample of the data- stractions of the data allows the system as a whole to perform
set, to improve its predictive performance. The final model is a relevant analysis. Crucially, these layers of features are not
then an average of each of the run models. Since this average designed by human engineers, but learned from the data using
model has been tested on a lot of different data samples, it a general-purpose learning procedure.20
should be more resilient to changes in the underlying data. A
“random forest” is an example of a model consisting of many
different decision tree-based models.
63
THE CAPCO INSTITUTE JOURNAL OF FINANCIAL TRANSFORMATION
Machine Learning: A Revolution in Risk Management and Compliance?
Deep learning is being applied to a wide range of uses. The institutions to deliver these data. The Basel Committee’s Prin-
ability to crunch large amounts of raw data and to identify ciples for Risk Data Aggregation (Basel 239) sets standards for
complex patterns in it makes it particularly well-placed to ana- G-SIBS to improve their IT systems and reporting structures.
lyze “big data,” such as the user datasets of tech giants, such IFRS 9 aims to improve the quality of supervisory data.
as Google, Microsoft, and Amazon.
Apart from reporting data, FIs are increasingly able to gath-
Given that it was partly developed by the U.S. National Secu- er large amounts of low-quality, unstructured, high-frequency
rity Agency, it is perhaps unsurprising that deep learning has data. These include outputs from consumer apps and other
proved to be very proficient at face recognition and natural digital interactions with clients, metadata from payment sys-
language understanding, including question answering and tems, and external data sources, such as social media feeds,
language translation. Upon “overhearing” a discussion, it is which can be mined to gauge insights on market sentiment.
able to classify the topic of discussion and the sentiments of This type of data is typically called “big data.”
the speakers.21 While some conventional machine-learning
approaches can be equipped to solve non-numeric problems With practically all aspects of FI’s business model regulated
as well (for example, x-means clustering has been applied to and supervised with detailed risk metrics, running a bank,
text mining), deep learning has often proved to be more ac- insurer, or asset manager is increasingly becoming a matter
curate. However, a typical deep-learning system is extremely of optimization within hundreds of constraints. To compete
complex and requires a dataset with hundreds of millions of effectively, they need to find this optimum while also mining
labeled observations only to be trained. In many fields, avail- consumer data for detailed insights on client preferences and
ability of sufficient data for such extremely large datasets is behavior.
hardly a given.
The extensive set of machine learning approaches is well sit-
Application within financial services uated to deliver this analytical power in different contexts due
In past years, the amounts of data gathered in financial in- to its ability to cope with (or better said, its need for) extremely
stitutions (FIs) have increased significantly as the details of large datasets and the granularity of analysis. For the mining
reporting requirements have mushroomed and digitalization of high-quality, structured supervisory data, more convention-
of services is creating a large amount of high-frequency, un- al machine learning techniques are typically applied. To mine
structured consumer data. As a result, FIs have a clear need high frequency, low quality “big data” sources, Google-like
for more powerful analytical tools to cope with large amounts deep learning and neural network techniques are applied,
of data of all kinds of sources and formats, while maintaining which cope with these data due to their representation learn-
or improving granularity of analysis. ing abilities.
After the financial crisis of 2008-09, many new regulations and Below, the state of play in three use cases of machine learning
supervisory measures were introduced that required FIs to re- is being discussed: the modeling of credit risk, detection of
port more detailed and more frequent data on more aspects fraud and money laundering, and the detection of conduct risk
of their business models and balance sheets. Under the new and abusive behavior within financial institutions.
capital regime, banks report large exposures, liquidity mea-
sures, collateral, and capital levels. Stress tests are based on
all kinds of firm data including loan-level balance sheet data
and qualitative aspects of the business model. The Federal THREE USE CASES
Reserve’s CCAR exercise requires FIs to consider the impact
of more than 2000 economic variables on their business. For Credit risk and revenue modeling
insurers, Solvency II has dramatically increased reporting re- Since the early 2000s, an extensive academic literature on the
quirements. use of machine learning methods to model credit risk has de-
veloped. To give just a few examples, Angelini et al. (2007)
These processes create large amounts of reporting data that apply a neural network approach to model SME credit risk on
need to be well-defined and structured, aggregated across
the group, and delivered in-time with supervisors. Regulators
have, therefore, introduced numerous initiatives to improve
the quality of supervisory data and the ability of financial 21 Ibid.
64
THE CAPCO INSTITUTE JOURNAL OF FINANCIAL TRANSFORMATION
Machine Learning: A Revolution in Risk Management and Compliance?
a small dataset of Italian SMEs. Auria and Moro (2008) assess example, Citigroup hired an external vendor to build a revenue
company solvency using support vector machines, and find forecasting model for the 2015 CCAR exercise.23
that they produce more accurate out-of-sample predictions
than existing techniques. Khandani et al. (2010) apply gen- Fraud
eralized classification and regression trees (CART) to a large One area in which machine learning has been applied for
dataset of a commercial bank to build consumer credit risk more than a decade and with significant success is the de-
models. These combine traditional credit factors, such as tection of credit card fraud. Banks have equipped their credit
debt-to-income ratios, with consumer banking transactions, card payments infrastructures with monitoring systems (so-
which greatly increases the predictive power of the model. called workflow engines), which monitor payments for poten-
tial fraudulent activity. Fraudulent transactions can then be
FIs have traditionally used linear, logit, and probit regressions blocked in real-time. The fraud models used by these engines
to model credit risk for capital requirements, stress-testing, have been trained on historical payments data.
and internal risk management procedures.22 Recently, many
have started to experiment with the application of machine The high frequency of credit card transactions provides the
learning methods to improve financial risk predictions. Unsu- large datasets required for algorithm training, back testing
pervised methods are typically used to explore the data, while and validation. Furthermore, since banks are able to verify
regression and classification methods (trees, support vector unambiguously which transactions were fraudulent and which
machines) can predict key credit risk variables as probability were not, they can construct clear historical data with relevant
of default or loss-given default. Banks normally have extensive fraud and non-fraud labels to train classification algorithms.
records of loan-level data to serve as inputs. The historical transaction datasets showcase a wide variety
of pre-determined features of fraud, which distinguish normal
Banks have sometimes also experienced that machine learn- card usage from fraudulent card usage, ranging from features
ing can be hard to apply, as methods can be complex and from transactions, the card holder, or from transaction history.
models sensitive to overfitting the data. Thereby, the quality of
data within banks is not always fit enough for advanced statis- The detection of money laundering and terrorism financing
tical analysis, while banks are not always able to consolidate through payments systems stands as a contrast to machine
the data from across the financial group, among others, due to learning’s long-standing record in credit card fraud. Many
inconsistent data definitions across jurisdictions and the use of banks are still relying on conventional rules-based systems,
multiple systems. Non-parametric and non-linear approaches which focus on individual transactions or simple transaction
(support vector machines, neural networks, and deep learning) patterns. These systems are often unable to detect complex
and ensembles are so complex that they are practically “black patterns of transactions or obtain a holistic view of transac-
boxes” that are hard, if not impossible, for any human to un- tions behavior on payment infrastructures. Due to their coarse
derstand and audit from the outside. That makes these models selection methods, the number of false positives created by
hardly useful for regulatory purposes, such as the develop- these systems is substantial. As a result, significant human
ment of internal models in the Basel Internal Ratings-Based capacity is required for the assessment of alerts and filtering
approach. Financial supervisors typically require risk models false positives from actual suspicious observations. In addi-
to be clear and simple in order to be understandable and veri- tion, impediments to data sharing and data usage, as well as
fiable and appropriate for validation by them. long-established regulatory requirements, have complicated
innovation in the AML/CFT area.24
That does not, however, rule out the use of machine learn-
ing to optimize parameters and models with a regulatory
function. Linear and simple non-linear machine learning ap-
proaches can be applied and still perform better than similar
non-machine learning approaches. Machine learning can also
be applied to select variables and optimize parameters in ex-
22 In a probit model, the dependent variable is binary (can only take two values); in a
isting, linear regulatory models. Khandani et al (2010) stress logit model, the dependent variable is categorical.
that CART (tree) models produce easily interpretable decision 23 Ayasdi, “CCAR stress test,” http://bit.ly/2m5n4y2, undated; and “After yesterday,
CCAR less stressful for Citigroup,” March 6, 2015, http://bit.ly/2mmbfph
rules whose logic is clearly laid out, despite their non-linear
24 See the IIF’s forthcoming report on the use of “regtech for AML” and submissions
character. Indeed, there have been examples already of banks to FATF and the BCBS for more information on data sharing issues in AML/CFT
applying machine learning in a regulatory context. In a public on www.iif.com.
65
THE CAPCO INSTITUTE JOURNAL OF FINANCIAL TRANSFORMATION
Machine Learning: A Revolution in Risk Management and Compliance?
Machine-learning systems have the potential to improve de- The capabilities of the first generation of these surveillance
tection of money laundering activity significantly, due to their systems were limited to monitoring trading behavior, and only
ability to identify complex patterns in the data and combine through assessing single trades. However, the improved abil-
transactions information at network speed, with data from ity of machine learning approaches to identify large, complex
many other sources to obtain a holistic picture of a client’s patterns in data has allowed a new generation of systems to
activity. Indeed, these systems have already been shown to analyze entire trading portfolios. These systems are also able
bring false positives down significantly.25 to link trading information to other behavioral information of a
trader, such as e-mail traffic, calendar items, building check
However, application so far in the AML space has lagged for in and check out-times, and even phone calls. Technologies,
several reasons. First, money laundering is hard to define. such as natural language processing (typically based on deep
There is no universally agreed definition of money laundering learning) and text mining (which can be based on several learn-
and financial institutions do not receive feedback from law en- ing algorithms26), have made those sources machine-readable
forcement agencies on which of their reported suspicious ac- and suitable for automated analysis. The outputs of the trading
tivities have turned out to be money laundering. It is, therefore, behavior and communications of one or multiple traders are
more difficult to train ML-detection algorithms using historical then integrated and compared to a profile of “normal” behav-
data, because an incidence of money laundering typically is ior. When a trader’s behavior or trading performance deviates
not firmly established. As a second-best, FIs are optimizing from what is deemed normal, the system will send an alert to
ML detection algorithms using lower-level suspicious activity the FI’s compliance team.
reports as a depending variable for classification – using clas-
sification between alerts that the bank could classify as false There are several challenges to applying machine learning in
alerts, and those that moved on to be submitted as SARs to this space. First, there are typically no labeled data to train
law enforcement agencies. algorithms on, as it is legally complex for financial institutions
to share the sensitive information on past breaches with devel-
Unsupervised learning methods are also applied to AML/CFT opers. Supervisory learning approaches are, therefore, hard to
as they “learn” relevant patterns from the data by clustering apply. Second, a surveillance system needs to be auditable for
transactions or client activity. This yields additional insights, supervisors and for compliance officers, and needs to be able
since laundering methods take all kinds of form and develop to explain to a compliance officer why certain behavior has set
on a continuous basis. off an alert. For systems that are entirely based on machine
learning, that can be difficult due to the “black box” character
An example of such unsupervised learning is clustering. Clus- of learning approaches. In order for an alert to be interpreta-
tering requires large datasets where it can automatically find ble and actionable for compliance teams, it should ideally be
patterns within the data without the need for labels. Cluster- linked to detection of a specific kind of behavior, rather than
ing works by identifying outliers as points without any strong based solely on a statistical correlation in the data.
membership in any one cluster group, thus finding anomalies
within subsets of the data. In AML, clustering is one of the These issues can be addressed at least partly by founding
methods used to group together data; using other analytics, the learning system on a behavioral science-based model,
such as topological data analytics and dimensionality reduc- which incorporates human decisions and behavioral traits. In
tion, machine learning can reduce the significant amounts of a way, such a model addresses the lack of explanatory power
false positives often associated with alternative methods. of machine learning approaches. Any alerts from the system
will be based on deviations it has identified from the model.
Surveillance of conduct and market abuse in However, the inclusion of machine learning approaches on top
trading of the model creates a feedback loop in the system through
A third area in which machine learning is increasingly being ap- which it can adapt to evolving behavior, and “get to know” a
plied within financial institutions is the surveillance of conduct
breaches by traders working for the institution. Examples of
such breaches include rogue trading, benchmark rigging, and
insider trading – trading violations that can lead to significant fi-
nancial and reputational costs for FIs. In the last couple of years,
25 Adamson, D., 2016, at “Machine learning – the future of compliance?” panel
automated systems have been developed that monitor the be- discussion at Sibos conference, September 28
havior of traders in multiple ways and with increasing accuracy. 26 Bholat et al., 2015.
66
THE CAPCO INSTITUTE JOURNAL OF FINANCIAL TRANSFORMATION
Machine Learning: A Revolution in Risk Management and Compliance?
trader as it ingests more data. That is a crucial difference with the detail with which data can be analyzed and outcomes pre-
previous rules-based systems, which are unable to tailor their dicted. Unsupervised approaches allow for exploration of data
surveillance methods to changed probability distributions and without a dependent variable. Running algorithms thousands
correlations. Consequently, these systems are typically based of times on training data and combining models improves their
on more conventional types of machine learning, which can be predictive power while limiting overfitting and maintaining an-
audited and explained more easily than complex types, such alytical granularity.
as neural nets and deep learning.
Such improved, often automated, analytical capabilities allow
A practical barrier to the implementation of automated surveil- FIs to gain better insights in business processes such as lend-
lance systems is the fragmentation and complexity sometimes ing, risk management, customer interaction, and payments.
found in FI’s IT systems. To gain a perspective on a trader’s With ever more data produced in these processes, machine
behavior, surveillance systems require information from many learning can discover richer, more complex patterns and rela-
sources, which are likely to be found in different systems that tionships as in the analysis of transactions or credit risk, or by
can be mutually incompatible or slow to deliver. connecting different datasets to draw more accurate overar-
ching conclusions, as in the monitoring of conduct breaches.
67