Fin XAI
Fin XAI
NUS, Singapore; Honorary Research Fellow, Imperial College London, United Kingdom, Singapore
The success of artificial intelligence (AI), and deep learning models in particular, has led to their widespread adoption across
various industries due to their ability to process huge amounts of data and learn complex patterns. However, due to their
lack of explainability, there are significant concerns regarding their use in critical sectors, such as finance and healthcare,
where decision-making transparency is of paramount importance. In this paper, we provide a comparative survey of methods
that aim to improve the explainability of deep learning models within the context of finance. We categorize the collection
of explainable AI methods according to their corresponding characteristics, and we review the concerns and challenges of
adopting explainable AI methods, together with future directions we deemed appropriate and important.
CCS Concepts: • Computing methodologies → Artificial intelligence; Machine learning.
Additional Key Words and Phrases: XAI, explainable AI, interpretable AI, finance, FinXAI
ACM Reference Format:
Yeo Wei Jie, van der Heever Wihan, Mao Rui, Cambria Erik, Satapathy Ranjan, and Mengaldo Gianmarco. 2023. A Compre-
hensive Review on Financial Explainable AI. J. ACM 37, 4, Article 111 (August 2023), 36 pages. https://doi.org/XXXXXXX.
XXXXXXX
1 INTRODUCTION
Finance is a constantly evolving sector that is deeply rooted in the development of human civilization. One of the
main tasks of finance is the efficient allocation of resources, with a chief example being the handling of capital
flows between various entities with different needs. These entities can be divided into individuals, companies,
and countries, and lead to the common categorization of personal, corporate, and government finance. The sector
Authors’ addresses: Yeo Wei Jie, [email protected], Nanyang Technological University (NTU), 50 Nanyang Ave, Singapore, 639798;
van der Heever Wihan, Nanyang Technological University (NTU), 50 Nanyang Ave, Singapore, 639798, [email protected]; Mao Rui,
Nanyang Technological University (NTU), 50 Nanyang Ave, Singapore, 639798, [email protected]; Cambria Erik, Nanyang Technological
University (NTU), 50 Nanyang Ave, Singapore, 639798, [email protected]; Satapathy Ranjan, (corresponding author) Institute of High
Performance Computing (IHPC), Agency for Science, Technology and Research (A∗STAR), Fusionopolis Way, #16-16 Connexis, Singapore,
Republic of Singapore, 138632, [email protected]; Mengaldo Gianmarco, National University of Singapore (NUS); Asian
Institute of Digital Finance at NUS, Singapore; Honorary Research Fellow, Imperial College London, United Kingdom, 9 Engineering Drive 1,
Singapore, 117575, [email protected].
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first
page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy
otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected].
© 2023 Association for Computing Machinery.
0004-5411/2023/8-ART111 $15.00
https://doi.org/XXXXXXX.XXXXXXX
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:2 • Wei Jie et. al.
can be traced back to 5000 years ago, in the agrarian societies that had been established and developed for some
thousand of years at the time. Indeed, one of the first examples of banking, a central institution within finance,
can be attributed to the Babylonian empire. Since then, societal development and technological advances have
pushed the field to undergo several changes. In the past two decades, these changes have been particularly
marked, due to the accelerating pace of technological development, especially in the context of AI. The latter has
started spreading across multiple segments of finance, from digital transactions to investment management, risk
management, algorithmic trading, and more [114]. The use of novel AI- and non-AI technologies to automate
and improve financial processes is now known as FinTech (Financial Technology), and its growth in the past two
decades has been remarkable [89]. In this review, we focus on AI-based technologies and machine learning for
financial applications.s
Financial researchers and practitioners have been relying on supervised, unsupervised, and semi-supervised
machine learning methods as well as reinforcement learning for tackling many different problems. Some examples
include credit evaluation, fraud detection, algorithmic trading, and wealth management. In supervised-based
machine learning methods it is common to use e.g., neural networks to identify complex relationships hidden
in the available labeled data. The labels are usually provided by domain experts. For instance, one can think of
building a stock-picking system, where a domain expert labels periods of positive and negative returns. The
machine is then tasked to build the relationship between (possibly) high-dimensional data, and positive and
negative returns of a given stock (or multiple stocks) and generalize to unseen data to e.g., predict the future
stock’s behavior. In unsupervised-based machine learning methods, the task is instead to identify data with
similar characteristics that can therefore be clustered together, without domain-expert labeling. For example, one
can think of identifying all stocks that have similar characteristics into clusters using some similarity metrics,
such as valuation, profitability and risk. Semi-supervised learning is a middle ground between supervised and
unsupervised learning, where only a portion of the data is labeled. Finally, reinforcement learning aims to
maximize, through a set of actions, the cumulative reward specified by the practitioners. Reinforcement learning
is used in finance for e.g., portfolio construction. Reinforcement learning is strictly related to Markov decision
processes and substantially differs from both supervised and unsupervised learning.
Among supervised, unsupervised, and reinforcement learning methods, there is vast heterogeneity in terms of
complexity. Some methods are considered easier to understand, hence to interpret by practitioners (also referred
to as white-box methods), while others are considered not interpretable (also referred to as black-box methods).
To this end, neural networks and deep learning strategies, that underpin the majority (albeit not the entirety) of
recent machine learning methods for financial applications, are considered black-box methods - i.e., the reason
for a given prediction is not of easy access when available). This constitutes a critical issue, especially in risky
and highly regulated sectors, such as healthcare and finance, where a wrong decision may lead to catastrophic
loss of life (healthcare) or capital (finance). Hence, it was deemed important to understand the reasons (i.e., the
data and patterns) the machine used to make a given decision. This aspect encompasses the broad field of AI
transparency. The latter is composed of three pillars, (i) AI awareness, (ii) AI model explainability, and (iii) AI
outcome explainability. The first is tasked to understand whether AI is involved in a given product. The second
is responsible to provide a detailed explanation of the AI model, including its inputs and outputs. The third is
responsible to provide a granular explanation of the inputs’ contributions to the AI model’s outcomes. To this last
category, we find a vast array of post-hoc interpretability methods. In this review, we assume that AI awareness
is achieved, i.e., we know that in a given financial process AI is involved, and focus on AI explainability, also
referred to as eXplainable AI or simply XAI. A further distinction commonly made is between interpretability
and explainability of an AI model. These two terms, frequently used interchangeably, have subtle differences.
Interpretability refers to how and why a model works. Explainability refers to the ability of explaining the results
in human terms.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:3
While deep learning methods are considered black-boxes, many other methods in finance are considered
white-box methods. The trade-off between complexity and interpretability is perhaps one of the most debated
aspects in the field of financial AI. On the one hand, white-box methods are highly interpretable but lack the ability
to grasp complex relationships, frequently failing to meet the desired performance. On the other hand, black-box
methods are not interpretable but usually (although not always) meet the desired performance. Therefore, it
is not surprising that there are significant efforts being pushed forward in recent years to render black-box
methods more interpretable, where the primary example is the field of deep learning. In this paper, we provide
an extensive review of XAI methods for the financial field that we name FinXAI. Although there have been a
number of surveys on XAI methods [7, 19, 49, 88, 90, 102, 105], these papers are targeted towards general XAI,
and are not specific to finance. Hence, we conduct a review on explainability techniques exclusively related to
financial use cases.
To compile this review, we took into account 69 papers, focusing mainly, though not exclusively on the third
pillar, i.e., the explainability of the inputs’ contributions to the AI model’s outcomes. To this end, we considered
both post-hoc interpretability methods applied to black-box deep learning models, and inherently transparent
models that do not require further post-hoc interpretability. Despite the relatively small number of collected
papers in the field of XAI, it is important to note that our main objective is to focus specifically on XAI techniques
applicable to the financial industry. This targeted approach will provide valuable insights for researchers in
related fields and will ultimately help drive innovation and progress in the financial industry. With the growing
need for transparency and accountability of deep learning, the XAI community has seen increasing growth in the
number of works published, we focus here instead only on works concerning financial use cases. Notably, FinXAI
is but a small subset of the general field of XAI and, thus we take a holistic approach to assembling existing studies
with the goal of keeping up to date with the current approaches. The papers were queried from both Google
Scholar and Scopus where we searched using a set of keywords relating to works that have applied explainable
AI techniques in financial use cases, the set of keywords include “XAI, explainable AI, finance, financial sector,
financial field, explainable ML”. We try to collect a diverse set of papers that covers each category sufficiently well,
and summarized in tables 1, 2, 3. We also noticed the majority of explanation types were limited to fact-based
explanations, hence we explicitly search for techniques explaining in the form of counterfactuals. Counterfactual
explanations are deemed as a desirable form of explanation as the receiver tends to prefer understanding why a
certain prediction was made instead of the opposing.
The main contributions of our work as such:
• We provide an extensive study on consolidating XAI methods in the field of finance (FinXAI), for researchers
interested in prioritizing transparency in their solutions.
• We frame the FinXAI process as a sequential flow of decision-making processes (see Fig 4), where we place
importance on aligning the XAI technique with the target audience. The objective of this framework is to
produce explanations that are both goal-oriented and audience-centric.
• We review current FinXAI techniques, analyze their technical contributions to ethical goals, and list down
a number of key challenges faced in implementing XAI as well as important directions to be improved for
the future.
The remainder of the review paper is organized as follows: Section 2 describes the definitions, reasons, and
brief overview of FinXAI. Subsequently, we explain the methodology of FinXAI, starting from numerical in
Section 3, textual in Section 4, hybrid analysis in Section 5 and ending with transparent models in Section 6. In
Section 7, we analyze how the reviewed FinXAI methods contribute to ethical goals. Section 8 discusses key
challenges of adopting explainable models and future directions for research. Finally, Section 9 offers concluding
remarks.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:4 • Wei Jie et. al.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:5
exists a gap between each organization’s understanding of necessity, more often than not leading to the delay in
approving the deployment of financial services.
As mentioned, the constitution of a good explanation is largely subjective. The amount of required information
usually increases in a hierarchical manner starting from the audience to the regulatory authorities, as depicted
in Figure 1. Here, scrutiny refers to the amount of information regarded as essential. On the one hand, end-users
typically require the least amount of explanations (cause of outcome, data security) since they are usually only
interested in resolving their practical concerns. On the other hand, external regulators require explanations
about the end-product from head-to-toe (overall guideline in design process, accountable and involved personnel,
deployment process, training structure of organization), including the end-users requirements.
Proximity refers to the region of explanation provided by the XAI technique and can be classified under local
(reasons about a particular outcome) and global (view of the underlying reasoning and mechanics of the AI
model). End-users tend to be concerned with how the outcome affecting them is provided (local proximity). For
example, a person whose credit card application was rejected would want to know the underlying reason behind
it. In contrast, the solution providers and regulators tend to focus on the internal operations and design workflow
of the product, for reasons related to performance enhancement, fairness in the model’s sense of judgment, and
identification of biases in the prediction (global proximity).
Fig. 1. Levels of explanation requirements by different audiences, categorized by explanation proximity, and ordered by
scrutiny level. Local proximity refers to explanations concerned about a specific outcome. Global proximity refers to
the underlying reasoning and mechanics of an AI model). End-users typically require are satisfied with local-proximity
explanations, and the level of scrutiny is low. Developers, domain experts and regulatory authorities require global-proximity
explanations instead, and the level of scrutiny is much higher.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:6 • Wei Jie et. al.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:7
Fig. 2. Ethical goals are classified under three broad audiences: end-users, developers/domain experts, and internal/external
regulatory authorities. Some ethical goals are shared by the three different audiences considered, such as informativeness. [7]
• Accessibility: The main personnel interacting with algorithms are usually restricted to AI developers or
domain experts, providing accessibility could allow for non-experts to get involved. This can be seen as
an important stepping stone for making AI prevalent and well-accepted by the general society. Likewise,
complicated algorithms deter financial companies from adopting such solutions, since extensive training
is required while having to fear potential repercussions in the case of any unintended wrongdoings. If a
model is able to relate its mechanisms in easily understandable terms, it can ease the fear of users and
encourage more organizations to adopt such practices.
• Privacy Awareness: Not knowing the full limits of accessibility in the data can result in a breach of privacy.
Likewise, such an issue triggers concerns within the overall design workflow. Accountable personnel in
the designing process should ensure third parties are only allowed restricted access to the end-users data
and prevent any misuse which can disrupt data integrity. Privacy awareness is especially important in the
financial sector due to the amount and sensitivity of the information being captured.
• Confidence: The AI model should provide not only an outcome but also the confidence it has in the decision-
making process, allowing domain experts to identify uncertainty in both model’s results as well as the region
of data captured. Stability in the prediction can be used to access a model’s confidence while explanations
provided by the model should only be trusted if it produces results that are consistent across different data
inputs.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:8 • Wei Jie et. al.
• Causality: It is usually in the interest of developers or experts to understand the causality between data
features. However, proving it is a difficult task that requires extensive experimenting. Correlation can be
involved in assessing causality, though it is frequently not representative of causality. Since AI models only
discover correlations among the data they learn from, domain experts are usually required to perform a
deeper analysis of causal relationships.
• Transferability: Allowing for the distillation of knowledge learned from AI models is an extensive area of
research, a notable benefit is that it allows for the reusability of different models and averts endless hours
of re-training. However, the complexity of the algorithms limits experts from deploying trained models in
different domains. For example, a model trained to forecast future stock prices can likely be used to predict
other financial variables such as bond price, market volatility, or creditworthiness, if the model behavior in
these circumstances is known. Delivering an intuition of the inner workings can ease the burden of experts
to facilitate adapting the knowledge learned, reducing the effort required for fine-tuning. Transferability is
arguably one of the essential properties for the improvement of future AI models.
Fig. 3. Different stages where interpretability can be injected into the design workflow. [83]
The review provided in this paper aims to give the readers an overall view of the XAI methodologies developed
thus far in the financial industry. We note that explainability can be injected across different stages of the devel-
opment cycle. These stages include: pre-modeling, modeling, and post-modeling [83]. Pre-modeling stage refers to
the process chain before the designing stage of the AI model, this can include preliminary procedures which
focus on identifying salient features by accessing readily available domain knowledge [57]. The modeling phase
includes any adjustment to the model’s architecture or optimization objective. As a start, simpler transparent
models should be preferred over complex black-box models if the problem at hand is not too complicated. Most
of the papers in the review focus on the post-modeling stage, mainly due to the flexibility and ease of designing
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:9
explainability techniques. Since the outcome is provided, it provides developers with more information to design
an appropriate explanation method towards the form of data interacted (See Figure 3). Most XAI techniques tend
to focus on one stage of the modeling process, though it is possible to do so in two or more.
Table 1. Classification of papers relating to credit evaluation. The papers reviewed are split by task category and subsequently
categorized by entailed properties. Missing options are either not stated or non-applicable.
Feature relevance
Counterfactual
Domain expert
Simplification
By example
Regulatory
Numerical
Developer
End-user
Post-hoc
Intrinsic
Textual
Factual
Global
Visual
Local
Text
[38] ✓ ✓ ✓ ✓ ✓ ✓
[22, 47] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[85, ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
107]
[16, 17] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[101] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[91] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[139] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[13] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[30] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[33] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[39, ✓ ✓ ✓ ✓ ✓ ✓
113]
[15] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[42, ✓ ✓ ✓ ✓ ✓ ✓ ✓
116]
[112] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[48] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[3] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[36] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[75] ✓ ✓ ✓ ✓ ✓ ✓
[136] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
The focused regions of finance can be broadly categorized under three sections [10]: credit evaluation (peer-
to-peer lending, credit assessment, credit risk management, credit scoring, accounting anomalies), financial
prediction (Asset allocation, stock index prediction, market condition forecasting, volatility forecasting, algorithmic
trading, financial growth rate, economic crisis forecast, bankruptcy prediction, fraud detection, mortgage default)
and financial analytics (financial text classification, spending behavior, financial corporate social responsibility
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:10 • Wei Jie et. al.
(CSR), customer satisfaction). Following the task classification, we further differentiate the studies based on the
underlying characteristics of the XAI technique as shown in Table 1, 2, 3. Specifically, we seek to answer questions
such as “What form of explanation is provided?” (explanation procedure), “Who is the explanation intended for?”
(audience), “What kind of explanation is provided?” (proximity, explanation type).
Table 2. Classification of papers relating to financial prediction. The papers reviewed are split by task category and subse-
quently categorized by entailed properties. Missing options are either not stated or non-applicable.
Feature relevance
Counterfactual
Domain expert
Simplification
By example
Regulatory
Numerical
Developer
End-user
Post-hoc
Intrinsic
Textual
Factual
Global
Visual
Local
Text
[135] ✓ ✓ ✓ ✓ ✓ ✓
[129] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[37] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[43] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[28] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[12, 41] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[14] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[92] ✓ ✓ ✓ ✓ ✓ ✓
[21] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[29] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[131] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[57, 97, ✓ ✓ ✓ ✓ ✓ ✓
121, 122,
124]
[8] ✓ ✓ ✓ ✓ ✓ ✓
[69] ✓ ✓ ✓ ✓ ✓ ✓
[11, 44] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[127] ✓ ✓ ✓ ✓ ✓ ✓
[20] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[65] ✓ ✓ ✓ ✓ ✓ ✓
[26] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[24, 64, ✓ ✓ ✓ ✓ ✓ ✓ ✓
110]
[2] ✓ ✓ ✓ ✓ ✓ ✓
[40] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[96] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[133] ✓ ✓ ✓ ✓ ✓ ✓ ✓
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:11
Table 3. Classification of papers relating to financial analytics. The papers reviewed are split by task category and subsequently
categorized by entailed properties. Missing options are either not stated or non-applicable. There were no evaluation metrics
present for these papers.
Feature relevance
Counterfactual
Domain expert
Simplification
By example
Regulatory
Numerical
Developer
End-user
Post-hoc
Intrinsic
Textual
Factual
Global
Visual
Local
Text
[99] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[134] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[79] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[80] ✓ ✓ ✓ ✓ ✓ ✓
[81] ✓ ✓ ✓ ✓ ✓ ✓
[66] ✓ ✓ ✓ ✓ ✓ ✓
[46] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
[72] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[128] ✓ ✓ ✓ ✓ ✓ ✓
[58] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[138] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
• Transparency: As mentioned in Section 2.3, interpretability of the model is either derived via interpreting
the internal mechanisms of the AI model or through external techniques aimed at delivering some form
of visualization or intuition of how the model works. Most of the reviewed papers focus on post-hoc
explainability techniques, which we believe are preferred for a number of reasons. Intrinsic models usually
under-perform complex networks and as such, producing explanations for an inaccurate prediction is
pointless. We additionally note that the method of conveying explanations for intrinsic models is by
definition model-specific. This means the same method cannot be reused for a different model. While
post-hoc techniques can be agnostic or specific towards any single model.
• Proximity: The explanations provided by XAI tools can seek to explain either the derivation of an outcome,
known as local explanation, or how the model outputs on a global scale, referred to as global explanation.
Global explanations tend to provide information on how the model makes decisions globally based on the
learned weights, data features, and structure of the network. Producing an acceptable global explanation
tends to be difficult in most cases [88] as opposed to just a region of the input data. On the other hand, local
explanations focus on a specific region of the dataset and seek to assist the receiver in understanding how
a particular prediction is made. Local explanation is more accurate for unique cases where the dependency
on input features is rarely captured by the AI model, which can cause global explanations to ignore
such dependency. End-users tend to prefer local explanations as their concern lies with the explanation
surrounding their outcome. Regulators and financial experts, on the other hand, prefer global explanations
in order to have a complete understanding of the model.
• Explanation Procedure: According to [7], the various forms of post-hoc XAI techniques can be divided into
several sections: text explanation (TE), visual explanation (VE), explanation by example (EE), explanation
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:12 • Wei Jie et. al.
by simplification (ES) and feature relevance (FR). TE provides an explanation via text generation. Natural
language tends to be easily understood by non-experts and is a common source of information in human
society. VE enables visual understanding of the model’s behavior, which may be preferable for image
features [106], such methods comprise graphical plots for both local and global explainability. EE captures
a smaller subset of examples which represents the correlations modeled by the black-box model at a high
level. ES techniques build a simpler surrogate model to approximate the underlying black-box model with
high fidelity yet being interpretable. FR techniques aim to identify features deemed relevant for the model’s
prediction, by computing a relevance score for each feature. FR can account for explainability at both local
and global levels and constitutes the largest share among the reviewed papers in our literature.
• Audience: Since the quality of explanations is subjective, it is very difficult to derive a one-fit-all expla-
nation and hence, explanations should be customized towards one’s needs. The examples of audiences
are referenced from Fig 1, while we further merge internal and external regulators together. We highlight
that aligning the objective of the explanation to the audience receiving it is important [115]. Determining
if an explanation is considered meaningful, is dependent on the target goals respective of each audience.
Financial regulators, for example, would not be very concerned with understanding what sort of AI model
or ML technique is used, but rather on the aspect of data privacy, model biases, or unfair treatment between
affected end-users. It is uncommon for a single explanation to be deemed acceptable to audiences holding
different positions in a financial company. An example is that the explanation produced for the developer
tends to require additional customization before submitting to the immediate superior and the same applies
to the proceeding higher-ups and external end-users.
• Data Type: The most commonly used forms of input data among the reviewed papers consists of text,
images, and numerical values. In terms of frequency among the forms of available data, numerical features
are the most common source of information used in the financial industry. Images represent the least
utilized source, as they tend to be storage intensive and contain a large amount of redundant information
or are not applicable for most use cases. We only found a single work using image features. [24] performs
classification of eight different candlestick patterns and the explanation is delivered through monitoring
changes in prediction after applying adversarial attacks. Surprisingly, textual information is not used as
frequently as expected, albeit being a valuable source of information for deriving market sentiment or
understanding consumers’ emotions towards certain aspects of the business product. It is also possible to
unify multiple sources of information, otherwise known as multi-model data. A boost in performance can
be achieved, for instance by combining the patterns learned from time-series features and sentiment from
textual features.
• Explanation Type: A single explanation can be conveyed in various forms, including factual, contrastive,
and counterfactual explanations [84]. Factual delivers straightforward explanations that seek to answer the
question “Why does X lead to Y” as opposed to contrastive “Why does X lead to Y instead of Z”. Counterfactual
instead reasons how the consequent can be changed with respect to the antecedent, answering the question
“how to achieve Z by changing X”. Humans tend to prefer contrastive rather than factual explanations since
the latter can have multiple answers and referring to [84], explanations are selective. As humans tend to
ignore a large portion of the explanations except for the important ones due to cognitive bias. For example,
if Person A’s loan application was rejected, there could be numerous reasons for this, such as “Person A’s
income was too low for the past 6 months”, “Person A’s only have 1 existing credit card”, “Person A has had a
credit default 3 months ago” and so on. Whereas a contrastive explanation can instead involve comparing
against another applicant whose outcome contrasts the target applicant’s and an explanation can be made,
highlighting the most significant factor. As argued by [70], contrastive explanations are easier to deliver
as one does not have to investigate the entire region of causes but rather a subset of it. Counterfactual
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:13
explanations then seek to provide solutions for the contrastive explanation, commonly done by identifying
the smallest changes to the input features, such that the outcome can be altered towards the alternative.
• Explanation Evaluation: Despite the extensive studies carried out to investigate what defines a good
explanation, it is difficult to qualitatively compare among interpretations. The quality of an explanation
is mostly subjective as a single explanation can be perceived with varying opinions among audiences.
Nonetheless, there exist a number of studies that provides a quantitative approach to evaluating explanations.
These measurements can be derived from human experts [128], referencing financial ethical goals [3] or
through statistical methods [91]. [57] conducted a comparison between feature importance techniques
in time series data and proposed a multivariate dataset that deals with the inability of techniques that
identify salient time-series features. A vast majority of the reviewed papers focused only on evaluating
the performance of the prediction model and consider it as a proxy for the quality of the explanation. We
argue that such evaluation does not fully represent the quality of the explanation and even if so, it may not
be suitable for every form of explanation procedure.
Selection Procedure: We design a framework shown in Figure 4, framing the designing of the XAI solution as a
sequential decision-making process. The selection categories can be referenced from Tables 1, 2, 3. The sequential
structure of the framework ensures the explanation provided is tailored to the audience’s needs while achieving
the goal set out with respect to the target audience. We note that certain properties of the XAI technique have
inner dependencies with each other, such as the relationship between explanation proximity and target audience.
The quality of the explanation is evaluated and serves as feedback for any necessary adjustment, resulting in an
audience-centric explanation.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:14 • Wei Jie et. al.
Fig. 4. XAI framework depicting a sequential flow of decision-making events. The proximity (local/global) and explanation
type (factual/counterfactual) should be chosen in accordance with the target audience and data type available. The choice
of explanation property is assessed by an iterative evaluation under the appropriate metric for both performance and
explanation conveyed.
explanations, and is applicable to any deep neural networks. The proposed technique outputs a localization map
using gradients corresponding to the target label. The right plot in Figure 5 depicts a global map highlighting
each asset’s importance across the trading period. Interestingly in the left plot, the agent focuses on the worst-
performing stock, GLID the most, rather than the high-performing stocks. Here, the agent predicts the stock
decline and reduces the allocation proportion, and indirectly increases the weights of high-performing stocks
which in this case is the target stock, NVDA. [2] proposes to use an attention mechanism [119] to compute
similarity scores of possibly fraudulent transactions on both feature and temporal levels and in return, allows for
visualization at the top contributing features accounting for the model’s prediction.
Model-agnostic VE techniques can be integrated with any form of model architecture and bear a similar
resemblance with feature relevance techniques. Both investigate the effects on the model’s output by adjusting
the input features. [13, 40, 134, 139] employ Partial Dependence Plots (PDP) to visualize the marginal effects of
features relating to corporate distress, credit scoring, and detecting mortgage loans defaults. The generated plots
can enable a way of inferring if the underlying input-output relationship is linear or complex. However, PDP has
often been criticized for its assumption of independence between features, evaluating unrealistic inputs, and also
conceals any heterogeneous effects of the input features. Accumulated Local Effects (ALE) [6] address the concerns
of feature correlation by considering the conditional distribution rather than the marginal one. In particular, it
accumulates differences between intervals within the feature set to account for individual feature effects. [30]
employs ALE on top of a tree ensemble model, XGBoost [25], as well as with global Shapley values [109] for better
scrutability. This work deduces that the increase in profit margin and solvency ratio leads to lower debt default
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:15
Fig. 5. [left] shows a heatmap denoting the global attentiveness of individual stocks in the overall portfolio. [right] corre-
spondingly presents a heatmap of individual assets. The agent chooses to allocate the most weight of the portfolio to NVDA,
while surprisingly focusing most on a declining stock, GILD. The agent reduces the weightage of GILD and allocates to
NVDA. Adapted from [110].
rates of small enterprises. [134] evaluates across an arsenal of XAI techniques, encompassing the aforementioned,
and also includes Individual Conditional Expectation (ICE) for financial auditing purposes. ICE differs subtly
from PDP in that it considers instance-based effects rather than averaging across all instances, making it a local
approach (see Figure 6). [136] generates counterfactual explanations on credit loan applications by coupling
unsupervised VAE with a supervised probit regression. The combined model yields a discriminative latent
state, corresponding to class labels of either delinquency or non-delinquency. The counterfactual is subsequently
produced by a stepwise manipulation function towards the opposite class label. The authors evaluate the generated
counterfactuals quantitatively using maximum mean discrepancy (MMD) [137], which measures the number of
successfully flipped class labels as well as minimal feature changes.
Fig. 6. [left] shows a PDP on averaged marginal effects of total assets on the probability of statement restatement and [right]
displays ICE, which considers instance-level relationship. Both show a negative relationship. [134].
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:16 • Wei Jie et. al.
Fig. 7. LIME process: Predictions of black-box model are uninterpretable. The local instance in the red box is the target to be
explained. Subsets of nonzero elements of the target instance are uniformly drawn to form a local dataset on which the
surrogate transparent model is trained on. The prediction from the linear transparent model can then be interpreted by the
user. [100].
The idea of Explanation by Simplification (ES) techniques is to introduce a surrogate model performing
uncomplicated operations. The purpose is to allow the machine learning developer to formulate a mental model
of the AI model’s behavior. The surrogate model has to be interpretable and more importantly capture the
performance of the black-box model with high fidelity. The latter property should be given a higher priority
since there is little use for interpreting a low-fidelity solution. ML techniques which apply linear operations and
rule extraction are applicable as surrogate models in place of uninterpretable neural networks. These include
decisions tree (DT) with limited depth, linear/logistic regression, K-Nearest Neighbors (KNN), and generalized
linear models (GLM).
Local Interpretable Model-agnostic Explanations (LIME) [100]: is perhaps one of the most popular explanation
techniques across various use cases, including finance. LIME is a model-agnostic method that is used to provide
insight as to why a certain prediction was made and can be constituted as an outcome explanation technique.
Since LIME is a local-based technique, it only has to approximate the data points within a defined neighborhood,
achieving a much more realistic goal instead of capturing an interpretable representation of the entire dataset.
On a high level, LIME can be implemented as follows (see Fig 7):
(1) The target instance to be explained is denoted as 𝑥 ∈ R𝑑 . Uniformly sample 𝑛 random subsets of nonzero
elements of 𝑥 to form local training points, 𝑧 ∈ 𝑧 1, 𝑧 2, ..., 𝑧𝑛 , where 𝑧𝑖 ∈ R𝑑 for 1 ≤ 𝑖 ≤ 𝑛.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:17
(2) Derive labels 𝑓 (𝑧𝑖 ) for each point using the black-box model 𝑓 . The surrogate model, 𝑔 is then trained on
the derived dataset, {𝑧, 𝑓 (𝑧)} ∈ 𝑍 𝑛 .
(3) Choose a transparent surrogate model, 𝑔 and train it on the dataset, 𝑍 𝑛 via Equation 1.
(4) Interpret the outputs of the transparent model on the target instance, 𝑔(𝑥).
LIME minimizes the following loss function to optimize for both fidelity of the local model as well as minimal
complexity.
explanation(𝑥) = argmin [ 𝐿(𝑓 , 𝑔, 𝜋𝑥 ) + Ω(𝑔)] (1)
𝑔∈𝐺
𝐿 represents the loss function of the surrogate model 𝑔 on the labels 𝑓 , weighted by proximity 𝜋𝑥 . Ω represents
the complexity or number of features in the surrogate model. 𝐺 is the set of all locally fitted models, where
each explanation is produced by an individual local model. The authors additionally propose a sparse selection
of features, named Submodular Pick LIME (SP-LIME), to present the observer with a global view, based on
an allocated budget of maximal features to focus on. The method delivers diverse representation by omitting
redundancy. [85, 107] use LIME on top of tree ensembles to identify the contributions of individual features
pushing towards predicting a specific borrower as defaulting or successfully paying off the loan. Such explanations
can be useful in preventing social bias by discovering any socially discriminative features on which the model
may be focused, thereby instilling trust in the model’s usability. [127] extends LIME towards financial regulators
requiring commercial banks to adhere to a set of financial factors, where they propose a method named LIMER
(R stands for Regtech). The authors of LIMER argue that high acceptance of financial solutions can be achieved if
such factors are integrated into the explainability design of the AI model. [28] implements model simplification by
extracting logical rules from a random forest and selecting top most relevant rules. The decision rules are extracted
from a local dataset, derived similarly to LIME without weighting the proximity of each drawn sample. [81]
trains a recurrent neural network (RNN) to classify customer spending into five categories and an interpretable
linear regression model was subsequently trained to predict the nodes formed by the RNN model. The authors
then perform inverse regression which provides a mapping from output space to state space where the features
responsible for categorizing customer spending can be identified.
The final outcome is intuitively derived as an aggregate over all non-zero shapely values. SHAP’s popularity
stems from three attractive properties: guaranteeing a complete approximation of the original model 𝑓 (𝑥)
through additive feature attribution (see Equation 2), ensuring non-contributing features have no impact on
model output and consistency of feature values tracking the outcome contribution. We notice a large subset of
papers reviewed has utilized SHAP as an explanation approach, likely given its flexibility towards explaining the
model at both local and global scales (see Figure 8). [38] incorporates SHAP with additional credit knowledge
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:18 • Wei Jie et. al.
for the layperson to assess the logic of XGBoost’s decision in a peer-to-peer lending scenario. [91] introduces
RESHAPE, designed for unsupervised deep learning networks, which provide explanations at the attribute level.
Such explanations can assist auditors in understanding why an accounting statement is flagged as anomalous. The
authors evaluated RESHAPE against other variants of SHAP, based on metrics measuring fidelity, stability, and
robustness. Attributing to the recent frenzy in cryptocurrency which has led to a number of studies attempting
to predict movements in the cryptocurrency market, [41] proposes an interactive dashboard providing multiple
graphical tools using SHAP for financial experts. [8] applies SHAP to explain predictions, generated by the
popular mean-variance Markowitz model [82] which is an optimization model for establishing the optimal
portfolio balancing between returns and risk. The generated explanation provides regulators a means of asserting
compliance of algorithmic automized traders, otherwise known as robot-advisors, with established rules and
regulations. [36] incorporates Global Interpretation via Recursive Partitioning (GIRP) with SHAP as a global
interpretability technique. GIRP uses the importance values generated by SHAP to further extract meaning
insights from tree models, and the method is compared against a boolean rule technique in a credit scoring use
case. [17] constructs a tree-like visual explanation with TreeSHAP [73], specifically designed for ensemble trees
with an improvement in computational efficiency. The produced structure allows users to visualize clusters of
similar outcomes describing company default risk. [131] compares TreeSHAP against impurity metrics using
information gain, on ensemble tree models for investment quality prediction. [46] identifies relevant features
leading to consumers’ decision on purchasing insurance and further clusters them into least to most likely
groups with Shapley values. [16] similarly implements SHAP to explain XGBoost’s classification of credit risk,
while comparing it against an interpretable logistic regression model. Other studies include discovering the
relationship between corporate social responsibility and financial performance [66], customer satisfaction [99],
GDP growth rates [97], stock trading [12, 65], financial distress [116], market volatility forecast [124] and credit
evaluation [15, 42, 101]. [122] performs K-means clustering on historical S&P 500 stock information to identify
Fig. 8. [left] Values of feature importance at a global level for the ML model’s decision in credit card approval. [right] An
example of instance-level, 𝐸 (𝑓 (𝑋 )) represents the model’s base prediction if no features were considered, and 𝑓 (𝑥) represents
the final prediction after summing the contributing features (𝜙𝑖 ) [101].
dominant sector correlations which describe the state of the market. This work applies Layer-wise Relevance
Propagation (LRP) [9], after transforming the clustering classifier into a neural network since LRP is designed to
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:19
work specifically with neural network architectures. [20] prunes unimportant technical indicators using different
configurations of a permutation importance technique, before implementing decision tree techniques for stock
market forecasting. The proposed technique was compared with LIME and demonstrated better reliability. [14]
introduces a set of feature relevance techniques, Quantitative Input Influence (QII) [32] to compute interaction
effects between influential features and additionally for each multi-class label. The authors additionally evaluated
the ability of the XAI technique with five questions relating to each individual audience class. All of the XAI
methods shown thus far are implemented in the post-modeling stage, while [57] is an example pertaining to
pre-modeling where the identification of relevant features takes place before constructing the black-box model.
This work explores the set of features relating to mortgage bankruptcy and performs feature mapping against a
set of widely-used credit concepts. The utility of such an approach is confirmed through empirical evaluations.
As pointed out before, contrastive explanations are usually preferred. End-users subjected to an unfavorable AI
model’s decision would prefer a solution to the problem rather than a fact-based explanation which may present
multiple possible reasons, giving little use to the explanation receiver. An explanation providing changes to be
made such that the outcome can be reversed towards the favorable is referred to as a counterfactual explanation.
Counterfactuals are derived by computing small changes to the input features continuously until the outcome is
altered to the target class. [26] first identifies significant features, attributing to bankruptcy through SHAP, and
subsequently generates an optimal set of counterfactuals using Genetic Algorithm (GA). The loss optimized by GA
composes of objectives describing desirable properties of a good counterfactual outcome, including minimizing
the size of altered features and maximizing the feasibility of the outcome. [48] additionally provides positive
counterfactual explanations, describing the required changes to the current inputs that would instead reverse the
loan approval to rejection. Such explanations can provide some form of safety margin for the user to be mindful
of. [121] used various technique from DiCE [34] to generate counterfactuals under five different experimental
conditions. The experiment aims to identify and study the effects of the causal variables in the fraud detection of
ATM transactions.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:20 • Wei Jie et. al.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:21
Fig. 9. [Top] Transfer of educating statement to actionable statements advising applicant on actions to take such that the
subsequent loan application can be approved. [Bottom] Transfer of original statement highlighting actions to educating
statement conveying the reason for rejected loan application. [112].
sentences with a transformer architecture, trained under contextual decomposition. The explanation technique,
derived from Sampling and Contextual Decomposition (SCD) [60], performs different actions including inserting,
removing, or replacing words that are representative of the context in the statement, based on the target objective.
The high-level idea of counterfactual generation involves identifying the most relevant word and replacing it with
an antonym from a reference dictionary and continues until the outcome is reversed. The proposed transformer
outperforms even human experts in classifying financial articles on merger & acquisition event outcomes. [133]
generates text explanations using a state-of-the-art natural language generation transformer decoder, GPT-2 [98],
while fulfilling soft constraints of including keywords. The proposed technique, soft-constrained dynamic beam
allocation (SC-DBA) extracts keywords corresponding to various levels of predicted market volatility using a
separate network on harvested news titles. The quantitative measurement is evaluated based on the fluency and
utility of the explanation produced.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:22 • Wei Jie et. al.
to dual-level attention, [69, 75] propose a hierarchical attention model at both the word and sentence level and
produced explanations in the form of a heatmap, highlighting relevant text. The proposed method, FISHQA was
trained to detect loan arrears from financial text statements, similar to the compared baselines. The uniqueness
of the proposed method lies in providing FISHQA with additional user queries. The model was able to highlight
regions of the statement corresponding to the set of expert-defined concepts. This form of explanation allows
users to verify if the model is focusing on the correct terms relating to the concept at hand (refer to Figure 10).
Along the lines of hierarchical attention, [69] introduces a quantitative measure to evaluate the precision and
Fig. 10. FISQHA: hierarchical attention model, different colors relating to different financial concepts, grey - company, light
brown - executives, red - financing, blue - litigation, teal - personnel. [75].
recall of captured against various lexicon dictionaries and expert annotated lists. The approach, analogous to
the former study can be seen as an extrinsic process of ensuring the correctness of concept identification, by
capturing words associated with financial risk. [37] implements knowledge graphs to provide a visual linkage
between event entities extracted from stock news articles. The approach offers users a visual understanding
between the feature’s relationship and the corresponding prediction. [58] introduces GINN, an interpretable
neural network, the network is designed in a way that each layer represents different entities such as words and
concepts at the node level. The approach identifies words attributing to the predicted sentiment labels, as well as
the concepts it belongs to.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:23
corresponding to the respective outcome. Similarly, [44] implements LIME with LSTM-CNN and accurately
identifies attentive words in consonant with the target sentiment. [72] predicts the possibility of litigation on
financial firms from examining 10-K financial reports and numerical indicators concerning the firm’s accounting
knowledge. The author additionally carry out an ablation study on the utility of hybrid information as opposed
to individual and validated the initial approach. Correspondingly, the explanation served to regulators is framed
as the identification of text leading to the suspicion of insider trading, with the help of an attention mechanism.
[138] adopted the practice of shapley values and further integrate external knowledge regarding truth factors,
namely Truth Default Theory (TDT) [67] to detect information fraud. The explanation module incorporates
both shapley values and TDT to generate a report highlighting numerical contributions of features as well as a
text explanation. A union of explanation by simplification and feature relevance was proposed by [29, 43]. [43]
implements both LIME and SHAP, offering a global and local explanation of market fear prediction in the
Indian financial market. [79] uses SHAP and identifies textual information to be more important for classifying
financial transactions and further performs clustering to identify top contributing keywords. [29] interprets
an RL-trained agent’s behavior in algorithmic trading. The resulting explanation enables experts to focus on
time-dependent variables alongside consideration of non-linearity effects, which are reduced to a small subset
of initial variables. The learned policy is simplified via policy distillation, onto the space of linear regressions
such that an interpretable Lasso regression model can be used as an interpretable approximation. Subsequently,
𝑘-degree polynomial analysis is conducted to select salient features, with 𝑘 acting as an additional flexibility
for the developer to decide. [96] utilizes aspect-based sentiment analysis to study the relationship between
stock price movement and top relevant aspects detected in tweets. The polarity of each aspect is derived from a
SenticNet-based graph convolutional network (GCN) [68]. The proposed method can be seen as analogous to the
feature relevance technique, aimed at deriving top contributing aspects with polarity values. The proposed work
focuses on the relationship between financial variables instead of making financial predictions. Such information
can allow for further analysis, leveraging the relationship between the price movement of individual stocks and
individual sentiment of popular terms detected in tweets.
Hybrid information combines the utility of both numerical and textual information, which can lead to better
performance and an increase in the number of compatible explanation techniques. For example, text generation
techniques can be used to generate natural language explanations for non-technical audiences, facilitating ease
of understanding, while feature relevance approaches can be utilized to identify top contributing factors in the
feature domain for technical experts. Models working with both numerical and textual information can also
benefit from a performance point of view if such information can be processed without the risk of overfitting.
However, it may be difficult for models to seamlessly perform with hybrid information, as it ultimately depends
on the task at hand and may require complex feature engineering. For instance, the utility of text information
largely depends on the source and often requires a significant amount of preprocessing before the data can be
useful. The combination of both text and numerical features may increase the complexity of the explanation
and end up being counterproductive. Such issues limit the inclusion of textual information in use cases such as
stock trading or market index predictions. Nonetheless, we note that leveraging hybrid information to provide
explanations can be a promising approach if the aforementioned issues are addressed.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:24 • Wei Jie et. al.
complicated tasks is largely restricted due to their poor predictive strength. Nevertheless, transparent models
still remain an attractive option if sufficient performance can be guaranteed.
Linear/Logistic Regression: Linear regression model is among the earliest ML models to be used for quantitative
analysis. The prediction outcome can be easily derived as a weighted aggregate of input features. As such, the
outcome can naturally be interpreted by inferring from the coefficients, 𝑊𝑖 which serves as a quantitative measure
of feature importance for the outcome. Attributing to the linearity assumption, the output 𝑦 can be derived as
such:
𝑦 = 𝑊0 + 𝑊1𝑥 1 + 𝑊2𝑥 2 + ...𝑊𝑛 𝑥𝑛 + 𝜖 (3)
One can easily interpret the outcome as “By increasing feature 𝑥𝑖 by one unit, the output increases by 𝑊𝑖 ”. On
the other hand, the logistic regression model is interpreted in a slightly different manner, since the output is
bounded between [0,1], a logistic function is used. Logistic regression looks at the probability ratio between
𝑃 (𝑦=1)
both outputs: “Increasing one unit of 𝑥𝑖 is equivalent to increasing 𝑃 (𝑦=0) by 𝑒𝑥𝑝 (𝑊𝑖 )” [88]. [39] addresses the
trade-off between accuracy and interpretability by incorporating decision trees with logistic regression acting as
the main operational backbone. The technique is coined as Penalised Logistic Tree Regression (PLTR). PLTR
extracts binary variables from short-depth DTs, all the while establishing sparsity through a penalized lasso
operation. The proposed model is able to account for non-linear effects in a credit-scoring dataset while retaining
interpretability by observing the top-selected rules.
Decision Trees: Decision trees are one of the most commonly used techniques in machine learning problems
due to their simplicity and easily understandable structure. Unlike linear/logistic regression, DT can approximate
nonlinear relationships and yet remain interpretable via simple if-else logic. However, the transparency of tree
models diminishes with increasing depth, and popular ensemble tree models such as XGBoost or gradient-boosting
tree models completely eliminate any form of interpretability. The user can interpret decision trees by traversing
through the root node and upon arrival at each leaf node. The outcome can simply be explained as “if 𝑥 1 is >/<
𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 1 AND 𝑥 2 is >/< 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 2 , · · · , outputs 𝑌 ”. [47] employs a single DT and frames the loan approval
task as one which maximizes profit for the lender firm. [21] builds a lexicon dictionary associated with stock
price variation, extracted from a dataset comprising both news and historical stock prices. The combined effort
provides users with two forms of explanation, observed in a sequential rule-based manner as well as words
correlated with the predicted market direction.
Others: [3] constructs an interactive platform, Temenos XAI using fuzzy logic to make financial predictions. The
authors demonstrated the efficacy and explainability in various downstream banking and trading scenarios. The
usage of fuzzy-logic accounts for uncertainty, which is prevalent in the financial environment, and is especially
useful for modeling imprecise information. The platform allows users to interpret the model on a global scale
as well as at an instance-level, via observing the top contributing rules. [22] builds on top of neural additive
models (NAM) [4] and introduces a generalized form of NAM, GGNAMS which focuses on sparse nonlinear
interactions. GGNAMS can be regarded as an intermediate between fully connected networks and logistic/linear
regression with the intent being to retain linearity and minimize excessive interactions among features while
maximizing accuracy. The additive components can then be interpreted similarly to LR. [92] similarly implements
NAMs and Explainable Boosting Machine [94] to identify financial drivers leading to creditor recovery rates. [39]
proposes a hybrid approach of combining decision trees with logistic regression, capturing nonlinear effects
while retaining the transparency of the model’s behavior. [113] advocates for designing inherently transparent
models in the pre-modeling/modeling stages and suggested a qualitative template, describing properties of model
interpretability. The intention of the template is to allow researchers to ensure model interpretability while
designing the model architecture. As a proof of concept, this work designs an interpretable ReLU network while
conforming to the proposed template, and evaluates the network in a credit default classification task. The
resulting network can be disentangled into a set of local linear models whose inherent transparency can be
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:25
Transparent models have the advantage of being interpretable without requiring additional approaches to
interpret the model or outcome. However, there exists a clear trade-off between desired performance and sufficient
interpretability. Certain works have introduced approaches that combine different transparent models to achieve
better performance while still retaining as much model transparency as possible. Transparent models remain a
popular choice in the financial domain, as companies must undergo routine audits that require audited firms
to provide accountability for their algorithmic services offered to end-users. Nonetheless, given the monotonic
success of deep learning models, companies seeking to maintain their competitive edge must either improve on
the existing transparent models or balance the performance-interpretability trade-off. A recommended approach
would be to stick with transparent models if their performance proves sufficient and proceed with less interpretable
models otherwise. One could also break down the task in a hierarchical manner, using interpretable models for
lower-level tasks and better-performing models for more complicated tasks.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:26 • Wei Jie et. al.
Table 4. The contributions of current FinXAI techniques to ethical goals. VE, ES, FR, EE, and TE denote visual explanation,
explanation by simplification, feature relevance, explanation by example, and text explanation, respectively.
by discovering important features. [41] improves usability by constructing interactive graphical tools upon
SHAP, which likewise promotes accessibility. [121] is one of the rare works that study causal inference based
on generated counterfactuals. For EE methods, [33] generates counterfactuals to explain the required changes,
based on representative instances. The selected representatives of similar instances by EE methods [33, 36] can
be used to select instances to represent a particular cluster in the output space. Similar instances aligned to such
representatives can assure and improve the confidence of stakeholders.
XAI for textual information targets to improve trustworthiness, informativeness, accessibility, and causality.
For TE methods, [112, 128] similarly improves trustworthiness with counterfactual texts, with the latter providing
alignment according to the user’s prompt. The alignment from educational to actionable information enhances
information flow, especially for individuals not familiar with the service interface. For VE techniques operating on
text, attention weights are widely used for interpretation purposes. Such techniques enhance informativeness and
accessibility by using the attention weights to understand regions of focus by the underlying model [69, 75, 129].
On the other hand, [37, 58] improve the interpretability of graph neural networks in the financial domain.
XAI for hybrid information leverages both textual and numerical features to improve informativeness and
accessibility. These works interpret the black-box model’s behavior [11, 29, 43, 44, 72, 79, 138], and provide textual
evidence regarding predictions [11, 29, 43, 44, 79, 96]. [3, 21, 22, 39, 47, 92, 113] implements transparent models,
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:27
mitigating the need for post-hoc analysis, and the simplicity of such models improves upon the accessibility
for non-expert users. Transparent models may be a suitable choice if the performance is satisfactory and the
outcome has to be readily interpretable by non-technical stakeholders. Decision trees are one model which can
be easily communicated to audiences without a technical background, given its easily understandable format.
From the above works, we can find that most of the XAI research lies in either studying the underlying model’s
behavior or identifying important features. Notably, the connotations of informativeness and accessibility goals
are rich as seen in Table 4, no XAI technique can achieve all of the desired goals. It is therefore imperative that
the XAI design process is tailored towards the desired goals of the target audience Fig. 2. Likewise, the format of
presenting explanations is equally important. Non-technical audiences would very much prefer user-friendly
visuals as compared to technical plots.
On the other hand, the ethical goal of preserving data-privacy has not been well studied in the works reviewed.
Privacy-preserving techniques are a popular research direction, e.g., federated learning [130]. It is a decentralized
learning method that allows parties to collaboratively train a model within a local environment without sharing
their data with each other. Generating synthetic data in place of actual data for training models can be one such
approach. One example is LIME which generates a local dataset given a target instance, without requiring access
to other data instances. This can help to minimize the amount of information being accessed outside of the
accountable circle. The understanding of how various features lead to a certain output creates the opportunity
of generating more synthetic data. This can be seen as a form of self-supervised learning with the purpose of
preserving privacy. However, XAI techniques can also become a double-edged sword, attributing to privacy
leakage instead. Such concerns are especially prevalent in techniques manipulating decision boundaries including
SVM, K-nearest neighbors, and counterfactual explanations [111]. For example, a counterfactual explanation
on reserving a loan application might reveal a suite of sensitive information (location, 10-year income, marital
status) to be modified, even though such information is meant to be anonymized. The leaked information can be
accessed by third-party providers who may be part of the product design or malicious hackers. A key challenge
is managing the balance between the fidelity of the delivered explanation and the sensitive features altered. Data
leakage goes against the privacy awareness goal of XAI and such events are not rare in the financial sector where
there exists a constant supply of computerized bots looking to capitalize on these openings. The consequences
often affect a large group of public stakeholders [35], and the affected firm has to pay large fines and incur
a loss of trust from their clients. In addition, overly-expressive explanations may allow external competitors
to reverse-engineer the models and potentially replicate and improve upon them, thereby compromising the
competitive edge a company holds.
XAI techniques that improve transferability are another less frequently studied area. In the field of general
AI, transferable knowledge is usually acquired through transfer learning [93], multi-task learning [77], meta-
learning [53], and domain adaptation [126]. However, the main carrier of these learning paradigms at present
is usually deep neural networks. It is difficult to acquire explainability for a deep neural network by using
these learning methods. In addition, knowledge forgetting also brings challenges to traditional neural network-
based learning methods [52]. Thus, the old knowledge stored in the neural network is likely to be replaced by
the learned new knowledge, if the old knowledge is not retained together with the new knowledge. In light
of this, how can we leverage explainability and transferability, simultaneously? One possible direction is to
utilize neural symbolic techniques. Neural symbolic AI has achieved significant impacts in natural language
processing (NLP), e.g., sentiment analysis [18] and metaphor processing [78]. It takes the merits of both neural
networks and symbolic representations. For example, neural networks have strong generalization ability in
learning feature representations. Symbolic reasoning enables human-understandable explanations of the system’s
decision-making process through transparency and interpretability. Since symbolic knowledge can be readily
stored in a knowledge base permanently, it avoids the problem of knowledge forgetting in neural networks. A
comprehensive and accurate knowledge base can weaken the fitting ability of the neural network. As a result, a
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:28 • Wei Jie et. al.
more lightweight and transparent neural network can be used in a neural symbolic system. However, developing
domain-specific knowledge for finance is costly. Besides, developing symbolic representations for numerical data
is also challenging.
Finally, improving fairness, confidence, and causality is also important for ethical concerns. Whereas, the
FinXAI research in these areas is very limited. As noted in Table 4, there are not many explanation methods that
approach these goals, e.g., ES for fairness with numerical features; EE for confidence with numerical features;
and FR and TE for causality with numerical and textual features, respectively. However, it is difficult for a
one-fit-all explanation. Hence, we highlight the importance of an audience-centric XAI technique as a more
realistic expectation.
8.1 Over-reliance
A means of interpreting the model can be helpful while transforming how users interact with data. However, it can
cause users to over-rely on possibly inaccurate explanations. A survey was conducted to study how data scientists
perceive explanations provided by different XAI tools and found out a large proportion tend to over-trust the
explanations provided [61], especially the ones which have received widespread usage. The visual explanations
delivered by feature relevance techniques such as SHAP, tend to be absorbed at face value, which can cause
researchers to not question their legitimacy. Concurrently, a data scientist who has spent an enormous amount
of time designing the AI model may already have prior beliefs on the outcome or model and are more inclined
to accept the explanation if aligned with their initial beliefs [56]. Such an occurrence is commonly known as
confirmation bias. Over-trusting these explanations can be especially damaging if conveyed to the layperson and
can result in the spread of misinformation to a wider audience.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:29
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:30 • Wei Jie et. al.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:31
9 CONCLUSION
Overall, explainability will continue to be a critical area of focus in FinTech as companies seek to build trust
and confidence with consumers and regulators alike. To conclude our work, we have provided a comprehensive
review of XAI tools in the financial domain (FinXAI), highlighting the significant progress made in recent years
toward developing explainable AI models for financial applications. This includes both inherently transparent
models and post-hoc explainability techniques, the former of which we advocate for more improvements to be
made. We provided a framework that establishes the selection of appropriate FinXAI tools as a sequential decision-
making process, placing great emphasis on the audience and iterative assessment of produced explanation. The
reviewed works are categorized according to their respective characteristics for ease of access by interested
readers. We also examine the contributions of current FinXAI to several ethical goals, e.g., trustworthiness,
fairness, informativeness, accessibility, privacy, confidence, causality, and transparency.
Though there have been many great works done thus far, the review also reveals some limitations and challenges
associated with FinXAI. This includes appropriate metrics to measure both the faithfulness and plausibility
of explanations, as well as issues concerning the over-reliance on potentially misleading explanations. Future
research should focus on addressing these challenges, as well as exploring new directions for FinXAI, including
integrating NLP into explanation-generating techniques and a greater focus on inherently transparent models.
Nevertheless, there is great potential for XAI techniques to enhance transparency, trust, and accountability in the
financial domain. This underscores the importance of active research and development in this field.
REFERENCES
[1] 2021. Microsoft Responsible AI Toolbox. Retrieved February 23, 2023 from https://github.com/microsoft/responsible-ai-toolbox
[2] Idan Achituve, Sarit Kraus, and Jacob Goldberger. 2019. Interpretable online banking fraud detection based on hierarchical attention
mechanism. In 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1–6.
[3] Janet Adams and Hani Hagras. 2020. A type-2 fuzzy logic approach to explainable AI for regulatory compliance, fair customer outcomes
and market stability in the global financial sector. In 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, 1–8.
[4] Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, and Geoffrey E Hinton. 2021. Neural
additive models: Interpretable machine learning with neural nets. Advances in Neural Information Processing Systems 34 (2021),
4699–4711.
[5] Elvio Amparore, Alan Perotti, and Paolo Bajardi. 2021. To trust or not to trust an explanation: using LEAF to evaluate local linear XAI
methods. PeerJ Computer Science 7 (2021), e479.
[6] Daniel W Apley and Jingyu Zhu. 2016. Visualizing the effects of predictor variables in black box supervised learning models. arXiv
preprint arXiv:1612.08468 (2016).
[7] Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García,
Sergio Gil-López, Daniel Molina, Richard Benjamins, et al. 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies,
opportunities and challenges toward responsible AI. Information fusion 58 (2020), 82–115.
[8] Golnoosh Babaei, Paolo Giudici, and Emanuela Raffinetti. 2022. Explainable artificial intelligence for crypto asset allocation. Finance
Research Letters 47 (2022), 102941.
[9] Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On
pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10, 7 (2015), e0130140.
[10] Arash Bahrammirzaee. 2010. A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert
system and hybrid intelligent systems. Neural Computing and Applications 19, 8 (2010), 1165–1195.
[11] Harit Bandi, Suyash Joshi, Siddhant Bhagat, and Dayanand Ambawade. 2021. Integrated Technical and Sentiment Analysis Tool for
Market Index Movement Prediction, comprehensible using XAI. In 2021 International Conference on Communication information and
Computing Technology (ICCICT). IEEE, 1–8.
[12] Eric Benhamou, Jean-Jacques Ohana, David Saltiel, and Beatrice Guez. 2021. Explainable AI (XAI) models applied to planning in
financial markets. (2021).
[13] Przemysław Biecek, Marcin Chlebus, Janusz Gajda, Alicja Gosiewska, Anna Kozak, Dominik Ogonowski, Jakub Sztachelski, and Piotr
Wojewnik. 2021. Enabling machine learning algorithms for credit scoring–explainable artificial intelligence (XAI) methods for clear
understanding complex predictive models. arXiv preprint arXiv:2104.06735 (2021).
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:32 • Wei Jie et. al.
[14] Philippe Bracke, Anupam Datta, Carsten Jung, and Shayak Sen. 2019. Machine learning explainability in finance: an application to
default risk analysis. (2019).
[15] Andreas C Bueff, Mateusz Cytryński, Raffaella Calabrese, Matthew Jones, John Roberts, Jonathon Moore, and Iain Brown. 2022. Machine
learning interpretability for a stress scenario generation in credit scoring based on counterfactuals. Expert Systems with Applications
202 (2022), 117271.
[16] Niklas Bussmann, Paolo Giudici, Dimitri Marinelli, and Jochen Papenbrock. 2020. Explainable AI in fintech risk management. Frontiers
in Artificial Intelligence 3 (2020), 26.
[17] Niklas Bussmann, Paolo Giudici, Dimitri Marinelli, and Jochen Papenbrock. 2021. Explainable machine learning in credit risk
management. Computational Economics 57 (2021), 203–216.
[18] Erik Cambria, Qian Liu, Sergio Decherchi, Frank Xing, and Kenneth Kwok. 2022. SenticNet 7: A commonsense-based neurosymbolic
AI framework for explainable sentiment analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference.
3829–3839.
[19] Erik Cambria, Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, and Navid Nobani. 2023. A survey on XAI and natural language
explanations. Information Processing & Management 60, 1 (2023), 103111.
[20] Salvatore Carta, Alessandro Sebastian Podda, Diego Reforgiato Recupero, and Maria Madalina Stanciu. 2022. Explainable AI for
financial forecasting. In Machine Learning, Optimization, and Data Science: 7th International Conference, LOD 2021, Grasmere, UK,
October 4–8, 2021, Revised Selected Papers, Part II. Springer, 51–69.
[21] Salvatore M Carta, Sergio Consoli, Luca Piras, Alessandro Sebastian Podda, and Diego Reforgiato Recupero. 2021. Explainable machine
learning exploiting news and domain-specific lexicon for stock market forecasting. IEEE Access 9 (2021), 30193–30205.
[22] Dangxing Chen and Weicheng Ye. 2022. Generalized Gloves of Neural Additive Models: Pursuing transparent and accurate machine
learning models in finance. arXiv preprint arXiv:2209.10082 (2022).
[23] Jiahao Chen and Victor Storchan. 2021. Seven challenges for harmonizing explainability requirements. arXiv preprint arXiv:2108.05390
(2021).
[24] Jun-Hao Chen, Samuel Yen-Chi Chen, Yun-Cheng Tsai, and Chih-Shiang Shur. 2020. Explainable deep convolutional candlestick learner.
arXiv preprint arXiv:2001.02767 (2020).
[25] Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international
conference on knowledge discovery and data mining. 785–794.
[26] Soo Hyun Cho and Kyung-shik Shin. 2023. Feature-Weighted Counterfactual-Based Explanation for Bankruptcy Prediction. Expert
Systems with Applications 216 (2023), 119390.
[27] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks
on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).
[28] Dennis Collaris, Leo M Vink, and Jarke J van Wijk. 2018. Instance-level explanations for fraud detection: A case study. arXiv preprint
arXiv:1806.07129 (2018).
[29] Lin William Cong, Ke Tang, Jingyuan Wang, and Yang Zhang. 2021. AlphaPortfolio: Direct construction through deep reinforcement
learning and interpretable AI. Available at SSRN 3554486 (2021).
[30] Lisa Crosato, Caterina Liberati, and Marco Repetto. 2021. Look Who’s Talking: Interpretable Machine Learning for Assessing Italian
SMEs Credit Default. arXiv preprint arXiv:2108.13914 (2021).
[31] Marina Danilevsky, Kun Qian, Ranit Aharonov, Yannis Katsis, Ban Kawas, and Prithviraj Sen. 2020. A survey of the state of explainable
AI for natural language processing. arXiv preprint arXiv:2010.00711 (2020).
[32] Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments
with learning systems. In 2016 IEEE symposium on security and privacy (SP). IEEE, 598–617.
[33] Randall Davis, Andrew W Lo, Sudhanshu Mishra, Arash Nourian, Manish Singh, Nicholas Wu, and Ruixun Zhang. 2022. Explainable
machine learning models of consumer credit risk. Available at SSRN (2022).
[34] Kelly C De Bruin, Rob B Dellink, and Richard SJ Tol. 2009. AD-DICE: an implementation of adaptation in the DICE model. Climatic
Change 95 (2009), 63–81.
[35] AJ Dellinger. 2019. Understanding The First American Financial Data Leak: How Did It Happen And What Does It Mean? Retrieved
February 22, 2023 from https://www.forbes.com/sites/ajdellinger/2019/05/26/understanding-the-first-american-financial-data-leak-
how-did-it-happen-and-what-does-it-mean/?sh=7716df86567f
[36] Lara Marie Demajo, Vince Vella, and Alexiei Dingli. 2020. Explainable AI for Interpretable Credit Scoring. In CS & IT Conference
Proceedings, Vol. 10. CS & IT Conference Proceedings.
[37] Shumin Deng, Ningyu Zhang, Wen Zhang, Jiaoyan Chen, Jeff Z Pan, and Huajun Chen. 2019. Knowledge-driven stock trend prediction
and explanation via temporal convolutional network. In Companion Proceedings of The 2019 World Wide Web Conference. 678–685.
[38] Murat Dikmen and Catherine Burns. 2022. The effects of domain knowledge on trust in explainable AI and task performance: A case of
peer-to-peer lending. International Journal of Human-Computer Studies 162 (2022), 102792.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:33
[39] Elena Dumitrescu, Sullivan Hué, Christophe Hurlin, and Sessi Tokpavi. 2022. Machine learning for credit scoring: Improving logistic
regression with non-linear decision-tree effects. European Journal of Operational Research 297, 3 (2022), 1178–1192.
[40] Taghi Farzad. 2019. Determinants of Mortgage Loan Delinquency: Application of Interpretable Machine Learning.
[41] Jacopo Fior, Luca Cagliero, and Paolo Garza. 2022. Leveraging Explainable AI to Support Cryptocurrency Investors. Future Internet 14,
9 (2022), 251.
[42] Sebastian Fritz-Morgenthal, Bernhard Hein, and Jochen Papenbrock. 2022. Financial risk management and explainable, trustworthy,
responsible AI. Frontiers in Artificial Intelligence 5 (2022), 5.
[43] Indranil Ghosh and Manas K Sanyal. 2021. Introspecting predictability of market fear in Indian context during COVID-19 pandemic:
An integrated approach of applied predictive modelling and explainable AI. International Journal of Information Management Data
Insights 1, 2 (2021), 100039.
[44] Shilpa Gite, Hrituja Khatavkar, Ketan Kotecha, Shilpi Srivastava, Priyam Maheshwari, and Neerav Pandey. 2021. Explainable stock
prices prediction from financial news articles using sentiment analysis. PeerJ Computer Science 7 (2021), e340.
[45] Bryce Goodman and Seth Flaxman. 2017. European Union regulations on algorithmic decision-making and a “right to explanation”. AI
magazine 38, 3 (2017), 50–57.
[46] Alex Gramegna and Paolo Giudici. 2020. Why to buy insurance? An explainable artificial intelligence approach. Risks 8, 4 (2020), 137.
[47] Thomas Gramespacher and Jan-Alexander Posth. 2021. Employing explainable AI to optimize the return target function of a loan
portfolio. Frontiers in Artificial Intelligence 4 (2021), 693022.
[48] Rory Mc Grath, Luca Costabello, Chan Le Van, Paul Sweeney, Farbod Kamiab, Zhao Shen, and Freddy Lecue. 2018. Interpretable credit
application predictions with counterfactual explanations. arXiv preprint arXiv:1811.05245 (2018).
[49] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods
for explaining black box models. ACM computing surveys (CSUR) 51, 5 (2018), 1–42.
[50] Karthik S Gurumoorthy, Amit Dhurandhar, Guillermo Cecchi, and Charu Aggarwal. 2019. Efficient data representation by selecting
prototypes with importance weights. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 260–269.
[51] Sooji Han, Rui Mao, and Erik Cambria. 2022. Hierarchical Attention Network for Explainable Depression Detection on Twitter Aided by
Metaphor Concept Mappings. In Proceedings of the 29th International Conference on Computational Linguistics (COLING). International
Committee on Computational Linguistics, Gyeongju, Republic of Korea, 94–104.
[52] Kai He, Rui Mao, Tieliang Gong, Erik Cambria, and Chen Li. 2022. JCBIE: A Joint Continual Learning Neural Network for Biomedical
Information Extraction. BMC Bioinformatics 23, 549 (2022). https://doi.org/10.1186/s12859-022-05096-w
[53] Kai He, Rui Mao, Tieliang Gong, Chen Li, and Erik Cambria. 2022. Meta-based Self-training and Re-weighting for Aspect-based
Sentiment Analysis. IEEE Transactions on Affective Computing (2022). https://doi.org/10.1109/TAFFC.2022.3202831
[54] AI HLEG. 2019. Ethics guidelines for trustworthy AI. Retrieved February 8, 2023 from https://digital-strategy.ec.europa.eu/en/library/
ethics-guidelines-trustworthy-ai
[55] Robert R Hoffman, Shane T Mueller, Gary Klein, and Jordan Litman. 2018. Metrics for explainable AI: Challenges and prospects. arXiv
preprint arXiv:1812.04608 (2018).
[56] Fred Hohman, Andrew Head, Rich Caruana, Robert DeLine, and Steven M Drucker. 2019. Gamut: A design probe to understand how
data scientists understand machine learning models. In Proceedings of the 2019 CHI conference on human factors in computing systems.
1–13.
[57] Sheikh Rabiul Islam, William Eberle, Sid Bundy, and Sheikh Khaled Ghafoor. 2019. Infusing domain knowledge in AI-based “ black box”
models for better explainability with application in bankruptcy prediction. arXiv preprint arXiv:1905.11474 (2019).
[58] Tomoki Ito, Hiroki Sakaji, Kiyoshi Izumi, Kota Tsubouchi, and Tatsuo Yamashita. 2020. GINN: gradient interpretable neural networks
for visualizing financial texts. International Journal of Data Science and Analytics 9 (2020), 431–445.
[59] Alon Jacovi and Yoav Goldberg. 2020. Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness?
arXiv preprint arXiv:2004.03685 (2020).
[60] Xisen Jin, Zhongyu Wei, Junyi Du, Xiangyang Xue, and Xiang Ren. 2019. Towards hierarchical importance attribution: Explaining
compositional semantics for neural sequence models. arXiv preprint arXiv:1911.06194 (2019).
[61] Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting
interpretability: understanding data scientists’ use of interpretability tools for machine learning. In Proceedings of the 2020 CHI conference
on human factors in computing systems. 1–14.
[62] Marko Kolanovic and Rajesh T. Krishnamachari. 2017. Big Data and AI Strategies, Machine Learning and Alternative Data Approach
to Investing.
[63] Ouren Kuiper, Martin van den Berg, Joost van der Burgt, and Stefan Leijnen. 2022. Exploring explainable ai in the financial sector:
Perspectives of banks and supervisory authorities. In Artificial Intelligence and Machine Learning: 33rd Benelux Conference on Artificial
Intelligence, BNAIC/Benelearn 2021, Esch-sur-Alzette, Luxembourg, November 10–12, 2021, Revised Selected Papers 33. Springer, 105–119.
[64] Devinder Kumar, Graham W Taylor, and Alexander Wong. 2017. Opening the black box of financial ai with clear-trade: A class-
enhanced attentive response approach for explaining and visualizing deep learning-driven stock market prediction. arXiv preprint
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:34 • Wei Jie et. al.
arXiv:1709.01574 (2017).
[65] Satyam Kumar, Mendhikar Vishal, and Vadlamani Ravi. 2022. Explainable Reinforcement Learning on Financial Stock Trading using
SHAP. arXiv preprint arXiv:2208.08790 (2022).
[66] Julien Lachuer and Sami Ben Jabeur. 2022. Explainable artificial intelligence modeling for corporate social responsibility and financial
performance. Journal of Asset Management 23, 7 (2022), 619–630.
[67] Timothy R Levine. 2014. Truth-default theory (TDT) a theory of human deception and deception detection. Journal of Language and
Social Psychology 33, 4 (2014), 378–392.
[68] Bin Liang, Hang Su, Lin Gui, Erik Cambria, and Ruifeng Xu. 2022. Aspect-based sentiment analysis via affective knowledge enhanced
graph convolutional networks. Knowledge-Based Systems 235 (2022), 107643.
[69] Ting-Wei Lin, Ruei-Yao Sun, Hsuan-Ling Chang, Chuan-Ju Wang, and Ming-Feng Tsai. 2021. XRR: Explainable Risk Ranking for
Financial Reports. In Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track: European Conference, ECML
PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part IV 21. Springer, 253–268.
[70] Peter Lipton. 1990. Contrastive explanation. Royal Institute of Philosophy Supplements 27 (1990), 247–266.
[71] Zachary C Lipton. 2018. The mythos of model interpretability: In machine learning, the concept of interpretability is both important
and slippery. Queue 16, 3 (2018), 31–57.
[72] Rong Liu, Feng Mai, Zhe Shan, and Ying Wu. 2020. Predicting shareholder litigation on insider trading from financial text: An
interpretable deep learning approach. Information & Management 57, 8 (2020), 103387.
[73] Scott M Lundberg, Gabriel G Erion, and Su-In Lee. 2018. Consistent individualized feature attribution for tree ensembles. arXiv preprint
arXiv:1802.03888 (2018).
[74] Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing
systems 30 (2017).
[75] Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond Polarity: Interpretable Financial
Sentiment Analysis with Hierarchical Query-driven Attention. In IJCAI. 4244–4250.
[76] Yu Ma, Rui Mao, Qika Lin, Peng Wu, and Erik Cambria. 2023. Multi-source Aggregated Classification for Stock Price Movement
Prediction. Information Fusion 91 (2023), 515–528. https://doi.org/10.1016/j.inffus.2022.10.025
[77] Rui Mao and Xiao Li. 2021. Bridging Towers of Multi-task Learning with a Gating Mechanism for Aspect-based Sentiment Analysis
and Sequential Metaphor Identification. Proceedings of the AAAI Conference on Artificial Intelligence 35, 15 (2021), 13534–13542.
https://doi.org/10.1609/aaai.v35i15.17596
[78] Rui Mao, Xiao Li, Mengshi Ge, and Erik Cambria. 2022. MetaPro: A computational metaphor processing model for text pre-processing.
Information Fusion 86-87 (2022), 30–43. https://doi.org/10.1016/j.inffus.2022.06.002
[79] Charl Maree, Jan Erik Modal, and Christian W Omlin. 2020. Towards responsible AI for financial transactions. In 2020 IEEE Symposium
Series on Computational Intelligence (SSCI). IEEE, 16–21.
[80] Charl Maree and Christian W Omlin. 2022. Can interpretable reinforcement learning manage prosperity your way? AI 3, 2 (2022),
526–537.
[81] Charl Maree and Christian W Omlin. 2022. Understanding Spending Behavior: Recurrent Neural Network Explanation and Interpretation.
In 2022 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr). IEEE, 1–7.
[82] Harry Markowitz. 1952. Portfolio Selection. The Journal of Finance 7, 1 (1952), 77–91. http://www.jstor.org/stable/2975974
[83] BNY Mellon. 2021. Why Every Financial Institution Should Consider Explainable AI. Retrieved February 8, 2023 from https://www.
bnymellon.com/us/en/insights/all-insights/why-every-financial-institution-should-consider-explainable-ai.html
[84] Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence 267 (2019), 1–38.
[85] Branka Hadji Misheva, Joerg Osterrieder, Ali Hirsa, Onkar Kulkarni, and Stephen Fung Lin. 2021. Explainable AI in credit risk
management. arXiv preprint arXiv:2103.00949 (2021).
[86] Sina Mohseni, Fan Yang, Shiva Pentyala, Mengnan Du, Yi Liu, Nic Lupfer, Xia Hu, Shuiwang Ji, and Eric Ragan. 2021. Machine learning
explanations to prevent overtrust in fake news detection. In Proceedings of the International AAAI Conference on Web and Social Media,
Vol. 15. 421–431.
[87] Sina Mohseni, Niloofar Zarei, and Eric D Ragan. 2021. A multidisciplinary survey and framework for design and evaluation of
explainable AI systems. ACM Transactions on Interactive Intelligent Systems (TiiS) 11, 3-4 (2021), 1–45.
[88] Christoph Molnar. 2020. Interpretable machine learning. Lulu. com.
[89] Agnieszka Mroczkowska. 2020. What is a Fintech Application?, Definition and Insights for Business Owners. Retrieved February 7, 2023
from https://www.thedroidsonroids.com/blog/what-is-a-fintech-application-definition-and-insights-for-business-owners/
[90] Shane T Mueller, Robert R Hoffman, William Clancey, Abigail Emrey, and Gary Klein. 2019. Explanation in human-AI systems: A
literature meta-review, synopsis of key ideas and publications, and bibliography for explainable AI. arXiv preprint arXiv:1902.01876
(2019).
[91] Ricardo Müller, Marco Schreyer, Timur Sattarov, and Damian Borth. 2022. RESHAPE: Explaining Accounting Anomalies in Financial
Statement Audits by enhancing SHapley Additive exPlanations. In Proceedings of the Third ACM International Conference on AI in
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
A Comprehensive Review on Financial Explainable AI • 111:35
Finance. 174–182.
[92] Abdolreza Nazemi, Jonas Rauch, and Frank J Fabozzi. 2022. Interpretable Machine Learning for Creditor Recovery Rates. Available at
SSRN 4190345 (2022).
[93] Behnam Neyshabur, Hanie Sedghi, and Chiyuan Zhang. 2020. What is being transferred in transfer learning? Advances in neural
information processing systems 33 (2020), 512–523.
[94] Harsha Nori, Samuel Jenkins, Paul Koch, and Rich Caruana. 2019. Interpretml: A unified framework for machine learning interpretability.
arXiv preprint arXiv:1909.09223 (2019).
[95] Monetary Authority of Singapore. 2021. Veritas Initiative Addresses Implementation Challenges in the Responsible Use of Artificial
Intelligence and Data Analytics. Retrieved February 8, 2023 from https://www.mas.gov.sg/news/media-releases/2021/veritas-initiative-
addresses-implementation-challenges
[96] Keane Ong, Wihan van der Heever, Ranjan Satapathy, Gianmarco Mengaldo, and Erik Cambria. 2023. FinXABSA: Explainable Finance
through Aspect-Based Sentiment Analysis. arXiv:2303.02563 [cs.CL]
[97] Sangjin Park and Jae-Suk Yang. 2022. Interpretable deep learning LSTM model for intelligent economic decision-making. Knowledge-
Based Systems 248 (2022), 108907.
[98] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised
multitask learners. OpenAI blog 1, 8 (2019), 9.
[99] Ioannis Rallis, Yannis Markoulidakis, Ioannis Georgoulas, George Kopsiaftis, Maria Kaselimi, Nikolaos Doulamis, and Anastasios
Doulamis. 2022. Interpretation of net promoter score attributes using explainable AI. In Proceedings of the 15th International Conference
on PErvasive Technologies Related to Assistive Environments. 113–117.
[100] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why should i trust you?” Explaining the predictions of any classifier.
In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.
[101] Maryan Rizinski, Hristijan Peshov, Kostadin Mishev, Lubomir T Chitkushev, Irena Vodenska, and Dimitar Trajanov. 2022. Ethically
Responsible Machine Learning in Fintech. IEEE Access 10 (2022), 97531–97554.
[102] Thomas Rojat, Raphaël Puget, David Filliat, Javier Del Ser, Rodolphe Gelin, and Natalia Díaz-Rodríguez. 2021. Explainable artificial
intelligence (xai) on timeseries data: A survey. arXiv preprint arXiv:2104.00950 (2021).
[103] Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models
instead. Nature machine intelligence 1, 5 (2019), 206–215.
[104] Cynthia Rudin and Joanna Radin. 2019. Why are we using black box models in AI when we don’t need to? A lesson from an explainable
AI competition. Harvard Data Science Review 1, 2 (2019), 10–1162.
[105] Maria Sahakyan, Zeyar Aung, and Talal Rahwan. 2021. Explainable artificial intelligence for tabular data: A survey. IEEE Access 9
(2021), 135392–135422.
[106] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam:
Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer
vision. 618–626.
[107] Sefik Ilkin Serengil, Salih Imece, Ugur Gurkan Tosun, Ege Berk Buyukbas, and Bilge Koroglu. 2022. A Comparative Study of Machine
Learning Approaches for Non Performing Loan Prediction with Explainability. International Journal of Machine Learning and Computing
12, 5 (2022).
[108] Sofia Serrano and Noah A Smith. 2019. Is attention interpretable? arXiv preprint arXiv:1906.03731 (2019).
[109] Lloyd S Shapley et al. 1953. A value for n-person games. (1953).
[110] Si Shi, Jianjun Li, Guohui Li, Peng Pan, and Ke Liu. 2021. XPM: An explainable deep reinforcement learning framework for portfolio
management. In Proceedings of the 30th ACM international conference on information & knowledge management. 1661–1670.
[111] Kacper Sokol and Peter A Flach. 2019. Counterfactual Explanations of Machine Learning Predictions: Opportunities and Challenges for
AI Safety. SafeAI@ AAAI (2019).
[112] Ramya Srinivasan, Ajay Chander, and Pouya Pezeshkpour. 2019. Generating user-friendly explanations for loan denials using GANs.
arXiv preprint arXiv:1906.10244 (2019).
[113] Agus Sudjianto and Aijun Zhang. 2021. Designing Inherently Interpretable Machine Learning Models. arXiv preprint arXiv:2111.01743
(2021).
[114] CFI Team. 2022. Finance Overview: Personal,business and government. Retrieved February 7, 2023 from https://corporatefinanceinstitute.
com/resources/wealth-management/finance-industry-overview/
[115] Richard Tomsett, Dave Braines, Dan Harborne, Alun Preece, and Supriyo Chakraborty. 2018. Interpretable to whom? A role-based
model for analyzing interpretable machine learning systems. arXiv preprint arXiv:1806.07552 (2018).
[116] Kim Long Tran, Hoang Anh Le, Thanh Hien Nguyen, and Duc Trung Nguyen. 2022. Explainable Machine Learning for Financial
Distress Prediction: Evidence from Vietnam. Data 7, 11 (2022), 160.
[117] Hugues Turbé, Mina Bjelogrlic, Christian Lovis, and Gianmarco Mengaldo. 2023. Evaluation of post-hoc interpretability methods in
time-series classification. Nature Machine Intelligence (2023). https://doi.org/10.1038/s42256-023-00620-w
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.
111:36 • Wei Jie et. al.
[118] Martin van den Berg and Ouren Kuiper. 2020. XAI in the financial sector: a conceptual framework for explainable AI (XAI). https://www.
hu. nl/-/media/hu/documenten/onderzoek/projecten/ (2020).
[119] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017.
Attention is all you need. Advances in neural information processing systems 30 (2017).
[120] JAMES VINCENT. 2018. Google ‘fixed’ its racist algorithm by removing gorillas from its image-labeling tech. Retrieved February 8, 2023
from https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-recognition-algorithm-ai
[121] Yelleti Vivek, Vadlamani Ravi, Abhay Anand Mane, and Laveti Ramesh Naidu. 2022. Explainable Artificial Intelligence and Causal
Inference based ATM Fraud Detection. arXiv preprint arXiv:2211.10595 (2022).
[122] Tobias Wand, Martin Heßler, and Oliver Kamps. 2022. Identifying Dominant Industrial Sectors in Market States of the S&P 500 Financial
Data. arXiv preprint arXiv:2208.14106 (2022).
[123] Katharina Weitz, Dominik Schiller, Ruben Schlagowski, Tobias Huber, and Elisabeth André. 2019. " Do you trust me?" Increasing
user-trust by integrating virtual agents in explainable AI interaction design. In Proceedings of the 19th ACM international conference on
intelligent virtual agents. 7–9.
[124] Futian Weng, Jianping Zhu, Cai Yang, Wang Gao, and Hongwei Zhang. 2022. Analysis of financial pressure impacts on the health care
industry with an explainable machine learning method: China versus the USA. Expert Systems with Applications 210 (2022), 118482.
[125] James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viégas, and Jimbo Wilson. 2019. The what-if tool:
Interactive probing of machine learning models. IEEE transactions on visualization and computer graphics 26, 1 (2019), 56–65.
[126] Binhui Xie, Longhui Yuan, Shuang Li, Chi Harold Liu, Xinjing Cheng, and Guoren Wang. 2022. Active learning for domain adaptation:
An energy-based approach. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8708–8716.
[127] Han Yan, Sheng Lin, et al. 2019. New Trend in FinTech: Research on Artificial Intelligence Model Interpretability in Financial Fields.
Open Journal of Applied Sciences 9, 10 (2019), 761.
[128] Linyi Yang, Eoin Kenny, Tin Lok James Ng, Yi Yang, Barry Smyth, and Ruihai Dong. 2020. Generating Plausible Counterfactual
Explanations for Deep Transformers in Financial Text Classification. In Proceedings of the 28th International Conference on Computational
Linguistics. 6150–6160.
[129] Linyi Yang, Zheng Zhang, Su Xiong, Lirui Wei, James Ng, Lina Xu, and Ruihai Dong. 2018. Explainable text-driven neural network for
stock prediction. In 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). IEEE, 441–445.
[130] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions
on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1–19.
[131] Angeline Yasodhara, Azin Asgarian, Diego Huang, and Parinaz Sobhani. 2021. On the trustworthiness of tree ensemble explainability
methods. In Machine Learning and Knowledge Extraction: 5th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain
Conference, CD-MAKE 2021, Virtual Event, August 17–20, 2021, Proceedings 5. Springer, 293–308.
[132] Tan Wen Rui Yeong Zee Kin, Lee Wan Sie. 2023. How Singapore is developing trustworthy AI. Retrieved February 8, 2023 from
https://www.weforum.org/agenda/2023/01/how-singapore-is-demonstrating-trustworthy-ai-davos2023/
[133] Jie Yuan and Zhu Zhang. 2020. Connecting the dots: forecasting and explaining short-term market volatility. In Proceedings of the First
ACM International Conference on AI in Finance. 1–8.
[134] Chanyuan Abigail Zhang, Soohyun Cho, and Miklos Vasarhelyi. 2022. Explainable Artificial Intelligence (XAI) in auditing. International
Journal of Accounting Information Systems 46 (2022), 100572.
[135] Ruoyun Zhang, Chao Yi, and Yixin Chen. 2020. Explainable machine learning for regime-based asset allocation. In 2020 IEEE International
Conference on Big Data (Big Data). IEEE, 5480–5485.
[136] Wei Zhang, Brian Barr, and John Paisley. 2022. An Interpretable Deep Classifier for Counterfactual Generation. In Proceedings of the
Third ACM International Conference on AI in Finance. 36–43.
[137] Wei Zhang, Brian Barr, and John Paisley. 2022. Understanding Counterfactual Generation using Maximum Mean Discrepancy. In
Proceedings of the Third ACM International Conference on AI in Finance. 44–52.
[138] Xiaohui Zhang, Qianzhou Du, and Zhongju Zhang. 2020. An explainable machine learning framework for fake financial news detection.
In 2020 International Conference on Information Systems-Making Digital Inclusive: Blending the Local and the Global, ICIS 2020. Association
for Information Systems.
[139] Zijiao Zhang, Chong Wu, Shiyou Qu, and Xiaofang Chen. 2022. An explainable artificial intelligence approach for financial distress
prediction. Information Processing & Management 59, 4 (2022), 102988.
J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2023.