0% found this document useful (0 votes)
94 views14 pages

Explainable Artificial Intelligence Approaches

The document surveys explainable artificial intelligence (XAI) approaches and discusses their merits, demerits, and challenges. It analyzes popular XAI methods using a case study on credit default prediction and compares their advantages from different perspectives like local and global explanations. The survey also provides insights on quantifying explainability and recommends paths toward responsible AI using XAI.

Uploaded by

ghrab Med
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views14 pages

Explainable Artificial Intelligence Approaches

The document surveys explainable artificial intelligence (XAI) approaches and discusses their merits, demerits, and challenges. It analyzes popular XAI methods using a case study on credit default prediction and compares their advantages from different perspectives like local and global explanations. The survey also provides insights on quantifying explainability and recommends paths toward responsible AI using XAI.

Uploaded by

ghrab Med
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1

Explainable Artificial Intelligence Approaches: A


Survey
Sheikh Rabiul Islam, University of Hartford, William Eberle, Tennessee Tech University, Sheikh Khaled
Ghafoor, Tennessee Tech University, Mohiuddin Ahmed, Edith Cowan University

Abstract—The lack of explainability of a decision from an Artificial Intelligence (AI) based “black box” system/model, despite its
superiority in many real-world applications, is a key stumbling block for adopting AI in many high stakes applications of different domain
or industry. While many popular Explainable Artificial Intelligence (XAI) methods or approaches are available to facilitate a
human-friendly explanation of the decision, each has its own merits and demerits, with a plethora of open challenges. We demonstrate
popular XAI methods with a mutual case study/task (i.e., credit default prediction), analyze for competitive advantages from multiple
perspectives (e.g., local, global), provide meaningful insight on quantifying explainability, and recommend paths towards responsible or
arXiv:2101.09429v1 [cs.AI] 23 Jan 2021

human-centered AI using XAI as a medium. Practitioners can use this work as a catalog to understand, compare, and correlate
competitive advantages of popular XAI methods. In addition, this survey elicits future research directions towards responsible or
human-centric AI systems, which is crucial to adopt AI in high stakes applications.

Index Terms—Explainable Artificial Intelligence, Explainability Quantification, Human-centered Artificial Intelligence, Interpretability.

1 I NTRODUCTION

A RTIFICIAL Intelligence (AI) has become an integral


part of many real-world applications. Factors fueling
the proliferation of AI-based algorithmic decision making
wide range of areas, as well as prevailing concerns over the
unethical use, lack of transparency, and undesired biases
in the models. Many real-world applications in the Indus-
in many disciplines include: (1) the demand for process- trial Control System (ICS) greatly increase the efficiency of
ing a variety of voluminous data, (2) the availability of industrial production from the automated equipment and
powerful computing resources (e.g., GPU computing, cloud production processes [5]. However, in this setting, the use
computing), and (3) powerful and new algorithms. How- of ’black box’ is still not in a favorable position due to the
ever, most of the successful AI-based models are “black lack of explainability and transparency of the model and
box” in nature, making it a challenge to understand how decisions.
the model or algorithm works and generates decisions. According to [6] and [7], XAI encompasses Machine
In addition, the decisions from AI systems affect human Learning (ML) or AI systems/tools for demystifying black
interests, rights, and lives; consequently, the decision is models internals (e.g., what the models have learned)
crucial for high stakes applications such as credit approval and/or for explaining individual predictions. In general,
in finance, automated machines in defense, intrusion detec- explainability of an AI model’s prediction is the extent of
tion in cybersecurity, etc. Regulators are introducing new transferable qualitative understanding of the relationship
laws such as European Union’s General Data Protection between model input and prediction (i.e., selective/suitable
Regulation (GDPR) 1 [1] aka “right to explanation” [2], US causes of the event) in a recipient friendly manner. The
government’s “Algorithmic Accountability Act of 2019” 2 term “explainability” and “interpretability” are being used
[3], or U.S. Department of Defense’s Ethical Principles for interchangeably throughout the literature. To this end, in
Artificial Intelligence 3 [4] ) to tackle primarily fairness, ac- the case of an intelligent system (i.e., AI-based system), it
countability, and transparency-related risks with automated is evident that explainability is more than interpretability
decision making systems. in terms of importance, completeness, and fidelity of pre-
XAI is a re-emerging research trend, as the need to ad- diction. Based on that, we will use these terms accordingly
vocate these principles/laws, and promote the explainable where appropriate.
decision-making system and research, continues to increase. Due to the increasing number of XAI approaches, it has
Explanation systems were first introduced in the early ’80s become challenging to understand the pros, cons, and com-
to explain the decisions of expert systems. Later, the focus of petitive advantages, associated with the different domains.
the explanation systems shifted towards human-computer In addition, there are lots of variations among different XAI
systems (e.g., intelligent tutoring systems) to provide better methods, such as whether a method is global (i.e., explains
cognitive support to users. The primary reason for the the model’s behavior on the entire data set), local (i.e.,
renewed interest in XAI research has stemmed from recent explains the prediction or decision of a particular instance),
advancements in AI and ML, and their application to a ante-hoc (i.e. involved in the pre training stage), post-hoc
(i.e. works on already trained model), or surrogate (i.e.
1. https://www.eugdpr.org deploys a simple model to emulate the prediction of a
2. https://www.senate.gov “black box” model). However, despite many reviews on XAI
3. https://www.defense.gov methods, there is still a lack of comprehensive analysis of
2

XAI when it comes to these methods and perspectives. interest in XAI research has stemmed from the recent ad-
Some of the popular work/tools on XAI are LIME, vancements in AI, its application to a wide range of areas,
DeepVis Toolbox, TreeInterpreter, Keras-vis, Microsoft Inter- the concerns over unethical use, lack of transparency, and
pretML, MindsDB, SHAP, Tensorboard WhatIf, Tensorflow’s undesired biases in the models. In addition, recent laws
Lucid, Tensorflow’s Cleverhans, etc. However, a few of these by different governments are necessitating more research
work/tools are model specific. For instance, DeepVis, keras- in XAI. According to [6] and [7], XAI encompasses Machine
vis, and Lucid are for a neural network’s explainability, and Learning (ML) or AI systems for demystifying black models
TreeInterpreter is for a tree-based model’s explainability. At internals (e.g., what the models have learned) and/or for
a high level, each of the proposed approaches have similar explaining individual predictions.
concepts, such as feature importance, feature interactions, In 2019, Mueller et al. presents a comprehensive review
shapely values, partial dependence, surrogate models, coun- of the approaches taken by a number of types of “explana-
terfactual, adversarial, prototypes and knowledge infusion. tion systems” and characterizes those into three generations:
However, despite some visible progress in XAI methods, (1) first-generation systems—for instance, expert systems
the quantification or evaluation of explainability is under- from the early 70’s, (2) second generation systems—for in-
focused, and in particular, when it comes to human study- stance, intelligent tutoring systems, and (3) third generation
based evaluations. systems—tools and techniques from the recent renaissance
In this paper, we (1) demonstrate popular meth- starting from 2015 [9]. The first generation systems attempt
ods/approaches towards XAI with a mutual task (i.e., credit to clearly express the internal working process of the system
default prediction) and explain the working mechanism in by embedding expert knowledge in rules often elicited
layman’s terms, (2) compare the pros, cons, and competitive directly from experts (e.g., via transforming rules into natu-
advantages of each approach with their associated chal- ral language expressions). The second generations systems
lenges, and analyze those from multiple perspectives (e.g., can be regarded as the human-computer system designed
global vs local, post-hoc vs ante-hoc, and inherent vs emu- around human knowledge and reasoning capacities to pro-
lated/approximated explainability), (3) provide meaningful vide cognitive support. For instance, arranging the interface
insight on quantifying explainability, and (4) recommend a in such a way that complements the knowledge that the
path towards responsible or human-centered AI using XAI user is lacking. Similar to the first generation systems, the
as a medium. Our survey is only one among the recent ones third generation systems also attempt to clarify the inner
(See Table 1) which includes a mutual test case with useful workings of the systems. But this time, these systems are
insights on popular XAI methods (See Table 4). mostly “black box” (e.g., deep nets, ensemble approaches).
In addition, nowadays, researchers are using advanced
TABLE 1 computer technologies in data visualizations, animation,
Comparison with other Surveys and video, that have a strong potential to drive the XAI
research further. Many new ideas have been proposed for
Survey Reference Mutual test case generating explainable decisions from the need of primarily
Adadi et al., 2018 [8] × accountable, fair, and trust-able systems and decisions.
Mueller et al., 2019 [9] ×
Samek et al., 2017 [6] ×
There has been some previous work [10] that mentions
Molnar et al., 2019 [10] × three notions for quantification of explainability. Two out
Staniak et al., 2018 [11] × of three notions involve experimental studies with humans
Gilpin et al., 2018 [12] × (e.g., domain expert or a layperson, that mainly investigate
Collaris et al., 2018 [13] ×
Ras et al., 2018 [1] × whether a human can predict the outcome of the model)
Dosilovic et al., 2018 [14] × [24], [25], [26], [27], [28]. The third notion (proxy tasks) does
Tjoa et al., 2019 [15] × not involve a human, and instead uses known truths as a
Dosi-Valez et al., 2017 [16] ×
Rudin et al., 2019 [17] ×
metric (e.g., the less the depth of the decision tree, the more
Arrieta et al., 2020 [18] × explainable the model).
Miller et al., 2018 [19] × Some mentionable reviews on XAI are listed in Table
Zhang et al., 2018 [20] × 1. However, while these works provide analysis from one
This Survey X
or more of the mentioned perspectives, a comprehensive
We start with a background of related works (Section review considering all of the mentioned important perspec-
2), followed by a description of the test case in Section tives, using a mutual test case, is still missing. Therefore,
3, and then a review of XAI methods in Section 4. We we attempt to provide an overview using a demonstration
conclude with an overview of quantifying explainability of a mutual test case or task, and then analyze the various
and a discussion addressing open questions and future approaches from multiple perspectives, with some future di-
research directions towards responsible for human-centered rections of research towards responsible or human-centered
AI in Section 5. AI.

2 BACKGROUND 3 T EST C ASE


Research interests in XAI are re-emerging. The earlier works The mutual test case or task that we use in this paper
such as [21], [22], and [23] focused primarily on explain- to demonstrate and evaluate the XAI methods is credit
ing the decision process of knowledge-based systems and default prediction. This mutual test case enables a better
expert systems. The primary reason behind the renewed understanding of the comparative advantages of different
3

XAI approaches. We predict whether a customer is going it works even when the relationship between input and
to default on a mortgage payment (i.e., unable to pay output is non-linear, and even when the features interact
monthly payment) in the near future or not, and explain the with one another (i.e., a correlation among features). In a
decision using different XAI methods in a human-friendly Decision Tree, a path from the root node (i.e., starting node)
way. We use the popular Freddie Mac [29] dataset for the (e.g., credit score in Figure 1) to a leaf node (e.g., default)
experiments. Table 2 lists some important features and their tells how the decision (the leaf node) took place. Usually, the
descriptions. The description of features are taken from the nodes in the upper-level of the tree have higher importance
data set’s [29] user guide. than lower-level nodes. Also, the less the number of levels
We use well-known programming language R’s package (i.e., height) a tree has, the higher the level of explainability
“iml” [30] for producing the results for the XAI methods the tree possesses. In addition, the cutoff point of a node in
described in this review. the Decision Trees provides counterfactual information—for
instance, increasing the value of a feature equal to the cutoff
point will reverse the decision/prediction. In Figure 1, if the
4 E XPLAINABLE A RTIFICIAL I NTELLIGENCE credit score is greater than the cutoff point 748, then the
M ETHODS customer is predicted as non-default. Also, tree-based ex-
This section summarizes different explainability methods planations are contrastive, i.e., a ”what if” analysis provides
with their pros, cons, challenges, and competitive advan- the relevant alternative path to reach a leaf node. According
tages primarily based on two recent comprehensive surveys: to the tree in Figure 1, there are two separate paths (credit
[31] and [16]. We then enhance the previous surveys with a score → delinquency → non-default; and credit score →
multi-perspective analysis, recent research progresses, and non-default) that lead to a non-default classification.
future research directions. [16] broadly categorize methods However, tree-based explanations cannot express the
for explanations into three kinds: Intrinsically Interpretable linear relationship between input features and output. It
Methods, Model Agnostic Methods, and Example-Based also lacks smoothness; slight changes in input can have a big
Explanations. impact on the predicted output. Also, there can be multiple
different trees for the same problem. Usually, the more the
nodes or depth of the tree, the more challenging it is to
4.1 Intrinsically Interpretable Methods
interpret the tree.
The convenient way to achieve explainable results is to
stick with intrinsically interpretable models such as Lin-
ear Regression, Logistic Regression, and Decision Trees by
avoiding the use of “black box” models. However, usually,
this natural explainability comes with a cost in performance.
In a Linear Regression, the predicted target consists
of the weighted sum of input features. So the weight or
coefficient of the linear equation can be used as a medium of
explaining prediction when the number of features is small.
y = b0 + b1 ∗ x1 + ... + bn ∗ xn +  (1)
In Formula 1, y is the target (e.g., chances of credit default),
b0 is a constant value known as the intercept (e.g., .33), bi
is the learned feature’s weight or coefficient (e.g., .33) for
the corresponding feature xi (e.g., credit score), and  is a
constant error term (e.g., .0001). Linear regression comes
with an interpretable linear relationship among features.
However, in cases where there are multiple correlated fea-
Fig. 1. Decision Trees
tures, the distinct feature influence becomes indeterminable
as the individual influences in prediction are not additive to Decision Rules (simple IF-THEN-ELSE conditions) are
the overall prediction anymore. also an inherent explanation model. For instance, ”IF credit
Logistic Regression is an extension of Linear Regression score is less than or equal to 748 AND if the customer is
to the classification problems. It models the probabilities for delinquent on payment for more than zero days (condition),
classification tasks. The interpretation of Logistic Regression THEN the customer will default on payment (prediction)”.
is different from Linear Regression as it gives a probability Although IF-THEN rules are straightforward to interpret,
between 0 and 1, where the weight might not exactly rep- it is mostly limited to classification problems (i.e., does not
resent the linear relationship with the predicted probability. support a regression problem), and inadequate in describing
However, the weight provides an indication of the direction linear relationships. In addition, the RuleFit algorithm [32]
of influence (negative or positive) and a factor of influence has an inherent interpretation to some extent as it learns
between classes, although it is not additive to the overall sparse linear models that can detect the interaction effects
prediction. in the form of decision rules. Decision rules consist of the
Decision Tree-based models split the data multiple combination of split decisions from each of the decision
times based on a cutoff threshold at each node until it paths. However, besides the original features, it also learns
reaches a leaf node. Unlike Logistic and Linear Regression, some new features to capture the interaction effects of
4

TABLE 2
Dataset description

Feature Description
creditScore A number in between 300 and 850 that indicates the creditworthiness of the borrowers.
originalUPB Unpaid principle balance on the note date.
originalInterestRate Original interest rate as indicated by the mortgage note.
currentLoanDelinquencyStatus Indicates the number of days the borrower is delinquent.
numberOfBorrower Number of borrower who are obligated to repay the loan.
currentInterestRate Active interest rate on the note.
originalCombinedLoanToValue Ratio of all mortgage loans and apprised price of mortgaged property on the note date.
currentActualUPB Unpaid principle balance as of latest month of payment.
defaulted Whether the customer was default on payment (1) or not (0.)

original features. Usually, interpretability degrades with an features. In the real world, this is unusual. Furthermore,
increasing number of features. there is a practical limit of only two features that PD plot
Other interpretable models include the extension of lin- can clearly explain at a time. Also, it is a global method, as it
ear models such as Generalized Linear Models (GLMs) plots the average effect (from all instances) of a feature(s) on
and Generalized Additive Models (GAMs); they help to the prediction, and not for all features on a specific instance.
deal with some of the assumptions of linear models (e.g., The PD plot in Figure 2 shows the effect of credit score on
the target outcome y and given features follow a Gaussian prediction. Individual bar lines along the X axis represent
Distribution; and no interaction among features). However, the frequency of samples for different ranges of credit scores.
these extensions make models more complex (i.e., added
interactions) as well as less interpretable. In addition, a
Naı̈ve Bayes Classifier based on Bayes Theorem, where the
probability of classes for each of the features is calculated
independently (assuming strong feature independence), and
K-Nearest Neighbors, which uses nearest neighbors of a
data point for prediction (regression or classification), also
fall under intrinsically interpretable models.

4.2 Model-Agnostic Methods


Model-agnostic methods separate explanation from a ma-
chine learning model, allowing the explanation method to
be compatible with a variety of models. This separation has
some clear advantages such as (1) the interpretation method
can work with multiple ML models, (2) provides different
forms of explainability (e.g., visualization of feature impor-
tance, linear formula) for a particular model, and (3) allows
for a flexible representation—a text classifier uses abstract
word embedding for classification but uses actual words
for explanation. Some of the model-agnostic interpretation
methods include Partial Dependence Plot (PDP), Individual
Fig. 2. Partial Dependence Plot (PDP)
Conditional Expectation (ICE), Accumulation Local Effects
(ALE) Plot, Feature Interaction, Feature Importance, Global
Surrogate, Local Surrogate (LIME), and Shapley Values 4.2.2 Individual Conditional Expectation (ICE)
(SHAP). Unlike PDP, ICE plots one line per instance showing how
a feature influences the changes in prediction (See Figure
4.2.1 Partial Dependence Plot (PDP) 3. The average on all lines of an ICE plot gives a PD plot
The partial Dependence Plot (PDP) or PD plot shows the [34] (i.e., the single line shown in the PD plot in Figure 2).
marginal effect of one or two features (at best three features Figure 4, combines both PDP and ICE together for a better
in 3-D) on the predicted outcome of an ML model [33]. It is interpretation.
a global method, as it shows an overall model behavior, and Although ICE curves are more intuitive to understand
is capable of showing the linear or complex relationships than a PD plot, it can only display one feature meaningfully
between target and feature(s). It provides a function that at a time. In addition, it also suffers from the problem of
depends only on the feature(s) being plotted by marginal- correlated features and overcrowded lines when there are
izing over other features in such a way that includes the many instances.
interactions among them. PDP provides a clear and causal
interpretation by providing the changes in prediction due 4.2.3 Accumulated Local Effects (ALE) Plot
to changes in particular features. However, PDP assumes Similar to PD plots (Figure 2, ALE plots (Figure 5 describe
features under the plot are not correlated with the remaining how features influence the prediction on average. However,
5

Fig. 3. Individual Conditional Expectation (ICE) Fig. 5. Accumulated Local Effects (ALE) Plot

of the partial dependence functions for each feature separately.


Figure 6 shows the interaction strength of each partici-
pating feature. For example, current Actual UPB has the
highest level of interaction with other features, and credit
score has the least interaction with other features. However,
calculating feature interaction is computationally expensive.
Furthermore, using sampling instead of the entire dataset
usually shows variances from run to run. 6,

Fig. 4. PDP and ICE combined together in the same plot

unlike PDP, ALE plot reasonably works well with correlated


features and is comparatively faster. Although ALE plot is
not biased to the correlated features, it is challenging to in-
terpret the changes in prediction when features are strongly
correlated and analyzed in isolation. In that case, only plots
showing changes in both correlated features together make
sense to understand the changes in the prediction.

4.2.4 Feature Interaction


When the features interact with one another, individual
feature effects do not sum up to the total feature effects Fig. 6. Feature interaction
from all features combined. An H-statistic (i.e., Friedman’s
H-statistic) helps to detect different types of interaction,
even with three or more features. The interaction strength 4.2.5 Feature Importance
between two features is the difference between the partial Usually, the feature importance of a feature is the increase
dependence function for those two features together and the sum in the prediction error of the model when we permute
6

the values of the feature to break the true relationship model could be avoided given the surrogate model demon-
between the feature and the true outcome. After shuffling strates a comparable performance. Although a surrogate
the values of the feature, if errors increase, then the feature model comes with interpretation and flexibility (i.e., such
is important. [35] introduced the permutation-based feature as model agnosticism), diverse explanations for the same
importance for Random Forests; later [36] extended the “black box” such as multiple possible decision trees with
work to a model-agnostic version. Feature importance pro- different structures, is a drawback. Besides, some would
vides a compressed and global insight into the ML model’s argue that this is only an illusion of interpretability.
behavior. For example, Figure 7 shows the importance of
each participating feature, current Actual UPB possess the
highest feature importance, and credit score possess the low-
est feature importance. Although feature importance takes
into account both the main feature effect and interaction,
this is a disadvantage as feature interaction is included in
the importance of correlated features. We can see that the
feature current Actual UPB possesses the highest feature
importance (Figure 7), at the same time it also possesses the
highest interaction strength 6. As a result, in the presence
of interaction among features, the feature importance does
not add up to total drop-in of performance. Besides, it is
unclear whether the test set or training set should be used
for feature importance, as it demonstrates variance from run
to run in the shuffled dataset. It is necessary to mention that
feature importance also falls under the global methods.

Fig. 8. Global surrogate

4.2.7 Local Surrogate (LIME)


Unlike global surrogate, local surrogate explains individ-
ual predictions of black-box models. Local Interpretable
Model-Agnostic Explanations (LIME) was proposed by [37].
Lime trains an inherently interpretable model (e.g., Decision
Trees) on a new dataset made from the permutation of
samples and the corresponding prediction of the black box.
Although the learned model can have a good approximation
of local behavior, it does not have a good global approxi-
mation. This trait is also known as local fidelity. Figure 9
is a visualization of the output from LIME. For a random
sample, the black box predicts that a customer will default
on payment with a probability of 1; the local surrogate
model, LIME also predict that the customer will default
on the payment, however, the probability is 0.99, that is
Fig. 7. Feature importance little less than the black box models prediction. LIME also
shows which feature contributes to the decision making
and by how much. Furthermore, LIME allows replacing the
4.2.6 Global Surrogate underlying “black box” model by keeping the same local
A global surrogate model tries to approximate the overall interpretable model for the explanation. In addition, LIME
behavior of a “black box” model using an interpretable ML works for tabular data, text, and images. As LIME is an ap-
model. In other words, surrogate models try to approxi- proximation model, and the local model might not cover the
mate the prediction function of a black-box model using complete attribution due to the generalization (e.g., using
an interpretable model as correctly as possible, given the shorter trees, lasso optimization), it might be unfit for cases
prediction is interpretable. It is also known as a meta-model, where we legally need complete explanations of a decision.
approximate model, response surface model, or emulator. Furthermore, there is no consensus on the boundary of the
We approximate the behavior of a Random Forest using neighborhood for the local model; sometimes, it provides
CART decision trees (Figure 8). The original black box very different explanations for two nearby data points.
7

and remove features iteratively based on their influence


on the overall average predicted response (baseline) [40].
For instance, from the game theory perspective, it starts
with an empty team, then adds feature values one by one
based on their decreasing contribution. In each iteration, the
amount of contribution from each feature depends on the
features values of those are already in the team, which is
considered as a drawback of this approach. However, it is
faster than the Shapley value method due to the greedy
approach, and for models without interactions, the results
are the same [31]. Figure 11 is a visualization of break down
for a random sample, showing contribution (positive or
negative) from each of the participating features towards
the final prediction.

4.3 Example-Based Explanations


Example-Based Explanation methods use particular in-
stances from the dataset to explain the behavior of the model
and the distribution of the data in a model agnostic way.
It can be expressed as “X is similar to Y and Y caused Z,
so the prediction says X will cause Z”. According to [31],
a few explanation methods that fall under Example-Based
Explanations are described as follows:
Fig. 9. Local Interpretable Model-Agnostic Explanations (LIME)
4.3.1 Counterfactual
The counterfactual method indicates the required changes
4.2.8 Shapley Values in the input side that will have significant changes (e.g.,
Shapley is another local explanation method. In 1953, Shap- reverse the prediction) in the prediction/output. Coun-
ley [38] coined the Shapley Value. It is based on coalitional terfactual explanations can explain individual predictions.
game theory that helps to distribute feature importance For instance, it can provide an explanation that describes
among participating features fairly. Here the assumption is causal situations such as “If A had not occurred, B would
that each feature value of the instance is a player in a game, not have occurred”. Although counterfactual explanations
and the prediction is the overall payout that is distributed are human-friendly, it suffers from the “Rashomon effect”,
among players (i.e., features) according to their contribution where each counterfactual explanation tells a different story
to the total payout (i.e., prediction). We use Shapely val- to reach a prediction. In other words, there are multiple
ues (See Figure 10) to analyze the prediction of a random true explanations (counterfactual) for each instance level
forest model for the credit default prediction problem. The prediction, and the challenge is how to choose the best one.
actual prediction for a random sample is 1.00, the average The counterfactual methods do not require access to data
prediction from all samples in the data set is 0.53, and or models and could work with a system that does not use
their difference .47 (1.00 − 0.53) consists of the individual machine learning at all. In addition, this method does not
contributions from the features (e.g., Current Actual UPB work well for categorical variables with many values. For
contributes 0.36). The Shapely Value is the average contri- instance, if the credit score of customer 5 (from Table 3) can
bution in prediction over all possible coalition of features, be increased to 749 (similar to the credit score of customer 6)
which make it computationally expensive when there is a from 748, given other features values remain unchanged, the
large number of features—for example, for k number of customer will not default on a payment. In short, there can
features, there will be 2k number of coalitions. Unlike LIME, be multiple different ways to tune feature values to make
Shapely Value is an explanation method with a solid theory customers move from non-default to default, or vice versa.
that provides full explanations. However, it also suffers from Traditional explanation methods are mostly based on
the problem of correlated features. Furthermore, the Shapely explaining correlation rather than causation. Moraffah et al.
value returns a single value per feature; there is no way [41] focus on the causal interpretable model that explains
to make a statement about the changes in output resulting the possible decision under different situations such as
from the changes in input. One mentionable implementation being trained with different inputs or hyperparameters. This
of the Shapely value is in the work of [39] that they call causal interpretable approach share concept of counterfac-
SHAP. tual analysis as both work on causal inference. Their work
also suggests possible use in fairness criteria evaluation of
4.2.9 Break Down decisions.
The Break Down package provides the local explanation and
is loosely related to the partial dependence algorithm with 4.3.2 Adversarial
an added step-wise procedure known as “Break Down” An adversarial technique is capable of flipping the decision
(proposed by [11]). It uses a greedy strategy to identify using counterfactual examples to fool the machine learner
8

Fig. 10. Shapely values

TABLE 3 tions of adversarial samples on Recurrent Neural Network


Example-Based Explanations (RNNs) based IDS because RNNs are good for sequential
data analysis, and network traffic exhibits some sequential
Customer Delinquency Credit score Defaulted patterns. They find that adversarial the adversarial train-
1 162 680 yes ing procedure can significantly reduce the attack surface.
2 149 691 yes Furthermore, [43] apply an adversarial approach to finding
3 6 728 yes minimum modification of the input features of an intrusion
4 6 744 yes
5 0 748 yes detection system needed to reverse the classification of the
6 0 749 no misclassified instance. Besides satisfactory explanations of
7 0 763 no the reason for misclassification, their approach work pro-
8 0 790 no
9 0 794 no
vide further diagnosis capabilities.
10 0 806 no

4.3.3 Prototypes
(i.e., small intentional perturbations in input to make a false Prototypes consist of a selected set of instances that rep-
prediction). However, adversarial examples could help to resent the data very well. Conversely, the set of instances
discover hidden vulnerabilities as well as to improve the that do not represent data well are called criticisms [44].
model. For instance, an attacker can intentionally design Determining the optimal number of prototypes and crit-
adversarial examples to cause the AI system to make a icisms are challenging. For example, customers 1 and 10
mistake (i.e., fooling the machine), which poses greater from Table 3 can be treated as prototypes as those are strong
threats to cyber-security and autonomous vehicles. As an representatives of the corresponding target. On the other
example, the credit default prediction system can be fooled hand, customers 5 and 6 (from Table 3) can be treated as a
for customer 5, just by increasing the credit score by 1 (see criticism as the distance between the data points is minimal,
Table 3), leading to a reversed prediction. and they might be classified under either class from run to
Hartl et al. [42] emphasize on understanding the implica- run of the same or different models.
9

Fig. 11. Breakdown

4.3.4 Influential Instances


Influential instances are data points from the training set
that are influential for prediction and parameter determina-
tion of the model. While it helps to debug the model and un-
derstand the behavior of the model better, determining the
right cutoff point to separate influential or non-influential
instances is challenging. For example, based on the values
of feature credit score and delinquency, customers 1, 2, 9,
and 10 from Table 3 can be treated as influential instances as
those are strong representatives of the corresponding target.
On the other hand, customers 5 and 6 are not influential in-
stances, as those would be in the margin of the classification
decision boundary.

4.3.5 k-nearest Neighbors Model


The prediction of the k-nearest neighbor model can be
explained with the k-neighbor data points (neighbors those
were averaged to make the prediction). A visualization of
the individual cluster containing similar instances provides
an interpretation of why an instance is a member of a Fig. 12. KNN
particular group or cluster. For example, in Figure 12, the
new sample (black circle) is classified according to the other
three (3-nearest neighbor) nearby samples(one gray, two
white). This visualization gives an interpretation of why a model behavior (i.e., creates an illusion of interpretability)
particular sample is part of a particular class. or finds actual behavior, (B) whether the method alone is
Table 4 summarizes the explainability methods from the inherently interpretable or not, (C) whether the interpre-
perspective of (A) whether the method approximates the tation method is ante-hoc, that is, it incorporates explain-
10

ability into a model from the beginning, or post-hoc, where samples with abnormal samples for better understanding
explainability is incorporated after the regular training of with detailed information.
the actual model (i.e., testing time), (D) whether the method
is model agnostic (i.e., works for any ML model) or specific
to an algorithm, and (E) whether the model is local, provid- 4.5 Knowledge Infusion Techniques
ing instance-level explanations, or global, providing overall
model behavior. [48] propose a concept attribution-based approach (i.e.,
Our analysis says there is a lack of an explainability sensitivity to the concept) that provides an interpretation
method (i.e., a gap in the literature), which is, at the same of the neural network’s internal state in terms of human-
time actual and direct (i.e., does not create an illusion of friendly concepts. Their approach, Testing with CAV (TCAV),
explainability by approximating the model), model agnostic, quantifies the prediction’s sensitivity to a high dimensional
and local, such that it utilizes the full potential of the concept. For example, a user-defined set of examples that
explainability method in different applications. There are defines the concept ’striped’, TCAV can quantify the in-
some recent works that bring external knowledge and infuse fluence of ’striped’ in the prediction of ’zebra’ as a single
that into the model for better interpretation. These XAI number. However, their work is only for image classification
methods have the potential to fill the gap to some extent by and falls under the post-modeling notion (i.e., post-hoc) of
incorporating domain knowledge into the model in a model explanation.
agnostic and transparent way (i.e., not by illusion). [49] propose a knowledge-infused learning that mea-
sures information loss in latent features learned by the
neural networks through Knowledge Graphs (KGs). This
4.4 Other Techniques external knowledge incorporation (via KGs) aids in super-
Chen et al. [45] introduce an instance-wise feature selection vising the learning of features for the model. Although
as a methodology for model interpretation where the model much work remains, they believe that (KGs) will play a
learns a function to extract a subset of most informative crucial role in developing explainable AI systems.
features for a particular instance. The feature selector at- [50] and [51] infuse popular domain principles from the
tempt to maximize the mutual information between selected domain in the model and represent the output in terms of
features and response variables. However, their approach is the domain principle for explainable decisions. In [50], for
mostly limited to posthoc approaches. a bankruptcy prediction problem they use the 5C’s of credit
In a more recent work, [46] study explainable ML us- as the domain principle which is commonly used to analyze
ing information theory where they quantify the effect of key factors: character (reputation of the borrower/firm),
an explanation by the conditional mutual information be- capital (leverage), capacity (volatility of the borrower’s earn-
tween the explanation and prediction considering user back- ings), collateral (pledged asset) and cycle (macroeconomic
ground. Their approach provides personalized explanation conditions) [52], [53]. In [51], for an intrusion detection and
based on the background of the recipient, for instance, a response problem, they incorporate the CIA principles into
different explanation for those who know linear algebra and the model; C stands for confidentiality—concealment of infor-
those who don’t. However, this work is yet to be considered mation or resources, I stands for integrity—trustworthiness
as a comprehensive approach which considers a variety of of data or resources, and A stands for availability—ability
user and their explanation needs. To understand the flow to use the information or resource desired [54]. In both
of information in a Deep Neural Network (DNN), [47] cases, the infusion of domain knowledge leads to better
analyzed different gradient-based attribution methods that explainability of the prediction with negligible compromises
assign an attribution value (i.e., contribution or relevance) to in performance. It also comes with better execution time and
each input feature (i.e., neuron) of a network for each out- a more generalized model that works better with unknown
put neurons. They use a heatmap for better visualizations samples.
where a particular color represents features that contribute Although these works [50], [51] come with unique com-
positively to the activation of target output, and another binations of merits such as model agnosticism, the capability
color for features that suppress the effect on it. of both local and global explanation, and authenticity of
A survey on the visual representation of Convolutional explanation—simulation or emulation free, they are still
Neural Networks (CNNs), by [20], categorizes works based not fully off-the-shelf systems due to some domain-specific
on a) visualization of CNN representations in intermediate configuration requirements. Much work still remains and
network layers, b) diagnosis of CNN representation for needs further attention.
feature space of different feature categories or potential
representation flaws, c) disentanglement of “the mixture of
patterns” encoded in each filter of CNNs, d) interpretable
5 Q UANTIFYING E XPLAINABILITY AND F UTURE
CNNs, and e) semantic disentanglement of CNN represen-
tations. R ESEARCH D IRECTIONS
In the industrial control system, an alarm from the
5.1 Quantifying Explainability
intrusion/anomaly detection system has a very limited role
unless the alarm can be explained with more information. The quantification or evaluation of explainability is an open
[5] design a layer-wise relevance propagation method for challenge. There are two primary directions of research to-
DNN to map the abnormalities between the calculation pro- wards the evaluation of explainability of an AI/ML model:
cess and features. This process helps to compare the normal (1) model complexity-based, and (2) human study-based.
11

TABLE 4
Comparison of different explainability methods from a set of key perspectives (approximation or actual; inherent or not; post-hoc or ante-hoc;
model-agnostic or model specific; and global or local)

Method Approx. Inherent Post/Ante Agnos./Spec. Global/Local


Linear/Logistic Regression No Yes Ante Specific Both
Decision Trees No Yes Ante Specific Both
Decision Rules No Yes Ante Specific Both
k-Nearest Neighbors No Yes Ante Specific Both
Partial Dependence Plot (PDP) Yes No Post Agnostic Global
Individual Conditional Expectation (ICE) Yes No Post Agnostic Both
Accumulated Local Effects (ALE) Plot Yes No Post Agnostic Global
Feature Interaction No Yes Both Agnostic Global
Feature Importance No Yes Both Agnostic Global
Global Surrogate Yes No Post Agnostic Global
Local Surrogate (LIME) Yes No Post Agnostic Local
Shapley Values (SHAP) Yes No Post Agnostic Local
Break Down Yes No Post Agnostic Local
Counterfactual explanations Yes No Post Agnostic Local
Adversarial examples Yes No Post Agnostic Local
Prototypes Yes No Post Agnostic Local
Influential instances Yes No Post Agnostic Local

5.1.1 Model Complexity-based Explainability Evaluation which ultimately makes it hard to understand the causal
relationship between input and output, compared to an
In the literature, model complexity and (lack of) model individual feature influence in the prediction. In fact, from
interpretability are often treated as the same [10]. For in- our study of different explainability tools (e.g., LIME, SHAP,
stance, in [55], [56], model size is often used as a measure of PDP), we have found that the correlation among features is
interpretability (e.g., number of decision rules, depth of the a key stumbling block to represent feature contribution in
tree, number of non-zero coefficients). a model agnostic way. Keeping the issue of feature inter-
[56] propose a scalable Bayesian Rule List (i.e., proba- actions in mind, [10] propose a technique that uses three
bilistic rule list) consisting of a sequence of IF-THEN rules, measures: number of features, interaction strength among
identical to a decision list or one-sided decision tree. Unlike features, and the main effect (excluding the interaction part)
the decision tree that uses greedy splitting and pruning, of features, to measure the complexity of a post-hoc model
their approach produces a highly sparse and accurate rule for explanation.
list with a balance between interpretability, accuracy, and Although, [10] mainly focuses on model complexity for
computation speed. Similarly, the work of [55] is also rule- post-hoc models, their work was a foundation for the ap-
based. They attempt to evaluate the quality of the rules proach by [58] for the quantification of explainability. Their
using a rule learning algorithm by: the observed coverage, approach to quantify explainability is model agnostic and
which is the number of positive examples covered by the is for a model of any notion (e.g., pre-modeling, post-hoc)
rule, which should be maximized to explain the training using proxy tasks that do not involve a human. Instead,
data well; and consistency, which is the number of negative they use known truth as a metric (e.g., the less number of
examples covered by the rule, which should be minimized features, the more explainable the model). Their proposed
to generalize well to unseen data. formula for explainability gives a score in between 0 and 1
According to [57], while the number of features and the for explainability based on the number of cognitive chunks
size of the decision tree are directly related to interpretabil- (i.e., individual pieces of information) used on the input side
ity, the optimization of the tree size or features (i.e., feature and output side, and the extent of interaction among those
selection) is costly as it requires the generation of a large cognitive chunks.
set of models and their elimination in subsequent steps.
However, reducing the tree size (i.e., reducing complexity) 5.1.2 Human Study-based Explainability Evaluation
increases error, as they could not find a way to formulate The following works deal with the application-level and
the relation in a simple functional form. More recently, [10] human-level evaluation of explainability involving human
attempts to quantify the complexity of the arbitrary machine studies.
learning model with a model agnostic measure. In that [26] investigate the suitability of different alternative
work, the author demonstrates that when the feature in- representation formats (e.g., decision tables, (binary) deci-
teraction (i.e., the correlation among features) increases, the sion trees, propositional rules, and oblique rules) for clas-
quality of representations of explainability tools degrades. sification tasks primarily focusing on the explainability of
For instance, the explainability tool ALE Plot (see Figure results rather than accuracy or precision. They discover that
5 starts to show harsh lines (i.e., zigzag lines) as feature decision tables are the best in terms of accuracy, response
interaction increases. In other words, with more interaction time, the confidence of answer, and ease of use.
comes a more combined influence in the prediction, induced [24] argue that interpretability is not an absolute con-
from different correlated subsets of features (at least two), cept; instead, it is relative to the target model, and may or
12

may not be relative to the human. Their finding suggests open challenges such as (A) a lack of formalism of the
that a model is readily interpretable to a human when it uses explanation, (B) a customized explanation for different types
no more than seven pieces of information [59]. Although, of explanation recipients (e.g., layperson, domain expert,
this might vary from task to task and person to person. another machine), (C) a way to quantify the explanation,
For instance, a domain expert might consume a lot more and (D) quantifying the level of comprehensibility with
detailed information depending on their experience. human studies. Therefore, leveraging the knowledge from
The work of [27] is a human-centered approach, focus- multiple domains, a generic framework could be useful
ing on previous work on human trust in a model from considering the mentioned challenges. As a result, mission-
psychology, social science, machine learning, and human- critical applications from different domains will be able to
computer interaction communities. In their experiment with leverage the black-box model with greater confidence and
human subjects, they vary factors (e.g., number of features, regulatory compliance.
whether the model internals are transparent or a black box)
that make a model more or less interpretable and measures 5.2.2 Towards Fair, Accountable, and Transparent AI-
how the variation impacts the prediction of human subjects. based Models
Their results suggest that participants who were shown a Responsible use of AI is crucial for avoiding risks stemming
transparent model with a small number of features were from a lack of fairness, accountability, and transparency in
more successful in simulating the model’s predictions and the model. Remediation of data, algorithmic, and societal
trusted the model’s predictions. biases is vital to promote fairness; the AI system/adopter
[25] investigate interpretability of a model based on two should be held accountable to affected parties for its deci-
of its definitions: simulatability, which is a user’s ability to sion; and finally, an AI system should be analyzable, where
predict the output of a model on a given input; and “what the degree of transparency should be comprehensible to
if” local explainability, which is a user’s ability to predict have trust in the model and its prediction for mission-
changes in prediction in response to changes in input, given critical applications. Interestingly, XAI enhances understat-
the user has the knowledge of a model’s original prediction ing directly, increasing trust as a side-effect. In addition,
for the original input. They introduce a simple metric called the explanation techniques can help in uncovering potential
runtime operation count that measures the interpretability, risks (e.g., what are possible fairness risks). So it is crucial to
that is, the number of operations (e.g., the arithmetic opera- adhere to fairness, accountability, and transparency princi-
tion for regression, the boolean operation for trees) needed ples in the design and development of explainable models.
in a user’s mind to interpret something. Their findings
suggest that interpretability decreases with an increase in 5.2.3 Human-Machine Teaming
the number of operations.
Despite some progress, there are still some open chal- To ensure the responsible use of AI, the design, devel-
lenges surrounding explainability such as an agreement of opment, and deployment of human-centered AI, that col-
what an explanation is and to whom; a formalism for the laborates with the humans in an explainable manner, is
explanation; and quantifying the human comprehensibility essential. Therefore, the explanation from the model needs
of the explanation. Other challenges include addressing to be comprehensible by the user, and there might be some
more comprehensive human studies requirements and in- supplementary questions that need to be answered for a
vestigating the effectiveness among different approaches clear explanation. So, the interaction (e.g., follow-ups after
(e.g., supervised, unsupervised, semi-supervised) for var- the initial explanation) between humans and machines is
ious application areas (e.g., natural language processing, important. The interaction is more crucial for adaptive ex-
image recognition). plainable models that provide context-aware explanations
based on user profiles such as expertise, domain knowledge,
interests, and cultural backgrounds. The social sciences and
5.2 Future Research Directions human behavioral studies have the potential to impact
The long term goal for current AI initiatives is to contribute XAI and human-centered AI research. Unfortunately, the
to the design, development, and deployment of human- Human-Computer Interaction (HCI) community is kind of
centered artificial intelligent systems, where the agents col- isolated. The combination of HCI empirical studies and
laborate with the human in an interpretable and explainable human science theories could be a compelling force for the
manner, with the intent on ensuring fairness, transparency, design of human-centered AI models as well as furthering
and accountability. To accomplish that goal, we propose a XAI research. Therefore, efforts to bring a human into the
set of research plans/directions towards achieving respon- loop, enabling the model to receive input (repeated feed-
sible or human-centered AI using XAI as a medium. back) from the provided visualization/explanations to the
human, and improving itself with the repeated interactions,
5.2.1 A Generic Framework to Formalize Explainable Arti- has the potential to further human-centered AI. Besides
ficial Intelligence adherence to fairness, accountability, and transparency, the
The work in [50] and [51], demonstrates a way to collect effort will also help in developing models that adhere to our
and leverage domain knowledge from two different do- ethics, judgment, and social norms.
mains, finance and cybersecurity, and further infused that
knowledge into black-box models for better explainability. 5.2.4 Collective Intelligence from Multiple Disciplines
In both of these works, competitive performance with en- From the explanation perspective, there is plenty of research
hanced explainability is achieved. However, there are some in philosophy, psychology, and cognitive science on how
13

people generate, select, evaluate, and represent explana- [3] B. Wyden, “Algorithmic accountability,” https://www.wyden.
tions and associate cognitive biases and social expectations senate.gov/imo/media/doc/Algorithmic%20Accountability%
20Act%20of%202019%20Bill%20Text.pdf, (Accessed on
in the explanation process. In addition, from the interac- 11/21/2019).
tion perspective, human-computer teaming involving social [4] M. T. Esper, “Ai ethical principles,” https://www.defense.gov/
science, the HCI community, and social-behavioral stud- Newsroom/Releases/Release/Article/2091996/dod-adopts-
ies could combine for further breakthroughs. Furthermore, ethical-principles-for-artificial-intelligence/, February 2020,
(Accessed on 03/07/2020).
from the application perspective, the collectively learned [5] Z. Wang, Y. Lai, Z. Liu, and J. Liu, “Explaining the attributes of
knowledge from different domains (e.g., Health-care, Fi- a deep learning based intrusion detection system for industrial
nance, Medicine, Security, Defense) can contribute to fur- control networks,” Sensors, vol. 20, no. 14, p. 3817, 2020.
thering human-centric AI and XAI research. Thus, there is a [6] W. Samek, T. Wiegand, and K.-R. Müller, “Explainable artificial
intelligence: Understanding, visualizing and interpreting deep
need for a growing interest in multidisciplinary research to learning models,” arXiv preprint arXiv:1708.08296, 2017.
promote human-centric AI as well as XAI in mission-critical [7] A. Fernandez, F. Herrera, O. Cordon, M. J. del Jesus, and F. Mar-
applications from different domains. celloni, “Evolutionary fuzzy systems for explainable artificial in-
telligence: why, when, what for, and where to?” ieee Computational
intelligenCe magazine, vol. 14, no. 1, pp. 69–81, 2019.
[8] A. Adadi and M. Berrada, “Peeking inside the black-box: A survey
6 C ONCLUSION on explainable artificial intelligence (xai),” IEEE Access, vol. 6, pp.
52 138–52 160, 2018.
We demonstrate and analyze mutual XAI methods using [9] S. T. Mueller, R. R. Hoffman, W. Clancey, A. Emrey, and G. Klein,
a mutual test case to explain competitive advantages and “Explanation in human-ai systems: A literature meta-review, syn-
elucidate the challenges and further research directions. opsis of key ideas and publications, and bibliography for explain-
Most of the available works on XAI are on the post-hoc able ai,” arXiv preprint arXiv:1902.01876, 2019.
[10] C. Molnar, G. Casalicchio, and B. Bischl, “Quantifying model
notion of explainability. However, the post-hoc notion of complexity via functional decomposition for better post-hoc in-
explainability is not purely transparent and can be mis- terpretability,” in Joint European Conference on Machine Learning and
leading, as it explains the decision after it has been made. Knowledge Discovery in Databases. Springer, 2019, pp. 193–204.
The explanation algorithm can be optimized to placate [11] M. Staniak and P. Biecek, “Explanations of model predictions with
live and breakdown packages,” arXiv preprint arXiv:1804.01955,
subjective demand, primarily stemming from the emulation 2018.
effort of the actual prediction, and the explanation can be [12] L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Ka-
misleading, even when it seems plausible [60], [61]. Thus, gal, “Explaining explanations: An overview of interpretability of
machine learning,” in 2018 IEEE 5th International Conference on data
many suggest not to explain black-box models using post-
science and advanced analytics (DSAA). IEEE, 2018, pp. 80–89.
hoc notions, instead, they suggest adhering to simple and [13] D. Collaris, L. M. Vink, and J. J. van Wijk, “Instance-level ex-
intrinsically explainable models for high stakes decisions planations for fraud detection: A case study,” arXiv preprint
[17]. Furthermore, from the literature review, we find that arXiv:1806.07129, 2018.
[14] F. K. Došilović, M. Brčić, and N. Hlupić, “Explainable artificial
explainability in pre-modeling is a viable option to avoid intelligence: A survey,” in 2018 41st International convention on
the transparency related issues, albeit, under-focused. In information and communication technology, electronics and microelec-
addition, knowledge infusion techniques have the potential tronics (MIPRO). IEEE, 2018, pp. 0210–0215.
to enhance explainability greatly, although, also an under- [15] E. Tjoa and C. Guan, “A survey on explainable artificial intelli-
gence (xai): towards medical xai,” arXiv preprint arXiv:1907.07374,
focused challenge. Therefore, we need more focus on the 2019.
explainability of “black box” models using domain knowl- [16] F. Doshi-Velez and B. Kim, “Towards a rigorous science of inter-
edge. At the same time, we need to focus on the evaluation pretable machine learning,” arXiv preprint arXiv:1702.08608, 2017.
or quantification of explainability using both human and [17] C. Rudin, “Stop explaining black box machine learning models
for high stakes decisions and use interpretable models instead,”
non-human studies. We believe this review provides a good Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, 2019.
insight into the current progress on XAI approaches, eval- [18] A. B. Arrieta, N. Dı́az-Rodrı́guez, J. Del Ser, A. Bennetot, S. Tabik,
uation and quantification of explainability, open challenges, A. Barbado, S. Garcı́a, S. Gil-López, D. Molina, R. Benjamins et al.,
and a path towards responsible or human-centered AI using “Explainable artificial intelligence (xai): Concepts, taxonomies,
opportunities and challenges toward responsible ai,” Information
XAI as a medium. Fusion, vol. 58, pp. 82–115, 2020.
[19] T. Miller, “Explanation in artificial intelligence: Insights from the
social sciences,” Artificial Intelligence, 2018.
ACKNOWLEDGMENTS [20] Q.-s. Zhang and S.-C. Zhu, “Visual interpretability for deep
learning: a survey,” Frontiers of Information Technology & Electronic
Our sincere thanks to Christoph Molnar for his open E- Engineering, vol. 19, no. 1, pp. 27–39, 2018.
book on Interpretable Machine Learning and contribution [21] B. Chandrasekaran, M. C. Tanner, and J. R. Josephson, “Explaining
control strategies in problem solving,” IEEE Intelligent Systems,
to the open-source R package “iml”. Both were very useful no. 1, pp. 9–15, 1989.
in conducting this survey. [22] W. R. Swartout and J. D. Moore, “Explanation in second generation
expert systems,” in Second generation expert systems. Springer,
1993, pp. 543–585.
R EFERENCES [23] W. R. Swartout, “Rule-based expert systems: The mycin experi-
ments of the stanford heuristic programming project: Bg buchanan
[1] G. Ras, M. van Gerven, and P. Haselager, “Explanation methods and eh shortliffe,(addison-wesley, reading, ma, 1984); 702 pages,”
in deep learning: Users, values, concerns and challenges,” in 1985.
Explainable and Interpretable Models in Computer Vision and Machine [24] A. Dhurandhar, V. Iyengar, R. Luss, and K. Shanmugam, “Tip:
Learning. Springer, 2018, pp. 19–36. Typifying the interpretability of procedures,” arXiv preprint
[2] B. Goodman and S. Flaxman, “Eu regulations on algorithmic arXiv:1706.02952, 2017.
decision-making and a “right to explanation”,” in ICML workshop [25] S. A. Friedler, C. D. Roy, C. Scheidegger, and D. Slack, “Assess-
on human interpretability in machine learning (WHI 2016), New York, ing the local interpretability of machine learning models,” arXiv
NY. http://arxiv. org/abs/1606.08813 v1, 2016. preprint arXiv:1902.03501, 2019.
14

[26] J. Huysmans, K. Dejaeger, C. Mues, J. Vanthienen, and B. Baesens, SIGKDD Conference on Knowledge Discovery and Data Mining, 2019,
“An empirical evaluation of the comprehensibility of decision Anomaly Detection in Finance Workshop, 2019.
table, tree and rule based predictive models,” Decision Support [51] S. R. Islam, W. Eberle, S. K. Ghafoor, A. Siraj, and M. Rogers,
Systems, vol. 51, no. 1, pp. 141–154, 2011. “Domain knowledge aided explainable artificial intelligence for
[27] F. Poursabzi-Sangdeh, D. G. Goldstein, J. M. Hofman, J. W. intrusion detection and response,” arXiv preprint arXiv:1911.09853,
Vaughan, and H. Wallach, “Manipulating and measuring model 2019.
interpretability,” arXiv preprint arXiv:1802.07810, 2018. [52] E. Angelini, G. di Tollo, and A. Roli, “A neural network approach
[28] Q. Zhou, F. Liao, C. Mou, and P. Wang, “Measuring interpretability for credit risk evaluation,” The quarterly review of economics and
for different types of machine learning models,” in Pacific-Asia finance, vol. 48, no. 4, pp. 733–755, 2008.
Conference on Knowledge Discovery and Data Mining. Springer, 2018, [53] J. Segal, “Five cs of credit.” [Online]. Available: https:
pp. 295–308. //www.investopedia.com/terms/f/five-c-credit.asp
[29] “Single family loan level dataset - freddie mac.” [Online]. [54] B. Matt et al., Introduction to computer security. Pearson Education
Available: http://www.freddiemac.com/research/datasets/sf India, 2006.
loanlevel dataset.page [55] J. Fürnkranz, D. Gamberger, and N. Lavrač, “Rule learning in a
[30] “Iml-cran package.” [Online]. Available: https://cran.r-project. nutshell,” in Foundations of Rule Learning. Springer, 2012, pp. 19–
org/web/packages/iml/index.html 55.
[31] C. Molnar et al., “Interpretable machine learning: A guide for mak- [56] H. Yang, C. Rudin, and M. Seltzer, “Scalable bayesian rule lists,” in
ing black box models explainable,” E-book at¡ https://christophm. Proceedings of the 34th International Conference on Machine Learning-
github. io/interpretable-ml-book/¿, version dated, vol. 10, 2018. Volume 70. JMLR. org, 2017, pp. 3921–3930.
[32] J. H. Friedman, B. E. Popescu et al., “Predictive learning via rule [57] S. Rüping et al., “Learning interpretable models,” 2006.
ensembles,” The Annals of Applied Statistics, vol. 2, no. 3, pp. 916– [58] S. R. Islam, W. Eberle, and S. K. Ghafoor, “Towards quantification
954, 2008. of explainability in explainable artificial intelligence methods,”
[33] J. H. Friedman, “Greedy function approximation: a gradient boost- arXiv preprint arXiv:1911.10104, 2019.
ing machine,” Annals of statistics, pp. 1189–1232, 2001. [59] G. A. Miller, “The magical number seven, plus or minus two: Some
[34] A. Goldstein, A. Kapelner, J. Bleich, and M. A. Kapelner, “Package limits on our capacity for processing information.” Psychological
‘icebox’,” 2017. review, vol. 63, no. 2, p. 81, 1956.
[35] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. [60] Z. C. Lipton, “The mythos of model interpretability,” arXiv preprint
5–32, 2001. arXiv:1606.03490, 2016.
[36] A. Fisher, C. Rudin, and F. Dominici, “Model class re- [61] P. Gandhi, “Explainable artificial intelligence.” [Online]. Available:
liance: Variable importance measures for any machine learning https://www.kdnuggets.com/2019/01/explainable-ai.html
model class, from the “rashomon” perspective,” arXiv preprint
arXiv:1801.01489, 2018.
[37] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?:
Explaining the predictions of any classifier,” in Proceedings of the
22nd ACM SIGKDD international conference on knowledge discovery
and data mining. ACM, 2016, pp. 1135–1144.
[38] L. S. Shapley, “A value for n-person games,” Contributions to the
Theory of Games, vol. 2, no. 28, pp. 307–317, 1953.
[39] S. Lundberg and S.-I. Lee, “An unexpected unity among
methods for interpreting model predictions,” arXiv preprint
arXiv:1611.07478, 2016.
[40] B. B. . B. Greenwell, “Chapter 16 interpretable machine learning
— hands-on machine learning with r,” https://bradleyboehmke.
github.io/HOML/iml.html, (Accessed on 11/28/2019).
[41] R. Moraffah, M. Karami, R. Guo, A. Raglin, and H. Liu, “Causal
interpretability for machine learning-problems, methods and eval-
uation,” ACM SIGKDD Explorations Newsletter, vol. 22, no. 1, pp.
18–33, 2020.
[42] A. Hartl, M. Bachl, J. Fabini, and T. Zseby, “Explainability and
adversarial robustness for rnns,” arXiv preprint arXiv:1912.09855,
2019.
[43] D. L. Marino, C. S. Wickramasinghe, and M. Manic, “An adversar-
ial approach for explainable ai in intrusion detection systems,” in
IECON 2018-44th Annual Conference of the IEEE Industrial Electronics
Society. IEEE, 2018, pp. 3237–3243.
[44] B. Kim, R. Khanna, and O. O. Koyejo, “Examples are not enough,
learn to criticize! criticism for interpretability,” in Advances in
Neural Information Processing Systems, 2016, pp. 2280–2288.
[45] J. Chen, L. Song, M. J. Wainwright, and M. I. Jordan, “Learning to
explain: An information-theoretic perspective on model interpre-
tation,” arXiv preprint arXiv:1802.07814, 2018.
[46] A. Jung and P. H. J. Nardelli, “An information-theoretic approach
to personalized explainable machine learning,” IEEE Signal Pro-
cessing Letters, 2020.
[47] M. Ancona, E. Ceolini, C. Öztireli, and M. Gross, “Towards better
understanding of gradient-based attribution methods for deep
neural networks,” arXiv preprint arXiv:1711.06104, 2017.
[48] B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas, and
R. Sayres, “Interpretability beyond feature attribution: Quantita-
tive testing with concept activation vectors (tcav),” arXiv preprint
arXiv:1711.11279, 2017.
[49] U. Kursuncu, M. Gaur, and A. Sheth, “Knowledge infused learn-
ing (k-il): Towards deep incorporation of knowledge in deep
learning,” arXiv preprint arXiv:1912.00512, 2019.
[50] S. R. Islam, W. Eberle, S. Bundy, and S. K. Ghafoor, “Infusing
domain knowledge in ai-based” black box” models for better
explainability with application in bankruptcy prediction,” ACM

You might also like