Explainable Artificial Intelligence Approaches
Explainable Artificial Intelligence Approaches
Abstract—The lack of explainability of a decision from an Artificial Intelligence (AI) based “black box” system/model, despite its
superiority in many real-world applications, is a key stumbling block for adopting AI in many high stakes applications of different domain
or industry. While many popular Explainable Artificial Intelligence (XAI) methods or approaches are available to facilitate a
human-friendly explanation of the decision, each has its own merits and demerits, with a plethora of open challenges. We demonstrate
popular XAI methods with a mutual case study/task (i.e., credit default prediction), analyze for competitive advantages from multiple
perspectives (e.g., local, global), provide meaningful insight on quantifying explainability, and recommend paths towards responsible or
arXiv:2101.09429v1 [cs.AI] 23 Jan 2021
human-centered AI using XAI as a medium. Practitioners can use this work as a catalog to understand, compare, and correlate
competitive advantages of popular XAI methods. In addition, this survey elicits future research directions towards responsible or
human-centric AI systems, which is crucial to adopt AI in high stakes applications.
Index Terms—Explainable Artificial Intelligence, Explainability Quantification, Human-centered Artificial Intelligence, Interpretability.
1 I NTRODUCTION
XAI when it comes to these methods and perspectives. interest in XAI research has stemmed from the recent ad-
Some of the popular work/tools on XAI are LIME, vancements in AI, its application to a wide range of areas,
DeepVis Toolbox, TreeInterpreter, Keras-vis, Microsoft Inter- the concerns over unethical use, lack of transparency, and
pretML, MindsDB, SHAP, Tensorboard WhatIf, Tensorflow’s undesired biases in the models. In addition, recent laws
Lucid, Tensorflow’s Cleverhans, etc. However, a few of these by different governments are necessitating more research
work/tools are model specific. For instance, DeepVis, keras- in XAI. According to [6] and [7], XAI encompasses Machine
vis, and Lucid are for a neural network’s explainability, and Learning (ML) or AI systems for demystifying black models
TreeInterpreter is for a tree-based model’s explainability. At internals (e.g., what the models have learned) and/or for
a high level, each of the proposed approaches have similar explaining individual predictions.
concepts, such as feature importance, feature interactions, In 2019, Mueller et al. presents a comprehensive review
shapely values, partial dependence, surrogate models, coun- of the approaches taken by a number of types of “explana-
terfactual, adversarial, prototypes and knowledge infusion. tion systems” and characterizes those into three generations:
However, despite some visible progress in XAI methods, (1) first-generation systems—for instance, expert systems
the quantification or evaluation of explainability is under- from the early 70’s, (2) second generation systems—for in-
focused, and in particular, when it comes to human study- stance, intelligent tutoring systems, and (3) third generation
based evaluations. systems—tools and techniques from the recent renaissance
In this paper, we (1) demonstrate popular meth- starting from 2015 [9]. The first generation systems attempt
ods/approaches towards XAI with a mutual task (i.e., credit to clearly express the internal working process of the system
default prediction) and explain the working mechanism in by embedding expert knowledge in rules often elicited
layman’s terms, (2) compare the pros, cons, and competitive directly from experts (e.g., via transforming rules into natu-
advantages of each approach with their associated chal- ral language expressions). The second generations systems
lenges, and analyze those from multiple perspectives (e.g., can be regarded as the human-computer system designed
global vs local, post-hoc vs ante-hoc, and inherent vs emu- around human knowledge and reasoning capacities to pro-
lated/approximated explainability), (3) provide meaningful vide cognitive support. For instance, arranging the interface
insight on quantifying explainability, and (4) recommend a in such a way that complements the knowledge that the
path towards responsible or human-centered AI using XAI user is lacking. Similar to the first generation systems, the
as a medium. Our survey is only one among the recent ones third generation systems also attempt to clarify the inner
(See Table 1) which includes a mutual test case with useful workings of the systems. But this time, these systems are
insights on popular XAI methods (See Table 4). mostly “black box” (e.g., deep nets, ensemble approaches).
In addition, nowadays, researchers are using advanced
TABLE 1 computer technologies in data visualizations, animation,
Comparison with other Surveys and video, that have a strong potential to drive the XAI
research further. Many new ideas have been proposed for
Survey Reference Mutual test case generating explainable decisions from the need of primarily
Adadi et al., 2018 [8] × accountable, fair, and trust-able systems and decisions.
Mueller et al., 2019 [9] ×
Samek et al., 2017 [6] ×
There has been some previous work [10] that mentions
Molnar et al., 2019 [10] × three notions for quantification of explainability. Two out
Staniak et al., 2018 [11] × of three notions involve experimental studies with humans
Gilpin et al., 2018 [12] × (e.g., domain expert or a layperson, that mainly investigate
Collaris et al., 2018 [13] ×
Ras et al., 2018 [1] × whether a human can predict the outcome of the model)
Dosilovic et al., 2018 [14] × [24], [25], [26], [27], [28]. The third notion (proxy tasks) does
Tjoa et al., 2019 [15] × not involve a human, and instead uses known truths as a
Dosi-Valez et al., 2017 [16] ×
Rudin et al., 2019 [17] ×
metric (e.g., the less the depth of the decision tree, the more
Arrieta et al., 2020 [18] × explainable the model).
Miller et al., 2018 [19] × Some mentionable reviews on XAI are listed in Table
Zhang et al., 2018 [20] × 1. However, while these works provide analysis from one
This Survey X
or more of the mentioned perspectives, a comprehensive
We start with a background of related works (Section review considering all of the mentioned important perspec-
2), followed by a description of the test case in Section tives, using a mutual test case, is still missing. Therefore,
3, and then a review of XAI methods in Section 4. We we attempt to provide an overview using a demonstration
conclude with an overview of quantifying explainability of a mutual test case or task, and then analyze the various
and a discussion addressing open questions and future approaches from multiple perspectives, with some future di-
research directions towards responsible for human-centered rections of research towards responsible or human-centered
AI in Section 5. AI.
XAI approaches. We predict whether a customer is going it works even when the relationship between input and
to default on a mortgage payment (i.e., unable to pay output is non-linear, and even when the features interact
monthly payment) in the near future or not, and explain the with one another (i.e., a correlation among features). In a
decision using different XAI methods in a human-friendly Decision Tree, a path from the root node (i.e., starting node)
way. We use the popular Freddie Mac [29] dataset for the (e.g., credit score in Figure 1) to a leaf node (e.g., default)
experiments. Table 2 lists some important features and their tells how the decision (the leaf node) took place. Usually, the
descriptions. The description of features are taken from the nodes in the upper-level of the tree have higher importance
data set’s [29] user guide. than lower-level nodes. Also, the less the number of levels
We use well-known programming language R’s package (i.e., height) a tree has, the higher the level of explainability
“iml” [30] for producing the results for the XAI methods the tree possesses. In addition, the cutoff point of a node in
described in this review. the Decision Trees provides counterfactual information—for
instance, increasing the value of a feature equal to the cutoff
point will reverse the decision/prediction. In Figure 1, if the
4 E XPLAINABLE A RTIFICIAL I NTELLIGENCE credit score is greater than the cutoff point 748, then the
M ETHODS customer is predicted as non-default. Also, tree-based ex-
This section summarizes different explainability methods planations are contrastive, i.e., a ”what if” analysis provides
with their pros, cons, challenges, and competitive advan- the relevant alternative path to reach a leaf node. According
tages primarily based on two recent comprehensive surveys: to the tree in Figure 1, there are two separate paths (credit
[31] and [16]. We then enhance the previous surveys with a score → delinquency → non-default; and credit score →
multi-perspective analysis, recent research progresses, and non-default) that lead to a non-default classification.
future research directions. [16] broadly categorize methods However, tree-based explanations cannot express the
for explanations into three kinds: Intrinsically Interpretable linear relationship between input features and output. It
Methods, Model Agnostic Methods, and Example-Based also lacks smoothness; slight changes in input can have a big
Explanations. impact on the predicted output. Also, there can be multiple
different trees for the same problem. Usually, the more the
nodes or depth of the tree, the more challenging it is to
4.1 Intrinsically Interpretable Methods
interpret the tree.
The convenient way to achieve explainable results is to
stick with intrinsically interpretable models such as Lin-
ear Regression, Logistic Regression, and Decision Trees by
avoiding the use of “black box” models. However, usually,
this natural explainability comes with a cost in performance.
In a Linear Regression, the predicted target consists
of the weighted sum of input features. So the weight or
coefficient of the linear equation can be used as a medium of
explaining prediction when the number of features is small.
y = b0 + b1 ∗ x1 + ... + bn ∗ xn + (1)
In Formula 1, y is the target (e.g., chances of credit default),
b0 is a constant value known as the intercept (e.g., .33), bi
is the learned feature’s weight or coefficient (e.g., .33) for
the corresponding feature xi (e.g., credit score), and is a
constant error term (e.g., .0001). Linear regression comes
with an interpretable linear relationship among features.
However, in cases where there are multiple correlated fea-
Fig. 1. Decision Trees
tures, the distinct feature influence becomes indeterminable
as the individual influences in prediction are not additive to Decision Rules (simple IF-THEN-ELSE conditions) are
the overall prediction anymore. also an inherent explanation model. For instance, ”IF credit
Logistic Regression is an extension of Linear Regression score is less than or equal to 748 AND if the customer is
to the classification problems. It models the probabilities for delinquent on payment for more than zero days (condition),
classification tasks. The interpretation of Logistic Regression THEN the customer will default on payment (prediction)”.
is different from Linear Regression as it gives a probability Although IF-THEN rules are straightforward to interpret,
between 0 and 1, where the weight might not exactly rep- it is mostly limited to classification problems (i.e., does not
resent the linear relationship with the predicted probability. support a regression problem), and inadequate in describing
However, the weight provides an indication of the direction linear relationships. In addition, the RuleFit algorithm [32]
of influence (negative or positive) and a factor of influence has an inherent interpretation to some extent as it learns
between classes, although it is not additive to the overall sparse linear models that can detect the interaction effects
prediction. in the form of decision rules. Decision rules consist of the
Decision Tree-based models split the data multiple combination of split decisions from each of the decision
times based on a cutoff threshold at each node until it paths. However, besides the original features, it also learns
reaches a leaf node. Unlike Logistic and Linear Regression, some new features to capture the interaction effects of
4
TABLE 2
Dataset description
Feature Description
creditScore A number in between 300 and 850 that indicates the creditworthiness of the borrowers.
originalUPB Unpaid principle balance on the note date.
originalInterestRate Original interest rate as indicated by the mortgage note.
currentLoanDelinquencyStatus Indicates the number of days the borrower is delinquent.
numberOfBorrower Number of borrower who are obligated to repay the loan.
currentInterestRate Active interest rate on the note.
originalCombinedLoanToValue Ratio of all mortgage loans and apprised price of mortgaged property on the note date.
currentActualUPB Unpaid principle balance as of latest month of payment.
defaulted Whether the customer was default on payment (1) or not (0.)
original features. Usually, interpretability degrades with an features. In the real world, this is unusual. Furthermore,
increasing number of features. there is a practical limit of only two features that PD plot
Other interpretable models include the extension of lin- can clearly explain at a time. Also, it is a global method, as it
ear models such as Generalized Linear Models (GLMs) plots the average effect (from all instances) of a feature(s) on
and Generalized Additive Models (GAMs); they help to the prediction, and not for all features on a specific instance.
deal with some of the assumptions of linear models (e.g., The PD plot in Figure 2 shows the effect of credit score on
the target outcome y and given features follow a Gaussian prediction. Individual bar lines along the X axis represent
Distribution; and no interaction among features). However, the frequency of samples for different ranges of credit scores.
these extensions make models more complex (i.e., added
interactions) as well as less interpretable. In addition, a
Naı̈ve Bayes Classifier based on Bayes Theorem, where the
probability of classes for each of the features is calculated
independently (assuming strong feature independence), and
K-Nearest Neighbors, which uses nearest neighbors of a
data point for prediction (regression or classification), also
fall under intrinsically interpretable models.
Fig. 3. Individual Conditional Expectation (ICE) Fig. 5. Accumulated Local Effects (ALE) Plot
the values of the feature to break the true relationship model could be avoided given the surrogate model demon-
between the feature and the true outcome. After shuffling strates a comparable performance. Although a surrogate
the values of the feature, if errors increase, then the feature model comes with interpretation and flexibility (i.e., such
is important. [35] introduced the permutation-based feature as model agnosticism), diverse explanations for the same
importance for Random Forests; later [36] extended the “black box” such as multiple possible decision trees with
work to a model-agnostic version. Feature importance pro- different structures, is a drawback. Besides, some would
vides a compressed and global insight into the ML model’s argue that this is only an illusion of interpretability.
behavior. For example, Figure 7 shows the importance of
each participating feature, current Actual UPB possess the
highest feature importance, and credit score possess the low-
est feature importance. Although feature importance takes
into account both the main feature effect and interaction,
this is a disadvantage as feature interaction is included in
the importance of correlated features. We can see that the
feature current Actual UPB possesses the highest feature
importance (Figure 7), at the same time it also possesses the
highest interaction strength 6. As a result, in the presence
of interaction among features, the feature importance does
not add up to total drop-in of performance. Besides, it is
unclear whether the test set or training set should be used
for feature importance, as it demonstrates variance from run
to run in the shuffled dataset. It is necessary to mention that
feature importance also falls under the global methods.
4.3.3 Prototypes
(i.e., small intentional perturbations in input to make a false Prototypes consist of a selected set of instances that rep-
prediction). However, adversarial examples could help to resent the data very well. Conversely, the set of instances
discover hidden vulnerabilities as well as to improve the that do not represent data well are called criticisms [44].
model. For instance, an attacker can intentionally design Determining the optimal number of prototypes and crit-
adversarial examples to cause the AI system to make a icisms are challenging. For example, customers 1 and 10
mistake (i.e., fooling the machine), which poses greater from Table 3 can be treated as prototypes as those are strong
threats to cyber-security and autonomous vehicles. As an representatives of the corresponding target. On the other
example, the credit default prediction system can be fooled hand, customers 5 and 6 (from Table 3) can be treated as a
for customer 5, just by increasing the credit score by 1 (see criticism as the distance between the data points is minimal,
Table 3), leading to a reversed prediction. and they might be classified under either class from run to
Hartl et al. [42] emphasize on understanding the implica- run of the same or different models.
9
ability into a model from the beginning, or post-hoc, where samples with abnormal samples for better understanding
explainability is incorporated after the regular training of with detailed information.
the actual model (i.e., testing time), (D) whether the method
is model agnostic (i.e., works for any ML model) or specific
to an algorithm, and (E) whether the model is local, provid- 4.5 Knowledge Infusion Techniques
ing instance-level explanations, or global, providing overall
model behavior. [48] propose a concept attribution-based approach (i.e.,
Our analysis says there is a lack of an explainability sensitivity to the concept) that provides an interpretation
method (i.e., a gap in the literature), which is, at the same of the neural network’s internal state in terms of human-
time actual and direct (i.e., does not create an illusion of friendly concepts. Their approach, Testing with CAV (TCAV),
explainability by approximating the model), model agnostic, quantifies the prediction’s sensitivity to a high dimensional
and local, such that it utilizes the full potential of the concept. For example, a user-defined set of examples that
explainability method in different applications. There are defines the concept ’striped’, TCAV can quantify the in-
some recent works that bring external knowledge and infuse fluence of ’striped’ in the prediction of ’zebra’ as a single
that into the model for better interpretation. These XAI number. However, their work is only for image classification
methods have the potential to fill the gap to some extent by and falls under the post-modeling notion (i.e., post-hoc) of
incorporating domain knowledge into the model in a model explanation.
agnostic and transparent way (i.e., not by illusion). [49] propose a knowledge-infused learning that mea-
sures information loss in latent features learned by the
neural networks through Knowledge Graphs (KGs). This
4.4 Other Techniques external knowledge incorporation (via KGs) aids in super-
Chen et al. [45] introduce an instance-wise feature selection vising the learning of features for the model. Although
as a methodology for model interpretation where the model much work remains, they believe that (KGs) will play a
learns a function to extract a subset of most informative crucial role in developing explainable AI systems.
features for a particular instance. The feature selector at- [50] and [51] infuse popular domain principles from the
tempt to maximize the mutual information between selected domain in the model and represent the output in terms of
features and response variables. However, their approach is the domain principle for explainable decisions. In [50], for
mostly limited to posthoc approaches. a bankruptcy prediction problem they use the 5C’s of credit
In a more recent work, [46] study explainable ML us- as the domain principle which is commonly used to analyze
ing information theory where they quantify the effect of key factors: character (reputation of the borrower/firm),
an explanation by the conditional mutual information be- capital (leverage), capacity (volatility of the borrower’s earn-
tween the explanation and prediction considering user back- ings), collateral (pledged asset) and cycle (macroeconomic
ground. Their approach provides personalized explanation conditions) [52], [53]. In [51], for an intrusion detection and
based on the background of the recipient, for instance, a response problem, they incorporate the CIA principles into
different explanation for those who know linear algebra and the model; C stands for confidentiality—concealment of infor-
those who don’t. However, this work is yet to be considered mation or resources, I stands for integrity—trustworthiness
as a comprehensive approach which considers a variety of of data or resources, and A stands for availability—ability
user and their explanation needs. To understand the flow to use the information or resource desired [54]. In both
of information in a Deep Neural Network (DNN), [47] cases, the infusion of domain knowledge leads to better
analyzed different gradient-based attribution methods that explainability of the prediction with negligible compromises
assign an attribution value (i.e., contribution or relevance) to in performance. It also comes with better execution time and
each input feature (i.e., neuron) of a network for each out- a more generalized model that works better with unknown
put neurons. They use a heatmap for better visualizations samples.
where a particular color represents features that contribute Although these works [50], [51] come with unique com-
positively to the activation of target output, and another binations of merits such as model agnosticism, the capability
color for features that suppress the effect on it. of both local and global explanation, and authenticity of
A survey on the visual representation of Convolutional explanation—simulation or emulation free, they are still
Neural Networks (CNNs), by [20], categorizes works based not fully off-the-shelf systems due to some domain-specific
on a) visualization of CNN representations in intermediate configuration requirements. Much work still remains and
network layers, b) diagnosis of CNN representation for needs further attention.
feature space of different feature categories or potential
representation flaws, c) disentanglement of “the mixture of
patterns” encoded in each filter of CNNs, d) interpretable
5 Q UANTIFYING E XPLAINABILITY AND F UTURE
CNNs, and e) semantic disentanglement of CNN represen-
tations. R ESEARCH D IRECTIONS
In the industrial control system, an alarm from the
5.1 Quantifying Explainability
intrusion/anomaly detection system has a very limited role
unless the alarm can be explained with more information. The quantification or evaluation of explainability is an open
[5] design a layer-wise relevance propagation method for challenge. There are two primary directions of research to-
DNN to map the abnormalities between the calculation pro- wards the evaluation of explainability of an AI/ML model:
cess and features. This process helps to compare the normal (1) model complexity-based, and (2) human study-based.
11
TABLE 4
Comparison of different explainability methods from a set of key perspectives (approximation or actual; inherent or not; post-hoc or ante-hoc;
model-agnostic or model specific; and global or local)
5.1.1 Model Complexity-based Explainability Evaluation which ultimately makes it hard to understand the causal
relationship between input and output, compared to an
In the literature, model complexity and (lack of) model individual feature influence in the prediction. In fact, from
interpretability are often treated as the same [10]. For in- our study of different explainability tools (e.g., LIME, SHAP,
stance, in [55], [56], model size is often used as a measure of PDP), we have found that the correlation among features is
interpretability (e.g., number of decision rules, depth of the a key stumbling block to represent feature contribution in
tree, number of non-zero coefficients). a model agnostic way. Keeping the issue of feature inter-
[56] propose a scalable Bayesian Rule List (i.e., proba- actions in mind, [10] propose a technique that uses three
bilistic rule list) consisting of a sequence of IF-THEN rules, measures: number of features, interaction strength among
identical to a decision list or one-sided decision tree. Unlike features, and the main effect (excluding the interaction part)
the decision tree that uses greedy splitting and pruning, of features, to measure the complexity of a post-hoc model
their approach produces a highly sparse and accurate rule for explanation.
list with a balance between interpretability, accuracy, and Although, [10] mainly focuses on model complexity for
computation speed. Similarly, the work of [55] is also rule- post-hoc models, their work was a foundation for the ap-
based. They attempt to evaluate the quality of the rules proach by [58] for the quantification of explainability. Their
using a rule learning algorithm by: the observed coverage, approach to quantify explainability is model agnostic and
which is the number of positive examples covered by the is for a model of any notion (e.g., pre-modeling, post-hoc)
rule, which should be maximized to explain the training using proxy tasks that do not involve a human. Instead,
data well; and consistency, which is the number of negative they use known truth as a metric (e.g., the less number of
examples covered by the rule, which should be minimized features, the more explainable the model). Their proposed
to generalize well to unseen data. formula for explainability gives a score in between 0 and 1
According to [57], while the number of features and the for explainability based on the number of cognitive chunks
size of the decision tree are directly related to interpretabil- (i.e., individual pieces of information) used on the input side
ity, the optimization of the tree size or features (i.e., feature and output side, and the extent of interaction among those
selection) is costly as it requires the generation of a large cognitive chunks.
set of models and their elimination in subsequent steps.
However, reducing the tree size (i.e., reducing complexity) 5.1.2 Human Study-based Explainability Evaluation
increases error, as they could not find a way to formulate The following works deal with the application-level and
the relation in a simple functional form. More recently, [10] human-level evaluation of explainability involving human
attempts to quantify the complexity of the arbitrary machine studies.
learning model with a model agnostic measure. In that [26] investigate the suitability of different alternative
work, the author demonstrates that when the feature in- representation formats (e.g., decision tables, (binary) deci-
teraction (i.e., the correlation among features) increases, the sion trees, propositional rules, and oblique rules) for clas-
quality of representations of explainability tools degrades. sification tasks primarily focusing on the explainability of
For instance, the explainability tool ALE Plot (see Figure results rather than accuracy or precision. They discover that
5 starts to show harsh lines (i.e., zigzag lines) as feature decision tables are the best in terms of accuracy, response
interaction increases. In other words, with more interaction time, the confidence of answer, and ease of use.
comes a more combined influence in the prediction, induced [24] argue that interpretability is not an absolute con-
from different correlated subsets of features (at least two), cept; instead, it is relative to the target model, and may or
12
may not be relative to the human. Their finding suggests open challenges such as (A) a lack of formalism of the
that a model is readily interpretable to a human when it uses explanation, (B) a customized explanation for different types
no more than seven pieces of information [59]. Although, of explanation recipients (e.g., layperson, domain expert,
this might vary from task to task and person to person. another machine), (C) a way to quantify the explanation,
For instance, a domain expert might consume a lot more and (D) quantifying the level of comprehensibility with
detailed information depending on their experience. human studies. Therefore, leveraging the knowledge from
The work of [27] is a human-centered approach, focus- multiple domains, a generic framework could be useful
ing on previous work on human trust in a model from considering the mentioned challenges. As a result, mission-
psychology, social science, machine learning, and human- critical applications from different domains will be able to
computer interaction communities. In their experiment with leverage the black-box model with greater confidence and
human subjects, they vary factors (e.g., number of features, regulatory compliance.
whether the model internals are transparent or a black box)
that make a model more or less interpretable and measures 5.2.2 Towards Fair, Accountable, and Transparent AI-
how the variation impacts the prediction of human subjects. based Models
Their results suggest that participants who were shown a Responsible use of AI is crucial for avoiding risks stemming
transparent model with a small number of features were from a lack of fairness, accountability, and transparency in
more successful in simulating the model’s predictions and the model. Remediation of data, algorithmic, and societal
trusted the model’s predictions. biases is vital to promote fairness; the AI system/adopter
[25] investigate interpretability of a model based on two should be held accountable to affected parties for its deci-
of its definitions: simulatability, which is a user’s ability to sion; and finally, an AI system should be analyzable, where
predict the output of a model on a given input; and “what the degree of transparency should be comprehensible to
if” local explainability, which is a user’s ability to predict have trust in the model and its prediction for mission-
changes in prediction in response to changes in input, given critical applications. Interestingly, XAI enhances understat-
the user has the knowledge of a model’s original prediction ing directly, increasing trust as a side-effect. In addition,
for the original input. They introduce a simple metric called the explanation techniques can help in uncovering potential
runtime operation count that measures the interpretability, risks (e.g., what are possible fairness risks). So it is crucial to
that is, the number of operations (e.g., the arithmetic opera- adhere to fairness, accountability, and transparency princi-
tion for regression, the boolean operation for trees) needed ples in the design and development of explainable models.
in a user’s mind to interpret something. Their findings
suggest that interpretability decreases with an increase in 5.2.3 Human-Machine Teaming
the number of operations.
Despite some progress, there are still some open chal- To ensure the responsible use of AI, the design, devel-
lenges surrounding explainability such as an agreement of opment, and deployment of human-centered AI, that col-
what an explanation is and to whom; a formalism for the laborates with the humans in an explainable manner, is
explanation; and quantifying the human comprehensibility essential. Therefore, the explanation from the model needs
of the explanation. Other challenges include addressing to be comprehensible by the user, and there might be some
more comprehensive human studies requirements and in- supplementary questions that need to be answered for a
vestigating the effectiveness among different approaches clear explanation. So, the interaction (e.g., follow-ups after
(e.g., supervised, unsupervised, semi-supervised) for var- the initial explanation) between humans and machines is
ious application areas (e.g., natural language processing, important. The interaction is more crucial for adaptive ex-
image recognition). plainable models that provide context-aware explanations
based on user profiles such as expertise, domain knowledge,
interests, and cultural backgrounds. The social sciences and
5.2 Future Research Directions human behavioral studies have the potential to impact
The long term goal for current AI initiatives is to contribute XAI and human-centered AI research. Unfortunately, the
to the design, development, and deployment of human- Human-Computer Interaction (HCI) community is kind of
centered artificial intelligent systems, where the agents col- isolated. The combination of HCI empirical studies and
laborate with the human in an interpretable and explainable human science theories could be a compelling force for the
manner, with the intent on ensuring fairness, transparency, design of human-centered AI models as well as furthering
and accountability. To accomplish that goal, we propose a XAI research. Therefore, efforts to bring a human into the
set of research plans/directions towards achieving respon- loop, enabling the model to receive input (repeated feed-
sible or human-centered AI using XAI as a medium. back) from the provided visualization/explanations to the
human, and improving itself with the repeated interactions,
5.2.1 A Generic Framework to Formalize Explainable Arti- has the potential to further human-centered AI. Besides
ficial Intelligence adherence to fairness, accountability, and transparency, the
The work in [50] and [51], demonstrates a way to collect effort will also help in developing models that adhere to our
and leverage domain knowledge from two different do- ethics, judgment, and social norms.
mains, finance and cybersecurity, and further infused that
knowledge into black-box models for better explainability. 5.2.4 Collective Intelligence from Multiple Disciplines
In both of these works, competitive performance with en- From the explanation perspective, there is plenty of research
hanced explainability is achieved. However, there are some in philosophy, psychology, and cognitive science on how
13
people generate, select, evaluate, and represent explana- [3] B. Wyden, “Algorithmic accountability,” https://www.wyden.
tions and associate cognitive biases and social expectations senate.gov/imo/media/doc/Algorithmic%20Accountability%
20Act%20of%202019%20Bill%20Text.pdf, (Accessed on
in the explanation process. In addition, from the interac- 11/21/2019).
tion perspective, human-computer teaming involving social [4] M. T. Esper, “Ai ethical principles,” https://www.defense.gov/
science, the HCI community, and social-behavioral stud- Newsroom/Releases/Release/Article/2091996/dod-adopts-
ies could combine for further breakthroughs. Furthermore, ethical-principles-for-artificial-intelligence/, February 2020,
(Accessed on 03/07/2020).
from the application perspective, the collectively learned [5] Z. Wang, Y. Lai, Z. Liu, and J. Liu, “Explaining the attributes of
knowledge from different domains (e.g., Health-care, Fi- a deep learning based intrusion detection system for industrial
nance, Medicine, Security, Defense) can contribute to fur- control networks,” Sensors, vol. 20, no. 14, p. 3817, 2020.
thering human-centric AI and XAI research. Thus, there is a [6] W. Samek, T. Wiegand, and K.-R. Müller, “Explainable artificial
intelligence: Understanding, visualizing and interpreting deep
need for a growing interest in multidisciplinary research to learning models,” arXiv preprint arXiv:1708.08296, 2017.
promote human-centric AI as well as XAI in mission-critical [7] A. Fernandez, F. Herrera, O. Cordon, M. J. del Jesus, and F. Mar-
applications from different domains. celloni, “Evolutionary fuzzy systems for explainable artificial in-
telligence: why, when, what for, and where to?” ieee Computational
intelligenCe magazine, vol. 14, no. 1, pp. 69–81, 2019.
[8] A. Adadi and M. Berrada, “Peeking inside the black-box: A survey
6 C ONCLUSION on explainable artificial intelligence (xai),” IEEE Access, vol. 6, pp.
52 138–52 160, 2018.
We demonstrate and analyze mutual XAI methods using [9] S. T. Mueller, R. R. Hoffman, W. Clancey, A. Emrey, and G. Klein,
a mutual test case to explain competitive advantages and “Explanation in human-ai systems: A literature meta-review, syn-
elucidate the challenges and further research directions. opsis of key ideas and publications, and bibliography for explain-
Most of the available works on XAI are on the post-hoc able ai,” arXiv preprint arXiv:1902.01876, 2019.
[10] C. Molnar, G. Casalicchio, and B. Bischl, “Quantifying model
notion of explainability. However, the post-hoc notion of complexity via functional decomposition for better post-hoc in-
explainability is not purely transparent and can be mis- terpretability,” in Joint European Conference on Machine Learning and
leading, as it explains the decision after it has been made. Knowledge Discovery in Databases. Springer, 2019, pp. 193–204.
The explanation algorithm can be optimized to placate [11] M. Staniak and P. Biecek, “Explanations of model predictions with
live and breakdown packages,” arXiv preprint arXiv:1804.01955,
subjective demand, primarily stemming from the emulation 2018.
effort of the actual prediction, and the explanation can be [12] L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Ka-
misleading, even when it seems plausible [60], [61]. Thus, gal, “Explaining explanations: An overview of interpretability of
machine learning,” in 2018 IEEE 5th International Conference on data
many suggest not to explain black-box models using post-
science and advanced analytics (DSAA). IEEE, 2018, pp. 80–89.
hoc notions, instead, they suggest adhering to simple and [13] D. Collaris, L. M. Vink, and J. J. van Wijk, “Instance-level ex-
intrinsically explainable models for high stakes decisions planations for fraud detection: A case study,” arXiv preprint
[17]. Furthermore, from the literature review, we find that arXiv:1806.07129, 2018.
[14] F. K. Došilović, M. Brčić, and N. Hlupić, “Explainable artificial
explainability in pre-modeling is a viable option to avoid intelligence: A survey,” in 2018 41st International convention on
the transparency related issues, albeit, under-focused. In information and communication technology, electronics and microelec-
addition, knowledge infusion techniques have the potential tronics (MIPRO). IEEE, 2018, pp. 0210–0215.
to enhance explainability greatly, although, also an under- [15] E. Tjoa and C. Guan, “A survey on explainable artificial intelli-
gence (xai): towards medical xai,” arXiv preprint arXiv:1907.07374,
focused challenge. Therefore, we need more focus on the 2019.
explainability of “black box” models using domain knowl- [16] F. Doshi-Velez and B. Kim, “Towards a rigorous science of inter-
edge. At the same time, we need to focus on the evaluation pretable machine learning,” arXiv preprint arXiv:1702.08608, 2017.
or quantification of explainability using both human and [17] C. Rudin, “Stop explaining black box machine learning models
for high stakes decisions and use interpretable models instead,”
non-human studies. We believe this review provides a good Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, 2019.
insight into the current progress on XAI approaches, eval- [18] A. B. Arrieta, N. Dı́az-Rodrı́guez, J. Del Ser, A. Bennetot, S. Tabik,
uation and quantification of explainability, open challenges, A. Barbado, S. Garcı́a, S. Gil-López, D. Molina, R. Benjamins et al.,
and a path towards responsible or human-centered AI using “Explainable artificial intelligence (xai): Concepts, taxonomies,
opportunities and challenges toward responsible ai,” Information
XAI as a medium. Fusion, vol. 58, pp. 82–115, 2020.
[19] T. Miller, “Explanation in artificial intelligence: Insights from the
social sciences,” Artificial Intelligence, 2018.
ACKNOWLEDGMENTS [20] Q.-s. Zhang and S.-C. Zhu, “Visual interpretability for deep
learning: a survey,” Frontiers of Information Technology & Electronic
Our sincere thanks to Christoph Molnar for his open E- Engineering, vol. 19, no. 1, pp. 27–39, 2018.
book on Interpretable Machine Learning and contribution [21] B. Chandrasekaran, M. C. Tanner, and J. R. Josephson, “Explaining
control strategies in problem solving,” IEEE Intelligent Systems,
to the open-source R package “iml”. Both were very useful no. 1, pp. 9–15, 1989.
in conducting this survey. [22] W. R. Swartout and J. D. Moore, “Explanation in second generation
expert systems,” in Second generation expert systems. Springer,
1993, pp. 543–585.
R EFERENCES [23] W. R. Swartout, “Rule-based expert systems: The mycin experi-
ments of the stanford heuristic programming project: Bg buchanan
[1] G. Ras, M. van Gerven, and P. Haselager, “Explanation methods and eh shortliffe,(addison-wesley, reading, ma, 1984); 702 pages,”
in deep learning: Users, values, concerns and challenges,” in 1985.
Explainable and Interpretable Models in Computer Vision and Machine [24] A. Dhurandhar, V. Iyengar, R. Luss, and K. Shanmugam, “Tip:
Learning. Springer, 2018, pp. 19–36. Typifying the interpretability of procedures,” arXiv preprint
[2] B. Goodman and S. Flaxman, “Eu regulations on algorithmic arXiv:1706.02952, 2017.
decision-making and a “right to explanation”,” in ICML workshop [25] S. A. Friedler, C. D. Roy, C. Scheidegger, and D. Slack, “Assess-
on human interpretability in machine learning (WHI 2016), New York, ing the local interpretability of machine learning models,” arXiv
NY. http://arxiv. org/abs/1606.08813 v1, 2016. preprint arXiv:1902.03501, 2019.
14
[26] J. Huysmans, K. Dejaeger, C. Mues, J. Vanthienen, and B. Baesens, SIGKDD Conference on Knowledge Discovery and Data Mining, 2019,
“An empirical evaluation of the comprehensibility of decision Anomaly Detection in Finance Workshop, 2019.
table, tree and rule based predictive models,” Decision Support [51] S. R. Islam, W. Eberle, S. K. Ghafoor, A. Siraj, and M. Rogers,
Systems, vol. 51, no. 1, pp. 141–154, 2011. “Domain knowledge aided explainable artificial intelligence for
[27] F. Poursabzi-Sangdeh, D. G. Goldstein, J. M. Hofman, J. W. intrusion detection and response,” arXiv preprint arXiv:1911.09853,
Vaughan, and H. Wallach, “Manipulating and measuring model 2019.
interpretability,” arXiv preprint arXiv:1802.07810, 2018. [52] E. Angelini, G. di Tollo, and A. Roli, “A neural network approach
[28] Q. Zhou, F. Liao, C. Mou, and P. Wang, “Measuring interpretability for credit risk evaluation,” The quarterly review of economics and
for different types of machine learning models,” in Pacific-Asia finance, vol. 48, no. 4, pp. 733–755, 2008.
Conference on Knowledge Discovery and Data Mining. Springer, 2018, [53] J. Segal, “Five cs of credit.” [Online]. Available: https:
pp. 295–308. //www.investopedia.com/terms/f/five-c-credit.asp
[29] “Single family loan level dataset - freddie mac.” [Online]. [54] B. Matt et al., Introduction to computer security. Pearson Education
Available: http://www.freddiemac.com/research/datasets/sf India, 2006.
loanlevel dataset.page [55] J. Fürnkranz, D. Gamberger, and N. Lavrač, “Rule learning in a
[30] “Iml-cran package.” [Online]. Available: https://cran.r-project. nutshell,” in Foundations of Rule Learning. Springer, 2012, pp. 19–
org/web/packages/iml/index.html 55.
[31] C. Molnar et al., “Interpretable machine learning: A guide for mak- [56] H. Yang, C. Rudin, and M. Seltzer, “Scalable bayesian rule lists,” in
ing black box models explainable,” E-book at¡ https://christophm. Proceedings of the 34th International Conference on Machine Learning-
github. io/interpretable-ml-book/¿, version dated, vol. 10, 2018. Volume 70. JMLR. org, 2017, pp. 3921–3930.
[32] J. H. Friedman, B. E. Popescu et al., “Predictive learning via rule [57] S. Rüping et al., “Learning interpretable models,” 2006.
ensembles,” The Annals of Applied Statistics, vol. 2, no. 3, pp. 916– [58] S. R. Islam, W. Eberle, and S. K. Ghafoor, “Towards quantification
954, 2008. of explainability in explainable artificial intelligence methods,”
[33] J. H. Friedman, “Greedy function approximation: a gradient boost- arXiv preprint arXiv:1911.10104, 2019.
ing machine,” Annals of statistics, pp. 1189–1232, 2001. [59] G. A. Miller, “The magical number seven, plus or minus two: Some
[34] A. Goldstein, A. Kapelner, J. Bleich, and M. A. Kapelner, “Package limits on our capacity for processing information.” Psychological
‘icebox’,” 2017. review, vol. 63, no. 2, p. 81, 1956.
[35] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. [60] Z. C. Lipton, “The mythos of model interpretability,” arXiv preprint
5–32, 2001. arXiv:1606.03490, 2016.
[36] A. Fisher, C. Rudin, and F. Dominici, “Model class re- [61] P. Gandhi, “Explainable artificial intelligence.” [Online]. Available:
liance: Variable importance measures for any machine learning https://www.kdnuggets.com/2019/01/explainable-ai.html
model class, from the “rashomon” perspective,” arXiv preprint
arXiv:1801.01489, 2018.
[37] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?:
Explaining the predictions of any classifier,” in Proceedings of the
22nd ACM SIGKDD international conference on knowledge discovery
and data mining. ACM, 2016, pp. 1135–1144.
[38] L. S. Shapley, “A value for n-person games,” Contributions to the
Theory of Games, vol. 2, no. 28, pp. 307–317, 1953.
[39] S. Lundberg and S.-I. Lee, “An unexpected unity among
methods for interpreting model predictions,” arXiv preprint
arXiv:1611.07478, 2016.
[40] B. B. . B. Greenwell, “Chapter 16 interpretable machine learning
— hands-on machine learning with r,” https://bradleyboehmke.
github.io/HOML/iml.html, (Accessed on 11/28/2019).
[41] R. Moraffah, M. Karami, R. Guo, A. Raglin, and H. Liu, “Causal
interpretability for machine learning-problems, methods and eval-
uation,” ACM SIGKDD Explorations Newsletter, vol. 22, no. 1, pp.
18–33, 2020.
[42] A. Hartl, M. Bachl, J. Fabini, and T. Zseby, “Explainability and
adversarial robustness for rnns,” arXiv preprint arXiv:1912.09855,
2019.
[43] D. L. Marino, C. S. Wickramasinghe, and M. Manic, “An adversar-
ial approach for explainable ai in intrusion detection systems,” in
IECON 2018-44th Annual Conference of the IEEE Industrial Electronics
Society. IEEE, 2018, pp. 3237–3243.
[44] B. Kim, R. Khanna, and O. O. Koyejo, “Examples are not enough,
learn to criticize! criticism for interpretability,” in Advances in
Neural Information Processing Systems, 2016, pp. 2280–2288.
[45] J. Chen, L. Song, M. J. Wainwright, and M. I. Jordan, “Learning to
explain: An information-theoretic perspective on model interpre-
tation,” arXiv preprint arXiv:1802.07814, 2018.
[46] A. Jung and P. H. J. Nardelli, “An information-theoretic approach
to personalized explainable machine learning,” IEEE Signal Pro-
cessing Letters, 2020.
[47] M. Ancona, E. Ceolini, C. Öztireli, and M. Gross, “Towards better
understanding of gradient-based attribution methods for deep
neural networks,” arXiv preprint arXiv:1711.06104, 2017.
[48] B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas, and
R. Sayres, “Interpretability beyond feature attribution: Quantita-
tive testing with concept activation vectors (tcav),” arXiv preprint
arXiv:1711.11279, 2017.
[49] U. Kursuncu, M. Gaur, and A. Sheth, “Knowledge infused learn-
ing (k-il): Towards deep incorporation of knowledge in deep
learning,” arXiv preprint arXiv:1912.00512, 2019.
[50] S. R. Islam, W. Eberle, S. Bundy, and S. K. Ghafoor, “Infusing
domain knowledge in ai-based” black box” models for better
explainability with application in bankruptcy prediction,” ACM