An Overview of XAI Algorithms
An Overview of XAI Algorithms
Abstract—In the field of artificial intelligence (AI) and machine explored. Lack of transparency and interpretability can lead to
2023 International Automatic Control Conference (CACS) | 979-8-3503-0635-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/CACS60074.2023.10326174
learning, which have already found extensive applications in a loss of trust, making troubleshooting and improvements
various domains, the challenges posed by their complexity and difficult, especially in high-risk domains like healthcare and
opacity cannot be disregarded. This paper delves into Explainable finance. Hence, achieving "Explainable AI" (XAI) has become
Artificial Intelligence (XAI), a technology aimed at addressing the crucial. By enabling people to comprehend AI’s decision-
‘black box’ problem of AI and enhancing decision transparency. making processes, XAI increases confidence in AI usage and
We will discuss two primary approaches to XAI: interpretability facilitates quicker problem-solving.
within models and post-training interpretability. We will analyze
machine learning models inherently possessing interpretability, This paper aims to delve into the concept of explainable
such as decision trees. Additionally, we will provide a detailed artificial intelligence, focusing mainly on the importance and
introduction to prominent XAI algorithms like LIME, LRP, and applications of XAI. In the following sections, we will first
SHAP, among others. This paper will explore the current state of introduce the concept of interpretability and then elucidate the
XAI, its challenges, and its future directions and potential. We significance of transparent design interpretability.
hope this research contributes to the comprehension and Subsequently, we will delve deep into XAI algorithms, briefly
enhancement of AI interpretability. introducing the methods and techniques currently employed
I. INTRODUCTION for realizing XAI. Furthermore, we will discuss the challenges
encountered on the path to XAI and explore the prospects of
Introduce the significance of artificial intelligence (AI) and XAI. Finally, we will summarize our discourse, emphasizing
machine learning and their applications in various domains. XAI’s significant impact on ensuring AI technology’s
Subsequently, discuss a significant challenge within this reliability and credibility.
context: interpretability. Explainable Artificial Intelligence
(XAI) emerges as a response to this challenge, aiming to In the upcoming chapters, we will further explore various
provide explanations and transparency for complex models aspects of XAI, unveiling its importance and influence within
and, lastly, provide an overview of the structure and purpose the evolving field of artificial intelligence. Section two
of this paper. explains XAI, Section three covers model-internal
interpretability, Section four describes XAI algorithms,
With the rapid advancement of technology, AI and Section five summarizes the XAI algorithm described above.,
machine learning are profoundly altering our lifestyles and and, lastly, Section six presents the conclusion.
work methods. These technologies have found applications
across numerous domains, from autonomous driving cars to II. OVERVIEW OF EXPLAINABLE ARTIFICIAL
voice assistants and personalized recommendation systems, INTELLIGENCE
ushering in new possibilities. Their superiority lies in their A. Overview of XAI
ability to surpass human capabilities in speed and accuracy,
making unprecedented predictions and decisions through Explainable Artificial Intelligence (XAI) is a critical
learning from vast datasets. This automated approach not only technology designed to unveil the inner workings of machine
saves costs and enhances productivity but also expands our learning models, rendering their decision-making processes no
boundaries of knowledge, enabling human involvement in longer a mysterious black box. The significance of XAI lies in
areas previously deemed inaccessible, such as genetics. AI has dismantling the opacity, establishing trust in artificial
become an essential tool for genetic data analysis, aiding intelligence, and permitting tracking and adjustments when
researchers in exploring solutions for diseases. issues arise. XAI’s objective is to enhance the transparency of
a model’s decision-making process, enabling people to
However, interpretability has become an essential and understand how it arrives at its predictions or judgments,
pressing challenge as AI technology proliferates. Despite AI’s thereby bolstering confidence in AI technology. This chapter
remarkable successes in many fields, its "black box" nature has introduces two primary XAI approaches: Transparent Design
raised concerns. While AI can make highly accurate and post-hoc Explanation.
predictions, its decision-making process often needs to be
B. Transparent Design
Transparent design refers to specific machine learning
Yen-Wei Chen is pursuing a Master 's. degree in Department of models designed with a degree of transparency, allowing
Management, Information, System, National ChengChi University, Taipei, individuals to comprehend how the model transitions from
Taiwan. (e-mail: [email protected])
Shih-Yi Chien is with the Department of Management, Information,
input data to output predictions. However, there exists a trade-
System, National ChengChi University, Taipei, Taiwan. (e-mail: off between transparent design and predictive accuracy.
[email protected]) Simpler models such as logistic regression, linear regression,
Fang Yu is with the Department of Management, Information, System, and decision trees often possess higher internal interpretability
National ChengChi University, Taipei, Taiwan. (e-mail: [email protected]) since their decision-making processes are relatively
Authorized licensed use limited to: University of Ottawa. Downloaded on November 15,2024 at 12:31:49 UTC from IEEE Xplore. Restrictions apply.
straightforward and comprehensible. However, these models Scholars have begun to explore this area by making tree
might experience a slight decline in predictive accuracy. ensemble classifiers interpretable using prototypes that are
Overall, transparent design presents a trade-off between central according to a distance function derived from the tree
advantages and drawbacks, necessitating a balance within ensemble [4] and by utilizing confidence tests and qualitative
specific application scenarios. feedback to observe the effectiveness of the explanations.
C. Post-hoc Explanation IV. INTRODUCTION TO XAI ALGORITHMS
The post-hoc explanation involves using techniques to In addition to model-internal interpretability, a prevailing
explain a model’s prediction process after completing its approach to achieving interpretability involves employing
training. These techniques are typically employed for models external algorithms to elucidate decision-making processes. In
with powerful performance but are challenging to interpret, comparison to Machine Learning Models with transparent
such as deep neural networks. Post-hoc explanation methods design, this approach offers several advantages and
typically focus on generating interpretable feature importance, characteristics:
impact analysis, or local explanations. Some standard Post-hoc
explanation techniques include Local Interpretable Model- ● Overcoming Limitations: External interpretability
Agnostic Explanations (LIME), Shapley Additive methods are not confined to simple model
explanations (SHAP), and Layer-wise Relevance Propagation architectures but can be applied to more complex and
(LRP). These methods aim to facilitate the understanding and more extensive models. This breaks free from the
explanation of a model’s prediction process by revealing the constraints intrinsic interpretability models may face
critical features that contribute to specific predictions. due to their simplicity.
This paper will delve into these two XAI approaches, ● Maintaining Accuracy and Enhancing Explanatory
gaining a deeper understanding of their mechanisms, Power: By utilizing external algorithms, developers
advantages, and limitations. This exploration will aid us in can uphold the original accuracy while shedding light
better comprehending how to achieve explainable artificial on large-scale models’ "black-box" decision-making
intelligence, thereby enhancing the credibility and reliability processes. This augments explanatory power and aids
of AI technology. developers in identifying and rectifying potential
issues.
III. MACHINE LEARNING MODELS WITH TRANSPARENT
DESIGN ● Diversity: These algorithms exhibit remarkable
diversity and can be applied to intricate models such
Machine Learning Models with transparent design are as neural networks, enhancing their generality and
often implemented through simple model architectures. flexibility.
However, due to their simplicity, their accuracy might differ
from some complex neural network models. Notable examples However, this approach also entails disadvantages:
of Machine Learning Models with transparent design include
linear regression, where the simple mathematical form can ● Need for Additional Methods: Extracting explanations
clearly demonstrate the relationship between features and from complex models might require supplementary
prediction targets. For instance, a positive coefficient indicates methods and tools.
a positive correlation, a negative coefficient indicates a
negative correlation, and the size of the coefficient can signify ● Potential Lack of Reflectiveness: Explanations
importance. These relationships can even be further elucidated derived post-prediction may only encompass some
through visualization. However, decision trees [5] are also details of the underlying model.
indispensable in this context. Decision trees generate a tree
● Credibility Concerns: Post-hoc explanations’ logic
structure based on training data, and during training, they use
might not comprehensively represent the workings of
algorithms like InfoGain, Gini, etc., to evaluate the branches
intricate models, potentially leading to trust issues.
and identify the most suitable rules. The tree-like structure of
decision trees, with branch nodes representing a binary rule, In essence, external interpretability algorithms offer
can convey explanations. Visualization also aids in advantages in terms of increased explanatory power, expanded
understanding how the model classifies or makes regression applicability, and retention of original accuracy. Nonetheless,
predictions based on features. One can also track the path of a they might introduce complexity and trust-related challenges.
single sample within the tree to explain its prediction, helping The selection and application of these algorithms necessitate
to understand how the prediction for a specific sample is careful consideration based on specific requirements and
derived from its feature values. In medical diagnosis, decision contexts.
trees, particularly when set within a framework of Bayesian
averaging, have been proven to be robust automatic Subsequently, renowned XAI algorithms will be
classification systems. Their interpretability makes them more introduced. Here are three representative Post-hoc Explanation
readily accepted by clinicians than "black box" models [1]. algorithms:
However, with the rise of the significant data era and the ● LIME (Local Interpretable Model-agnostic
growing demands for prediction accuracy, even simple Explanations): LIME primarily provides explanations
decision trees have begun to grow in complexity. Random for individual predictions by training a simple model
forests [2] and boosted trees [3] represent this trend. The vast within a localized region near a specific prediction to
input space has led to a significant decline in interpretability. comprehend the behavior of complex models.
Authorized licensed use limited to: University of Ottawa. Downloaded on November 15,2024 at 12:31:49 UTC from IEEE Xplore. Restrictions apply.
● LRP (Layer-wise Relevance Propagation): LRP is interpretable and offers insights into how the complex model
tailored for deep learning models, allowing for a better operates within the local region. Finally, by examining the
understanding of the contribution of individual simple model, LIME provides an intuitive explanation of the
features behind specific predictions through complex model’s behavior on the specific instance. This
backtracking network layers. example might reveal which specific features the complex
model focuses on when classifying the image as a "cat."
● SHAP (Shapley Additive exPlanations): SHAP is
rooted in game theory, quantifying the contribution of B. LRP (Layer-Wise Relevance Propagation)
each feature to predictions, enabling users to perceive Layer-Wise Relevance Propagation [7] involves two main
the importance of individual features in predictions. steps. Firstly, during the training process of a conventional
A. LIME (Local Interpretable Model-Agnostic Explanation) neural network, backpropagation is performed to calculate
predictions while retaining all intermediate computations.
LIME [6] is an approach that aims to provide Subsequently, the second step reverses the neural network,
interpretability by perturbing features to determine which starting from the last layer and computing relevance scores
features’ presence significantly impacts the output result. layer by layer backward. These relevance scores are then
Unlike the conventional notion of explaining through propagated back to the initial input layer of the model. To
perturbing features, LIME also considers human cognitive simplify, let us consider an image: the first layer is the input
limitations, ensuring that the algorithm generates layer representing pixel information, and the last layer is the
comprehensible explanations. Before using LIME, model’s classification output, determining the category.
distinguishing between features and interpretable data is Layer-wise relevance Propagation can calculate corresponding
necessary. Features serve as the input to the model, but the relevance scores for each feature dimension at each layer.
results obtained after processing them may need to be more These scores help explain the impact of input features from the
interpretable. For instance, the input comprises individual
previous layer on the final prediction. This process is akin to
pixels in the context of images. However, processing explaining the model’s predictions. By computing the relative
thousands of pixels might lead to inefficiency and suboptimal importance of each feature, we can better understand the basis
results. To address this, superpixels from image processing can for the model’s decision-making.
be used as interpretable data input for LIME calculations.
LRP employs the structure of machine learning to compute
LIME designs an objective function to acquire relevance scores by performing calculations in reverse within
explanations. the model. Assume that a node in the i-th layer is connected to
a node in the j-th layer’s preceding layer in the model. The
(1) weight W i->j of i * i->j is used to calculate j during output.
However, when computing relevance scores for LRP, j
The explanation is represented by g ∈ G, where G calculates the relevance score of i to j. The formula is as
represents models that are possibly more easily interpretable, follows:
such as linear models or decision trees. These models can be
presented to users through visuals or text. g describes whether
the part to be explained exists and is designed to remain (2)
sufficiently simple. The complexity of explaining g ∈ G is
measured using Ω(g). f denotes the complex model being R represents the relevance score, j is the dimension node,
explained, while πx(z) calculates the local similarity. x and i is the dimension node in the previous layer connected to
represents the data point being explained, i.e., the target j. z_ij represents the importance of j to i. Through this formula,
instance for understanding how the model makes predictions. the importance of the node to the output can be derived. In the
z assists in comprehending the model’s behavior near x by context of images, this translates to calculating the importance
measuring the similarity between instances. The entire of each pixel concerning the output. The LRP introduced in
expression L(f, g, πx) quantifies how much the simple model this paper is the fundamental LRP. For different types of
g approximates the complex model f within a local scope. In activation functions, LRP also has more complex expressions.
other words, it measures how unfaithful g is to f within πx. A
Using the example of a cat, LRP can use the algorithm
lower unfaithfulness indicates that the simple model better above to compute which part of the pixels is crucial for the
approximates the behavior of the complex model within the model’s decision, as described earlier. We use a conventional
local scope. Lastly, a balance is sought between simplicity machine learning model to classify initially. Then, through
(interpretability) and fidelity (consistency with the original LRP, we calculate the relevance scores for each pixel. After
model), which is precisely where the challenge lies within this obtaining these scores, a common approach involves using
method. heatmaps to identify the key areas. For instance, in the case of
LIME’s impact on image interpretation is particularly a cat, it might pinpoint the cat’s face and fur as crucial while
significant. To illustrate, when explaining an image classified assigning lower relevance scores to the background, as it is not
as a cat, LIME begins by creating perturbed versions of the a discriminative factor. Another prominent example is
original image and using a complex model to predict these distinguishing between huskies and wolves. If the model has
perturbed instances. This process attempts to understand how issues, it might use the background for differentiation. In such
the model responds to slight variations in input. Subsequently, cases, we can adjust the model based on the problems
LIME trains a simple linear model within a local region near identified by LRP.
the original instance. This model endeavors to approximate the
behavior of the complex model. The simple model is easily
Authorized licensed use limited to: University of Ottawa. Downloaded on November 15,2024 at 12:31:49 UTC from IEEE Xplore. Restrictions apply.
C. SHAP (SHapley Additive explanation) each feature by analyzing the neural network’s structure and
Shapley’s Additive explanation (SHAP) [8] is a method parameters. Linear SHAP decomposes feature contributions
used to assess feature importance. It quantifies the impact of into feature weights, well-suited for linear models.
features on predictions by calculating the Shapley values of the
conditional expectation function of the original model. It V. CONCLUSION
provides a unified and comprehensive way of evaluating Introducing three distinct eXplainable Artificial
feature importance while maintaining interpretability. It Intelligence (XAI) algorithms, let us summarize and compare
employs Additive Feature Attribution Methods to accumulate their differences. Firstly, LIME is a local interpretability
the influence of each feature, thus explaining the overall method that focuses on explaining individual predictions. It
prediction results. This approach benefits complex models as approximates the predictive behavior of the model by
it simplifies the model interpretation problem into individual generating interpretable data points in the vicinity. LIME
feature contribution issues. Additionally, SHAP employs creates a simple model, such as linear regression, to explain
Shapley Values to quantify feature importance and, in cases of the model’s behavior around specific inputs. LIME
non-linearity or non-independent inputs, averages all possible concentrates on the impact of local features, aiding in
feature orders to obtain explanatory results. This method capturing the model’s interpretation in specific contexts.
combines feature importance and interactions, offering deeper Secondly, LRP is an interpretability method tailored for neural
insights to understand prediction outcomes while maintaining networks, aiming to explain predictions of deep learning
interpretability. models. LRP redistributes predictions onto different input
Within the category of Additive Feature Attribution features, thereby calculating each feature’s influence on
Methods, one unique solution satisfies the following three predictions. It emphasizes the importance of each layer in the
properties. The first property is Local Accuracy, ensuring that model, providing insights into the global behavior of neural
the predictive performance of the interpretive model on networks. Lastly, SHAP is a versatile method for evaluating
simplified inputs closely matches the original model’s feature importance applicable to various models. Based on
predictions for the same inputs, maintaining accuracy and Shapley value concepts from cooperative game theory, SHAP
interpretability. The second property, Missingness, implies explains model predictions by computing the contribution of
that if a feature is missing from the original data, its influence features. SHAP offers a comprehensive and consistent
on the prediction outcome should be considered negligible. approach to feature importance assessment, applicable to
The third property, Consistency, states that if a feature’s single predictions or entire datasets while maintaining model
impact on prediction is enhanced or remains stable despite the interpretability.
influence of other features, the explanation for that feature In conclusion, these interpretability methods have their
should not diminish. unique characteristics and suitable scenarios. LIME focuses on
Shapley Values satisfy these three properties, enhancing local explanations for single predictions, aiding in
the credibility and usability of the model’s interpretation and understanding why a model makes a specific prediction for a
providing more valuable information. For instance, consider a particular instance. This is valuable for quickly grasping
set of housing data, including features for each house, such as model decisions, such as debugging models, validating
area, geographic location, and number of bedrooms. Using a predictions, or providing explanations in specific cases. LRP
machine learning model like a house price prediction model, centers on a global perspective, revealing the functioning of
we aim to predict the price of each house. SHAP can be used the entire neural network. Through backward propagation,
to explain why the model gives a specific price prediction for LRP interprets the model’s output as a linear combination of
a particular house. For a specific house’s price prediction, we input features, allowing us to comprehend how the model
compute the SHAP values for each feature. These SHAP makes decisions across the network. This is particularly
values for features reflect their impact on the predicted house beneficial for deep learning models, especially in domains like
price. We arrive at an overall explanatory result by aggregating autonomous driving, where insights into how the model
the SHAP values for each feature and observing their transitions from perception to action are crucial. SHAP
interactions. This result explains why the model predicts a operates on both global and local levels. Leveraging
certain price for that specific house. This explanation gives us cooperative game theory, it computes the contribution of each
insights into the reasons behind the model’s price prediction. feature to model outputs while considering feature
For instance, the model might attribute the high price interactions. This enables us to explore relative feature
prediction to the house’s large area and favorable geographic importance and how they collaboratively impact predictions,
location. suitable for cases with complex feature interactions, such as
finance and credit risk assessment.
Apart from the basic SHAP, the current SHAP offers
various model-specific and model-agnostic methods for In the evaluation aspect, limited methodology is available
calculating feature importance based on different machine- to assess these algorithms. Moreover, interpretability is
learning models. For instance, Kernel SHAP[8] employs subjective; human evaluation might introduce some
Monte Carlo simulations to estimate SHAP values and can be discrepancies. For instance, the same explanation might be
applied to various models. Tree SHAP[9] focuses on decision perceived differently from different perspectives—some
trees, decomposing feature contributions into contributions of might find an explanation satisfactory, while others might not.
each node. Deep SHAP[10, 11] is suitable for deep learning, To objectively assess interpretability algorithms for specific
connecting specifec rules[12]. It utilizes the model’s forward models, apart from selecting an appropriate algorithm based
and backward propagation to approximate the contribution of on the provided introduction and considering the data and
Authorized licensed use limited to: University of Ottawa. Downloaded on November 15,2024 at 12:31:49 UTC from IEEE Xplore. Restrictions apply.
model, one can evaluate stability and explanatory consistency. [10] Chen, H., Lundberg, S. M., & Lee, S.-I. (2022). “Explaining a series of
Stability examines whether an algorithm generates similar or models by propagating Shapley values. ” Nature Communications,
13(1), 4512.
consistent explanation results for the same data instances or [11] P. Schwab and W. Karlen, (2019) “Cxplain: Causal explanations for
samples across multiple tests. In contrast, explanatory model interpretation under uncertainty, ” in Advances in Neural
consistency ensures that the generated explanation results Information Processing Systems, pp. 10220-10230.
maintain similarity in their core features across different [12] A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important
instances. In other words, essential features in the generated features through propagating activation differences, ” in Proc. 34th
International Conference on Machine Learning, 2017, pp. 3145-3153.
explanation results must remain consistent and not vary due to
different contexts, ensuring consistency.
REFERENCES
[1] Schetinin, V. et al. (2007) “Confident interpretation of bayesian
decision tree ensembles for clinical applications,” IEEE Transactions
on Information Technology in Biomedicine, 11(3), pp. 312–319.
[2] Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.
[3] Friedman, J.H. (2001) “Greedy function approximation: A gradient
boosting machine.,” The Annals of Statistics, 29(5).
[4] Tan, S. et al. (2020) “Tree space prototypes,” Proceedings of the 2020
ACM-IMS on Foundations of Data Science Conference
[5] Myles, A.J. et al. (2004) “An introduction to decision tree modeling,”
Journal of Chemometrics, 18(6), pp. 275–285.
[6] Ribeiro, M.T., Singh, S. and Guestrin, C. (2016) “why should I trust
you?,” Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining
[7] Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., &
Samek, W. (2015).On pixel-wise explanations for non-linear classifier
decisions by layer-wise relevance propagation. PloS one, 10(7),
e0130140.
[8] S. M. Lundberg and S.-I. Lee..( 2017) “A unified approach to
interpreting model predictions. In Advances in Neural Information
Processing Systems. ” 4765–4774.
[9] Lundberg, S. M. et al. “From local explanations to global understanding
with explainable AI for trees. ” In Nat. Mach. Intell. 2, 56–67
Authorized licensed use limited to: University of Ottawa. Downloaded on November 15,2024 at 12:31:49 UTC from IEEE Xplore. Restrictions apply.