Explainable Artificial Intelligence A Review and Case Study On Model-Agnostic Methods
Explainable Artificial Intelligence A Review and Case Study On Model-Agnostic Methods
Race The racial background of the pa- String White, Black, Other
tient
T Stage Indicates the size and extent of the String T1, T2, T3, T4
primary tumor
N Stage Indicates whether the cancer has String N1, N2, N3 Fig. 1. Description of the instance used for local explanation
spread to nearby (regional) lymph
nodes
6th Stage Refers to the breast cancer staging String IIA, IIIA, IIIC, IIB, IIIB
system as defined by the AJCC a accuracy of our trained model reached 91 %. Subsequently, we
in its 6th edition
employed following XAI techniques to interpret the decisions
A Stage Indicates the extent of cancer String Regional, Distant
spread of our black-box model: Global Surrogate model, PDP, PFI,
Differentiate Refers to how differentiated the String Poorly differentiated, Moder- ICE, LIME, and SHAP. For evaluating local XAI techniques,
tumor cells are ately differentiated, Well dif-
ferentiated, Undifferentiated we considered the patient described in [Fig 1] and classified
Grade Tumor grade is an indication of String 1, 2, 3, anaplastic; Grade IV as ”Alive” by the model. This instance will be called A in the
how aggressive a tumor is and how
it might behave remainder of this paper.
Tumor size Indicates the size of the primary Integer
tumor B. Global Model-Agnostic Methods
Estrogen Status Indicates whether the breast cancer
cells have receptors for the hor-
String Positive, Negative 1) Permutation Feature Importance
mone estrogen
Progesterone Status Indicates whether the breast cancer String Positive, Negative
a) Method Overview
cells have receptors for the hor-
mone progesterone
Permutation Feature Importance (PFI) [15] is a widely
Regional Node Ex- Number of nearby lymph nodes Integer
used XAI technique. While it shares similarities with the
amined that were removed and examined
under a microscope
PDP approach, PFI serves a distinct purpose. Specifically,
Reginol Node Posi- Represents the number of exam- Integer
PFI focuses on gauging the impact of individual features on
tive ined lymph nodes that were found
to have cancer
the black-box model performance. This impact is assessed
Survival Months Number of months the patient sur- Integer
by measuring the change in the model’s performance when
vived post-diagnosis the values of a feature are randomly shuffled. A significant
Status Patient’s current status String Alive, Dead increase in the model error implies higher importance of
a The American Joint Committee on Cancer. that feature. Consequently, the method yields a ranked list of
features based on their importance. The PFI algorithm can be
summarized in the following steps:
widely utilized in XAI [19] for both post-hoc and intrinsic
• Train the model and measure its performance.
explanations, especially in interpreting neural network models,
• For each feature:
dating back to 1997 [1] [20].
– Shuffle its values in the test dataset.
III. C ASE STUDY – Measure the model performance for the new dataset.
In this section, we provide a comprehensive review of – Calculate the feature importance based on the differ-
six XAI techniques, illustrated through a case study. These ence in performance (accuracy or r-squared) before
techniques are used to explain a random forest model trained and after permutation.
on a breast cancer dataset. • Rank the features by their computed importance.
Nevertheless, PFI can yield biased outcomes when dealing
A. Dataset with correlated features. Such scenarios may lead the model to
For our case study, we employed a dataset pertaining to underestimate the significance of certain features, even though
breast cancer. This dataset was sourced from the November they might play a crucial role [17] [22].
2017 update of the SEER (Surveillance, Epidemiology, and b) Method Application
End Results) Program of the National Cancer Institute (NCI). When we first applied PFI to our case study, we obtained
Specifically, the dataset centers on female patients diagnosed the feature importance plots shown in [Fig 2]. We can notice
with infiltrating duct and lobular carcinoma breast cancer, two that some features have negative values. These negative values
prevalent types of breast cancer, between 2006 and 2010 [21]. indicate that the respective features not only lack importance
The dataset comprises 4024 entries across 16 columns. A but also introduce noise into the model [22]. Indeed, after
detailed description of these columns is provided in Table 1. eliminating these features, the model’s accuracy improved.
We utilized the Random Forest algorithm to classify patients After eliminating these features and re-executing the PFI,
as either ”Alive” or ”Dead” based on their medical data. The the features ranking changed as depicted in [Fig 3]. The
Fig. 2. Feature importance plot showing the ranking of features based on their
computed importance. Noting that there are features with negative values.
Fig. 4. PDP applied to each feature, sorted by curve variation, and grouped in
this figure. The y-axis displays the prediction average. Curve variance reflects
the dependence between the prediction and the feature values. Almost constant
curves indicate a non-significant feature importance.
R2 = 1 − i=1
n
P
(yi −yi )2
i=1
where n is the number of instances, yi′ prediction given by
the surrogate model, yi prediction of the black-box model
and yi the mean of the back-box model predictions.
The R-squared value should ideally approach 1 to effectively
consider the interpretable model as a surrogate for the Fig. 5. ICE plot illustrating the individual impact of the ”Progesterone
Status” feature on model predictions. While the feature generally has limited
black-box model. However, it is crucial to remember that influence, pronounced effects are observed for specific instances.
such interpretation serve as approximation and may not
necessarily capture the exact logic used by the black-box
model. model’s prediction for individual instances.
This technique has been widely employed in explaining Just like PDP, the ICE method can sometimes yield mislead-
various complex machine learning models, including neural ing interpretations when the feature under consideration is
networks [14] [24] and random forests [18]. Notably, decision correlated with others. Moreover, in large datasets, ICE can
trees are often favored, as they provide comprehensible rules produce overlapping curves, potentially complicating visual
that explain the decisions made by the original model. interpretation. Nevertheless, ICE excels in discovering the
heterogeneous responses of individual instances to the feature
b) Method Application of interest, a detail that might be hidden when focusing on the
In the context of our case study, we employed the decision average provided by PDP.
tree as a surrogate model to provide insights into our Random b) Method Application
Forest classifier. The divergence between of the two models
From the results obtained using the PDP and PFI methods,
is 0.55, indicating a less than optimal alignment. It is worth
we observed a discrepancy in the interpretation of the ”Pro-
noting that the decision tree was configured with 6 layers.
gesterone Status” feature. While PFI considered it important,
Increasing the depth results in a completely dissimilar model.
PDP did not. To further analyze the impact of this feature at
However, despite the divergence between the two models, we
an individual instance level, we referred to the ICE plot, as
utilized the surrogate model to explain the decision related to
illustrated in [Fig 5]. It’s evident that, in general, the feature
the instance A. After ensuring a match in predictions between
doesn’t significantly influence predictions. However, there are
the Random Forest and the decision tree for the instance, the
specific instances where its positive or negative impact is
rule extracted from the decision tree is :
notably pronounced. We also observed the issue of overlapping
SurvivalM onths > 47.50
curves, which complicates the interpretation of the plot.
and SurvivalM onths <= 82.50
and ReginolN odeP ositive <= 8.50
and Grade <= 0.82 2) LIME
and Grade > 0.69 and Age > 30.50
a) Method Overview
C. Local Model-Agnostic Methods
LIME, standing for Local Interpretable Model-agnostic Ex-
1) Individual Conditional Expectation planations [11], aims to approximate a black-box model locally
a) Method Overview using a more interpretable model, often a linear one. While
The Individual Conditional Expectation (ICE) plot [25] is it shares similarities with the concept of a global surrogate,
a variant of the Partial Dependence Plot (PDP) that aims to LIME focuses on emulating the decisions of the complex
examine the relationship between specific features and their model at a local level. The process to explain the decision
respective model predictions. Although ICE fundamentally related to a specific instance x involves the following steps:
operates on a local level by evaluating individual instances, • Perturb the features of x to generate new instances Z
it offers a global view on a feature importance. Contrary around x.
to PDP which averages the predictions for each instance, • For each generated instance z, assign a weight π based
rendering a singular point for that instance, ICE retains and on its proximity to x. This weight acts as a measure to
visualizes these predictions across the range of a feature’s filter out noise; the closer z is to x, the greater the weight
values, rendering a distinctive curve for each instance. This π is.
distinction allows for a more granular understanding of how • Use the original black-box model to make predictions on
altering a specific attribute for a given feature might affect the the newly generated instances in Z, denoted as f (z).
Fig. 7. Feature importance for instance A using TreeSHAP. The plot displays
individual feature contributions towards the prediction. Features pushing the
prediction towards ”survival” are shown in red, while features in blue suggest
a contrary effect. The magnitude of each bar represents the strength of the
feature’s influence.
Fig. 6. LIME interpretation for instance A. The figure depicts the contri-
butions of individual features to the predicted outcome. Features in orange • Construct x′ such that features in S remain unchanged
positively influence the prediction towards 1, signifying ”Alive.” Conversely,
the ”Grade” feature, shown in blue, negatively impacts the prediction. from x, while features not in S and not ai are marginal-
ized (typically using an average or expected value).
• Calculate the prediction of x′ denoted fx (S) (The pre-
• Train an interpretable model g, using the perturbed diction of x without ai ).
instances, their assigned weights, and the predictions • Construct x′′ such that features in S and ai remain un-
obtained from the black-box model. The goal is for changed from x, while features not in S are marginalized.
the interpretable model to closely match the predictions • Compute the prediction of x′′ denoted fx (S ∪ i) (The
provided by fP by minimizing the following loss function: prediction of x with ai ).
L(f, g, π) = πx (z)(f (z) − g(z))2 • Calculate the difference between fx (S) and fx (S ∪i) (the
z∈Z contribution of ai when added to S).
• Explain the decision related to x using the interpretable
• Obtain the Shapley value for ai by computing the average
model. The coefficients of this model (in the case of
of all differencesP(over all coalitions S) weighted as
linear model) represent the impact of the features on the |S|!(M −|S|−1)!
follows: ϕi = [fx (S ∪ i) − fx (S)]
decision. S⊆N \{i}
M!
b) Method Application
The SHAP method includes three specialized variants:
In our case study, the explanation provided by LIME for
KernelSHAP: A model-agnostic approach optimized by
the instance A is depicted in [Fig 6]. It suggests a 91%
considering a subset of coalitions rather than the exhaustive
probability of survival. This high probability is attributed to
set.
the positive impact of the following features, in order: Survival
TreeSHAP: Specifically tailored for tree-based models, it
Months, Regional Node Positive, Regional Node Examined,
offers optimizations that capitalize on the structural properties
Age, Marital Status and Race. Conversely, the Grade feature
of trees.
has a slight negative impact on the prediction.
DeepSHAP: Designed for deep learning models, it combines
the Shapley value concept with the DeepLIFT method [26].
3) SHAP A prominent challenge with the Shapley value is its
exponential computation time. For a dataset with M features,
a) Method Overview
the technique necessitates the examination of 2M coalitions.
SHAP (SHapley Additive exPlanations), introduced by
Consequently, several approximation strategies have been
Lundberg [26], provides local explanations by quantifying the
proposed [28]. Furthermore, it’s notable that the SHAP
contribution of each feature to a model’s decision for a given
technique is not exclusively local; it also provides global
instance. Inspired by the Shapley value [27] from game theory,
explanations by averaging feature importance across the entire
SHAP considers each feature as a player and the increase
dataset [29].
in prediction (i.e., the difference between a given prediction
and the average prediction) as the payout. The Shapley value
evaluates the impact of a feature by assessing its contribution b) Method Application
across all possible subsets of features, known as coalitions.
In our case study, we utilized the SHAP method, specifically
Specifically, it computes how the prediction changes when a
TreeSHAP, to investigate the contributions of each feature to
feature is known versus when it’s unknown (approximated by
the model decision related to the instance A. The results from
a random or average value). Here are the main steps of the
this technique are depicted in [Fig 7]. Similar to the LIME
standard Shapley value:
technique, SHAP reveals features that contribute positively to
Given:
the ”survival” prediction of instance A. These features include
• f the prediction function of the black-box model Survival Months, Regional Node Positive, Regional Node
• N= {a1 , a2 , .., aM } the model’s features Examined, Marital Status, and Age, which offset the negative
• x the instance to be explained impact (Status=0) from the Grade feature. The TreeSHAP
For each feature ai : gives us also a global view of the features importance related
For each coalition S of N not containing ai : to our black-box model, as shown in [Fig 8].
agnostic approaches. We enriched our review with a case
study experiments, using a healthcare dataset, which was
classified using a Random Forest model. Our experimental
results highlighted the benefits of the six evaluated techniques
in revealing different interpretability facets of the black-box
model in question. Furthermore, our findings also indicated
potential disparities in the outcomes of these techniques,
particularly regarding less impactful features, in both local
and global explanation contexts. Such divergence underscores
the importance of aligning XAI techniques that target to the
Fig. 8. Global Features importance using SHAP same explanatory dimension to avoid misunderstandings and
confusion among users.
In our future work, we aim to delve deeper into model-agnostic
D. Result Discussion XAI techniques, with a particular focus on rule-based method-
In this study, we employed six XAI methods to interpret ologies as well as abductive and fuzzy logic. Additionally,
our Random Forest model. The techniques outcomes generally we aspire to collaborate with experts and practitioners in the
converge. The main conclusions are: healthcare domain to develop useful and interpretable models.
• Across the six techniques, both globally and locally,
the most significant features for predicting the status R EFERENCES
of a breast cancer patient are: Survival Months and
Regional Node Positive. Conversely, the least significant [1] Benı́tez, J. M., Castro, J. L., Requena, I. (1997). Are artificial neural
networks black boxes?. IEEE Transactions on neural networks, 8(5),
are: Estrogen Status and A Stage. 1156-1164.
• For a global explanation, based on PDP, PFI, and SHAP [2] Shortliffe, E. H., Davis, R., Axline, S. G., Buchanan, B. G., Green, C.
Global analyses, the most critical features for predicting C., and Cohen, S. N. (1975). Computer-based consultations in clinical
therapeutics: explanation and rule acquisition capabilities of the MYCIN
the status of a breast cancer patient are: Survival Months, system. Computers and biomedical research, 8(4), 303-320.
Regional Node Positive, Grade, and Age. Discrepancies [3] Shortliffe, E. H. (1977, October). Mycin: A knowledge-based computer
are evident for the other features. program applied to infectious diseases. In Proceedings of the Annual
Symposium on Computer Application in Medical Care (p. 66). American
• For local explanation, the outcomes of LIME and SHAP Medical Informatics Association.
largely aligned. [4] Feigenbaum, E. A., and Buchanan, B. C. (1971). J. Lederberg,” On
• Though the ICE technique is a local method, it provides Generality and Problem Solving: a Case Study Using the DENDRAL
Program,” Machine Intelligence 5.
a global view of the model. [5] Feigenbaum, E. A. (1979). Themes and case studies of knowledge
• The divergence between our model and the surrogate engineering. Expert systems in the micro-electronic age, 3-25.
model was measured at 0.55, suggesting that the surrogate [6] Confalonieri, R., Coba, L., Wagner, B., and Besold, T. R. (2021). A
historical perspective of explainable Artificial Intelligence. Wiley Inter-
might not accurately reflect the original model. On a local disciplinary Reviews: Data Mining and Knowledge Discovery, 11(1),
level, though, the surrogate model’s explanations were e1391.
generally consistent with those provided by both SHAP [7] DARPA XAI Literature Review
and LIME. [8] Ginsberg, M. L. (1986). Counterfactuals. Artificial intelligence, 30(1),
35-79.
In Table 2, we summarized the used XAI techniques. We [9] Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K., Müller, K. R.
also added the execution time and stability for each technique (Eds.). (2019). Explainable AI: interpreting, explaining and visualizing
deep learning (Vol. 11700). Springer Nature.
when repeatedly applied to the same dataset. Notably, random
[10] Meske, C., Bunde, E., Schneider, J., Gersch, M. (2022). Explainable
data perturbations lead to minor instabilities in both PFI and artificial intelligence: objectives, stakeholders, and future research op-
LIME, especially for less significant features. Furthermore, portunities. Information Systems Management, 39(1), 53-63.
PFI’s outcomes are sensitive to hyperparameter adjustments. [11] Ribeiro, M. T., Singh, S., and Guestrin, C. (2016, August). ” Why
should i trust you?” Explaining the predictions of any classifier. In
For instance, changing the number of permutations per feature Proceedings of the 22nd ACM SIGKDD international conference on
from the default 30 to 40, causes a change in the order of knowledge discovery and data mining (pp. 1135-1144).
”age” and ”grade”. Although XAI techniques are designed [12] Apley, D. W., Zhu, J. (2020). Visualizing the effects of predictor
variables in black box supervised learning models. Journal of the Royal
to boost the user confidence in black-box models, variations Statistical Society Series B: Statistical Methodology, 82(4), 1059-1086.
in their explanations may introduce confusion about both [13] Friedman, J. H. (2001). Greedy function approximation: a gradient
the model and the explanation technique. This highlights the boosting machine. Annals of statistics, 1189-1232.
[14] Craven, M., and Shavlik, J. (1995). Extracting tree-structured represen-
need for a balance in XAI methods, ensuring both stability tations of trained networks. Advances in neural information processing
and convergence in explanations, and also calls for efficient systems, 8.
evaluation metrics for XAI. [15] Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
[16] Stepin, I., Alonso, J. M., Catala, A., Pereira-Fariña, M. (2021). A survey
IV. C ONCLUSION of contrastive and counterfactual explanation generation methods for
explainable artificial intelligence. IEEE Access, 9, 11974-12001.
In this paper, we have conducted a comprehensive review [17] Molnar, C. (2023). Interpretable machine learning. A Guide for Making
of XAI techniques, with a particular emphasis on model- Black Box Models Explainable. Second Edition
TABLE II
XAI TECHNIQUES SUMMARY
[18] Kim, B., Khanna, R., Koyejo, O. O. (2016). Examples are not enough, derstanding with explainable AI for trees. Nature machine intelligence,
learn to criticize! criticism for interpretability. Advances in neural 2(1), 56-67.
information processing systems, 29.
[19] Alonso, J. M., Castiello, C., Mencar, C. (2018, May). A bibliometric
analysis of the explainable artificial intelligence research field. In In-
ternational conference on information processing and management of
uncertainty in knowledge-based systems (pp. 3-15). Cham: Springer
International Publishing.
[20] Castro, J. L., Mantas, C. J., Benı́tez, J. M. (2002). Interpretation of
artificial neural networks by means of fuzzy rules. IEEE Transactions
on Neural Networks, 13(1), 101-116.
[21] https://seer.cancer.gov/data/
[22] https://scikit-learn.org/stable/modules/permutation importance.html
[23] Greenwell, B. M., Boehmke, B. C., McCarthy, A. J. (2018). A simple
and effective model-based variable importance measure. arXiv preprint
arXiv:1805.04755.
[24] Zhou, Z.H., Jiang, Y.: Medical diagnosis with c4. 5 rule preceded by
artificial neural network ensemble. IEEE Transactions on information
Technology in Biomedicine 7(1), 37-42 (2003)
[25] Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2015). Peeking
inside the black box: Visualizing statistical learning with plots of indi-
vidual conditional expectation. journal of Computational and Graphical
Statistics, 24(1), 44-65.
[26] Lundberg, S. M., and Lee, S. I. (2017). A unified approach to interpreting
model predictions. Advances in neural information processing systems,
30.
[27] Shapley, Lloyd S. “A value for n-person games.” Contributions to the
Theory of Games 2.28 (1953): 307-317
[28] Štrumbelj, E., and Kononenko, I. (2014). Explaining prediction models
and individual predictions with feature contributions. Knowledge and
information systems, 41, 647-665.
[29] Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M.,
Nair, B., ... and Lee, S. I. (2020). From local explanations to global un-