0% found this document useful (0 votes)
20 views9 pages

A Theoretical Framework For AI Models

Uploaded by

david teacher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views9 pages

A Theoretical Framework For AI Models

Uploaded by

david teacher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

A Theoretical Framework for AI Models

Explainability with Application in Biomedicine


Matteo Rizzo Alberto Veneri Andrea Albarelli
Department of Computer Science Department of Computer Science Department of Computer Science
Ca’ Foscari University Ca’ Foscari University & ISTI-CNR Ca’ Foscari University
Venice, Italy Venice, Italy Venice, Italy
[email protected] [email protected] [email protected]

Claudio Lucchese Marco Nobile Cristina Conati


Department of Computer Science Department of Computer Science Department of Computer Science
arXiv:2212.14447v4 [cs.AI] 14 Jun 2023

Ca’ Foscari University Ca’ Foscari University University of British Columbia


Venice, Italy Venice, Italy Vancouver, Canada
[email protected] [email protected] [email protected]

Abstract—EXplainable Artificial Intelligence (XAI) is a vibrant stance, in the biomedical field, where human lives are at stake,
research topic in the artificial intelligence community. It is raising and understanding the reasoning behind a model’s predictions
growing interest across methods and domains, especially those is essential to guarantee safety and avoid costly errors [2].
involving high stake decision-making, such as the biomedical
sector. Much has been written about the subject, yet XAI still Furthermore, the ability to explain how a model arrived at
lacks shared terminology and a framework capable of providing a certain conclusion can increase the understanding of the
structural soundness to explanations. In our work, we address underlying biological mechanisms, enabling more informed
these issues by proposing a novel definition of explanation that support to decision-making for clinicians and researchers.
synthesizes what can be found in the literature. We recognize With the first recent attempts of the legislative machinery to
that explanations are not atomic but the combination of evidence
stemming from the model and its input-output mapping, and make explanations for automatic decisions a user’s right [3],
the human interpretation of this evidence. Furthermore, we fit the pressure on generating explanations for the ML model’s
explanations into the properties of faithfulness (i.e., the expla- behaviors raised even more. Despite the endeavor of the XAI
nation is an accurate description of the model’s inner workings community to develop either models that are explainable by
and decision-making process) and plausibility (i.e., how much design [4]–[6] and methods to explain existing black-box
the explanation seems convincing to the user). Our theoretical
framework simplifies how these properties are operationalized, models [7]–[9], the way to DL explainability is paved with
and it provides new insights into common explanation methods results that are mostly preliminary and anecdotal in nature
that we analyze as case studies. We also discuss the impact (e.g., [10]–[12]). Most notably, it is hard to relate different
that our framework could have in biomedicine, a very sensitive pieces of research due to a lack of common theoretical grounds
application domain where XAI can have a central role in capable of supporting and guiding the discussion. In particular,
generating trust.
Index Terms—explainability, machine learning, biomedicine we detect a gap in the literature on foundational issues such
as a shared definition of the term “explanation” and the
users’ role in the design and deployment of explainability for
I. I NTRODUCTION
complex ML models. The XAI community suffers from the
The advent of Deep-Learning (DL) allowed for raising the paucity of common terminology, with only a few attempts
accuracy bar of Machine Learning (ML) models for countless of establishing one, focusing more on the distinction among
tasks and domains. Riding the wave of enthusiasm around such the terms “interpretable”, “explainable”, and “transparent”
stunning results, DL models have been deployed even in high- rather than the inner structure and meaning of an explanation
stake decision-making environments, not without criticism [1]. (e.g., [13]–[15]). Similarly, a lack of an outline of the main
These kinds of environments require not only high predictive theoretical components of the discussion around explainability
accuracy but also an explanation of why that prediction disperses research, while the current literature finds it hard to
was made. The need for explanations initiated the discussion provide the involved stakeholders with principled analytical
around the explainability of DL models, which are known to tools to operate on black-box models. This trend has been
be “black boxes”. In other words, their inner workings are hard detected in the delicate field of biomedicine and addressed
for humans to be understood. Who should be accountable for with context-specific guidelines [16]. In this work, we propose
a model-based decision and how a model came to a certain a simple, general, and effective theoretical framework that
prediction are just some of the questions that drive research outlines the core components of the explainability machinery
on explaining ML models. This is particularly relevant, for in- and lays out grounds for a more coherent debate on how
to explain the decisions of ML models. Such a framework human-understandable clarifications of its decision-making
is not meant to be set in stone but rather to be used as a process. The definition of explanation is thus crucial and will
common reference among researchers and iteratively improved be discussed extensively in section IV. Our claim follows
to fit more and more sophisticated explainability methods two rationales: (i) the term interpretation is used within our
and strategies. We hope to provide shared jargon and formal proposed framework with a precise meaning that deviates from
definitions to inform and standardize the discussion around the current literature and that we deem more accurate (see
crucial topics of XAI. The core of the proposed theoretical Section IV-C); (ii) we argue against grouping models into
framework is a novel definition of explanation, that draws inherently interpretable and post-hoc explainable. Recently,
from existing literature in sociology and philosophy but, at Molnar has defined “intrinsic interpretability” as a property
the same time, is easy to operationalize when analyzing of ML models that are considered fully understandable due to
a specific approach to explain the predictions made by a their simple structure (e.g., short decision trees or sparse linear
model. We conceive an explanation as the interaction of two models), while “post hoc explainability” as the need for some
decoupled components, namely evidence and its interpretation. models to apply interpretation methods after training [22].
Evidence is any sort of information stemming from a ML Although principled, we drop this hard distinction by claiming
model, while an interpretation is some semantic meaning that that all models embed a certain degree of explainability.
human stakeholders attribute to the evidence to make sense Even though, to the best of our knowledge, no metrics can
of the model’s inner workings. We relate these definitions quantify explainability yet, we can assert that this depends
to crucial properties of explanations, especially faithfulness on multiple factors. In particular, a model is as explainable
and plausibility. Jacovi & Goldberg define faithfulness as “the as the explanations that are proposed to the user to justify a
accurate representation of the causal chain of decision-making certain prediction are effective. Thus, bringing the human into
processes in a model” [17]. We argue that faithfulness relates the explainability design loop is key to deploying models that
in different ways to the elements of the proposed theoretical are actually explainable. Consequently, there are models for
framework because it assures the interpretation of the evidence which it is easier to design explanations (i.e., the so-called
is true to how the model actually uses it within its inner white-box models, e.g., linear regression, decision trees, rule-
reasoning. A property orthogonal to faithfulness is plausibility, based systems, etc.) and models for which the same process
namely “the degree to which some explanation is aligned with is more difficult (i.e., the so-called black-box models, e.g.,
the user’s understanding of the model’s decision process” [17]. artificial neural networks). The notion of difficulty here is
A follow-up work by Jacovi & Goldberg addresses plausibility defined by the inner complexity of the model, which relates to
as the “property of an explanation of being convincing towards the amount of cognitive load the user can sustain. We highlight
the model prediction, regardless of whether the model was that the degree of explainability moves along a gradient from
correct or whether the interpretation is faithful” [18]. We relate black-box to white-box models, without clear-cut thresholds.
plausibility to faithfulness and highlight a need for faithfulness Nevertheless, in section VI, we show that explanations for both
to be embedded in explainability methods and strategies, as white-box and black-box models fit our proposed framework.
well as plausibility as an important (yet not indispensable) Thus they both can be structured homogeneously and more
property of the same. This is particularly true in the biomedical deeply understood by leveraging theoretical tools. Most impor-
field as a high stake environment, where it is crucial for tantly, we advocate for explainability design as a crucial part of
the explanation to portray the decision-making process of Artificial Intelligence (AI) software development. We endorse
the model accurately. As case studies, we zoom in on the Chazette et al., in claiming that explainability should be
evaluation of faithfulness of some popular DL explanation considered a non-functional requirement in the software design
tools and strategies, such as “attention” [9], [19], Gradient- process [23]. Thus explanations for any ML models (and,
weighted Class Activation Mapping (Grad-CAM) [20] and especially, for DL models) should be accounted for within the
SHapley Additive exPlanations (SHAP) [8]. In addition, we initial design of an AI-powered application. Even the most
look at the faithfulness of models traditionally considered accurate black-box model should not be deployed without an
intrinsically interpretable (a notion with distance ourselves to) explanation mechanism backing it up, as we cannot be sure
such as a linear regressors and models based on fuzzy logic. whether it learned to discriminate over meaningful or wrong
features. A classic example is a dog image classification model
II. D ESIGNING E XPLAINABILITY learning to detect huskies because of the snowy setting instead
Research in XAI seizes the problem of explaining models of the features of the animal itself, involuntarily deceiving
for decision-making from multiple perspectives. First of all, the users [7]. A design-oriented approach to AI development
we observe that most of the existing literature uses the terms should involve taking humans into the loop, thus fostering a
“interpretable” and “explainable” interchangeably, while some human-centered AI which is more intelligible by design and
have highlighted the semantic nuance that distinguishes the is expected to increase trust in the end-users [24].
two words [21]. We argue that the term explainable (and, by
extension explainability) is more suited than the term inter-
pretable (similarly, by extension, interpretability) to describe
the property of a model for which effort is made to generate
III. C HARACTERISING THE I NFERENCE P ROCESS OF A
M ACHINE L EARNING M ODEL
In this section, we provide a formal characterization of
the inference process of a general ML model, without any
constraint on the task. Such a characterization will be used
to introduce the terminology which substantiates the main
components of our proposed framework of explainability,
whose details are provided in section IV. To this end, we
define a ML model M as an arbitrarily complex function
mapping a model input to a model conclusion through a
sequential composition of transformation steps. The whole Fig. 1: Example of transformation functions for two steps si .
characterization is exemplified in Figure 1.
A. Elements of the Characterization si the chosen function fˆi,mi can map different intermediate
Model Input. The model input consists of a set of features, transformations xi−1,j of the feature set at the previous
either coming from an observation or synthetically generated. transformation step into the same transformation xi,z one step
Model Conclusion. The model conclusion is the final output further in the chain. This means that the same outcome in the
of the model, which is the outcome of the last link in the chain transformation chain, be it intermediate or conclusive, can be
of transformations over the model input. achieved through different rationales, and it could be difficult
Transformation steps. Overall, the decision-making process for a human user to understand which of them is the one the
of M can be represented as a chain of N > 0 transformations model has actually learned. This can be a result of a high
of the original model input, that are causally related. This dimensionality of the set of transformation functions, as well
causal chain is enforced by model design (e.g., the sequence as a high complexity of the transformed feature set.
of layers in the architecture of a neural network or the depth For example, pictures of zebras and salmon can be discrim-
of a decision tree). We call each stage of this causal chain inated on the basis of either their anatomy (i.e., zebras have
a “transformation step”, and we denote it with si , for i ∈ stripes, salmon have gills) or the environment/habitat (i.e.,
[1, N ]. The transformation steps advance the computation from zebras live in savannas and salmon in rivers). If we consider
the model input to the model output through transformation a relatively complex model such as a Convolutional Neural
functions. Network (CNN), where a transformation step coincides with a
Transformation functions. Each transformation step si re- layer within the network architecture, it is generally difficult to
lates to a set of ni “transformation functions” fi,mi , where understand which kind of transformation fi,mi this represents,
mi ∈ [1, ni ] indicates one of the possible learnable functions if any that is human-understandable. Thus, how do we make
at si . Note that, in general, the number of such functions sense of which of the ni possible alternative mappings of
would be infinite, but we discretize it assuming that we are xi−1,j led to xi,z ? This remains an open question, with major
working on a real scenario using some computational machine. implications for the discussion around faithfulness, which we
The transformation functions are mappings from a feature set will enlarge in the next section.
xi−1,j to a feature set xi,z , with j ∈ [1, ki−1 ], z ∈ [1, ki ] (i.e.,
IV. D EFINING EXPLANATIONS
the arrows enclosed in the ellipses in Figure 1). The number
ki denotes the cardinality of the set of all possible feature sets Recent work on ML interpretability produced multiple
generated by all possible learnable transformation functions at definitions for the term “explanation”. According to Lipton,
step si . These transformation functions are generally opaque “explanation refers to numerous ways of exchanging infor-
to the user in the context of the so-called black-box models. mation about a phenomenon, in this case, the functionality
At every step in the chain of transformation steps, the model of a model or the rationale and criteria for a decision, to
learns one of the possible transformation functions (i.e., the op- different stakeholders” [25]. Similarly, for Guidotti et al. “an
timal function according to some learning scheme, highlighted explanation is an “interface” between humans and a decision-
with a solid line in Figure 1). That is, the model learns the maker that is at the same time both an accurate proxy of
function fˆi,mi such that fˆ = fˆN,mN ◦ . . . ◦ fˆi,mi ◦ . . . ◦ fˆ1,m1 is the decision-maker and comprehensible to humans” [26].
the overall approximation of the true mapping from the model Murdoch et al. add to how the explanation is delivered to
input to the model conclusion. According to the notation the user stating that “an explanation is some relevant knowl-
above, we denote the model input as x0,0 (or simply x) and edge extracted from a machine-learning model concerning
the model conclusion as ŷN,j , with j ∈ [1, kN ] (or simply ŷ). relationships either contained in data or learned by the model.
[...] They can be produced in formats such as visualizations,
B. Observations natural language, or mathematical equations, depending on the
We asserted that, at each transformation step si , the model context and audience” [27]. On a more general note, Mueller
picks one function fˆi,mi among ni such that fˆi,mi (xi−1 , j) = et al. state that “the property of “being an explanation” is
xi,z . This raises issues that increase model opacity. At step not a property of the text, statements, narratives, diagrams,
or other forms of material. It is an interaction of (i) the the user (e.g., linear regression, fuzzy rule-based systems) the
offered explanation, (ii) the learner’s knowledge and beliefs, extraction of evidence is straightforward since all components
(iii) the context or situation and its immediate demands, and of the model directly present a piece of semantic information
(iv) the learner’s goals or purposes in that context” [28]. in a human-comprehensible format.
Finally, Miller tackles the challenge of defining explanations 2) Explanatory Potential: We define explanatory potential
from a sociological perspective. The author highlights a wide (ϵ(e)) of some evidence as the extent to which the evidence
taxonomy of explanations but focuses on those which are an influences the causal chain of transformations steps of a model.
answer to a “why-question” [29]. Intuitively, the explanatory potential indicates “how much” of
The definitions mentioned above offer a well-rounded per- a model the selected type of evidence can explain. It can be
spective on what constitutes an explanation. However, they computed either by counting how many transformation steps
fail to highlight its atomic components and to characterize are impacted by the evidence (i.e., breadth), or how much of
their relationships. We synthesize our proposed definition of each single transformation step is impacted by the evidence
explanation based on complementary aspects of the existing (i.e., depth).
definitions. The result is a concise definition that is easy to op-
erationalize for supporting the analysis of multiple approaches C. Interpretation
to explainability. Our full proposed framework is reported in An interpretation is a function g associating semantic mean-
the scheme in Figure 2, whose components will be discussed ing to some evidence and mapping its instances into explana-
in the following sections. tions either for a given prediction or the whole model. Then
an explanation can be defined as either E = g(e, x, ŷ, M ), or
A. Explanation E = g(e, M ), respectively.
Given a model M which takes an input x and returns a 1) Local vs. Global Interpretations: In accordance with the
prediction ŷ, we define explanation the output of an interpre- existing literature, we relate “evidence” and “interpretation”
tation function applied to some evidence, providing the answer to the concepts of locality and globality. Both evidence and
to a “why question” posed by the user. interpretations can either be local or global. Local evidence
(e.g., attention weights, gradient, etc.) relates relevant model
B. Evidence information to a particular model input x and correspond-
Evidence (e) is whatever kind of objective information ing prediction ŷ. Global evidence (e.g. full set of model
stemming from the model we wish to provide an explanation parameters) is generally independent of specific inputs and
for and that can reveal insights into its inner workings and might explain higher level functioning (providing deeper or
rationale for prediction (e.g., attention weights, model param- wider info) of the model or some of its sub-components.
eters, gradients, etc.). Similarly, interpretations can provide either a local or a global
1) Evidence Extractor: An evidence extractor (ξ) is a semantic of the evidence. A local interpretation of attention
method fetching some relevant information about either M , x, could be, e.g., “attention weights are descriptive of input
ŷ, or a combination of the three. Then: e = ξ(x, ŷ, M ). Ex- components’ importance to model output”. On the other hand,
amples of evidence extractors are, e.g., encoder plus attention a global interpretation of the same evidence may aggregate
layers, gradient back-propagation, and random tree approxima- all the attention weights’ heatmaps for a whole dataset and
tion, with the corresponding extracted evidence being attention highlight specific patterns. For example, in a dog vs. cat
weights, gradient values, and random tree mimicking the classification problem, a global interpretation of attention may
original model. In the peculiar case of a white-box approach, be represented by clusters of similar parts of the animal’s body
that is, ML models designed to be easily interpretable by (e.g., groups of ears, tails, etc.) highlighted by the attention
activations.
2) Generating Interpretations: Given some evidence in-
volved in one or more steps si of M , we guess how this
evidence is involved in the opaque input-to-output transfor-
mations by formulating an interpretation g of some extent of
the decision-making process of the model. At a low level, we
generate a candidate g that encapsulates the approximations

fi,m i
˜ fˆi,mi of the behavior of certain functions learned by
=
M at some steps si . On an abstract level, interpretations
can be seen as hypotheses about the role of evidence in the
explanation-generation process. Like a good experimental hy-
pothesis, a good interpretation satisfies two core properties: (i)
is testable, and (ii) clearly defines dependent and independent
variables. Interpretations can be formulated using different
forms of reasoning (e.g., deductive, inductive, abductive, etc.).
Fig. 2: Overview of the theoretical framework of explainability. In particular, the survey on explanations and social sciences
by Miller reports that people usually make assumptions (i.e.,
in our context, choose an interpretation) via social attribution
of intent (to the evidence) [29]. Social attribution is concerned
with how people attribute or explain the behavior of others,
and not with the real causes of the behavior. Social attribution
is generally expressed through folk psychology, which is
the attribution of intentional behavior using everyday terms
such as beliefs, desires, intentions, emotions, and personality
traits. Such concepts may not truly be the cause of the
described behavior but are indeed those humans leverage to Fig. 3: Overview of the outcome on the user of the interaction
between faithfulness and plausibility.
model and predict each others’ behaviors. This may generate
misalignment between a hypothesized interpretation of some
evidence and its actual role within the inference process of the has no inherent meaning, its semantics is defined by some
model. In other words, reasoning on evidence through folk interpretation that may or may not involve social attribution
psychology might generate interpretations that are plausible of intent to the causal chain of inference processes.
but not necessarily faithful to the inference process of the
model (such terms will be further explored in section V). A. Faithfulness
Given an interpretation function g describing some trans-
D. Explanation Interface formation steps si within a model M ’s inference process, we
Explanations are meant to be delivered to some target users. want to be able to prove that g is faithful (at least to some
We define eXplanation User Interface (XUI) as the format in extent) to the actual transformations made by M to an input
which some explanation is presented to the end user. This x to get a prediction ŷ. Namely, we define the property of
could be, for example, in the form of text, plots, infographics, faithfulness of an interpretation ϕi (g, e), as the extent to which
etc. We argue that an XUI is characterized by three main an interpretation g accurately describes the behavior of some
properties: (i) human understandability, (ii) informativeness, transformation functions fi,mi that some model learned to map
and (iii) completeness. The human-understandability is the an output xi−1,j at si−1 into xi,z at si making use of some
degree to which users can understand the answer to their instance evidence e. Given some evidence e and its interpre-
“why” question via the XUI. This property depends on user tation function g, we say that a related explanation is faithful
cognition, bias, expertise, goals, etc., and is influenced by to some transformation steps if the following conditions hold:
the complexity of the selected interpretation function. The (i) the evidence e has explanatory potential ϵi > 0, and (ii)
informativeness (i.e., depth) of an explanation is a measure the interpretation g has faithfulness ϕi > 0. Then we can
of the effectiveness of an XUI in answering the why question define the faithfulness of an explanation (Φ) as a function
posed by the user. That is the depth of information for some of the faithfulness of the interpretation of each step involved
si of great interest in the XUI. The completeness (i.e., width) and the related
P explanatory potential. For example, we could
of an explanation is the accuracy of an XUI in describing the define Φ = i ϵi ϕi ∀i ∈ I ⊆ [1, N ] where I is the set of
overall model’s workings, and the degree to which it allows indices of transformation steps si that involved the evidence
for anticipating predictions. That is the width in terms of the e. Thus the faithfulness of an explanation is the sum of the
number of si the XAI spans. Note that both informativeness faithfulness scores of its components, i.e., the faithfulness of
and completeness are bound by the explanatory potential of the interpretations of the evidence involved in the generation
the evidence (e.g., attention weights do not explain the full of the explanation. Besides, the related explanatory power
model, just some transformation steps, while the full set of weights the faithfulness of each interpretation, following the
model parameters does). intuition that evidence with higher ϵ should have a larger
impact on the overall faithfulness score of the interpretation.
V. C ONCERNING FAITHFULNESS AND P LAUSIBILITY We can have various measures of faithfulness that are
In Section IV-C2 we observed that social attribution is a associated with different explanation types, in the same way
double-edged sword for the interpretation generation process, as we have different metrics to evaluate the ability of a ML
as it may incur in propelling plausibility without accounting model to complete a task. Thus ϕi is implicitly bounded.
for faithfulness. This issue was highlighted by Jacovi & When designing a faithful explanatory method, we can opt
Goldberg, who introduced a property of explanations called for two approaches. We can achieve faithfulness “structurally”
aligned faithfulness [18]. In the words of the authors, an by enforcing this property on pre-selected interpretations in
explanation satisfies this property if “it is faithful and aligned model design (e.g., imposing constraints on transformation
to the social attribution of the intent behind the causal chain of steps limiting the range of learnable functions). This direction
decision-making processes”. Our proposed framework allows has been recently explored by Jain et al. [30] and Jacovi &
us to go a step forward in the characterization of this property. Goldberg [18]. An alternative, naive, strategy is trial-and-error:
We note that the property of aligned faithfulness pertains formulating interpretations and assessing their faithfulness via
only to interpretations, not evidence. The latter by itself formal proofs or requirements-based testing using proxy tasks.
While formal proofs are still missing in current literature, Interpretation. The interpretation of the evidence is a func-
a number of tests for faithfulness have been recently pro- tion gatt (e(x, ŷ)) that describes function f3,m3 , i.e., how the
posed [10]–[12], [31]. weighted encodings are decoded into the model conclusion.
Faithfulness. Note that we do not know the faithful interpre-
B. Plausibility tation function, so we hypothesize its behavior by formulating
a candidate interpretation, a process that is usually guided
The combined value of the three above-mentioned proper-
by the researcher’s intuition. In the case of attention, an
ties of XUIs (i.e., human understandability, informativeness,
interpretation generally shared among researchers is that “the
and completeness) drives the plausibility of an explanation.
value of each attention weight describes the importance of the
More specifically, we define plausibility as the degree to which
corresponding token in the original input to the model output”.
an explanation is aligned with the user’s understanding of
Unfortunately, albeit plausible, research in this field disproved
the model’s partial or overall inner workings. Plausibility is
such an interpretation of attention weights [10]–[12], leaving
a user-dependent property and as such, it is subject to the
the role of attention for explainability (if any) still unclear.
user’s knowledge, bias, etc. Unlike faithfulness, the plausi-
bility of explanations can be assessed via user studies. Note
B. Grad-CAM
that a plausible explanation is not necessarily faithful, just
like a faithful explanation is not necessarily plausible. It is A popular explanation called Grad-CAM [20] presents a
desirable for both properties to be satisfied in the design of method to explain a prediction made by an image classifier
some explanation. Interestingly, an unfaithful but plausible using the information encompassed in the back-propagated
explanation may deceive a user into believing that a model gradient of a prediction. In short, Grad-CAM uses the infor-
behaves according to a rationale when it is actually not mation about the gradient computed at the last convolutional
the case. This raises ethical concerns around the possibility layer of a CNN given a certain input x to assign a feature
that poorly designed explanations could spread inaccurate or importance score for each input feature.
false knowledge among the end-users. Figure 3 provides a Evidence. The Grad-CAM evidence-extraction ξgrad method
simplified overview of the problem. consisted of using the feature activation map of a convolutional
layer from a given input x to compute the neurons’ importance
VI. F RAMING C OMMON E XPLAINABILITY S TRATEGIES weights αi . The explanatory potential ϵ(egrad ) is related, as for
the attention mechanism, to the number of parameters analyzed
A. Attention w.r.t the total number of parameters of the method.
The introduction of attention mechanisms has been one Interpretation. Grad-CAM claim that the computed neuron’s
of the most notable breakthroughs in DL research in recent weights αi corresponds to the part of the input features that
years. Originally proposed for empowering neural machine influence the final prediction the most.
translation tasks [9], it is currently employed in many state- Faithfulness. The authors measure the faithfulness of the
of-the-art approaches for numerous cognitive tasks. The chain model using image occlusion. That is, they patched some part
of transformations in the simplest neural model making use of of the input to the model, and they measured the correlation
self-attention is a three-step causal process: (i) encoding, (ii) with the difference in the final output. With this faithfulness
weight encodings by attention scores, and (iii) decoding into metric, a high correlation means a high faithfulness in the
model output. Then we can define the function learned by the explanation.
model as the composition fˆ = f3,m3 ◦f2,m2 ◦f1,m1 , where each
fi for i ∈ [1, 3] corresponds to the respective transformation C. SHAP
function in the causal chain. Lundberg & Lee in 2017 proposed SHAP [8], a method to
Evidence. For an input x split into t sequentially related assign an importance value to each feature used by an opaque
tokens, let f1,m1 be an encoder function such that f1,m1 (x) = model M to explain a single prediction ŷ. SHAP has been
X̄ is the vector
Pt of the encoded model input tokens. Then, presented as a generalization of other well-known explanation
f2,m2 (X̄) = j=1 αj x̄j , for all model input tokens x̄j ∈ X̄, methods, such as Local Interpretable Model-agnostic Expla-
is the linear combination of the encodings weighted by their nations (LIME) [32], DeepLIFT [33], Layer-wise relevance
corresponding attention scores. Then eatt = {αj }t1 . propagation [34], and classic Shapley value estimation [8].
The SHAP values are defined as:
t
X
eatt = {αj | f2,m2 (X̄) = αj x̄j } (1) M
X
j=1 h (z ′ ) = β0 + βi zi′ (2)
i=1
That is, the evidence eatt related to a model input is the set
of weights αj produced by the attention layer. The explanatory where zi′ ∈ {0, 1}M is a simplified version of the input x, M
potential ϵ(eatt ) is the ratio between the number of parameters is the number of features used in the explanation, and βi ∈ R
involved in the analyzed attention layer with respect to the total is a coefficient that represents the effect that the i − th feature
number of parameters of the model. has on the output.
Evidence. The only evidence eshape = ξshap (M, x) used complex systems by using a human-comprehensible linguistic
by SHAP is the set of predictions made by the classifier in approach. Thanks to these characteristics, they are generally
a neighborhood of x. To compute the explanatory potential considered white or gray boxes and are often mentioned
ϵ (eshap ), we can use the ratio of predictions employed to as good options for interpretable AI [35]. FRBSs perform
compute the SHAP values w.r.t the total number of possible their inference (i.e., calculate a conclusion) by exploiting a
samples in the countable (and possibly infinite) neighborhood knowledge base composed of linguistic terms and rules. A
of x. Thus, the greater the number of predictions we have, the fuzzy rule is usually expressed as a sentence in the form:
higher the exploratory potential of the method.
Interpretation. The interpretation gshap of the evidence pro- IF <antecedent> THEN <consequent> (4)
posed by SHAP is that, given eshap , we can locally reproduce
the behavior of a complex unknown model with a simple where antecedent is a logic formula created by concatenat-
additive model h (·), and analyzing h (·) we can get a local ing clauses like ‘X IS a’ with some logical operators, where
explanation Eshap of the behavior of the initial model. That T is a linguistic variable (associated with one input feature)
is, the proposed interpretation of the evidence results from the and a is a linguistic term. Thanks to this representation,
optimization problem in Equation 2. the antecedent of each rule give an intuitive and human-
Faithfulness. Even though the authors do not present a mea- understandable characterization of some class/group.
sure of the faithfulness of the explanation directly, they provide Evidence. The rules are good evidence for a large part of
three desirable properties that are i) local accuracy, ii) miss- the model: they characterize the feature space by using a
ingness, iii) consistency. The authors showed that their method self-explanatory formalism that can be read and validated, by
is the only one that satisfies all these properties, assessing a human operators.
requirements-based form of faithfulness as described in § V. Interpretation. The fuzzy sets, that are used to create the
fuzzy terms and evaluate the satisfaction of the antecedents,
D. Linear regression models have self-explanatory interpretations: they define how much a
Linear regression models are not an explanation method but value belongs to a given set by means of membership func-
are normally considered intrinsically interpretable. Following tions. The fuzzy rules are also self-explanatory. The only part
our proposed framework, we claim that defining them, among that requires a proper interpretation is the output calculation
other models, as intrinsically interpretable is inaccurate and function. In the case of Sugeno reasoning, such functions can
often misleading. In fact, the definition of what is simple to be seen as linear regression models, hence all considerations
be interpreted by humans is not well-defined, and we can discussed in Section VI-D remain valid also in the case of
enumerate various examples of models that are easy to be fuzzy models.
interpreted by a practitioner but are almost black-boxed for Faithfulness. Similarly to the case of linear regression models,
non-expert users. there are no doubts about the faithfulness of the interpretation
A linear regressor fˆlin (·) is typically formulated as: of the predictions given a normalization step. However, in
N
X the case of special transformations (e.g., log-transformation),
fˆlin (x) = β0 + βi x′i (3) some of the intrinsic interpretability might be lost in favor of
i=1 better fitting to training data [35]. Since it is often the case that
where βi are the weights of the learned features, and N is the features in biomedicine (see, e.g., clinical parameters) follow a
feature space dimension. log-normal distribution, such transformations are very frequent
Evidence. The implicit assumption, claiming that a lin- and delicate.
ear model is intrinsically interpretable, is that the weights VII. O N THE IMPACT ON BIOMEDICINE
βi , 1 ≤ i ≤ M are a good explanation for the model. Thus
elin = {βi }N AI models have revolutionized the field of biomedicine,
1 . With a linear model, we have the maximum
explanatory potential ξlin because with elin we can fully enabling advanced analyses, predictions, and decision-making
describe the model. processes. However, the increasing complexity of AI models,
Interpretation. Assuming a normalization of the features, such as deep learning neural networks, has raised concerns re-
we can say that the higher the value of βi , the higher the garding their lack of explainability. The present paper provides
contribution of the feature xi to the model prediction. a common ground on theoretical notions of explainability,
Faithfulness. There are no doubts about the faithfulness of as a pre-requisite to a principled discussion and evaluation.
the interpretation of the predictions given the normalization The final aim is to catalyze a coherent examination of the
assumption, and in fact, a linear model is normally consid- impact of AI and explainability by highlighting its significance
ered an intrinsically interpretable method. However, in a real in facilitating trust, regulatory compliance, and accelerating
scenario, its plausibility to a non-expert user is not guaranteed. research and development. Such a topic is particularly well-
suited for high-stake environments, such as biomedicine.
E. Fuzzy models The use of black box machine learning models in the
Fuzzy models, especially in the form of Fuzzy Rule-Based biomedical field has been steadily increasing, with many
Systems (FRBS), represent effective tools for the modeling of researchers relying on them to make predictions and gain
insights from complex datasets. However, the opacity of these into evidence (factual data coming from the model) and
models poses a challenge to their explainability, leading to interpretation (a hypothesized function that describes how the
the question of what criteria should be used to evaluate their model uses the evidence). The explanation is the product of
explanations [36]. Two key criteria that have been proposed are the application of the interpretation to the evidence and is
faithfulness and plausibility. Faithfulness is important because presented to the target user via some form of explanation
it allows researchers to understand how the model arrived at interface. These components allow for designing more princi-
its conclusions, identify potential sources of error or bias, and pled explanations by defining the atomic components and the
ultimately increase the trustworthiness of the model. In the properties that enable them. There are three core properties:
biomedical field, this is particularly important as the stakes (i) the explanatory potential for the evidence (i.e., how much
are high when it comes to making accurate predictions about of the model the evidence can tell about); (ii) the faithfulness
patient outcomes and the decision maker wants to have the of the interpretation (i.e., whether the interpretation is true
most accurate motivation behind the model suggestions. On to the decision-making of the model); (iii) the plausibility
the other hand, while a faithful explanation may accurately re- of the explanation interface (i.e., how much the explanation
flect the model’s inner workings, it may not be understandable makes sense to the user and is intelligible). We show that the
or useful to stakeholders who lack expertise in the technical theoretical framework can be applied to explanations coming
details of the model. Plausibility is therefore important because from a variety of methods, which fit the atomic components
all the people involved in the decision-making process gain we propose. The lesson learned from analyzing explanations
insights from the model that they can act upon and can over the lent of our proposed framework is that humans
communicate these insights to other stakeholders in a way that (both stakeholder and researcher) should be involved in the
is meaningful and actionable. Both faithfulness and plausibility design of explainability as soon as possible in the AI-powered
are important criteria for evaluating explanations of black-box software design process, especially in sensitive application
machine-learning models in the biomedical field. Researchers domains like biomedicine, where a blind application of black-
should strive to balance these criteria to provide the best box approaches hampers the right to an explanation. Involving
explanations to the stakeholders involved; this can be achieved stakeholders allows for a proper filling of each component
by involving them during the entire development of the AI in the theoretical framework of explainability, and informs
machine models and tools. model design. The top-down approach that is established this
Beyond the need for faithfulness and plausibility, the con- way propels the human understanding of how AI (and ML in
crete impact of explainability on AI for biomedicine is broad. particular) works, possibly fostering user trust in the system.
Explainability plays a crucial role in establishing trust between We believe that high stake decision-making domains such as
healthcare professionals and AI systems. In critical biomedi- biomedicine would be those which will benefit the most from
cal applications, such as disease diagnosis, treatment recom- a more rigorous definition of core concepts of explainability,
mendation, and patient monitoring, transparency in decision- with opportunities to cement a conscious aid of AI-assisted
making is essential for clinicians to make informed decisions. decisions. Therefore, in future work, we want to apply our
By providing interpretable explanations, AI models can help theoretical study to a real-case scenario in the biomedical
healthcare professionals understand the reasoning behind the sector and analyze its implementation with the help of human
model’s predictions, leading to increased trust and acceptance feedback, to better focus on the plausibility analysis of our
[37]. Moreover, regulatory bodies of biomedicine, such as the theoretical framework.
Food and Drug Administration (FDA) in the United States,
require transparency and accountability for AI models used in R EFERENCES
clinical decision-making. XAI techniques provide an opportu- [1] C. Rudin, “Stop explaining black box machine learning models for high
nity to meet regulatory standards by enabling model auditing stakes decisions and use interpretable models instead,” Nat Mach Intell,
vol. 1, no. 5, pp. 206–215, May 2019.
and validation. Through explainability, clinicians and regula-
[2] S. Kundu, “Ai in medicine must be explainable,” Nature Medicine,
tory bodies can assess the risk associated with the deployment vol. 27, no. 8, pp. 1328–1328, Aug 2021.
of AI models, ensuring patient safety and compliance with [3] B. Goodman and S. Flaxman, “European union regulations on algorith-
ethical guidelines [38]. Finally, explainability can significantly mic Decision-Making and a “right to explanation”,” AIMag, vol. 38,
no. 3, pp. 50–57, Oct. 2017.
enhance the research and development process in biomedicine. [4] C. Chen, O. Li, C. Tao, A. J. Barnett, J. Su, and C. Rudin, “This
By uncovering the underlying factors and features that con- looks like that: Deep learning for interpretable image recognition,” in
tribute to an AI model’s decision, researchers can gain valuable Proceedings of the 33rd International Conference on Neural Information
Processing Systems. Red Hook, NY, USA: Curran Associates Inc.,
insights into disease mechanisms, biomarkers, and potential 2019.
therapeutic targets. [5] Q. Zhang, Y. N. Wu, and S.-C. Zhu, “Interpretable convolutional neural
networks,” in 2018 IEEE/CVF Conference on Computer Vision and
VIII. C ONCLUSIONS AND F UTURE W ORK Pattern Recognition, 2018, pp. 8827–8836.
[6] B.-J. Hou and Z.-H. Zhou, “Learning with interpretable structure from
In this work, we propose a novel theoretical framework gated RNN,” IEEE Trans Neural Netw Learn Syst, vol. 31, no. 7, pp.
that brings order and opportunities for a better design of 2267–2279, Jul. 2020.
[7] M. T. Ribeiro, S. Singh, and C. Guestrin, ““why should I trust you?”:
explanations to the XAI community by introducing formal Explaining the predictions of any classifier,” in Proceedings of the 22nd
terminology. The framework allows dissecting explanations ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, ser. KDD ’16. New York, NY, USA: Association for published (February 28, 2022), 2022. [Online]. Available: https:
Computing Machinery, Aug. 2016, pp. 1135–1144. //christophm.github.io/interpretable-ml-book
[8] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model [23] L. Chazette and K. Schneider, “Explainability as a non-functional
predictions,” in Proceedings of the 31st International Conference on requirement: challenges and recommendations,” Requirements Engineer-
Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY, ing, vol. 25, no. 4, pp. 493–514, Dec. 2020.
USA: Curran Associates Inc., 2017, p. 4768–4777. [24] B. Li, P. Qi, B. Liu, S. Di, J. Liu, J. Pei, J. Yi, and B. Zhou, “Trustworthy
[9] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by AI: From principles to practices,” ACM Comput. Surv., Aug. 2022.
jointly learning to align and translate,” in 3rd International Conference [25] Z. C. Lipton, “The mythos of model interpretability: In machine
on Learning Representations, ICLR 2015, San Diego, CA, USA, May learning, the concept of interpretability is both important and slippery.”
7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Queue, vol. 16, no. 3, p. 31–57, jun 2018.
Eds., 2015. [Online]. Available: http://arxiv.org/abs/1409.0473 [26] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and
[10] S. Jain and B. C. Wallace, “Attention is not explanation,” in Proceed- D. Pedreschi, “A survey of methods for explaining black box models,”
ings of the 2019 Conference of the North. Stroudsburg, PA, USA: ACM Comput. Surv., vol. 51, no. 5, aug 2018.
Association for Computational Linguistics, 2019. [27] W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu, “Def-
[11] S. Wiegreffe and Y. Pinter, “Attention is not not explanation,” in initions, methods, and applications in interpretable machine learning,”
Proceedings of the 2019 Conference on Empirical Methods in Natural Proc. Natl. Acad. Sci. U. S. A., vol. 116, no. 44, pp. 22 071–22 080, Oct.
Language Processing and the 9th International Joint Conference on 2019.
Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA, [28] S. T. Mueller, R. R. Hoffman, W. J. Clancey, A. Emrey, and
USA: Association for Computational Linguistics, 2019, pp. 11–20. G. Klein, “Explanation in human-ai systems: A literature meta-
[12] S. Serrano and N. A. Smith, “Is attention interpretable?” in Proceedings review, synopsis of key ideas and publications, and bibliography for
of the 57th Annual Meeting of the Association for Computational explainable AI,” CoRR, vol. abs/1902.01876, 2019. [Online]. Available:
Linguistics. Florence, Italy: Association for Computational Linguistics, http://arxiv.org/abs/1902.01876
Jul. 2019, pp. 2931–2951. [29] T. Miller, “Explanation in artificial intelligence: Insights from the social
[13] M. Graziani, L. Dutkiewicz, D. Calvaresi, J. Amorim, K. Yordanova, sciences,” Artif. Intell., vol. 267, pp. 1–38, 2019. [Online]. Available:
M. Vered, R. Nair, P. Henriques Abreu, T. Blanke, V. Pulignano, https://doi.org/10.1016/j.artint.2018.07.007
J. Prior, L. Lauwaert, W. Reijers, A. Depeursinge, V. Andrearczyk, [30] S. Jain, S. Wiegreffe, Y. Pinter, and B. C. Wallace, “Learning to
and H. Müller, “A global taxonomy of interpretable ai: unifying the faithfully rationalize by construction,” in Proceedings of the 58th Annual
terminology for the technical and social sciences,” Artificial Intelligence Meeting of the Association for Computational Linguistics. Online:
Review, 09 2022. Association for Computational Linguistics, Jul. 2020, pp. 4459–4473.
[14] M.-A. Clinciu and H. Hastie, “A survey of explainable AI terminology,” [Online]. Available: https://aclanthology.org/2020.acl-main.409
in Proceedings of the 1st Workshop on Interactive Natural Language [31] J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and
Technology for Explainable Artificial Intelligence (NL4XAI 2019). B. Kim, “Sanity checks for saliency maps,” in Proceedings of the 32nd
Association for Computational Linguistics, 2019, pp. 8–13. [Online]. International Conference on Neural Information Processing Systems, ser.
Available: https://aclanthology.org/W19-8403 NIPS’18. Red Hook, NY, USA: Curran Associates Inc., Dec. 2018,
[15] W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu, pp. 9525–9536.
“Definitions, methods, and applications in interpretable machine [32] M. Ribeiro, S. Singh, and C. Guestrin, ““why should I trust you?”:
learning,” Proceedings of the National Academy of Sciences, vol. 116, Explaining the predictions of any classifier,” in Proceedings of the
no. 44, pp. 22 071–22 080, Oct. 2019, company: National Academy 2016 Conference of the North American Chapter of the Association for
of Sciences Distributor: National Academy of Sciences Institution: Computational Linguistics: Demonstrations. San Diego, California:
National Academy of Sciences Label: National Academy of Sciences Association for Computational Linguistics, Jun. 2016, pp. 97–101.
Publisher: Proceedings of the National Academy of Sciences. [Online]. [Online]. Available: https://aclanthology.org/N16-3020
Available: https://www.pnas.org/doi/abs/10.1073/pnas.1900654116 [33] A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important fea-
[16] L. Arbelaez Ossa, G. Starke, G. Lorenzini, J. E. Vogt, D. M. Shaw, tures through propagating activation differences,” in Proceedings of the
and B. S. Elger, “Re-focusing explainability in medicine,” Digit Health, 34th International Conference on Machine Learning, ser. Proceedings
vol. 8, p. 20552076221074488, Feb. 2022. of Machine Learning Research, D. Precup and Y. W. Teh, Eds., vol. 70.
[17] A. Jacovi and Y. Goldberg, “Towards faithfully interpretable NLP Sydney, NSW, Australia: PMLR, 2017, pp. 3145–3153.
systems: How should we define and evaluate faithfulness?” in [34] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and
Proceedings of the 58th Annual Meeting of the Association for W. Samek, “On Pixel-Wise explanations for Non-Linear classifier deci-
Computational Linguistics. Online: Association for Computational sions by Layer-Wise relevance propagation,” PLoS One, vol. 10, no. 7,
Linguistics, Jul. 2020, pp. 4198–4205. [Online]. Available: https: p. e0130140, Jul. 2015.
//aclanthology.org/2020.acl-main.386 [35] C. Fuchs, S. Spolaor, U. Kaymak, and M. S. Nobile, “The impact
[18] ——, “Aligning faithful interpretations with their social attribution,” of variable selection and transformation on the interpretability and
Trans. Assoc. Comput. Linguist., vol. 9, pp. 294–310, Mar. 2021. accuracy of fuzzy models,” in 2022 IEEE Conference on Computational
[19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Intelligence in Bioinformatics and Computational Biology (CIBCB).
Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you IEEE, 2022, pp. 1–8.
need,” in Advances in Neural Information Processing Systems, [36] C. Combi, B. Amico, R. Bellazzi, A. Holzinger, J. H. Moore, M. Zitnik,
I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, and J. H. Holmes, “A manifesto on explainability for artificial intelli-
S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, gence in medicine,” Artif Intell Med, vol. 133, p. 102423, Oct. 2022.
Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper/ [37] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad,
2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf “Intelligible models for healthcare: Predicting pneumonia risk and
[20] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and hospital 30-day readmission,” in Proceedings of the 21th ACM
D. Batra, “Grad-cam: Visual explanations from deep networks via SIGKDD International Conference on Knowledge Discovery and
gradient-based localization,” in 2017 IEEE International Conference on Data Mining, ser. KDD ’15. New York, NY, USA: Association
Computer Vision (ICCV), 2017, pp. 618–626. for Computing Machinery, 2015, p. 1721–1730. [Online]. Available:
[21] B. Mittelstadt, C. Russell, and S. Wachter, “Explaining explanations in https://doi.org/10.1145/2783258.2788613
AI,” in Proceedings of the Conference on Fairness, Accountability, and [38] H. Suresh and J. Guttag, A Framework for Understanding Sources of
Transparency, ser. FAT* ’19. New York, NY, USA: Association for Harm throughout the Machine Learning Life Cycle. New York, NY,
Computing Machinery, Jan. 2019, pp. 279–288. USA: Association for Computing Machinery, 2021. [Online]. Available:
[22] C. Molnar, Interpretable Machine Learning, 2nd ed. Independently https://doi.org/10.1145/3465416.3483305

You might also like