Chapter 1 - Introduction - 2023 - Explainable Deep Learning AI

This chapter introduces the concepts and challenges of explainable artificial intelligence (XAI) and explainable deep learning (DL). It discusses the need for XAI due to issues like biases, errors, and lack of transparency in AI systems. The chapter then provides an overview of the book, which focuses on XAI methods for DL systems. It describes some of the key XAI challenges for DL like explaining the decisions of complex neural networks. Finally, the chapter previews the contents of subsequent chapters, which present novel XAI approaches and evaluations to help interpret trained DL models and their decisions.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views

Chapter 1 - Introduction - 2023 - Explainable Deep Learning AI

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

CHAPTER 1

Introduction
Jenny Benois-Pineaua and Dragutin Petkovicb
a Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, Talence, France
b San Francisco State University, San Francisco, CA, United States

Chapter points
• In this chapter we ﬁrst give the rationale of the book.
• The content of each chapter will be shortly introduced.

We are witnessing the emergence of an “AI economy and society” where AI technolo-
gies are increasingly impacting many aspects of business as well as everyday life. We read
with great interest about the recent advances in AI medical diagnostic systems, self-
driving cars, the ability of AI technology to automate many aspects of business decisions
like loan approvals, hiring, policing, etc. However, as evident by recent experiences, AI
systems may produce errors, can exhibit overt or subtle bias, may be sensitive to noise
in the data, and often lack technical and judicial transparency and explainability. These
shortcomings have been documented in the scientific literature, but also, importantly,
in the general press (accidents with self-driving cars, biases in AI-based policing, hir-
ing, and loan systems, biases in face recognition systems for people of color, seemingly
correct medical diagnoses later found to be made due to wrong reasons, etc.). These
shortcomings are raising many ethical and policy concerns not only in technical and
academic communities, but also among policymakers and general public, and will in-
evitably impede wider adoption of this potentially very beneficial technology. These
broad concerns about AI are often grouped under an umbrella area of “AI Ethics and
Values” or “Trustworthy AI.” The technical community, potential adopters, popular
media, as well as political and legal stakeholders, have recognized the problem and have
begun to seek and ask for solutions. Many very high level political and technical bodies
(e.g., G20, EU expert groups, Association of Computing Machinery in the USA), as
well as major AI companies, have identified Trustworthy AI as a critical issue. The tech-
nical and academic community are also addressing these issues by dedicated conferences
and workshops and academic centers. The governing bodies are issuing guidance and
even laws like EU GDPR with “right to know” clauses for AI-driven decisions.
The problems related to Trustworthy AI are complex and broad, and encompass not
only technical issues but also legal, political, and ethical ones. Trustworthy AI concerns
seem to have some common components, such as bias/discrimination, need for human
understanding and control, ethics, as well as judicial and human-centric transparency

and explainability. It is also a common belief that better methods for auditing and even
certification of AI systems for Trustworthiness are going to be critical for its broader
adoption in business and society.
One way to tackle broad problems like Trustworthy AI is to first work on its key
components. We hence devote this book to AI explainability, which is what we believe
of the critical issue in achieving Trustworthy AI – after all, how can one audit/certify,
trust, and legally defend (that will come, too!) something one does not understand?
More specifically, this book addresses explainability of Deep Learning (DL) AI tech-
nologies, which are very popular and powerful, but at the same time notoriously hard
to explain. At a high level, explainability of AI systems should allow human users to gain
insights into how they make their decisions. The human users of Explainable AI have to
include not only AI experts but also domain experts (often the key adopters), as well as
general public. AI explainability can be model based, where it offers insights on how the
AI system works on a collection of data as a whole (e.g., the whole training database),
and sample based, where it offers insights into how the trained Machine Learning sys-
tem classifies a specific data sample (“why was I rejected for the loan?”). The latter has
been shown to be critical in determining user trust in AI systems. AI explainers can be
agnostic to AI methods they try to explain, or direct (tied to a specific AI algorithm). AI
systems that are explainable achieve many benefits, e.g., they could increase user trust
and adoption, improve AI quality control and maintenance, make them legally more
transparent, and possibly even offer new insights into the analyzed domain. The need
for explainability does not exclude the usefulness of black-box models since they are
often tried first and may serve, among other things, to point to ultimate achievable ac-
curacy. Explainable systems can also be used for development, verification, and testing
of AI systems including otherwise hard to explain black box systems.
Challenges in achieving explainability of DL AI systems are very significant. Due
to their structure (multiple layers of a very large number of interconnected weighted
connections), they are extremely hard to explain compared to, for example, powerful
ensemble tree based systems like Random Forests. The challenges for explainable DL
include algorithms to extract explainable information from trained DL systems, ways
to present and visualize explanation information such that it is understandable to target
human users (experts or not) at the model as well as sample level, and methods for
evaluation of such systems which have to involve target human users, thus making them
time consuming and expensive.
This book covers the latest and novel contributions to all aspects of explainable
DL, from area overview to specific approaches, including comprehensive evaluations
of usefulness for human users. It comprises 13 contributing chapters which present
methodological aspects, together with application examples.
Chapter 2 introduces the reader to the concepts, methods, and recent developments
in the field of explainable AI. After discussing what it means to “explain” in the context
Introduction 3

of machine learning and presenting useful desiderata for explanations, a brief overview
of established explanation techniques is provided, mainly focusing on the so-called attri-
bution methods which explain the model’s decisions by post hoc assigning importance
scores to every input dimension (e.g., pixels in images). Furthermore, recent develop-
ments in XAI are discussed. In addition, the author presents ways to use explanations
to effectively prune, debug, and improve a given model, as well as the concept of neu-
ralization which easily allows one to transfer XAI methods specifically developed for
neural network classifiers to other types of models and tasks. The chapter concludes
with a discussion of the limits of current explanation methods and promising future
research directions.
In Chapter 3 the authors make use of information visualization approaches to present
a method to help in the interpretability of trained deep neural networks. By depicting
both their architecture and their way to process the different classes of a testing input
dataset, the moment (i.e., the layer) where classes are being discriminated becomes vis-
ible. Similarly, they show how it is possible to detect degradation in the classification
process, mainly because of the usage of an oversized network for a simpler classification
task. The authors conduct their experiments on well-known Deep NN architectures,
such as LeNet5 and VGG16 networks, when using popular MNIST and Fashion-
MNIST data sets. Results show a progressive classification process which is extended
to an improvement on an oversized network to solve a simpler task.
In Chapter 4 the authors address the family of explanation methods by perturba-
tion. These methods are based on evaluating the effect that perturbations applied to
the input induce on the model’s output. We note that the authors are in particular
interested in image classification. In this chapter, they discuss the limitations of the exist-
ing perturbation-based approaches and contribute with a perturbation-based attribution
method guided by semantic segmentation of images. This method inhibits specific im-
age areas according to their semantic meaning, associated to an assigned semantic label.
Hereby, perturbations are linked up with this semantic meaning and a complete attri-
bution map is obtained for all image pixels. The potential capabilities of this attribution
scheme are exemplified by automatically arranging image areas into semantic sets ac-
cording to their relevance, irrelevance, and distracting potential for scene recognition of
the semantic label assigned to them.
Chapter 5 presents a further development of recently introduced Feature Under-
standing Method (FEM) which explains DNN decisions also in the context of image
classification tasks. The authors call the method “Modified FEM.” It belongs to the
family of the so-called “white-box” methods. It is based on the analysis of the features
in the last convolution layer of the network with further backpropagation to identify
the image pixels which contributed to the decision the most. The method explains
the decision of a trained network for a specific sample image. The application exam-
ple of the method is explanation of the network trained for the classification of chest
4 Explainable Deep Learning AI

X-ray images for the recognition of COVID-19 disease. This chapter covers the inten-
sive research in the XAI field despite its relatively rich set of methods which have been
previously developed.
Chapter 6 is an example of the use of XAI methods in a clinical field. The authors
do not propose new explainers. Instead, they use methods previously proposed in the
literature, such as Backpropagation, Guided Backpropagation, and Layerwise Relevance
Propagation. They hypothesize that the agreement across these methods is an indication
of the robustness of the results in the problem of stratification of multiple sclerosis on
brain images. The voxels of the input data mostly involved in the classification decision
are identified and their association with clinical scores is assessed, potentially bringing
to light brain regions which might reveal disease signatures. Indeed, presented results
highlight regions such as the Parahippocampal Gyrus, among others, showing both
high stability across the three visualization methods and a significant correlation with
the Expanded Disability Status Scale (EDSS) score, witnessing in favor of the neuro-
physiological plausibility of these findings.
In Chapter 7 as in Chapter 4, the authors focus on perturbation based methods which
use some modifications of the initial signal (more often, in some neighborhood around
it) and measure how the classification result changes. They propose a method recur-
sively hides image parts to get an explanation for a particular decision of the black-box
classifier. The core of this method is the division of the image being classified into sep-
arate rectangular parts, followed by the analysis of their influence on the classification
result. Such divisions are repeated recursively until the explanation of the classification
result is found, or the size of parts is too small. As a result, the pair of images with com-
plementary hidden parts is discovered: the first image preserves both the most valuable
parts and the classification result of the initial image. The second shows the result of
hiding the most valuable parts of an initial image that leads to the different classification.
Such a representation allows humans to see which parts of the image are significant for
the particular decision and confirms that classification is not successful without these
parts.
Chapter 8 focuses on the importance of trained convolutional filters in the convolu-
tional neural networks. Removing some of them, the authors track classification results
and thus also explain the network. Filter removal can lead to the increased accuracy,
which explains the chapter title “Remove to improve.”
Chapter 9 is interesting in the sense that it combines rule-based methods which are
naturally explainable prediction models, with CNNs. The rule mining techniques such
as SBRL, Rule Regularization, and Gini Regularization are adapted to produce an
interpretable surrogate model, which imitates the behavior of the CNN classifier on
time series data. This makes the decision process more comprehensible for humans. In
the experiments the authors evaluate three methods on the classification task trained
on AIS real-world dataset and compare the performance of each method in terms of
Introduction 5

metrics such as F1-score, fidelity, and the number of rules in the surrogate list. Besides,
they illustrate the impact of support, and bins (number of discrete intervals), parameters
on the model’s performance along with the strengths and weaknesses of each method.
Chapter 10 addresses an extremely important and still open research question of the
evaluation of explanation methods. It presents two protocols to compare different XAI
methods. The first protocol is designed to be applied when no end users are available.
An objective, quantitative metric is being compared to objective data expert’s analysis.
The second experiment is designed to take into account end-users feedback. It uses the
quantitative metric applied in the first experiment, and compares it to users preferences.
Then, the quantitative metric can be used to evaluate explanations, allowing multiple
explanation methods to be tried and adjusted, without the cost of having systematic
end-users’ evaluation. The protocol can be applied on post hoc explaining approaches,
as well as to self-explaining neural networks. The application example used is natural
language processing tasks.
Chapter 11 is application oriented in the field of cybersecurity. As machine learning
approaches are used for malware detection, XAI approaches allow for detecting the
most characteristic features both in the static and dynamic analysis frameworks, which
characterize malware. In this perspective, the authors focus on such challenges and the
potential uses of explainability techniques in the context of Android ransomware, which
represents a serious threat for mobile platforms. They present an approach that enables
the identification of the most influential features and the analysis of ransomware. Their
results suggest that the proposed methods can help cyber threat intelligence teams in
the early detection of new ransomware families and could be extended to other types
of malware.
The authors of Chapter 12 are interested in explainability in medical image caption-
ing tasks. Image captioning is the task of describing the content of the image using
textual representation. It has been used in many applications such as semantic tag-
ging, image retrieval, early childhood learning, human-like robot–robot interactions,
visual question answering tasks, and medical diagnosis. In the medical field, automatic
captioning assists medical professionals in diagnosis, disease treatment, and follow-up
recommendations. As a general trend, deep learning techniques have been developed
for image captioning task. They rely on encoder–decoder models which are black boxes
of two components cooperating to generate new captions for images. Image captioning
builds a bridge between natural language processing and image processing, making it
difficult to understand the correspondence between visual and semantic features. The
authors present an explainable approach that provides a sound interpretation of the
attention-based encoder–decoder model for image captioning. It provides a visual link
between the region of medical image and the corresponding wording in the generated
sentence. The authors evaluate the performance of the model and provide samples from
the ImageCLEF medical captioning dataset.
6 Explainable Deep Learning AI

Chapter 13 is the second one which explicitly tackles the question of the evalua-
tion of explainers. This chapter elaborates on some key ideas and studies designed to
provide post hoc explanations-by-example to the problem of explaining the predictions
of black-box deep learning systems. With a focus on image and time series data, the
authors review several recent explanation strategies – using factual, counterfactual, and
semifactual explanations – for Deep Learners. Several novel evaluations of these methods
are reported, showing how well these methods work, along with representative outputs
that are produced. The chapter also profiles the user studies being carried out on these
methods, discussing the pitfalls that arise and results found.
Up to now a wide variety of XAI methods have been proposed. It is not always
obvious for an AI practitioner which of them is the most applicable to the problem at
hand, as the theoretical analysis of these methods and the conditions of their applicabil-
ity remain rare. One of the popular methods in the family of black-box XAI approaches
is Locally Interpretable Model-agnostic Explanation (LIME). It consists in masking pre-
segmented regions in the data (images) and tracking the score changes, thus determining
the regions important for classification. The exact operation of LIME, though, is often
overlooked by practitioners, sometimes at a great expense. One example in particular
is the nature of the sampling process, which can lead to misleading explanations if not
adapted to the data at hand. Another point is the role of LIME’s hyperparameters, of
which the provided explanations depend in a complicated manner. In Chapter 14, the
author summarizes some recent theoretical work focused on LIME with the goal to
propose useful insights into the method.
Finally, Chapter 15 concludes the book. We review the content of the book pro-
viding insights about the main concepts presented in the previous chapters. We also
give two important directions for future work: the evaluation of XAI methods and
the enrichment of the explanation methods with semantics in order to foster trust to
nonexperts.