2024 AI Explainability in Practice The Alan Turing Institute
2024 AI Explainability in Practice The Alan Turing Institute
2024 AI Explainability in Practice The Alan Turing Institute
AI Explainability
in Practice
Participant Workbook
Intended for participants to engage with in
preparation for, and during, workshops.
AI Explainability in Practice 1
Acknowledgements
This workbook was written by David Leslie, Cami Rincón, Morgan Briggs, Antonella Maia
Perini, Smera Jayadeva, Ann Borda, SJ Bennett, Christopher Burr, Mhairi Aitken, Sabeehah
Mahomed, Janis Wong, Madeleine Waller, and Claudia Fischer.
The creation of this workbook would not have been possible without the support and
efforts of various partners and collaborators. As ever, all members of our brilliant team
of researchers in the Ethics Theme of the Public Policy Programme at The Alan Turing
Institute have been crucial and inimitable supports of this project from its inception several
years ago, as have our Public Policy Programme Co-Directors, Helen Margetts and Cosmina
Dorobantu. We are deeply thankful to Conor Rigby, who led the design of this workbook
and provided extraordinary feedback across its iterations. We also want to acknowledge
Johnny Lighthands, who created various illustrations for this document, and Alex Krook
and John Gilbert, whose input and insights helped get the workbook over the finish line.
Special thanks must be given to the Digital Office for Scottish Local Government, staff
and residents of Camden Council, Justin Green (Northumbria Healthcare NHS Foundation
Trust) and Postgraduate doctors in training (PGDiT) at NHS England Education Northeast
& North Cumbria, and Anna Knack, Ardi Janjeva, Megan Hughes, Rosamunde Powell, and
Samuel Stockwell (The Alan Turing Institute) for helping us test the activities and review
the content included in this workbook.
This work was supported by Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC
Grant EP/W006022/1, particularly the Public Policy Programme theme within that grant &
The Alan Turing Institute; Towards Turing 2.0 under the EPSRC Grant EP/W037211/1 & The
Alan Turing Institute; and the Ecosystem Leadership Award under the EPSRC Grant EP/
X03870X/1 & The Alan Turing Institute
Cite this work as: Leslie, D., Rincón, C., Briggs, M., Perini, A., Jayadeva, S., Borda, A.,
Bennett, SJ. Burr, C., Aitken, M., Mahomed, S., Wong, J., Waller, M., and Fischer, C. (2024).
AI Explainability in Practice. The Alan Turing Institute.
AI Explainability in Practice 2
Contents
About the Workbook Series 44 Part Two: Putting the Principle of
Explainability Into Practice
68 Activities Overview
31 Responsibility Explanation
33 Data Explanation
35 Fairness Explanation
39 Safety Explanation
42 Impact Explanation
AI Explainability in Practice 3
About the AI Ethics and
Governance in Practice
Workbook Series
Who We Are
The Public Policy Programme at The Alan Turing Institute was set up in May 2018 with the
aim of developing research, tools, and techniques that help governments innovate with
data-intensive technologies and improve the quality of people’s lives. We work alongside
policymakers to explore how data science and artificial intelligence can inform public policy
and improve the provision of public services. We believe that governments can reap the
benefits of these technologies only if they make considerations of ethics and safety a first
priority.
In 2021, the UK’s National AI Strategy recommended as a ‘key action’ the update and
expansion of this original guidance. From 2021 to 2023, with the support of funding from
the Office for AI and the Engineering and Physical Sciences Research Council as well
as with the assistance of several public sector bodies, we undertook this updating and
expansion. The result is the AI Ethics and Governance in Practice Programme, a bespoke
series of eight workbooks and a digital platform designed to equip the public sector with
tools, training, and support for adopting what we call a Process-Based Governance (PBG)
Framework to carry out projects in line with state-of-the-art practices in responsible and
trustworthy AI innovation.
AI Explainability in Practice 4
About the Workbooks
The AI Ethics and Governance in Practice Programme curriculum is composed of a series
of eight workbooks. Each of the workbooks in the series covers how to implement a key
component of the PBG Framework. These include Sustainability, Safety, Accountability,
Fairness, Explainability, and Data Stewardship. Each of the workbooks also focuses on a
specific domain, so that case studies can be used to promote ethical reflection and animate
the Key Concepts.
2 AI Sustainability in Practice 6
AI Safety in Practice
Part One
AI in Transport
AI in Urban Planning
3 AI Sustainability in Practice 7
AI Explainability in Practice
Part Two
AI in Social Care
AI in Urban Planning
4 8
AI Fairness in Practice AI Accountability in Practice
AI in Healthcare AI in Education
Explore the full curriculum and additional resources on the AI Ethics and Governance in
Practice Platform at aiethics.turing.ac.uk
aiethics.turing.ac.uk..
Taken together, the workbooks are intended to provide public sector bodies with the skills
required for putting AI ethics and governance principles into practice through the full
implementation of the guidance. To this end, they contain activities with instructions for
either facilitating or participating in capacity-building workshops.
Please note, these workbooks are living documents that will evolve and improve with input
from users, affected stakeholders, and interested parties. We value your participation.
Please share feedback with us at [email protected]
[email protected].
AI Explainability in Practice 5
Programme Roadmap
The graphic below visualises this workbook in context alongside key frameworks, values
and principles discussed within this programme. For more information on how these
elements build upon one another, refer to AI Ethics and Governance in Practice: An
Introduction.
Introduction
1 2 3 4 5 6 7 8
S F D S E A
C A R E
ACT
1 2 3 4 5 6 7 8
Intended Audience
The workbooks are primarily aimed at civil servants engaging in the AI Ethics and
Governance in Practice Programme — whether as AI Ethics Champions delivering the
curriculum within their organisations by facilitating peer-learning workshops, or as
participants completing the programmes by attending these workshops. Anyone interested
in learning about AI ethics, however, can make use of the programme curriculum, the
workbooks, and resources provided. These have been designed to serve as stand-alone,
open access resources. Find out more at aiethics.turing.ac.uk
aiethics.turing.ac.uk.
• Facilitator Workbooks are annotated with additional guidance and resources for
preparing and facilitating training workshops.
• Participant Workbooks (such as this document) are intended for workshop participants
to engage with in preparation for, and during, workshops.
AI Explainability in Practice 6
Introduction to This Workbook
The purpose of this workbook is to introduce participants to the principle of AI Explainability.
Understanding how, why, and when explanations of AI-supported or -generated outcomes
need to be provided, and what impacted people’s expectations are about what these
explanations should include, is crucial to fostering responsible and ethical practices within your
AI projects. To guide you through this process, we will address essential questions: What do
we need to explain? And who do we need to explain this to? This workbook offers practical
insights and tools to facilitate your exploration of AI Explainability. By providing actionable
approaches, we aim to equip you and your team with the means to identify when and how to
employ various types of explanations effectively. This workbook is divided into two sections,
Key Concepts and Activities:
This section provides content for workshop participants and facilitators to engage with
prior to attending each workshop. It first provides definitions of key terms, introduces the
maxims of AI Explainability and considerations for building appropriately explainable AI
systems, and gives an overview of the main types of explanations. The section then delves
into practical tasks and tools to ensure AI Explainability. Topics discussed include:
1 2 3
4 5
1 2
EAM
AI Explainability in Practice 7
Activities Section
Case studies within the AI Ethics and Governance in Practice workbook series are grounded
in public sector use cases, but do not reference specific AI projects.
Go over Key Concepts and review the case study for this workshop.
Information Gathering
Evaluating Explanations
Practise evaluating the extent to which AI explanations meet their purpose and align
with the Maxims of AI Explainability.
AI Explainability in Practice 8
AI Explainability in Practice
Key
Concepts
31 Responsibility Explanation
15 Process-Based and Outcome-Based
Explanations 33 Data Explanation
The workbook operationalises concepts from Explaining decisions made with AI, AI a co-badged
guidance by the Information Commissioner’s Office (ICO) and The Alan Turing Institute. This
guidance aims to give organisations practical advice to help explain the processes, services
and decisions delivered or assisted by AI, to the individuals affected by them.
Sections of the workbook also draw from the Responsible Research and Innovation in Data
Science and AI Skills Track of Turing Commons.[1]
Transparency
A neighbouring concept to AI Explainability is that of AI Transparency. Transparency holds
multiple meanings dependent on the context and discipline it is being used within. The
common dictionary understanding of transparency defines it as either:
1. the quality an object has when one can see clearly through it, or
2. the quality of a situation or process that can be clearly justified and explained because
it is open to inspection and free from secrets.[2]
1. Interpretability of an AI system or the ability to know how and why a model performed
the way it did in a specific context and therefore to understand the rationale behind
its decision or behaviour. This sort of transparency is often referred to by way of
the metaphor of ‘opening the black box’ of AI. It involves content clarification and
intelligibility.[3]
2. Transparent AI asks that the designers and developers of AI systems demonstrate that
their processes and decision-making, in addition to system models and outputs, are
sustainable, safe, fair, and driven by responsibly managed data.[4]
Balancing these two aspects is essential for building responsible AI systems. Project teams
need to address considerations around the potential AI Safety risks, how they manage the
information generated about those, risks, and how these are shared or protected. They will
also need to integrate transparency considerations into those decisions, and consider the
extent to which explanations about the model, and the processes of the AI project, will be
made available.[8]
There are a host of risks AI applications create for children. Among many challenges,
there is a long-term concern of the potential transformative effects of these technologies
on the holisitic development of children into socialised members of their communities.
Furthermore, there are ongoing risks to the privacy of children and their families in
increasingly data extractive, intrusive, and digitally networked social environments.
The UK’s Information Commissioner’s Office (ICO) has developed a statutory code aimed
at protecting children’s online data. This code comprises 15 standards that online services,
deemed ‘likely to be accessed by children’, must adhere to, in line with the Convention on
the Rights of the Child (UNCRC) and the obligations outlined in the General Data Protection
Regulation (GDPR) and the Privacy and Electronic Communications Regulations (PECR).
Among its provisions, the code mandates that online services:
• refrain from sharing children’s data with third parties, disable geolocation services;
• For example, if you are trying to explain the fairness and safety of a particular
AI-assisted decision, one component of your explanation will involve
establishing that you have taken adequate measures across the system’s
production and deployment to ensure that its outcome is fair and safe.
When considering AI systems that are developed using children’s data, in order to explain
that the system is fair and safe, one must first adhere to the UK ICO’s Age Appropriate
Design Code.
Code However, a process-based explanation in this setting expands well past legal
compliance. How have adequate measures of reporting and oversight been put in place
to ensure that children are not harmed in the process of the design, development, and
deployment of AI technologies that use their data?
Maxims of AI Explainability
The following maxisms provide a broad steer on what to think about when explaining
AI/ML-assisted decisions to individuals.
Maxim 1
Be Transparent
Be Accountable
More details about GDPR can be found in Workbook 5: Responsible Data Stewardship
WB 5
in Practice.
Practice
• Identify those within your organisation who manage and oversee the ‘explainability’
requirements of an AI decision-support system and assign ultimate responsibility
for this.
• Ensure you have a designated and capable human point-of-contact for individuals
to clarify or contest a decision.
• Actively consider and make justified choices about how to design and deploy AI/ML
models that are appropriately explainable to individuals.
• Take steps to prove and document that you made these considerations, and that
they are present in the design and deployment of the models themselves.
Accountability is also mentioned in the UNICEF Policy guidance on AI for children under the
requirement of ‘Providing transparency, explainability, and accountability for children’ (pg.
38).[21] It is critical for roles and responsibilities to be established within an organisation to
ensure accountability for decisions made about children’s data. Additionally, UNICEF states
it is imperative that AI systems are developed so that they protect and empower child
users according to legal and policy frameworks, regardless of children’s understanding
of the system. They state, ‘the development of AI systems cannot ignore or exploit
any child’s lack of understanding or vulnerability’ (pg. 39). Accountability should also be
accompanied by AI oversight bodies with a specific focus on child rights and the inclusion of
child rights experts.
Maxim 3
Consider Context[22]
1. Choose appropriate models and explanation. If you plan to use AI/ML to help
make decisions about people, you should consider:
• the setting;
• prioritising delivery of the relevant explanation types. Find out more about
explanation types on page 28 28.
• For auditors:
- The level and depth of explanation that is fit for the purpose of the relevant
review.
- the range of people subject to decisions made (to account for the range of
knowledge or expertise);
In addition to the nine requirements outlined in the UNICEF Policy guidance on AI for
children, there are overarching recommendations that apply in all contexts. This includes
children
adapting the AI/ML system to the national or local context while keeping children in mind
from design to deployment. These considerations should be taken into account from the
design stage to avoid algorithmic bias resulting from contextual blindness. Additionally,
the requirement of ‘Ensure inclusion of and for children’ (pg. 33) recommends active
participation of children across all stages of the project lifecycle to ensure children are
considered in the context of the system’s intended use.[23] When considering potential
impacts, specific focus should be given to ‘actively support the most marginalized children’
(pg. 34) including girls, minority groups, children with disabilities, and those in refugee
contexts to ensure that they may benefit from, and not be disadvantaged by AI systems.[24]
Reflect on Impacts
You should then revisit and reflect on the impacts identified in the initial stages of the AI/
ML project throughout the development and implementation stages.[28] If any new impacts
are identified, you should document them, alongside any implemented mitigation measures
where relevant.[29] This will help you explain to individuals impacted what impacts you have
identified and how you have reduced any potentially harmful effects as best as possible.
1. Ensure individual wellbeing.[30] Think about how to build and implement your AI/ML
system in a way that;
• ensures their abilities to make free and informed decisions about their own lives;
• supports their abilities to flourish, to fully develop themselves, and to pursue their
interests according to their own freely determined life plans;
• encourages all voices to be heard and all opinions to be weighed seriously and
sincerely;
• uses AI technologies as an essential support for the protection of fair and equal
treatment under the law;
• anticipates the wider impacts of the AI/ML technologies you are developing by
thinking about their ramifications for others around the globe, for the biosphere as
a whole, and for future generations.
Reflecting on impacts overlaps with the UNICEF Policy guidance on AI for children
requirements of ‘Prioritise fairness and non-discrimination for children’ (pg. 34) and ‘Protect
children’s data and privacy’ (pg. 35). These requirements call for actively supporting
marginalised children to ensure benefits from AI systems.[31] This entails making certain
that datasets include a diversity of children’s data and implementing responsible data
approaches to ensure that children’s data is handled with care and sensitivity. The Age
Appropriate Design Code contains the principle of ‘Detrimental use of data’ which states
that children’s data should not be used in any way that could negatively affect their well-
being or go against industry standards, regulatory provisions, or government advice.[32]
• type of application;
• existing technology.
Consideration 1 in Depth
1. Type of application: Start by assessing both the kind of tool you are building and the
environment in which it will apply. Understanding your AI system’s purpose and context
of application will give you a better idea of the stakes involved in its use and hence also a
good starting point to think about the scope of its interpretability needs.
Consideration 2 in Depth
2. Available data resources and domain knowledge: Where data resources lend to
well-structured, meaningful representations and domain expertise can be incorporated into
model design, interpretable techniques may often be more desirable than opaque ones.
Careful data pre-processing and iterative model development can, in these cases, hone
the accuracy of such interpretable systems in ways that may make the advantages gained
by the combination of their performance and transparency outweigh the benefits of more
opaque approaches.
3. Task appropriate AI/ML techniques: For use cases where AI/ML-based predictive
modelling or classification involves tabular or structured data, interpretable techniques may
hold advantages, but for tasks in areas like computer vision, natural language processing,
and speech recognition, where unstructured and high-dimension data is required, drawing
on standard interpretable techniques will not be possible.
Consideration 3 in Depth
KEY CONCEPT Black Box Model
1. Thoroughly weigh up impacts and
risks: As a general policy, you and your We define a ‘black box’ model as any
team should utilise ‘black box’ models only AI system whose inner workings and
where their potential impacts and risks have rationale are opaque or inaccessible to
been thoroughly considered in advance, and human understanding. These systems
you have determined that your use case and may include:
domain specific needs support the responsible • neural networks, including recurrent
design and implementations of these systems. and convolutional neural networks
(models consisting of interconnected
2. Consider supplemental interpretability nodes that make predictions based
tools: Consider what sort of explanatory on correlations from input data);
resources the interpretability tool will provide
users and implementers in order (1) to enable • ensemble methods (such as the
them to exercise better-informed evidence- random forest technique that
based judgments and (2) to assist them in strengthens an overall prediction
offering plausible, sound, and reasonable by combining and aggregating the
accounts of the logic behind algorithmically results of several or many different
generated output to affected individuals and base models); or
concerned parties. • support vector machines (a classifier
that uses a special type of mapping
3. Formulate an action plan to optimise function to build a divider between
explainability: This should include a clear two sets of features in a high
articulation of the explanatory strategies your dimensional space).
team intends to use, a detailed plan that
indicates the stages in the project workflow For example of model types, see
when the design of these strategies will need Appendix A.
A
to take place, and a succinct formulation
of your explanation priorities and delivery
strategy. Further actions are detailed in the Six
Tasks for Explainability Assurance Management
that are described in Part Two of this guidance.
Consideration 4 in Depth
Helps people understand the reasons that led to a Helps people understand the steps taken to ensure AI
decision outcome. decisions are generally unbiased and equitable, and whether
or not they have been treated equitably themselves.
Helps people understand who is involved in the Helps people understand the measures that are in
development and management of the AI model, and place and the steps taken to maximise the performance,
who to contact for a human review of a decision. reliability, security, and robustness of the AI outcomes,
as well as what is the justification for the chosen type of
AI system.
Helps people understand what data about them, and what Helps people understand the considerations taken
other sources of data, were used in a particular AI decision, as about the effects that the AI decision-support system
well as the data used to train and test the AI model. may have on an individual and society.
Rationale Explanation
• How the procedures you have set up help • How you have set up your system’s
you provide meaningful explanations of the design and deployment workflow so
underlying logic of your AI model’s results. that it is appropriately interpretable and
explainable, including its data collection
• How these procedures are suitable given and preprocessing, model selection,
the model’s particular domain context explanation extraction, and explanation
and its possible impacts on the affected delivery procedures.
individuals and wider society.
• The formal and logical rationale of the AI • Translation of the system’s workings - its
system. How the system is verified against input and output variables, parameters
its formal specifications, so you can verify and so on – into accessible everyday
that the AI system will operate reliably and language, so you can clarify, in plain and
behave in accordance with its intended understandable terms, what role these
functionality. factors play in reasoning about the real-
world problem that the model is trying to
• The technical rationale of the system’s
address or solve.
output. How the model’s components (its
variables and rules) transform inputs into • Clarification on how a statistical result
outputs, so you know what role these is applied to the individual concerned. A
components play in producing that output. decision from the AI system will usually
By understanding the roles and functions of have a probability associated with it
the individual components, it is possible to corresponding to the confidence of the AI
identify the features and parameters that model of that decision. You can specify
significantly influence an output. how that probability influenced the final
decision made, the confidence threshold
at which decisions were accepted, and the
reasonings behind choosing that threshold.
Rationale explanation begins with explaining how the system operated the way it did.
Children should be able to fully understand how their data was mapped throughout the AI
system to determine a specific output. Rationale explanation often involves more technical
concepts than some of the other explanation types; thus it is critical that these technical
explanations of model choice, the system’s inner workings, and statistical results are
delivered in an age-appropriate manner. One way to assist with rationale explanation is by
having children involved from the design stages of the AI system so that they are informed
upfront of the types of models being used as well as being appraised of model decisions
made along the way.[35]
Responsibility Explanation
• The roles and functions across your • Who is responsible at each step from the
organisation that are involved in the design of an AI system through to its
various stages of developing and implementation to make sure that there is
implementing your AI system, including effective accountability throughout.
any human involvement in the decision-
making. If your system, or parts of it, are
procured, you should include information
about the providers or developers involved.
Because a responsibility explanation largely has to do with the governance of the design
and implementation of AI systems, it is, in a strict sense, entirely process-based. Even so,
there is important information about post-decision procedures that you should be able to
provide:
• Give individuals a way to directly contact the role or team responsible for the review.
You do not need to identify a specific person in your organisation. One person involved
in this should have implemented the decision and used the statistical results of a
decision-support system to come to a determination about an individual.
Data Explanation
• Clarify the input data used for a specific decision, and the sources of that data. This is
outcome-based because it refers to your AI system’s result for a particular stakeholder.
• In some cases, the output data may also require an explanation, particularly where
a user has been placed in a category which may not be clear to them. For example,
in the case of anomaly detection for financial fraud identification, the output might
be a distance measure (i.e. a distance calculated using various statistical or ML
techniques that serves as a classification or scoring mechanism) which places them at
a certain distance away from other people based on their transaction history.[38] Such a
classification may require an explanation.
When considering data explanation, it is critical that children’s data agency is promoted
and kept at the forefront of all decisions made along the way. The UNICEF Policy guidance
on AI for children recommends that a privacy-by-design approach is taken when designing
and implementing AI systems that use children’s data. As data explanation tends to utilise
more technical-based language — as seen in the Rational Explanation section above —
than the other remaining explanation types, it is extremely important to reflect upon how
these technical terms and systems can be explained to children in age-accessible language.
Showing what data was used in a particular decision will assist with contributing towards
transparency and building trust amongst children and organisations using their data to
design, develop, and deploy AI systems. It is imperative that data equity is placed at the
forefront to ensure that a diverse range of children’s data are included and that transparent
reporting was in place to demonstrate that this was achieved. The UK ICO Age Appropriate
Design Code contains principles of data minimisation—collecting only the minimum amount
of personal data necessary to carry out the system—and data sharing—considering the
best interests of the child when contemplating sharing data, both of which should be
implemented accordingly.
A data explanation type seeks to provide information about the outcomes and
processes involved in implementing the principle of Data Stewardship. For a
comprehensive understanding or refresher on what this principle encompasses,
please consult the Responsible Data Stewardship in Practice workbook.
Fairness Explanation
• Attempted to identify any underlying structural biases that may play a role in translating
your objectives into target variables and measurable proxies. When defining the problem
at the start of the AI project, these biases could influence what system designers expect
target variables to measure and what they statistically represent.
• Mitigated bias in the data pre-processing phase by taking into account the sector or
organisational context in which you are operating. When this process is automated or
outsourced, show that you have reviewed what has been done, and maintained oversight.
You should also attach information on the context to your metadata, so that those coming
to the pre-processed data later on have access to the relevant properties when they
undertake bias mitigation.
• Mitigated bias when the feature space was determined (i.e. when relevant features were
selected as input variables for your model). Choices made about grouping or separating
and including or excluding features, as well as more general judgements about the
comprehensiveness or coarseness of the total set of features, may have consequences for
protected groups of people.
• Mitigated bias when tuning parameters and setting metrics at the modelling, testing, and
evaluation stages (i.e. into the trained model). Your AI development team should iterate
the training of the model and peer review it to help ensure that how they choose to adjust
the parameters and metrics of the model are in line with your objectives of mitigating
bias.
• Mitigated bias by watching for hidden proxies for discriminatory features in your trained
model, as these may act as influences on your model’s output. Designers should also look
into whether the significant correlations and inferences determined by the model’s learning
mechanisms are justifiable.
• you have been explicit about the • the method you have applied in
formal definition(s) of fairness you operationalising your formalised
have chosen and why. Data scientists fairness criteria (for example)
can apply different formalised fairness adjust for outcome preferences
criteria to choose how specific groups by reweighting model parameters,
in a selected set will receive benefits embedding trade-offs in a classification
in comparison to others in the same procedure, or re-tooling algorithmic
set, or how the accuracy or precision results.
of the model will be distributed among
subgroups; and
• Your chosen measures to mitigate risks • The results of your initial (and ongoing)
of bias and discrimination at the data fairness testing, self-assessment, and
collection, preparation, model design and external validation – showing that your
testing stages. chosen fairness measures are deliberately
and effectively being integrated into model
• How these measures were chosen and how
design. You could do this by showing that
you have managed informational barriers
different groups of people receive similar
to bias-aware design such as limited access
outcomes, or that protected characteristics
to data about protected or sensitive traits
have not played a factor in the results.
of concern.
• Details about how your formal fairness • Explanations of how others similar
criteria were implemented in the case to the individual were treated (i.e.
of a particular decision or output. whether they received the same
decision outcome as the individual).
• Presentation of the relevant
For example, you could use information
fairness metrics and performance
generated from counterfactual
measurements in the delivery interface
scenarios to show whether or not
of your model. This should be geared to
someone with similar characteristics,
a non-technical audience and done in
but of a different ethnicity or gender,
an easily understandable way.
would receive the same decision
outcome as the individual.
Fairness explanation contains facets that overlap with the UNICEF Policy guidance on
AI for children principle of ‘Prioritizing fairness and non-discrimination’. This principle
includes providing active support for the most marginalised children so that they can
receive benefits from AI systems. As stated in the UNICEF guidance, this requires
attention to ‘the differences in cultural, social, and regional contexts of AI-related policies
and activities’, which should include considerations that expand past ensuring access to
these technologies—although this still remains a key barrier to accessing the benefits
that AI systems may provide. Several other key points outlined by UNICEF include
ensuring a diversity of children’s data in new datasets that are being developed and
removing bias against children or certain groups of children. In addition to ensuring data
representativeness and completeness, it is critical that teams consider the trade-off of
various fairness metrics and how these could affect children differently. How will these
decisions be reported in a way that is transparent and accessible for children?
Safety Explanation
• The proportion of examples for which your • The system is able to protect its
model generates a correct output. This architecture from unauthorised
component may also include other related modification or damage of any of its
performance measures such as precision, component parts. The system remains
sensitivity (true positives), and specificity continuously functional and accessible to
(true negatives). Individuals may want its authorised users and keeps confidential
to understand how accurate, precise, and and private information secure, even under
sensitive the output was in their particular hostile or adversarial conditions.
case.
• The system functions reliably and
• How dependably the AI system does what accurately in practice. Individuals may
it was intended to do. If it did not do what want to know how well the system
it was programmed to carry out, individuals works if things go wrong, how this has
may want to know why, and whether this been anticipated and tested, and how
happened in the process of producing the the system has been immunised from
decision that affected them. adversarial attacks.
• Why you chose those measures, and how you went about assuring
it.
• What you did at the data collection stage to ensure your training
data was up-to-date and reflective of the characteristics of the
people to whom the results apply.
• What the overall accuracy rate of the system was at testing stage.
• What you do to monitor this (e.g. measuring for concept drift over
time).
• How you measure it and how you went about assuring it.
• How you measure it and how you went about assuring it (e.g. how
limitations have been set on who is able to access the system,
when, and how).
• How you went about assuring it, e.g. how you’ve stress-tested the
system to understand how it responds to adversarial intervention,
implementer error, or skewed goal-execution by an automated
learner (in reinforcement learning applications).
While you may not be able to guarantee accuracy at an individual level, you should be able
to provide assurance that, at run-time, your AI system operated reliably, securely, and
robustly for a specific decision. In the case of accuracy and the other performance metrics,
however, you should include in your model’s delivery interface the results of your cross-
validation (training/testing splits) and any external validation carried out.
You may also include relevant information related to your system’s confusion matrix (the
table that provides the range of performance metrics) and ROC curve (receiver operating
characteristics)/AUC (area under the curve). Include guidance for users and affected
individuals that makes the meaning of these measurement methods, and specifically
the ones you have chosen to use, easily accessible and understandable. This should also
include a clear representation of the uncertainty of the results (e.g. confidence intervals
and error bars).
The safety of an AI system is particularly important when considering children as they have
unique needs and considerations. The UNICEF Policy guidance on AI for children child-
centric principle of ‘Ensuring safety for children’, draws attention to various considerations
that should be in place. The first of these is a mechanism for continuous monitoring and
assessment of the impact of AI systems on children as well as continuous monitoring
of these impacts throughout the entire lifecycle. UNICEF also calls for the testing of AI
systems using children’s data for safety, security, and robustness.
A safety explanation type seeks to provide information about the outcomes and
S
Safety
processes involved in implementing the principle of Safety. For a comprehensive
understanding or refresher on what this principle encompasses, please consult
the AI Safety in Practice workbook.
Impact Explanation
Demonstrate that you have thought about how your AI system will potentially affect individuals
and wider society. Clearly show affected individuals the process you have gone through to
determine these possible impacts.
• Showing the considerations you gave to • Information about how you plan to monitor
your AI system’s potential effects, how you and re-assess impacts while your system is
undertook these considerations, and the deployed.
measures and steps you took to mitigate
possible negative impacts on society, and
to amplify the positive effects.
Although the impact explanation is mainly about demonstrating that you have put
appropriate forethought into the potential ‘big picture’ effects, you should also consider
how to help decision recipients understand the impact of the AI-assisted decisions
that specifically affect them. For instance, you might explain the consequences for the
individual of the different possible decision outcomes and how, in some cases, changes in
their behaviour would have brought about a different outcome with more positive impacts.
This use of counterfactual assessment would help affected individuals make changes that
could lead to a different outcome in the future or allow them to challenge the decision.
It is critical that the potential impacts of an AI system that uses children’s data are fully
considered and weighed. This is especially relevant for systems that are not intended
for children to use but that children may have access to, such as smart devices in the
household. In order to fully understand possible impacts, it is imperative that organisations
engage with children to understand how possible impacts will differ from other audiences
due to children’s specific contexts and needs. Negative impacts of AI systems if not
considered properly before deployment could have long-term effects on children’s mental
health and well-being, future pathways, safety and security, amongst many others. One
way to go about considering all of the potential impacts is to engage in a meaningful way
with children through the entire AI project lifecycle to effectively investigate impacts that
may not have been thought of.
• selecting, extracting and delivering explanations that are differentiated according to the
needs and skills of the different audiences they are directed at.
or
ating Proje
Upd oning ct P
del si lan
nin
Mo provi g
D e
Expla
Assuran ina Pr
& ce M bility ob
ana le
se gem m
U ing en Fo
t
or
on m
t
12 1
M ste
rm
en
it
ty
ili em
ul
Sy
ab nag
at
io
a
e n
n
nc ai
M
ra pl
su Ex
11 2
As
Da rocur
ta E
g
P
nin
xtra ent
Trai
DEPLOYMENT DESIGN
em
ction
User
10 3
or
I m pl e
9 4
ly s is
S ys t t a t i o
na
m en
em
ta A
DEVELOPMENT
Da
n
8 5
As
su
Ex
ra
pl e
n
nc
ai
M ab
a n ili
g
er &
a g ty
M
in
7 6
ne g
od
em
g i in
en
el
n ss
ep t c
e
R
or ro E
ti ili t
y nt ep e
ng in ab me Pr tu r
Ex p la a na g e a
c e M Fe
A s s u ra n
Mo
de ng
lT ra i ni
est &T
in g
& Va l e ct ion
l id a t i o n M odel S e
Key Concepts Part Two: Putting the Principle of Explainability Into Practice 44
Task 1
Select Priority Explanations by Considering the Domain, Use Case, and
Impact on the Individual
At the Project Planning stage, when considering children’s rights or projects that involve
children, additional care needs to be made where children’s data is to be included as part of
AI systems. In addition to explanations related to the system itself and who is responsible,
increased transparency on the use and processing of data should be provided for parents,
guardians, and children that help explain their participation in simple language that is easy
to understand. More details should also be offered regarding the risks of not abiding by
child-centric requirements as part of the Project Planning stage, where justifications for
children’s involvement should be clearly outlined.
Key Concepts Part Two: Putting the Principle of Explainability Into Practice 45
Task 2
Collect and Pre-Process Your Data in an Explanation-Aware Manner
Related AI Lifecycle Stages: 5. You should check the data used within
Data Extraction or Procurement
your model to ensure it is sufficiently
representative of those you are making
Data Analysis decisions about. You should also consider
whether pre-processing techniques, such
as re-weighting, are required. These
1. The data that you collect and pre-process
decisions should be documented in your
before inputting it into your system has
Bias Self-Assessments (provided in the AI
an important role to play in the ability to
Fairness in Practice workbook), and will
derive each explanation type.
help your fairness explanation.
Key Concepts Part Two: Putting the Principle of Explainability Into Practice 46
Considerations for Child-Centred AI
When considering data extraction, procurement, and analysis of children’s data, it is important to ensure
that the UNICEF Policy guidance on AI for children and other normative tools regarding the responsible
use of children’s data, such as the GDPR and the UK ICO Age Appropriate Design Code is applied. We
would suggest that any data-related to children is pseudonymised or anonymised to limit potential
harms. Under current data protection regulations, children under the age of 13 are unable to consent
to the use of their personal data, so the lawful basis for processing such personal data must be clearly
communicated, in consultation with children as well as their parents or guardians should personal data be
used and what the potential impact may be.
Key Concepts Part Two: Putting the Principle of Explainability Into Practice 47
Task 3
Build Your System to Ensure You Are Able to Extract Relevant Information for
a Range of Explanation Types
Key Concepts Part Two: Putting the Principle of Explainability Into Practice 48
Considerations for Child-Centred AI
The Turing-UNICEF Pilot Project on Understanding AI Ethics and Safety for Children
established the difficulties that adults and children face in understanding the inner
workings of complex systems and models. As a result, identifying and mitigating risks is
key to ensuring that the selected AI model is justified in its use. Data risk management
frameworks (tools and methodologies that aim to establish clarity on the benefits and
risks of data and datasets) may be useful for this process. Given that it may be impossible
to transparently communicate black box AI systems to children and their parents or
guardians, the focus should turn to documenting model selection processes and ensuring
that supplemental interpretability tools are used where appropriate.
Interpretable Algorithms
When possible and application-appropriate, draw on standard and algorithmic
techniques that are as interpretable as possible.
Careful data pre-processing and iterative model development can hone the accuracy of
interpretable systems. As a result, the advantages gained by the combination of their
improved performance and their transparency may outweigh those of less transparent
approaches.
Key Concepts Part Two: Putting the Principle of Explainability Into Practice 49
‘Black Box’ AI Systems
When you consider using opaque algorithmic techniques, make sure that the
supplementary interpretability tools that you will use to explain the model are
appropriate to meet the domain-specific risks and explanatory needs that may arise
from deploying it.
For certain data processing activities, it may not be feasible to use straightforwardly
interpretable AI systems. For example, the most effective machine learning
approaches are likely to be opaque when you are using AI applications to classify
images, recognise speech, or detect anomalies in video footage. The feature spaces
of these kinds of AI systems grow exponentially to hundreds of thousands or
even millions of dimensions. At this scale of complexity, conventional methods of
interpretation no longer apply.
You should only use ‘black box’ models if you have thoroughly considered their
potential impacts and risks in advance. The members of your team should also have
determined that your use case and your organisational capacities/resources support
the responsible design and implementation of these systems.
Likewise, you should only use them if supplemental interpretability tools provide your
system with a domain-appropriate level of explainability. This needs to be reasonably
sufficient to mitigate the potential risks of the system and provide decision recipients
with meaningful information about the rationale of any given outcome. A range of
supplementary techniques and tools that assist in providing some access to the
underlying logic of ‘black box’ models is explored below and in Appendix B. B
Key Concepts Part Two: Putting the Principle of Explainability Into Practice 50
Task 4
Translate the Rationale of Your System’s Results Into Useable and Easily
Understandable Reasons
Considerations for Child Centred AI: In conjunction with the previous tasks, model reporting
on projects related to children’s rights and data should be explained in simple language to
ensure that children and their parents or guardians understand the impact of the model’s
results. While it may not be possible to state statistical outputs in non-technical terms, your
team should endeavour to outline what different generalised results or outcomes from the
model mean and how that translates to informing real-world decision-making processes.
This includes explaining the different inputs into the model and why those specific bits of
information are used.
Key Concepts Part Two: Putting the Principle of Explainability Into Practice 51
Task 5
Prepare Implementers to Deploy Your AI System
1. In cases where decisions are not fully • the limitations of AI and automated
automated, implementers need to be decision-support technologies;
meaningfully involved. This means that
• the benefits and risks of deploying
they need to be appropriately trained to
these systems to assist decision-
use the model’s results responsibly and
making, particularly how they help
fairly.
humans come to judgements rather
than replacing that judgement; and
Those who are selected to implement an AI system must understand child-centred design[39]
if they are to engage with children’s data as part of the system’s deployment. Having
knowledge of child-centred design will help implementers understand why the management
of children’s data or a system that deals with children’s data may require more stringent
ethical considerations. If implementers are to directly speak to and work with children, they
should go through background checks, such as the Disclosure and Barring Service (DBS),
(DBS)
and training on engaging with children to ensure that they are sensitive to their needs and
perspectives.
Key Concepts Part Two: Putting the Principle of Explainability Into Practice 52
Task 6
Consider How to Build and Present Your Explanation
3. How you present your explanation 7. To increase trust and awareness of your
depends on the way you make AI-assisted use of AI, you can proactively engage with
decisions, and on how people might your stakeholders by making information
expect you to deliver explanations you available about how you use AI systems
make without using AI. to help you make decisions.
With regard to children’s rights and data, based on the information provided in the previous
tasks, a short summary should be written to explain your AI-assisted decisions. Graphics,
videos, and interactive resources could be made available to support multiple ways of
delivering material for developing an understanding of the project and model. Additionally,
as much effort as possible should be made to ensure that such explanations are accessible.
Clear communication on the project, the model, the information and data, how potential
risks have been mitigated, as well as the benefits of the system to an appropriate audience,
in this case, children and their parents or guardians, can help limit unexpected outcomes.
Where possible, references to children-related policies should be added throughout the AI
explanation process to pinpoint where such considerations have been applied.
Key Concepts Part Two: Putting the Principle of Explainability Into Practice 53
The Explainability Assurance Management template will help you and your team accomplish
the six tasks illustrated previously.
EAM
Explainability Assurance Management
Template for Project Name
Date completed: Team members involved:
Task 1
Select Priority Explanations by Considering the Domain, Use
Case, and Impact on the Individual
Related AI Lifecycle Stage: Design Phase
a. Considering the project domain, use b. Considering the project domain, use
case, and potential impacts outlined in case, and potential impacts outlined
the Stakeholder Engagement Process in the SEP report, what information
(SEP) report, what other explanation will explanations require and how
types will you prioritise? comprehensive will this information
be?
• The SEP report can be found in the
AI Sustainability in Practice Part
One workbook.
c. What other explanation types will be
considered for this project?
We have considered the setting and sector in which our AI model will be used,
and how this affects the types of explanation we provide.
We have considered the potential impacts of our system, and how these affect
the scope and depth of the explanation we provide.
Task 2
Collect and Preprocess Your Data in an Explanation-Aware
Manner
Related AI Lifecycle Stages: Data Extraction or Procurement, Data Analysis
Drawn from the Data Factsheet Template found in the Responsible Data Stewardship in
Practice workbook.
Drawn from the Governance Workflow Map found in the AI Accountability in Practice
workbook.
Drawn from the Data Factsheet Template found in the Responsible Data Stewardship in
Practice workbook.
a. What is the source of the training c. What are the results of assessments
data? about data integrity, quality, and
protection and privacy?
Drawn from the Bias Self-Assessment Template found in the AI Fairness in Practice
workbook.
Drawn from the Safety Self-Assessment Template found in the AI Safety in Practice
workbook.
a. How have you established reasonable c. How have you ensured that the
safety objectives? modelling, testing, and monitoring
stages of the system development lead
to accurate results?
a. How have you implemented the results of your Stakeholder Impact Assessment?
Our data are representative of those we make decisions about, and are reliable,
relevant and up-to-date.
We have checked with a domain expert to ensure that the data we are using is
appropriate and adequate.
We know where the data has come from, the purpose it was originally collected
for, and how it was collected.
Where we are using synthetic data, we know how it was created and what
properties it has.
We know what the risks are of using the data we have chosen, as well as the risks
to data subjects of having their data included.
We have labelled the data we are using in our AI system with information
including what it is, where it is from, and the reasons why we have included it.
We have ensured as far as possible that the data does not reflect past
discrimination, whether based explicitly on protected characteristics or possible
proxies.
It is clear who within our organisation is responsible for data collection and
pre-processing.
a. What are the explanation needs for d. Does the data being used require a
this project, considering its domain, more or less explainable system?
use case, and potential impacts?
b. If selecting a black box model: What supplementary interpretability tools are you using
to help you provide explanations?
• Rationale Explanations?
• Safety Explanations?
• Responsibility Explanations?
• Impact Explanations?
Where we are using a ‘black box’ We have made it clear how the
system we have considered which model has been tested, including
supplementary interpretability which parts of the data have been
tools are appropriate for our use used to train the model, which
case. have been used to test it, and
which have formed the holdout
Where we are using ‘challenger’ data (i.e. test data that is
models [43]
alongside more intentionally excluded from the
interpretable models, we have dataset)’.
established that we are using them
lawfully and responsibly, and we We have a record of each time the
have justified why we are using model is updated, how each
them. version has changed, and how this
affects the model’s outputs.
a. Have implementers been appropriately trained to use the model’s results responsibly
and fairly?
1. Initial Considerations
Rationale
Responsibility
Data
Fairness
Safety
Impact
Activities
We offer a collaborative workshop format for team learning and discussion about the
concepts and activities presented in the workbook. To run this workshop with your team,
you will need to access the resources provided in the link below. This includes a digital
board and printable PDFs with case studies and activities to work through.
Case studies within the Activities sections of the AI Ethics and Governance in Practice
workbook series offer only basic information to guide reflective and deliberative activities.
If activity participants find that they do not have sufficient information to address an issue
that arises during deliberation, they should try to come up with something reasonable that
fits the context of their case study.
Corresponding Sections
→ Part One: Introduction to AI Explainability
(page 10
10))
Information Gathering
Evaluating Explanations
Practise evaluating the extent to which AI explanations meet their purpose and align
with the Maxims of AI Explainability.
Your team is a local authority social care team, which has a statutory duty to safeguard
and promote the welfare of children in need within your borough by providing care
services. When your team receives a referral and has reasons to be concerned that a child
may be suffering, or likely to suffer, significant harm, you are required to undertake an
investigation into the child’s circumstances.
As far as is reasonably consistent with your safeguarding duties, however, your team has
the duty to promote the upbringing of children by their families by providing an appropriate
range and level of services.
The Children’s Social Care (CSC) system across England, however, has faced an increase
in demand for its services alongside austerity measures, which have limited the resources
available to local authorities. Your team is no exception to this challenge and is
considering the use of an AI system that could aid care workers when conducting
investigations.
75% of councils are overspending for children’s 89% of directors of CSC services reporting in 2016-2017
services.[47] that they found it increasingly challenging to fulfil their
statutory duties to provide support to children in need due
to the limited available resources at their disposal’.[48]
Community Concerns
Although this system could offer your organisation evidence-based insights to ensure that
children receive care at the proper time, this high impact context calls for appropriate
attention to potential impacts on affected individuals, families, and communities. The
community impacted by CSC services has already expressed various concerns about the
use of ML in this setting:
Discriminatory
outcome
Type of model:
Binary logistic regression
It is a supervised ML model that assists with finding a relationship between features and the
probability of a particular outcome. It provides a value between 0 and 1 and converts this
value into a classification.
The logistic regression model used in this system is binary, meaning that it predicts one
of two mutually exclusive classes, in this case, “at risk” (positive class) or “not at risk”
(negative class).
This type of model is considered to be highly interpretable, because it is linear and feature
importance can be easily isolated—though the transformation of input variables makes the
relationship between them and the output a little more challenging to grasp.
However:
• The data available when developing the model was not standardised and there were
dispersed amounts of missing data, making a large proportion of the data unfit for use,
resulting in large amounts of unstructured and missing data.
*PG is parent/guardian – PG1 is the primary ** Child at risk is the target variable
caregiver, PG2 is the secondary caregiver
(if applicable)
The relative feature importance of the trained model, which indicates how each variable
contributes to increasing or decreasing the of odds of the model categorising a child as “at
risk” is as follows:
-1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
Constant
Income
num_dependents
age_pg_1
age_pg_2
num_referals
past_csc_Yes
education_pg_1_Masters
education_pg_1_PhD
education_pg_1_University Degree
education_pg_2_Masters
education_pg_2_PhD
education_pg_2_University Degree
employment_pg_1_Part-time
employment_pg_1_Unemployed
employment_pg_2_Part-time
employment_pg_2_Unemployed
past_domestic_abuse_Yes
criminal_record_either_parent_Yes
public_benefits_Yes
drug_abuse_treatment_Yes
past_csc_Yes
num_referrals
criminal_record_either_parent_Yes
Constant
employment_pg_2_Unemployed
public_benefits_Yes
employment_pg_1_Part-time
num_dependents
model correctly classifies child as at risk model correctly classifies child as not at risk
model incorrectly classifies child as at risk model incorrectly classifies child as not at risk
Team Instructions
1. While your facilitator presents, feel free 4. After the discussion, reconvene as a team
to read along with the Key Concepts having one volunteer from each group
section of the board, and make note report back with a summary of their group
of any questions you have. After the discussion.
presentation, feel free to ask any
questions.
Information Gathering
Objective
Practise gathering relevant information for building explanations of AI
systems.
Team Instructions
1. In this activity, your team will be split 5. A volunteer note-taker should write
into groups, each focused on gathering team answers under the Supporting
relevant information for building an Information column, they will also report
explanation for the Smith family. back to the team in the next activity.
Checklist
Evaluating Explanations
Objective
Practise evaluating the extent to which AI explanations meet their
purpose and align with the Maxims of AI Explainability.
Team Instructions
1. In this activity, volunteer note-takers are 3. Next, revisit the questions form the
to share each group’s results from the previous activity:
previous activity.
• What concerns may the AI system
2. After each volunteer has shared, have in the case study raise for AI
a group discussion, considering the Explainability?
following questions:
• Which explanation types do you think
• Has enough information been are most important for this AI system?
gathered to achieve the purpose of
this explanation type? Consider the 4. Your workshop co-facilitator will take
descriptions on the Key Concepts notes about your discussion on the board.
section of the board.
To help you get a better picture of the spectrum of algorithmic techniques, this Appendix lays out
some of the basic properties, potential uses, and interpretability characteristics of the most widely
used algorithms at present. These techniques are also listed in the table below. We recommend
that you work with a data scientist or related expert in considering or applying these techniques.
The 11 techniques listed in the left column are considered to be largely interpretable, although for
some of them, like the regression-based and tree-based algorithms, this depends on the number
of input features that are being processed. The four techniques in the right column are more or
less considered to be ‘black box’ algorithms.
• Naïve Bayes
Makes predictions about a target Advantageous in highly regulated High level of interpretability
variable by summing weighted sectors like finance (e.g. credit because of linearity and
input/predictor variables. scoring) and healthcare (predict monotonicity. Can become less
disease risk given eg lifestyle interpretable with increased
and existing health conditions) number of features (i.e. high
because it is simpler to calculate dimensionality).
and have oversight over.
Logistic regression
Extends linear regression to Like linear regression, Good level of interpretability but
classification problems by using advantageous in highly regulated less so than LR because features
a logistic function to transform and safety-critical sectors, but are transformed through a logistic
outputs to a probability between in use cases that are based in function and related to the
0 and 1. classification problems such as probabilistic result logarithmically
yes/no decisions on risks, credit, rather than as sums.
or disease.
Extends linear regression to Like linear regression, Good level of interpretability but
classification problems by using advantageous in highly regulated less so than LR because features
a logistic function to transform and safety-critical sectors, but are transformed through a logistic
outputs to a probability between in use cases that are based in function and related to the
0 and 1. classification problems such as probabilistic result logarithmically
yes/no decisions on risks, credit, rather than as sums.
or disease.
To model relationships between This extension of LR is applicable Good level of interpretability that
features and target variables that to use cases where target tracks the advantages of LR while
do not follow normal (Gaussian) variables have constraints that also introducing more flexibility.
distributions, a GLM introduces a require the exponential family Because of the link function,
link function that allows for the set of distributions (for instance, determining feature importance
extension of LR to non-normal if a target variable involves may be less straightforward
distributions. number of people, units of time than with the additive character
or probabilities of outcome, the of simple LR, and a degree of
result has to have a non-negative transparency may be lost.
value).
A model that uses inductive Because the step-by-step logic High level of interpretability if
branching methods to split data that produces DT outcomes is the DT is kept manageably small,
into interrelated decision nodes easily understandable to non- so that the logic can be followed
which terminate in classifications technical users (depending on end-to-end. The advantage of
or predictions. DT’s moves from number of nodes/features), DT’s over LR is that the former
starting ‘root’ nodes to terminal this method may be used in can accommodate non-linearity
‘leaf’ nodes, following a logical high-stakes and safety-critical and variable interaction while
decision path that is determined decision-support situations that remaining interpretable.
by Boolean-like ‘if-then’ operators require transparency as well as
that are weighted through many other use cases where
training. volume of relevant features is
reasonably low.
Closely related to DT’s, rule/ Because the step-by-step logic Rule lists and sets have one
decision lists and sets apply that produces DT outcomes is of the highest degrees of
series of if-then statements easily understandable to non- interpretability of all optimally
to input features in order to technical users (depending on performing and non-opaque
generate predictions. Whereas number of nodes/ features), algorithmic techniques. However,
decision lists are ordered and this method may be used in they also share with DT’s the
narrow down the logic behind an high-stakes and safety-critical same possibility that degrees of
output by applying ‘else’ rules, decision-support situations that understandability are lost as the
decision sets keep individual require transparency as well as rule lists get longer or the rule
if-then statements unordered many other use cases where sets get larger.
and largely independent, while volume of relevant features is
weighting them so that rule reasonably low.
voting can occur in generating
predictions.
SLIM utilises data-driven learning SLIM has been used in medical Because of its sparse and easily
to generate a simple scoring applications that require quick understandable character, SLIM
system that only requires users and streamlined but optimally offers optimal interpretability for
to add, subtract, and multiply accurate clinical decision-making. human-centred decision-support.
a few numbers in order to A version called Risk-Calibrated As a manually completed scoring
make a prediction. Because SLIM (RiskSLIM) has been system, it also ensures the active
SLIM produces such a sparse applied to the criminal justice engagement of the interpreter-
and accessible model, it can sector to show that its sparse user, who implements it.
be implemented quickly and linear methods are as effective
efficiently by non-technical users, for recidivism prediction as some
who need no special training to opaque models that are in use.
deploy the system.
Naïve Bayes
Uses Bayes rule to estimate the While this technique is called Naïve Bayes classifiers are
probability that a feature belongs naïve for reason of the unrealistic highly interpretable, because the
to a given class, assuming that assumption of the independence class membership probability
features are independent of each of features, it is known to be very of each feature is computed
other. To classify a feature, the effective. Its quick calculation independently. The assumption
Naïve Bayes classifier computes time and scalability make it that the conditional probabilities
the posterior probability for good for applications with high of the independent variables
the class membership of that dimensional feature spaces. are statistically independent,
feature by multiplying the prior Common applications include however, is also a weakness,
probability of the class with the spam filtering, recommender because feature interactions are
class conditional probability of the systems, and sentiment analysis. not considered.
feature.
Used to group data into KNN is a simple, intuitive, KNN works off the assumption
clusters for purposes of either versatile technique that has that classes or outcomes can
classification or prediction, wide applications but works best be predicted by looking at the
this technique identifies a with smaller datasets. Because proximity of the data points upon
neighbourhood of nearest it is non-parametric (makes no which they depend to data points
neighbours around a data assumptions about the underlying that yielded similar classes and
point of concern and either data distribution), it is effective outcomes. This intuition about
finds the mean outcome of for non-linear data without the importance of nearness/
them for prediction or the most losing interpretability. Common proximity is the explanation of all
common class among them for applications include recommender KNN results. Such an explanation
classification. systems, image recognition, and is more convincing when the
customer rating and sorting. feature space remains small, so
that similarity between instances
remains accessible.
Uses a special type of mapping SVM’s are extremely versatile for Low level of interpretability that
function to build a divider complex sorting tasks. They can depends on the dimensionality
between two sets of features in a be used to detect the presence of of the feature space. In context-
high dimensional feature space. objects in images (face/no face; determined cases, the use of
An SVM therefore sorts two cat/no cat), to classify text types SVM’s should be supplemented by
classes by maximising the margin (sports article/arts article), and secondary explanation tools.
of the decision boundary between to identify genes of interest in
them. bioinformatics.
Family of non-linear statistical ANN’s are best suited to complete The tendencies towards curviness
techniques (including recurrent, a wide range of classification (extreme non-linearity) and
convolutional, and deep neural and prediction tasks for high high-dimensionality of input
nets) that build complex dimensional feature spaces— variables produce very low-levels
mapping functions to predict or ie cases where there are very of interpretability in ANN’s. They
classify data by employing the large input vectors. Their uses are considered to be the epitome
feedforward—and sometimes may range from computer of ‘black box’ techniques. Where
feedback—of input variables vision, image recognition, appropriate, the use of ANN’s
through trained networks of sales and weather forecasting, should be supplemented by
interconnected and multi-layered pharmaceutical discovery, and secondary explanation tools.
operations. stock prediction to machine
translation, disease diagnosis,
and fraud detection.
Random forest
Builds a predictive model by Random forests are often used to Very low levels of interpretability
combining and averaging the effectively boost the performance may result from the method
results from multiple (sometimes of individual decisions trees, to of training these ensembles of
thousands) of decision trees that improve their error rates, and decision trees on bagged data
are trained on random subsets of to mitigate overfitting. They are and randomised features, the
shared features and training data. very popular in high-dimensional number of trees in a given forest,
problem areas like genomic and the possibility that individual
medicine and have also been trees may have hundreds or even
used extensively in computational thousands of nodes.
linguistics, econometrics, and
predictive risk modelling.
Ensemble methods
The distinction between the explanation of single instances of a model’s results and an
explanation of how it works across all of its outputs is often characterised as the difference
between local explanation and global explanation. Both types of explanation offer
potentially helpful support for providing significant information about the rationale behind
an AI system’s output.
Providing a global explanation entails offering a wide-angled view that captures the
inner-workings and logic of that model’s behaviour as a whole and across predictions or
classifications. This kind of explanation can capture the overall significance of features
and variable interactions for model outputs and significant changes in the relationship of
predictor and response variables across instances. It can also provide insights into dataset-
level and population-level patterns, which are crucial for both big picture and case-focused
decision-making.
Similarly, when this type of internal explanation is applied to a ‘black box model’, it can
shed light on that opaque model’s operation by breaking it down into more understandable,
analysable, and digestible parts. For example, in the case of an artificial neural network
(ANN), it can break it down into interpretable characteristics of its vectors, features,
interactions, layers, parameters etc. This is often referred to as ‘peeking into the black
box’.
• test the sensitivity of the outputs of an opaque model to perturbations in its inputs;
If, after considering domain, impact, and technical factors, you have chosen to use a ‘black
box’ AI system, your next step is to incorporate appropriate supplementary explanation
tools into building your model.
With this in mind, ‘fidelity’ may be a suitable primary goal for your technical ‘black box’
explanation strategy. In order for your supplementary tool to achieve a high level of
fidelity, it should provide a reliable and accurate approximation of the system’s behaviour.
For practical purposes, you should think both locally and globally when choosing the
supplementary explanation tools that will achieve fidelity.
This sort of global understanding may also provide crucial insights into your model’s more
general potential impacts on individuals and wider society, as well as allow your team
to improve the model, so that you can properly address concerns raised by such global
insights.
In the following pages we provide you with a table containing details of some of the more
widely used supplementary explanation strategies and tools, and we highlight some of their
strengths and weaknesses. Keep in mind, though, that this is a rapidly developing field,
so remaining up to date with the latest tools will mean that you and technical members of
your team need to move beyond the basic information we are offering there. The following
pages cover the following supplementary explanation strategies:
• Counterfactual Explanation
SM’s build a simpler interpretable model (often a decision tree or rule list) from the
dataset and predictions of an opaque system. The purpose of the SM is to provide an
understandable proxy of the complex model that estimates that model well, while not
having the same degree of opacity. They are good for assisting in processes of model
diagnosis and improvement and can help to expose overfitting and bias. They can also
represent some non-linearities and interactions that exist in the original model.
Limitations
As approximations, SM’s often fail to capture the full extent of non-linear relationships and
high-dimensional interactions among features. There is a seemingly unavoidable trade-
off between the need for the SM to be sufficiently simple so that it is understandable by
humans, and the need for that model to be sufficiently complex so that it can represent the
intricacies of how the mapping function of a ‘black box’ model works as a whole. That said,
the R2 measurement can provide a good quantitative metric of the accuracy of the SM’s
approximation of the original complex model.
Global/local? Internal/post-hoc?
For the most part, SM’s may be used both globally and locally. As simplified proxies, they
are post-hoc.
A PDP calculates and graphically represents the marginal effect of one or two input
features on the output of an opaque model by probing the dependency relation between
the input variable(s) of interest and the predicted outcome across the dataset, while
averaging out the effect of all the other features in the model. This is a good visualisation
tool, which allows a clear and intuitive representation of the nonlinear behaviour for
complex functions (like random forests and SVM’s). It is helpful, for instance, in showing
that a given model of interest meets monotonicity constraints across the distribution it fits.
Limitations
While PDP’s allow for valuable access to non-linear relationships between predictor and
response variables, and therefore also for comparisons of model behaviour with domain-
informed expectations of reasonable relationships between features and outcomes, they do
not account for interactions between the input variables under consideration. They may, in
this way, be misleading when certain features of interest are strongly correlated with other
model features.
Because PDP’s average out marginal effects, they may also be misleading if features have
uneven effects on the response function across different subsets of the data—ie where they
have different associations with the output at different points. The PDP may flatten out
these heterogeneities to the mean.
Global/local? Internal/post-hoc?
PDP’s are global post-hoc explainers that can also allow deeper causal understandings of
the behaviour of an opaque model through visualisation. These insights are, however, very
partial and incomplete both because PDP’s are unable to represent feature interactions and
heterogenous effects, and because they are unable to graphically represent more than a
couple of features at a time (human spatial thinking is limited to a few dimensions, so only
two variables in 3D space are easily graspable).
Refining and extending PDP’s, ICE plots graph the functional relationship between a
single feature and the predicted response for an individual instance. Holding all features
constant except the feature of interest, ICE plots represent how, for each observation,
a given prediction changes as the values of that feature vary. Significantly, ICE plots
therefore disaggregate or break down the averaging of partial feature effects generated in
a PDP by showing changes in the feature-output relationship for each specific instance, ie
observation-by-observation. This means that it can both detect interactions and account
for uneven associations of predictor and response variables.
Limitations
When used in combination with PDP’s, ICE plots can provide local information about
feature behaviour that enhances the coarser global explanations offered by PDP’s. Most
importantly, ICE plots are able to detect the interaction effects and heterogeneity in
features that remain hidden from PDP’s in virtue of the way they compute the partial
dependence of outputs on features of interest by averaging out the effect of the other
predictor variables. Still, although ICE plots can identify interactions, they are also liable
to missing significant correlations between features and become misleading in some
instances.
Constructing ICE plots can also become challenging when datasets are very large. In these
cases, time-saving approximation techniques such as sampling observation or binning
variables can be employed (but, depending on adjustments and size of the dataset, with an
unavoidable impact on explanation accuracy).
Global/local? Internal/post-hoc?
ALE plots are also more computationally tractable than PDP’s because they are able to use
techniques to compute effects in smaller intervals and chunks of observations.
Limitations
A notable limitation of ALE plots has to do with the way that they carve up the data
distribution into intervals that are largely chosen by the explanation designer. If there
are too many intervals, the prediction differences may become too small and less stably
estimate influences. If the intervals are widened too much, the graph will cease to
sufficiently represent the complexity of the underlying model.
While ALE plots are good for providing global explanations that account for feature
correlations, the strengths of using PDP’s in combination with ICE plots should also
be considered (especially when there are less interaction effects in the model being
explained). All three visualisation techniques shed light on different dimensions of interest
in explaining opaque systems, so the appropriateness of employing them should be
weighed case-by-case.
Global/local? Internal/post-hoc?
ALE plots are a global and post-hoc form of supplementary explanation.
The global variable importance strategy calculates the contribution of each input feature
to model output across the dataset by permuting the feature of interest and measuring
changes in the prediction error: if changing the value of the permuted feature increases
the model error, then that feature is considered to be important. Utilising global variable
importance to understand the relative influence of features on the performance of the
model can provide significant insight into the logic underlying the model’s behaviour. This
method also provides valuable understanding about non-linearities in the complex model
that is being explained.
Limitations
While permuting variables to measure their relative importance, to some extent, accounts
for interaction effects, there is still a high degree of imprecision in the method with regard
to which variables are interacting and how much these interactions are impacting the
performance of the model.
A bigger picture limitation of global variable importance comes from what is known as the
‘Rashomon effect’. This refers to the variety of different models that may fit the same data
distribution equally well. These models may have very different sets of significant features.
Because the permutation-based technique can only provide explanatory insight with regard
to a single model’s performance, it is unable to address this wider problem of the variety of
effective explanation schemes.
Global/local? Internal/post-hoc?
Global variable importance is a form of global and post-hoc explanation.
The global variable interaction strategy computes the importance of variable interactions
across the dataset by measuring the variance in the model’s prediction when potentially
interacting variables are assumed to be independent. This is primarily done by calculating
an ‘H-statistic’ where a no-interaction partial dependence function is subtracted from an
observed partial dependence function in order to compute the variance in the prediction.
This is a versatile explanation strategy, which has been employed to calculate interaction
effects in many types of complex models including ANN’s and Random Forests. It can be
used to calculate interactions between two or more variables and also between variables
and the response function as a whole. It has been effectively used, for example, in
biological research to identify interaction effects among genes.
Limitations
While the basic capacity to identify interaction effects in complex models is a positive
contribution of global variable interaction as a supplementary explanatory strategy, there
are a couple of potential drawbacks to which you may want to pay attention.
First, there is no established metric in this method to determine the quantitative threshold
across which measured interactions become significant. The relative significance of
interactions is useful information as such, but there is no way to know at which point
interactions are strong enough to exercise effects.
Second, the computational burden of this explanation strategy is very high, because
interaction effects are being calculated combinatorially across all the data points. This
means that as the number of data points increase, the number of necessary computations
increase exponentially.
Global/local? Internal/post-hoc?
Sensitivity analysis and LRP are supplementary explanation tools used for artificial neural
networks. Sensitivity analysis identifies the most relevant features of an input vector
by calculating local gradients to determine how a data point has to be moved to change
the output label. Here, an output’s sensitivity to such changes in input values identifies
the most relevant features. LRP is another method to identify feature relevance that is
downstream from sensitivity analysis. It uses a strategy of moving backward through
the layers of a neural net graph to map patterns of high activation in the nodes and
ultimately generates interpretable groupings of salient input variables that can be visually
represented in a heat or pixel attribution map.
Limitations
Both sensitivity analysis and LRP identify important variables in the vastly large feature
spaces of neural nets. These explanatory techniques find visually informative patterns
by mathematically piecing together the values of individual nodes in the network. As a
consequence of this piecemeal approach, they offer very little by way of an account of the
reasoning or logic behind the results of an ANNs’ data processing.
Recently, more and more research has focused on attention-based methods of identifying
the higher-order representations that are guiding the mapping functions of these kinds of
models as well as on interpretable CBR methods that are integrated into ANN architectures
and that analyse images by identifying prototypical parts and combining them into a
representational wholes. These newer techniques are showing that some significant
progress is being made in uncovering the underlying logic of some ANN’s.
Global/local? Internal/post-hoc?
Sensitivity analysis and salience mapping are forms of local and post-hoc explanation,
although the recent incorporation of CBR techniques is moving neural net explanations
toward a more internal basis of interpretation.
LIME does this by generating a simple linear regression model by weighting the values of
the data points, which were produced by randomly perturbing the opaque model, according
to their proximity to the original prediction or classification. The closest of these values to
the instance being explained are weighted the heaviest, so that the supplemental model
can produce an explanation of feature importance that is locally faithful to that instance.
Note that other interpretive models like decision trees may be used as well.
Limitations
While LIME appears to be a step in the right direction, in its versatility and in the
availability of many iterations in very useable software, a host of issues that present
challenges to the approach remains unresolved.
For instance, the crucial aspect of how to properly define the proximity measure for the
‘neighbourhood’ or ‘local region’ where the explanation applies remains unclear, and small
changes in the scale of the chosen measure can lead to greatly diverging explanations.
Likewise, the explanation produced by the supplemental linear model can quickly become
unreliable, even with small and virtually unnoticeable perturbations of the system it is
attempting to approximate. This challenges the basic assumption that there is always
some simplified interpretable model that successfully approximates the underlying model
reasonably well near any given data point.
LIME’s creators have largely acknowledged these shortcomings and have recently offered a
new explanatory approach that they call ‘anchors’. These ‘high precision rules’ incorporate
into their formal structures ‘reasonable patterns’ that are operating within the underlying
model (such as the implicit linguistic conventions that are at work in a sentiment prediction
model), so that they can establish suitable and faithful boundaries of their explanatory
coverage of its predictions or classifications.
Global/local? Internal/post-hoc?
SHAP uses concepts from cooperative game theory to define a ‘Shapley value’ for a
feature of concern that provides a measurement of its influence on the underlying model’s
prediction.
Broadly, this value is calculated by averaging the feature’s marginal contribution to every
possible prediction for the instance under consideration. The way SHAP computes marginal
contributions is by constructing two instances: the first instance includes the feature being
measured, while the second leaves it out by substituting a randomly selected stand-in
variable for it. After calculating the prediction for each of these instances by plugging their
values into the original model, the result of the second is subtracted from that of the first
to determine the marginal contribution of the feature. This procedure is then repeated for
all possible combinations of features so that the weighted average of all of the marginal
contributions of the feature of concern can be computed.
This method then allows SHAP, by extension, to estimate the Shapley values for all input
features in the set to produce the complete distribution of the prediction for the instance.
While computationally intensive, this means that for the calculation of the specific instance,
SHAP can axiomatically guarantee the consistency and accuracy of its reckoning of the
marginal effect of the feature. This computational robustness has made SHAP attractive
as an explainer for a wide variety of complex models, because it can provide a more
comprehensive picture of relative feature influence for a given instance than any other
post-hoc explanation tool.
Limitations
Of the several drawbacks of SHAP, the most practical one is that such a procedure is
computationally burdensome and becomes intractable beyond a certain threshold.
Note, though, some later SHAP versions do offer methods of approximation such as Kernel
SHAP and Shapley Sampling Values to avoid this excessive computational expense. These
methods do, however, affect the overall accuracy of the method.
Another significant limitation of SHAP is that its method of sampling values in order to
measure marginal variable contributions assumes feature independence (ie that values
sampled are not correlated in ways that might significantly affect the output for a particular
calculation). As a consequence, the interaction effects engendered by and between
the stand-in variables that are used as substitutes for left-out features are necessarily
unaccounted for when conditional contributions are approximated. The result is the
introduction of uncertainty into the explanation that is produced, because the complexity
of multivariate interactions in the underlying model may not be sufficiently captured by the
simplicity of this supplemental interpretability technique. This drawback in sampling (as
There are currently efforts being made to account for feature dependencies in the SHAP
calculations. The original creators of the technique have introduced Tree SHAP to, at least
partially, include feature interactions. Others have recently introduced extensions of Kernel
SHAP.
Global/local? Internal/post-hoc?
Counterfactual explanations offer information about how specific factors that influenced
an algorithmic decision can be changed so that better alternatives can be realised by the
recipient of a particular decision or outcome.
Limitations
While counterfactual explanation offers a useful way to contrastively explore how feature
importance may influence an outcome, it has limitations that originate in the variety
of possible features that may be included when considering alternative outcomes. In
certain cases, the sheer number of potentially significant features that could be at play
in counterfactual explanations of a given result can make a clear and direct explanation
difficult to obtain and selected sets of possible explanations seem potentially arbitrary.
Moreover, there are as yet limitations on the types of datasets and functions to which these
kinds of explanations are applicable.
Global/local? Internal/post-hoc?
Limitations
Global/local? Internal/post-hoc?
Because self-explaining and attention-based systems are secondary tools that can utilise
many different methods of explanation, they may be global or local, internal or post-hoc,
or a combination of any of them.
18 Leslie, D. (2020). Explaining Decisions Made 27 Burr, C., & Leslie, D. (2022). Ethical assurance:
with AI. The Alan Turing Institute and ICO. a practical approach to the responsible design,
http://dx.doi.org/10.2139/ssrn.4033308 development, and deployment of data-driven
technologies. AI and Ethics, 1-26. https://doi.
19 Equality Act 2010, c. 5. https://www. org/10.1007/s43681-022-00178-0
legislation.gov.uk/ukpga/2010/15/part/2/
chapter/2/crossheading/adjustments-for- 28 Burr, C., & Leslie, D. (2022). Ethical assurance:
disabled-persons a practical approach to the responsible design,
development, and deployment of data-driven
20 Equality and Human Rights Commission. technologies. AI and Ethics, 1-26. https://doi.
(2014). The Essential Guide to the Public org/10.1007/s43681-022-00178-0
Sector Equality Duty: England and Non-
Devolved Public Authorities in Scotland and 29 Königstorfer, F., & Thalmann, S. (2022). AI
Wales. https://www.equalityhumanrights.com/ Documentation: A path to accountability.
guidance/public-sector-equality-duty-psed Journal of Responsible Technology, 11,
100043. https://doi.org/10.1016/j.
21 UNICEF. (2021). Policy guidance on AI for jrt.2022.100043
children 2.0. UNICEF. https://www.unicef.org/
innocenti/reports/policy-guidance-ai-children 30 Leslie, D., Rincón, C., Briggs, M., Perini, A.,
Jayadeva, S., Borda, A., Bennett, SJ. Burr, C.,
22 Leslie, D. (2020). Explaining Decisions Made Aitken, M., Katell, M., Fischer, C., Wong, J., and
with AI. The Alan Turing Institute and ICO. Kherroubi Garcia, I. (2023). AI Sustainability in
http://dx.doi.org/10.2139/ssrn.4033308 Practice. Part One: Foundations for Sustainable
AI Projects.The Alan Turing Institute.
23 UNICEF. (2021). Policy guidance on AI for
children 2.0. UNICEF. https://www.unicef.org/ 31 UNICEF. (2021). Policy guidance on AI for
innocenti/reports/policy-guidance-ai-children children 2.0. UNICEF. https://www.unicef.org/
innocenti/reports/policy-guidance-ai-children
aiethics.turing.ac.uk
eV rsion 1.2
This work is licensed under the terms of the Creative Commons Attribution
License 4.0 which permits unrestricted use, provided the original author
and source are credited. The license is available at:
AI Explainability in Practice
https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode 108