Responsible Machine Learning
Responsible Machine Learning
Learning
Actionable Strategies for Mitigating
Risks and Driving Adoption
978-1-492-09084-7
[LSI]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
iii
Discrimination Testing and Remediation 45
Securing Machine Learning 50
Privacy-Enhancing Technologies for Machine Learning 53
iv | Table of Contents
Preface
v
world. Since there are so many issues to cover, we don’t dwell on any
subject for very long. We hope that interested readers will engage in
and explore our references to understand the real-world implica‐
tions and the real, human consequences. We also hope that the sheer
volume and diversity of presented material leave an indelible
impression on how readers think about ML.
We break our discussions of risk mitigation and ML adoption
strategies into three major chapters: people, processes, and technol‐
ogies. The people and processes chapters (Chapters 2 and 3)
describe actions people can take, and processes that organizations
can employ to mitigate ML risks and increase ML adoption. These
two chapters are meant to be approachable for non-technical audi‐
ences. While it includes no mathematical formulas, the technology
chapter (Chapter 4) requires some technical background in ML, and
it may be best suited for ML practitioners and their frontline man‐
agers. It’s also important to say that the ML risks we’re addressing
are sophisticated, unsolved, and intersectional. Given the complex‐
ity of ML systems and how they interact with the world, there’s no
silver bullet to derisk an ML system completely.
Moreover, the serial nature of a printed report means that we
address risks and mitigation strategies one by one. In truth, both the
risks and strategies are inherently connected: compliance, discrimi‐
nation, instability, privacy, and security risks are related, and so are
the actions you could take to address them. Since deployment by an
organization is often where risks become real for ML, proper risk
mitigation is a key last-mile problem for ML’s success. Although
imperfect, we hope you’ll find the proposed strategies helpful and
actionable to decrease risk and maximize the long-term value of ML
in your organization.
vi | Preface
CHAPTER 1
Introduction to Responsible
Machine Learning
“Success in creating effective AI, could be the biggest event in the history
of our civilization. Or the worst.”
—Stephen Hawking
Machine learning (ML) systems can make and save money for
organizations across industries, and they’re a critical aspect of many
organization’s digital transformation plans. For these reasons (and
others), ML investments were increasing rapidly before the
COVID-19 crisis, and they’re expected to stay healthy as the situa‐
tion unfolds. However, ML systems present risks for operators, con‐
sumers, and the general public. In many ways, this is similar to an
older generation of transformational commercial technologies, like
jetliners and nuclear reactors. Like these technologies, ML can fail
on its own, or adversaries can attack it. Unlike some older transfor‐
mational technologies, and despite growing evidence of ML’s capa‐
bility to do serious harm, ML practitioners don’t seem to consider
risk mitigation to be a primary directive of their work.1
Common ML failure modes include unaccountable black-box
mechanisms, social discrimination, security vulnerabilities, privacy
harms, and the decay of system quality over time. Most ML attacks
involve insider manipulation of training data and model
1
mechanisms; manipulation of predictions or intellectual property
extraction by external adversaries; or trojans hidden in third-party
data, models, or other artifacts. When failures or attacks spiral out
of control, they become full-blown AI incidents, creating significant
adverse outcomes for the operator or the public. There have been
over 1,000 reports of AI incidents to date.
While AI incidents are receiving more attention in the news and
technology media of late, the hype around ML still seems to focus
mostly on ML successes and not on ML risks. Subsequently, some
decision makers and practitioners implement ML without a sober
evaluation of its dangers. This report will cut through the hype to
provide a high-level overview of ML’s emerging risk mitigation prac‐
tices—often called “responsible machine learning.” This first chapter
will give definitions of responsible AI and ML, and Chapters 2, 3,
and 4 discuss viable ML risk mitigation steps for people, processes,
and technologies, respectively. Chapter 5 closes this report with
business-driven perspectives on risk and trust.
“People worry that computers will get too smart and take over the
world, but the real problem is that they’re too stupid and they’ve already
taken over the world.”
—Pedro Domingos
Since its inception, there has been the temptation to give AI and ML
increasingly more agency. However, this should not be the goal for
organizations deploying ML today. Due to all the AI incidents we’re
seeing, we firmly believe the technology isn’t mature enough.
Instead, the goal should be to make sure humans are in the loop of
ML-based decision making. Human involvement is imperative
because an all too common mistake, as the quote above highlights, is
for firms to assume their responsible ML duties lie solely in techno‐
logical implementation. This chapter presents many of the human
considerations that companies must address when building out their
ML infrastructure. We’ll start with organizational culture then shift
the discussion to how practitioners and consumers can get more
involved with the inner workings of ML systems. The chapter closes
by highlighting some recent examples of employee activism and data
journalism related to the responsible practice of ML.
7
diversity. We’ll also discuss the arguably stale adage, “go fast and
break things.”
Accountability
A key to the successful mitigation of ML risks is real accountability.
Ask yourself: “Who tracks the way ML is developed and used at my
organization? Who is responsible for auditing our ML systems? Do
we have AI incident response plans?” For many organizations today,
the answers may be, “no one” and, “no.” If no one’s job is on the line
when an ML system fails or gets attacked, then it’s possible that no
one at that organization really cares about ML risks. This is a pri‐
mary reason that many leading financial institutions now employ
chief model risk officers. Smaller organizations may not be able to
spare an entire full-time employee to monitor ML model risk. Still,
it’s essential to have an individual or group responsible and held
accountable if ML systems misbehave. In our experience, if an orga‐
nization assumes everyone is accountable for ML risk and AI inci‐
dents, the reality is that no one is accountable.
Dogfooding
Dogfooding is a term from software engineering that refers to an
organization using its own software, i.e., “eating your own dog food.”
In the context of responsible ML, dogfooding brings an additional
layer of alpha or prealpha testing that is often neglected in the mad
dash to profit from a perceived ML gold rush. More importantly,
dogfooding can bring legal and risk questions to the forefront. If an
organization has developed an ML system that operates in a manner
that, say, violates their own privacy policies, or is meant to be decep‐
tive or manipulative, employees engaging in dogfooding might find
this objectionable and raise concerns. Dogfooding can bring the
Golden Rule into ML: if you wouldn’t use an ML system on yourself,
you probably should not use it on others. We’ll discuss diversity in
the next section, but it’s worth mentioning here that if your team is
more diverse, dogfooding is more likely to detect a wider variety of
objectionable (or problematic) features.
When you’re ready to move beyond these basic steps, check out the
referenced papers from Google Research and look into resources
from public model risk management forums, e.g., The North Amer‐
ican Chief Risk Officer Council.
Domain Expertise
Many were introduced to the human expertise in-the-loop concept
by the Pandora recommendation algorithm or something similar,
which has ultimately evolved into a multibillion-dollar industry of
expert labeling and decision review of ML systems. More generally,
real-world success in ML almost always requires some input from
humans with a deep understanding of the problem domain. Of
course, such experts can help with feature selection and engineering,
and interpretation of ML results. But the experts can also serve as a
sanity check mechanism. For instance, if you’re developing a medi‐
cal ML system, you should consult physicians and other medical
professionals. How will generalist data scientists understand the
subtlety and complexity inherent in medical data and the results of
systems trained on such data? They might not be able to, which can
lead to AI incidents when the system is deployed. The social scien‐
ces deserve a special callout in this regard as well. Described as
“tech’s quiet colonization of the social sciences”, some organizations
are pursuing ill-advised ML projects that either replace decisions
that would be made by trained social scientists or they are using
practices, such as facial recognition for criminal risk assessments
that have been condemned by actual social scientists.
1 The famous statistician George Box is credited with saying, “all models are wrong, but
some are useful”.
Kill Switches
The title of a recent Forbes article asks, “Will There Be a Kill Switch
For AI?”. If your organization wants to mitigate risks around ML
and AI, we hope the answer for your ML systems will be, “yes.”
ML systems can make decisions very quickly—orders of magnitudes
faster than humans. So, if your ML system goes seriously wrong, you
will want to be able to turn it off fast. But how will you even know if
your ML system is misbehaving? ML systems should be monitored
for multiple kinds of problems, including inaccuracy, instability, dis‐
crimination, leakage of private data, and security vulnerabilities.
Once you’ve detected a severe problem, the question then becomes,
can you turn off the ML system? ML system outputs often feed into
downstream business processes, sometimes including other ML sys‐
tems. These systems and business processes can be mission critical,
as in the case of an ML system used for credit underwriting or
17
Discrimination In, Discrimination Out
We hear about many discriminatory algorithms these days, but dis‐
crimination tends to enter ML systems most often through poor
experimental design or biased, unrepresentative, or mislabeled
training data. This is a crucial process concern because business
goals often define an ML model’s inherent experiment, and training
data is usually collected or purchased as part of some broader
organizational mechanism. When an organization is designing an
ML system or selecting data for an ML project, discrimination can
enter into the system in many ways, including:
Problem framing (e.g., association or label bias)
In ML, we essentially use a dataset to ask the question: is X pre‐
dictive of y? Sometimes simply asking this question can set up a
discriminatory premise. For instance, predicting criminal risk
(y) based on facial characteristics (X), or using individual
healthcare costs (y) as an inherently biased substitute for health‐
care needs. Said another way, just because you have access to
data on different topics, doesn’t mean that ML can link the two
topics without introducing or perpetuating discrimination.
Labeling or annotation (e.g., exclusion, sampling, reporting, label, or
nonresponse bias)
Data is often cleaned or preprocessed before it ever reaches an
ML algorithm. These processes, if done without care, can intro‐
duce discrimination. For example, switching race to a numeric
code, misinterpreting a coded value for a particular demo‐
graphic group, or mislabeling sound or images due to uncon‐
scious human bias are just a few ways discrimination can seep
into data cleaning or preprocessing.
Unrepresentative data (e.g., selection or coverage bias)
ML models require highly representative training data. Con‐
sider training a facial recognition classifier on face images col‐
lected in one country, for example, the US, and then applying it
in another country, like Japan or Kenya. The chances are that
the model will be less accurate for the people it learned less
about during training. This is yet another way ML can be
discriminatory.
Once discriminatory data enters into an ML system, you can bet dis‐
criminatory predictions will quickly follow. A real difference
between ML and human decision making is speed. ML systems can
make decisions about a lot of people, very quickly. Moreover, the
complexity of ML models can make finding discrimination more
difficult than in traditional linear models. All of this can add up to a
disastrous AI incident, like the one described in a recent Science arti‐
cle, where a major US insurer unintentionally used an allegedly dis‐
criminatory algorithm to allocate healthcare resources for perhaps
millions of patients. Such discrimination can cause significant harm
to consumers and regulatory and reputational problems for
Model Governance
The decision to move into the world of ML is not a simple undertak‐
ing and smart leaders can be left asking, “how can I mitigate the
risks for my organization?” Luckily, there are mature model gover‐
nance practices crafted by government agencies and private compa‐
nies that your organization can use to get started. This section will
highlight some of the governance structures and processes your
organization can employ to ensure fairness, accountability, and
Model Governance | 25
transparency in your ML functions. This discussion is split into
three major sections: model monitoring, model documentation, and
organizational concerns. We’ll wrap up the discussion of model gov‐
ernance with some brief advice for practitioners looking for just the
bare bones needed to get started on basic model governance.
Model Monitoring
Model monitoring is a stage in the ML lifecycle that involves keep‐
ing tabs on your ML system while it is making predictions or deci‐
sions on new, live data. There’s lots to be aware of when monitoring
ML models. First and foremost is model decay. Model decay is a
common failure mode for ML systems. It happens when the charac‐
teristics of live data coming into an ML system drift away from those
of the training data, making the underlying ML model less accurate.
Model drift is most often described in terms of decreasing model
accuracy, but can also affect the fairness or security of ML systems.
Model drift is typically detected by monitoring the statistical proper‐
ties of model inputs and predictions and comparing them to those
recorded at training time. For fairness and security, monitoring
could involve real-time discrimination testing and ongoing red-
teaming or security audits of deployed ML systems, respectively.
Anytime a significant drift is detected, system stakeholders should
be alerted. To address accuracy drift, ML systems are typically
retrained with new data when drift is detected, or at frequent inter‐
vals to avoid drift altogether. Addressing drifts in fairness or security
is a more novel pursuit and standard practices are not yet estab‐
lished. However, the discrimintion testing and remediation and
security countermeasures discussed elsewhere in this report could
also be helpful in this regard.
Another major topic in model monitoring is anomaly detection.
Strange input or output values from an ML system can be indicative
of stability problems or security and privacy vulnerabilities. It’s pos‐
sible to use statistics, ML, and business rules to monitor anomalous
behavior in both inputs and outputs, and across an entire ML sys‐
tem. Just like when model drift is detected, system stakeholders
must be made aware of anomalous ML system inputs and outputs.
Two additional and worrisome scenarios for which to monitor are
error propagation and feedback loops. Error propagation refers to
problems in the output of some data or ML system leading to wor‐
sening errors in the consuming ML system or in subsequent
Model Documentation
All organizational predictive models should be inventoried and doc‐
umented. When done correctly, model documentation should pro‐
vide all pertinent technical, business, and personnel information
about a model, enabling detailed human review, maintenance con‐
tinuity, and some degree of incident response. Moreover, in some
industries, model documentation is already a regulatory require‐
ment. The main drawback of model documentation is that it is tedi‐
ous and time consuming, sometimes taking longer to write the
documentation than to train the ML model itself. One answer to this
problem was provided by Google research in their recent model
cards and datasheet work. Model cards and datasheets provide
quick, summary information about the models and data used in ML
systems, respectively. Another promising answer has started to
emerge in the commercial analytics market: automatic model docu‐
mentation. Purchasing or building ML software that creates model
documents along with your ML-model training can be a great solu‐
tion for ML teams looking to document their models and save time
on resource-intensive model governance activities. Of course, even
if model documentation is generated automatically, humans must
still read the documentation and raise concerns when necessary.
Model Governance | 27
structure for organizations looking to build ML into their opera‐
tional processes. This chart uses the acronym D&A, a common
industry shorthand for data and analytics groups, especially in years
past.
Model Governance | 29
Figure 3-3. Proposed model governance workflow and organizational
responsibility architecture (courtesy of Ben Cox and H2O.ai).
AI Incident Response
Like nearly all of the commercial technologies that came before it,
ML systems fail and can be attacked. To date, there have been over
1,000 public reports of such incidents. Even our most secure, regula‐
ted, and monitored commercial technologies, like airliners and
nuclear reactors, experience attacks and failures. Given that very few
organizations are auditing and monitoring ML systems with the
same rigor, and that regulatory interest in ML systems is on the rise,
we’ll probably hear more about AI incidents in the next few years.
Furthermore, when a technology is important to an organization’s
mission, it’s not uncommon to have built-in redundancy and inci‐
dent response plans. ML systems should be no different. Having a
plan in place for ML system failures or attacks can be the difference
between a glitch in system behavior and a serious AI incident with
negative consequences for both the organization and the public.
The stress and confusion of an active AI incident can make incident
response difficult. Who has the authority to respond? Who has the
budget? What are the commercial consequences of turning off an
ML system? These basic questions and many more are why AI inci‐
dent response requires advanced planning. The idea of being pre‐
pared for problems is not new for computer systems, or even for
predictive modeling. Respected institutions such as SANS and NIST
already publish computer security incident response plans. Model
governance practices typically include inventories of ML systems
with detailed documentation designed, among other goals, to help
AI Incident Response | 31
respond to ML system failures. While conventional incident
response plans and model governance are great places to start miti‐
gating AI incident risks, neither are a perfect fit for AI incident
response. Many conventional incident response plans do not yet
address specialized ML attacks, and model governance often does
not explicitly address incident response or ML security. To see a
sample AI incident response plan that builds off both traditional
incident response and model governance, and that incorporates the
necessary specifics for ML, check out the free and open Sample AI
Incident Checklist. And don’t wait until it’s too late to make your own
AI incident response plan!
“If builders built houses the way programmers built programs, the first
woodpecker to come along would destroy civilization.”
—Gerald M. Weinberg
Reproducibility
Establishing reproducible benchmarks to gauge improvements (or
degradation) in accuracy, fairness, interpretability, privacy, or
37
security is crucial for applying the scientific method. Reproducibility
can also be necessary for regulatory compliance in certain cases.
Unfortunately, the complexity of ML workflows makes reproducibil‐
ity a real challenge. This section presents a few pointers for increas‐
ing reproducibility in your organization’s ML systems.
Metadata
Metadata about ML systems allows data scientists to track all model
artifacts that lead to a deployed model (e.g., datasets, preprocessing
steps, model, data and model validation results, human sign offs,
and deployment details). Many of the additional reproducibility
steps presented below are just specific ways to track ML system
metadata. Tracking metadata also allows retracing of what went
wrong, throughout the entire ML life cycle, when an AI incident
occurs. For an open-source example of a nice tool for tracking meta‐
data, checkout TensorFlow’s MLMD.
Random Seeds
ML models are subject to something known as the “multiplicity of
good models,” or the “Rashomon effect.” Unlike more traditional
linear models, this means that there can be huge numbers of accept‐
able ML models for any given dataset. ML models also utilize ran‐
domness, which can cause unexpected results. These factors
conspire to make reproducible outcomes in ML models more diffi‐
cult than in traditional statistics and software engineering. Luckily,
almost all contemporary, high-quality ML software comes with a
“seed” parameter to help improve reproducibility. The seed typically
starts the random number generator inside an algorithm at the same
place every time. The key with seeds is to understand how they work
in different packages and then use them consistently.
Version Control
ML code is often highly intricate and typically relies on many third-
party libraries or packages. Of course, changes in your code and
changes to third-party code can change the outcomes of an ML sys‐
tem. Systematically keeping track of these changes is another good
way to increase reproducibility, transparency, and your sanity. Git
and GitHub are free and ubiquitous resources for software version
control, but there are plenty of other options to explore. Ensuring
38 | Chapter 4: Technology: Engineering Machine Learning for Human Trust and Understanding
correct versions of certain ML libraries is also very important in any
ML application, as different versions of ML libraries can lead to dif‐
ferences in performance and accuracy. So, ensuring that versions of
each library used are documented and controlled will often lead to
better reproducibility. Also, remember that tracking changes to large
datasets and other ML-related artifacts is different than tracking
code changes. In addition to some of the environment tools we dis‐
cuss in the next subsection, checkout Pachyderm or DVC for data
versioning.
Environments
ML models are trained, tested, and deployed in an environment that
is determined by software, hardware, and running programs. Ensur‐
ing a consistent environment for your ML model during training,
testing, and deployment is critical. Different environments will most
likely be detrimental to reproducibility (and just a huge pain to han‐
dle manually). Happily, many tools are now available to help data
scientists and ML engineers preserve their computing environ‐
ments. For instance, Python, sometimes called the lingua franca of
ML, now includes virtual environments for preserving coding
environments.
Virtual machines, and more recently, containers, provide a mecha‐
nism to replicate the entire software environment in which an ML
system operates. When it comes to ML, the container framework is
very popular. It can preserve the exact environment a model was
trained in and be run later on different hardware—major pluses for
reproducibility and easing ML system deployment! Moreover, speci‐
alized software has even been developed specifically to address envi‐
ronment reproducibility in data and ML workflows. Check out
Domino Data Lab, Gigantum, KubeFlow Pipelines, and TensorFlow
Extended to see what these specialized offerings look like.
Hardware
Hardware is the collection of different physical components that
enable a computer to run, subsequently allowing ML code to run,
which finally enables the training and deployment of ML systems.
Of course, hardware can have a major impact on ML system repro‐
ducibility. Basic considerations for hardware and ML reproducibility
include ensuring similarity of the hardware used between training
Reproducibility | 39
and deployment of ML systems and testing ML systems across dif‐
ferent hardware with an eye toward reproducibility.
By taking stock of these factors, along with the benchmark models
also discussed later in Chapter 4, data scientists, ML and data engi‐
neers, and other IT personnel should be able to enhance your
organization’s ML reproducibility capabilities. This is just a first step
to being more responsible with ML, but should also lead to happier
customers and faster ML product delivery over an ML system’s life‐
span. And once you know your ML system is standing on solid foot‐
ing, then the next big technological step is to start applying
interpretable and explainable ML techniques so you can know
exactly how your system works.
Interpretable Models
For decades, an informal belief in a so-called “accuracy-
interpretability tradeoff ” led most researchers and practitioners in
ML to treat their models as supposedly accurate, but inscrutable,
black boxes. In recent years, papers from leading ML scholars and
several empirical studies have begun to cast serious doubt on the
perceived tradeoff.1 There has been a flurry of papers and software
for new ML algorithms that are nonlinear, highly accurate, and
directly interpretable. Moreover, “interpretable” as a term has
become more associated with these kinds of new models.
40 | Chapter 4: Technology: Engineering Machine Learning for Human Trust and Understanding
New interpretable models are often Bayesian or constrained variants
of older ML algorithms, such as the explainable neural network
(XNN) pictured in the online resources that accompany this report.
In the example XNN, the model’s architecture is constrained to
make it more understandable to human operators.
Another key concept with interpretability is that it’s not a binary on-
off switch. And XNNs are probably some of the most complex kinds
of interpretable models. Scalable Bayesian rule lists, like some other
interpretable models, can create model architectures and results that
are perhaps interpretable enough for business decision makers.
Other interesting examples of these interpretable ML models
include:
42 | Chapter 4: Technology: Engineering Machine Learning for Human Trust and Understanding
ML models are deployed as part of public facing, organizational IT
systems.
This is where model debugging comes in. Model debugging is a
practice that’s focused on finding and fixing problems in ML sys‐
tems. In addition to a few novel approaches, the discipline borrows
from model governance, traditional model diagnostics, and software
testing. Model debugging attempts to test ML systems like computer
code because ML models are almost always made from code. And it
uses diagnostic approaches to trace complex ML model response
functions and decision boundaries to hunt down and address accu‐
racy, fairness, security, and other problems in ML systems. This sec‐
tion will discuss two types of model debugging: porting software
quality assurance (QA) techniques to ML, and specialized techni‐
ques needed to find and fix problems in the complex inner workings
of ML systems.
Benchmark Models
Benchmark models are simple, trusted, or transparent models to
which ML systems can be compared. They serve myriad risk mitiga‐
tion purposes in a typical ML workflow, including use in model
debugging and model monitoring.
Model Debugging
First, it’s always a good idea to check that a new complex ML model
outperforms a simpler benchmark model. Once an ML model passes
this baseline test, benchmark models can serve as debugging tools.
44 | Chapter 4: Technology: Engineering Machine Learning for Human Trust and Understanding
Use them to test your ML model by asking questions like, “What did
my ML model get wrong that my benchmark model got right? And
can I see why?” Another important function that benchmark models
can serve is tracking changes in complex ML pipelines. Running a
benchmark model at the beginning of a new training exercise can
help you confirm that you are starting on solid ground. Running
that same benchmark after making changes can help to confirm
whether changes truly improved an ML model or pipeline. More‐
over, automatically running benchmarks as part of a CI/CD process
can be a great way to understand how code changes impact complex
ML systems.
Model Monitoring
Comparing simpler benchmark models and ML system predictions
as part of model monitoring can help to catch stability, fairness, or
security anomalies in near real time. Due to their simple mecha‐
nisms, an interpretable benchmark model should be more stable,
easier to confirm as minimally discriminatory, and should be harder
to hack. So, the idea is to use a highly transparent benchmark model
when scoring new data and your more complex ML system. Then
compare your ML system predictions against a trusted benchmark
model. If the difference between your more complex ML system and
your benchmark model is above some reasonable threshold, then
fall back to issuing the benchmark model’s predictions or send the
row of data for manual processing. Also, record the incident. It
might turn out to be meaningful later. (It should be mentioned that
one concern when comparing an ML model versus a benchmark
model in production is the time it takes to score new data, i.e.,
increased latency.)
Given the host of benefits that benchmark models can provide, we
hope you’ll consider adding them into your training or deployment
technology stack.
These are discussed in more detail below. While picking the right
tool for discrimination testing and remediation is often difficult and
context sensitive, ML practitioners must make this effort. If you’re
using data about people, it probably encodes historical discrimina‐
tion that will be reflected in your ML system outcomes, unless you
find and fix it. This section will present the very basics of discrimi‐
nation testing and remediation in hopes of helping your organiza‐
tion get a jump start on fighting this nasty problem.
46 | Chapter 4: Technology: Engineering Machine Learning for Human Trust and Understanding
many individuals in your data, training adversary models or using
special training constraints, tracing decision boundaries, and using
post hoc explanation techniques to understand if features in your
models are local proxies for demographic variables. Of course,
doing all this extra work is never a bad idea, as it can help you
understand drivers of discrimination in your ML system, whether
these are group disparities or local disparities. And these extra steps
can be used later in your ML training process to confirm if any
applied remediation measures were truly successful.
Strategy 1
Strategy 1 is the traditional strategy (and safest from a US regulatory
perspective). Make sure to use no demographic features in your
model training, and simply check standard discrimination metrics
(like adverse impact ratio or standardized mean difference) across
an array of candidate ML models. Then select the least discrimina‐
tory model that is accurate enough to meet your business needs.
This is often the strategy used today in highly regulated areas like
lending and insurance.
Figure 4-1 illustrates how simply considering a discrimination
measure, adverse impact ratio (AIR) for African Americans versus
Caucasians in this case, during ML model selection can help find
accurate and less discriminatory models. AIR is usually accompa‐
nied by the four-fifths rule practical significance test, wherein the
ratio of positive outcomes for a historically marginalized
demographic versus the positive outcomes for a reference group,
often Whites or males, should be greater than 0.8, or four-fifths.
Strategy 2
Strategy 2 includes newer methods from the ML, computer science,
and fairness research communities.
Fix your data. Today, in less regulated industrial sectors, you’ll likely
be able to use software packages that can help you resample or
reweight your data so that it brings less discrimination into your ML
model training to begin with. Another key consideration here is
simply collecting representative data; if you plan to use an ML sys‐
tem on a certain population, you should collect data that accurately
represents that population.
48 | Chapter 4: Technology: Engineering Machine Learning for Human Trust and Understanding
Some of these might even be permissible in highly regulated settings
today but be sure to confer with your compliance or legal depart‐
ment before getting too invested in one of these techniques.
Regularization
The most aggressive, and perhaps riskiest approach from a reg‐
ulatory standpoint, is to leave demographic features in your ML
model training and decision-making processes, but use special‐
ized methods that attempt to regularize, or down weight, their
importance in the model.
Dual optimization
In a dual optimization approach, demographic features are not
typically used in the ML system decision-making process. But,
they are used during the ML model training process to down
weight model mechanisms that could result in more discrimina‐
tory outcomes. If you’re careful, dual optimization approaches
may be acceptable in some US regulated settings since demo‐
graphic information is not technically used in decision making.
Adversarial debiasing
In adversarial debiasing, two models compete against one
another. One ML model will be the model used inside your ML
system for decision making. This model usually does not have
access to any explicit demographic information. The other
model is an adversary model that is discarded after training,
and it does have access to explicit demographic information.
Training proceeds by first fitting the main model, then seeing if
the adversary can accurately predict demographic information
from only the main model’s predictions. If the adversary can,
the main model uses information from the adversary, but not
explicit demographic information, to down weight any hidden
demographic information in its training data. This back-and-
forth continues until the adversary can no longer predict demo‐
graphic information based on the main model’s predictions.
Like a dual objective approach, adversarial debiasing may be
acceptable in some US regulated settings.
General attacks
ML systems are subject to hacks like distributed denial of service
(DDOS) attacks and man-in-the-middle attacks.
50 | Chapter 4: Technology: Engineering Machine Learning for Human Trust and Understanding
Insider attacks
Malicious or extorted insiders can change ML training data to
manipulate ML system outcomes. This is known as data poisoning.
They can also alter code used to score new data, including creating
back doors, to impact ML system outputs. (These attacks can also be
performed by unauthorized external adversaries but are often seen
as more realistic attack vectors for insiders.)
External attacks
Several types of external attacks involve hitting ML endpoints with
weird data to change the system’s output. This can be as simple as
using strange input data, known as adversarial examples, to game
the ML system’s results. Or these attacks can be more specific, say
impersonating another person’s data, or using tweaks to your own
data to evade certain ML-based security measures. Another kind of
external ML attack involves using ML prediction endpoints as
designed, meaning simply submitting data to—and receiving pre‐
dictions from—ML endpoints. But instead of using the submitted
data and received predictions for legitimate business purposes, this
information is used to steal ML model logic and to reason about, or
even replicate, sensitive ML training data.
Trojans
ML systems are often dependent on numerous third-party and
open-source software packages, and, more recently, large pretrained
architectures. Any of these can contain malicious payloads.
Illustrations of some ML attacks are provided in the online resour‐
ces that accompany this report. These illustrations are visual sum‐
maries of the discussed insider and external ML attacks. For an
excellent overview of most known attacks, see the Berryville
Machine Learning Institute’s Interactive Machine Learning Risk
Framework.
Countermeasures
Given the variety of attacks for ML systems, you may now be won‐
dering about how to protect your organization’s ML and AI models.
There are several countermeasures you can use and, when paired
with the processes proposed in Chapter 3—bug bounties, security
audits, and red teaming—such measures are more likely to be
The basics
Whenever possible, require consumer authentication to access pre‐
dictions or use ML systems. Also, throttle system response times for
large or anomalous requests. Both of these basic IT security meas‐
ures go a long way in hindering external attacks.
Model debugging
Use sensitivity analysis and adversarial example searches to profile
how your ML system responds to different types of data. If you find
that your model may be subject to manipulation by certain kinds of
input data, either retrain your model with more data, constraints
and regularization, or alert those responsible for model monitoring
to be on the lookout for the discovered vulnerabilities.
Model monitoring
As discussed elsewhere in the report, models are often monitored
for decaying accuracy. But models should also be monitored for an
adversarial attack. Because a model could be attacked to be made
discriminatory, real-time discrimination testing should be conduc‐
ted if possible. In addition to monitoring for accuracy and discrimi‐
nation, watching for strange inputs such as unrealistic data, random
data, duplicate data, and training data can help to catch external
adversarial attacks as they occur. Finally, a general strategy that has
also been discussed in other sections is the real-time comparison of
the ML system results to simpler benchmark model results.
52 | Chapter 4: Technology: Engineering Machine Learning for Human Trust and Understanding
receive the absolute minimum IT system permissions, is one of the
best ways to guard against insider ML attacks. Other strategies
include careful control and documentation of data and code for ML
systems and residual analysis to find strange predictions for insiders
or their close associates.
Other key points in ML security include privacy-enhancing technol‐
ogies (PETs) to obscure and protect training data and organizational
preparation with AI incident response plans. As touched on in
Chapter 3, incorporating some defensive strategies—and training on
how and when to use them—into your organization’s AI incident
response plans can improve your overall ML security. As for PETs,
the next section will address them.
Federated Learning
Federated learning is an approach to training ML algorithms across
multiple decentralized edge devices or servers holding local data
samples, without exchanging raw data. This approach is different
from traditional centralized ML techniques where all datasets are
uploaded to a single server. The main benefit of federated learning is
that it enables the construction of robust ML models without shar‐
ing data among many parties. Federated learning avoids sharing
data by training local models on local data samples and exchanging
parameters between servers or edge devices to generate a global
model, which is then shared by all servers or edge devices. Assum‐
ing a secure aggregation process is used, federated learning helps
address fundamental data privacy and data security concerns.
Differential Privacy
Differential privacy is a system for sharing information about a
dataset by describing patterns about groups in the dataset without
disclosing information about specific individuals. In ML tools, this
Causality
We’ll close our responsible ML technology discussion with causality,
because modeling causal drivers of some phenomenon, instead of
complex correlations, could help address many of the risks we’ve
brought up. Correlation is not causation. And nearly all of today’s
popular ML approaches rely on correlation, or some more localized
variant of the same concept, to learn from data. Yet, data can be both
correlated and misleading. For instance, in the famous asthma
patient example discussed earlier, having asthma is correlated with
greater medical attention, not being at a lower risk of death from
pneumonia. Furthermore, a major concern in discrimination testing
and remediation is ML models learning complex correlations to
demographic features, instead of real relationships. Until ML algo‐
rithms can learn such causal relationships, they will be subject to
these kinds of basic logical flaws and other problems. Fortunately,
techniques like Markov Chain Monte Carlo (MCMC) sampling,
Bayesian networks, and various frameworks for causal inference are
beginning to pop up in commercial and open-source software ML
packages. More innovations are likely on the way, so keep an eye on
this important corner of the data world.
Aside from rigorous causal inference approaches, there are steps you
can take right now to incorporate causal concepts into your ML
projects. For instance, enhanced interpretability and model debug‐
ging can lead to a type of “poor man’s causality” where debugging is
54 | Chapter 4: Technology: Engineering Machine Learning for Human Trust and Understanding
used to find logical flaws in ML models and remediation techniques
such as model assertions, model editing, monotonicity constraints,
or interaction constraints are used to fix the flaw with human
domain knowledge. Root cause analysis is also a great addition to
high-stakes ML workflows. Interpretable ML models and post hoc
explanation techniques can now indicate reasons for ML model
behaviors, which human caseworkers can confirm or deny. These
findings can then be incorporated into the next iteration of the ML
system in hopes of improving multiple system KPIs. Of course, all of
these different suggestions are not a substitute for true causal infer‐
ence approaches, but they can help you make progress toward this
goal.
“By far, the greatest danger of Artificial Intelligence is that people con‐
clude too early that they understand it.”
—Eliezer Yudkowsky
57
The more you understand an ML system’s risks, the more you can
trust it. We often find that executives and leaders jump to ask,
“What is the risk?” whereas the data science practitioners are more
focused on, “Can I trust this prediction?” But in the end, they are
asking the same question.
The first and most obvious metrics to be analyzed are those around
the risk that a given ML model may manifest. Below are a few ques‐
tions informed decision makers need to ask regarding ML
deployments:
Acknowledgments
Thanks to our colleagues at H2O.ai, especially Ingrid Burton.
Thanks to Michele Cronin, Michelle Houston, Beth Kelly, Mike
Loukides, and Rebecca Novack at O’Reilly Media. Thanks also to
our colleagues in the broader data science community, Andrew
Burt, Hannes Hapke, Catherine Nelson, and Nicholas Schimdt.