21 SS133
21 SS133
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1 Sparse logical models: Decision trees, decision lists, decision sets . . . 11
2 Scoring systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Generalized additive models . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Modern case-based reasoning . . . . . . . . . . . . . . . . . . . . . . . 25
5 Complete supervised disentanglement of neural networks . . . . . . . . 31
arXiv: 2103.11251
∗ Partial support provided by grants DOE DE-SC0021358, NSF DGE-2022040, NSF CCF-
1934964, and NIDA DA054994-01.
† Equal contribution from C. Chen, Z. Chen, H. Huang, L. Semenova, and C. Zhong
1
2 C. Rudin et al.
Introduction
necessary authority to the black box [253]. There is a clear need for innovative
machine learning models that are inherently interpretable.
There is now a vast and confusing literature on some combination of in-
terpretability and explainability. Much literature on explainability confounds it
with interpretability/comprehensibility, thus obscuring the arguments (and thus
detracting from their precision), and failing to convey the relative importance
and use-cases of the two topics in practice. Some of the literature discusses top-
ics in such generality that its lessons have little bearing on any specific problem.
Some of it aims to design taxonomies that miss vast topics within interpretable
ML. Some of it provides definitions that we disagree with. Some of it even
provides guidance that could perpetuate bad practice. Importantly, most of it
assumes that one would explain a black box without consideration of whether
there is an interpretable model of the same accuracy. In what follows, we provide
some simple and general guiding principles of interpretable machine learning.
These are not meant to be exhaustive. Instead they aim to help readers avoid
common but problematic ways of thinking about interpretability in machine
learning.
The major part of this survey outlines a set of important and fundamental
technical grand challenges in interpretable machine learning. These are both
modern and classical challenges, and some are much harder than others. They
are all either hard to solve, or difficult to formulate correctly. While there
are numerous sociotechnical challenges about model deployment (that can be
much more difficult than technical challenges), human-computer interaction
challenges, and how robustness and fairness interact with interpretability, those
topics can be saved for another day. We begin with the most classical and most
canonical problems in interpretable machine learning: how to build sparse mod-
els for tabular data, including decision trees (Challenge #1) and scoring systems
(Challenge #2). We then delve into a challenge involving additive models (Chal-
lenge #3), followed by another in case-based reasoning (Challenge #4), which
is another classic topic in interpretable artificial intelligence. We then move to
more exotic problems, namely supervised and unsupervised disentanglement of
concepts in neural networks (Challenges #5 and #6). Back to classical prob-
lems, we discuss dimension reduction (Challenge #7). Then, how to incorporate
physics or causal constraints (Challenge #8). Challenge #9 involves understand-
ing, exploring, and measuring the Rashomon set of accurate predictive models.
Challenge #10 discusses interpretable reinforcement learning. Table 1 provides
a guideline that may help users to match a dataset to a suitable interpretable
supervised learning technique. We will touch on all of these techniques in the
challenges.
Table 1
Rule of thumb for the types of data that naturally apply to various supervised learning
algorithms. “Clean” means that the data do not have too much noise or systematic bias.
“Tabular” means that the features are categorical or real, and that each feature is a
meaningful predictor of the output on its own. “Raw” data is unprocessed and has a complex
data type, e.g., image data where each pixel is a feature, medical records, or time series data.
Models Data type
decision trees / decision lists somewhat clean tabular data with interactions, includ-
(rule lists) / decision sets ing multiclass problems. Particularly useful for cate-
gorical data with complex interactions (i.e., more than
quadratic). Robust to outliers.
scoring systems somewhat clean tabular data, typically used in medicine
and criminal justice because they are small enough that
they can be memorized by humans.
generalized additive models continuous data with at most quadratic interactions,
(GAMs) useful for large-scale medical record data.
case-based reasoning any data type (different methods exist for different data
types), including multiclass problems.
disentangled neural net- data with raw inputs (computer vision, time series, tex-
works tual data), suitable for multiclass problems.
A typical interpretable supervised learning setup, with data {zi }i , and models
chosen from function class F is:
1
min Loss(f, zi ) + C · InterpretabilityPenalty(f ), subject to (*)
f ∈F n
i
InterpretabilityConstraint(f ),
where the loss function, as well as soft and hard interpretability constraints, are
chosen to match the domain. (For classification zi might be (xi , yi ), xi ∈ Rp , yi ∈
{−1, 1}.) The goal of these constraints is to make the resulting model f or its
predictions more interpretable. While solutions of (*) would not necessarily be
sufficiently interpretable to use in practice, the constraints would generally help
us find models that would be interpretable (if we design them well), and we might
also be willing to consider slightly suboptimal solutions to find a more useful
model. The constant C trades off between accuracy and the interpretability
penalty, and can be tuned, either by cross-validation or by taking into account
the user’s desired tradeoff between the two terms.
Equation (*) can be generalized to unsupervised learning, where the loss term
would simply be replaced by a loss term for the unsupervised problem, whether
it is novelty detection, clustering, dimension reduction, or another task.
Creating interpretable models can sometimes be much more difficult than
creating black box models for many different reasons including: (i) Solving the
optimization problem may be computationally hard, depending on the choice of
constraints and the model class F. (ii) When one does create an interpretable
model, one invariably realizes that the data are problematic and require trou-
bleshooting, which slows down deployment (but leads to a better model). (iii)
Interpretable machine learning grand challenges 5
1 Obviously, this document does not apply to black box formulas that do not depend on
randomness in the data, i.e., a calculation of deterministic function, not machine learning.
6 C. Rudin et al.
other hand, for self-driving cars, even if they are very reliable, problems would
arise if the car’s vision system malfunctions causing a crash and no reason for
the crash is available. Lack of interpretability would be problematic in this case.
Our second fundamental principle concerns trust:
With black boxes, one needs to make a decision about trust with much less
information; without knowledge about the reasoning process of the model, it
is more difficult to detect whether it might generalize beyond the dataset. As
stated by Afnan et al. [4] with respect to medical decisions, while interpretable
AI is an enhancement of human decision making, black box AI is a replacement
of it.
An important point about interpretable machine learning models is that there
is no scientific evidence for a general tradeoff between accuracy and interpretabil-
ity when one considers the full data science process for turning data into knowl-
edge. (Examples of such pipelines include KDD, CRISP-DM, or the CCC Big
Data Pipelines; see Figure 1, or [95, 51, 7].) In real problems, interpretability
is useful for troubleshooting, which leads to better accuracy, not worse. In that
sense, we have the third principle:
ically, sparsity, but model creators generally would not equate interpretability
with sparsity. Sparsity is often one component of interpretability, and a model
that is sufficiently sparse but has other desirable properties is more typical.
While there is almost always a tradeoff of accuracy with sparsity (particularly
for extremely small models), there is no evidence of a general tradeoff of ac-
curacy with interpretability. Let us consider both (1) development and use of
ML models in practice, and (2) experiments with static datasets; in neither case
have interpretable models proven to be less accurate.
These two data extremes show that in machine learning, the dichotomy be-
tween the accurate black box and the less-accurate interpretable model is false.
The often-discussed hypothetical choice between the accurate machine-learning-
based robotic surgeon and the less-accurate human surgeon is moot once some-
one builds an interpretable robotic surgeon. Given that even the most difficult
computer vision benchmarks can be solved with interpretable models, there is no
reason to believe that an interpretable robotic surgeon would be worse than its
black box counterpart. The question ultimately becomes whether the Rashomon
set should permit such an interpretable robotic surgeon–and all scientific evi-
dence so far (including a large-and-growing number of experimental papers on
interpretable deep learning) suggests it would.
Our next principle returns to the data science process.
Principle 4 As part of the full data science process, one should expect both the
performance metric and interpretability metric to be iteratively refined.
Hence, this survey concerns the former. This is not a survey on Explainable
AI (XAI, where one attempts to explain a black box using an approximation
model, derivatives, variable importance measures, or other statistics), it is a sur-
vey on Interpretable Machine Learning (creating a predictive model that is not
a black box). Unfortunately, these topics are much too often lumped together
within the misleading term “explainable artificial intelligence” or “XAI” despite
a chasm separating these two concepts [250]. Explainability and interpretability
techniques are not alternative choices for many real problems, as the recent sur-
veys often imply; one of them (XAI) can be dangerous for high-stakes decisions
to a degree that the other is not.
Interpretable ML is not a subset of XAI. The term XAI dates from ∼2016,
and grew out of work on function approximation; i.e., explaining a black box
model by approximating its predictions by a simpler model [e.g., 70, 69], or
explaining a black box using local approximations. Interpretable ML also has a
(separate) long and rich history, dating back to the days of expert systems in
the 1950’s, and the early days of decision trees. While these topics may sound
similar to some readers, they differ in ways that are important in practice.
In particular, there are many serious problems with the use of explaining black
boxes posthoc, as outlined in several papers that have shown why explaining
10 C. Rudin et al.
black boxes can be misleading and why explanations do not generally serve their
intended purpose [250, 173, 171]. The most compelling such reasons are:
• Explanations for black boxes are often problematic and misleading, poten-
tially creating misplaced trust in black box models. Such issues with ex-
planations have arisen with assessment of fairness and variable importance
[258, 82] as well as uncertainty bands for variable importance [113, 97].
There is an overall difficulty in troubleshooting the combination of a black
box and an explanation model on top of it; if the explanation model is
not always correct, it can be difficult to tell whether the black box model
is wrong, or if it is right and the explanation model is wrong. Ultimately,
posthoc explanations are wrong (or misleading) too often.
One particular type of posthoc explanation, called saliency maps (also
called attention maps) have become particularly popular in radiology
and other computer vision domains despite known problems [2, 53, 334].
Saliency maps highlight the pixels of an image that are used for a predic-
tion, but do not explain how the pixels are used. As an analogy, consider
a real estate agent who is pricing a house. A “black box” real estate agent
would provide the price with no explanation. A “saliency” real estate agent
would say that the price is determined from the roof and backyard, but
doesn’t explain how the roof and backyard were used to determine the
price. In contrast, an interpretable agent would explain the calculation in
detail, for instance, using “comps” or comparable properties to explain
how the roof and backyard are comparable between properties, and how
these comparisons were used to determine the price. One can see from this
real estate example how the saliency agent’s explanation is insufficient.
Saliency maps also tend to be unreliable; researchers often report that dif-
ferent saliency methods provide different results, making it unclear which
one (if any) actually represents the network’s true attention.2
• Black boxes are generally unnecessary, given that their accuracy is gener-
ally not better than a well-designed interpretable model. Thus, explana-
tions that seem reasonable can undermine efforts to find an interpretable
model of the same level of accuracy as the black box.
• Explanations for complex models hide the fact that complex models are
difficult to use in practice for many different reasons. Typographical errors
in input data are a prime example of this issue [as in the use of COMPAS
2 To clear possible confusion, techniques such as SHAP and LIME are tools for explaining
black box models, are not needed for inherently interpretable models, and thus do not belong
in this survey. These methods determine how much each variable contributed to a prediction.
Interpretable models do not need SHAP values because they already explicitly show what
variables they are using and how they are using them. For instance, sparse decision trees
and sparse linear models do not need SHAP values because we know exactly what variables
are being used and how they are used. Interpretable supervised deep neural networks that
use case-based reasoning (Challenge #4) do not need SHAP values because they explicitly
reveal what part of the observation they are paying attention to, and in addition, how that
information is being used (e.g., what comparison is being made between part of a test image
and part of a training image). Thus, if one creates an interpretable model, one does not need
LIME or SHAP whatsoever.
Interpretable machine learning grand challenges 11
in practice, see 317]. A model with 130 hand-typed inputs is more error-
prone than one involving 5 hand-typed inputs.
In that sense, explainability methods are often used as an excuse to use a black
box model–whether or not one is actually needed. Explainability techniques give
authority to black box models rather than suggesting the possibility of models
that are understandable in the first place [253].
XAI surveys have (thus far) universally failed to acknowledge the important
point that interpretability begets accuracy when considering the full data science
process, and not the other way around. Perhaps this point is missed because of
the more subtle fact that one does generally lose accuracy when approximating
a complicated function with a simpler one, so it would appear that the simpler
approximation is less accurate. (Again the approximations must be imperfect,
otherwise one would throw out the black box and instead use the explanation as
an inherently interpretable model.) But function approximators are not used in
interpretable ML; instead of approximating a known function (a black box ML
model), interpretable ML can choose from a potential myriad of approximately-
equally-good models, which, as we noted earlier, is called “the Rashomon set”
[41, 97, 269]. We will discuss the study of this set in Challenge 9. Thus, when
one explains black boxes, one expects to lose accuracy, whereas when one creates
an inherently interpretable ML model, one does not.
In this survey, we do not aim to provide yet another dull taxonomy of “ex-
plainability” terminology. The ideas of interpretable ML can be stated in just
one sentence: an interpretable model is constrained, following a domain-specific
set of constraints that make reasoning processes understandable. Instead, we
highlight important challenges, each of which can serve as a starting point for
someone wanting to enter into the field of interpretable ML.
The first two challenges involve optimization of sparse models. We discuss both
sparse logical models in Challenge #1 and scoring systems (which are sparse
linear models with integer coefficients) in Challenge #2. Sparsity is often used as
a measure of interpretability for tabular data where the features are meaningful.
Sparsity is useful because humans can handle only 7±2 cognitive entities at
the same time [208], and sparsity makes it easier to troubleshoot, check for
typographical errors, and reason about counterfactuals (e.g., “How would my
prediction change if I changed this specific input?”). Sparsity is rarely the only
consideration for interpretability, but if we can design models to be sparse, we
can often handle additional constraints. Also, if one can optimize for sparsity,
a useful baseline can be established for how sparse a model could be with a
particular level of accuracy.
We remark that more sparsity does not always equate to more interpretabil-
ity. This is because “humans by nature are mentally opposed to too simplistic
representations of complex relations” [93, 100]. For instance, in loan decisions,
we may choose to have several sparse mini-models for length of credit, history of
12 C. Rudin et al.
default, etc., which are then assembled at the end into a larger model composed
of the results of the mini-models [see 54, who attempted this]. On the other hand,
sparsity is necessary for many real applications, particularly in healthcare and
criminal justice where the practitioner needs to memorize the model.
Logical models, which consist of logical statements involving “if-then,” “or,”
and “and” clauses are among the most popular algorithms for interpretable
machine learning, since their statements provide human-understandable reasons
for each prediction.
When would we use logical models? Logical models are usually an excellent
choice for modeling categorical data with potentially complicated interaction
terms (e.g., “IF (female AND high blood pressure AND congenital heart fail-
ure), OR (male AND high blood pressure AND either prior stroke OR age >
70) THEN predict Condition 1 = true”). Logical models are also excellent for
multiclass problems. Logical models are also known for their robustness to out-
liers and ease of handling missing data. Logical models can be highly nonlinear,
and even classes of sparse nonlinear models can be quite powerful.
Figure 2 visualizes three logical models: a decision tree, a decision list, and
a decision set. Decision trees are tree-structured predictive models where each
branch node tests a condition and each leaf node makes a prediction. Decision
lists, identical to rule lists or one-sided decision trees, are composed of if-then-
else statements. The rules are tried in order, and the first rule that is satisfied
makes the prediction. Sometimes rule lists have multiple conditions in each
split, whereas decision trees typically do not. A decision set, also known as a
“disjunction of conjunctions,” “disjunctive normal form” (DNF), or an “OR of
ANDs” is comprised of an unordered collection of rules, where each rule is a
conjunction of conditions. A positive prediction is made if at least one of the
rules is satisfied. Even though these logical models seem to have very different
forms, they are closely related: every decision list is a (one-sided) decision tree
and every decision tree can be expressed as an equivalent decision list (by listing
each path to a leaf as a decision rule). The collection of leaves of a decision tree
(or a decision list) also forms a decision set.
Let us provide some background on decision trees. Since Morgan and Son-
quist [212] developed the first decision tree algorithm, many works have been
proposed to build decision trees and improve their performance. However, learn-
ing decision trees with high performance and sparsity is not easy. Full decision
tree optimization is known to be an NP-complete problem [174], and heuristic
greedy splitting and pruning procedures have been the major type of approach
since the 1980s to grow decision trees [40, 239, 192, 205]. These greedy methods
for building decision trees create trees from the top down and prune them back
afterwards. They do not go back to fix a bad split if one was made. Consequently,
the trees created from these greedy methods tend to be both less accurate and
less interpretable than necessary. That is, greedy induction algorithms are not
designed to optimize any particular performance metric, leaving a gap between
the performance that a decision tree might obtain and the performance that
the algorithm’s decision tree actually attains, with no way to determine how
large the gap is (see Figure 3 for a case where the 1984 CART algorithm did
Interpretable machine learning grand challenges 13
Fig 2. Predicting which individuals are arrested within two years of release by a decision
tree (a), a decision list (b), and a decision set (c). The dataset used here is the ProPublica
recidivism dataset [13].
not obtain an optimal solution, as shown by the better solution from a 2020
algorithm called “GOSDT,” to the right). This gap can cause a problem in
practice because one does not know whether poor performance is due to the
choice of model form (the choice to use a decision tree of a specific size) or poor
optimization (not fully optimizing over the set of decision trees of that size).
When fully optimized, single trees can be as accurate as ensembles of trees, or
neural networks, for many problems. Thus, it is worthwhile to think carefully
about how to optimize them.
Fig 3. (a) 16-leaf decision tree learned by CART [40] and (b) 9-leaf decision tree generated by
GOSDT [185] for the classic Monk 2 dataset [88]. The GOSDT tree is optimal with respect
to a balance between accuracy and sparsity.
1
min Loss(f, zi ) + C · Number of leaves (f ), (1.1)
f ∈ set of trees n i
where the user specifies the loss function and the trade-off (regularization) pa-
rameter.
Efforts to fully optimize decision trees, solving problems related to (1.1),
have been made since the 1990s [28, 85, 94, 220, 221, 131, 185]. Many recent
papers directly optimize the performance metric (e.g., accuracy) with soft or
hard sparsity constraints on the tree size, where sparsity is measured by the
number of leaves in the tree. Three major groups of these techniques are (1)
mathematical programming, including mixed integer programming (MIP) [see
the works of 28, 29, 251, 301, 302, 118, 6] and SAT solvers [214, 130] [see also the
review of 47], (2) stochastic search through the space of trees [e.g., 321, 114, 228],
and (3) customized dynamic programming algorithms that incorporate branch-
and-bound techniques for reducing the size of the search space [131, 185, 222, 78].
Decision list and decision set construction lead to the same challenges as
decision tree optimization, and have a parallel development path. Dating back
to 1980s, decision lists have often been constructed in a top-down greedy fashion.
Associative classification methods assemble decision lists or decision sets from
a set of pre-mined rules, generally either by greedily adding rules to the model
one by one, or simply including all “top-scoring” rules into a decision set, where
each rule is scored separately according to a scoring function [247, 63, 188,
184, 325, 275, 200, 298, 252, 62, 108, 65, 99, 104, 199, 198]. Sometimes decision
lists or decision sets are optimized by sampling [180, 321, 306], providing a
Bayesian interpretation. Some recent works can jointly optimize performance
metrics and sparsity for decision lists [251, 327, 10, 11, 8] and decision sets
[311, 110, 170, 132, 77, 197, 109, 80, 328, 45]. Some works optimize for individual
rules [77, 255].
In recent years, great progress has been made on optimizing the combination
of accuracy and sparsity for logical models, but there are still many challenges
that need to be solved. Some important ones are as follows:
1.1 Can we improve the scalability of optimal sparse decision trees?
A lofty goal for optimal decision tree methods is to fully optimize trees
as fast as CART produces its (non-optimal) trees. Current state-of-the-art
optimal decision tree methods can handle medium-sized datasets (thousands
of samples, tens of binary variables) in a reasonable amount of time (e.g.,
within 10 minutes) when appropriate sparsity constraints are used. But how
to scale up to deal with large datasets or to reduce the running time remains
a challenge.
These methods often scale exponentially in p, the number of dimensions
of the data. Developing algorithms that reduce the number of dimensions
through variable screening theorems or through other means could be ex-
tremely helpful.
Interpretable machine learning grand challenges 15
For methods that use the mathematical programming solvers, a good for-
mulation is key to reducing training time. For example, MIP solvers use
branch-and-bound methods, which partition the search space recursively
and solve Linear Programming (LP) relaxations for each partition to pro-
duce lower bounds. Small formulations with fewer variables and constraints
can enable the LP relaxations to be solved faster, while stronger LP relax-
ations (which usually involve more variables) can produce high quality lower
bounds to prune the search space faster and reduce the number of LPs to
be solved. How to formulate the problem to leverage the full power of MIP
solvers is an open question. Currently, mathematical programming solvers
are not as efficient as the best customized algorithms.
For customized branch-and-bound search algorithms such as GOSDT and
OSDT [185, 131], there are several mechanisms to improve scalability: (1)
effective lower bounds, which prevent branching into parts of the search
space where we can prove there is no optimal solution, (2) effective schedul-
ing policies, which help us search the space to find close-to-optimal solu-
tions quickly, which in turn improves the bounds and again prevents us
from branching into irrelevant parts of the space, (3) computational reuse,
whereby if a computation involves a sum over (even slightly) expensive
computations, and part of that sum has previously been computed and
stored, we can reuse the previous computation instead of computing the
whole sum over again, (4) efficient data structures to store subproblems
that can be referenced later in the computation should that subproblem
arise again.
1.2 Can we efficiently handle continuous variables? While decision trees
handle categorical variables and complicated interactions better than other
types of approaches (e.g., linear models), one of the most important chal-
lenges for decision tree algorithms is to optimize over continuous features.
Many current methods use binary variables as input [131, 185, 301, 222, 78],
which assumes that continuous variables have been transformed into indi-
cator variables beforehand (e.g., age> 50). These methods are unable to
jointly optimize the selection of variables to split at each internal tree node,
the splitting threshold of that variable (if it is continuous), and the tree
structure (the overall shape of the tree). Lin et al. [185] preprocess the data
by transforming continuous features into a set of dummy variables, with
many different split points; they take split points between every ordered
pair of unique values present in the training data. Doing this preserves opti-
mality, but creates a huge number of binary features, leading to a dramatic
increase in the size of the search space, and the possibility of hitting either
time or memory limits. Some methods [301, 222] preprocess the data using
an approximation, whereby they consider a much smaller subset of possible
thresholds, potentially sacrificing the optimality of the solution [see 185,
Section 3, which explains this]. One possible technique to help with this
problem is to use similar support bounds, identified by Angelino et al. [11],
but in practice these bounds have been hard to implement because checking
the bounds repeatedly is computationally expensive, to the point where the
16 C. Rudin et al.
bounds have never been used (as far as we know). Future work could go
into improving the determination of when to check these bounds, or prov-
ing that a subset of all possible dummy variables still preserves closeness to
optimality.
1.3 Can we handle constraints more gracefully? Particularly for greedy
methods that create trees using local decisions, it is difficult to enforce
global constraints on the overall tree. Given that domain-specific constraints
may be essential for interpretability, an important challenge is to deter-
mine how to incorporate such constraints. Optimization approaches (mathe-
matical programming, dynamic programming, branch-and-bound) are more
amenable to global constraints, but the constraints can make the prob-
lem much more difficult. For instance, falling constraints [306, 55] enforce
decreasing probabilities along a rule list, which make the list more inter-
pretable and useful in practice, but make the optimization problem harder,
even though the search space itself becomes smaller.
Example: Suppose a hospital would like to create a decision tree that will
be used to assign medical treatments to patients. A tree is convenient because
it corresponds to a set of questions to ask the patient (one for each internal
node along a path to a leaf). A tree is also convenient in its handling of multiple
medical treatments; each leaf could even represent a different medical treatment.
The tree can also handle complex interactions, where patients can be asked
multiple questions that build on each other to determine the best medication for
the patient. To train this tree, the proper assumptions and data handling were
made to allow us to use machine learning to perform causal analysis (in practice
these are more difficult than we have room to discuss here). The questions
we discussed above arise when the variables are continuous; for instance, if
we split on age somewhere in the tree, what is the optimal age to split at
in order to create a sparse tree? (See Challenge 1.2.) If we have many other
continuous variables (e.g., blood pressure, weight, body mass index), scalability
in choosing how to split them all becomes an issue. Further, if the hospital
has additional preferences, such as “falling probabilities,” where fewer questions
should be asked to determine whether a patient is in the most urgent treatment
categories, again it could affect our ability to find an optimal tree given limited
computational resources (see Challenge 1.3).
2. Scoring systems
Scoring systems are linear classification models that require users to add, sub-
tract, and multiply only a few small numbers in order to make a prediction.
These models are used to assess the risk of numerous serious medical con-
ditions since they allow quick predictions, without using a computer. Such
models are also heavily used in criminal justice. Table 2 shows an example
of a scoring system. A doctor can easily determine whether a patient screens
positive for obstructive sleep apnea by adding points for the patient’s age,
whether they have hypertension, body mass index, and sex. If the score is above
Interpretable machine learning grand challenges 17
Table 2
A scoring system for sleep apnea screening [295]. Patients that screen positive may need to
come to the clinic to be tested.
Patient screens positive for obstructive sleep apnea if Score >1
1. age ≥ 60 4 points ......
2. hypertension 4 points +......
3. body mass index ≥ 30 2 points +......
4. body mass index ≥ 40 2 points +......
5. female -6 points +......
Add points from row 1-5 Score = ......
Risk scores are scoring systems that have a conversion table to probabilities.
For instance, a 1 point total might convert to probability 15%, 2 points to 33%
and so on. Whereas scoring systems with a threshold (like the one in Table 2)
would be measured by false positive and false negative rates, a risk score might
be measured using the area under the ROC curve (AUC) and calibration.
The development of scoring systems dates back at least to criminal justice
work in the 1920s [44]. Since then, many scoring systems have been designed
for healthcare [154, 152, 153, 38, 175, 14, 107, 211, 274, 315, 286]. However,
none of the scoring systems mentioned so far was optimized purely using an
algorithm applied to data. Each scoring system was created using a different
method involving different heuristics. Some of them were built using domain
expertise alone without data, and some were created using rounding heuristics
for logistic regression coefficients and other manual feature selection approaches
to obtain integer-valued point scores [see, e.g., 175].
Such scoring systems could be optimized using a combination of the user’s
preferences (and constraints) and data. This optimization should ideally be ac-
complished by a computer, leaving the domain expert only to specify the prob-
lem. However, jointly optimizing for predictive performance, sparsity, and other
user constraints may not be an easy task. Equation (2.1) shows an example of
a generic optimization problem for creating a scoring system:
1
minf ∈F Loss(f, zi ) + C · Number of nonzero terms (f ), subject to
n i
18 C. Rudin et al.
p
f is a linear model, f (x) = λj xj ,
j=1
with small integer coefficients, ∀ j, λj ∈ {−10, −9, .., 0, .., 9, 10}
and additional user constraints. (2.1)
Here, the user would specify the loss function (logistic loss, etc.), the tradeoff
parameter C between the number of nonzero coefficients and the training loss,
and possibly some additional constraints, depending on the domain.
The integrality constraint on the coefficients makes the optimization problem
very difficult. The easiest way to satisfy these constraints is to fit real coefficients
(e.g., run logistic regression, perhaps with 1 regularization) and then round
these real coefficients to integers. However, rounding can go against the loss
gradient and ruin predictive performance. Here is an example of a coefficient
vector that illustrates why rounding might not work:
[5.3, 6.1, 0.31, 0.30, 0.25, 0.25, 0.24, ..., 0.05] → (rounding) → [5, 6, 0, 0, ..., 0].
When rounding, we lose all signal coming from all variables except the first
two. The contribution from the eliminated variables may together be significant
even if each individual coefficient is small, in which case, we lose predictive
performance.
Compounding the issue with rounding is the fact that 1 regularization in-
troduces a strong bias for very sparse problems. To understand why, consider
that the regularization parameter must be set to a very large number to get a
very sparse solution. In that case, the 1 regularization does more than make
the solution sparse, it also imposes a strong 1 bias. The solutions disintegrate
in quality as the solutions become sparser, then rounding to integers only makes
the solution worse.
An even bigger problem arises when trying to incorporate additional con-
straints, as we allude to in (2.1). Even simple constraints such as “ensure pre-
cision is at least 20%” when optimizing recall would be very difficult to satisfy
manually with rounding. There are four main types of approaches to building
scoring systems: i) exact solutions using optimization techniques, ii) approxi-
mation algorithms using linear programming, iii) more sophisticated rounding
techniques, iv) computer-aided exploration techniques.
Exact solutions. There are several methods that can solve (2.1) directly [294,
291, 292, 293, 256]. To date, the most promising approaches use mixed-integer
linear programming solvers (MIP solvers) which are generic optimization soft-
ware packages that handle systems of linear equations, where variables can be
either linear or integer. Commercial MIP solvers (currently CPLEX and Gurobi)
are substantially faster than free MIP solvers, and have free academic licenses.
MIP solvers can be used directly when the problem is not too large and when
the loss function is discrete or linear (e.g., classification error is discrete, as it
takes values either 0 or 1). These solvers are flexible and can handle a huge
variety of user-defined constraints easily. However, in the case where the loss
Interpretable machine learning grand challenges 19
function is nonlinear, like the classical logistic loss i log(1 + exp(−yi f (xi ))),
MIP solvers cannot be used directly. In that case, it is possible to use an algo-
rithm called RiskSLIM [293] that uses sophisticated optimization tools: cutting
planes within a branch-and-bound framework, using “callback” functions to a
MIP solver. A major benefit of scoring systems is that they can be used as
decision aids in very high stakes settings; RiskSLIM has been used to create a
model (the 2HELPS2B score) that is used in intensive care units of hospitals to
make treatment decisions about critically ill patients [281].
While exact optimization approaches provide optimal solutions, they strug-
gle with larger problems. For instance, to handle nonlinearities in continuous
covariates, these variables are often discretized to form dummy variables by
splitting on all possible values of the covariate (similar to the way continuous
variables are handled for logical model construction as discussed above, e.g.,
create dummy variables for age<30, age<31, age<32, etc.). Obviously doing
this can turn a small number of continuous variables into a large number of
categorical variables. One way to reduce the size of the problem is to use only
a subset of thresholds (e.g., age<30, age<35, age<40, etc.), but it is possible
to lose accuracy if not enough thresholds are included. Approximation methods
can be valuable in such cases.
down randomly. They also propose a greedy method where the sum of coef-
ficients is fixed and coefficients are rounded one at a time. Sokolovska et al.
[277] propose an algorithm that finds a local minimum by improving the so-
lution at each iteration until no further improvements are possible. Ustun and
Rudin [293] propose a combination of rounding and “polishing.” Their rounding
method is called Sequential Rounding. At each iteration, Sequential Rounding
chooses a coefficient to round and whether to round it up or down. It makes this
choice by evaluating each possible coefficient rounded both up and down, and
chooses the option with the best objective. After Sequential Rounding produces
an integer coefficient vector, a second algorithm, called Discrete Coordinate De-
scent (DCD), is used to “polish” the rounded solution. At each iteration, DCD
chooses a coefficient, and optimizes its value over the set of integers to obtain
a feasible integer solution with a better objective. All of these algorithms are
easy to program and might be easier to deal with than troubleshooting a MIP
or LP solver.
where x·j indicates the jth feature, g(·) is a link function and the fi ’s are
univariate component functions that are possibly nonlinear; common choices are
step functions and splines. If the link function g(·) is the identity, the expression
describes an additive model such as a regression model; if the link function
is the logistic function, then the expression describes a generalized additive
22 C. Rudin et al.
Fig 4. Hierarchical relationships between GAMs, additive models, linear models, and scoring
systems.
model that could be used for classification. The standard form of GAMs is
interpretable because the model is constrained to be a linear combination of
univariate component functions. We can plot each component function with
fj (x·j ) as a function of x·j to see the contribution of a single feature to the
prediction. The left part of Figure 5 shows all component functions of a GAM
(with no interactions) that predicts whether a patient has diabetes. The right
enlarged figure visualizes the relationship between plasma glucose concentration
after 2 hours into an oral glucose tolerance test and the risk of having diabetes.
Fig 5. Left: All component functions of a GAM model trained using the interpret package
[223] on a diabetes dataset [88]; Right: zoom-in of component function for glucose concentra-
tion.
If the features are all binary (or categorical), the GAM becomes a linear
model and the visualizations are just step functions. The visualizations become
more interesting for continuous variables, like the ones shown in Figure 5. If a
GAM has bivariate component functions (that is, if we choose an fj to depend
on two variables, which permits an interaction between these two variables), a
heatmap can be used to visualize the component function on the two dimensional
plane and understand the pairwise interactions [196]. As a comparison point with
Interpretable machine learning grand challenges 23
decision trees, GAMs typically do not handle more than a few interaction terms,
and all of these would be quadratic (i.e., involve 2 variables); this contrasts
with decision trees, which handle complex interactions of categorical variables.
GAMs, like other linear models, do not handle multiclass problems in a natural
way. GAMs have been particularly successful for dealing with large datasets of
medical records that have many continuous variables because they can elucidate
complex relationships between, for instance, age, disease and mortality. (Of
course, dealing with large raw medical datasets, we would typically encounter
serious issues with missing data, or bias in the labels or variables, which would
be challenging for any method, including GAMs.)
A component function fj can take different forms. For example, it can be a
weighted sum of indicator functions, that is:
fj (x·j ) = cj,j 1[x·j > θj ]. (3.1)
thresholdsj
If the weights on the indicator functions are integers, and only a small set of
weights are nonzero, then the GAM becomes a scoring system. If the indicators
are all forced to aim in one direction (e.g., 1[x·j > θj ] for all j , with no
indicators in the other direction, 1[x·j < θj ]) and the coefficients cj,j are all
constrained to be nonnegative, then the function will be monotonic. In the case
that splines are used as component functions, the GAM can be a weighted sum
Kj
of the splines’ basis functions, i.e. fj (x·j ) = k=1 βjk bjk (x·j ).
There are many different ways to fit GAMs. The traditional way is to use
backfitting, where we iteratively train a component function to best fit the
residuals from the other (already-chosen) components [122]. If the model is fit-
ted using boosting methods [101, 102, 103], we learn a tree on each single feature
in each iteration and then aggregate them together [195]. Among different es-
timations of component functions and fitting procedures, Binder and Tutz [36]
found that boosting performed particularly well in high-dimensional settings,
and Lou et al. [195] found that using a shallow bagged ensemble of trees on
a single feature in each step of stochastic gradient boosting generally achieved
better performance.
We remark that GAMs have the advantage that they are very powerful,
particularly if they are trained as boosted stumps or trees, which are reliable
out-of-the-box machine learning techniques. The AdaBoost algorithm also has
the advantage that it maximizes convex proxies for both classification error and
area under the ROC curve (AUC) simultaneously [254, 72]. This connection
explains why boosted models tend to have both high AUC and accuracy. How-
ever, boosted models are not naturally sparse, and issues with bias arise under
1 regularization, as discussed in the scoring systems section.
We present two interesting challenges involving GAMs:
3.1 How to control the simplicity and interpretability for GAMs: The
simplicity of GAMs arises in at least two ways: sparsity in the number of
component functions and smoothness of the component functions. Imposing
monotonicity of the component functions also helps with interpretability
24 C. Rudin et al.
when we have prior knowledge, e.g., that risk increases with age. In the
case when component functions are estimated by splines, many works ap-
ply convex regularizers (e.g., 1 ) to control both smoothness and sparsity
[186, 245, 206, 324, 234, 194, 262, 121]. For example, they add “roughness”
penalties and lasso type penalties on the fj ’s in the objective function to con-
trol both the smoothness of component functions and sparsity of the model.
Similarly, if the fj ’s are sums of indicators, as in (3.1), we could regularize
to reduce the cj,j coefficients to induce smoothness. These penalties are
usually convex, therefore, when combined with convex loss functions, con-
vex optimization algorithms minimize their (regularized) objectives. There
could be some disadvantages to this setup: (1) as we know, 1 regulariza-
tion imposes a strong unintended bias on the coefficients when aiming for
very sparse solutions. (2) Lou et al. [195] find that imposing smoothness
may come at the expense of accuracy, (3) imposing smoothness may miss
important naturally-occurring patterns like a jump in a component func-
tion; in fact, they found such a jump in mortality as a function of age that
seems to occur around retirement age. In that case, it might be more in-
terpretable to include a smooth increasing function of age plus an indicator
function around retirement age. At the moment, these types of choices are
hand-designed, rather than automated.
As mentioned earlier, boosting can be used to train GAMs to produce ac-
curate models. However, sparsity and smoothness are hard to control with
AdaBoost since it adds a new term to the model at each iteration.
3.2 How to use GAMs to troubleshoot complex datasets? GAMs are
often used on raw medical records or other complex data types, and these
datasets are likely to benefit from troubleshooting. Using a GAM, we might
find counterintuitive patterns; e.g., as shown in [49], asthma patients fared
better than non-asthma patients in a health outcomes study. Caruana et
al. [49] provides a possible reason for this finding, which is that asthma
patients are at higher natural risk, and are thus given better care, leading
to lower observed risk. Medical records are notorious for missing important
information or providing biased information such as billing codes. Could
a GAM help us to identify important missing confounders, such as retire-
ment effects or special treatment for asthma patients? Could GAMs help us
reconcile medical records from multiple data storage environments? These
data quality issues can be really important.
Example: Suppose a medical researcher has a stack of raw medical records and
would like to predict the mortality risk for pneumonia patients. The data are
challenging, including missing measurements (structural missingness as well as
data missing not at random, and unobserved variables), insurance codes that
do not convey exactly what happened to the patient, nor what their state was.
However, the researcher decides that there is enough signal in the data that it
could be useful in prediction, given a powerful machine learning method, such
as a GAM trained with boosted trees. There are also several important con-
tinuous variables, such as age, that could be visualized. A GAM with a small
Interpretable machine learning grand challenges 25
number of component functions might be appropriate since the doctor can vi-
sualize each component function. If there are too many component functions
(GAM without sparsity control), analyzing contributions from all of them could
be overwhelming (see Challenge 3.1). If the researcher could control the sparsity,
smoothness, and monotonicity of the component functions, she might be able to
design a model that not only predicts well, but also reveals interesting relation-
ships between observed variables and outcomes. This model could also help us
to determine whether important variables were missing, recorded inconsistently
or incorrectly, and could help identify key risk factors (see Challenge 3.2).
From there, the researcher might want to develop even simpler models, such
as decision trees or scoring systems, for use in the clinic (see Challenges 1 and 2).
Fig 6. Case-based reasoning types. Left: Nearest neighbors (just some arrows are shown for
3-nearest neighbors). Right: Prototype-based reasoning, shown with two prototypes.
26 C. Rudin et al.
Fig 7. (a) Prototype-based classification of the network from Li et al. [183]. The network
compares a previously unseen image of “6” with 15 prototypes of handwritten digits, learned
from the training set, and classifies the image as a 6 because it looks like the three proto-
types of handwritten 6’s, which have been visualized by passing them through a decoder from
latent space into image space. (b) Part-based prototype classification of a ProtoPNet [53].
The ProtoPNet compares a previously unseen image of a bird with prototypical parts of a
clay colored sparrow, which are learned from the training set. It classifies the image as a
clay colored sparrow because (the network thinks that) its head looks like a prototypical head
from a clay-colored sparrow, its wing bars look like prototypical wing bars from a clay-colored
sparrow, and so on. Here, the prototypes do not need to be passed through a decoder, they are
images from the training set.
typical parts of encoded training images. The prototypical parts are patches
of convolutional-neural-network-encoded training images, and represent typical
features observed for various image classes. Given an input instance, the net-
work compares an encoded input image with each of the learned prototypical
parts, and generates a prototype activation map that indicates both the location
and the degree of the image patch most similar to that prototypical part. The
authors applied the network to the benchmark Caltech-UCSD Birds-200-2011
(CUB-200-2011) dataset [305] for bird recognition, and the ProtoPNet was able
to learn prototypical parts of 200 classes of birds, and use these prototypes to
classify birds with an accuracy comparable to non-interpretable black-box mod-
els. Given a test image of a bird, the ProtoPNet was able to find prototypical
parts that are similar to various parts of the test image, and was able to provide
an explanation for its prediction, such as “this bird is a clay-colored sparrow,
because its head looks like that prototypical head from a clay-colored sparrow,
and its wing bars look like those prototypical wing bars from a clay-colored spar-
row” (Figure 7(b)). In their work, Chen et al. [53] also removed the decoder, and
instead introduced prototype projection, which pushes every prototypical part
to the nearest encoded training patch of the same class for visualization. This
improved the visualization quality of the learned prototypes (in comparison to
the approach of Li et al. [183] which used a decoder).
These two works (i.e., [183, 53]) have been extended in the domain of deep
case-based reasoning and deep prototype learning. In the image recognition
domain, Nauta et al. [215] proposed a method for explaining what visual char-
acteristics a prototype (in a trained ProtoPNet) is looking for; Nauta et al.
[216] proposed a method for learning neural prototype trees based on a proto-
type layer; Rymarczyk et al. [260] proposed data-dependent merge-pruning of
the prototypes in a ProtoPNet, to allow prototypes that activate on similarly
looking parts from various classes to be pruned and shared among those classes.
In the sequence modeling domain (such as natural language processing), Ming
et al. [209] and Hong et al. [129] took the concepts in [183] and [53], and inte-
grated prototype learning into recurrent neural networks for modeling sequential
data. Barnett et al. [21] extended the ideas of Chen et al. [53] and developed an
application to interpretable computer-aided digital mammography.
Despite the recent progress, many challenges still exist in the domain of case-
based reasoning, including:
Example: Suppose that a doctor wants to evaluate the risk of breast cancer
among patients [see 21]. The dataset used to predict malignancy of breast can-
cer usually contains a set of mammograms, and a set of patient features (e.g.,
age). Given a particular patient, how do we characterize prototypical signs of
cancerous growth from a sequence of mammograms taken at various times (this
is similar to a video, see Challenge 4.1)? How can a doctor supervise prototype
learning by telling a prototype-based model what (image/patient) features are
Interpretable machine learning grand challenges 31
where Signal(c, x) means the signal of concept c that passes through x, which
measures similarity between its two arguments. This constraint means all signal
about concept c in layer l will only pass through neuron neur. In other words,
using these constraints, we could constrain our network to have the kind of
“Grandmother node” that scientists have been searching for in both real and
artificial convolutional neural networks [115].
In this challenge, we consider the possibility of fully disentangling a DNN
so that each neuron in a piece of the network represents a human-interpretable
concept.
The vector space whose axes are formed by activation values on the hidden
layer’s neurons is known as the latent space of a DNN. Figure 8 shows an exam-
ple of what an ideal interpretable latent space might look like. The axes of the
latent space are aligned with individual visual concepts, such as “lamp,” “bed,”
“nightstand,” “curtain.” Note that the “visual concepts” are not restricted to
objects but can also be things such as weather or materials in a scene. We hope
all information about the concept travels through that concept’s corresponding
neuron on the way to the final prediction. For example, the “lamp” neuron will
be activated if and only if the network thinks that the input image contains in-
formation about lamps. This kind of representation makes the reasoning process
of the DNN much easier to understand: the image is classified as “bedroom”
because it contains information about “bed” and “lamp.”
Such a latent space, made of disjoint data generation factors, is a disentangled
latent space [26, 123]. An easy way to create a disentangled latent space is just
32 C. Rudin et al.
Fig 8. Disentangled latent space. The axes (neurons) of the latent space are aligned with
supervised concepts, e.g. “lamp,” “bed,” “nightstand,” “curtain.” All information about the
concept up to that point in the network travels through that concept’s corresponding neuron.
to create a classifier for each concept (e.g., create a lamp classifier), but this
is not a good strategy: it might be that the network only requires the light
of the lamp rather than the actual lamp body, so creating a lamp classifier
could actually reduce performance. Instead, we would want to encourage the
information that is used about a concept (if any is used) to go along one path
through the network.
Disentanglement is not guaranteed in standard neural networks. In fact, in-
formation about any concept could be scattered throughout the latent space of
a standard DNN. For example, post hoc analyses on neurons of standard con-
volutional neural networks [336, 337] show that concepts that are completely
unrelated could be activated on the same axis, as shown in Figure 9. Even if we
create a vector in the latent space that is aimed towards a single concept [as is
done in 147, 338], that vector could activate highly on multiple concepts, which
means the signal for the two concepts is not disentangled. In that sense, vectors
in the latent space are “impure” in that they do not naturally represent single
concepts [see 58, for a detailed discussion].
This challenge focuses on supervised disentanglement of neural networks, i.e.,
the researcher specifies which concepts to disentangle in the latent space. (In the
next section we will discuss unsupervised disentanglement.) Earlier work in this
domain disentangles the latent space for specific applications, such as face recog-
nition, where we might aim to separate identity and pose [341]. Recent work
in this area aims to disentangle the latent space with respect to a collection of
predefined concepts [58, 158, 193, 3]. For example, Chen et al. [58] adds con-
straints to the latent space to decorrelate the concepts and align them with axes
in the latent space; this is called “concept whitening.” (One could think of this
Interpretable machine learning grand challenges 33
Fig 9. Example of an impure neuron in standard neural network from [337]. This figure shows
images that highly activate this neuron. Both dining tables (green) and Greek-style buildings
(red) are highly activated on the neuron, even though these two concepts are unrelated.
methods try to disentangle at most a single layer in the DNN (that is, they
attempt only the problem discussed above). This means one could only in-
terpret neurons in that specific layer, while semantic meanings of neurons
in other layers remain unknown. Ideally, we want to be able to completely
understand and modify the information flowing through all neurons in the
network. This is a challenging task for many obvious reasons, the first one
being that it is hard practically to define what all these concepts could
possibly be. We would need a comprehensive set of human-interpretable
concepts, which is hard to locate, create, or even to parameterize. Even if
we had this complete set, we would probably not want to manually specify
exactly what part of the network would be disentangled with respect to
each of these numerous concepts. For instance, if we tried to disentangle
the same set of concepts in all layers, it would be immediately problem-
atic because DNNs are naturally hierarchical: high-level concepts (objects,
weather, etc.) are learned in deeper layers, and the deeper layers leverage
low-level concepts (color, texture, object parts, etc.) learned in lower layers.
Clearly, complex concepts like “weather outside” could not be learned well
in earlier layers of the network, so higher-level concepts might be reserved
for deeper layers. Hence, we also need to know the hierarchy of the concepts
to place them in the correct layer. Defining the concept hierarchy manually
is almost impossible, since there could be thousands of concepts. But how
to automate it is also a challenge.
5.3 How to choose good concepts to learn for disentanglement? In su-
pervised disentanglement, the concepts are chosen manually. To gain useful
insights from the model, we need good concepts. But what are good con-
cepts in specific application domains? For example, in medical applications,
past works mostly use clinical attributes that already exist in the datasets.
However, Chen et al. [58] found that attributes in the ISIC dataset might
be missing the key concept used by the model to classify lesion malignancy.
Active learning approaches could be incredibly helpful in interfacing with
domain experts to create and refine concepts.
Moreover, it is challenging to learn concepts with continuous values. These
concepts might be important in specific applications, e.g., age of the patient
and size of tumors in medical applications. Current methods either define
a concept by using a set of representative samples or treat the concept as a
binary variable, where both are discrete. Therefore, for continuous concepts,
a challenge is how to choose good thresholds to transform the continuous
concept into one or multiple binary variables.
5.4 How to make the mapping from the disentangled layer to the out-
put layer interpretable? The decision process of current disentangled
neural networks contains two parts, x → c mapping the input x to the
disentangled representation (concepts) c, and c → y mapping the disen-
tangled representation c to the output y. [The notations are adopted from
158]. All current methods on neural disentanglement aim at making the
c interpretable, i.e., making the neurons in the latent space aligned with
human understandable concepts, but how these concepts combine to make
Interpretable machine learning grand challenges 35
the final prediction, i.e., c → y, often remains a black box. This leaves a
gap between the interpretability of the latent space and the interpretability
of the entire model. Current methods either rely on variable importance
methods to explain c → y posthoc [58], or simply make c → y a linear
layer [158]. However, a linear layer might not be expressive enough to learn
c → y. [158] also shows that a linear function c → y is less effective than
nonlinear counterparts when the user wants to intervene in developing the
disentangled representation, e.g., replacing predicted concept values ĉj with
true concept values cj . Neural networks like neural additive models [5] and
neural decision trees [322] could potentially be used to model c → y, since
they are both differentiable, nonlinear, and intrinsically interpretable once
the input features are interpretable.
Example: Suppose machine learning practitioners and doctors want to build
a supervised disentangled DNN on X-ray data to detect and predict arthritis.
First, they aim to choose a set of relevant concepts that have been assessed by
doctors (Challenge 5.3). They should also choose thresholds to turn continuous
concepts (e.g., age) into binary variables to create the concept datasets (Chal-
lenge 5.3). Using the concept datasets, they can use supervised disentanglement
methods like concept whitening [58] to build a disentangled DNN. However, if
they choose too many concepts to disentangle in the neural network, loading
samples from all of the concept datasets may take a very long time (Challenge
5.1). Moreover, doctors may have chosen different levels of concepts, such as
bone spur (high-level) and shape of joint (low-level), and they would like the
low-level concepts to be disentangled by neurons in earlier layers, and high-level
concepts to be disentangled in deeper layers, since these concepts have a hi-
erarchy according to medical knowledge. However, current methods only allow
placing them in the same layer (Challenge 5.2). Finally, all previous steps can
only make neurons in the DNN latent space aligned with medical concepts, while
the way in which these concepts combine to predict arthritis remains uninter-
pretable (Challenge 5.4).
that exist in natural scenes, labeled datasets for computer vision have a severe
labeling bias: we tend only to label entities in images that are useful for a spe-
cific task (e.g., object detection), thus ignoring much of the information found
in images. If we could effectively perform unsupervised disentanglement, we can
rectify problems caused by human bias, and potentially make scientific discov-
eries in uncharted domains. For example, an unsupervised disentangled neural
network can be used to discover key patterns in materials and characterize their
relation to the physical properties of the material (e.g., “will the material al-
low light to pass through it?”). Figure 10 shows such a neural network, with
a latent space completely disentangled and aligned with the key patterns dis-
covered without supervision: in the latent space, each neuron corresponds to
a key pattern and all information about the pattern flows through the corre-
sponding neuron. Analyzing these patterns’ contribution to the prediction of a
desired physical property could help material scientists understand what corre-
lates with the physical properties and could provide insight into the design of
new materials.
Fig 10. Neural network with unsupervised disentanglement in the latent space. The input (on
the left) is a unit cell of metameterial, made of stiff (yellow) and soft (purple) constituent
materials. The target of the neural network is predicting whether the material with unit cell
on the left supports the formation of forbidden frequency bands of propagation, i.e. existence
of “band gaps.” Key patterns related to the band gap (illustrated in blue) have been discov-
ered without supervision. Neurons in the latent space are aligned with these patterns. Figure
adapted from [59].
drawn (e.g., random natural images in the space of natural images). The statis-
tical independence between the latent features makes it easy to disentangle the
representation [27]: a disentangled representation simply guarantees that know-
ing or changing one latent feature (and its corresponding concept) does not
affect the distribution of any other. Results on simple imagery datasets show
that these generative models can decompose the data generation process into
disjoint factors (for instance, age, pose, identity of a person) and explicitly rep-
resent them in the latent space, without any supervision on these factors; this
happens based purely on statistical independence of these factors in the data.
Recently, the quality of disentanglement in deep generative models has been
improved [57, 124]. These methods achieve full disentanglement without super-
vision by maximizing the mutual information between latent variables and the
observations. However, these methods only work for relatively simple imagery
data, such as faces or single 3D objects (that is, in the image, there is only one
object, not a whole scene). Learning the decomposition of a scene into groups of
objects in the latent space, for example, is not yet achievable by these deep gen-
erative models. One reason for the failure of these approaches might be that the
occurrence of objects may not be statistically independent; for example, some
objects such as “bed” and “lamp” tend to co-occur in the same scene. Also, the
same type of object may occur multiple times in the scene, which cannot be
easily encoded by a single continuous latent feature.
Neural networks that incorporate compositional inductive bias: Another line
of work designs neural networks that directly build compositional structure into
a neural architecture. Compositional structure occurs naturally in computer
vision data, as objects in the natural world are made of parts. In areas that
have been widely studied beyond computer vision, including speech recognition,
researchers have already summarized a series of compositional hypotheses, and
incorporated them into machine learning frameworks. For example, in computer
vision, the “vision as inverse graphics” paradigm [23] tries to treat vision tasks
as the inverse of the computer graphics rendering process. In other words, it tries
to decode images into a combination of features that might control rendering of
a scene, such as object position, orientation, texture and lighting. Many studies
on unsupervised disentanglement have been focused on creating this type of rep-
resentation because it is intrinsically disentangled. Early approaches toward this
goal includes DC-IGN [165] and Spatial Transformers [135]. Recently, Capsule
Networks [126, 261] have provided a new way to incorporate compositional as-
sumptions into neural networks. Instead of using neurons as the building blocks,
Capsule Networks combine sets of neurons into larger units called “Capsules,”
and force them to represent information such as pose, color and location of ei-
ther a particular part or a complete object. This method was later combined
with generative models in the Stack Capsule Autoencoder (SCAE) [163]. With
the help of the Set Transformer [179] in combining information between layers,
SCAE discovers constituents of the image and organizes them into a smaller set
of objects. Slot Attention modules [191] further use an iterative attention mech-
anism to control information flow between layers, and achieve better results on
unsupervised object discovery. Nevertheless, similar to the generative models,
38 C. Rudin et al.
these networks perform poorly when aiming to discover concepts on more realis-
tic datasets. For example, SCAE can only discover stroke-like structures that are
uninterpretable to humans on the Street View House Numbers (SVHN) dataset
[218] (see Figure 11). The reason is that the SCAE can only discover visual
structures that appear frequently in the dataset, but in reality the appearance
of objects that belong to the same category can vary a lot. There have been
proposals [e.g., GLOM 125] on how a neural network with a fixed architecture
could potentially parse an image into a part-whole hierarchy. Although the idea
seems to be promising, no working system has been developed yet. There is still
a lot of room for development for this type of method.
Fig 11. Left: Sample images collected from the SVHN dataset [218]. Right: Stroke-like tem-
plates discovered by the SCAE. Although the objects in the dataset are mostly digits, current
capsule networks are unable to discover them as concepts without supervision.
to be activated only on a specific part of the object. These two methods are
capable of learning single objects or parts in the latent space but have not yet
been generalized to handle more comprehensive concepts (such as properties of
scenes, including style of an indoor room – e.g., cozy, modern, etc., weather for
outdoor scenes, etc.), since these concepts are not localized in the images.
Despite multiple branches of related work targeting concept discovery in dif-
ferent way, many challenges still exist:
Fig 12. Visualization of the MNIST dataset [177] using different kinds of DR methods: PCA
[232], t-SNE [297, 187, 236], UMAP [204], and PaCMAP [313]. The axes are not quantified
because these are projections into an abstract 2D space.
Fig 13. Visualization of 4000 points sampled from 20 isotropic Gaussians using Laplacian
Eigenmap [24, 233], t-SNE [297, 187, 236], ForceAtlas2 [134], UMAP [204], TriMap [9] and
PaCMAP [313]. The 20 Gaussians are equally spaced on an axis in 50-dimensional space,
labelled by the gradient colors. The best results are arguably those of t-SNE and PaCMAP in
this figure, which preserve clusters compactly and their relative placement (yellow on the left,
purple on the right).
DR algorithms shed light on how the loss function affects the rendering of local
structure [37], and provide guidance on how to design good loss functions so
that the local and global structure can both be preserved simultaneously [313].
Nevertheless, several challenges still exists for DR methods:
7.1 How to capture information from the high dimensional space more
accurately?
Most of the recent DR methods capture information in the high dimensional
space mainly from the k-nearest-neighbors and their relative distances, at
the expense of information from points that are more distant, which would
allow the preservation of more global structure. Several works [314, 64] dis-
cuss possible pitfalls in data analysis created by t-SNE and UMAP due to
loss of non-local information. Recent methods mitigate the loss of global in-
formation by using global-aware initialization [204, 155] (that is, initializing
the distances in the low-dimensional space using PCA) and/or selectively
preserving distances between non-neighbor samples [105, 313]. Neverthe-
less, these methods are still designed and optimized under the assumption
that the nearest neighbors, defined by the given metric (usually Euclidean
distance in the high dimensional space), can accurately depict the relation-
ships between samples. This assumption may not hold true for some data,
44 C. Rudin et al.
for instance, Euclidean distance may not be suitable for measuring distances
between weights (or activations) of a neural network [see 50, for a detailed
example of such a failure]. We would like DR methods to better capture
information from the high dimensional space to avoid such pitfalls.
7.2 How should we select hyperparameters for DR?
Modern DR methods, due to their multi-stage characteristics, involve a large
number of hyperparameters, including the number of high-dimensional near-
est neighbors to be preserved in the low-dimensional space and the learn-
ing rate used to optimize the low-dimensional embedding. There are often
dozens of hyperparameters in any given DR method, and since DR meth-
ods are unsupervised and we do not already know the structure of the
high-dimensional data, it is difficult to tune them. A poor choice of hy-
perparameters may lead to disappointing (or even misleading) DR results.
Fig 14 shows some DR results for the Mammoth dataset [133, 64] using
t-SNE, LargeVis, UMAP, TriMAP and PaCMAP with different sets of rea-
sonable hyperparameters. When their perplexity parameter or the number
of nearest neighbors is not chosen carefully, algorithms can fail to preserve
the global structure of the mammoth (specifically, the overall placement of
the mammoth’s parts), and they create spurious clusters (losing connectiv-
ity between parts of the mammoth) and lose details (such as the toes on the
feet of the mammoth). For more detailed discussions about the effect of dif-
ferent hyperparameters, see [314, 64]. Multiple works [for example 25, 313]
aimed to alleviate this problem for the most influential hyperparameters,
but the problem still exists, and the set of hyperparameters remains data-
dependent. The tuning process, which sometimes involves many runs of a
DR method, is time and power consuming, and requires user expertise in
both the data domain and in DR algorithms. It could be extremely helpful
to achieve better automatic hyperparameter selection for DR algorithms.
7.3 Can the DR transformation from high- to low-dimensions be made
more interpretable or explainable? The DR mapping itself – that is,
the transformation from high to low dimensions – typically is complex.
There are some cases in which insight into this mapping can be gained, for
instance, if PCA is used as the DR method, we may be able to determine
which of the original dimensions are dominant in the first few principle
components. It may be useful to design modern approaches to help users
understand how the final two or three dimensions are defined in terms of the
high-dimensional features. This may take the form of explanatory post-hoc
visualizations or constrained DR methods.
Fig 14. Projection from [313] of the Mammoth dataset into 2D using t-SNE [297, 187, 236],
LargeVis [284], UMAP [204], TriMap [9] and PaCMAP [313]. Incorrectly-chosen hyperpa-
rameters will lead to misleading results even in a simple dataset. This issue is particularly
visible for t-SNE (first two columns) and UMAP (fourth column). The original dataset is 3
dimensional and is shown at the top.
out the ground truth afforded to supervised methods), the researchers have no
way to see whether this cluster is present in the high-dimensional data or not
(Challenge 7.1 above). Scientists could waste a lot of time examining each such
spurious cluster. If we were able to solve the problems with DR tuning and
structure preservation discussed above, it will make DR methods more reliable,
leading to potentially increased understanding of many datasets.
There is a growing trend towards developing machine learning models that incor-
porate physics (or other) constraints. These models are not purely data-driven,
in the sense that their training may require little data or no data at all [e.g.,
244]. Instead, these models are trained to observe physical laws, often in the form
of ordinary (ODEs) and partial differential equations (PDEs). These physics-
guided models provide alternatives to traditional numerical methods (e.g., finite
element methods) for solving PDEs, and are of immense interest to physicists,
chemists, and materials scientists. The resulting models are interpretable, in the
46 C. Rudin et al.
sense that they are constrained to follow the laws of physics that were provided
to them. (It might be easier to think conversely: physicists might find that a
standard supervised machine learning model that is trained on data from a
known physical system – but that does not follow the laws of physics – would
be uninterpretable.)
The idea of using machine learning models to approximate ODEs and PDEs
solutions is not new. Lee and Kang [178] developed highly parallel algorithms,
based on neural networks, for solving finite difference equations (which are them-
selves approximations of original differential equations). Psichogios and Ungar
[238] created a hybrid neural network-first principles modeling scheme, in which
neural networks are used to estimate parameters of differential equations. La-
garis et al. [168, 169] explored the idea of using neural networks to solve initial
and boundary value problems. Several additional works [167, 43, 259] used sparse
regression and dynamic mode decomposition to discover the governing equa-
tions of dynamical systems directly from data. More recently, Raissi et al. [242]
extended the earlier works and developed the general framework of a physics-
informed neural network (PINN). In general, a PINN is a neural network that
approximates the solution of a set of PDEs with initial and boundary conditions.
The training of a PINN minimizes the residuals from the PDEs as well as the
residuals from the initial and boundary conditions. In general, physics-guided
models (neural networks) can be trained without supervised training data. Let
us explain how this works. Given a differential equation, say, f (t) = af (t)+bt+c,
where a, b and c are known constants, we could train a neural network g to ap-
proximate f , by minimizing (g (t) − ag(t) − bt − c)2 at finitely many points t.
Thus, no labeled data in the form (t, f (t)) (what we would need for conven-
tional supervised machine learning) is needed. The derivative g (t) with respect
to input t (at each of those finitely many points t used for training) is found by
leveraging the existing network structure of g using back-propagation. Figure
15 illustrates the training process of a PINN for approximating the solution of
∂2u
one-dimensional heat equation ∂u ∂t = k ∂x2 with initial condition u(x, 0) = f (x)
and Dirichlet boundary conditions u(0, t) = 0 and u(L, t) = 0. If observed data
are available, a PINN can be optimized with an additional mean-squared-error
term to encourage data fit. Many extensions to PINNs have since been devel-
oped, including fractional PINNs [226] and parareal PINNs [207]. PINNs have
been extended to convolutional [340] and graph neural network [270] backbones.
They have been used in many scientific applications, including fluid mechanics
modeling [243], cardiac activation mapping [263], stochastic systems modeling
[323], and discovery of differential equations [241, 240].
In addition to neural networks, Gaussian processes are also popular mod-
els for approximating solutions of differential equations. For example, Archam-
beau et al. [16] developed a variational approximation scheme for estimating
the posterior distribution of a system governed by a general stochastic differen-
tial equation, based on Gaussian processes. Zhao et al. [335] developed a PDE-
constrained Gaussian process model, based on the global Galerkin discretization
of the governing PDEs for the wire saw slicing process. More recently, Pang et
Interpretable machine learning grand challenges 47
Fig 15. Physics-informed neural network (PINN) framework for solving the one-dimensional
2
heat equation ∂u∂t
= k ∂∂xu2 with initial condition u(x, 0) = f (x) and Dirichlet boundary condi-
tions u(0, t) = 0 and u(L, t) = 0. Here, u is the model, which is the output of a neural network
(left), obeys the heat equation (upper right) and initial and boundary conditions (lower right).
The degree to which u does not obey the heat equation or initial or boundary conditions is
reflected in the loss function.
vision exhibit Rashomon sets, because neural networks that perform case-based
reasoning on disentanglement still yielded models that were equally accurate
to their unconstrained values; thus, these interpretable deep neural models are
within the Rashomon set. Rashomon sets present an opportunity for data scien-
tists: if there are many equally-good models, we can choose one that has desired
properties that go beyond minimizing an objective function. In fact, the model
that optimizes the training loss might not be the best to deploy in practice
anyway due to the possibilities of poor generalization, trust issues, or encoded
inductive biases that are undesirable [74]. More careful approaches to problem
formulation and model selection could be taken that include the possibility of
model multiplicity in the first place. Simply put – we need ways to explore the
Rashomon set, particularly if we are interested in model interpretability.
Formally, the Rashomon set is the set of models whose training loss is below a
specific threshold, as shown in Figure 16 (a). Given a loss function and a model
class F, the Rashomon set can be written as
where f ∗ can be an empirical risk minimizer, optimal model or any other refer-
ence model. We would typically choose F to be complex enough to contain
models that fit the training data well without overfitting. The threshold ,
which is called the Rashomon parameter, can be a hard hyper-parameter set
by a machine learning practitioner or a percentage of the loss (i.e., becomes
Loss(f ∗ )). We would typically choose or to be small enough that suffering
this additional loss would have little to no practical significance on predictive
performance. For instance, we might choose it to be much smaller than the
(generalization) error between training and test sets. We would conversely want
to choose or to be as large as permitted so that we have more flexibility to
choose models within a bigger Rashomon set.
It has been shown by Semenova et al. [269] that when the Rashomon set is
large, under weak assumptions, it must contain a simple (perhaps more inter-
pretable) model within it. The argument goes as follows: assume the Rashomon
set is large, so that it contains a ball of functions from a complicated function
class Fcomplicated (think of this as high-dimensional polynomials that are com-
plex enough to fit the data well without overfitting). If a set of simpler functions
Fsimpler could serve as approximating set for Fcomplicated (think decision trees
of a certain depth approximating the set of polynomials), it means that each
complicated function could be well-approximated by a simpler function (and
indeed, polynomials can be well-approximated by decision trees). By this logic,
the ball of Fcomplicated that is within the Rashomon set must contain at least
one function within Fsimpler , which is the simple function we were looking for.
Semenova et al. [269] also suggested a useful rule of thumb for determining
whether a Rashomon set is large: run many different types of machine learn-
ing algorithms (e.g., boosted decision trees, support vector machines, neural
networks, random forests, logistic regression) and if they generally perform sim-
ilarly, it correlates with the existence of a large Rashomon set (and thus the
50 C. Rudin et al.
9.1 How can we characterize the Rashomon set? As discussed above, the
Interpretable machine learning grand challenges 51
Fig 16. (a) An illustration of a possible Rashomon set in two-dimensional hypothesis space.
Models from two local minima that are below the red plane belong to the Rashomon set. (b)
An illustration of a possible visualization of the Rashomon set. Models inside the shaded red
regions belong to the Rashomon set. The plot is created as a contour plot of the loss over the
hypothesis space. Green dots represent a few simpler models inside the Rashomon set. These
models are simpler because they are sparse: they depend on one less dimension than other
models in the Rashomon set.
Fig 17. A one-dimensional example of computation of the Rashomon ratio. The hypothesis
space consists of decision stumps f (x) = {1[x≤a] }, a ∈ [0, 1]. (The function is 1 if x ≤ a and
0 otherwise.) The loss function is the zero-one loss: loss(f, x, y) = 1[f (x)=y] , which is 1 if
the
model makes a mistake and zero otherwise. A model f belongs to the Rashomon set if
i loss(f, xi , yi ) ≤ 0.2, that is, the model made 2 mistakes. The Rashomon ratio in (a) is
equal to 0.4 and is computed as the ratio of volumes of decision stumps in the Rashomon set
(blue shaded region) to the volume of all possible decision stumps in the area where data reside
(blue and orange shaded region). The pattern Rashomon ratio in (b) is equal to 3/11 and is
computed as the ratio of classifiers that belong to the Rashomon set (blue stumps, of which
there are 3 unique stumps with respect to the data) to the number of all possible classifiers
(orange and blue stumps, of which there are 11). This figure shows that there are multiple
ways to compute the size of the Rashomon set and it is not clear which one to choose.
issues with parameterization, but the pattern Rashomon ratio has its
own problems. Would variable importance space be suitable, where
each axis represents the importance of a raw variable, as in the work
of Dong and Rudin [86]? Or is there an easier space and metric to work
with?
(b) Are there approximation algorithms or other techniques that
will allow us to efficiently compute or approximate the size
of the Rashomon set? At worst, computation over the Rashomon
set requires a brute force calculation over a large discrete hypothesis
space. In most cases, the computation should be much easier. Perhaps
we could use dynamic programming or branch and bound techniques
to reduce the search space so that it encompasses the Rashomon set
but not too much more than that? In some cases, we could actually
compute the size of the Rashomon set analytically. For instance, for
linear regression, a closed-form solution in parameter space for the vol-
ume of the Rashomon set has been derived based on the singular values
of the data matrix [269].
9.2 What techniques can be used to visualize the Rashomon set? Visu-
alization of the Rashomon set can potentially help us to understand its prop-
erties, issues with the data, biases, or underspecification of the problem. To
give an example of how visualization can be helpful in troubleshooting mod-
els generally, Li et al. [181] visualized the loss landscape of neural networks
and, as a result, found answers to questions about the selection of the batch
size, optimizer, and network architecture. Kissel and Mentch [151] proposed
a tree-based graphical visualization to display outputs of the model path
selection procedure that finds models from the Rashomon set based on a for-
ward selection of variables. The visualization helps us to understand the sta-
bility of the model selection process, as well as the richness of the Rashomon
set, since wider graphical trees imply that there are more models available
for the selection procedure. To characterize the Rashomon set in variable
importance space, variable importance diagrams have been proposed [86],
which are 2-dimensional projections of the variable importance cloud. The
variable importance cloud is created by mapping every variable to its impor-
tance for every good predictive model (every model in the Rashomon set).
Figure 16(b) uses a similar technique of projection in two-dimensional space
and depicts the visualization of the example of the Rashomon set of Figure
16(a). This simple visualization allows us to see the Rashomon set’s layout,
estimate its size or locate sparser models within it. Can more sophisticated
approaches be developed that would allow good visualization? The success
of these techniques most likely will have to depend on whether we can design
a good metric for the model class. After the metric is designed, perhaps we
might be able to utilize techniques from Challenge 7 for the visualization.
9.3 What model to choose from the Rashomon set? When the Rashomon
set is large, it can contain multiple accurate models with different properties.
Choosing between them might be difficult, particularly if we do not know
54 C. Rudin et al.
how to explore the Rashomon set. Interactive methods might rely on dimen-
sion reduction techniques (that allow users to change the location of data
on the plot or to change the axis of visualization), weighted linear models
(that allow the user to compute weights on specific data points), continuous
feedback from the user (that helps to continuously improve models predic-
tion in changing environments, for example, in recommender systems) in
order to interpret or choose a specific model with desired property. Das et
al. [75] design a system called BEAMES that allows users to interactively
select important features, change weights on data points, visualize and se-
lect a specific model or even an ensemble of models. BEAMES searches the
hypothesis space for models that are close to the practitioners’ constraints
and design choices. The main limitation of BEAMES is that it works with
linear regression classifiers only. Can a similar framework that searches the
Rashomon set, instead of the whole hypothesis space, be designed? What
would the interactive specification of constraints look like in practice to help
the user choose the right model? Can collaboration with domain experts in
other ways be useful to explore the Rashomon set?
Example: Suppose that a financial institution would like to make data-driven
loan decisions. The institution must have a model that is as accurate as possible,
and must provide reasons for loan denials. In practice, loan decision prediction
problems have large Rashomon sets, and many machine learning methods per-
form similarly despite their different levels of complexity. To check whether the
Rashomon set is indeed large, the financial institution wishes to measure the
size of the Rashomon set (Challenge 9.1) or visualize the layout of it (Challenge
9.2) to understand how many accurate models exist, and how many of them
are interpretable. If the Rashomon set contains multiple interpretable models,
the financial institution might wish to use an interactive framework (Challenge
9.3) to navigate the Rashomon set and design constraints that would help to
locate the best model for their purposes. For example, the institution could
additionally optimize for fairness or sparsity.
Fig 18. An illustration of an RL system based on the example of Type 1 diabetes manage-
ment. This is theoretically a closed-loop system, where the agent observes the state from the
environment, makes an action, and receives a reward based on the action. For diabetes con-
trol, the state consists of measurements of glucose and other patient data, the actions are
injections of insulin, and the rewards are based on sugar levels.
state might look like on(block1, block2); on(block1, table); free(block2). Typically,
in relational reinforcement learning, policies are represented as relational regres-
sion trees [90, 91, 143, 96, 76] that are more interpretable than neural networks.
A second set of assumptions is based on a natural decomposition of the prob-
lem for multi-task or skills learning. For example, Shu et al. [271] use hierarchical
policies that are learned over multiple tasks, where each task is decomposed into
multiple sub-tasks (skills). The description of the skills is created by a human,
so that the agent learns these understandable skills (for instance, task stack blue
block can be decomposed into the skills find blue block, get blue block, and put
blue block ).
As far as we know, there are currently no general interpretable well-performing
methods for deep reinforcement learning that allow transparency in the agent’s
actions or intent. Progress has been made instead on explainable deep rein-
forcement learning (posthoc approximations), including tree-based explanations
[66, 189, 81, 22, 73], reward decomposition [138], and attention-based methods
[330, 213]. An interesting approach is that of Verma et al. [300], who define
rule-based policies through a domain-specific, human-readable programming
language that generalizes to unseen environments, but shows slightly worse per-
formance than neural network policies it was designed to explain. The approach
works for deterministic policies and symbolic domains only and will not work
for the domains where the state is represented as a raw data image, unless an
additional logical relation extractor is provided.
Based on experience from supervised learning (discussed above), we know
that post-hoc explanations suffer from multiple problems, in that explanations
are often incorrect or incomplete. Atrey et al. [18] have argued that for deep
reinforcement learning, saliency maps should be used for exploratory, and not
explanatory, purposes. Therefore, developing interpretable reinforcement learn-
ing policies and other interpretable RL models is important. The most crucial
challenges for interpretable reinforcement learning area include:
Interpretable machine learning grand challenges 57
Fig 19. (a) An example of a learned decision tree policy for the Cart-Pole balancing problem
adapted from Silva et al. [272]. In the Cart-Pole environment, a pole is attached to a cart
by a joint. The cart moves along the horizontal axis. The goal is to keep the pole balanced
upright, preventing it from falling. The state consists of four features: Cart Position, Cart
Velocity, Pole Angle and Pole Velocity At Tip. Here, angle 0◦ means straight upwards. There
are two possible actions at each state, including push cart to the left or push cart to the right.
(b) is a visualization of the policy from Silva et al. [272] based on two features: Pole Angle
and Pole Velocity At Tip. Interestingly, it is easy to see from this figure that the policy is not
left-right symmetric. Note that there are many reasonable policies for this problem, including
many that are asymmetric.
11. Problems that were not in our top 10 but are really important
We covered a lot of ground in the 10 challenges above, but we certainly did not
cover all of the important topics related to interpretable ML. Here are a few
that we left out:
• Can we improve preprocessing to help with both interpretabil-
ity and test accuracy? As we discussed, some interpretable modeling
problems are computationally difficult (or could tend to overfit) without
preprocessing. For supervised problems with tabular data, popular meth-
ods like Principal Component Analysis (PCA) would generally transform
the data in a way that damages interpretability of the model. This is
because each transformed feature is a combination of all of the original
features. Perhaps there are other general preprocessing tools that would
preserve predictive power (or improve it), yet retain interpretability.
• Can we convey uncertainty clearly? Uncertainty quantification is al-
ways important. Tomsett et al. [288] discuss the importance of uncertainty
Interpretable machine learning grand challenges 59
12. Conclusion
In this survey, we hoped to provide a pathway for readers into important topics
in interpretable machine learning. The literature currently being generated on
interpretable and explainable AI can be downright confusing. The sheer diversity
of individuals weighing in on this field includes not just statisticians and com-
puter scientists but legal experts, philosophers, and graduate students, many of
whom have not either built or deployed a machine learning model ever. It is
easy to underestimate how difficult it is to convince someone to use a machine
learning model in practice, and interpretability is a key factor. Many works over
the last few years have contributed new terminology, mistakenly subsumed the
older field of interpretable machine learning into the new field of “XAI,” and
review papers have universally failed even to truly distinguish between the ba-
sic concepts of explaining a black box and designing an interpretable model.
Because of the misleading terminology, where papers titled “explainability” are
sometimes about “interpretability” and vice versa, it is very difficult to follow
the literature (even for us). At the very least, we hoped to introduce some fun-
damental principles, and cover several important areas of the field and show
how they relate to each other and to real problems. Clearly this is a massive
field that we cannot truly hope to cover, but we hope that the diverse areas we
covered and problems we posed might be useful to those needing an entrance
point into this maze.
Interpretable models are not just important for society, they are also beauti-
ful. One might also find it absolutely magical that simple-yet-accurate models
exist for so many real-world datasets. We hope this document allows you to see
not only the importance of this topic but also the elegance of its mathematics
and the beauty of its models.
Acknowledgments
We thank Leonardo Lucio Custode for pointing out several useful references to
Challenge 10. Thank you to David Page for providing useful references on early
explainable ML. Thank you to the anonymous reviewers that made extremely
helpful comments.
References
[76] Das, S., Natarajan, S., Roy, K., Parr, R. and Kersting, K.
(2020). Fitted Q-Learning for Relational Domains. arXiv e-print
arXiv:2006.05595.
[77] Dash, S., Günlük, O. and Wei, D. (2018). Boolean Decision Rules via
Column Generation. In Proceedings of Conference on Neural Information
Processing Systems (NeurIPS) 31 4655–4665.
[78] Demirović, E., Lukina, A., Hebrard, E., Chan, J., Bailey, J.,
Leckie, C., Ramamohanarao, K. and Stuckey, P. J. (2020). MurTree:
Optimal Classification Trees via Dynamic Programming and Search. arXiv
e-print arXiv:2007.12652.
[79] Desjardins, G., Courville, A. and Bengio, Y. (2012). Disen-
tangling factors of variation via generative entangling. arXiv e-print
arXiv:1210.5474.
[80] Dhamnani, S., Singal, D., Sinha, R., Mohandoss, T. and Dash, M.
(2019). RAPID: Rapid and Precise Interpretable Decision Sets. In 2019
IEEE International Conference on Big Data (Big Data) 1292–1301. IEEE.
[81] Dhebar, Y., Deb, K., Nageshrao, S., Zhu, L. and Filev, D. (2020).
Interpretable-AI policies using evolutionary nonlinear decision trees for dis-
crete action systems. arXiv e-print arXiv:2009.09521.
[82] Dimanov, B., Bhatt, U., Jamnik, M. and Weller, A. (2020). You
Shouldn’t Trust Me: Learning Models Which Conceal Unfairness From
Multiple Explanation Methods. In 24th European Conference on Artificial
Intelligence (ECAI).
[83] Dinh, L., Pascanu, R., Bengio, S. and Bengio, Y. (2017). Sharp min-
ima can generalize for deep nets. In Proceedings of the International Con-
ference on Machine Learning (ICML) 1019–1028.
[84] Do, K. and Tran, T. (2019). Theory and evaluation metrics for learning
disentangled representations. In Proceedings of the International Confer-
ence on Learning Representations (ICLR).
[85] Dobkin, D., Fulton, T., Gunopulos, D., Kasif, S. and Salzberg, S.
(1997). Induction of shallow decision trees. IEEE Transactions on Pattern
Analysis and Machine Intelligence.
[86] Dong, J. and Rudin, C. (2020). Exploring the cloud of variable impor-
tance for the set of all good models. Nature Machine Intelligence 2 810–824.
[87] Donoho, D. L. and Grimes, C. (2003). Hessian eigenmaps: Locally lin-
ear embedding techniques for high-dimensional data. Proceedings of the
National Academy of Arts and Sciences 100 5591-5596. MR1981019
[88] Dua, D. and Graff, C. (2017). UCI Machine Learning Repository.
[89] Dudani, S. A. (1976). The Distance-Weighted k-Nearest-Neighbor Rule.
IEEE Transactions on Systems, Man, and Cybernetics SMC-6 325-327.
[90] Džeroski, S., De Raedt, L. and Blockeel, H. (1998). Relational re-
inforcement learning. In International Conference on Inductive Logic Pro-
gramming 11–22. Springer.
[91] Džeroski, S., De Raedt, L. and Driessens, K. (2001). Relational re-
inforcement learning. Machine Learning 43 7–52.
[92] Eastwood, C. and Williams, C. K. (2018). A framework for the quan-
68 C. Rudin et al.
285 2864–2870.
[108] Gaines, B. R. and Compton, P. (1995). Induction of ripple-down rules
applied to modeling large databases. Journal of Intelligent Information
Systems 5 211–228.
[109] Ghosh, B. and Meel, K. S. (2019). IMLI: An incremental framework for
MaxSAT-based learning of interpretable classification rules. In Proceedings
of the AAAI/ACM Conference on AI, Ethics, and Society (AIES) 203–210.
[110] Goh, S. T. and Rudin, C. (2014). Box Drawings for Learning with Imbal-
anced Data. In Proceedings of the ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD) 333–342.
[111] Gomez, O., Holter, S., Yuan, J. and Bertini, E. (2020). ViCE: Vi-
sual Counterfactual Explanations for Machine Learning Models. In Pro-
ceedings of the 25th International Conference on Intelligent User Interfaces
(IUI’20).
[112] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-
Farley, D., Ozair, S., Courville, A. and Bengio, Y. (2014). Genera-
tive Adversarial Nets. In Proceedings of Conference on Neural Information
Processing Systems (NeurIPS) 27 2672–2680.
[113] Gosiewska, A. and Biecek, P. (2020). Do Not Trust Additive Expla-
nations. arXiv e-print arXiv:1903.11420.
[114] Gray, J. B. and Fan, G. (2008). Classification tree analysis us-
ing TARGET. Computational Statistics & Data Analysis 52 1362–1372.
MR2422741
[115] Gross, C. C. (2002). Genealogy of the “Grandmother Cell”. The Neu-
roscientist 8 512-518.
[116] Guan, L., Verma, M., Guo, S., Zhang, R. and Kambhampati, S.
(2020). Widening the Pipeline in Human-Guided Reinforcement Learning
with Explanation and Context-Aware Data Augmentation.
[117] Guez, A., Vincent, R. D., Avoli, M. and Pineau, J. (2008). Adaptive
Treatment of Epilepsy via Batch-mode Reinforcement Learning. In Proceed-
ings of AAAI Conference on Artificial Intelligence (AAAI) 1671–1678.
[118] Günlük, O., Kalagnanam, J., Li, M., Menickelly, M. and Schein-
berg, K. (2021). Optimal decision trees for categorical data via integer
programming. Journal of Global Optimization 1–28. MR4299184
[119] Hamamoto, R., Suvarna, K., Yamada, M., Kobayashi, K.,
Shinkai, N., Miyake, M., Takahashi, M., Jinnai, S., Shimoyama, R.,
Sakai, A., Takasawa, K., Bolatkan, A., Shozu, K., Dozen, A.,
Machino, H., Takahashi, S., Asada, K., Komatsu, M., Sese, J. and
Kaneko, S. (2020). Application of Artificial Intelligence Technology in
Oncology: Towards the Establishment of Precision Medicine. Cancers 12.
[120] Hammond, K. (1989). Proceedings of the Second Case-Based Reasoning
Workshop (DARPA). Morgan Kanfmann Publishers, Inc., San Mateo, CA.
[121] Haris, A., Simon, N. and Shojaie, A. (2019). Generalized sparse addi-
tive models. arXiv e-print arXiv:1903.04641.
[122] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models
43. CRC press. MR1082147
70 C. Rudin et al.
[123] Higgins, I., Amos, D., Pfau, D., Racaniere, S., Matthey, L.,
Rezende, D. and Lerchner, A. (2018). Towards a definition of disen-
tangled representations. arXiv e-print arXiv:1812.02230.
[124] Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X.,
Botvinick, M., Mohamed, S. and Lerchner, A. (2017). beta-VAE:
Learning basic visual concepts with a constrained variational framework.
In Proceedings of the International Conference on Learning Representations
(ICLR).
[125] Hinton, G. (2021). How to represent part-whole hierarchies in a neural
network. arXiv e-print arXiv:2102.12627.
[126] Hinton, G. E., Krizhevsky, A. and Wang, S. D. (2011). Transforming
Auto-Encoders. In Proceedings of International Conference on Artificial
Neural Networks 44–51. Springer.
[127] Hochreiter, S. and Schmidhuber, J. (1997). Flat minima. Neural
Computation 9 1–42.
[128] Holte, R. C. (1993). Very simple classification rules perform well on
most commonly used datasets. Machine Learning 11 63-91.
[129] Hong, D., Baek, S. S. and Wang, T. (2020). Interpretable Sequence
Classification Via Prototype Trajectory. arXiv e-print arXiv:2007.01777.
[130] Hu, H., Siala, M., Hébrard, E. and Huguet, M.-J. (2020). Learning
Optimal Decision Trees with MaxSAT and its Integration in AdaBoost.
In IJCAI-PRICAI 2020, 29th International Joint Conference on Artificial
Intelligence and the 17th Pacific Rim International Conference on Artificial
Intelligence.
[131] Hu, X., Rudin, C. and Seltzer, M. (2019). Optimal Sparse Decision
Trees. In Proceedings of Conference on Neural Information Processing Sys-
tems (NeurIPS).
[132] Ignatiev, A., Pereira, F., Narodytska, N. and Marques-Silva, J.
(2018). A SAT-based approach to learn explainable decision sets. In Inter-
national Joint Conference on Automated Reasoning 627–645. MR3836041
[133] The Smithsonian Institute (2020). Mammuthus primigenius
(Blumbach). https://3d.si.edu/object/3d/mammuthus-primigenius-
blumbach:341c96cd-f967-4540-8ed1-d3fc56d31f12.
[134] Jacomy, M., Venturini, T., Heymann, S. and Bastian, M. (2014).
ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network
Visualization Designed for the Gephi Software. PLoS One 9 1-12.
[135] Jaderberg, M., Simonyan, K., Zisserman, A. and
Kavukcuoglu, K. (2015). Spatial transformer networks. In Proceedings
of Conference on Neural Information Processing Systems (NeurIPS)
2017–2025.
[136] Javad, M. O. M., Agboola, S. O., Jethwani, K., Zeid, A. and Ka-
marthi, S. (2019). A reinforcement learning–based method for manage-
ment of type 1 diabetes: exploratory study. JMIR Diabetes 4 e12905.
[137] Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L.,
Lawrence Zitnick, C. and Girshick, R. (2017). CLEVR: A Diagnos-
tic Dataset for Compositional Language and Elementary Visual Reasoning.
Interpretable machine learning grand challenges 71
sentations (ICLR).
[150] Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sal-
lab, A. A., Yogamani, S. and Pérez, P. (2021). Deep reinforcement
learning for autonomous driving: A survey. IEEE Transactions on Intelli-
gent Transportation Systems 1-18.
[151] Kissel, N. and Mentch, L. (2021). Forward Stability and Model Path
Selection. arXiv e-print arXiv:2103.03462.
[152] Knaus, W. A., Draper, E. A., Wagner, D. P. and Zimmerman, J. E.
(1985). APACHE II: a severity of disease classification system. Critical Care
Medicine 13 818–829.
[153] Knaus, W. A., Wagner, D. P., Draper, E. A., Zimmerman, J. E.,
Bergner, M., Bastos, P. G., Sirio, C. A., Murphy, D. J.,
Lotring, T., Damiano, A. et al. (1991). The APACHE III prognostic
system: risk prediction of hospital mortality for critically ill hospitalized
adults. Chest 100 1619–1636.
[154] Knaus, W. A., Zimmerman, J. E., Wagner, D. P., Draper, E. A.
and Lawrence, D. E. (1981). APACHE-acute physiology and chronic
health evaluation: a physiologically based classification system. Critical
Care Medicine 9 591–597.
[155] Kobak, D. and Berens, P. (2019). The art of using t-SNE for single-cell
transcriptomics. Nature Communication 10 5416.
[156] Kober, J., Bagnell, J. A. and Peters, J. (2013). Reinforcement learn-
ing in robotics: A survey. The International Journal of Robotics Research
32 1238–1274.
[157] Kodratoff, Y. (1994). The comprehensibility manifesto. KDD Nugget
Newsletter 94.
[158] Koh, P. W., Nguyen, T., Tang, Y. S., Mussmann, S., Pierson, E.,
Kim, B. and Liang, P. (2020). Concept bottleneck models. In Proceedings
of International Conference on Machine Learning (ICML) 5338–5348.
[159] Kohonen, T. (1995). Learning Vector Quantization. The Handbook of
Brain Theory and Neural Networks 537–540.
[160] Kolodner, J. L. (1988). Proceedings of the Case-Based Reasoning Work-
shop (DARPA). Morgan Kaufmmnn Publishers, Inc., San Mateo, CA.
[161] Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distri-
butions and bounding the generalization error of combined classifiers. The
Annals of Statistics 30 1–50. MR1892654
[162] Komorowski, M., Celi, L. A., Badawi, O., Gordon, A. C. and
Faisal, A. A. (2018). The artificial intelligence clinician learns optimal
treatment strategies for sepsis in intensive care. Nature Medicine 24 1716–
1720.
[163] Kosiorek, A., Sabour, S., Teh, Y. W. and Hinton, G. E. (2019).
Stacked capsule autoencoders. In Proceedings of Conference on Neural In-
formation Processing Systems (NeurIPS) 15512–15522.
[164] Krizhevsky, A., Sutskever, I. and Hinton, G. E. (2017). ImageNet
classification with deep convolutional neural networks. Communications of
the ACM 60.
Interpretable machine learning grand challenges 73
[179] Lee, J., Lee, Y., Kim, J., Kosiorek, A., Choi, S. and Teh, Y. W.
(2019). Set transformer: A framework for attention-based permutation-
invariant neural networks. In Proceedings of International Conference on
Machine Learning (ICML) 3744–3753.
[180] Letham, B., Rudin, C., McCormick, T. H., Madigan, D. et al.
(2015). Interpretable classifiers using rules and bayesian analysis: Build-
ing a better stroke prediction model. The Annals of Applied Statistics 9
1350–1371. MR3418726
[181] Li, H., Xu, Z., Taylor, G., Studer, C. and Goldstein, T. (2018).
Visualizing the loss landscape of neural nets. In Proceedings of Conference
on Neural Information Processing Systems (NeurIPS).
[182] Li, J., Monroe, W., Ritter, A., Galley, M., Gao, J. and Juraf-
sky, D. (2016). Deep reinforcement learning for dialogue generation. In
Proceedings of Conference on Empirical Methods in Natural Language Pro-
cessing (EMNLP).
[183] Li, O., Liu, H., Chen, C. and Rudin, C. (2018). Deep Learning for Case-
Based Reasoning through Prototypes: A Neural Network that Explains Its
Predictions. In Proceedings of AAAI Conference on Artificial Intelligence
(AAAI).
[184] Li, W., Han, J. and Pei, J. (2001). CMAR: Accurate and efficient classi-
fication based on multiple class-association rules. In Proceedings 2001 IEEE
International Conference on Data Mining (ICDM) 369–376. IEEE.
[185] Lin, J., Zhong, C., Hu, D., Rudin, C. and Seltzer, M. (2020). Gen-
eralized and scalable optimal sparse decision trees. In Proceedings of Inter-
national Conference on Machine Learning (ICML) 6150–6160.
[186] Lin, Y., Zhang, H. H. et al. (2006). Component selection and smoothing
in multivariate nonparametric regression. The Annals of Statistics 34 2272–
2297. MR2291500
[187] Linderman, G. C., Rachh, M., Hoskins, J. G., Steinerberger, S.
and Kluger, Y. (2019). Fast interpolation-based t-SNE for improved vi-
sualization of single-cell RNA-seq data. Nature Methods 16 243-245.
[188] Liu, B., Hsu, W., Ma, Y. et al. (1998). Integrating classification and
association rule mining. In Proceedings of the ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD) 98 80–86.
[189] Liu, G., Schulte, O., Zhu, W. and Li, Q. (2018). Toward interpretable
deep reinforcement learning with linear model u-trees. In Joint European
Conference on Machine Learning and Knowledge Discovery in Databases
414–429. Springer.
[190] Liu, Y., Logan, B., Liu, N., Xu, Z., Tang, J. and Wang, Y. (2017).
Deep reinforcement learning for dynamic treatment regimes on medical
registry data. In 2017 IEEE International Conference on Healthcare Infor-
matics (ICHI) 380–385. IEEE.
[191] Locatello, F., Weissenborn, D., Unterthiner, T., Mahen-
dran, A., Heigold, G., Uszkoreit, J., Dosovitskiy, A. and Kipf, T.
(2020). Object-Centric Learning with Slot Attention. In Proceedings of
Conference on Neural Information Processing Systems (NeurIPS).
Interpretable machine learning grand challenges 75
[192] Loh, W.-Y. and Shih, Y.-S. (1997). Split selection methods for classifi-
cation trees. Statistica Sinica 7 815–840. MR1488644
[193] Losch, M., Fritz, M. and Schiele, B. (2019). Interpretability be-
yond classification output: Semantic bottleneck networks. arXiv e-print
arXiv:1907.10882.
[194] Lou, Y., Bien, J., Caruana, R. and Gehrke, J. (2016). Sparse par-
tially linear additive models. Journal of Computational and Graphical
Statistics 25 1126–1140. MR3572032
[195] Lou, Y., Caruana, R. and Gehrke, J. (2012). Intelligible models for
classification and regression. In Proceedings of the ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining (KDD) 150–
158.
[196] Lou, Y., Caruana, R., Gehrke, J. and Hooker, G. (2013). Accurate
intelligible models with pairwise interactions. In Proceedings of the ACM
SIGKDD International Conference on Knowledge Discovery and Data Min-
ing (KDD) 623–631.
[197] Malioutov, D. and Meel, K. S. (2018). MLIC: A MaxSAT-based
framework for learning interpretable classification rules. In International
Conference on Principles and Practice of Constraint Programming 312–
327. Springer.
[198] Malioutov, D. and Varshney, K. (2013). Exact rule learning via
boolean compressed sensing. In Proceedings of International Conference
on Machine Learning (ICML) 765–773.
[199] Marchand, M. and Shawe-Taylor, J. (2002). The set covering ma-
chine. Journal of Machine Learning Research 3 723–746. MR1983944
[200] Marchand, M. and Sokolova, M. (2005). Learning with decision lists
of data-dependent features. Journal of Machine Learning Research 6 427–
451. MR2249827
[201] Marx, C., Calmon, F. and Ustun, B. (2020). Predictive multiplicity
in classification. In Proceedings of International Conference on Machine
Learning (ICML) 6765–6774.
[202] Matthey, L., Higgins, I., Hassabis, D. and Lerchner, A. (2017).
dSprites: Disentanglement testing Sprites dataset. https://github.com/
deepmind/dsprites-dataset/.
[203] McGough, M. (2018). How bad is Sacramento’s air, exactly? Google
results appear at odds with reality, some say. Sacramento Bee.
[204] McInnes, L., Healy, J. and Melville, J. (2018). UMAP: Uniform
Manifold Approximation and Projection for Dimension Reduction. arXiv
e-print arXiv:1802.03426.
[205] Mehta, M., Agrawal, R. and Rissanen, J. (1996). SLIQ: A fast scal-
able classifier for data mining. In International Conference on Extending
Database Technology 18–32. Springer.
[206] Meier, L., Van de Geer, S., Bühlmann, P. et al. (2009). High-
dimensional additive modeling. The Annals of Statistics 37 3779–3821.
MR2572443
[207] Meng, X., Li, Z., Zhang, D. and Karniadakis, G. E. (2020).
76 C. Rudin et al.
[234] Petersen, A., Witten, D. and Simon, N. (2016). Fused lasso additive
model. Journal of Computational and Graphical Statistics 25 1005–1025.
MR3572026
[235] Lo Piano, S. (2020). Ethical principles in machine learning and artificial
intelligence: cases from the field and possible ways forward. Humanit Soc
Sci Commun 7.
[236] Poličar, P. G., Stražar, M. and Zupan, B. (2019). openTSNE: a
modular Python library for t-SNE dimensionality reduction and embed-
ding. bioRxiv.
[237] Poursabzi-Sangdeh, F., Goldstein, D. G., Hofman, J. M.,
Vaughan, J. W. and Wallach, H. M. (2021). Manipulating and Measur-
ing Model Interpretability. In Proceedings of Conference on Human Factors
in Computing Systems (CHI).
[238] Psichogios, D. C. and Ungar, L. H. (1992). A hybrid neural network-
first principles approach to process modeling. AIChE Journal 38 1499–
1511.
[239] Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan
Kaufmann. MR3729316
[240] Raissi, M. (2018). Deep Hidden Physics Models: Deep Learning of Non-
linear Partial Differential Equations. Journal of Machine Learning Research
19 1-24. MR3862432
[241] Raissi, M., Perdikaris, P. and Karniadakis, G. E. (2018). Multi-
step Neural Networks for Data-driven Discovery of Nonlinear Dynamical
Systems. arXiv e-print arXiv:1801.01236. MR3881695
[242] Raissi, M., Perdikaris, P. and Karniadakis, G. E. (2019). Physics-
informed neural networks: A deep learning framework for solving forward
and inverse problems involving nonlinear partial differential equations.
Journal of Computational Physics 378 686–707. MR3881695
[243] Raissi, M., Yazdani, A. and Karniadakis, G. E. (2020). Hidden fluid
mechanics: Learning velocity and pressure fields from flow visualizations.
Science 367 1026–1030. MR4265157
[244] Rao, C., Sun, H. and Liu, Y. (2021). Physics-Informed Deep Learn-
ing for Computational Elastodynamics without Labeled Data. Journal of
Engineering Mechanics 147 04021043.
[245] Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009).
Sparse additive models. Journal of the Royal Statistical Society: Series B
(Statistical Methodology) 71 1009–1030. MR2750255
[246] Riesbeck, C. K. and Schank, R. S. (1989). Inside Case-Based Reason-
ing. Lawrence Erlbaum Assoc., Inc., Hillsdale, NJ.
[247] Rivest, R. L. (1987). Learning decision lists. Machine Learning 2 229–
246.
[248] Roth, A. M., Topin, N., Jamshidi, P. and Veloso, M. (2019).
Conservative Q-improvement: Reinforcement learning for an interpretable
decision-tree policy. arXiv e-print arXiv:1907.01180.
[249] Roweis, S. T. and Saul, L. K. (2000). Nonlinear Dimensionality Re-
duction by Locally Linear Embedding. Science 290 2323-2326.
Interpretable machine learning grand challenges 79
[250] Rudin, C. (2019). Stop Explaining Black Box Machine Learning Models
for High Stakes Decisions and Use Interpretable Models Instead. Nature
Machine Intelligence 1 206–215.
[251] Rudin, C. and Ertekin, S. (2018). Learning Customized and Optimized
Lists of Rules with Mathematical Programming. Mathematical Program-
ming C (Computation) 10 659–702. MR3863707
[252] Rudin, C., Letham, B. and Madigan, D. (2013). Learning theory anal-
ysis for association rules and sequential event prediction. Journal of Ma-
chine Learning Research 14 3441–3492. MR3144468
[253] Rudin, C. and Radin, J. (2019). Why Are We Using Black Box Mod-
els in AI When We Don’t Need To? A Lesson From An Explainable AI
Competition. Harvard Data Science Review 1.
[254] Rudin, C. and Schapire, R. E. (2009). Margin-Based Ranking and
an Equivalence between AdaBoost and RankBoost. Journal of Machine
Learning Research 10 2193–2232. MR2563980
[255] Rudin, C. and Shaposhnik, Y. (2019). Globally-Consistent Rule-
Based Summary-Explanations for Machine Learning Models: Application
to Credit-Risk Evaluation. May 28, 2019. Available at SSRN: https://
ssrn.com/abstract=3395422.
[256] Rudin, C. and Ustun, B. (2018). Optimized scoring systems: Toward
trust in machine learning for healthcare and criminal justice. Interfaces 48
449–466.
[257] Rudin, C. and Wagstaff, K. L. (2014). Machine Learning for Science
and Society. Machine Learning 95. MR3179975
[258] Rudin, C., Wang, C. and Coker, B. (2020). The Age of Secrecy and
Unfairness in Recidivism Prediction. Harvard Data Science Review 2.
[259] Rudy, S. H., Brunton, S. L., Proctor, J. L. and Kutz, J. N. (2017).
Data-driven discovery of partial differential equations. Science Advances 3
e1602614.
[260] Rymarczyk, D., Struski, L ., Tabor, J. and Zieliński, B. (2020).
ProtoPShare: Prototype Sharing for Interpretable Image Classification and
Similarity Discovery. arXiv e-print arXiv:2011.14340.
[261] Sabour, S., Frosst, N. and Hinton, G. E. (2017). Dynamic rout-
ing between capsules. In Proceedings of Conference on Neural Information
Processing Systems (NeurIPS) 3856–3866.
[262] Sadhanala, V., Tibshirani, R. J. et al. (2019). Additive models with
trend filtering. The Annals of Statistics 47 3032–3068. MR4025734
[263] Sahli Costabal, F., Yang, Y., Perdikaris, P., Hurtado, D. E. and
Kuhl, E. (2020). Physics-informed neural networks for cardiac activation
mapping. Frontiers in Physics 8 42.
[264] Salakhutdinov, R. and Hinton, G. (2007). Learning a Nonlinear Em-
bedding by Preserving Class Neighbourhood Structure. In Artificial Intel-
ligence and Statistics (AISTATS) 412–419.
[265] Sallab, A. E., Abdou, M., Perot, E. and Yogamani, S. (2017).
Deep reinforcement learning framework for autonomous driving. Electronic
Imaging 2017 70–76.
80 C. Rudin et al.
[266] Schapire, R. E., Freund, Y., Bartlett, P., Lee, W. S. et al. (1998).
Boosting the margin: A new explanation for the effectiveness of voting
methods. The Annals of Statistics 26 1651–1686. MR1673273
[267] Schmidhuber, J. (1992). Learning factorial codes by predictability min-
imization. Neural Computation 4 863–879.
[268] Schramowski, P., Stammer, W., Teso, S., Brugger, A., Her-
bert, F., Shao, X., Luigs, H.-G., Mahlein, A.-K. and Kersting, K.
(2020). Making deep neural networks right for the right scientific reasons
by interacting with their explanations. Nature Machine Intelligence 2 476–
486.
[269] Semenova, L., Rudin, C. and Parr, R. (2019). A study in Rashomon
curves and volumes: A new perspective on generalization and model sim-
plicity in machine learning. arXiv e-print arXiv:1908.01755.
[270] Seo, S., Meng, C. and Liu, Y. (2019). Physics-aware difference graph
networks for sparsely-observed dynamics. In Proceedings of the Interna-
tional Conference on Learning Representations (ICLR).
[271] Shu, T., Xiong, C. and Socher, R. (2018). Hierarchical and inter-
pretable skill acquisition in multi-task reinforcement learning. In Proceed-
ings of the International Conference on Learning Representations (ICLR).
[272] Silva, A., Gombolay, M., Killian, T., Jimenez, I. and Son, S.-H.
(2020). Optimization methods for interpretable differentiable decision trees
applied to reinforcement learning. In International Conference on Artificial
Intelligence and Statistics (AISTATS) 1855–1865.
[273] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van
Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneer-
shelvam, V., Lanctot, M. et al. (2016). Mastering the game of Go with
deep neural networks and tree search. Nature 529 484–489.
[274] Six, A., Backus, B. and Kelder, J. (2008). Chest pain in the emergency
room: value of the HEART score. Netherlands Heart Journal 16 191–196.
[275] Sokolova, M., Marchand, M., Japkowicz, N. and Shawe-
taylor, J. S. (2003). The decision list machine. In Proceedings of Confer-
ence on Neural Information Processing Systems (NeurIPS) 945–952.
[276] Sokolovska, N., Chevaleyre, Y., Clément, K. and Zucker, J.-D.
(2017). The fused lasso penalty for learning interpretable medical scor-
ing systems. In 2017 International Joint Conference on Neural Networks
(IJCNN) 4504–4511. IEEE.
[277] Sokolovska, N., Chevaleyre, Y. and Zucker, J.-D. (2018). A Prov-
able Algorithm for Learning Interpretable Scoring Systems. In Proceedings
of Machine Learning Research Vol. 84: Artificial Intelligence and Statistics
(AISTATS) 566–574.
[278] Spiegelhalter, D. (2020). Should We Trust Algorithms? Harvard Data
Science Review 2.
[279] Srebro, N., Sridharan, K. and Tewari, A. (2010). Smoothness, low
noise and fast rates. In Proceedings of Conference on Neural Information
Processing Systems (NeurIPS) 2199–2207.
[280] Sreedharan, S., Chakraborti, T. and Kambhampati, S. (2018).
Interpretable machine learning grand challenges 81
[326] Yu, C., Liu, J. and Nemati, S. (2019). Reinforcement learning in health-
care: A survey. arXiv e-print arXiv:1908.08796.
[327] Yu, J., Ignatiev, A., Bodic, P. L. and Stuckey, P. J. (2020). Optimal
Decision Lists using SAT. arXiv e-print arXiv:2010.09919.
[328] Yu, J., Ignatiev, A., Stuckey, P. J. and Le Bodic, P. (2020). Com-
puting Optimal Decision Sets with SAT. In International Conference on
Principles and Practice of Constraint Programming 952–970. Springer.
[329] Zahavy, T., Ben-Zrihem, N. and Mannor, S. (2016). Graying the
black box: Understanding DQNs. In International Conference on Machine
Learning 1899–1908.
[330] Zambaldi, V., Raposo, D., Santoro, A., Bapst, V., Li, Y.,
Babuschkin, I., Tuyls, K., Reichert, D., Lillicrap, T., Lock-
hart, E. et al. (2019). Relational deep reinforcement learning. In Proceed-
ings of the International Conference on Learning Representations (ICLR).
[331] Zech, J. R. et al. (2018). Variable generalization performance of a deep
learning model to detect pneumonia in chest radiographs: A cross-sectional
study. PLoS Med. 15.
[332] Zhang, K., Wang, Y., Du, J., Chu, B., Celi, L. A., Kindle, R. and
Doshi-Velez, F. (2021). Identifying Decision Points for Safe and Inter-
pretable Reinforcement Learning in Hypotension Treatment. In Proceedings
of the NeurIPS Workshop on Machine Learning for Health.
[333] Zhang, Q., Nian Wu, Y. and Zhu, S.-C. (2018). Interpretable convolu-
tional neural networks. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) 8827–8836.
[334] Zhang, Y., Song, K., Sun, Y., Tan, S. and Udell, M. (2019). “Why
Should You Trust My Explanation?” Understanding Uncertainty in LIME
Explanations. In Proceedings of the ICML AI for Social Good Workshop.
[335] Zhao, H., Jin, R., Wu, S. and Shi, J. (2011). PDE-constrained Gaussian
process model on material removal rate of wire saw slicing process. Journal
of Manufacturing Science and Engineering 133.
[336] Zhou, B., Bau, D., Oliva, A. and Torralba, A. (2018). Interpreting
deep visual representations via network dissection. IEEE Transactions on
Pattern Analysis and Machine Intelligence.
[337] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. and Torralba, A.
(2014). Object detectors emerge in deep scene CNNs. In Proceedings of the
International Conference on Learning Representations (ICLR).
[338] Zhou, B., Sun, Y., Bau, D. and Torralba, A. (2018). Interpretable
basis decomposition for visual explanation. In Proceedings of the European
Conference on Computer Vision (ECCV) 119–134.
[339] Zhou, D.-X. (2002). The covering number in learning theory. Journal of
Complexity 18 739–767.
[340] Zhu, Y., Zabaras, N., Koutsourelakis, P.-S. and Perdikaris, P.
(2019). Physics-constrained deep learning for high-dimensional surrogate
modeling and uncertainty quantification without labeled data. Journal of
Computational Physics 394 56–81.
Interpretable machine learning grand challenges 85
[341] Zhu, Z., Luo, P., Wang, X. and Tang, X. (2014). Multi-view per-
ceptron: a deep model for learning face identity and view representations.
In Proceedings of Conference on Neural Information Processing Systems
(NeurIPS) 217–225.