An Introduction To Machine Learning Interpretability
An Introduction To Machine Learning Interpretability
Machine Learning
Interpretability
An Applied Perspective on Fairness,
Accountability, Transparency, and
Explainable AI
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. An Introduction to Machine
Learning Interpretability, the cover image, and related trade dress are trademarks of O’Reilly Media,
Inc.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsi‐
bility for errors or omissions, including without limitation responsibility for damages resulting from
the use of or reliance on this work. Use of the information and instructions contained in this work is
at your own risk. If any code samples or other technology this work contains or describes is subject
to open source licenses or the intellectual property rights of others, it is your responsibility to ensure
that your use thereof complies with such licenses and/or rights.
978-1-492-03314-1
[LSI]
Table of Contents
iii
An Introduction to Machine Learning
Interpretability
Understanding and trusting models and their results is a hallmark of good sci‐
ence. Scientists, engineers, physicians, researchers, and humans in general have
the need to understand and trust models and modeling results that affect their
work and their lives. However, the forces of innovation and competition are now
driving analysts and data scientists to try ever-more complex predictive modeling
and machine learning algorithms. Such algorithms for machine learning include
gradient-boosted ensembles (GBM), artificial neural networks (ANN), and ran‐
dom forests, among many others. Many machine learning algorithms have been
labeled “black box” models because of their inscrutable inner-workings. What
makes these models accurate is what makes their predictions difficult to under‐
stand: they are very complex. This is a fundamental trade-off. These algorithms
are typically more accurate for predicting nonlinear, faint, or rare phenomena.
Unfortunately, more accuracy almost always comes at the expense of interpreta‐
bility, and interpretability is crucial for business adoption, model documentation,
regulatory oversight, and human acceptance and trust.
The inherent trade-off between accuracy and interpretability in predictive mod‐
eling can be a particularly vexing catch-22 for analysts and data scientists work‐
ing in regulated industries. Due to strenuous regulatory and documentation
requirements, data science professionals in the regulated verticals of banking,
insurance, healthcare, and other industries often feel locked into using tradi‐
tional, linear modeling techniques to create their predictive models. So, how can
you use machine learning to improve the accuracy of your predictive models and
increase the value they provide to your organization while still retaining some
degree of interpretability?
This report provides some answers to this question by introducing interpretable
machine learning techniques, algorithms, and models. It discusses predictive
modeling and machine learning from an applied perspective and puts forward
social and commercial motivations for interpretability, fairness, accountability,
1
and transparency in machine learning. It defines interpretability, examines some
of the major theoretical difficulties in the burgeoning field, and provides a taxon‐
omy for classifying and describing interpretable machine learning techniques.
We then discuss many credible and practical machine learning interpretability
techniques, consider testing of these interpretability techniques themselves, and,
finally, we present a set of open source code examples for interpretability techni‐
ques.
Figure 1-1. An illustration of the error surface of a traditional linear model. (Figure
courtesy of H2O.ai.)
Because of the convex nature of the error surface for linear models, there is basi‐
cally only one best model, given some relatively stable set of inputs and a predic‐
tion target. The model associated with the error surface displayed in Figure 1-1
would be said to have strong model locality. Moreover, because the weighting of
income versus interest rate is highly stable in the pictured error function and its
associated linear model, explanations about how the function made decisions
about loan defaults based on those two inputs would also be stable. More stable
explanations are often considered more trustworthy explanations.
Figure 1-2 depicts a nonconvex error surface that is representative of the error
function for a machine learning function with two inputs—for example, a cus‐
tomer’s income and a customer’s interest rate—and an output, such as the same
customer’s probability of defaulting on a loan. This nonconvex error surface with
Figure 1-2. An illustration of the error surface of a machine learning model. (Figure
courtesy of H2O.ai.)
Figure 1-3. A linear model, g(x), predicts the average number of purchases, given a
customer’s age. The predictions can be inaccurate but the explanations are straight‐
forward and stable. (Figure courtesy of H2O.ai.)
Defining Interpretability
Let’s take a step back now and offer a definition of interpretability, and also
briefly introduce those groups at the forefront of machine learning interpretabil‐
ity research today. In the context of machine learning models and results, inter‐
pretability has been defined as “the ability to explain or to present in
understandable terms to a human.”[13]. The latter might be the simplest defini‐
tion of machine learning interpretability, but there are several communities with
different and sophisticated notions of what interpretability is today and should be
in the future. Two of the most prominent groups pursuing interpretability
research are a group of academics operating under the acronym FAT* and civil‐
ian and military researchers funded by the Defense Advanced Research Projects
Agency (DARPA). FAT* academics (meaning fairness, accountability, and trans‐
parency in multiple artificial intelligence, machine learning, computer science,
legal, social science, and policy applications) are primarily focused on promoting
and enabling interpretability and fairness in algorithmic decision-making sys‐
tems with social and commercial impact. DARPA-funded researchers seem pri‐
marily interested in increasing interpretability in sophisticated pattern
recognition models needed for security applications. They tend to label their
work explainable AI, or XAI.
Defining Interpretability | 9
A Machine Learning Interpretability Taxonomy for
Applied Practitioners
Technical challenges as well as the needs and perspectives of different user com‐
munities make machine learning interpretability a subjective and complicated
subject. Luckily, a previously defined taxonomy has proven useful for character‐
izing the interpretability of various popular explanatory techniques used in
commercial data mining, analytics, data science, and machine learning applica‐
tions[10]. The taxonomy describes models in terms of their complexity, and cate‐
gorizes interpretability techniques by the global or local scope of explanations
they generate, the family of algorithms to which they can be applied, and their
ability to promote trust and understanding.
Figure 1-5. A data visualization, a correlation graph, that is helpful for enhancing
trust and understanding in machine learning models because it displays important,
complex relationships between variables in a dataset as edges and nodes in an
undirected graph. (Figure courtesy of H2O.ai.)
Testing Interpretability
The approximate nature of machine learning explanations can, and often should,
call into question the trustworthiness of model explanations themselves. Don’t
Testing Interpretability | 29
fret! You can test explanations for accuracy. Originally, researchers proposed test‐
ing machine learning model explanations by their capacity to enable humans to
correctly determine the outcome of a model prediction based on input data val‐
ues.[13] Very recent research has highlighted the potential bias of human practi‐
tioners toward simpler explanations, even when simple explanations are
inaccurate[14]. Given that human evaluation studies are likely impractical for
most commercial data science or machine learning groups anyway, several more
automated approaches for testing model explanations are proposed here.
Simulated data
You can use simulated data with known characteristics to test explanations.
For instance, models trained on totally random data with no relationship
between a number of input variables and a prediction target should not give
strong weight to any input variable nor generate compelling local explana‐
tions or reason codes. Conversely, you can use simulated data with a known
signal generating function to test that explanations accurately represent that
known function.
Explanation stability with increased prediction accuracy
If previously known, accurate explanations or reason codes from a simpler
linear model are available, you can use them as a reference for the accuracy
of explanations from a related, but more complex and hopefully more accu‐
rate, model. You can perform tests to see how accurate a model can become
before its prediction’s reason codes veer away from known standards.
Explanation stability under data perturbation
Trustworthy explanations likely should not change drastically for minor
changes in input data. You can set and test thresholds for allowable explana‐
tion value changes automatically by perturbing input data. Explanations or
reason code values can also be averaged across a number of models to create
more stable explanations.
References
[1] Hall, Patrick, Wen Phan, and Katie Whitson. The Evolution of Analytics:
Opportunities and Challenges for Machine Learning in Business. Sebastopol, CA:
O’Reilly Media, 2016. http://oreil.ly/2DIBefK
[2] Interpretability. Fast Forward Labs, 2017. https://www.fastforwardlabs.com/
research/ff06
[3] Donoho, David. “50 years of Data Science.” Tukey Centennial Workshop,
2015. http://bit.ly/2GQOh1J
[4] Nguyen, Anh, Jason Yosinski, and Jeff Clune. “Deep neural networks are
easily fooled: High confidence predictions for unrecognizable images.” Proceed‐
ings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
http://www.evolvingai.org/fooling
Conclusion | 31
[5] Angwin, Julia et al. “Machine bias: there’s software used across the country to
predict future criminals. and it’s biased against blacks.” ProPublica. 2016. https://
www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
[6] Goodman, Bryce, and Seth Flaxman. “EU regulations on algorithmic
decision-making and a ‘right to explanation’.” ICML workshop on human inter‐
pretability in machine learning (WHI 2016). 2016. URL https://arxiv.org/pdf/
1606.08813.pdf
[7] Evtimov, Ivan et al. “Robust Physical-World Attacks on Deep Learning Mod‐
els.” arXiv preprint. 2017. https://iotsecurity.eecs.umich.edu/#roadsigns
[8] Bob Crutchfield. “Approve More Business Customers,” Equifax Insights Blog.
2017. https://insight.equifax.com/approve-business-customers/
[9] Breiman, Leo. “Statistical modeling: The two cultures (with comments and a
rejoinder by the author).” Statistical Science. 2001. http://bit.ly/2pwz6m5
[10] Hall, Patrick, Wen Phan, and Sri Satish Ambati. “Ideas on interpreting
machine learning.” O’Reilly Ideas. 2017. https://www.oreilly.com/ideas/ideas-on-
interpreting-machine-learning
[11] Patrick Hall et al. Machine Learning Interpretability with H2O Driverless AI.
H2O.ai. 2017. http://bit.ly/2FRqKAL
[12] Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “Why should I
trust you?: Explaining the predictions of any classifier.” Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data Min‐
ing. 2016. http://bit.ly/2FROG6X
[13] Doshi-Velez, Finale and Been Kim. “Towards a rigorous science of interpret‐
able machine learning.” arXiv preprint. 2017. https://arxiv.org/pdf/1702.08608.pdf
[14] Herman, Bernease. “The Promise and Peril of Human Evaluation for Model
Interpretability.” arXiv preprint. 2017. https://arxiv.org/pdf/1711.07414.pdf
[15] Lichman, M. UCI machine learning repository, 2013. http://
archive.ics.uci.edu/ml