0% found this document useful (0 votes)
63 views

CS282BR: Topics in Machine Learning Interpretability and Explainability

The document discusses a course on machine learning interpretability and explainability. It provides an agenda for the course that includes an overview, the instructor's research, real-world scenarios requiring interpretability, and discussions of two research papers. The course aims to improve understanding of interpretability through reading and discussing papers, implementing algorithms, and critiquing the literature.

Uploaded by

azjajaoan malaya
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

CS282BR: Topics in Machine Learning Interpretability and Explainability

The document discusses a course on machine learning interpretability and explainability. It provides an agenda for the course that includes an overview, the instructor's research, real-world scenarios requiring interpretability, and discussions of two research papers. The course aims to improve understanding of interpretability through reading and discussing papers, implementing algorithms, and critiquing the literature.

Uploaded by

azjajaoan malaya
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

CS282BR: Topics in Machine Learning

Interpretability and Explainability

Hima Lakkaraju

Assistant Professor (Starting Jan 2020)


Harvard Business School + Harvard SEAS
Agenda
 Course Overview

 My Research

 Interpretability: Real-world scenarios

 Research Paper: Towards a rigorous science of interpretability

 Break (15 mins)

 Research Paper: The mythos of model interpretability

2
Course Staff & Office Hours

Hima Lakkaraju Ike Lage


(Instructor) (TF)

Office Hours: Office Hours:


Tuesday 330pm to 430pm Thursday 2pm to 3pm
MD 337 MD 337
[email protected]; [email protected]
[email protected]

Course Webpage: https://canvas.harvard.edu/courses/68154/


3
Goals of this Course
Learn and improve upon the state-of-the-art literature
on ML interpretability

 Understand where, when, and why is interpretability needed

 Read, present, and discuss research papers


 Formulate, optimize, and evaluate algorithms

 Implement state-of-the-art algorithms

 Understand, critique, and redefine literature


 EMERGING FIELD!!
4
Course Overview and Prerequisites

 Interpretable models & explanations


 rules, prototypes, linear models, saliency maps etc.

 Connections with causality, debugging, & fairness

 Focus on applications
 Criminal justice, healthcare

Prerequisites: linear algebra, probability, algorithms, machine


learning (cs181 or equivalent), programming in python, numpy,
sklearn;
5
Structure of the Course

 Introductory material (2 lectures)


 Learning interpretable models (3 lectures)
 Interpretable explanations of black-box models
(4 lectures)
 Interpretability and causality (1 lecture)
 Human-in-the-loop approaches to interpretability (1
lecture)
 Debugging and fairness (1 lecture)

1 lecture ~ 2.5 hours ~ 2 papers; Calendar on course webpage


6
Course Assessment

 3 Homeworks (30%)
 10% each

 Paper presentation and discussions (25%)


 Presentation (15%)
 Class discussion (10%)

 Semester project (45%)


 3 checkpoints (7% each)
 Final presentation (4%)
 Final paper (20%)
7
Course Assessment: Homeworks

 HW1: Machine learning refresher


 SVMs, neural networks, EM algorithm, unsupervised
learning, linear/logistic regression

 HW2 & HW3


 Implementing and critiquing papers discussed in class

 Homeworks are building blocks for semester project


– please take them seriously!

8
Course Assessment:
Paper Presentations and Discussions
 Each student will present a research paper after week 5
(individually or in a team of 2)
 45 minutes of presentation (slides or whiteboard)
 15 minutes of discussion and questions
 Sign up for presentations
(will send announcement in week 3)

 Each student is also expected to prepare & participate in


class discussions for all the papers
 What do you like/dislike about the paper?
 Any questions?
 Post on canvas
9
Course Assessment:
Semester Project and Checkpoints
 3 checkpoints
 Problem direction, context, outline of algorithm and evaluation
 Formulation, algorithm, preliminary results
 Additional theory/methods and results
 Templates will be posted on canvas

 Final presentation
 15 mins presentation + 5 mins questions

 Final report
 Detailed writeup (8 pages)
10
Timeline for Assignments

Assignment Out Due


HW1 09/16 09/30
HW2 09/30 10/14
HW3 10/21 11/04
CP1 09/23
CP2 10/21
CP3 11/11
Final Presentation 12/06
Final Report 12/09
Paper Presentation Sign up for slots (first-come, first-serve)

All Deadlines: Monday 11.59pm ET


Presentations: In class on Friday

11
COURSE REGISTRATION

 Application Form:
https://forms.gle/2cmGx3469zKyJ6DH9
 Due September 7th (Saturday) 11.59pm ET

 Selection decisions out on


 September 9th 11.59pm ET

 If selected, you must register by


 September 10th 11.59pm ET

12
Questions??
My Research

Facilitating Effective and Efficient


Human-Machine Collaboration
to Improve High-Stakes
Decision-Making

14
High-Stakes Decisions

 Healthcare: What treatment to recommend to the


patient?

 Criminal Justice: Should the defendant be released


on bail?

High-Stakes Decisions: Impact on human well-being.

15
Overview of My Research

Reliable Evaluation
Interpretable Models
of Models for
for Decision-Making
Decision-Making
Computational
Methods &
Algorithms
Characterizing Biases
Diagnosing Failures of
in Human & Machine
Predictive models
Decisions

Application
Domains
Law Healthcare Education Business

16
Academic Research

Interpretable Models for Reliable Evaluation


Decision-Making of Models for
Decision-Making
[KDD’16, AISTATS’17,
FAT ML’17, AIES’19] [QJE’18, KDD’17, KDD’15]

Characterizing Biases in
Diagnosing Failures of
Human & Machine
Predictive models
Decisions
[AAAI’17]
[NIPS’16, SDM’15]

17
We are Hiring!!
 Starting a new, vibrant research group
 Focus on ML methods as well as applications
 Collaborations with Law, Policy, and Medical schools

 PhD, Masters, and Undergraduate students


 Background in ML and Programming [preferred]
 Computer Science, Statistics, Data Science
 Business, Law, Policy, Public Health & Medical Schools

 Email me: [email protected];


[email protected]

18
Questions??
Real World Scenario: Bail Decision

 U.S. police make about 12M arrests each year

Release/Detain We consider the


binary decision

 Release vs. Detain is a high-stakes decision


 Pre-trial detention can go up to 9 to 12 months
 Consequential for jobs & families of defendants as well
as crime

20
Bail Decision
Fail to appear
Non-violent crime Unfavorable
s e Violent crime
le a
Re
None of the
Favorable
above

De
t ai n Spends time in jail

Judge is making a prediction:


Will the defendant commit ‘crime’ if released on bail?

21
Bail Decision-Making as a Prediction Problem

Build a model that predicts defendant behavior if


released based on his/her characteristics
Does making the model more
Training examples
Training ⊆ Set of Released Defendants
examples
understandable/transparent to the
judge improve
Characteristics decision-making
Defendant Outcome

Age Prev. performance?


Level of …
Crimes Charge
If2 so, Felony
how to… do it?Crime
28
14 1 Misd. … No Crime Learning
63
.
0
.
Misd.
.


No Crime
.
algorithm
. . . … .
. . . … .

Test case
Prediction:
Defendant
Characteristics
Outcome
Predictive Crime
35 3 Felony . ?
Model (0.83)
22
Our Experiment
If Current-Offense = Felony:
If Prior-Felony = Yes and Prior-Arrests ≥ 1, then Crime
If Crime-Status = Active and Owns-House = No and Has-Kids = No, then Crime
If Prior-Convictions = 0 and College = Yes and Owns-House = Yes, then No Crime

If Current-Offense = Misdemeanor and Prior-Arrests > 1:


If Prior-Jail-Incarcerations = Yes, then Crime
If Has-Kids = Yes and Married = Yes and Owns-House = Yes, then No Crime
If Lives-with-Partner = Yes and College = Yes and Pays-Rent = Yes, then No Crime

If Current-Offense = Misdemeanor and Prior-Arrests ≤ 1:


If Has-Kids = No and Owns-House = No and Moved_10times_5years = Yes, then Crime
If Age ≥ 50 and Has-Kids = Yes, then No Crime

Default: No Crime

Judges were able to make decisions 2.8 times faster


and 38% more accurately (compared to no explanation
and only prediction) !
23
Real World Scenario:
Treatment Recommendation

Demographics:
Age
Gender What treatment should be given?
….. Options: quick relief drugs (mild),
Medical History:
Has asthma? controller drugs (strong)
Other chronic issues?
……
Symptoms:
Severe Cough
Wheezing
……
Test Results:
Peak flow: Positive
Spirometry: Negative 24
Treatment Recommendation

Symptoms relieved in
 More than a week Unfavorable
i
User studiesld showed that doctors were  Within
able a week
to make decisions 1.9 times
m
faster and 26% more accurately when explanations were provided along
with the model! Favorable

stro Symptoms relieved in


n g  Within a week

Doctor is making a prediction:


Will the patient get better with a milder drug?
Use ML to make a similar prediction

25
Questions??
Interpretable Classifiers Using
Rules and Bayesian Analysis
Benjamin Letham, Cynthia Rudin, Tyler McCormick, David Madigan; 2015
Contributions

 Goal: Rigorously define and evaluate interpretability

 Taxonomy of interpretability evaluation

 Taxonomy of interpretability based on


applications/tasks

 Taxonomy of interpretability based on methods

28
Motivation for Interpretability

 ML systems are being deployed in complex high-


stakes settings

 Accuracy alone is no longer enough

 Auxiliary criteria are important:


 Safety
 Nondiscrimination
 Right to explanation

29
Motivation for Interpretability

 Auxiliary criteria are often hard to quantify


(completely)
 E.g.: Impossible to enumerate all scenarios violating safety
of an autonomous car

 Fallback option: interpretability


 If the system can explain its reasoning, we can verify if
that reasoning is sound w.r.t. auxiliary criteria

30
Prior Work: Defining and Measuring
Interpretability
 Little consensus on what interpretability is and how
to evaluate it

 Interpretability evaluation typically falls into:

 Evaluate in the context of an application

 Evaluate via a quantifiable proxy

31
Prior Work: Defining and Measuring
Interpretability
 Evaluate in the context of an application
 If a system is useful in a practical application or a
simplified version, it must be interpretable

 Evaluate via a quantifiable proxy


 Claim some model class is interpretable and present
algorithms to optimize within that class
 E.g. rule lists

You will know it when you see it!

32
Lack of Rigor?
 Yes and No
 Previous notions are reasonable
Important to formalize these notions!!!

 However,

 Are all models in all “interpretable” model classes equally


interpretable?
 Model sparsity allows for comparison

 How to compare a model sparse in features to a model sparse in


prototypes?

 Do all applications have same interpretability needs?


33
What is Interpretability?

 Defn: Ability to explain or to present in


understandable terms to a human

 No clear answers in psychology to:


 What constitutes an explanation?
 What makes some explanations better than the others?
 When are explanations sought?

This Work: Data-driven ways to derive operational definitions and


evaluations of explanations and interpretability

34
When and Why Interpretability?

 Not all ML systems require interpretability


 E.g., ad servers, postal code sorting
 No human intervention

 No explanation needed because:


 No consequences for unacceptable results
 Problem is well studied and validated well in real-world
applications  trust system’s decision

When do we need explanation then?

35
When and Why Interpretability?

 Incompleteness in problem formalization


 Hinders optimization and evaluation

 Incompleteness ≠ Uncertainty
 Uncertainty can be quantified
 E.g., trying to learn from a small dataset (uncertainty)

36
Incompleteness: Illustrative Examples

 Scientific Knowledge
 E.g., understanding the characteristics of a large dataset
 Goal is abstract

 Safety
 End to end system is never completely testable
 Not possible to check all possible inputs

 Ethics
 Guard against certain kinds of discrimination which are too
abstract to be encoded
 No idea about the nature of discrimination beforehand

37
Incompleteness: Illustrative Examples

 Mismatched objectives
 Often we only have access to proxy functions of the ultimate goals

 Multi-objective tradeoffs
 Competing objectives
 E.g., privacy and prediction quality
 Even if the objectives are fully specified, trade-offs are unknown,
decisions have to be case by case

38
Taxonomy of Interpretability Evaluation

Claim of the research should match the type of the evaluation!

39
Application-grounded evaluation

 Real humans (domain experts), real tasks

 Domain expert experiment with exact application task

 Domain expert experiment with a simpler of partial


task
 Shorten experiment time
 Increases number of potential subjects

 Typical in HCI and visualization communities


40
Human-grounded evaluation

 Real humans, simplified tasks


 Can be completed with lay humans
 Larger pool, less expensive
 More general notions of explainability
 Eg., what kinds of explanations are understood under time
constraints?

 Potential experiments
 Pairwise comparisons
 Simulate the model output
 What changes should be made to input to change the
output?
41
Functionally-grounded evaluation

 No humans, proxy tasks


 Appropriate for a class of models already validated
 Eg., decision trees
 A method is not yet mature
 Human subject experiments are unethical
 What proxies to use?

 Potential experiments
 Complexity (of a decision tree) compared to other other
models of the same (similar) class
 How many levels? How many rules?

42
Open Problems: Design Issues

 What proxies are best for what real world


applications?

 What factors to consider when designing simpler


tasks in place of real world tasks?

How about a data-driven approach to characterize interpretability?

43
Matrix Factorization: Netflix Problem

44
Data-driven approach to
characterize interpretability

K is the number of latent dimensions

Matrix on the left is very expensive and time consuming to obtain – requires
evaluation in real world applications with domain experts!
So, data-driven approach to characterize interpretability is not feasible!

45
Taxonomy based on applications/tasks

 Global vs. Local


 High level patterns vs. specific decisions

 Degree of Incompleteness
 What part of the problem is incomplete? How incomplete
is it?
 Incomplete inputs or constraints or costs?

 Time Constraints
 How much time can the user spend to understand
explanation?
46
Taxonomy based on applications/tasks

 Nature of User Expertise


 How experienced is end user?
 Experience affects how users process information
 E.g., domain experts can handle detailed, complex
explanations compared to opaque, smaller ones

 Note: These taxonomies are constructed based on intuition


and are not data or evidence driven. They must be treated as
hypotheses.

47
Taxonomy based on methods

 Basic units of explanation:


 Raw features? E.g., pixel values
 Semantically meaningful? E.g., objects in an image
 Prototypes?

 Number of basic units of explanation:


 How many does the explanation contain?
 How do various types of basic units interact?
 E.g., prototype vs. feature

48
Taxonomy based on methods

 Level of compositionality:
 Are the basic units organized in a structured way?
 How do the basic units compose to form higher order units?

 Interactions between basic units:


 Combined in linear or non-linear ways?
 Are some combinations easier to understand?

 Uncertainty:
 What kind of uncertainty is captured by the methods?
 How easy is it for humans to process uncertainty?
49
Summary

 Goal: Rigorously define and evaluate interpretability

 Taxonomy of interpretability evaluation

 An attempt at data-driven characterization of


interpretability

 Taxonomy of interpretability based on applications/tasks

 Taxonomy of interpretability based on methods


50
Questions??
Let’s start the critique!
The Mythos of Model Interpretability
Zachary Lipton; 2017
Contributions

 Goal: Refine the discourse on interpretability

 Outline desiderata of interpretability research


 Motivations for interpretability are often diverse and
discordant

 Identifying model properties and techniques thought


to confer interpretability

54
Motivation

 We want models to be not only good w.r.t. predictive


capabilities, but also interpretable

 Interpretation is underspecified
 Lack of a formal technical meaning

 Papers provide diverse and non-overlapping


motivations for interpretability

55
Prior Work: Motivations for Interpretability

Interpretability promotes trust

 But what is trust?

 Is it faith in model performance?

 If so, why are accuracy and other standard


performance evaluation techniques inadequate?

56
When is interpretability needed?
 Simplified optimization objectives fail to capture
complex real life goals.
 Algorithm for hiring decisions – productivity and ethics
 Ethics is hard to formulate

 Training data is not representative of deployment


environment

Interpretability serves those objectives that we deem important


but struggle to model formally!

57
Desiderata

 Understanding motivations for interpretability


through the lens of prior literature

 Trust
 Causality
 Transferability
 Informativeness
 Fair and Ethical Decision Making

58
Desiderata: Trust
 Is trust simply confidence that the model will perform well?
 If so, interpretability serves no purpose

 A person might feel at ease with a well understood model,


even if this understanding has no purpose

 Training and deployment objectives diverge


 Eg., model makes accurate predictions but not validated for racial
biases

 Trust  relinquish control


 For which examples is the model right?
59
Desiderata: Causality

 Researchers hope to infer properties (beyond


correlational associations) from
interpretations/explanations
 Regression reveals strong association between smoking
and lung cancer

 However, task of inferring causal relationships from


observational data is a field in itself
 Don Rubin
 Judea Pearl

60
Desiderata: Transferability

 Humans exhibit richer capacity to generalize, transferring


learned skills to unfamiliar situations
 Model’s generalization error: gap between performance on
training and test data
 We already use ML in non-stationary environments

 Environment might even be adversarial


 Changing pixels in an image tactically could throw off models
but not humans

 Predictive models can often be gamed


 In such cases, predictive power loses meaning
61
Desiderata: Informativeness

 Predictions  Decisions
 Convey additional information to human decision makers

 Example: Which conference should I target?


 A one word answer is not very meaningful

 Interpretation might be meaningful even if it does


not shed light on model’s inner workings
 Similar cases for a doctor in support of a decision

62
Desiderata: Fair & Ethical Decision Making

 ML is being deployed in critical settings


 Eg., Bail and recidivism predictions

 How can we be sure algorithms do not discriminate


on the basis of race?
 AUC is not good enough

 Side note: European Union – Right to explanation

63
Properties of Interpretable Models

 Transparency
 How exactly does the model work?
 Details about its inner workings, parameters etc.

 Post-hoc explanations:
 What else can the model tell me?
 Eg., visualizations of learned model, explaining by
example

64
Transparency: Simulatability
 Can a person contemplate the entire model at once?
 Need a very simple model

 A human should be able to take input data and model


parameters and calculate prediction

 Simulatability: size of the model + computation


required to perform inference
 Decision trees: size of the model may grow faster than
time to perform inference

65
Transparency: Decomposability

 Understanding each input, parameter, calculation


 Eg., decision trees, linear regression

 Inputs must be interpretable


 Models with highly engineered or anonymous features are
not decomposable

66
Algorithmic Transparency

 Learning algorithm itself is transparent


 Eg., linear models (error surface, unique solution)

 Modern deep learning methods lack this kind of


transparency
 We don’t understand how the optimization methods work
 No guarantees of working on new problems

 Note: Humans do not exhibit any of these forms of


transparency
67
Post-hoc: Text Explanations

 Humans often justify decisions verbally (post-hoc)

 Krening et. al.:


 One model is a reinforcement learner
 Another model maps models states onto verbal
explanations
 Explanations are trained to maximize likelihood of ground
truth explanations from human players
 So, explanations do not faithfully describe agent decisions,
but rather human intuition

68
Post-hoc: Visualization

 Visualize high-dimensional data with t-SNE


 2D visualizations in which nearby data points appear close

 Perturb input data to enhance activations of certain


nodes in neural nets (image classification)
 Helps understand which nodes corresponds to what
aspects of the image
 Eg., certain nodes might correspond to dog faces

69
Post-hoc: Example Explanations

 Reasoning with examples

 Eg., Patient A has a tumor because he is similar to


these k other data points with tumors

 k neighbors can be computed by using some distance


metric on learned representations
 Eg., word2vec

70
Post-hoc: Local Explanations

 Hard to explain a complex model in its entirety


 How about explaining smaller regions?

LIME (Ribeiro et. al.)

 Explains decisions of any model in a local region around a


particular point

 Learns sparse linear model


71
Claims about interpretability must
be qualified

 If a model satisfies a form of transparency, highlight


that clearly

 For post-hoc interpretability, fix a clear objective and


demonstrate evidence

72
Transparency may be at odds with
broader objectives of AI
 Choosing interpretable models over accurate ones to
convince decision makers

 Short term goal of building trust with doctors might


clash with long term goal of improving health care

73
Post-hoc interpretations can mislead

 Do not blindly embrace post-hoc explanations!

 Post-hoc explanations can seem plausible but be


misleading
 They do not claim to open up the black-box;
 They only provide plausible explanations for its behavior
 Eg., text explanations

74
Are linear models always more
transparent than
deep neural networks?
Read 4.1 [Lipton] and write a paragraph on canvas.

Must cover different perspectives on transparency:


simulatability, decomposability, and algorithmic transparency

Due September 10th (Tuesday) 11.59pm ET


Summary

 Goal: Refine the discourse on interpretability

 Outline desiderata of interpretability research


 Motivations for interpretability are often diverse and
discordant

 Identifying model properties and techniques thought


to confer interpretability

76
Takeaways

 Interpretability is often desired when there is


 Incompleteness
 Mismatch between training and deployment environments

 There is no single definition of interpretability that


caters to all needs

 Build reliable taxonomies

 Build unified terminology

77
Questions??
Let’s start the critique!
Things to do!

 To apply for course enrollment, please fill out


https://forms.gle/2cmGx3469zKyJ6DH9 by September 7th 11.59pm ET

 Readings for next week (empirical studies)


 An Evaluation of the Human-Interpretability of Explanation
 Manipulating and Measuring Model Interpretability
 What do you like/dislike about each paper? Why?
Any questions? – post on canvas before next lecture
 Please be prepared for class discussions!

 Start thinking about project proposals


 Come talk to us during office hours
 Note: Checkpoint 1 due on 09/23

80
Course Participation Credit - Today

Are linear models always more


transparent than
deep neural networks?
[Due September 10th 11.59pm ET]
 Read 4.1 [Lipton] and write a paragraph on canvas.

 Must cover different perspectives on transparency:


simulatability, decomposability, and algorithmic
transparency

81
Upcoming Deadlines

 Checkpoint 1: Project proposals due 09/23

 September 16th -- HW1 released


 Refresher for your ML concepts – SVMs, Neural
Networks, Regression, Unsupervised Learning, EM
Algorithm
 2 weeks to finish

 Please check course webpage regularly!

82
Relevant Conferences to Explore

 ICML
 NeurIPS
 ICLR
 UAI
 AISTATS
 KDD
 AAAI
 FAT*
 AIES

83
Questions??

You might also like