0% found this document useful (0 votes)

9 views49 pages

Module3 Lecture7 Part1

Uploaded by

vaibhavkirar459

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views49 pages

Module3 Lecture7 Part1

Uploaded by

vaibhavkirar459

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Concept-based

explanations
CSEP 590B: Explainable AI
Hugh Chen, Ian Covert & Su-In Lee
University of Washington

©2022 Su-In Lee 1

Course announcements
§ HW1 grades are released
§ We need one more week 8 discussion leader

©2022 Su-In Lee 2

Recall: decomposability
§ Do model components have an intuitive role
(inputs, parameters, calculations)?
§ Examples: splits in decision tree, weights in linear
model, input features

§ Concept explanations consider the role of

high-level concepts rather than original inputs
§ Potentially more intuitive, meaningful to humans

Lipton, "The Mythos of model interpretability: In machine learning, the concept of

interpretability is both important and slippery” (2018)

©2022 Su-In Lee 3

Setup
§ Focusing on high-dimensional data
§ Mainly images, possibly genomics or NLP

§ With high-dimensional data, humans may

prefer to operate on high-level concepts
§ A processed version of input, possibly with fewer
dimensions (compressed)
§ More intuitive meaning, more direct relationship with
outcome than original features (e.g., pixels)

©2022 Su-In Lee 4

High-level features in DNNs
§ Conventional wisdom about how DNNs process
images:
§ Input layer is pixels
§ First layer detects edges
§ Next layers find parts
§ Highest layers detect objects
§ Last layer makes classification

Zeiler & Fergus, “Visualizing and understanding convolutional networks” (2013)

©2022 Su-In Lee 5

Analogy to human reasoning
§ Seemingly true for DNNs, and interesting to
compare with humans
§ Humans seem to reason in a similar, hierarchical
manner
§ Typically prefer explanations based on high-level,
intuitive concepts

§ Can we incorporate this into an explanation

approach?

©2022 Su-In Lee 6

Concept representation

§ Consider concepts as an
intermediate representation
§ Examples: color, texture, object Inputs
parts, shape

§ Properties: Concepts
§ Compressed (fewer dimensions)
§ Sacrifices minimal information
§ Intuitive meaning Output
§ Simpler relationship with output

©2022 Su-In Lee 7

Concept explanations

Previous methods operate at input layer Inputs

Concept explanations operate here Concepts

Output

©2022 Su-In Lee 8

Image example
§ Explaining at pixel-level localizes important
information
§ But is importance due to color, texture, shape, or
something else?

©2022 Su-In Lee 9

Image example (cont.)
§ Alternatively, explanations can be based on
high-level concepts
§ Potentially more informative, intuitive for
humans

©2022 Su-In Lee 10

Medical image example
Input image Saliency map

Benign

Can we go beyond
localization?

Benign

Provided by Alex DeGrave, MD/PhD student in the AIMS lab

©2022 Su-In Lee 11

Challenges
§ Which concepts should we consider?
§ How do we obtain a concept-based
representation of the input data?

§ Possible approaches:
§ Adjust the model to guarantee that specific concepts
are used
§ Use a standard model, then discover how concepts
are represented within the model

©2022 Su-In Lee 12

Today
§ Section 1
§ Concept bottleneck models
§ Concept activation vectors
§ StylEx
§ Section 2
§ Neuron interpretation

©2022 Su-In Lee 13

Main idea
§ Force a deep learning model to represent
specific concepts before making prediction
§ Then, use intermediate concept representation
to understand the model’s dependencies

Koh et al., “Concept bottleneck models” (2020)

©2022 Su-In Lee 14

Concept bottleneck models
Concepts → predictions

Inputs → concepts

Koh et al., “Concept bottleneck models” (2020)

©2022 Su-In Lee 15

Learning concept bottleneck
models
$
§ Training data 𝑥 , 𝑦 , 𝑐 ! !
!"#
!
, where 𝑥 is
input, 𝑦 is label, and 𝑐 is concept vector
§ Create an architecture with bottleneck layer
§ Map from inputs to concepts with 𝑐̂ = 𝑔 𝑥
§ Then map to labels with 𝑓 𝑔 𝑥
§ Train the model to accurately predict both
concepts and labels
§ Can train either jointly or sequentially

©2022 Su-In Lee 16

Test-time interventions
§ Analyze how the model responds to changes in
the predicted concepts
§ Intervene on samples by replacing incorrectly
predicted concepts with true concept values

©2022 Su-In Lee 17

Successful test-time
interventions

Intervening on one or more concepts can correct the model prediction

©2022 Su-In Lee 18

Generating explanations
§ Additionally, we can apply explanation approaches
from previous lectures

§ Gradient-based explanations:
§ Is the output sensitive to a concept being slightly more
expressed?

§ Removal-based explanations:
§ Is the output sensitive to removing information from one
or more concepts?
§ E.g., leave-one-out or Shapley values

§ Counterfactual explanations (next time)

©2022 Su-In Lee 19

Remarks
§ Pros:
§ CBM ensures the model operates on a known set of
concepts (and nothing else)
§ Enables intervention and explanation via concepts

§ Cons:
§ Must use modified architecture
§ Requires comprehensive set of concepts for high
accuracy
§ Requires concept annotations in training data

©2022 Su-In Lee 20

Today
§ Section 1
§ Concept bottleneck models
§ Concept activation vectors
§ StylEx
§ Section 2
§ Neuron interpretation

©2022 Su-In Lee 21

Main idea
§ Post-hoc approach to identify concepts in a
model’s latent space (internal representation)
§ Alternative to using a concept bottleneck layer
§ After training the model, use concept samples
to find concept activation vectors (CAV)
§ Investigate a prediction’s sensitivity to concepts

Kim et al. "Interpretability beyond feature attribution: Quantitative testing with concept
activation vectors (TCAV)" (2018)

©2022 Su-In Lee 22

Concept activation vector (CAV)

§ Choose a concept, select a hidden layer

§ Find the direction separating samples that
represent the concept
Activation values

Deep
model

Concept
examples Random
(stripes) examples

©2022 Su-In Lee CAV based on linear classifier 23

CAV computation
§ Calculate embeddings for positive and negative
concept examples
§ Train a linear classifier to separate them
§ CAV is vector orthogonal to classification boundary

Concept
examples Random
(stripes) examples

CAV based on linear classifier

©2022 Su-In Lee 24

Sanity checks
§ Calculate CAV for a given concept
§ Examine images strongly activated along CAV
direction

©2022 Su-In Lee 25

Conceptual sensitivity
§ Recall, input gradients consider sensitivity to
small changes in pixel intensity

§ Here, conceptual sensitivity is about small

changes in a concept’s intensity
§ Calculate the impact of small perturbations in CAV
direction
§ Equivalent to a directional derivative

©2022 Su-In Lee 26

Conceptual sensitivity (cont.)
§ Let 𝑥 be an input, 𝑘 class of interest
§ Let 𝑓% 𝑥 be intermediate representation and
ℎ%,' 𝑓% 𝑥 the prediction for class 𝑦
§ Let 𝑣(% be the CAV for concept 𝐶
§ Conceptual sensitivity 𝑆(,',% 𝑥 ∈ ℝ is given by:

ℎ%,' 𝑓% 𝑥 + 𝜖𝑣(% − ℎ%,' 𝑓% 𝑥

𝑆(,',% 𝑥 = lim
)→+ 𝜖
Directional derivative
= ∇ℎ%,' 𝑓% 𝑥 ⋅ 𝑣(%
Can be obtained via dot product
©2022 Su-In Lee 27
Conceptual sensitivity (cont.)
§ Conceptual sensitivity:
𝑆(,',% 𝑥 = ∇ℎ%,' 𝑓% 𝑥 ⋅ 𝑣(%

Sample

Conceptual sensitivity

Directional derivative CAV (e.g., stripes)

Output Embedding
function function

©2022 Su-In Lee 28

Local explanations
§ Consider input 𝑥, class of interest 𝑘
§ How relevant is each concept to this prediction?
§ We can calculate conceptual sensitivity 𝑆(,',% 𝑥
for all concepts 𝐶

©2022 Su-In Lee 29

Global explanations
§ Consider a class of interest 𝑘, and a concept 𝐶
§ How relevant is the concept to this class?
§ Kim et al. propose the TCAV score to
summarize many local explanations:

𝑥 ∈ 𝑋' : 𝑆!,#,$ 𝑥 > 0

TCAV(,',% =
𝑋'

Set of examples with class 𝑘

Portion of examples where
concept 𝐶 is positively related
𝑋! =
©2022 Su-In Lee 30
Example results

©2022 Su-In Lee 31

Example results

©2022 Su-In Lee 32

Remarks
§ Pros:
§ TCAV is post-hoc, no architecture modifications
§ Fewer concept annotations required (but we still
need examples to find CAVs)

§ Cons:
§ Single direction (CAV) may not be able to represent
complex concepts
§ Sensitivity to small changes may not be meaningful
§ Results depend on the layer
©2022 Su-In Lee 33
Today
§ Section 1
§ Concept bottleneck models
§ Concept activation vectors
§ StylEx
§ Section 2
§ Neuron interpretation

©2022 Su-In Lee 34

Main idea
§ Train a model that maps samples to
disentangled latent factors (StyleGAN)
§ Then, incorporate a classifier into the GAN
§ Use humans to interpret each dimension of the
StyleSpace as a concept (attribute)
§ Generate attribute-wise counterfactuals, see
how they impact the classifier

©2022 Su-In Lee 35

StyleGAN2
§ A GAN architecture for generative image
modeling, state-of-the-art performance in
distribution quality metrics
§ Produces a disentangled latent space
§ Latent dimensions correspond to high-level
attributes (e.g., pose, freckles, hair)
§ Here, single dimensions rather than directions (like in
TCAV)

Karras et al. “Analyzing and improving the image quality of StyleGAN” (2020)

StyleGAN2 (cont.)
§ Basically, a GAN with improved architecture and
training

𝑧 ∼ 𝑁(0, 𝐼) Generator 𝑥, Discriminator 𝑃 real

Goodfellow et al. “Generative adversarial networks” (2014)

Example results
Fake people produced by StyleGAN2 generator

Observation: StyleSpace is
disentangled
§ Wu et al. explored an intermediate layer in
StyleGAN2, called the “StyleSpace”
§ Proposed using concept examples to identify
dimensions that correspond to concepts (e.g.,
hair style, glasses)
§ Then, adjusted these attributes to generate new
images with desired properties

Wu et al., "StyleSpace analysis: Disentangled controls for StyleGAN image generation"

(2021)

Observation: StyleSpace is
disentangled

Wu et al., "StyleSpace analysis: Disentangled controls for StyleGAN image generation"

(2021)

Latent space can represent
concepts

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”

(2021)

Combining classifier with
StyleGAN2
§ StyleGAN can produce attributes that don’t
affect the classifier
§ StylEx proposed a StyleGAN training procedure
that incorporates a classifier
§ Learns a classifier-specific StyleSpace
§ Classification loss ensures that generated image has
same classification as corresponding original image

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”

(2021)

Combining classifier with
StyleGAN2
Learned components: E, G, and D

Learning objectives in red

Adversarial loss D
Discriminator

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”

(2021)

Example concepts in gender
classifier

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”

(2021)

Example concepts in age
classifier

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”

(2021)

Local explanations

Independent

Interactions

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”

(2021)

Remarks
§ Pros:
§ StyleGAN is trained without concept labels, concept
directions are discovered automatically after training

§ Cons:
§ GANs are difficult to train
§ Requires manual inspection to determine if latent space
maps to disentangled factors (not guaranteed, works best
for faces)

§ Note: this can be considered a counterfactual

explanation that changes one attribute at a time

Conclusion
§ Concepts are not inherently explanations
§ Concept explanations typically require two steps:
§ Learning a latent space of human understandable
concepts
§ Explaining model predictions via that latent space

Approach Concept annotation Explanation Learning approach

Concept
All training samples Intervention Supervised
bottleneck
TCAV Some samples Directional derivative Post-hoc supervised

StylEx Some samples Counterfactuals Unsupervised

Today
§ Section 1
§ Concept bottleneck models
§ Concept activation vectors
§ StylEx
§ 10 min break
§ Section 2
§ Neuron interpretation

Explainable AI
No ratings yet
Explainable AI
18 pages
Head First Object-Oriented Analysis and Design A Brain Friendly Guide To OOA&D
100% (5)
Head First Object-Oriented Analysis and Design A Brain Friendly Guide To OOA&D
603 pages
Internship Report
100% (2)
Internship Report
59 pages
Concept-Based Explainable Artificial Intelligence: A Survey
No ratings yet
Concept-Based Explainable Artificial Intelligence: A Survey
46 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
163 pages
Dynamic Behavior of Materials, Volume 1: Leslie E. Lamberson Editor
No ratings yet
Dynamic Behavior of Materials, Volume 1: Leslie E. Lamberson Editor
218 pages
Deep Learning Model
No ratings yet
Deep Learning Model
144 pages
Little Book of Deep Learning
100% (1)
Little Book of Deep Learning
158 pages
Mathematics 9 Curriculum Mapdocx
No ratings yet
Mathematics 9 Curriculum Mapdocx
12 pages
Dragonpay API
No ratings yet
Dragonpay API
31 pages
Quantitative Research Designs
100% (15)
Quantitative Research Designs
16 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
155 pages
A Comprehensive Guide To Explainable Ai: From Classical Models To Llms
No ratings yet
A Comprehensive Guide To Explainable Ai: From Classical Models To Llms
255 pages
Kubernetes Container
No ratings yet
Kubernetes Container
7 pages
Alice Book Volume 1
No ratings yet
Alice Book Volume 1
378 pages
Micrologix 1200 and 1500 Programmable Controllers Firmware Upgrade
No ratings yet
Micrologix 1200 and 1500 Programmable Controllers Firmware Upgrade
12 pages
Yu Xiaozhuo
No ratings yet
Yu Xiaozhuo
85 pages
Computer Vision Progress and Perspectives
No ratings yet
Computer Vision Progress and Perspectives
76 pages
1.neural Networks and Convolutional Processing
No ratings yet
1.neural Networks and Convolutional Processing
94 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
168 pages
LBDL
No ratings yet
LBDL
156 pages
Explainable AI XAI Explained
No ratings yet
Explainable AI XAI Explained
6 pages
LBDL
No ratings yet
LBDL
156 pages
DAAI - Lecture - 15 - 23nov22
No ratings yet
DAAI - Lecture - 15 - 23nov22
113 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
167 pages
L12 Intro-Cnn-Part1 Slides
No ratings yet
L12 Intro-Cnn-Part1 Slides
56 pages
AI Slide 2
No ratings yet
AI Slide 2
82 pages
Regression
No ratings yet
Regression
49 pages
LBDL
No ratings yet
LBDL
185 pages
Lec14 CNNRNNModels
No ratings yet
Lec14 CNNRNNModels
64 pages
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
No ratings yet
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
163 pages
Unit 4
No ratings yet
Unit 4
86 pages
LBDL
No ratings yet
LBDL
143 pages
Concept Embedding Analysis - A Review
No ratings yet
Concept Embedding Analysis - A Review
47 pages
Lbdlu
No ratings yet
Lbdlu
168 pages
Visualization and Interpretation of The Functioning of Deep Neural Networks
No ratings yet
Visualization and Interpretation of The Functioning of Deep Neural Networks
38 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
143 pages
Learning Local Discrete Features in Explainable-By
No ratings yet
Learning Local Discrete Features in Explainable-By
37 pages
Advanced Data Visualization and Interpretation 1
No ratings yet
Advanced Data Visualization and Interpretation 1
37 pages
Overview of Object Detection Algorithms Using
No ratings yet
Overview of Object Detection Algorithms Using
18 pages
Eodkjhbejkwbjdeko 832 Ru 2 Oihfra
No ratings yet
Eodkjhbejkwbjdeko 832 Ru 2 Oihfra
20 pages
2412 09311v1
No ratings yet
2412 09311v1
30 pages
Semantic Interpretation For Convolutional Neural Networks: What Makes A Cat A Cat?
No ratings yet
Semantic Interpretation For Convolutional Neural Networks: What Makes A Cat A Cat?
33 pages
Craft: Concept Recursive Activation Factorization For Explainability
No ratings yet
Craft: Concept Recursive Activation Factorization For Explainability
25 pages
XAI Basics
No ratings yet
XAI Basics
34 pages
Explaining Convolutional Neural Networks Through Attribution-Based Input Sampling and Block-Wise Feature Aggregation
No ratings yet
Explaining Convolutional Neural Networks Through Attribution-Based Input Sampling and Block-Wise Feature Aggregation
17 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Harsha Thesis
No ratings yet
Harsha Thesis
62 pages
Counterfactual Exp-Causal Model-Image dataset-NN
No ratings yet
Counterfactual Exp-Causal Model-Image dataset-NN
15 pages
Ch3 CNN
No ratings yet
Ch3 CNN
64 pages
Exposing The Deception Deepfake Detection
No ratings yet
Exposing The Deception Deepfake Detection
13 pages
Secure Login System in Assembly Language
No ratings yet
Secure Login System in Assembly Language
13 pages
25 - Human-Centered Explanations Lessons Learned From Image Classification For Medical and Clinical Decision Making
No ratings yet
25 - Human-Centered Explanations Lessons Learned From Image Classification For Medical and Clinical Decision Making
11 pages
Visualizing and Understanding Convolutional Networks
No ratings yet
Visualizing and Understanding Convolutional Networks
9 pages
Sub 287
No ratings yet
Sub 287
16 pages
Interpretable Explanations of Black Boxes by Meaningful Perturbation
No ratings yet
Interpretable Explanations of Black Boxes by Meaningful Perturbation
9 pages
On Interpretability of Artificial Neural Networks A Survey
No ratings yet
On Interpretability of Artificial Neural Networks A Survey
20 pages
Understanding Multimodal Deep Neural Networks: A Concept Selection View
No ratings yet
Understanding Multimodal Deep Neural Networks: A Concept Selection View
8 pages
L MS PG 001 0106 Submersible Motors
No ratings yet
L MS PG 001 0106 Submersible Motors
54 pages
Adversarial Tcav - Robust and Effective Interpretation of Intermediate Layers
No ratings yet
Adversarial Tcav - Robust and Effective Interpretation of Intermediate Layers
11 pages
Network Master Series: MT9090A MU909014A1/B/B1/C/C6 MU909015A6/B/B1/C/C6
No ratings yet
Network Master Series: MT9090A MU909014A1/B/B1/C/C6 MU909015A6/B/B1/C/C6
12 pages
387-Article Text-817-3-10-20240304
No ratings yet
387-Article Text-817-3-10-20240304
14 pages
Experiment 3: Spatial Domain Image Enhancement: MATLAB Code
No ratings yet
Experiment 3: Spatial Domain Image Enhancement: MATLAB Code
8 pages
Hierarchical Semantic Tree Concept Whitening For Image Understanding (1) - Updated
No ratings yet
Hierarchical Semantic Tree Concept Whitening For Image Understanding (1) - Updated
10 pages
Ghorbani 2019
No ratings yet
Ghorbani 2019
8 pages
Lecture 5
No ratings yet
Lecture 5
114 pages
Conv Neural Nets
No ratings yet
Conv Neural Nets
11 pages
Earth Science Reviewer
No ratings yet
Earth Science Reviewer
13 pages
Improvements in The Mechanical Properties of The 18R-6R High-Hysteresis Martensitic Transformation by Nanoprecipitates in CuZnAl Alloys
No ratings yet
Improvements in The Mechanical Properties of The 18R-6R High-Hysteresis Martensitic Transformation by Nanoprecipitates in CuZnAl Alloys
8 pages
Xaliss Jamal Omer - Numerical
No ratings yet
Xaliss Jamal Omer - Numerical
16 pages
2010 01 12 3DBeam CDT6
No ratings yet
2010 01 12 3DBeam CDT6
65 pages
Explaining Explanations - An Overview of Interpretability of Machine Learning
No ratings yet
Explaining Explanations - An Overview of Interpretability of Machine Learning
10 pages
MMW Midterms Notes
No ratings yet
MMW Midterms Notes
6 pages
ATATool
No ratings yet
ATATool
6 pages
Ceng317 Gc32 Final Exam: Two-Way Anova
No ratings yet
Ceng317 Gc32 Final Exam: Two-Way Anova
6 pages
Chapter 1 - Introduction - 2023 - Explainable Deep Learning AI
No ratings yet
Chapter 1 - Introduction - 2023 - Explainable Deep Learning AI
6 pages
Cristal de Cuarzo 40MHz
No ratings yet
Cristal de Cuarzo 40MHz
4 pages
Notes For Practical
No ratings yet
Notes For Practical
49 pages
Entropy 23 00018 v2 40
No ratings yet
Entropy 23 00018 v2 40
1 page
Discriminant Analysis
No ratings yet
Discriminant Analysis
5 pages
Overview ML Interpretability
No ratings yet
Overview ML Interpretability
10 pages
Data Security System Using Crytography Algorithm (RC6
No ratings yet
Data Security System Using Crytography Algorithm (RC6
8 pages
TURBOTEST-nl EN
No ratings yet
TURBOTEST-nl EN
2 pages
Interpretation of Geophysical Logs Coal.
No ratings yet
Interpretation of Geophysical Logs Coal.
16 pages
Siemens 1LA7 Cat 48
No ratings yet
Siemens 1LA7 Cat 48
1 page
Smartview Common Issues - Master Blog Part-1: Issue-Smart View Not Submitting Data To Essbase Application/Database
No ratings yet
Smartview Common Issues - Master Blog Part-1: Issue-Smart View Not Submitting Data To Essbase Application/Database
19 pages
(IJCST-V10I5P12) :mrs J Sarada, P Priya Bharathi
No ratings yet
(IJCST-V10I5P12) :mrs J Sarada, P Priya Bharathi
6 pages
Mastering DALL-E: The Beginner and Intermediate Guide to AI Image Creation
From Everand
Mastering DALL-E: The Beginner and Intermediate Guide to AI Image Creation
GN
No ratings yet
Mastering Vue.js for Beginners
From Everand
Mastering Vue.js for Beginners
Madison Giroux
No ratings yet
Bag of Words Model: Unlocking Visual Intelligence with Bag of Words
From Everand
Bag of Words Model: Unlocking Visual Intelligence with Bag of Words
Fouad Sabry
No ratings yet

Module3 Lecture7 Part1

Uploaded by

Module3 Lecture7 Part1

Uploaded by

Concept-based

©2022 Su-In Lee 1

©2022 Su-In Lee 2

§ Concept explanations consider the role of

Lipton, "The Mythos of model interpretability: In machine learning, the concept of

©2022 Su-In Lee 3

§ With high-dimensional data, humans may

©2022 Su-In Lee 4

Zeiler & Fergus, “Visualizing and understanding convolutional networks” (2013)

©2022 Su-In Lee 5

§ Can we incorporate this into an explanation

©2022 Su-In Lee 6

©2022 Su-In Lee 7

Previous methods operate at input layer Inputs

Concept explanations operate here Concepts

©2022 Su-In Lee 8

©2022 Su-In Lee 9

©2022 Su-In Lee 10

Provided by Alex DeGrave, MD/PhD student in the AIMS lab

©2022 Su-In Lee 11

©2022 Su-In Lee 12

©2022 Su-In Lee 13

Koh et al., “Concept bottleneck models” (2020)

©2022 Su-In Lee 14

Koh et al., “Concept bottleneck models” (2020)

©2022 Su-In Lee 15

©2022 Su-In Lee 16

©2022 Su-In Lee 17

Intervening on one or more concepts can correct the model prediction

©2022 Su-In Lee 18

§ Counterfactual explanations (next time)

©2022 Su-In Lee 19

©2022 Su-In Lee 20

©2022 Su-In Lee 21

©2022 Su-In Lee 22

§ Choose a concept, select a hidden layer

©2022 Su-In Lee CAV based on linear classifier 23

CAV based on linear classifier

©2022 Su-In Lee 24

©2022 Su-In Lee 25

§ Here, conceptual sensitivity is about small

©2022 Su-In Lee 26

ℎ%,' 𝑓% 𝑥 + 𝜖𝑣(% − ℎ%,' 𝑓% 𝑥

Directional derivative CAV (e.g., stripes)

©2022 Su-In Lee 28

©2022 Su-In Lee 29

𝑥 ∈ 𝑋' : 𝑆!,#,$ 𝑥 > 0

Set of examples with class 𝑘

©2022 Su-In Lee 31

©2022 Su-In Lee 32

©2022 Su-In Lee 34

©2022 Su-In Lee 35

©2022 Su-In Lee 36

𝑧 ∼ 𝑁(0, 𝐼) Generator 𝑥, Discriminator 𝑃 real

Goodfellow et al. “Generative adversarial networks” (2014)

©2022 Su-In Lee 37

©2022 Su-In Lee 38

Wu et al., "StyleSpace analysis: Disentangled controls for StyleGAN image generation"

©2022 Su-In Lee 39

Wu et al., "StyleSpace analysis: Disentangled controls for StyleGAN image generation"

©2022 Su-In Lee 40

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”

©2022 Su-In Lee 41

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”

©2022 Su-In Lee 42

Learning objectives in red

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”

©2022 Su-In Lee 43

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”

©2022 Su-In Lee 44

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”

©2022 Su-In Lee 45

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”

©2022 Su-In Lee 46

§ Note: this can be considered a counterfactual

©2022 Su-In Lee 47