0% found this document useful (0 votes)
9 views49 pages

Module3 Lecture7 Part1

Uploaded by

vaibhavkirar459
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views49 pages

Module3 Lecture7 Part1

Uploaded by

vaibhavkirar459
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Concept-based

explanations
CSEP 590B: Explainable AI
Hugh Chen, Ian Covert & Su-In Lee
University of Washington

©2022 Su-In Lee 1


Course announcements
§ HW1 grades are released
§ We need one more week 8 discussion leader

©2022 Su-In Lee 2


Recall: decomposability
§ Do model components have an intuitive role
(inputs, parameters, calculations)?
§ Examples: splits in decision tree, weights in linear
model, input features

§ Concept explanations consider the role of


high-level concepts rather than original inputs
§ Potentially more intuitive, meaningful to humans

Lipton, "The Mythos of model interpretability: In machine learning, the concept of


interpretability is both important and slippery” (2018)

©2022 Su-In Lee 3


Setup
§ Focusing on high-dimensional data
§ Mainly images, possibly genomics or NLP

§ With high-dimensional data, humans may


prefer to operate on high-level concepts
§ A processed version of input, possibly with fewer
dimensions (compressed)
§ More intuitive meaning, more direct relationship with
outcome than original features (e.g., pixels)

©2022 Su-In Lee 4


High-level features in DNNs
§ Conventional wisdom about how DNNs process
images:
§ Input layer is pixels
§ First layer detects edges
§ Next layers find parts
§ Highest layers detect objects
§ Last layer makes classification

Zeiler & Fergus, “Visualizing and understanding convolutional networks” (2013)

©2022 Su-In Lee 5


Analogy to human reasoning
§ Seemingly true for DNNs, and interesting to
compare with humans
§ Humans seem to reason in a similar, hierarchical
manner
§ Typically prefer explanations based on high-level,
intuitive concepts

§ Can we incorporate this into an explanation


approach?

©2022 Su-In Lee 6


Concept representation

§ Consider concepts as an
intermediate representation
§ Examples: color, texture, object Inputs
parts, shape

§ Properties: Concepts
§ Compressed (fewer dimensions)
§ Sacrifices minimal information
§ Intuitive meaning Output
§ Simpler relationship with output

©2022 Su-In Lee 7


Concept explanations

Previous methods operate at input layer Inputs

Concept explanations operate here Concepts

Output

©2022 Su-In Lee 8


Image example
§ Explaining at pixel-level localizes important
information
§ But is importance due to color, texture, shape, or
something else?

©2022 Su-In Lee 9


Image example (cont.)
§ Alternatively, explanations can be based on
high-level concepts
§ Potentially more informative, intuitive for
humans

©2022 Su-In Lee 10


Medical image example
Input image Saliency map

Benign

Can we go beyond
localization?

Benign

Provided by Alex DeGrave, MD/PhD student in the AIMS lab

©2022 Su-In Lee 11


Challenges
§ Which concepts should we consider?
§ How do we obtain a concept-based
representation of the input data?

§ Possible approaches:
§ Adjust the model to guarantee that specific concepts
are used
§ Use a standard model, then discover how concepts
are represented within the model

©2022 Su-In Lee 12


Today
§ Section 1
§ Concept bottleneck models
§ Concept activation vectors
§ StylEx
§ Section 2
§ Neuron interpretation

©2022 Su-In Lee 13


Main idea
§ Force a deep learning model to represent
specific concepts before making prediction
§ Then, use intermediate concept representation
to understand the model’s dependencies

Koh et al., “Concept bottleneck models” (2020)

©2022 Su-In Lee 14


Concept bottleneck models
Concepts → predictions

Inputs → concepts

Koh et al., “Concept bottleneck models” (2020)

©2022 Su-In Lee 15


Learning concept bottleneck
models
$
§ Training data 𝑥 , 𝑦 , 𝑐 ! !
!"#
!
, where 𝑥 is
input, 𝑦 is label, and 𝑐 is concept vector
§ Create an architecture with bottleneck layer
§ Map from inputs to concepts with 𝑐̂ = 𝑔 𝑥
§ Then map to labels with 𝑓 𝑔 𝑥
§ Train the model to accurately predict both
concepts and labels
§ Can train either jointly or sequentially

©2022 Su-In Lee 16


Test-time interventions
§ Analyze how the model responds to changes in
the predicted concepts
§ Intervene on samples by replacing incorrectly
predicted concepts with true concept values

©2022 Su-In Lee 17


Successful test-time
interventions

Intervening on one or more concepts can correct the model prediction

©2022 Su-In Lee 18


Generating explanations
§ Additionally, we can apply explanation approaches
from previous lectures

§ Gradient-based explanations:
§ Is the output sensitive to a concept being slightly more
expressed?

§ Removal-based explanations:
§ Is the output sensitive to removing information from one
or more concepts?
§ E.g., leave-one-out or Shapley values

§ Counterfactual explanations (next time)

©2022 Su-In Lee 19


Remarks
§ Pros:
§ CBM ensures the model operates on a known set of
concepts (and nothing else)
§ Enables intervention and explanation via concepts

§ Cons:
§ Must use modified architecture
§ Requires comprehensive set of concepts for high
accuracy
§ Requires concept annotations in training data

©2022 Su-In Lee 20


Today
§ Section 1
§ Concept bottleneck models
§ Concept activation vectors
§ StylEx
§ Section 2
§ Neuron interpretation

©2022 Su-In Lee 21


Main idea
§ Post-hoc approach to identify concepts in a
model’s latent space (internal representation)
§ Alternative to using a concept bottleneck layer
§ After training the model, use concept samples
to find concept activation vectors (CAV)
§ Investigate a prediction’s sensitivity to concepts

Kim et al. "Interpretability beyond feature attribution: Quantitative testing with concept
activation vectors (TCAV)" (2018)

©2022 Su-In Lee 22


Concept activation vector (CAV)

§ Choose a concept, select a hidden layer


§ Find the direction separating samples that
represent the concept
Activation values

Deep
model

Concept
examples Random
(stripes) examples

©2022 Su-In Lee CAV based on linear classifier 23


CAV computation
§ Calculate embeddings for positive and negative
concept examples
§ Train a linear classifier to separate them
§ CAV is vector orthogonal to classification boundary

Concept
examples Random
(stripes) examples

CAV based on linear classifier

©2022 Su-In Lee 24


Sanity checks
§ Calculate CAV for a given concept
§ Examine images strongly activated along CAV
direction

©2022 Su-In Lee 25


Conceptual sensitivity
§ Recall, input gradients consider sensitivity to
small changes in pixel intensity

§ Here, conceptual sensitivity is about small


changes in a concept’s intensity
§ Calculate the impact of small perturbations in CAV
direction
§ Equivalent to a directional derivative

©2022 Su-In Lee 26


Conceptual sensitivity (cont.)
§ Let 𝑥 be an input, 𝑘 class of interest
§ Let 𝑓% 𝑥 be intermediate representation and
ℎ%,' 𝑓% 𝑥 the prediction for class 𝑦
§ Let 𝑣(% be the CAV for concept 𝐶
§ Conceptual sensitivity 𝑆(,',% 𝑥 ∈ ℝ is given by:

ℎ%,' 𝑓% 𝑥 + 𝜖𝑣(% − ℎ%,' 𝑓% 𝑥


𝑆(,',% 𝑥 = lim
)→+ 𝜖
Directional derivative
= ∇ℎ%,' 𝑓% 𝑥 ⋅ 𝑣(%
Can be obtained via dot product
©2022 Su-In Lee 27
Conceptual sensitivity (cont.)
§ Conceptual sensitivity:
𝑆(,',% 𝑥 = ∇ℎ%,' 𝑓% 𝑥 ⋅ 𝑣(%

Sample

Conceptual sensitivity

Directional derivative CAV (e.g., stripes)

Output Embedding
function function

©2022 Su-In Lee 28


Local explanations
§ Consider input 𝑥, class of interest 𝑘
§ How relevant is each concept to this prediction?
§ We can calculate conceptual sensitivity 𝑆(,',% 𝑥
for all concepts 𝐶

©2022 Su-In Lee 29


Global explanations
§ Consider a class of interest 𝑘, and a concept 𝐶
§ How relevant is the concept to this class?
§ Kim et al. propose the TCAV score to
summarize many local explanations:

𝑥 ∈ 𝑋' : 𝑆!,#,$ 𝑥 > 0


TCAV(,',% =
𝑋'

Set of examples with class 𝑘


Portion of examples where
concept 𝐶 is positively related
𝑋! =
©2022 Su-In Lee 30
Example results

©2022 Su-In Lee 31


Example results

©2022 Su-In Lee 32


Remarks
§ Pros:
§ TCAV is post-hoc, no architecture modifications
§ Fewer concept annotations required (but we still
need examples to find CAVs)

§ Cons:
§ Single direction (CAV) may not be able to represent
complex concepts
§ Sensitivity to small changes may not be meaningful
§ Results depend on the layer
©2022 Su-In Lee 33
Today
§ Section 1
§ Concept bottleneck models
§ Concept activation vectors
§ StylEx
§ Section 2
§ Neuron interpretation

©2022 Su-In Lee 34


Main idea
§ Train a model that maps samples to
disentangled latent factors (StyleGAN)
§ Then, incorporate a classifier into the GAN
§ Use humans to interpret each dimension of the
StyleSpace as a concept (attribute)
§ Generate attribute-wise counterfactuals, see
how they impact the classifier

©2022 Su-In Lee 35


StyleGAN2
§ A GAN architecture for generative image
modeling, state-of-the-art performance in
distribution quality metrics
§ Produces a disentangled latent space
§ Latent dimensions correspond to high-level
attributes (e.g., pose, freckles, hair)
§ Here, single dimensions rather than directions (like in
TCAV)

Karras et al. “Analyzing and improving the image quality of StyleGAN” (2020)

©2022 Su-In Lee 36


StyleGAN2 (cont.)
§ Basically, a GAN with improved architecture and
training

𝑧 ∼ 𝑁(0, 𝐼) Generator 𝑥, Discriminator 𝑃 real

Goodfellow et al. “Generative adversarial networks” (2014)

©2022 Su-In Lee 37


Example results
Fake people produced by StyleGAN2 generator

©2022 Su-In Lee 38


Observation: StyleSpace is
disentangled
§ Wu et al. explored an intermediate layer in
StyleGAN2, called the “StyleSpace”
§ Proposed using concept examples to identify
dimensions that correspond to concepts (e.g.,
hair style, glasses)
§ Then, adjusted these attributes to generate new
images with desired properties

Wu et al., "StyleSpace analysis: Disentangled controls for StyleGAN image generation"


(2021)

©2022 Su-In Lee 39


Observation: StyleSpace is
disentangled

Wu et al., "StyleSpace analysis: Disentangled controls for StyleGAN image generation"


(2021)

©2022 Su-In Lee 40


Latent space can represent
concepts

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”


(2021)

©2022 Su-In Lee 41


Combining classifier with
StyleGAN2
§ StyleGAN can produce attributes that don’t
affect the classifier
§ StylEx proposed a StyleGAN training procedure
that incorporates a classifier
§ Learns a classifier-specific StyleSpace
§ Classification loss ensures that generated image has
same classification as corresponding original image

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”


(2021)

©2022 Su-In Lee 42


Combining classifier with
StyleGAN2
Learned components: E, G, and D

Learning objectives in red

Adversarial loss D
Discriminator

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”


(2021)

©2022 Su-In Lee 43


Example concepts in gender
classifier

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”


(2021)

©2022 Su-In Lee 44


Example concepts in age
classifier

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”


(2021)

©2022 Su-In Lee 45


Local explanations

Independent

Interactions

Lang et al., “Explaining in style: Training a GAN to explain a classifier in StyleSpace”


(2021)

©2022 Su-In Lee 46


Remarks
§ Pros:
§ StyleGAN is trained without concept labels, concept
directions are discovered automatically after training

§ Cons:
§ GANs are difficult to train
§ Requires manual inspection to determine if latent space
maps to disentangled factors (not guaranteed, works best
for faces)

§ Note: this can be considered a counterfactual


explanation that changes one attribute at a time

©2022 Su-In Lee 47


Conclusion
§ Concepts are not inherently explanations
§ Concept explanations typically require two steps:
§ Learning a latent space of human understandable
concepts
§ Explaining model predictions via that latent space

Approach Concept annotation Explanation Learning approach


Concept
All training samples Intervention Supervised
bottleneck
TCAV Some samples Directional derivative Post-hoc supervised

StylEx Some samples Counterfactuals Unsupervised

©2022 Su-In Lee 48


Today
§ Section 1
§ Concept bottleneck models
§ Concept activation vectors
§ StylEx
§ 10 min break
§ Section 2
§ Neuron interpretation

©2022 Su-In Lee 49

You might also like