Module3 Lecture7 Part1
Module3 Lecture7 Part1
explanations
CSEP 590B: Explainable AI
Hugh Chen, Ian Covert & Su-In Lee
University of Washington
§ Consider concepts as an
intermediate representation
§ Examples: color, texture, object Inputs
parts, shape
§ Properties: Concepts
§ Compressed (fewer dimensions)
§ Sacrifices minimal information
§ Intuitive meaning Output
§ Simpler relationship with output
Output
Benign
Can we go beyond
localization?
Benign
§ Possible approaches:
§ Adjust the model to guarantee that specific concepts
are used
§ Use a standard model, then discover how concepts
are represented within the model
Inputs → concepts
§ Gradient-based explanations:
§ Is the output sensitive to a concept being slightly more
expressed?
§ Removal-based explanations:
§ Is the output sensitive to removing information from one
or more concepts?
§ E.g., leave-one-out or Shapley values
§ Cons:
§ Must use modified architecture
§ Requires comprehensive set of concepts for high
accuracy
§ Requires concept annotations in training data
Kim et al. "Interpretability beyond feature attribution: Quantitative testing with concept
activation vectors (TCAV)" (2018)
Deep
model
Concept
examples Random
(stripes) examples
Concept
examples Random
(stripes) examples
Sample
Conceptual sensitivity
Output Embedding
function function
§ Cons:
§ Single direction (CAV) may not be able to represent
complex concepts
§ Sensitivity to small changes may not be meaningful
§ Results depend on the layer
©2022 Su-In Lee 33
Today
§ Section 1
§ Concept bottleneck models
§ Concept activation vectors
§ StylEx
§ Section 2
§ Neuron interpretation
Karras et al. “Analyzing and improving the image quality of StyleGAN” (2020)
Adversarial loss D
Discriminator
Independent
Interactions
§ Cons:
§ GANs are difficult to train
§ Requires manual inspection to determine if latent space
maps to disentangled factors (not guaranteed, works best
for faces)