0% found this document useful (0 votes)
48 views22 pages

Post Hoc Explanations Feature Attributions 3 of 4

SmoothGrad is a technique for removing noise from sensitivity maps, which are used to explain image classifier decisions. It works by taking the average of sensitivity maps generated from the original image with added Gaussian noise. This smoothing reduces the effect of rapid fluctuations in the gradient, creating less visually noisy sensitivity maps. Experiments show SmoothGrad maps have better visual coherence and discriminativity compared to other gradient-based techniques like vanilla gradients or integrated gradients. However, the results are qualitative and more quantitative evaluation is needed.

Uploaded by

Paul George
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views22 pages

Post Hoc Explanations Feature Attributions 3 of 4

SmoothGrad is a technique for removing noise from sensitivity maps, which are used to explain image classifier decisions. It works by taking the average of sensitivity maps generated from the original image with added Gaussian noise. This smoothing reduces the effect of rapid fluctuations in the gradient, creating less visually noisy sensitivity maps. Experiments show SmoothGrad maps have better visual coherence and discriminativity compared to other gradient-based techniques like vanilla gradients or integrated gradients. However, the results are qualitative and more quantitative evaluation is needed.

Uploaded by

Paul George
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

SmoothGrad: removing noise

by adding noise

Authors: Daniel Smilkov, Nikhil Thorat, Been Kim,


Fernanda Viégas, Martin Wattenberg

Presented by: Vignav Ramesh, Zelin (James) Li, Paul Liu


Motivation
▪ We want post-hoc explanations of image
classifiers

▪ Solution: Sensitivity maps (a.k.a. saliency


maps, pixel attribution maps)
o Visual interpretation of gradient of class
activation function w.r.t input image
o Structured as grayscale image w/
dimension same as input image
■ Brightness of pixel ∝ importance to
classification decision
Background: Gradients as Sensitivity Maps
▪ N: network that classifies images into one class from set C
▪ Given input image x, N typically computes class activation function Sc for
each c ∈ C
▪ Final classification is
▪ Sensitivity map given by

▪ Intuition: Mc represents how much difference a tiny change in each pixel of


x would make to the classification score for class c
Related Work: Perturbation Methods
▪ Key idea: generate a perturbed dataset to fit an explainable model
o LIME
o KernelSHAP
Related Work: Backpropagation

Output
▪ Key idea: backpropagate importance
… through the network
o Vanilla gradients
… o Layerwise relevance propagation (Bach et al.)
o Integrated gradients (Sundararajan et al.)

o DeepLIFT (Shrikumar et al.)
… o Deconvolution (Zeiler & Fergus, 2014)
Yellow = inputs o Guided Backpropagation (Springenberg et al,
2014)
Limitations of Sensitivity Maps
▪ Visually noisy
o Often highlight pixels that–to a human eye–seem randomly selected
o a priori, we cannot know if this noise reflects an underlying truth about
how networks perform classification, or is due to more superficial factors
■ The SmoothGrad paper answers this question - we’ll get to this soon!
Theory Behind SmoothGrad: Noisy Gradients
▪ Key idea behind SmoothGrad: noisy maps are due to noisy gradients

▪ Derivative of Sc may fluctuate sharply at small scales


o Apparent noise one sees in a sensitivity map may be due to essentially
meaningless local variations in partial derivatives
Noisy Gradients (cont’d)

▪ Given these rapid fluctuations, gradient of S c at any given point will be less
meaningful than a local average of gradient values.
SmoothGrad: Intuition
▪ Recall that noisy maps are due to noisy gradients

▪ Simple solution:
o take an image of interest
o sample similar images by adding Gaussian noise to the image
o take the average of the resulting sensitivity maps for each sampled
image
■ This smoothes the gradient
SmoothGrad: Algorithm
1. Take random samples in a neighborhood of an input x with added noise
2. Average the resulting sensitivity maps.

n is the number of samples, and Ɲ(0, σ2) represents Gaussian noise with
standard deviation σ.
Experimental Setup

▪ Performed SmoothGrad on visualizations of two neural networks:


o Inception v3 model by Google that was trained on the ILSVRC-2013
dataset
o Convolutional MNIST model based on the TensorFlow tutorial
Choosing Hyperparameters (𝜎: std. dev.)

𝜎 : the standard deviation of


the Gaussian noise
Choosing Hyperparameters (n: sample size)
Qualitative Results: Visualization Techniques
● Absolute Value of Gradients
○ depends on the characteristics of dataset
Qualitative Results: Visualization Techniques
● Absolute Value of Gradients
○ depends on the characteristics of dataset
● Capping outlying values
○ presence of few pixels that have much higher gradients than the average
○ capping to 99 percentile
● Multiplying maps with images
○ produce simpler & sharper images (Shrikumar et al., 2017; Sundararajan et al.,
2017)
○ Downside: Pixels with values of 0 will never show up on the sensitivity map.
○ Upside: when viewing the importance of the feature as contribution to the
image
Qualitative Results: Visual Coherence

Definition (Visual Coherence): Highlights are only on the object of interest, not
the background

Comparison with three gradient-based methods


o Vanilla gradient
o Integrated Gradients
o Guided BackProp
Qualitative Results: Discriminativity

Definition (Discriminativity):
the ability to explain /
distinguish separate objects
without confusion
Qualitative Results: Discriminativity

Open Problem

Which properties affect the


discriminativity of a given
methods?

- Why did GBP show the


worst performance?
Combining with Other Methods

The same smoothing procedure


can be used to augment any
gradient-based method.
Limitations / Discussion
▪ Completely qualitative results, can we get quantitative metrics?
▪ Noisy sensitivity maps are due to noisy gradients
o Is this true?
o Future work: look for further evidence and theoretical arguments
▪ Does SmoothGrad generalize to other networks & tasks?
▪ How do we tradeoff between making picture pretty and being faithful to the
model? Do you think SmoothGrad handled this tradeoff well?
Thank you!
Questions?

You might also like