0% found this document useful (0 votes)
42 views39 pages

Eccv2014 Zeiler Convolutional Networks 01

Uploaded by

khcheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views39 pages

Eccv2014 Zeiler Convolutional Networks 01

Uploaded by

khcheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Visualization and Understanding

Convolutional Neural Networks

Matthew Zeiler

NYU Advisor:
Rob Fergus
Overview

• Visualization technique based on Deconvolutional


Networks
• Applied to convolutional neural networks
– determine what each layer learns
– provides insight for architecture selection
Convnets Show Huge Gains

35  

30  

25  
Top-­‐5  error  rate  %  

ImageNet 2012 20  
classification 15  

competition results 10  

5  

0  
SuperVision   ISI   Oxford   INRIA   Amsterdam  
Convolutional Networks (LeCun et al. ’89)

• Supervised & feed-forward


• Each layer:
– Convolve input with filters
– Non-linearity (rectified linear)
Feature maps
– Pooling (local max)
• Train convolutional filters by Pooling
back-propagating classification error
Non-linearity

Convolution (Learned)

Input Image
Overview

• What are the models learning?

• Which part of the model is key to performance?

• Do the features generalize?


Deconvolutional Networks
[Zeiler et al. CVPR’10, ICCV’11]  
• Provides way to map activations at
high layers back to the input Feature maps

• Same operations as Convnet, but in


Unpooling
reverse:
– Unpool feature maps
– Convolve unpooled maps Non-linearity
• Filters copied from Convnet

Convolution (learned)
• Used here purely as a probe
– Originally proposed as unsupervised
learning method
Input Image
– No inference, no learning
Reversible Max Pooling

Max Pooled
Locations Feature Maps
“Switches”

Pooling Unpooling

Feature Map
Reconstructed Feature Map
Layer 1 Filters
Projecting back from Higher Layers

Feature  
0   ....   0   Map   ....  

Filters   Filters  

Layer 2 Reconstruction Layer 2: Feature maps


Deconvnet  

Convnet  
Layer 1 Reconstruction Layer 1: Feature maps

Visualization Input Image


Visualizations of Higher Layers

• Use ImageNet 2012 validation set


• Push each image through network

• Take max activation from


Feature  
Map   ....   feature map associated
with each filter
Filters  
• Use Deconvnet to project
Lower  Layers   back to pixel space

• Use pooling “switches”


Input    
peculiar to that activation
Image   Validation Images
Layer 1: Top-9 Patches
Layer 2: Top-1
Layer 2: Top-9
Layer 2: Top-9 Patches
Layer 3: Top-1
Layer 3: Top-9
Layer 3: Top-9 Patches
Layer  3:  Top-­‐9  Patches  
Layer  4:  Top-­‐1  
Layer 4: Top-9
Layer 4: Top-9 Patches
Layer  5:  Top-­‐1  
Layer  5:  Top-­‐9  
Layer  5:  Top-­‐9  Patches  
Occlusion Experiment

• Mask parts of input with occluding square

• Monitor output

• Perhaps network using scene context?


Input image  

p(True class)   Most probable class  


Input image  

Total activation in most Other activations from


active 5th layer feature map   same feature map  
Input image  

p(True class)   Most probable class  


Input image  

Total activation in most Other activations from


active 5th layer feature map   same feature map  
Input image  

p(True class)   Most probable class  


Input image  

Total activation in most Other activations from


active 5th layer feature map   same feature map  
Lack of Understanding

• What are the models learning?

• Which part of the model is key to performance?

• Do the features generalize?


Visualizations Help – 2% Boost
Too  specific  low-­‐level   Dead  filters   Block  ArMfacts  
Constrain  RMS  

(a) (b)

(c) (d) (e)

Needs  renormalizaMon   Too  simple  mid-­‐level  

Smaller  Strides  
Smaller  Filters   4  to  2  
7x7  to  11x11  
ImageNet  ClassificaMon  2013  Results  
Top-­‐5  Error  Rates  (lower  is  be4er)  
17%  
16.4%  
16.1%  
16%  
15.2%   15.2%  
Test  error  (top-­‐5)  

15%   14.8%  

14.2%   14.3%  

14%   13.5%   13.6%  

13.0%  
13%  

12%   11.7%  

11.2%  

11%  

10%  

hZp://www.image-­‐net.org/challenges/LSVRC/2013/results.php  
 
Recent Success

• Using smaller strides:


– Very Deep Convolutional Networks for Large-
Scale Image Recognition, ILSVRC 2014 2nd
Classification, 1st Localization Simonyan and
Zisserman, Arxiv 2014
– Some Improvements on Deep Convolutional
Neural Networks, Howard, Arxiv 2013
• Using Visualizations for saliency:
– Deep Inside Convolutional Networks: Visualizing
Image Classification Models and Saliency Maps,
Simonyan, Vedaldi, and Zisserman, Arxiv 2014
Overview

• What are the models learning?

• Which part of the model is key to performance?

• Do the features generalize?


Caltech 256
75

70

65

60

55

50

45

40

35

30

25
0 10 20 30 40 50 60
Caltech 256
75

70

65

60 6  training  examples  

55

50

45

40

35

30

25
0 10 20 30 40 50 60
Summary

• Visualization technique based on Deconvolutional


Networks
• Applied to convolutional neural networks
– better understanding of what is learned
– gives insight into model selection
Thanks!

Clarifai is Hiring!
www.clarifai.com

You might also like