0% found this document useful (0 votes)

104 views91 pages

CV Ss16 0609 Deep Learning

This document provides an introduction to deep learning for computer vision. It discusses how deep learning methods can learn features from data rather than relying on hand-designed features. Convolutional neural networks are described as a successful deep learning approach for computer vision tasks like object recognition. The document reviews historical neural network architectures and recent successes of deep learning on large-scale datasets like ImageNet.

Uploaded by

Roberto Ariel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views91 pages

CV Ss16 0609 Deep Learning

Uploaded by

Roberto Ariel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 91

High Level Computer Vision

Intro to Deep Learning for Computer Vision

Bernt Schiele - [email protected]

Mario Fritz - [email protected]

https://www.mpi-inf.mpg.de/hlcv

most slides from: Rob Fergus & Marc’Aurelio Ranzato

Deep Learning  
for  
Computer Vision

NIPS 2013 Tutorial

Rob Fergus
Dept. of Computer Science
New York University
Overview

• Primarily about object recognition, using

supervised ConvNet models

• Focus on natural images

– Rather than digits
– Classification & Detection

• Brief discussion of   instead of

other vision problems
Motivation
Existing Recognition Approach

Image/Video
Hand-designed 
Trainable  Object 
Pixels Feature
Classifier Class
Extraction

• Features are not learned

• Trainable classifier is often generic (e.g. SVM)

Motivation
• Features are key to recent progress in recognition
• Multitude of hand-designed features currently in use
– SIFT, HOG, LBP, MSER, Color-SIFT………….
• Where next? Better classifiers? Or keep building
more features?

Felzenszwalb, Girshick,   Yan & Huang  

McAllester and Ramanan, PAMI 2007 (Winner of PASCAL 2010 classification competition)
Hand-Crafted Features
• LP-β Multiple Kernel Learning (MKL)
– Gehler and Nowozin, On Feature Combination for
Multiclass Object Classification, ICCV’09
• 39 different kernels
– PHOG, SIFT, V1S+, 
Region Cov. Etc.
• MKL only gets  
few % gain over  
averaging features
à Features are  
doing the work
What about Learning the Features?

• Perhaps get better performance?

• Deep models: hierarchy of feature extractors
• All the way from pixels à classifier
• One layer extracts features from output of previous layer

Image/Video
Simple  
Pixels Layer 1 Layer 2 Layer 3
Classifier

• Train all layers jointly

Deep  
Learning
SUPERVISED
Recurrent Neural Net

Convolutional   Boosting
Neural Net

Neural Net Perceptron

SVM
DEEP SHALLOW
Deep (sparse/denoising)   AutoencoderNeural Net
Autoencoder
Sparse Coding
SP GMM
Deep Belief Net Restricted BM

BayesNP
UNSUPERVISED Slide: M. Ranzato
Multistage Hubel&Wiesel Architecture
Slide: Y.LeCun

• [Hubel & Wiesel 1962]

• simple cells detect local features
• complex cells “pool” the outputs
of simple cells within a retinotopic
neighborhood.

Cognitron / Neocognitron 
[Fukushima 1971-1982]
Convolutional Networks 
• Also HMAX [Poggio 2002-2006] [LeCun 1988-present]
[Reading - Chapter 5.1 - 5.3 @ Bishop 2006]

Short Intro: “Standard” Neural Networks

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 12
Short Intro: Perceptron

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 13
Short Intro: Perceptron

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 14
Short Intro: Perceptron - Activation Functions

High Level Computer Vision - June 9, 2o16 slide taken from David Stutz (Aachen)15
Single Layer Perceptron

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 16
Short Intro: Two-Layer Perceptron

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 17
Short Intro: Multi-Layer Perceptron (MLP)

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 18
Network Training

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 19
Network Training - Error Measures

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 20
Network Training - Approaches

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 21
Network Training - Parameter Optimization

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 22
Parameter Optimization by Gradient Descent

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 23
Backpropagation =  
Parameter Optimization by Gradient Descent

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 24
Backpropagation =  
Parameter Optimization by Gradient Descent

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 25
Backpropagation =  
Parameter Optimization by Gradient Descent

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 26
Backpropagation =  
Parameter Optimization by Gradient Descent

slide taken from David Stutz (Aachen)

High Level Computer Vision - June 9, 2o16 27
Convolutional Neural Networks

• LeCun et al. 1989

• Neural network with
specialized connectivity
structure
Convnet Successes
• Handwritten text/digits
– MNIST (0.17% error [Ciresan et al. 2011])
– Arabic & Chinese [Ciresan et al. 2012]

• Simpler recognition benchmarks

– CIFAR-10 (9.3% error [Wan et al. 2013])
– Traffic sign recognition
• 0.56% error vs 1.16% for humans [Ciresan et al. 2011]

• But (until recently) less good at  

more complex datasets
– E.g. Caltech-101/256 (few training examples)
Characteristics of Convnets
Feature maps
• Feed-forward:
– Convolve input
– Non-linearity (rectified linear) Pooling
– Pooling (local max) / (=subsampling)
• Supervised Non-linearity
• Train convolutional filters by  
back-propagating classification error
Convolution (Learned)

Input Image

[LeCun et al. 1989]

Application to ImageNet
[Deng et al. CVPR 2009]

• ~14 million labeled images, 20k classes

• Images gathered from Internet

• Human labels via Amazon Turk

[NIPS 2012]
Krizhevsky et al. [NIPS2012]
• Same model as LeCun’98 but: 
- Bigger model (8 layers)
- More data (106 vs 103 images)
- GPU implementation (50x speedup over CPU)
- Better regularization (DropOut)

• 7 hidden layers, 650,000 neurons, 60,000,000 parameters

• Trained on 2 GPUs for a week
ImageNet Classification 2012

• Krizhevsky et al. - 16.4% error (top-5)

• Next best (non-convnet) – 26.2% error
30

22.5
Top-5 error rate %

7.5

0
SuperVision ISI Oxford INRIA Amsterdam
Commercial Deployment
• Google & Baidu, Spring 2013 for personal image search
Intuitions Behind  
Deep Networks
(following slides from Marc Aurelio Ranzato - Google)
36
37
38
39
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Large Convnets 
for  
Image Classification
Large Convnets for Image Classification

• Operations in each layer

• Architecture

• Training

• Results
Components of Each Layer

Pixels / Filter with   + Non-linearity

Dictionary
Features (convolutional 
or tiled)

Spatial/Feature
(Sum or Max)

Normalization 
[Optional] between   Output Features
feature responses
Compare: SIFT Descriptor
Image  
Pixels Apply 
Gabor filters

Spatial pool
(Sum)

Feature  
Normalize to unit
length Vector
Non-Linearity

• Non-linearity
– Per-feature independent
– Tanh
– Sigmoid: 1/(1+exp(-x))
– Rectified linear
• Simplifies backprop
• Makes learning faster
• Avoids saturation issues 

à Preferred option
Pooling
• Spatial Pooling
– Non-overlapping / overlapping regions
– Sum or max
– Boureau et al. ICML’10 for theoretical analysis

Max

Sum
Architecture

Importance of Depth
Architecture of Krizhevsky et al.
Softmax Output

• 8 layers total Layer 7: Full

Layer 6: Full
• Trained on Imagenet 
dataset [Deng et al. CVPR’09] Layer 5: Conv + Pool

Layer 4: Conv
• 18.2% top-5 error
Layer 3: Conv

Layer 2: Conv + Pool

• Our reimplementation:
18.1% top-5 error Layer 1: Conv + Pool

Input Image
Architecture of Krizhevsky et al.
Softmax Output

• Remove top fully

connected layer
– Layer 7 Layer 6: Full

Layer 5: Conv + Pool

• Drop 16 million
parameters Layer 4: Conv

Layer 3: Conv
• Only 1.1% drop in
performance! Layer 2: Conv + Pool

Layer 1: Conv + Pool

Input Image
Architecture of Krizhevsky et al.
Softmax Output

• Remove both fully connected

layers
– Layer 6 & 7
Layer 5: Conv + Pool

• Drop ~50 million parameters

Layer 4: Conv

Layer 3: Conv
• 5.7% drop in performance
Layer 2: Conv + Pool

Layer 1: Conv + Pool

Input Image
Architecture of Krizhevsky et al.
Softmax Output

• Now try removing upper feature Layer 7: Full

extractor layers:
Layer 6: Full
– Layers 3 & 4
Layer 5: Conv + Pool
• Drop ~1 million parameters

• 3.0% drop in performance

Layer 2: Conv + Pool

Layer 1: Conv + Pool

Input Image
Architecture of Krizhevsky et al.
Softmax Output

• Now try removing upper feature

extractor layers & fully connected:
– Layers 3, 4, 6 ,7

Layer 5: Conv + Pool

• Now only 4 layers

• 33.5% drop in performance

Layer 2: Conv + Pool

àDepth of network is key
Layer 1: Conv + Pool

Input Image
Tapping off Features at each Layer
Plug features from each layer into linear SVM or soft-max
Translation (Vertical)

Output
Layer 1

Layer 7
Layer 1

Layer 7 Output
Scale Invariance
Layer 1

Layer 7 Output
Rotation Invariance
Visualizing  
ConvNets
Visualizing Convnets

• Raw coefficients of learned filters in higher

layers difficult to interpret

• Several approaches look to optimize input 

to maximize activity in a high-level feature
– Erhan et al. [Tech Report 2009]
– Le et al. [NIPS 2010]
– Depend on initialization
– Model invariance with Hessian about 
(locally) optimal stimulus
Visualization using Deconvolutional Networks
[Zeiler et al. CVPR’10, ICCV’11, arXiv’13]

• Provide way to map activations

at high layers back to the input Feature maps

• Same operations as Convnet, Unpooling

but in reverse:
– Unpool feature maps Non-linearity
– Convolve unpooled maps
• Filters copied from Convnet
Convolution (learned)

• Used here purely as a probe

– Originally proposed as unsupervised
learning method Input Image

– No inference, no learning
Deconvnet Projection from Higher Layers
[Zeiler and Fergus. arXiv’13]

Feature 
0 .... 0 Map ....

Filters Filters

Layer 2 Reconstruction Layer 2: Feature maps

Deconvnet

Convnet
Layer 1 Reconstruction Layer 1: Feature maps

Visualization Input Image

Unpooling Operation
Layer 1 Filters
Visualizations of Higher Layers
[Zeiler and Fergus. arXiv’13]

• Use ImageNet 2012 validation set

• Push each image through network

• Take max activation from

Feature  feature map associated
Map ....
with each filter
Filters
• Use Deconvnet to project
back to pixel space
Lower Layers
• Use pooling “switches”
peculiar to that activation
Input  
Image Validation Images
Layer 1: Top-9 Patches
Layer 2: Top-9

• NOT SAMPLES FROM MODEL

• Just parts of input image that give strong activation of this feature map
• Non-parametric view on invariances learned by model
Layer 2: Top-9 Patches

• Patches from validation images that give maximal activation of a given feature map
Layer 3: Top-1
Layer 3: Top-9
Layer 3: Top-9 Patches
Layer 4: Top-1
Layer 4: Top-9
Layer 4: Top-9 Patches
Layer 5: Top-1
Layer 5: Top-9
Layer 5: Top-9 Patches
ImageNet Classification 2013 Results
• http://www.image-net.org/challenges/LSVRC/2013/results.php
0.17

0.1525
Test error (top-5)

0.135

0.1175

0.1
Clarifai (extra data) NUS Andrew Howard UvA-Euvision Adobe CogniXveVision

• Pre-2012: 26.2% error à 2012: 16.5% error à 2013: 11.2% error

Sample Classification Results
[Krizhevsky et al. NIPS’12]

114 - Jermin Shaikh - Artificial Intelligence Practical
100% (1)
114 - Jermin Shaikh - Artificial Intelligence Practical
53 pages
CERN Deep Learning and Vision
No ratings yet
CERN Deep Learning and Vision
72 pages
Deep Learning For Computer Vision PDF
7% (14)
Deep Learning For Computer Vision PDF
24 pages
231867-06 B Pascal Arrays Function Prosidures and Paradime
100% (1)
231867-06 B Pascal Arrays Function Prosidures and Paradime
9 pages
Practiceproblems DFA NFA PDF
No ratings yet
Practiceproblems DFA NFA PDF
3 pages
WEEK 4 - What Is Common Table Expressions
No ratings yet
WEEK 4 - What Is Common Table Expressions
3 pages
Image Recognition Using Neural Networks
No ratings yet
Image Recognition Using Neural Networks
18 pages
Convolutional Neural PDF
No ratings yet
Convolutional Neural PDF
187 pages
Chapitre 8 2024
No ratings yet
Chapitre 8 2024
231 pages
Binary Tree in Java
No ratings yet
Binary Tree in Java
79 pages
MRF For Vision and Image Processing
No ratings yet
MRF For Vision and Image Processing
472 pages
Lab Manual On Soft Computing (IT-802) : Ms. Neha Sexana
No ratings yet
Lab Manual On Soft Computing (IT-802) : Ms. Neha Sexana
29 pages
Dlincv 161110052148 PDF
No ratings yet
Dlincv 161110052148 PDF
271 pages
7 CNN
No ratings yet
7 CNN
66 pages
Convolutional Neural Networks-CNN PDF
No ratings yet
Convolutional Neural Networks-CNN PDF
95 pages
Bmva Ss 2018 Breckon Deepmachinelearning PDF
No ratings yet
Bmva Ss 2018 Breckon Deepmachinelearning PDF
120 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
Image Classification Using Convolutional Neural Networks (CNNS)
No ratings yet
Image Classification Using Convolutional Neural Networks (CNNS)
61 pages
Lecture2.2 UnimodalRepresentations Part1 PDF
No ratings yet
Lecture2.2 UnimodalRepresentations Part1 PDF
92 pages
Ast 4
No ratings yet
Ast 4
5 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Unit 3
No ratings yet
Unit 3
105 pages
Introduction To Neural Networks: RWTH Aachen University Chair of Computer Science 6 Prof. Dr.-Ing. Hermann Ney
No ratings yet
Introduction To Neural Networks: RWTH Aachen University Chair of Computer Science 6 Prof. Dr.-Ing. Hermann Ney
31 pages
DLCV Ch2 Neural Network
No ratings yet
DLCV Ch2 Neural Network
68 pages
Question # 6:: 2 3 4 Max Z X X X
No ratings yet
Question # 6:: 2 3 4 Max Z X X X
12 pages
Module11 - NNandDeep Learning
No ratings yet
Module11 - NNandDeep Learning
84 pages
Week8 WEB
No ratings yet
Week8 WEB
54 pages
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
No ratings yet
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
55 pages
CV 2025 Spring 16
No ratings yet
CV 2025 Spring 16
53 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
Ann 5TH
No ratings yet
Ann 5TH
98 pages
CV Mot
No ratings yet
CV Mot
69 pages
Lecture4 - Convnets For CV Slide
No ratings yet
Lecture4 - Convnets For CV Slide
65 pages
Oct2022 CSC649 SupervisedDL - CNN
No ratings yet
Oct2022 CSC649 SupervisedDL - CNN
79 pages
5b Dana
No ratings yet
5b Dana
67 pages
Convolutional Neural Networks: Riddhiman Dasgupta & Ayushi Dalmia Cse577 Tutorial, Iiit Hyderabad, Monsoon 2015
No ratings yet
Convolutional Neural Networks: Riddhiman Dasgupta & Ayushi Dalmia Cse577 Tutorial, Iiit Hyderabad, Monsoon 2015
29 pages
Identify Web Cam Images Using Neural Networks
No ratings yet
Identify Web Cam Images Using Neural Networks
17 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
Computer Vision With Deep Learning
No ratings yet
Computer Vision With Deep Learning
5 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
Classify Webcam Images Using Deep Learning
No ratings yet
Classify Webcam Images Using Deep Learning
17 pages
Military AI-Week 05-AI in Computer Vision
No ratings yet
Military AI-Week 05-AI in Computer Vision
65 pages
Deep Learning For Computer Vision PDF
No ratings yet
Deep Learning For Computer Vision PDF
24 pages
Vbook - Pub Deep Learning For Computer Visionpdf
No ratings yet
Vbook - Pub Deep Learning For Computer Visionpdf
24 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
FT04 Haghighat Independent 2023
No ratings yet
FT04 Haghighat Independent 2023
40 pages
Convolutional Neural Networks - Deeplearning-Notes
No ratings yet
Convolutional Neural Networks - Deeplearning-Notes
43 pages
Aidl 2023s DL 08 CNN Architectures
No ratings yet
Aidl 2023s DL 08 CNN Architectures
51 pages
Unit 5a - Machine Vision
No ratings yet
Unit 5a - Machine Vision
55 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
Week5 Computer Vision
No ratings yet
Week5 Computer Vision
58 pages
CNN 2
No ratings yet
CNN 2
47 pages
Lec 2
No ratings yet
Lec 2
42 pages
Coding Theory and Modular Arithmetic
No ratings yet
Coding Theory and Modular Arithmetic
28 pages
Unit Iv - NNDL
No ratings yet
Unit Iv - NNDL
32 pages
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
No ratings yet
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
76 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
46 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
W11 Lecture ITS69204 Image Recognition
No ratings yet
W11 Lecture ITS69204 Image Recognition
44 pages
Eee342 hw2 Solution PDF
No ratings yet
Eee342 hw2 Solution PDF
7 pages
ch4 CNN
No ratings yet
ch4 CNN
35 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
CV - T3 - Unit-7
No ratings yet
CV - T3 - Unit-7
36 pages
Zeiler Ec CV 2014
No ratings yet
Zeiler Ec CV 2014
16 pages
Convolutional Neural Networks in Python - DataCamp
No ratings yet
Convolutional Neural Networks in Python - DataCamp
22 pages
Practice Sheet-I Fuzzy Logic
No ratings yet
Practice Sheet-I Fuzzy Logic
10 pages
Sample Paper - 2009 Class - X Subject - Computer Application
No ratings yet
Sample Paper - 2009 Class - X Subject - Computer Application
3 pages
UNIT 2 Test Important Questions
No ratings yet
UNIT 2 Test Important Questions
3 pages
Convolutional Neural Networks For Visual Recognition
No ratings yet
Convolutional Neural Networks For Visual Recognition
45 pages
SoS'25 Midterm - Report
No ratings yet
SoS'25 Midterm - Report
14 pages
Network Flow Algorithms
No ratings yet
Network Flow Algorithms
33 pages
Syllabus
No ratings yet
Syllabus
15 pages
Introduction To Convolutional Neural Networks1-Unit3
No ratings yet
Introduction To Convolutional Neural Networks1-Unit3
10 pages
frmCourseSyllabusIPDownload Aspx
No ratings yet
frmCourseSyllabusIPDownload Aspx
2 pages
Best-First Search
No ratings yet
Best-First Search
2 pages
32 Scheme Examples PDF
No ratings yet
32 Scheme Examples PDF
8 pages
Solution To Credit Assignment Problem in MLP. Rumelhart, Hinton and Relating To Economics)
No ratings yet
Solution To Credit Assignment Problem in MLP. Rumelhart, Hinton and Relating To Economics)
14 pages
A Review of Advances in Image Recognition Models F
No ratings yet
A Review of Advances in Image Recognition Models F
5 pages
Decision Trees 2
No ratings yet
Decision Trees 2
18 pages
Siemens Mri Magnetom-World Compressed-Sensing Compressed-Sensing-Flowchart Blasche 1800000003520147
No ratings yet
Siemens Mri Magnetom-World Compressed-Sensing Compressed-Sensing-Flowchart Blasche 1800000003520147
4 pages
An Overview of Categorization Techniques: B. Mahalakshmi, Dr. K. Duraiswamy
No ratings yet
An Overview of Categorization Techniques: B. Mahalakshmi, Dr. K. Duraiswamy
7 pages
Bsbss
No ratings yet
Bsbss
4 pages
Deadlock Avoidance
No ratings yet
Deadlock Avoidance
7 pages
EE5143 PSET2 Sols PDF
No ratings yet
EE5143 PSET2 Sols PDF
2 pages
AEM Question Bank
No ratings yet
AEM Question Bank
6 pages
Binary Octal Hex Conversion
No ratings yet
Binary Octal Hex Conversion
3 pages
Btech Sem 5 BTCS 502
No ratings yet
Btech Sem 5 BTCS 502
2 pages
Ejercitario 2P - 20190502181914
No ratings yet
Ejercitario 2P - 20190502181914
2 pages
Técnico AgroIndustrial
No ratings yet
Técnico AgroIndustrial
2 pages
Essay 1
No ratings yet
Essay 1
2 pages
Thesis
No ratings yet
Thesis
2 pages
Pentaho Data Integration Implementation Hce 5920 Exam
No ratings yet
Pentaho Data Integration Implementation Hce 5920 Exam
2 pages
Nih Working Definition of Bioinformatics and Computational Biology
No ratings yet
Nih Working Definition of Bioinformatics and Computational Biology
1 page
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
Ray Tracing Graphics: Exploring Photorealistic Rendering in Computer Vision
From Everand
Ray Tracing Graphics: Exploring Photorealistic Rendering in Computer Vision
Fouad Sabry
No ratings yet

CV Ss16 0609 Deep Learning

Uploaded by

CV Ss16 0609 Deep Learning

Uploaded by

High Level Computer Vision

Intro to Deep Learning for Computer Vision

Bernt Schiele - [email protected]

most slides from: Rob Fergus & Marc’Aurelio Ranzato

NIPS 2013 Tutorial

• Primarily about object recognition, using

• Focus on natural images

• Brief discussion of instead of

• Features are not learned

• Trainable classifier is often generic (e.g. SVM)

Felzenszwalb, Girshick, Yan & Huang

• Perhaps get better performance?

• Train all layers jointly

Neural Net Perceptron

• [Hubel & Wiesel 1962]

Short Intro: “Standard” Neural Networks

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

slide taken from David Stutz (Aachen)

• LeCun et al. 1989

• Simpler recognition benchmarks

• But (until recently) less good at

[LeCun et al. 1989]

• ~14 million labeled images, 20k classes

• Images gathered from Internet

• Human labels via Amazon Turk

• 7 hidden layers, 650,000 neurons, 60,000,000 parameters

• Krizhevsky et al. - 16.4% error (top-5)

• Operations in each layer

Pixels / Filter with + Non-linearity

• 8 layers total Layer 7: Full

Layer 2: Conv + Pool

• Remove top fully

Layer 5: Conv + Pool

Layer 1: Conv + Pool

• Remove both fully connected

• Drop ~50 million parameters

Layer 1: Conv + Pool

• Now try removing upper feature Layer 7: Full

• 3.0% drop in performance

Layer 1: Conv + Pool

• Now try removing upper feature

Layer 5: Conv + Pool

• 33.5% drop in performance

Layer 2: Conv + Pool

• Raw coefficients of learned filters in higher

• Several approaches look to optimize input

• Provide way to map activations

• Same operations as Convnet, Unpooling

• Used here purely as a probe

Layer 2 Reconstruction Layer 2: Feature maps

Visualization Input Image

• Use ImageNet 2012 validation set

• Take max activation from

• NOT SAMPLES FROM MODEL

• Pre-2012: 26.2% error à 2012: 16.5% error à 2013: 11.2% error

You might also like

• Brief discussion of   instead of

Felzenszwalb, Girshick,   Yan & Huang  

• But (until recently) less good at  

Pixels / Filter with   + Non-linearity

• Several approaches look to optimize input