0% found this document useful (0 votes)

52 views48 pages

Lecture 6 CNN - Detection

The document discusses classification and localization using Overfeat, which applies a convolutional network across an image at multiple locations and scales to simultaneously classify, locate, and detect objects, and trains the network to produce a category distribution and bounding box prediction for each window while accumulating evidence across locations and scales. It also describes how Overfeat performs classification by applying a classifier across feature maps extracted from different regions, and performs localization by adding a regression network to the classifier to predict bounding box coordinates.

Uploaded by

Abdou Abdelali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views48 pages

Lecture 6 CNN - Detection

Uploaded by

Abdou Abdelali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Lecture 6:

Classification & Localization

boris. [email protected]

1
Agenda

 ILSVRC 2014
 Overfeat: integrated classification, localization, and detection
– Classification with Localization
– Detection.

2
ILSVRC-2014

http://www.image-net.org/challenges/LSVRC/2014/

Classification & Localization:

– Assign to each image label. 5 guesses.
– A bounding box of the main object must be returned and must match with
the ground truth by 50% (using the PASCAL criterion of union over
intersection). Each returned bounding box must be labeled with the correct
class. similar to classification, 5 guesses are allowed per image
Detection:
– there can be any number of object in each image (including zero). False
positives are penalized

3
ILSVRC-2014

4
Detection: Examples

5
Detection: PASCAL VOC

 http://pascallin.ecs.soton.ac.uk/challenges/VOC/
 20 classes:

6
Detection: ILSVRC 2014

 http://image-net.org/challenges/LSVRC/2014/

PASCAL 2012 ILSVRC 2013 ILSVRC 2014

# classes 20 200 200
Training # images 5717 395909 456567
# objects 13609 345854 478807
Validation # images 5823 20121 20121
# objects 13841 55502 55502
testing # images 10991 40152 40152
# objects

7
Detection paradigms

1. Overfeat
2. Regions with CNN
3. SPP + CNN
4. CNN + Regression

8
OVERFEAT

9
Overfeat: Integrated
classification, localization & detection
http://cilvr.nyu.edu/doku.php?id=software:overfeat:start
Training a convolutional network to simultaneously classify, locate
and detect objects. 3 ideas:
1. apply a ConvNet at multiple locations in the image, in a sliding
window fashion, and over multiple scales.
2. train the system to produce
1. a distribution over categories for each window,
2. a prediction of the location and size of the bounding box containing the
object relative to that of the viewing window
3. accumulate the evidence for each categories at each location
and size.

10
Overfeat: “accurate” net topology
input 3x221x221
1. convo: 7×7 stride 2×2; ReLU; maxpool: 3×3 stride 3×3; output: 96x36x36
2. convo: 7×7 stride 1×1; ReLU; maxpool: 2×2 stride 2×2; output: 256x15x15
3. convo: 3×3 stride 1×1 0-padded; ReLU; output: 512x15x15
4. convo: 3×3 stride 1×1 0-padded; ReLU; output: 512x15x15
5. convo: 3×3 stride 1×1 0-padded; ReLU; output: 1024x15x15
6. convo: 3×3 stride 1×1 0-padded; ReLU; maxpool: 3×3 stride 3×3;
output: 1024x5x5
7. convo: 5×5 stride 1×1; ReLU; output: 4096x1x1
8. full; ReLU; output: 4096x1x1
9. full; output: 1000x1x1
10. softmax; output: 1000x1x1
Feature Extraction: 3 x [231x231]  1024 x [5x5], with total
down-sampling is (2x3x2x3):1=36:1
11
Overfeat: topology summary

Layers 1-5 are similar to Alexnet: conv. layer with ReLU, and max
pooling, but with the following differences:
1. no contrast normalization
2. pooling regions are non-overlapping
3. Smaller stride to improve accuracy

12
Overfeat: classification
Let’s takes image, and apply sliding window [231x231], For each window we
will take best score. Feature extractor has sub-smapling 36:1. If we slide
window with step 36, then output feature will slide with step 1

231x231

5x5

Image: 340x270 Features: 8x6 Best score: 4x2

13
Overfeat: classification

2 adjacent windows share many computations. Let’s do all

windows in parallel.
Feature extraction:
The filters are convolved across the entire image in one pass. This far more
efficient than sliding a fixed-size feature extractor over the image and then
aggregating the results from different locations.
Classifier :
Two last fully connected layers can be done in parallel too, but we should
take care of right offsets.

14
Overfeat: classification

15
Overfeat: classification
Feature Extraction:
we compute first 5 layers for whole image. First 5 layers before pooling
correspond to 12:1 “subsampling” .
Classifier:
The classifier has a fixed-size 5x5 input and is exhaustively applied to the
layer 5 maps. We will shift the classifier’s viewing window by 1 pixel
through pooling layers without subsampling.
In the end we have [MxN] x C scores, where M, N are sliding
windows index, and C – number of classes.
Quiz: How to choose 5 best options?

Input Layer 5 Layer 5 Classifier map

Before pooling After pool 3x3
245x245 17x17 [3x3] x [5x5] [3x3] x C
281x 317 20x23 [6x9] x [5x5] [6x9] x C16
Overfeat: scaling and data augmentation
To locate objects in different sizes we can rescale image to 6
scales:
– The typical ratio from one scale to another is about ~1.4 (this number
differs for each scale since dimensions are adjusted to
– fit exactly the stride of our network)
Data augmentation: horizontal flipping.
Final post-processing:
 For each class we took local spatial max for resulting windows,
 take top-1/ top-5 .

17
Overfeat: boosting

Boosting: train 7 different models with different init

weights, and select the best result

18
Overfeat: ”fast” net topology
Input 3x231x231
1. convo: 11×11 stride 4×4; ReLU; maxpool: 2×2 stride 2×2; output: 96x24x24
2. convo: 5×5 stride 1×1; ReLU; maxpool: 2×2 stride 2×2; output: 256x12x12
3. convo: 3×3 stride 1×1 0-padded; ReLU; output: 512x12x12
4. convo: 3×3 stride 1×1 0-padded; ReLU; output: 1024x12x12
5. convo: 3×3 stride 1×1 0-padded; ReLU; maxpool: 2×2 stride 2×2; output: 1024x6x6
6. convo: 6×6 stride 1×1; ReLU; output: 3072x1x1
7. full; ReLU; output : 4096x1x1
8. full; output: 1000x1x1
9. softmax; output: 1000x1x1

19
Overfeat : training details
1. Data augmentation:
– Each image is down-sampled so that the smallest dimension is 256 pixels.
We then extract 5 random crops (and their horizontal flips) of size
221x221 pixels
2. Weight initialization
– randomly with (µ, σ) = (0, 1 × 10 -2 ).
3. Training:
– SGD with learning rate = 5 × 10-2 and is decreased by ½ after (30, 50, 60,
70, 80) epochs,
– momentum =0.6 ,
– ℓ2 weight decay =1×10-5 ;
– Dropout in FC layers.

20
Overfeat: localization

1. Starting from our classification-trained network, fix the feature

extraction layers (1-5) and replace the classifier layers by a
regression network:
– Regression net takes as input the pooled feature maps from layer 5. It has
2 fully-connected hidden layers of size 4096 and 1024 channels,
respectively. The output layer: has 4 units for each class, which specify the
coordinates for the bounding box edges.
2. Train regression net:
– using an ℓ2 loss between the predicted and true bounding box for each
example.
– training use the same set of scales as in multi-scale classification.
– compare the prediction of the regressor at each spatial location with the
ground-truth bounding box, shifted into the frame of reference

21
Overfeat: localization
3. Bounding boxes are merged & accumulated
a) Assign to Cs the set of classes in the top -5 for each scale s ∈ 1 . . . 6, by
taking the maximum detection class outputs across spatial locations for
that scale.
b) Assign to Bs the set of bounding boxes predicted by the regressor
network for each class in Cs, across all spatial locations at scale s.
c) Assign B ←Us Bs
d) Repeat merging until done:
a. (b1, b2) = argmin b1!= b2∈B match_score (b1, b2)
b. If (match_score(b1, b2) > t), then stop;
c. Otherwise, set B ← B\ {b1, b2} ∪ box_merge(b1, b2)
Here match_score = the sum of the distance between centers of the two
bounding boxes and the intersection area of the boxes.
box merge compute the average of the bounding boxes’ coordinates.
22
Overfeat: localization pipleine

1. The raw classifier/detector outputs a class and a confidence for

each location:

23
Overfeat: localization pipleine

2. The regression then predicts the location scale of the object

with respect to each window:

24
Overfeat: localization pipleine

3. Bounding boxes are merged & accumulated

25
Single-class Regression vs
Per- Class Regression
Using a different top layer for each class in the regressor network for each class
(Per-Class Regressor (PCR) surprisingly did not outperform using only a single
network shared among all classes (44.1% vs. 31.3%).

26
Overfeat: Detection

The detection task differ from localization in that there can be any
number of object in each image (including zero), and that false
positives are penalized by the mean average precision (mAP)
measure
The main difference with the localization task, is the necessity to
predict a background class when no object is present. Traditionally,
negative examples are initially taken at random for training. Then
the most offending negative errors are added to the training set in
bootstrapping passes.

27
REGIONS WITH CNN

28
R-CNN: Regions with CNN features

R. Girshick et al , Berkeley “Rich feature hierarchies…”

http://www.cs.berkeley.edu/~rbg/slides/rcnn-cvpr14-slides.pdf
Source: https://github.com/rbgirshick/rcnn // requires Matlab

Regions with CNN detection approach:

1. generates ~2000 category-independent regions for the input image,
2. extracts a fixed-length feature vector from each region using a CNN,
3. classifies each region with category-specific linear SVM

R-CNN outperforms OverFeat, with a mAP = 31.4% vs 24.3%.

29
R-CNN: architecture

1. Region detection  2000 regions , see

http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-124.pdf
2. Region croped and scaled to [227 x 227]  feature extraction with
Imagenet: 5 convolutional layers + 2FC  4096 features
3. SVM for 200 classes
4. Greedy non-maximum suppression for each class: rejects a region if it has
an intersection-over-union (IoU) overlap with a higher scoring selected
region larger than a learned threshold

30
R-CNN Training

The principle idea is to train feature extraction CNN on a large

auxiliary dataset (ILSVRC), followed by domain specific fine-
tuning on a small dataset (PASCAL):
 Pre-training: Train Imagenet
 Replace last layer with FC layer to N+1 outputs (N classes + 1
“background”; VOC N=20, ILSVRC N=200 )
 Training:
– For each region: if IoU > ½ - positive example, otherwise – negative
(background).
– Batch = 128 = 32 positive + 96 background
– Init weights random
– SGD with λ= 0.001

31
R-CNN: PASCAL VOC performance

2012 SIFT, HOG,…

32
R-CNN: PASCAL VOC performance

2014: Regions with CNN

33
R-CNN: ILSVRC 2013 performance

34
R-CNN speed and

 R-CNN detection time/frame

35
R-CNN CODE

https://github.com/rbgirshick/rcnn
Requires Matlab!

36
CNN WITH
SPATIAL PYRAMID POOLING

37
SPP-net = CNN + SPP

Kaiming He et al, “Spatial Pyramid Pooling in Deep Convolutional Networks

for Visual Recognition
“Classical” conv. NN” requires a fixed-size (e.g. 224224) input
image:
– Need cropping or warping to transform original image to square shape
– This constraint is related to Fully-Connected layer ONLY
Idea: let’s use Spatial Pooling Pyramid to transform any-shape
image to ‘fixed-length” feature vector.

38
http://research.microsoft.com/en-us/um/people/kahe/
CNN topology

Soft Max

Inner Product

ReLUP

BACKWARD
Inner Product
FORWARD

Pooling [2x2, stride 2]

SPP(5x5+7x7+13x13)

Convolutional layer [5x5]

Pooling [2x2, stride 2]

Convolutional layer [5x5]

Data Layer
39
Spatial Pyramid Pooling

Here sizeX is the size of the pooling window. This configuration is

for a network whose feature map size of conv5 is 1313, so the
pool33, pool22, and pool11 layers will have 3x3, 2x2, and x1 bins
respectively.

40
SPP-net training

 Size augmentation:
– Imagenet: 224x224  180x180
– Horizontal flipping
– Color altering
 Dropout with 2 last FC layers
 Learning rate:
– Init lr= 0.01; divide by 10 when error plateau

41
SPP-net: Imagenet classification

42
SPP: Imagenet - Detection

1. Find 2000 windows candidate /~ R-CNN /

2. extract the feature maps from the entire image only once
(possibly at multiple scales) /~ Overfeat/.
3. Then apply the spatial pyramid pooling on each candidate
window of the feature, which maps window to a fixed-length
representation
4. Then 2 FC layers
5. SVM

~170x faster than R-CNN

43
Exercises & Projects

Exercise:
– Implement Overfeat network; train classifier.

Projects:
– Install R-CNN
– Re-implement R-CNN in pure Python/C++ to eliminate Matlab
dependency

44
BACKUP
CNN - REGRESSION

45
CNN regression

Szegedy et all ( Google) 2010, “Deep Neural Networks for Object

Detection”
 start with Alexnet,
 replace last soft-max layer with regression layer which generates an binary
mask “d x d” : 1 if pixel is inside box, 0- otherwise;
 train net by minimizing L2 error vs ground truth mask m:

46
CNN regression

Multi-scale

47
CNN regression

Issues:
1. Overlapping masks for multiple touching objects
2. Localization accuracy
3. Recognition of small objects

Issue1:
– To deal with multiple touching objects, we generate not one but several
masks, each representing either the full object or part of it.
– we use one network to predict the object box mask and four additional
networks to predict four halves of the box: bottom, top, left and right
halves

Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
Students Placement Prediction Using Machine Learning Algorithms
No ratings yet
Students Placement Prediction Using Machine Learning Algorithms
14 pages
Gemini Ai
No ratings yet
Gemini Ai
15 pages
Support Vector Machine
100% (1)
Support Vector Machine
11 pages
Speech Emotion Recognition
No ratings yet
Speech Emotion Recognition
55 pages
3141b86-6fd4-7726-D8ad-20a1516bcd Statistics Interview Cheat Sheet - Emmading - Com. All Rights Reserved.
No ratings yet
3141b86-6fd4-7726-D8ad-20a1516bcd Statistics Interview Cheat Sheet - Emmading - Com. All Rights Reserved.
10 pages
cs131 Class Notes PDF
No ratings yet
cs131 Class Notes PDF
213 pages
Social Network Analysis Con Python PDF
No ratings yet
Social Network Analysis Con Python PDF
80 pages
Discussion 4 Pytorch
100% (1)
Discussion 4 Pytorch
37 pages
ML Terminologies PDF
100% (1)
ML Terminologies PDF
44 pages
Finding Similar Items
No ratings yet
Finding Similar Items
85 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
56 pages
A Survey On Video Based Human Action Recognition: Recent Updates, Datasets, Challenges, and Applications
No ratings yet
A Survey On Video Based Human Action Recognition: Recent Updates, Datasets, Challenges, and Applications
64 pages
Decision Tree
100% (1)
Decision Tree
57 pages
04 Ifelse Return Input Strings
No ratings yet
04 Ifelse Return Input Strings
17 pages
02 Expressions Variables Forloops
No ratings yet
02 Expressions Variables Forloops
15 pages
Vision
No ratings yet
Vision
219 pages
Time-Series Forecasting With Deep Learning - A Survey
No ratings yet
Time-Series Forecasting With Deep Learning - A Survey
14 pages
Introduction of Neural Network
No ratings yet
Introduction of Neural Network
31 pages
Predictive Maintenance Using Machine Learning: AWS Implementation Guide
No ratings yet
Predictive Maintenance Using Machine Learning: AWS Implementation Guide
11 pages
Getting Started nRF5SDK Ses
No ratings yet
Getting Started nRF5SDK Ses
39 pages
Dive Into Deep Learning - D2l-En
100% (1)
Dive Into Deep Learning - D2l-En
660 pages
Exploratory data analysis.
No ratings yet
Exploratory data analysis.
3 pages
A Tutorial On Principal Component Analysis
No ratings yet
A Tutorial On Principal Component Analysis
12 pages
VHDL-AMS Simulation of RF Mixed-Signal Communication Systems
No ratings yet
VHDL-AMS Simulation of RF Mixed-Signal Communication Systems
19 pages
Supervised Learning Flowchart
No ratings yet
Supervised Learning Flowchart
1 page
Neural Networks-Notes
No ratings yet
Neural Networks-Notes
24 pages
A Systematic Literature Review of Deep Learning Approaches For Sketch-Based Image Retrieval Datasets Metrics and Future Directions
No ratings yet
A Systematic Literature Review of Deep Learning Approaches For Sketch-Based Image Retrieval Datasets Metrics and Future Directions
23 pages
Altınbaş University Graduate Education Institute Instructors: E-Mail: Aytug - Boyaci@altinbas - Edu.tr
No ratings yet
Altınbaş University Graduate Education Institute Instructors: E-Mail: Aytug - Boyaci@altinbas - Edu.tr
27 pages
Recurrent Neural Networks: Anahita Zarei, PH.D
No ratings yet
Recurrent Neural Networks: Anahita Zarei, PH.D
37 pages
RNN Lecture 4 by Dr. Vibha Tiwari
No ratings yet
RNN Lecture 4 by Dr. Vibha Tiwari
27 pages
100 Days of ML
100% (1)
100 Days of ML
15 pages
Language Model PDF
No ratings yet
Language Model PDF
76 pages
An Introduction To Programming Physics-Informed Neural Network-Based Computational Solid Mechanics
100% (1)
An Introduction To Programming Physics-Informed Neural Network-Based Computational Solid Mechanics
32 pages
Paper-Mca-2025
No ratings yet
Paper-Mca-2025
5 pages
Machine Learning Task 1
No ratings yet
Machine Learning Task 1
12 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Radial Basis Function Neural Network RBFNN
No ratings yet
Radial Basis Function Neural Network RBFNN
14 pages
An Introduction to Convolutional Neural Networks_ a Comprehensive Guide to CNNs in Deep Learning _ DataCamp
No ratings yet
An Introduction to Convolutional Neural Networks_ a Comprehensive Guide to CNNs in Deep Learning _ DataCamp
14 pages
771 A18 Lec4
100% (1)
771 A18 Lec4
128 pages
Explainable K-Means Clustering for OE
No ratings yet
Explainable K-Means Clustering for OE
8 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Python Advanced - Finite State Machine in Python
No ratings yet
Python Advanced - Finite State Machine in Python
1 page
Machine Learning in New
No ratings yet
Machine Learning in New
13 pages
03 ML+and+DL+in+ADAS+-+Sensors+&+Sensor+Fusion
No ratings yet
03 ML+and+DL+in+ADAS+-+Sensors+&+Sensor+Fusion
12 pages
Leukemia Cancer Cells Segmentation and Classification Using Machine Learning
No ratings yet
Leukemia Cancer Cells Segmentation and Classification Using Machine Learning
18 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
01 Speed Read Tensorflow Playground
No ratings yet
01 Speed Read Tensorflow Playground
6 pages
NRF Sniffer UG v2.2 PDF
No ratings yet
NRF Sniffer UG v2.2 PDF
21 pages
CNN Based Crack Detection in Concrete Structures
No ratings yet
CNN Based Crack Detection in Concrete Structures
2 pages
Fuzzy Logic Control
No ratings yet
Fuzzy Logic Control
9 pages
Complete Download Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, PDF All Chapters
100% (4)
Complete Download Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, PDF All Chapters
55 pages
Top 10 Data Mining Algorithms
No ratings yet
Top 10 Data Mining Algorithms
65 pages
Multilayer Perceptron
No ratings yet
Multilayer Perceptron
24 pages
Forest Fire Detection and Recognition
No ratings yet
Forest Fire Detection and Recognition
11 pages
Wafer Map Defect Pattern Classification and Image Retrieval Using Convolutional Neural Network
No ratings yet
Wafer Map Defect Pattern Classification and Image Retrieval Using Convolutional Neural Network
6 pages
Becoming AI Engineer Learning Path
No ratings yet
Becoming AI Engineer Learning Path
4 pages
Super Cheatsheet Machine Learning
100% (1)
Super Cheatsheet Machine Learning
15 pages
18AI61
No ratings yet
18AI61
3 pages
Machine Learning Basic Principles
No ratings yet
Machine Learning Basic Principles
124 pages
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
Machine Learning Based Intrusion Detection System
No ratings yet
Machine Learning Based Intrusion Detection System
5 pages
Skin Cancer Classification Using Convolutional Neural Networks
No ratings yet
Skin Cancer Classification Using Convolutional Neural Networks
8 pages
Data Science With Python Training in Bangalore - Python Training Institutes in Bangalore, Marathahalli, Jayanagar
100% (1)
Data Science With Python Training in Bangalore - Python Training Institutes in Bangalore, Marathahalli, Jayanagar
8 pages
Project report topics
No ratings yet
Project report topics
1 page
An Enlightenment To Machine Learning - Resp
No ratings yet
An Enlightenment To Machine Learning - Resp
22 pages
Full Stack Data Science
No ratings yet
Full Stack Data Science
2 pages
ML Glossary
No ratings yet
ML Glossary
44 pages
19 Storytelling PDF
No ratings yet
19 Storytelling PDF
64 pages
Deep Learning For Computer Vision
No ratings yet
Deep Learning For Computer Vision
55 pages
CV and DIP Coures Outline
No ratings yet
CV and DIP Coures Outline
3 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
7 pages
Scientific Python Workshop
100% (1)
Scientific Python Workshop
2 pages
XV. Anomaly Detection
0% (1)
XV. Anomaly Detection
4 pages
Knime Anomaly Detection Visualization
No ratings yet
Knime Anomaly Detection Visualization
13 pages
The 9 Deep Learning Papers You Need To Know About 3
No ratings yet
The 9 Deep Learning Papers You Need To Know About 3
19 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
Face Detection & Emotion Recognition
No ratings yet
Face Detection & Emotion Recognition
26 pages
Neural
No ratings yet
Neural
35 pages
Automobile
No ratings yet
Automobile
15 pages
Machine Learning Basics Infographic With Algorithm Examples PDF
No ratings yet
Machine Learning Basics Infographic With Algorithm Examples PDF
1 page
POL BigDataStatisticsJune2014
No ratings yet
POL BigDataStatisticsJune2014
27 pages
Trust-In Machine Learning Models
No ratings yet
Trust-In Machine Learning Models
11 pages
Adaline/Madaline:Applications
100% (1)
Adaline/Madaline:Applications
25 pages
Scaling AI and ML
No ratings yet
Scaling AI and ML
4 pages
Summary - Applied Data Science With Python and Jupyter
No ratings yet
Summary - Applied Data Science With Python and Jupyter
2 pages
ChatGPT for Programmers: Enhance Your Coding Skills and Boost Productivity with AI-Powered Assistance (2024 Guide)
From Everand
ChatGPT for Programmers: Enhance Your Coding Skills and Boost Productivity with AI-Powered Assistance (2024 Guide)
CHRIS BUSH
No ratings yet
Communication Nets: Stochastic Message Flow and Delay
From Everand
Communication Nets: Stochastic Message Flow and Delay
Leonard Kleinrock
3/5 (1)
Logic synthesis Standard Requirements
From Everand
Logic synthesis Standard Requirements
Gerardus Blokdyk
No ratings yet

Lecture 6 CNN - Detection

Uploaded by

Lecture 6 CNN - Detection

Uploaded by

Lecture 6:

Classification & Localization

Classification & Localization:

PASCAL 2012 ILSVRC 2013 ILSVRC 2014

Image: 340x270 Features: 8x6 Best score: 4x2

2 adjacent windows share many computations. Let’s do all

Input Layer 5 Layer 5 Classifier map

Boosting: train 7 different models with different init

1. Starting from our classification-trained network, fix the feature

1. The raw classifier/detector outputs a class and a confidence for

2. The regression then predicts the location scale of the object

3. Bounding boxes are merged & accumulated

R. Girshick et al , Berkeley “Rich feature hierarchies…”

Regions with CNN detection approach:

R-CNN outperforms OverFeat, with a mAP = 31.4% vs 24.3%.

1. Region detection  2000 regions , see

The principle idea is to train feature extraction CNN on a large

2012 SIFT, HOG,…

2014: Regions with CNN

 R-CNN detection time/frame

Kaiming He et al, “Spatial Pyramid Pooling in Deep Convolutional Networks

Pooling [2x2, stride 2]

Convolutional layer [5x5]

Pooling [2x2, stride 2]

Convolutional layer [5x5]

Here sizeX is the size of the pooling window. This configuration is

1. Find 2000 windows candidate /~ R-CNN /

~170x faster than R-CNN

Szegedy et all ( Google) 2010, “Deep Neural Networks for Object

You might also like