0% found this document useful (0 votes)
7 views

Object Detection

Uploaded by

ha.xn.nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Object Detection

Uploaded by

ha.xn.nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Object detection

Presenter
Contents
1. Object Detection
2. Faster R-CNN
3. YOLO
4. SSD
Computer Vision Tasks
Object Detection
deer

cat
Object Detection as Classification
deer?
CNN cat?
background?
Object Detection as Classification
deer?
CNN cat?
background?
Object Detection as Classification
deer?
CNN cat?
background?
Object Detection as Classification
with Sliding Window
deer?
CNN cat?
background?
Object Detection as Classification
with Box Proposals
Box Proposal Method – SS: Selective Search

Segmentation As
Selective Search for
Object Recognition. van
de Sande et al. ICCV
2011
R-CNN
Fast-RCNN
Faster-RCNN
YOLO- You Only Look Once

Idea: No bounding
box proposals.
Predict a class and a
box for every location
in a grid.

https://arxiv.org/abs/1506.02640 Redmon et al. CVPR 2016.


YOLO- You Only Look Once
YOLO- You Only Look Once
YOLO- You Only Look Once
YOLO- You Only Look Once
• Non-maximal suppression:
YOLO v2

19
YOLO v2

Each cell has 5 anchor boxes. Each anchor


includes:

• Bouding box: 4 real numbers in the range


[0, 1] – offsets of anchor box.
• Objectness score.
• Class score.
5 anchor boxes
-> Each grid cell outputs 5 * (4 + 1 + 20) =
125 real numbers

20
YOLO v2

Linear
Image 2 FC reg
CNN
448 x 448 x 3

7 x 7 x 1024 4096 7 x 7 x 30

YOLO v1

21
YOLO v2

2 x Conv3, 1 x Conv1,
Image 1024 125
CNN
448 x 448 x 3

7 x 7 x 1024 7 x 7 x 1024 7 x 7 x 125

YOLO v2

22
YOLO v2

23
YOLO v2

• Disadvantages of YOLO v1, v2

Last feature map => hard to detect small objects

24
YOLO v3 - Feature Pyramid

Input image Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6

C1 C2 C3 C4 C5 C6

Conv 1x1 Conv 1x1 Upsample

C6 P5 + U6

2 x (Conv 3x3, 1024) Upsample


T5
1 x (Conv 1x1, 75)
P4 + U4
T4

3 anchor boxes for each scale


T4 T5
25
SSD: Single Shot Detector

Idea: Similar to YOLO, but denser grid map, multiscale grid maps. +
Data augmentation + Hard negative mining + Other design choices
in the network. Liu et al. ECCV 2016.
SSD: Single Shot Detector

• Base network : VGG-16


• Add extra convolution feature layers on top of base network
• Multi-scale feature maps for detection

Liu et al. ECCV 2016.


SSD: Single Shot Detector
Input feature map Predictor 𝑝(𝑐𝑙𝑎𝑠𝑠3 )
Loss
𝑝(𝑐𝑙𝑎𝑠𝑠2 )
𝑝(𝑐𝑙𝑎𝑠𝑠1 )
5x5x
3x3 21classes softmax
𝑝(𝑐𝑙𝑎𝑠𝑠)
conv 𝐿 𝑥, 𝑐, , 𝑙, 𝑔 =

𝑤
𝑦
1
3x3 𝑥
(𝐿𝑐𝑜𝑛𝑓 𝑥, 𝑐
conv
𝑁
5x5x +𝛼𝐿𝑙𝑜𝑐 (𝑥, 𝑙, 𝑔))
4 box offset (𝑥, 𝑦, 𝑤, ℎ)
@5x5x256
Feature map

You might also like