0% found this document useful (0 votes)
35 views

Advanced Deep Learning Based Object Detection Methods

This document discusses several advanced deep learning methods for object detection. It begins by describing improvements to non-maximum suppression methods, such as linear soft-NMS and Gaussian soft-NMS. It then discusses learning non-maximum suppression by including the suppression process in the training. The document also covers multi-scale object detection using FPN and single-stage detection using RetinaNet with focal loss. Finally, it summarizes Mask R-CNN for instance segmentation, pose estimation, and its use of RoiAlign for fine spatial information.

Uploaded by

seul alone
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Advanced Deep Learning Based Object Detection Methods

This document discusses several advanced deep learning methods for object detection. It begins by describing improvements to non-maximum suppression methods, such as linear soft-NMS and Gaussian soft-NMS. It then discusses learning non-maximum suppression by including the suppression process in the training. The document also covers multi-scale object detection using FPN and single-stage detection using RetinaNet with focal loss. Finally, it summarizes Mask R-CNN for instance segmentation, pose estimation, and its use of RoiAlign for fine spatial information.

Uploaded by

seul alone
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Advanced Deep Learning based Object

Detection Methods
Improving Object Detection With One Line of Code
● Non-Maximum Suppression is a greedy
process.
○ It worked well enough in 2007 but it doesn’t
anymore.
● High scoring detections can be suppressed
just as low scoring detections.
○ Overlap with stronger detection is the only
criteria.
● Should one detection completely suppress
another detection, or simply reduce its
confidence?
Improving Object Detection With One Line of Code
● NMS:

● Linear Soft-NMS:

● Gaussian Soft-NMS:
○ Linear Soft-NMS is not continuous in terms of
overlap and a sudden penalty is applied when a
NMS threshold is reached.
○ Instead we can use a continuous function:
Improving Object Detection With One Line of Code
Improving Object Detection With One Line of Code
Learning Non-Maximum Suppression
● Object detectors are mostly trained
end-to-end, except for the NMS.
○ NMS is still fully hand-crafted, and forces a
trade-off between recall and precision.
● Training loss is not evaluation loss.
○ Training is performed without NMS
○ During evaluation, multiple detections for same
object count as false positives.
● Instead, train the network to include the
suppression process.
○ Only output one bounding box per object.
○ Learn how to handle close objects.
Learning Non-Maximum Suppression
● Additional blocks that: ● New loss:
○ Encode pairwise information. ○ Only one positive candidate per object.
○ For each detection, pool information from all ○ Instead of the current practice to take all
pairings. objects with IoU>50%
○ Update feature vector.
○ Repeat.
Learning Non-Maximum Suppression
Learning Non-Maximum Suppression
Multi-Scale Object Detection

● Multi-scale object detection using image pyramid


○ Predict different scales by applying same model at different image resolutions.
● Classic method.
● But also, in OverFeat.
● Slow. Requires multiple evaluation of the same model.
Multi-Scale Object Detection

● Predict multiple scale of objects using a single feature map.


● Same as Faster R-CNN.
● Fast
● Single model (same in training as in testing).
● Bad features resolution for small objects.
Multi-Scale Object Detection

● Predict different object sizes at different feature scales.


● Same as SSD.
● Good features resolution for small objects
● But features are much weaker than in deeper layers.
Feature Pyramid Network (FPN)

● Single model (same in training as in testing).


● Good features resolution for small objects.
● Strong features in all layers.
● Almost no overhead over SSD (= Fast).
Feature Pyramid Network (FPN)
Feature Pyramid Network (FPN)

● How important is top-down enrichment?


● How important are lateral connections?
● How important are pyramid representations?
Feature Pyramid Network (FPN)

● How important is top-down enrichment?


● How important are lateral connections?
● How important are pyramid representations?
Focal Loss for Dense Object Detection

● Can we train a single stage detector to be as accurate as two stage detectors?


● Contributions:
○ RetinaNet: Single stage object detector based on FPN backbone.
○ New loss.
Focal Loss for Dense Object Detection

● Class unbalance is an important issue for object detection.


● Previous solutions:
○ Random resampling at 1:3 ratio.
○ Hard negative resampling at 1:3 ratio.
● Both solutions means that at each step, we only a few samples actually matters
to the loss function.
● Instead, include all samples but use different weight for each class.
○ Regular cross entropy:
○ Weighted cross entropy:
Focal Loss for Dense Object Detection
● Using weight CE as baseline:
○ Can we do better?
○ Can we use different weight for each sample?
● Focal loss:
● Every sample is weighted according to its error.
○ We want to focus on samples which are
mislabeled.
Focal Loss for Dense Object Detection

● Different parameters for RetinaNet


Focal Loss for Dense Object Detection

● Comparison with online hard negative mining


Focal Loss for Dense Object Detection

● Accuracy/speed trade-offs
Focal Loss for Dense Object Detection

● Benchmark results
Also Read:
Deformable Convolutional Networks
https://arxiv.org/abs/1703.06211
YouTube Videos

● CS231n
○ Lecture 11 - Detection and segmentation https://youtu.be/nDPWywWRIRo
● Deep Learning for Objects and Scenes (CVPR 2017 Workshop)
○ Lecture 1: Learning Deep Representations for Visual Recognition, by Kaiming He
https://youtu.be/jHv37mKAhV4
○ Lecture 2: Deep Learning for Instance-level Object Understanding, by Ross Girshick
https://youtu.be/jHv37mKAhV4?t=39m4s
Looking for brilliant researchers

[email protected] /
[email protected]
Computer Vision Tasks

Source: CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf


Mask R-CNN
● Instance segmentation with pose
estimation for people.
● Extends faster R-CNN by adding new
branch for the instance mask task.
● Pose estimation can be added by simply
adding an additional branch.
● SOTA accuracy on detection, segmentation
and pose estimation at 5 FPS on GPU.
● https://arxiv.org/abs/1703.06870
● Girshick won young researcher award.
Mask R-CNN
Mask R-CNN
Mask R-CNN
Mask R-CNN
● RoiPool
○ Quantization breaks pixel-to-pixel alignment
○ Too coarse and not good for fine spatial
information required for mask.
● RoiAlign
○ Bilinearly sample the proposal region and avoid
the quantization.
○ Smoothly normalize features and predictions
into coordinate frame free of scale and aspect
ratio
Mask R-CNN
Mask R-CNN
● Backbone architecture
○ ResNet
○ ResNeXt
○ FPN
● Mask representation
○ FC vs. Convolutional
○ Multinomial vs. Independent Masks: softmax
vs. sigmoid
○ Class-Specific vs. Class-Agnostic Masks:
almost same accuracy
● Multi-task learning
○ Mask task improves object detection accuracy.
○ Keypoint task reduces object detection
accuracy.
Mask R-CNN
● Pose estimation
○ Simply add an additional branch.
○ Model a keypoint’s location as a one-hot mask,
and adopt Mask R-CNN to predict K masks.
○ Experiments are mainly to demonstrate the
generality of the Mask R-CNN framework.
○ RoiAlign improves this task’s accuracy as well.
Looking for brilliant researchers

[email protected]

You might also like