139 Pretrained Networks Object Detection
139 Pretrained Networks Object Detection
Deep Learning
■ Limitations of R-CNN
• Training is a multi-stage pipeline. Involves the preparation and
operation of three separate models.
• Training is expensive in space and time. Training a deep CNN on
so many region proposals per image is very slow.
• Object detection is slow. Make predictions using a deep CNN on
so many region proposals is very slow.
■ Fast R-CNN is proposed as a single model instead of a pipeline to learn
and output regions and classifications directly.
The model is significantly faster to train and to make predictions, yet still
requires a set of candidate regions to be proposed along with each input
image.
(DL, Dr. Ashish Gupta) ULC665 : Introduction 16 / 22
Faster R-CNN
■ Most of the DL-based detectors run detection only on the feature maps
of the networks’ top layer.
■ Although the features in deeper layers of a CNN are beneficial for
category recognition, it is not conducive to localizing objects.
■ FPN
• leverages a ConvNet’s pyramidal feature hierarchy, which has se-
mantics from low to high levels, and
• build a feature pyramid with high-level semantics throughout.
within it.
4
3
Each grid cell predicts a bounding box
The class probabilities map and the bounding boxes with
confidences are then combined into a final set of
bounding boxes and class labels.
involving the x, y coordinate and the width and height
and the confidence.