0% found this document useful (0 votes)
28 views6 pages

Retina Net

The document discusses the RetinaNet model for object detection. It introduces RetinaNet and explains why it was developed, describing the challenges with existing single-stage detectors. The architecture of RetinaNet is then broken down, including the backbone network, classification and regression subnetworks, and use of focal loss to address class imbalance.

Uploaded by

sudhanshu2198
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views6 pages

Retina Net

The document discusses the RetinaNet model for object detection. It introduces RetinaNet and explains why it was developed, describing the challenges with existing single-stage detectors. The architecture of RetinaNet is then broken down, including the backbone network, classification and regression subnetworks, and use of focal loss to address class imbalance.

Uploaded by

sudhanshu2198
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

8/12/23, 12:19 PM RetinaNet Model for object detection explanation - TowardsMachineLearning

Register Login

Home About Us Contact Us Blogs  Live Sessions


Testimonials Careers

RetinaNet Model for object detection explanation


Leave a Comment / Deep Learning

Introduction:-
Lately RetinaNet model for object detection has been buzz word in Deep learning community.

And why should it not ? Object detection is a tremendously important field in computer vision .
People are using different object detection methods for autonomous driving, video surveillance,
medical applications, and to solve many other business problems.

Table of Contents: –
1. What is RetinaNet
2. Why RetinaNet was needed
3. Architecture of RetinaNet
Backbone Network
https://towardsmachinelearning.org/retinanet-model-for-object-detection-explanation/ 1/7
8/12/23, 12:19 PM RetinaNet Model for object detection explanation - TowardsMachineLearning

Subnetwork for object Classification


Subnetwork for object Regression
4. Focal Loss
5. Final Notes

In this article, I’ll introduce you to the architecture of RetinaNet model & working of it. Cherry on
top? In next article , we’ll build a “Face mask detector” using RetinaNet to help us in this
ongoing pandemic.

What is RetinaNet Model: –


Facebook AI research (FAIR ) team has introduced RetinaNet model with aim to tackle dense and
small objects detection problem.

For this reason, it has become a popular object detection model that one can use with aerial and
satellite imagery also.

Researchers have introduced RetinaNet by making two improvements over existing single stage
object detection models –

Feature Pyramid Networks (FPN)


Focal Loss

Need of RetinaNet Model: –


Both classic one stage detection methods, like boosted detectors, DPM & more recent methods
4 5
like SSD evaluate almost 10 to 10 candidate locations per image but only a few locations
contain objects (i.e. Foreground) and rest are just background objects. This leads to class
imbalance problem.

And this turn out to be the central cause of making performance of one stage detectors inferior
to two stage detectors.

Hence , researchers have introduced RetinaNet Model with concept called Focal Loss to fill in for
the class imbalances and inconsistencies of the single shot object detectors like YOLO and SSD
,while dealing with extreme foreground-background classes.

Architecture of RetinaNet Model: –


In essence, we can break down RetinaNet architecture in to 3 following components:

1. Backbone Network (i.e. Bottom up pathway + Top down pathway with lateral connections
eg. ResNet + FPN)
2. Sub-network for object Classification
3. Sub-network for object Regression

https://towardsmachinelearning.org/retinanet-model-for-object-detection-explanation/ 2/7
8/12/23, 12:19 PM RetinaNet Model for object detection explanation - TowardsMachineLearning

Figure 1 :- RetinaNet Model Architecture Source

For better understanding, Let’s understand each component of architecture separately –

The backbone Network: –


Bottom up pathway – Bottom up pathway (eg. ResNet) is used for feature extraction. So, It
calculates the feature maps at different scales, irrespective of the input image size.

Top down pathway with lateral connections– The top down pathway up samples the spatially
coarser feature maps from higher pyramid levels, and the lateral connections merge the top-
down layers and the bottom-up layers with the same spatial size.

Higher level feature maps tend to have small resolution though semantically stronger and is
therefore more suitable for detecting larger objects; on the contrary, grid cells from lower level
feature maps have high resolution and hence are better at detecting smaller objects

So, with combination of the top-down pathway and its lateral connections with bottom up
pathway, which do not require much extra computation, every level of the resulting feature maps
can be both semantically and spatially strong

Hence this architecture is scale-invariant and can provide better performance both in terms of
speed and accuracy.

Sub-network for object Classification: –


Fully convolutional network (FCN) is attached to each FPN level for object classification. As it’s
shown in diagram above , This subnetwork incorporates 3*3 convolutional layers with 256 filter
followed by another 3*3 convolutional layer with K*A filters. Hence output feature map would be
of size W*H*KA , where W & H are proportional to the width and height of input feature map and
K & A are number of object class and anchor boxes respectively.

At last , researchers have used Sigmoid layer (not softmax) for object classification.

And reason for last convolution layer to have KA filters is because , if there’re “A ” number of
anchor box proposals for each position in feature map obtained from last convolution layer then
each anchor box has possibility to be classified in K number of classes . So the output feature
map would be of size KA channels or filters.

https://towardsmachinelearning.org/retinanet-model-for-object-detection-explanation/ 3/7
8/12/23, 12:19 PM RetinaNet Model for object detection explanation - TowardsMachineLearning

Sub-network for object Regression: –


The regression subnetwork is attached to each feature map of the FPN in parallel to the
classification subnetwork. The design of the regression subnetwork is identical to that of the
classification subnet, except that the last convolutional layer is of size 3*3 with 4 filters
resulting in output feature map with size of W*H*4A .

Reason for last convolution layer to have 4 filters is because in order to localize the class
objects, regression sub-network produces 4 numbers for each anchor box that predict the
relative offset (in terms of center coordinates, width and height) between the anchor box and
the ground truth box. Therefore, the output feature map of the regression sub-net has 4A filters
or channels.

So by now we’ve little clarity on RetinaNet model for object detection architecture. Now let’s
understand most discussed topic topic of RetinaNet model for object detection and that is Focal
loss.

Focal Loss : –
Focal Loss (FL) is an improved version of Cross-Entropy Loss (CE) that tries to handle the class
imbalance problem by assigning more weights to hard or easily misclassified examples (i.e.
background with noisy texture or partial object or the object of our interest ) and to down-
weight easy examples (i.e. Background objects).

So Focal Loss reduces the loss contribution from easy examples and increases the importance
of correcting misclassified examples.)

Focal loss is just an extension of cross entropy loss function that would down-weight easy
examples and focus training on hard negatives. So to achieve this researchers have proposed
γ
(1 − pt ) to the cross entropy loss ,with a tunable focusing parameter γ ≥= 0

RetinaNet object detection method uses an α-balanced variant of the focal loss, where α=0.25,
γ=2 works the best.

https://towardsmachinelearning.org/retinanet-model-for-object-detection-explanation/ 4/7
8/12/23, 12:19 PM RetinaNet Model for object detection explanation - TowardsMachineLearning

Figure 1 . Focal loss vs probability of ground truth class Source

So one can define focal loss as –

F L(p < em > t) = −α < /em > t(1 − p < em > t)γ ln (p < /em > t)

The focal loss is visualized for several values of γϵ [0, 5] ,refer Figure 1.

Focal Loss characteristics:-


We shall note following properties of the focal loss-

1. When an example is misclassified and pt is small, the modulating factor is near 1 and the

loss is unaffected.
2. As pt → 1 ,the factor goes to 0 and the loss for well classified examples is down weighed.

3. The focusing parameter γ smoothly adjusts the rate at which easy examples are down-
weighted.
4. As γ is increased , the effect of modulating factor is likewise increased. ( After a lot of
experiments and trails , researchers have found γ = 2 to work best.

Note:- When γ = 0 , FL is equivalent to CE. (Shown blue curve in Figure 1)

You can read about Focal loss in detail in this article , Where I’ve talked about evolution of cross
entropy into Focal loss, need of focal loss, comparison of focal loss with Cross entropy.

https://towardsmachinelearning.org/retinanet-model-for-object-detection-explanation/ 5/7
8/12/23, 12:19 PM RetinaNet Model for object detection explanation - TowardsMachineLearning

And cherry on top, I’ve used couple of examples to explain why Focal loss is better than cross
entropy.

End Points: –
Retina Net is a powerful model that uses Feature Pyramid Network & ResNet as its backbone.

In general RetinaNet is a good choice to start an object detection project, in particular if you
need to quickly get good results. In next article we’ll build a solution using RetinaNet model.

If you’ve enjoyed this article, leave a few claps, it will encourage me to explore further machine
learning opportunities 🙂

References: –
http://arxiv.org/abs/1605.06409
https://arxiv.org/pdf/1708.02002.pdf
https://developers.arcgis.com/python/guide/how-retinanet-works/
https://analyticsindiamag.com/what-is-retinanet-ssd-focal-loss/
https://github.com/fizyr/keras-retinanet
https://www.freecodecamp.org/news/object-detection-in-colab-with-fizyr-retinanet-
efed36ac4af3/
https://deeplearningcourses.com/
https://blog.zenggyu.com/en/post/2018-12-05/retinanet-explained-and-demystified/

Article Credit:-
Name:- Praveen Kumar
Founder:- TowardsMachineLearning.Org

+9

Related Posts:

Deep Learning Focal loss for RCNN RCNN Family Region Proposal
dense object Simplified (Fast R-CNN Network (RPN)

https://towardsmachinelearning.org/retinanet-model-for-object-detection-explanation/ 6/7

You might also like