Retina Net
Retina Net
Register Login
Introduction:-
Lately RetinaNet model for object detection has been buzz word in Deep learning community.
And why should it not ? Object detection is a tremendously important field in computer vision .
People are using different object detection methods for autonomous driving, video surveillance,
medical applications, and to solve many other business problems.
Table of Contents: –
1. What is RetinaNet
2. Why RetinaNet was needed
3. Architecture of RetinaNet
Backbone Network
https://towardsmachinelearning.org/retinanet-model-for-object-detection-explanation/ 1/7
8/12/23, 12:19 PM RetinaNet Model for object detection explanation - TowardsMachineLearning
In this article, I’ll introduce you to the architecture of RetinaNet model & working of it. Cherry on
top? In next article , we’ll build a “Face mask detector” using RetinaNet to help us in this
ongoing pandemic.
For this reason, it has become a popular object detection model that one can use with aerial and
satellite imagery also.
Researchers have introduced RetinaNet by making two improvements over existing single stage
object detection models –
And this turn out to be the central cause of making performance of one stage detectors inferior
to two stage detectors.
Hence , researchers have introduced RetinaNet Model with concept called Focal Loss to fill in for
the class imbalances and inconsistencies of the single shot object detectors like YOLO and SSD
,while dealing with extreme foreground-background classes.
1. Backbone Network (i.e. Bottom up pathway + Top down pathway with lateral connections
eg. ResNet + FPN)
2. Sub-network for object Classification
3. Sub-network for object Regression
https://towardsmachinelearning.org/retinanet-model-for-object-detection-explanation/ 2/7
8/12/23, 12:19 PM RetinaNet Model for object detection explanation - TowardsMachineLearning
Top down pathway with lateral connections– The top down pathway up samples the spatially
coarser feature maps from higher pyramid levels, and the lateral connections merge the top-
down layers and the bottom-up layers with the same spatial size.
Higher level feature maps tend to have small resolution though semantically stronger and is
therefore more suitable for detecting larger objects; on the contrary, grid cells from lower level
feature maps have high resolution and hence are better at detecting smaller objects
So, with combination of the top-down pathway and its lateral connections with bottom up
pathway, which do not require much extra computation, every level of the resulting feature maps
can be both semantically and spatially strong
Hence this architecture is scale-invariant and can provide better performance both in terms of
speed and accuracy.
At last , researchers have used Sigmoid layer (not softmax) for object classification.
And reason for last convolution layer to have KA filters is because , if there’re “A ” number of
anchor box proposals for each position in feature map obtained from last convolution layer then
each anchor box has possibility to be classified in K number of classes . So the output feature
map would be of size KA channels or filters.
https://towardsmachinelearning.org/retinanet-model-for-object-detection-explanation/ 3/7
8/12/23, 12:19 PM RetinaNet Model for object detection explanation - TowardsMachineLearning
Reason for last convolution layer to have 4 filters is because in order to localize the class
objects, regression sub-network produces 4 numbers for each anchor box that predict the
relative offset (in terms of center coordinates, width and height) between the anchor box and
the ground truth box. Therefore, the output feature map of the regression sub-net has 4A filters
or channels.
So by now we’ve little clarity on RetinaNet model for object detection architecture. Now let’s
understand most discussed topic topic of RetinaNet model for object detection and that is Focal
loss.
Focal Loss : –
Focal Loss (FL) is an improved version of Cross-Entropy Loss (CE) that tries to handle the class
imbalance problem by assigning more weights to hard or easily misclassified examples (i.e.
background with noisy texture or partial object or the object of our interest ) and to down-
weight easy examples (i.e. Background objects).
So Focal Loss reduces the loss contribution from easy examples and increases the importance
of correcting misclassified examples.)
Focal loss is just an extension of cross entropy loss function that would down-weight easy
examples and focus training on hard negatives. So to achieve this researchers have proposed
γ
(1 − pt ) to the cross entropy loss ,with a tunable focusing parameter γ ≥= 0
RetinaNet object detection method uses an α-balanced variant of the focal loss, where α=0.25,
γ=2 works the best.
https://towardsmachinelearning.org/retinanet-model-for-object-detection-explanation/ 4/7
8/12/23, 12:19 PM RetinaNet Model for object detection explanation - TowardsMachineLearning
F L(p < em > t) = −α < /em > t(1 − p < em > t)γ ln (p < /em > t)
The focal loss is visualized for several values of γϵ [0, 5] ,refer Figure 1.
1. When an example is misclassified and pt is small, the modulating factor is near 1 and the
loss is unaffected.
2. As pt → 1 ,the factor goes to 0 and the loss for well classified examples is down weighed.
3. The focusing parameter γ smoothly adjusts the rate at which easy examples are down-
weighted.
4. As γ is increased , the effect of modulating factor is likewise increased. ( After a lot of
experiments and trails , researchers have found γ = 2 to work best.
You can read about Focal loss in detail in this article , Where I’ve talked about evolution of cross
entropy into Focal loss, need of focal loss, comparison of focal loss with Cross entropy.
https://towardsmachinelearning.org/retinanet-model-for-object-detection-explanation/ 5/7
8/12/23, 12:19 PM RetinaNet Model for object detection explanation - TowardsMachineLearning
And cherry on top, I’ve used couple of examples to explain why Focal loss is better than cross
entropy.
End Points: –
Retina Net is a powerful model that uses Feature Pyramid Network & ResNet as its backbone.
In general RetinaNet is a good choice to start an object detection project, in particular if you
need to quickly get good results. In next article we’ll build a solution using RetinaNet model.
If you’ve enjoyed this article, leave a few claps, it will encourage me to explore further machine
learning opportunities 🙂
References: –
http://arxiv.org/abs/1605.06409
https://arxiv.org/pdf/1708.02002.pdf
https://developers.arcgis.com/python/guide/how-retinanet-works/
https://analyticsindiamag.com/what-is-retinanet-ssd-focal-loss/
https://github.com/fizyr/keras-retinanet
https://www.freecodecamp.org/news/object-detection-in-colab-with-fizyr-retinanet-
efed36ac4af3/
https://deeplearningcourses.com/
https://blog.zenggyu.com/en/post/2018-12-05/retinanet-explained-and-demystified/
Article Credit:-
Name:- Praveen Kumar
Founder:- TowardsMachineLearning.Org
+9
Related Posts:
Deep Learning Focal loss for RCNN RCNN Family Region Proposal
dense object Simplified (Fast R-CNN Network (RPN)
https://towardsmachinelearning.org/retinanet-model-for-object-detection-explanation/ 6/7