Region-Based Object Detection and Classification Using Faster R-CNN
Region-Based Object Detection and Classification Using Faster R-CNN
Abstract— With the advent of Deep Learning,the machine approach similar to the R-CNN. The approach used a Region
learning systems are able to recognize and classify objects of Proposal algorithm typically Selective search to extract objects
interest in an image.Various advancement has been done in the of interest in the scene.The approach varies from the R-CNN
field of object recognition and classification.Our research work in the process that it applies on the entire image instead of
focusses on improving the R-CNN, Fast R-CNN,YOLO selecting one region at a time.For classication and regression it
architecture.The work focussed on using Region Proposals employs Region of interest Pooling layer on the feature
Network(RPN) to extract region of interest in an image.RPN map[3].This approach proved to be faster than the R-
outputs an image based on the objectness score.The output CNN.However its biggest drawback was the technique still
objects are subjected to Roll Polling for classification.Our relied on selective search approach for extracting object from
research work focusses on training Faster R-CNN using custom the image.
based data set of images. Our trained network efficiently detects
objects from an image consisting of multiple objects.Our network A few time after R-CNN architecture came into existence,
requires minimum GPU capability of 3.0 or higher. You Only Look Once Unified(YOLO) in 2016, object
detection and classification technique was proposed.The
Keywords—Deep Learning, Faster R-CNN, Region Proposal reserch work was published in the form of paper by Joseph
Network, Convolution Neural Network.) Redmon[4].The architecture which the researchers have
proposed is based on a convolution neural network. The
I. INTRODUCTION technique achieved efficient results and took relatively less
Object detection and classification came into existence time.This was the first time that the real-time object
with the advent of Convolution Neural Network.The very first recognition came into picture.[5]
advancement in the field of Deep Learning application in
Following theYOLO architecture an advancement widely
object recognition took place at NYU in Singapore in
called Faster R-CNN proposed by Shaoqing Ren coauthored
2013.The model developed by the researcher known as
by Girshick who is currently working as aResearcher at
OverFeat employs the use of sliding window approach in
Facebook. Faster R-CNN in an attempt to construct a model
Convolution Neural Network.However the approach was not
that can be trained efficiently using training data added to its
efficient as it required the window to be placed on different
feature a Region Proposal Network[6].The basic of RPN is it
regions of network.Soon after which Region based
outputs the objects the object based on their relative objectness
Convolution Network architecture was proposed by Ross
score.The objects extracted from the RPN networks are
Girshick[1].The results provided improvement to the existing
subsequently used by the Rol polling and fully connected
architecture.The R-CNN architecture proposed by them
layer.
worked in three phases.Firstly it extracts the typical objects in
the entire image using a region proposal method,the most The uses the train Faster RCNN Object Detector function
common among them was Selective approach.Secondly it of the Computer Vision System Toolbox. Custom data sets of
extracts features from each of the possible objects recognized approximate 295 images containing vehicles as objects ias
in the scene[2].Lastly it classifies the image into regions by collected.[7] A Convolution Neural Network(CNN) is created
using the Support Vector Machine.While the results were far layer by layer by implementing the functionality provided by
much accurate,training the network using different data sets MATLAB tool that is Neural Network Toolbox™. At the
avaliable was a challenge.The approach employs extracting time of training image patches are extracted.
features from objects of interest in the scene one by one and
apply Support Vector Machine for object The paper is divided into following sections.Section I
classification.However the approach was later improved by gives the general introduction to the deep learning techniques.
Ross Girshick,a Microsoft Researcher by publishing an Section II of the paper indicates and its applications including
the various research performed by researchers in the field of
Authorized licensed use limited to: INSTITUTE OF ENGINEERING & MANAGEMENT TRUST. Downloaded on May 21,2024 at 04:45:02 UTC from IEEE Xplore. Restrictions apply.
International Conference on "Computational Intelligence and Communication Technology" (CICT 2018)
Object Detection and classification.Section III explains the of different scales.R-FCN follows the architecture of Faster R-
architecture of Faster R-CNN and Region Proposal CNN.
Network.Further Section IV shows the results of the training
performed on network using custom data set. Lastly Section , III. METHODOLGY
V concludes the paper. A. Faster R-CNN
II. RELATED WORK Region of interest polling is the approach that is gaining
much attention in the field of object recognition and
A. Region Based Convolution Neural Network classification,a deep learning approach.An instance could be
The model developed by the researcher known as detection of objects from a scene of image containing multiple
OverFeat employs the use of sliding window approach in objects.[12]The objective is to use max pooling on the entire
Convolution Neural Network.However the approach was not image to extract feature maps of fixed-size.The typical
efficient as it required the window to be placed on different architecture of Faster R-CNN is illustrated in fig[1,1].
regions of network.Soon after which Region based
Convolution Network architecture was proposed by Ross
Girshick.[8]The results provided improvement to the existing
architecture.The R-CNN architecture proposed by them
worked in three phases.
Firstly it extracts the typical objects in the entire image
using a region proposal method,the most common among
them was Selective approach.Secondly it extracts features
from each of the possible objects recognized in the
scene.Lastlly it classifies the image into regions by using the
Support Vector Machine.While the results were far much
accurate,training the network using different data sets
avaliable was a challenge.[9]The approach employs extracting
features from objects of interest in the scene one by one and
apply Support Vector Machine for object classification
B. Fast-Region based Convolution Neural Network
However the approach was later improved by Ross
Girshick,a Microsoft Researcher by publishing an approach
similar to the R-CNN known as Fast R-CNN. The approach
used a Region Proposal algorithm typically Selective search to
extract objects of interest in the scene.The approach varies
from the R-CNN in the process that it applies on the entire Fig 1.Typical architecture of Faster R-CNN
image instead of selecting one region at a time.For
classiication and regression it employs Region of interest The object detection technique of Faster R-CNN is sub-
Pooling layer on the feature map.[10]This approach proved to divided into follwing stages:
be faster than the R-CNN.However its biggest drawback was
the technique still relied on selective search approach for a) Region Propsal Network:The very fast task is to search in
extracting object from the image. the given input image the spaces where there is a probability
C. YOLO Architecture of location of object.The position of the object in an image can
A few time after R-CNN architecture, You Only Look be located.[13]These regions where there is possibility of
Once Unified(YOLO) in 2016, object detection and object is bounded by a region known as region of
classification technique was proposed.The reserch work was interest(ROI).
published in the form of paper by Joseph Redmon.
The architecture which the researchers have proposed is b) Classification:The stage is to classify the regions of interest
based on a convolution neural network.[11]The technique identified in the above steps into corresponding classes.The
achieved efficient results and took relatively less time.This technique deployed here is Convolution Neural
was the first time that the real-time object recognition came Networks(CNN).
into picture. In the proposed approach there is rigrous process of
identifying all spaces of object location in image.However if
D. Single Shot Detector(SSD) Region-based Fully no regions are identified in the first stage of algorithm then
Convolution Networks(R-FCN) there is no need to further go to the second step of
The proposed architecture followed YOLO. Predicts approach.[14]
categories and box offsets Uses small convolutional filters
applied to feature maps Makes predictions using feature maps
Authorized licensed use limited to: INSTITUTE OF ENGINEERING & MANAGEMENT TRUST. Downloaded on May 21,2024 at 04:45:02 UTC from IEEE Xplore. Restrictions apply.
International Conference on "Computational Intelligence and Communication Technology" (CICT 201
2018)
Authorized licensed use limited to: INSTITUTE OF ENGINEERING & MANAGEMENT TRUST. Downloaded on May 21,2024 at 04:45:02 UTC from IEEE Xplore. Restrictions apply.
International Conference on "Computational Intelligence and Communication Technology" (CICT 2018)
A)Loading of data. created bu combining the networks obtained from initial two
B)Convolution Neural Network designing. steps.The Convergence rates can be diferent for each training
C)Configuring training options steps,therfore we have have specified options for training in
D)Training of the Network. each step using trainingOptions function from the Neural
E)Testing of the trained network. Network Toolbox.[35]
Authorized licensed use limited to: INSTITUTE OF ENGINEERING & MANAGEMENT TRUST. Downloaded on May 21,2024 at 04:45:02 UTC from IEEE Xplore. Restrictions apply.
International Conference on "Computational Intelligence and Communication Technology" (CICT 2018)
layers can be used to improve the average precision. [7] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D.
Ramanan, “Object detection with discriminatively trained
partbased models,” IEEE Transactions on Pattern Analysis
and Machine Intelligence (TPAMI), 2010.
Authorized licensed use limited to: INSTITUTE OF ENGINEERING & MANAGEMENT TRUST. Downloaded on May 21,2024 at 04:45:02 UTC from IEEE Xplore. Restrictions apply.
International Conference on "Computational Intelligence and Communication Technology" (CICT 2018)
Authorized licensed use limited to: INSTITUTE OF ENGINEERING & MANAGEMENT TRUST. Downloaded on May 21,2024 at 04:45:02 UTC from IEEE Xplore. Restrictions apply.