Tensor Flow

Second International Conference on Smart Systems and Inventive Technology (ICSSIT 2019)
IEEE Xplore Part Number: CFP19P17-ART; ISBN:978-1-7281-2119-2
Object Detection and Count of Objects in Image

using Tensor Flow Object Detection API
B N Krishna Sai, Sasikala T.
Department of Computer Science and Engineering, Department of Computer Science and Engineering,
Amrita School of Engineering, Amrita School of Engineering,
Bangalore, India Bangalore, India
[email protected] [email protected]
Abstract— Object Detection is widely utilized in several A localization algorithm for object will output the place
applications such as detecting vehicles, face detection, coordinates of an item with regard to the picture or image.
autonomous vehicles and pedestrians on streets. TensorFlow’s
Object detection is a problem of importance in CV. Similar
Object Detection API is a powerful tool that can quickly enable
to image classification tasks, deeper networks have shown
anyone to build and deploy powerful image recognition
software. Object detection not solely includes classifying and better performance in detection. At present, the accuracy of
recognizing objects in an image however additionally localizes these techniques is excellent. Hence it used in many
those objects and attracts bounding boxes around them. This applications. The difference is the number of objects. In
paper mostly focuses on detecting harmful objects like detection, there are a variable number of objects. This small
threatening objects. To ease object detection for threatening difference makes a big difference when designing the
objects, we have got Tensor flow Object Detection API to train architectures for the deep learning model concerning
model and we have used Faster R-CNN algorithm for localization or detection.
implementation. The model is built on two classes of
threatening Objects. The model is evaluated on test data for the
two classes of detecting threatening objects.
Keywords— Deep Learning, Object Detection API, Tensor II. PRIOR W ORK
Flow, Threatening Objects, Faster R-CNN, CNN, Computer
Vision The Computer Vision is growing exponentially as the
technology is growing exponentially. In this area, a lot of
I. INT RODUCT ION
work has happened for the continuous growth and
Computer Vision(CV) is the science of understanding and improvements in the domain of Computer Vision. To witness
manipulating digital videos and images. Computer Vision the growth and improvements in this area many researchers
plays a vital role in many applications, which includes Face follow different methods and approaches to a problem. Always
recognition, image retrieval, industrial inspection, and researchers will be kept on digging to find improvements.
augmented reality etc. With the emergence of deep learning, This section explains about the previous works which have
computer vision has proven to be useful for various done so far using different methodologies followed. Firstly,
applications. Deep Learning is an Artificial Neural Network we start with different applications of object detection and
(ANN) collection of methods, which is a machine learning then we got to the background of the implementing algorithm.
branch. On the human brain, ANNs are modelled where nodes
are connected to each other that pass data to each other. The Apoorva Raghunandan and et. al [1], has made work on
use of deep learning for computer vision can be categorized different algorithms such as colour, skin and face detection are
into various categories: generation, segmentation, detection simulated and implemented using MATLAB for detecting
and classification of both videos and images. Image different objects in video surveillance applications to improve
classification labels the image as a whole Finding the position accuracy. Viola jones algorithms are used for face detection.
of the object in addition to labelling the object is called object When they had input an image detected face the algorithm
localization. Typically, the position of the object is defined by detects all the features of a face like eyes, nose etc. In skin
rectangular coordinates. Finding multiple objects in the image pixels, skin detection and the non-skinning pixels has been
with rectangular coordinates is called detection. Segmentation detected. Skin detection four cases have been considered from
is detecting exact objects like creating a transparent mask a single face and output binary images for different cases has
above the object with exact edges. been obtained. When in cas e of multiple people skin
An image classification or model of image recognition merely detections, people were made to be seated in various positions.
detects an object's likelihood in an image. In comparison the The skin complexion and different colour clothing have been
location of objects relates to the place of an item in the picture. observed. Then colour detection has performed to know the
accurate object detections. It helps in classification of different
978-1-7281-2119-2/19/$31.00 ©2019 IEEE 542

objects. In simulation, it has detected various shapes and Interface. In this classification of objects has given
different shades in colour image. Then comes another part of it approximately 76% and event classification of 83%.
has been target detection in this fore ground has been cleaned
up and foreground shadows are detected with a threshold of 80 Donghoon Kim and et. al [15], work has been carried out
and it has been done for different cases with a difference of 20. in detection of objects and tracking of underwater robots using
Finally using various object detection algorithms and simulated template matching. As the environment under water is like
in MATLAB has given accuracy of 95%. noisy and it has very low light so the detection has some cons
due to the poor visibility. In this they proposed vision -based
Jung Uk Kim and et. al [3], has worked on object tracking techniques for underwater robots using artificial
detection in road scene. This object detection in road scene objects and proposed a novel weighted correlation approach
drawn very important attention. Occlusion problems occur using the feature-based performance matching in different
very frequently road scenes. As the previous research has illumination conditions. The conventional method has required
limitations of not detecting the object properly. They proposed a different threshold for matching the different shapes where
a novel approach of detecting objects which is robust in the threshold is more than 0.8. The proposed model has taken a
occlusions. In this, it contains mainly two parts which are using threshold of 0.5. Since underwater is noisy pre-processing has
framework and object bounding box. In the framework part, it done by using camera calibration and Gaussian smoothing
will perform classification bounding box regression Object methods to compensate distortion and noise.
detection framework has used VGG16 network which is part of
feature encoding. Based on the results of KITTI Vision From 2012, when Krizhevsky et al.[16] won the ImageNet
Benchmark suite dataset has shown that the proposed object Large Scale Visual Recognition Challenge (ILSVRC) by relea
network model outperformed different state of art methods. sing AlexNet where deep learning dominated computer vision
in various aspects. In 2014, Girshick et al. [13] show cased the
Shantanu Deshmukh and et. al [4], has come out with advantages of the convolutional neural network for designing
object detection solar panel layout generation. Roof with new network for detecting the object which was named as
obstacles and edges are marked on panel layout diagram. Region-based Convolutional neural network or R-CNN.
Generally, user will draw a boundary manually over each and Selective Search algorithm took huge annotated datasets to
every obstacle in a meticulous and tedious manner. A train a network with R-CNN, but the datasets available for
framework has been built on existing object detection mod els. object at that time were scarce. Girshick et al. [13] uses
which leveraged energy from general traditional edge ImageNet 2012 classification dataset to pre-train the CNN
detection algorithm’s, which are fusing with cutting -edge which has only image level annotations (no bounding box
machine based on the frameworks. In the proposed solution an around image) which solves the scarcity problem. Then, this
approach termed “Novel” fusion is applied. We firstly put in network is modified and worked with two different datasets,
object detection API then after each detection, put in edge PASCAL VOC 201 and ImageNet 2013 dataset with bounding
detected algorithms. Object detection API gives bounding boxes. To improve the training process of R-CNN, In 2015
boxes on original image. From the edge detection output Girshick [11] proposed an algorithm called faster object
candidate image fused back original roof image. Exact edges detection algorithm - Fast R-CNN. In Fast R-CNN, an image
have mapped with obstacles by creating a layout. Proposal input is fed to a single CNN having many convolutional layers
described here having significant impact and highly effective which generates a convolution feature map. The main
on object detection API's. Results in framework is capable of advantage of Fast R-CNN involves training an entire image
detecting objects. The boundaries in a solar panel are with only one CNN instead of training the images with
automatically generate pixel count which has a variation of 25 multiple CNNs for all the region of an image.
less of the ground truth.
III. IMPLEMENT AT ION
Waritchana Rakumthong and et. al [14], proposed a
new method which supports a smart surveillance system which
can detect the stolen objects and abandoned in public areas.
The system has been implemented using image processing
techniques. Immediately it will alert responsible people like
guards etc. It has four major components like acquisition,
processing, detection and presentation. The experiment has
been conducted to detect whether the system is able to access
the qualities of usability and correctness. In order to do the
experiment, the system has taken a video from the CCTV and
then detects the objects using image processing techniques Fig 1 Work Flow
acquired from decision-making. The outcome of processing
can be viewed via computer screen or a TV Screen user
978-1-7281-2119-2/19/$31.00 ©2019 IEEE 543

The above Fig 1, shows the work flow of our work. In this being verified for collision, objects like pedestrians on or near
paper image data set has been created using threatening object to the road, other signs and vehicles. Here we used threatening
images which is taken from google images. We have taken objects the output image is given around those threatening
nearly 78 object images out of which the images splitted into objects. There is a 2D coordinate system and a 3D coordinate
train and test images. We train the model on train images and system that are both being used. In digital image processing,
evaluate model using the test performances. Firstly, we start the bounding box is only the coordinates of the oblong border
from labelling the images. We label the images using that absolutely covers the image when placed over a page, a
LabelImg tool we create a rectangular box around the object screen, a canvas, or other comparable bi-dimensional
which gives coordinates of the object where it lies. The background. These bounding boxes are appeared based on the
image data is initially stored in xml format for each image. As detection classes and detection scores. Detection classes in
the number of images is more we will have a same number of object detection are the craft of identifying instances of exact
xml files to avoid complexity. So, we create a csv file which class, like humans, animals and many more in a video or
has data of all the images. Since we are working with image image. To differentiate between two objects in any image or
dataset the size of images is more s o using a binary file format video. Detection score is to interpret the outcome, we can
to store our data has an important impact on performance of explore for the score and the location of each detected object.
the model. Binary data occupies very little space on disk, The detection score is range between 0 and 1 which gives
consumes less time to copy and can be read more efficiently confidence that the object was genuinely detected. The
from disk. So we convert data into record file format. The detected score is nearer to 1, the more confident the model is.
main advantage of using TFrecord file format is that the data We can decide the cut-off threshold and below the threshold
can be optimized in multiple ways. This is an advantage will be discarded. The input image fed is given as output
especially for data sets that are too large to store it in memory, based on detection scores where the detection score matches to
as the required data (e.g. a batch) is loaded from the disk as a label it returns with class on top of bounding boxes.
and when required and then processed. After generation of tf
records we create a label map which gives a unique id for each
of the category to identify. We have used a trained model of
Faster RCNN. The default parameters changed to our use
model by using config file. Model config block is about
configuration of a model. These config files are used to
configure parameters to the initial setting for some of the
computer codes. Here, each ModelConfig specifies one model
to be served, including its name and the path.
After initialization of everything we train our based on

our system configuration. When each training phase has
begun, the loss will be reported. As the training method
progress, it will begin high and get smaller and lower. W e are
Fig 2 Test Case 1 as Gun
training on model Faster RCNN Inception V2 model. It started
at 3 and fell down rapidly. In this we have trained nearly 5000 The above Fig 2, shows one of the test cases that out
steps until loss is constantly less than 0.1. It took almost twelve four objects i.e., out of four guns it has detected only one of
hours to train the model. The model may train faster on them. As we have trained nearly 500 steps of loss. When we
powerful cpu and gpu. After completing the training process, checked the result after 500 steps the image with only one gun
we export an inference graph by using checkpoints that are has detected a greater number of objects. To get the accurate
created while training a model. This checkpoint won’t contain results we need to train more steps.
any description of the computation defined by the model.
Weights from these checkpoint files are inserted into variable
operations.
After finishing training from Client script, we access
the api from our side by giving the test images to detect the
objects in an image. These images are led to the tensor flow
serving server. TensorFlow Serving is a versatile, high -
performance serving system intended to produce environments
different machine learning models. TensorFlow Serving
makes deploying fresh techniques or algorithms and
experiments straight forward while maintaining same server Fig 3 Result after training complete model
design and APIs. The input image we have given are returned
with Bounding boxes around the object. These Bounding In the above fig 3 it shows that four guns has been
boxes are imaginary boxes which are around objects that are detected and the count of objects is also given as 4. Fig 3,
978-1-7281-2119-2/19/$31.00 ©2019 IEEE 544

shows the output after final model has built with a greater a very good accuracy and the count of objects is same as the in
number of steps which gives correct detection and object count the input image.
in an image.
Fig 4 Test Case 2 as Knife
The above Fig 4, shows one of the test cases that out
one object i.e, out of One Knife in an image it has detected
two are appeared in an image when we have actually in one
image. As we have trained nearly 500 steps of loss. When we
Fig 6 Detected Objects and Count in Image
checked the result after full model is built we greater number
of steps the image with only one Knife has detected and the
Count of image also given as same.
As we have two classes then we have input with different class
which is gun. The same image we have given as input
previously when we stopped to train the model with a smaller
number of steps where it has predicted three guns in the same
image now it shows only two after the model has trained.
Fig 5 Result after training complete model
In the above Fig 4 it shows that Only Knife has been

detected and count is always given from the number of objects
detected in image i.e., bounding boxes number in an image.
IV. RESULT S Fig 7 Detected Objects and Count in Image

The network model is tested with our model with the
test image as input using Faster R-CNN. As shown in the Fig.
5, and Fig. 6, the model which we built method detected
V. CONCLUSION
objects using Faster R-CNN. We have already seen some test
cases in implementation part. From test cases in In this paper, we built a model using Faster R-CNN
implementation we know that more you train better the which considers the detection of threatening objects in an
performance of model. In test we have taken 11 objects out of image. We built the model using Object Detection API. We
which 7 are gun and 4 are Knife 6 of them are predicted as trained model nearly 4500 steps to get a loss under 0.1 which
True Positive which means actual and predicted are same 1 is took twelve hours and when we test the model with test
wrongly classified similarly for knife objects 1 is wrongly images it performs well by giving better results. The future
classified which gives an accuracy of 81.81 % accuracy of the work includes to further enhancing efficiency of model by
model. Now we test the model with some random images. training big number of images and to train model a higher
number of steps for better results.
The below Fig. 5, shows results of the images with
detected objects and count of objects in an image. In the First
fig we have given as input to the system of image which
containing two images. The output has predicted as knife with
978-1-7281-2119-2/19/$31.00 ©2019 IEEE 545

REFERENCES [10] Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet,
[1] Raghunandan, Apoorva, Pakala Raghav, and HV Ravish Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent
Aradhya. "Object Detection Algorithms for video surveillance Vanhoucke, and Andrew Rabinovich. "Going deeper with
applications." In 2018 International Conference on convolutions." In Proceedings of the IEEE conference on
Communication and Signal Processing (ICCSP), pp. 0563- computer vision and pattern recognition, pp. 1-9. 2015.
0568. IEEE, 2018 [11] Girshick, Ross. "Fast r-cnn." In Proceedings of the IEEE
[2] Yu, Liyan, Xianqiao Chen, and Sansan Zhou. "Research of international conference on computer vision, pp. 1440-1448.
Image M ain Objects Detection Algorithm Based on Deep 2015.
Learning." In 2018 IEEE 3rd International Conference on [12] Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause,
Image, Vision and Computing (ICIVC), pp. 70-75. IEEE, 2018. Sanjeev Satheesh, Sean M a, Zhiheng Huang et al. "Imagenet
[3] Kim, Jung Uk, Jungsu Kwon, Hak Gu Kim, Haesung Lee, and large scale visual recognition challenge." International journal
Yong M an Ro. "Object Bounding Box-Critic Networks for of computer vision 115, no. 3 (2015): 211-252.
Occlusion-Robust Object Detection in Road Scene." In 2018 [13] Girshick, Ross, Jeff Donahue, Trevor Darrell, and Jitendra
25th IEEE International Conference on Image Processing M alik. "Rich feature hierarchies for accurate object detection
(ICIP), pp. 1313-1317. IEEE, 2018 and semantic segmentation." In Proceedings of the IEEE
[4] Deshmukh, Shantanu, and Teng-Sheng M oh. "Fine object conference on computer vision and pattern recognition, pp.
detection in automated solar panel layout generation." In 2018 580-587. 2014.
17th IEEE International Conference on M achine Learning and [14] Rakumthong, Waritchana, Natpaphat Phetcharaladakun,
Applications (ICM LA), pp. 1402-1407. IEEE, 2018. Wichuda Wealveerakup, and Nawat Kamnoonwatana.
[5] Abhilash, M . S. K., Amrita Thakur, Deepa Gupta, and B. "Unattended and stolen object detection based on relocating of
Sreevidya. "Time Series Analysis of Air Pollution in Bengaluru existing object." In 2014 Third ICT International Student
Using ARIM A M odel." In Ambient Communications and Project Conference (ICT-ISPC), pp. 115-118. IEEE, 2014.
Computer Systems, pp. 413-426. Springer, Singapore, 2018. [15] Kim, Donghoon, Donghwa Lee, Hyun M yung, and Hyun-Tak
[6] Aki, Aravindh, D. Krishna M ohan Reddy, Y. Koushik Reddy, Choi. "Object detection and tracking for autonomous
C. R. Kavitha, and T. Sasikala. "Analyzing the real time underwater robots using weighted template matching."
electricity data using data mining techniques." In 2017 In OCEANS, 2012-Yeosu, pp. 1-5. IEEE, 2012
International Conference On Smart Technologies For Smart [16] Alex Krizhevsky, Ilya Sutskever, and Geo_rey Hinton.
Nation (SmartTechCon), pp. 545-549. IEEE, 2017. Imagenet classi_cation with deep convolutional neural
[7] Venkataraman, D., Nandina Vinay, TV Vamsi Vardhan, Sai networks. In Advances in Neural Information Processing
Phanindra Boppudi, R. Yogesh Reddy, and P. Systems, pages 1097-1105, 2012.
Balasubramanian. "Yarn price prediction using advanced [17] Koh, Jia Juang, Timothy Tzen Vun Yap, Hu Ng, Vik Tor Goh,
analytics model." In 2016 IEEE International Conference on Hau Lee Tong, Chiung Ching Ho, and
Computational Intelligence and Computing Research (ICCIC), Thiam Yong Kuek. "Autonomous Road Potholes
pp. 1-8. IEEE, 2016. Detection on Video." In Computational Science and
[8] Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Technology, pp. 137-143. Springer, Singapore, 2019.
Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. [18]Oyebode, Kazeem, Shengzhi Du, Barend Jacobus Van Wyk,
"Ssd: Single shot multibox detector." In European conference and Karim Djouani. "A sample-free
on computer vision, pp. 21-37. Springer, Cham, 2016. Bayesian-like model for indoor environment
[9] Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. recognition." IEEE Access 7 (2019): 79783-79790.
"Faster r-cnn: Towards real-time object detection with region
proposal networks." In Advances in neural information
processing systems, pp. 91-99. 2015.
978-1-7281-2119-2/19/$31.00 ©2019 IEEE 546

Tensor Flow

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Tensor Flow

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tensor Flow

Uploaded by

Copyright:

Available Formats

Second International Conference on Smart Systems and Inventive Technology (ICSSIT 2019)

IEEE Xplore Part Number: CFP19P17-ART; ISBN:978-1-7281-2119-2

Object Detection and Count of Objects in Image

978-1-7281-2119-2/19/$31.00 ©2019 IEEE 542

978-1-7281-2119-2/19/$31.00 ©2019 IEEE 543

After initialization of everything we train our based on

978-1-7281-2119-2/19/$31.00 ©2019 IEEE 544

Fig 4 Test Case 2 as Knife

Fig 5 Result after training complete model

In the above Fig 4 it shows that Only Knife has been

IV. RESULT S Fig 7 Detected Objects and Count in Image

978-1-7281-2119-2/19/$31.00 ©2019 IEEE 545

978-1-7281-2119-2/19/$31.00 ©2019 IEEE 546

You might also like