Object Detection with
Tensorflow
o D. HARI VAMSHI
o V. RAJU
o U. LAXMAN
Agenda
Ø Intro
Ø What is Object Detection
Ø State of Object Detection
Ø Tensorflow Object Detection API
Ø Preparing Data
Ø Training & Evaluating Models
Ø Links
What is Object
Detection
Object detection is the
task of
identifying objects in an
image and drawing
bounding boxes around
them, i.e. localizing them.
It's a very important
problem in computer vision
due its numerous
applications from self-
driving cars to security and
tracking.
Object detection =
Object Classification + Object
Localization
Approache
s
▪ Classical approach (Haar features) -
first OD real time framework (Viola-Jones)
▪ Deep learning approach is a subset of machine
learning in artificial intelligence (AI) that has networks
capable of learning unsupervised from data that is
unstructured or unlabeled. Also known
as deep neural learning or deep neural network. Few of
the approaches used are:
▪ OverFeat
▪ R-CNN
▪ Fast R-CNN
▪ YOLO
▪ Faster R-CNN
▪ SSD andR-FCN
Deep learning
approach
OverFeat -published in 2013, multi-scale
sliding window algorithm using Convolutional
Neural Networks (CNNs).
N.N - Regions with CNN features. Three
stage approach:
- Extract possible objects using a region
proposal method (the most popular one
being Selective Search).
- Extract features from each region using a
CNN.
- Classify each region with SVMs.
Deep learning
approach
Fast R-CNN - Similar to R-CNN, it used Selective
Search to generate object proposals, but instead
of extracting all of them independently and
using SVM classifiers, it applies the CNN on the
complete image and then used both Region of
Interest (RoI) Pooling on the feature mapwith a
final feed forward network for classification and
regression.
YOLO - You Only Look Once:
a simple convolutional neural
network approach which has
both great results and high
speed,allowing for the first
time real time object
detection.
Deep learning
approach
Faster R-CNN - Faster R-CNN added what
they called a Region Proposal Network
(RPN), in an attempt to get rid of the
Selective Search algorithm and make the
model completely trainable end-to-end.
SSD andR-FCN
Finally, there are two notable papers, Single Shot
Detector (SSD)which takes on YOLO by using
multiple sized convolutional feature maps
achieving better results and speed, and Region-
based Fully Convolutional Networks (R-FCN)
which takes the architecture of Faster R-CNN but
with only convolutionalnetworks.
Introduction
TensorFlow is a free and open-
source software library for
dataflow and differentiable
programming across a range of
tasks. It is a symbolic math
library, and is also used for
machine learning applications
such as neural networks. We
train and process the data
based on the help of
TensorFlow Object Detection
API.
Creating a
dataset
We can either create a dataset of
our own or we can also consider
a predefined dataset and work on
the basis of TensorFlow package.
The dataset being considered in
CIFAR 10
this project is CIFAR-10 which
consists of numerous pictures
used in detection classified in 10
classes.
Dataset
Ø Tensorflow Object Detection API uses
the TFRecord file format
Ø There is available third-party scripts
to convert PASCAL VOC and Oxford
Pet Format
Ø In other cases explanation of format is
available in git repo.
Ø Input data to create TFRecord - annotated
Image
The dataset being considered in this module is
CIFAR 10.
Creating
TFRecord
TensorFlow object detection API report contains folder dataset_tools
with scripts to covert common structures of data into TFRecord.
The considered images can be formed into a TFrecord once the input file are
either an image or a jpg or png file which s stored in the form of records in
TensorFlow.
Max-norm The maximum norm, also called max-norm or max-norm, is a popular constraint
because it is less aggressive than other norms such as the unit norm, simply setting an upper bound.
TRAINING DATA
One model for two
tasks?
Po - is object
exists
bx1
- bounding
bx2 box
Object detection -output is the one number (index) of coordinates
aclass by1
by2
c1
c2 - object’s
variables
c3
…
Object localization -output is the four
numbers - coordinates of bounding box. cn
Selecting a
model
Tensorflow OD API provides a
collection of detection models pre-
trained on the COCO dataset, the Kitti dataset,
and the Open Images dataset.
- model name corresponds to a config file that
was used to train this model.
- speed -
running time in msper 600x600 image
- mAP stands for mean average precision,
which indicates how well the model
performed on the COCO dataset.
- Outputs types (Boxes, and Masks if ap
plicable)
Training &
Evaluating
# From the tensorflow/models/research directory
python object_detection/train.py
--logtostderr
--
pipeline_config_path=/tensorflow/models/object_detection/samples/configs/ssd_mobilenet_v1_p
ets.config
--train_dir=${PATH_TO_ROOT_TRAIN_FOLDER}
# From the tensorflow/models/research directory
python object_detection/eval.py \
--logtostderr \
--pipeline_config_path=$
{PATH_TO_YOUR_PIPELINE_CONFIG} \
--checkpoint_dir=${PATH_TO_TRAIN_DIR} \
--eval_dir=${PATH_TO_EVAL_DIR}
Facial Recognition:
A deep learning facial recognition system called the
“DeepFace” has been developed by a group of researchers
in the Facebook, which identifies human faces in a digital
image very effectively. Google uses its own facial
recognition system in Google Photos, which automatically
segregates all the photos based on the person in the
image. There are various components involved in Facial
Recognition like the eyes, nose, mouth and the eyebrows.
Self Driving Cars:
Self-driving cars are the Future, there’s no doubt in
that. But the working behind it is very tricky as it
combines a variety of techniques to perceive their
surroundings, including radar, laser light, GPS,
odometry, and computer vision.
Advanced control systems interpret sensory
information to identify appropriate navigation
paths, as well as obstacles and once the image
sensor detects any sign of a living being in its
path, it automatically stops. This happens at a
very fast rate and is a big step towards Driverless
Cars.
Security: Object Detection plays a very important role in Security. Be it face ID of Apple or
the retina scan used in all the sci-fi movies.
It is also used by the government to access the security feed and match it with
their existing database to find any criminals or to detect the robbers’ vehicle.
The applications are limitless.
Link
s
▪ https://towardsdatascience.com/how-to-train-your-own-object-detector-with-
tensorflows-object-detector-api-bec72ecfe1d9
▪ https://www.kdnuggets.com/2017/10/deep-learning-object-detection-
comprehensive-review.html
▪ http://www.machinelearninguru.com/deep_learning/tensorflow/basics/tfrecord/tfreco
rd.html
▪ https://www.coursera.org/learn/convolutional-neural-networks
▪ https://medium.com/comet-app/review-of-deep-learning-algorithms-for-object-
detection-c1f3d437b852
▪ https://towardsdatascience.com/evolution-of-object-detection-and-localization-
algorithms-e241021d8bad
▪ https://medium.freecodecamp.org/how-to-play-quidditch-using-the-tensorflow-
ANY QUERIES!