Seminar Report On Object Detection and Tracking

Object Detection And Tracking
Seminar Report Submitted in Partial Fulfillment of

the Requirements for the Degree of
Bachelor of Engineering
in
Computer Science And Engineering
Submitted by
Pravin Kumar: (Roll No. 19UCSE4012)
Under the Supervision of

Abhisek Gour
Assisstant Professor
Department of Computer Science and Engineering

MBM University, Jodhpur
Apr, 2022
This page was intentionally left blank.
Department of Computer Science & Engineering
M.B.M. Engineering College, Jai Narain Vyas University

Ratanada, Jodhpur, Rajasthan, India –342011
CERTIFICATE
This is to certify that the work contained in this report entitled “Object Detection And
Tracking” is submitted by the group members Mr. Pravin Kumar (Roll. No:
19UCSE4012) to the Department of Computer Science & Engineering, M.B.M.
Engineering College, Jodhpur, for the partial fulfillment of the requirements for the
degree of Bachelor of Engineering in Information Technology.
They have carried out their work under my supervision. This work has not been
submitted else-where for the award of any other degree or diploma.
The project work in our opinion, has reached the standard fulfilling of the requirements
for the degree of Bachelor of Engineering in Information Technology in accordance
with the regulations of the Institute.
Abhisek Gour
Assistant Professor
(Professor)
Dept. of Computer Science & Engg.
M.B.M. Engineering College, Jodhpur
N.C. Barwar
(Head)
Dept. of Computer Science & Engg.
M.B.M. Engineering College, Jodhpur
ii
iii
DECLARATION
I, Pravin Kumar, hereby declare that this seminar/project titled “Object Detection And
Tracking” is a record of original work done by me under the supervision and guidance
of Prof. Abhisek Gour.
I, further certify that this work has not formed the basis for the award of the
Degree/Diploma/Associateship/Fellowship or similar recognition to any candidate of
any university and no part of this report is reproduced as it is from any other source
without appropriate reference and permission.
SIGNATURE OF STUDENT
(Pravin Kumar)
7th Semester, CSE
Enroll. - < Enroll No>
Roll No. - 19UCSE4012
iv
v
ACKNOWLEDGEMENT
I, Pravin Kumar would sincerely like to thank Dr. Nemi Chand

Barwar, Head of Department, Computer Science & Engineering,
MBM University, Jodhpur for the support and availability of
facilities by the department. I wish to express my deepest sense of
gratitude to Asst. Prof. Abhisek Gour, for his able guidance and
useful suggestions, that helped me in completing the seminar
work, on time. His guidance, encouragement, suggestion, and
constructive criticism have contributed immensely to the
evolution of my ideas on the report.
Finally, yet most importantly, I would like to express my heartfelt

thanks to my family, friends, and peers for their blessings, wishes,
and support for the successful completion of this report.
vi
vii
ABSTRACT
Object detection and tracking is one of the critical areas of research due to
routine change in motion of object and variation in scene size, occlusions,
appearance variations, and object motion and illumination changes.
Specifically, feature selection is the vital role in object tracking. It is related
to many real time applications like vehicle perception, video surveillance
and so on. In order to overcome the issue of detection, tracking related to
object movement and appearance. Most of the algorithm focuses on the
tracking algorithm to smoothing the video sequence. On the other hand,
few methods use the prior available information about object shape, color,
texture and so on.
viii
ix
Contents
1. Introduction to the Topic (6-8 pages) 1
1.1. Object detection and tracking 1
1.1.1. Object detection

1.1.2. Object tracking
2. History, Evolution & Technical Details (4-5 Pages) 9
3. Similar Technologies (4-5 Pages) 21
4. Applications, Pros & Cons (3-4 Pages) 29
5. Summary of Study (1-2 Pages) 38
References…………………………………………………………………….. 40
x
xi
List of Figures
1.1 Object Detection With Label 1 5

2.1 harr like features 2 9
2.3 part based detection 10
2.4 yolo detection 12
3.1 Lidar object detection 21
3.2 Radar object detection 23
3.3 Sonar object detection 25
4.1 character recognition 29
4.2 self driving car 30
4.3 tracking objects with label 30
4.4 face detection 31
4.5 iris code 32
4.6 smile detection 33
4.7 object detection in health care 33
4.8 ball tracking 34
4.9 automated cctv 34
4.9.1 robot 35
xii
xiii
Project Report Template (B.E.)
Chapter 1
INTRODUCTION
Object detection is the process of detecting a target object in an image or a single frame of
the video. Object tracking refers to the ability to estimate or predict the position of a target
object in each consecutive frame in a video once the initial position of the target object is
defined.
Object tracking, using video sensing technique, is one of the major areas of research due
to its increased commercial applications such as surveillance systems, Mobile Robots,
Medical therapy, security systems and driver assistance systems. Object tracking, by
definition, is to track an object (or multiple objects) over a sequence of images.
Tracking is usually performed on higher-level applications that require the location and
object in every frame. The most popular application in this area is vision-based
surveillance, to help understand the movement patterns of people with suspicious
actions. Traffic scene analysis is also a well-known application, to get the tracking
information for keeping the vehicles in lane and preventing the accidents. Thus, object
detection and tracking under dynamic conditions is still a challenge for real-time
performance which requires the computational complexity to be minimum. Various
methods for object detection have been proposed; such as feature-based, template-based
object detection and background subtraction. But selection of the best technique for a
specific application is relative and dependent upon the hardware resources and scope of
the application. Feature-based detection searches for corresponding features in
successive frames, including Harris corner, edges, SIFT, contours or colour pixels.
Background subtraction is a popular method which uses static background and
calculating the difference between the hypotheses background and the current image.
This approach is fast and good for fixed background but it cannot deal with the dynamic
environment, with different illumination and motions of small objects. The goal of
tracking is to establish a correspondence between the detected target objects of images
Chapter 1: Introduction 1
over frames. Tracking using mean shift kernel is also introduced. This method performs
well when there is occlusion, which can be solved using templates. Camshift
(Continously Adaptive Meanshift) can track a single object fast and robust using color
features, but it is ineffective for occlusion. There is also research on appearance-based
object detection. It uses whole 2-D images to perform tracking for navigation in faster
time. However, this kind of approach requires several templates and does not work
when the target object, color or perspective view is changed. The main problem in
object detection and tracking are the temporal variation of objects due to perspective,
occlusion, interaction between objects and appearance or disappearance of objects. That
cause the appearance of a target tends to change during a long tracking. The background
in a long image sequence is also dynamic even if it is taken by a stationary camera.
Detection and tracking of multiple objects at the same time is an important issue for
real-time performance. The comprehensive search in multiple tracking is computational
expensive and incapable of being real-time system. Another issue is when using the
moving camera, instead of using camera with a fixed location, which need the analysis
of camera platform coordinate system.
1.1 Object Detection And Tracking
Video surveillance is an active research topic in computer vision that tries to detect,
recognize and track objects over a sequence of images and it also makes an attempt to
understand and describe object behavior by replacing the aging old traditional method
of monitoring cameras by human operators. Object detection and tracking are important
and challenging tasks in many computer vision applications such as surveillance,
vehicle navigation and autonomous robot navigation. Object detection involves locating
objects in the frame of a video sequence. Every tracking method requires an object
detection mechanism either in every frame or when the object first appears in the video.
Object tracking is the process of locating an object or multiple objects over time using a
camera. The high powered computers, the availability of high quality and inexpensive
video cameras and the increasing need for automated video analysis has generated a
great deal of interest in object tracking algorithms. There are three key steps in video
analysis, detection interesting moving objects, tracking of such objects from each and
every frame to frame, and analysis of object tracks to recognize their behavior.Therefore,
the use of object tracking is pertinent in the tasks of, motion based recognition.
Automatic detection, tracking, and counting of a variable number of objects are crucial
tasks for a wide range of home, business, and industrial applications such as security,
surveillance, management of access points, urban planning, traffic control, etc. However,
these applications were not still playing an important part in consumer electronics. The
main reason is that they need strong requirements to achieve satisfactory working
conditions, specialized and expensive hardware, complex installations and setup
procedures, and supervision of qualified workers. Some works have focused on
developing automatic detection and tracking algorithms that minimizes the necessity of
supervision. They typically use a moving object function that evaluates each
hypothetical object configuration with the set of available detection without to explicitly
compute their data association. Thus, a considerable saving in computational cost is
achieved. In addition, the likelihood function has been designed to account for noisy,
false and missing detections. The field of machine (computer) vision is concerned with
problems that involve interfacing computers with their surrounding environment. One
such problem, surveillance, has an objective to monitor a given environment and report
the information about the observed activity that is of significant interest. In this respect,
video surveillance usually utilizes electro-optical sensors (video cameras) to collect
information from the environment. In a typical surveillance system, these video cameras
are mounted in fixed positions or on pan-tilt devices and transmit video streams to a
certain location, called monitoring room. Then, the received video streams are
monitored on displays and traced by human operators. However, the human operators
might face many issues, while they are monitoring these sensors. One problem is due to
the fact that the operator must navigate through the cameras, as the suspicious object
moves between the limited field of view of cameras and should not miss any other
object while taking it. Thus, monitoring becomes more and more challenging, as the
number of sensors in such a surveillance network increases. Therefore, surveillance
systems must be automated to improve the performance and eliminate such operator
errors. Ideally, an automated surveillance system should only require the objectives of
an application, in which real time interpretation and robustness is needed. Then, the
challenge is to provide robust and real-time performing surveillance systems at an
affordable price. With the decrease in costs of hardware for sensing and computing, and
the increase in the processor speeds, surveillance systems have become commercially
available, and they are now applied to a number of different applications, such as traffic
monitoring, airport and bank security, etc. However, machine vision algorithms
(especially for single camera) are still severely affected by many shortcomings, like
occlusions, shadows, weather conditions, etc. As these costs decrease almost on a daily
basis, multi-camera networks that utilize 3D information are becoming more available.
Although, the use of multiple cameras leads to better handling of these problems,
compared to a single camera, unfortunately, multi-camera surveillance is still not the
ultimate solution yet. There are some challenging problems within the surveillance
algorithms, such as background modeling, feature extraction, tracking, occlusion
handling and event recognition. Moreover, machine vision algorithms are still not
robust enough to handle fully automated systems and many research studies on such
improvements are still being done. This work focuses on developing a framework to
detect moving objects and generate reliable tracks from surveillance video. The problem
is most of the existing algorithms works on the gray scale video. But after converting
the RGB video frames to gray at the time of conversion, information loss occurs.The
main problem comes when background and the foreground both have approximately
same gray values. Then it is difficult for the algorithm to find out which pixel is
foreground pixel and which one background pixel. Sometimes two different colors such
as dark blue and dark violet, color when converted to gray scale, their gray values will
come very near to each other,it can’t be differentiated that which value comes from dark
blue and which comes from dark violet. However, if color images are taken then the
background and foreground color can be easily differentiated. So without losing the
color information this modified background model will work directly on the color
frames of the video.
Fig 1.1: Object detection with label
1.1.1 Object Detection
Every tracking method requires an object detection mechanism either in every frame or
when the object first appears in the video. A common approach for object detection is to
use information in a single frame. However, some object detection methods make use of
the temporal information computed from a sequence of frames to reduce the number of
false detection.
1. Point detectors-Point detectors are used to find interesting points in images which
have an expressive texture in their respective localities. A desirable quality of an interest
point is its in variance to changes in illumination and camera viewpoint. In literature,
commonly used interest point detectors include Moravec’s detector, Harris detector,
KLT detector, SIFT detector.
2. Background Subtraction-Object detection can be achieved by building a

representation of the scene called the background model and then finding deviations
from the model for each incoming frame. Any significant change in an image region
from the background model signifies a moving object. The pixels constituting the
regions undergoing change are marked for further processing. This process is referred to
as the background subtraction. There are various methods of background subtraction
as discussed in the this report are Frame differencing Region-based (or) spatial
information, Hidden Markov models (HMM) and Eigen space decomposition.
3. Segmentation-The aim of image segmentation algorithms is to partition the image

into perceptually similar regions. Every segmentation algorithm addresses two problems,
the criteria for a good partition and the method for achieving efficient partitioning. In
the literature survey it has been discussed various segmentation techniques that are
relevant to object tracking They are, mean shift clustering, and image segmentation
using Graph-Cuts (Normalized cuts) and Active contours. Object detection can be
performed by learning different object views automatically from a set of examples by
means of supervised learning mechanism.
1.1.2 Object Tracking
The aim of an object tracker is to generate the trajectory of an object over time by
locating its position in every frame of the video. But tracking has two definition one is
in literally it is locating a moving object or multiple object over a period of time using a
camera. Another one in technically tracking is the problem of estimating the trajectory
or path of an object in the image plane as it moves around a scene. The tasks of
detecting the object and establishing a correspondence between the object instances
across frames can either be performed separately or jointly. In the first case, possible
object region in every frame is obtained by means of an object detection algorithm, and
then the tracker corresponds objects across frames. In the latter case, the object region
and correspondence is jointly estimated by iteratively updating object location and
region information obtained from previous frames. There are different methods of
Tracking
Point tracking- Tracking can be formulated as the correspondence of detecting objects

represented by points across frames. Point tracking can be divided into two broad
categories, i.e. Deterministic approach and Statistical approach. Objects detected in
consecutive frames are represented by points, and the association of the points is based
on the previous object state which can include object position and motion.
Kernel tracking- Performed by computing the motion of the object, represented by a

primitive object region, from one frame to the next. Object motion is in the form of
parametric motion or the dense flow field computed in subsequent frames. Kernel
tracking methods are divided into two subcategories
Silhouette Tracking- It Provides an accurate shape description of the target objects.

The goal of silhouette tracker is to find the object region in each frame by means of an
object model generated using the previous frames. Silhouette trackers can be divided
into two categories i.e. Shape matching and Contour tracking.
Chapter 2: History And Evolution 8

Chapter 2
History And Evolution
In this story we will review the history of object detection from “traditional object
detection period (before 2014)” to “deep learning based detection period (after 2014)”.
2.1 Traditional Object Detection And Evolution
Viola Jones Detectors:
Developed in 2001 by Paul Viola and Michael Jones,this object recognition framework
allows the detection of human faces in real-time.It uses sliding windows to go through all
possible locations and scales in an image to see if any window contains a human
face.The sliding windows essentially searches for ‘haar-like’ features (named after
Alfred Haar who developed the concept of haar wavelets).
Fig 2.1 harr like features
Thus the haar wavelet is used as the feature representation of an image.To speed up
detection, it uses integral image , which makes the computational complexity of each
sliding window independent of its window size.Another trick to improve detection speed
that was used by the authors is to use Adaboost algorithm for feature selection which
selects a small set of features that are mostly helpful for face detection from a huge set of
random features pools.The algorithm also used Detection Cascades which is a multi-

stage detection paradigm to reduce its computational overhead by spending less

computations on background windows but more on face targets.
HOG Detector :
Fig 2.2 HOG Detection

Originally proposed in 2005 by N. Dalal and B. Triggs,Hog is an improvement of the
scale invariant feature transform and shape contexts of its time.HOG works with
something called blocks(similar to a sliding window) ,a dense pixel grid in which
gradients are constituted from the magnitude and direction of change in the intensities of
pixels within the block. HOGs are widely known for their use in pedestrian detection. To
detect objects of different sizes, the HOG detector rescales the input image for multiple
times while keeping the size of a detection window unchanged.
Deformable Part-based Model (DPM):
Fig 2.3 part based detection

DPM was originally proposed by P. Felzenszwalb in 2008 as an extension of the HOG

detector.Later a variety of improvements have been made by R. Girshick. The problem
of detecting a “car” can be broken down as a ‘divide and conquer’ strategy by detecting
its window, body, and wheels.DPM uses this strategy. Training process involves learning
a proper way of decomposing an object, and inference involves ensembling detections of
different object parts.
DPM detector consists of a root-filter and a number of part-filters. A weakly supervised

learning method is developed in DPM where all configurations(size , location etc.) of
part filters can be automatically learned as latent variables.To improve detection
accuracy, R. Girshick used a special case of Multi-Instance learning for this purpose ,
and some other important techniques such as “hard negative mining”, “bounding box
regression”, and “context priming”.Later he even used a technique that implements a
cascade architecture, which has achieved over 10 times acceleration without sacrificing
accuracy.
Deep Learning era:
Unfortunately object detection has reached a plateau after 2010 as the performance of
hand-crafted features became saturated.However in 2012, the world saw the rebirth of
convolutional neural networks and deep convolutional networks were successful at
learning robust and high-level feature representations of an image.The deadlocks of
object detection was broken in 2014 by the proposal of the Regions with CNN features
(RCNN) for object detection.In this deep learning era, object detection is grouped into
two genres: “two-stage detection” and “one-stage detection”.

It starts with the extraction of a set of object proposals (object candidate boxes) by
selective search.Then each proposal is rescaled to a fixed size image and fed into a pre-
trained CNN model to extract features.Finally, linear SVM classifiers are used to predict
the presence of an object within each region and to recognize object categories.
You Only Look Once (YOLO):
Fig 2.4 you only look detection

All of the previous object detection algorithms use regions to localize the object within
the image. The network does not look at the complete image, instead it looks at parts of
the image which have high probabilities of containing the object.
YOLO trains on full images and directly optimizes detection performance.With YOLO,
a single CNN simultaneously predicts multiple bounding boxes and class probabilities
for those boxes.It also predicts all bounding boxes across all classes for an image
simultaneously.It divides the input image into an S × S grid. If the center of an object
falls into a grid cell, that grid cell is responsible for detecting that object.Each grid cell
predicts B bounding boxes and confidence scores for those boxes. These confidence
scores reflect how confident the model is that the box contains an object and also how
accurate it thinks the box it predicted is.

We have briefly discussed the evolution of Object detection which is a very

challenging,highly complex as well highly evolving domain in computer vision.Every
year, new algorithms keep on outperforming the previous ones.Today, there is a plethora
of pre-trained models for object detection.Also object detection has found its application
in many interesting fields,including Tracking objects(like tracking a ball during a match
in the football world cup),Automated CCTV surveillance,Person Detection(used in
intelligent video surveillance frameworks),Vehicle Detection etc.It has also found its
application in autonomous driving,which is one of the most interesting and highly
anticipated innovations of the modern era.
Object Detection Methods: -
First step in the process of object tracking is to identify objects of interest in the video
sequence and to cluster pixels of these objects. Since moving objects are typically the
primary source of information, most methods focus on the detection of such objects.
Detailed explanation for various methods is given below.
A. Frame differencing
The presence of moving objects is determined by calculating the difference between two
consecutive images. Its calculation is simple and easy to implement. For a variety of
dynamic environments, it has a strong adaptability, but it is generally difficult to obtain
complete outline of moving object, responsible to appear the empty phenomenon, as a
result the detection of moving object is not accurate.
B. Optical Flow
Optical flow method is to calculate the image optical flow field, and do clustering
processing according to the optical flow distribution characteristics of image. This
method can get the complete movement information and detect the moving object from
the background better, however, a large quantity of calculation, sensitivity to noise,
poor antinoise performance, make it not suitable for real-time demanding occasions.
C. Background subtraction
First step for background subtraction is background modelling. It is the core of
background subtraction algorithm. Background Modeling must sensitive enough to

recognize moving objects [10]. Background Modeling is to yield reference model. This
reference model is used in background subtraction in which each video sequence is
compared against the reference model to determine possible Variation. The variations
between current video frames to that of the reference frame in terms of pixels signify
existence of moving objects. Currently, mean filter and median filter are widely used to
realize background modeling. The background subtraction method is to use the
difference method of the current image and background image to detect moving objects,
with simple algorithm, but very sensitive to the changes in the external environment and
has poor anti- interference ability. However, it can provide the most complete object
information in the case background is known. As describe in, background subtraction
has mainly two approaches:
1. Recursive algorithm - Recursive techniques do not maintain a buffer for background
estimation. Instead, they recursively update a single background model based on each
input frame. As a result, input frames from distant past could have an effect on the
current background model. Compared with non-recursive techniques, recursive
techniques require less storage, but any error in the background model can linger for a
much longer period of time. This technique includes various methods such as
approximate median, adaptive background, Gaussian of mixture
2. Non-Recursive Algorithm - A non-recursive technique uses a sliding-window
approach for background estimation. It stores a buffer of the previous L video frames,
and estimates the background image based on the temporal variation of each pixel
within the buffer. Non-recursive techniques are highly adaptive as they do not depend
on the history beyond those frames stored in the buffer. On the other hand, the storage
requirement can be significant if a large buffer is needed to cope with slow-moving
traffic.

Object Tracking Method -
Tracking can be defined as the problem of approximating the path of an object in the
image plane as it moves around a scene. The purpose of an object tracking is to generate
the route for an object above time by finding its position in every single frame of the
video. Object is tracked for object extraction, object recognition and tracking, and
decisions about activities. According to paper, Object tracking can be classified as point
tracking, kernel based tracking and silhouette based tracking. For illustration, the point
trackers involve detection in every frame; while geometric area or kernel based tracking
or contours-based tracking require detection only when the object first appears in the
scene. As described in, tracking methods can be divided into following categories:
A. Point Tracking In an image structure, moving objects are represented by their

feature points during tracking. Point tracking is a complex problem particularly in the
incidence of occlusions, false detections of object. Recognition can be done relatively
simple, by thresholding, at of identification of these points.
1. Kalman Filter

They are based on Optimal Recursive Data Processing Algorithm. The Kalman Filter
performs the restrictive probability density propagation. Kalman filter is a set of
mathematical equations that provides an efficient computational (recursive) means to
estimate the state of a process in several aspects: it supports estimations of past, present,
and even future states, and it can do the same even when the precise nature of the
modelled system is unknown. The Kalman filter estimates a process by using a form of
feedback control. The filter estimates the process state at some time and then obtains
feedback in the form of noisy measurements. The equations for Kalman filters fall in
two groups: time update equations and measurement update equations. The time update
equations are responsible for projecting forward (in time) the current state and error
covariance estimates to obtain the priori estimate for the next time step. The
measurement update equations are responsible for the feedback. Kalman filters always
give optimal solutions.
2. Particle Filtering
The particle filtering generates all the models for one variable before moving to the next
variable. Algorithm has an advantage when variables are generated dynamically and
there can be unboundedly numerous variables. It also allows for new operation of
resampling. One restriction of the Kalman filter is the assumption of state variables are
normally distributed (Gaussian). Thus, the Kalman filter is poor approximations of state
variables which do not Gaussian distribution. This restriction can be overwhelmed by
using particle filtering. This algorithm usually uses contours, color features, or texture
mapping. The particle filter is a Bayesian sequential importance Sample technique,
which recursively approaches the later distribution using a finite set of weighted trials.
It also consists of fundamentally two phases: prediction and update as same as Kalman
Filtering. It was developing area in the field of computer vision communal and applied
to tracking problematic and is also known as the Condensation algorithm
3. Multiple Hypothesis Tracking (MHT):

In MHT algorithm, several frames have been observed for better tracking outcomes
MHT is an iterative algorithm. Iteration begins with a set of existing track hypotheses.
Each hypothesis is a crew of disconnect tracks. For each hypothesis, a prediction of
object’s position in the succeeding frame is made. The predictions are then compared by

calculating a distance measure. MHT is capable of tracking multiple object, handles

occlusions and Calculating of Optimal solutions.
B. Kernel Based Tracking
Kernel tracking is usually performed by computing the moving object, which is
represented by a embryonic object region, from one frame to the next. The object
motion is usually in the form of parametric motion such as translation, conformal, affine,
etc. These algorithms diverge in terms of the presence representation used, the number
of objects tracked, and the method used for approximation the object motion. In real-
time, illustration of object using geometric shape is common. But one of the restrictions
is that parts of the objects may be left outside of the defined shape while portions of the
background may exist inside. This can be detected in rigid and non-rigid objects .They
are large tracking techniques based on representation of object, object
features ,appearance and shape of the object.
1. Simple Template Matching
Template matching [9][4] is a brute force method of examining the Region of Interest in
the video. In template matching, a reference image is verified with the frame that is
separated from the video. Tracking can be done for single object in the video and
overlapping of object is done partially. Template Matching is a technique for processing
digital images to find small parts of an image that matches, or equivalent model with an
image (template) in each frame. The matching procedure contains the image template
for all possible positions in the source image and calculates a numerical index that
specifies how well the model fits the picture that position. It can capable of dealing with
tracking single image and partial occlusion of object.
2. Mean Shift Method
Mean-shift tracking tries to find the area of a video frame that is locally most similar to
a previously initialized model. The image region to be tracked is represented by a
histogram. A gradient ascent procedure is used to move the tracker to the location that
maximizes a similarity score between the model and the current image region. In object
tracking algorithms target representation is mainly rectangular or elliptical region. It
contain target model and target candidate. To characterize the target color histogram is
chosen. Target model is generally represented by its probability density function (pdf).
Target model is regularized by spatial masking with an asymmetric kernel.

3. Support Vector Machine (SVM)

SVM [13] is a broad classification method which gives a set of positive and negative
training values. For SVM, the positive samples contain tracked image object, and the
negative samples consist of all remaining things that are not tracked. It can handle
single image, partial occlusion of object but necessity of a physical initialization and
necessity of training.
4. Layering based tracking
This is another method of kernel based tracking where multiple objects are tracked.
Each layer consists of shape representation (ellipse), motion such as translation and
rotation, and layer appearance, based on intensity. Layering is achieved by first
compensating the background motion such that the object’s motion can be estimated
from the rewarded image by means of 2D parametric motion. Every pixel’s probability
of calculated based on the object’s foregoing motion and shape features. It can capable
of tracking multiple images and fully occlusion of object.
C. Silhouette Based Tracking Approach

Some object will have complex shape such as hand, fingers, shoulders that cannot be
well defined by simple geometric shapes. Silhouette based methods afford an accurate
shape description for the objects. The aim of a silhouette-based object tracking is to find
the object region in every frame by means of an object model generated by the previous
frames. Capable of dealing with variety of object shapes, Occlusion and object split and
merge.
1. Contour Tracking
Contour tracking methods [9], iteratively progress a primary contour in the previous
frame to its new position in the current frame. This contour progress requires that
certain amount of the object in the current frame overlay with the object region in the
previous frame. Contour Tracking can be performed using two different approaches.
The first approach uses state space models to model the contour shape and motion. The
second approach directly evolves the contour by minimizing the contour energy using
direct minimization techniques such as gradient descent. The most significant advantage
of silhouettes tracking is their flexibility to handle a large variety of object shapes.
2. Shape Matching

These approaches examine for the object model in the existing frame. Shape matching
performance is similar to the template based tracking in kernel approach. Another
approach to Shape matching is to find matching silhouettes detected in two successive
frames. Silhouette matching, can be considered similar to point matching. Detection
based on Silhouette is carried out by background subtraction. Models object are in the
form of density functions, silhouette boundary, object edges. Capable of dealing with
single object and Occlusion handling will be performed in with Hough transform
techniques.


Chapter 3
Similar Technologies
3.1 Lidar
Fig 3.1 Lidar objects detection
* Uses laser beams: - LiDAR technology uses light pulses or laser beams to determine
the distance between the sensor and the object. The laser travels to the object and is
reflected back to the source and the time taken for the laser to be reflected back is then
used to calculate the distance.
Chapter 3: Similar Technology 21

* Measures precise distance measurements: - Because of the nature of the laser

pulses, LiDAR is mostly used to measure the exact distances of an object.
The laser pulses travel at the speed of light which increases the accuracy of
the measurements.
* Measures atmospheric densities and atmospheric currents: - LiDAR technology

can be used to measure atmospheric densities of various components such as aerosols
and other atmospheric gases. This is because the pulses are more accurate and have a
shorter wavelength that can be used to acquire accurate data.
* Used in obtaining 3D images with high resolution: - LiDAR technology is

capable of creating high-resolution images of an object at any surface and
this is why it is popularly used in mapping and other topographical uses.
Based on the speed of the laser pulses from LiDAR sensors, the data is
returned fast and with accurate results.
* It is adversely affected by smoke, rain, and fog: - Unlike RADAR technology,

LiDAR pulses are adversely affected by atmospheric weather conditions such as dense
fogs, smoke, and even rain. The light pulses will be distorted during flight and this will
affect the accuracy of the data collected.
* It has a higher measurement accuracy: - Unlike RADAR, LiDAR data has a higher
accuracy of measurement because of its speed and short wavelength. Also, LiDAR
targets specific objects which contributes to the accuracy of the data relayed.

* LiDAR is cheaper when used in different applications: - LiDAR technology is

cheaper when used in large-scale applications. This is because it is fast and saves a lot
of time and it is also not very labor-intensive unlike other methods of data collection.
* Data can be collected quickly: - Because of its speed and accuracy of the laser
pulses from LiDAR sensors, the data can be collected fast and with utmost accuracy.
This is why LiDAR sensors are used in high capacity and data-intensive applications.
* It does not have geometric distortions: - LiDAR sensors are highly accurate and
are therefore not affected by geometric distortions. The data collected will be precise
and accurate and will map the exact location of the object in the image.
* It can be integrated with other data sources: - LiDAR data can easily be integrated
with other data sources such as GPS and used in mapping and calculation of distances.
This can also be applied in forest mapping and other remote sensing technologies.
3.2 Radar

Fig 3.2 object detection using radar system
*Uses Electromagnetic waves: - RADAR technology uses electromagnetic waves or

radio signals to determine the distance and angle of inclination of objects on the surface.
* It can operate in cloudy weather conditions and during the night: - Unlike
LiDAR, RADAR technology is not affected by adverse weather conditions such as
clouds, rainfall, or fogs.
* It has a longer operating distance: - RADAR technology has a longer operating

distance although it takes a longer time to return data regarding the distance of the
object.
* Cannot detect smaller objects: - It does not allow the detection of smaller objects
due to longer wavelengths. This means that data regarding very tiny objects on the
surface may be distorted or insufficient.
* No 3D replica of the object: - It cannot provide an exact 3D image of the object due
to the longer wavelength. This means that the image will be a representation of the
object but not an exact replica of the object’s characteristics.
* Determines distance from objects and their angular positions: - Apart from the
distance from an object, RADAR technology can also provide the angular positions of
objects from the surface, a characteristic that cannot be measured by LiDAR.

* RADAR measures estimated distance measurements: - RADAR technology

does not give the exact accurate measurements of distance and other
characteristics of the object because of the distortions.
* Radar beam can incorporate many targets: - A RADAR beam can have several
targets at the same time and return data on several objects at the same time. However,
this may exclude smaller objects within the target field.
* Radar may not distinguish multiple targets that are close together: - RADAR
technology cannot distinguish multiple targets within a surface that are closely
entangled together. The data may therefore not be accurate.
* RADAR takes more time to lock on an object: - RADAR, unlike LiDAR pulses,
travels at a slower speed which means more time is needed to lock onto an object and
return data regarding the object.
3.3 Sonar

Fig 3.3 detection using sonar wave
* Uses sound waves: - Sonar stands for Sound Navigation and ranging. It transmits
sound waves that are then returned in form of echoes which are used to analyze various
qualities or attributes of the target or object.
* Used to detect underwater objects: - Sonar is mainly used to detect underwater

objects because the sound waves can penetrate the water depths to the bottom of the sea.
* It is affected by variations in sound speed: - Sound travels slowly in freshwater

than in seawater. This means that the variations in the speed of sound may affect the
return echoes which may also have an impact on the data or attribute of the target.
* Mostly used to find actual sea depth: - Because of its unique capabilities of
penetrating seawater, sonar is mainly used to calculate the depth of the sea because it is
fast and accurate.
* Is not affected by surface factors: - The sound waves are not affected by the
calmness or the roughness of the water surface. They can penetrate even tides and still
get the necessary data required.
* It has adverse effects on marine life: - Sound waves from sonar have adverse effects
on marine life such as whales that also depend on sound waves.
* Sonar generates a lot of noise: - The sound waves from the transmitters usually
generate a lot of noise that also have an effect on the marine life that live deep sea.

* Passive sonar does not require a transmitter and a receiver: - Unlike active sonar
that transmits with the help of a transmitter and also relies on a receiver, passive sonar
does not transmit. It listens without transmitting
* Scattering: - Active sonar may lead to scattering from small objects as well as the sea
bottom and surface which may cause interference.
* Causes decompression sickness: - SONAR may cause decompression sickness that

may be fatal.

This page was intentionally left blank

Chapter 4
Applications, Prons & Cons
1. OPTICAL CHARACTER RECOGNITION -
Fig 4.1 character recognition
Optical character recognition or optical character reader, often abbreviated as OCR, is

the mechanical or electronic conversion of images of typed, handwritten or printed text
into machine-encoded text, whether from a scanned document, a photo of a document, a
scene-photo (for example the text on signs and billboards in a landscape photo) or from
subtitle text superimposed on an image, we are extracting characters from the image or
video.Widely used as a form of information entry from printed paper data records –
whether passport documents, invoices, bank statements, computerized receipts, business
cards, mail, printouts of static-data, or any suitable documentation it is a common
method of digitizing printed texts so that they can be electronically edited, searched,
stored more compactly, displayed on-line, and used in machine processes such as
cognitive computing, machine translation, (extracted) text-to-speech.
Chapter 4: Applications Prons & Cons 29

2. SELF DRIVING CARS -
Fig 4.2 self driving car
One of the best examples of why you need object detection is for autonomous driving is
In order for a car to decide what to do in next step whether accelerate, apply brakes or
turn, it needs to know where all the objects are around the car and what those objects are
That requires object detection and we would essentially train the car to detect known set
of objects such as cars, pedestrians, traffic lights, road signs, bicycles, motorcycles, etc
3. TRACKING OBJECTS -
Fig 4.3 tracking objects with label
Object detection system is also used in tracking the objects, for example tracking a ball
during a football match, tracking movement of a cricket bat, tracking a person in a

video.Object tracking has a variety of uses, some of which are surveillance and security,
traffic monitoring, video communication, robot vision and animation.
4. FACE DETECTION AND FACE RECOGNITION -
Face detection and Face Recognition is widely used in computer vision task. We
noticed how facebook detects our face when you upload a photo This is a simple
application of object detection that we see in our daily life.Face detection can be
regarded as a specific case of object-class detection. In object-class detection, the task is
to find the locations and sizes of all objects in an image that belong to a given class.
Examples include upper torsos, pedestrians, and cars. Face detection is a computer
technology being used in a variety of applications that identifies human faces in digital
images. Face recognition describes a biometric technology that goes way beyond
recognizing when a human face is present. It actually attempts to establish whose face it
is. Face-detection algorithms focus on the detection of frontal human faces. It is
analogous to image detection in which the image of a person is matched bit by bit.
Image matches with the image stores in database. Any facial feature changes in the
database will invalidate the matching process.
Fig 4.4 face detection
There are lots of applications of face recognition. Face recognition is already being used
to unlock phones and specific applications. Face recognition is also used for biometric

surveillance, Banks, retail stores, stadiums, airports and other facilities use facial
recognition to reduce crime and prevent violence.
5. IDENTITY VERIFICATION THROUGH IRIS CODE -
Fig 4.5 iris code
Iris recognition is one of the most accurate identity verification systems. Identity
verification and identification is becoming increasingly popular. However, advances in
the field have expanded the options to include biometric such as iris, retina and more.
Among the large set of options it has been shown that the iris is the most accurate
biometric. Hence we need object detection system in iris detection.
6. OBJECT COUNTING -
Object detection system can also be used for counting the number of objects in the
image or real time video.
People Counting: Object detection can be also used for people counting, it is used for
analyzing store performance or crowd statistics during festivals. These tend to be more
difficult as people move out of the frame quickly (also because people are non rigid
objects).
7. SMILE DETECTION -
Facial expression analysis plays a key role in analyzing emotions and human behaviors.
Smile detection is a special task in facial expression analysis with various potential
applications such as photo selection, user experience analysis and patient monitoring

Fig 4.6 smile detection
8. MEDICAL IMAGING
Fig 4.7 object detection in health care

Medical image processing tools are playing an increasingly important role in assisting
the clinicians in diagnosis, therapy planning and image-guided interventions. Accurate,
robust and fast tracking of deformable anatomical objects such as the heart, is a crucial
task in medical image analysis.
9. BALL TRACKING IN SPORTS

Fig 4.8 ball tracking

Increase in the number of sport lovers in games like football, cricket, etc. has created a
need for digging, analyzing and presenting more and more multidimensional
information to them. Different classes of people require different kinds of information
and this expands the space and scale of the required information. Tracking of ball
movement is of utmost importance for extracting any information from the ball based
sports video sequences and we can record the video frame according to the movement
of the ball automatically
10. AUTOMATED CCTV
Fig 4.9 automated cctv

Surveillance is an integral part of security and patrol. Recent advances in computer
vision technology have lead to the development of various automatic surveillance
systems, however their effectiveness is adversely affected by many factors and they are
not completely reliable. This study investigated the potential of automated surveillance
system to reduce the CCTV operator workload in both detection and tracking activities.

Normally CCTV is Running every time, so we need large size of memory system to
store the recorded video. By using object detection system we can automate CCTV in
such a way that if some objects are detected then only recording is going to start. Using
this we can decrease the repeatedly recording same image frames, which increases the
memory efficiency. We can decrease the memory requirement by using this object
detection system.
11. ROBOTICS
Fig 4.9.1 robot

Autonomous assistive robots must be provided with the ability to process visual data in
real time so that they can react adequately for quickly adapting to changes in the
environment. Reliable object detection and recognition is usually a necessary early step
to achieve this goal.

Table 4.1 performance of object detection methods
Methods Accuracy Computational

Time
Background
Subtraction Moderate Moderate
Optical Flow
Moderate
High
Frame Low to
Differencing High Moderate
Table 4.2 advantage and limitations of object tracking methods
Method Advantages Limitations

Kalman Filter Can track point in image State variable are
which are noisy normally distributed
Particle Filter Optimal results when
evolution of the image
takes place at hypothesis
object position
Mean Shift When the background is
Applicable for situations similar to the target,
with dominant colors. tracking problems arise.
Template matching Not suitable for complex
templates.Problem can
arise when object
Relatively easier to temporarily leave the
implement and use. frame or become occluded
KTL Tracker Time efficient, robust Multiple object tracking
occlusion. becomes highly complex.


Chapter 5
Summary
This report gives an idea about object detection and tracking. we see about some
techniques for detection and tracking of objects. Object detection is the process of
detecting a target object in an image or a single frame of the video. First some existing
algorithms for detecting the objects like Frame difference method, optical flow,
background substrction.
Object tracking refers to the ability to estimate or predict the position of a target object
in each consecutive frame in a video once the initial position of the target object is
defined.
In object tracking First we see about the point tracking, in an image structure, moving
objects are represented by their feature points during tracking and In point tracking we
learn about some point tracking methods like Kalman Filter, Particle Filtering , Multiple
Hypothesis Tracking.
The second technique of object tracking is kernel-based tracking in this we learn some
methods like Simple Template Matching, Mean Shift Method, Support Vector Machine
(SVM), Layering based tracking.
The third technique of object tracking is Silhouette Based Tracking Approach - Some
object will have complex shape such as hand, fingers, shoulders that cannot be well
defined by simple geometric shapes. Silhouette based methods afford an accurate shape
description for the objects. The aim of a silhouette-based object tracking is to find the
object region in every frame by means of an object model generated by the previous
frames. Capable of dealing with a variety of object shapes, Occlusion and object split
and merge , have two algorithms Contour Tracking, Shape Matching.
Chapter 5: Summary 38
Chapter 5: Summary 39
References
[1] analytics-vidhya link - https://medium.com/analytics-vidhya/evolution-of-
object-detection-582259d2aa9b
[2] A Survey on Object Detection and Tracking Algorithms by Rupesh Kumar

Raout http://ethesis.nitrkl.ac.in/4836/1/211CS1049.pdf
[3] Alper Yilmaz, Omar Javed, and Mubarak Shah. Object tracking: A survey.
Acm Computing Surveys (CSUR), 38(4):13, 2006
[4] Object Detection and Tracking in Images and Point Clouds Daniel J.
Finnegan link- https://ps2fino.github.io/documents/Daniel_J._Finnegan-
Thesis.pdf
[5] Object detection and tracking in video image Rajkamal Kishor Gupta link -
http://ethesis.nitrkl.ac.in/6256/1/E-1.pdf
References 40

Seminar Report On Object Detection and Tracking

Uploaded by

Copyright:

Available Formats

Seminar Report On Object Detection and Tracking

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Seminar Report On Object Detection and Tracking

Uploaded by

Copyright:

Available Formats

Object Detection And Tracking

Seminar Report Submitted in Partial Fulfillment of

Under the Supervision of

Department of Computer Science and Engineering

M.B.M. Engineering College, Jai Narain Vyas University

I, Pravin Kumar would sincerely like to thank Dr. Nemi Chand

Finally, yet most importantly, I would like to express my heartfelt

1. Introduction to the Topic (6-8 pages) 1

1.1. Object detection and tracking 1

1.1.1. Object detection

2. History, Evolution & Technical Details (4-5 Pages) 9

3. Similar Technologies (4-5 Pages) 21

4. Applications, Pros & Cons (3-4 Pages) 29

5. Summary of Study (1-2 Pages) 38

1.1 Object Detection With Label 1 5

1.1 Object Detection And Tracking

Fig 1.1: Object detection with label

1.1.1 Object Detection

2. Background Subtraction-Object detection can be achieved by building a

3. Segmentation-The aim of image segmentation algorithms is to partition the image

1.1.2 Object Tracking

Point tracking- Tracking can be formulated as the correspondence of detecting objects

Kernel tracking- Performed by computing the motion of the object, represented by a

Silhouette Tracking- It Provides an accurate shape description of the target objects.

This page was intentionally left blank.

Chapter 2: History And Evolution 8

2.1 Traditional Object Detection And Evolution

Viola Jones Detectors:

Fig 2.1 harr like features

Chapter 2: History And Evolution 9

stage detection paradigm to reduce its computational overhead by spending less

Fig 2.2 HOG Detection

Deformable Part-based Model (DPM):

Fig 2.3 part based detection

Chapter 2: History And Evolution 10

DPM was originally proposed by P. Felzenszwalb in 2008 as an extension of the HOG

DPM detector consists of a root-filter and a number of part-filters. A weakly supervised

Deep Learning era:

Chapter 2: History And Evolution 11

Fig 2.4 you only look detection

Chapter 2: History And Evolution 12

We have briefly discussed the evolution of Object detection which is a very

Chapter 2: History And Evolution 13

Chapter 2: History And Evolution 14

Object Tracking Method -

A. Point Tracking In an image structure, moving objects are represented by their

Chapter 2: History And Evolution 15

3. Multiple Hypothesis Tracking (MHT):

Chapter 2: History And Evolution 16

calculating a distance measure. MHT is capable of tracking multiple object, handles

Chapter 2: History And Evolution 17

3. Support Vector Machine (SVM)

C. Silhouette Based Tracking Approach

Chapter 2: History And Evolution 18

Chapter 2: History And Evolution 19

This page was intentionally left blank.

Chapter 2: History And Evolution 20

Fig 3.1 Lidar objects detection

Chapter 3: Similar Technology 21