0% found this document useful (0 votes)
263 views3 pages

Project Detecto!: A Real-Time Object Detection Model

The document discusses Project DetectO!, a real-time object detection model. The model aims to incorporate state-of-the-art object detection techniques to achieve high accuracy with real-time performance. It trains a neural network on a challenging publicly available dataset used in an annual object detection challenge. The resulting system is fast and accurate, making it suitable for applications requiring object detection. Template matching is used to detect objects by performing normalized cross-correlations between template images of training objects and new images. Filtering and blob analysis techniques are also discussed as early vision processing steps.

Uploaded by

ANURAG V NAIR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
263 views3 pages

Project Detecto!: A Real-Time Object Detection Model

The document discusses Project DetectO!, a real-time object detection model. The model aims to incorporate state-of-the-art object detection techniques to achieve high accuracy with real-time performance. It trains a neural network on a challenging publicly available dataset used in an annual object detection challenge. The resulting system is fast and accurate, making it suitable for applications requiring object detection. Template matching is used to detect objects by performing normalized cross-correlations between template images of training objects and new images. Filtering and blob analysis techniques are also discussed as early vision processing steps.

Uploaded by

ANURAG V NAIR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

1

Project DetectO!: A real-time object detection


model
Afeeza Ali, Anurag, Anudeep Palliyath, Anjana K, and Under the guidance of Prof. Arvind Naik

Abstract—Efficient and accurate object detection has been an Appearance-based object recognition systems are currently
important topic in the advancement of computer vision systems. the most successful approach for dealing with 3D recognition
With the advent of deep learning techniques, the accuracy for of arbitrary objects in the presence of clutter and occlusion.
object detection has increased drastically. The project aims to
incorporate state-of-the-art technique for object detection with For appearance-based models, only the appearance is used,
the goal of achieving high accuracy with a real-time performance. which is usually captured by different two-dimensional views
A major challenge in many of the object detection systems is the of the object-of-interest. Based on the applied features these
dependency on other computer vision techniques for helping the methods can be sub-divided into two main classes, i.e., local
deep learning based approach, which leads to slow and non- and global approaches. A local feature is a property of an
optimal performance. In this project, the network is trained on
the most challenging publicly available dataset, on which a object image (object) located on a single point or small region. It is
detection challenge is conducted annually. The resulting system a single piece of information describing a rather simple, but
is fast and accurate, thus aiding those applications which require ideally distinctive property of the object’s projection to the
object detection. camera (image of the object). Examples for local features of
an object are, e.g., the colour, (mean) gradient or (mean) gray
value of a pixel or small region. For object recognition tasks
I. I NTRODUCTION
the local feature should be invariant to illumination changes,
The Object Detection and Recognition system in images is noise, scale changes and changes in viewing direction, but,
the technique which mainly aims to detect the multiple objects in general, this cannot be reached due to the simpleness of
from various types of images. It also recognizes the images the features itself. Thus, several features of a single point
after performing the detection. Object detection is a computer or distinguished region in various forms are combined and a
technology related to computer vision and image processing more complex description of the image usually referred to as
that deals with detecting instances of semantic objects of a descriptor is obtained. A distinguished region is a connected
certain class (such as humans, buildings, or cars) in digital part of an image showing a significant and interesting image
images and videos. Well-researched domains of object detec- property. It is usually determined by the application of a region
tion include face detection and pedestrian detection. Object of interest detector to the image.
detection has applications in many areas of computer vision, 1.3 Proposed System
including image retrieval and video surveillance. Object recog- Template matching is a technique in digital image pro-
nition is an important task in image processing and computer cessing for finding small parts of an image which match a
vision. It is concerned with determining the identity of an template image. It can be used in manufacturing as a part
object being observed in an image from a set of known tags. of quality control, a way to navigate a mobile robot, or as a
Humans can recognize any object in the real world easily way to detect edges in images. Template matching is a simple
without any efforts; on contrary machines by itself cannot task of performing a normalized cross-correlation between a
recognize objects. template image (object in training set) and a new image. For
1.1 Problem Statement matching the template with the data image, different iterations
Many problems in computer vision were saturating on their of geometrical parameters (such as scale, rotation etc) are
accuracy before a decade. However, with the rise of deep applied and the required image is found. This method is
learning techniques, the accuracy of these problems drastically normally implemented by first picking out a part of the search
improved. One of the major problems was that of image image to use as a template.
classification, which is defined as predicting the class of 1.4 Objective
the image. A slightly complicated problem is that of image A well known application of object detection is face de-
localization, where the image contains a single object and tection, that is used in almost all the mobile cameras. A
the system should predict the class of the location of the more generalized (multi-class) application can be used in
object in the image (a bounding box around the object). The autonomous driving where a variety of objects need to be
more complicated problem (this project), of object detection detected. Also it has a important role to play in surveillance
involves both classification and localization. In this case, the systems. These systems can be integrated with other tasks such
input to the system will be a image, and the output will be a as pose estimation where the first stage in the pipeline is to
bounding box corresponding to all the objects in the image, detect the object, and then the second stage will be to estimate
along with the class of object in each box. pose in the detected region. It can be used for tracking objects
1.2 Existing System and thus can be used in robotics and medical applications.
2

Thus this problem serves a multitude of applications. When an image is acquired by a camera or other imaging
Flow of the model: system, often the vision system for which it is intended is
unable to use it directly. The image may be corrupted by
random variations in intensity, variations in illumination, or
poor contrast that must be dealt with in the early stages of
vision processing.
Filtering is a technique for modifying or enhancing an im-
age. For example, you can filter an image to emphasize certain
features or remove other features. Image processing operations
implemented with filtering include smoothing, sharpening, and
edge enhancement.
Filtering is a neighborhood operation, in which the value of
any given pixel in the output image is determined by applying
some algorithm to the values of the pixels in the neighborhood
of the corresponding input pixel. A pixel’s neighborhood is
some set of pixels, defined by their locations relative to that
pixel. (See Neighborhood or Block Processing: An Overview
for a general discussion of neighborhood operations.) Linear
filtering is filtering in which the value of an output pixel is
a linear combination of the values of the pixels in the input
pixel’s neighborhood.
Blob Analysis:
Blob Analysis is a fundamental technique of machine vision
based on analysis of consistent image regions. As such it
is a tool of choice for applications in which the objects
being inspected are clearly discernible from the background.
Diverse set of Blob Analysis methods allows to create tailored
solutions for a wide range of visual inspection problems.
Main advantages of this technique include high flexi-
bility and excellent performance. Its limitations are: clear
background-foreground relation requirement (see Template
Matching for an alternative) and pixel-precision (see 1D Edge
Detection for an alternative).
Fig. 1. Flow diagram Blob detection methods are aimed at detecting regions in
a digital image that differ in properties, such as brightness or
color, compared to surrounding regions. Informally, a blob is
Capture Video:
a region of an image in which some properties are constant
The real time video is captured using the webcame and
or approximately constant; all the points in a blob can be
the frames are thus extracted. In order to capture the video
considered in some sense to be similar to each other. The
openCV library is used as a feature of Python.
most common method for blob detection is convolution.
Feature extraction starts from an initial set of measured data
and builds derived values (features) intended to be informative
and non-redundant, facilitating the subsequent learning and II. C ONCLUSION
generalization steps, and in some cases leading to better human In this paper we propose a model that can efficiently
interpretations. detect realtime object using a live video recording device.
Feature extraction is a dimensionality reduction process, Using machine learning techniques to image classifications
where an initial set of raw variables is reduced to more man- and counting we detect and classify the categorical object.
ageable groups (features) for processing, while still accurately REFERENCES:
and completely describing the original data set. [1] Dong, C., Loy, C.C., He, K. and Tang, X., 2014,
When the input data to an algorithm is too large to be September. Learning a deep convolu- tional network for image
processed and it is suspected to be redundant then it can super-resolution. In European conference on computer vision
be transformed into a reduced set of features (also named a (pp. 184-199).
feature vector). Determining a subset of the initial features [2] Ignatov, A., Kobyshev, N., Timofte, R., Vanhoey, K. and
is called feature selection. The selected features are expected Van Gool, L., 2017, October. DSLR-quality photos on mobile
to contain the relevant information from the input data, so devices with deep convolutional networks. In the IEEE Int.
that the desired task can be performed by using this reduced Conf. on Computer Vision (ICCV)
representation instead of the complete initial data. [3] Vinyals, O., Toshev, A., Bengio, S. and Erhan, D.,
Image filtering: 2015. Show and tell: A neural image caption generator. In
3

Proceedings of the IEEE conference on computer vision and


pattern recognition (pp. 3156-3164)
[4]Karpathy, A. and Fei-Fei, L., 2015. Deep visual-semantic
alignments for generating im- age descriptions. In Proceedings
of the IEEE conference on computer vision and pattern recog-
nition (pp. 3128-3137).

You might also like