0% found this document useful (0 votes)
4 views11 pages

Object Detection

This thesis compares two object detection algorithms, YOLO and RCNN, focusing on their configurations, performance, and accuracy. It emphasizes the importance of object detection for the autonomy of devices like smartphones and robots, driven by advancements in machine learning and deep learning. The project aims to evaluate the algorithms' effectiveness in detecting, classifying, and tracking multiple objects in images and videos.

Uploaded by

minsetpaing.11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views11 pages

Object Detection

This thesis compares two object detection algorithms, YOLO and RCNN, focusing on their configurations, performance, and accuracy. It emphasizes the importance of object detection for the autonomy of devices like smartphones and robots, driven by advancements in machine learning and deep learning. The project aims to evaluate the algorithms' effectiveness in detecting, classifying, and tracking multiple objects in images and videos.

Uploaded by

minsetpaing.11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

ABSTRACT

The ability to recognize objects is born with human and animals. Humans and animals
can recognize objects without much effort. Object recognition is part of their daily lives and they
don’t even notice about it. The ability to recognize and classify objects for computers is called
object detection. Object detection is a key ability for many computers, smartphones and robots.
Many deep learning algorithms have made object detection to progress greatly in many
directions. This thesis focuses on comparison of object detection using two algorithms, YOLO
and RCNN. The configurations, performance and accuracy will be compared and discussed.
CHAPTER 1

INTRODUCTION

1.1 Motivation

As technologies have been made significantly advanced progress in the recent years, people
wanted their devices and gadgets to be automated, starting from smartphones, robots to self-
driving cars. When making devices to be autonomous, using scripts, programs or sensors
cannot satisfy the needs due to the fact that both of them will work as the way of how they’re
programmed by the programmer. Devices needed intelligence to make decisions or classify
items. As machine learning and deep learning researchers and practitioners have contributed
to the field of artificial intelligence, devices can recognize objects in images, classify music
from audio files and predict the prices and stock shares. Intelligence for smartphones,
machines, computers and robots to make them more and more autonomous and independent
of human supervision is a sustain dream for the mankind. Many science-fiction movies have
shown robots that do domestic work, providing healthcare, fight in battlegrounds and
companioning humans.

A robot cannot be intelligent and independent if it cannot see and adapt to the
surrounding environment. Engineers and scientist implemented image recognition
technologies inside the intelligence robots. It must also be able to recognize people’s faces,
determine which object to pick up, drop objects at the required place or give them to people,
avoid the objects that are obstacles in its path and ability to understand human language. The
key ability for a robot or computer is object detection. Scientists and researchers have
contributed several algorithms to carry out object detection.

1.2 Purpose and Scope

The purpose of the thesis is to compare the algorithms in detection, classification and
tracking the objects. According to the need for detecting objects, the goal of the thesis project
is to identify multiple objects in the image or video using two algorithms, YOLO and RCNN.
Once the development of the project is finished, there will be measurements and evaluations
in terms of configurations, performance and accuracy of detecting objects.

1.3 Development

As mentioned earlier, detection algorithms will be implemented using Python, open


source interpreted programming language. Python has many libraries that supports
machine learning and scientific computations such as NumPy, TensorFlow, PyTorch,
Scikit Learn and Matplotlib. When an image is feed as input to the neural network, the
detected objects will be shown with inside a rectangle with their respective color and
label text on the rectangle. If the input is video, the same process will be carried out for
every single frame.

Cars and pedestrians detected in an image


Chair, monitor and plant detected in an image
Elephants and zebras detected in an image.

CHAPTER 2

THEORY

2.1 Neural Networks

A neural network is inspired


from the networks of neurons found inside
brains of humans and animals. Neural
Networks can do signal processing,
predicting, regression, classification and
clustering. A neuron is a single processing unit inside the
neural network. Neurons connect to each other with coefficients bounded
with coefficients called weights and additional values called bias. This mathematical
framework is one of the most used in the artificial intelligence.
A simple neural network with input layer, a hidden layer and output layer.

2.2 YOLO

Existing detection algorithms from the last decade make use of classifiers to perform
detection. To detect an object, they take a classifier for the object and calculate its probabilities
and confidence values at different locations in an image.

More recent approaches like RCNN use region proposal technics to generate bounding
boxes in the image that is being classified to run a classifier on the bounding boxes. After
classification, a method called post-processing is used to improve the quality of the bounding
boxes, eliminate nearby duplicate detections. These algorithms are slow, resource-hungry and
difficult to optimize because each individual component must be trained separately.
YOLO reframes object detection as a single regression problem, straight from image
pixels to bounding box coordinates and class probabilities. Using YOLO, you only look once at
an image to predict what objects are in the image and location of the objects in the image. YOLO
is amazingly simple a simultaneously predicts multiple bounding boxes and class probabilities
for those boxes. YOLO trains on full images and directly optimizes detection performance. This
unified model has several benefits over traditional methods of object detection.

Detecting dog, bike and vehicle with YOLO, each color showing the class of objects
An example of convolutional neural network

The detection system divides the input image into a S × S grid. If the center of an object falls
into a grid cell, that grid cell is responsible for detecting that object. Each grid cell predicts B
bounding boxes and confidence scores for those boxes. These confidence scores reflect how
confident the model is that the box contains an object and also how accurate it thinks the box
is that it predicts. If no object exists in that cell, the confidence scores should be zero.
Otherwise, the confidence score should be equal to the intersection over union (IOU)
between the predicted box and the ground truth. Each bounding box consists of 5 predictions:
x, y, w, h, and confidence. The (x, y) coordinates represent the center of the box relative to
the bounds of the grid cell. The width and height are predicted relative to the whole image.
Finally, the confidence prediction represents the IOU between the predicted box and any
ground truth box.

Each grid cell also predicts C conditional class probabilities, Pr (Classi | Object). These
probabilities are conditioned on the grid cell containing an object. We only predict one set of
class probabilities per grid cell, regardless of the number of boxes B. At test time we multiply
the conditional class probabilities and the individual box confidence predictions
YOLO detecting a bird, bounding box(red), grid cells(green) and x, y, w, h values

IoU Formula
Ground truth box and predicted box while detecting a stop sign
Accuracy of YOLO depending on IoU

You might also like