0% found this document useful (0 votes)
4 views

real object detection system using yolov3 images

The document presents a real object detection system utilizing an improved YOLOv3 model for concurrent real-time detection across multiple cameras, addressing limitations of existing systems such as memory constraints and processing speed. The proposed architecture optimizes memory usage and enhances detection performance, capable of recognizing 80 different objects using a dataset of 2.5 million labeled instances. The system includes various modules for training, analysis, testing, detection, and recognition, with specified hardware and software configurations for implementation.

Uploaded by

nagaranivo2003
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

real object detection system using yolov3 images

The document presents a real object detection system utilizing an improved YOLOv3 model for concurrent real-time detection across multiple cameras, addressing limitations of existing systems such as memory constraints and processing speed. The proposed architecture optimizes memory usage and enhances detection performance, capable of recognizing 80 different objects using a dataset of 2.5 million labeled instances. The system includes various modules for training, analysis, testing, detection, and recognition, with specified hardware and software configurations for implementation.

Uploaded by

nagaranivo2003
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

REAL OBJECT DETECTION SYSTEM USING YOLOV3

IMAGES
ABSTRACT

Object detection is a stimulating task in the applications of computer


vision. It is gaining a lot of attention in many real-time applications such as
detection of number plates of suspect cars, identifying trespassers under
surveillance areas, detecting unmasked faces in security gates during the COVID-
19 period, etc. Region-based Convolution Neural Networks(R-CNN), You only
Look once (YOLO) based CNNs, etc., comes under Deep Learning
approaches. In this proposed work, an improved stacked Yolov3 model is designed
for the detection of objects by bounding boxes. Hyperparameters are tuned to get
optimum performance.
The proposed model evaluated using the COCO dataset, and the
performance is better than other existing object detection models. Anchor boxes
are used for overlapping objects. After removing all the predicted bounding
boxes that have a low detection probability, bounding boxes with the highest
detection probability are selected and eliminated all the bounding boxes whose
Intersection Over Union value is higher than 0.4. Non-Maximal Suppression
(NMS) is used to only keep the best bounding box. In this experimentation, we
have tried with various range of values, but finally got better result at threshold 0.5.
EXISTING SYSTEM

There has been a very limited number of works on concurrent real-time


object detection on multiple live streams. In existing system, in some real-world
applications, such as concurrent real-time inference on a GPU server in a
commercial system, the available computing GPU resources are limited in terms of
memory. In this case, each object detection approach receives a live stream from a
camera and all processing is performed on a GPU. The main problem is system
resources such as memory, CPU, and GPU when using them concurrently for real-
time detection scenarios.

DISADVANTAGES OF EXISTING SYSTEM

1. Limited Memory.

2. Difficult to scale up.

3. Time consuming
PROPOSED SYSTEM
In proposed system, we try to provide an optimized architecture to concurrently detect
objects on multiple cameras. This proposed system uses YOLOv3 for concurrent real-time
objection detection on a single GPU server using a multi-thread architecture. YOLOv3 uses
Darknet-53 architecture, which has 53 convolutional layers trained on ImageNet dataset. The
proposed model is designed to spot even tiny objects from the image. The proposed model able
to recognize 80 different objects in a single image. For the task of detection, 53 additional layers
are stacked onto it, giving us a 106 layer fully convolutional layers. Our purpose is to provide an
optimized architecture that significantly decreases memory usage.

3.2.1. ADVANTAGES OF PROPOSED SYSTEM


• It takes low memory usage.
• High speed in detection performance.
Modules Description:
1) Training DataSet
It collects the lot of images with specified angles and dimensions to train the system to
understand about the objects and their types. Large scale annotated image dataset ImageNet
which contained high resolution images, making it possible to train deep models with large scale
training data. It also implements pre-process technique for implementation of image learning for
each segmentation. It also verifies the alignment of every image; then only they are eligible for
training phase. COCO explicates 91 classes, but data uses only 80 classes. COCO has 91 object
categories therein 82 of them have quite 5,000 labeled instances. The dataset has a total of
2,500,000 labelled instances respective to the 328,000 images.

2) Analyze
It is used for detection of faces and to reduce the noises present in the complicated movie
scenes. It defines various graph editing operations as per the noise analysis and then designs the
edit cost function to improve the performance.

3) Testing Dataset
After training and analyzing phase, go to testing phase to test the detection algorithm
whether the given image is tested based on previous training working properly or not. we first
highlight the importance of learning strategy of detection due to the difficulty of training
detectors, and then introduce the optimization techniques for both training and testing stages in
detail. Finally, we review some real-world object detection based applications including face
detection, pedestrian detection, logo detection and video analysis. During testing, images are
resized to different scales followed by multiple detectors and the detection results are merged.

4) Detection
In this module we are going to detect the face of the movie characters. In this module we
are using cv library and facenet library. After installing the cv2 and facenet libraries in this
project which refers to read the images from specified path, splits the regions and compare them
with every trained data which was already implemented in pre-process mechanism. Facenet
library is used to load the model from the given directory, get the RGB colors for the divided
regions to match them. When the completion of this process it detects the objects which are
available the given image or given video. It get the precise localization of objects, 5 million
Object instances and 200 K pictures are labelled out of 330 K pictures.

5) Recognition and result


In this module we are going to recognize the face of the movie characters which is we
previously stored on the face database. We just found that the give the real name of it. This is
going to be done here. Here we are using deep learning object detection algorithm to implement
this process accurately and time cost effective. In fact, instance segmentation can be viewed as a
special setting of object detection, where instead of localizing an object by a bounding box,
pixel-level localization is desired.
HARDWARE CONFIGURATION

The below Hardware Specifications were used in both Server and Client machines when
developing.

Processor : Intel(R) Core(TM) i3


Processor Speed : 3.06 GHz
Ram : 2 GB
Hard Disk Drive : 250 GB
Floppy Disk Drive : Sony
CD-ROM Drive : Sony
Monitor : “17” inches
Keyboard : TVS Gold
Mouse : Logitech

SOFTWARE CONFIGURATION

The below Software Specifications were used in both Server and Client machines when
developing.

SERVER
Operating System : Windows 7
Technology Used : Python
Database : My-SQL
Database Connectivity : Native Connectivity
Web Server : Django
Browser : Chrome

CLIENT
Operating System : Windows 7
Browser : Chrome

You might also like