Grp2 Final PPT YOLO Moving Object Classification

Final Year Project
Evaluation (Semester – 8)
PROJ - CS881
Moving Object
Classification
using Deep Learning
YOLO v3
Guided By – Amlan Ray Chaudhuri Sir

Team Members
Kaustav Sarkar Ahana Basu Krishnendu Manjit Paul

11500120051 11500120098 Sankar Mandal 11500120107
11500120012
Contents 1. Objective
2. Motivation
3. Literature Survey
4. Proposed Methodology
5. Result/ Conclusion
6. Application & Future Scope
7. References
Objectiv
e
1. Real-time Object Detection & Classification:
in video streams using YOLO (You Only Look Once) deep learning. This involves identifying
objects in each frame, drawing bounding boxes, and labeling them with confidence scores.
2. Pre-trained Model Integration:

To utilize a pre-trained YOLO model trained on the COCO (Common Object in Context)
dataset. This enables leveraging existing deep learning models for accurate object
detection across various classes.
3. Interactive Video Processing:

To process video input from a webcam or specified file, demonstrating practical computer
vision applications for object recognition in real-time.
Motiv
ation
Why YOLO v3 ???
Single Forward Pass:

Unlike traditional object detectors that apply the model to multiple layers, thus,
significantly reduces the computational cost and increasing detection speed.
Open Source:
The availability of pre-trained models and open-source implementations makes it
accessible.
● Here, we use Yolo V3 model instead of Yolo V8 because it gives maximum 0.9 and minimum 0.6
confidence whereas on the other hand Yolo V8 only gives maximum 0.27 confidence. (tested)
● Efficient and accurate object detection in live video feeds for applications like
surveillance, traffic monitoring, and robotics.
Literature Survey
Research Paper 1:
(A Lightweight Moving Vehicle Classification System Through Attention-Based Method and
Deep Learning)
https://ieeexplore.ieee.org/abstract/document/8886464/authors#authors
Published Date: 30th October, 2019

Authors: Nasaruddin Nasaruddin | Kahlil Muchtar | Afdhal Afdhal
The research paper discusses using convolutional neural networks (CNNs) for vehicle classification in videos, focusing
on intelligent transportation systems. They propose an attention-based approach to overcome challenges like camera
jitter and bad weather.
Methodology involves two parts: attention-based detection and fine-grained classification using YOLOv3. The authors
customize YOLOv3 with their dataset of 49,652 annotated frames covering four vehicle classes. They address class
imbalance with image augmentation. Results show the proposed method outperforms existing techniques in specificity,
false-positive rate, and precision. They use challenging outdoor scenes from CDNET2014 for evaluation.
Disadvantage: However, the paper lacks details on computational complexity and scalability.
In summary, the paper introduces a promising approach for vehicle classification in complex scenarios, showing
potential for real-world applications in intelligent transportation systems. However, it could provide more insights into
computational efficiency and scalability for practical implementation.
Research Paper 2:
(Moving Vehicle Detection and Classification Using Gaussian Mixture Model and Ensemble Deep
Learning Technique)
https://www.hindawi.com/journals/wcmc/2021/5590894/
Published Date: 27th May, 2021

Authors: Preetha Jagannathan | Sujatha Rajkumar | Jaroslav Frnda | Prabu Subramani
The research paper focuses on improving automatic vehicle classification in visual traffic surveillance systems,
particularly during lockdowns to curb COVID-19 spread. It proposes a new technique for classifying vehicle types,
addressing issues with imbalanced data and real-time performance.
The methodology involves collecting data from the Beijing Institute of Technology Vehicle Dataset and the
MIOvision Traffic Camera Dataset. Techniques like adaptive histogram equalization and Gaussian mixture model are
used for image enhancement and vehicle detection. Feature extraction is done using the Steerable Pyramid
Transform and Weber Local Descriptor, followed by classification using ensemble deep learning.
The proposed technique achieves high classification accuracy of 99.13% and 99.28% on the MIOvision Traffic
Camera Dataset and the Beijing Institute of Technology Vehicle Dataset, respectively. These results outperform
existing benchmark techniques.
Disadvantage: However, the paper lacks discussion on the computational efficiency and scalability of
the proposed method, which could be a drawback.
Proposed
Methodology
● Argument Parsing:
Data “argparse” used to input video

file path and set parameters
Collection & like confidence and threshold.
● Load Class Labels: Read and

Preparation load COCO class labels from
a text file to identify the object
classes in the dataset.
Library and Framework Selection
• Import necessary packages: numpy, argparse, imutils, cv2, os, and
• Use OpenCV’s Deep Neural Network (DNN) module for loading and running the
YOLO model
Loading Model
Specify paths for YOLO weight (yolov3.weights) and configuration (yolov3.cfg) files.
Load the model using cv2.dnn.readNetFromDarknet

Detection &
Classification
Initialise Video Stream
• From the command line arguments: 0 for webcam, video file, otherwise.
Process Each Frame (loop)

• Blob Construction (preprocessing): making it suitable for YOLO model input, frame
matches the format and scale expected by the neural network for accurate detection
• 1 / 255.0: A scaling factor to normalize the pixel values, [0,1], suitable for the neural
network.
• YOLO expects the input image to be of a fixed size (416x416 pixels)
• swapRB=True : converts the image from BGR (OpenCV’s default format) to RGB (neural
network format)
● In YOLO, image is divided into grids
and predicts bboxes in each grid, in
Perform & a single pass.
● Detection(output) is a list of 85
Process items:
First 5 values are:
Detections Center X, Center Y, width, height,

confidence score (probability of
presence of object)
for better model detection[5] to detection[84]: Class

performance scores for the 80 classes in the COCO
dataset (probability distribution over
all classes)
Filter out weak predictions
• For each detection, extract the class scores (detection[5:])
• Identify the class with the highest score and its corresponding confidence
• Compare the confidence score with a predefined threshold (e.g., 0.5)
Hence, detections with lower confidence will not be shown as output.

Apply non-Maximum Suppression
• To remove overlapping and redundant bounding boxes predicted for same
object instance. (one object may be detected by multiple grid cells)
• and keep only the most confident and accurate bbox.
• Procedure:
1. Choose the bbox with max confidence score (bbox1)
2. Eliminate the other boxes whose IoU (intersection over union)
with bbox1 > threshold input
IoU = area of intersection/ area of union
Lastly, it will draw the bounding boxes, display class label

and confidence score.
Result/Output Frame:
Vid 1 Vid 2
● Ground Truth Extraction
Evaluation
- We have extracted the ground
truth using the free AI tool –
makesense.ai
of Model - Exported the GT coordinates

matching the YOLO output format
Metric used is IoU ● Calculated the IoU between each

prediction and its ground truth.
(Intersection over ● If IoU > 0.5:
Union) True Positive ++
Accuracy of the above used algorithm by taking one frame as an input along with
its ground truth:
Output:
Note:
Accuracy = correct predictions / total predictions
Avg Precision = identifies instances of a

particular class correctly (eg. Among all class1
Figure3: bounding boxes, confidence and label of each detection predictions how many are correct)
Avg Recall = out of all class1 true samples, how

many are right
Comparison graph of the accuracy of four frames taken as input along with their
ground truths:
Average Accuracy of Model:

0.798 = 80%
Figure4: line graph to compare correctness of each image

Comparison graph of the count of ground truth vs. detections in a frame
Figure 5: reference frame
Figure 6: point graph to compare count of ground truth vs predictions

● In this project, we implemented a
YOLO-based object detection
system that processes video
streams to identify and classify
objects in real-time.
● The model achieved an accuracy of

80%, demonstrating effective
Conclusion performance in detecting and

labeling objects from the COCO
dataset.
● This system showcases the

potential for real-world applications
in surveillance, autonomous
driving, and image analysis.
● Self driving cars and real-time
surveillance systems and
analysis
● Traffic Monitoring Systems
Application ● Expression Analysis etc.,
&
Future Scope ●
-
Future Plans:
Model upgradation to v4, v5 etc.
- Enhance detection capabilities
- Extended Evaluation Metrics
- Improve Overall Accuracy
● Research Papers:
[1] Artificial Intelligence Applications in Mobile Virtual
Reality Technology)|6th Sept, 2021|Wireless
Communications and Mobile Computing|Chunsheng Chen
and Din Li
[2] Moving Object Detection for Video Surveillance|11th

Mar, 2015|The Scientific World Journal|K. Kalirajan and M.
Sudha
References
[3] Visual Sequence Algorithm for Moving Object Tracking
and Detection in Images|27th Dec, 2021|Contrast Media and
Molecular Imaging|Renzheng Xue, Ming Liu and Xiaokun Yu
● Media and Dataset

- COCO from
https://docs.ultralytics.com/datasets/detect/coco/
- videos from
https://www.kaggle.com/code/shawon10/object-detecti
on-from-a-traffic-video/input
THANK YOU

Grp2 Final PPT YOLO Moving Object Classification

Uploaded by

Copyright:

Available Formats

Grp2 Final PPT YOLO Moving Object Classification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Grp2 Final PPT YOLO Moving Object Classification

Uploaded by

Copyright:

Available Formats

Final Year Project

Guided By – Amlan Ray Chaudhuri Sir

Kaustav Sarkar Ahana Basu Krishnendu Manjit Paul

6. Application & Future Scope

2. Pre-trained Model Integration:

3. Interactive Video Processing:

Single Forward Pass:

Published Date: 30th October, 2019

Published Date: 27th May, 2021

Data “argparse” used to input video

Collection & like confidence and threshold.

● Load Class Labels: Read and

Load the model using cv2.dnn.readNetFromDarknet

Process Each Frame (loop)

Perform & a single pass.

Detections Center X, Center Y, width, height,

for better model detection[5] to detection[84]: Class

Hence, detections with lower confidence will not be shown as output.

IoU = area of intersection/ area of union

Lastly, it will draw the bounding boxes, display class label

of Model - Exported the GT coordinates

Metric used is IoU ● Calculated the IoU between each

Avg Precision = identifies instances of a

Avg Recall = out of all class1 true samples, how

Average Accuracy of Model:

Figure4: line graph to compare correctness of each image

Figure 5: reference frame

Figure 6: point graph to compare count of ground truth vs predictions

● The model achieved an accuracy of

Conclusion performance in detecting and

● This system showcases the

Application ● Expression Analysis etc.,

[2] Moving Object Detection for Video Surveillance|11th

● Media and Dataset

You might also like