Grp2 Final PPT YOLO Moving Object Classification

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

Final Year Project

Evaluation (Semester – 8)
PROJ - CS881
Moving Object
Classification
using Deep Learning
YOLO v3

Guided By – Amlan Ray Chaudhuri Sir


Team Members

Kaustav Sarkar Ahana Basu Krishnendu Manjit Paul


11500120051 11500120098 Sankar Mandal 11500120107
11500120012
Contents 1. Objective

2. Motivation

3. Literature Survey

4. Proposed Methodology

5. Result/ Conclusion

6. Application & Future Scope

7. References
Objectiv
e
1. Real-time Object Detection & Classification:
in video streams using YOLO (You Only Look Once) deep learning. This involves identifying
objects in each frame, drawing bounding boxes, and labeling them with confidence scores.

2. Pre-trained Model Integration:


To utilize a pre-trained YOLO model trained on the COCO (Common Object in Context)
dataset. This enables leveraging existing deep learning models for accurate object
detection across various classes.

3. Interactive Video Processing:


To process video input from a webcam or specified file, demonstrating practical computer
vision applications for object recognition in real-time.
Motiv
ation
Why YOLO v3 ???

Single Forward Pass:


Unlike traditional object detectors that apply the model to multiple layers, thus,
significantly reduces the computational cost and increasing detection speed.

Open Source:
The availability of pre-trained models and open-source implementations makes it
accessible.

● Here, we use Yolo V3 model instead of Yolo V8 because it gives maximum 0.9 and minimum 0.6
confidence whereas on the other hand Yolo V8 only gives maximum 0.27 confidence. (tested)

● Efficient and accurate object detection in live video feeds for applications like
surveillance, traffic monitoring, and robotics.
Literature Survey
Research Paper 1:
(A Lightweight Moving Vehicle Classification System Through Attention-Based Method and
Deep Learning)
https://ieeexplore.ieee.org/abstract/document/8886464/authors#authors

Published Date: 30th October, 2019


Authors: Nasaruddin Nasaruddin | Kahlil Muchtar | Afdhal Afdhal

The research paper discusses using convolutional neural networks (CNNs) for vehicle classification in videos, focusing
on intelligent transportation systems. They propose an attention-based approach to overcome challenges like camera
jitter and bad weather.

Methodology involves two parts: attention-based detection and fine-grained classification using YOLOv3. The authors
customize YOLOv3 with their dataset of 49,652 annotated frames covering four vehicle classes. They address class
imbalance with image augmentation. Results show the proposed method outperforms existing techniques in specificity,
false-positive rate, and precision. They use challenging outdoor scenes from CDNET2014 for evaluation.
Disadvantage: However, the paper lacks details on computational complexity and scalability.

In summary, the paper introduces a promising approach for vehicle classification in complex scenarios, showing
potential for real-world applications in intelligent transportation systems. However, it could provide more insights into
computational efficiency and scalability for practical implementation.
Research Paper 2:
(Moving Vehicle Detection and Classification Using Gaussian Mixture Model and Ensemble Deep
Learning Technique)
https://www.hindawi.com/journals/wcmc/2021/5590894/

Published Date: 27th May, 2021


Authors: Preetha Jagannathan | Sujatha Rajkumar | Jaroslav Frnda | Prabu Subramani

The research paper focuses on improving automatic vehicle classification in visual traffic surveillance systems,
particularly during lockdowns to curb COVID-19 spread. It proposes a new technique for classifying vehicle types,
addressing issues with imbalanced data and real-time performance.

The methodology involves collecting data from the Beijing Institute of Technology Vehicle Dataset and the
MIOvision Traffic Camera Dataset. Techniques like adaptive histogram equalization and Gaussian mixture model are
used for image enhancement and vehicle detection. Feature extraction is done using the Steerable Pyramid
Transform and Weber Local Descriptor, followed by classification using ensemble deep learning.
The proposed technique achieves high classification accuracy of 99.13% and 99.28% on the MIOvision Traffic
Camera Dataset and the Beijing Institute of Technology Vehicle Dataset, respectively. These results outperform
existing benchmark techniques.

Disadvantage: However, the paper lacks discussion on the computational efficiency and scalability of
the proposed method, which could be a drawback.
Proposed
Methodology
● Argument Parsing:

Data “argparse” used to input video


file path and set parameters

Collection & like confidence and threshold.

● Load Class Labels: Read and


Preparation load COCO class labels from
a text file to identify the object
classes in the dataset.
Library and Framework Selection
• Import necessary packages: numpy, argparse, imutils, cv2, os, and
• Use OpenCV’s Deep Neural Network (DNN) module for loading and running the
YOLO model

Loading Model
Specify paths for YOLO weight (yolov3.weights) and configuration (yolov3.cfg) files.

Load the model using cv2.dnn.readNetFromDarknet


Detection &
Classification
Initialise Video Stream
• From the command line arguments: 0 for webcam, video file, otherwise.

Process Each Frame (loop)


• Blob Construction (preprocessing): making it suitable for YOLO model input, frame
matches the format and scale expected by the neural network for accurate detection

• 1 / 255.0: A scaling factor to normalize the pixel values, [0,1], suitable for the neural
network.
• YOLO expects the input image to be of a fixed size (416x416 pixels)
• swapRB=True : converts the image from BGR (OpenCV’s default format) to RGB (neural
network format)
● In YOLO, image is divided into grids
and predicts bboxes in each grid, in

Perform & a single pass.

● Detection(output) is a list of 85

Process items:
First 5 values are:

Detections Center X, Center Y, width, height,


confidence score (probability of
presence of object)

for better model detection[5] to detection[84]: Class


performance scores for the 80 classes in the COCO
dataset (probability distribution over
all classes)
Filter out weak predictions
• For each detection, extract the class scores (detection[5:])
• Identify the class with the highest score and its corresponding confidence
• Compare the confidence score with a predefined threshold (e.g., 0.5)

Hence, detections with lower confidence will not be shown as output.


Apply non-Maximum Suppression
• To remove overlapping and redundant bounding boxes predicted for same
object instance. (one object may be detected by multiple grid cells)
• and keep only the most confident and accurate bbox.

• Procedure:
1. Choose the bbox with max confidence score (bbox1)
2. Eliminate the other boxes whose IoU (intersection over union)
with bbox1 > threshold input

IoU = area of intersection/ area of union

Lastly, it will draw the bounding boxes, display class label


and confidence score.
Result/Output Frame:

Vid 1 Vid 2
● Ground Truth Extraction

Evaluation
- We have extracted the ground
truth using the free AI tool –
makesense.ai

of Model - Exported the GT coordinates


matching the YOLO output format

Metric used is IoU ● Calculated the IoU between each


prediction and its ground truth.
(Intersection over ● If IoU > 0.5:
Union) True Positive ++
Accuracy of the above used algorithm by taking one frame as an input along with
its ground truth:

Output:

Note:
Accuracy = correct predictions / total predictions

Avg Precision = identifies instances of a


particular class correctly (eg. Among all class1
Figure3: bounding boxes, confidence and label of each detection predictions how many are correct)

Avg Recall = out of all class1 true samples, how


many are right
Comparison graph of the accuracy of four frames taken as input along with their
ground truths:

Average Accuracy of Model:


0.798 = 80%

Figure4: line graph to compare correctness of each image


Comparison graph of the count of ground truth vs. detections in a frame

Figure 5: reference frame

Figure 6: point graph to compare count of ground truth vs predictions


● In this project, we implemented a
YOLO-based object detection
system that processes video
streams to identify and classify
objects in real-time.

● The model achieved an accuracy of


80%, demonstrating effective

Conclusion performance in detecting and


labeling objects from the COCO
dataset.

● This system showcases the


potential for real-world applications
in surveillance, autonomous
driving, and image analysis.
● Self driving cars and real-time
surveillance systems and
analysis
● Traffic Monitoring Systems

Application ● Expression Analysis etc.,

&
Future Scope ●
-
Future Plans:
Model upgradation to v4, v5 etc.
- Enhance detection capabilities
- Extended Evaluation Metrics
- Improve Overall Accuracy
● Research Papers:
[1] Artificial Intelligence Applications in Mobile Virtual
Reality Technology)|6th Sept, 2021|Wireless
Communications and Mobile Computing|Chunsheng Chen
and Din Li

[2] Moving Object Detection for Video Surveillance|11th


Mar, 2015|The Scientific World Journal|K. Kalirajan and M.
Sudha

References
[3] Visual Sequence Algorithm for Moving Object Tracking
and Detection in Images|27th Dec, 2021|Contrast Media and
Molecular Imaging|Renzheng Xue, Ming Liu and Xiaokun Yu

● Media and Dataset


- COCO from
https://docs.ultralytics.com/datasets/detect/coco/
- videos from
https://www.kaggle.com/code/shawon10/object-detecti
on-from-a-traffic-video/input
THANK YOU

You might also like