Group Number - 2 - MOVING OBJECT CLASSIFICATION USING YOLO Algorithm

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

Mid – Term Project

Evaluation (Semester – 8)
Title -> Moving Object
Classification Using Deep
Learning
Team Members

Kaustav Sarkar Ahana Basu Krishnendu Manjit Paul


11500120051 11500120098 Sankar Mandal 11500120107
11500120012
Contents
1. Objective

2. Motivation

3. Literature Survey

4. Proposed Methodology

5. Result/Discussion
Objective

1. Real-time Object Detection & Classification: in video streams using YOLO


(You Only Look Once) deep learning. This involves identifying objects in each
frame, drawing bounding boxes, and labeling them with confidence scores.

2. Pre-trained Model Integration: Utilize a pre-trained YOLO model trained on


the COCO (Common Object in Context) dataset. This enables leveraging
existing deep learning models for accurate object detection across various
classes.

3. Interactive Video Processing: Develop an interactive application to


process video input from a webcam or specified file, demonstrating practical
computer vision applications for object recognition in real-time.
Motivation

The code essentially performs real-time object detection on a video stream or


webcam feed using YOLOv3 model and OpenCV. It continuously processes
frames, detects objects, and displays the results until the user exits the program.

Here, we use Yolo V3 model instead of Yolo V8 because it gives maximum 0.9 and
minimum 0.6 confidence whereas on the other hand Yolo V8 only gives maximum
0.27 confidence. (tested)

Efficient and accurate object detection in live video feeds for applications like
surveillance, traffic monitoring, and robotics.
Literature Survey
Research Paper 1:
(A Lightweight Moving Vehicle Classification System Through Attention-Based Method and
Deep Learning)
https://ieeexplore.ieee.org/abstract/document/8886464/authors#authors

Published Date: 30th October, 2019


Authors: Nasaruddin Nasaruddin | Kahlil Muchtar | Afdhal Afdhal

The research paper discusses using convolutional neural networks (CNNs) for vehicle classification in videos, focusing
on intelligent transportation systems. They propose an attention-based approach to overcome challenges like camera
jitter and bad weather.

Methodology involves two parts: attention-based detection and fine-grained classification using YOLOv3. The authors
customize YOLOv3 with their dataset of 49,652 annotated frames covering four vehicle classes. They address class
imbalance with image augmentation. Results show the proposed method outperforms existing techniques in specificity,
false-positive rate, and precision. They use challenging outdoor scenes from CDNET2014 for evaluation.
Disadvantage: However, the paper lacks details on computational complexity and scalability.

In summary, the paper introduces a promising approach for vehicle classification in complex scenarios, showing
potential for real-world applications in intelligent transportation systems. However, it could provide more insights into
computational efficiency and scalability for practical implementation.
Research Paper 2:
(Moving Vehicle Detection and Classification Using Gaussian Mixture Model and Ensemble Deep
Learning Technique)
https://www.hindawi.com/journals/wcmc/2021/5590894/

Published Date: 27th May, 2021


Authors: Preetha Jagannathan | Sujatha Rajkumar | Jaroslav Frnda | Prabu Subramani

The research paper focuses on improving automatic vehicle classification in visual traffic surveillance systems,
particularly during lockdowns to curb COVID-19 spread. It proposes a new technique for classifying vehicle types,
addressing issues with imbalanced data and real-time performance.

The methodology involves collecting data from the Beijing Institute of Technology Vehicle Dataset and the
MIOvision Traffic Camera Dataset. Techniques like adaptive histogram equalization and Gaussian mixture model are
used for image enhancement and vehicle detection. Feature extraction is done using the Steerable Pyramid
Transform and Weber Local Descriptor, followed by classification using ensemble deep learning.
The proposed technique achieves high classification accuracy of 99.13% and 99.28% on the MIOvision Traffic
Camera Dataset and the Beijing Institute of Technology Vehicle Dataset, respectively. These results outperform
existing benchmark techniques.

Disadvantage: However, the paper lacks discussion on the computational efficiency and scalability of
the proposed method, which could be a drawback.
Proposed
Methodology
1) Importing Necessary Packages:
a) numpy: For numerical operations on arrays.
b) argparse: For parsing command line arguments.
c) imutils: Provides convenience functions to work with OpenCV.
d) time: For timing operations.
e) cv2: OpenCV library for computer vision tasks.
f) os: For interacting with the operating system.
2) Argument Parsing:
a) Using argparse, command-line arguments like input video path, confidence
threshold, and non-maximum suppression threshold are parsed.
3) Loading Model and Labels:
a) Loads the YOLO object detection model from disk using
cv2.dnn.readNetFromDarknet() function, which reads the model architecture from
a .cfg file and weights from a .weights file.
b) Loads the class labels from a text file.
4) Initializing Video Stream:
a) Determines whether the input is a video file or a live camera feed.
b) If the input is a video file, it retrieves frame dimensions and total frames.
5) Frame Processing in a Loop:
a) Processes each frame of the video stream.
b) Constructs a blob (resizes the frame, converts it to RGB format, normalizes pixel
values) from the input frame to feed it to the neural network.
c) Performs a forward pass through the network to get bounding boxes, confidences,
and class IDs of detected objects.
d) Filters out weak detections based on confidence threshold.
e) Applies non-maximum suppression to remove redundant bounding boxes.
f) Draws bounding boxes and labels on the frame.
g) Displays the processed frame.
6) Exiting the Program:
a) Listens for the 'q' key press to exit the video.
7) Cleanup:
a) Releases video stream and closes OpenCV windows.
Result/Discussion:

Vid 1 Vid 2
THANK
YOU

You might also like