Grp2 Final PPT YOLO Moving Object Classification
Grp2 Final PPT YOLO Moving Object Classification
Grp2 Final PPT YOLO Moving Object Classification
Evaluation (Semester – 8)
PROJ - CS881
Moving Object
Classification
using Deep Learning
YOLO v3
2. Motivation
3. Literature Survey
4. Proposed Methodology
5. Result/ Conclusion
7. References
Objectiv
e
1. Real-time Object Detection & Classification:
in video streams using YOLO (You Only Look Once) deep learning. This involves identifying
objects in each frame, drawing bounding boxes, and labeling them with confidence scores.
Open Source:
The availability of pre-trained models and open-source implementations makes it
accessible.
● Here, we use Yolo V3 model instead of Yolo V8 because it gives maximum 0.9 and minimum 0.6
confidence whereas on the other hand Yolo V8 only gives maximum 0.27 confidence. (tested)
● Efficient and accurate object detection in live video feeds for applications like
surveillance, traffic monitoring, and robotics.
Literature Survey
Research Paper 1:
(A Lightweight Moving Vehicle Classification System Through Attention-Based Method and
Deep Learning)
https://ieeexplore.ieee.org/abstract/document/8886464/authors#authors
The research paper discusses using convolutional neural networks (CNNs) for vehicle classification in videos, focusing
on intelligent transportation systems. They propose an attention-based approach to overcome challenges like camera
jitter and bad weather.
Methodology involves two parts: attention-based detection and fine-grained classification using YOLOv3. The authors
customize YOLOv3 with their dataset of 49,652 annotated frames covering four vehicle classes. They address class
imbalance with image augmentation. Results show the proposed method outperforms existing techniques in specificity,
false-positive rate, and precision. They use challenging outdoor scenes from CDNET2014 for evaluation.
Disadvantage: However, the paper lacks details on computational complexity and scalability.
In summary, the paper introduces a promising approach for vehicle classification in complex scenarios, showing
potential for real-world applications in intelligent transportation systems. However, it could provide more insights into
computational efficiency and scalability for practical implementation.
Research Paper 2:
(Moving Vehicle Detection and Classification Using Gaussian Mixture Model and Ensemble Deep
Learning Technique)
https://www.hindawi.com/journals/wcmc/2021/5590894/
The research paper focuses on improving automatic vehicle classification in visual traffic surveillance systems,
particularly during lockdowns to curb COVID-19 spread. It proposes a new technique for classifying vehicle types,
addressing issues with imbalanced data and real-time performance.
The methodology involves collecting data from the Beijing Institute of Technology Vehicle Dataset and the
MIOvision Traffic Camera Dataset. Techniques like adaptive histogram equalization and Gaussian mixture model are
used for image enhancement and vehicle detection. Feature extraction is done using the Steerable Pyramid
Transform and Weber Local Descriptor, followed by classification using ensemble deep learning.
The proposed technique achieves high classification accuracy of 99.13% and 99.28% on the MIOvision Traffic
Camera Dataset and the Beijing Institute of Technology Vehicle Dataset, respectively. These results outperform
existing benchmark techniques.
Disadvantage: However, the paper lacks discussion on the computational efficiency and scalability of
the proposed method, which could be a drawback.
Proposed
Methodology
● Argument Parsing:
Loading Model
Specify paths for YOLO weight (yolov3.weights) and configuration (yolov3.cfg) files.
• 1 / 255.0: A scaling factor to normalize the pixel values, [0,1], suitable for the neural
network.
• YOLO expects the input image to be of a fixed size (416x416 pixels)
• swapRB=True : converts the image from BGR (OpenCV’s default format) to RGB (neural
network format)
● In YOLO, image is divided into grids
and predicts bboxes in each grid, in
● Detection(output) is a list of 85
Process items:
First 5 values are:
• Procedure:
1. Choose the bbox with max confidence score (bbox1)
2. Eliminate the other boxes whose IoU (intersection over union)
with bbox1 > threshold input
Vid 1 Vid 2
● Ground Truth Extraction
Evaluation
- We have extracted the ground
truth using the free AI tool –
makesense.ai
Output:
Note:
Accuracy = correct predictions / total predictions
&
Future Scope ●
-
Future Plans:
Model upgradation to v4, v5 etc.
- Enhance detection capabilities
- Extended Evaluation Metrics
- Improve Overall Accuracy
● Research Papers:
[1] Artificial Intelligence Applications in Mobile Virtual
Reality Technology)|6th Sept, 2021|Wireless
Communications and Mobile Computing|Chunsheng Chen
and Din Li
References
[3] Visual Sequence Algorithm for Moving Object Tracking
and Detection in Images|27th Dec, 2021|Contrast Media and
Molecular Imaging|Renzheng Xue, Ming Liu and Xiaokun Yu