Object Detection Classification and Tracking of Everyday Common Objects
Object Detection Classification and Tracking of Everyday Common Objects
ISSN No:-2456-2165
Abstract:- This project presents an advanced computer making it a valuable tool for applications such as self-driving
vision system for object detection, classification, and cars and video surveillance.
tracking utilizing the cutting-edge YOLOv4 algorithm.
Recent advances in deep learning have led to significant As research in this area continues, we can expect to see
improvements in the accuracy and speed of object even more improvements in the accuracy, speed, and
detection models. The project focuses on training the capabilities of object detection and tracking models. These
YOLOv4 model on large-scale datasets with diverse models will have a wide range of applications, such as self-
object categories. By employing transfer learning driving cars, video surveillance, and robotics.
techniques, the model will be fine-tuned to adapt to
specific target objects of interest, achieving a high level of II. LITERATURE SURVEY
accuracy and generalization. The Object detection, This paper[1] proposes a method for object tracking and
classification and tracking model achieves high accuracy counting in a zone using YOLOv4, Deep SORT, and
in detecting and tracking objects. The performance TensorFlow. The results of the experiments are promising and
analysis of the system showcases promising results. The suggest that the proposed method is a viable option for object
model achieves the accuracy of over 95% for most of the tracking and counting in a zone. The key points of the paper
objects, dropping till 75% for few objects and rarely till are that YOLOv4 is a fast and accurate object detection
50%. The fluctuation results due to the model not being algorithm, Deep SORT is a tracking algorithm that can track
very robust to occlusions. Overall, the model significantly objects over time, TensorFlow is a machine learning
improves the accuracy of existing model by detecting the framework that can be used to train and deploy YOLOv4 and
targets that are very close to the edges of the frame to by Deep SORT, and the proposed method was able to achieve
focusing on them before they exit the frame. The model high accuracy and speed in object tracking and counting.
counts the objects and get their position information when
tacking. However, ongoing improvement efforts are In this paper[2] a new method proposed for multiple
necessary to address potential challenges, such as real object tracking in surveillance videos uses a Spatio Temporal
time multi object tracking, object association and Markov Random Field (ST-MRF) model to track moving
occlusion handling. vectors (MVs) and blocks coding modes (BCMs) from a
compressed bitstream. The results show that the proposed
Keywords:- Yolov4, detection, classification, tracking, method outperforms other methods on the MOTChallenge
OpenCV. benchmark. The paper also investigates the use of visual
I. INTRODUCTION features in the tracking phase of a tracking system using Deep
SORT. The results show that the use of visual features can
Recent advances in deep learning have led to significant improve the performance of the tracking system.
improvements in the accuracy and speed of object detection
models. These models are now able to detect a wider range of This paper[3] proposes a new object detection
objects, including everyday common objects such as people, framework called YOLOv4-5D, which is based on the
cars, animals, and food. In addition, object tracking YOLOv4 architecture. The framework introduces several
algorithms have been developed that can track objects over new techniques to improve the accuracy and efficiency of
time more accurately, even when they are partially occluded object detection for autonomous driving. These techniques
or moving quickly. One of the most significant recent include using a new backbone network, replacing the last
developments in object detection is the YOLOv4 model. output layer with deformable convolution, designing a new
YOLOv4 has been shown to achieve state-of-the-art accuracy feature fusion module, and using a new network pruning
on a variety of object detection benchmarks, while also being algorithm. The results of the experiments show that
significantly faster than previous versions of YOLO. It also YOLOv4-5D outperforms the YOLOv4 baseline on the BDD
supports over 80 object categories, making it a powerful tool and KITTI datasets. The framework is also able to run in real
for detecting everyday common objects. time, making it suitable for use in autonomous driving
applications.
Another important development in object detection is
the DeepSORT algorithm. DeepSORT is an object tracking This paper[4] proposes a real-time vehicle detection and
algorithm that can track objects over time more accurately tracking system based on the YOLOv4-tiny object detection
than previous methods. DeepSORT is able to track objects model. The system uses a pre-trained YOLOv4-tiny model to
even when they are partially occluded or moving quickly, detect vehicles in real time and the Deep SORT algorithm to
track the detected vehicles. The system was evaluated on the
Fig 1 shows the architecture diagram of the application. terminal. The input video then preprocessed and send to the
The details of the application are expanded in this section of Yolov4 model (DNN). The model is trained and evaluated via
the report. The user interacts with the project through the val2014 and val2017 COCO dataset. Finally, the video after
command line terminal. The input to the project is given via feature extraction is played on a prompt and the tracking
web cam or by-passing file location of video through details and the detected objects are displayed on the terminal.
A. Functional Description of the Modules hundreds of proposals. This leads to a technique, which filters
There are three major modules in this project which proposals based on some criteria called Non-Maximum
includes multiple object detection followed by multiple object Suppression calculation is actually used to measure the
classification and finally multiple object tracking. Fig 1: overlap between two proposals.
shows the structure chart for all 3 modules.
Multiple Object Classification
Multiple Object Detection Multi-object classification is a computer vision task
An image is given as the input to algorithm and aimed at simultaneously identifying and categorizing
transformation is done using CNN. These transformations are multiple objects within an image. anchor boxes are placed
done so that, input image is compatible to specifications of across the image during training and act as reference frames
algorithm. Following this, flattening operation is performed. for the model to predict object locations. The model calculates
Flattening is converting data into a 1-dimensional array for the offsets between the anchor boxes and the actual boundary
inputting it to next layer. Most of approaches employ a sliding boxes enclosing the objects, which helps in precisely
window over feature map and assigns foreground/background localizing each object. The boundary boxes represent the
scores depending on features computed in that window. The predicted bounding boxes for each object detected in the
neighborhood windows have similar scores to some extent image. These boxes are generated by adjusting the anchor
and are considered as candidate regions. This leads to boxes based on the predicted offsets. For each object detected,
Fig 3 shows the tracking of the objects detected by our The performance analysis of the system showcases
model. The model detects the objects, classify them across promising results. The Object detection, classification and
the 80 categories and shows the probability of match. The Fig tracking model achieves high accuracy in detecting and
4 shows the tracking position of the detected object in tracking objects. The model achieves the accuracy of over
different frames of the videos. 95% for most of the objects, dropping till 75% for few objects
and rarely till 50%. The fluctuation results due to the model
not being very robust to occlusions.
Despite the promising features and capabilities of the [1.] "Multiple Object Tracking using STMRF and
project, there are some limitations that need to be YOLOv4 Deep SORT in Surveillance Video",
acknowledged. These limitations are as follows. International Journal of Science & Engineering
Accuracy: YOLOv4 is a fast and efficient object detection Development Research (www.ijrti.org), ISSN:2455-
model, but it might not be as accurate as some slower, more 2631, Vol.7, Issue 6, page no.43 - 51, June-2022.
complex models. The trade-off between speed and [2.] Diwan T, Anirudh G, Tembhurne JV. Object
accuracy could impact the detection and tracking detection using YOLO: challenges, architectural
performance, especially in challenging scenarios or with successors, datasets and applications. Multimed Tools
small objects. Appl. 2023;82(6):9243-9275. doi: 10.1007/s11042-
Training Data: The quality and diversity of the training data 022-13644-y. Epub 2022 Aug 8. PMID: 35968414;
can significantly impact the model's performance. If the PMCID: PMC9358372.
training data is limited or biased, the model may struggle to [3.] Jiang, Yue & Li, Wenjing & Zhang, Jun & Li, Fang &
generalize to unseen situations or objects. Wu, Zhongcheng. (2022). YOLOv4‐dense: A smaller
Computational Resources: YOLOv4 can be resource- and faster YOLOv4 for real‐time edge‐device based
intensive, particularly during training and inference, object detection in traffic scene. IET Image
requiring powerful GPUs or specialized hardware. This Processing. 17. n/a-n/a. 10.1049/ipr2.12656.
might limit its usage on devices with limited computational [4.] Li, Fudong & Gao, Dongyang & Yang, Yuequan &
capabilities. Zhu, Junwu. (2022). Small target deep convolution
Complex Scenes: The model's performance might degrade recognition algorithm based on improved YOLOv4.
in complex scenes with occlusions, clutter, or overlapping International Journal of Machine Learning and
objects. These situations can challenge the tracker's ability Cybernetics. 14. 1-8. 10.1007/s13042-021-01496-1.
to maintain accurate object associations over time. [5.] Amrouche, Y. Bentrcia, A. Abed and N. Hezil,
Variability in Object Appearance: Objects with large "Vehicle Detection and Tracking in Real-time using
variations in appearance, such as different scales, rotations, YOLOv4-tiny," 2022 7th International Conference on
or lighting conditions, might be challenging for YOLOv4 Image and Signal Processing and their Applications
to detect and track consistently. (ISPA), Mostaganem, Algeria, 2022, pp. 1-5, doi:
10.1109/ISPA54004.2022.9786330.
To address the limitations and further improve the [6.] Zhang, F. Kang, and Y. Wang, “An Improved Apple
project, several enhancements could be considered. The Object Detection Method Based on Lightweight
future works may include, fine-tuning the YOLOv4 model on YOLOv4 in Complex Backgrounds,” Remote Sensing,
domain-specific or more diverse datasets could improve the vol. 14, no. 17, p. 4150, Aug. 2022, doi:
accuracy and robustness of object detection and tracking in 10.3390/rs14174150.
specific scenarios. Applying various data augmentation [7.] L. Hou, C. Chen, S. Wang, Y. Wu, and X. Chen,
techniques during training can help the model generalize “Multi-Object Detection Method in Construction
better to different object appearances and environmental Machinery Swarm Operations Based on the Improved
conditions. YOLOv4 Model,” Sensors, vol. 22, no. 19, p. 7294,
Sep. 2022, doi: 10.3390/s22197294.
Combining the outputs of multiple object detection [8.] U. P. Naik, V. Rajesh, R. K. R and Mohana,
models or tracking algorithms using ensemble methods could "Implementation of YOLOv4 Algorithm for Multiple
potentially improve overall performance and reliability. Object Detection in Image and Video Dataset using
Implementing object re-identification techniques can enhance Deep Learning and Artificial Intelligence for Urban
the tracker's ability to handle occlusions and re-establish Traffic Video Surveillance Application," 2021 Fourth
associations when objects briefly leave the camera's view. International Conference on Electrical, Computer and
Communication Technologies (ICECCT), Erode,
ACKNOWLEDGMENT India, 2021, pp. 1-6, doi:
10.1109/ICECCT52121.2021.9616625.
We would like to express our gratitude to our advisors
[9.] Dewi, R. -C. Chen, Y. -T. Liu, X. Jiang and K. D.
and mentors for their guidance and support throughout the
Hartomo, "Yolo V4 for Advanced Traffic Sign
course of this research. We acknowledge the contributions of
Recognition with Synthetic Training Data Generated
R.V College of Engineering for providing us with the
by Various GAN," in IEEE Access, vol. 9, pp. 97228-
necessary resources and facilities. We also extend our thanks
97242, 2021, doi: 10.1109/ACCESS.2021.3094201.
to our guide Dr. Hemavathy R. who generously gave her time
[10.] Y. Cai et al., "YOLOv4-5D: An Effective and Efficient
and effort to make this study possible. We appreciate the
Object Detector for Autonomous Driving," in IEEE
feedback and insights provided by our peers and colleagues,
Transactions on Instrumentation and Measurement,
which greatly improved the quality of our work. Lastly, we
vol. 70, pp. 1-13, 2021, Art no. 4503613, doi:
thank the peer-reviewers for their invaluable suggestions and
10.1109/TIM.2021.3065438.
constructive criticism, which have helped to refine and
[11.] M. A. Bin Zuraimi and F. H. Kamaru Zaman, "Vehicle
strengthen our findings.
Detection and Tracking using YOLO and Deep
SORT," 2021 IEEE 11th IEEE Symposium on
Computer Applications & Industrial Electronics