CV Report
CV Report
YOLOv8
Abhinav Raj(142202026) , Sanket D Avaralli (142202024)
December 2, 2023
Introduction
Our project introduces a different way to detect, track, and count people using computer vision. We
use three main tools: YOLOv8, which is great at spotting objects (like people) quickly and accurately;
Supervision, a system that helps manage and process video data; and ByteTracker, which is skilled
at keeping track of where objects move in a video. Together, these tools work on both videos that
have already been recorded and live video from webcams. This makes our system very flexible
and useful for many different situations.Click here to get the folder of the code file and the input-output
video.
• It uses deep convolutional neural networks to analyze the image, detecting objects and their loca-
tions.
• Advanced algorithms within YOLOv8 allow it to distinguish between various object classes, par-
ticularly focusing on human figures in this project.
2 Supervision
2.1 Overview
Supervision is a comprehensive video processing framework designed to enhance the integration and effi-
ciency of object detection and tracking systems. It acts as a versatile toolkit, facilitating the management
of various video formats and sources. The framework excels in coordinating the flow of data between
object detection models, like YOLOv8, and tracking algorithms such as ByteTracker. It achieves this
by handling the annotation of video frames with detection and tracking data, making the output more
insightful and visually informative. Supervision’s key strength lies in its ability to seamlessly synchronize
the detection and tracking outputs, enabling a coherent and effective analysis of video content. This
1
synchronization is crucial for tasks that require detailed and accurate object tracking over time, such
as in surveillance and behavioral analysis. Supervision’s role is pivotal in ensuring that complex video
processing tasks are executed smoothly, maintaining the integrity and utility of the processed video data.
3 ByteTracker
3.1 Overview
ByteTracker is an advanced tracking algorithm, highly effective in dynamic video environments. It
works seamlessly with object detection models, like YOLOv8, to track objects across video frames. The
algorithm assigns unique identifiers to each detected object, ensuring consistent and accurate tracking.
It utilizes a combination of feature extraction, Kalman filtering, and the Hungarian algorithm for data
association. This allows ByteTracker to predict the movement of objects, handle occlusions effectively,
and re-identify objects even after they have temporarily disappeared from the scene. Its strength lies
in robustness in crowded settings and real-time performance, though its effectiveness is contingent on
the quality of initial object detections. Overall, ByteTracker is a versatile and powerful tool for object
tracking in complex and dynamic scenarios.
4 Video-Based Tracking
4.1 Overview
In this model, we have taken a prerecorded video as our input and drawn a line by defining the coordinates
based on the requirements of the video.This line helps the model keep track of the number of humans
who have crossed the defined line in the video.
2
5 Live Webcam-Based Tracking
5.1 Overview
In this model, we are using live video from webcam as our input. Using YOLOv8 and ByteTracker, we
are detecting humans and tracking them by assigning a unique ID for each detected human and count
the number of different humans in the video, which is displayed in the live output video.
3. Continuous Tracking: ByteTrack assigns unique IDs to each detected individual, enabling the
model to track people continuously as they move within the camera’s view.
4. Live Counting: The model counts the number of unique individuals detected over time. This
count is updated live, reflecting the current number of people seen by the webcam.
5. Live Video Output: The live feed is displayed with tracked individuals marked. A counter shows
the number of unique individuals detected in real-time.
Conclusion
This project successfully combines YOLOv8, Supervision, and ByteTracker to create a powerful com-
puter vision system. It can not only spot and follow people in real-time videos but also count them
accurately. This system is versatile and useful in many different areas, including security, city planning,
and business. Whether it’s working with prerecorded videos or live feeds from webcams, our project
shows how advanced technology can be used in practical, real-world situations.