0% found this document useful (0 votes)

67 views23 pages

Based On Improved YOLOv8 and Bot SORT Surveillance

This research article presents a traffic flow counting system that integrates an improved YOLOv8 detection model with the Bot SORT tracking algorithm to enhance the accuracy of vehicle detection in surveillance videos. The proposed system incorporates advanced techniques such as SPD-Conv and CoTAttention to improve detection capabilities, particularly for small targets, and demonstrates significant performance improvements in various metrics compared to the original YOLOv8 model. Experimental results validate the effectiveness of this combined approach for real-time traffic monitoring and management.

Uploaded by

mebarki.ikramkhadidja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views23 pages

Based On Improved YOLOv8 and Bot SORT Surveillance

Uploaded by

mebarki.ikramkhadidja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Based on improved YOLOv8 and Bot SORT

surveillance video tra c statistics

Yiqun Yang
Hubei Normal University
Daneng Pi

Hubei Normal University

Lingyan Wang
Hubei Normal University
Mingliang Bao
Hubei Normal University
Jianfu Ge
Hubei Normal University
Tingchen Yuan
Hubei Normal University
Houshi Yu
Hubei Normal University
Qi Zhou
Hubei Normal University

Research Article

Keywords: YOLOv8, Bot SORT, SPD-Conv, CoTAttention, Tra c ow statistics

Posted Date: March 28th, 2024

DOI: https://doi.org/10.21203/rs.3.rs-4161504/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

Additional Declarations: No competing interests reported.

Based on improved YOLOv8 and Bot SORT
surveillance video traffic statistics

Yiqun Yang, Daneng Pi* , Lingyan Wang, Mingliang Bao, Jianfu Ge, Tingchen
Yuan, Houshi Yu, and Qi Zhou

School of Electrical Engineering and Automation, Hubei Normal

University, Huangshi 435002, P. R. China

E-mail: [email protected], [email protected].

Keywords: YOLOv8, Bot SORT, SPD-Conv, CoTAttention, Traffic flow statistics

Abstract:Aiming at the problems of leakage detection and low detection accuracy of

existing deep learning based surveillance video traffic flow detection algorithms, a
traffic flow counting system combining improved YOLOv8 detection and Bot SORT
tracking is proposed. First, the backbone network is used to incorporate the
SPD-Conv convolutional layer to improve the network's ability to detect small targets.
Then, the attention mechanism CoTAttention is introduced into the neck network to
further improve the model generalization ability. Finally, the improved YOLOv8
model and the Bot SORT algorithm are combined to design and implement a traffic
counting system capable of monitoring video traffic in real time, and trained and
tested on the open-source UA-DETRAC vehicle detection dataset. The experimental
results show that the improved YOLOv8 algorithm improves F1, P, mAP50, and
mAP50-95 by 0.36, 2.2, 1.8, and 2.1 percentage points, respectively, compared with
the original algorithm. Combined with the Bot SORT tracking, it achieves more
accurate and reliable results in the task of traffic counting, which provides a strong
support for the vehicle detection and counting in the monitoring system.
1. Introduction
With the continuous progress of urbanization and the popularity of transportation,
traffic flow management on urban roads has become increasingly complex and
important. In this context, the surveillance video traffic counting system becomes one
of the important tools for urban traffic management. By using computer vision
technology, especially target detection algorithms, the traffic flow counting system
can accurately analyze the traffic flow in the surveillance video in real time and
provide powerful data support for urban traffic management[1].
Surveillance video-based traffic counting usually contains two parts: vehicle
detection[2] and vehicle tracking[3]. Traditional target detection algorithms include
frame difference method[4], optical flow method[5], and background difference
method[6]. In this case, the frame difference method is a pixel level based method that
detects a target by comparing the differences between consecutive frames. When a
target undergoes motion, it causes a change in pixel values between neighboring
frames. However, the frame difference method is less effective in dealing with
lighting variations and large scene dynamics, and is susceptible to noise. The optical
flow method captures the motion information of a target by analyzing the temporal
displacement of pixels in an image, and in the presence of occlusion and illumination
variations, the optical flow method is prone to erroneous motion estimation, leading to
inaccurate target detection. The background difference method detects a moving
target by modeling a stationary background and comparing the current frame with the
background. In the presence of lighting changes, camera shaking, or dynamic
backgrounds in the scene, the background difference method is prone to false alarms
or missed alarms, which reduces the accuracy and stability of detection. Although
these traditional methods are still useful in some simple application scenarios, with
the development of computer vision and deep learning, in order to overcome the
shortcomings of traditional vehicle detection methods, target detection algorithms
based on deep learning, have made significant progress to better cope with complex
scenarios and diverse target detection tasks[7].
2. Related work
Deep learning based target detection algorithms are mainly categorized into two
types: the One-Stage[8] and Two-Stage[9] structures. Among them, the Two-Stage
algorithm is to first generate candidate regions, and then to classify and precisely
locate the targets in these candidate regions. Such as R-CNN[10], Fast RCNN[11],
Faster R-CNN[12], Mask R-CNN[13], etc.; One-Stage algorithms directly predict the
target bounding box and categories in the image with a feed-forward neural network
structure, and usually use regression to accomplish the target localization and
classification tasks at the same time, such as the SSD[14], RetinaNet[15], and YOLO[16]
series.
Currently, the YOLO model has evolved to the latest YOLOv8[17], released by
Ultralytics, YOLOv8 has higher detection accuracy and faster detection speed
compared to the previous version. It employs a deep neural network architecture
based on the Darknet framework by dividing the input image into multiple grid cells,
each of which is responsible for detecting the targets in it. This grid-based design
allows YOLOv8 to detect multiple targets at the same time and has an advantage in
processing speed.
For vehicle tracking, SORT[18] is a lightweight multi-target tracking algorithm
based on the Hungarian algorithm. It achieves real-time target tracking by first
predicting the target using Kalman filtering and then matching the association
between detection and prediction using the Hungarian algorithm[19]. However, when
the target is partially or completely occluded, Kalman filtering[20] may not be able to
accurately estimate the target's position and the tracking effect is poor.The SORT
algorithm is susceptible to the problem of target ID confusion in multi-target tracking
which means that different targets are incorrectly assigned the same ID. To solve such
problems, Bot SORT[21] is an online and real-time target tracking algorithm. It is
based on an improved version of the SORT algorithm, which can better deal with the
problem of target re-identification by introducing appearance features.Even if the
target is not visible for a period of time, it is still able to be matched and tracked by
the appearance features,and target matching and ID assignment can be performed
more accurately.
To summarize, this paper adopts an improved YOLOv8n-based target detection
model as a detector, and then combines it with the Bot SORT target tracking
algorithm for traffic counting.YOLOv8n is able to efficiently carry out target
detection, while Bot SORT has an improvement in multi-target tracking, which
improves the accuracy of target re-recognition through the introduction of appearance
features, and thus better adapts to the complex traffic scenarios. This end-to-end
solution will effectively realize accurate traffic counting and tracking, providing
strong support for traffic management and decision-making.

3. Introduction to the YOLOv8n algorithm

The network structure of YOLOv8n is divided into four main parts: the Input
input, the Backbone backbone feature extraction network, the Neck network, and the
Head output, as shown in Figure 1.
The Input input is mainly composed of Mosaic data enhancement, automatic
image cropping and splicing, and adaptive anchor frames[21-23].Mosaic data
enhancement is used to increase the diversity of the training data by splicing multiple
images together to form a single large input image.In this way, the model can be made
to learn the features of the target better at different locations and scales. Automatic
image cropping stitching is used to generate a larger input image by automatically
cropping and stitching images. This increases the diversity of the training data and can
handle targets at different scales. Adaptive anchor frame is to automatically adjust the
size and scale of the anchor frame according to the distribution of the training data to
improve the accuracy of target detection.
The Backbone backbone feature extraction network adopts the CSPDarknet
network architecture, which mainly consists of CBS (standard convolutional layer),
C2f module, and SPPF[24-26]. The CBS is composed of a series of convolutional, batch
normalization, and activation function layers for extracting features of the input image.
The C2f module replaces the original C3 module, which is mainly referenced to the
C3 module and the ELAN module, and obtains richer gradient information by
branching more gradient streams in parallel, thus obtaining richer feature information
while ensuring lightweight.The SPPF pyramid pooling fusion module, which is used
to perform pooling operations on feature maps at different scales, and fuses pooled
feature maps to be able to capture image information at different scales and improve
the network's ability to adapt to scale changes.
The Neck network is located between Backbone, the backbone network of
YOLOv8n, and Head, the output, and plays the role of feature fusion and feature
enhancement. A PAN-FPN[27] structure is used, which fuses feature maps from
different layers through bottom-up and top-down paths and generates a series of
feature maps with different resolutions.
The Head outputs include components such as prediction of bounding boxes,
prediction of categories, loss function layers, processing of feature maps, and
non-extreme value suppression. These outputs can help to determine the location and
category of the target in the image and generate the final target detection result.

Figure 1 YOLOv8n network architecture diagram

4. Improvement of YOLOv8n algorithm
4.1 Introduction of the SPD-Conv module
SPD-Conv is composed of a space-to-depth (SPD) layer and a non-stepwise
convolution (Conv) layer[28].SPD-Conv first performs a space_to_depth operation on
the input feature maps to convert the spatial dimension to the depth dimension. Then,
ordinary convolution operation is performed on the converted feature map. The output
feature map is then subjected to depth_to_space operation to convert the depth
dimension back to the spatial dimension. Finally, by combining the space_to_depth
operation with the ordinary convolution operation, SPD-Conv can increase the
sensory field of the network and extract richer features while keeping the shape of the
input feature map unchanged. This is because the space_to_depth operation increases
the receptive field of each neuron, while the ordinary convolutional operation extracts
localized features.SPD-Conv is used in the convolutional operation to improve the
network's ability to detect small targets.

Figure 2 Structure of SPD-Conv module

Consider an intermediate feature mapping X of arbitrary size S×S×C, as shown
in Figure 2. The SPD-Conv component introduces the input feature map
transformation technique into the backbone network of YOLOv8n. The feature
mapping X is partitioned into a series of sub-feature mappings, which are split as
follows:

f0,0 = X [0 : S : scale,0 : S : scale], f0,1 = X [1: S : scale,0 : S : scale],...,

f scale−1,0 = X [scale − 1: S : scale,0 : S : scale];

f0,1 = X [0 : S : scale,1: S : scale], f1,1 = X [1: S : scale,1: S : scale],...,

f scale−1,1 = X [scale − 1: S : scale,1: S : scale];

f0,scale−1 = X [0 : S : scale, scale −1: S : scale], f1,scale−1 = X [1: S : scale, scale −1: S : scale],...,

f scale−1,scale−1 = X [scale −1: S : scale, scale −1: S : scale],...,

For the usual case, for any (original) feature graph X, the fx , y subgraph consists of

all entries X (i, j ) that satisfy i + x and i + y that are divisible by scale. Thus, each

subgraph downsamples X by a factor of scale. Sub-feature map formation: for the

sub-feature maps obtained from the slicing operation, when scale=2, they are denoted
S S
by indexing as f 0,0 , f 0,1 , f1,0 , f1,1 , and each sub-feature map has the shape ( , , C ) .
2 2
All sub-feature maps are connected along the channel dimension to form a new
feature map X. This new feature map is reduced in spatial dimension by a factor of
scale scale and increased in channel dimension by a factor of 2 of scale scale. The size
S S
of the resulting feature map X becomes ( , , scale2C ) . This indicates that
scale scale
the SPD method realizes the downsampling of the original feature map by slicing,
joining and scale transforming the feature map.

4.2 Introducing the CoTAttention Attention Mechanism

Traditional Transformer models use Self-Attention mechanisms in the encoder
and decoder to model the contextual relationships of input sequences. However, the
Self-Attention mechanism only takes into account the associations within the input
sequences and not the relationships with other input sequences, which leads to poor
performance of the model when dealing with complex associations.
CoTAttention[29] solves this problem by introducing a contextual attention
mechanism, as shown in Figure 3. It introduces an additional contextual attention
matrix on top of the self-attention mechanism to capture the associations between

input sequences. Assume that there is the same input 2D feature graph X  R H W C .
The keys, queries and values are defined as K = X , Q = X and V = XWv . Unlike the

typical self-attention mechanism where 1×1 convolution is performed for each key,
the CoT block first contextualizes each key representation by spatially contextualizing
all neighboring keys within the k  k grid through k  k group convolution. The

contextual culture key K1  R H W C naturally reflects the static contextual information

between local neighboring keys, and we consider K1 as the static contextual
representation of input X.

Figure 3 Context converter

Specifically, CoTAttention adjusts the attention distribution in the self-attention
mechanism by calculating the similarity matrix between input sequences and using it
as the weight of the contextual attention matrix. By introducing the contextual
attention mechanism, CoTAttention is able to better capture the semantic associations
between input sequences, which improves the performance of the model in semantic
understanding and generation tasks.

4.3 YOLOv8n_S_C algorithm

The SPD-Conv module is embedded in specific locations (indexes 2, 5, 8, and 11)
of the YOLOv8n backbone network to perform a spatial depth transformation, as
shown in Figure 4. It organizes blocks of pixels in a specific region into deeper
channels, and this operation serves several purposes:

Figure 4 YOLOv8n_S_C network structure

Reduction of spatial resolution: the spatial depth transformation results in a
reduction of the spatial resolution of the feature map. By organizing adjacent pixels
into deeper channels, the model can capture information over a larger contextual
range without significantly increasing the number of parameters.
Increased receptive field: since pixels in a specific region are organized together,
the response of each channel can cover a larger receptive field. This is useful for
detecting contextual information about an object, especially when the object size is
relatively small.
Introduction of non-linear relationships: The spatial depth transform introduces
non-linear relationships between channels. This helps the model to better learn
complex features at specific locations.
In addition to this, the CoTAttention layer, which is a context transforming
attention mechanism, is embedded in the configuration file at specific locations
indexed 19, 22, 25 in the neck, and CoTAttention works as follows:
Contextual Attention: the CoTAttention enables the model to dynamically attend
to different parts of the input image instead of processing the whole image uniformly.
This is very helpful in improving the model's understanding of the relationships
between objects and the context.
Enhanced Perception: by introducing the attention mechanism, the model can
focus more on processing task-relevant information, thus enhancing its perceptual
capabilities.

5. BoT-SORT tracking algorithms

BoT-SORT is an advanced tracking algorithm that combines motion and
appearance information, camera motion compensation, and more accurate Kalman
filtered state vectors, and integrates these improvements into ByteTrack to enhance
the robustness of target tracking.Key features of BoT-SORT include:
Combining motion and appearance information: BoT-SORT not only utilizes the
target's appearance information for tracking, but also considers the target's motion
information. The accuracy of tracking is improved by joint modeling of motion
trajectories and appearance features.
Camera Motion Compensation: in scenes where camera motion exists, the motion
of the camera can interfere with target tracking.BoT-SORT reduces the effect of
camera motion on target tracking by compensating for the camera motion[30].
More accurate Kalman-filtered state vectors: BoT-SORT uses more accurate
Kalman-filtered state vectors to predict and update the target position. This helps to
reduce the tracking error and improve the target localization accuracy.
Multi-target tracking: BoT-SORT can track multiple targets at the same time,
taking into account the inter-relationships and occlusions between the targets. This
enables accurate target tracking even in complex scenes.
Strong Robustness: Since BoT-SORT utilizes a combination of information for
tracking, it has strong robustness. Accurate target tracking can be achieved even when
the target's appearance changes, occlusion or camera movement occurs.

6. Experimental results and analysis

6.1 Experimental environment

In this experiment, the programming language was Python 3.9 and deep learning
training was performed using the PyTorch framework without the use of pre-trained
models. The parameter configurations were standardized and the experiments were
conducted on a computer with Windows 10 operating system. The computer hardware
configuration included a 12th Gen Intel Core i5-12500H processor, 32 GB of RAM,
and an NVIDIA GeForce GTX 3070 Ti graphics card (8 GB of video memory). In
order to fully utilize the computational power of the graphics card, CUDA 11.8 was
chosen as the development environment for accelerated computation on the GPU.
During deep learning training, a configuration of 16 samples per batch and a setting
with an initial learning rate of 0.01 were used, and performance was continuously
optimized over 500 training rounds.

6.2 Data set

Vehicle Detection Dataset: The UA-DETRAC dataset[31] was harvested from 24
different locations in Beijing and Tianjin, China. The dataset includes more than
140,000 frames with a total duration of 10 hours, capturing vehicle behavior in urban
traffic scenes. Manually annotated with 8250 vehicles and containing 1.21 million
labeled object bounding boxes, the dataset contains different traffic scenarios and
environmental contexts, such as highways, intersections, and zigzags, daytime,
nighttime, cloudy, and rainy. This makes the trained model more adaptable to various
complex traffic situations and environmental conditions. In this experiment, the
purpose of this paper is to perform traffic counts without detailed detection of vehicle
types. In order to simplify the task and increase efficiency, the choice was made to
unify car, bus, van and other in the original dataset as car. such manipulation does not
change the basic meaning of the information, but simply treats all vehicles as the
same category when counting the traffic flow in order to process the data in a more
concise manner. This simplification helps to reduce the complexity of the model and
improve the accuracy and efficiency of the traffic count. A partial dataset was used
for the experiments and randomly divided in the ratio of 8:1:1 to form the training,
validation and test sets for a comprehensive experimental evaluation.

6.3 Evaluation indicators

In this study, the improved YOLOv8n algorithm is evaluated experimentally
using several performance evaluation metrics, including precision P (Precision), recall
R (Recall), average precision (AP), mean average precision (mAP), F1 score ( F1
Score), and frame rate (Frames per Second, FPS).
(1) Precision (Precision) P: describes how many of all the samples predicted as
positive cases by the model are truly positive cases. The calculation formula is as
follows:
TP
P= × 100%
TP + FP
(2) Recall (Recall) R: A measure of how many of all true positive examples of the
model are successfully predicted as positive. The formula is as follows:

TP
R= × 100%
TP + FN
(3) Average Precision (AP) AP: measures the detection performance of the model

on different categories. Average precision is obtained by calculating the area under the
precision-recall curve for each category. The calculation formula is as follows:
1
AP = ∫ P(R)dR
0
(4) Mean Average Precision (mAP): indicates the average detection precision of
all categories. The calculation formula is as follows:

n
1
mAP = ∑ AP(i) × 100%
n
i

(5) F1 score (F1 score): combines the precision rate and the recall rate, and is the
reconciled average of the precision rate and the recall rate. The formula is as follows:

2×P×R
F1 = × 100%
P+R
(6) Frames per Second (FPS): Indicates the performance of the algorithm in terms
of processing the number of frames per second.

6.4 Ablation experiment

In order to analyze the impact of different improvement strategies on the model
detection performance, this paper takes YOLOv8n as the baseline, and introduces the
SPD-Conv module in the backbone network, which helps to improve the accuracy of
object detection; and introduces the CoTAttention attention mechanism in the neck,
which helps to reduce the model's sensitivity to the redundant information, and further
improves the model's generalization ability. Three sets of experiments were designed
and training was completed while ensuring the same dataset and training parameters.
The experimental results are shown in Table 1.
Table 1 Results of ablation experiments

SPD-Conv CoTAttention mAP50/% mAP50-95/% Parameters/M GFLOPs FPS

－－
78.6 56.8 3.01 8.2 303
√ －
79.9 58.3 3.27 11.7 384
√ √
80.4 58.9 4.17 13.2 322

Through the ablation experiments in Table 1, the first row reflects the benchmark
performance of the original YOLOv8n on the dataset. After introducing the
SPD-Conv module and CoTAttention attention mechanism respectively, it is observed
that the SPD-Conv module improves the detection results more significantly,
including the mAP50, mAP50-95 and FPS. Through analysis, it is concluded that the
SPD-Conv module introduces spatial depth convolution to better capture the context
and features of the object, which is especially suitable for scenes that need to deal
with objects of different scales. In contrast, the relatively weak performance of the
CoTAttention attention mechanism is due to the fact that the contextual information
has less impact on the detection accuracy, and therefore, CoTAttention has a small
improvement on the detection performance. Meanwhile, the introduction of
SPD-Conv and CoTAttention achieves the best results for the detection network, with
mAP50 and mAP50-95 improving by 1.8, 2.1 percentage points. This shows that the
combined effect of the two modules achieved a synergistic effect in improving the
detection performance.

Figure. 5 Comparison of mAP during training for ablation experiments

According to the observation in Figure 5,the blue curve in both graphs performs
the worst in terms of mAP, while the green curve performs the best. When the original
YOLOv8n model is introduced into the SPD-Conv module and trained for 15 rounds,
the mAP metrics of the orange curve begin to significantly exceed those of the blue
curve, showing significant performance improvement. Subsequently, when the
CoTAttention attention mechanism is introduced and training continues to about 90
rounds, the mAP of the blue curve starts to exceed the orange curve, showing further
performance improvement.
Figure 6 Comparison of visualization results of ablation experiments
In order to analyze the effect of SPD-Conv and CoTAttention on the detection
results, the effect of vehicles detected by the model is tested by ablation experiments
in Figure 6. (a) demonstrates the detection results of the original YOLOv8n, which
shows high detection accuracy for vehicles but poor performance for long-range
vehicles. In (b), some of the detection results with the introduction of the SPD-Conv
module are shown in Fig. (b) compared to Fig. (a), which successfully detects a
vehicle at a long distance in the same image with a probability of being considered as
a vehicle of 0.4. Further, in (c), the detection results with the introduction of
CoTAttention are shown. Compared to Fig. (b), the probability of detecting a vehicle
at a long distance in Fig. (c) improves to 0.7, which is an improvement of 0.3 in
comparison to Fig. (b).Meanwhile, the model maintains a high detection accuracy.
This indicates that the introduction of CoTAttention further optimizes the detection
performance, especially in the detection of long-distance targets, which achieves
significant improvement.
6.5 Experimental results of different models
In order to make a side-by-side comparison, the three networks YOLOv5n,
YOLOv6n, and YOLOv8n are selected to be trained and tested on the same dataset
respectively in this paper, and the experimental results are shown in Table 2.
Table 2 Comparison of results for different detection networks

Model F1/% P/% R/% mAP50/% mAP50-95/% Parameters/M GFLOPs FPS

YOLOv5n
73.84 78.0 70.1 78.4 55.8 2.51 7.2 400
YOLOv6n
73.95 82.2 67.2 78.1 56.4 4.23 11.9 303
YOLOv8n
75.23 80.5 70.6 78.6 56.8 3.01 8.2 303
YOLOv8n_S_C
75.59 82.7 69.6 80.4 58.9 4.17 13.2 322
As can be observed in Table 2, the improved YOLOv8n_S_C model performed
the best in terms of accuracy at 82.7%. The model achieved the highest score of 80.4%
for mAP50 and also the highest level of 58.9% for mAP50-95. This indicates that the
YOLOv8n_S_C model has excellent comprehensive performance in the target
detection task. Its F1 score is 75.59%, which is the highest level among the models.

7. Traffic statistics
7.1 Traffic flow system interface
The user interface of the monitoring video traffic system was developed using the
Python language in conjunction with the PySide6 module. At the top of the interface
is the title of the system, and directly below the title are four display boxes showing
the total number of categories, the total number of targets, the average frame rate, and
the usage model. Between the title and the display boxes, the current traffic flow is
displayed. Below the hidden buttons on the left side of the system interface are some
function buttons, including Local File, Call Camera, Call RTSP Monitor, Traffic Line
Chart, Single Target Tracking, and Enable Web Side.

Figure 7 Monitoring video traffic flow system

There are some setting options under the right side settings of the system
interface, including model, interaction ratio, confidence level, delay time, and alarm
threshold. The specific settings are as follows: the model is improved YOLOv8n_S_C
in this paper, the interaction ratio is set to 0.36, the confidence level is set to 0.51, the
delay time is set to 0ms, and the alarm threshold is set to 7. The whole interface is
designed so that the user can monitor and manage the traffic flow system easily.
Figure 7 above shows the scenario where the model is detecting the monitoring video
traffic flow.

7.2 Multiple target tracking

Multi-target tracking has a wide range of applications in the fields of video
surveillance, autonomous driving, and UAV tracking[31]. Its goal is to track multiple
targets in a video sequence, usually by finding the position of the target in each frame.
In surveillance video traffic counting, multi-target tracking serves to track the
movement trajectory of each vehicle so that the behavior of the vehicle can be
analyzed and counted. Multi-target tracking is shown in Figure 8.

Figure 8 Single target tracking

In a scenario based on improved YOLOv8 and Bot SORT for traffic counting,
target detection is performed using improved YOLOv8n_S_C, i.e., vehicles in the
image are identified in each frame. The detected vehicle information is passed to Bot
SORT for single-target tracking.Bot SORT assigns a unique identifier to each vehicle
and tracks their motion in the video sequence. The motion of each vehicle is recorded,
which can be achieved by correlating the target position in each frame with the target
position of the same identifier in the previous frame. By tracking individual vehicles,
the system can more accurately understand the dynamics of the entire traffic flow,
providing useful information for traffic management, security monitoring, etc.

7.3 Line graph of traffic flow

A real-time traffic flow line chart is a chart that shows the changes in traffic flow
through time. In the system, generating a line graph of traffic flow helps to visualize
the traffic situation, analyze the traffic flow at the current moment, and take
appropriate measures when needed. As shown in Figure. 9, the horizontal axis
represents the time and the vertical axis represents the number of vehicles. Through
the change of traffic flow in the line graph, it can be observed that the current traffic
flow in the surveillance video is mainly concentrated between 5 and 6, showing a
relatively moderate state.

Figure 9 Line graph of traffic flow

Throughout the monitoring time period, there were several moments when the
traffic volume exceeded the alarm threshold of 7 vehicles. Specifically, there were 5
moments when the number of vehicles reached 8, the current traffic volume subtitle
would turn red and the system would generate an alarm alert, indicating a high
volume of traffic; at the same time, there were 5 moments when the number of
vehicles was only 3, indicating a low volume of traffic. The change of the line over
time reflects the fluctuation of the traffic flow. Real-time updating of the traffic flow
line graph enables continuous analysis of changes in traffic flow, allowing decision
makers to stay informed of the traffic situation and take immediate action when
needed. This helps to optimize traffic management and improve the real-time
responsiveness of the system.

8. Concluding remarks
In a system based on improved YOLOv8 with Bot SORT surveillance video
traffic counting, this paper combines a target detection technique (YOLOv8n_S_C)
and a single-target tracking algorithm (Bot SORT) to achieve accurate detection,
tracking, and counting of vehicles in surveillance video. This system provides a
powerful tool for traffic management and surveillance, capable of accurately
capturing the location, movement trajectory, and changes in traffic flow of vehicles in
real-time video streams. Through the simple operation of the system, it not only
generates line graphs of traffic flow to visualize the trend of traffic flow over time, but
also analyzes in-depth the impact of traffic on traffic flow. This provides traffic
managers with more comprehensive information, enabling them to develop more
targeted traffic optimization strategies to improve the overall efficiency of the road
network. It brings innovation and convenience to urban traffic management and lays a
solid foundation for building a smarter city.
9. References
[1] J. E. Park, W. Byun, Y. Kim, H. Ahn, D. K. Shin, Journal of Advanced
Transportation, 2021, 1.
[2] Z. Wang, J. Zhan, C. Duan, X. Guan, P. Lu, K. Yang. IEEE Transactions on
Neural Networks and Learning Systems.2022.
[3] Y. Fang, C. Wang, W. Yao, X. Zhao, H. Zhao, H. Zha.. On-road vehicle tracking
using part-based particle filter. IEEE transactions on intelligent transportation systems,
2019, 20, 4538.
[4] J. Ju, J. **ng. Multimedia tools and applications, 2019 ,78, 29937.
[5] M. P. Dessauer, S. Dua. In Ground/air multi-sensor interoperability, integration,
and networking for persistent ISR. 2010 7694, 366.
[6] B. Hardjono, H. Tjahyadi, M. G. Rhizma, A. E. Widjaja, R. Kondorura, A. M.
Halim. IEEE 9th Annual Information Technology, Electronics and Mobile
Communication Conference. 2018. 556.
[7] A. Anoop, G. Harikrishnan, K. Nair, B. Sangeetha, V. Praseedalekshmi.
International Conference on Innovations in Science and Technology for Sustainable
Development. 2022. 258.
[8] M. Carranza-García, J. Torres-Mateo, P. Lara-Benítez, J. García-Gutiérrez.
Remote Sensing, 2020, 13, 89.
[9] H. Wang, Y. Yu, Y. Cai, X. Chen, L. Chen, Y. Li. IEEE Transactions on
Intelligent Vehicles, 2020, 6, 100.
[10] K. Lenc, A. Vedaldi, . ar**v preprint ar**v.2015. 1506 . 06981.
[11] R. Girshick. In Proceedings of the IEEE international conference on computer
vision. 2015, 1440.
[12] H. Jiang, E. Learned-Miller. IEEE international conference on automatic face
gesture recognition. 2017, 650.
[13] K. He, G. Gkioxari, P. Dollár, R. Girshick. In Proceedings of the IEEE
international conference on computer vision. 2017, 2961.
[14] J. Jeong, H. Park, N. Kwak. ar**v preprint ar**v. 2017, 1705, 09587.
[15] H. Zhang, H. Chang, B. Ma, S. Shan, X. Chen. ar**v preprint ar**v. 2019, 1907,
06881.
[16] M. Hussain. Machines. 2023, 11, 677.
[17] P. Kranz, U. Ali, A. Mueller, M. Hornauer, M. Loeser, F. Sukkar, T. Kaupp.
2021.
[18] X. Hou, Y. Wang, L. P. Chau. IEEE International Conference on Advanced
Video and Signal Based Surveillance. 2019, 1.
[19] E. Hamuda, B. Mc Ginley, M. Glavin, E. Jones. Computers and electronics in
agriculture, 2018, 148, 37.
[20] C. K. Chui, G. Chen. Berlin, Germany: Springer International Publishing. 2017,
19.
[21] S. Yan, Y. Fu, W. Zhang, W. Yang, R. Yu, F Zhang. International Conference on
Electronic Engineering and Informatics. 2023. 506.
[22] Z. Tan, B. Chen, L. Sun, H. Xu, K. Zhang, F Chen. Information Technology and
Control. 2023, 52, 878.
[23] M. T. Ibrahim, R. Hafiz, M. M. Khan, Y. Cho. Multimedia Systems,2016, 22,
379-392.
[24] Z. Zhang, X. Lu, G. Cao, Y. Yang, L. Jiao, F Liu. In Proceedings of the
IEEE/CVF international conference on computer vision.2021, 2799.
[25] G. Yu, X. Zhou. Mathematics. 2023,11, 2377.
[26] Y. Fan, G. Tohti, M. Geni, G. Zhang, J. Yang. A marigold corolla detection
model based on the improved YOLOv7 lightweight. 2023
[27] G Wang, Y. Chen, P. An, H. Hong, J. Hu, T. Huang. Sensors, 2023, 23, 16, 7190.
[28] Z. Yang, Q. Wu, F. Zhang, X. Zhang, X. Chen, Y. Gao. Symmetry,2023, 15,
1037.
[29] Q. Geng, H. Liu, T. Gao, R. Liu, C. Chen, Q. Zhu, M. Shu. In Healthcare.2023.
[30] J. Lee, K. C. Lee, S. Jeong, Y. J. Lee, S. H. Sim. Mechanical Systems and Signal
Processing. 2020, 140, 106651.
[31] L. Wen, D. Du, Z. Cai, Z. Lei, M. C. Chang, H. Qi, S. Lyu. UA-DETRAC:
Computer Vision and Image Understanding, 2020, 193, 102907.
[32] P. Dendorfer, A. Osep, A. Milan, K. Schindler, D. Cremers, I. Reid, L.
Leal-Taixé. Motchallenge: International Journal of Computer Vision. 2021, 129, 845.

Module 1 - Traffic Operations Analysis
100% (1)
Module 1 - Traffic Operations Analysis
29 pages
Transportation Engineering
No ratings yet
Transportation Engineering
71 pages
Design of Traffic Signal at Kundahallli Junction Bengaluru Karnataka
100% (1)
Design of Traffic Signal at Kundahallli Junction Bengaluru Karnataka
8 pages
Perception of Commuters On The Traffic Management (Final Fullblown)
No ratings yet
Perception of Commuters On The Traffic Management (Final Fullblown)
54 pages
Quiz 3 - Problems & Solutions
100% (1)
Quiz 3 - Problems & Solutions
5 pages
Real Time Traffic
No ratings yet
Real Time Traffic
55 pages
Traffic Engineering and Management: Special PPT'S Included Last
100% (1)
Traffic Engineering and Management: Special PPT'S Included Last
276 pages
Transportation Engineering - Lab Manual PDF
71% (7)
Transportation Engineering - Lab Manual PDF
186 pages
Yolov 8
No ratings yet
Yolov 8
12 pages
Grp2 Final PPT YOLO Moving Object Classification
No ratings yet
Grp2 Final PPT YOLO Moving Object Classification
26 pages
Smart Traffic Monitoring System
No ratings yet
Smart Traffic Monitoring System
27 pages
Sustainability 14 12274 v2
No ratings yet
Sustainability 14 12274 v2
19 pages
Enhancing Object Detection and Tracking From Surveillance Video Camera Using YOLOv8
No ratings yet
Enhancing Object Detection and Tracking From Surveillance Video Camera Using YOLOv8
6 pages
Project Phase 2 R-3
No ratings yet
Project Phase 2 R-3
17 pages
Improving The Vehicle Small Object Detection Algorithm of Yolov5
No ratings yet
Improving The Vehicle Small Object Detection Algorithm of Yolov5
11 pages
Conference
No ratings yet
Conference
16 pages
4377-Article Text-19556-1-10-20240731
No ratings yet
4377-Article Text-19556-1-10-20240731
6 pages
Object Detection Using Yolo Algorithm-1
No ratings yet
Object Detection Using Yolo Algorithm-1
9 pages
Applsci 13 09316
No ratings yet
Applsci 13 09316
18 pages
Chapter 1
No ratings yet
Chapter 1
15 pages
A Real-Time Traffic Monitoring System Based On YOLOv8 For Vehicle Detection and Classification
No ratings yet
A Real-Time Traffic Monitoring System Based On YOLOv8 For Vehicle Detection and Classification
6 pages
Image Processing Technique For Tracking and Counting of Vehicles Using ROI
No ratings yet
Image Processing Technique For Tracking and Counting of Vehicles Using ROI
6 pages
Shockwave Theory
No ratings yet
Shockwave Theory
41 pages
Traffic Monitoring System YOLOv8 OpenCV
No ratings yet
Traffic Monitoring System YOLOv8 OpenCV
4 pages
NSUT
No ratings yet
NSUT
13 pages
Yolo11 Car
No ratings yet
Yolo11 Car
16 pages
Improved Small-Object Detection Using YOLOv8 A Com
No ratings yet
Improved Small-Object Detection Using YOLOv8 A Com
9 pages
Basic Macroscopic Traffic Flow Model
No ratings yet
Basic Macroscopic Traffic Flow Model
42 pages
ABSTRACT
No ratings yet
ABSTRACT
1 page
ECV 408 Traffic Engineering II
0% (1)
ECV 408 Traffic Engineering II
36 pages
Uranchimeg, 1rouf4-8
No ratings yet
Uranchimeg, 1rouf4-8
5 pages
Improvement of Traffic Efficiency For Intersections in Azzawiya City
No ratings yet
Improvement of Traffic Efficiency For Intersections in Azzawiya City
184 pages
Tfa - Bacayan - Research Proposal
No ratings yet
Tfa - Bacayan - Research Proposal
41 pages
Article 1
No ratings yet
Article 1
9 pages
Real-Time Image Segmentation and Objec1111 Tracking For Autonomous Vehicles
No ratings yet
Real-Time Image Segmentation and Objec1111 Tracking For Autonomous Vehicles
5 pages
(2024-AEJ) YOLO-Fusion and Internet of Things - Advancing Object Detection in Smart Transportation
No ratings yet
(2024-AEJ) YOLO-Fusion and Internet of Things - Advancing Object Detection in Smart Transportation
12 pages
14489-Article Text-52397-76611-10-20240112
No ratings yet
14489-Article Text-52397-76611-10-20240112
10 pages
Electronics 12 04970
No ratings yet
Electronics 12 04970
21 pages
Abir
No ratings yet
Abir
10 pages
Rationale: The Function of TOMECO in City of Tacloban For Maintaining Traffic Congestion
No ratings yet
Rationale: The Function of TOMECO in City of Tacloban For Maintaining Traffic Congestion
4 pages
Paper
No ratings yet
Paper
17 pages
CCTV
No ratings yet
CCTV
23 pages
Evaluation of Planning Options To Alleviate Traffic Congestion and Resulting Air Pollution in Dhaka City PDF
No ratings yet
Evaluation of Planning Options To Alleviate Traffic Congestion and Resulting Air Pollution in Dhaka City PDF
157 pages
The Problem and Its Background
No ratings yet
The Problem and Its Background
38 pages
Capacity and LOS - Part1 PDF
No ratings yet
Capacity and LOS - Part1 PDF
6 pages
YOLOv1 v8综述
No ratings yet
YOLOv1 v8综述
36 pages
DAYOLO
No ratings yet
DAYOLO
19 pages
CHAPTER 1 Introduction CHAPTER 2 Transportation Systems
No ratings yet
CHAPTER 1 Introduction CHAPTER 2 Transportation Systems
10 pages
Conference Proj
No ratings yet
Conference Proj
22 pages
IEEE Journal Submission Trans On MTT Example
No ratings yet
IEEE Journal Submission Trans On MTT Example
5 pages
Yolo Model grp2 Final Project
No ratings yet
Yolo Model grp2 Final Project
16 pages
Paper
No ratings yet
Paper
11 pages
Report-Traffic Counting With OpenCV
No ratings yet
Report-Traffic Counting With OpenCV
8 pages
(2024-AEJ) GC-YOLOv9 - Innovative Smart City Traffic Monitoring Solution
No ratings yet
(2024-AEJ) GC-YOLOv9 - Innovative Smart City Traffic Monitoring Solution
11 pages
Paper 13
No ratings yet
Paper 13
4 pages
Synopsis - Internship - Group-53
No ratings yet
Synopsis - Internship - Group-53
8 pages
Automated Speed Detection and Alert System Utilizing YOLO For Intelligent Transportation Management-R
No ratings yet
Automated Speed Detection and Alert System Utilizing YOLO For Intelligent Transportation Management-R
5 pages
Vehicles 06 00065
No ratings yet
Vehicles 06 00065
19 pages
Real-Time Accident Detection in Traffic Surveillance Using Deep Learning
No ratings yet
Real-Time Accident Detection in Traffic Surveillance Using Deep Learning
6 pages
22 Paper-25
No ratings yet
22 Paper-25
6 pages
Make 05 00083 v2
No ratings yet
Make 05 00083 v2
37 pages
Chapter 2
No ratings yet
Chapter 2
13 pages
YED-YOLO: An Object Detection Algorithm For Automatic Driving
No ratings yet
YED-YOLO: An Object Detection Algorithm For Automatic Driving
9 pages
IEEE Conference Template 1
No ratings yet
IEEE Conference Template 1
7 pages
JR - Kelowna Traffic Model
No ratings yet
JR - Kelowna Traffic Model
16 pages
L-LOS Interpretation (HCM 2000 Methodology)
No ratings yet
L-LOS Interpretation (HCM 2000 Methodology)
8 pages
Aws RP
No ratings yet
Aws RP
11 pages
Group4 4B South 4
No ratings yet
Group4 4B South 4
17 pages
Modified YOLO Module For Efficient Object Tracking in A Video
No ratings yet
Modified YOLO Module For Efficient Object Tracking in A Video
10 pages
Object Detection Classification and Tracking of Everyday Common Objects
No ratings yet
Object Detection Classification and Tracking of Everyday Common Objects
5 pages
Object Detection Document
No ratings yet
Object Detection Document
4 pages
IRJET Smart Traffic Control System Using
No ratings yet
IRJET Smart Traffic Control System Using
4 pages
Project
100% (1)
Project
30 pages
Fin Irjmets1657104229
No ratings yet
Fin Irjmets1657104229
6 pages
Paper 5
No ratings yet
Paper 5
13 pages
Team RUNIX - Research Paper
No ratings yet
Team RUNIX - Research Paper
9 pages
YOLO Algorithm For Real-Time Object Detection: 2.1. Network Design
No ratings yet
YOLO Algorithm For Real-Time Object Detection: 2.1. Network Design
3 pages
YOLOV8
No ratings yet
YOLOV8
13 pages
Applying Lean Construction Principles in Road Maintenance Planning and Scheduling
No ratings yet
Applying Lean Construction Principles in Road Maintenance Planning and Scheduling
12 pages
Implementation of Traffic Management
No ratings yet
Implementation of Traffic Management
5 pages
Problem
No ratings yet
Problem
2 pages
Safety Aspects at Roundabout: Presented By, Alfiya S M23CETE04
No ratings yet
Safety Aspects at Roundabout: Presented By, Alfiya S M23CETE04
15 pages
AE - IEEE - REPORT - 01fe20bei040
No ratings yet
AE - IEEE - REPORT - 01fe20bei040
5 pages
IoT-Based Traffic Control System For Emergency Vehicles
No ratings yet
IoT-Based Traffic Control System For Emergency Vehicles
5 pages
Intelligent Transport System in Ethiopia: Status and The Way Forward
No ratings yet
Intelligent Transport System in Ethiopia: Status and The Way Forward
12 pages
Town Planning and Urp Brief
No ratings yet
Town Planning and Urp Brief
4 pages
Urban and Regionnal Planning Research
No ratings yet
Urban and Regionnal Planning Research
8 pages
Modelling Congestion Along Urban Corridors in Thiruvananthapuram City Using Acceleration Noise - College Journal - Published
No ratings yet
Modelling Congestion Along Urban Corridors in Thiruvananthapuram City Using Acceleration Noise - College Journal - Published
6 pages
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
No ratings yet
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
5 pages

Based On Improved YOLOv8 and Bot SORT Surveillance

Uploaded by

Based On Improved YOLOv8 and Bot SORT Surveillance

Uploaded by

Based on improved YOLOv8 and Bot SORT

surveillance video tra c statistics

Hubei Normal University

Keywords: YOLOv8, Bot SORT, SPD-Conv, CoTAttention, Tra c ow statistics

Posted Date: March 28th, 2024

Additional Declarations: No competing interests reported.

School of Electrical Engineering and Automation, Hubei Normal

University, Huangshi 435002, P. R. China

E-mail: [email protected], [email protected].

Keywords: YOLOv8, Bot SORT, SPD-Conv, CoTAttention, Traffic flow statistics

Abstract:Aiming at the problems of leakage detection and low detection accuracy of

3. Introduction to the YOLOv8n algorithm

Figure 1 YOLOv8n network architecture diagram

Figure 2 Structure of SPD-Conv module

f0,0 = X [0 : S : scale,0 : S : scale], f0,1 = X [1: S : scale,0 : S : scale],...,

f scale−1,0 = X [scale − 1: S : scale,0 : S : scale];

f scale−1,1 = X [scale − 1: S : scale,1: S : scale];

f scale−1,scale−1 = X [scale −1: S : scale, scale −1: S : scale],...,

subgraph downsamples X by a factor of scale. Sub-feature map formation: for the

4.2 Introducing the CoTAttention Attention Mechanism

contextual culture key K1  R H W C naturally reflects the static contextual information

Figure 3 Context converter

4.3 YOLOv8n_S_C algorithm

Figure 4 YOLOv8n_S_C network structure

5. BoT-SORT tracking algorithms

6. Experimental results and analysis

6.1 Experimental environment

6.2 Data set

6.3 Evaluation indicators

6.4 Ablation experiment

SPD-Conv CoTAttention mAP50/% mAP50-95/% Parameters/M GFLOPs FPS

Figure. 5 Comparison of mAP during training for ablation experiments

Model F1/% P/% R/% mAP50/% mAP50-95/% Parameters/M GFLOPs FPS

Figure 7 Monitoring video traffic flow system

7.2 Multiple target tracking

Figure 8 Single target tracking

7.3 Line graph of traffic flow

Figure 9 Line graph of traffic flow

You might also like