Based On Improved YOLOv8 and Bot SORT Surveillance
Based On Improved YOLOv8 and Bot SORT Surveillance
Research Article
DOI: https://doi.org/10.21203/rs.3.rs-4161504/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Yiqun Yang, Daneng Pi* , Lingyan Wang, Mingliang Bao, Jianfu Ge, Tingchen
Yuan, Houshi Yu, and Qi Zhou
f0,scale−1 = X [0 : S : scale, scale −1: S : scale], f1,scale−1 = X [1: S : scale, scale −1: S : scale],...,
For the usual case, for any (original) feature graph X, the fx , y subgraph consists of
all entries X (i, j ) that satisfy i + x and i + y that are divisible by scale. Thus, each
input sequences. Assume that there is the same input 2D feature graph X R H W C .
The keys, queries and values are defined as K = X , Q = X and V = XWv . Unlike the
typical self-attention mechanism where 1×1 convolution is performed for each key,
the CoT block first contextualizes each key representation by spatially contextualizing
all neighboring keys within the k k grid through k k group convolution. The
TP
R= × 100%
TP + FN
(3) Average Precision (AP) AP: measures the detection performance of the model
on different categories. Average precision is obtained by calculating the area under the
precision-recall curve for each category. The calculation formula is as follows:
1
AP = ∫ P(R)dR
0
(4) Mean Average Precision (mAP): indicates the average detection precision of
all categories. The calculation formula is as follows:
n
1
mAP = ∑ AP(i) × 100%
n
i
(5) F1 score (F1 score): combines the precision rate and the recall rate, and is the
reconciled average of the precision rate and the recall rate. The formula is as follows:
2×P×R
F1 = × 100%
P+R
(6) Frames per Second (FPS): Indicates the performance of the algorithm in terms
of processing the number of frames per second.
- -
78.6 56.8 3.01 8.2 303
√ -
79.9 58.3 3.27 11.7 384
√ √
80.4 58.9 4.17 13.2 322
Through the ablation experiments in Table 1, the first row reflects the benchmark
performance of the original YOLOv8n on the dataset. After introducing the
SPD-Conv module and CoTAttention attention mechanism respectively, it is observed
that the SPD-Conv module improves the detection results more significantly,
including the mAP50, mAP50-95 and FPS. Through analysis, it is concluded that the
SPD-Conv module introduces spatial depth convolution to better capture the context
and features of the object, which is especially suitable for scenes that need to deal
with objects of different scales. In contrast, the relatively weak performance of the
CoTAttention attention mechanism is due to the fact that the contextual information
has less impact on the detection accuracy, and therefore, CoTAttention has a small
improvement on the detection performance. Meanwhile, the introduction of
SPD-Conv and CoTAttention achieves the best results for the detection network, with
mAP50 and mAP50-95 improving by 1.8, 2.1 percentage points. This shows that the
combined effect of the two modules achieved a synergistic effect in improving the
detection performance.
YOLOv5n
73.84 78.0 70.1 78.4 55.8 2.51 7.2 400
YOLOv6n
73.95 82.2 67.2 78.1 56.4 4.23 11.9 303
YOLOv8n
75.23 80.5 70.6 78.6 56.8 3.01 8.2 303
YOLOv8n_S_C
75.59 82.7 69.6 80.4 58.9 4.17 13.2 322
As can be observed in Table 2, the improved YOLOv8n_S_C model performed
the best in terms of accuracy at 82.7%. The model achieved the highest score of 80.4%
for mAP50 and also the highest level of 58.9% for mAP50-95. This indicates that the
YOLOv8n_S_C model has excellent comprehensive performance in the target
detection task. Its F1 score is 75.59%, which is the highest level among the models.
7. Traffic statistics
7.1 Traffic flow system interface
The user interface of the monitoring video traffic system was developed using the
Python language in conjunction with the PySide6 module. At the top of the interface
is the title of the system, and directly below the title are four display boxes showing
the total number of categories, the total number of targets, the average frame rate, and
the usage model. Between the title and the display boxes, the current traffic flow is
displayed. Below the hidden buttons on the left side of the system interface are some
function buttons, including Local File, Call Camera, Call RTSP Monitor, Traffic Line
Chart, Single Target Tracking, and Enable Web Side.
8. Concluding remarks
In a system based on improved YOLOv8 with Bot SORT surveillance video
traffic counting, this paper combines a target detection technique (YOLOv8n_S_C)
and a single-target tracking algorithm (Bot SORT) to achieve accurate detection,
tracking, and counting of vehicles in surveillance video. This system provides a
powerful tool for traffic management and surveillance, capable of accurately
capturing the location, movement trajectory, and changes in traffic flow of vehicles in
real-time video streams. Through the simple operation of the system, it not only
generates line graphs of traffic flow to visualize the trend of traffic flow over time, but
also analyzes in-depth the impact of traffic on traffic flow. This provides traffic
managers with more comprehensive information, enabling them to develop more
targeted traffic optimization strategies to improve the overall efficiency of the road
network. It brings innovation and convenience to urban traffic management and lays a
solid foundation for building a smarter city.
9. References
[1] J. E. Park, W. Byun, Y. Kim, H. Ahn, D. K. Shin, Journal of Advanced
Transportation, 2021, 1.
[2] Z. Wang, J. Zhan, C. Duan, X. Guan, P. Lu, K. Yang. IEEE Transactions on
Neural Networks and Learning Systems.2022.
[3] Y. Fang, C. Wang, W. Yao, X. Zhao, H. Zhao, H. Zha.. On-road vehicle tracking
using part-based particle filter. IEEE transactions on intelligent transportation systems,
2019, 20, 4538.
[4] J. Ju, J. **ng. Multimedia tools and applications, 2019 ,78, 29937.
[5] M. P. Dessauer, S. Dua. In Ground/air multi-sensor interoperability, integration,
and networking for persistent ISR. 2010 7694, 366.
[6] B. Hardjono, H. Tjahyadi, M. G. Rhizma, A. E. Widjaja, R. Kondorura, A. M.
Halim. IEEE 9th Annual Information Technology, Electronics and Mobile
Communication Conference. 2018. 556.
[7] A. Anoop, G. Harikrishnan, K. Nair, B. Sangeetha, V. Praseedalekshmi.
International Conference on Innovations in Science and Technology for Sustainable
Development. 2022. 258.
[8] M. Carranza-García, J. Torres-Mateo, P. Lara-Benítez, J. García-Gutiérrez.
Remote Sensing, 2020, 13, 89.
[9] H. Wang, Y. Yu, Y. Cai, X. Chen, L. Chen, Y. Li. IEEE Transactions on
Intelligent Vehicles, 2020, 6, 100.
[10] K. Lenc, A. Vedaldi, . ar**v preprint ar**v.2015. 1506 . 06981.
[11] R. Girshick. In Proceedings of the IEEE international conference on computer
vision. 2015, 1440.
[12] H. Jiang, E. Learned-Miller. IEEE international conference on automatic face
gesture recognition. 2017, 650.
[13] K. He, G. Gkioxari, P. Dollár, R. Girshick. In Proceedings of the IEEE
international conference on computer vision. 2017, 2961.
[14] J. Jeong, H. Park, N. Kwak. ar**v preprint ar**v. 2017, 1705, 09587.
[15] H. Zhang, H. Chang, B. Ma, S. Shan, X. Chen. ar**v preprint ar**v. 2019, 1907,
06881.
[16] M. Hussain. Machines. 2023, 11, 677.
[17] P. Kranz, U. Ali, A. Mueller, M. Hornauer, M. Loeser, F. Sukkar, T. Kaupp.
2021.
[18] X. Hou, Y. Wang, L. P. Chau. IEEE International Conference on Advanced
Video and Signal Based Surveillance. 2019, 1.
[19] E. Hamuda, B. Mc Ginley, M. Glavin, E. Jones. Computers and electronics in
agriculture, 2018, 148, 37.
[20] C. K. Chui, G. Chen. Berlin, Germany: Springer International Publishing. 2017,
19.
[21] S. Yan, Y. Fu, W. Zhang, W. Yang, R. Yu, F Zhang. International Conference on
Electronic Engineering and Informatics. 2023. 506.
[22] Z. Tan, B. Chen, L. Sun, H. Xu, K. Zhang, F Chen. Information Technology and
Control. 2023, 52, 878.
[23] M. T. Ibrahim, R. Hafiz, M. M. Khan, Y. Cho. Multimedia Systems,2016, 22,
379-392.
[24] Z. Zhang, X. Lu, G. Cao, Y. Yang, L. Jiao, F Liu. In Proceedings of the
IEEE/CVF international conference on computer vision.2021, 2799.
[25] G. Yu, X. Zhou. Mathematics. 2023,11, 2377.
[26] Y. Fan, G. Tohti, M. Geni, G. Zhang, J. Yang. A marigold corolla detection
model based on the improved YOLOv7 lightweight. 2023
[27] G Wang, Y. Chen, P. An, H. Hong, J. Hu, T. Huang. Sensors, 2023, 23, 16, 7190.
[28] Z. Yang, Q. Wu, F. Zhang, X. Zhang, X. Chen, Y. Gao. Symmetry,2023, 15,
1037.
[29] Q. Geng, H. Liu, T. Gao, R. Liu, C. Chen, Q. Zhu, M. Shu. In Healthcare.2023.
[30] J. Lee, K. C. Lee, S. Jeong, Y. J. Lee, S. H. Sim. Mechanical Systems and Signal
Processing. 2020, 140, 106651.
[31] L. Wen, D. Du, Z. Cai, Z. Lei, M. C. Chang, H. Qi, S. Lyu. UA-DETRAC:
Computer Vision and Image Understanding, 2020, 193, 102907.
[32] P. Dendorfer, A. Osep, A. Milan, K. Schindler, D. Cremers, I. Reid, L.
Leal-Taixé. Motchallenge: International Journal of Computer Vision. 2021, 129, 845.