Multiple Object Tracking in Drone Aerial Videos by a Holistic Transformer and Multiple Feature Trajectory Matching Pattern
Abstract
:1. Introduction
2. Related Work
2.1. Multiple Object Tracking
2.2. Object-Feature-Based Multi-Object Tracking Methods
2.3. Joint Detection and Tracking Multi-Object Methods
2.4. Transformer-Based Multi-Object Tracking Methods
3. Methodology
3.1. Holistic Trans-Detector: Object Detection and Feature Extraction
3.1.1. Attention Computation
3.1.2. Detection Head
3.2. GAO Trajectory Matching Pattern
3.2.1. Appear-IOU Distance Matching
3.2.2. Gau-IOU Distance Matching
3.2.3. OSPA-IOU Distance Matching
3.2.4. Visual Gaussian Mixture Probability Hypothesis Density
4. Experiments
4.1. Dataset and Evaluation Metrics
4.2. Training Preprocessing
4.3. Experimental Settings
4.4. Comparative Experiments
4.4.1. Detection Comparison
4.4.2. Tracking Comparison
4.5. Ablation Experiments
4.5.1. Effect of Backbone
4.5.2. Impact of Pre-Processing and Detection Results Classification
4.5.3. Impact of Matching Strategies
4.5.4. Impact of VGM-PHD
5. Discussion
5.1. Performance Analysis
5.2. Strengths
5.3. Limitations
5.4. Future Directions
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
MOT | Multiple object tracking |
GAO | Gaussian, appearance, and optimal subpattern assignment |
IOU | Intersection over union |
OSPA | Optimal subpattern assignment |
VGM-PHD | Visual Gaussian mixture probability hypothesis density |
MOTA | Multiple object tracking accuracy |
MOTP | Multiple object tracking precision |
References
- Wu, X.; Li, W.; Hong, D.; Tao, R.; Du, Q. Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey. IEEE Geosci. Remote Sens. Mag. 2021, 10, 91–124. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, H.; Yang, Y.; Liu, H.; Yuan, D. RISTrack: Learning Response Interference Suppression Correlation Filters for UAV Tracking. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
- Dai, M.; Hu, J.; Zhuang, J.; Zheng, E. A transformer-based feature segmentation and region alignment method for UAV-view geo-localization. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 4376–4389. [Google Scholar] [CrossRef]
- Yi, S.; Liu, X.; Li, J.; Chen, L. UAVformer: A composite transformer network for urban scene segmentation of UAV images. Pattern Recogn. 2023, 133, 109019. [Google Scholar] [CrossRef]
- Yongqiang, X.; Zhongbo, L.; Jin, Q.; Zhang, K.; Zhang, B.; Feng, Q. Optimal video communication strategy for intelligent video analysis in unmanned aerial vehicle applications. Chin. J. Aeronaut. 2020, 33, 2921–2929. [Google Scholar]
- Bochinski, E.; Eiselein, V.; Sikora, T. High-speed tracking-by-detection without using image information. In Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; pp. 1–6. [Google Scholar]
- Chen, G.; Wang, W.; He, Z.; Wang, L.; Yuan, Y.; Zhang, D.; Zhang, J.; Zhu, P.; Van, G.; Han, J.; et al. VisDrone-MOT2021: The Vision Meets Drone Multiple Object Tracking Challenge Results. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 2839–2846. [Google Scholar]
- Bisio, I.; Garibotto, C.; Haleem, H.; Lavagetto, F.; Sciarrone, A. Vehicular/Non-Vehicular Multi-Class Multi-Object Tracking in Drone-based Aerial Scenes. IEEE Trans. Veh. Technol. 2023, 73, 4961–4977. [Google Scholar] [CrossRef]
- Lin, Y.; Wang, M.; Chen, W.; Gao, W.; Li, L.; Liu, Y. Multiple Object Tracking of Drone Videos by a Temporal-Association Network with Separated-Tasks Structure. Remote Sens. 2022, 14, 3862. [Google Scholar] [CrossRef]
- Al-Shakarji, N.; Bunyak, F.; Seetharaman, G.; Palaniappan, K. Multi-object tracking cascade with multi-step data association and occlusion handling. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; pp. 1–6. [Google Scholar]
- Yu, H.; Li, G.; Zhang, W.; Yao, H.; Huang, Q. Self-balance motion and appearance model for multi-object tracking in uav. In Proceedings of the 1st ACM International Conference on Multimedia in Asia, Beijing, China, 15–18 December 2019; pp. 1–6. [Google Scholar]
- Wang, Z.; Zheng, L.; Liu, Y.; Li, Y.; Wang, S. Towards real-time multi-object tracking. In Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 107–122. [Google Scholar]
- Wu, H.; Nie, J.; He, Z.; Zhu, Z.; Gao, M. One-shot multiple object tracking in UAV videos using task-specific fine-grained features. Remote Sens. 2022, 14, 3853. [Google Scholar] [CrossRef]
- Shi, L.; Zhang, Q.; Pan, B.; Zhang, J.; Su, Y. Global-Local and Occlusion Awareness Network for Object Tracking in UAVs. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 8834–8844. [Google Scholar] [CrossRef]
- Zhou, X.; Koltun, V.; Krähenbühl, P. Tracking objects as points. In Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 474–490. [Google Scholar]
- Peng, J.; Wang, C.; Wan, F.; Wu, Y.; Wang, Y.; Tai, Y.; Wang, C.; Li, J.; Huang, F.; Fu, Y. Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 145–161. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Tsai, C.; Shen, G.; Nisar, H. Swin-JDE: Joint detection and embedding multi-object tracking in crowded scenes based on swin-transformer. Eng. Appl. Artif. Intel. 2023, 119, 105770. [Google Scholar] [CrossRef]
- Hu, M.; Zhu, X.; Wang, H.; Cao, S.; Liu, C.; Song, Q. STDFormer: Spatial-Temporal Motion Transformer for Multiple Object Tracking. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 6571–6594. [Google Scholar] [CrossRef]
- Zeng, F.; Dong, B.; Zhang, Y.; Wang, T.; Zhang, X.; Wei, Y. Motr: End-to-end multiple-object tracking with transformer. In Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 659–675. [Google Scholar]
- Cai, J.; Xu, M.; Li, W.; Xiong, Y.; Xia, W.; Tu, Z.; Soatto, S. Memot: Multi-object tracking with memory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8090–8100. [Google Scholar]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Hu, Q.; Ling, H. Vision meets drones: Past, present and future. arXiv 2020, arXiv:2001.06303. [Google Scholar]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 370–386. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Fang, Y.; Liao, B.; Wang, X.; Fang, J.; Qi, J.; Wu, R.; Niu, J.; Liu, W. You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection. Adv. Neural Inf. Process. Syst. 2021, 34, 26183–26197. [Google Scholar]
- Li, Y.; Mao, H.; Girshick, R.; He, K. Exploring plain vision transformer backbones for object detection. In Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 280–296. [Google Scholar]
- Ye, T.; Qin, W.; Zhao, Z.; Gao, X.; Deng, X.; Ouyang, Y. Real-Time Object Detection Network in UAV-Vision Based on CNN and Transformer. IEEE Trans. Instrum. Meas. 2023, 72, 2505713. [Google Scholar] [CrossRef]
- Li, F.; Zhang, H.; Liu, S.; Guo, J.; Ni, L.; Zhang, L. Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13619–13627. [Google Scholar]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 1–21. [Google Scholar]
- Aharon, N.; Orfaig, R.; Bobrovsky, B. BoT-SORT: Robust associations multi-pedestrian tracking. arXiv 2022, arXiv:2206.14651. [Google Scholar]
- Liu, S.; Li, X.; Lu, H.; He, Y. Multi-Object Tracking Meets Moving UAV. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8876–8885. [Google Scholar]
- Deng, K.; Zhang, C.; Chen, Z.; Hu, W.; Li, B.; Lu, F. Jointing Recurrent Across-Channel and Spatial Attention for Multi-Object Tracking With Block-Erasing Data Augmentation. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 4054–4069. [Google Scholar] [CrossRef]
- Xiao, C.; Cao, Q.; Zhong, Y.; Lan, L.; Zhang, X.; Cai, H.; Luo, Z. Enhancing Online UAV Multi-Object Tracking with Temporal Context and Spatial Topological Relationships. Drones 2023, 7, 389. [Google Scholar] [CrossRef]
- Keawboontan, T.; Thammawichai, M. Toward Real-Time UAV Multi-Target Tracking Using Joint Detection and Tracking. IEEE Access 2023, 11, 65238–65254. [Google Scholar] [CrossRef]
- Li, J.; Ding, Y.; Wei, H.; Zhang, Y.; Lin, W. Simpletrack: Rethinking and improving the jde approach for multi-object tracking. Sensors 2022, 22, 5863. [Google Scholar] [CrossRef]
- Sun, P.; Cao, J.; Jiang, Y.; Zhang, R.; Xie, E.; Yuan, Z.; Wang, C.; Luo, P. Transtrack: Multiple object tracking with transformer. arXiv 2020, arXiv:2012.15460. [Google Scholar]
- Meinhardt, T.; Kirillov, A.; Leal-Taixe, L.; Feichtenhofer, C. Trackformer: Multi-object tracking with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8844–8854. [Google Scholar]
- Xu, Y.; Ban, Y.; Delorme, G.; Gan, C.; Rus, D.; Alameda-Pineda, X. TransCenter: Transformers with dense representations for multiple-object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 7820–7835. [Google Scholar] [CrossRef] [PubMed]
- Zhou, X.; Yin, T.; Koltun, V.; Krähenbühl, P. Global Tracking Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8771–8780. [Google Scholar]
- Chen, M.; Liao, Y.; Liu, S.; Wang, F.; Hwang, J. TR-MOT: Multi-Object Tracking by Reference. arXiv 2022, arXiv:2203.16621. [Google Scholar]
- Wu, H.; He, Z.; Gao, M. GCEVT: Learning Global Context Embedding for Vehicle Tracking in Unmanned Aerial Vehicle Videos. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
- Xu, X.; Feng, Z.; Cao, C.; Yu, C.; Li, M.; Wu, Z.; Ye, S.; Shang, Y. STN-Track: Multiobject Tracking of Unmanned Aerial Vehicles by Swin Transformer Neck and New Data Association Method. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 8734–8743. [Google Scholar] [CrossRef]
Dataset | Detector | AP | [email protected] | [email protected] | APs | APm | APl |
---|---|---|---|---|---|---|---|
VisDrone | DETR [17] | 34.8 | 63.4 | 32.2 | 12.8 | 38.5 | 55.6 |
Deformable DETR [24] | 36.9 | 60.4 | 35.2 | 9.9 | 38.1 | 52.7 | |
YOLOS [25] | 36.6 | 63.1 | 38.7 | 15.4 | 39.9 | 54.9 | |
Swin-JDE [18] | 38.2 | 60.5 | 34.8 | 11.1 | 41.4 | 57.6 | |
VitDet [26] | 38.9 | 64.7 | 38.7 | 19.6 | 40.5 | 57.8 | |
RTD-Net [27] | 38.1 | 64.6 | 40.2 | 17.6 | 42.8 | 57.6 | |
DN-DETR [28] | 39.4 | 63.4 | 36.5 | 16.8 | 42.5 | 59.2 | |
Holistic Trans-Det | 39.6 | 67.9 | 40.8 | 18.6 | 40.3 | 59.4 | |
UAVDT | DETR [17] | 48.8 | 69.3 | 49.3 | 28.0 | 47.5 | 57.1 |
Deformable DETR [24] | 47.2 | 69.2 | 50.3 | 29.0 | 53.2 | 59.4 | |
YOLOS [25] | 49.3 | 71.1 | 51.4 | 32.3 | 50.4 | 58.9 | |
Swin-JDE [18] | 49.6 | 69.9 | 52.8 | 33.9 | 54.8 | 59.7 | |
VitDet [26] | 54.6 | 68.9 | 59.5 | 37.5 | 57.9 | 61.0 | |
RTD-Net [27] | 52.2 | 71.4 | 55.6 | 36.3 | 57.2 | 60.9 | |
DN-DETR [28] | 56.7 | 68.6 | 60.2 | 38.7 | 59.8 | 62.9 | |
Holistic Trans-Det | 57.5 | 69.0 | 60.5 | 38.8 | 61.5 | 67.9 |
Tracker | MOTA↑ | MOTP↑ | IDF1 (%)↑ | IDSW↓ | MT (%)↑ | ML (%)↑ | FP↓ | FN↓ | |
---|---|---|---|---|---|---|---|---|---|
Motion- based | DeepSORT [29] | 19.4 | 69.8 | 33.1 | 6387 | 38.8 | 52.2 | 15,181 | 44,830 |
ByteTrack [30] | 25.1 | 72.6 | 40.8 | 4590 | 42.8 | 50.3 | 10,722 | 24,376 | |
BoT-SORT [31] | 23.0 | 71.6 | 41.4 | 7014 | 51.9 | 73.6 | 10,701 | 47,922 | |
UAVMOT [32] | 25.0 | 72.3 | 40.5 | 6644 | 52.6 | 49.6 | 10,134 | 55,630 | |
DCMOT [33] | 33.5 | 76.1 | 45.5 | 1139 | - | - | 12,594 | 64,856 | |
TFAM [34] | 30.9 | 74.4 | 42.7 | 3998 | - | - | 27,732 | 126,811 | |
MTTJDT [35] | 31.2 | 73.2 | 43.6 | 2415 | - | - | 25,976 | 183,381 | |
Transformer- based | TransTrack [37] | 27.3 | 62.1 | 28.3 | 2523 | 33.5 | 59.7 | 15,028 | 51,396 |
TrackFormer [38] | 24 | 77.3 | 38 | 4724 | 39 | 46.3 | 11,731 | 32,807 | |
TransCenter [39] | 29.9 | 66.6 | 46.8 | 3446 | 33.4 | 61.8 | 15,104 | 20,894 | |
MOTR [20] | 13.1 | 72.4 | 47.1 | 2997 | 52.9 | 72 | 12,216 | 42,186 | |
MeMOT [21] | 29.4 | 73 | 48.7 | 3755 | 46.7 | 47.9 | 9963 | 30,062 | |
GTR [40] | 28.1 | 76.8 | 54.5 | 2000 | 61.3 | 57.6 | 8165 | 10,553 | |
TR-MOT [41] | 29.9 | 64.3 | 46 | 1005 | 42.8 | 59.9 | 7593 | 17,352 | |
GCEVT [42] | 34.5 | 73.8 | 50.6 | 841 | 520 | 612 | - | - | |
STN-Track [43] | 38.6 | - | 73.7 | 668 | 31.4 | 51.2 | 7385 | 76,006 | |
STDFormer [19] | 35.9 | 74.5 | 59.9 | 1441 | 52.7 | 60.3 | 8527 | 20,558 | |
GAO-Tracker | 38.8 | 76.3 | 54.3 | 972 | 55.9 | 52.4 | 6883 | 10,204 |
Tracker | MOTA↑ | MOTP↑ | IDF1 (%)↑ | IDSW↓ | MT (%)↑ | ML (%)↑ | FP↓ | FN↓ | |
---|---|---|---|---|---|---|---|---|---|
Motion- based | DeepSORT [29] | 35.9 | 71.5 | 58.3 | 698 | 43.4 | 25.7 | 50,513 | 59,733 |
ByteTrack [30] | 39.1 | 74.3 | 44.7 | 2341 | 43.8 | 28.1 | 14,468 | 87,485 | |
BoT-SORT [31] | 37.2 | 72.1 | 53.1 | 1692 | 40.8 | 27.3 | 42,286 | 64,494 | |
UAVMOT [32] | 43.0 | 73.5 | 61.5 | 641 | 45.3 | 22.7 | 27,832 | 65,467 | |
SimpleTrack [36] | 45.3 | 73.9 | 57.1 | 1404 | 43.6 | 22.5 | 21,153 | 53,448 | |
TFAM [34] | 47.0 | 72.9 | 67.8 | 506 | - | - | 68,282 | 111,959 | |
Transformer- based | TransTrack [37] | 33.2 | 72.4 | 67.6 | 1122 | 38.9 | 23.8 | 50,746 | 54,938 |
TrackFormer [38] | 53.4 | 74.2 | 46.3 | 2247 | 43.7 | 23.3 | 13,719 | 91,061 | |
TransCenter [39] | 48.9 | 73.9 | 51.3 | 2287 | 32.6 | 35.1 | 27,995 | 93,013 | |
MOTR [20] | 35.6 | 72.5 | 56.1 | 1759 | 39.8 | 29.3 | 39,733 | 56,368 | |
MeMOT [21] | 45.6 | 74.6 | 62.8 | 2118 | 34.9 | 26.5 | 38,933 | 59,156 | |
GTR [40] | 46.5 | 75.3 | 61.1 | 1482 | 42.7 | 18.6 | 21,676 | 52,617 | |
TR-MOT [41] | 57.7 | 74.1 | 55.7 | 2461 | 33.9 | 21.3 | 32,217 | 50,838 | |
GCEVT [42] | 47.6 | 73.4 | 68.6 | 1801 | 618 | 363 | - | - | |
STN-Track [43] | 60.6 | - | 73.1 | 1420 | 57.0 | 17.0 | 12,825 | 61,760 | |
STDFormer [19] | 60.6 | 74.8 | 61.7 | 1642 | 44.6 | 20.3 | 20,258 | 41,895 | |
GAO-Tracker | 61.7 | 75.2 | 67.9 | 1216 | 45.3 | 24.6 | 24,915 | 59,640 |
Dataset | Detector Backbone | MOTA↑ | MOTP↑ | IDF1 (%)↑ | IDSW↓ | MT (%)↑ | ML (%)↑ | FP↓ | FN↓ |
---|---|---|---|---|---|---|---|---|---|
VisDrone | ResNet-50 | 19.6 | 59.9 | 36.7 | 4287 | 35.3 | 31.3 | 9078 | 18,764 |
DLA-34 | 34.9 | 68.5 | 50.3 | 2198 | 46.3 | 43.5 | 8818 | 13,070 | |
ViT | 35.2 | 69.7 | 51.0 | 2019 | 48.9 | 45.9 | 8009 | 12,897 | |
Swin-L | 35.5 | 70.2 | 52.3 | 1509 | 51.9 | 47.6 | 6832 | 12,223 | |
Holistic Trans | 38.8 | 76.3 | 54.3 | 972 | 55.9 | 52.4 | 6883 | 10,204 | |
UAVDT | ResNet-50 | 56.2 | 70.3 | 62.1 | 2252 | 40.4 | 22.6 | 32,743 | 72,629 |
DLA-34 | 61.9 | 75.1 | 66.4 | 1798 | 42.4 | 23.4 | 28,705 | 65,616 | |
ViT | 60.1 | 74.0 | 65.9 | 1504 | 42.8 | 23.7 | 26,937 | 62,348 | |
Swin-L | 59.6 | 74.4 | 66.0 | 1264 | 43.9 | 23.8 | 25,822 | 61,324 | |
Holistic Trans | 61.7 | 75.2 | 67.9 | 1216 | 45.3 | 24.6 | 24,915 | 59,640 |
Dataset | Method | MOTA↑ | MOTP↑ | IDF1 (%)↑ | IDSW↓ | MT (%)↑ | ML (%)↑ | FP↓ | FN↓ |
---|---|---|---|---|---|---|---|---|---|
VisDrone | Baseline | 36.2 | 70.9 | 52.5 | 1344 | 53.1 | 49.3 | 9117 | 11,987 |
B+Pre | 37.6 | 71.2 | 52.8 | 1320 | 54.3 | 50.1 | 9135 | 11,499 | |
B+Grade | 37.3 | 74.2 | 52.7 | 1138 | 54.7 | 51.2 | 9627 | 11,060 | |
B+Pre+Grade | 38.8 | 76.3 | 54.3 | 972 | 55.9 | 52.4 | 6883 | 10,204 | |
UAVDT | Baseline | 57.8 | 72.0 | 64.0 | 1841 | 42.4 | 23.3 | 29,057 | 67,373 |
B+Pre | 59.3 | 74.4 | 65.6 | 1398 | 43.8 | 23.8 | 25,836 | 62,429 | |
B+Grade | 60.4 | 74.7 | 66.1 | 1221 | 44.5 | 23.9 | 25,418 | 60,828 | |
B+Pre+Grade | 61.7 | 75.2 | 67.9 | 1216 | 45.3 | 24.6 | 24,915 | 59,640 |
Dataset | Method | MOTA↑ | MOTP↑ | IDF1 (%)↑ | IDSW↓ | MT (%)↑ | ML (%)↑ | FP↓ | FN↓ |
---|---|---|---|---|---|---|---|---|---|
VisDrone | Baseline | 36.2 | 70.9 | 52.5 | 1552 | 53.1 | 49.3 | 9117 | 11,987 |
B+Appear-IOU | 37.1 | 74.4 | 53.2 | 1334 | 53.2 | 49.2 | 9209 | 11,027 | |
B+Appear-IOU+Gau-IOU | 38.3 | 75.6 | 53.9 | 1052 | 53.8 | 49.9 | 7343 | 10,946 | |
B+Appear-IOU+Gau-IOU+ OSPA-IOU | 38.8 | 76.3 | 54.3 | 972 | 55.9 | 52.4 | 6883 | 10,204 | |
UAVDT | Baseline | 57.8 | 72.0 | 64.0 | 1841 | 42.4 | 23.3 | 29,057 | 67,373 |
B+Appear-IOU | 58.2 | 73.8 | 64.9 | 1536 | 43.0 | 23.7 | 29,133 | 63,781 | |
B+Appear-IOU+Gau-IOU | 60.9 | 74.9 | 66.3 | 1297 | 45.0 | 24.0 | 25,011 | 60,369 | |
B+Appear-IOU+Gau-IOU+ OSPA-IOU | 61.7 | 75.2 | 67.9 | 1216 | 45.3 | 24.6 | 24,915 | 59,640 |
Dataset | Method | MOTA↑ | MOTP↑ | IDF1 (%)↑ | IDSW↓ | MT (%)↑ | ML (%)↑ | FP↓ | FN↓ |
---|---|---|---|---|---|---|---|---|---|
VisDrone | No trajectory prediction | 29.9 | 64.4 | 49.3 | 2497 | 42.8 | 42.8 | 8719 | 15,226 |
Kalman Filter | 35.3 | 69.9 | 50.6 | 1727 | 51.4 | 47.5 | 8998 | 12,302 | |
VGM-PHD | 38.8 | 76.3 | 54.3 | 972 | 55.9 | 52.4 | 6883 | 10,204 | |
UAVDT | No trajectory prediction | 43.1 | 61.2 | 46.4 | 4437 | 32.3 | 17.4 | 49,018 | 99,620 |
Kalman Filter | 52.8 | 68.0 | 56.1 | 3069 | 37.4 | 21.0 | 38,389 | 82,471 | |
VGM-PHD | 61.7 | 75.2 | 67.9 | 1216 | 45.3 | 24.6 | 24,915 | 59,640 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yuan, Y.; Wu, Y.; Zhao, L.; Pang, Y.; Liu, Y. Multiple Object Tracking in Drone Aerial Videos by a Holistic Transformer and Multiple Feature Trajectory Matching Pattern. Drones 2024, 8, 349. https://doi.org/10.3390/drones8080349
Yuan Y, Wu Y, Zhao L, Pang Y, Liu Y. Multiple Object Tracking in Drone Aerial Videos by a Holistic Transformer and Multiple Feature Trajectory Matching Pattern. Drones. 2024; 8(8):349. https://doi.org/10.3390/drones8080349
Chicago/Turabian StyleYuan, Yubin, Yiquan Wu, Langyue Zhao, Yaxuan Pang, and Yuqi Liu. 2024. "Multiple Object Tracking in Drone Aerial Videos by a Holistic Transformer and Multiple Feature Trajectory Matching Pattern" Drones 8, no. 8: 349. https://doi.org/10.3390/drones8080349