Collaborative Joint Perception and Prediction for Autonomous Driving
Abstract
:1. Introduction
- It extends the static single-frame information sharing to a multi-frame spatial–temporal information sharing framework. This allows agents to exchange comprehensive spatial–temporal information with minimal communication overhead in a single round of collaboration. Thereby, the collaborative message is enhanced and the system can support tasks that require temporal information.
- The system fully considers the spatial–temporal importance to the PnP task, ensuring that the most critical information is retained during information sharing, which makes the collaboration effective and efficient.
- The system simultaneously outputs the perception and prediction results decoded from a common fused spatial–temporal feature. This approach directly benefits prediction and mitigates the accumulation of cascade error.
- (1)
- We propose CoPnP, a novel collaborative joint perception and Prediction system for autonomous driving, which extends the static single-frame information sharing in previous work to multi-frame spatial–temporal information sharing, expanding the benefits and promoting the application of collaboration among road agents.
- (2)
- To achieve effective and communication-efficient information sharing, we propose two novel designs in the CoPnP system: a task-oriented spatial–temporal information refine model to refine the collaborative messages and a spatial–temporal importance-aware feature-fusion model to comprehensively fuse the spatial–temporal features.
- (3)
- We generate PnP labels for two public large-scale collaborative perception datasets, OPV2V and V2XSet, and conduct experiments on these two datasets to validate the proposed CoPnP. The experimental results show that the proposed CoPnP outperforms existing state-of-the-art collaboration methods on the PnP task with a brilliant performance-communication trade-off and has a performance gain up to 11.51%/10.34% IoU and 12.31%/10.96% VPQ on datasets OPV2V/V2XSet.
2. Related Works
2.1. Collaborative Perception
2.2. Joint Perception and Prediction
3. Methods: Collaborative Joint Perception and Prediction System
3.1. Problem Statement
3.2. System
3.3. Training Loss
4. Evaluation
4.1. Dataset
4.2. Metrics and Implementation
4.3. Quantitative Evaluation
4.4. Qualitative Evaluation
4.5. Ablation Study
4.6. Discussion of the Generalization and Robustness
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, Y.; Ibanez-Guzman, J. Lidar for Autonomous Driving: The Principles, Challenges, and Trends for Automotive Lidar and Perception Systems. IEEE Signal Process. Mag. 2020, 37, 50–61. [Google Scholar] [CrossRef]
- Zhou, T.; Yang, M.; Jiang, K.; Wong, H.; Yang, D. MMW Radar-Based Technologies in Autonomous Driving: A Review. Sensors 2020, 20, 7283. [Google Scholar] [CrossRef] [PubMed]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed]
- Ren, S.; Chen, S.; Zhang, W. Collaborative perception for autonomous driving: Current status and future trend. In Proceedings of the 2021 5th Chinese Conference on Swarm Intelligence and Cooperative Control; Springer: Singapore, 2022; pp. 682–692. [Google Scholar]
- Shan, M.; Narula, K.; Wong, Y.F.; Worrall, S.; Khan, M.; Alexander, P.; Nebot, E. Demonstrations of cooperative perception: Safety and robustness in connected and automated vehicle operations. Sensors 2020, 21, 200. [Google Scholar] [CrossRef] [PubMed]
- Schiegg, F.A.; Llatser, I.; Bischoff, D.; Volk, G. Collective perception: A safety perspective. Sensors 2020, 21, 159. [Google Scholar] [CrossRef] [PubMed]
- Wang, T.H.; Manivasagam, S.; Liang, M.; Yang, B.; Zeng, W.; Urtasun, R. V2VNet: Vehicle-to-vehicle communication for joint perception and prediction. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 605–621. [Google Scholar]
- Li, Y.; Ren, S.; Wu, P.; Chen, S.; Feng, C.; Zhang, W. Learning distilled collaboration graph for multi-agent perception. Adv. Neural Inf. Process. Syst. 2021, 34, 29541–29552. [Google Scholar]
- Hu, Y.; Fang, S.; Lei, Z.; Yiqi, Z.; Chen, S. Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps. In Proceedings of the Thirty-Sixth Conference on Neural Information Processing Systems (Neurips), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Ren, S.; Lei, Z.; Wang, Z.; Dianati, M.; Wang, Y.; Chen, S.; Zhang, W. Interruption-Aware Cooperative Perception for V2X Communication-Aided Autonomous Driving. IEEE Trans. Intell. Veh. 2024, 9, 4698–4714. [Google Scholar] [CrossRef]
- Lei, Z.; Ren, S.; Hu, Y.; Zhang, W.; Chen, S. Latency-aware collaborative perception. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 316–332. [Google Scholar]
- Lu, Y.; Li, Q.; Liu, B.; Dianati, M.; Feng, C.; Chen, S.; Wang, Y. Robust Collaborative 3D Object Detection in Presence of Pose Errors. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation, London, UK, 29 May–2 June 2023. [Google Scholar]
- Xu, R.; Xiang, H.; Xia, X.; Han, X.; Li, J.; Ma, J. OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication. In Proceedings of the 2022 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022. [Google Scholar]
- Xu, R.; Xiang, H.; Tu, Z.; Xia, X.; Yang, M.H.; Ma, J. V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Ngo, H.; Fang, H.; Wang, H. Cooperative Perception with V2V Communication for Autonomous Vehicles. IEEE Trans. Veh. Technol. 2023, 72, 11122–11131. [Google Scholar] [CrossRef]
- Chen, Q.; Tang, S.; Yang, Q.; Fu, S. Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In Proceedings of the 2019 IEEE 39th International Conference on Distributed Computing Systems, Dallas, TX, USA, 7–10 July 2019; pp. 514–524. [Google Scholar]
- Chen, Q.; Ma, X.; Tang, S.; Guo, J.; Yang, Q.; Fu, S. F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3D point clouds. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, Washington, DC, USA, 7–9 November 2019; pp. 88–100. [Google Scholar]
- Arnold, E.; Mozaffari, S.; Dianati, M. Fast and robust registration of partially overlapping point clouds. IEEE Robot. Autom. Lett. 2021, 7, 1502–1509. [Google Scholar] [CrossRef]
- Li, Y.; Ma, D.; An, Z.; Wang, Z.; Zhong, Y.; Chen, S.; Feng, C. V2X-Sim: Multi-Agent Collaborative Perception Dataset and Benchmark for Autonomous Driving. IEEE Robot. Autom. Lett. 2022, 7, 10914–10921. [Google Scholar] [CrossRef]
- Yu, H.; Luo, Y.; Shu, M.; Huo, Y.; Yang, Z.; Shi, Y.; Guo, Z.; Li, H.; Hu, X.; Yuan, J.; et al. DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 21361–21370. [Google Scholar]
- Xu, R.; Xia, X.; Li, J.; Li, H.; Zhang, S.; Tu, Z.; Meng, Z.; Xiang, H.; Dong, X.; Song, R.; et al. V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 13712–13722. [Google Scholar]
- Wei, S.; Wei, Y.; Hu, Y.; Lu, Y.; Zhong, Y.; Chen, S.; Zhang, Y. Asynchrony-Robust Collaborative Perception via Bird’s Eye View Flow. In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Vadivelu, N.; Ren, M.; Tu, J.; Wang, J.; Urtasun, R. Learning to communicate and correct pose errors. In Proceedings of the Conference on Robot Learning, London, UK, 8–11 November 2021; pp. 1195–1210. [Google Scholar]
- Sun, C.; Zhang, R.; Lu, Y.; Cui, Y.; Deng, Z.; Cao, D.; Khajepour, A. Toward Ensuring Safety for Autonomous Driving Perception: Standardization Progress, Research Advances, and Perspectives. IEEE Trans. Intell. Transp. Syst. 2024, 25, 3286–3304. [Google Scholar] [CrossRef]
- Hell, F.; Hinz, G.; Liu, F.; Goyal, S.; Pei, K.; Lytvynenko, T.; Knoll, A.; Yiqiang, C. Monitoring perception reliability in autonomous driving: Distributional shift detection for estimating the impact of input data on prediction accuracy. In Proceedings of the 5th ACM Computer Science in Cars Symposium, Ingolstadt, Germany, 30 November 2021; pp. 1–9. [Google Scholar]
- Berk, M.; Schubert, O.; Kroll, H.M.; Buschardt, B.; Straub, D. Exploiting Redundancy for Reliability Analysis of Sensor Perception in Automated Driving Vehicles. IEEE Trans. Intell. Transp. Syst. 2020, 21, 5073–5085. [Google Scholar] [CrossRef]
- Casas, S.; Luo, W.; Urtasun, R. Intentnet: Learning to predict intention from raw sensor data. In Proceedings of the Conference on Robot Learning, Zürich, Switzerland, 29–31 October 2018; pp. 947–956. [Google Scholar]
- Luo, W.; Yang, B.; Urtasun, R. Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Liang, M.; Yang, B.; Zeng, W.; Chen, Y.; Hu, R.; Casas, S.; Urtasun, R. Pnpnet: End-to-end perception and prediction with tracking in the loop. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11553–11562. [Google Scholar]
- Li, L.L.; Yang, B.; Liang, M.; Zeng, W.; Ren, M.; Segal, S.; Urtasun, R. End-to-end contextual perception and prediction with interaction transformer. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 5784–5791. [Google Scholar]
- Wong, K.; Wang, S.; Ren, M.; Liang, M.; Urtasun, R. Identifying Unknown Instances for Autonomous Driving. In Proceedings of the Conference on Robot Learning, Virtual, 16–18 November 2020; Kaelbling, L.P., Kragic, D., Sugiura, K., Eds.; Proceedings of Machine Learning Research. PMLR: London, UK, 2020; Volume 100, pp. 384–393. [Google Scholar]
- Hu, A.; Murez, Z.; Mohan, N.; Dudas, S.; Hawke, J.; Badrinarayanan, V.; Cipolla, R.; Kendall, A. Fiery: Future instance prediction in bird’s-eye view from surround monocular cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 15273–15282. [Google Scholar]
- Hu, S.; Chen, L.; Wu, P.; Li, H.; Yan, J.; Tao, D. St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 533–549. [Google Scholar]
- Zhang, Y.; Zhu, Z.; Zheng, W.; Huang, J.; Huang, G.; Zhou, J.; Lu, J. Beverse: Unified perception and prediction in birds-eye-view for vision-centric autonomous driving. arXiv 2022, arXiv:2205.09743. [Google Scholar]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12697–12705. [Google Scholar]
- Xu, R.; Xiang, H.; Han, X.; Xia, X.; Meng, Z.; Chen, C.J.; Correa-Jullian, C.; Ma, J. The OpenCDA Open-Source Ecosystem for Cooperative Driving Automation Research. IEEE Trans. Intell. Veh. 2023, 8, 2698–2711. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Wu, P.; Chen, S.; Metaxas, D.N. Motionnet: Joint perception and motion prediction for autonomous driving based on bird’s eye view maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11385–11395. [Google Scholar]
Notation | Definition | Abbreviation | Full Name |
---|---|---|---|
T | Number of past timesteps as input | PnP | Perception and prediction |
Number of future timesteps to predict | CAV | Connected autonomous vehicle | |
Collaboration agent (vehicle or roadside unit) | RSU | Roadside unit | |
N | Number of collaboration agents in the scene | V2X | Vehicle-to-everything |
t | Timestep | V2V | Vehicle-to-vehicle |
Set of neighbor collaboration agents of agent | V2I | Vehicle-to-infrastructure | |
Raw sensor data observed by agent at timestep t | BEV | Bird’s eye view | |
Single-frame features extracted from | STI | Spatial–temporal importance | |
Refined spatial–temporal features of agent at timestep t | IoU | Intersection over union | |
Compressed features of | VPQ | Video panoptic quality | |
Decompressed features of | |||
PnP results of agent at timestep t | |||
Ground truth of |
Single-Agent Perception | Late Fusion | V2VNet * | F-Cooper | F-Cooper * | DiscoNet | DiscoNet * | CoPnP * | |
---|---|---|---|---|---|---|---|---|
OPV2V | 61.38 | 73.96 | 72.21 | 75.68 | 78.54 | 71.33 | 76.45 | 79.43 |
V2XSet | 56.53 | 65.05 | 64.07 | 66.66 | 69.97 | 63.54 | 70.00 | 73.51 |
Dataset | STIR | STI-Aware Fusion | OPV2V | V2XSet | Communication Volume in Log Scale | ||
---|---|---|---|---|---|---|---|
Metric | IoU | VPQ | IoU | VPQ | |||
Vanilla collaboration model | 64.95 | 60.68 | 59.47 | 54.87 | 7.43 | ||
STIR+Maxfusion | ✓ | 65.04 | 61.20 | 60.16 | 55.45 | 6.95 | |
STIR+Discofusion | ✓ | 66.44 | 62.29 | 61.23 | 57.05 | 6.95 | |
STIR+STI-aware fusion | ✓ | ✓ | 67.16 | 63.08 | 62.68 | 59.14 | 6.95 |
STIR+STI-aware fusion + 64× compression (CoPnP) | ✓ | ✓ | 67.06 | 62.99 | 62.35 | 58.85 | 5.15 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ren, S.; Chen, S.; Zhang, W. Collaborative Joint Perception and Prediction for Autonomous Driving. Sensors 2024, 24, 6263. https://doi.org/10.3390/s24196263
Ren S, Chen S, Zhang W. Collaborative Joint Perception and Prediction for Autonomous Driving. Sensors. 2024; 24(19):6263. https://doi.org/10.3390/s24196263
Chicago/Turabian StyleRen, Shunli, Siheng Chen, and Wenjun Zhang. 2024. "Collaborative Joint Perception and Prediction for Autonomous Driving" Sensors 24, no. 19: 6263. https://doi.org/10.3390/s24196263