RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception

Li, Lantao; Yang, Kang; Zhang, Wenqi; Wang, Xiaoxue; Sun, Chen

Computer Science > Robotics

arXiv:2501.16803 (cs)

[Submitted on 28 Jan 2025 (v1), last revised 1 Apr 2025 (this version, v2)]

Title:RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception

Authors:Lantao Li, Kang Yang, Wenqi Zhang, Xiaoxue Wang, Chen Sun

View PDF HTML (experimental)

Abstract:Cooperative perception offers an optimal solution to overcome the perception limitations of single-agent systems by leveraging Vehicle-to-Everything (V2X) communication for data sharing and fusion across multiple agents. However, most existing approaches focus on single-modality data exchange, limiting the potential of both homogeneous and heterogeneous fusion across agents. This overlooks the opportunity to utilize multi-modality data per agent, restricting the system's performance. In the automotive industry, manufacturers adopt diverse sensor configurations, resulting in heterogeneous combinations of sensor modalities across agents. To harness the potential of every possible data source for optimal performance, we design a robust LiDAR and camera cross-modality fusion module, Radian-Glue-Attention (RG-Attn), applicable to both intra-agent cross-modality fusion and inter-agent cross-modality fusion scenarios, owing to the convenient coordinate conversion by transformation matrix and the unified sampling/inversion mechanism. We also propose two different architectures, named Paint-To-Puzzle (PTP) and Co-Sketching-Co-Coloring (CoS-CoCo), for conducting cooperative perception. PTP aims for maximum precision performance and achieves smaller data packet size by limiting cross-agent fusion to a single instance, but requiring all participants to be equipped with LiDAR. In contrast, CoS-CoCo supports agents with any configuration-LiDAR-only, camera-only, or LiDAR-camera-both, presenting more generalization ability. Our approach achieves state-of-the-art (SOTA) performance on both real and simulated cooperative perception datasets. The code is now available at GitHub.

Subjects:	Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Networking and Internet Architecture (cs.NI); Image and Video Processing (eess.IV)
Cite as:	arXiv:2501.16803 [cs.RO]
	(or arXiv:2501.16803v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2501.16803

Submission history

From: Lantao Li [view email]
[v1] Tue, 28 Jan 2025 09:08:31 UTC (7,038 KB)
[v2] Tue, 1 Apr 2025 02:05:03 UTC (7,054 KB)

Computer Science > Robotics

Title:RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators