ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking

Sadjadpour, Tara; Ambrus, Rares; Bohg, Jeannette

Computer Science > Computer Vision and Pattern Recognition

arXiv:2310.02532 (cs)

[Submitted on 4 Oct 2023]

Title:ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking

Authors:Tara Sadjadpour, Rares Ambrus, Jeannette Bohg

View PDF

Abstract:3D multi-object tracking (MOT) is essential for an autonomous mobile agent to safely navigate a scene. In order to maximize the perception capabilities of the autonomous agent, we aim to develop a 3D MOT framework that fuses camera and LiDAR sensor information. Building on our prior LiDAR-only work, ShaSTA, which models shape and spatio-temporal affinities for 3D MOT, we propose a novel camera-LiDAR fusion approach for learning affinities. At its core, this work proposes a fusion technique that generates a rich sensory signal incorporating information about depth and distant objects to enhance affinity estimation for improved data association, track lifecycle management, false-positive elimination, false-negative propagation, and track confidence score refinement. Our main contributions include a novel fusion approach for combining camera and LiDAR sensory signals to learn affinities, and a first-of-its-kind multimodal sequential track confidence refinement technique that fuses 2D and 3D detections. Additionally, we perform an ablative analysis on each fusion step to demonstrate the added benefits of incorporating the camera sensor, particular for small, distant objects that tend to suffer from the depth-sensing limits and sparsity of LiDAR sensors. In sum, our technique achieves state-of-the-art performance on the nuScenes benchmark amongst multimodal 3D MOT algorithms using CenterPoint detections.

Comments:	8 pages, 1 figure
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2310.02532 [cs.CV]
	(or arXiv:2310.02532v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2310.02532

Submission history

From: Tara Sadjadpour [view email]
[v1] Wed, 4 Oct 2023 02:17:59 UTC (1,202 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators