MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

Chong, Zhiyu; Ma, Xinzhu; Zhang, Hong; Yue, Yuxin; Li, Haojie; Wang, Zhihui; Ouyang, Wanli

Computer Science > Computer Vision and Pattern Recognition

arXiv:2201.10830 (cs)

[Submitted on 26 Jan 2022]

Title:MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

Authors:Zhiyu Chong, Xinzhu Ma, Hong Zhang, Yuxin Yue, Haojie Li, Zhihui Wang, Wanli Ouyang

View PDF

Abstract:3D object detection is a fundamental and challenging task for 3D scene understanding, and the monocular-based methods can serve as an economical alternative to the stereo-based or LiDAR-based methods. However, accurately detecting objects in the 3D space from a single image is extremely difficult due to the lack of spatial cues. To mitigate this issue, we propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors, without introducing any extra cost in the inference phase. In particular, we first project the LiDAR signals into the image plane and align them with the RGB images. After that, we use the resulting data to train a 3D detector (LiDAR Net) with the same architecture as the baseline model. Finally, this LiDAR Net can serve as the teacher to transfer the learned knowledge to the baseline model. Experimental results show that the proposed method can significantly boost the performance of the baseline model and ranks the $1^{st}$ place among all monocular-based methods on the KITTI benchmark. Besides, extensive ablation studies are conducted, which further prove the effectiveness of each part of our designs and illustrate what the baseline model has learned from the LiDAR Net. Our code will be released at \url{this https URL}.

Comments:	Accepted by ICLR 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2201.10830 [cs.CV]
	(or arXiv:2201.10830v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2201.10830

Submission history

From: Xinzhu Ma [view email]
[v1] Wed, 26 Jan 2022 09:21:41 UTC (7,924 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators