Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss

Jörgensen, Eskil; Zach, Christopher; Kahl, Fredrik

Computer Science > Computer Vision and Pattern Recognition

arXiv:1906.08070 (cs)

[Submitted on 19 Jun 2019 (v1), last revised 20 Jun 2019 (this version, v2)]

Title:Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss

Authors:Eskil Jörgensen, Christopher Zach, Fredrik Kahl

View PDF

Abstract:Three-dimensional object detection from a single view is a challenging task which, if performed with good accuracy, is an important enabler of low-cost mobile robot perception. Previous approaches to this problem suffer either from an overly complex inference engine or from an insufficient detection accuracy. To deal with these issues, we present SS3D, a single-stage monocular 3D object detector. The framework consists of (i) a CNN, which outputs a redundant representation of each relevant object in the image with corresponding uncertainty estimates, and (ii) a 3D bounding box optimizer. We show how modeling heteroscedastic uncertainty improves performance upon our baseline, and furthermore, how back-propagation can be done through the optimizer in order to train the pipeline end-to-end for additional accuracy. Our method achieves SOTA accuracy on monocular 3D object detection, while running at 20 fps in a straightforward implementation. We argue that the SS3D architecture provides a solid framework upon which high performing detection systems can be built, with autonomous driving being the main application in mind.

Comments:	For associated videos file, see this http URL ; without supplementary material
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:1906.08070 [cs.CV]
	(or arXiv:1906.08070v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1906.08070

Submission history

From: Eskil Jörgensen [view email]
[v1] Wed, 19 Jun 2019 12:39:16 UTC (6,568 KB)
[v2] Thu, 20 Jun 2019 09:26:37 UTC (6,568 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators