The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

Watson, Jamie; Mac Aodha, Oisin; Prisacariu, Victor; Brostow, Gabriel; Firman, Michael

Computer Science > Computer Vision and Pattern Recognition

arXiv:2104.14540 (cs)

[Submitted on 29 Apr 2021 (v1), last revised 14 Jul 2021 (this version, v2)]

Title:The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

Authors:Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel Brostow, Michael Firman

View PDF

Abstract:Self-supervised monocular depth estimation networks are trained to predict scene depth using nearby frames as a supervision signal during training. However, for many applications, sequence information in the form of video frames is also available at test time. The vast majority of monocular networks do not make use of this extra signal, thus ignoring valuable information that could be used to improve the predicted depth. Those that do, either use computationally expensive test-time refinement techniques or off-the-shelf recurrent networks, which only indirectly make use of the geometric information that is inherently available.
We propose ManyDepth, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available. Taking inspiration from multi-view stereo, we propose a deep end-to-end cost volume based approach that is trained using self-supervision only. We present a novel consistency loss that encourages the network to ignore the cost volume when it is deemed unreliable, e.g. in the case of moving objects, and an augmentation scheme to cope with static cameras. Our detailed experiments on both KITTI and Cityscapes show that we outperform all published self-supervised baselines, including those that use single or multiple frames at test time.

Comments:	CVPR 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2104.14540 [cs.CV]
	(or arXiv:2104.14540v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2104.14540

Submission history

From: Jamie Watson [view email]
[v1] Thu, 29 Apr 2021 17:53:42 UTC (5,804 KB)
[v2] Wed, 14 Jul 2021 10:08:51 UTC (33,448 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators