Depth Anything V2

Yang, Lihe; Kang, Bingyi; Huang, Zilong; Zhao, Zhen; Xu, Xiaogang; Feng, Jiashi; Zhao, Hengshuang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.09414 (cs)

[Submitted on 13 Jun 2024 (v1), last revised 20 Oct 2024 (this version, v2)]

Title:Depth Anything V2

Authors:Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao

View PDF HTML (experimental)

Abstract:This work presents Depth Anything V2. Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. Notably, compared with V1, this version produces much finer and more robust depth predictions through three key practices: 1) replacing all labeled real images with synthetic images, 2) scaling up the capacity of our teacher model, and 3) teaching student models via the bridge of large-scale pseudo-labeled real images. Compared with the latest models built on Stable Diffusion, our models are significantly more efficient (more than 10x faster) and more accurate. We offer models of different scales (ranging from 25M to 1.3B params) to support extensive scenarios. Benefiting from their strong generalization capability, we fine-tune them with metric depth labels to obtain our metric depth models. In addition to our models, considering the limited diversity and frequent noise in current test sets, we construct a versatile evaluation benchmark with precise annotations and diverse scenes to facilitate future research.

Comments:	Accepted by NeurIPS 2024. Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.09414 [cs.CV]
	(or arXiv:2406.09414v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.09414

Submission history

From: Lihe Yang [view email]
[v1] Thu, 13 Jun 2024 17:59:56 UTC (46,058 KB)
[v2] Sun, 20 Oct 2024 11:24:09 UTC (46,064 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Depth Anything V2

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Depth Anything V2

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators