Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion

Guizilini, Vitor; Irshad, Muhammad Zubair; Chen, Dian; Shakhnarovich, Greg; Ambrus, Rares

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.18804 (cs)

[Submitted on 30 Jan 2025]

Title:Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion

Authors:Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen, Greg Shakhnarovich, Rares Ambrus

View PDF HTML (experimental)

Abstract:Current methods for 3D scene reconstruction from sparse posed images employ intermediate 3D representations such as neural fields, voxel grids, or 3D Gaussians, to achieve multi-view consistent scene appearance and geometry. In this paper we introduce MVGD, a diffusion-based architecture capable of direct pixel-level generation of images and depth maps from novel viewpoints, given an arbitrary number of input views. Our method uses raymap conditioning to both augment visual features with spatial information from different viewpoints, as well as to guide the generation of images and depth maps from novel views. A key aspect of our approach is the multi-task generation of images and depth maps, using learnable task embeddings to guide the diffusion process towards specific modalities. We train this model on a collection of more than 60 million multi-view samples from publicly available datasets, and propose techniques to enable efficient and consistent learning in such diverse conditions. We also propose a novel strategy that enables the efficient training of larger models by incrementally fine-tuning smaller ones, with promising scaling behavior. Through extensive experiments, we report state-of-the-art results in multiple novel view synthesis benchmarks, as well as multi-view stereo and video depth estimation.

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2501.18804 [cs.CV]
	(or arXiv:2501.18804v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.18804

Submission history

From: Vitor Guizilini [view email]
[v1] Thu, 30 Jan 2025 23:43:06 UTC (21,933 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators