M-LRM: Multi-view Large Reconstruction Model

Li, Mengfei; Long, Xiaoxiao; Liang, Yixun; Li, Weiyu; Liu, Yuan; Li, Peng; Chi, Xiaowei; Qi, Xingqun; Xue, Wei; Luo, Wenhan; Liu, Qifeng; Guo, Yike

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.07648 (cs)

[Submitted on 11 Jun 2024]

Title:M-LRM: Multi-view Large Reconstruction Model

Authors:Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Xiaowei Chi, Xingqun Qi, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

View PDF

Abstract:Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected.
It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the strong 3D coherence among the input images. In this paper, we propose a Multi-view Large Reconstruction Model (M-LRM) designed to efficiently reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner. Specifically, we introduce a multi-view consistent cross-attention scheme to enable M-LRM to accurately query information from the input images. Moreover, we employ the 3D priors of the input multi-view images to initialize the tri-plane tokens. Compared to LRM, the proposed M-LRM can produce a tri-plane NeRF with $128 \times 128$ resolution and generate 3D shapes of high fidelity. Experimental studies demonstrate that our model achieves a significant performance gain and faster training convergence than LRM. Project page: this https URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.07648 [cs.CV]
	(or arXiv:2406.07648v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.07648

Submission history

From: Mengfei Li [view email]
[v1] Tue, 11 Jun 2024 18:29:13 UTC (3,630 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:M-LRM: Multi-view Large Reconstruction Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:M-LRM: Multi-view Large Reconstruction Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators