Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Guo, Jun; Ma, Xiaojian; Fan, Yue; Liu, Huaping; Li, Qing

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.15624 (cs)

[Submitted on 22 Mar 2024 (v1), last revised 23 Aug 2024 (this version, v2)]

Title:Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Authors:Jun Guo, Xiaojian Ma, Yue Fan, Huaping Liu, Qing Li

View PDF HTML (experimental)

Abstract:Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, with wide-ranging applications in embodied agents and augmented reality systems. Existing methods adopt neurel rendering methods as 3D representations and jointly optimize color and semantic features to achieve rendering and scene understanding simultaneously. In this paper, we introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our key idea is to distill knowledge from 2D pre-trained models to 3D Gaussians. Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, which is based on spatial relationship and need no additional training. We further build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference. The quantitative results on ScanNet segmentation and LERF object localization demonstates the superior performance of our method. Additionally, we explore several applications of Semantic Gaussians including object part segmentation, instance segmentation, scene editing, and spatiotemporal segmentation with better qualitative results over 2D and 3D baselines, highlighting its versatility and effectiveness on supporting diverse downstream tasks.

Comments:	Project page: see this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.15624 [cs.CV]
	(or arXiv:2403.15624v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.15624

Submission history

From: Jun Guo [view email]
[v1] Fri, 22 Mar 2024 21:28:19 UTC (33,606 KB)
[v2] Fri, 23 Aug 2024 06:44:36 UTC (7,259 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators