Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Guo, Jun; Ma, Xiaojian; Fan, Yue; Liu, Huaping; Li, Qing

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.15624v1 (cs)

[Submitted on 22 Mar 2024 (this version), latest version 23 Aug 2024 (v2)]

Title:Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Authors:Jun Guo, Xiaojian Ma, Yue Fan, Huaping Liu, Qing Li

View PDF HTML (experimental)

Abstract:Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, withwide-ranging applications in embodied agents and augmented reality systems. Previous approaches haveadopted Neural Radiance Fields (NeRFs) to analyze 3D scenes. In this paper, we introduce SemanticGaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our keyidea is distilling pre-trained 2D semantics into 3D Gaussians. We design a versatile projection approachthat maps various 2Dsemantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, withoutthe additional training required by NeRFs. We further build a 3D semantic network that directly predictsthe semantic component from raw 3D Gaussians for fast inference. We explore several applications ofSemantic Gaussians: semantic segmentation on ScanNet-20, where our approach attains a 4.2% mIoU and 4.0%mAcc improvement over prior open-vocabulary scene understanding counterparts; object part segmentation,sceneediting, and spatial-temporal segmentation with better qualitative results over 2D and 3D baselines,highlighting its versatility and effectiveness on supporting diverse downstream tasks.

Comments:	Project page: see this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.15624 [cs.CV]
	(or arXiv:2403.15624v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.15624

Submission history

From: Jun Guo [view email]
[v1] Fri, 22 Mar 2024 21:28:19 UTC (33,606 KB)
[v2] Fri, 23 Aug 2024 06:44:36 UTC (7,259 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators