PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model

Kareem, Amrin; Lahoud, Jean; Cholakkal, Hisham

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.03836 (cs)

[Submitted on 4 Apr 2024]

Title:PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model

Authors:Amrin Kareem, Jean Lahoud, Hisham Cholakkal

View PDF HTML (experimental)

Abstract:Recent advancements in 3D perception systems have significantly improved their ability to perform visual recognition tasks such as segmentation. However, these systems still heavily rely on explicit human instruction to identify target objects or categories, lacking the capability to actively reason and comprehend implicit user intentions. We introduce a novel segmentation task known as reasoning part segmentation for 3D objects, aiming to output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object. To facilitate evaluation and benchmarking, we present a large 3D dataset comprising over 60k instructions paired with corresponding ground-truth part segmentation annotations specifically curated for reasoning-based 3D part segmentation. We propose a model that is capable of segmenting parts of 3D objects based on implicit textual queries and generating natural language explanations corresponding to 3D object segmentation requests. Experiments show that our method achieves competitive performance to models that use explicit queries, with the additional abilities to identify part concepts, reason about them, and complement them with world knowledge. Our source code, dataset, and trained models are available at this https URL.

Comments:	14 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.03836 [cs.CV]
	(or arXiv:2404.03836v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.03836

Submission history

From: Amrin Kareem [view email]
[v1] Thu, 4 Apr 2024 23:38:45 UTC (24,177 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators