Controllable Human-Object Interaction Synthesis

Li, Jiaman; Clegg, Alexander; Mottaghi, Roozbeh; Wu, Jiajun; Puig, Xavier; Liu, C. Karen

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.03913 (cs)

[Submitted on 6 Dec 2023 (v1), last revised 14 Jul 2024 (this version, v2)]

Title:Controllable Human-Object Interaction Synthesis

Authors:Jiaman Li, Alexander Clegg, Roozbeh Mottaghi, Jiajun Wu, Xavier Puig, C. Karen Liu

View PDF HTML (experimental)

Abstract:Synthesizing semantic-aware, long-horizon, human-object interaction is critical to simulate realistic human behaviors. In this work, we address the challenging problem of generating synchronized object motion and human motion guided by language descriptions in 3D scenes. We propose Controllable Human-Object Interaction Synthesis (CHOIS), an approach that generates object motion and human motion simultaneously using a conditional diffusion model given a language description, initial object and human states, and sparse object waypoints. Here, language descriptions inform style and intent, and waypoints, which can be effectively extracted from high-level planning, ground the motion in the scene. Naively applying a diffusion model fails to predict object motion aligned with the input waypoints; it also cannot ensure the realism of interactions that require precise hand-object and human-floor contact. To overcome these problems, we introduce an object geometry loss as additional supervision to improve the matching between generated object motion and input object waypoints; we also design guidance terms to enforce contact constraints during the sampling process of the trained diffusion model. We demonstrate that our learned interaction module can synthesize realistic human-object interactions, adhering to provided textual descriptions and sparse waypoint conditions. Additionally, our module seamlessly integrates with a path planning module, enabling the generation of long-term interactions in 3D environments.

Comments:	ECCV 2024, project webpage: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.03913 [cs.CV]
	(or arXiv:2312.03913v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.03913

Submission history

From: Jiaman Li [view email]
[v1] Wed, 6 Dec 2023 21:14:20 UTC (6,666 KB)
[v2] Sun, 14 Jul 2024 23:00:54 UTC (13,522 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Controllable Human-Object Interaction Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Controllable Human-Object Interaction Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators