Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

Jin, Peng; Li, Hao; Cheng, Zesen; Li, Kehan; Yu, Runyi; Liu, Chang; Ji, Xiangyang; Yuan, Li; Chen, Jie

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.10528 (cs)

[Submitted on 15 Jul 2024]

Title:Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

Authors:Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Runyi Yu, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen

View PDF HTML (experimental)

Abstract:Text-to-motion generation requires not only grounding local actions in language but also seamlessly blending these individual actions to synthesize diverse and realistic global motions. However, existing motion generation methods primarily focus on the direct synthesis of global motions while neglecting the importance of generating and controlling local actions. In this paper, we propose the local action-guided motion diffusion model, which facilitates global motion generation by utilizing local actions as fine-grained control signals. Specifically, we provide an automated method for reference local action sampling and leverage graph attention networks to assess the guiding weight of each local action in the overall motion synthesis. During the diffusion process for synthesizing global motion, we calculate the local-action gradient to provide conditional guidance. This local-to-global paradigm reduces the complexity associated with direct global motion generation and promotes motion diversity via sampling diverse actions as conditions. Extensive experiments on two human motion datasets, i.e., HumanML3D and KIT, demonstrate the effectiveness of our method. Furthermore, our method provides flexibility in seamlessly combining various local actions and continuous guiding weight adjustment, accommodating diverse user preferences, which may hold potential significance for the community. The project page is available at this https URL.

Comments:	Accepted by ECCV 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.10528 [cs.CV]
	(or arXiv:2407.10528v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.10528

Submission history

From: Peng Jin [view email]
[v1] Mon, 15 Jul 2024 08:35:00 UTC (2,693 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators