Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

Yu, Cuican; Lu, Guansong; Zeng, Yihan; Sun, Jian; Liang, Xiaodan; Li, Huibin; Xu, Zongben; Xu, Songcen; Zhang, Wei; Xu, Hang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.16758 (cs)

[Submitted on 31 Aug 2023]

Title:Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

Authors:Cuican Yu, Guansong Lu, Yihan Zeng, Jian Sun, Xiaodan Liang, Huibin Li, Zongben Xu, Songcen Xu, Wei Zhang, Hang Xu

View PDF

Abstract:Generating 3D faces from textual descriptions has a multitude of applications, such as gaming, movie, and robotics. Recent progresses have demonstrated the success of unconditional 3D face generation and text-to-3D shape generation. However, due to the limited text-3D face data pairs, text-driven 3D face generation remains an open problem. In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D faces using text guidance. Specifically, we adopt an unconditional 3D face generation framework and equip it with text conditions, which learns the text-guided 3D face generation with only text-2D face data. On top of that, we propose two text-to-face cross-modal alignment techniques, including the global contrastive learning and the fine-grained alignment module, to facilitate high semantic consistency between generated 3D faces and input texts. Besides, we present directional classifier guidance during the inference process, which encourages creativity for out-of-domain generations. Compared to the existing methods, TG-3DFace creates more realistic and aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over Latent3D. The rendered face images generated by TG-3DFace achieve higher FID and CLIP score than text-to-2D face/image generation models, demonstrating our superiority in generating realistic and semantic-consistent textures.

Comments:	accepted by ICCV 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.16758 [cs.CV]
	(or arXiv:2308.16758v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.16758

Submission history

From: Cuican Yu [view email]
[v1] Thu, 31 Aug 2023 14:26:33 UTC (2,804 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators