Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting

Zeng, Weili; Yan, Yichao; Zhu, Qi; Chen, Zhuo; Chu, Pengzhi; Zhao, Weiming; Yang, Xiaokang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.14007 (cs)

[Submitted on 22 Apr 2024]

Title:Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting

Authors:Weili Zeng, Yichao Yan, Qi Zhu, Zhuo Chen, Pengzhi Chu, Weiming Zhao, Xiaokang Yang

View PDF HTML (experimental)

Abstract:Text-to-image (T2I) customization aims to create images that embody specific visual concepts delineated in textual descriptions. However, existing works still face a main challenge, concept overfitting. To tackle this challenge, we first analyze overfitting, categorizing it into concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities, i.e, backgrounds, layouts, styles. To evaluate the overfitting degree, we further introduce two metrics, i.e, Latent Fisher divergence and Wasserstein metric to measure the distribution changes of non-customized and customized concept respectively. Drawing from the analysis, we propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities, while preserving non-customized knowledge. Remarkably, Infusion achieves this feat with remarkable efficiency, requiring a mere 11KB of trained parameters. Extensive experiments also demonstrate that our approach outperforms state-of-the-art methods in both single and multi-concept customized generation.

Comments:	10 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.14007 [cs.CV]
	(or arXiv:2404.14007v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.14007

Submission history

From: Weili Zeng [view email]
[v1] Mon, 22 Apr 2024 09:16:25 UTC (35,098 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators