SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Nguyen, Viet; Nguyen, Anh; Dao, Trung; Nguyen, Khoi; Pham, Cuong; Tran, Toan; Tran, Anh

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.02687 (cs)

[Submitted on 3 Dec 2024 (v1), last revised 4 Dec 2024 (this version, v2)]

Title:SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Authors:Viet Nguyen, Anh Nguyen, Trung Dao, Khoi Nguyen, Cuong Pham, Toan Tran, Anh Tran

View PDF HTML (experimental)

Abstract:Recent approaches have yielded promising results in distilling multi-step text-to-image diffusion models into one-step ones. The state-of-the-art efficient distillation technique, i.e., SwiftBrushv2 (SBv2), even surpasses the teacher model's performance with limited resources. However, our study reveals its instability when handling different diffusion model backbones due to using a fixed guidance scale within the Variational Score Distillation (VSD) loss. Another weakness of the existing one-step diffusion models is the missing support for negative prompt guidance, which is crucial in practical image generation. This paper presents SNOOPI, a novel framework designed to address these limitations by enhancing the guidance in one-step diffusion models during both training and inference. First, we effectively enhance training stability through Proper Guidance-SwiftBrush (PG-SB), which employs a random-scale classifier-free guidance approach. By varying the guidance scale of both teacher models, we broaden their output distributions, resulting in a more robust VSD loss that enables SB to perform effectively across diverse backbones while maintaining competitive performance. Second, we propose a training-free method called Negative-Away Steer Attention (NASA), which integrates negative prompts into one-step diffusion models via cross-attention to suppress undesired elements in generated images. Our experimental results show that our proposed methods significantly improve baseline models across various metrics. Remarkably, we achieve an HPSv2 score of 31.08, setting a new state-of-the-art benchmark for one-step diffusion models.

Comments:	18 pages, 9 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.02687 [cs.CV]
	(or arXiv:2412.02687v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.02687

Submission history

From: Viet Nguyen [view email]
[v1] Tue, 3 Dec 2024 18:56:32 UTC (9,729 KB)
[v2] Wed, 4 Dec 2024 08:01:47 UTC (9,725 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators