Cascaded Diffusion Models for High Fidelity Image Generation

Ho, Jonathan; Saharia, Chitwan; Chan, William; Fleet, David J.; Norouzi, Mohammad; Salimans, Tim

Computer Science > Computer Vision and Pattern Recognition

arXiv:2106.15282 (cs)

[Submitted on 30 May 2021 (v1), last revised 17 Dec 2021 (this version, v3)]

Title:Cascaded Diffusion Models for High Fidelity Image Generation

Authors:Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans

View PDF

Abstract:We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation benchmark, without any assistance from auxiliary image classifiers to boost sample quality. A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution, beginning with a standard diffusion model at the lowest resolution, followed by one or more super-resolution diffusion models that successively upsample the image and add higher resolution details. We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation, our proposed method of data augmentation of the lower resolution conditioning inputs to the super-resolution models. Our experiments show that conditioning augmentation prevents compounding error during sampling in a cascaded model, helping us to train cascading pipelines achieving FID scores of 1.48 at 64x64, 3.52 at 128x128 and 4.88 at 256x256 resolutions, outperforming BigGAN-deep, and classification accuracy scores of 63.02% (top-1) and 84.06% (top-5) at 256x256, outperforming VQ-VAE-2.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2106.15282 [cs.CV]
	(or arXiv:2106.15282v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2106.15282

Submission history

From: Jonathan Ho [view email]
[v1] Sun, 30 May 2021 17:14:52 UTC (6,300 KB)
[v2] Wed, 7 Jul 2021 19:43:38 UTC (27,024 KB)
[v3] Fri, 17 Dec 2021 17:21:04 UTC (28,173 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Cascaded Diffusion Models for High Fidelity Image Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Cascaded Diffusion Models for High Fidelity Image Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators