0% found this document useful (0 votes)
7 views83 pages

Lec24 Diffusion

The document discusses the fundamentals of diffusion models in deep learning, particularly their generative capabilities and how they relate to variational autoencoders (VAEs). It outlines the forward and reverse processes of denoising diffusion models, emphasizing their training methodologies and the use of stochastic differential equations. Additionally, it highlights the limitations of traditional VAEs and presents diffusion models as an advanced approach to generating high-quality data.

Uploaded by

marcus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views83 pages

Lec24 Diffusion

The document discusses the fundamentals of diffusion models in deep learning, particularly their generative capabilities and how they relate to variational autoencoders (VAEs). It outlines the forward and reverse processes of denoising diffusion models, emphasizing their training methodologies and the use of stochastic differential equations. Additionally, it highlights the limitations of traditional VAEs and presents diffusion models as an advanced approach to generating high-quality data.

Uploaded by

marcus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 83

Deep

Learning
Diffusion
Hao Chen

Fall 2024
Attendance:
@

1
Generative vs. Discriminative
• Generative models learn the data
distribution

2
Generative Models
• Learning to generate
data

https://cvpr2022-tutorial-diffusion-models.github.io/ 3
Generative Models

4
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
Generative
Models
Last
Lecture

5
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
Generative Models

This
Lecture

6
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
A Fast Evolving Field

SORA 2024

7
Conten

t
Denoising Diffusion Model Basics
• Diffusion Models from Stochastic
Differential Equations and Score Matching
Perspective
• Denoising Diffusion Implicit Model (DDIM)
• Conditional Diffusion Models
• Applications of Diffusion Models

8
Conten

t
Diffusion Model Basics
– Diffusion Models as Stacking VAEs
– Diffusion Models: Forward, Reverse, Training,
Sampling
• Diffusion Models from Stochastic
Differential Equations and Score Matching
Perspective
• Denoising Diffusion Implicit Model (DDIM)
• Conditional Diffusion Models
• Applications of Diffusion Models
9
Denoising Diffusion
• Models
what we often see about diffusion
models

Forward diffusion Reverse denoising


process process

10
Denoising Diffusion
• Models
what we often see about diffusion
models

Forward diffusion Reverse denoising


process process

• this lecture: denoising diffusion is a stack of


VAEs 11
Recap: Variational Autoencoders
• VAEs: a likelihood-based generative
model
• Encoder: an inference model that 𝑝(𝑥|
𝑧)
approximates the posterior 𝑞(𝑧| � �
𝑥) � �
• Decoder: a generative model that 𝑞(𝑧|
𝑥)
transforms a Gaussian variable z to real
data
• Training: maximize the
ELBO
12
Recap: Variational Autoencoders
Decoder: transforms a
Gaussian variable to real data

𝑝(𝑥|
𝑧)
VAE

� �
� �
𝑞(𝑧|𝑥)

Encoder: an inference model


approximates the posterior, i.e.
Gaussian

13
VAEs are good, but…
• Blurry
results

14
Kingma et al. Auto-Encoding Variational
Limitations of VAEs
• Decoder must transform a standard Gaussian all
the way to the target distribution in one-step
– Often too large a gap
– Blurry results are generated
z 𝑥
𝐷(𝑥; +
𝜙)

𝑒~
𝑁(0, 𝐶)

15
Limitations of VAEs
• Decoder must transform a standard Gaussian all
the way to the target distribution in one-step
– Often too large a gap
– Blurry results are generated
z 𝑥
𝐷(𝑥; +
𝜙)

𝑒~
𝑁(0, 𝐶)

• Solution: have some intermediate latent variables


to reduce the gap of each step 16
Hierarchical
• Hierarchical VAEs– Stacking VAEs on top of each
VAEs
other
– Multiple (T) intermediate latent

– Joint distribution

– Posterior
• Better likelihood achieved!
𝑝(𝑥|𝑧1 )
𝑝(𝑧2 |
𝑧3 )

𝑝(𝑧1 |𝑧2 ) z3

𝑥 𝑞(𝑧 3 |
𝑧 )
17
Sønderby et al. Ladder Variational Networks. 2016
Stacking VAEs
• Each step, the decoder removes part of the
noise

𝑥 distribution 𝑥
• Provides a seed model closer to final

3 0
𝐷(𝑥; 𝐷(𝑥; 𝐷(𝑥;
𝑥2 𝑥1𝜙 1 )
+ + +
𝜙3 ) 𝜙2 )

𝑒~ 𝑒~ 𝑒 ~ 𝑁(0,
𝑁(0, 𝐶) 𝑁(0, 𝐶) 𝐶)

18
Stacking
• VAEs
We can have many many steps (in total T)…
• Each step incrementally recovers the final
distribution

𝑥 𝑥 𝑥 𝑥

𝑥 Decoder𝑥 𝑇−Decoder
Decoder 𝑇−
𝑇 1 2 2 1 0
… Decoder Decoder Decoder
T T-1 T-2 3 2 1

• Looks familiar?
19
Diffusion Models are Stacking
•VAEs
Diffusion models are special cases of Stacking
VAEs
Decoder Decoder Decoder
Decoder
1 2 t T

𝑥 𝑥
𝑥0 𝑥1 𝑥 𝑡− 𝑥 𝑇−

𝑡 1 𝑇
1

•𝑥2The reverse denoising process is the stack
of decoders
• What about encoders?

20
Diffusion Models are Stacking
•VAEs
Diffusion models are special case of Stacking
VAEs
Decoder Decoder Decoder Decoder

𝑥
2

𝑥
1 t

𝑥 𝑥1 𝑥 𝑇−
𝑥 𝑡−
T
… …
𝑥2
0 𝑡 1 𝑇
1
Encoder Encoder Encoder Encoder
1 2 t T

• In VAEs, encoders are learned with KL-


divergence between the posterior and the prior
• Suffers from the ‘posterior-collapse’ issue
• Diffusion models use fixed inference encoders
21
Chen et al. Variational Lossy Autoencoder. 2016
Pol
l

22
Denoising Diffusion Models
• Diffusion models have two processes

• Forward diffusion process gradually adds noise


to input

• Reverse denoising process learns to generate


data by denoising

23
Forward Diffusion Process
• Forward diffusion process is stacking fixed VAE
encoders
– gradually adding Gaussian noise according to schedule 𝛽𝑡

24
Forward Diffusion Process

• The forward process allows sampling of 𝑥𝑡


at arbitrary timestep 𝑡 in closed form:

• The noise schedule (𝛽𝑡 values) is designed such


that 25
Reverse Denoising Process
• Generation process
– Sample
– Iteratively sample

• not directly tractable

Gaussian distribution if 𝛽 𝑡 is small


• But can be estimated with a

at each step
– The purpose of our stack of VAE
decoders! 26
Reverse Denoising Process
• Reverse diffusion process is stacking learnable VAE
decoders
– Predicting the mean and std of added Gaussian Noise

27
Reverse Denoising Process
• Reverse diffusion process is stacking learnable VAE
decoders
– Predicting the mean and std of added Gaussian Noise

28
Reverse Denoising Process
• Reverse diffusion process is stacking learnable VAE
decoders
– Predicting the mean and std of added Gaussian Noise

Trainable Network, Shared Across All


Timesteps

29
Learning the Denoising
• Modelmodels are trained with variational
Denoising
upper bound (negative ELBO), as VAEs

• which derives
to:

• tractable posterior distribution (closed-


form)

Ho et al. Denoising Diffusion Probabilistic Models. 30


2020.
Learning the Denoising
• Modelmodels are trained with variational
Denoising
upper bound (negative ELBO), as VAEs

• which derives
to:

constant Scalin
• tractable posterior distribution (closed- g
form)

Ho et al. Denoising Diffusion Probabilistic Models. 31


2020.
Learning the Denoising
• Modelmodels are trained with variational
Denoising
upper bound (negative ELBO), as VAEs

• which derives
to:

• tractable posterior distribution (closed-


form)

Ho et al. Denoising Diffusion Probabilistic Models. 32


2020.
Parameterizing the Denoising
• Model
KL divergence has a simple form between
Gaussians

• Recall that:

• Trainable network predicts the noise


mean

• Final
Objective 33
Simplified Training Objective

𝜆𝑡

• 𝜆 𝑡 ensures the weighting for correct


maximum likelihood estimation

• In DDPM, this is further simplified to:

34
Summary: Training and Sampling

35
Summary: Noise Schedule

Str¨umke et al. Lecture Notes in Probabilistic Diffusion Models. 2020. 36


Connection with Hierarchical VAEs
• Diffusion models are special case of Hierarchical
VAEs
– Fixed inference models in forward process
– Latent variables have same dimension as data
– ELBO is decomposed to each timestep: faster to train
– Model is trained with some weighting of ELBO
𝑝(𝑥| 𝑝(𝑧1 | 𝑝(𝑧2 |
𝑧1 ) 𝑧2 ) 𝑧3 )
� z1 z2 z3

𝑞(𝑧1 | 𝑞(𝑧 2 | 𝑞(𝑧 3 |
𝑥) 𝑧1 ) 𝑧2 )
Ho et al. Denoising Diffusion Probabilistic Models. 37
2020.
Pol
l

38
Conten

t
Diffusion Model Basics
– Diffusion Models as Stacking VAEs
– Diffusion Models: Forward, Reverse, Training,
Sampling
• Diffusion Models from Stochastic
Differential Equations and Score Matching
Perspective
• Classifier-Free Guidance for Conditional
Models
• Applications of Diffusion Models
39
Why SDEs?
• A unified framework for interpreting
diffusion models and score-based
generation models
– Variants of diffusion-based and flow-based
models

40
Stochastic Differential Equations

41
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
Stochastic Differential Equations

42
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
Score Matching
• General form of probability density function

• Maximizing the log-likelihood requires us to


know
– Often intractable

• Instead, we can model the score function

43
Forward Diffusion Process as
SDEs

• Consider a forward process with many many small steps (continuous


time)

Taylor expansion
44
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
Forward Diffusion Process as
SDEs

• Consider a forward process with many many small


steps

Taylor expansion
Allows different size along Step
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/ t size45
Forward Diffusion Process as
SDEs

• Consider a forward process with many many small


steps

Taylor expansion
46
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
Forward Diffusion Process as
SDEs

• An iterative update that can be viewed as


SDEs

Stochastic Differential Equation


(SDE)
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
47
Forward Diffusion Process as
SDEs

Drift Term Diffusion Term


(Pulls toward the (Injects
mode) Noise) 48
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
49
Figure credit to: https://yang-song.net/blog/2021/score/
Generative Reverse SDEs

• The forward SDE has a reverse


form:

50
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
51
Figure credit to: https://yang-song.net/blog/2021/score/
Generative Reverse SDEs

• The forward SDE has a reverse


form:

Score
function How 52
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
Denoising Score
Matching

53
Figure credit to: https://yang-song.net/blog/2021/score/
Denoising Score
Matching

54
Figure credit to: https://yang-song.net/blog/2021/score/
Denoising Score
Matching

Looks 55
Figure credit to: https://yang-song.net/blog/2021/score/ similar?
Denoising Score
• Matching
Denoising score matching
objective

• Re-parametrized
sampling:

• Score function:

• Denoising
network:

• Final objective:
56
Weighted Diffusion Objective
• Denoising score matching objective with loss
weighting

• Loss weights trade-off between


– good perceptual quality:
– maximum likelihood:

• More complicated model parametrization and loss


weighting
leads to different diffusion model variants in the literature!

57
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
Pol
l

58
Conten

t
Diffusion Model Basics
• Diffusion Models from Stochastic
Differential Equations and Score Matching
Perspective
• Denoising Diffusion Implicit Model (DDIM)
• Conditional Diffusion Models
• Applications of Diffusion Models

59
Many Steps in Diffusion
• Slow in generation

• In Training, we randomly sample one time


step

• But in inference, we must transit from T to 0


– 1000 steps
– extremely slow for raw images/signals

60
Can we do generation with less
steps?

61
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
DDPM

62
DDPM

Only depends on previous


step

Only used during


training

63
DDIM

• A Non-Markovian Forward
Process

Song et al. Denoising Diffusion Implicit Models. 2021. 64


DDIM

• Backward
process

Song et al. Denoising Diffusion Implicit Models. 2021. 65


DDPM vs DDIM

66
DDIM with Fewer Steps Sampling

67
DDIM Results

68
Pol
l

69
Conten

t
Diffusion Model Basics
• Diffusion Models from Stochastic
Differential Equations and Score Matching
Perspective
• Denoising Diffusion Implicit Model (DDIM)
• Conditional Diffusion Models
• Applications of Diffusion Models

70
Conditional Diffusion Models
• Un-conditional •

Conditional

More controllable!
71
Conditional Score Matching
• Score matching with conditional
information

72
Classifier Guidance
• Use a discriminative classifier
for

• 𝛾 controls the strength of the condition

• Limitations:
– Need a separate classifier
– Conditioning depends on the performance
of classifier
73
Classifier-Free Guidance
• Score matching with conditional
information

• Classifier-free
guidance

Ho et al. Classifier-Free Diffusion Guidance. 74


2022.
Training of Classifier-Free
Guidance
• For conditional embeddings
– Randomly drop p original conditionals with
an additional unconditional class

Ho et al. Classifier-Free Diffusion Guidance. 75


2022.
Conten

t
Diffusion Model Basics
• Diffusion Models from Stochastic
Differential Equations and Score Matching
Perspective
• Denoising Diffusion Implicit Model (DDIM)
• Conditional Diffusion Models
• Applications of Diffusion Models

76
DDPM
• Training diffusion models on raw images
with a U-Net model

Ho et al. Denoising Diffusion Probabilistic Models. 2020. 77


Diffusion Models Beat GANs
• Larger denoising model with sophisticated
design
– Adaptive group normalization
– Attention layers in U-Net

Dhariwal et al. Diffusion Models Beat GANs on Image Synthesis. 78


2021.
Latent Diffusion Models (LDMs)
• Learn diffusion on VAE’s latent
– Yet another VAE! Except pre-
trained.

Rombach et al. High-Resolution Image Synthesis with Latent Diffusion Models. 2022. 79
Stable Diffusion
• Large-scale text-conditional LDMs
– With VAEs trained also on larger
datasets

Stability AI. https://github.com/Stability-AI/stablediffusion 80


DALLE

Ramesh et al. Hierarchical Text-Conditional Image Generation with CLIP 81


Latents
DiT
• A transformer architecture for diffusion
models

Peebles et al. Scalable Diffusion Models with Transformers. 82


2020.
MAR
• An autoregressive model with diffusion
loss

Li et al. Autoregressive Image Generation without Vector Quantization. 2024. 83

You might also like