Lec24 Diffusion
Lec24 Diffusion
Learning
Diffusion
Hao Chen
Fall 2024
Attendance:
@
1
Generative vs. Discriminative
• Generative models learn the data
distribution
2
Generative Models
• Learning to generate
data
https://cvpr2022-tutorial-diffusion-models.github.io/ 3
Generative Models
4
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
Generative
Models
Last
Lecture
5
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
Generative Models
This
Lecture
6
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
A Fast Evolving Field
SORA 2024
7
Conten
•
t
Denoising Diffusion Model Basics
• Diffusion Models from Stochastic
Differential Equations and Score Matching
Perspective
• Denoising Diffusion Implicit Model (DDIM)
• Conditional Diffusion Models
• Applications of Diffusion Models
8
Conten
•
t
Diffusion Model Basics
– Diffusion Models as Stacking VAEs
– Diffusion Models: Forward, Reverse, Training,
Sampling
• Diffusion Models from Stochastic
Differential Equations and Score Matching
Perspective
• Denoising Diffusion Implicit Model (DDIM)
• Conditional Diffusion Models
• Applications of Diffusion Models
9
Denoising Diffusion
• Models
what we often see about diffusion
models
10
Denoising Diffusion
• Models
what we often see about diffusion
models
𝑝(𝑥|
𝑧)
VAE
� �
� �
𝑞(𝑧|𝑥)
13
VAEs are good, but…
• Blurry
results
14
Kingma et al. Auto-Encoding Variational
Limitations of VAEs
• Decoder must transform a standard Gaussian all
the way to the target distribution in one-step
– Often too large a gap
– Blurry results are generated
z 𝑥
𝐷(𝑥; +
𝜙)
𝑒~
𝑁(0, 𝐶)
15
Limitations of VAEs
• Decoder must transform a standard Gaussian all
the way to the target distribution in one-step
– Often too large a gap
– Blurry results are generated
z 𝑥
𝐷(𝑥; +
𝜙)
𝑒~
𝑁(0, 𝐶)
– Joint distribution
– Posterior
• Better likelihood achieved!
𝑝(𝑥|𝑧1 )
𝑝(𝑧2 |
𝑧3 )
𝑝(𝑧1 |𝑧2 ) z3
𝑥 𝑞(𝑧 3 |
𝑧 )
17
Sønderby et al. Ladder Variational Networks. 2016
Stacking VAEs
• Each step, the decoder removes part of the
noise
𝑥 distribution 𝑥
• Provides a seed model closer to final
3 0
𝐷(𝑥; 𝐷(𝑥; 𝐷(𝑥;
𝑥2 𝑥1𝜙 1 )
+ + +
𝜙3 ) 𝜙2 )
𝑒~ 𝑒~ 𝑒 ~ 𝑁(0,
𝑁(0, 𝐶) 𝑁(0, 𝐶) 𝐶)
18
Stacking
• VAEs
We can have many many steps (in total T)…
• Each step incrementally recovers the final
distribution
𝑥 𝑥 𝑥 𝑥
…
𝑥 Decoder𝑥 𝑇−Decoder
Decoder 𝑇−
𝑇 1 2 2 1 0
… Decoder Decoder Decoder
T T-1 T-2 3 2 1
• Looks familiar?
19
Diffusion Models are Stacking
•VAEs
Diffusion models are special cases of Stacking
VAEs
Decoder Decoder Decoder
Decoder
1 2 t T
𝑥 𝑥
𝑥0 𝑥1 𝑥 𝑡− 𝑥 𝑇−
…
𝑡 1 𝑇
1
…
•𝑥2The reverse denoising process is the stack
of decoders
• What about encoders?
20
Diffusion Models are Stacking
•VAEs
Diffusion models are special case of Stacking
VAEs
Decoder Decoder Decoder Decoder
𝑥
2
𝑥
1 t
𝑥 𝑥1 𝑥 𝑇−
𝑥 𝑡−
T
… …
𝑥2
0 𝑡 1 𝑇
1
Encoder Encoder Encoder Encoder
1 2 t T
22
Denoising Diffusion Models
• Diffusion models have two processes
23
Forward Diffusion Process
• Forward diffusion process is stacking fixed VAE
encoders
– gradually adding Gaussian noise according to schedule 𝛽𝑡
24
Forward Diffusion Process
at each step
– The purpose of our stack of VAE
decoders! 26
Reverse Denoising Process
• Reverse diffusion process is stacking learnable VAE
decoders
– Predicting the mean and std of added Gaussian Noise
27
Reverse Denoising Process
• Reverse diffusion process is stacking learnable VAE
decoders
– Predicting the mean and std of added Gaussian Noise
28
Reverse Denoising Process
• Reverse diffusion process is stacking learnable VAE
decoders
– Predicting the mean and std of added Gaussian Noise
29
Learning the Denoising
• Modelmodels are trained with variational
Denoising
upper bound (negative ELBO), as VAEs
• which derives
to:
• which derives
to:
constant Scalin
• tractable posterior distribution (closed- g
form)
• which derives
to:
• Recall that:
• Final
Objective 33
Simplified Training Objective
𝜆𝑡
34
Summary: Training and Sampling
35
Summary: Noise Schedule
38
Conten
•
t
Diffusion Model Basics
– Diffusion Models as Stacking VAEs
– Diffusion Models: Forward, Reverse, Training,
Sampling
• Diffusion Models from Stochastic
Differential Equations and Score Matching
Perspective
• Classifier-Free Guidance for Conditional
Models
• Applications of Diffusion Models
39
Why SDEs?
• A unified framework for interpreting
diffusion models and score-based
generation models
– Variants of diffusion-based and flow-based
models
40
Stochastic Differential Equations
41
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
Stochastic Differential Equations
42
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
Score Matching
• General form of probability density function
43
Forward Diffusion Process as
SDEs
Taylor expansion
44
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
Forward Diffusion Process as
SDEs
Taylor expansion
Allows different size along Step
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/ t size45
Forward Diffusion Process as
SDEs
Taylor expansion
46
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
Forward Diffusion Process as
SDEs
50
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
51
Figure credit to: https://yang-song.net/blog/2021/score/
Generative Reverse SDEs
Score
function How 52
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
Denoising Score
Matching
53
Figure credit to: https://yang-song.net/blog/2021/score/
Denoising Score
Matching
54
Figure credit to: https://yang-song.net/blog/2021/score/
Denoising Score
Matching
Looks 55
Figure credit to: https://yang-song.net/blog/2021/score/ similar?
Denoising Score
• Matching
Denoising score matching
objective
• Re-parametrized
sampling:
• Score function:
• Denoising
network:
• Final objective:
56
Weighted Diffusion Objective
• Denoising score matching objective with loss
weighting
57
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
Pol
l
58
Conten
•
t
Diffusion Model Basics
• Diffusion Models from Stochastic
Differential Equations and Score Matching
Perspective
• Denoising Diffusion Implicit Model (DDIM)
• Conditional Diffusion Models
• Applications of Diffusion Models
59
Many Steps in Diffusion
• Slow in generation
60
Can we do generation with less
steps?
61
Slide credit to: https://cvpr2022-tutorial-diffusion-models.github.io/
DDPM
62
DDPM
63
DDIM
• A Non-Markovian Forward
Process
• Backward
process
66
DDIM with Fewer Steps Sampling
67
DDIM Results
68
Pol
l
69
Conten
•
t
Diffusion Model Basics
• Diffusion Models from Stochastic
Differential Equations and Score Matching
Perspective
• Denoising Diffusion Implicit Model (DDIM)
• Conditional Diffusion Models
• Applications of Diffusion Models
70
Conditional Diffusion Models
• Un-conditional •
Conditional
More controllable!
71
Conditional Score Matching
• Score matching with conditional
information
72
Classifier Guidance
• Use a discriminative classifier
for
• Limitations:
– Need a separate classifier
– Conditioning depends on the performance
of classifier
73
Classifier-Free Guidance
• Score matching with conditional
information
• Classifier-free
guidance
76
DDPM
• Training diffusion models on raw images
with a U-Net model
Rombach et al. High-Resolution Image Synthesis with Latent Diffusion Models. 2022. 79
Stable Diffusion
• Large-scale text-conditional LDMs
– With VAEs trained also on larger
datasets