Lecture 4 Diffusion - Models Part I Final
Lecture 4 Diffusion - Models Part I Final
1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Applications
8. Research directions
Outline
1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Applicatons
8. Research directions
Motivation
Motivation
Motivation
1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Research directions
High-level overview
• Diffusion models are probabilistic models used for image generation
• They involve reversing the process of gradually degrading the data
• Consist of two processes:
The forward process: data is progressively destroyed by adding noise across
multiple time steps
The reverse process: using a neural network, noise is sequentially removed
to obtain the original data
Standard Gaussian
Data distribution
reverse
forward
High-level overview
• Three categories:
1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Research directions
Notations
𝑝 𝑥 - data distribution
𝒩 𝑥; 𝜇, 𝜎 ⋅ 𝐼 - Gaussian distribution
Random Variable (image) Mean Vector Covariance matrix. 𝐼 is the identity matrix
𝑥= 𝜇+ 𝜎 ⋅ 𝑧, 𝑧~𝒩(0, 𝐼)
Forward process
𝑥 𝑥
… …
𝑥 ~𝑝(𝑥 ) 𝑥 ~𝒩(0, 𝐼)
Denoising Diffusion Probabilistic Models (DDPMs)
𝑥 𝑥
… …
… …
𝑥 𝑥 𝑥 𝑥
Denoising Diffusion Probabilistic Models (DDPMs)
𝛽 = 𝛼
𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 𝛽 ⋅ 𝑥 , 1 − 𝛽 I) 𝛼 =1 − 𝛽
… …
𝑥 𝑥 𝑥 𝑥
DDPMs. Properties of
1. 𝛽 ≪ 1 , 𝑡 = 1, 𝑇
𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 1 − 𝛽 𝑥 , 𝛽 I)
𝑡−1 𝑡
𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 𝜇 𝑥 ,𝑡 ,Σ 𝑥 ,𝑡 )
DDPMs. Properties of
1. 𝛽 ≪ 1 , 𝑡 = 1, 𝑇
𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 1 − 𝛽 𝑥 , 𝛽 I)
𝑡−1 𝑡
?
Less certain where was the 𝑥 , because we could have
reached 𝑥 from many more regions.
DDPMs. Properties of
1. 𝛽 ≪ 1, 𝑡 = 1, 𝑇 ⟹ 2. 𝑇 𝑖𝑠 𝑙𝑎𝑟𝑔𝑒
𝑥 𝑖𝑠 𝑝𝑢𝑟𝑒 𝑛𝑜𝑖𝑠𝑒
𝑥 𝑥
𝑇 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠
DDPMs. Training objective
Remember that:
𝑥 𝑥 𝑥 𝑥
… …
𝑝 𝑥 𝑥 ≈𝑝 𝑥 𝑥 = 𝒩(𝑥 ;𝜇 𝑥 ,𝑡 ,Σ 𝑥 ,𝑡 )
Reverse process
𝑥 𝑥 𝑥 𝑥
… …
𝑝 𝑥 𝑥 ≈𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 𝜇 𝑥 , 𝑡 , 𝜎 I)
Reverse process
𝜇 (𝑥 , 𝑡)
~𝒩 𝑥 , 𝜇 (𝑥 , 𝑡), 𝜎 I
𝑥
U-Net
U-Net
U-Net
Slide from:
Denoising Diffusion-based Generative Modeling:
Foundations and Applications
Karsten Kreis Ruiqi Gao Arash Vahdat
DDPMs. Training Objective
02 Attention and Transformers
Slides from Ming Li, University of Waterloo, CS 886 Deep Learning and NLP
DDPMs. Training Objective
• The first term measures the reconstruction error and can be addressed
with an independent decoder.
Notations:
𝑝 𝑥 𝑥 ,𝑥 = 𝒩 𝑥 ; 𝜇 𝑥 ,𝑥 ,𝛽 𝐼
𝛽 = 𝛼
1 1 − 𝛼
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 , 𝑧 ~𝒩(0, I) 𝛼 =1 − 𝛽
𝛼
1−𝛽
1−𝛽
𝛽 = ⋅𝛽
1−𝛽
DDPMs. Training Objective. Simplifications
min 𝔼 ~ ( ) − log 𝑝 𝑥 𝑥 + 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 ||𝑝 𝑥 𝑥 )
𝑝 𝑥 𝑥 ,𝑥 = 𝒩 𝑥 ; 𝜇 𝑥 ,𝑥 ,𝛽 𝐼 𝛽 = 𝛼
𝛼 =1 − 𝛽
1 1 − 𝛼
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 , 𝑧 ~𝒩(0, I) 1−𝛽
𝛼 𝛽 = ⋅𝛽
1−𝛽 1−𝛽
1 1 − 𝛼 ⟹
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 (𝑥 , 𝑡)
𝛼
1−𝛽
𝛽
⟹ 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 |𝑝 𝑥 𝑥 = 𝔼 ~𝒩( , ) 𝑧 − 𝑧 (𝑥 , 𝑡)
2𝜎 𝛼 (1 − 𝛽 )
DDPMs. Training Objective. Simplifications
min 𝔼 ~ ( ) − log 𝑝 𝑥 𝑥 + 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 ||𝑝 𝑥 𝑥 )
𝑝 𝑥 𝑥 ,𝑥 = 𝒩 𝑥 ; 𝜇 𝑥 ,𝑥 ,𝛽 𝐼 𝛽 = 𝛼
𝛼 =1 − 𝛽
1 1 − 𝛼
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 , 𝑧 ~𝒩(0, I) 1−𝛽
𝛼 𝛽 = ⋅𝛽
1−𝛽 1−𝛽
1 1 − 𝛼 ⟹
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 (𝑥 , 𝑡)
𝛼
1−𝛽
Ignored
𝛽
⟹ 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 |𝑝 𝑥 𝑥 = 𝔼 ~𝒩( , ) 𝑧 − 𝑧 (𝑥 , 𝑡)
2𝜎 𝛼 (1 − 𝛽 )
DDPMs. Training Algorithm
1
min 𝔼 ~ , ~𝒩 , 𝑧 − 𝑧 (𝑥 , 𝑡)
𝑇
Training algorithm:
Repeat 𝛽 = 𝛼
𝑥 ~𝑝 𝑥
𝑡~𝒰 1, … , 𝑇
𝑧 ~𝒩(0, I)
𝑥 = 𝛽 ⋅𝑥 + 1−𝛽 𝑧
𝜃 = 𝜃 − 𝑙𝑟 ⋅ ∇ ℒ
Until convergence
DDPMs. Sampling
𝑥
𝑧 (𝑥 , 𝑡)
• Pass the current noisy image along with t to the neural network
𝑥
𝑧 (𝑥 , 𝑡)
1 1 − 𝛼
~𝒩 𝑥 , 𝑥 − 𝑧 𝑥 ,𝑡 ,𝜎 I
𝛼
1−𝛽
𝑥
Thank You