0% found this document useful (0 votes)
15 views

Lecture 4 Diffusion - Models Part I Final

Uploaded by

huukhoadn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Lecture 4 Diffusion - Models Part I Final

Uploaded by

huukhoadn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

CAP6412

Advanced Computer Vision


Mubarak Shah
[email protected]
HEC-245
Lecture-4: Diffusion Models

1/23/2023 CAP6412 - Lecture 1 Introduction 1


Diffusion models in vision: A survey
https://arxiv.org/pdf/2209.04747.pdf

Alin Croitoru Vlad Hondru Radu Tudor Ionescu Mubarak Shah


University of Bucharest, University of Bucharest, University of Bucharest, University of Central
Romania Romania Romania Florida, US
[email protected] [email protected] [email protected] [email protected]
Agenda

1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Applications
8. Research directions
Outline

1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Applicatons
8. Research directions
Motivation
Motivation
Motivation

A hedgehog using a A corgi wearing a red A transparent sculpture- of


calculator. bowtie and a purple a duck made out of glass.
party hat.

A photo of a Corgi dog riding a Pomeranian king Zebras roaming


bike in Times Square. It is wearing with tiger soldiers. in the field.
sunglasses and a beach hat.
Outline

1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Research directions
High-level overview
• Diffusion models are probabilistic models used for image generation
• They involve reversing the process of gradually degrading the data
• Consist of two processes:
 The forward process: data is progressively destroyed by adding noise across
multiple time steps
 The reverse process: using a neural network, noise is sequentially removed
to obtain the original data

Standard Gaussian
Data distribution

reverse

forward
High-level overview

• Three categories:

 Denoising Diffusion Probabilistic Models (DDPM)

 Noise Conditioned Score Networks (NCSN)

 Stochastic Differential Equations (SDE)


Outline

1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Research directions
Notations
𝑝 𝑥 - data distribution

𝒩 𝑥; 𝜇, 𝜎 ⋅ 𝐼 - Gaussian distribution

Random Variable (image) Mean Vector Covariance matrix. 𝐼 is the identity matrix

𝑥= 𝜇+ 𝜎 ⋅ 𝑧, 𝑧~𝒩(0, 𝐼)

Sample from this distribution


Denoising Diffusion Probabilistic Models (DDPMs)

Forward process
𝑥 𝑥

… …

𝑥 ~𝑝(𝑥 ) 𝑥 ~𝒩(0, 𝐼)
Denoising Diffusion Probabilistic Models (DDPMs)

𝑥 𝑥

… …

𝑥 ~𝑝(𝑥 ) Reverse process 𝑥 ~𝒩(0, 𝐼)


Denoising Diffusion Probabilistic Models (DDPMs)

Forward process (Iterative) The image is


replaced with
noise
𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 1 − 𝛽 𝑥 , 𝛽 I) 𝛽 ≪ 1 , 𝑡 = 1, 𝑇

… …
𝑥 𝑥 𝑥 𝑥
Denoising Diffusion Probabilistic Models (DDPMs)

Forward process. Ancestral sampling (One Shot) Notations:

𝛽 = 𝛼
𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 𝛽 ⋅ 𝑥 , 1 − 𝛽 I) 𝛼 =1 − 𝛽

… …
𝑥 𝑥 𝑥 𝑥
DDPMs. Properties of

1. 𝛽 ≪ 1 , 𝑡 = 1, 𝑇
𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 1 − 𝛽 𝑥 , 𝛽 I)

𝑥 is created with a small step modeled by 𝛽

𝑡−1 𝑡

𝑥 comes from region close to 𝑥 ,


therefore we can model with Gaussian

𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 𝜇 𝑥 ,𝑡 ,Σ 𝑥 ,𝑡 )
DDPMs. Properties of

1. 𝛽 ≪ 1 , 𝑡 = 1, 𝑇

𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 1 − 𝛽 𝑥 , 𝛽 I)

𝑡−1 𝑡

?
Less certain where was the 𝑥 , because we could have
reached 𝑥 from many more regions.
DDPMs. Properties of

1. 𝛽 ≪ 1, 𝑡 = 1, 𝑇 ⟹ 2. 𝑇 𝑖𝑠 𝑙𝑎𝑟𝑔𝑒
𝑥 𝑖𝑠 𝑝𝑢𝑟𝑒 𝑛𝑜𝑖𝑠𝑒

𝑥 𝑥

𝑇 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠
DDPMs. Training objective
Remember that:

𝑥 𝑥 𝑥 𝑥

… …
𝑝 𝑥 𝑥 ≈𝑝 𝑥 𝑥 = 𝒩(𝑥 ;𝜇 𝑥 ,𝑡 ,Σ 𝑥 ,𝑡 )
Reverse process

Neural network Approximated by


weights a neural network
DDPMs. Training objective
Simplification:

𝑥 𝑥 𝑥 𝑥

… …
𝑝 𝑥 𝑥 ≈𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 𝜇 𝑥 , 𝑡 , 𝜎 I)
Reverse process

Neural network Approximated by


Fix the variance instead of learning, and predict/learn the mean weights a neural network
DDPMs. Training objective
UNet-like neural network

𝜇 (𝑥 , 𝑡)

~𝒩 𝑥 , 𝜇 (𝑥 , 𝑡), 𝜎 I

𝑥
U-Net
U-Net
U-Net
Slide from:
Denoising Diffusion-based Generative Modeling:
Foundations and Applications
Karsten Kreis Ruiqi Gao Arash Vahdat
DDPMs. Training Objective
02 Attention and Transformers

Cross Entropy and KL (Kullback-Leibler) divergence

• Entropy: E(P) = - ΣiP(i)logP(i)


• Cross Entropy: C(P) = - ΣiP(i) log Q(i)
• KL divergence: DKL(P || Q) = ΣiP(i)log[P(i)/Q(i)] = ΣiP(i)[logP(i) – logQ(i)]

Slides from Ming Li, University of Waterloo, CS 886 Deep Learning and NLP
DDPMs. Training Objective

min 𝔼 ~ ( ) − log 𝑝 𝑥 𝑥 + 𝐾𝐿(𝑝(𝑥 |𝑥 )||𝑝 𝑥 ) + 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 ||𝑝 𝑥 𝑥 )

At each time step t, 𝑝 𝑥 𝑥 is as close


This term can be ignored because 𝑝 𝑥 is 𝒩 0, Ι as possible to the true posterior of the
and does not depend on 𝜃. forward process when conditioned on the
original image.
DDPMs. Training Objective. Simplifications
min 𝔼 ~ ( ) − log 𝑝 𝑥 𝑥 + 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 ||𝑝 𝑥 𝑥 )

• The KL divergences of 2 gaussians is L2 distance between their means

• The first term measures the reconstruction error and can be addressed
with an independent decoder.

• DDPMs paper introduced two simplifications that led to a much simple


objective that is based on the noise in the image.
DDPMs. Training Objective. Simplifications
min 𝔼 ~ ( ) − log 𝑝 𝑥 𝑥 + 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 ||𝑝 𝑥 𝑥 )

Notations:
𝑝 𝑥 𝑥 ,𝑥 = 𝒩 𝑥 ; 𝜇 𝑥 ,𝑥 ,𝛽 𝐼
𝛽 = 𝛼
1 1 − 𝛼
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 , 𝑧 ~𝒩(0, I) 𝛼 =1 − 𝛽
𝛼
1−𝛽
1−𝛽
𝛽 = ⋅𝛽
1−𝛽
DDPMs. Training Objective. Simplifications
min 𝔼 ~ ( ) − log 𝑝 𝑥 𝑥 + 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 ||𝑝 𝑥 𝑥 )

Tractable posterior: Notations:

𝑝 𝑥 𝑥 ,𝑥 = 𝒩 𝑥 ; 𝜇 𝑥 ,𝑥 ,𝛽 𝐼 𝛽 = 𝛼

𝛼 =1 − 𝛽
1 1 − 𝛼
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 , 𝑧 ~𝒩(0, I) 1−𝛽
𝛼 𝛽 = ⋅𝛽
1−𝛽 1−𝛽
1 1 − 𝛼 ⟹
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 (𝑥 , 𝑡)
𝛼
1−𝛽

𝛽
⟹ 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 |𝑝 𝑥 𝑥 = 𝔼 ~𝒩( , ) 𝑧 − 𝑧 (𝑥 , 𝑡)
2𝜎 𝛼 (1 − 𝛽 )
DDPMs. Training Objective. Simplifications
min 𝔼 ~ ( ) − log 𝑝 𝑥 𝑥 + 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 ||𝑝 𝑥 𝑥 )

Tractable posterior: Notations:

𝑝 𝑥 𝑥 ,𝑥 = 𝒩 𝑥 ; 𝜇 𝑥 ,𝑥 ,𝛽 𝐼 𝛽 = 𝛼

𝛼 =1 − 𝛽
1 1 − 𝛼
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 , 𝑧 ~𝒩(0, I) 1−𝛽
𝛼 𝛽 = ⋅𝛽
1−𝛽 1−𝛽
1 1 − 𝛼 ⟹
𝜇 𝑥 ,𝑥 = 𝑥 − 𝑧 (𝑥 , 𝑡)
𝛼
1−𝛽
Ignored

𝛽
⟹ 𝐾𝐿(𝑝 𝑥 𝑥 , 𝑥 |𝑝 𝑥 𝑥 = 𝔼 ~𝒩( , ) 𝑧 − 𝑧 (𝑥 , 𝑡)
2𝜎 𝛼 (1 − 𝛽 )
DDPMs. Training Algorithm

1
min 𝔼 ~ , ~𝒩 , 𝑧 − 𝑧 (𝑥 , 𝑡)
𝑇

Training algorithm:

Repeat 𝛽 = 𝛼
𝑥 ~𝑝 𝑥
𝑡~𝒰 1, … , 𝑇
𝑧 ~𝒩(0, I)
𝑥 = 𝛽 ⋅𝑥 + 1−𝛽 𝑧
𝜃 = 𝜃 − 𝑙𝑟 ⋅ ∇ ℒ
Until convergence
DDPMs. Sampling

𝑥
𝑧 (𝑥 , 𝑡)

• Pass the current noisy image along with t to the neural network

• With the resultant compute the mean of the gaussian distribution


DDPMs. Sampling

𝑥
𝑧 (𝑥 , 𝑡)

Sample the image for the next iteration


𝜇 (𝑥 , 𝑡)

1 1 − 𝛼
~𝒩 𝑥 , 𝑥 − 𝑧 𝑥 ,𝑡 ,𝜎 I
𝛼
1−𝛽

𝑥
Thank You

You might also like