0% found this document useful (0 votes)

2 views

lecture7-diffusion

This document is a lecture on generative AI, specifically focusing on diffusion models and unsupervised learning techniques. It outlines the assumptions and goals of various generative models, including GANs, VAEs, and diffusion models, while providing examples and methodologies for learning these models. Key concepts include the forward and reverse processes in diffusion models, as well as the structure of U-Net for image segmentation tasks.

Uploaded by

marcus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

lecture7-diffusion

Uploaded by

marcus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

10-423/10-623 Generative AI

Machine Learning Department

School of Computer Science
Carnegie Mellon University

Diffusion Models

Matt Gormley & Pat Virtue

Lecture 7
Feb. 5, 2025

1
Reminders
• Homework 1: Generative Models of Text
– Out: Mon, Jan 27
– Due: Mon, Feb 10 at 11:59pm
• Quiz 2:
– In-class: Mon, Feb 17
– Lectures 5-8
• Homework 2: Generative Models of Images
– Out: Mon, Feb 10
– Due: Sat, Feb 22 at 11:59pm
3
UNSUPERVISED LEARNING

4
Re
Unsupervised Learning ca
ll…
Assumptions:
1. our data comes from some distribution
p*(x0)
2. we choose a distribution pθ(x0) for which
sampling x0 ~ pθ(x0) is tractable
Goal: learn θ s.t. pθ(x0) ≈ p*(x0)

5
Re
Unsupervised Learning ca
ll…
Assumptions: Example: autoregressive LMs
1. our data comes from some distribution • true p*(x0) is the (human) process that
p*(x0) produced text on the web
2. we choose a distribution pθ(x0) for which • choose pθ(x0) to be an autoregressive
sampling x0 ~ pθ(x0) is tractable language model
Goal: learn θ s.t. pθ(x0) ≈ p*(x0) – autoregressive structure means that
p(xt | x1, …, xt-1) ~ Categorical(.) and
ancestral sampling is exact/efficient
• learn by finding
θ ≈ argmaxθ log(pθ(x0))
using gradient based updates on
∇θ log(pθ(x0))

6
Re
Unsupervised Learning ca
ll…
Assumptions: Example: GANs
1. our data comes from some distribution • true p*(x0) is distribution over photos taken
p*(x0) and posted to Flikr
2. we choose a distribution pθ(x0) for which • choose pθ(x0) to be an expressive model
sampling x0 ~ pθ(x0) is tractable (e.g. noise fed into inverted CNN) that can
Goal: learn θ s.t. pθ(x0) ≈ p*(x0) generate images
– sampling is typically easy:
z ~ N(0, I) and x0 = fθ(z)
• learn by finding θ ≈ argmaxθ log(pθ(x0))?
– No! Because we can’t even compute
log(pθ(x0)) or its gradient
– Why not? Because the integral is
intractable even for a simple 1-hidden
layer neural network with nonlinear
activation !
p(x0 ) = p(x0 | z)p(z)dz 7
so optimize a minimax loss instead z
Re
Unsupervised Learning ca
ll…
Assumptions: Example: VAEs / Diffusion Models
1. our data comes from some distribution • true p*(x0) is distribution over photos taken
p*(x0) and posted to Flikr
2. we choose a distribution pθ(x0) for which • choose pθ(x0) to be an expressive model
sampling x0 ~ pθ(x0) is tractable (e.g. noise fed into inverted CNN) that can
Goal: learn θ s.t. pθ(x0) ≈ p*(x0) generate images
– sampling is will be easy
• learn by finding θ ≈ argmaxθ log(pθ(x0))?
– Sort of! We can’t compute the gradient
∇θ log(pθ(x0))
– So we instead optimize a variational
lower bound (more on that later)

8
Figure from Ho et al. (2020)
Latent Variable Models
• For GANs and VAEs,
we assume that there
are (unknown) latent
variables which give
rise to our
observations
• The vector z are those
latent variables
• After learning a GAN
or VAE, we can
interpolate between
images in latent z
space

9
Figure from Radford et al. (2016)
GAN à VAE à Diffusion (in 15 minutes)

12
U-NET

13
Semantic Segmentation
• Given an image,
predict a label for
every pixel in the
image
• Not merely a
classification
problem, because
there are strong
correlations between
pixel-specific labels

Figure from https://openaccess.thecvf.com/content_iccv_2015/papers/Noh_Learning_Deconvolution_Network_ICCV_2015_paper.pdf 14

Instance Segmentation
• Predict per-pixel labels as
in semantic segmentation,
but differentiate between
different instances of the
same label
• Example: if there are two
people in the image, one
person should be labeled
person-1 and one should
be labeled person-2

15
Figure from https://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf
U-Net
Contracting path
• block consists of:
– 3x3 convolution
– 3x3 convolution
– ReLU
– max-pooling with stride of 2
(downsample)
• repeat the block N times,
doubling number of channels

Expanding path
• block consists of:
– 2x2 convolution (upsampling)
– concatenation with
contracting path features
– 3x3 convolution
– 3x3 convolution
– ReLU
• repeat the block N times,
halving the number of
channels
16
U-Net
• Originally designed
for applications to
biomedical
segmentation
• Key observation is
that the output
layer has the same
dimensions as the
input image
(possibly with
different number
of channels)

17
DIFFUSION MODELS

18
Diffusion Model

xT xT-1 … xt+1 xt … x1 x0
q(x0 )
qφ (xT | xT −1 ) qφ (xt+1 | xt ) qφ (x1 | x0 )

Forward Process:
!
T
qφ (x0:T ) = q(x0 ) qφ (xt | xt−1 )
t=1

19
Diffusion Model
pθ (xT −1 | xT ) pθ (xt | xt+1 ) pθ (x0 | x1 )
pθ (xT )

xT xT-1 … xt+1 xt … x1 x0
q(x0 )
qφ (xT | xT −1 ) qφ (xt+1 | xt ) qφ (x1 | x0 )

Forward Process:
!
T
qφ (x0:T ) = q(x0 ) qφ (xt | xt−1 )
t=1

(Learned) Reverse Process:

!
T
pθ (x0:T ) = pθ (xT ) pθ (xt−1 | xt )
t=1

20
Diffusion Model
pθ (xT −1 | xT ) pθ (xt | xt+1 ) pθ (x0 | x1 )
pθ (xT )

xT xT-1 … xt+1 xt … x1 x0

adds noise to q(x0 )

the image qφ (xT | xT −1 ) qφ (xt+1 | xt ) qφ (x1 | x0 )
if we could sample
Forward Process: from this we’d be done
!
T
qφ (x0:T ) = q(x0 ) qφ (xt | xt−1 )
t=1

(Learned) Reverse Process: removes noise

!
T
pθ (x0:T ) = pθ (xT ) pθ (xt−1 | xt )
t=1

goal is to learn this 21

Diffusion Model
pθ (xT −1 | xT ) pθ (xt | xt+1 ) pθ (x0 | x1 )
pθ (xT )

xT xT-1 … xt+1 xt … x1 x0
q(x0 )
qφ (xT | xT −1 ) qφ (xt+1 | xt ) qφ (x1 | x0 )

22
Figure from Ho et al. (2020)
Diffusion Model
pθ (xT −1 | xT ) pθ (xt | xt+1 ) pθ (x0 | x1 )
pθ (xT )

xT xT-1 … xt+1 xt … x1 x0
q(x0 )
qφ (xT | xT −1 ) qφ (xt+1 | xt ) qφ (x1 | x0 )

Question: Answer:
Which are the latent variables in
a diffusion model?

23
Figure from Ho et al. (2020)
Denoising Diffusion Probabilistic Model (DDPM)
pθ (xT −1 | xT ) pθ (xt | xt+1 ) pθ (x0 | x1 )
pθ (xT )

xT xT-1 … xt+1 xt … x1 x0
q(x0 )
qφ (xT | xT −1 ) qφ (xt+1 | xt ) qφ (x1 | x0 )

Forward Process:
!
T
q(x0 ) = data distribution
qφ (x0:T ) = q(x0 ) qφ (xt | xt−1 ) √
t=1 qφ (xt | xt−1 ) ∼ N ( αt xt−1 , (1 − αt )I)

(Learned) Reverse Process:

!
T pθ (xT ) ∼ N (0, I)
pθ (x0:T ) = pθ (xT ) pθ (xt−1 | xt )
pθ (xt−1 | xt ) ∼ N (µθ (xt , t), Σθ (xt , t))
t=1

24
Diffusion Model
pθ (xT −1 | xT ) pθ (xt | xt+1 ) pθ (x0 | x1 )
pθ (xT )
learning is hard.
xT xt+1 xt x1 why don’t x0we instead just infer
xT-1 … …
the exact reverse process
adds noise to corresponding to q(xthe0forward
)
the image qφ (xT | xT −1 ) qφ (xt+1 | xt ) qφ (x1 | x0 )process?
if we could sample
Forward Process: from this we’d be done (Exact) Reverse Process:
!
T
!
T
qφ (x0:T ) = q(x0 ) qφ (xt | xt−1 ) qφ (x0:T ) = qφ (xT ) qφ (xt−1 | xt )
t=1 t=1
The exact reverse process requires inference. And,
even though qφ (xt | xt−1 ) is simple, computing
(Learned) Reverse Process: removes noise qφ (xt−1 | xt ) is intractable! Why? Because q(x0 )
!
T might be not‐so‐simple.
pθ (x0:T ) = pθ (xT ) pθ (xt−1 | xt ) !
q (x )dx0:t−2,t+1:T
x0:t−2,t+1:T φ 0:T
t=1 qφ (xt−1 | xt ) = !
q (x )dx0:t−2,t:T
x0:t−2,t:T φ 0:T
goal is to learn this 25
Denoising Diffusion Probabilistic Model (DDPM)
pθ (xT −1 | xT ) pθ (xt | xt+1 ) pθ (x0 | x1 )
pθ (xT )

xT xT-1 … xt+1 xt … x1 x0
q(x0 )
qφ (xT | xT −1 ) qφ (xt+1 | xt ) qφ (x1 | x0 )

Forward Process:
!
T
q(x0 ) = data distribution
qφ (x0:T ) = q(x0 ) qφ (xt | xt−1 ) √
t=1 qφ (xt | xt−1 ) ∼ N ( αt xt−1 , (1 − αt )I)

(Learned) Reverse Process:

!
T pθ (xT ) ∼ N (0, I)
pθ (x0:T ) = pθ (xT ) pθ (xt−1 | xt )
pθ (xt−1 | xt ) ∼ N (µθ (xt , t), Σθ (xt , t))
t=1

26
Defining the Forward Process
Forward Process:
!
T
q(x0 ) = data distribution
qφ (x0:T ) = q(x0 ) qφ (xt | xt−1 ) √
t=1 qφ (xt | xt−1 ) ∼ N ( αt xt−1 , (1 − αt )I)

Noise schedule:
We choose αt to follow a fixed schedule s.t.
qφ (xT ) ∼ N (0, I), just like pθ (xT ).

27
Gaussian (an aside)
Let X ∼ N (µx , σx2 ) and Y ∼ N (µy , σy2 )
1. Sum of two Gaussians is a Gaussian

X + Y ∼ N (µx + µy , σx2 + σy2 )

2. Difference of two Gaussians is a Gaussian

X − Y ∼ N (µx − µy , σx2 + σy2 )

3. Gaussian with a Gaussian mean has a Gaussian Conditional

Z ∼ N (µz = X, σz2 ) ⇒ P (Z | X) ∼ N (·, ·)

4. But #3 does not hold if X is passed through a nonlinear function f

2
W ∼ N (µz = f (X), σw ) ! P (W | X) ∼ N (·, ·)
28
Gaussian (an aside)
Let X ∼ N (µx , σx2 ) and Y ∼ N (µy , σy2 )
1. Sum of two Gaussians is a Gaussian

X + Y ∼ N (µx + µy , σx2 + σy2 )

2. Difference of two Gaussians is a Gaussian

X − Y ∼ N (µx − µy , σx2 + σy2 )

3. Gaussian with a Gaussian mean has a Gaussian Conditional

Z ∼ N (µz = X, σz2 ) ⇒ P (Z | X) ∼ N (·, ·)

4. But #3 does not hold if X is passed through a nonlinear function f

2
W ∼ N (µz = f (X), σw ) ! P (W | X) ∼ N (·, ·)
29
Defining the Forward Process
Forward Process:
!
T
q(x0 ) = data distribution
qφ (x0:T ) = q(x0 ) qφ (xt | xt−1 ) √
t=1 qφ (xt | xt−1 ) ∼ N ( αt xt−1 , (1 − αt )I)

Noise schedule: Property #1:

We choose αt to follow a fixed schedule s.t. √
q(xt | x0 ) ∼ N ( ᾱt x0 , (1 − ᾱt )I)
qφ (xT ) ∼ N (0, I), just like pθ (xT ). !
t
where ᾱt = αs
s=1

Q: So what is q𝜙(xT | x0) ? Note the capital T in the

subscript.

30
Diffusion Model
pθ (xT −1 | xT ) pθ (xt | xt+1 ) pθ (x0 | x1 )
pθ (xT )

xT xT-1 … xt+1 xt … x1 x0
q(x0 )
qφ (xT | xT −1 ) qφ (xt+1 | xt ) qφ (x1 | x0 )

Forward Process: Q: If q𝜙 is just adding noise, how can pθ be interesting

!
T at all?
qφ (x0:T ) = q(x0 ) qφ (xt | xt−1 )
A: Because q(x0) is not just a noise distribution and pθ
t=1 must capture that interesting variability

(Learned) Reverse Process: Q: But if pθ (xt−1 |xt ) is Gaussian, how can it learn a θ
!
T such that pθ (x0 ) ≈ q(x0 )? Won’t pθ (x0 ) be Gaussian
pθ (x0:T ) = pθ (xT ) pθ (xt−1 | xt ) too?
t=1 A: No. In fact, a diffusion model of sufÏciently long
timespan T can capture any smooth target distribution.32
Gaussian (an aside)
Let X ∼ N (µx , σx2 ) and Y ∼ N (µy , σy2 )
1. Sum of two Gaussians is a Gaussian

X + Y ∼ N (µx + µy , σx2 + σy2 )

2. Difference of two Gaussians is a Gaussian

X − Y ∼ N (µx − µy , σx2 + σy2 )

3. Gaussian with a Gaussian mean has a Gaussian Conditional

Z ∼ N (µz = X, σz2 ) ⇒ P (Z | X) ∼ N (·, ·)

4. But #3 does not hold if X is passed through a nonlinear function f

2
W ∼ N (µz = f (X), σw ) ! P (W | X) ∼ N (·, ·)
33
Gaussian (an aside)
Let X ∼ N (µx , σx2 ) and Y ∼ N (µy , σy2 )
1. Sum of two Gaussians is a Gaussian

X + Y ∼ N (µx + µy , σx2 + σy2 )

2. Difference of two Gaussians is a Gaussian

X − Y ∼ N (µx − µy , σx2 + σy2 )

3. Gaussian with a Gaussian mean has a Gaussian Conditional

Z ∼ N (µz = X, σz2 ) ⇒ P (Z | X) ∼ N (·, ·)

4. But #3 does not hold if X is passed through a nonlinear function f

2
W ∼ N (µz = f (X), σw ) ! P (W | X) ∼ N (·, ·)
34
Diffusion Model
pθ (xT −1 | xT ) pθ (xt | xt+1 ) pθ (x0 | x1 )
pθ (xT )

xT xT-1 … xt+1 xt … x1 x0
q(x0 )
qφ (xT | xT −1 ) qφ (xt+1 | xt ) qφ (x1 | x0 )

Forward Process: Q: If q𝜙 is just adding noise, how can pθ be interesting

!
T at all?
qφ (x0:T ) = q(x0 ) qφ (xt | xt−1 )
A: Because q(x0) is not just a noise distribution and pθ
t=1 must capture that interesting variability

37
Properties of forward and exact reverse processes
Property #1:
√
q(xt | x0 ) ∼ N ( ᾱt x0 , (1 − ᾱt )I)
!
t
where ᾱt = αs
s=1

⇒ we can sample xt from x0 at any timestep t

efÏciently in closed form
√ √
⇒ xt = ᾱt x0 + 1 − ᾱt ! where ! ∼ N (0, I) this is the same reparameterization trick from VAEs

39
Properties of forward and exact reverse processes
Property #1:
√
q(xt | x0 ) ∼ N ( ᾱt x0 , (1 − ᾱt )I)
!
t
where ᾱt = αs
s=1

⇒ we can sample xt from x0 at any timestep t

efÏciently in closed form
√ √
⇒ xt = ᾱt x0 + 1 − ᾱt ! where ! ∼ N (0, I)

Property #2: Estimating q(xt−1 | xt ) is intractable

because of its dependence on q(x0 ). However,
conditioning on x0 we can efÏciently work with:

q(xt−1 | xt , x0 ) = N (µ̃q (xt , x0 ), σt2 I)

√ √
ᾱt (1 − αt ) αt (1 − ᾱt )
where µ̃q (xt , x0 ) = x0 + xt
1 − ᾱt 1 − ᾱt
(0) (t)
= αt x0 + αt xt
(1 − ᾱt−1 )(1 − αt )
σt2 =
1 − ᾱt 40
Parameterizing the learned reverse process
Recall: pθ (xt−1 | xt ) ∼ N (µθ (xt , t), Σθ (xt , t))
Later we will show that given a train‐
ing sample x0 , we want

pθ (xt−1 | xt )

to be as close as possible to

q(xt−1 | xt , x0 )

Intuitively, this makes sense: if the

learned reverse process is supposed
to subtract away the noise, then
whenever we’re working with a spe‐
cific x0 it should subtract it away
exactly as exact reverse process would
have. 41
Parameterizing the learned reverse process
Recall: pθ (xt−1 | xt ) ∼ N (µθ (xt , t), Σθ (xt , t))
Later we will show that given a train‐ Idea #1: Rather than learn Σθ (xt , t) just use what we
ing sample x0 , we want know about q(xt−1 | xt , x0 ) ∼ N (µ̃q (xt , x0 ), σt2 I):

Σθ (xt , t) = σt2 I
pθ (xt−1 | xt )

to be as close as possible to Idea #2: Choose µθ based on q(xt−1 | xt , x0 ), i.e. we

want µθ (xt , t) to be close to µ̃q (xt , x0 ). Here are
q(xt−1 | xt , x0 ) three ways we could parameterize this:

Intuitively, this makes sense: if the Option A: Learn a network that approximates µ̃q (xt , x0 )
learned reverse process is supposed directly from xt and t:
to subtract away the noise, then µθ (xt , t) = UNetθ (xt , t)
whenever we’re working with a spe‐
cific x0 it should subtract it away where t is treated as an extra feature in UNet
exactly as exact reverse process would
have. 42
Parameterizing the learned reverse process
Recall: pθ (xt−1 | xt ) ∼ N (µθ (xt , t), Σθ (xt , t))
Later we will show that given a train‐ Idea #1: Rather than learn Σθ (xt , t) just use what we
ing sample x0 , we want know about q(xt−1 | xt , x0 ) ∼ N (µ̃q (xt , x0 ), σt2 I):

Σθ (xt , t) = σt2 I
pθ (xt−1 | xt )

to be as close as possible to Idea #2: Choose µθ based on q(xt−1 | xt , x0 ), i.e. we

want µθ (xt , t) to be close to µ̃q (xt , x0 ). Here are
q(xt−1 | xt , x0 ) three ways we could parameterize this:

Intuitively, this makes sense: if the Option B: Learn a network that approximates the
learned reverse process is supposed real x0 from only xt and t:
to subtract away the noise, then (0) (0) (t)
µθ (xt , t) = αt xθ (xt , t) + αt xt
whenever we’re working with a spe‐
(0)
cific x0 it should subtract it away where xθ (xt , t) = UNetθ (xt , t)
exactly as exact reverse process would
have. 43
Properties of forward and exact reverse processes
Property #1: Property #3: Combining the two previous prop‐
√ erties, we can obtain a different parameteriza‐
q(xt | x0 ) ∼ N ( ᾱt x0 , (1 − ᾱt )I) tion of µ̃q which has been shown empirically to
!
t help in learning pθ .
where ᾱt = αs √
Rearranging xt = ᾱt x0 + 1 − ᾱt ! we have
√
s=1
that:
⇒ we can sample xt from x0 at any timestep t √ " √
x0 = xt − 1 − ᾱt ! / ᾱt
!
efÏciently in closed form
√ √
⇒ xt = ᾱt x0 + 1 − ᾱt ! where ! ∼ N (0, I) Substituting this definition of x0 into property
#2’s definition of µ̃q gives:
Property #2: Estimating q(xt−1 | xt ) is intractable
because of its dependence on q(x0 ). However, (0) (t)
µ̃q (xt , x0 ) = αt x0 + αt xt
conditioning on x0 we can efÏciently work with: √
(0) !! " √ " (t)
= αt xt − 1 − ᾱt ! / ᾱt + αt xt
q(xt−1 | xt , x0 ) = N (µ̃q (xt , x0 ), σt2 I) 1
#
(1 − αt )
$
√ √ =√ xt − √ !
ᾱt (1 − αt ) αt (1 − ᾱt ) 1 − ᾱt
where µ̃q (xt , x0 ) = x0 + xt αt
1 − ᾱt 1 − ᾱt
(0) (t)
= αt x0 + αt xt
(1 − ᾱt−1 )(1 − αt )
σt2 =
1 − ᾱt 44
Parameterizing the learned reverse process
Recall: pθ (xt−1 | xt ) ∼ N (µθ (xt , t), Σθ (xt , t))
Later we will show that given a train‐ Idea #1: Rather than learn Σθ (xt , t) just use what we
ing sample x0 , we want know about q(xt−1 | xt , x0 ) ∼ N (µ̃q (xt , x0 ), σt2 I):

Σθ (xt , t) = σt2 I
pθ (xt−1 | xt )

to be as close as possible to Idea #2: Choose µθ based on q(xt−1 | xt , x0 ), i.e. we

want µθ (xt , t) to be close to µ̃q (xt , x0 ). Here are
q(xt−1 | xt , x0 ) three ways we could parameterize this:

Intuitively, this makes sense: if the Option C: Learn a network that approximates the
learned reverse process is supposed ! that gave rise to xt from x0 in the forward
process from xt and t:
to subtract away the noise, then
whenever we’re working with a spe‐ (0) (0) (t)
µθ (xt , t) = αt xθ (xt , t) + αt xt
cific x0 it should subtract it away √ " √
(0)
exactly as exact reverse process would where xθ (xt , t) = xt − 1 − ᾱt !θ (xt , t) / ᾱt
!

have. where !θ (xt , t) = UNetθ (xt , t) 45

Parameterizing the learned reverse process
Recall: pθ (xt−1 | xt ) ∼ N (µθ (xt , t), Σθ (xt , t))
Option A: Learn a network that approximates µ̃q (xt , x0 )
directly from xt and t:
Idea #1: Rather than learn Σθ (xt , t) just use what µθ (xt , t) = UNetθ (xt , t)
we know about q(xt−1 | xt , x0 ) ∼ N (µ̃q (xt , x0 ), σt2 I):
where t is treated as an extra feature in UNet
Σθ (xt , t) = σt2 I Option B: Learn a network that approximates the real
x0 from only xt and t:
Idea #2: Choose µθ based on q(xt−1 | xt , x0 ), i.e. (0) (0) (t)
µθ (xt , t) = αt xθ (xt , t) + αt xt
we want µθ (xt , t) to be close to µ̃q (xt , x0 ). Here
(0)
are three ways we could parameterize this: where xθ (xt , t) = UNetθ (xt , t)
Option C: Learn a network that approximates the ! that
gave rise to xt from x0 in the forward process from xt
and t:
(0) (0) (t)
µθ (xt , t) = αt xθ (xt , t) + αt xt
(0) √ " √
where xθ (xt , t) = xt − 1 − ᾱt !θ (xt , t) / ᾱt
!

where !θ (xt , t) = UNetθ (xt , t) 46

DIFFUSION MODEL TRAINING

47
Learning the Reverse Process
Recall: given a training sample x0 , Algorithm 1 Training (Option C)
we want 1: initialize θ
2: for e ∈ {1, . . . , E} do
pθ (xt−1 | xt ) 3: for x0 ∈ D do
4: t ∼ Uniform(1, . . . , T )
to be as close as possible to
5: ! ∼ N√(0, I) √
q(xt−1 | xt , x0 ) 6: xt ← ᾱt x0 + 1 − ᾱt !
7: #t (θ) ← &! − !θ (xt , t)&2
8: θ ← θ − ∇θ #t (θ)
Depending on which of the
options for parameterization we Option C: Learn a network that approximates the ! that
pick, we get a different training gave rise to xt from x0 in the forward process from xt
and t:
algorithm.
(0) (0) (t)
µθ (xt , t) = αt xθ (xt , t) + αt xt
Option C is the best (0) √ " √
where xθ (xt , t) = xt − 1 − ᾱt !θ (xt , t) / ᾱt
!

empirically where !θ (xt , t) = UNetθ (xt , t) 48

Diffusion: by Aryan Jain
100% (1)
Diffusion: by Aryan Jain
55 pages
Demystifying Variational Diffusion Models
No ratings yet
Demystifying Variational Diffusion Models
48 pages
CVPR2022 Tutorial Diffusion Model
No ratings yet
CVPR2022 Tutorial Diffusion Model
188 pages
Diffusion
No ratings yet
Diffusion
55 pages
lec24.diffusion
No ratings yet
lec24.diffusion
83 pages
Lecture 5 Diffusion - Models Part II Final
No ratings yet
Lecture 5 Diffusion - Models Part II Final
49 pages
Lecture7-8 Diffusion Model
No ratings yet
Lecture7-8 Diffusion Model
136 pages
Lecture7 8 - Diffusion - Model 1 78 1 66
No ratings yet
Lecture7 8 - Diffusion - Model 1 78 1 66
66 pages
Lecture7 8 Diffusion Model 1 78
No ratings yet
Lecture7 8 Diffusion Model 1 78
78 pages
DiffusionModel DDPM
No ratings yet
DiffusionModel DDPM
52 pages
Lecture # 13-2 Stable Diffusion Model
No ratings yet
Lecture # 13-2 Stable Diffusion Model
48 pages
2312.10393v1
No ratings yet
2312.10393v1
24 pages
Diffusion Models A Concise Perspective
No ratings yet
Diffusion Models A Concise Perspective
8 pages
Denoising Diffusion Probabilistic Models
No ratings yet
Denoising Diffusion Probabilistic Models
25 pages
Diffusion model
No ratings yet
Diffusion model
16 pages
Diffusion Model Clearly Explained! _ by Steins _ Medium
No ratings yet
Diffusion Model Clearly Explained! _ by Steins _ Medium
18 pages
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
No ratings yet
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
51 pages
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
No ratings yet
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
10 pages
slides2 (1)
No ratings yet
slides2 (1)
28 pages
Time Grad
No ratings yet
Time Grad
11 pages
Khan - Diffusion Models and Normalizing Flows
No ratings yet
Khan - Diffusion Models and Normalizing Flows
36 pages
Chapter 5
No ratings yet
Chapter 5
140 pages
2303.07576v1
No ratings yet
2303.07576v1
5 pages
Mlgs 2021 Endterm Solution
No ratings yet
Mlgs 2021 Endterm Solution
26 pages
Lecture 4 Diffusion - Models Part I Final
No ratings yet
Lecture 4 Diffusion - Models Part I Final
39 pages
kaist_cs492d_fall_2024_lecture_4
No ratings yet
kaist_cs492d_fall_2024_lecture_4
33 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
kaist_cs492d_fall_2024_lecture_5
No ratings yet
kaist_cs492d_fall_2024_lecture_5
77 pages
SD Flawed Schedule and Sampler
No ratings yet
SD Flawed Schedule and Sampler
8 pages
Short MCMC Supplementary
No ratings yet
Short MCMC Supplementary
5 pages
Diffusion Models For Time Series Applications: A Survey
No ratings yet
Diffusion Models For Time Series Applications: A Survey
25 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
1807.01622v1
No ratings yet
1807.01622v1
11 pages
Mock Endterm ADL 2021
No ratings yet
Mock Endterm ADL 2021
8 pages
SegDiff - Image Segmentation With Diffusion Probabilistic Models
No ratings yet
SegDiff - Image Segmentation With Diffusion Probabilistic Models
13 pages
CS772-Lec21
No ratings yet
CS772-Lec21
26 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Diffusion Models in Deep Learning
No ratings yet
Diffusion Models in Deep Learning
14 pages
10 1 1 314 2260 PDF
No ratings yet
10 1 1 314 2260 PDF
41 pages
Improved Denoising Diffusion Probabilistic Models
No ratings yet
Improved Denoising Diffusion Probabilistic Models
17 pages
L11 - UCLxDeepMind DL2020
No ratings yet
L11 - UCLxDeepMind DL2020
68 pages
Lec-4-Opt and BP
No ratings yet
Lec-4-Opt and BP
75 pages
Final 2012 W
No ratings yet
Final 2012 W
8 pages
nn (2)
No ratings yet
nn (2)
33 pages
Lecture # 4-2 Autoregressive Models
No ratings yet
Lecture # 4-2 Autoregressive Models
39 pages
winter1516_lecture53
No ratings yet
winter1516_lecture53
20 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
NU-Wave - A Diffusion Probabilistic Model For Neural Audio Upsampling - 6 Apr 2021
No ratings yet
NU-Wave - A Diffusion Probabilistic Model For Neural Audio Upsampling - 6 Apr 2021
5 pages
Final 2012 Wsolutions
No ratings yet
Final 2012 Wsolutions
14 pages
Neural Network Diffusion: Forward Process
No ratings yet
Neural Network Diffusion: Forward Process
17 pages
AI60201_module3_4_problems (1)
No ratings yet
AI60201_module3_4_problems (1)
4 pages
SS_2020
No ratings yet
SS_2020
21 pages
Lecture04 VDL
No ratings yet
Lecture04 VDL
93 pages
2209.04747v6
No ratings yet
2209.04747v6
25 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Instructor's Solution Manual For Neural Networks
No ratings yet
Instructor's Solution Manual For Neural Networks
40 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Group Assignment-2 Answer
No ratings yet
Group Assignment-2 Answer
6 pages
Driver and Maintenance Operations TD Steering System
No ratings yet
Driver and Maintenance Operations TD Steering System
52 pages
GCSE-Mathematics - 504-November2021-Higher Tier, M8 - Paper 2 (With Calculator) - Paper-2
No ratings yet
GCSE-Mathematics - 504-November2021-Higher Tier, M8 - Paper 2 (With Calculator) - Paper-2
16 pages
MC Manuel - Gardner - Fernandes Pickup Music V1 PDF
88% (8)
MC Manuel - Gardner - Fernandes Pickup Music V1 PDF
32 pages
Mba Strategy Quant Advanced
No ratings yet
Mba Strategy Quant Advanced
196 pages
CS276 PA1 Report Rukmani Ravi Sundaram Tayyab Tariq 1.description of The Structure of The Program: Index - Py
No ratings yet
CS276 PA1 Report Rukmani Ravi Sundaram Tayyab Tariq 1.description of The Structure of The Program: Index - Py
2 pages
Dell 27 Monitor - E2720H Data Sheet
No ratings yet
Dell 27 Monitor - E2720H Data Sheet
4 pages
Design of Control Laws and State Observers For Fixed-Wing UAVs Simulation and Experimental Approaches 1st Edition - Ebook PDF 2024 Scribd Download
100% (4)
Design of Control Laws and State Observers For Fixed-Wing UAVs Simulation and Experimental Approaches 1st Edition - Ebook PDF 2024 Scribd Download
41 pages
Bowens Reaction Series
No ratings yet
Bowens Reaction Series
9 pages
HGP HT: Technical Manual
No ratings yet
HGP HT: Technical Manual
28 pages
Refrigeration Oils
100% (1)
Refrigeration Oils
48 pages
TCD 2013, 2200 071 KW TCD 2013 L04 2V Q400 Tier3
No ratings yet
TCD 2013, 2200 071 KW TCD 2013 L04 2V Q400 Tier3
1 page
02 HCF & LCM (Eng.)
No ratings yet
02 HCF & LCM (Eng.)
2 pages
Electromagnetic Induction 2 QP
No ratings yet
Electromagnetic Induction 2 QP
9 pages
Industrial Electronics
100% (1)
Industrial Electronics
3 pages
LASER Is Abbreviation of Light Amplification by Stimulated Emission of
No ratings yet
LASER Is Abbreviation of Light Amplification by Stimulated Emission of
4 pages
PrEN 14227-05 - Granular Materials Bound With Hydraulic Road
No ratings yet
PrEN 14227-05 - Granular Materials Bound With Hydraulic Road
34 pages
1 - Possessive Nouns _ Determiners
No ratings yet
1 - Possessive Nouns _ Determiners
2 pages
Amt 4203 Finals - Module 5 Practice Problem 1
No ratings yet
Amt 4203 Finals - Module 5 Practice Problem 1
3 pages
Laporan Allison
No ratings yet
Laporan Allison
7 pages
Tutorial 7 Suggested Solutions
No ratings yet
Tutorial 7 Suggested Solutions
7 pages
Ncert Maths Module
100% (1)
Ncert Maths Module
171 pages
النحو التحويلي
No ratings yet
النحو التحويلي
26 pages
V32 40 Working Instructions
No ratings yet
V32 40 Working Instructions
724 pages
Compressor Efficiency
100% (1)
Compressor Efficiency
15 pages
Lab-02 Declarations and Initialization of Data Variables, Data Types, Escape Sequence
No ratings yet
Lab-02 Declarations and Initialization of Data Variables, Data Types, Escape Sequence
4 pages
Coal Fired PP at WHTC Mangla
No ratings yet
Coal Fired PP at WHTC Mangla
22 pages
Chi Square POGIL
No ratings yet
Chi Square POGIL
3 pages
Answer Key Elecs Superbook
100% (1)
Answer Key Elecs Superbook
46 pages
PR 724 Crawler Tractor
No ratings yet
PR 724 Crawler Tractor
18 pages