0% found this document useful (0 votes)
4 views

Week 4 - Diffusion Models

Uploaded by

Joel Lim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Week 4 - Diffusion Models

Uploaded by

Joel Lim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Official Open

Visual Generative
AI Application
Unets and Diffusion Models
Week 4
AY 24/25
S PEC IALIST D IPLOMA IN APPLIE D GE NE RATIV E AI ( S D GAI)
Official Open

Objectives
• By the end of this module, learners will be able to
• Explain diffusion models Using a U-Net architecture.
• Develop an intuitive overview of the theory behind denoising diffusion
• Highlight the design choices related to sampling (generating images when
a trained denoiser is available)
• Highlight the design choices when training that denoiser
Official Open

What is Diffusion Model?


• Diffusion models are an emerging
class of generative models.
• Diffusion models involves reversing
the process of gradually degrading
the data
• During training, a diffusion model is
trained by adding and removing noise
when given a set of training samples.
• During inference, it generates a new
sample using random noise as input
Official Open

Diffusion Model Pipeline


• The forward process: Perturbs the training data by adding noise at
each timestep. Data progressively destroyed by adding noise
across multiple time steps

Data Noise
Destructing Data by Adding Noise
Official Open

Diffusion Model Pipeline


• The reverse process: Optimizes a network to remove noise
perturbation. Using a neural network, noise is sequentially
removed to obtain the original data

Data Noise
Generating samples by removing Noise
Official Open

UNet Architecture
• U-Net is a
convolutional neural
network that was
developed for image
segmentation
• The U-Net
architecture has also
been employed in
diffusion models for
iterative image
denoising.
Official Open

Downsampling on Down Block


Kernel Image Output
• 2D Convolution
• ReLu 1 0 1
• BatchNorm2d .25 .25
• MaxPool2D 0 1 0
.25 .25
1 0 1

Output has a smaller dimension


compared to the input
Official Open

Downsampling on Down Block


Kernel Image Output

1 0 1
.25 .25
0 1 0
.25 .25
1 0 1
Official Open

Downsampling on Down Block


Kernel Image Output

1 • .25 0 • .25 1
.25 .25 .5
0 • .25 1 • .25 0
.25 .25
1 0 1
Official Open

Downsampling on Down Block


Kernel Image Output

1 0 • .25 1 • .25
.25 .25 .5 .5
0 1 • .25 0 • .25
.25 .25
1 0 1
Official Open

Downsampling on Down Block


Kernel Image Output

1 0 1
.25 .25 .5 .5
0 1 0
.25 .25 .5 .5
1 0 1

When is equal to 1, we move our convolution window across the image one spaces at a time.
Official Open

Upsampling on Up Block
Kernel Image Output
• Convolution Transpose
• BatchNorm2d
• ReLU .25 .25 1 0 1
• Conv2d
0 1 0
• BatchNorm2d
.25 .25
• ReLU 1 0 1

Output has a larger dimension


compared to the input
Official Open

Upsampling on Up Block
The stride defines how many rows Image Image
and columns we will add. Stride = 3
Stride = 2
With a stride of 2, we’ll add 1 row of
zeros in between each image row. 1 0 0 0 0 0 1

1 0 0 0 1 0 0 0 0 0 0 0
With a stride of 3, we’ll add 2 rows
of zeros in between each image 0 0 0 0 0 0 0 0 0 0 0 0
row.
0 0 1 0 0 0 0 0 1 0 0 0

1 0 1 0 0 0 0 0 0 0 0 0 0 0 0

0 1 0 1 0 0 0 1 0 0 0 0 0 0 0

1 0 1 1 0 0 0 0 0 1
Official Open

Upsampling on Up Block
Kernel Image Output
Stride = 2

1 • .25 0 • .25 0 0 1
.25
.25 .25 0 • .25 0 • .25 0 0 0

0 0 1 0 0
.25 .25 0 0 0 0 0

1 0 0 0 1
Official Open

Upsampling on Up Block
Kernel Image Output
Stride = 2

1 0 • .25 0 • .25 0 1
.25 0
.25 .25 0 0 • .25 0 • .25 0 0

0 0 1 0 0
.25 .25 0 0 0 0 0

1 0 0 0 1
Official Open

Upsampling on Up Block
Kernel Image Output
Stride = 2

1 0 0 0 1
.25 0 0 .25
.25 .25 0 0 0 0 0
0 .25 .25 0
0 0 1 0 0
0 .25 .25 0
.25 .25 0 0 0 0 0
.25 0 0 .25
1 0 0 0 1
Official Open

Putting the Processes together

• Both processes jointly enable the network to be trained for


distribution modelling.
• Sampling procedure obtains a random noise for data generation.
Official Open

Putting the Processes together


Official Open

Denoising Diffusion Probabilistic Models


• Denoising Diffusion Probabilistic Models (DDPM) trains a
sequence of probabilistic models to reverse each step of the noise
corruption, using knowledge of the reverse distributions.
Official Open

Denoising Diffusion Probabilistic Models


• The key to the success of DDPM is to train the reverse chain to
match the forward chain, i.e. the joint distribution of the traversal
in the forward chain closely approximates that of the reverse
chain.
• This is equivalent to minimizing the Kullback-Leibler (KL)
divergence between the 2 distribution function.
• The DDPM can be solved using Monte Carlo sampling and
stochastic optimization.
Official Open

Stochastic Differential Equations (SDE)


• DDPMs can be generalized to the case of infinite time steps or noise
levels, where the perturbation and denoising processes are
solutions to stochastic differential equations.
• Stochastic Differential Equation (SDE): diffuses a data point into
random noise given by a prescribed SDE that does not depend on
the data and has no trainable parameters.
Official Open

Stochastic Differential Equations (SDE)


• By reversing this process, random noise is smoothly removed for
sample generation.
Official Open

Stochastic Differential Equation


• Imagine an RGB image x of shape, say,
[3, 64, 64] from the dataset.
• Considering gradually adding noise 𝑤𝑡
to the image
• This is a stochastic differential
equation (SDE) solver.
• The change in image over a short time
step is random white noise.
• Solving this SDE involves simulating a
specific random numerical realization
of the process.
Official Open

Stochastic Differential Equation

• Studying this evolution on


many different starting images
and random paths.
• On average, they create a
changing shape over time.
• The complex pattern of data at
the left edge gradually mixes
and simplifies into a
featureless blob at the right
edge. This is the ubiquitous
normal distribution, or pure
white noise.
Official Open

Denoising Process
• Draw a random image of pure white noise,
• Remove the noise slowly (say, 2% at a time) by repeatedly feeding it to a
neural denoiser.
• Gradually, a random clean image emerges from underneath the noise.
• The distribution of generated content is determined by the dataset that
the denoiser network was trained with.

The denoiser output This is an


the blurry average of implementation of the
all possible clean theoretical probability
images that could flow ordinary
have been hiding differential equation
under the noise. solver.
Official Open

Stochastic Differential Equation


• We aim to find a way to sample novel
images from the true hidden data
distribution on the left.
• We can easily sample from the pure-
noise state on the right, using randn.
• SDE intuition
• Enables reversing the time direction,
and doing so automatically introduces
an extra term for the data-attraction
force.
• The force pulls the noisy image towards
its mean-square optimal denoising.
Official Open

Practical 4
Diffusion Models
• Unets
• Diffusion Models
• Optimizing Diffusion Models
Official Open

Optimizing Diffusion Models Stride = 2

Kernel Size = 3
• In transposed convolution, when
kernel size is larger than the
stride, we end up using a pixel
value for multiple computations,
resulting in “Checkerbox”
appearance.
Stride = 2
• Even if the kernel and stride are
not overlapping, the kernel can Kernel Size = 2

still output a pattern if it has


trouble learning.
Official Open

Batch and Group Normalization N


Batch Normalization Group Normalization
H, W
H, W

H, W

H, W
CC C
C
NN
Group Size
N=2 Group Size = All

Batch normalization convert the output of each neuron In group normalization, we will normalize the
across a batch into a z-score. With convolutional neural output of each sample image across a group of
networks, a kernel is equivalent to a neuron, so if the output channels. Each image in the batch does not
of each neuron creates a channel, the outputs across the influence each other.
channel is normalized.
Official Open

Activation Function: Gaussian Error Linear Unit


Official Open

Rearranged Pooling
Max Pooling
1 8 6 7 1 2 3 4
6 6
5 3 0 9 8 9 5 5
5 6 7 8 2 4
9 8 12 141 3
0 5 6 1 11
9 10 11 12 13
10 10
9 3 7 8 9 11
13 14 15 16

Max Pooling is an effective technique Einops provide tools to control rearranging of our
for reducing the size of our feature feature maps.
map, but it also drops a lot of
information We can cut every other column into strips and
stacked along the channel dimension. Then, cut
every other row into strips and stack those along
the channel dimension.
Official Open

Sinusoidal Position Embedding


0 1 2 3 4 5 6 7 8 9 …
Model may interpolate
t
between time steps

As a one-hot 0 0 0 0 0 0 0 1 0 0 … Lose important


encoding? sequence information
t=7

0 0 0 0 0 0 0 0 1 1 …

0 0 0 0 1 1 1 1 0 0 …
As binary? Large discontinuity at
0 0 1 1 0 0 1 1 0 0 … boundary. Consider
0001 + 0111 = 1000
0 1 0 1 0 1 0 1 0 1 …
Official Open

Sinusoidal Position Embedding


− 2 2 1, 0
,
2 2

cos 𝜃 , sin 𝜃

𝜃
-1, 0 1, 0

0, -1
Official Open

Time as a Sequence: Multiple Clocks


12 11 12 11

t=3 t = 15
10 9 10 9
Official Open

Conv T

Normalization

Activation Function

Convolution

Normalization

Activation Function
Increase Model Depth

Conv T

Normalization

Activation Function

Convolution

Normalization

Activation Function

Convolution
In general, adding more depth helps fight the checkerboard model

Normalization

Activation Function

Convolution

Normalization

Activation Function

Convolution

Normalization
Up
CT

Activation Function
Block

You might also like