0% found this document useful (0 votes)

4 views

Week 4 - Diffusion Models

Uploaded by

Joel Lim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Week 4 - Diffusion Models

Uploaded by

Joel Lim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Official Open

Visual Generative
AI Application
Unets and Diffusion Models
Week 4
AY 24/25
S PEC IALIST D IPLOMA IN APPLIE D GE NE RATIV E AI ( S D GAI)
Official Open

Objectives
• By the end of this module, learners will be able to
• Explain diffusion models Using a U-Net architecture.
• Develop an intuitive overview of the theory behind denoising diffusion
• Highlight the design choices related to sampling (generating images when
a trained denoiser is available)
• Highlight the design choices when training that denoiser
Official Open

What is Diffusion Model?

• Diffusion models are an emerging
class of generative models.
• Diffusion models involves reversing
the process of gradually degrading
the data
• During training, a diffusion model is
trained by adding and removing noise
when given a set of training samples.
• During inference, it generates a new
sample using random noise as input
Official Open

Diffusion Model Pipeline

• The forward process: Perturbs the training data by adding noise at
each timestep. Data progressively destroyed by adding noise
across multiple time steps

Data Noise
Destructing Data by Adding Noise
Official Open

Diffusion Model Pipeline

• The reverse process: Optimizes a network to remove noise
perturbation. Using a neural network, noise is sequentially
removed to obtain the original data

Data Noise
Generating samples by removing Noise
Official Open

UNet Architecture
• U-Net is a
convolutional neural
network that was
developed for image
segmentation
• The U-Net
architecture has also
been employed in
diffusion models for
iterative image
denoising.
Official Open

Downsampling on Down Block

Kernel Image Output
• 2D Convolution
• ReLu 1 0 1
• BatchNorm2d .25 .25
• MaxPool2D 0 1 0
.25 .25
1 0 1

Output has a smaller dimension

compared to the input
Official Open

Downsampling on Down Block

Kernel Image Output

1 0 1
.25 .25
0 1 0
.25 .25
1 0 1
Official Open

Downsampling on Down Block

Kernel Image Output

1 • .25 0 • .25 1
.25 .25 .5
0 • .25 1 • .25 0
.25 .25
1 0 1
Official Open

Downsampling on Down Block

Kernel Image Output

1 0 • .25 1 • .25
.25 .25 .5 .5
0 1 • .25 0 • .25
.25 .25
1 0 1
Official Open

Downsampling on Down Block

Kernel Image Output

1 0 1
.25 .25 .5 .5
0 1 0
.25 .25 .5 .5
1 0 1

When is equal to 1, we move our convolution window across the image one spaces at a time.
Official Open

Upsampling on Up Block
Kernel Image Output
• Convolution Transpose
• BatchNorm2d
• ReLU .25 .25 1 0 1
• Conv2d
0 1 0
• BatchNorm2d
.25 .25
• ReLU 1 0 1

Output has a larger dimension

compared to the input
Official Open

Upsampling on Up Block
The stride defines how many rows Image Image
and columns we will add. Stride = 3
Stride = 2
With a stride of 2, we’ll add 1 row of
zeros in between each image row. 1 0 0 0 0 0 1

1 0 0 0 1 0 0 0 0 0 0 0
With a stride of 3, we’ll add 2 rows
of zeros in between each image 0 0 0 0 0 0 0 0 0 0 0 0
row.
0 0 1 0 0 0 0 0 1 0 0 0

1 0 1 0 0 0 0 0 0 0 0 0 0 0 0

0 1 0 1 0 0 0 1 0 0 0 0 0 0 0

1 0 1 1 0 0 0 0 0 1
Official Open

Upsampling on Up Block
Kernel Image Output
Stride = 2

1 • .25 0 • .25 0 0 1
.25
.25 .25 0 • .25 0 • .25 0 0 0

0 0 1 0 0
.25 .25 0 0 0 0 0

1 0 0 0 1
Official Open

Upsampling on Up Block
Kernel Image Output
Stride = 2

1 0 • .25 0 • .25 0 1
.25 0
.25 .25 0 0 • .25 0 • .25 0 0

0 0 1 0 0
.25 .25 0 0 0 0 0

1 0 0 0 1
Official Open

Upsampling on Up Block
Kernel Image Output
Stride = 2

1 0 0 0 1
.25 0 0 .25
.25 .25 0 0 0 0 0
0 .25 .25 0
0 0 1 0 0
0 .25 .25 0
.25 .25 0 0 0 0 0
.25 0 0 .25
1 0 0 0 1
Official Open

Putting the Processes together

• Both processes jointly enable the network to be trained for

distribution modelling.
• Sampling procedure obtains a random noise for data generation.
Official Open

Putting the Processes together

Official Open

Denoising Diffusion Probabilistic Models

• Denoising Diffusion Probabilistic Models (DDPM) trains a
sequence of probabilistic models to reverse each step of the noise
corruption, using knowledge of the reverse distributions.
Official Open

Denoising Diffusion Probabilistic Models

• The key to the success of DDPM is to train the reverse chain to
match the forward chain, i.e. the joint distribution of the traversal
in the forward chain closely approximates that of the reverse
chain.
• This is equivalent to minimizing the Kullback-Leibler (KL)
divergence between the 2 distribution function.
• The DDPM can be solved using Monte Carlo sampling and
stochastic optimization.
Official Open

Stochastic Differential Equations (SDE)

• DDPMs can be generalized to the case of infinite time steps or noise
levels, where the perturbation and denoising processes are
solutions to stochastic differential equations.
• Stochastic Differential Equation (SDE): diffuses a data point into
random noise given by a prescribed SDE that does not depend on
the data and has no trainable parameters.
Official Open

Stochastic Differential Equations (SDE)

• By reversing this process, random noise is smoothly removed for
sample generation.
Official Open

Stochastic Differential Equation

• Imagine an RGB image x of shape, say,
[3, 64, 64] from the dataset.
• Considering gradually adding noise 𝑤𝑡
to the image
• This is a stochastic differential
equation (SDE) solver.
• The change in image over a short time
step is random white noise.
• Solving this SDE involves simulating a
specific random numerical realization
of the process.
Official Open

Stochastic Differential Equation

• Studying this evolution on

many different starting images
and random paths.
• On average, they create a
changing shape over time.
• The complex pattern of data at
the left edge gradually mixes
and simplifies into a
featureless blob at the right
edge. This is the ubiquitous
normal distribution, or pure
white noise.
Official Open

Denoising Process
• Draw a random image of pure white noise,
• Remove the noise slowly (say, 2% at a time) by repeatedly feeding it to a
neural denoiser.
• Gradually, a random clean image emerges from underneath the noise.
• The distribution of generated content is determined by the dataset that
the denoiser network was trained with.

The denoiser output This is an

the blurry average of implementation of the
all possible clean theoretical probability
images that could flow ordinary
have been hiding differential equation
under the noise. solver.
Official Open

Stochastic Differential Equation

• We aim to find a way to sample novel
images from the true hidden data
distribution on the left.
• We can easily sample from the pure-
noise state on the right, using randn.
• SDE intuition
• Enables reversing the time direction,
and doing so automatically introduces
an extra term for the data-attraction
force.
• The force pulls the noisy image towards
its mean-square optimal denoising.
Official Open

Practical 4
Diffusion Models
• Unets
• Diffusion Models
• Optimizing Diffusion Models
Official Open

Optimizing Diffusion Models Stride = 2

Kernel Size = 3
• In transposed convolution, when
kernel size is larger than the
stride, we end up using a pixel
value for multiple computations,
resulting in “Checkerbox”
appearance.
Stride = 2
• Even if the kernel and stride are
not overlapping, the kernel can Kernel Size = 2

still output a pattern if it has

trouble learning.
Official Open

Batch and Group Normalization N

Batch Normalization Group Normalization
H, W
H, W

H, W

H, W
CC C
C
NN
Group Size
N=2 Group Size = All

Batch normalization convert the output of each neuron In group normalization, we will normalize the
across a batch into a z-score. With convolutional neural output of each sample image across a group of
networks, a kernel is equivalent to a neuron, so if the output channels. Each image in the batch does not
of each neuron creates a channel, the outputs across the influence each other.
channel is normalized.
Official Open

Activation Function: Gaussian Error Linear Unit

Official Open

Rearranged Pooling
Max Pooling
1 8 6 7 1 2 3 4
6 6
5 3 0 9 8 9 5 5
5 6 7 8 2 4
9 8 12 141 3
0 5 6 1 11
9 10 11 12 13
10 10
9 3 7 8 9 11
13 14 15 16

Max Pooling is an effective technique Einops provide tools to control rearranging of our
for reducing the size of our feature feature maps.
map, but it also drops a lot of
information We can cut every other column into strips and
stacked along the channel dimension. Then, cut
every other row into strips and stack those along
the channel dimension.
Official Open

Sinusoidal Position Embedding

0 1 2 3 4 5 6 7 8 9 …
Model may interpolate
t
between time steps

As a one-hot 0 0 0 0 0 0 0 1 0 0 … Lose important

encoding? sequence information
t=7

0 0 0 0 0 0 0 0 1 1 …

0 0 0 0 1 1 1 1 0 0 …
As binary? Large discontinuity at
0 0 1 1 0 0 1 1 0 0 … boundary. Consider
0001 + 0111 = 1000
0 1 0 1 0 1 0 1 0 1 …
Official Open

Sinusoidal Position Embedding

− 2 2 1, 0
,
2 2

cos 𝜃 , sin 𝜃

𝜃
-1, 0 1, 0

0, -1
Official Open

Time as a Sequence: Multiple Clocks

12 11 12 11

t=3 t = 15
10 9 10 9
Official Open

Conv T

Normalization

Activation Function

Convolution

Normalization

Activation Function
Increase Model Depth

Conv T

Normalization

Activation Function

Convolution

Normalization

Activation Function

Convolution
In general, adding more depth helps fight the checkerboard model

Normalization

Activation Function

Convolution

Normalization

Activation Function

Convolution

Normalization
Up
CT

Activation Function
Block

SC-900 Question Bank
100% (1)
SC-900 Question Bank
42 pages
Diffusion: by Aryan Jain
100% (1)
Diffusion: by Aryan Jain
55 pages
Asset-V1 RISE+MASTER-BCG RF+Wave11-DSM09-P0-3+Type@[email protected] Build Your Team Everest Onepager Pre Read
No ratings yet
Asset-V1 RISE+MASTER-BCG RF+Wave11-DSM09-P0-3+Type@[email protected] Build Your Team Everest Onepager Pre Read
1 page
CNN With Tensor Flow
No ratings yet
CNN With Tensor Flow
61 pages
New Denoising Diffusion Model
No ratings yet
New Denoising Diffusion Model
13 pages
Diffusion
No ratings yet
Diffusion
55 pages
2312.14977diffusion Models For Generative Artificial
No ratings yet
2312.14977diffusion Models For Generative Artificial
23 pages
Elucidating The Design Space of Diffusion-Based Generative Models
No ratings yet
Elucidating The Design Space of Diffusion-Based Generative Models
47 pages
Stable Diffusion For Image Generation
No ratings yet
Stable Diffusion For Image Generation
23 pages
Diffusion
100% (5)
Diffusion
62 pages
An In-Depth Guide To Denoising Diffusion Probabilistic Models - From Theory To Implementation
No ratings yet
An In-Depth Guide To Denoising Diffusion Probabilistic Models - From Theory To Implementation
18 pages
Slides 1
No ratings yet
Slides 1
50 pages
Deep Learning notes
No ratings yet
Deep Learning notes
155 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
SegDiff - Image Segmentation With Diffusion Probabilistic Models
No ratings yet
SegDiff - Image Segmentation With Diffusion Probabilistic Models
13 pages
Matconvnet Manual
No ratings yet
Matconvnet Manual
59 pages
CVPR2022 Tutorial Diffusion Model
No ratings yet
CVPR2022 Tutorial Diffusion Model
188 pages
Tutorialon Diffusion Modelsfor Imaging and Vision
No ratings yet
Tutorialon Diffusion Modelsfor Imaging and Vision
90 pages
diffusion-csail-lecture-notes
No ratings yet
diffusion-csail-lecture-notes
56 pages
2582 Elucidating The Design Space o
No ratings yet
2582 Elucidating The Design Space o
13 pages
Diffusion Models A Concise Perspective
No ratings yet
Diffusion Models A Concise Perspective
8 pages
2209.04747v6
No ratings yet
2209.04747v6
25 pages
Lecture7-8 Diffusion Model
No ratings yet
Lecture7-8 Diffusion Model
136 pages
Lecture7 8 - Diffusion - Model 1 78 1 66
No ratings yet
Lecture7 8 - Diffusion - Model 1 78 1 66
66 pages
Lecture7 8 Diffusion Model 1 78
No ratings yet
Lecture7 8 Diffusion Model 1 78
78 pages
Tensorflow and Deep Learning
No ratings yet
Tensorflow and Deep Learning
51 pages
Stable Diffusion A Tutorial
100% (1)
Stable Diffusion A Tutorial
66 pages
diffusion_models_for_pnp_IR
No ratings yet
diffusion_models_for_pnp_IR
48 pages
Matconvnet: Convolutional Neural Networks For Matlab
No ratings yet
Matconvnet: Convolutional Neural Networks For Matlab
55 pages
Diffusion Model Clearly Explained! _ by Steins _ Medium
No ratings yet
Diffusion Model Clearly Explained! _ by Steins _ Medium
18 pages
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan September 10, 2024
No ratings yet
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan September 10, 2024
89 pages
Module 5
No ratings yet
Module 5
23 pages
Module 5
No ratings yet
Module 5
23 pages
HCIP-AI-EI Developer V2.0 Training Material
No ratings yet
HCIP-AI-EI Developer V2.0 Training Material
508 pages
DMD Lowres
No ratings yet
DMD Lowres
22 pages
Implemented LeNet on PyTorch
100% (1)
Implemented LeNet on PyTorch
17 pages
Lecture 5 Diffusion - Models Part II Final
No ratings yet
Lecture 5 Diffusion - Models Part II Final
49 pages
A Survey On Generative Diffusion Models
No ratings yet
A Survey On Generative Diffusion Models
26 pages
Final Term Paper Draft 2
No ratings yet
Final Term Paper Draft 2
33 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Consistency Models
No ratings yet
Consistency Models
41 pages
Stable_Diffusion_Diagrams_V2
No ratings yet
Stable_Diffusion_Diagrams_V2
29 pages
Diffusion Model
No ratings yet
Diffusion Model
17 pages
Deep Learning - Image Synthesis
No ratings yet
Deep Learning - Image Synthesis
36 pages
01 - Digital Image Processing - Part 01-1
No ratings yet
01 - Digital Image Processing - Part 01-1
60 pages
Computer Vision NN Architecture
No ratings yet
Computer Vision NN Architecture
19 pages
f8194544 Microsoft PowerPoint DeepLearning
No ratings yet
f8194544 Microsoft PowerPoint DeepLearning
28 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
WS_2021
No ratings yet
WS_2021
16 pages
6 Batchnorm
No ratings yet
6 Batchnorm
30 pages
slides2 (1)
No ratings yet
slides2 (1)
28 pages
Lecture # 13-2 Stable Diffusion Model
No ratings yet
Lecture # 13-2 Stable Diffusion Model
48 pages
mit_diffusion
No ratings yet
mit_diffusion
30 pages
Week 6 Unsupervised Learning
No ratings yet
Week 6 Unsupervised Learning
60 pages
DiffusionModel DDPM
No ratings yet
DiffusionModel DDPM
52 pages
Diffusion Models
No ratings yet
Diffusion Models
46 pages
Diffusion Models in Vision a Survey
No ratings yet
Diffusion Models in Vision a Survey
20 pages
Neural ODES
No ratings yet
Neural ODES
32 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Pixologic ZBrush 2022: A Comprehensive Guide, 8th Edition
From Everand
Pixologic ZBrush 2022: A Comprehensive Guide, 8th Edition
Prof. Sham Tickoo
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Autodesk Maya 2022: A Comprehensive Guide, 13th Edition
From Everand
Autodesk Maya 2022: A Comprehensive Guide, 13th Edition
Prof. Sham Tickoo
No ratings yet
S3-K-Nearest-Neighbor-LKW-15Jan2025
No ratings yet
S3-K-Nearest-Neighbor-LKW-15Jan2025
16 pages
Lecture 6 - Use Cases of CNN and Implementation
No ratings yet
Lecture 6 - Use Cases of CNN and Implementation
33 pages
S4-LogisticRegression-15Jan2025
No ratings yet
S4-LogisticRegression-15Jan2025
25 pages
Lecture 0 - DLIR Module Intro
No ratings yet
Lecture 0 - DLIR Module Intro
8 pages
AST Day 4 Slides (New)
No ratings yet
AST Day 4 Slides (New)
37 pages
Ans in Day 4 Slides
No ratings yet
Ans in Day 4 Slides
5 pages
Lecture 1 - NN - Computation
No ratings yet
Lecture 1 - NN - Computation
5 pages
Exercise 1.2 Data Exploration
No ratings yet
Exercise 1.2 Data Exploration
1 page
Lecture 1 - Introduction To NN - CET
No ratings yet
Lecture 1 - Introduction To NN - CET
53 pages
Lecture 2 - CNN and Overfitting
No ratings yet
Lecture 2 - CNN and Overfitting
42 pages
Week 0 - Introduction To SDGAI
No ratings yet
Week 0 - Introduction To SDGAI
8 pages
Lecture 2 - Conv - Operation
No ratings yet
Lecture 2 - Conv - Operation
31 pages
Week 1 - Introduction To SDGAI
No ratings yet
Week 1 - Introduction To SDGAI
36 pages
Week 3 - Post - GAN
No ratings yet
Week 3 - Post - GAN
38 pages
M2 - T-GCPFCI-B - Core Infrastructure 5.0 - ILT
No ratings yet
M2 - T-GCPFCI-B - Core Infrastructure 5.0 - ILT
47 pages
08 Interconnecting Networks
No ratings yet
08 Interconnecting Networks
45 pages
06 Sample Exam Questions
No ratings yet
06 Sample Exam Questions
79 pages
11 Managed Services
No ratings yet
11 Managed Services
25 pages
M3 - T-GCPFCI-B - Core Infrastructure 5.0 - ILT
No ratings yet
M3 - T-GCPFCI-B - Core Infrastructure 5.0 - ILT
45 pages
Mod6 4.1 Blogger vs. WordPress
No ratings yet
Mod6 4.1 Blogger vs. WordPress
5 pages
Leadership Team
No ratings yet
Leadership Team
10 pages
01 Interacting With Google Cloud
No ratings yet
01 Interacting With Google Cloud
20 pages
Mod6 2.1 RACE Framework Your Practical Tool For Effective Digital Marketing
No ratings yet
Mod6 2.1 RACE Framework Your Practical Tool For Effective Digital Marketing
5 pages
07 Resource Monitoring
No ratings yet
07 Resource Monitoring
37 pages
Comptia Linux Xk0 005 Exam Objectives (2 0)
No ratings yet
Comptia Linux Xk0 005 Exam Objectives (2 0)
16 pages
Kotlin Language Features Map
No ratings yet
Kotlin Language Features Map
8 pages
SuperEx Whitepaper - Eng
No ratings yet
SuperEx Whitepaper - Eng
18 pages
Imanager Merged
No ratings yet
Imanager Merged
51 pages
Data Mining
No ratings yet
Data Mining
25 pages
Concepts
No ratings yet
Concepts
651 pages
F
100% (1)
F
52 pages
Understanding Python
No ratings yet
Understanding Python
9 pages
FINAL PROJECT - PHÂN TÍCH THIẾT KẾ HỆ THỐNG THÔNG TIN
No ratings yet
FINAL PROJECT - PHÂN TÍCH THIẾT KẾ HỆ THỐNG THÔNG TIN
47 pages
Introduction to Quartus Updated
No ratings yet
Introduction to Quartus Updated
12 pages
Discovering Computers 2010: Living in A Digital World
No ratings yet
Discovering Computers 2010: Living in A Digital World
35 pages
Configuring SAP GUI Transaction in Fiori Launchpad
No ratings yet
Configuring SAP GUI Transaction in Fiori Launchpad
20 pages
Remedial Worksheet2 Grade 8 AK
No ratings yet
Remedial Worksheet2 Grade 8 AK
3 pages
Manual Viewer
No ratings yet
Manual Viewer
236 pages
Thesis On Image Compression PDF
100% (3)
Thesis On Image Compression PDF
5 pages
Voucher VENAVINON JOURS Up 636 02.24.24 Venavinon
No ratings yet
Voucher VENAVINON JOURS Up 636 02.24.24 Venavinon
2 pages
FInal Demo Teaching TLE - ICT
No ratings yet
FInal Demo Teaching TLE - ICT
4 pages
2nd PT Math 9 17 - 18
No ratings yet
2nd PT Math 9 17 - 18
4 pages
ARM Prog Model 3 Arithmetic
No ratings yet
ARM Prog Model 3 Arithmetic
13 pages
Linux Lab
No ratings yet
Linux Lab
31 pages
15forteen - Prog in C - Most Repeated
No ratings yet
15forteen - Prog in C - Most Repeated
6 pages
A Programmers Introduction To Mathematics Second Edition Jeremy Kun instant download
No ratings yet
A Programmers Introduction To Mathematics Second Edition Jeremy Kun instant download
88 pages
Assignment A3.1 - Creating The Marketing Strategy, Marketing Mix, and Customer Journey For A Brand
No ratings yet
Assignment A3.1 - Creating The Marketing Strategy, Marketing Mix, and Customer Journey For A Brand
12 pages
MS Word 2007
No ratings yet
MS Word 2007
109 pages
TCS iON Digital Glass Room: Visvesvaraya Technological University
No ratings yet
TCS iON Digital Glass Room: Visvesvaraya Technological University
6 pages
Install IFC4Exporter
No ratings yet
Install IFC4Exporter
3 pages
PPP - RTK Comparison 2016
No ratings yet
PPP - RTK Comparison 2016
31 pages
Courses List
No ratings yet
Courses List
10 pages
Bad Math Quiz
No ratings yet
Bad Math Quiz
15 pages
Furlenco - 2 Positions (JDS)
No ratings yet
Furlenco - 2 Positions (JDS)
3 pages