0% found this document useful (0 votes)
21 views18 pages

DDPM Slides

Uploaded by

huukhoadn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views18 pages

DDPM Slides

Uploaded by

huukhoadn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

T=0

T = 100

T = 250
Denoising Diffusion
T = 500
Probabilistic Models (DDPM)
T = 750 Umar Jamil
License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0):
https://creativecommons.org/licenses/by-nc/4.0/legalcode
Video: https://youtu.be/I1sPXkm2NH4
T = 1000
Not for commercial use

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


What is an Autoencoder?

* The values are random and


have no meaning

X Encoder Z Decoder X’

Code
[1.2, 3.65, …]

[1.6, 6.00, …]

[10.1, 9.0, …]

[2.5, 7.0, …]
Input Reconstructed
Input

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


What’s the problem with Autoencoders?
The code learned by the model makes no sense. That is, the model can just assign any vector to the inputs without the numbers in the vector representing any
pattern. The model doesn’t capture any semantic relationship between the data.

X Encoder Decoder X’

Code

Input Reconstructed
Input

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


Introducing the Variational Autoencoder
The variational autoencoder, instead of learning a code, learns a “latent space”. The latent space represents the parameters of a (multivariate) distribution.

X Encoder Decoder X’

Latent Space

Input Reconstructed
Input

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


Why is it called latent space?
Observable variable Latent (hidden) variable

X Z

[8.67, 12.8564, 0.44875, 874.22, …]

[4.59, 13.2548, 1.14569, 148.25, …]

[1.74, 32.3476, 5.18469, 358.14, …]

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


Plato’s allegory of the cave

Observable variable

Latent (hidden) variable

[8.67, 12.8564, 0.44875, 874.22, …]

[4.59, 13.2548, 1.14569, 148.25, …]

[1.74, 32.3476, 5.18469, 358.14, …]

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


Umar Jamil - https://github.com/hkproj/pytorch-ddpm
Cave-ception!

t=0
Original object

t = 100

t = 500

t=T
Pure noise
Umar Jamil - https://github.com/hkproj/pytorch-ddpm
Reverse process: Neural network

Original image Pure noise

X0 Z1 Z2 Z3 … ZT

Forward process: Fixed

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


Let’s have fun with… math!

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


Just like with a VAE, we want to learn the
parameters of the latent space

Reverse process p

Evidence Lower Bound (ELBO)

Forward process q

Ho, J., Jain, A. and Abbeel, P., 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, pp.6840-6851.

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


How to derive the loss function?
1. We start by writing our objective: we want to maximize the log likelihood of our data,
log(𝑝 𝑥 ), marginalizing over all other latent variables.
2. We find a lower bound for the log likelihood, that is, log 𝑝 𝑥 ≥ 𝐸𝐿𝐵𝑂
3. We maximize the 𝐸𝐿𝐵𝑂 (or minimize the negated term).

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


We take a sample from our dataset
We generate a random number t, between 1 and T
We sample some noise

We add noise to our image, and we train the model to learn to predict the amount of noise present in it.

We sample some noise

We keep denoising the image progressively for T steps.

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


U-Net

Ronneberger, O., Fischer, P. and Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and
Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18 (pp. 234-
241). Springer International Publishing.

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


Training code

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


Sampling code

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


The full code is available on GitHub!
Full code: https://github.com/hkproj/pytorch-ddpm

Special thanks to:


https://github.com/lucidrains/denoising-diffusion-pytorch for the U-Net Model
https://github.com/awjuliani/pytorch-diffusion/ for the Diffusion Model

Umar Jamil - https://github.com/hkproj/pytorch-ddpm


Thanks for watching!
Don’t forget to subscribe for
more amazing content on AI
and Machine Learning!

Umar Jamil - https://github.com/hkproj/pytorch-ddpm

You might also like