0% found this document useful (0 votes)
6 views53 pages

L20 GenerativeModels

The document discusses generative models, specifically focusing on Variational Autoencoders (VAEs) and their foundational concepts such as the manifold hypothesis, Bayesian inference, and the evidence lower bound (ELBO). It explains the training process of VAEs, including the reparameterization trick and applications like image segmentation, denoising, and super-resolution. Additionally, it touches on Generative Adversarial Networks (GANs) and their competitive training framework.

Uploaded by

Ed Z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views53 pages

L20 GenerativeModels

The document discusses generative models, specifically focusing on Variational Autoencoders (VAEs) and their foundational concepts such as the manifold hypothesis, Bayesian inference, and the evidence lower bound (ELBO). It explains the training process of VAEs, including the reparameterization trick and applications like image segmentation, denoising, and super-resolution. Additionally, it touches on Generative Adversarial Networks (GANs) and their competitive training framework.

Uploaded by

Ed Z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Generative Models:

Variational Autoencoders

Foundations of Data Analysis

April 28, 2022


These are not real people

Karras et al., CVPR 2020, and thispersondoesnotexist.com


These are not real people

Karras et al., CVPR 2020, and thispersondoesnotexist.com


These are not real people

Karras et al., CVPR 2020, and thispersondoesnotexist.com


These are not real people

Karras et al., CVPR 2020, and thispersondoesnotexist.com


Manifold Hypothesis
Real data lie near lower-dimensional manifolds

M
Deep Generative Models

Input: Output:
d g=gL ◦gL−1 ◦···◦g1
z∈R −−−−−−−−−→ x ∈ RD
z ∼ N(0, I)

d << D
Generative Models as Immersed Manifolds

g
M

g=gL ◦gL−1 ◦···◦g1


z ∈ Rd −−−−−−−−−−→ x ∈ RD
Z

1. g should be differentiable
2. Jacobian matrix, Dg, should be full rank

Shao, Kumar, Fletcher, The Riemannian Geometry of Deep Generative Models, DiffCVML 2018.
Talking about this paper:

Diederik Kingma and Max Welling, Auto-Encoding


Variational Bayes, In International Conference on
Learning Representation (ICLR), 2014.
Autoencoders
Input Latent Space Output

x ∈ RD z ∈ Rd x 0 ∈ RD

d << D
Autoencoders

I Linear activation functions give you PCA


Autoencoders

I Linear activation functions give you PCA


I Training:
1. Given data x, feedforward to x0 output
2. Compute loss, e.g., L(x, x0 ) = kx − x0 k2
3. Backpropagate loss gradient to update weights
Autoencoders

I Linear activation functions give you PCA


I Training:
1. Given data x, feedforward to x0 output
2. Compute loss, e.g., L(x, x0 ) = kx − x0 k2
3. Backpropagate loss gradient to update weights
I Not a generative model!
Variational Autoencoders
Input Latent Space Output

σ2

x ∈ RD z ∼ N(µ, σ 2 ) x 0 ∈ RD
Generative Models

z Sample a new x in two steps:

θ Prior: p(z)
Generator: pθ (x | z)
x
Generative Models

z Sample a new x in two steps:

θ Prior: p(z)
Generator: pθ (x | z)
x

Now the analogy to the “encoder” is:

Posterior: p(z | x)
Bayesian Inference

Posterior via Bayes’ Rule:

pθ (x | z)p(z)
p(z | x) =
p(x)
pθ (x | z)p(z)
=R
pθ (x | z)p(z)dz

Integral in denominator is (usually) intractable!


Kullback-Leibler Divergence

Z  
p(z)
DKL (qkp) = − q(z) log dz
q(z)
  
p
= Eq − log
q
Kullback-Leibler Divergence

Z  
p(z)
DKL (qkp) = − q(z) log dz
q(z)
  
p
= Eq − log
q

The average information gained from moving from q to p


Variational Inference

Approximate intractable posterior p(z | x) with a


manageable distribution q(z)
Variational Inference

Approximate intractable posterior p(z | x) with a


manageable distribution q(z)

Minimize the KL divergence: DKL (q(z)kp(z | x))


Evidence Lower Bound (ELBO)
DKL (q(z)kp(z | x))
  
p(z | x)
= Eq − log
q(z)
Evidence Lower Bound (ELBO)
DKL (q(z)kp(z | x))
  
p(z | x)
= Eq − log
q(z)
 
p(z, x)
= Eq − log
q(z)p(x)
Evidence Lower Bound (ELBO)
DKL (q(z)kp(z | x))
  
p(z | x)
= Eq − log
q(z)
 
p(z, x)
= Eq − log
q(z)p(x)
= Eq [− log p(z, x) + log q(z) + log p(x)]
Evidence Lower Bound (ELBO)
DKL (q(z)kp(z | x))
  
p(z | x)
= Eq − log
q(z)
 
p(z, x)
= Eq − log
q(z)p(x)
= Eq [− log p(z, x) + log q(z) + log p(x)]
= −Eq [log p(z, x)] + Eq [log q(z)] + log p(x)
Evidence Lower Bound (ELBO)
DKL (q(z)kp(z | x))
  
p(z | x)
= Eq − log
q(z)
 
p(z, x)
= Eq − log
q(z)p(x)
= Eq [− log p(z, x) + log q(z) + log p(x)]
= −Eq [log p(z, x)] + Eq [log q(z)] + log p(x)

log p(x) = DKL (q(z)kp(z | x)) + L[q(z)]


ELBO: L[q(z)] = Eq [log p(z, x)] − Eq [log q(z)]
Variational Autoencoder

qφ (z | x) pθ (x | z)

Encoder Network Decoder Network

Maximize ELBO:

L(θ, φ, x) = Eqφ [log pθ (x, z) − log qφ (z | x)]


VAE ELBO

L(θ, φ, x) = Eqφ [log pθ (x, z) − log qφ (z | x)]


VAE ELBO

L(θ, φ, x) = Eqφ [log pθ (x, z) − log qφ (z | x)]


= Eqφ [log pθ (z) + log pθ (x | z) − log qφ (z | x)]
VAE ELBO

L(θ, φ, x) = Eqφ [log pθ (x, z) − log qφ (z | x)]


= Eqφ [log pθ (z) + log pθ (x | z) − log qφ (z | x)]
 
pθ (z)
= Eqφ log + log pθ (x | z)
qφ (z | x)
VAE ELBO

L(θ, φ, x) = Eqφ [log pθ (x, z) − log qφ (z | x)]


= Eqφ [log pθ (z) + log pθ (x | z) − log qφ (z | x)]
 
pθ (z)
= Eqφ log + log pθ (x | z)
qφ (z | x)
= −DKL (qφ (z | x)kpθ (z)) + Eqφ [log pθ (x | z)]
VAE ELBO

L(θ, φ, x) = Eqφ [log pθ (x, z) − log qφ (z | x)]


= Eqφ [log pθ (z) + log pθ (x | z) − log qφ (z | x)]
 
pθ (z)
= Eqφ log + log pθ (x | z)
qφ (z | x)
= −DKL (qφ (z | x)kpθ (z)) + Eqφ [log pθ (x | z)]

Problem: Gradient ∇φ Eqφ [log pθ (x | z)] is intractable!


VAE ELBO

L(θ, φ, x) = Eqφ [log pθ (x, z) − log qφ (z | x)]


= Eqφ [log pθ (z) + log pθ (x | z) − log qφ (z | x)]
 
pθ (z)
= Eqφ log + log pθ (x | z)
qφ (z | x)
= −DKL (qφ (z | x)kpθ (z)) + Eqφ [log pθ (x | z)]

Problem: Gradient ∇φ Eqφ [log pθ (x | z)] is intractable!


Use Monte Carlo approx., sampling z(s) ∼ qφ (z | x):
S
1X
∇φ Eqφ [log pθ (x | z)] ≈ log pθ (x | z)∇φ log qφ (z(s) | x)
S s=1
Reparameterization Trick

What about the other term?

−DKL (qφ (z | x)kpθ (z))


Reparameterization Trick

What about the other term?

−DKL (qφ (z | x)kpθ (z))

Says encoder, qφ (z | x), should make code z look like


prior distribution
Reparameterization Trick

What about the other term?

−DKL (qφ (z | x)kpθ (z))

Says encoder, qφ (z | x), should make code z look like


prior distribution

Instead of encoding z, encode parameters for a normal


distribution, N(µ, σ 2 )
Reparameterization Trick

(i) 2(i)
qφ (zj | x(i) ) = N(µj , σj )
pθ (z) = N(0, I)
Reparameterization Trick

(i) 2(i)
qφ (zj | x(i) ) = N(µj , σj )
pθ (z) = N(0, I)

KL divergence between these two is:


d
(i) 1 X 2(i) (i) 2(i)

DKL (qφ (z | x )kpθ (z)) = − 1 + log(σj ) − (µj )2 − σj
2 j=1
Results from Kingma & Welling
Why Do Variational?

Example trained on MNIST:

Autoencoder
(reconstruction loss)

From: this webpage


Why Do Variational?

Example trained on MNIST:

Autoencoder
KL divergence only
(reconstruction loss)

From: this webpage


Why Do Variational?

Example trained on MNIST:

Autoencoder VAE
KL divergence only
(reconstruction loss) (KL + recon. loss)

From: this webpage


Applications of Autoencoder / VAE
Models
Image-to-Image Networks
Instead of trying to reconstruct the original input:
1. Encode input: z = encode(x)
2. Decode derived output: y = decode(z)
Image-to-Image Networks
Instead of trying to reconstruct the original input:
1. Encode input: z = encode(x)
2. Decode derived output: y = decode(z)

Example: Image Segmentation


Image Denoising
Learn mapping from noisy inputs → clean outputs

Hales et al., JMRI 2020


Image Super-resolution

Learn mapping from low-res inputs → hi-res outputs

From: this webpage


Image Colorization

Input Latent Space Output

x ∈ RD z ∈ Rd y ∈ RD
Generative Adversarial Networks (GANs)
Generative Adversarial Network

Fake Data
Random Noise

Generator
Network

Discriminator Real / Fake


z ~ N(0, I) G(z) Network (0 / 1)
Real Data

D(x)
GAN Game Theory

GAN training is framed as a competition where:


1. Discriminator is trying to maximize its reward
2. Generator is trying to minimize it

min max V(D, G)


G D

V(D, G) = Ex∼p(x) [log D(x)] + Ez∼N(0,I) [log(1 − D(G(z))]


GAN Training Algorithm
Original GAN Faces (2014)

Goodfellow et al., NeurIPS 2014

You might also like