0% found this document useful (0 votes)
25 views23 pages

VAE Continued: Biplab Banerjee

Uploaded by

Atul Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views23 pages

VAE Continued: Biplab Banerjee

Uploaded by

Atul Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

VAE continued

Biplab Banerjee
Auto-encoder re-visited
• It contains two parts:
 Encoder
 Decoder
• Encoder is used for feature abstraction

• Can this be used as a generative model?


 Given h, can we generate meaningful data?
Auto-encoder re-visited

• h is usually high-dimensional

• Unless given, it is very difficult to sample a meaningful


h without any prior knowledge

Probabilistic
interpretation
of AE?
Some cases
Let’s summarize
• Continuous latent space vs sparse latent space
• We need to constrain the encoded space
• However, since data itself is complex and the encoder network has
non-linear transformations, the distribution of the encoded space is
super complex!
• Solution – approximate inference!
Goal of VAE
Goal

Can be realized in terms of


Neural networks
VAE

 The decoder should maximize


The likelihood of P(X|z)

 The encoder should constrain


The z space to be some
Known continuous distribution Regularized AE? Like contrastive AE?
VAE
• For each data point, we want to estimate a distribution (or the
parameter of a distribution) such that with high probability, a sample
from this distribution will be able to reconstruct the original data
point
Effect of the loss terms
The variation inference perspective

 X is visible
 Z is latent or unobserved

The goal is for a given X, we want the most likely Z which offers the
Best reconstruction of X

Solutions: Either MCMC or variational inference


Variational inference
• Since the posterior is intractable, we approximate P by a known
distribution Q
• We assume that Q comes from a Gaussian and we can use the
encoder network to estimate the distribution parameters
• Goal: We need Q to be as close as to P
Recall
• We want to maximize the likelihood of X given Z
• We want to minimize the KL div in the encoded space
• We need to maximize the blue term – variational lower bound

• Maximizing the lower bound means maximizing P(X)


Analysis of the loss
We are interested in expanding both the terms

k is the dimensionality of the latent layer


Analysis of the loss

If we assume P(X|z) to be a Gaussian with mu(z) and I parameters,

Total VAE loss


Reparameterization trick

Some noise
Abstraction part – encoder only
Generation part – decoder only

You might also like