Lecture # 6 Latent Variable Models
Lecture # 6 Latent Variable Models
AI-4009 Generative AI
[object
label]
<START>
mixture
element
“easy” distribution
(e.g., Gaussian)
“easy” distribution
(e.g., Gaussian)
z x
04/23/2025 Presented by Dr. AKHTAR JAMIL 11
Latent variable models in deep learning
Easy choice: just a big fully connected network (linear layers + ReLU)
works well for tiny images (e.g., MNIST) or non-image data
normalizing flows
low
high
Intuition 2: how small is the expected log probability of one distribution under another, minus entropy?
why entropy?
this maximizes the first part
256
Lecture 13 -
Reconstructed data
Lecture 13 -
Reconstructed data
Lecture 13 -
Some background first: Autoencoders
Loss function
(Softmax, bird plane
etc) dog deer truck
Predicted Label
Fine-tune Train for final task
Classifier (sometimes with
Encoder can be encoder
jointly with small data)
used to initialize a Features Presented by
supervised model classifier
Dr.
04/23/2025 Encoder 43
AKHTAR
Input data JAMIL
Lecture 13 -
Some background first: Autoencoders
Autoencoders can reconstruct
data, and can learn features to
initialize a supervised model
Lecture 13 -
Variational Autoencoders
Probabilistic spin on autoencoders - will let us sample from the model to generate data!
Presented by
Sample from Dr.
04/23/2025 45
true prior AKHTAR
JAMIL
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Lecture 13 -
Variational Autoencoders
We want to estimate the true parameters
of this generative model.
Sample from How should we represent this model?
true conditional
Choose prior p(z) to be simple, e.g.
Decoder Gaussian.
network
Sample from Conditional p(x|z) is complex (generates
true prior image) => represent with neural Presented by
network Dr.
46
AKHTAR
JAMIL
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Lecture 13 -
04/23/2025
Variational Autoencoders
We want to estimate the true parameters
of this generative model.
Sample from
How to train the model?
true conditional
Learn model parameters to maximize
Decoder likelihood of training data
network
Sample from
true prior Presented by
Dr.
04/23/2025 47
AKHTAR
JAMIL
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Lecture 13 -
Variational
ʰ Autoencoders:
✔ ✔ Intractability
Data likelihood:
•Posterior density also intractable: ✔ ✔ ʰ
•Will see that this allows us to derive a lower bound on the data likelihood that is tractable,
which we can optimize Presented by
Dr.
04/23/2025 48
AKHTAR
JAMIL
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Lecture 13 -
Variational Autoencoders
Maximize
Putting it all together: maximizing the likelihood of Sample x|z from
likelihood lower bound original input
being
reconstructed
Decoder network
Kulback-Leibler Divergence
Sample z from
Make approximate
posterior distribution
close to prior
E
Presented by
For every minibatch of input n 49 Dr.
data: compute this forward c AKHTAR
pass, and then backprop! o Input Data
d JAMIL
e
r
Lecture 13 -
n
e
04/23/2025 t
Variational Autoencoders
why?
what does this look like? blurry “average” image what does this look like? garbage images
when reconstructing when sampling