0% found this document useful (0 votes)

52 views31 pages

Variational Autoencoder

This document discusses variational autoencoders and is comprised of three sections. Section one discusses why unsupervised learning and generative models are important from a presentation by Yann LeCun. Section two explains what a variational autoencoder is from a blog post by Jaan Altosaar. Section three provides a simple derivation of the VAE objective from importance sampling in a presentation by Shakir Mohamed.

Uploaded by

kensaii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views31 pages

Variational Autoencoder

Uploaded by

kensaii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Variational Autoencoders

Presented by Alex Beatson

Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed
Contents
1. Why unsupervised learning, and why generative models?
(Selected slides from Yann LeCun’s keynote at NIPS 2016)
2. What is a variational autoencoder?
(Jaan Altosaar’s blog post)
3. A simple derivation of the VAE objective from importance sampling
(Shakir Mohamed’s slides)

Sections 2 and 3 were done as a chalk talk in the presentation

1. Why unsupervised learning, and why
generative models?
• Selected slides from Yann LeCun’s keynote at NIPS 2016
Y LeCun

Supervised Learning

We can train a machine on lots of examples of tables, chairs,

dog, cars, and people
But will it recognize table, chairs, dogs, cars, and people it has
never seen before?

PLANE

CAR

CAR
Y LeCun
Obstacles to Progress in AI

Machines need to learn/understand how the world works

Physical world, digital world, people,….
They need to acquire some level of common sense
They need to learn a very large amount of background knowledge
Through observation and action
Machines need to perceive the state of the world
So as to make accurate predictions and planning
Machines need to update and remember estimates of the state of the world
Paying attention to important events. Remember relevant events
Machines neet to reason and plan
Predict which sequence of actions will lead to a desired state of the world

Intelligence & Common Sense =

Perception + Predictive Model + Memory + Reasoning & Planning
What is Common Sense?
Y LeCun

“The trophy doesn’t fit in the suitcase because it’s

too large/small”
(winograd schema)

“Tom picked up his bag and left the room”

We have common sense because we know how the

world works

How do we get machines to learn that?

Common Sense is the ability to fill in the blanks
Y LeCun

Infer the state of the world from partial information

Infer the future from the past and present
Infer past events from the present state

Filling in the visual field at the retinal blind spot

Filling in occluded images
Fillling in missing segments in text, missing words in speech.
Predicting the consequences of our actions
Predicting the sequence of actions leading to a result

Predicting any part of the past, present or future percepts from whatever
information is available.

That’s what predictive learning is

But really, that’s what many people mean by unsupervised learning
The Necessity of Unsupervised Learning / Predictive Learning
Y LeCun

The number of samples required to train a large learning machine (for any
task) depends on the amount of information that we ask it to predict.
The more you ask of the machine, the larger it can be.

“The brain has about 10^14 synapses and we only live for about 10^9
seconds. So we have a lot more parameters than data. This motivates the
idea that we must do a lot of unsupervised learning since the perceptual
input (including proprioception) is the only place we can get 10^5
dimensions of constraint per second.”
Geoffrey Hinton (in his 2014 AMA on Reddit)
(but he has been saying that since the late 1970s)

Predicting human-provided labels is not enough

Predicting a value function is not enough

How Much Information Does the Machine Need to Predict?
Y LeCun

“Pure” Reinforcement Learning (cherry)

The machine predicts a scalar
reward given once in a while.
A few bits for some samples

Supervised Learning (icing)

The machine predicts a category
or a few numbers for each input
Predicting human-supplied data
10→10,000 bits per sample

Unsupervised/Predictive Learning (cake)

The machine predicts any part of
its input for any observed part.
Predicts future frames in videos
Millions of bits per sample

(Yes, I know, this picture is slightly offensive to RL folks. But I’ll make it up)
Y LeCun

Classical model-based optimal control

● Simulate the world (the plant) with an initial control sequence
● Adjust the control sequence to optimize the objective through gradient descent
● Backprop through time was invented by control theorists in the late 1950s
– it’s sometimes called the adjoint state method
– [Athans & Falb 1966, Bryson & Ho 1969]

Plant Plant Plant Plant

Simulator Simulator Simulator Simulator

Command Command Command Command

Objective Objective Objective Objective

Y LeCun

The Architecture
Of an
Intelligent System
Y LeCun
AI System: Learning Agent + Immutable Objective

● The agent gets percepts from the world

● The agent acts on the world World
● The agents tries to minimize the long-term
expected cost.
Percepts / Actions/
Observations Outputs

Agent

State

Objective

Cost
Y LeCun

AI System: Predicting + Planning = Reasoning

● The essence of intelligence is the ability to
predict
● To plan ahead, we simulate the world World
● The action taken minimizes the predicted cost

Agent Actions/
World Percepts
Outputs
Simulator
Predicted Inferred Action
Percepts World State Proposals Agent

Actor
Agent State
Actor State
Predicted Objective Cost
Critic Cost
What we need is Model-Based Reinforcement LearningY LeCun

The essence of intelligence is the ability to predict

To plan ahead, we must simulate the world, so as to minimizes the
predicted value of some objective function.

Agent
World World World World
Simulator Simulator Simulator Simulator

Perception

Actor Actor Actor Actor

Critic Critic Critic Critic

Y LeCun

Unsupervised Learning
Energy-Based Unsupervised Learning
Y LeCun

Learning an energy function (or contrast function) that takes

Low values on the data manifold
Higher values everywhere else

Y1
Capturing Dependencies Between Variables
with an Energy Function
Y LeCun

The energy surface is a “contrast function” that takes low values on the data
manifold, and higher values everywhere else
Special case: energy = negative log density
Example: the samples live in the manifold

Y 2=(Y 1 )2

Y1
Y2
Y LeCun

Energy-Based Unsupervised Learning

● Energy Function: Takes low value on data manifold, higher values everywhere else
● Push down on the energy of desired outputs. Push up on everything else.
● But how do we choose where to push up?

Implausible futures
(high energy) Plausible futures
(low energy)
Learning the Energy Function
Y LeCun

parameterized energy function E(Y,W)

Make the energy low on the samples
Make the energy higher everywhere else
Making the energy low on the samples is easy
But how do we make it higher everywhere else?
Seven Strategies to Shape the Energy Function
Y LeCun

1. build the machine so that the volume of low energy stuff is constant
PCA, K-means, GMM, square ICA
2. push down of the energy of data points, push up everywhere else
Max likelihood (needs tractable partition function)
3. push down of the energy of data points, push up on chosen locations
contrastive divergence, Ratio Matching, Noise Contrastive Estimation,
Minimum Probability Flow
4. minimize the gradient and maximize the curvature around data points
score matching
5. train a dynamical system so that the dynamics goes to the manifold
denoising auto-encoder
6. use a regularizer that limits the volume of space that has low energy
Sparse coding, sparse auto-encoder, PSD
7. if E(Y) = ||Y - G(Y)||^2, make G(Y) as "constant" as possible.
Contracting auto-encoder, saturating auto-encoder
#1: constant volume of low energy
Energy surface for PCA and K-means
Y LeCun

1. build the machine so that the volume of low energy stuff is constant
PCA, K-means, GMM, square ICA...
K-Means,
PCA
Z constrained to 1-of-K code
E (Y )=∥W T WY −Y ∥2 E (Y )=min z ∑i ∥Y −W i Z i∥2
#6. use a regularizer that limits
the volume of space that has low energy
Y LeCun

Sparse coding, sparse auto-encoder, Predictive Sparse Decomposition

”Why generative models” take-aways:
• Any energy-based unsupervised learning method can be seen as a
probabilistic model by estimating the partition function
• I claim that any unsupervised learning method can be seen as energy-
based, and can thus be transformed into a generative or probabilistic
model
• Explicit probabilistic models are useful, because once we have one,
we can use it “out of the box” for any of a variety of “common sense”
tasks. No extra training or special procedures required.
anomaly detection, denoising, filling in the blanks/super-
resolution, compression / representation (inferring latent variables),
scoring “realism” of samples, generating samples, ….
2. What is a variational autoencoder?
• Tutorial by Jaan Altosaar: https://jaan.io/what-is-variational-
autoencoder-vae-tutorial/
“What is a VAE” take-aways:
DL interpretation:
• A VAE can be seen as a denoising compressive autoencoder
• Denoising = we inject noise to one of the layers. Compressive = the
middle layers have lower capacity than the outer layers.

Probabilistic interpretation:
• The “decoder” of the VAE can be seen as a deep (high
representational power) probabilistic model that can give us explicit
likelihoods
• The “encoder” of the VAE can be seen as a variational distribution
used to help train the decoder
2. From importance sampling to VAEs
• Selected slides from Shakir Mohamed’s talk at the Deep Learning
Summer School 2016
Importance Sampling

b-
nc- p(z) q(z) f (z) Z
z)
di- Integral problem p(x) = p(x|z)p(z)dz
wn
he Z
re q(z)
Proposal p(x) = p(x|z)p(z) dz
z q(z)
Z
e of acceptance rate with dimensionality is a
p(z)
Although rejection can be a useful technique
Importance Weight p(x) = p(x|z) q(z)dz
Notation
d to problems of high dimensionality. It can,
q(z)
n moreAlways think
sophisticated of q(z|x)
algorithms for sampling

but often will write q(z) (s) p(z)

ing w = z (s) ⇠ q(z)
for simplicity. q(z)
shing to sample from complicated probability
xpectations of the form (11.1). The technique
Conditions
amework for approximating expectations di- 1 X (s)
hanism for drawing samples from distribution Monte Carlo p(x) = w p(x|z(s) )
• q(z|x)>0, when f(z)p(z) ≠ 0. S s
the expectation, given by (11.2), depends on
• Easy
distribution p(z).toSuppose,
sample from
however, thatq(z).
it is
z) but that we can evaluate p(z) easily for any
Machines
tegy that Imagine
for evaluating and Reason
expectations would be to 40
Importance Sampling to Variational Inference
Z
Integral problem p(x) = p(x|z)p(z)dz
Z
q(z)
Proposal p(x) = p(x|z)p(z) dz
q(z)
Z
p(z)
Importance Weight p(x) = p(x|z) q(z)dz
q(z)
Z ✓ ◆
p(z)
Jensen’s inequality log p(x) q(z) log p(x|z) dz
log
Z
p(x)g(x)dx
Z
p(x) log g(x)dx
q(z)
Z Z
q(z)
= q(z) log p(x|z) q(z) log
p(z)

Variational lower bound Eq(z) [log p(x|z)] KL[q(z)kp(z)]

Machines that Imagine and Reason 41

Variational Free Energy

F(x, q) = Eq(z) [log p(x|z)] KL[q(z)kp(z)]

Approx. Posterior Reconstruction Penalty

Interpreting the bound:

• Approximate posterior distribution q(z|x): Best match to true posterior
p(z|x), one of the unknown inferential quantities of interest to us.

• Reconstruction cost: The expected log-likelihood measures how well

samples from q(z|x) are able to explain the data x.

• Penalty: Ensures that the explanation of the data q(z|x) doesn’t deviate
too far from your beliefs p(z). A mechanism for realising Ockham’s razor.

Machines that Imagine and Reason 42

Other Families of Variational Bounds
Variational Free Energy

F(x, q) = Eq(z) [log p(x|z)] KL[q(z)kp(z)]

Multi-sample Variational Objective

" #
1 X p(z)
F(x, q) = Eq(z) log p(x|z)
S s q(z)

Renyi Variational
2 Objective 3
!1 ↵
1 1 X p(z)
F(x, q) = Eq(z) 4 log p(x|z) 5
1 ↵ S s q(z)

Other generalised families exist. Optimal solution is the same for all objectives.
Machines that Imagine and Reason 43
“From importance sampling to VAE” take-
aways:
• The VAE objective function can be derived in a way that I think is
pretty unobjectionable to Bayesians and frequentists alike.
• Treat the decoder as a likelihood model we wish to train with
maximum likelihood. We want to use importance sampling as p(x|z) is
low for most z.
• The encoder is a trainable importance sampling distribution, and the
VAE objective is a lower bound to the likelihood by Jensen’s
inequality.

Gartner - SWOT SAS Institute
100% (1)
Gartner - SWOT SAS Institute
26 pages
Deep Learning: A Visual Introduction
No ratings yet
Deep Learning: A Visual Introduction
53 pages
Load Out
100% (2)
Load Out
239 pages
Thesis
No ratings yet
Thesis
87 pages
Lecunslides
No ratings yet
Lecunslides
99 pages
Do Large Language Models Need Sensory Grounding For Meaning and Understanding?
No ratings yet
Do Large Language Models Need Sensory Grounding For Meaning and Understanding?
38 pages
CSD411-Week 3 - Learning Paradigms and Mathematical Foundations
No ratings yet
CSD411-Week 3 - Learning Paradigms and Mathematical Foundations
132 pages
Lecun 20080517 Deep Learning
No ratings yet
Lecun 20080517 Deep Learning
74 pages
Lecun 20161205 Nips Keynote
No ratings yet
Lecun 20161205 Nips Keynote
75 pages
Lecun 20250427 Nus120
No ratings yet
Lecun 20250427 Nus120
90 pages
Deep Unsupervised Learning
No ratings yet
Deep Unsupervised Learning
90 pages
Energy Based Models in Document Recognition and Computer Vision
No ratings yet
Energy Based Models in Document Recognition and Computer Vision
118 pages
Unit-1 - Machine Learning
No ratings yet
Unit-1 - Machine Learning
85 pages
Research Areas in Artificial Intelligence and Machine Learning
100% (1)
Research Areas in Artificial Intelligence and Machine Learning
72 pages
Lec12 Self Supervised Learning
No ratings yet
Lec12 Self Supervised Learning
91 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Unsupervised Deep Learning-Unit 4
No ratings yet
Unsupervised Deep Learning-Unit 4
26 pages
The Little Book of Deep Learning
100% (1)
The Little Book of Deep Learning
140 pages
Lecun 20230721 Mit
No ratings yet
Lecun 20230721 Mit
69 pages
Conmatphys 031119 050745
No ratings yet
Conmatphys 031119 050745
28 pages
Statistics Mechanic of Deep Learning
No ratings yet
Statistics Mechanic of Deep Learning
28 pages
ML LittelBook
No ratings yet
ML LittelBook
161 pages
The - Little - Book - of - Deep Learning
No ratings yet
The - Little - Book - of - Deep Learning
140 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
140 pages
Unit IV V Deep Learning Material
No ratings yet
Unit IV V Deep Learning Material
32 pages
Deep Learning Lecture 0 Introduction Alexander Tkachenko
No ratings yet
Deep Learning Lecture 0 Introduction Alexander Tkachenko
31 pages
LBDL
No ratings yet
LBDL
142 pages
LBDL
No ratings yet
LBDL
185 pages
Lec15 Generative Models
No ratings yet
Lec15 Generative Models
51 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
167 pages
Deep Learning: Presented By:-Anuj Trehan (003) Deepak Dhingra (008) Divyanshu Sharma
No ratings yet
Deep Learning: Presented By:-Anuj Trehan (003) Deepak Dhingra (008) Divyanshu Sharma
15 pages
Unit 4 Short Notes
No ratings yet
Unit 4 Short Notes
27 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
168 pages
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
No ratings yet
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
163 pages
4 DL
No ratings yet
4 DL
81 pages
Lbdlu
No ratings yet
Lbdlu
168 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Deep Learning Hand Book 2024
No ratings yet
Deep Learning Hand Book 2024
185 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
163 pages
Deep
No ratings yet
Deep
15 pages
Deep Learning Introduction Class
No ratings yet
Deep Learning Introduction Class
46 pages
MODULE 5 Auto-Encoders and Generative Models
No ratings yet
MODULE 5 Auto-Encoders and Generative Models
25 pages
Deep Learning: Yann Lecun
No ratings yet
Deep Learning: Yann Lecun
58 pages
Generative Models
No ratings yet
Generative Models
65 pages
Unit5 Autoencoders
No ratings yet
Unit5 Autoencoders
45 pages
Deep Neural Networks Are Lazy: On The Inductive Bias of Deep Learning
No ratings yet
Deep Neural Networks Are Lazy: On The Inductive Bias of Deep Learning
78 pages
Introduction To AI and ML - Day 1: Gururajan Narasimhan Erode
No ratings yet
Introduction To AI and ML - Day 1: Gururajan Narasimhan Erode
39 pages
Quadrant Data Efficient Machine Learning Screen
No ratings yet
Quadrant Data Efficient Machine Learning Screen
6 pages
ANN Unit-2
No ratings yet
ANN Unit-2
48 pages
Lecture 12 Machine Learning
No ratings yet
Lecture 12 Machine Learning
40 pages
Brief Introduction On Current Research Areas - Autoencoders
No ratings yet
Brief Introduction On Current Research Areas - Autoencoders
20 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
001 Intro
No ratings yet
001 Intro
66 pages
Vae Gan
No ratings yet
Vae Gan
214 pages
Introduction To Machine Learning For Beginners: Ayush Pant
No ratings yet
Introduction To Machine Learning For Beginners: Ayush Pant
28 pages
Unit 3
No ratings yet
Unit 3
23 pages
Auto Encoder S
No ratings yet
Auto Encoder S
32 pages
Deep Learning Model
No ratings yet
Deep Learning Model
144 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
A Little Book of Deep Learning - Francois Fleuret
No ratings yet
A Little Book of Deep Learning - Francois Fleuret
149 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Unreal Engine 5 For Beginners: Build High-Quality Games, Immersive Virtual Worlds, And Advanced Interactive Contents
From Everand
Unreal Engine 5 For Beginners: Build High-Quality Games, Immersive Virtual Worlds, And Advanced Interactive Contents
Calren Dovale
No ratings yet
Sample Template For International Journal of Multiphase Flow - 2
No ratings yet
Sample Template For International Journal of Multiphase Flow - 2
1 page
Sample Template For International Journal of Multiphase Flow - 5
No ratings yet
Sample Template For International Journal of Multiphase Flow - 5
1 page
Sample Template For International Journal of Multiphase Flow
No ratings yet
Sample Template For International Journal of Multiphase Flow
1 page
Handson IV
No ratings yet
Handson IV
26 pages
Italy Kappa Viscosity Mar14
No ratings yet
Italy Kappa Viscosity Mar14
47 pages
ShapeNet: An Information-Rich 3D Model Repository
No ratings yet
ShapeNet: An Information-Rich 3D Model Repository
11 pages
06 RNC Nancy
No ratings yet
06 RNC Nancy
110 pages
JSM2016 ProgramBook
No ratings yet
JSM2016 ProgramBook
311 pages
Lévy Processes and Their Characteristics
No ratings yet
Lévy Processes and Their Characteristics
23 pages
A Method For The Approximation of Functions Defined by Formal Series Expansions
No ratings yet
A Method For The Approximation of Functions Defined by Formal Series Expansions
13 pages
05 CCA Kyoto2
No ratings yet
05 CCA Kyoto2
47 pages
Green Theorem
No ratings yet
Green Theorem
18 pages
Appendix Scripts
No ratings yet
Appendix Scripts
14 pages
Super Quick Introduction To MPI
No ratings yet
Super Quick Introduction To MPI
32 pages
Assembly and Operating Instructions: Inverter Welding Machine
No ratings yet
Assembly and Operating Instructions: Inverter Welding Machine
14 pages
BioAir EcoFilter Brochure
No ratings yet
BioAir EcoFilter Brochure
4 pages
2019 Genes Ejercicio
No ratings yet
2019 Genes Ejercicio
543 pages
The Straits Times Life! - Interviews With Mediacorp Artistes, Pierre PNG, Cynthia Koh and Rebecca Lim
No ratings yet
The Straits Times Life! - Interviews With Mediacorp Artistes, Pierre PNG, Cynthia Koh and Rebecca Lim
1 page
Regional Plan - 2021 For National Capital Region: Addressing The Planned Growth of Delhi by Adopting Regional Approach.
No ratings yet
Regional Plan - 2021 For National Capital Region: Addressing The Planned Growth of Delhi by Adopting Regional Approach.
14 pages
A Review of Daylighting Design and Implementation in Buildings 2018
No ratings yet
A Review of Daylighting Design and Implementation in Buildings 2018
10 pages
2425 - Pgdlma - Elscon - Mock - Assessment - Tagged
No ratings yet
2425 - Pgdlma - Elscon - Mock - Assessment - Tagged
4 pages
User Manual 2569067
No ratings yet
User Manual 2569067
70 pages
The FSC - Stability
No ratings yet
The FSC - Stability
9 pages
Practice Questions - Sign Convention - Spherical Mirrors DONEE
0% (1)
Practice Questions - Sign Convention - Spherical Mirrors DONEE
2 pages
Cebuano Words 3
No ratings yet
Cebuano Words 3
5 pages
Polarity and Intermolecular Forces Lab Sheet
100% (1)
Polarity and Intermolecular Forces Lab Sheet
9 pages
Likap Notebook Manual
No ratings yet
Likap Notebook Manual
8 pages
Skin and Temperature Control
No ratings yet
Skin and Temperature Control
3 pages
After Effects Reference (006-050)
No ratings yet
After Effects Reference (006-050)
45 pages
University of Mumbai: Revised Syllabus W.E.F. Academic Year, 2016-18
No ratings yet
University of Mumbai: Revised Syllabus W.E.F. Academic Year, 2016-18
5 pages
Manual Autoclave PDF
No ratings yet
Manual Autoclave PDF
104 pages
Juarez Cartel Suit
No ratings yet
Juarez Cartel Suit
52 pages
A Detailed Lesson Plan in Mathematics 9
No ratings yet
A Detailed Lesson Plan in Mathematics 9
7 pages
ENP Energy Efficient Free Cooling For Data Centers
No ratings yet
ENP Energy Efficient Free Cooling For Data Centers
16 pages
Design Calculation: Hindustan Construction Co. LTD
No ratings yet
Design Calculation: Hindustan Construction Co. LTD
13 pages
IoT Module 4 Associated IoT Technologies
No ratings yet
IoT Module 4 Associated IoT Technologies
56 pages
Elgin ESS-936HD2 Decanter Centrifuge 2019
No ratings yet
Elgin ESS-936HD2 Decanter Centrifuge 2019
1 page
Estimation of A Population Mean
No ratings yet
Estimation of A Population Mean
1 page
Assignment 5 Comp 3261
No ratings yet
Assignment 5 Comp 3261
6 pages
Automatic Transfer Switch - Ats 22 Manual
No ratings yet
Automatic Transfer Switch - Ats 22 Manual
38 pages
Zinc Flake Coating Ex Geomet
No ratings yet
Zinc Flake Coating Ex Geomet
7 pages
Karnataka FPOs
No ratings yet
Karnataka FPOs
66 pages