0% found this document useful (0 votes)
18 views17 pages

Auto Encoder

The lecture discusses autoencoders, a type of neural network designed for dimensionality reduction and feature learning by mapping high-dimensional data to a lower-dimensional space. It explains the concept of linear autoencoders and their relationship to Principal Component Analysis (PCA), as well as the advantages of deep nonlinear autoencoders. Additionally, it touches on layerwise training and the connection between autoencoders and Restricted Boltzmann Machines (RBMs).

Uploaded by

Srikanth Sri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views17 pages

Auto Encoder

The lecture discusses autoencoders, a type of neural network designed for dimensionality reduction and feature learning by mapping high-dimensional data to a lower-dimensional space. It explains the concept of linear autoencoders and their relationship to Principal Component Analysis (PCA), as well as the advantages of deep nonlinear autoencoders. Additionally, it touches on layerwise training and the connection between autoencoders and Restricted Boltzmann Machines (RBMs).

Uploaded by

Srikanth Sri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

CSC321 Lecture 20: Autoencoders

Roger Grosse

Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16


Overview

Latent variable models so far:


mixture models
Boltzmann machines
Both of these involve discrete latent variables. Now let’s talk about
continuous ones.
One use of continuous latent variables is dimensionality reduction

Roger Grosse CSC321 Lecture 20: Autoencoders 2 / 16


Autoencoders

An autoencoder is a feed-forward neural net whose job it is to take an


input x and predict x.
To make this non-trivial, we need to add a bottleneck layer whose
dimension is much smaller than the input.

Roger Grosse CSC321 Lecture 20: Autoencoders 3 / 16


Autoencoders

Why autoencoders?
Map high-dimensional data to two dimensions for visualization
Compression (i.e. reducing the file size)
Note: autoencoders don’t do this for free — it requires other ideas as
well.
Learn abstract features in an unsupervised way so you can apply them
to a supervised task
Unlabled data can be much more plentiful than labeled data

Roger Grosse CSC321 Lecture 20: Autoencoders 4 / 16


Principal Component Analysis

The simplest kind of autoencoder has one


hidden layer, linear activations, and squared
error loss.

L(x, x̃) = kx − x̃k2

This network computes x̃ = UVx, which is a


linear function.
If K ≥ D, we can choose U and V such that
UV is the identity. This isn’t very interesting.
But suppose K < D:
V maps x to a K -dimensional space, so it’s doing dimensionality
reduction.
The output must lie in a K -dimensional subspace, namely the column
space of U.

Roger Grosse CSC321 Lecture 20: Autoencoders 5 / 16


Principal Component Analysis

We just saw that a linear autoencoder has to map D-dimensional


inputs to a K -dimensional subspace S.
Knowing this, what is the best possible mapping it can choose?

Roger Grosse CSC321 Lecture 20: Autoencoders 6 / 16


Principal Component Analysis

We just saw that a linear autoencoder has to map D-dimensional


inputs to a K -dimensional subspace S.
Knowing this, what is the best possible mapping it can choose?
By definition, the projection of x onto S is the point in S which
minimizes the distance to x.

Fortunately, the linear autoencoder can represent projection onto S:


pick U = Q and V = Q> , where Q is an orthonormal basis for S.

Roger Grosse CSC321 Lecture 20: Autoencoders 6 / 16


Principal Component Analysis
The autoencoder should learn to choose the subspace which minimizes the
squared distance from the data to the projections.
This is equivalent to the subspace which maximizes the variance of the
projections.
By the Pythagorean Theorem,
N N
1 X (i) 1 X (i)
kx̃ − µk2 + kx − x̃(i) k2
N i=1 N i=1
| {z } | {z }
projected variance reconstruction error
N
1 X (i)
= kx − µk2
N i=1
| {z }
constant

You wouldn’t actually sove this problem by training a neural net. There’s a
closed-form solution, which you learn about in CSC 411.
The algorithm is called principal component analysis (PCA).
Roger Grosse CSC321 Lecture 20: Autoencoders 7 / 16
Principal Component Analysis
PCA for faces (“Eigenfaces”)

Roger Grosse CSC321 Lecture 20: Autoencoders 8 / 16


Principal Component Analysis
PCA for digits

Roger Grosse CSC321 Lecture 20: Autoencoders 9 / 16


Deep Autoencoders

Deep nonlinear autoencoders learn to project the data, not onto a


subspace, but onto a nonlinear manifold
This manifold is the image of the decoder.
This is a kind of nonlinear dimensionality reduction.

Roger Grosse CSC321 Lecture 20: Autoencoders 10 / 16


Deep Autoencoders

Nonlinear autoencoders can learn more powerful codes for a given


dimensionality, compared with linear autoencoders (PCA)

Roger Grosse CSC321 Lecture 20: Autoencoders 11 / 16


Layerwise Training

There’s a neat connection between autoencoders and RBMs.

An RBM is like an autoencoder with tied weights, except that the


units are sampled stochastically.

Roger Grosse CSC321 Lecture 20: Autoencoders 12 / 16


Layerwise Training

Suppose we’ve already trained an RBM with weights W(1) .


Let’s compute its hidden features on the training set, and feed that in
as data to another RBM:

Note that now W(1) is held fixed, but W(2) is being trained using
contrastive divergence.

Roger Grosse CSC321 Lecture 20: Autoencoders 13 / 16


Layerwise Training

A stack of two RBMs can be thought of as an autoencoder with three


hidden layers:

This gives a good initialization for the deep autoencoder. You can
then fine-tune the autoencoder weights using backprop.
This strategy is known as layerwise pre-training.
Roger Grosse CSC321 Lecture 20: Autoencoders 14 / 16
Autoencoders are not a probabilistic model.
However, there is an autoencoder-like probabilistic model called a
variational autoencoder (VAE). These are beyond the scope of the
course, and require some more advanced math.
Check out David Duvenaud’s excellent course “Differentiable
Inference and Generative Models”: https://www.cs.toronto.edu/
~duvenaud/courses/csc2541/index.html

Roger Grosse CSC321 Lecture 20: Autoencoders 15 / 16


Deep Autoencoders

(Professor Hinton’s slides)

Roger Grosse CSC321 Lecture 20: Autoencoders 16 / 16

You might also like