Unit5 Autoencoders
Unit5 Autoencoders
1
Contents
• What is an autoencoder?
1. Undercomplete Autoencoders
2. Regularized Autoencoders
3. Representational Power, Layout Size and Depth
4. Stochastic Encoders and Decoders
5. Denoising Autoencoders
6. Learning Manifolds and Autoencoders
7. Contractive Autoencoders
2
8. Predictive Sparse Decomposition
9. Applications of Autoencoders
3
What is an Autoencoder?
4
Embedding is a point on a
manifold
• An embedding is a low-dimensional vector
• With fewer dimensions than than the ambient space
of which the manifold is a low-dimensional subset
• Embedding Algorithm
• Maps any point in ambient space x to its embedding h
• Embeddings of related inputs form a manifold
5
A manifold in ambient
space
Embedding: map x to lower dimensional h
1-D manifold in 2-D space Age Progression/Regression by Conditional Adversarial Autoencoder (CAA
Github: https://github.com/ZZUTK/Face-Aging-CAAE
Derived from 28x28=784 space
6
General structure of an
autoencoder
• Maps an input x to an output r (called reconstruction)
through an internal representation code h
• It has a hidden layer h that describes a code used to represent
the input
• The network has two parts
• The encoder function h=f(x)
• A decoder that produces a reconstruction r=g(h)
7
Autoencoders differ from General Data
Compression
• Autoencoders are data-specific
• i.e., only able to compress data similar to what they have been
trained on
• This is different from, say, MP3 or JPEG compression
algorithm
• Which make general assumptions about "sound/images”, but
not about specific types of sounds/images
• Autoencoder for pictures of cats would do poorly in
compressing pictures of trees
• Because features it would learn would be cat-specific
• Autoencoders are lossy
• which means that the decompressed outputs will be degraded
compared to the original inputs (similar to MP3 or JPEG
8
compression).
• This differs from lossless arithmetic compression
• Autoencoders are learnt
9
What does an Autoencoder Learn?
• Learning g (f (x))=x everywhere is not useful
• Autoencoders are designed to be unable to copy
perfectly
• Restricted to copy only approximately
• Autoencoders learn useful properties of the data
• Being forced to prioritize which aspects of input should be
copied
• Can learn stochastic mappings
• Go beyond deterministic functions to mappings pencoder(h|x) and
pdecoder(x|h)
10
Autoencoder History
11
An autoencoder architecture
Weights W are learnt
using:
1. Training samples, and
2. a loss
Decoder
g function as
discussed next
Encoder
f
12
Two Autoencoder Training Methods
13
1. Undercomplete Autoencoder
14
Autoencoder with linear decoder +MSE is PCA
• Learning process is that of minimizing a loss function
L(x, g ( f (x)))
• where L is a loss function penalizing g( f (x)) for being dissimilar
from x
• such as L2 norm of difference: mean squared error
• When the decoder g is linear and L is the mean squared error,
an undercomplete autoencoder learns to span the same
subspace as PCA
• In this case the autoencoder trained to perform the copying task
has learned the principal subspace of the training data as a side-
effect
• Autoencoders with nonlinear f and g can learn more
powerful nonlinear generalizations of PCA
• But high capacity is not desirable as seen next
15
Autoencoder training using a loss
function
• Encoder f and Autoencoder with 3 fully connected hidden
layers
decoder g
f : Χ →h
g: h →X
arg 2
17
2
18
Cases when Autoencoder Learning Fails
19
Right Autoencoder Design: Use regularization
20
2. Regularized Autoencoder Properties
• Regularized AEs have properties beyond copying
input to output:
• Sparsity of representation
• Smallness of the derivative of the representation
• Robustness to noise
• Robustness to missing inputs
• Regularized autoencoder can be nonlinear and
overcomplete
• But still learn something useful about the data distribution
even if model capacity is great enough to learn trivial identity
function
21
Generative Models Viewed as Autoencoders
22
Deep Srihar
Learning i
Source: https://www.jeremyjordan.me/variational-autoencoders/ 23
Variational
Deep
Srihari
Learning
Autoencoder
• VAE is a generative model
• able to generate samples that look like samples from training
data
• With MNIST, these fake samples would be synthetic images
of digits
25
Sparse Autoencoder
Only a few nodes are encouraged to activate when a
single sample is fed into the network
Fewer nodes activating while still keeping its performance would guarantee that the
autoencoder is actually learning latent representations instead of redundant information
26
in our input data
27
Sparse Autoencoder Loss Function
30
Generative Model view of Sparse
Autoencoder
29
4. Stochastic Encoders and
Decoders
• General strategy for designing the output units
and loss function of a feedforward network is to
• Define the output distribution p(y|x)
• Minimize the negative log-likelihood –log p(y|x)
• In this setting y is a vector of targets such as class labels
• In an autoencoder x is the target as well as the input
• Yet we can apply the same machinery as before, as we see next
30
Loss function for Stochastic Decoder
32
Structure of stochastic autoencoder
33
Relationship to joint distribution
34
Sampling pmodel(h|x)
pencoder(h|x) pdecoder(x|h)
35
Ex: Sampling p(x|h): Deepstyle
• Boil down to a
representation which
relates to style
• By iterating neural network
through a set of images learn
efficient representations
• Choosing a random
numerical description in
encoded space will generate
new images of styles not
seen
• Using one input image and
changing values along
different dimensions of
feature space you can see
how the generated image
changes (patterning, color
texture) in style space 36