0% found this document useful (0 votes)

296 views36 pages

Autoencoders - Buffalo University

The document discusses various topics related to autoencoders including: 1. What autoencoders are and their general structure of encoding an input to a hidden representation and then decoding to reconstruct the output. 2. Different types of autoencoders like undercomplete, regularized, and denoising autoencoders. 3. Factors that influence what autoencoders learn from the data like capacity of the encoder/decoder and dimensionality of the hidden representation.

Uploaded by

nitin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

296 views36 pages

Autoencoders - Buffalo University

Uploaded by

nitin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Deep Learning Srihari

Autoencoders
Sargur Srihari
[email protected]

1
Deep Learning Srihari

Topics in Autoencoders
• What is an autoencoder?
1. Undercomplete Autoencoders
2. Regularized Autoencoders
3. Representational Power, Layout Size and Depth
4. Stochastic Encoders and Decoders
5. Denoising Autoencoders
6. Learning Manifolds and Autoencoders
7. Contractive Autoencoders
8. Predictive Sparse Decomposition
9. Applications of Autoencoders

2
Deep Learning
What is an Autoencoder? Srihari

• A neural network trained using unsupervised learning

• Trained to copy its input to its output
• Learns an embedding

3
Deep Learning Srihari

Embedding is a point on a manifold

• An embedding is a low-dimensional vector
• With fewer dimensions than than the ambient space of which
the manifold is a low-dimensional subset
• Embedding Algorithm
• Maps any point in ambient space x to its embedding h
• Embeddings of related inputs form a manifold

4
Deep Learning Srihari

A manifold in ambient space

Embedding: map x to lower dimensional h

Age Progression/Regression by Conditional Adversarial Autoencoder (CAA

1-D manifold in 2-D space Github: https://github.com/ZZUTK/Face-Aging-CAAE
Derived from 28x28=784 space

5
Deep Learning Srihari

General structure of an autoencoder

• Maps an input x to an output r (called reconstruction) through
an internal representation code h
• It has a hidden layer h that describes a code used to represent the input
• The network has two parts
• The encoder function h=f(x)
• A decoder that produces a reconstruction r=g(h)

6
Deep Learning Srihari

Autoencoders differ from General Data Compression

• Autoencoders are data-specific
• i.e., only able to compress data similar to what they have been trained on
• This is different from, say, MP3 or JPEG compression algorithm
• Which make general assumptions about "sound/images”, but not about
specific types of sounds/images
• Autoencoder for pictures of cats would do poorly in compressing pictures
of trees
• Because features it would learn would be cat-specific
• Autoencoders are lossy
• which means that the decompressed outputs will be degraded compared
to the original inputs (similar to MP3 or JPEG compression).
• This differs from lossless arithmetic compression
• Autoencoders are learnt

7
Deep Learning Srihari

What does an Autoencoder Learn?

• Learning g (f (x))=x everywhere is not useful
• Autoencoders are designed to be unable to copy perfectly
• Restricted to copy only approximately
• Autoencoders learn useful properties of the data
• Being forced to prioritize which aspects of input should be copied
• Can learn stochastic mappings
• Go beyond deterministic functions to mappings pencoder(h|x) and pdecoder(x|h)

8
Deep Learning Srihari

Autoencoder History

• Part of neural network landscape for decades

• Used for dimensionality reduction and feature learning
• Theoretical connection to latent variable models
• Have brought them into forefront of generative models
• Variational Autoencoders

9
Deep Learning Srihari

An autoencoder architecture

Decoder g

Weights W are learnt using:

1. Training samples, and
2. a loss function

as discussed next
Encoder f

10
Deep Learning Srihari

Two Autoencoder Training Methods

1. Autoencoder is a feed-forward non-recurrent neural net

• With an input layer, an output layer and one or more hidden layers
• Can be trained using the same techniques
• Compute gradients using back-propagation
• Followed by minibatch gradient descent
2. Unlike feedforward networks, can also be trained using
Recirculation
• Compare activations on the input to activations of the reconstructed input
• More biologically plausible than back-prop but rarely used in ML

11
Deep Learning Srihari

1. Undercomplete Autoencoder

• Copying input to output sounds useless

• But we have no interest in decoder output
• We hope h takes on useful properties
• Undercomplete autoencoder
• Constrain h to have lower dimension than x
• Force it to capture most salient features of training data

12
Deep Learning Srihari

Autoencoder with linear decoder +MSE is PCA

• Learning process is that of minimizing a loss function
L(x, g ( f (x)))
• where L is a loss function penalizing g( f (x)) for being dissimilar from x
• such as L2 norm of difference: mean squared error
• When the decoder g is linear and L is the mean squared error, an
undercomplete autoencoder learns to span the same subspace as PCA
• In this case the autoencoder trained to perform the copying task has learned
the principal subspace of the training data as a side-effect
• Autoencoders with nonlinear f and g can learn more powerful
nonlinear generalizations of PCA
• But high capacity is not desirable as seen next

13
Deep Learning Srihari

Autoencoder training using a loss function

Autoencoder with 3 fully connected hidden layers
• Encoder f and decoder g
f :Χ →h
g :h → X
2
arg min X −(f ! g)X
f ,g

h
• One hidden layer
• Non-linear encoder
• Takes input x ε Rd
• Maps into output h ε Rp Encoder f
Decoder g

h = σ1(Wx +b)
x ' = σ2(W 'h +b') σ is an element-wise activation function such as sigmoid or Relu

Trained to minimize reconstruction error (such as sum of squared errors)

2 2
t
L(x,x ') = x −x ' = x − σ2(W (σ1(Wx +b)) +b ')

Provides a compressed representation of the input x

14
Deep Learning Srihari

Encoder/Decoder Capacity

• If encoder f and decoder g are allowed too much capacity

• autoencoder can learn to perform the copying task without learning any
useful information about distribution of data
• Autoencoder with a one-dimensional code and a very powerful
nonlinear encoder can learn to map x(i) to code i.
• The decoder can learn to map these integer indices back to the values of
specific training examples
• Autoencoder trained for copying task fails to learn anything
useful if f/g capacity is too great

15
Deep Learning Srihari

Cases when Autoencoder Learning Fails

• Where autoencoders fail to learn anything useful:

1. Capacity of encoder/decoder f/g is too high
• Capacity controlled by depth
2. Hidden code h has dimension equal to input x
3. Overcomplete case: where hidden code h has dimension
greater than input x
• Even a linear encoder/decoder can learn to copy input to output
without learning anything useful about data distribution

16
Deep Learning Srihari

Right Autoencoder Design: Use regularization

• Ideally, choose code size (dimension of h) small and capacity of

encoder f and decoder g based on complexity of distribution
modeled
• Regularized autoencoders provide the ability to do so
• Rather than limiting model capacity by keeping encoder/decoder shallow
and code size small
• They use a loss function that encourages the model to have properties
other than copy its input to output

17
Deep Learning Srihari

2. Regularized Autoencoder Properties

• Regularized AEs have properties beyond copying input to
output:
• Sparsity of representation
• Smallness of the derivative of the representation
• Robustness to noise
• Robustness to missing inputs
• Regularized autoencoder can be nonlinear and overcomplete
• But still learn something useful about the data distribution even if model
capacity is great enough to learn trivial identity function

18
Deep Learning Srihari

Generative Models Viewed as Autoencoders

• Beyond regularized autoencoders

• Generative models with latent variables and an inference
procedure (for computing latent representations given input)
can be viewed as a particular form of autoencoder
• Generative modeling approaches which emphasize connection
with autoencoders are descendants of Helmholtz machine:
1. Variational autoencoder
2. Generative stochastic networks

19
Deep Learning Srihari

Latent variables treated as distributions

Source: https://www.jeremyjordan.me/variational-autoencoders/ 20
Deep Learning
Variational Autoencoder Srihari

• VAE is a generative model

• able to generate samples that look like samples from training data
• With MNIST, these fake samples would be synthetic images of digits

• Due to random variable between input-output it cannot be

trained using backprop
• Instead, backprop proceeds through parameters of latent distribution
• Called reparameterization trick
N(μ,Σ) = μ + Σ N(0, I)
Where Σ is diagonal

21
Deep Learning Srihari

Sparse Autoencoder
Only a few nodes are encouraged to activate when a single
sample is fed into the network

Fewer nodes activating while still keeping its performance would guarantee that the autoencoder is
actually learning latent representations instead of redundant information in our input data
22
Deep Learning Srihari

Sparse Autoencoder Loss Function

• A sparse autoencoder is an autoencoder whose

• Training criterion includes a sparsity penalty Ω(h) on the code layer h in
addition to the reconstruction error:
L(x, g ( f (x))) + Ω(h)
• where g (h) is the decoder output and typically we have h = f (x)
• Sparse encoders are typically used to learn features for another
task such as classification
• An autoencoder that has been trained to be sparse must
respond to unique statistical features of the dataset rather than
simply perform the copying task
• Thus sparsity penalty can yield a model that has learned useful features
as a byproduct
23
Deep Learning Srihari

Sparse Encoder doesn’t have Bayesian Interpretation

• Penalty term Ω(h) is a regularizer term added to a feedforward
network whose
• Primary task: copy input to output (with Unsupervised learning objective)
• Also perform some supervised task (with Supervised learning objective)
that depends on the sparse features
• In supervised learning regularization term corresponds to prior
probabilities over model parameters
• Regularized MLE corresponds to maximizing p(θ|x), which is equivalent
to maximizing log p(x|θ)+log p(θ)
• First term is data log-likelihood and second term is log-prior over parameters
• Regularizer depends on data and thus is not a prior
• Instead, regularization terms express a preference over functions

24
Deep Learning Srihari

Generative Model view of Sparse Autoencoder

• Rather than thinking of sparsity penalty as a regularizer for

copying task, think of sparse autoencoder as approximating
ML training of a generative model that has latent variables
• Suppose model has visible/latent variables x and h
• Explicit joint distribution is pmodel(x,h) = pmodel(h) pmodel(x|h)
• where pmodel(h) is model’s prior distribution over latent variables
• Different from p(θ) being distribution of parameters
• The log-likelihood can be decomposed as log p (x,h) = log ∑ p (h,x)
model model
h

• Autoencoder approximates the sum with a point estimate for

just one highly likely value of h, the output of a parametric
encoder
• With a chosen h we are maximizing log pmodel(x,h) = log pmodel(h)+log pmodel(x|h)
Deep Learning Srihari

Sparsity-inducing Priors
• The log pmodel(h) term can be sparsity-inducing. For example the
Laplace prior
λ −λ|h |
pmodel (hi ) = e i
2
• corresponds to an absolute value sparsity penalty
• Expressing the log-prior as an absolute value penalty
⎛ λ⎞
−log pmodel (h) = ∑ ⎜⎜⎜λ | hi | −log ⎟⎟⎟ = Ω(h) +const where Ω(h) = λ∑ hi
i ⎝ 2 ⎟⎠
i

• where the constant term depends only on λ and not on h

• We treat λ as a hyperparameter and discard the constant term,
since it does not affect parameter learning

26
Deep Learning Srihari

Denoising Autoencoders (DAE)

• Rather than adding a penalty Ω to the cost function, we can

obtain an autoencoder that learns something useful
• By changing the reconstruction error term of the cost function
• Traditional autoencoders minimize L(x, g ( f (x)))
• where L is a loss function penalizing g( f (x)) for being dissimilar from x,
such as L2 norm of difference: mean squared error
• A DAE minimizes L(x,g(f (x)))
!
• where x! is a copy of x that has been corrupted by some form of noise
• The autoencoder must undo this corruption rather than simply copying
their input
• Denoising training forces f and g to implicitly learn the structure
of pdata(x)
• Another example of how useful properties can emerge as a by-
product of minimizing reconstruction error
Deep Learning Srihari

Regularizing by Penalizing Derivatives

• Another strategy for regularizing an autoencoder
• Use penalty as in sparse autoencoders
L(x, g ( f (x))) + Ω(h,x)
• But with a different form of Ω
2
Ω(h,x) = λ∑ ∇ h x i
i

• Forces the model to learn a function that does not change

much when x changes slightly
• Called a Contractive Auto Encoder (CAE)
• This model has theoretical connections to
• Denoising autoencoders
• Manifold learning
• Probabilistic modeling
28
Deep Learning Srihari

3. Representational Power, Layer Size and Depth

• Autoencoders are often trained with with single layer

• However using deep encoder offers many advantages
• Recall: Although universal approximation theorem states that a single
layer is sufficient, there are disadvantages:
1. no of units needed may be too large
2. may not generalize well
• Common strategy: greedily pretrain a stack of shallow
autoencoders

29
Deep Learning Srihari

4. Stochastic Encoders and Decoders

• General strategy for designing the output units and loss
function of a feedforward network is to
• Define the output distribution p(y|x)
• Minimize the negative log-likelihood –log p(y|x)
• In this setting y is a vector of targets such as class labels
• In an autoencoder x is the target as well as the input
• Yet we can apply the same machinery as before, as we see next

30
Deep Learning Srihari

Loss function for Stochastic Decoder

• Given a hidden code h, we may think of the decoder as

providing a conditional distribution pdecoder(x|h)
• We train the autoencoder by minimizing –log pdecoder(x|h)
• The exact form of this loss function will change depending on
the form of pdecoder(x|h)
• As with feedforward networks we use linear output units to
parameterize the mean of the Gaussian distribution if x is real
• In this case negative log-likelihood is the mean-squared error
• With binary x correspond to a Bernoulli with parameters given
by a sigmoid
• Discrete x values correspond to a softmax
• The output variables are treated as being conditionally
independent given h 31
Deep Learning Srihari

Stochastic encoder

• We can also generalize the notion of an encoding function f(x)

to an encoding distribution pencoder(h|x)

32
Deep Learning Srihari

Structure of stochastic autoencoder

• Both the encoder and decoder are not simple functions but
involve a distribution
• The output is sampled from a distribution pencoder(h|x) for the
encoder and pdecoder(x|h) for the decoder

33
Deep Learning Srihari

Relationship to joint distribution

• Any latent variable model pmodel(h|x) defines a stochastic

encoder pencoder(h|x)=pmodel(h|x)
• And a stochastic decoder pdecoder(x|h)=pmodel(x|h)
• In general the encoder and decoder distributions are not
conditional distributions compatible with a unique joint
distribution pmodel(x,h)
• Training the autoencoder as a denoising autoencoder will tend
to make them compatible asymptotically
• With enough capacity and examples

34
Deep Learning Srihari

Sampling pmodel(h|x)

pencoder(h|x) pdecoder(x|h)

35
Deep Learning Srihari

Ex: Sampling p(x|h): Deepstyle

• Boil down to a representation
which relates to style
• By iterating neural network through
a set of images learn efficient
representations
• Choosing a random numerical
description in encoded space will
generate new images of styles
not seen
• Using one input image and
changing values along different
dimensions of feature space you
can see how the generated
image changes (patterning, color
texture) in style space 36

Practical Skills
No ratings yet
Practical Skills
35 pages
Class 12 Maths Mid-Term Paper
No ratings yet
Class 12 Maths Mid-Term Paper
7 pages
QA 27 Geometry - 2
No ratings yet
QA 27 Geometry - 2
33 pages
Notes-1-Activation Functions
No ratings yet
Notes-1-Activation Functions
2 pages
SCIENCE
No ratings yet
SCIENCE
4 pages
Effect of The Vent Hole Geometry and Welding
No ratings yet
Effect of The Vent Hole Geometry and Welding
16 pages
Parallel Lines and Transversals FOLDABLE Notes
No ratings yet
Parallel Lines and Transversals FOLDABLE Notes
10 pages
LAS DRAWING-Q2-Classification of Drawing Tools
No ratings yet
LAS DRAWING-Q2-Classification of Drawing Tools
3 pages
Guilford 1972
No ratings yet
Guilford 1972
15 pages
Unit 4 Deeplearning
No ratings yet
Unit 4 Deeplearning
41 pages
Drying: Learning Unit 3
No ratings yet
Drying: Learning Unit 3
24 pages
1 - Hutchison 1957, Concluding Remarks
No ratings yet
1 - Hutchison 1957, Concluding Remarks
13 pages
MYSYSTEM2
No ratings yet
MYSYSTEM2
9 pages
Abstract Algebra - Proof of Prime and Irreducible Equivalences in PIDs - Mathematics Stack Exchange
No ratings yet
Abstract Algebra - Proof of Prime and Irreducible Equivalences in PIDs - Mathematics Stack Exchange
2 pages
Unit5 Autoencoders
No ratings yet
Unit5 Autoencoders
45 pages
Multiple Correct Questions 1. Physics: Paper-1 JEE-Advanced - FT-02 - Sample Paper
No ratings yet
Multiple Correct Questions 1. Physics: Paper-1 JEE-Advanced - FT-02 - Sample Paper
12 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
71 pages
Combinations With Repetitions
0% (1)
Combinations With Repetitions
102 pages
Syllabus of Chemical Engineering 3rd Year 2020 5 April 2021
No ratings yet
Syllabus of Chemical Engineering 3rd Year 2020 5 April 2021
57 pages
Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
Deep Learning Lab With Output
No ratings yet
Deep Learning Lab With Output
12 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
Autoencoders
No ratings yet
Autoencoders
35 pages
Deep Learning Exp
No ratings yet
Deep Learning Exp
25 pages
Deep Learning Lab Practicals
No ratings yet
Deep Learning Lab Practicals
24 pages
Chapter 5 - Scheduling Management
No ratings yet
Chapter 5 - Scheduling Management
70 pages
Unit - 3-NNDL - Notes
No ratings yet
Unit - 3-NNDL - Notes
17 pages
AF12 Chapter 5 Solutions
No ratings yet
AF12 Chapter 5 Solutions
111 pages
Neural Network Unit 1 Handwritten Notes
No ratings yet
Neural Network Unit 1 Handwritten Notes
30 pages
DL Question Bank
No ratings yet
DL Question Bank
23 pages
Fibonacci Sequence
No ratings yet
Fibonacci Sequence
6 pages
Tangent Prop and Manifold Tangent Classifier Are B
No ratings yet
Tangent Prop and Manifold Tangent Classifier Are B
4 pages
Allama Iqbal Open University Islamabad Assignment#1
No ratings yet
Allama Iqbal Open University Islamabad Assignment#1
27 pages
General Universal Joint Characteristics and Applications From SDP - SI
No ratings yet
General Universal Joint Characteristics and Applications From SDP - SI
9 pages
The J Integral: Fracture Mechanics, With or Without Field Theory
No ratings yet
The J Integral: Fracture Mechanics, With or Without Field Theory
13 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
Unit 4
No ratings yet
Unit 4
79 pages
ADL Unit-3
100% (2)
ADL Unit-3
21 pages
Deep Learning Question Bank (2024-25)
No ratings yet
Deep Learning Question Bank (2024-25)
2 pages
50 Most Important CNN Interview Questions
No ratings yet
50 Most Important CNN Interview Questions
18 pages
2022 ML Assignments
No ratings yet
2022 ML Assignments
45 pages
Zeus Case Study
No ratings yet
Zeus Case Study
7 pages
2.building Blocks of Neural Networks
100% (1)
2.building Blocks of Neural Networks
2 pages
UG022528 International GCSE in Mathematics Spec B For Web
No ratings yet
UG022528 International GCSE in Mathematics Spec B For Web
55 pages
Unit 5
No ratings yet
Unit 5
23 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
5 Tools and Tricks I Learned From Teaching Autocad Civil 3D To Caltrans Employees
No ratings yet
5 Tools and Tricks I Learned From Teaching Autocad Civil 3D To Caltrans Employees
12 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
15 pages
Radix Sort: Problem Description
No ratings yet
Radix Sort: Problem Description
5 pages
Unit 2
No ratings yet
Unit 2
112 pages
Unit 5 Reinforcement Learning Notes
No ratings yet
Unit 5 Reinforcement Learning Notes
20 pages
CNN Case Studies Unit 4
No ratings yet
CNN Case Studies Unit 4
13 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
ML Unit 1
No ratings yet
ML Unit 1
44 pages
Artificial Neural Networks Quiz Questions 1
No ratings yet
Artificial Neural Networks Quiz Questions 1
17 pages
DC Circuit
No ratings yet
DC Circuit
142 pages
Practice Final sp22
No ratings yet
Practice Final sp22
10 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
Autodesk Constraints
No ratings yet
Autodesk Constraints
16 pages
DLunit 4
No ratings yet
DLunit 4
16 pages
Machine Learning Full Question Bank
No ratings yet
Machine Learning Full Question Bank
14 pages
Skill and Competency
100% (2)
Skill and Competency
28 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
Neural-Network Questions
0% (1)
Neural-Network Questions
3 pages
Deep Learning Questions
50% (2)
Deep Learning Questions
51 pages
Train A Simple NN - Jupyter Notebook
No ratings yet
Train A Simple NN - Jupyter Notebook
4 pages
SCSA3015 Deep Learning Unit 2 PDF
No ratings yet
SCSA3015 Deep Learning Unit 2 PDF
32 pages
Neural Networks Question Bank
No ratings yet
Neural Networks Question Bank
42 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
74 pages
Autoencoder Report 1
No ratings yet
Autoencoder Report 1
34 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Final PPT
No ratings yet
Final PPT
44 pages
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
No ratings yet
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
9 pages
Answers For End-Sem Exam Part - 2 (Deep Learning)
No ratings yet
Answers For End-Sem Exam Part - 2 (Deep Learning)
20 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Unit 2 - MCQ Bank PDF
No ratings yet
Unit 2 - MCQ Bank PDF
15 pages
Deep Learning (RCS-086) ppt-1 of Unit-1
100% (2)
Deep Learning (RCS-086) ppt-1 of Unit-1
14 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
No ratings yet
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
10 pages
An Introduction To Kohonen Self Organizing Maps: Rajarshi Guha
No ratings yet
An Introduction To Kohonen Self Organizing Maps: Rajarshi Guha
12 pages
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
Artificial Neural Networks: Part 1/3
No ratings yet
Artificial Neural Networks: Part 1/3
25 pages
Perceptron and Backpropagation
No ratings yet
Perceptron and Backpropagation
17 pages
Backpropagation
No ratings yet
Backpropagation
7 pages
ML First Unit
No ratings yet
ML First Unit
70 pages