0% found this document useful (0 votes)

15 views45 pages

Unit5 Autoencoders

Autoencoders are neural networks designed for unsupervised learning that aim to copy their input to output while learning a low-dimensional embedding of the data. They can be categorized into various types, including undercomplete, regularized, and denoising autoencoders, each with unique properties and applications. The document discusses the architecture, training methods, and challenges associated with autoencoders, as well as their connections to generative models and practical uses in feature learning and data compression.

Uploaded by

kannan.niran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views45 pages

Unit5 Autoencoders

Uploaded by

kannan.niran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Autoencoders

1
Contents
• What is an autoencoder?
1. Undercomplete Autoencoders
2. Regularized Autoencoders
3. Representational Power, Layout Size and Depth
4. Stochastic Encoders and Decoders
5. Denoising Autoencoders
6. Learning Manifolds and Autoencoders
7. Contractive Autoencoders
2
8. Predictive Sparse Decomposition
9. Applications of Autoencoders

3
What is an Autoencoder?

• A neural network trained using unsupervised learning

• Trained to copy its input to its output
• Learns an embedding

4
Embedding is a point on a
manifold
• An embedding is a low-dimensional vector
• With fewer dimensions than than the ambient space
of which the manifold is a low-dimensional subset
• Embedding Algorithm
• Maps any point in ambient space x to its embedding h
• Embeddings of related inputs form a manifold

5
A manifold in ambient
space
Embedding: map x to lower dimensional h

1-D manifold in 2-D space Age Progression/Regression by Conditional Adversarial Autoencoder (CAA
Github: https://github.com/ZZUTK/Face-Aging-CAAE
Derived from 28x28=784 space
6
General structure of an
autoencoder
• Maps an input x to an output r (called reconstruction)
through an internal representation code h
• It has a hidden layer h that describes a code used to represent
the input
• The network has two parts
• The encoder function h=f(x)
• A decoder that produces a reconstruction r=g(h)

7
Autoencoders differ from General Data
Compression
• Autoencoders are data-specific
• i.e., only able to compress data similar to what they have been
trained on
• This is different from, say, MP3 or JPEG compression
algorithm
• Which make general assumptions about "sound/images”, but
not about specific types of sounds/images
• Autoencoder for pictures of cats would do poorly in
compressing pictures of trees
• Because features it would learn would be cat-specific
• Autoencoders are lossy
• which means that the decompressed outputs will be degraded
compared to the original inputs (similar to MP3 or JPEG
8
compression).
• This differs from lossless arithmetic compression
• Autoencoders are learnt

9
What does an Autoencoder Learn?
• Learning g (f (x))=x everywhere is not useful
• Autoencoders are designed to be unable to copy
perfectly
• Restricted to copy only approximately
• Autoencoders learn useful properties of the data
• Being forced to prioritize which aspects of input should be
copied
• Can learn stochastic mappings
• Go beyond deterministic functions to mappings pencoder(h|x) and
pdecoder(x|h)

10
Autoencoder History

• Part of neural network landscape for decades

• Used for dimensionality reduction and feature
learning
• Theoretical connection to latent variable models
• Have brought them into forefront of generative
models
• Variational Autoencoders

11
An autoencoder architecture
Weights W are learnt
using:
1. Training samples, and
2. a loss
Decoder
g function as

discussed next

Encoder
f

12
Two Autoencoder Training Methods

1. Autoencoder is a feed-forward non-recurrent neural net

• With an input layer, an output layer and one or more hidden
layers
• Can be trained using the same techniques
• Compute gradients using back-propagation
• Followed by minibatch gradient descent
2. Unlike feedforward networks, can also be trained using
Recirculation
• Compare activations on the input to activations of the
reconstructed input
• More biologically plausible than back-prop but rarely used in ML

13
1. Undercomplete Autoencoder

• Copying input to output sounds useless

• But we have no interest in decoder output
• We hope h takes on useful properties
• Undercomplete autoencoder
• Constrain h to have lower dimension than x
• Force it to capture most salient features of training
data

14
Autoencoder with linear decoder +MSE is PCA
• Learning process is that of minimizing a loss function
L(x, g ( f (x)))
• where L is a loss function penalizing g( f (x)) for being dissimilar
from x
• such as L2 norm of difference: mean squared error
• When the decoder g is linear and L is the mean squared error,
an undercomplete autoencoder learns to span the same
subspace as PCA
• In this case the autoencoder trained to perform the copying task
has learned the principal subspace of the training data as a side-
effect
• Autoencoders with nonlinear f and g can learn more
powerful nonlinear generalizations of PCA
• But high capacity is not desirable as seen next

15
Autoencoder training using a loss
function
• Encoder f and Autoencoder with 3 fully connected hidden
layers
decoder g
f : Χ →h
g: h →X
arg 2

min X −(f ! g)X

f ,g
h
• One hidden layer
• Non-linear encoder
• Takes input x ε Rd
Decoder
• Maps into output h ε Rp Encoder
g
f
h = σ1(Wx +b)
x ' = σ2(W 'h
σ is an element-wise activation function such as sigmoid
+b') or Relu
16
Trained to minimize reconstruction error (such as sum of
squared errors)

17
2

• If encoder f and decoder g are allowed too much

capacity
• autoencoder can learn to perform the copying task without
learning any useful information about distribution of data
• Autoencoder with a one-dimensional code and a very
powerful nonlinear encoder can learn to map x(i) to
code i.
• The decoder can learn to map these integer indices back to
the values of specific training examples
• Autoencoder trained for copying task fails to learn
anything useful if f/g capacity is too great

18
Cases when Autoencoder Learning Fails

• Where autoencoders fail to learn anything

useful:
1. Capacity of encoder/decoder f/g is too high
• Capacity controlled by depth
2. Hidden code h has dimension equal to input x
3. Overcomplete case: where hidden code h has
dimension greater than input x
• Even a linear encoder/decoder can learn to copy input
to output without learning anything useful about data
distribution

19
Right Autoencoder Design: Use regularization

• Ideally, choose code size (dimension of h) small and

capacity of encoder f and decoder g based on
complexity of distribution modeled
• Regularized autoencoders provide the ability to do so
• Rather than limiting model capacity by keeping
encoder/decoder shallow and code size small
• They use a loss function that encourages the model to have
properties other than copy its input to output

20
2. Regularized Autoencoder Properties
• Regularized AEs have properties beyond copying
input to output:
• Sparsity of representation
• Smallness of the derivative of the representation
• Robustness to noise
• Robustness to missing inputs
• Regularized autoencoder can be nonlinear and
overcomplete
• But still learn something useful about the data distribution
even if model capacity is great enough to learn trivial identity
function

21
Generative Models Viewed as Autoencoders

• Beyond regularized autoencoders

• Generative models with latent variables and an
inference procedure (for computing latent
representations given input) can be viewed as a
particular form of autoencoder
• Generative modeling approaches which emphasize
connection with autoencoders are descendants of
Helmholtz machine:
1. Variational autoencoder
2. Generative stochastic networks

22
Deep Srihar
Learning i

Latent variables treated as

distributions

Source: https://www.jeremyjordan.me/variational-autoencoders/ 23
Variational
Deep
Srihari
Learning

Autoencoder
• VAE is a generative model
• able to generate samples that look like samples from training
data
• With MNIST, these fake samples would be synthetic images
of digits

• Due to random variable between input-output it

cannot be trained using backprop
• Instead, backprop proceeds through parameters of latent
distribution
• Called reparameterization trick
N(μ,Σ) = μ + Σ N(0, I)
24
Where Σ is diagonal

25
Sparse Autoencoder
Only a few nodes are encouraged to activate when a
single sample is fed into the network

Fewer nodes activating while still keeping its performance would guarantee that the
autoencoder is actually learning latent representations instead of redundant information
26
in our input data

27
Sparse Autoencoder Loss Function

• A sparse autoencoder is an autoencoder whose

• Training criterion includes a sparsity penalty Ω(h) on the code
layer h in addition to the reconstruction error:
L(x, g ( f (x))) + Ω(h)
• where g (h) is the decoder output and typically we have h = f (x)
• Sparse encoders are typically used to learn features
for another task such as classification
• An autoencoder that has been trained to be sparse
must respond to unique statistical features of the
dataset rather than simply perform the copying task
• Thus sparsity penalty can yield a model that has learned
useful features as a byproduct
28
Sparse Encoder doesn’t have Bayesian
Interpretation
• Penalty term Ω(h) is a regularizer term added to a
feedforward network whose
• Primary task: copy input to output (with Unsupervised learning
objective)
• Also perform some supervised task (with Supervised learning
objective) that depends on the sparse features
• In supervised learning regularization term
corresponds to prior probabilities over model
parameters
• Regularized MLE corresponds to maximizing p(θ|x), which is
equivalent to maximizing log p(x|θ)+log p(θ)
• First term is data log-likelihood and second term is log-prior over
parameters
29
• Regularizer depends on data and thus is not a prior
• Instead, regularization terms express a preference over functions

30
Generative Model view of Sparse
Autoencoder

• Rather than thinking of sparsity penalty as a

regularizer for copying task, think of sparse
autoencoder as approximating ML training of a
generative model that has latent variables
• Suppose model has visible/latent variables x and h
• Explicit joint distribution is pmodel(x,h) = pmodel(h) pmodel(x|h)
• where pmodel(h) is model’s prior distribution over latent variables
• Different from p(θ) being distribution of parameters
• The log-likelihood can be decomposed aslog p model
(x,h) = log ∑
pmodel(h,x)
h

• Autoencoder approximates the sum with a point

estimate for
just one highly likely value of h, the output of a
parametric encoder
• With a chosen h we are maximizing log pmodel(x,h) = log pmodel(h)+log pmodel(x|h)
Denoising Autoencoders (DAE)

• Rather than adding a penalty Ω to the cost

function, we can obtain an autoencoder that learns
something useful
• By changing the reconstruction error term of the cost function
• Traditional autoencoders minimize L(x, g ( f (x)))
• where L is a loss function penalizing g( f (x)) for being
dissimilar from x, such as L2 norm of difference: mean
squared error
• A DAE L(x, g(f (x! )))
minimizes
• where is a copy of x that has been corrupted by some form of
x! noise
• The autoencoder must undo this corruption rather than
simply copying their input
• Denoising training forces f and g to implicitly learn the
structure of pdata(x)
• Another example of how useful properties can emerge
as a byproduct of minimizing reconstruction error
Regularizing by Penalizing Derivatives
• Another strategy for regularizing an autoencoder
• Use penalty as in sparse autoencoders
L(x, g ( f (x))) + Ω(h,x)
• But with a different form of Ω
2
Ω(h,x) = λ∑
∇xhi
• Forces the model to learn a function that does not
change much when x changes slightly
• Called a Contractive Auto Encoder (CAE)
• This model has theoretical connections to
• Denoising autoencoders
• Manifold learning
• Probabilistic modeling
28
3. Representational Power, Layer Size and
Depth

• Autoencoders are often trained with with single layer

• However using deep encoder offers many advantages
• Recall: Although universal approximation theorem states
that a single layer is sufficient, there are disadvantages:
1. no of units needed may be too large
2. may not generalize well
• Common strategy: greedily pretrain a stack of
shallow autoencoders

29
4. Stochastic Encoders and
Decoders
• General strategy for designing the output units
and loss function of a feedforward network is to
• Define the output distribution p(y|x)
• Minimize the negative log-likelihood –log p(y|x)
• In this setting y is a vector of targets such as class labels
• In an autoencoder x is the target as well as the input
• Yet we can apply the same machinery as before, as we see next

30
Loss function for Stochastic Decoder

• Given a hidden code h, we may think of the

decoder as providing a conditional distribution
pdecoder(x|h)
• We train the autoencoder by minimizing –log pdecoder(x|h)
• The exact form of this loss function will change
depending on the form of pdecoder(x|h)
• As with feedforward networks we use linear output
units to parameterize the mean of the Gaussian
distribution if x is real
• In this case negative log-likelihood is the mean-squared error
• With binary x correspond to a Bernoulli with
parameters given by a sigmoid
• Discrete x values correspond to a softmax
• The output variables are treated as being conditionally
independent given h 31
Stochastic encoder

• We can also generalize the notion of an encoding

function f(x)
to an encoding distribution pencoder(h|x)

32
Structure of stochastic autoencoder

• Both the encoder and decoder are not simple

functions but involve a distribution
• The output is sampled from a distribution pencoder(h|x)
for the encoder and pdecoder(x|h) for the decoder

33
Relationship to joint distribution

• Any latent variable model pmodel(h|x) defines a

stochastic encoder pencoder(h|x)=pmodel(h|x)
• And a stochastic decoder pdecoder(x|h)=pmodel(x|h)
• In general the encoder and decoder distributions
are not conditional distributions compatible with
a unique joint distribution pmodel(x,h)
• Training the autoencoder as a denoising autoencoder
will tend to make them compatible asymptotically
• With enough capacity and examples

34
Sampling pmodel(h|x)

pencoder(h|x) pdecoder(x|h)

35
Ex: Sampling p(x|h): Deepstyle
• Boil down to a
representation which
relates to style
• By iterating neural network
through a set of images learn
efficient representations
• Choosing a random
numerical description in
encoded space will generate
new images of styles not
seen
• Using one input image and
changing values along
different dimensions of
feature space you can see
how the generated image
changes (patterning, color
texture) in style space 36

SMDM Project
87% (15)
SMDM Project
23 pages
Topic 1 - Types of Data PDF
No ratings yet
Topic 1 - Types of Data PDF
10 pages
Final Exam of Statistics1
100% (3)
Final Exam of Statistics1
5 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Modeling Discrete Time To Event Data Instant Reading Access
100% (8)
Modeling Discrete Time To Event Data Instant Reading Access
15 pages
Tutorial Letter 101/0/2024: Descriptive Statistics and Probability
No ratings yet
Tutorial Letter 101/0/2024: Descriptive Statistics and Probability
16 pages
DL Unit 5
No ratings yet
DL Unit 5
19 pages
MODULE 5 Auto-Encoders and Generative Models
No ratings yet
MODULE 5 Auto-Encoders and Generative Models
25 pages
Autoencoders - Buffalo University
No ratings yet
Autoencoders - Buffalo University
36 pages
Survival Analysis Approach To Reliability, Survivability
100% (1)
Survival Analysis Approach To Reliability, Survivability
20 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Machine Learning Essentials: Notes by Aniket Sahoo - Part I
No ratings yet
Machine Learning Essentials: Notes by Aniket Sahoo - Part I
87 pages
Verified PDF Download Calculus For Biology and Medicine 4th Edition by Claudia Neuhauser Ebook and TestBank Bundle Fast Instant Download
No ratings yet
Verified PDF Download Calculus For Biology and Medicine 4th Edition by Claudia Neuhauser Ebook and TestBank Bundle Fast Instant Download
409 pages
Ch3 Auto Encoder
No ratings yet
Ch3 Auto Encoder
40 pages
C 04
100% (5)
C 04
68 pages
Vae Gan
No ratings yet
Vae Gan
214 pages
PT - Practice Assignment 1 (With Solutions)
100% (1)
PT - Practice Assignment 1 (With Solutions)
6 pages
Dlunit 4
No ratings yet
Dlunit 4
122 pages
03 Autoencoders 4
No ratings yet
03 Autoencoders 4
159 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
Vae - Gan 1
No ratings yet
Vae - Gan 1
136 pages
Chapter 7 - Autoencoders
No ratings yet
Chapter 7 - Autoencoders
91 pages
D5 PPT
No ratings yet
D5 PPT
79 pages
Generative Models
No ratings yet
Generative Models
65 pages
DL Unit3 Autoencoder
No ratings yet
DL Unit3 Autoencoder
91 pages
Unit IV - Part 01
No ratings yet
Unit IV - Part 01
47 pages
Lecture 6373 07
No ratings yet
Lecture 6373 07
53 pages
Where Can Buy Statistics For Business and Economics Global Edition Newbold P. Ebook With Cheap Price
100% (4)
Where Can Buy Statistics For Business and Economics Global Edition Newbold P. Ebook With Cheap Price
45 pages
DeepLearning Unit IV Notes
No ratings yet
DeepLearning Unit IV Notes
58 pages
Monte Carlo Simulation in Engineering PDF
No ratings yet
Monte Carlo Simulation in Engineering PDF
98 pages
Lecture 23b Auto Encoder
No ratings yet
Lecture 23b Auto Encoder
27 pages
Unit V
No ratings yet
Unit V
32 pages
Deep Learning Module-2 & 4
No ratings yet
Deep Learning Module-2 & 4
48 pages
Unit II
No ratings yet
Unit II
35 pages
ML Lec 19 Autoencoder
No ratings yet
ML Lec 19 Autoencoder
54 pages
Autoencoders
No ratings yet
Autoencoders
35 pages
Unit-V DL
No ratings yet
Unit-V DL
31 pages
All Analysiscode Explanation
No ratings yet
All Analysiscode Explanation
22 pages
Unit 3
No ratings yet
Unit 3
23 pages
Uncertainty Analysis For RELAP5-3D: Aaron J. Pawel George L. Mesina August 2011
No ratings yet
Uncertainty Analysis For RELAP5-3D: Aaron J. Pawel George L. Mesina August 2011
19 pages
Detail Syllabus For B.A. Part II Honours Anthropology Honours
No ratings yet
Detail Syllabus For B.A. Part II Honours Anthropology Honours
36 pages
Module 03
No ratings yet
Module 03
13 pages
Unit 5
No ratings yet
Unit 5
27 pages
Autoencoder - Unit 4
No ratings yet
Autoencoder - Unit 4
39 pages
DL Unit - 4
No ratings yet
DL Unit - 4
26 pages
DL Unit 4
No ratings yet
DL Unit 4
21 pages
Experiment 4
No ratings yet
Experiment 4
26 pages
DUnit IV
No ratings yet
DUnit IV
22 pages
L23 Autoencoders
No ratings yet
L23 Autoencoders
16 pages
Unit 5e - Autoencoders
No ratings yet
Unit 5e - Autoencoders
32 pages
UNIT-5 Part1
No ratings yet
UNIT-5 Part1
15 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
7& 9 Autoencoder and Variational Autoencoder
No ratings yet
7& 9 Autoencoder and Variational Autoencoder
13 pages
DL Class5
No ratings yet
DL Class5
23 pages
Ad3501-Dl-Unit 5 Notes
No ratings yet
Ad3501-Dl-Unit 5 Notes
16 pages
DL M3 Tech
No ratings yet
DL M3 Tech
15 pages
Unit 4
No ratings yet
Unit 4
10 pages
Unit-5 Auto Encoders in Deep Learning
No ratings yet
Unit-5 Auto Encoders in Deep Learning
23 pages
Unit 3
No ratings yet
Unit 3
39 pages
465-Lecture 12
No ratings yet
465-Lecture 12
31 pages
Autoencoders
No ratings yet
Autoencoders
20 pages
Autoencoders U
No ratings yet
Autoencoders U
44 pages
Descriptive Dataset
No ratings yet
Descriptive Dataset
6 pages
Autoencoder
No ratings yet
Autoencoder
39 pages
Autoencoders
No ratings yet
Autoencoders
4 pages
Auto Encoder
No ratings yet
Auto Encoder
10 pages
Auto Encoders
No ratings yet
Auto Encoders
4 pages
KMJ - Muafakat SM025 Set 2
No ratings yet
KMJ - Muafakat SM025 Set 2
5 pages
Unit Iv
No ratings yet
Unit Iv
11 pages
Module 4
No ratings yet
Module 4
10 pages
Autoencoder
No ratings yet
Autoencoder
14 pages
AAI Module 3
No ratings yet
AAI Module 3
11 pages
Lecture 14 Autoencoders
No ratings yet
Lecture 14 Autoencoders
39 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
1 Autoencoders
No ratings yet
1 Autoencoders
22 pages
22mats41 Cse
No ratings yet
22mats41 Cse
4 pages
Auto Encoder
No ratings yet
Auto Encoder
39 pages
Unit 4 Streaming Data
No ratings yet
Unit 4 Streaming Data
4 pages
Budgeting: Estimating Costs and Risks
No ratings yet
Budgeting: Estimating Costs and Risks
29 pages
Continuous Probability Distribution
No ratings yet
Continuous Probability Distribution
22 pages
1 Intro Timescale Transfer
No ratings yet
1 Intro Timescale Transfer
69 pages
Quality Management For Engineers
No ratings yet
Quality Management For Engineers
16 pages
Cartwright Longuet
No ratings yet
Cartwright Longuet
21 pages
The Sampling Distribution of The Sample Mean: Recall: Simple Random Sampling
No ratings yet
The Sampling Distribution of The Sample Mean: Recall: Simple Random Sampling
9 pages
Mathematics in The Modern World Juan Apolinario C. Reyes, MS
No ratings yet
Mathematics in The Modern World Juan Apolinario C. Reyes, MS
5 pages
Report: Mean (Expected Value) of A Discrete Random Variable 100%
No ratings yet
Report: Mean (Expected Value) of A Discrete Random Variable 100%
2 pages
Mystical Rose School of Caloocan Inc.: # 15 Ilang-Ilang ST., Almar Subdivision Camarin Caloocan City
No ratings yet
Mystical Rose School of Caloocan Inc.: # 15 Ilang-Ilang ST., Almar Subdivision Camarin Caloocan City
2 pages
Math 2133 Exam 2013 Final
No ratings yet
Math 2133 Exam 2013 Final
6 pages
Bernoulli Distribution (From
No ratings yet
Bernoulli Distribution (From
2 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
Ecostat ps4
No ratings yet
Ecostat ps4
2 pages
DLP Final
No ratings yet
DLP Final
3 pages
How To Conduct A Simulation Study
No ratings yet
How To Conduct A Simulation Study
5 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet

Unit5 Autoencoders

Uploaded by

Unit5 Autoencoders

Uploaded by

Autoencoders

• A neural network trained using unsupervised learning

• Part of neural network landscape for decades

1. Autoencoder is a feed-forward non-recurrent neural net

• Copying input to output sounds useless

min X −(f ! g)X

• If encoder f and decoder g are allowed too much

• Where autoencoders fail to learn anything

• Ideally, choose code size (dimension of h) small and

• Beyond regularized autoencoders

Latent variables treated as

• Due to random variable between input-output it

• A sparse autoencoder is an autoencoder whose

• Rather than thinking of sparsity penalty as a

• Autoencoder approximates the sum with a point

• Rather than adding a penalty Ω to the cost

• Autoencoders are often trained with with single layer

• Given a hidden code h, we may think of the

• We can also generalize the notion of an encoding

• Both the encoder and decoder are not simple

• Any latent variable model pmodel(h|x) defines a

You might also like