0% found this document useful (0 votes)

214 views42 pages

ch14 Autoencoder

The document provides an introduction and overview of autoencoders. It discusses undercomplete autoencoders, regularized autoencoders including sparse autoencoders, and denoising autoencoders. The key points are that autoencoders are neural networks trained to reconstruct their input, but can learn useful representations if constrained, such as through sparsity penalties or partially corrupted inputs as in denoising autoencoders.

Uploaded by

黃良初

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

214 views42 pages

ch14 Autoencoder

Uploaded by

黃良初

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Deep Learning Tutorial: Autoencoders

Jen-Tzung Chien

Department of Electrical and Computer Engineering

Department of Computer Science
National Chiao Tung University, Hsinchu

September 26, 2016

Outline

• Introduction
• Undercomplete Autoencoders
• Representational Power, Layer Size and Depth
• Stochastic Encoders and Decoders
• Denoising Autoencoders
• Learning Manifolds with Autoencoders
• Predictive Sparse Decomposition
• Applications of Autoencoders

1
Autoencoders Deep Learning

Outline

2
Autoencoders Deep Learning

Introduction

• An autoencoder is a neural network that is trained to attempt to copy its input

to its output

• Internally, it has a hidden layer h that describes a code used to represent the
input

• The network may be viewed as consisting of two parts: an encoder function

h = f (x) and a decoder that produces a reconstruction r = g(h)

• If an autoencoder succeeds in simply learning to set g(f (x)) = x everywhere,

then it is not especially useful

• Instead, autoencoders are designed to be unable to learn to copy perfectly

3
Autoencoders Deep Learning

Autoencoder Graphical model

4
Autoencoders Deep Learning

Introduction

• Modern autoencoders have generalized the idea of an encoder and a de-

coder beyond deterministic functions to stochastic mappings pencoder (h|x)
and pdecoder (x|h)

• Traditionally, autoencoders were used for dimensionality reduction or feature

learning

• Recently, theoretical connections between autoencoders and latent variable

models have brought autoencoders to the forefront of generative modeling

• Unlike general feedforward networks, autoencoders may also be trained using

recirculation(Hinton and McClelland, 1988), a learning algorithm based on
comparing the activations of the network on the original input to the activations
on the reconstructed input

5
Autoencoders Deep Learning

Outline

6
Autoencoders Deep Learning

Undercomplete Autoencoders

• We hope that training the autoencoder to perform the input copying task will
result in h taking on useful properties

• One way to obtain useful features from the autoencoder is to constrain h to

have smaller dimension than x

• An autoencoder whose code dimension is less than the input dimension is

called undercomplete

• Learning an undercomplete representation forces the autoencoder to capture

the most salient features of the training data

7
Autoencoders Deep Learning

Learning Process
• The learning process is described simply as minimizing a loss function

L(x, g(f (x)))

• where L is a loss function penalizing g(f (x)) for being dissimilar from x, such
as the mean squared error

• When the decoder is linear and L is the mean squared error, an undercomplete
autoencoder learns to span the same subspace as PCA

• In this case, an autoencoder trained to perform the copying task has learned
the principal subspace of the training data as a side-eﬀect

• If the encoder and decoder are allowed too much capacity, the autoencoder
can learn to perform the copying task without extracting useful information
about the distribution of the data

8
Autoencoders Deep Learning

Regularized Autoencoders
• A similar problem occurs if the hidden code is allowed to have dimension
equal to the input, and in the overcompletecase in which the hidden code has
dimension greater than the input

• In these cases, even a linear encoder and linear decoder can learn to copy
the input to the output without learning anything useful about the data
distribution

• Rather than limiting the model capacity by keeping the encoder and decoder
shallow and the code size small, regularized autoencoders use a loss function
that encourages the model to have other properties besides the ability to copy
its input to its output

• A regularized autoencoder can be nonlinear and overcomplete but still learn

something useful about the data distribution even if the model capacity is
great enough to learn a trivial identity function

9
Autoencoders Deep Learning

Sparse Autoencoders

• A sparse autoencoder is simply an autoencoder whose training criterion involves

a sparsity penalty Ω(h) on the code layer h, in addition to the reconstruction
error:
L(x, g(f (x))) + Ω(h)

• where g(h) is the decoder output and typically we have h = f (x), the encoder
output

• Sparse autoencoders are typically used to learn features for another task such
as classification

• An autoencoder that has been regularized to be sparse must respond to unique

statistical features of the dataset it has been trained on, rather than simply
acting as an identity function

10
Autoencoders Deep Learning

Sparse Autoencoders

• We can think of the penalty Ω(h) simply as a regularizer term added

to a feedforward network whose primary task is to copy the input to
the output(unsupervised learning objective) and possibly also perform some
supervised task(with a supervised learning objective) that depends on these
sparse features

• Training with weight decay and other regularization penalties can be interpreted
as a MAP approximation to Bayesian inference, with the added regularizing
penalty corresponding to a prior probability distribution over the model
parameters

• Regularized autoencoders defy such an interpretation because the regularizer

depends on the data and is therefore by definition not a prior in the formal
sense of the word

11
Autoencoders Deep Learning

Sparse Autoencoders

• Rather than thinking of the sparsity penalty as a regularizer for the copying
task, we can think of the entire sparse autoencoder framework as approximating
maximum likelihood training of a generative model that has latent variables

• Suppose we have a model with visible variables x and latent variables h, with
an explicit joint distribution pmodel(x, h) = pmodel(h)pmodel(x|h)

• We refer to pmodel(h) as the model’s prior distribution over the latent variables,
representing the model’s beliefs prior to seeing x

log pmodel(x, h) = log pmodel(h) + log pmodel(x|h)

12
Autoencoders Deep Learning

Sparse Autoencoder
• The log pmodel(h) term can be sparsity-inducing. For example, the Laplace
prior
λ
pmodel(hi) = e−λ|hi|
2
corresponds to an absolute value sparsity penalty
∑( λ
)
− log pmodel(h) = λ|hi| − log = Ω(h) + const
i
2
∑
Ω(h) = λ |hi|
i

where the constant term depends only on λ and not h

• We typically treat λ as a hyperparameter and discard the constant term since

it does not aﬀect the parameter learning

13
Autoencoders Deep Learning

Denoising Autoencoders

• A denoising autoencoder or DAE instead minimizes

L(x, g(f (x̃)))

where x̃ is a copy of x that has been corrupted by some form of noise

• Denoising training forces f and g to implicitly learn the structure of pdata(x),

as shown by Alain and Bengio (2014) and Bengio et al. (2013)

14
Autoencoders Deep Learning

Regularizing by Penalizing Derivatives

• Another strategy for regularizing an autoencoder is to use a penalty Ω as in

sparse autoencoders
L(x, g(f (x))) + Ω(x, h)

but with a diﬀerent form of Ω

∑
Ω(x, h) = λ ||∇xhi||2
i

• This forces the model to learn a function that does not change much when x
changes slightly

• An autoencoder regularized in this way is called acontractive autoencoderor

CAE

15
Autoencoders Deep Learning

Outline

16
Autoencoders Deep Learning

Representational Power, Layer Size and Depth

• Autoencoder with a single hidden layer is able to represent the identity function
along the domain of the data arbitrarily well

• A deep autoencoder, with atleast one additional hidden layer inside the encoder
itself, can approximate any mapping from input to code arbitrarily well, given
enough hidden units

• A common strategy for training a deep autoencoder is to greedily pretrain the

deep architecture by training a stack of shallow autoencoders

17
Autoencoders Deep Learning

Outline

18
Autoencoders Deep Learning

Stochastic Encoders and Decoders

19
Autoencoders Deep Learning

Stochastic Encoders and Decoders

• Any latent variable model pmodel(h|x) defines a stochastic encoder

pencode(h|x) = pmodel(h|x)

• and a stochastic decoder

pdecode(x|h) = pmodel(x|h)

20
Autoencoders Deep Learning

Outline

21
Autoencoders Deep Learning

Denoising Autoencoders

22
Autoencoders Deep Learning

Denoising Autoencoders

• The denoising autoencoder(DAE) is an autoencoder that receives a corrupted

data point as input and is trained to predict the original, uncorrupted data
point as its output

• We introduce a corruption process C(x̂|x) which represents a conditional

distribution over corrupted samples x̂, given a data sample x

23
Autoencoders Deep Learning

DAE Training Procedure

Algorithm 1 DAE training procedure

1: Sample a training example x from the training data
2: Sample a corrupted version x̂ from C(x̂|x)
3: Use (x̂, x) as a training example for estimating the autoencoder reconstruction
distribution preconstruct(x|x̂) = pdecoder (x|h) with h the output of encoder
f (x̂) and pdecoder typically defined by a decoder g(h)

• We can therefore view the DAE as performing stochastic gradient descent on

the following expectation
[ ]
−Ex∼p̂data(x) Ex̂∼C(x̂|x) [log pdecoder (x|h = f (x̂))]

where p̂data(x) is the training distribution

24
Autoencoders Deep Learning

Vector Field g(f (x)) − x

25
Autoencoders Deep Learning

Historical Perspective

• The name ”denoising autoencoder” refers to a model that is intended not

merely to learn to denoise its input but to learn a good internal representation
as a side eﬀect of learning to denoise(Vincent et al. (2008), Vincent et al.
(2010))

• Prior to the introduction of the modern DAE, Inayoshi and Kurita (2005)
explored some of the same goals with some of the same methods

• Their approach minimizes reconstruction error in addition to a supervised

objective while injecting noise in the hidden layer of a supervised MLP, with
the objective to improve generalization by introducing the reconstruction error
and the injected noise

26
Autoencoders Deep Learning

Outline

27
Autoencoders Deep Learning

Learning Manifolds with Autoencoders

• Like many other machine learning algorithms, autoencoders exploit the idea
that data concentrates around a low-dimensional manifold or a small set of
such manifolds

• At a point x on a d-dimensional manifold, the tangent plane is given by d basis

vectors that span the local directions of variation allowed on the manifold

• These local directions specify how one can change x infinitesimally while
staying on the manifold

28
Autoencoders Deep Learning

Tangent Planes

29
Autoencoders Deep Learning

Learning Manifolds with Autoencoders

• All autoencoder training procedures involve a compromise between two forces:

– Learning a representation h of a training example x such that x can be
approximately recovered from h through a decoder
– Satisfying the constraint or regularization penalty

• The two forces together are useful because they force the hidden representation
to capture information about the structure of the data generating distribution

30
Autoencoders Deep Learning

One-Dimensional Example

Figure 1: If the autoencoder learns a reconstruction function that is invariant to

small perturbations near the data points, it captures the manifold structure of
the data

31
Autoencoders Deep Learning

Non-parametric Manifold Learning

Figure 2: Non-parametric manifold learning procedures build a nearest neighbor

graph in which nodes represent training examples a directed edges indicate nearest
neighbor relationships

32
Autoencoders Deep Learning

Global Coordinate System

Figure 3: Each local patchcan be thought of as a local Euclidean coordinate

system or as a locally flat Gaussian, or ”pancake” , with a very small variance in
the directions orthogonal to the pancake and avery large variance in the directions
defining the coordinate system on the pancake

33
Autoencoders Deep Learning

Contractive Autoencoders

• The contractive autoencoder (Rifai et al., 2011a), (Rifai et al., 2011b)

introduces an explicit regularizer on the code h = f (x), encouraging the
derivatives of f to be as small as possible:

∂f (x) 2
Ω(h) = λ∥ ∥F
x

• Denoising autoencoders make the reconstruction function resist small but

finite-sized perturbations of the input, while contractive autoencoders make
the feature extraction function resist infinitesimal perturbations of the input

34
Autoencoders Deep Learning

Tangent Vectors

• The goal of the CAE is to learn the manifold structure of the data

• Directions x with large J x, that J is the Jacobian matrix at a point x, rapidly

change h, so these are likely to be directions which approximate the tangent
planes of the manifold

• The directions corresponding to the largest singular values are interpreted as

the tangent directions that the contractive autoencoder has learned

• Ideally, these tangent directions should correspond to real variations in the

data

35
Autoencoders Deep Learning

Tangent Vectors of the Manifold

Figure 4: Although both local PCA and the CAE can capture local tangents, the
CAE is able to form more accurate estimates from limited training data because
it exploits parameter sharing across diﬀerent locations that share a subset of
active hidden units. The CAE tangent directions typically correspond to moving
or changing parts of the object (such as the head or legs)

36
Autoencoders Deep Learning

Outline

37
Autoencoders Deep Learning

Predictive Sparse Decomposition

• Predictive sparse decomposition(PSD) is a model that is a hybrid of

sparsecoding and parametric autoencoders (Kavukcuoglu et al. (2010))

• Training proceeds by minimizing

||x − g(h)||2 + λ|h|1 + ||h − f (x)||2

• Predictive sparse coding is an example of learned approximate inference

• PSD models may be stacked and used to initialize a deep network to be trained
with another criterion

38
Autoencoders Deep Learning

Outline

39
Autoencoders Deep Learning

Applications of Autoencoders

• Autoencoders have been successfully applied to dimensionality reduction and

information retrieval tasks

• Hinton and Salakhutdinov (2006) trained a stack of RBMs and then used their
weights to initialize a deep autoencoder

• The resulting code yielded less reconstruction error than PCA into 30
dimensions

• One task that benefits even more than usual from dimensionality reduction is
information retrieval, the task of finding entries in a database that resemble a
query entry

40
Autoencoders Deep Learning

References
[1] G. E. Hinton and J. L. McClelland, “Learning representations by recirculation,” in Neural information processing
systems. New York: American Institute of Physics, 1988, pp. 358–366.
[2] G. Alain and Y. Bengio, “What regularized auto-encoders learn from the data-generating distribution.” Journal
of Machine Learning Research, vol. 15, no. 1, pp. 3563–3593, 2014.
[3] Y. Bengio, L. Yao, G. Alain, and P. Vincent, “Generalized denoising auto-encoders as generative models,” in
Advances in Neural Information Processing Systems, 2013, pp. 899–907.
[4] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with
denoising autoencoders,” in Proceedings of the 25th international conference on Machine learning. ACM,
2008, pp. 1096–1103.
[5] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: Learning
useful representations in a deep network with a local denoising criterion,” Journal of Machine Learning Research,
vol. 11, no. Dec, pp. 3371–3408, 2010.
[6] H. Inayoshi and T. Kurita, “Improved generalization by adding both auto-association and hidden-layer-noise to
neural-network-based-classifiers,” in 2005 IEEE Workshop on Machine Learning for Signal Processing. IEEE,
2005, pp. 141–146.
[7] S. Rifai, G. Mesnil, P. Vincent, X. Muller, Y. Bengio, Y. Dauphin, and X. Glorot, “Higher order contractive
auto-encoder,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases.
Springer, 2011a, pp. 645–660.
[8] S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, “Contractive auto-encoders: Explicit invariance
during feature extraction,” in Proceedings of the 28th international conference on machine learning (ICML-11),
2011b, pp. 833–840.
[9] K. Kavukcuoglu, M. Ranzato, and Y. LeCun, “Fast inference in sparse coding algorithms with applications to
object recognition,” arXiv preprint arXiv:1010.3467, 2010.
[10] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science,
vol. 313, no. 5786, pp. 504–507, 2006.

Chapter 4 - Validity and Reliability
100% (2)
Chapter 4 - Validity and Reliability
23 pages
ANN Unit-2 Chapter-2
No ratings yet
ANN Unit-2 Chapter-2
56 pages
Time Series Forecasting - Final Project Report
89% (9)
Time Series Forecasting - Final Project Report
67 pages
Autoencoders - Buffalo University
100% (1)
Autoencoders - Buffalo University
36 pages
Ch06 Deep Feedforward Networks
100% (1)
Ch06 Deep Feedforward Networks
90 pages
Greedy-Layerwise in Deep Learning
No ratings yet
Greedy-Layerwise in Deep Learning
15 pages
ML Assignment 3 Nptel 2019
No ratings yet
ML Assignment 3 Nptel 2019
26 pages
PDART59 12 Statistical Methods
100% (4)
PDART59 12 Statistical Methods
74 pages
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
No ratings yet
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
22 pages
Unit5 Autoencoders
No ratings yet
Unit5 Autoencoders
45 pages
Computer Vision Unit 4
No ratings yet
Computer Vision Unit 4
186 pages
Btech CSE
100% (1)
Btech CSE
17 pages
Ensemble Methods
No ratings yet
Ensemble Methods
4 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Variational Autoencoders
No ratings yet
Variational Autoencoders
14 pages
Unit 5
No ratings yet
Unit 5
36 pages
Producer Consumer Problem
100% (1)
Producer Consumer Problem
8 pages
Autoencoders
No ratings yet
Autoencoders
66 pages
Variational Autoencoder Explanation
No ratings yet
Variational Autoencoder Explanation
11 pages
50 Most Important CNN Interview Questions
No ratings yet
50 Most Important CNN Interview Questions
18 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
Unit 4 Deeplearning
No ratings yet
Unit 4 Deeplearning
41 pages
RL Unit 2
No ratings yet
RL Unit 2
11 pages
Deep Learning (MODULE-3)
No ratings yet
Deep Learning (MODULE-3)
85 pages
Variational Autoencoders
No ratings yet
Variational Autoencoders
94 pages
ch10 Sequence Modelling - Recurrent and Recursive Nets
No ratings yet
ch10 Sequence Modelling - Recurrent and Recursive Nets
45 pages
Ad3501-Dl-Unit 2 Notes
No ratings yet
Ad3501-Dl-Unit 2 Notes
29 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
Estimation of Parameters
No ratings yet
Estimation of Parameters
49 pages
Final PPT
No ratings yet
Final PPT
44 pages
Lecture 26-30 Unit 2
No ratings yet
Lecture 26-30 Unit 2
20 pages
Must Know Questions Deep Learning
No ratings yet
Must Know Questions Deep Learning
22 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
Unit - 3-NNDL - Notes
No ratings yet
Unit - 3-NNDL - Notes
17 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
DLunit 4
No ratings yet
DLunit 4
16 pages
RBM, DBN, and DBM
No ratings yet
RBM, DBN, and DBM
79 pages
SASMO Sample Paper and Syllabus 1
No ratings yet
SASMO Sample Paper and Syllabus 1
5 pages
NNunit 2
No ratings yet
NNunit 2
25 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Deep Learning - Unit-III Two Marks
100% (1)
Deep Learning - Unit-III Two Marks
3 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
Activation Function
No ratings yet
Activation Function
13 pages
Autoencoder Report 1
No ratings yet
Autoencoder Report 1
34 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Self Organizing Maps
No ratings yet
Self Organizing Maps
27 pages
Unit 5
No ratings yet
Unit 5
23 pages
Machine Learning Full Question Bank
No ratings yet
Machine Learning Full Question Bank
14 pages
Trip Generation: Source: NHI Course On Travel Demand Forecasting (152054A)
100% (1)
Trip Generation: Source: NHI Course On Travel Demand Forecasting (152054A)
28 pages
DL Unit-2 Notes PPT
No ratings yet
DL Unit-2 Notes PPT
39 pages
Deep Learning
No ratings yet
Deep Learning
6 pages
Introduction To Computer Vision
No ratings yet
Introduction To Computer Vision
10 pages
Business Analytics - PPT
No ratings yet
Business Analytics - PPT
25 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
18 pages
Back Propagation Network: Soft Computing
No ratings yet
Back Propagation Network: Soft Computing
33 pages
Nueral Network Mcqs
No ratings yet
Nueral Network Mcqs
6 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
Revised Pavement Design - Beawar Pali Section PDF
No ratings yet
Revised Pavement Design - Beawar Pali Section PDF
119 pages
Mathematics For Machine Learning-I
No ratings yet
Mathematics For Machine Learning-I
10 pages
Bidirectional RNN and RVNN
No ratings yet
Bidirectional RNN and RVNN
15 pages
Assignment - Week 6 (Neural Networks) Type of Question: MCQ/MSQ
No ratings yet
Assignment - Week 6 (Neural Networks) Type of Question: MCQ/MSQ
4 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Machine Learning Module-3
No ratings yet
Machine Learning Module-3
23 pages
IIT Madras Notes Machine Learning
No ratings yet
IIT Madras Notes Machine Learning
13 pages
An Introduction To Kohonen Self Organizing Maps: Rajarshi Guha
No ratings yet
An Introduction To Kohonen Self Organizing Maps: Rajarshi Guha
12 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
RBF, KNN, SVM, DT
No ratings yet
RBF, KNN, SVM, DT
9 pages
Overfitting vs. Underfitting, Bias vs. Variance
No ratings yet
Overfitting vs. Underfitting, Bias vs. Variance
7 pages
ch13 Linear Factor Models
No ratings yet
ch13 Linear Factor Models
33 pages
Ch1 Introduction
No ratings yet
Ch1 Introduction
42 pages
真藍牙耳機晶片與系統設計 Data Converters: Su-Hao Wu (吳書豪)
100% (1)
真藍牙耳機晶片與系統設計 Data Converters: Su-Hao Wu (吳書豪)
48 pages
Project Report ON: "Event Management As A Strategic Marketing Tool - The Launch of A Product"
No ratings yet
Project Report ON: "Event Management As A Strategic Marketing Tool - The Launch of A Product"
57 pages
CS1 Study Guide 2023
No ratings yet
CS1 Study Guide 2023
34 pages
Sample Space and Probability
No ratings yet
Sample Space and Probability
86 pages
Case 5 - Lanco Case
67% (3)
Case 5 - Lanco Case
16 pages
Exploring Perspectives of EFL Students On Using Electronic Dictionary
No ratings yet
Exploring Perspectives of EFL Students On Using Electronic Dictionary
22 pages
09 Chapter 2
No ratings yet
09 Chapter 2
98 pages
Further Probability & Statistics
No ratings yet
Further Probability & Statistics
33 pages
Assigment 3 MAS
No ratings yet
Assigment 3 MAS
10 pages
Unit 1 - 03
No ratings yet
Unit 1 - 03
10 pages
Assignment 3 FBA
No ratings yet
Assignment 3 FBA
14 pages
Crime Scene Project
No ratings yet
Crime Scene Project
6 pages
Regression Analysis For Human Resource Demand Forecasting
100% (3)
Regression Analysis For Human Resource Demand Forecasting
10 pages
Weibull Median Ranks Table
100% (2)
Weibull Median Ranks Table
2 pages
Kerala University Patter of Exam
No ratings yet
Kerala University Patter of Exam
15 pages
Datastructure Problems
No ratings yet
Datastructure Problems
1 page
Beamer 1
0% (1)
Beamer 1
12 pages
Business Inferential Statistics Course Outline
No ratings yet
Business Inferential Statistics Course Outline
3 pages
Team 11 Research Proposal
No ratings yet
Team 11 Research Proposal
23 pages
FAL (2021-22) MAT1011 ELA AP2021222001178 Reference Material I Applied Statistics MAT1011 V 2.1
No ratings yet
FAL (2021-22) MAT1011 ELA AP2021222001178 Reference Material I Applied Statistics MAT1011 V 2.1
2 pages
Allergy and Intolerance Regarding IgG4 Immunoglobulin
No ratings yet
Allergy and Intolerance Regarding IgG4 Immunoglobulin
23 pages
Managerial Accounting Case Study 1
No ratings yet
Managerial Accounting Case Study 1
5 pages
Case Processing Summary
No ratings yet
Case Processing Summary
4 pages
Stat 150 Class Notes: Onur Kaya 16292609
No ratings yet
Stat 150 Class Notes: Onur Kaya 16292609
4 pages

ch14 Autoencoder

Uploaded by

ch14 Autoencoder

Uploaded by

Deep Learning Tutorial: Autoencoders

Department of Electrical and Computer Engineering

September 26, 2016

• An autoencoder is a neural network that is trained to attempt to copy its input

• The network may be viewed as consisting of two parts: an encoder function

• If an autoencoder succeeds in simply learning to set g(f (x)) = x everywhere,

• Instead, autoencoders are designed to be unable to learn to copy perfectly

Autoencoder Graphical model

• Modern autoencoders have generalized the idea of an encoder and a de-

• Traditionally, autoencoders were used for dimensionality reduction or feature

• Recently, theoretical connections between autoencoders and latent variable

• Unlike general feedforward networks, autoencoders may also be trained using

• One way to obtain useful features from the autoencoder is to constrain h to

• An autoencoder whose code dimension is less than the input dimension is

• Learning an undercomplete representation forces the autoencoder to capture

L(x, g(f (x)))

• A regularized autoencoder can be nonlinear and overcomplete but still learn

• A sparse autoencoder is simply an autoencoder whose training criterion involves

• An autoencoder that has been regularized to be sparse must respond to unique

• We can think of the penalty Ω(h) simply as a regularizer term added

• Regularized autoencoders defy such an interpretation because the regularizer

log pmodel(x, h) = log pmodel(h) + log pmodel(x|h)

where the constant term depends only on λ and not h

• We typically treat λ as a hyperparameter and discard the constant term since

• A denoising autoencoder or DAE instead minimizes

L(x, g(f (x̃)))

where x̃ is a copy of x that has been corrupted by some form of noise

• Denoising training forces f and g to implicitly learn the structure of pdata(x),

Regularizing by Penalizing Derivatives

• Another strategy for regularizing an autoencoder is to use a penalty Ω as in

but with a diﬀerent form of Ω

• An autoencoder regularized in this way is called acontractive autoencoderor

Representational Power, Layer Size and Depth

• A common strategy for training a deep autoencoder is to greedily pretrain the

Stochastic Encoders and Decoders

Stochastic Encoders and Decoders

• Any latent variable model pmodel(h|x) defines a stochastic encoder

• and a stochastic decoder

• The denoising autoencoder(DAE) is an autoencoder that receives a corrupted

• We introduce a corruption process C(x̂|x) which represents a conditional

DAE Training Procedure

Algorithm 1 DAE training procedure

• We can therefore view the DAE as performing stochastic gradient descent on

where p̂data(x) is the training distribution

Vector Field g(f (x)) − x

• The name ”denoising autoencoder” refers to a model that is intended not

• Their approach minimizes reconstruction error in addition to a supervised

Learning Manifolds with Autoencoders

• At a point x on a d-dimensional manifold, the tangent plane is given by d basis

Learning Manifolds with Autoencoders

• All autoencoder training procedures involve a compromise between two forces:

Figure 1: If the autoencoder learns a reconstruction function that is invariant to

Non-parametric Manifold Learning

Figure 2: Non-parametric manifold learning procedures build a nearest neighbor

Global Coordinate System

Figure 3: Each local patchcan be thought of as a local Euclidean coordinate

• The contractive autoencoder (Rifai et al., 2011a), (Rifai et al., 2011b)

• Denoising autoencoders make the reconstruction function resist small but

• Directions x with large J x, that J is the Jacobian matrix at a point x, rapidly

• The directions corresponding to the largest singular values are interpreted as

• Ideally, these tangent directions should correspond to real variations in the

Tangent Vectors of the Manifold

Predictive Sparse Decomposition

• Predictive sparse decomposition(PSD) is a model that is a hybrid of

• Training proceeds by minimizing

||x − g(h)||2 + λ|h|1 + ||h − f (x)||2

• Predictive sparse coding is an example of learned approximate inference

• Autoencoders have been successfully applied to dimensionality reduction and

You might also like