0% found this document useful (0 votes)
41 views

DL Unit3 Autoencoder

The document discusses different types of autoencoders including linear autoencoders, overcomplete autoencoders, denoising autoencoders, and sparse autoencoders. Autoencoders are a type of neural network used for unsupervised learning tasks like dimensionality reduction and representation learning. They learn an efficient encoding for a set of data in an unsupervised manner.

Uploaded by

anongreeen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

DL Unit3 Autoencoder

The document discusses different types of autoencoders including linear autoencoders, overcomplete autoencoders, denoising autoencoders, and sparse autoencoders. Autoencoders are a type of neural network used for unsupervised learning tasks like dimensionality reduction and representation learning. They learn an efficient encoding for a set of data in an unsupervised manner.

Uploaded by

anongreeen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

DEEP LEARNING

UNIT - III

By,
DR. HIMANI DESHPANDE (TSEC, MUMBAI) Dr. Himani Deshpande 1
UNIT – III AUTOENCODERS

3.1
Introduction, Linear Autoencoder, Undercomplete Autoencoder, Overcomplete
Autoencoders, Regularization in Autoencoders

3.2
Denoising Autoencoders, Sparse Autoencoders, Contractive Autoencoders

3.3
Application of Autoencoders: Image Compression
2

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODERS

¡ An autoencoder is a type of artificial neural network used to learn efficient


codings of unlabeled data (unsupervised learning).

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODERS

¡ Autoencoders are designed to reproduce their input, especially for images.


¡ Key point is to reproduce the input from a learned encoding.

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODERS. ARCHITECTURE

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODER

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


HIGHLIGHT NOTES

¡ The way we try to mark and learn


important things for exams instead of
learning the whole book chapter.
Autoencoder focuses on reproducing
significant information with some loss

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTO ENCODERS

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODERS

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODERS

10

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


Why can't we just copy ?
AUTOENCODERS
That way latent layers
will not learn anything.

11

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODERS

12

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


PCA AND AUTOENCODERS

13

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


PCA AND AUTOENCODERS

14

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


PCA AND AUTOENCODERS

15

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


APPLICATION
SELF DRIVING CARS

16

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


SELF DRIVING CARS

17

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


Properties of Autoencoders

DATA SPECIFIC

UNSUPERVISED

LOSSY
PROPERTIES OF AUTOENCODERS

• Autoencoders are data-specific, which means that they will only be able to compress
data similar to what they have been trained on.
• Example, an autoencoder trained on pictures of faces would do a rather poor job of
compressing pictures of trees, because the features it would learn would be face-specific.
• Autoencoders are lossy, which means that the decompressed outputs will be
degraded compared to the original inputs.
• Autoencoders are learned automatically from data examples, which is a useful
property: it means that it is easy to train specialized instances of the algorithm that will
perform well on a specific type of input. It doesn’t require any new engineering, just
appropriate training data. 20

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


PARTS OF AUTO ENCODERS

¡ Encoder : This part of the network encodes or compresses the input data into a
latent-space representation. The compressed data typically looks garbled, nothing
like the original data.
¡ Decoder : This part of network decodes or reconstructs the encoded data(latent
space representation) back to original dimension. The decoded data is a lossy
reconstruction of the original data.

21

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTO-ENCODER

Usually <784
Compact
NN code representation of
Encoder the input object
28 X 28 = 784
Learn together

code NN
Decoder
Can reconstruct
the original object
22

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODERS

Minimize 𝑥 − 𝑥# "

As close as possible

encode decode
𝑥 𝑐 𝑥#
𝑊 𝑊!
hidden layer
Input layer (linear) output layer
Bottleneck later
23

DR. HIMANI DESHPANDE (TSEC, MUMBAI) Output of the hidden layer is the code
TRAINING AUTOENCODERS

CODE SIZE
NUMBER OF LAYERS

NUMBER OF NODES
PER LAYER LOSS FUNCTION
TRAINING AUTOENCODERS

If we are working with image data, the most popular loss


functions for reconstruction are MSE Loss and L1 Loss.
In case the inputs and outputs are within the range [0,1], as in
MNIST, we can also make use of Binary Cross Entropy as the
reconstruction loss.
TRAINING AUTOENCODERS
You need to set 4 hyperparameters before training an autoencoder:

1.Code size: The code size or the size of the bottleneck is the most important hyperparameter used to tune the
autoencoder. The bottleneck size decides how much the data has to be compressed. This can also act as a
regularisation term.

2.Number of layers: Like all neural networks, an important hyperparameter to tune autoencoders is the depth
of the encoder and the decoder. While a higher depth increases model complexity, a lower depth is faster to
process.

3.Number of nodes per layer: The number of nodes per layer defines the weights we use per layer. Typically,
the number of nodes decreases with each subsequent layer in the autoencoder as the input to each of these
layers becomes smaller across the layers.

4.Reconstruction Loss: The loss function we use to train the autoencoder is highly dependent on the type of
input and output we want the autoencoder to adapt to. If we are working with image data, the most popular
loss functions for reconstruction are MSE Loss and L1 Loss. In case the inputs and outputs are within the range
[0,1], as in MNIST, we can also make use of Binary Cross Entropy as the reconstruction loss.
AUTO ENCODERS ARCHITECTURE

27

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODER

¡ Autoencoders are identical to multilayer


perceptron neural networks because,
like multilayer perceptrons ,
autoencoders have an input layer, some
hidden layers, and an output layer.
¡ The key difference between a multilayer
perceptron network and an
autoencoder is that the output layer of
an autoencoder has the same number
of neurons as that of the input layer.
28

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODER

h = g(W x i + b) The model is trained to minimize a certain loss function 29

xˆ i = f (W ∗ h + c) which will ensure that xˆi is close to x i


DR. HIMANI DESHPANDE (TSEC, MUMBAI)
AUTO ENCODERS ARCHITECTURE

30

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTO ENCODERS ARCHITECTURE

31

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


ENCODER

32

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


33

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


TYPES OF AE

9. Linear Autoencoder

10. Overcomplete Autoencoder 36

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


LINEAR AUTOENCODERS

¡ A linear autoencoder is a type of autoencoder that uses only linear transformations,


such as matrix multiplication and addition, to compress and reconstruct the data.
¡ A linear autoencoder consists of two parts: an encoder and a decoder. The encoder
takes the input data and maps it to a lower-dimensional space. The decoder then
takes the compressed representation and reconstructs the original data.
¡ The goal of training a linear autoencoder is to minimize the reconstruction error
between the input and output:
L(x, x') = ||x - x'||^2

37

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


LINEAR AUTOENCODERS

¡ A linear autoencoder is a type of autoencoder that uses only linear transformations.


In other words, the encoder and decoder are composed of only linear layers. The
advantage of using a linear autoencoder is that it is computationally efficient and can
be trained on large datasets.

38

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


LINEAR AUTOENCODERS

39

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


UNDERCOMPLETE AUTOENCODER

40

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


UNDERCOMPLETE AUTOENCODER

Ø Let us consider the case where


dim(h) < dim(x i )

Ø If we are still able to reconstruct xˆi perfectly


from h, then what does it say about h?

Ø h is a loss-free encoding of x i . It cap- tures all


the important characteristics of x i

An autoencoder where dim(h) < dim(x i ) is called an


under-complete autoencoder
41

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


UNDERCOMPLETE AUTOENCODERS

¡ An undercomplete autoencoder is one of the simplest types of autoencoders.


¡ The way it works is very straightforward—
Undercomplete autoencoder takes in an image and tries to predict the same image
as output, thus reconstructing the image from the compressed bottleneck region.
¡ Undercomplete autoencoders are truly unsupervised as they do not take any form of
label, the target being the same as the input.

42

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


UNDERCOMPLETE AE

43

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


OVERCOMPLETE AUTOENCODER

44

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


APPLICATION

¡ Applications of undercomplete autoencoders include compression, recommendation


systems as well as outlier detection.

45

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


OVERCOMPLETE AUTOENCODER

46

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


OVERCOMPLETE AUTOENCODER

Ø In such a case the autoencoder could learn a


trivial encoding by simply copying x i into h
and then copying h into xˆi

Ø Such an identity encoding is useless in


practice as it does not really tell us anything
about the important characteristics of the
data

An autoencoder where dim(h) ≥ dim(x i) 47

is called an over complete autoencoder


DR. HIMANI DESHPANDE (TSEC, MUMBAI)
APPLICATION

¡ Very rare applications of overcomplete autoencoder, hypothetical scenarios.


¡ BMI à Height & Weight calculation.
¡ Sometimes inspite of knowing BMI, we might need our network to gain knowledge about height or weight.

Sparse Autoencoder based on regularization

48

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


UNDERCOMPLETE AND OVERCOMPLETE AUTOENCODERS
ARCHITECTURE

The only difference between the two is in the encoding


output's size 49

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


UNDERCOMPLETE AND OVERCOMPLETE AUTOENCODERS

¡ We can make out latent space representation learn useful features by giving it smaller
dimensions then input data. In this case autoencoder is undercomplete. By training an
undercomplete representation, we force the autoencoder to learn the most salient features
of the training data. If we give autoencoder much capacity(like if we have almost same
dimensions for input data and latent space), then it will just learn copying task without
extracting useful features or information from data.

¡ If dimensions of latent space is equal to or greater then to input data, in such case
autoencoder is overcomplete. In such case even linear encoder and linear decoder can learn
to copy the input to the output without learning anything useful about the data distribution.
50

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


STACK AUTOENCODER

A stacked autoencoder is a
neural network consist several
layers of sparse autoencoders
where output of each hidden
layer is connected to the input
of the successive hidden layer.

51

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


Regularization in

Autoencoders
It is fine to
loose some
data in this
case
52

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


REGULARIZATION

¡ A reliable autoencoder must make a tradeoff between two important parts:


• Sensitive enough to inputs so that it can accurately reconstruct input data
• Able to generalize well even when evaluated on unseen data

¡ As a result, our loss function of autoencoder is composed of two different parts.


¡ The first part is the loss function (e.g. mean squared error loss) calculating the difference
between input data and output data.
¡ The second term would act as regularization term which prevents autoencoder from
overfitting.
53

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


REGULARIZATION

¡ Regularization helps with the effects of out-of-control parameters by using different methods to
minimize parameter size over time.

¡ Regularization coefficients L1 and L2 help fight overfitting by making certain weights smaller. Smaller-
valued weights lead to simpler hypotheses, which are the most generalizable.
¡ Unregularized weights with several higher-order polynomials in the feature sets tend to overfit the
training set.

54

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


R EG U L A R I ZAT I O N

ü While poor generalization could happen


even in undercomplete autoencoders it is an
even more serious problem for overcomplete
auto encoders
ü Here, the model can simply learn to copy
x i to h and then h to xˆi
ü To avoid poor generalization, we need to
introduce regularization
55

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


R EG U L A R I ZAT I O N

¡ The simplest solution is to add a


L 2 -regularization term to the objective
function

‘m’ is the number of rows , n is number of


columns in the X vector of image
56

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


R EG U L A R I ZAT I O N

¡ The simplest solution is to add a


L 2 -regularization term to the objective
function

57
Θ= combination of all the weights and bias
Θ = [ w1,w2,w3,….. ]
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
REGULARIZATION

¡ The regularized autoencoders use a loss function that helps the model to have other
properties besides copying input to the output.
¡ We can generally find two types of regularized autoencoder:
¡ the denoising autoencoder and
¡ the sparse autoencoder.

58

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


DENOISING AUTOENCODERS

¡ Autoencoders are Neural Networks which are commonly used for feature selection and extraction.
However, when there are more nodes in the hidden layer than there are inputs, the Network is risking
to learn the so-called “Identity Function”, also called “Null Function”, meaning that the output equals
the input, marking the Autoencoder useless.
¡ Denoising Autoencoders solve this problem by corrupting the data on purpose by randomly turning
some of the input values to zero. In general, the percentage of input nodes which are being set to zero
is about 50%. Other sources suggest a lower count, such as 30%. It depends on the amount of data
and input nodes you have.
¡ When calculating the Loss function, it is important to compare the output values with the original
input, not with the corrupted input. That way, the risk of learning the identity function instead of
extracting features is eliminated.
59

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


DENOISING
AUTOENCODERS

¡ Denoising autoencoders are a robust variant of the standard autoencoders.


¡ They have the same structure as a standard autoencoders but are trained using
samples in which some amount of noise is added.
¡ Thus, we map these noisy samples to their clean version.
¡ This ensures that the network doesn’t learn an identity mapping which will be
pointless.
¡ So, to summarise, denoising autoencoders are used where you want to learn a more
robust latent representation for particular set of input data. 60

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


DENOISING AUTOENCODER

61

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


DENOISING AUTOENCODER

62

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


SPARSE AUTOENCODER

¡ A sparse autoencoder is simply an autoencoder whose training criterion involves a sparsity


penalty. In most cases, we would construct our loss function by penalizing activations of
hidden layers so that only a few nodes are encouraged to activate when a single sample is
fed into the network.
¡ There are actually two different ways to construct our sparsity penalty:
¡ L1 regularization and
¡ KL-divergence.

69

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


SPARSE AUTOENCODER

70

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


SPARSE AUTOENCODER

71

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


SPARSE AUTOENCODER

¡ In sparse autoencoders with a sparsity enforcer that directs a single layer network to learn
code dictionary which minimizes the error in reproducing the input while constraining
number of code words for reconstruction.
¡ The sparse autoencoder consists of a single hidden layer, which is connected to the input
vector by a weight matrix forming the encoding step. The hidden layer outputs to a
reconstruction vector, using a tied weight matrix to form the decoder.

73

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


CONTRACTIVE AE

74

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


CONTRACTIVE AUTOENCODER

¡ A Contractive Autoencoder is an autoencoder that adds a penalty term to the classical reconstruction cost
function.
¡ This penalty term corresponds to the Frobenius norm of the Jacobian matrix of the encoder activations with
respect to the input.

The Frobenius Norm of a matrix is defined as the


square root of the sum of the squares of the
elements of the matrix. Approach: Find the sum
of squares of the elements of the matrix and
then print the square root of the calculated
value.
75

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


CONTRACTIVE AUTOENCODER

¡ Contractive Autoencoder was proposed by the researchers at the University of Toronto in 2011 in the
paper Contractive auto-encoders: Explicit invariance during feature extraction. The idea behind that is
to make the autoencoders robust of small changes in the training dataset.
¡ To deal with the above challenge that is posed in basic autoencoders, the authors proposed to add
another penalty term to the loss function of autoencoders.
¡ The Loss function:
Contractive autoencoder adds an extra term in the loss function of autoencoder, it is given as:

i.e the above penalty term is the Frobenius Norm of the encoder, the frobenius norm is just a
generalization of Euclidean norm. 76

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


DENOISING AND CONTRACTIVE AUTOENCODER

¡ There is a connection between the denoising autoencoder and the contractive autoencoder:
¡ the denoising reconstruction error is equivalent to a contractive penalty on the reconstruction function that
maps x to r - g(f(x)).
¡ In other words, denoising autoencoders make the reconstruction function resist small but finite sized
perturbations of the input, whereas contractive autoencoders make the feature extraction function resist
infinitesimal perturbations of the input.

77

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


Symmetric is not
necessary.
DEEP AUTO-ENCODER

¡ Of course, the auto-encoder can be deep

As close as possible

Output Layer
Input Layer

bottle
… …

Layer

Layer

Layer
Layer
Layer

Layer

𝑊# 𝑊" 𝑊"! 𝑊#!

𝑥 Initialize by RBM 𝑥#
Code 78

layer-by-layer
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
85

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


APPLICATIONS
WATERMARK
REMOVAL
01
DENOISING
05
02 DIMENTIONALITY
REDUCTION
IMAGE
COMPRESSION

IMAGE 04 03 FEATURE
COLORIZATION
VARIATION
USE OF AUTO ENCODERS

¡ Data denoising and Dimensionality reduction for data visualization are considered
as two main interesting practical applications of autoencoders. With appropriate
dimensionality and sparsity constraints, autoencoders can learn data projections that
are more interesting than PCA or other basic techniques.
¡ Autoencoders also can be used for Image Reconstruction, Basic Image colorization,
data compression, gray-scale images to colored images, generating higher resolution
images etc.

87

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


APPLICATIONS

Autoencoders present an efficient way to learn a representation of


your data that focuses on the signal, not the noise. You can use them
for a variety of tasks such as:

•Image Compression
•Dimensionality reduction
•Feature extraction
•Denoising of data/images
•Imputing missing data
IMAGE COMPRESSION

89

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


IMAGE COMPRESSION

¡ Autoencoders are a deep learning model for transforming data from a high-
dimensional space to a lower-dimensional space. They work by encoding the data,
whatever its size, to a 1-D vector. This vector can then be decoded to reconstruct
the original data (in this case, an image).

¡ An autoencoder consists of two parts: an encoder network and a decoder network.


The encoder network compresses the input data, while the decoder network
reconstructs the compressed data back into its original form. The compressed data,
also known as the bottleneck layer, is typically much smaller than the input data.
90

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODERS: APPLICATIONS

¡ Denoising: input clean image + noise and train to reproduce the clean image.

91

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODERS: APPLICATIONS

¡ Image colorization: input black and white and train to produce color images

92

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


AUTOENCODERS: APPLICATIONS

¡ Watermark removal

93

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


FEATURE VARIATION

94

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


DIMENSIONALITY REDUCTION

95

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


96

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


PROPERTIES OF AUTOENCODERS

¡ Data-specific: Autoencoders are only able to compress data similar to what they have been trained on.
¡ Lossy: The decompressed outputs will be degraded compared to the original inputs.
¡ Learned automatically from examples: It is easy to train specialized instances of the algorithm that will
perform well on a specific type of input.

97

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


https://www.edureka.co/blog/autoencoders-tutorial/
CAPACITY

¡ As with other NNs, overfitting is a problem when capacity is too large for the data.

¡ Autoencoders address this through some combination of:


¡ Bottleneck layer – fewer degrees of freedom than in possible outputs.
¡ Training to denoise.
¡ Sparsity through regularization.
¡ Contractive penalty.

98

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


BOTTLENECK LAYER (UNDERCOMPLETE)

¡ Suppose input images are nxn and the latent space is m < nxn.
¡ Then the latent space is not sufficient to reproduce all images.
¡ Needs to learn an encoding that captures the important features in training data, sufficient for approximate
reconstruction.

99

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


SIMPLE BOTTLENECK LAYER IN KERAS

¡ input_img = Input(shape=(784,))
¡ encoding_dim = 32
¡ encoded = Dense(encoding_dim, activation='relu')(input_img)
¡ decoded = Dense(784, activation='sigmoid')(encoded)
¡ autoencoder = Model(input_img, decoded)
¡ Maps 28x28 images into a 32 dimensional vector.
¡ Can also use more layers and/or convolutions.

100

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


https://blog.keras.io/building-autoencoders-in-keras.html
DENOISING AUTOENCODERS

¡ Basic autoencoder trains to minimize the loss between x and the reconstruction g(f(x)).
¡ Denoising autoencoders train to minimize the loss between x and g(f(x+w)), where w is random noise.
¡ Same possible architectures, different training data.
¡ Kaggle has a dataset on damaged documents.

101

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


https://blog.keras.io/building-autoencoders-in-keras.html
DENOISING AUTOENCODERS

¡ Denoising autoencoders can’t simply memorize the input output relationship.


¡ Intuitively, a denoising autoencoder learns a projection from a neighborhood of our
training data back onto the training data.

102

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


SPARSE AUTOENCODERS

¡ Construct a loss function to penalize activations within a layer.


¡ Usually regularize the weights of a network, not the activations.
¡ Individual nodes of a trained model that activate are data-dependent.
¡ Different inputs will result in activations of different nodes through the network.
¡ Selectively activate regions of the network depending on the input data.

103

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


https://www.jeremyjordan.me/autoencoders/
SPARSE AUTOENCODERS

¡ Construct a loss function to penalize activations the network.


¡ L1 Regularization: Penalize the absolute value of the vector of activations a in layer h for observation I

¡ KL divergence: Use cross-entropy between average activation and desired activation

104

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


https://www.jeremyjordan.me/autoencoders/
CONTRACTIVE AUTOENCODERS

¡ Arrange for similar inputs to have similar activations.


¡ I.e., the derivative of the hidden layer activations are small with respect to the input.

¡ Denoising autoencoders make the reconstruction function (encoder+decoder) resist


small perturbations of the input
¡ Contractive autoencoders make the feature extraction function (ie. encoder) resist
infinitesimal perturbations of the input.

105

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


https://www.jeremyjordan.me/autoencoders/
CONTRACTIVE AUTOENCODERS

¡ Contractive autoencoders make the feature extraction function (ie. encoder) resist infinitesimal perturbations of
the input.

106

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


https://ift6266h17.files.wordpress.com/2017/03/14_autoencoders.pdf
AUTOENCODERS

¡ Both the denoising and contractive autoencoder can perform well


¡ Advantage of denoising autoencoder : simpler to implement-requires adding one or two lines of code to regular
autoencoder-no need to compute Jacobian of hidden layer
¡ Advantage of contractive autoencoder : gradient is deterministic -can use second order optimizers (conjugate gradient,
LBFGS, etc.)-might be more stable than denoising autoencoder, which uses a sampled gradient
¡ To learn more on contractive autoencoders:
¡ Contractive Auto-Encoders: Explicit Invariance During Feature Extraction. Salah Rifai, Pascal Vincent, Xavier Muller, Xavier
Glorot et Yoshua Bengio, 2011.

107

DR. HIMANI DESHPANDE (TSEC, MUMBAI)


https://ift6266h17.files.wordpress.com/2017/03/14_autoencoders.pdf

You might also like