DL Unit3 Autoencoder
DL Unit3 Autoencoder
UNIT - III
By,
DR. HIMANI DESHPANDE (TSEC, MUMBAI) Dr. Himani Deshpande 1
UNIT – III AUTOENCODERS
3.1
Introduction, Linear Autoencoder, Undercomplete Autoencoder, Overcomplete
Autoencoders, Regularization in Autoencoders
3.2
Denoising Autoencoders, Sparse Autoencoders, Contractive Autoencoders
3.3
Application of Autoencoders: Image Compression
2
10
11
12
13
14
15
16
17
DATA SPECIFIC
UNSUPERVISED
LOSSY
PROPERTIES OF AUTOENCODERS
• Autoencoders are data-specific, which means that they will only be able to compress
data similar to what they have been trained on.
• Example, an autoencoder trained on pictures of faces would do a rather poor job of
compressing pictures of trees, because the features it would learn would be face-specific.
• Autoencoders are lossy, which means that the decompressed outputs will be
degraded compared to the original inputs.
• Autoencoders are learned automatically from data examples, which is a useful
property: it means that it is easy to train specialized instances of the algorithm that will
perform well on a specific type of input. It doesn’t require any new engineering, just
appropriate training data. 20
¡ Encoder : This part of the network encodes or compresses the input data into a
latent-space representation. The compressed data typically looks garbled, nothing
like the original data.
¡ Decoder : This part of network decodes or reconstructs the encoded data(latent
space representation) back to original dimension. The decoded data is a lossy
reconstruction of the original data.
21
Usually <784
Compact
NN code representation of
Encoder the input object
28 X 28 = 784
Learn together
code NN
Decoder
Can reconstruct
the original object
22
Minimize 𝑥 − 𝑥# "
As close as possible
encode decode
𝑥 𝑐 𝑥#
𝑊 𝑊!
hidden layer
Input layer (linear) output layer
Bottleneck later
23
DR. HIMANI DESHPANDE (TSEC, MUMBAI) Output of the hidden layer is the code
TRAINING AUTOENCODERS
CODE SIZE
NUMBER OF LAYERS
NUMBER OF NODES
PER LAYER LOSS FUNCTION
TRAINING AUTOENCODERS
1.Code size: The code size or the size of the bottleneck is the most important hyperparameter used to tune the
autoencoder. The bottleneck size decides how much the data has to be compressed. This can also act as a
regularisation term.
2.Number of layers: Like all neural networks, an important hyperparameter to tune autoencoders is the depth
of the encoder and the decoder. While a higher depth increases model complexity, a lower depth is faster to
process.
3.Number of nodes per layer: The number of nodes per layer defines the weights we use per layer. Typically,
the number of nodes decreases with each subsequent layer in the autoencoder as the input to each of these
layers becomes smaller across the layers.
4.Reconstruction Loss: The loss function we use to train the autoencoder is highly dependent on the type of
input and output we want the autoencoder to adapt to. If we are working with image data, the most popular
loss functions for reconstruction are MSE Loss and L1 Loss. In case the inputs and outputs are within the range
[0,1], as in MNIST, we can also make use of Binary Cross Entropy as the reconstruction loss.
AUTO ENCODERS ARCHITECTURE
27
30
31
32
9. Linear Autoencoder
37
38
39
40
42
43
44
45
46
48
¡ We can make out latent space representation learn useful features by giving it smaller
dimensions then input data. In this case autoencoder is undercomplete. By training an
undercomplete representation, we force the autoencoder to learn the most salient features
of the training data. If we give autoencoder much capacity(like if we have almost same
dimensions for input data and latent space), then it will just learn copying task without
extracting useful features or information from data.
¡ If dimensions of latent space is equal to or greater then to input data, in such case
autoencoder is overcomplete. In such case even linear encoder and linear decoder can learn
to copy the input to the output without learning anything useful about the data distribution.
50
A stacked autoencoder is a
neural network consist several
layers of sparse autoencoders
where output of each hidden
layer is connected to the input
of the successive hidden layer.
51
Autoencoders
It is fine to
loose some
data in this
case
52
¡ Regularization helps with the effects of out-of-control parameters by using different methods to
minimize parameter size over time.
¡ Regularization coefficients L1 and L2 help fight overfitting by making certain weights smaller. Smaller-
valued weights lead to simpler hypotheses, which are the most generalizable.
¡ Unregularized weights with several higher-order polynomials in the feature sets tend to overfit the
training set.
54
57
Θ= combination of all the weights and bias
Θ = [ w1,w2,w3,….. ]
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
REGULARIZATION
¡ The regularized autoencoders use a loss function that helps the model to have other
properties besides copying input to the output.
¡ We can generally find two types of regularized autoencoder:
¡ the denoising autoencoder and
¡ the sparse autoencoder.
58
¡ Autoencoders are Neural Networks which are commonly used for feature selection and extraction.
However, when there are more nodes in the hidden layer than there are inputs, the Network is risking
to learn the so-called “Identity Function”, also called “Null Function”, meaning that the output equals
the input, marking the Autoencoder useless.
¡ Denoising Autoencoders solve this problem by corrupting the data on purpose by randomly turning
some of the input values to zero. In general, the percentage of input nodes which are being set to zero
is about 50%. Other sources suggest a lower count, such as 30%. It depends on the amount of data
and input nodes you have.
¡ When calculating the Loss function, it is important to compare the output values with the original
input, not with the corrupted input. That way, the risk of learning the identity function instead of
extracting features is eliminated.
59
61
62
69
70
71
¡ In sparse autoencoders with a sparsity enforcer that directs a single layer network to learn
code dictionary which minimizes the error in reproducing the input while constraining
number of code words for reconstruction.
¡ The sparse autoencoder consists of a single hidden layer, which is connected to the input
vector by a weight matrix forming the encoding step. The hidden layer outputs to a
reconstruction vector, using a tied weight matrix to form the decoder.
73
74
¡ A Contractive Autoencoder is an autoencoder that adds a penalty term to the classical reconstruction cost
function.
¡ This penalty term corresponds to the Frobenius norm of the Jacobian matrix of the encoder activations with
respect to the input.
¡ Contractive Autoencoder was proposed by the researchers at the University of Toronto in 2011 in the
paper Contractive auto-encoders: Explicit invariance during feature extraction. The idea behind that is
to make the autoencoders robust of small changes in the training dataset.
¡ To deal with the above challenge that is posed in basic autoencoders, the authors proposed to add
another penalty term to the loss function of autoencoders.
¡ The Loss function:
Contractive autoencoder adds an extra term in the loss function of autoencoder, it is given as:
i.e the above penalty term is the Frobenius Norm of the encoder, the frobenius norm is just a
generalization of Euclidean norm. 76
¡ There is a connection between the denoising autoencoder and the contractive autoencoder:
¡ the denoising reconstruction error is equivalent to a contractive penalty on the reconstruction function that
maps x to r - g(f(x)).
¡ In other words, denoising autoencoders make the reconstruction function resist small but finite sized
perturbations of the input, whereas contractive autoencoders make the feature extraction function resist
infinitesimal perturbations of the input.
77
As close as possible
Output Layer
Input Layer
bottle
… …
Layer
Layer
Layer
Layer
Layer
Layer
𝑥 Initialize by RBM 𝑥#
Code 78
layer-by-layer
DR. HIMANI DESHPANDE (TSEC, MUMBAI)
85
IMAGE 04 03 FEATURE
COLORIZATION
VARIATION
USE OF AUTO ENCODERS
¡ Data denoising and Dimensionality reduction for data visualization are considered
as two main interesting practical applications of autoencoders. With appropriate
dimensionality and sparsity constraints, autoencoders can learn data projections that
are more interesting than PCA or other basic techniques.
¡ Autoencoders also can be used for Image Reconstruction, Basic Image colorization,
data compression, gray-scale images to colored images, generating higher resolution
images etc.
87
•Image Compression
•Dimensionality reduction
•Feature extraction
•Denoising of data/images
•Imputing missing data
IMAGE COMPRESSION
89
¡ Autoencoders are a deep learning model for transforming data from a high-
dimensional space to a lower-dimensional space. They work by encoding the data,
whatever its size, to a 1-D vector. This vector can then be decoded to reconstruct
the original data (in this case, an image).
¡ Denoising: input clean image + noise and train to reproduce the clean image.
91
¡ Image colorization: input black and white and train to produce color images
92
¡ Watermark removal
93
94
95
¡ Data-specific: Autoencoders are only able to compress data similar to what they have been trained on.
¡ Lossy: The decompressed outputs will be degraded compared to the original inputs.
¡ Learned automatically from examples: It is easy to train specialized instances of the algorithm that will
perform well on a specific type of input.
97
¡ As with other NNs, overfitting is a problem when capacity is too large for the data.
98
¡ Suppose input images are nxn and the latent space is m < nxn.
¡ Then the latent space is not sufficient to reproduce all images.
¡ Needs to learn an encoding that captures the important features in training data, sufficient for approximate
reconstruction.
99
¡ input_img = Input(shape=(784,))
¡ encoding_dim = 32
¡ encoded = Dense(encoding_dim, activation='relu')(input_img)
¡ decoded = Dense(784, activation='sigmoid')(encoded)
¡ autoencoder = Model(input_img, decoded)
¡ Maps 28x28 images into a 32 dimensional vector.
¡ Can also use more layers and/or convolutions.
100
¡ Basic autoencoder trains to minimize the loss between x and the reconstruction g(f(x)).
¡ Denoising autoencoders train to minimize the loss between x and g(f(x+w)), where w is random noise.
¡ Same possible architectures, different training data.
¡ Kaggle has a dataset on damaged documents.
101
102
103
104
105
¡ Contractive autoencoders make the feature extraction function (ie. encoder) resist infinitesimal perturbations of
the input.
106
107