0% found this document useful (0 votes)
17 views12 pages

Autoencoders

Autoencoders in Deep Learning

Uploaded by

mohitdubey42551
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views12 pages

Autoencoders

Autoencoders in Deep Learning

Uploaded by

mohitdubey42551
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

An autoencoder is a type of artificial neural network used to learn data encodings in an

unsupervised manner.
The aim of an autoencoder is to learn a lower-dimensional representation (encoding) for a higher-
dimensional data, typically for dimensionality reduction, by training the network to capture the
most important parts of the input image.

Autoencoders are a specific type of feedforward neural networks where the input is the same as
the output. They compress the input into a lower-dimensional code and then reconstruct the
output from this representation. The code is a compact “summary” or “compression” of the input,
also called the latent-space representation.

An autoencoder consists of 3 components: encoder, code and decoder. The encoder compresses
the input and produces the code, the decoder then reconstructs the input only using this code.

To build an autoencoder we need 3 things: an encoding method, decoding method, and a loss
function to compare the output with the target. We will explore these in the next section.

Autoencoders are mainly a dimensionality reduction (or compression) algorithm with a couple of
important properties:

 Data-specific: Autoencoders are only able to meaningfully compress data similar to what they
have been trained on. Since they learn features specific for the given training data, they are
different than a standard data compression algorithm like gzip. So we can’t expect an
autoencoder trained on handwritten digits to compress landscape photos.

 Lossy: The output of the autoencoder will not be exactly the same as the input, it will be a
close but degraded representation. If you want lossless compression they are not the way to go.

 Unsupervised: To train an autoencoder we don’t need to do anything fancy, just throw the raw
input data at it. Autoencoders are considered an unsupervised learning technique since they
don’t need explicit labels to train on. But to be more precise they are self-supervised because
they generate their own labels from the training data.

2. Architecture

Let’s explore the details of the encoder, code and decoder. Both the encoder and decoder are
fully-connected feedforward neural networks, essentially the ANNs we covered in Part 1. Code is
a single layer of an ANN with the dimensionality of our choice. The number of nodes in the code
layer (code size) is a hyperparameter that we set before training the autoencoder.

This is a more detailed visualization of an autoencoder. First the input passes through the encoder,
which is a fully-connected ANN, to produce the code. The decoder, which has the similar ANN
structure, then produces the output only using the code. The goal is to get an output identical with
the input. Note that the decoder architecture is the mirror image of the encoder. This is not a
requirement but it’s typically the case. The only requirement is the dimensionality of the input and
output needs to be the same. Anything in the middle can be played with.

There are 4 hyperparameters that we need to set before training an autoencoder:

 Code size: number of nodes in the middle layer. Smaller size results in more compression.

 Number of layers: the autoencoder can be as deep as we like. In the figure above we have 2
layers in both the encoder and decoder, without considering the input and output.
 Number of nodes per layer: the autoencoder architecture we’re working on is called a stacked
autoencoder since the layers are stacked one after another. Usually stacked autoencoders look
like a “sandwitch”. The number of nodes per layer decreases with each subsequent layer of the
encoder, and increases back in the decoder. Also the decoder is symmetric to the encoder in
terms of layer structure. As noted above this is not necessary and we have total control over
these parameters.

 Loss function: we either use mean squared error (mse) or binary crossentropy. If the input
values are in the range [0, 1] then we typically use crossentropy, otherwise we use the mean
squared error.

 Autoencoders are trained the same way as ANNs via backpropagation.

We have total control over the architecture of the autoencoder. We can make it very powerful by
increasing the number of layers, nodes per layer and most importantly the code size. Increasing
these hyperparameters will let the autoencoder to learn more complex codings. But we should be
careful to not make it too powerful. Otherwise the autoencoder will simply learn to copy its inputs
to the output, without learning any meaningful representation. It will just mimic the identity
function. The autoencoder will reconstruct the training data perfectly, but it will
be overfitting without being able to generalize to new instances, which is not what we want.

This is why we prefer a “sandwitch” architecture, and deliberately keep the code size small. Since
the coding layer has a lower dimensionality than the input data, the autoencoder is said to
be undercomplete. It won’t be able to directly copy its inputs to the output, and will be forced to
learn intelligent features. If the input data has a pattern, for example the digit “1” usually contains
a somewhat straight line and the digit “0” is circular, it will learn this fact and encode it in a more
compact form. If the input data was completely random without any internal correlation or
dependency, then an undercomplete autoencoder won’t be able to recover it perfectly. But luckily
in the real-world there is a lot of dependency.

Denoising Autoencoders

Dnoising autoencoder add random noise to the input and forces the autoencoder to learn the original
data after removing the noise.
The autoencoder trained in such a way that is identified the noise, remove the noise and learns only
the required feature of the original data.

The loss function still checks the difference between the input data and output data this ensures that
there is no overfitting of data and the autoencoder remove the noise and learn the important features
of the input data after removing the noise.

Keeping the code layer small forced our autoencoder to learn an intelligent representation of the
data. There is another way to force the autoencoder to learn useful features, which is adding
random noise to its inputs and making it recover the original noise-free data. This way the
autoencoder can’t simply copy the input to its output because the input also contains random
noise. We are asking it to subtract the noise and produce the underlying meaningful data. This is
called a denoising autoencoder.

The top row contains the original images. We add random Gaussian noise to them and the noisy
data becomes the input to the autoencoder. The autoencoder doesn’t see the original image at all.
But then we expect the autoencoder to regenerate the noise-free original image.

There is only one small difference between the implementation of denoising autoencoder and the
regular one. The architecture doesn’t change at all, only the fit function. We trained the regular
autoencoder as follows:
autoencoder.fit(x_train, x_train)

Denoising autoencoder is trained as:

autoencoder.fit(x_train_noisy, x_train)

Simple as that, everything else is exactly the same. The input to the autoencoder is the noisy
image, and the expected target is the original noise-free one.

Visualization

Now let’s visualize whether we are able to recover the noise-free images.

Looks pretty good. The bottom row is the autoencoder output. We can do better by using more
complex autoencoder architecture, such as convolutional autoencoders.

Denoising Autoencoder in Overcomplete autoencoder


It is always necessary to have less hidden nodes that he number of input nodes in the code layer.
Where there are equal or more hidden nodes then the input nodes the autoencoder is called
overcomplete autoencoder.

We expect the autoencoder to learn new features. But what might happen is that the values in the
input nodes will be copied to the hidden nodes without learning any useful information. That is
the input data is stored without any modification in the hidden node and subsequently transferred
to the output and is found to have learnt the identity function.

Just as an undercomplete autoencoder is used to compress the input data by extracting useful
features , an overcomplete autoencoder can be used to separate the jumbled features in an input
data. Identity encoding can be avoided using denoising encoders.

Sparse Autoencoders
Sparse Autoencoders are one of the valuable types of Autoencoders. The idea behind Sparse
Autoencoders is that we can achieve an information bottleneck (same information with fewer
neurons) without reducing the number of neurons in the hidden layers. The number of neurons in
the hidden layer can be greater than the number in the input layer.
We achieve this by imposing a sparsity constraint on the learning. According to the sparsity
constraint, only some percentage of nodes can be active in a hidden layer. The neurons with
output close to 1 are active, whereas the neurons close to 0 are in-active neurons.
More specifically, we penalize the loss function such that only a few neurons are active in a
layer. We force the autoencoder to represent the input information in fewer neurons by reducing
the number of neurons. Also, we can increase the code size because only a few neurons are
active, corresponding to a layer.
Sparse Autoencoders are a type of artificial neural network that are used for unsupervised
learning of efficient codings. The primary goal of a sparse autoencoder is to learn a
representation (encoding) for a set of data, typically for the purpose of dimensionality
reduction or feature extraction.

What are Sparse Autoencoders?


Sparse Autoencoders are a variant of autoencoders, which are neural networks trained to
reconstruct their input data. However, unlike traditional autoencoders, sparse autoencoders are
designed to be sensitive to specific types of high-level features in the data, while being
insensitive to most other features. This is achieved by imposing a sparsity constraint on the
hidden units during training, which forces the autoencoder to respond to unique statistical
features of the dataset it is trained on.

How do Sparse Autoencoders work?


Sparse Autoencoders consist of an encoder, a decoder, and a loss function. The encoder is used
to compress the input into a latent-space representation, and the decoder is used to reconstruct
the input from this representation. The sparsity constraint is typically enforced by adding a
penalty term to the loss function that encourages the activations of the hidden units to be sparse.
The sparsity constraint can be implemented in various ways, such as by using a sparsity penalty,
a sparsity regularizer, or a sparsity proportion. The sparsity penalty is a term added to the loss
function that penalizes the network for having non-sparse activations. The sparsity regularizer is
a function that encourages the network to have sparse activations. The sparsity proportion is a
hyperparameter that determines the desired level of sparsity in the activations.

Why are Sparse Autoencoders important?


Sparse Autoencoders are important because they can learn useful features from unlabeled data,
which can be used for tasks such as anomaly detection, denoising, and dimensionality reduction.
They are particularly useful when the dimensionality of the input data is high, as they can learn a
lower-dimensional representation that captures the most important features of the data.
Furthermore, Sparse Autoencoders can be used to pretrain deep neural networks. Pretraining a
deep neural network with a sparse autoencoder can help the network learn a good initial set of
weights, which can improve the performance of the network on a subsequent supervised
learning task.

Applications of Sparse Autoencoders


Sparse Autoencoders have been used in a variety of applications, including:

 Anomaly detection: Sparse autoencoders can be used to learn a normal representation of


the data, and then detect anomalies as data points that have a high reconstruction error.
 Denoising: Sparse autoencoders can be used to learn a clean representation of the data,
and then reconstruct the clean data from a noisy input.
 Dimensionality reduction: Sparse autoencoders can be used to learn a lower-dimensional
representation of the data, which can be used for visualization or to reduce the
computational complexity of subsequent tasks.
 Pretraining deep neural networks: Sparse autoencoders can be used to pretrain the
weights of a deep neural network, which can improve the performance of the network on
a subsequent supervised learning task.

Sparse Autoencoders are a powerful tool for unsupervised learning, capable of learning
useful features from high-dimensional data and improving the performance of deep
neural networks.

Contractive Autoencoder (CAE) is a type of neural network used in unsupervised machine


learning. It is a powerful technique for learning compressed and robust representations of
high dimensional data

Introduction

In the realm of machine learning and neural networks, the evolution of autoencoders has been
pivotal in advancing unsupervised learning. Among the various types of autoencoders, the
Contractive Autoencoder (CAE) stands out due to its unique approach to feature learning. This
essay delves into the concept, working mechanism, and applications of Contractive Autoencoders,
highlighting their significance in the field of deep learning.

Understanding Contractive Autoencoders

Contractive Autoencoders are a variant of traditional autoencoders that introduce a regularization


term to the loss function. This term penalizes the model not only for reconstruction errors but also
for the sensitivity of the learned representations to the input data. The primary objective of CAEs
is to learn a representation that is robust to slight variations or noise in the input data.

The Architecture

Like a basic autoencoder, a CAE consists of two main components: an encoder and a decoder.
The encoder compresses the input data into a lower-dimensional latent space, while the decoder
reconstructs the data from this compressed form. The distinction lies in the loss function, where
the CAE incorporates a contractive penalty.

The Contractive Loss Function

The contractive loss in CAEs is typically a Frobenius norm of the Jacobian matrix of the
encoder’s outputs with respect to its inputs. This penalty forces the model to learn a representation
where slight changes in input do not significantly alter the output. Essentially, it encourages the
network to learn a manifold where the data points are densely packed together, leading to a more
robust feature representation.

Advantages of Contractive Autoencoders

1. Enhanced Feature Learning: By penalizing sensitivity to input variations, CAEs learn more
robust and stable features compared to basic autoencoders. This robustness is particularly
beneficial in noisy environments or in scenarios where data augmentation is required.

2. Improved Generalization: The regularization term in CAEs helps in preventing overfitting,


leading to better generalization capabilities. This makes CAEs suitable for a variety of
applications, including image recognition, where they can generalize well from training to real-
world data.

3. Applications in Denoising: Given their inherent resistance to slight variations in input, CAEs
are adept at denoising tasks. They can effectively learn to ignore the “noise” in the data,
focusing instead on the underlying patterns.

Use Cases in Various Domains

1. Image Processing: In image processing, CAEs have shown great promise in tasks like image
denoising, compression, and reconstruction. Their ability to learn stable features makes them
valuable for image classification and recognition tasks.
2. Anomaly Detection: CAEs are adept at learning the normal variations in data, making them
effective for anomaly detection. In scenarios like fraud detection or system fault identification,
CAEs can efficiently differentiate between normal and anomalous patterns.

3. Data Compression: The robust feature learning capability of CAEs also makes them suitable for
data compression tasks. By learning compact representations, they can compress data
effectively without significant loss of information.

Variational Autoencoders:


Autoencoders have emerged as an architecture for data representation and generation. Among
them, Variational Autoencoders (VAEs) stand out, introducing probabilistic encoding and
opening new avenues for diverse applications.
Autoencoders are neural network architectures that are intended for the compression and
reconstruction of data. It consists of an encoder and a decoder; these networks are learning a
simple representation of the input data. Reconstruction loss ensures a close match of output with
input, which is the basis for understanding more advanced architectures such as VAEs. The
encoder aims to learn efficient data encoding from the dataset and pass it into a bottleneck
architecture. The other part of the autoencoder is a decoder that uses latent space in the
bottleneck layer to regenerate images similar to the dataset. These results backpropagate the
neural network in the form of the loss function.
What is a Variational Autoencoder?
Variational autoencoder was proposed in 2013 by Diederik P. Kingma and Max Welling at
Google and Qualcomm. A variational autoencoder (VAE) provides a probabilistic manner for
describing an observation in latent space. Thus, rather than building an encoder that outputs a
single value to describe each latent state attribute, we’ll formulate our encoder to describe a
probability distribution for each latent attribute. It has many applications, such as data
compression, synthetic data creation, etc.

Variational autoencoder is different from an autoencoder in a way that it provides a statistical


manner for describing the samples of the dataset in latent space. Therefore, in the variational
autoencoder, the encoder outputs a probability distribution in the bottleneck layer instead of a
single output value.
Architecture of Variational Autoencoder
 The encoder-decoder architecture lies at the heart of Variational Autoencoders (VAEs),
distinguishing them from traditional autoencoders. The encoder network takes raw input data
and transforms it into a probability distribution within the latent space.
 The latent code generated by the encoder is a probabilistic encoding, allowing the VAE to
express not just a single point in the latent space but a distribution of potential representations.
 The decoder network, in turn, takes a sampled point from the latent distribution and
reconstructs it back into data space. During training, the model refines both the encoder and
decoder parameters to minimize the reconstruction loss – the disparity between the input data
and the decoded output. The goal is not just to achieve accurate reconstruction but also to
regularize the latent space, ensuring that it conforms to a specified distribution.
 The process involves a delicate balance between two essential components: the reconstruction
loss and the regularization term, often represented by the Kullback-Leibler divergence. The
reconstruction loss compels the model to accurately reconstruct the input, while the
regularization term encourages the latent space to adhere to the chosen distribution, preventing
overfitting and promoting generalization.
 By iteratively adjusting these parameters during training, the VAE learns to encode input data
into a meaningful latent space representation. This optimized latent code encapsulates the
underlying features and structures of the data, facilitating precise reconstruction. The
probabilistic nature of the latent space also enables the generation of novel samples by drawing
random points from the learned distribution.

Variational Autoencoder
Mathematics behind Variational Autoencoder
Variational autoencoder uses KL-divergence as its loss function, the goal of this is to minimize
the difference between a supposed distribution and original distribution of dataset.
Suppose we have a distribution z and we want to generate the observation x from it. In other

words, we want to calculate


We can do it by following way:

But, the calculation of p(x) can be quite difficult

This usually makes it an intractable distribution. Hence, we need to approximate p(z|x) to q(z|x)
to make it a tractable distribution. To better approximate p(z|x) to q(z|x), we will minimize the
KL-divergence loss which calculates how similar two distributions are:

By simplifying, the above minimization problem is equivalent to the following maximization


problem :

The first term represents the reconstruction likelihood and the other term ensures that our
learned distribution q is similar to the true prior distribution p.

PCA vs Autoencoder
 Although PCA is fundamentally a linear transformation, auto-encoders may describe
complicated non-linear processes.
 Because PCA features are projections onto the orthogonal basis, they are completely linearly
uncorrelated. However, since autoencoded features are only trained for correct reconstruction,
they may have correlations.
 PCA is quicker and less expensive to compute than autoencoders.
 PCA is quite similar to a single layered autoencoder with a linear activation function.

 Because of the large number of parameters, the autoencoder is prone to overfitting.

 Flexibility: Autoencoders offer greater flexibility in capturing complex, non-linear relationships


in data compared to the linear combinations of PCA.

You might also like