0% found this document useful (0 votes)
7 views46 pages

Unit VI

The document provides an overview of Auto-Encoders, including their components (encoder, bottleneck, and decoder) and their purpose in learning low-dimensional representations of data. It discusses regularization techniques to prevent overfitting, such as L1 and L2 regularization, and introduces specialized types like Denoising Autoencoders and Sparse Autoencoders, which enhance feature extraction and noise reduction. Additionally, it outlines the applications and advantages of these autoencoders in various fields, including image and audio processing.

Uploaded by

iron92469
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views46 pages

Unit VI

The document provides an overview of Auto-Encoders, including their components (encoder, bottleneck, and decoder) and their purpose in learning low-dimensional representations of data. It discusses regularization techniques to prevent overfitting, such as L1 and L2 regularization, and introduces specialized types like Denoising Autoencoders and Sparse Autoencoders, which enhance feature extraction and noise reduction. Additionally, it outlines the applications and advantages of these autoencoders in various fields, including image and audio processing.

Uploaded by

iron92469
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

UNIT VI

Auto-Encoders

P Jyothi,Asst. Prof., CSE Dept.


P Jyothi
Asst. prof.,
CSE Dept.

P Jyothi,Asst. Prof., CSE Dept.


Auto-Encoders Introduction

 Encoder:
• The encoder is a neural network with one or more hidden layers.
• It receives noisy input data instead of the original input and generates an
encoding in a low-dimensional space.
• There are several ways to generate a corrupted input. The most common being
adding a Gaussian noise or randomly masking some of the inputs.
 Decoder:
• Similar to encoders, decoders are implemented as neural networks with one or
more hidden layers.
• It takes the encoding generated by the encoder as input and reconstructs the
original data.
• When calculating the Loss function it compares the output values with the
original input, not with the corrupted input.
P Jyothi,Asst. Prof., CSE Dept.
Auto-Encoders Introduction conti..

 An Autoencoder has the following parts:


1. Encoder: The encoder is the part of the network which
takes in the input and produces a lower Dimensional
encoding
2. Bottleneck: It is the lower dimensional hidden layer where
the encoding is produced. The bottleneck layer has a lower
number of nodes and the number of nodes in the bottleneck
layer also gives the dimension of the encoding of the input.
3. Decoder: The decoder takes in the encoding and recreates
back the input.
P Jyothi,Asst. Prof., CSE Dept.
Auto-Encoders Introduction conti..

P Jyothi,Asst. Prof., CSE Dept.


Auto-Encoders Introduction conti..

 The bottleneck layer is the lower dimension layer. In the diagram,


we have the neural networks encoder and decoder. Phi and
Theta are the representing parameters of the encoder and
decoder respectively.
 The target of this model is such that the Input is equivalent to the
Reconstructed Output. To achieve this we minimize a loss
function named Reconstruction Loss. Basically, Reconstruction
Loss is given by the error between the input and the
reconstructed output. It is usually given by the Mean Square
error or Binary Crossentropy between the input and
reconstructed output. Binary Crossentropy is used if the data is
binary.
P Jyothi,Asst. Prof., CSE Dept.
Auto-Encoders Introduction conti..

Autoencoders:
 Autoencoders present an efficient way to learn a
representation of your data, which helps with tasks such as
dimensionality reduction or feature extraction. You can even
train an autoencoder to identify and remove noise from
your data.

P Jyothi,Asst. Prof., CSE Dept.


Regularization in auto-encoders

 Regularization helps with the effects of out-of-control parameters by using


different methods to minimize parameter size over time.
 In mathematical notation, we see regularization represented by the
coefficient lambda, controlling the trade-off between finding a good fit and
keeping the value of certain feature weights low as the exponents on
features increase.
 Regularization coefficients L1 and L2 help fight overfitting by making
certain weights smaller. Smaller-valued weights lead to simpler
hypotheses, which are the most generalizable. Unregularized weights with
several higher-order polynomials in the feature sets tend to overfit the
training set.
 As the input training set size grows, the effect of regularization decreases,
and the parameters tend to increase in magnitude. This is appropriate
because an excess of features relative to training set examples leads to
P Jyothi,Asst. Prof., CSE Dept.
overfitting in the first place. Bigger data is the ultimate regularizer.
Regularization in auto-encoders conti..

 There are other ways we can constraint the reconstruction


of an autoencoder than to impose a hidden layer of smaller
dimension than the input. Rather than limiting the model
capacity by keeping the encoder and decoder shallow and
the code size small, regularized autoencoders use a loss
function that encourages the model to have other properties
besides the ability to copy its input to its output. In practice,
we usually find two types of regularized autoencoder:
the sparse autoencoder and the denoising autoencoder.

P Jyothi,Asst. Prof., CSE Dept.


Regularization in auto-encoders conti..

 There are other ways to constrain the reconstruction of an


autoencoder than to impose a hidden layer of smaller
dimensions than the input. The regularized autoencoders
use a loss function that helps the model to have other
properties besides copying input to the output. We can
generally find two types of regularized autoencoder: the
denoising autoencoder and the sparse autoencoder.

P Jyothi,Asst. Prof., CSE Dept.


Regularization in auto-encoders conti..

 Sparse autoencoder : Sparse autoencoders are typically


used to learn features for another task such as
classification. An autoencoder that has been regularized to
be sparse must respond to unique statistical features of the
dataset it has been trained on, rather than simply acting as
an identity function. In this way, training to perform the
copying task with a sparsity penalty can yield a model that
has learned useful features as a byproduct.

P Jyothi,Asst. Prof., CSE Dept.


Regularization in auto-encoders conti..

 Another way we can constraint the reconstruction of autoencoder


is to impose a constraint in its loss. We could, for example, add a
reguralization term in the loss function. Doing this will make our
autoencoder learn sparse representation of data.
 Denoising autoencoder : Rather than adding a penalty to the
loss function, we can obtain an autoencoder that learns
something useful by changing the reconstruction error term of
the loss function. This can be done by adding some noise of the
input image and make the autoencoder learn to remove it. By
this means, the encoder will extract the most important features
and learn a robuster representation of the data.

P Jyothi,Asst. Prof., CSE Dept.


Regularization in auto-encoders conti..

 There are actually two different ways to construct our sparsity


penalty: L1 regularization and KL-divergence. And here we will only
talk about L1 regularization.
 L1 regularization and L2 regularization are widely used in machine
learning and deep learning. L1 regularization adds “absolute value of
magnitude” of coefficients as penalty term while L2 regularization adds
“squared magnitude” of coefficient as a penalty term.
 Although L1 and L2 can both be used as regularization term, the key
difference between them is that L1 regularization tends to shrink the
penalty coefficient to zero while L2 regularization would move
coefficients towards zero but they will never reach. Thus L1
regularization is often used as a method of feature extraction. But why
L1 regularization leads to sparsity?
P Jyothi,Asst. Prof., CSE Dept.
Regularization in auto-encoders conti..

P Jyothi,Asst. Prof., CSE Dept.


Regularization in auto-encoders conti..

P Jyothi,Asst. Prof., CSE Dept.


Denoising Autoencoders

 Autoencoders are Neural Networks which are commonly used for


feature selection and extraction. However, when there are more
nodes in the hidden layer than there are inputs, the Network is
risking to learn the so-called “Identity Function”, also called “Null
Function”, meaning that the output equals the input, marking the
Autoencoder useless.
 Denoising Autoencoders(DAE) solve this problem by corrupting
the data on purpose by randomly turning some of the input
values to zero. In general, the percentage of input nodes which
are being set to zero is about 50%. Other sources suggest a
lower count, such as 30%. It depends on the amount of data and
input nodes you have.

P Jyothi,Asst. Prof., CSE Dept.


Denoising Autoencoders Conti..

• If DAEs are trained with partially corrupted inputs (e.g., with


masking values), they learn to impute or fill in missing
information during the reconstruction process. This makes
them useful for tasks involving incomplete datasets.
• If DAEs are trained with partially noisy inputs (gaussian
noise) DAEs tend to generalize well to unseen, real-world
data with different levels of noise or corruption as they
learn to extract robust features. This is beneficial in various
applications where data quality is compromised, such as
image denoising or signal processing.
P Jyothi,Asst. Prof., CSE Dept.
Denoising Autoencoders Conti..

Objective Function of DAE


 The objective of DAE is to minimize the difference between
the original input (clean input without the notice) and the
reconstructed output. This is quantified using a
reconstruction loss function. Two types of loss function are
generally used depending on the type of input data.

P Jyothi,Asst. Prof., CSE Dept.


Denoising Autoencoders Conti..

 denoising autoencoder is a modification of the original


autoencoder in which instead of giving the original input
we give a corrupted or noisy version of input to the
encoder while decoder loss is calculated concerning
original input only. This results in efficient learning of
autoencoders and the risk of autoencoder becoming an
identity function is significantly reduced.

P Jyothi,Asst. Prof., CSE Dept.


Denoising Autoencoders Conti..

 Applications of DAE
• Image Denoising: DAEs are widely employed for cleaning and
enhancing images by removing noise.
• Audio Denoising: DAEs can be applied to denoise audio signals, making
them valuable in speech-enhancement tasks.
• Sensor Data Processing: DAEs are valuable in processing sensor data,
removing noise, and extracting relevant information from sensor
readings.
• Data Compression: Autoencoders, including DAEs, can be utilized for
data compression by learning compact representations of input data.
• Feature Learning: DAEs are effective in unsupervised feature learning,
capturing relevant features in the data without explicit labels.

P Jyothi,Asst. Prof., CSE Dept.


Denoising Autoencoders Conti..

 Advantages
1. This type of autoencoder can extract important features and reduce the
noise or the useless features.
2. Denoising autoencoders can be used as a form of data augmentation, the
restored images can be used as augmented data thus generating additional
training samples.
 Disadvantages
1. Selecting the right type and level of noise to introduce can be challenging
and may require domain knowledge.
2. Denoising process can result into loss of some information that is needed
from the original input. This loss can impact accuracy of the output.

P Jyothi,Asst. Prof., CSE Dept.


Architecture
Denoising Autoencoders Conti..

P Jyothi,Asst. Prof., CSE Dept.


Denoising Autoencoders Conti..

 When calculating the Loss function, it is important to


compare the output values with the original input, not with
the corrupted input. That way, the risk of learning the
identity function instead of extracting features is eliminated.

P Jyothi,Asst. Prof., CSE Dept.


Sparse Autoencoder

 A sparse autoencoder is simply an autoencoder whose training


criterion involves a sparsity penalty. In most cases, we would
construct our loss function by penalizing activations of hidden
layers so that only a few nodes are encouraged to activate when
a single sample is fed into the network.
 This type of autoencoder typically contains more hidden units
than the input but only a few are allowed to be active at once.
This property is called the sparsity of the network. The sparsity of
the network can be controlled by either manually zeroing the
required hidden units, tuning the activation functions or by
adding a loss term to the cost function.

P Jyothi,Asst. Prof., CSE Dept.


Sparse Autoencoder Conti..

 The intuition behind this method is that, for example, if a


man claims to be an expert in mathematics, computer
science, psychology, and classical music, he might be just
learning some quite shallow knowledge in these subjects.
However, if he only claims to be devoted to mathematics,
we would like to anticipate some useful insights from him.
And it’s the same for autoencoders we’re training — fewer
nodes activating while still keeping its performance would
guarantee that the autoencoder is actually learning latent
representations instead of redundant information in our
input data.
P Jyothi,Asst. Prof., CSE Dept.
Sparse Autoencoder Conti..

P Jyothi,Asst. Prof., CSE Dept.


Sparse Autoencoder Conti..

 Advantages
1. The sparsity constraint in sparse autoencoders helps in filtering out noise and
irrelevant features during the encoding process.
2. These autoencoders often learn important and meaningful features due to their
emphasis on sparse activations.
 Disadvantages
1. The choice of hyperparameters play a significant role in the performance of this
autoencoder. Different inputs should result in the activation of different nodes
of the network.
2. The application of sparsity constraint increases computational complexity

P Jyothi,Asst. Prof., CSE Dept.


Contractive auto-encoders

 A Contractive Autoencoder (CAE) is a specific type of


autoencoder used in unsupervised machine learning.
Autoencoders are neural networks designed to learn
efficient representations of the input data, called encodings,
by training the network to ignore insignificant data (“noise”).
These encodings can then be used for tasks such as
dimensionality reduction, feature learning, and more.

P Jyothi,Asst. Prof., CSE Dept.


Contractive auto-encoders conti..

 A Contractive Autoencoder consists of two main components: an


encoder and a decoder. The encoder compresses the input into a
lower-dimensional representation, and the decoder reconstructs the
input from this representation. The goal is for the reconstructed output
to be as close as possible to the original input.
 The training process involves minimizing a loss function that has two
terms. The first term is the reconstruction loss, which measures the
difference between the original input and the reconstructed output. The
second term is the regularization term, which measures the sensitivity
of the encoded representations to the input. By penalizing the
sensitivity, the CAE learns to produce encodings that do not change
much when the input is perturbed slightly, leading to more robust
features.
P Jyothi,Asst. Prof., CSE Dept.
Contractive auto-encoders conti..

 The principle that the contractive autoencoders are based


on is pretty similar to the denoising encoders. The idea is
that the encodings produced for similar inputs will be
similar. In other words, if we change the inputs or tweak
them by just a little the encodings will remain the same and
show no changes. They are used for feature extractions.
 Autoencoders can be implemented using any kind of neural
network, like for image data we can use Convolutional
Neural Nets and for time series data we can use Recurrent
Neural Nets.
P Jyothi,Asst. Prof., CSE Dept.
Contractive auto-encoders conti..

Applications of Contractive Autoencoders


 Contractive Autoencoders have several applications in the field of machine learning
and artificial intelligence:
• Feature Learning: CAEs can learn to capture the most salient features in the data,
which can then be used for various downstream tasks such as classification or
clustering.
• Dimensionality Reduction:Like other autoencoders, CAEs can reduce the
dimensionality of data, which is useful for visualization or as a preprocessing step
for other algorithms that perform poorly with high-dimensional data.
• Denoising: Due to their contractive property, CAEs can be used to remove noise
from data, as they learn to ignore small variations in the input.
• Data Generation: While not their primary application, autoencoders can generate
new data points by decoding samples from the learned encoding space.

P Jyothi,Asst. Prof., CSE Dept.


Contractive auto-encoders conti..

Advantages of Contractive Autoencoders


 Contractive Autoencoders offer several advantages:
• Robustness to Noise: By design, CAEs are robust to small perturbations or
noise in the input data.
• Improved Generalization: The contractive penalty encourages the model to
learn more general features that do not depend on the specific noise or
variations present in the training data.
• Stability: The regularization term helps to stabilize the training process by
preventing the model from learning trivial or overfitted representations.

P Jyothi,Asst. Prof., CSE Dept.


Contractive auto-encoders conti..

 Challenges with Contractive Autoencoders


 Despite their advantages, CAEs also present some challenges:
• Computational Complexity: Calculating the Jacobian matrix for the
contractive penalty can be computationally expensive, especially for large
neural networks.
• Hyperparameter Tuning:The strength of the contractive penalty is controlled
by a hyperparameter that needs to be carefully tuned to balance the
reconstruction loss and the regularization term.
• Choice of Regularization: The effectiveness of the CAE can depend on the
choice of regularization term, and different problems may require different forms
of the contractive penalty.

P Jyothi,Asst. Prof., CSE Dept.


Structured Probabilistic Models for Deep
Learning
 A structured probabilistic model is a way of describing a
probability distribution, using a graph (consisting of nodes
and edges) to describe which random variables in the
probability distribution interact with each other directly.
 A structured probablistic model is a way of describing a
probablistic distribution, using a graph to descrie which random
variable in the probablistic distribution interact with each other
directly.e often also referred to as graphical models.

P Jyothi,Asst. Prof., CSE Dept.


Structured Probabilistic Models for Deep
Learning Conti..
 They are a way of describing probability distributions using a graph to
describe which variables interact with each other directly • Graph is used in
the sense of graph theory – Vertices connected to one another by edges

P Jyothi,Asst. Prof., CSE Dept.


Structured Probabilistic Models for Deep
Learning Conti..
 Structured probabilistic models use graphs to represent
interactions between random variables. Each node
represents a random variable. Each edge represents a
direct interaction. These direct interactions imply other,
indirect interactions, but only the direct interactions need
to be explicitly modeled. In the following sections we
describe two categories of graphical models: models
based on directed acyclic graphs, and models based on
undirected graphs.

P Jyothi,Asst. Prof., CSE Dept.


Structured Probabilistic Models for Deep
Learning Conti..
1.Directed Models:

 One kind of structured probabilistic model is the directed


graphical model, otherwise known as the belief network
or Bayesian network. that is, they point from one vertex
to another. Drawing an arrow from a to b means the
distribution over b depends on the value of a.

P Jyothi,Asst. Prof., CSE Dept.


Structured Probabilistic Models for Deep
Learning Conti..
 Continuing with the relay race example, suppose we name
Alice’s, Bob’s and Carol’s finishing times respectively t0,
t1 and t2. As we saw earlier, our estimate of t1 depends
on t0. Our estimate of t2 depends directly on t1 but only
indirectly on t0. We can draw this relationship in a
directed graphical model, illustrated in figure 1.

P Jyothi,Asst. Prof., CSE Dept.


Structured Probabilistic Models for Deep
Learning Conti..
 Formally, a directed graphical model defined on variables
x is defined by a directed acyclic graph G whose vertices
are the random variables in the model, and a set of local
conditional probability distributions p(xi |P aG(xi)) where
P aG(xi) gives the set of parents of xi in G. The probability
distribution over x is given by

P Jyothi,Asst. Prof., CSE Dept.


Structured Probabilistic Models for Deep
Learning Conti..

P Jyothi,Asst. Prof., CSE Dept.


Structured Probabilistic Models for Deep
Learning Conti..
2. Undirected Models:

 Another popular language is that of undirected models,


otherwise known as Markov random fields (MRFs) or
Markov networks
 Not all situations we might want to model have such a
clear direction to their interactions. When the
interactions seem to have no intrinsic direction, or to
operate in both directions, it may be more appropriate to
use an undirected model.

P Jyothi,Asst. Prof., CSE Dept.


Structured Probabilistic Models for Deep
Learning Conti..
 As an example of such a situation, suppose we want to model a distribution
over three binary variables: whether or not you are sick, whether or not your
coworker is sick, and whether or not your roommate is sick represented by hy,
hc and hr. Assuming that your coworker and your roommate do not know each
other, it is very unlikely that one of them will give the other an infection
directly. However, it is reasonably likely that either of them could give you a
cold, and that you could pass it on to the other.
 We can model the indirect transmission of a cold from your coworker to your
roommate by modeling the transmission of the cold from your coworker to
you and the transmission of the cold from you to your roommate. See figure 2
for the drawing representing this scenario. Unlike directed models, the edge
in an undirected model has no arrow, and is not associated with a conditional
probability distribution.
P Jyothi,Asst. Prof., CSE Dept.
Structured Probabilistic Models for Deep
Learning Conti..

P Jyothi,Asst. Prof., CSE Dept.


Structured Probabilistic Models for Deep
Learning Conti..
 Formally, an undirected graphical model is a structured
probabilistic model defined on an undirected graph G. For
each clique C in the graph, a factor ϕ(C) (also called a
clique potential) measures the affinity of the variables in
that clique for being in each of their possible joint states.
The factors are constrained to be non-negative. Together
they define an unnormalized probability

P Jyothi,Asst. Prof., CSE Dept.


Structured Probabilistic Models for Deep
Learning Conti..
 To complete the model, we would need to also define a similar factor
for the clique containing hy and hr.

P Jyothi,Asst. Prof., CSE Dept.


Structured Probabilistic Models for Deep
Learning Conti..
Advantages of Structured Modeling:
 The primary advantage of using structured probabilistic models
is that they allow us to dramatically reduce the cost of
representing probability distributions as well as learning and
inference. Sampling is also accelerated in the case of directed
models, while the situation can be complicated with undirected
models. A less quantifiable benefit of using structured
probabilistic models is that they allow us to explicitly separate
representation of knowledge from learning of knowledge or
inference given existing knowledge. This makes our models
easier to develop and debug.

P Jyothi,Asst. Prof., CSE Dept.

You might also like