0% found this document useful (0 votes)
8 views31 pages

Unit-V DL

The document discusses autoencoders and deep generative models, focusing on undercomplete autoencoders, regularized autoencoders, sparse autoencoders, denoising autoencoders, contractive autoencoders, and stochastic autoencoders. It explains their architectures, training processes, applications, and advantages, particularly in dimensionality reduction, feature extraction, and generative modeling. Additionally, it covers Boltzmann Machines and Restricted Boltzmann Machines, highlighting their structure and functionality in unsupervised learning.

Uploaded by

bitsmid167
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views31 pages

Unit-V DL

The document discusses autoencoders and deep generative models, focusing on undercomplete autoencoders, regularized autoencoders, sparse autoencoders, denoising autoencoders, contractive autoencoders, and stochastic autoencoders. It explains their architectures, training processes, applications, and advantages, particularly in dimensionality reduction, feature extraction, and generative modeling. Additionally, it covers Boltzmann Machines and Restricted Boltzmann Machines, highlighting their structure and functionality in unsupervised learning.

Uploaded by

bitsmid167
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 31

UNIT-V

Auto Encoders and Deep Generative Models

Undercomplete Auto Encoders

Autoencoders are a type of deep learning algorithm that are

designed to receive an input and transform it into a different

representation. They play an important part in image

construction.

An autoencoder neural networkis an Unsupervised Machine

learning algorithm that applies backpropagation, setting the

target values to be equal to the inputs. Autoencoders are

used to reduce the size of our inputs into a smaller

representation. If anyone needs the original data, they can

reconstruct it from the compressed data.

Autoencoders are preferred over Principal Component

Analysis(PCA) which is also used for diamensionality

reduction because of the following reasons:


 An autoencoder can learn non linear transformations

with a non-linear activation function and multiple layers.

 It doesn’t have to learn dense layers. It can use

convolutional layers to learn which is better for video,

image and series data.

 It is more efficient to learn several layers with an

autoencoder rather than learn one huge transformation

with PCA.

 An autoencoder provides a representation of each layer

as the output.

 It can make use of pre-trained layers from another model

to apply transfer learning to enhance the

encoder/decoder.

Architecture of Autoencoders

An Autoencoder consist of three layers:

1. Encoder

2. Code
3. Decoder

Encoder: This part of the network compresses the input

into a latent space representation.Theencoderlayer

encodes theinputimageasa compressed representation in

a reduced dimension. The compressed imageis the

distorted version of the original image.

Code:Thispart of the network represents the compressed

input which is fed to the decoder.

Decoder:This layer decodes the encoded image back to

the original dimension. The decoded image is a lossy

reconstruction of the original image and it is

reconstructed from the latent space representation.


The layer between the encode and decoder,ie.the

code is also known as Bottleneck. This is a well-

designed approach to decide which aspects of

observed data are relevant information and what

aspects can be discarded.

An autoencoder consists of two parts: an encoder

network and a decoder network. The encoder

network compresses the input data, while the

decodernetwork reconstructs the compressed data

back into its original form.

 The encoder network takes the input data and maps it to

a lower-dimensional representation. This lower-


dimensional representation is the compressed data. The

decoder network takes this compressed data and maps it

back to the original input data. The decoder network is

essentially the inverse of the encoder network.

 The bottleneck layer is the layer in the middle of the

autoencoder thatcontains the compressed data. The size

of the bottleneck layer determines the amount of

compression that can be achieved.

 undercomplete autoencoders intentionally constrain the

size of the hidden layer to be smaller than the input

layer. This constraint forces the model to learn a

compressed representation of the input data, capturing

only its most essential features.

 Training undercomplete autoencoders involves

optimizing the model parameters to minimize the

reconstruction error between the input and output data.


 The objective is to learn a compressed representation of

the input data that captures its essential features while

discarding redundant information. This is typically

achieved through backpropagation and gradient

descent, where the weights of the neural network layers

are adjusted iteratively to minimize the reconstruction

error.

Applications of Undercomplete Autoencoders:

Undercomplete autoencoders find applications in various

domains where dimensionality reduction is desirable.

 Anomaly Detection: The compressed representation

learned by undercomplete autoencoders can highlight

anomalies or outliers in the data by capturing deviations

from the normal patterns.

 Feature Learning: By focusing on the most important

features of the input data, undercomplete autoencoders

aid in feature learning tasks, facilitating subsequent

analysis and classification tasks.


 Data Compression: The compressed representation

generated by undercomplete autoencoders serves as an

efficient encoding of the input data, enabling data

compression and storage optimization.

Regularized Autoencoders

RAE stands for "Regularized Autoencoder" and refers to a specific


type of autoencoder that incorporates regularization techniques to
prevent overfitting and improve generalization. Overfitting occurs
when the model learns to fit the noise in the training data rather than
the underlying patterns, resulting in poor performance on new,
unseen data. Regularization is a method for constraining the model
in order to prevent overfitting and improve its ability to generalize to
new data.
Regularization allows the models to have
 Sparsity of representation (Sparse Autoencoder)
 Robustness to noise( denoising Auto encoder)

Applications of RAE

RAE can be useful in a variety of applications, including:

 Dimensionality Reduction: RAE can be used to learn a


compressed representation of high-dimensional data, making it
easier to work with and visualize.
 Feature Extraction: RAE can be used to extract meaningful
features from the input data, which can then be used as inputs to
other machine learning models.
 Generative Modeling: RAE can be used to generate new data
points that are similar to the training data, which can be useful in
image and text generation tasks.

Sparse Autoencoders
Sparse autoencoders (SAEs) impose a sparsity constraint: rather than
creating an information bottleneck by reducing the number of nodes in
each hidden layer, SAEs create a bottleneck by reducing the number of
nodes that can be activated at the same time.

Whereas a standard undercomplete autoencoder will use the entire neural


network for each observation, autoencoders with a sparsity function are
penalized for each neuron that has been activated beyond a certain
threshold. This enables the encoder and decoder to have a higher capacity
without a corresponding risk of overfitting to training data (because not
all neurons will be activated). It also allows hidden layers to contain
nodes dedicated to discovering specific features: the sparsity function
ensures that it’s only “worth the penalty” to activate those nodes if said
features are present.

Though the calculation of reconstruction error and subsequent


optimization of parameter weights through backpropagation occurs
separately, this optimization is regularized by this sparsity function. The
autoencoder is thus forced to learn the most effective latent space
representation within the given sparsity constraints.

Denoising Autoencoders
Denoising autoencoders are given partially corrupted input data and
trained to restore the original input by removing useless information
through dimensionality reduction.
Unlike most autoencoders, denoising autoencoders do not have the
ground truth data as its input. Instead, Gaussian noise is added to the
original data—for example, adding random static to an image—and the
denoising autoencoder (DAE) learns to filter it out. During model
training, the reconstruction error of the denoised output is not measured
against the corrupted input data, but against the original image.
In addition to preventing overfitting, this training technique also makes
denoising autoencoders very useful for cleaning up noisy or corrupted
image and audio files. Denoising autoencoders have also served as
foundational training paradigms for state-of-the-art image generation
models like Stable Diffusion.
Contractive Autoencoder
A Contractive Autoencoder (CAE) is a specific type of autoencoder used
in unsupervised machine learning. Autoencoders are neural
networks designed to learn efficient representations of the input data,
called encodings, by training the network to ignore insignificant data
(“noise”). These encodings can then be used for tasks such as
dimensionality reduction, feature learning, and more.

The "contractive" aspect of CAEs comes from the fact that they are
regularized to be insensitive to slight variations in the input data. This is
achieved by adding a penalty to the loss function during training, which
forces the model to learn a representation that is robust to small changes
or noise in the input. The penalty is typically the Frobenius norm of the
Jacobian matrix of the encoder activations with respect to the input and
encourages the learned representations to contract around the training
data.

A Contractive Autoencoder consists of two main components: an encoder


and a decoder. The encoder compresses the input into a lower-
dimensional representation, and the decoder reconstructs the input from
this representation. The goal is for the reconstructed output to be as close
as possible to the original input.

The training process involves minimizing a loss function that has two
terms. The first term is the reconstruction loss, which measures the
difference between the original input and the reconstructed output. The
second term is the regularization term, which measures the sensitivity of
the encoded representations to the input. By penalizing the sensitivity, the
CAE learns to produce encodings that do not change much when the
input is perturbed slightly, leading to more robust features.

Contractive Autoencoders offer several advantages:

 Robustness to Noise: By design, CAEs are robust to small perturbations


or noise in the input data.
 Improved Generalization: The contractive penalty encourages the
model to learn more general features that do not depend on the specific
noise or variations present in the training data.
 Stability: The regularization term helps to stabilize the training process
by preventing the model from learning trivial or overfitted
representations.

Stochastic Autoencoders
 Instead of a deterministic mapping between input and latent space, VAEs
learn a probability distribution over the latent space.
 This means that for the same input, the encoder can produce different
latent representations, leading to diverse outputs during decoding.

 The encoder learns a mapping from the input data to a latent space, and
the decoder learns to reconstruct the input data from the latent space.

 The stochastic encoder learns a probability distribution over the latent


space, allowing for the generation of new, diverse samples.

Key Concepts:

 Latent Space: The compressed representation learned by the


autoencoder.

 Encoder: The part of the network that maps the input to the latent space.

 Decoder: The part of the network that reconstructs the input from the
latent space.

 Stochastic Mapping: The encoder learns a probability distribution over


the latent space, rather than a fixed mapping.

 Generative Model: VAE can be viewed as a generative model, as it can


learn to generate new data samples similar to the training data.

Benefits of Stochastic Autoencoders:

Generative Capabilities:
They can be used to generate new data samples that are similar to the
training data.
Diverse Outputs:
The stochastic nature allows for the generation of a variety of outputs
for the same input.
Probabilistic Representations:
They learn a probability distribution over the latent space, capturing the
uncertainty in the data.
Denoising and Feature Extraction:
Like regular autoencoders, they can be used for denoising and feature
extraction.

Boltzmann Machine

 Boltzmann Machine is a generative unsupervised model,

which involves learning a probability distribution from an

original dataset and using it to make inferences about never

before seen data.

 Boltzmann Machine has an input layer (also referred to as

the visible layer) and one or several hidden layers (also

referred to as the hidden layer).


 Boltzmann Machine uses neural networks with neurons that

are connected not only to other neurons in other layers but

also to neurons within the same layer.

 Everything is connected to everything. Connections are

bidirectional, visible neurons connected to each other

and hidden neurons also connected to each other

 Boltzmann Machine doesn’t expect input data, it generates

data. Neurons generate information regardless they are

hidden or visible.

 For Boltzmann Machine all neurons are the same, it doesn’t

discriminate between hidden and visible neurons. For

Boltzmann Machine whole things are system and its

generating state of the system.

Ex: Nuclear Power Plant

 It learns from the input, what are the possible connections

between all these parameters, how do they influence each

other and therefore it becomes a machine that represents our

system.
 Boltzmann Machine learns how the system works in its

normal states through a good example.

Boltzmann Machines are primarily divided into two categories:

Energy-based Models (EBMs)

Restricted Boltzmann Machines (RBM).

When these RBMs are stacked on top of each other, they are

known as Deep Belief Networks (DBN).

Restricted Boltzmann Machine

 What makes RBMs different from Boltzmann machines is

that visible node isn’t connected to each other, and hidden

nodes aren’t connected with each other. Other than that,

RBMs are exactly the same as Boltzmann machines.


 RBM is the neural network that belongs to the energy-based

model.

 It is a probabilistic, unsupervised, generative deep machine

learning algorithm.

 RBM’s objective is to find the joint probability distribution

that maximizes the log-likelihood function.

 RBM is undirected and has only two layers, Input layer, and

hidden layer

 All visible nodes are connected to all the hidden nodes. RBM

has two layers, visible layer or input layer and hidden layer so

it is also called an asymmetrical bipartite graph.

 No intralayer connection exists between the visible nodes.

There is also no intralayer connection between the hidden


nodes. There are connections only between input and hidden

nodes.

 The original Boltzmann machine had connections

between all the nodes. Since RBM restricts the intralayer

connection, it is called a Restricted Boltzmann Machine.

Working of RBM

 In an RBM, we have a symmetric bipartite graph where no

two units within the same group are connected.

 RBM is a Stochastic Neural Network which means that each

neuron will have some random behavior when activated.

 There are two other layers of bias units (hidden bias and

visible bias) in an RBM. This is what makes RBMs different

from autoencoders.

 The hidden bias RBM produces the activation on the forward

pass and the visible bias helps RBM to reconstruct the input

during a backward pass.

 The reconstructed input is always different from the actual

input as there are no connections among the visible units and


therefore, no way of transferring information among

themselves.

 The above image shows the first step in training an RBM

with multiple inputs. The inputs are multiplied by the weights

and then added to the bias.

 The result is then passed through a sigmoid activation

function and the output determines if the hidden state gets

activated or not.

 Weights will be a matrix with the number of input nodes as

the number of rows and the number of hidden nodes as the

number of columns.
 The first hidden node will receive the vector multiplication of

the inputs multiplied by the first column of weights before the

corresponding bias term is added to it.

Now this image shows the reverse phase or

the reconstruction phase. It is similar to the first pass but in the

opposite direction.

DEEP BELIEF NETWORKS(DBN)

 Deep Belief Networks are machine learning algorithm that

resembles the deep neural network but are not the same.

 These are feedforward neural networks with a deep

architecture, i.e., having many hidden layers.


 The unsupervised networks like restricted Boltzmann

machines- RBMs or autoencoders make DBNs, with the

hidden layer of each sub-network serving as the visible

layer for the next layer.

 Deep Belief Networks (DBNs) are created to address

issues with classic neural networks in deep layered

networks such as slow learning, becoming stuck in local

minima owing to poor parameter selection, and

requiring a large number of training datasets of these given

input layer.

 Several layers of stochastic latent variables make a DBN.

Binary latent variables that are often known as feature

detectors or hidden units are binary variables.

 DBN is a hybrid generative graphical model. The top two

layers have no direction. The layers above have directed

links to lower layers.

 DBN is an algorithm for Unsupervised probabilistic deep

learning.
Restricted Boltzmann Machines

A Restricted Boltzmann Machine (RBM) is a type of

generative stochastic artificial neural network that can learn a

probability distribution from its inputs. Deep learning

networks can also use RBM. Deep belief networks, in

particular, can be created by “stacking” RBMs and fine-tuning

the resulting deep network via gradient descent and

backpropagation.

The Architecture of DBN


 A series of constrained Boltzmann machines connected in

a specific order make a Deep Belief Network.

 Then we’ll train it until its convergence and apply the

same until the completion of the whole network.

Working of DBN
 The Greedy learning algorithm is used to pre-train DBN.

For learning the top-down generative weights-the greedy

learning method that employs a layer-by-layer approach.

These generative weights determine the relationship

between variables in one layer and variables in the layer

above.

 On the top two hidden layers, we run numerous steps of

Gibbs sampling in DBN. The top two hidden layers define


the RBM thus, this stage is effectively extracting a sample

from it.

Deep Boltzmann Machine

 A Deep Boltzmann Machine (DBM) is a three-layer

generative model. It is similar to a Deep Belief Network, but

instead allows bidirectional connections in the bottom

layers.

 A deep Boltzmann machine is a model with more hidden

layers with directionless connections between the nodes as

shown in Figure.
 DBM learns the features hierarchically from the raw data and

the features extracted in one layer are applied as hidden

variables as input to the subsequent layer.

 DBM incorporates a Markov random field for layer-wise

pre-training for the large unlabeled data and then provides

feedback from the upper layer to the backward layers. By

applying the backpropagation method, the training

algorithm is fine-tuned.

 The training process in DBM needs to be adapted to define

the training information, weight initialization and adjustment

parameters.
 The stacked layer of restricted Boltzmann machine with

graphical, unsupervised, generative, and probabilistic

representation is used in DBM.

 The data latent features are detected with the usage of DBM

and it contains an undirectional connection .

 The optimal parameters are detected using unsupervised

representation.

Working of DBM

 Deep Boltzmann Machines work by first learning about the

data in an unsupervised way, which means they look for

patterns without being told what to look for.


 They do this using a process that involves adjusting the

connections between units based on the data they see.

 This process is similar to tuning a radio to get a clear signal;

the DBM 'tunes' itself to resonate with the structure of the

data.

 When a DBM is given a set of data, it uses a stochastic, or

random, process to decide whether a hidden unit should be

turned on or off. This decision is based on the input data and

the current state of other units in the network. By doing this

repeatedly, the DBM learns the probability distribution of

the data—basically, it gets an understanding of which

patterns are likely and which are not.

 After the learning phase, you can use a DBM to generate

new data.

 When generating new data, the DBM starts with a random

pattern and refines it step by step, each time updating the

pattern to be more like the patterns it learned during

training.
Key concepts in DBM

 Energy-Based Models: DBMs are energy-based models,

which means they assign an 'energy' level to each possible

state of the network. States that are more likely have lower

energy. The network learns by finding states that minimize

this energy.

 Stochastic Neurons: Neurons in a DBM are stochastic.

Unlike in other types of neural networks, where neurons

output a deterministic value based on their input, DBM

neurons make random decisions about whether to activate.

 Unsupervised Learning: DBMs learn without labels. They

look at the data and try to understand the underlying

structure without any guidance on what features are

important.

 Pre-training: DBMs often go through a pre-training phase

where they learn one layer at a time. This step-by-step

learning helps in stabilizing the learning process before fine-

tuning the entire network together.


 Fine-Tuning: After pre-training, DBMs are fine-tuned,

which means they adjust all their parameters at once to

better model the data.

Generative Adversial Networks(GAN)

 A generative adversarial network (GAN) is a deep

learning architecture. It trains two neural networks to

compete against each other to generate more authentic

new data from a given training dataset.

 GANs are a class of neural networks that autonomously

learn patterns in the input data to generate new

examples resembling the original dataset.

GAN’s architecture consists of two neural networks:

1. Generator: creates synthetic data from random noise to

produce data so realistic that the discriminator cannot

distinguish it from real data.

2. Discriminator: acts as a critic, evaluating whether the

data it receives is real or fake.


 They use adversial training to produce artificial data

that is identical to actual data.

 The Generator improves its ability to create realistic

data, while the Discriminator becomes better at

detecting fakes.

 Over time, this adversarial process leads to the

generation of highly realistic and high-quality data.

1. The generator neural network analyzes the training set and

identifies data attributes.

2. The discriminator neural network also analyzes the initial

training data and distinguishes between the attributes

independently.
3. The generator modifies some data attributes by adding noise

(or random changes) to certain attributes.

4. The generator passes the modified data to the discriminator.

5. The discriminator calculates the probability that the

generated output belongs to the original dataset.

6. The discriminator provides feedback to the generator to

reduce the noise vector randomization in the next cycle.

 The generator attempts to maximize the probability of

mistake by the discriminator, but the discriminator

attempts to minimize the probability of error.

 In training iterations, both the generator and

discriminator evolve and confront each other

continuously until they reach an equilibrium state.

 In the equilibrium state, the discriminator can no longer

recognize synthesized data. At this point, the training

process is over.

You might also like