0% found this document useful (0 votes)

17 views27 pages

Unit 5

Uploaded by

vijayganesh.s.2022.ads

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views27 pages

Unit 5

Uploaded by

vijayganesh.s.2022.ads

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Unit -V

AUTOENCODERS AND GENERATIVE MODELS

Autoencoders: Undercomplete autoencoders — Regularized autoencoders — Stochastic
encoders and decoders — Learning with autoencoders; Deep Generative Models: Variational
autoencoders -Generative adversarial networks.
Autoencoders :

An autoencoder is a type of artificial neural network used to learn data encodings in an unsupervised
manner. The aim of an autoencoder is to learn a lower-dimensional representation (encoding) for a
higher-dimensional data, typically for dimensionality reduction, by training the network to capture
the most important parts of the input image.

Properties of Autoencoders
An autoencoder learns two functions:
1. An encoding function that transforms the input data, and a decoding function that
recreates the input data from the encoded representation. The autoencoder learns an
efficient representation (encoding) for a set of data, typically for dimensionality
reduction.
2. Data specific : It is used to compress the data to the standard compression algorithm
3. Lossy :The output of the autoencoders will not be exactly same as the input it will be
close but degraded representation
4. Unsupervised : No labels will be available for the data in autoencoders

Architecture of the Autoencoders

Autoencoders consist of 3 parts:

1. Encoder: A module that compresses the train-validate-test set input data into an encoded
representation that is typically several orders of magnitude smaller than the input data.

2. Bottleneck: A module that contains the compressed knowledge representations and is therefore
the most important part of the network.

3. Decoder: A module that helps the network“decompress” the knowledge representations and
reconstructs the data back from its encoded form. The output is then compared with a ground truth.
The architecture as a whole looks something like this:

The relationship between the Encoder, Bottleneck, and Decoder

Encoder

The encoder is a set of convolutional blocks followed by pooling modules that compress the input to
the model into a compact section called the bottleneck.The bottleneck is followed by the decoder that
consists of a series of upsampling modules to bring the compressed feature back into the form of an
image. In case of simple autoencoders, the output is expected to be the same as the input data with
reduced noise.However, for variational autoencoders it is a completely new image, formed with
information the model has been provided as input.

Bottleneck

The most important part of the neural network, and ironically the smallest one, is the bottleneck. The
bottleneck exists to restrict the flow of information to the decoder from the encoder, thus,allowing
only the most vital information to pass through.Since the bottleneck is designed in such a way that
the maximum information possessed by an image is captured in it, we can say that the bottleneck
helps us form a knowledge-representation of the input.
Thus, the encoder-decoder structure helps us extract the most from an image in the form of data and
establish useful correlations between various inputs within the network. A bottleneck as a compressed
representation of the input further prevents the neural network from memorising the input and
overfitting on the data.Very small bottlenecks would restrict the amount of information storable,
which increases the chances of important information slipping out through the pooling layers of the
encoder.

Decoder

Finally, the decoder is a set of upsampling and convolutional blocks that reconstructs the bottleneck's
output.

Since the input to the decoder is a compressed knowledge representation, the decoder serves as a
“decompressor” and builds back the image from its latent attributes.

How to train autoencoders?

You need to set 4 hyperparameters before training an autoencoder:

1. Code size: The code size or the size of the bottleneck is the most important hyperparameter
used to tune the autoencoder. The bottleneck size decides how much the data has to be
compressed. This can also act as a regularisation term.

2. Number of layers: Like all neural networks, an important hyperparameter to tune

autoencoders is the depth of the encoder and the decoder. While a higher depth increases
model complexity, a lower depth is faster to process.

3. Number of nodes per layer: The number of nodes per layer defines the weights we use per
layer. Typically, the number of nodes decreases with each subsequent layer in the
autoencoder as the input to each of these layers becomes smaller across the layers.

4. Reconstruction Loss: The loss function we use to train the autoencoder is highly dependent

on the type of input and output we want the autoencoder to adapt to. If we are working with

image data, the most popular loss functions for reconstruction are MSE Loss and L1 Loss. In

case the inputs and outputs are within the range [0,1], as in MNIST, we can also make use

of Binary Cross Entropy as the reconstruction loss.

5. Types of Autoencoders

The idea of autoencoders for neural networks isn't new.

The first applications date to the 1980s. Initially used for dimensionality reduction and feature
learning, an autoencoder concept has evolved over the years and is now widely used for learning
generative models of data.

Here are five popular autoencoders that we will discuss:

1. Undercomplete autoencoders

2. Sparse autoencoders

3. Contractive autoencoders

4. Denoising autoencoders

5. Variational Autoencoders (for generative modelling)

1. Undercomplete autoencoders

An undercomplete autoencoder is one of the simplest types of autoencoders.

Undercomplete autoencoder takes in an image and tries to predict the same image as output, thus
reconstructing the image from the compressed bottleneck region.Undercomplete autoencoders are
truly unsupervised as they do not take any form of label, the target being the same as the input.The
primary use of autoencoders like such is the generation of the latent space or the bottleneck, which
forms a compressed substitute of the input data and can be easily decompressed back with the help
of the network when needed.This form of compression in the data can be modeled as a form
of dimensionality reduction.
When we think of dimensionality reduction, we tend to think of methods like PCA (Principal
Component Analysis) that form a lower-dimensional hyperplane to represent data in a higher-
dimensional form without losing information.pca can only build linear relationships. As a result, it is
put at a disadvantage compared with methods like undercomplete autoencoders that can learn non-
linear relationships and, therefore, perform better in dimensionality reduction.This form of nonlinear
dimensionality reduction where the autoencoder learns a non-linear manifold is also termed
as manifold learning.Effectively, if we remove all non-linear activations from an undercomplete
autoencoder and use only linear layers, we reduce the undercomplete autoencoder into something
that works at an equal footing with PCA.The loss function used to train an undercomplete
autoencoder is called reconstruction loss, as it is a check of how well the image has been
reconstructed from the input data.

Although the reconstruction loss can be anything depending on the input and output, we will use an
L1 loss to depict the term (also called the norm loss) represented by:

Where x^ represents the predicted output and x represents the ground truth.

As the loss function has no explicit regularisation term, the only method to ensure that the model is
not memorising the input data is by regulating the size of the bottleneck and the number of hidden
layers within this part of the network—the architecture.
2. Sparse autoencoders

Sparse autoencoders are similar to the undercomplete autoencoders in that they use the same image
as input and ground truth. However—

The means via which encoding of information is regulated is significantly different.

While undercomplete autoencoders are regulated and fine-tuned by regulating the size of the
bottleneck, the sparse autoencoder is regulated by changing the number of nodes at each hidden layer.

Since it is not possible to design a neural network that has a flexible number of nodes at its hidden
layers, sparse autoencoders work by penalizing the activation of some neurons in hidden layers.

In other words, the loss function has a term that calculates the number of neurons that have been
activated and provides a penalty that is directly proportional to that.

This penalty, called the sparsity function, prevents the neural network from activating more neurons
and serves as a regularizer.
While typical regularizers work by creating a penalty on the size of the weights at the nodes, sparsity
regularizer works by creating a penalty on the number of nodes activated.This form of regularization
allows the network to have nodes in hidden layers dedicated to find specific features in images during
training and treating the regularization problem as a problem separate from the latent space problem.

We can thus set latent space dimensionality at the bottleneck without worrying about regularization.

There are two primary ways in which the sparsity regularizer term can be incorporated into the loss
function.

L1 Loss: In here, we add the magnitude of the sparsity regularizer as we do for general regularizers:

Where h represents the hidden layer, i represents the image in the minibatch, and a represents the
activation.

KL-Divergence: In this case, we consider the activations over a collection of samples at once rather
than summing them as in the L1 Loss method. We constrain the average activation of each neuron
over this collection.
Considering the ideal distribution as a Bernoulli distribution, we include KL divergence within the
loss to reduce the difference between the current distribution of the activations and the ideal
(Bernoulli) distribution:

Where and j denote the specific neuron for layer h and a collection
of m samples is being made here, each denoted as x.

3. Contractive autoencoders

Similar to other autoencoders, contractive autoencoders perform task of learning a representation of

the image while passing it through a bottleneck and reconstructing it in the decoder.The contractive
autoencoder also has a regularization term to prevent the network from learning the identity function
and mapping input into the output.Contractive autoencoders work on the basis that similar inputs
should have similar encodings and a similar latent space representation. It means that the latent space
should not vary by a huge amount for minor variations in the input.To train a model that works along
with this constraint, we have to ensure that the derivatives of the hidden layer activations are small
with respect to the input data.

Mathematically:

Where h represents the hidden layer and x represents the input.

An important thing to note in the loss function (formed from the norm of the derivatives and the
reconstruction loss) is that the two terms contradict each other.While the reconstruction loss wants
the model to tell differences between two inputs and observe variations in the data, the frobenius
norm of the derivatives says that the model should be able to ignore variations in the input data.
Putting these two contradictory conditions into one loss function enables us to train a network where
the hidden layers now capture only the most essential information. This information is necessary to
separate images and ignore information that is non-discriminatory in nature, and therefore, not
important.

The total loss function can be mathematically expressed as:

Where h> is the hidden layer for which a gradient is calculated and represented with respect to the

input x as

The gradient is summed over all training samples, and a frobenius norm of the same is taken.

Applications of autoencoders

Now that you understand various types of autoencoders, let’s summarize some of their most common
use cases.

1. Dimensionality reduction

Undercomplete autoencoders are those that are used for dimensionality reduction.

These can be used as a pre-processing step for dimensionality reduction as they can perform fast and
accurate dimensionality reductions without losing much information.Furthermore, while
dimensionality reduction procedures like PCA can only perform linear dimensionality reductions,
undercomplete autoencoders can perform large-scale non-linear dimensionality reductions.

2. Image denoising

Autoencoders like the denoising autoencoder can be used for performing efficient and highly accurate
image denoising.
Unlike traditional methods of denoising, autoencoders do not search for noise, they extract the image
from the noisy data that has been fed to them via learning a representation of it. The representation is
then decompressed to form a noise-free image.Denoising autoencoders thus can denoise complex
images that cannot be denoised via traditional methods.

3. Generation of image and time series data

Variational Autoencoders can be used to generate both image and time series data.

The parameterized distribution at the bottleneck of the autoencoder can be randomly sampled to
generate discrete values for latent attributes, which can then be forwarded to the decoder,leading to
generation of image data. VAEs can also be used to model time series data like music.

4. Anomaly detection

Undercomplete autoencoders can also be used for anomaly detection.

For example—consider an autoencoder that has been trained on a specific dataset P. For any image
sampled for the training dataset, the autoencoder is bound to give a low reconstruction loss and is
supposed to reconstruct the image as it is.For any image which is not present in the training dataset,
however, the autoencoder cannot perform the reconstruction, as the latent attributes are not adapted
for the specific image that has never been seen by the network.As a result, the outlier image gives off
a very high reconstruction loss and can easily be identified as an anomaly with the help of a proper
threshold.

Points to be considered for Autoencoders

 An autoencoder is an unsupervised learning technique for neural networks that learns

efficient data representations (encoding) by training the network to ignore signal “noise.”

 Autoencoders can be used for image denoising, image compression, and, in some cases, even
generation of image data.

 While autoencoders might seem easy at the first glance (as they have a very simple theoretical
background), making them learn a representation of the input that is meaningful is quite
difficult.
 Autoencoders like the undercomplete autoencoder and the sparse autoencoder do not have
large scale applications in computer vision compared to VAEs and DAEs which are still used
in works since being proposed in 2013 (by Kingmaet al).

4. Denoising autoencoders

Denoising autoencoders, as the name suggests, are autoencoders that remove noise from an image.

As opposed to autoencoders we’ve already covered, this is the first of its kind that does not have the
input image as its ground truth.
In denoising autoencoders, we feed a noisy version of the image, where noise has been added via
digital alterations. The noisy image is fed to the encoder-decoder architecture, and the output is
compared with the ground truth image.

The denoising autoencoder gets rid of noise by learning a representation of the input where the noise
can be filtered out easily.While removing noise directly from the image seems difficult, the
autoencoder performs this by mapping the input data into a lower-dimensional manifold (like in
undercomplete autoencoders), where filtering of noise becomes much easier. Essentially, denoising
autoencoders work with the help of non-linear dimensionality reduction. The loss function generally
used in these types of networks is L2 or L1 loss.

5. Variational autoencoders

Standard and variational autoencoders learn to represent the input just in a compressed form called
the latent space or the bottleneck.Therefore, the latent space formed after training the model is not
necessarily continuous and, in effect, might not be easy to interpolate.This is what a variational
autoencoder would learn from the input:
While these attributes explain the image and can be used in reconstructing the image from the
compressed latent space, they do not allow the latent attributes to be expressed in a probabilistic
fashion.Variational autoencoders deal with this specific topic and express their latent attributes as a
probability distribution, leading to the formation of a continuous latent space that can be easily
sampled and interpolated.When fed the same input, a variational autoencoder would construct latent
attributes in the following manner:

The latent attributes are then sampled from the latent distribution formed and fed to the decoder,
reconstructing the input.The motivation behind expressing the latent attributes as a probability
distribution can be very easily understood via statistical expressions.We aim at identifying the
characteristics of the latent vector z that reconstructs the output given a particular input. Effectively,
we want to study the characteristics of the latent vector given a certain output x[p(z|x)].
While estimating the distribution becomes impossible mathematically, a much simpler and easier
option is to build a parameterized model that can estimate the distribution for us. It does it by
minimizing the KL divergence between the original distribution and our parameterized one.
Expressing the parameterized distribution as q, we can infer the possible latent attributes used in the
image reconstruction.Assuming the prior z to be a multivariate Gaussian model, we can build a
parameterized distribution as one containing two parameters, the mean and the variance. The
corresponding distribution is then sampled and fed to the decoder, which then proceeds to reconstruct
the input from the sample points.While this seems easy in theory, it becomes impossible to implement
because backpropagation cannot be defined for a random sampling process performed before feeding
the data to the decoder.To get by this hurdle, we use the reparameterization trick—a cleverly defined
way to bypass the sampling process from the neural network.In the reparameterization trick, we
randomly sample a valueε from a unit Gaussian and then scale this by the latent distribution varianceσ
and shift it by the mean μ of the same.Now, we have left behind the sampling process as something
done outside what the backpropagation pipeline handles, and the sampled value ε acts just like
another input to the model, that is fed at the bottleneck.

A diagrammatic view of what we attain can be expressed as:

A Generative Model is a powerful way of learning any kind of data distribution using
unsupervised learning and it has achieved tremendous success in just few years. All types of
generative models aim at learning the true data distribution of the training set so as to generate
new data points with some variations. But it is not always possible to learn the exact distribution
of our data either implicitly or explicitly and so we try to model a distribution which is as similar
as possible to the true data distribution. For this, we can leverage the power of neural networks
to learn a function which can approximate the model distribution to the true distribution.Two of
the most commonly used and efficient approaches are Variational Autoencoders (VAE) and
Generative Adversarial Networks (GAN). VAE aims at maximizing the lower bound of the data
log-likelihood and GAN aims at achieving an equilibrium between Generator and
Discriminator. In this blogpost, I will be explaining the working of VAE and GANs and the
intuition behind them.

Variational Autoencoder

I am assuming that the reader is already familiar with the working of a vanilla autoencoder. We
know that we can use an autoencoder to encode an input image to a much smaller dimensional
representation which can store latent information about the input data distribution. But in a
vanilla autoencoder, the encoded vector can only be mapped to the corresponding input using a
decoder. It certainly can’t be used to generate similar images with some variability.To achieve
this, the model needs to learn the probability distribution of the training data. VAE is one of the
most popular approach to learn the complicated data distribution such as images using neural
networks in an unsupervised fashion. It is a probabilistic graphical model rooted in Bayesian
inference i.e., the model aims to learn the underlying probability distribution of the training data
so that it could easily sample new data from that learned distribution. The idea is to learn a low-
dimensional latent representation of the training data called latent variables (variables which are
not directly observed but are rather inferred through a mathematical model) which we assume
to have generated our actual training data. These latent variables can store useful information
about the type of output the model needs to generate. The probability distribution of latent
variables z is denoted by P(z). A Gaussian distribution is selected as a prior to learn the
distribution P(z) so as to easily sample new data points during inference time.
Now the primary objective is to model the data with some parameters which maximizes the
likelihood of training data X. In short, we are assuming that a low-dimensional latent vector has
generated our data x (x ∈ X) and we can map this latent vector to data x using a deterministic
function f(z;θ) parameterized by theta which we need to evaluate (see fig. 1[1]). Under this
generative process, our aim is to maximize the probability of each data in X which is given as,

Pө(X) = ∫Pө(X, z)dz = ∫Pө(X|z)Pө(z)dz (1)

Here, f(z;θ)has been replaced by a distribution Pө(X|z).

Fig. 1. Latent vector mapped to data distribution using parameter ө [1]

The intuition behind this maximum likelihood estimation is that if the model can generate
training samples from these latent variables then it can also generate similar samples with some
variations. In other words, if we sample a large number of latent variables from P(z) and generate
x from these variables then the generated x should match the data distribution Pdata(x). Now
we have two questions which we need to answer. How to capture the distribution of latent
variables and how to integrate Equation 1 over all the dimensions of z?
Generative Adversarial Networks

The adversarial training is the coolest thing since sliced bread. Seeing the popularity of
Generative Adversarial Networks and the quality of the results they produce, I think most of us
would agree with him. Adversarial training has completely changed the way we teach the neural
networks to do a specific task. Generative Adversarial Networks don’t work with any explicit
density estimation like Variational Autoencoders. Instead, it is based on game theory approach
with an objective to find Nash equilibrium between the two networks, Generator and
Discriminator. The idea is to sample from a simple distribution like Gaussian and then learn to
transform this noise to data distribution using universal function approximators such as neural
networks.This is achieved by adversarial training of these two networks. A generator model G
learns to capture the data distribution and a discriminator model D estimates the probability that
a sample came from the data distribution rather than model distribution. Basically the task of
the Generator is to generate natural looking images and the task of the Discriminator is to decide
whether the image is fake or real. This can be thought of as a mini-max two player game where
the performance of both the networks improves over time. In this game, the generator tries to
fool the discriminator by generating real images as far as possible and the discriminator tries not
to get fooled by the generator by improving its discriminative capability. Below image shows
the basic architecture of GAN.
Fig.3. Building block of Generative Adversarial Network

We define a prior on input noise variables P(z) and then the generator maps this to data
distribution using a complex differentiable function with parameters өg. In addition to this, we
have another network called Discriminator which takes in input x and using another
differentiable function with parameters өd outputs a single scalar value denoting the probability
that x comes from the true data distribution Pdata(x). The objective function of the GAN is
defined as

In the above equation, if the input to the Discriminator comes from true data distribution then
D(x) should output 1 to maximize the above objective function w.r.t D whereas if the image has
been generated from the Generator then D(G(z)) should output 1 to minimize the objective
function w.r.t G. The latter basically implies that G should generate such realistic images which
can fool D. We maximize the above function w.r.t parameters of Discriminator using Gradient
Ascent and minimize the same w.r.t parameters of Generator using Gradient Descent. But there
is a problem in optimizing generator objective. At the start of the game when the generator
hasn’t learned anything, the gradient is usually very small and when it is doing very well, the
gradients are very high (see Fig. 4). But we want the opposite behaviour. We therefore maximize
E[log(D(G(z))] rather than minimizing E[log(1-D(G(z))]

Fig.4. Cost for the Generator as a function of Discriminator response on the generated
image

The training process consists of simultaneous application of Stochastic Gradient Descent on

Discriminator and Generator. While training, we alternate between k steps of optimizing D and
one step of optimizing G on the mini-batch. The process of training stops when the
Discriminator is unable to distinguish ρg and ρdata i.e. D(x, өd) = ½ or when ρg = ρdata.One of
the earliest model on GAN employing Convolutional Neural Network was DCGAN which
stands for Deep Convolutional Generative Adversarial Networks. This network takes as input
100 random numbers drawn from a uniform distribution and outputs an image of desired shape.
The network consists of many convolutional, deconvolutional and fully connected layers. The
network uses many deconvolutional layers to map the input noise to the desired output image.
Batch Normalization is used to stabilize the training of the network. ReLU activation is used in
generator for all layers except the output layer which uses tanh layer and Leaky ReLU is used
for all layers in the Discriminator. This network was trained using mini-batch stochastic gradient
descent and Adam optimizer was used to accelerate training with tuned hyperparameters. The
results of the paper were quite interesting. The authors showed that the generators have
interesting vector arithmetic properties using which we can manipulate images in the way we
want.
One of the most widely used variation of GANs is conditional GAN which is constructed by
simply adding conditional vector along with the noise vector (see Fig. 7). Prior to cGAN, we
were generating images randomly from random samples of noise z. What if we want to generate
an image with some desired features. Is there any way to provide this extra information to the
model anyhow about what type of image we want to generate? The answer is yes and
Conditional GAN is the way to do that. By conditioning the model on additional information
which is provided to both generator and discriminator, it is possible to direct the data generation
process. Conditional GANs are used in a variety of tasks such as text to image generation, image
to image translation, automated image tagging etc.

One of the cool thing about GANs is that they can be trained even with small training data.
Indeed the results of GANs are promising but the training procedure is not trivial especially
setting up the hyperparameters of the network. Moreover, GANs are difficult to optimize as they
don’t converge easily. Of course there are some tips and tricks to hack GANs but they may not
always help. You can find some of these tips here. Also, we don’t have any criteria for the
quantitative evaluation of the results except to check whether the generated images are
perceptually realistic or not.
It uses using random noise and the discriminator on the other hand diffrentiates between the
fake and real samples, after multiple sample are diffrentiated the generator also refers the
feedbacks given from discriminator and enhances the fake sample such that the real and fake
sample both can't be diffrentiated easily. To read more about the GANs you could refer this
article by Taru Jain from opengenus Beginner's Guide to Generative Adversarial Networks
with a demo.
There are multiple types of GANs that perform different applications but in this article we are
only going to discuss some of the important GANs.

The Different Types of Generative Adversarial Networks (GANs) are:

 Vanilla GAN

 Conditional Gan (CGAN)

 Deep Convolutional GAN (DCGAN)

 CycleGAN
 Generative Adversarial Text to Image Synthesis

 Style GAN

 Super Resolution GAN (SRGAN)

1. Vanilla GAN - The Vanilla GAN is the simplest type of GAN made up of the generator and
discriminator , where the classification and generation of images is done by the generator and
discriminator internally with the use of multi layer perceptrons. The generator captures the data
distribution meanwhile , the discriminator tries to find the probability of the input belonging to
a certain class, finally the feedback is sent to both the generator and discriminator after
calculating the loss function , and hence the effort to minimize the loss comes into picture.

2. Conditional Gan (CGAN) - In this GAN the generator and discriminator both are provided
with additional information that could be a class label or any modal data. As the name suggests
the additional information helps the discriminator in finding the conditional probability instead
of the joint probability.
The loss function of the conditional GAN is as below

3. Deep Convolutional GAN (DCGAN)-This is the first GAN where the generator used deep
convolutional network , hence generating high resolution and quality images to be
diffrentiated.ReLU activation is used in Generator all layers except last one where Tanh
activation is used, meanwhile in Discriminator all layers use the Leaky-ReLu activation
function. Adam optimizer is used with a learning rate of 0.0002.

The above figure shows the architecture of generator of the GAN. The input generated is of 64
X 64 resolution.

4. Cycle GAN - This GAN is made for Image-to-Image translations, meaning one image
to be mapped with another image. For example , if summer and winter are made to
undergo the process of Image-Image translation we find a mapping function that could
convert summer images into that of winter images and vice versa by adding or removing
features according to the mapping function,such that the predicted output and actual
output have minimized loss.

5. Generative Adversarial Text to Image Synthesis - In this the GANs are capable of finding
an image from the dataset that is closest to the text description and generate similar images.
Gan Architecture is given below :

As you can the generator network is trying to generate based on the description and the
diffrentiation is done by the discriminator based off the features mentioned in text description.

6. Style GAN- Other GANs focused on improving the discriminator in this case we improve
the generator. This GAN generates by taking a reference picture.
As you an see the figure below, Style Gan architecture consists of a Mapping network that
maps the input to an intermediate Latent space, Further the intermediate is processed using the
AdaIN after each layer , there are approximately 18 convolutional layers.

The Style GAN uses the AdaIN or the Adaptive Instance Normalization which is defined as

Applications:
1. Firstly, GANs can be used as data augmentation techniques where the generator
generates new images taking the training dataset and producing multiple images by
applying some changes.

2. GANs improve the classification techniques by training the discriminator on a very

large scale of data that are real as well as fake.

3. GANs are also used to improve the resolution of any input image.

4. Just like filters used in Snapchat , a filter could be applied to see what a place might
look like if in summer, winter,spring or autumn and many more conditions could be
applied and hence thats where Deep ConvNets play their role.
5. GANs are used to convert semantics into images and better understand the
visualizations done by the machine.

Deep Generative Models :

Generative models are forms of Artificial Intelligence (AI) and Machine Learning (ML) that
use deep neural networks that understand the distribution of complex training data sets. This
knowledge facilitates the generation of large datasets that know the probability of the next item
in a sequence. Applications include natural language processing, speech processing, and
computer vision.

Why are deep generative Models Important:

To create more authentic output from your generative model, you can use Generative
Adversarial Networks (GAN) to create a synthetic training data set that trains a second
competing Neural Network. The generated neural network instances become negative training
examples for the discriminator. By learning to distinguish the generator’s fake data from actual
data, generating more plausible and original new data is possible.

Examples of the deep generative Model:

Different algorithms are applicable depending on the application of a deep generative model.
These include the following.

Variational Autoencoders

Variational autoencoders can learn to reconstruct and generate new samples from a provided
dataset. By utilizing a latent space, variational autoencoders can represent data continuously
and smoothly. This enables the generation of variations of the input data with smooth
transitions.

Generative Adversarial Networks

Generative adversarial networks (GANs) are generative models that can create new data
instances similar to but not the same as the training data sets. GANs are great for creating
images but not as sophisticated as diffusion Models.

Autoregressive Models

An autoregressive model is a statistical model used to understand and predict future values in
a time series based on past values.

Normalizing Flow Models

Normalizing Flows is a method for constructing complex distributions by transforming a

probability density through a series of invertible mappings. By repeatedly applying the rule for
change of variables, the initial density ‘flows’ through the sequence of invertible mappings.

Energy-Based Models

An energy-based model is a generative model usually used in statistical physics. After learning
the data distribution of a training data set, the generative model can produce other datasets
matching the data distributions.

Score-Based Models

Score-based generative models estimate the scores from the training data, allowing the model
to navigate the data space according to the learned distribution and generate similar new data.

Applications of Deep Generative Models :

Below are some use cases for deep generative models being applied in the real world today:
 Autonomous vehicle systems use inputs from visual and Lidar sensors fed to a neural network
that predicts future behavior to make proactive course corrections thousands of times a second.
 Fraud detection compares historical behavior to current transactions to detect anomalies and
act accordingly.
 Virtual assistants learn a person’s taste in music, their schedule, purchasing history and any
other information they have access to make recommendations. For example, it can provide
travel times to home or places to work.
 Entertainment systems can recommend movies based on past viewing of similar content.
 A smartwatch can warn of potential medical conditions, over-exertion, and lack of sleep to
oversee the owner’s well-being.
 Images taken with a digital camera or scanned images can be enhanced by increasing
sharpness, balancing colors, and suggesting crops.
 Captions can be auto-generated for movies or meeting videos to enhance playback.
 Handwriting style can be learned, and new text can be generated in the same style.
 Captioned videos can have captions generated in multiple languages.
 Photo libraries can be tagged with descriptions to make finding similar ones or duplicates
easier.

Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Unit IV - Part 01
No ratings yet
Unit IV - Part 01
47 pages
MODULE 5 Auto-Encoders and Generative Models
No ratings yet
MODULE 5 Auto-Encoders and Generative Models
25 pages
Experiment 4
No ratings yet
Experiment 4
26 pages
Unit V
No ratings yet
Unit V
32 pages
DL Unit 5
No ratings yet
DL Unit 5
19 pages
Lecture 23b Auto Encoder
No ratings yet
Lecture 23b Auto Encoder
27 pages
D5 PPT
No ratings yet
D5 PPT
79 pages
Unit-V DL
No ratings yet
Unit-V DL
31 pages
ML Lec 19 Autoencoder
No ratings yet
ML Lec 19 Autoencoder
54 pages
Chapter 7 - Autoencoders
No ratings yet
Chapter 7 - Autoencoders
91 pages
Unit 3
No ratings yet
Unit 3
39 pages
L23 Autoencoders
No ratings yet
L23 Autoencoders
16 pages
Autoencoders
No ratings yet
Autoencoders
4 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
7& 9 Autoencoder and Variational Autoencoder
No ratings yet
7& 9 Autoencoder and Variational Autoencoder
13 pages
Auto Encoder
No ratings yet
Auto Encoder
10 pages
Unit 4
No ratings yet
Unit 4
10 pages
Introduction To Autoencoders: A Brief Overview
No ratings yet
Introduction To Autoencoders: A Brief Overview
27 pages
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
No ratings yet
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
22 pages
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
No ratings yet
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
22 pages
Autoencoders in Machine Learning
No ratings yet
Autoencoders in Machine Learning
7 pages
Unit-5 Auto Encoders in Deep Learning
No ratings yet
Unit-5 Auto Encoders in Deep Learning
23 pages
Unit 3
No ratings yet
Unit 3
23 pages
Unit5 Autoencoders
No ratings yet
Unit5 Autoencoders
45 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
Study Materials - Denoising Autoencoders
No ratings yet
Study Materials - Denoising Autoencoders
7 pages
Ad3501-Dl-Unit 5 Notes
No ratings yet
Ad3501-Dl-Unit 5 Notes
16 pages
Brief Introduction On Current Research Areas - Autoencoders
No ratings yet
Brief Introduction On Current Research Areas - Autoencoders
20 pages
Unit 5
No ratings yet
Unit 5
23 pages
Lecture 2.3.1 - Autoencoders
No ratings yet
Lecture 2.3.1 - Autoencoders
6 pages
Autoencoder
No ratings yet
Autoencoder
4 pages
DeepLearning Unit IV Notes
No ratings yet
DeepLearning Unit IV Notes
58 pages
Ch3 Auto Encoder
No ratings yet
Ch3 Auto Encoder
40 pages
DL M3 Tech
No ratings yet
DL M3 Tech
15 pages
Autoencoder - Unit 4
No ratings yet
Autoencoder - Unit 4
39 pages
Autoencoders
No ratings yet
Autoencoders
35 pages
Unit II
No ratings yet
Unit II
35 pages
DUnit IV
No ratings yet
DUnit IV
22 pages
Vae - Gan 1
No ratings yet
Vae - Gan 1
136 pages
Lecture 14 Autoencoders
No ratings yet
Lecture 14 Autoencoders
39 pages
Auto Encoder S
No ratings yet
Auto Encoder S
16 pages
Unit IV DL
No ratings yet
Unit IV DL
122 pages
Chapter17 Autoencoders
No ratings yet
Chapter17 Autoencoders
23 pages
Dlunit 4
No ratings yet
Dlunit 4
122 pages
Generative Models
No ratings yet
Generative Models
65 pages
Deep Learning Module-2 & 4
No ratings yet
Deep Learning Module-2 & 4
48 pages
Unit IV DL
No ratings yet
Unit IV DL
122 pages
1 Autoencoders
No ratings yet
1 Autoencoders
22 pages
Auto Encoders
No ratings yet
Auto Encoders
4 pages
Module 03
No ratings yet
Module 03
13 pages
Vae Gan
No ratings yet
Vae Gan
214 pages
Unsupervised Deep Learning-Unit 4
No ratings yet
Unsupervised Deep Learning-Unit 4
26 pages
Autoencoders Tutorial - What Are Autoencoders - Edureka
No ratings yet
Autoencoders Tutorial - What Are Autoencoders - Edureka
10 pages
Autoencoders: Presented By: 2019220013 Balde Lansana (
No ratings yet
Autoencoders: Presented By: 2019220013 Balde Lansana (
21 pages
DL Unit3 Autoencoder
No ratings yet
DL Unit3 Autoencoder
91 pages
DL Class5
No ratings yet
DL Class5
23 pages
Autoencoders U
No ratings yet
Autoencoders U
44 pages
DL Unit - 4
No ratings yet
DL Unit - 4
26 pages
Generative Ai: A Comprehensive Guide to Innovative Ai Models (A Step-by-step Understanding of Fundamental Concepts With Practical Applications)
From Everand
Generative Ai: A Comprehensive Guide to Innovative Ai Models (A Step-by-step Understanding of Fundamental Concepts With Practical Applications)
Anthony Phillips
No ratings yet
Research Paper by Avan Usmani
No ratings yet
Research Paper by Avan Usmani
6 pages
Thesis Topics in Education in India
100% (3)
Thesis Topics in Education in India
4 pages
AI For Cyber Security Automated Incident Response Systems
No ratings yet
AI For Cyber Security Automated Incident Response Systems
30 pages
Machine Learning: 1.1 Types of Problems and Tasks
No ratings yet
Machine Learning: 1.1 Types of Problems and Tasks
9 pages
Abdelghani KABOT S CV 1675947915
No ratings yet
Abdelghani KABOT S CV 1675947915
2 pages
Chapter 2 - Artificial Neural Networks (ANNs)
No ratings yet
Chapter 2 - Artificial Neural Networks (ANNs)
27 pages
Assignment 2a Research Simulation
No ratings yet
Assignment 2a Research Simulation
4 pages
Final Report Dinesh
No ratings yet
Final Report Dinesh
33 pages
Psycho-Pass, A Case Study
No ratings yet
Psycho-Pass, A Case Study
10 pages
CV
No ratings yet
CV
2 pages
Chat-Gpt The Full Guide To Making Money With Chat-Gpt (Omar Faruq) (Z-Library)
100% (2)
Chat-Gpt The Full Guide To Making Money With Chat-Gpt (Omar Faruq) (Z-Library)
42 pages
Blue Print Class IX Ai PT 2 417
100% (1)
Blue Print Class IX Ai PT 2 417
2 pages
Digital Technologies and Their Impact On Society and Governance
No ratings yet
Digital Technologies and Their Impact On Society and Governance
16 pages
CSET at Three
No ratings yet
CSET at Three
32 pages
The Structure of Style
No ratings yet
The Structure of Style
358 pages
Ensemble Methods in Machine Learning: X X X X X X
No ratings yet
Ensemble Methods in Machine Learning: X X X X X X
15 pages
Machine Dreams
No ratings yet
Machine Dreams
14 pages
Internshala Summer Training Report On Data Science
No ratings yet
Internshala Summer Training Report On Data Science
70 pages
Azure 900 Cloud Fundamentals: Microsoft Certification AZ-900: Azure AI Services
No ratings yet
Azure 900 Cloud Fundamentals: Microsoft Certification AZ-900: Azure AI Services
1 page
ELI5 - Chapter 5 - Content Creation and Posting - Tom Lowe
No ratings yet
ELI5 - Chapter 5 - Content Creation and Posting - Tom Lowe
2 pages
Complex System Report Ujwal Bhattarai
No ratings yet
Complex System Report Ujwal Bhattarai
19 pages
Ai-Generator Websites and Its Impact On The Academic Performance of Senior High School Students of Usant
100% (1)
Ai-Generator Websites and Its Impact On The Academic Performance of Senior High School Students of Usant
147 pages
Uva - MSC - Computational Science - Overzicht
No ratings yet
Uva - MSC - Computational Science - Overzicht
4 pages
DL
No ratings yet
DL
2 pages
Government AI Readiness 2022 FV
No ratings yet
Government AI Readiness 2022 FV
61 pages
Automation and RPA in The Enterprise
No ratings yet
Automation and RPA in The Enterprise
47 pages
Final Internship Report MAAN
No ratings yet
Final Internship Report MAAN
32 pages
Generating Adversarial Malware Examples For Black-Box Attacks Based On GAN
No ratings yet
Generating Adversarial Malware Examples For Black-Box Attacks Based On GAN
7 pages
Deep Learning Based Model For Fake Review Detection
No ratings yet
Deep Learning Based Model For Fake Review Detection
4 pages
CV Yen-Ling Kuo
No ratings yet
CV Yen-Ling Kuo
5 pages

Unit 5

Uploaded by

Unit 5

Uploaded by

Unit -V

AUTOENCODERS AND GENERATIVE MODELS

Architecture of the Autoencoders

Autoencoders consist of 3 parts:

The relationship between the Encoder, Bottleneck, and Decoder

How to train autoencoders?

You need to set 4 hyperparameters before training an autoencoder:

2. Number of layers: Like all neural networks, an important hyperparameter to tune

of Binary Cross Entropy as the reconstruction loss.

The idea of autoencoders for neural networks isn't new.

Here are five popular autoencoders that we will discuss:

5. Variational Autoencoders (for generative modelling)

An undercomplete autoencoder is one of the simplest types of autoencoders.

The means via which encoding of information is regulated is significantly different.

Similar to other autoencoders, contractive autoencoders perform task of learning a representation of

Where h represents the hidden layer and x represents the input.

The total loss function can be mathematically expressed as:

3. Generation of image and time series data

Undercomplete autoencoders can also be used for anomaly detection.

Points to be considered for Autoencoders

 An autoencoder is an unsupervised learning technique for neural networks that learns

A diagrammatic view of what we attain can be expressed as:

Pө(X) = ∫Pө(X, z)dz = ∫Pө(X|z)Pө(z)dz (1)

Here, f(z;θ)has been replaced by a distribution Pө(X|z).

Fig. 1. Latent vector mapped to data distribution using parameter ө [1]

The training process consists of simultaneous application of Stochastic Gradient Descent on

The Different Types of Generative Adversarial Networks (GANs) are:

 Conditional Gan (CGAN)

 Deep Convolutional GAN (DCGAN)

 Super Resolution GAN (SRGAN)

2. GANs improve the classification techniques by training the discriminator on a very

Deep Generative Models :

Why are deep generative Models Important:

Examples of the deep generative Model:

Generative Adversarial Networks

Normalizing Flow Models

Normalizing Flows is a method for constructing complex distributions by transforming a

Applications of Deep Generative Models :

You might also like