0% found this document useful (0 votes)

7 views30 pages

Unit 5 Deep Unsupervised Learning

Deep Unsupervised Learning focuses on using artificial neural networks to uncover patterns in data without human supervision. It includes methods like clustering, dimensionality reduction, and the use of latent variable models to simplify complex data relationships. Autoencoders, a type of unsupervised neural network, compress data into lower dimensions and reconstruct it, with variations like denoising autoencoders used for tasks such as image denoising and fraud detection.

Uploaded by

arshee.naz17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views30 pages

Unit 5 Deep Unsupervised Learning

Uploaded by

arshee.naz17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Unit-5

Deep Unsupervised Learning

Deep learning is the branch of machine learning which is based on artificial neural network
architecture. An artificial neural network or ANN uses layers of interconnected nodes called neurons
that work together to process and learn from the input data.
In a fully connected Deep neural network, there is an input layer and one or more hidden layers
connected one after the other. Each neuron receives input from the previous layer neurons or the
input layer. The output of one neuron becomes the input to other neurons in the next layer of the
network, and this process continues until the final layer produces the output of the network. The
layers of the neural network transform the input data through a series of nonlinear
transformations,allowing the network to learn complex representations of the input data.

Today Deep learning has become one of the most popular and visible areas of machine learning, due
to its success in a variety of applications, such as computer vision, natural language processing, and
Reinforcement learning.
Deep learning can be used for supervised, unsupervised as well as reinforcement machine learning. it
uses a variety of ways to process these.
 Supervised Machine Learning: Supervised machine learning is the machine
learning technique in which the neural network learns to make predictions or classify data based
on the labeled datasets. Here we input both input features along with the target variables. the
neural network learns to make predictions based on the cost or error that comes from the
difference between the predicted and the actual target, this process is known as
backpropagation. Deep learning algorithms like Convolutional neural networks, Recurrent
neural networks are used for many supervised tasks like image classifications and recognization,
sentiment analysis, language translations, etc.
 Unsupervised Machine Learning: Unsupervised machine learning is the machine
learning technique in which the neural network learns to discover the patterns or to cluster the
dataset based on unlabeled datasets. Here there are no target variables. while the machine has to
self-determined the hidden patterns or relationships within the datasets. Deep learning
algorithms like autoencoders and generative models are used for unsupervised tasks like
clustering, dimensionality reduction, and anomaly detection.
 Reinforcement Machine Learning: Reinforcement Machine Learning is the machine
learning technique in which an agent learns to make decisions in an environment to maximize a
reward signal. The agent interacts with the environment by taking action and observing the
resulting rewards. Deep learning can be used to learn policies, or a set of actions, that maximizes
the cumulative reward over time. Deep reinforcement learning algorithms like Deep Q networks
and Deep Deterministic Policy Gradient (DDPG) are used to reinforce tasks like robotics and
game playing etc

Deep Unsupervised Learning

Unsupervised learning The task of unsupervised learning is to uncover structure in data, without using
human-provided supervision as a guide to what is salient or interesting about particular observations.
When doing unsupervised learning we seek to explain or analyze our data, or to provide useful inputs
for further applications. In practice there exists a varied body of unsupervised methods that each aim to
characterize different kinds of structure in the data. For instance, cluster analysis is an unsupervised
learning method where the goal is to identify groups, or clusters, of statistically similar observations
(Jain et al., 1999). Collaborative filtering seeks to “complete” a partial array of data, by leveraging
correlations between data variables (Su and Khoshgoftaar, 2009). Dimensionality reduction posits that
many datasets exhibit substantial redundancy across variables, and aims to reduce the data to its
essential directions of variability (van der Maaten et al., 2009).

Figure 2.1: Unsupervised learning methods include clustering (a) where data is separated in statistically
similar groups, (b) dimensionality reduction where the aim is to capture low-dimensional structure of
the data, and (c) density estimation, where the aim is approximate the true data distribution.

Dimensionality reduction is closely related to representation learning in which the aim is to learn
transformations of data that serve as useful representations for down-stream tasks (Bengio et al., 2013).
Many unsupervised learning approaches can be understood from a probabilistic perspective, where the
goal is to find a model pθ that closely matches the observed data. When dealing with continuous data
this task is often referred to as density estimation. However, one issue with framing unsupervised
learning in this way is that successful density estimation does not always incentivize the learning of
useful structure in the data. For instance, consider a perfect black-box model that outputs calibrated
probability densities for any input x. Such a model has perfectly characterized the statistical
dependencies between data variables, but it may not be useful to us if we are interested in cluster-
structure, or low-dimensional representations of data for use in alternative tasks. To reconcile the goals
of unsupervised learning, with the generic probabilistic objective of density estimation we can impose
structure on the parametric model pθ. In doing so we obtain probabilistic analogs to a number of
classical unsupervised objectives. For instance, if we assume the data are generated using unobserved
latent variables z, and that the latent variables are of lower dimensionality than the observed variables,
then by performing inference we also do dimensionality reduction. Under certain modeling
assumptions that are discussed in the following section, this reduces to a probabilistic variant of the
classic dimensionality reduction method PCA (Tipping and Bishop, 1999). Alternatively if the latent
variables are categorical cluster indicators, as in a mixture model, then we naturally recover a
clustering method through the EM-algorithm.
Figure 2.2: The data generating process for images from the CLEVR dataset (Johnson et al., 2017). To
generate an image, latent scene variables (left) including object properties and lighting conditions are
chosen from a prior model, and then transformed through the rendering process to a photo-realistic
image (right).

Latent variable models

A latent variable is a random variable which you can’t observe neither in training nor in test phase .It is
derived from the latin word latēre which means hidden. Intuitionally, some phenomenons like
incidences,altruism one can’t measure while others like speed or height one can. These variables which
you can’t measure on a quantitative scale are called latent variables.

assume that data variables x are generated via interactions with unobserved, or latent variables,
typically denoted z. Intuitively, it is reasonable to believe that in a particular dataset we will not have
observed all the relevant variables, and that correlations between variables may be caused by some
unobserved source. For example, if we collect data about umbrella sales and car accidents over time,
we will probably observe positive correlations in the variables. However, these variables are really
independent, given the knowledge that it is raining, or not raining, which in this case is a latent variable.
Alternatively, for perceptual data like images of faces, we know that there exist some underlying
factors that explain most of the variability across images: skin tone, face shape, camera pose, facial
expression, etc. We expect the dimensionality of these latent factors to be smaller than the number of
pixels in an image. Latent variable models formalize these assumptions by describing a data-generating
process where 24 Chapter 2. Background latent variables are first sampled, and then data variables are
generated conditioned on these latent variables. For instance, to generate an image of a person we first
generate the latent features z such as skin tone or camera pose, and then transform these features to
pixels x via a digital image-formation process. Figure 2.2 depicts a similar generative process for
images in the CLEVR dataset, with scene attributes such as object types, colours and materials being
fed into a renderer to produce photo-realistic images (Johnson et al., 2017). Latent variable models are
useful ways to describe natural observations, and we can obtain the model distribution over the
observed variables by marginalizing out the latent variables: p(x; θ) = Z z p(x|z; θ)p(z; θ) dz. (2.1) This
means that to evaluate the probability of a given observation, we need to consider all possible settings
of the latent variables, weight them by their prior probabilities, and evaluate the probability of the
observation assuming those particular latent variables.

We will present a simple scenario which will clear our concepts on latent variables.An IT company
wants to hire an employee for one of its open position.The candidates have the following features.i)High
School Grade ii) University School Grade iii)IQ Score iv)Phone Interview Score .The XYZ company
wants to bring some candidates for an onsite interview.
Fig 1:Dataset of features of a hiring candidate
They can’t bring all of them because of the expenses involved .So they decided to predict the onsite
interview score based on which they will entertain the idea of whether to bring the candidate for an on-
site interview or not .They had the previous historical data for performance of on-site candidates .It
seems a trivial standard regression model where one had to predict the regression score.Well it seems to
be not for basically two reasons :

i) Missing values in the dataset- We can fill them with all the trades and tricks we have learnt from
Machine Learning foundations but still they will induce an uncertainty which will hamper any
probabilistic model drawn from here.
ii) Quantifying the uncertainty in predictions-Two persons having same score 50/100 one with some
missing data and one with none.We are sure about the one with none missing data that he is going to
perform badly owing to low prediction score but we are not so sure about the one with same score but
with missing values.He may be good after all .

Thus we go for probabilistic model of data owing to the above mentioned reasons.We draw some
random variables and understand the connection between these random variables.In our case all the
random variables might be connected to each other.A High IQ score might affect High School grade , a
GPA might affect the phone interview and so on.

Fig 2:Parameters of a probabilistic model

In this scenario we fail to capture the structure of probabilistic model.So we end up with most flexible
and least structured model that we can possibly have.To build a probabilistic model we have to assign
probability to each and every combination of our observed variables as depicted in Fig 3.There might be
exponentially different combinations of all the values of features .It is just impractical to treat them as
parameters.

Fig 3:Different combinations of a probabilistic model

Now to deal with this situation the concept of latent variable has come into existence , where we
introduce a latent variable which will be a measure that quantifies the uncertainty .In our case it might
be Intelligence. The latent variable is direct causation of all the parameters .Now our model is much
simpler to work with and we will get the same efficiency without reducing the flexibility of model.

Fig 4:Introduction of latent variable

The joint probability of all the parameters is now the summation of the dependency of the parameters on
the latent variable.This way we have reduced the humongous combination of table into a table of 5
conditioned on a latent variable and its prior probability. We can further break the conditional
probabilities into a simple dependency of each variable on the latent one due to the structure of our
model.

Benefits of using Latent Variable Models:

i) Fewer Parameters
ii) Simple Model
iii) Easy to interpret.

Drawbacks:Harder to train.

Auto Encoders
Auto Encoder is an unsupervised Artificial Neural Network that attempts to encode the data by
compressing it into the lower dimensions (bottleneck layer or code) and then decoding the data to
reconstruct the original input.The bottleneck layer(or code) holds the compressed representation of the
in put data. In Auto Encoder the number of output units must be equal to the number of input units
since we’re attempting to reconstruct the input data. Auto Encoders usually consist of an encoder and a
decoder. The encoder encodes the provided data into a lower dimension which is the size of the
bottleneck layer and the decoder decodes the compressed data into its original form.The number of
neurons in the layers of the encoder will be decreasing as we move on with further layers, whereas the
number of neurons in the layers of the decoder will be increasing as we move on with further
layers.There are three layers used in the encoder and decoder in the following example.The encoder
contains 32, 16, and 7 units in each layer respectively and the decoder contains 7,16, and 32 units in
each layer respectively. The code size/the number of neurons in bottle-neck must be less than the
number of features in the data. Before feeding the data into the AutoEncoder the data must definitely
be scaled between 0 and 1 using Min Max Scaler since we are going to use sigmoid activation function
in the output layer which outputs values between 0 and 1.When we are using AutoEncoders for
dimensionality reduction we’ll be extracting the bottleneck layer and use it to reduce the dimensions.
This process can be viewed as feature extraction.The type of AutoEncoder that we’re using is Deep
AutoEncoder, where the encoder and the decoder are symmetrical. The Autoencoders don’t necessarily
have a symmetrical encoder and decoder but we can have the encoder and decoder non-symmetrical as
well.
Architecture of Auto encoder in Deep Learning The general architecture of an autoencoder
includes an encoder, decoder, and bottleneck layer.
 Encoder
1. Input layer take raw input data
2. The hidden layers progressively reduce the dimensionality of the input, capturing important
features and patterns. These layer compose the encoder.
3. The bottleneck layer (latent space) is the final hidden layer, where the dimensionality is
significantly reduced. This layer represents the compressed encoding of the input data.
 Decoder
1. The bottleneck layer takes the encoded representation and expands it back to the dimensionality
of the original input.
2. The hidden layers progressively increase the dimensionality and aim to reconstruct the original
input.
3. The output layer produces the reconstructed output, which ideally should be as close as possible
to the input data.
4. The loss function used during training is typically a reconstruction loss, measuring the
difference between the input and the reconstructed output. Common choices include mean
squared error (MSE) for continuous data or binary cross-entropy for binary data.
5. During training, the autoencoder learns to minimize the reconstruction loss, forcing the network
to capture the most important features of the input data in the bottleneck layer.
After the training process, only the encoder part of the autoencoder is retained to encode a similar
type of data used in the training process. The different ways to constrain the network are: –
 Keep small Hidden Layers: If the size of each hidden layer is kept as small as possible,
then the network will be forced to pick up only the representative features of the data thus
encoding the data.
 Regularization: In this method, a loss term is added to the cost function which encourages
the network to train in ways other than copying the input.
 Denoising: Another way of constraining the network is to add noise to the input and teach
the network how to remove the noise from the data.
 Tuning the Activation Functions: This method involves changing the activation functions
of various nodes so that a majority of the nodes are dormant thus, effectively reducing the size
of the hidden layers.

Hyper parameters of an AutoEncoder

 Code size or the number of units in the bottleneck layer
 Input and output size,which is the number of features in the data
 Number of neurons or nodes per layer
 Number layering encoder and decoder.
 Activation function
 Optimization function


Types of Autoencoders
There are diverse types of autoencoders and analyze the advantages and disadvantages associated
with different variation
1. Denoising Autoencoder
Denoising autoencoder works on a partially corrupted input and trains to recover the original
undistorted image. As mentioned above, this method is an effective way to constrain the network
from simply copying the input and thus learn the underlying structure and important features of the
data.
During the training phase, present the denoising autoencoder (DAE) with a collection of clean
input examples along with their respective noisy counterparts. The objective is to acquire a
function that maps a noisy input to a relatively clean output using an encoder -decoder
architecture. To achieve this, a reconstruction loss function is typically employed to evaluate the
disparity between the clean input and the reconstructed output. A DAE is trained by minimizing
this loss through the use of backpropagation, which involves updating the weights of both
encoder and decoder components.Applications of Denoising Autoencoders (DAEs) span a variety
of domains, including computer vision, speech processing, and natural language processing.

Examples

 Image Denoising: DAEs are effective in removing noise from images, such as Gaussian
noise or salt-and-pepper noise.
 Fraud Detection: DAEs can contribute to identifying fraudulent transactions by learning
to reconstruct common transactions from their noisy counterparts.
 Data Imputation: To reconstruct missing values from available data by learning, DAEs
can facilitate data imputation in datasets with incomplete information.
 Data Compression: DAEs can compress data by obtaining a concise representation of
the data in the encoding space.
 Anomaly Detection: Using DAEs, anomalies in a dataset can be detected by training a
model to reconstruct normal data and then flag challenging inputs as potentially
abnormal.

Advantages
1. This type of autoencoder can extract important features and reduce the noise or the useless
features.
2. Denoising autoencoders can be used as a form of data augmentation, the restored images
can be used as augmented data thus generating additional training samples.
Disadvantages
1. Selecting the right type and level of noise to introduce can be challenging and may require
domain knowledge.
2. Denoising process can result into loss of some information that is needed from the original
input. This loss can impact accuracy of the output.

2.Sparse Autoencoder

This type of autoencoder typically contains more hidden units than the input but only a few are
allowed to be active at once. This property is called the sparsity of the network. The sparsity of the
network can be controlled by either manually zeroing the required hidden units, tuning the activation
functions or by adding a loss term to the cost function.
Advantages
1. The sparsity constraint in sparse autoencoders helps in filtering out noise and irrelevant
features during the encoding process.
2. These autoencoders often learn important and meaningful features due to their emphasis on
sparse activations.
Disadvantages
1. The choice of hyperparameters play a significant role in the performance of this
autoencoder. Different inputs should result in the activation of different nodes of the network.
2. The application of sparsity constraint increases computational complexity.

3. Convolutional Autoencoder

Convolutional autoencoders are a type of autoencoder that use convolutional neural networks
(CNNs) as their building blocks. The encoder consists of multiple layers that take a image or a grid
as input and pass it through different convolution layers thus forming a compressed representation of
the input. The decoder is the mirror image of the encoder it deconvolves the compressed
representation and tries to reconstruct the original image.

Advantages
1. Convolutional autoencoder can compress high-dimensional image data into a lower-
dimensional data. This improves storage efficiency and transmission of image data.
2. Convolutional autoencoder can reconstruct missing parts of an image. It can also handle
images with slight variations in object position or orientation.
Disadvantages
1. These autoencoder are prone to overfitting. Proper regularization techniques should be used
to tackle this issue.
2. Compression of data can cause data loss which can result in reconstruction of a lower
quality image.

Autoencoders have emerged as an architecture for data representation and generation. Among them,
Variational Autoencoders (VAEs) stand out, introducing probabilistic encoding and opening new
avenues for diverse applications. In this article, we are going to explore the architecture and
foundational concepts of variational autoencoders (VAEs).
Autoencoders are neural network architectures that are intended for the compression and
reconstruction of data. It consists of an encoder and a decoder; these networks are learning a simple
representation of the input data. Reconstruction loss ensures a close match of output with input,
which is the basis for understanding more advanced architectures such as VAEs. The encoder aims
to learn efficient data encoding from the dataset and pass it into a bottleneck architecture. The other
part of the autoencoder is a decoder that uses latent space in the bottleneck layer to regenerate
images similar to the dataset. These results back propagate the neural network in the form of the loss
function.
Variational Autoencoder
Variational autoencoder was proposed in 2013 by Diederik P. Kingma and Max Welling at Google
and Qualcomm. A variational autoencoder (VAE) provides a probabilistic manner for describing an
observation in latent space. Thus, rather than building an encoder that outputs a single value to
describe each latent state attribute, we’ll formulate our encoder to describe a probability distribution
for each latent attribute. It has many applications, such as data compression, synthetic data creation,
etc.
Variational autoencoder is different from an autoencoder in a way that it provides a statistical
manner for describing the samples of the dataset in latent space. Therefore, in the variational
autoencoder, the encoder outputs a probability distribution in the bottleneck layer instead of a single
output value.

Architecture of Variational Autoencoder

The encoder-decoder architecture lies at the heart of Variational Autoencoders (VAEs),
distinguishing them from traditional autoencoders. The encoder network takes raw input data and
transforms it into a probability distribution within the latent space.
The latent code generated by the encoder is a probabilistic encoding, allowing the VAE to express
not just a single point in the latent space but a distribution of potential representations.
The decoder network, in turn, takes a sampled point from the latent distribution and reconstructs it
back into data space. During training, the model refines both the encoder and decoder parameters to
minimize the reconstruction loss – the disparity between the input data and the decoded output. The
goal is not just to achieve accurate reconstruction but also to regularize the latent space, ensuring
that it conforms to a specified distribution.
The process involves a delicate balance between two essential components: the reconstruction loss
and the regularization term, often represented by the Kullback-Leibler divergence. The
reconstruction loss compels the model to accurately reconstruct the input, while the regularization
term encourages the latent space to adhere to the chosen distribution, preventing overfitting and
promoting generalization.
By iteratively adjusting these parameters during training, the VAE learns to encode input data into a
meaningful latent space representation. This optimized latent code encapsulates the underlying
features and structures of the data, facilitating precise reconstruction. The probabilistic nature of the
latent space also enables the generation of novel samples by drawing random points from the learned
distribution.

Variational Autoencoder

Mathematics behind Variational Autoencoder

Variational autoencoder uses KL-divergence as its loss function, the goal of this is to minimize the
difference between a supposed distribution and original distribution of dataset.
Suppose we have a distribution z and we want to generate the observation x from it. In other words,
we want to calculate

We can do it by following way:

But, the calculation of p(x) can be quite difficult

This usually makes it an intractable distribution. Hence, we need to approximate p(z|x) to q(z|x) to
make it a tractable distribution. To better approximate p(z|x) to q(z|x), we will minimize the KL-
divergence loss which calculates how similar two distributions are:

By simplifying, the above minimization problem is equivalent to the following maximization

problem :

The first term represents the reconstruction likelihood and the other term ensures that our learned
distribution q is similar to the true prior distribution p.
Thus our total loss consists of two terms, one is reconstruction error and other is KL-divergence loss:

Advantages
1. Variational Autoencoders are used to generate new data points that resemble the original
training data. These samples are learned from the latent space.
2. Variational Autoencoder is probabilistic framework that is used to learn a compressed
representation of the data that captures its underlying structure and variations, so it is useful in
detecting anomalies and data exploration.
Disadvantages
1. Variational Autoencoder use approximations to estimate the true distribution of the latent
variables. This approximation introduces some level of error, which can affect the quality of
generated samples.
2. The generated samples may only cover a limited subset of the true data distribution. This
can result in a lack of diversity in generated samples.

Generative Adversarial Networks (GANs)

GANs is an approach for generative modeling using deep learning methods such as CNN
(Convolutional Neural Network). Generative modeling is an unsupervised learning approach that
involves automatically discovering and learning patterns in input data such that the model can be used
to generate new examples from the original dataset.
GANs is a way of training a generative model by framing problem as a supervised learning problem
with two sub-models. There are two components of GANs:

1. Generator: It is trained to generate new dataset, for example in computer vision it generates
new images from existing real world images.
2. Discriminator: It compares those images with some real world examples and classify real and
fake images.
Example:
The Generator generates some random images (eg. tables) and then the discriminator compares those
images with some real world table images and sends the feed back to itself and Generator. Look at
GAN structure in fig. 1.

Fig. 1: GAN structure

How does GAN works? Let’s take an example of generating images of Dogs.
Step 1- Training of Discriminator

1. Firstly some random noise signal is sent to a generator which creates some useless images
containing noise
2. Two inputs are given to Discriminator. First is the sample output images generated from
Generator and second being the real world dog image samples.
3. There after, The Discriminator populates some values (probability) after comparing both the
images which can be seen in fig. 2. It calculates 0.8, 0.3 and 0.5 for generator output images and
0.1, 0.9 and 0.2 for the real world images.
4. Now, an error is calculated by comparing probabilities of generated images with 0 (Zero) and
comparing probabilities of real-word images with 1 (One). (Ex. 0-0.8, 0-0.3, 0-0.5 and 1-0.1, 1-0.9,
1-0.2).
5. After calculating individual errors, it will calculate cumulative error(loss) which is
backpropagated and the weights of the Discriminator are adjusted. This is how a Discriminator is
trained.

Fig. 2: GAN Step 1

Step 2 – Training of Generator:

1. As previously in step 1, the loss is back propagated to discriminator to adjust its weights. Now
we also need to back propagate an error to the Generator so that it can adjust its weights as well
and train itself to generate improved images.
2. Now, the images generated from the Generator are used as input to the generator itself instead
of random noise.
3. The newly generated images from the generator will now be an input to the Discriminator
which again calculates probabilities like 0.5, 0.1 and 0.2. (See fig. 2)
4. Now, an error will be calculated by comparing probabilities of generated images with 1(One).
5. After calculating individual errors, it will calculate cumulative error(loss) which is back
propagated and the weights of Generator are adjusted. This is how Generator is trained.
Fig. 2: GAN Step 2

After a few iterations, you will see that the Generator starts generating images close to real-world
images.
Applications of GAN:

1. Generating Images
2. Super Resolution
3. Image Modification
4. Photo realistic images
5. Face Ageing

al Network) represents a cutting-edge approach to generative modeling within deep learning, often
leveraging architectures like convolutional neural networks. The goal of generative modeling is to
autonomously identify patterns in input data, enabling the model to produce new examples that
feasibly resemble the original dataset.
What is a Generative Adversarial Network?
Generative Adversarial Networks (GANs) are a powerful class of neural networks that are used for
an unsupervised learning. GANs are made up of two neural networks, a discriminator and a
generator. They use adversarial training to produce artificial data that is identical to actual data.
 The Generator attempts to fool the Discriminator, which is tasked with accurately
distinguishing between produced and genuine data, by producing random noise samples.
 Realistic, high-quality samples are produced as a result of this competitive interaction,
which drives both networks toward advancement.
 GANs are proving to be highly versatile artificial intelligence tools, as evidenced by their
extensive use in image synthesis, style transfer, and text-to-image synthesis.
 They have also revolutionized generative modeling.
Through adversarial training, these models engage in a competitive interplay until the generator
becomes adept at creating realistic samples, fooling the discriminator approximately half the time.
Generative Adversarial Networks (GANs) can be broken down into three parts:
 Generative: To learn a generative model, which describes how data is generated in terms of
a probabilistic model.
 Adversarial: The word adversarial refers to setting one thing up against another. This
means that, in the context of GANs, the generative result is compared with the actual images in
the data set. A mechanism known as a discriminator is used to apply a model that attempts to
distinguish between real and fake images.
 Networks: Use deep neural networks as artificial intelligence (AI) algorithms for training
purposes.
Types of GANs
1. Vanilla GAN: This is the simplest type of GAN. Here, the Generator and the Discriminator
are simple a basic multi-layer perceptrons. In vanilla GAN, the algorithm is really simple, it tries
to optimize the mathematical equation using stochastic gradient descent.
2. Conditional GAN (CGAN): CGAN can be described as a deep learning method in
which some conditional parameters are put into place.
 In CGAN, an additional parameter ‘y’ is added to the Generator for generating the
corresponding data.
 Labels are also put into the input to the Discriminator in order for the
Discriminator to help distinguish the real data from the fake generated data.
3. Deep Convolutional GAN (DCGAN): DCGAN is one of the most popular and also the
most successful implementations of GAN. It is composed of ConvNets in place of multi-layer
perceptrons.
 The ConvNets are implemented without max pooling, which is in fact replaced by
convolutional stride.
 Also, the layers are not fully connected.
4. Laplacian Pyramid GAN (LAPGAN): The Laplacian pyramid is a linear invertible image
representation consisting of a set of band-pass images, spaced an octave apart, plus a low-
frequency residual.
 This approach uses multiple numbers of Generator and Discriminator
networks and different levels of the Laplacian Pyramid.
 This approach is mainly used because it produces very high-quality images. The
image is down-sampled at first at each layer of the pyramid and then it is again up-scaled at each
layer in a backward pass where the image acquires some noise from the Conditional GAN at
these layers until it reaches its original size.
5. Super Resolution GAN (SRGAN): SRGAN as the name suggests is a way of designing a
GAN in which a deep neural network is used along with an adversarial network in order to
produce higher-resolution images. This type of GAN is particularly useful in optimally up-
scaling native low-resolution images to enhance their details minimizing errors while doing so.
Architecture of GANs
A Generative Adversarial Network (GAN) is composed of two primary parts, which are the
Generator and the Discriminator.

Generator Model

A key element responsible for creating fresh, accurate data in a Generative Adversarial Network
(GAN) is the generator model. The generator takes random noise as input and converts it into
complex data samples, such text or images. It is commonly depicted as a deep neural network.

The training data’s underlying distribution is captured by layers of learnable parameters in its design
through training. The generator adjusts its output to produce samples that closely mimic real data as
it is being trained by using backpropagation to fine-tune its parameters.
The generator’s ability to generate high-quality, varied samples that can fool the discriminator is
what makes it successful.
Generator Loss
The objective of the generator in a GAN is to produce synthetic samples that are realistic enough to
fool the discriminator. The generator achieves this by minimizing its loss function JG. The loss is
minimized when the log probability is maximized, i.e., when the discriminator is highly likely to
classify the generated samples as real. The following equation is given below:

Where,
 JG measure how well the generator is fooling the discriminator.
 log D(G(zi))represents log probability of the discriminator being correct for generated samples.
 The generator aims to minimize this loss, encouraging the production of samples that the
discriminator classifies as real (logD(G(zi)), close to 1.

Discriminator Model

An artificial neural network called a discriminator model is used in Generative Adversarial Networks
(GANs) to differentiate between generated and actual input. By evaluating input samples and
allocating probability of authenticity, the discriminator functions as a binary classifier.
Over time, the discriminator learns to differentiate between genuine data from the dataset and
artificial samples created by the generator. This allows it to progressively hone its parameters and
increase its level of proficiency.
Convolutional layers or pertinent structures for other modalities are usually used in its architecture
when dealing with picture data. Maximizing the discriminator’s capacity to accurately identify
generated samples as fraudulent and real samples as authentic is the aim of the adversarial training
procedure. The discriminator grows increasingly discriminating as a result of the generator and
discriminator’s interaction, which helps the GAN produce extremely realistic-looking synthetic data
overall.
Discriminator Loss
The discriminator reduces the negative log likelihood of correctly classifying both produced and real
samples. This loss incentivizes the discriminator to accurately categorize generated samples as fake
and real samples with the following equation:

JD assesses the discriminator’s ability to discern between produced and actual samples.
The log likelihood that the discriminator will accurately categorize real data is represented by
logD(xi).
The log chance that the discriminator would correctly categorize generated samples as fake is
represented by (1−log(1−D(G(zi))).
The discriminator aims to reduce this loss by accurately identifying artificial and real samples.
MinMax Loss
In a Generative Adversarial Network (GAN), the minimax loss formula is provided by:

Where,
 G is generator network and is D is the discriminator network
 Actual data samples obtained from the true data distribution pdata(x) are represented by x.
 Random noise sampled from a previous distribution pz(z)(usually a normal or uniform
distribution) is represented by z.
 D(x) represents the discriminator’s likelihood of correctly identifying actual data as real.
 D(G(z)) is the likelihood that the discriminator will identify generated data coming from the
generator as authentic.
How does a GAN work?
The steps involved in how a GAN works:
1. Initialization: Two neural networks are created: a Generator (G) and a Discriminator (D).
 G is tasked with creating new data, like images or text, that closely resembles real data.
 D acts as a critic, trying to distinguish between real data (from a training dataset) and the data
generated by G.
2. Generator’s First Move:
G takes a random noise vector as input. This noise vector contains random values and acts as the
starting point for G’s creation process. Using its internal layers and learned patterns, G transforms
the noise vector into a new data sample, like a generated image.
3. Discriminator’s Turn: D receives two kinds of inputs:
 Real data samples from the training dataset.
 The data samples generated by G in the previous step. D’s job is to analyze each input and
determine whether it’s real data or something G cooked up. It outputs a probability score
between 0 and 1. A score of 1 indicates the data is likely real, and 0 suggests it’s fake.
4. The Learning Process: Now, the adversarial part comes in:
If D correctly identifies real data as real (score close to 1) and generated data as fake (score close to
0), both G and D are rewarded to a small degree. This is because they’re both doing their jobs well.
However, the key is to continuously improve. If D consistently identifies everything correctly, it
won’t learn much. So, the goal is for G to eventually trick D.
5. Generator’s Improvement:
When D mistakenly labels G’s creation as real (score close to 1), it’s a sign that G is on the right
track. In this case, G receives a significant positive update, while D receives a penalty for being
fooled.
This feedback helps G improve its generation process to create more realistic data.
6. Discriminator’s Adaptation:
Conversely, if D correctly identifies G’s fake data (score close to 0), but G receives no reward, D is
further strengthened in its discrimination abilities.
This ongoing duel between G and D refines both networks over time.
As training progresses, G gets better at generating realistic data, making it harder for D to tell the
difference. Ideally, G becomes so adept that D can’t reliably distinguish real from fake data. At this
point, G is considered well-trained and can be used to generate new, realistic data samples.
Implementation of Generative Adversarial Network (GAN)
We will follow and understand the steps to understand how GAN is implemented:
Step1 : Importing the required libraries

 Python3
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import numpy as np

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

For training on the CIFAR-10 image dataset, this PyTorch module creates a Generative Adversarial
Network (GAN), switching between generator and discriminator training. Visualization of the
generated images occurs every tenth epoch, and the development of the GAN is tracked.
Step 2: Defining a Transform
The code uses PyTorch’s transforms to define a simple picture transforms.Compose. It normalizes
and transforms photos into tensors.

 Python3

# Define a basic transform

transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

Step 3: Loading the Dataset

A CIFAR-10 dataset is created for training with below code, which also specifies a root directory,
turns on train mode, downloads if needed, and applies the specified transform. Subsequently, it
generates a 32-batch DataLoader and shuffles the training set of data.

 Python3

train_dataset = datasets.CIFAR10(root='./data',\
train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(train_dataset, \
batch_size=32, shuffle=True)

Step 4: Defining parameters to be used in later processes

A Generative Adversarial Network (GAN) is used with specified hyperparameters.
 The latent space’s dimensionality is represented by latent_dim.
 lr is the optimizer’s learning rate.
 The coefficients for the Adam optimizer are beta1 and beta2. To find the total number of
training epochs, use num_epochs.

 Python3

# Hyperparameters
latent_dim = 100
lr = 0.0002
beta1 = 0.5
beta2 = 0.999
num_epochs = 10

Step 5: Defining a Utility Class to Build the Generator

The generator architecture for a GAN in PyTorch is defined with below code.
 From nn.Module, the Generator class inherits. It is comprised of a sequential model with
Tanh, linear, convolutional, batch normalization, reshaping, and upsampling layers.
 The neural network synthesizes an image (img) from a latent vector (z), which is the
generator’s output.
The architecture uses a series of learned transformations to turn the initial random noise in the latent
space into a meaningful image.

 Python3

# Define the generator

class Generator(nn.Module):
def __init__(self, latent_dim):
super(Generator, self).__init__()

self.model = nn.Sequential(
nn.Linear(latent_dim, 128 * 8 * 8),
nn.ReLU(),
nn.Unflatten(1, (128, 8, 8)),
nn.Upsample(scale_factor=2),
nn.Conv2d(128, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128, momentum=0.78),
nn.ReLU(),
nn.Upsample(scale_factor=2),
nn.Conv2d(128, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64, momentum=0.78),
nn.ReLU(),
nn.Conv2d(64, 3, kernel_size=3, padding=1),
nn.Tanh()
)

def forward(self, z):

img = self.model(z)
return img

Step 6: Defining a Utility Class to Build the Discriminator

The PyTorch code describes the discriminator architecture for a GAN. The class Discriminator is
descended from nn.Module. It is composed of linear layers, batch normalization, dropout,
convolutional, LeakyReLU, and sequential layers.
An image (img) is the discriminator’s input, and its validity—the probability that the input image is
real as opposed to artificial—is its output.

 Python3

# Define the discriminator

class Discriminator(nn.Module):
def __init__(self):
super(Discriminator, self).__init__()

self.model = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
nn.LeakyReLU(0.2),
nn.Dropout(0.25),
nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
nn.ZeroPad2d((0, 1, 0, 1)),
nn.BatchNorm2d(64, momentum=0.82),
nn.LeakyReLU(0.25),
nn.Dropout(0.25),
nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),
nn.BatchNorm2d(128, momentum=0.82),
nn.LeakyReLU(0.2),
nn.Dropout(0.25),
nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(256, momentum=0.8),
nn.LeakyReLU(0.25),
nn.Dropout(0.25),
nn.Flatten(),
nn.Linear(256 * 5 * 5, 1),
nn.Sigmoid()
)

def forward(self, img):

validity = self.model(img)
return validity

Step 7: Building the Generative Adversarial Network

The code snippet defines and initializes a discriminator (Discriminator) and a generator (Generator).
 The designated device (GPU if available) receives both models. Binary Cross Entropy
Loss, which is frequently used for GANs, is selected as the loss function (adversarial_loss).
 For the generator (optimizer_G) and discriminator (optimizer_D), distinct Adam optimizers
with predetermined learning rates and betas are also defined.

 Python3

# Define the generator and discriminator

# Initialize generator and discriminator
generator = Generator(latent_dim).to(device)
discriminator = Discriminator().to(device)
# Loss function
adversarial_loss = nn.BCELoss()
# Optimizers
optimizer_G = optim.Adam(generator.parameters()\
, lr=lr, betas=(beta1, beta2))
optimizer_D = optim.Adam(discriminator.parameters()\
, lr=lr, betas=(beta1, beta2))

Step 8: Training the Generative Adversarial Network

For a Generative Adversarial Network (GAN), the code implements the training loop.
 The training data batches are iterated through during each epoch. Whereas the generator
(optimizer_G) is trained to generate realistic images that trick the discriminator, the
discriminator (optimizer_D) is trained to distinguish between real and phony images.
 The generator and discriminator’s adversarial losses are computed. Model parameters are
updated by means of Adam optimizers and the losses are backpropagated.
 Discriminator printing and generator losses are used to track progress. For a visual assessment
of the training process, generated images are additionally saved and shown every 10 epochs.

Python3

# Training loop
for epoch in range(num_epochs):
for i, batch in enumerate(dataloader):
# Convert list to tensor
real_images = batch[0].to(device)
# Adversarial ground truths
valid = torch.ones(real_images.size(0), 1, device=device)
fake = torch.zeros(real_images.size(0), 1, device=device)
# Configure input
real_images = real_images.to(device)

# ---------------------
# Train Discriminator
# ---------------------
optimizer_D.zero_grad()
# Sample noise as generator input
z = torch.randn(real_images.size(0), latent_dim, device=device)
# Generate a batch of images
fake_images = generator(z)

# Measure discriminator's ability

# to classify real and fake images
real_loss = adversarial_loss(discriminator\
(real_images), valid)
fake_loss = adversarial_loss(discriminator\
(fake_images.detach()), fake)
d_loss = (real_loss + fake_loss) / 2
# Backward pass and optimize
d_loss.backward()
optimizer_D.step()

# -----------------
# Train Generator
# -----------------

optimizer_G.zero_grad()
# Generate a batch of images
gen_images = generator(z)
# Adversarial loss
g_loss = adversarial_loss(discriminator(gen_images), valid)
# Backward pass and optimize
g_loss.backward()
optimizer_G.step()
# ---------------------
# Progress Monitoring
# ---------------------
if (i + 1) % 100 == 0:
print(
f"Epoch [{epoch+1}/{num_epochs}]\
Batch {i+1}/{len(dataloader)} "
f"Discriminator Loss: {d_loss.item():.4f} "
f"Generator Loss: {g_loss.item():.4f}"
)
# Save generated images for every epoch
if (epoch + 1) % 10 == 0:
with torch.no_grad():
z = torch.randn(16, latent_dim, device=device)
generated = generator(z).detach().cpu()
grid = torchvision.utils.make_grid(generated,\
nrow=4, normalize=True)
plt.imshow(np.transpose(grid, (1, 2, 0)))
plt.axis("off")
plt.show()

Output:
Epoch [10/10] Batch 1300/1563 Discriminator Loss: 0.4473 Generator Loss: 0.9555
Epoch [10/10] Batch 1400/1563 Discriminator Loss: 0.6643 Generator Loss: 1.0215
Epoch [10/10] Batch 1500/1563 Discriminator Loss: 0.4720 Generator Loss: 2.5027

GAN Output
Application Of Generative Adversarial Networks (GANs)
GANs, or Generative Adversarial Networks, have many uses in many different fields. Here are some
of the widely recognized uses of GANs:
1. Image Synthesis and Generation : GANs are often used for picture synthesis and
generation tasks, They may create fresh, lifelike pictures that mimic training data by learning
the distribution that explains the dataset. The development of lifelike avatars, high-resolution
photographs, and fresh artwork have all been facilitated by these types of generative networks.
2. Image-to-Image Translation : GANs may be used for problems involving image-to-image
translation, where the objective is to convert an input picture from one domain to another while
maintaining its key features. GANs may be used, for instance, to change pictures from day to
night, transform drawings into realistic images, or change the creative style of an image.
3. Text-to-Image Synthesis : GANs have been used to create visuals from descriptions in text.
GANs may produce pictures that translate to a description given a text input, such as a phrase or
a caption. This application might have an impact on how realistic visual material is produced
using text-based instructions.
4. Data Augmentation : GANs can augment present data and increase the robustness and
generalizability of machine-learning models by creating synthetic data samples.
5. Data Generation for Training : GANs can enhance the resolution and quality of low-
resolution images. By training on pairs of low-resolution and high-resolution images, GANs can
generate high-resolution images from low-resolution inputs, enabling improved image quality in
various applications such as medical imaging, satellite imaging, and video enhancement.

Advantages of GAN

The advantages of the GANs are as follows:

1. Synthetic data generation: GANs can generate new, synthetic data that resembles some
known data distribution, which can be useful for data augmentation, anomaly detection, or
creative applications.
2. High-quality results: GANs can produce high-quality, photorealistic results in image
synthesis, video synthesis, music synthesis, and other tasks.
3. Unsupervised learning: GANs can be trained without labeled data, making them suitable
for unsupervised learning tasks, where labeled data is scarce or difficult to obtain.
4. Versatility: GANs can be applied to a wide range of tasks, including image synthesis, text-
to-image synthesis, image-to-image translation, anomaly detection, data augmentation, and
others.
Disadvantages of GAN
The disadvantages of the GANs are as follows:

1. Training Instability: GANs can be difficult to train, with the risk of instability, mode
collapse, or failure to converge.
2. Computational Cost: GANs can require a lot of computational resources and can be slow
to train, especially for high-resolution images or large datasets.
3. Overfitting: GANs can overfit the training data, producing synthetic data that is too similar
to the training data and lacking diversity.
4. Bias and Fairness: GANs can reflect the biases and unfairness present in the training data,
leading to discriminatory or biased synthetic data.
5. Interpretability and Accountability: GANs can be opaque and difficult to interpret or
explain, making it challenging to ensure accountability, transparency, or fairness in their
applications.

Use Cases of Generative Adversarial Networks

Image synthesis: Generating new, realistic images from a given data distribution, such as faces,
landscapes, or animals.

1. Text-to-Image synthesis: Generating images from text descriptions, such as scene descriptions,
object descriptions, or attributes.
2. Image-to-Image translation: Translating images from one domain to another, such as
converting grayscale images to color, changing the season of a scene, or transforming sketches into
photorealistic images.
3. Anomaly detection: Identifying anomalies or outliers in data, such as detecting fraud in
financial transactions, detecting network intrusions, or identifying medical conditions in medical
imaging.
4. Data augmentation: Increasing the size and diversity of a dataset for training deep learning
models, such as in computer vision, speech recognition, or natural language processing.
5. Video synthesis: Generating new, realistic video sequences from a given data distribution,
such as human action sequences, animal behaviors, or animated sequences.
6. Music synthesis: Generating new, original music from a given data distribution, such as
musical genres, styles, or instrumentations.
3D model synthesis: Generating new, realistic 3D models from a given data distribution, such as
objects, scenes, or shapes.
Generative Adversarial Networks (GANs) are most popular for generating images from a given dataset
of images but apart from it, GANs are now being used for a variety of applications. These are a class of
neural networks that has a discriminator block and a generator block that works together and is able to
produce new samples apart from just classifying or predicting the class of sample.
Some of the newly discovered uses cases of GANs are:
Security: Artificial intelligence has proved to be a boon to many industries but it is also surrounded by
the problem of Cyber threats.GANs are proved to be a great help to handle adversarial attacks. The
adversarial attacks use a variety of techniques to fool deep learning architectures. By creating fake
examples and training the model to identify them we counter these attacks.
Generating Data using GANs: Data is the most important key for any deep learning algorithm. In
general, the more is the data, the better is the performance of any deep learning algorithm. But in many
cases such as health diagnostics, the amount of data is restricted, in such cases, there is a need to
generate good quality data. For which GANs are being used.
Privacy-Preserving: There are many cases when our data needs to be kept confidential. This is
especially useful in defense and military applications. We have many data encryption schemes but each
has its own limitations, in such a case GANs can be useful. Recently, in 2016, Google opened a new
research path on using GAN competitive framework for encryption problems, where two networks had
to compete in creating the code and cracking it.
Data Manipulation:
We can use GANs for pseudo style transfer i.e. modifying a part of the subject, without complete style
transfer. For e.g. in many applications, we want to add a smile to an image, or just work on the eyes
part of the image. This can also be extended to other domains such as Natural Language Processing,
speech processing, etc. For e.g. we can work on some selected words of a paragraph without modifying
the whole paragraph.
Advantages of Generative Adversarial Network (GAN) use cases:

1. Image synthesis: GANs can generate high-quality, photorealistic images, which can be used in
a variety of applications, such as entertainment, art, or marketing.
2. Text-to-Image synthesis: GANs can generate images from text descriptions, which can be
useful for generating illustrations, animations, or virtual environments.
3. Image-to-Image translation: GANs can translate images from one domain to another, which
can be used for colorization, style transfer, or data augmentation.
4. Anomaly detection: GANs can identify anomalies or outliers in data, which can be useful for
detecting fraud, network intrusions, or medical conditions.
5. Data augmentation: GANs can increase the size and diversity of a dataset for training deep
learning models, which can improve their performance, robustness, or generalization.
6. Video synthesis: GANs can generate high-quality, realistic video sequences, which can be
used in animation, film, or video games.
7. Music synthesis: GANs can generate new, original music, which can be used in music
composition, performance, or entertainment.
8. 3D model synthesis: GANs can generate high-quality, realistic 3D models, which can be used
in architecture, design, or engineering.
Disadvantages of Generative Adversarial Network (GAN) use cases:
1. Training difficulty: GANs can be difficult to train and require a lot of computational resources,
which can be a barrier for some applications.
2. Overfitting: GANs can overfit to the training data, producing synthetic data that is too similar
to the training data and lacking diversity.
3. Bias and fairness: GANs can reflect the biases and unfairness present in the training data,
leading to discriminatory or biased synthetic data.
4. Interpretability and accountability: GANs can be opaque and difficult to interpret or explain,
making it challenging to ensure accountability, transparency, or fairness in their applications.
5. Quality control: GANs can generate unrealistic or irrelevant synthetic data if the generator and
discriminator are not properly trained, which can affect the quality of the results.

Introduction to Convolution Neural Network

A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture
commonly used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a
computer to understand and interpret the image or visual data.
When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural Networks
are used in various datasets like images, audio, and text. Different types of Neural Networks are used
for different purposes, for example for predicting the sequence of words we use Recurrent Neural
Networks more precisely an LSTM, similarly for image classification we use Convolution Neural
networks. In this blog, we are going to build a basic building block for CNN.
In a regular Neural Network there are three types of layers:

1. Input Layers: It’s the layer in which we give input to our model. The number of neurons in
this layer is equal to the total number of features in our data (number of pixels in the case of an
image).
2. Hidden Layer: The input from the Input layer is then fed into the hidden layer. There can be
many hidden layers depending on our model and data size. Each hidden layer can have different
numbers of neurons which are generally greater than the number of features. The output from each
layer is computed by matrix multiplication of the output of the previous layer with learnable
weights of that layer and then by the addition of learnable biases followed by activation function
which makes the network nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a logistic function like
sigmoid or softmax which converts the output of each class into the probability score of each class.
The data is fed into the model and output from each layer is obtained from the above step is
called feedforward, we then calculate the error using an error function, some common error functions
are cross-entropy, square loss error, etc. The error function measures how well the network is
performing. After that, we backpropagate into the model by calculating the derivatives. This step is
called Backpropagation which basically is used to minimize the loss.
Convolution Neural Network
Convolutional Neural Network (CNN) is the extended version of artificial neural networks
(ANN) which is predominantly used to extract the feature from the grid-like matrix dataset. For
example visual datasets like images or videos where data patterns play an extensive role.
CNN architecture
Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer,
Pooling layer, and fully connected layers.
Simple CNN architecture
The Convolutional layer applies filters to the input image to extract features, the Pooling layer
downsamples the image to reduce computation, and the fully connected layer makes the final
prediction. The network learns the optimal filters through backpropagation and gradient descent.
How Convolutional Layers works
Convolution Neural Networks or covnets are neural networks that share their parameters. Imagine you
have an image. It can be represented as a cuboid having its length, width (dimension of the image), and
height (i.e the channel as images generally have red, green, and blue channels).

Now imagine taking a small patch of this image and running a small neural network, called a filter or
kernel on it, with say, K outputs and representing them vertically. Now slide that neural network across
the whole image, as a result, we will get another image with different widths, heights, and depths.
Instead of just R, G, and B channels now we have more channels but lesser width and height. This
operation is called Convolution. If the patch size is the same as that of the image it will be a regular
neural network. Because of this small patch, we have fewer weights.

Now let’s talk about a bit of mathematics that is involved in the whole convolution process.
 Convolution layers consist of a set of learnable filters (or kernels) having small widths and
heights and the same depth as that of input volume (3 if the input layer is image input).
 For example, if we have to run convolution on an image with dimensions 34x34x3. The
possible size of filters can be axax3, where ‘a’ can be anything like 3, 5, or 7 but smaller as
compared to the image dimension.
 During the forward pass, we slide each filter across the whole input volume step by step where
each step is called stride (which can have a value of 2, 3, or even 4 for high-dimensional images)
and compute the dot product between the kernel weights and patch from input volume.
 As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together as a
result, we’ll get output volume having a depth equal to the number of filters. The network will
learn all the filters.

Layers used to build ConvNets

A complete Convolution Neural Networks architecture is also known as covnets. A covnets is a
sequence of layers, and every layer transforms one volume to another through a differentiable
function.
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
 Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the
input will be an image or a sequence of images. This layer holds the raw input of the image with
width 32, height 32, and depth 3.
 Convolutional Layers: This is the layer, which is used to extract the feature from the input
dataset. It applies a set of learnable filters known as the kernels to the input images. The
filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input image
data and computes the dot product between kernel weight and the corresponding input image patch.
The output of this layer is referred as feature maps. Suppose we use a total of 12 filters for this
layer we’ll get an output volume of dimension 32 x 32 x 12.
 Activation Layer: By adding an activation function to the output of the preceding layer,
activation layers add nonlinearity to the network. it will apply an element-wise activation function
to the output of the convolution layer. Some common activation functions are RELU: max(0,
x), Tanh, Leaky RELU, etc. The volume remains unchanged hence output volume will have
dimensions 32 x 32 x 12.
 Pooling layer: This layer is periodically inserted in the covnets and its main function is to
reduce the size of volume which makes the computation fast reduces memory and also prevents
overfitting. Two common types of pooling layers are max pooling and average pooling. If we use
a max pool with 2 x 2 filters and stride 2, the resultant volume will be of dimension 16x16x12.

 Flattening: The resulting feature maps are flattened into a one-dimensional vector after the
convolution and pooling layers so they can be passed into a completely linked layer for
categorization or regression.
 Fully Connected Layers: It takes the input from the previous layer and computes the final
classification or regression task.
Image source: cs231n.stanford.edu
Output Layer: The output from the fully connected layers is then fed into a logistic function for
classification tasks like sigmoid or softmax which converts the output of each class into the probability
score of each class.

Transfer Learning
We, humans, are very perfect at applying the transfer of knowledge between tasks. This means that
whenever we encounter a new problem or a task, we recognize it and apply our relevant knowledge
from our previous learning experiences. This makes our work easy and fast to finish. For instance, if
you know how to ride a bicycle and if you are asked to ride a motorbike which you have never done
before. In such a case, our experience with a bicycle will come into play and handle tasks like
balancing the bike, steering, etc. This will make things easier compared to a complete beginner. Such
learnings are very useful in real life as they make us more perfect and allow us to earn more
experience. Following the same approach, a term was introduced Transfer Learning in the field of
machine learning. This approach involves the use of knowledge that was learned in some tasks and
applying it to solve the problem in the related target task. While most machine learning is designed
to address a single task, the development of algorithms that facilitate transfer learning is a topic of
ongoing interest in the machine-learning community.
What is Transfer Learning?
Transfer learning is a technique in machine learning where a model trained on one task is used as the
starting point for a model on a second task. This can be useful when the second task is similar to the
first task, or when there is limited data available for the second task. By using the learned features
from the first task as a starting point, the model can learn more quickly and effectively on the second
task. This can also help to prevent overfitting, as the model will have already learned general
features that are likely to be useful in the second task.

Why do we need Transfer Learning?

Many deep neural networks trained on images have a curious phenomenon in common: in the early
layers of the network, a deep learning model tries to learn a low level of features, like detecting
edges, colours, variations of intensities, etc. Such kind of features appear not to be specific to a
particular dataset or a task because no matter what type of image we are processing either for
detecting a lion or car. In both cases, we have to detect these low-level features. All these features
occur regardless of the exact cost function or image dataset. Thus, learning these features in one task
of detecting lions can be used in other tasks like detecting humans.
How does Transfer Learning work?

This is a general summary of how transfer learning works:

 Pre-trained Model: Start with a model that has previously been trained for a certain task
using a large set of data. Frequently trained on extensive datasets, this model has identified
general features and patterns relevant to numerous related jobs.
 Base Model: The model that has been pre-trained is known as the base model. It is made up
of layers that have utilized the incoming data to learn hierarchical feature representations.
 Transfer Layers: In the pre-trained model, find a set of layers that capture generic
information relevant to the new task as well as the previous one. Because they are prone to
learning low-level information, these layers are frequently found near the top of the network.
 Fine-tuning: Using the dataset from the new challenge to retrain the chosen layers. We
define this procedure as fine-tuning. The goal is to preserve the knowledge from the pre-training
while enabling the model to modify its parameters to better suit the demands of the current
assignment.
The Block diagram is shown below as follows:

Transfer Learning
Low-level features learned for task A should be beneficial for learning of model for task B.
This is what transfer learning is. Nowadays, it is very hard to see people training whole
convolutional neural networks from scratch, and it is common to use a pre-trained model trained on a
variety of images in a similar task, e.g models trained on ImageNet (1.2 million images with 1000
categories) and use features from them to solve a new task. When dealing with transfer learning, we
come across a phenomenon called the freezing of layers. A layer, it can be a CNN layer, hidden layer,
a block of layers, or any subset of a set of all layers, is said to be fixed when it is no longer available
to train. Hence, the weights of freeze layers will not be updated during training. While layers that are
not frozen follows regular training procedure. When we use transfer learning in solving a problem,
we select a pre-trained model as our base model. Now, there are two possible approaches to using
knowledge from the pre-trained model. The first way is to freeze a few layers of the pre-trained
model and train other layers on our new dataset for the new task. The second way is to make a new
model, but also take out some features from the layers in the pre-trained model and use them in a
newly created model. In both cases, we take out some of the learned features and try to train the rest
of the model. This makes sure that the only feature that may be the same in both of the tasks is taken
out from the pre-trained model, and the rest of the model is changed to fit the new dataset by
training.

Freezed and Trainable Layers:

Now, one may ask how to determine which layers we need to freeze, and which layers need to train.
The answer is simple, the more you want to inherit features from a pre-trained model, the more you
have to freeze layers. For instance, if the pre-trained model detects some flower species and we need
to detect some new species. In such a case, a new dataset with new species contains a lot of features
similar to the pre-trained model. Thus, we freeze less number of layers so that we can use most of its
knowledge in a new model. Now, consider another case, if there is a pre-trained model which detects
humans in images, and we want to use that knowledge to detect cars, in such a case where the dataset
is entirely different, it is not good to freeze lots of layers because freezing a large number of layers
will not only give low level features but also give high-level features like nose, eyes, etc which are
useless for new dataset (car detection). Thus, we only copy low-level features from the base network
and train the entire network on a new dataset.
Let’s consider all situations where the size and dataset of the target task vary from the base network.
 The target dataset is small and similar to the base network dataset: Since the target
dataset is small, that means we can fine-tune the pre-trained network with the target dataset. But
this may lead to a problem of overfitting. Also, there may be some changes in the number of
classes in the target task. So, in such a case we remove the fully connected layers from the end,
maybe one or two, and add a new fully connected layer satisfying the number of new classes.
Now, we freeze the rest of the model and only train newly added layers.
 The target dataset is large and similar to the base training dataset: In such cases when
the dataset is large, and it can hold a pre-trained model there will be no chance of overfitting.
Here, also the last full-connected layer is removed, and a new fully-connected layer is added
with the proper number of classes. Now, the entire model is trained on a new dataset. This
makes sure to tune the model on a new large dataset keeping the model architecture the same.
 The target dataset is small and different from the base network dataset: Since the
target dataset is different, using high-level features of the pre-trained model will not be useful.
In such a case, remove most of the layers from the end in a pre-trained model, and add new
layers a satisfying number of classes in a new dataset. This way we can use low-level features
from the pre-trained model and train the rest of the layers to fit a new dataset. Sometimes, it is
beneficial to train the entire network after adding a new layer at the end.
 The target dataset is large and different from the base network dataset: Since the target
network is large and different, the best way is to remove the last layers from the pre-trained
network and add layers with a satisfying number of classes, then train the entire network without
freezing any layer.
Transfer learning is a very effective and fast way, to begin with, a problem. It gives the direction to
move, and most of the time best results are also obtained by transfer learning.

Advantages of transfer learning:

 Speed up the training process: By using a pre-trained model, the model can learn more
quickly and effectively on the second task, as it already has a good understanding of the features
and patterns in the data.
 Better performance: Transfer learning can lead to better performance on the second task, as
the model can leverage the knowledge it has gained from the first task.
 Handling small datasets: When there is limited data available for the second task, transfer
learning can help to prevent overfitting, as the model will have already learned general features
that are likely to be useful in the second task.

Disadvantages of transfer learning:

 Domain mismatch: The pre-trained model may not be well-suited to the second task if the
two tasks are vastly different or the data distribution between the two tasks is very different.
 Overfitting: Transfer learning can lead to overfitting if the model is fine-tuned too much on
the second task, as it may learn task-specific features that do not generalize well to new data.
 Complexity: The pre-trained model and the fine-tuning process can be computationally
expensive and may require specialized hardware.

Long_Short_Term_Memory_(LSTM)
No ratings yet
Long_Short_Term_Memory_(LSTM)
23 pages
Deep
No ratings yet
Deep
15 pages
L11 - UCLxDeepMind DL2020
No ratings yet
L11 - UCLxDeepMind DL2020
68 pages
‎⁨فصل ثاني اسراء⁩
No ratings yet
‎⁨فصل ثاني اسراء⁩
13 pages
conmatphys-031119-050745
No ratings yet
conmatphys-031119-050745
28 pages
Statistics Mechanic of Deep Learning
No ratings yet
Statistics Mechanic of Deep Learning
28 pages
Module1_ Deep Learning
No ratings yet
Module1_ Deep Learning
26 pages
Statistical Mechanics of Deep Learning
No ratings yet
Statistical Mechanics of Deep Learning
30 pages
NNunsuperv Learning PDF
No ratings yet
NNunsuperv Learning PDF
21 pages
ANN ARTIFICAL NEURAL NETWORK
No ratings yet
ANN ARTIFICAL NEURAL NETWORK
34 pages
Machine Learning Report
No ratings yet
Machine Learning Report
58 pages
ML+LVC+1+Post-Session+Summary
No ratings yet
ML+LVC+1+Post-Session+Summary
15 pages
20 StatMechDeep
No ratings yet
20 StatMechDeep
30 pages
Representation Learning
No ratings yet
Representation Learning
6 pages
DL Unit-1
No ratings yet
DL Unit-1
25 pages
Unit-2: Logistic Regression
No ratings yet
Unit-2: Logistic Regression
30 pages
ML Merge
No ratings yet
ML Merge
145 pages
Quadrant Data Efficient Machine Learning Screen
No ratings yet
Quadrant Data Efficient Machine Learning Screen
6 pages
ML FINA l note
No ratings yet
ML FINA l note
90 pages
An Introductory Note On Machine Learning. A V Narasimhadhan
No ratings yet
An Introductory Note On Machine Learning. A V Narasimhadhan
2 pages
Data Exploration
No ratings yet
Data Exploration
5 pages
Chapter Introduction
No ratings yet
Chapter Introduction
7 pages
A Gentle Introduction To Generative Adversarial Networks (GANs)
No ratings yet
A Gentle Introduction To Generative Adversarial Networks (GANs)
15 pages
Unit 5a
No ratings yet
Unit 5a
60 pages
Unit 2 Machine learning aktu
No ratings yet
Unit 2 Machine learning aktu
18 pages
Week-12 - Introduction To ML-NN-CNN
No ratings yet
Week-12 - Introduction To ML-NN-CNN
45 pages
Neural Networks
No ratings yet
Neural Networks
32 pages
Mixture Models
No ratings yet
Mixture Models
16 pages
Book Chapter-Deep Learning-New 23-05-2019
No ratings yet
Book Chapter-Deep Learning-New 23-05-2019
24 pages
Supervised Unsupervised Reinforcement
No ratings yet
Supervised Unsupervised Reinforcement
39 pages
Unsupervised Learning Using Back Propagation in Neural Networks
No ratings yet
Unsupervised Learning Using Back Propagation in Neural Networks
4 pages
Unit V
No ratings yet
Unit V
22 pages
Supervised learning
No ratings yet
Supervised learning
19 pages
Ml Quation Bank
No ratings yet
Ml Quation Bank
50 pages
NN PDF
No ratings yet
NN PDF
23 pages
Deep Learning Models
No ratings yet
Deep Learning Models
18 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
17 pages
Module 1.Pptx
No ratings yet
Module 1.Pptx
64 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
51 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
21 pages
ML Lecture - 1
No ratings yet
ML Lecture - 1
33 pages
Unit 3 Model Construction 3.1 Machine Learning Concepts - An Overview
No ratings yet
Unit 3 Model Construction 3.1 Machine Learning Concepts - An Overview
36 pages
Module 1
No ratings yet
Module 1
22 pages
2 ML
No ratings yet
2 ML
9 pages
Unit-I (R20 Syllabus) Machine Learning Basics
No ratings yet
Unit-I (R20 Syllabus) Machine Learning Basics
50 pages
03-Lecture Notes-Mid
No ratings yet
03-Lecture Notes-Mid
23 pages
A Tour of Unsupervised Deep Learning For Medical Image Analysis
No ratings yet
A Tour of Unsupervised Deep Learning For Medical Image Analysis
29 pages
22BSA10066 AI Assignment
No ratings yet
22BSA10066 AI Assignment
5 pages
86 37 196 Mod 5
No ratings yet
86 37 196 Mod 5
52 pages
Imp Questions
No ratings yet
Imp Questions
42 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
32 pages
ML Archs
No ratings yet
ML Archs
36 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
SS CH2 LM AI CLASS X
No ratings yet
SS CH2 LM AI CLASS X
92 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
unit 6
No ratings yet
unit 6
19 pages
4 AI ML - 2
No ratings yet
4 AI ML - 2
31 pages
MLT Assignment 6
No ratings yet
MLT Assignment 6
4 pages
Artificial Intelligence Chapter 18 (Updated)
No ratings yet
Artificial Intelligence Chapter 18 (Updated)
19 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Unit V Tn321
No ratings yet
Unit V Tn321
50 pages
Experiment No. 4 TE SL-II (ANN)
No ratings yet
Experiment No. 4 TE SL-II (ANN)
2 pages
2008-A Literature Survey On Domain Adaptation of Statistical Classifiers
No ratings yet
2008-A Literature Survey On Domain Adaptation of Statistical Classifiers
12 pages
Retest HY Exam QP AI 9th
No ratings yet
Retest HY Exam QP AI 9th
2 pages
Module 3
No ratings yet
Module 3
11 pages
CS3491- AIML - QBANK (2)
No ratings yet
CS3491- AIML - QBANK (2)
9 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
Big-Data Unit-3
100% (1)
Big-Data Unit-3
54 pages
Implementation and Optimization of The Accelerator Based On FPGA Hardware For LSTM Network
No ratings yet
Implementation and Optimization of The Accelerator Based On FPGA Hardware For LSTM Network
8 pages
Handwritten Text Recognition Using Deep Learning
No ratings yet
Handwritten Text Recognition Using Deep Learning
13 pages
Mera: Merging Pretrained Adapters For Few-Shot Learning
No ratings yet
Mera: Merging Pretrained Adapters For Few-Shot Learning
6 pages
ML - Project
No ratings yet
ML - Project
1 page
CS 4063 Natural Language Processing Outline Spring2022
No ratings yet
CS 4063 Natural Language Processing Outline Spring2022
4 pages
ANFIS
No ratings yet
ANFIS
19 pages
gen_ai
No ratings yet
gen_ai
2 pages
Defect Detection Choladeck
No ratings yet
Defect Detection Choladeck
14 pages
AI Fundamentals Level 1 Quiz - Attempt Review
No ratings yet
AI Fundamentals Level 1 Quiz - Attempt Review
9 pages
Design of Intelligent Classroom Facial Recognition
No ratings yet
Design of Intelligent Classroom Facial Recognition
9 pages
U-Net Architecture For Image Segmentation
No ratings yet
U-Net Architecture For Image Segmentation
7 pages
Properties of Task Environment
100% (2)
Properties of Task Environment
2 pages
SCSA3015 Deep Learning Quiz For IV Year (Batch 2019 - 2023)
No ratings yet
SCSA3015 Deep Learning Quiz For IV Year (Batch 2019 - 2023)
15 pages
Ai Edited by Tina
No ratings yet
Ai Edited by Tina
10 pages
Class8worksheetch4aianswers PDF
No ratings yet
Class8worksheetch4aianswers PDF
2 pages
Ex NO 9 DL LAB
No ratings yet
Ex NO 9 DL LAB
3 pages
Python Package Imports
No ratings yet
Python Package Imports
3 pages
image
No ratings yet
image
22 pages
Analysis and Study of Perceptron To Solve Xor Problem
No ratings yet
Analysis and Study of Perceptron To Solve Xor Problem
6 pages
Quantum-neural-network-for-genomic-pattern-detection
No ratings yet
Quantum-neural-network-for-genomic-pattern-detection
11 pages
Machine Learning - CheatSheet
100% (1)
Machine Learning - CheatSheet
2 pages

Unit 5 Deep Unsupervised Learning

Uploaded by

Unit 5 Deep Unsupervised Learning

Uploaded by

Unit-5

Deep Unsupervised Learning

Deep Unsupervised Learning

Latent variable models

Fig 2:Parameters of a probabilistic model

Fig 3:Different combinations of a probabilistic model

Fig 4:Introduction of latent variable

Benefits of using Latent Variable Models:

Hyper parameters of an AutoEncoder

Architecture of Variational Autoencoder

Mathematics behind Variational Autoencoder

We can do it by following way:

But, the calculation of p(x) can be quite difficult

By simplifying, the above minimization problem is equivalent to the following maximization

Generative Adversarial Networks (GANs)

Fig. 1: GAN structure

Fig. 2: GAN Step 1

Step 2 – Training of Generator:

# Define a basic transform

Step 3: Loading the Dataset

Step 4: Defining parameters to be used in later processes

Step 5: Defining a Utility Class to Build the Generator

# Define the generator

def forward(self, z):

Step 6: Defining a Utility Class to Build the Discriminator

# Define the discriminator

def forward(self, img):

Step 7: Building the Generative Adversarial Network

# Define the generator and discriminator

Step 8: Training the Generative Adversarial Network

# Measure discriminator's ability

The advantages of the GANs are as follows:

Use Cases of Generative Adversarial Networks

Introduction to Convolution Neural Network

Layers used to build ConvNets

Why do we need Transfer Learning?

This is a general summary of how transfer learning works:

Freezed and Trainable Layers:

Advantages of transfer learning:

Disadvantages of transfer learning:

You might also like