Deep Learning PPT Full Notes
Deep Learning PPT Full Notes
Deep Learning PPT Full Notes
Topics to be Covered
Introduction: Deep Learning
Deep and Shallow Neural Network
Machine Learning vs Deep Learning
Deep Learning Models
Logistic Regression
Gradient Descent and Types
Regularization
It works technically in the same way as machine learning does, but with different capabilities
and approaches.
Deep learning models are capable enough to focus on the accurate features themselves by
requiring a little guidance from the programmer.
Deep learning is implemented with the help of Neural Networks, and the idea
behind the motivation of neural network is the biological neurons, which is
nothing but a brain cell.
Architectures
Shallow neural network:
The Shallow neural network has only one hidden layer between the input and output.
Most of the people think the machine learning, deep learning, and as well as artificial
intelligence as the same buzzwords. But in actuality, all these terms are different but related to
each other.
Autoencoders
Logistic Regression
Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.
Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems, whereas
Logistic regression is used for solving the classification problems.
Logistic Regression
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which
predicts two maximum values (0 or 1).
It maps any real value into another value within a range of 0 and 1.
The value of the logistic regression must be between 0 and 1, which cannot go beyond this
limit, so it forms a curve like the "S" form. The S-form curve is called the Sigmoid function
or the logistic function.
• Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
• Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as "low", "Medium", or "High".
Gradient Descent
Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable
function. Gradient descent is simply used to find the values of a function's parameters
(coefficients) that minimize a cost function as far as possible.
Most machine learning and deep learning algorithms involve some sort of optimization.
Optimization refers to the process of either minimizing or maximizing some function by
altering its parameters.
With gradient descent, you start with a cost function (also known as a loss or error function)
based on a set of parameters. The goal is to find the parameter values that minimize
the cost function.
Gradient Descent
Gradient Descent
How can we avoid local minima and always try and get the optimized weights based on global
minima?
Since we use the entire dataset to compute the gradient convergence is slow.
If the dataset is huge and contains millions or billions of data points then it is memory as well
as computationally intensive.
We first need to shuffle the dataset so that we get a completely randomized dataset. As the
dataset is randomized and weights are updated for each single example, update of the weights
and the cost function will be noisy jumping all over the place
Mini batch gradient descent is widely used and converges faster and is more stable.
As we take a batch with different samples, it reduces the noise which is variance of the weight
updates and that helps to have a more stable converge faster.
Regularization
Regularization is one of the most important concepts of machine learning. It is a technique to
prevent the model from overfitting by adding extra information to it.
Sometimes the machine learning model performs well with the training data but does not
perform well with the test data.
It means the model is not able to predict the output when deals with unseen data by
introducing noise in the output, and hence the model is called overfitted.
Regularization
This technique can be used in such a way that it will allow to maintain all variables or features
in the model by reducing the magnitude of the variables. Hence, it maintains accuracy as well
as a generalization of the model.
It mainly regularizes or reduces the coefficient of features toward zero. In simple words, In
regularization technique, we reduce the magnitude of the features by keeping the same
number of features.
Types of Regularization
Ridge Regression
Ridge regression is one of the types of linear regression in which a small amount of bias is
introduced so that we can get better long-term predictions.
Ridge regression is a regularization technique, which is used to reduce the complexity of the
model. It is also called as L2 regularization.
Lasso Regression:
Lasso regression is another regularization technique to reduce the complexity of the model. It
stands for Least Absolute and Selection Operator.
It is similar to the Ridge Regression except that the penalty term contains only the
absolute weights instead of a square of weights.
It is also called as L1 regularization.
References
https://medium.com/odscjournal/understanding-the-3-primary-types-of-gradient-descent-
987590b2c36
https://medium.com/@arshren/gradient-descent-5a13f385d403
https://www.javatpoint.com/deep-learning
https://www.geeksforgeeks.org/gradient-descent-algorithm-and-its-variants/
https://www.javatpoint.com/machine-learning-vs-deep-learning
https://www.javatpoint.com/regularization-in-machine-learning
https://www.javatpoint.com/logistic-regression-in-machine-learning
https://www.coursera.org/lecture/introduction-to-deep-learning-with-keras/shallow-versus-
deep-neural-networks-3pKHn
THANK YOU
Hit Academic Booster on YouTube for
GATE & Interview Preparation
Topics to be Covered
Introduction: CNN
The LeNet Architecture
Operations of CNN
Convolution
Introducing Non Linearity
Pooling
Fully Connected Layer
What is CNN?
Convolutional Neural Networks (ConvNets or CNNs) are a category of neural
networks that have proven very effective in areas such as image recognition and
classification.
ConvNets have been successful in identifying faces, objects and traffic signs apart from
powering vision in robots and self driving cars.
ConvNets, therefore, are an important tool for most machine learning practitioners today.
What is CNN?
There have been several new architectures proposed in the recent years which are
improvements over the LeNet, but they all use the main concepts from the LeNet and are
relatively easier to understand if you have a clear understanding of the former.
Operations of CNN
There are four main operations in the ConvNet:
Convolution
Non Linearity (ReLU)
Pooling or Sub Sampling
Classification (Fully Connected Layer)
These operations are the basic building blocks of every Convolutional Neural Network, so
understanding how these work is an important step to developing a sound understanding of
ConvNets.
Image is a Matrix
An image from a standard digital camera will have three channels – red, green and blue – you
can imagine those as three 2d-matrices stacked over each other (one for each color), each
having pixel values in the range 0 to 255.
A grayscale image, on the other hand, has just one channel. For the purpose of this post, we
will only consider grayscale images, so we will have a single 2d matrix representing an
image. The value of each pixel in the matrix will range from 0 to 255 – zero indicating
black and 255 indicating white.
It is important to note that filters acts as feature detectors from the original input image.
It is evident from the animation above that different values of the filter matrix will produce
different Feature Maps for the same input image.
The more number of filters we have, the more image features get extracted and the better our
network becomes at recognizing patterns in unseen images.
Depth: Depth corresponds to the number of filters we use for the convolution operation.
Stride: Stride is the number of pixels by which we slide our filter matrix over the input matrix. When
the stride is 1 then we move the filters one pixel at a time. When the stride is 2, then the filters jump 2
pixels at a time as we slide them around. Having a larger stride will produce smaller feature maps.
Zero-padding: Sometimes, it is convenient to pad the input matrix with zeros around
the border, so that we can apply the filter to bordering elements of our input image
matrix. A nice feature of zero padding is that it allows us to control the size of the
feature maps. Adding zero-padding is also called wide convolution, and not using
zero-padding would be a narrow convolution.
ReLU
ReLU is an element wise operation (applied per pixel) and replaces all negative pixel values in the
feature map by zero. The purpose of ReLU is to introduce non-linearity in our ConvNet, since
most of the real-world data we would want our ConvNet to learn would be non-linear
(Convolution is a linear operation – element wise matrix multiplication and addition, so we
account for non-linearity by introducing a non-linear function like ReLU).
In case of Max Pooling, we define a spatial neighborhood (for example, a 2×2 window) and
take the largest element from the rectified feature map within that window. Instead of taking
the largest element we could also take the average (Average Pooling) or sum of all elements in
that window. In practice, Max Pooling has been shown to work better.
The output from the convolutional and pooling layers represent high-level features of the
input image. The purpose of the Fully Connected layer is to use these features for
classifying the input image into various classes based on the training dataset.
References
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
https://www.analyticsvidhya.com/blog/2018/12/guide-convolutional-neural-network-cnn/
https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-
deep-learning-99760835f148
https://en.wikipedia.org/wiki/Convolutional_neural_network
https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-
networks-584bc134c1e2
https://www.coursera.org/lecture/deep-learning-business/5-1-deep-learning-with-cnn-
convolutional-neural-network-6t88U
THANK YOU
Hit Academic Booster on YouTube for
GATE & Interview Preparation
Topics to be Covered
Generative Adversarial Networks
Working of GANs
Semi Supervised Learning
Dimensionality Reduction
PCA and LDA
Auto Encoders
CNN Architectures
AlexNet, VGGNet, Inception, ResNet
What is GAN?
Generative Adversarial Networks, or GANs for short, are an approach to generative modeling
using deep learning methods, such as convolutional neural networks.
GAN is proposed by Ian Goodfellow and few other researchers including Yoshua Bengio in
2014.
What is GAN?
What is GAN?
In GAN we have a Generator that is pitted against an adversarial network called
Discriminator. Hence the name Generative Adversarial Network.
Generator’s objective is to model or generate data that is very similar to the training data.
Generator needs to generate data that is indistinguishable from the real data. Generated data
should be such that discriminator is tricked to identify it as real data.
What is GAN?
• Discriminator objective is to identify if the data is real or fake. It gets two sets of input. One
input comes from the training dataset and the other input is the modelled dataset generated by
Generator.
• Generator can be thought as team of counterfeiters making fake currency which looks exactly
like real currency. Discriminators can be considered as team of cops trying to detect the
counterfeit currency. Counterfeiters and cops both are trying to beat each other at their game.
Discriminator
Discriminator gets two inputs. One is the real data from training dataset and other is the fake data
from the Generator. Goal of the Discriminator is to identify which input is real and which is fake.
Usage of GAN
Generating a high resolution image from a low resolution image
The basic procedure involved is that first, the programmer will cluster similar data using an
unsupervised learning algorithm and then use the existing labeled data to label the rest of the
unlabeled data.
• Cluster Assumption: The data can be divided into discrete clusters and points in the same cluster
are more likely to share an output label.
• Manifold Assumption: The data lie approximately on a manifold of much lower dimension than
• Protein Sequence Classification: Since DNA strands are typically very large in size, the rise
of Semi-Supervised learning has been imminent in this field.
Dimensionality Reduction
Dimensionality Reduction the process of reducing the number of random variables under
consideration via obtaining a set of principal variables. It can be divided into feature selection and
feature extraction.
Feature extraction: This reduces the data in a high dimensional space to a lower dimension
space, i.e. a space with lesser no. of dimensions.
It is used for modeling differences in groups i.e. separating two or more classes. It is used to
project the features in higher dimension space into a lower dimension space.
For example, we have two classes and we need to separate them efficiently. Classes can have
multiple features. Using only a single feature to classify them may result in some overlapping
Applications of LDA
Face Recognition: Linear discriminant analysis (LDA) is used here to reduce the number of
features to a more manageable number before the process of classification.
Medical: In this field, Linear discriminant analysis (LDA) is used to classify the patient
disease state as mild, moderate or severe based upon the patient various parameters and the
medical treatment he is going through. This helps the doctors to intensify or reduce the pace
of their treatment.
Customer Identification: Linear discriminant analysis will help us to identify and select the
features which can describe the characteristics of the group of customers that are most likely
to buy that particular product in the shopping mall.
PCA tends to find linear correlations between variables, which is sometimes undesirable.
PCA fails in cases where mean and covariance are not enough to define datasets.
We may not know how many principal components to keep- in practice, some thumb rules are
applied.
Auto-Encoder
Autoencoder is an unsupervised artificial neural network that learns how to efficiently compress
and encode data then learns how to reconstruct the data back from the reduced encoded
representation to a representation that is as close to the original input as possible.
Auto-Encoder : Components
Autoencoders consists of 4 main parts:
Encoder: In which the model learns how to reduce the input dimensions and compress the input
data into an encoded representation.
Bottleneck: which is the layer that contains the compressed representation of the input data. This
is the lowest possible dimensions of the input data.
Decoder: In which the model learns how to reconstruct the data from the encoded representation
to be as close to the original input as possible.
Reconstruction Loss: This is the method that measures how well the decoder is performing
and how close the output is to the original input.
CNN Architectures
LeNet-5 (1998)
LeNet-5, a pioneering 7-level convolutional network by LeCun et al in 1998, that classifies digits,
was applied by several banks to recognise hand-written numbers on checks (cheques) digitized in
32x32 pixel greyscale inputimages. The ability to process higher resolution images requires larger
and more convolutional layers, so this technique is constrained by the availability of computing
resources.
CNN Architectures
AlexNet (2012)
In 2012, AlexNet significantly outperformed all the prior competitors and won the challenge by
reducing the top-5 error from 26% to 15.3%. The second place top-5 error rate, which was not a
CNN variation, was around 26.2%.
CNN Architectures
GoogLeNet/Inception (2014)
It achieved a top-5 error rate of 6.67%! This was very close to human level performance which the
organisers of the challenge were now forced to evaluate. As it turns out, this was actually rather
hard to do and required some human training in order to beat GoogLeNets accuracy.
CNN Architectures
VGGNet (2014)
VGGNet consists of 16 convolutional layers and is very appealing because of its very uniform
architecture. Similar to AlexNet, only 3x3 convolutions, but lots of filters. Trained on 4 GPUs for
2–3 weeks. It is currently the most preferred choice in the community for extracting features from
images.
CNN Architectures
ResNet (2015)
At last, at the ILSVRC 2015, the so-called Residual Neural Network (ResNet) by Kaiming He et
al introduced anovel architecture with “skip connections” and features heavy batch normalization.
Such skip connections are also known as gated units or gated recurrent units and have a strong
similarity to recent successful elements applied in RNNs.
References
https://machinelearningmastery.com/what-are-generative-adversarial-networks-gans/
https://medium.com/datadriveninvestor/deep-learning-generative-adversarial-network-gan-
34abb43c0644
https://developers.google.com/machine-learning/gan/generative
https://developers.google.com/machine-learning/gan/gan_structure
https://www.geeksforgeeks.org/ml-semi-supervised-learning/
https://medium.com/inside-machine-learning/placeholder-3557ebb3d470
https://www.geeksforgeeks.org/dimensionality-reduction/
https://medium.com/machine-learning-researcher/dimensionality-reduction-pca-and-lda-
6be91734f567
THANK YOU
Hit Academic Booster on YouTube for
GATE & Interview Preparation
Topics to be Covered
Introduction: RNN
How RNN Works
Problems in RNN
What is LSTM
Advantages of RNN
Disadvantages of RNN
Use Case and Application of Deep Learning
In traditional neural networks, all the inputs and outputs are independent of each other, but in
cases like when it is required to predict the next word of a sentence, the previous words are
required and hence there is a need to remember the previous words.
Thus RNN came into existence, which solved this issue with the help of a Hidden Layer.
RNN have a “memory” which remembers all information about what has been calculated.
It uses the same parameters for each input as it performs the same task on all the inputs or
hidden layers to produce the output.
Then like other neural networks, each hidden layer will have its own set of weights and
biases, let’s say, for hidden layer 1 the weights and biases are (w1, b1), (w2, b2) for second
hidden layer and (w3, b3) for third hidden layer.
This means that each of these layers are independent of each other, i.e. they do not memorize
the previous outputs.
Hence these three layers can be joined together such that the weights and bias of all the
hidden layers is the same, into a single recurrent layer.
Problems in RNN
Although the basic Recurrent Neural Network is fairly effective, it can suffer from a significant
problem. For deep networks, The Back-Propagation process can lead to the following issues:-
• Vanishing Gradients: This occurs when the gradients become very small and tend towards
zero.
• Exploding Gradients: This occurs when the gradients become too large due to back-
propagation.
Solutions of Problem
The problem of Exploding Gradients may be solved by using a hack – By putting a threshold
on the gradients being passed back in time. But this solution is not seen as a solution to the
problem and may also reduce the efficiency of the network.
To deal with such problems, two main variants of Recurrent Neural Networks were developed
Long Short Term Memory Networks (LSTM)
Gated Recurrent Unit Networks (GRU)
In concept, an LSTM recurrent unit tries to “remember” all the past knowledge that the
network is seen so far and to “forget” irrelevant data.
This is done by introducing different activation function layers called “gates” for different
purposes.
A Long Short Term Memory Network consists of four different gates for different purposes as
described below:-
Forget Gate (f)
Input Gate (i)
Input Modulation Gate (g)
Output Gate (o)
Advantages of RNN
An RNN remembers each and every information through time. It is useful in time series
prediction only because of the feature to remember previous inputs as well. This is called
Long Short Term Memory.
Recurrent neural network are even used with convolutional layers to extend the effective pixel
neighborhood.
Disadvantages of RNN
Gradient vanishing and exploding problems.
It cannot process very long sequences if using tanh or Relu as an activation function.
You can think of it how a child learns through constant experiences and replication. These
new services could provide unexpected business models for companies.
Google Now, the voice-activated assistant for Android, was launched less than a year after
Siri. The newest of the voice-activated intelligent assistants is Microsoft Cortana.
Automatic machine translation has been around for a long time, but deep learning is achieving
top results in two specific areas:
Automatic Translation of Text
Automatic Translation of Images
The model is capable of learning how to spell, punctuate, form sentences and even capture the
style of the text in the corpus. Large recurrent neural networks are used to learn the
relationship between items in the sequences of input strings and then generate text.
Image recognition is already being used in several sectors like gaming, social media, retail,
tourism, etc.
References
https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks
https://towardsdatascience.com/recurrent-neural-networks-d4642c9bc7ce
https://www.geeksforgeeks.org/introduction-to-recurrent-neural-network/
https://www.geeksforgeeks.org/recurrent-neural-networks-explanation/?ref=rp
https://www.geeksforgeeks.org/long-short-term-memory-networks-explanation/?ref=rp
https://medium.com/breathe-publication/top-15-deep-learning-applications-that-will-rule-
the-world-in-2018-and-beyond-7c6130c43b01
https://www.mygreatlearning.com/blog/deep-learning-applications/
THANK YOU
Hit Academic Booster on YouTube for
GATE & Interview Preparation