0% found this document useful (0 votes)
50 views89 pages

Unit III

adad

Uploaded by

fakeidatuse123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views89 pages

Unit III

adad

Uploaded by

fakeidatuse123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 89

Unit III

Advanced Deep Learning


Dr.E.Poongothai
Assistant Professor
Department of Computational Intelligence
SRM Institute of Science and Technology
• Convolutional Networks,
• Convolutional operation,
• Pooling,
• Normalization,
• Applications in Computer Vision,
• Sequence Modelling,
• Recurrent Neural Networks,
• Difficulty in Training RNN,
• LSTM,
• GRU,
• Encoder Decoder
• architectures,
• Application,
• Spam classification, sentiment analysis
Convolutional Neural Network
(CNN)
is a type of Deep Learning neural network architecture commonly
used in Computer Vision.
Computer vision is a field of Artificial Intelligence that enables a
computer to understand and interpret the image or visual data.
• When it comes to Machine Learning, Artificial Neural Networks
perform really well. Neural Networks are used in various datasets
like images, audio, and text.
• Different types of Neural Networks are used for different purposes,
for example for predicting the sequence of words we use
Recurrent Neural Networks more precisely an LSTM,
• similarly for image classification we use Convolution Neural
networks.
Layers of Neural Network

1.Input Layers:
It’s the layer in which we give input to our model. The number of neurons in this
layer is equal to the total number of features in our data (number of pixels in the
case of an image).
2.Hidden Layer: The input from the Input layer is then fed into the hidden layer.
There can be many hidden layers depending on our model and data size. Each
hidden layer can have different numbers of neurons which are generally greater
than the number of features. The output from each layer is computed by matrix
multiplication of the output of the previous layer with learnable weights of that
layer and then by the addition of learnable biases followed by activation function
which makes the network nonlinear.
3.Output Layer: The output from the hidden layer is then fed into a logistic
function like sigmoid or softmax which converts the output of each class into the
probability score of each class.
• The data is fed into the model and output from each
layer is obtained from the above step is called
feedforward, we then calculate the error using an
error function, some common error functions are cross-
entropy, square loss error, etc.
• The error function measures how well the network is
performing. After that, we backpropagate into the
model by calculating the derivatives. This step is called
Backpropagation which basically is used to minimize
the loss.
• cross-entropy measures the difference between the
discovered probability distribution of a classification
model and the predicted values.
• The cross-entropy loss function is used to find the
optimal solution by adjusting the weights of a machine
learning model during training. The objective is to
minimize the error between the actual and predicted
outcomes. A lower cross-entropy value indicates better
performance.
Convolution Neural Network
• Convolutional Neural Network (CNN) is the extended
version of artificial neural networks (ANN) which is
predominantly used to extract the feature from the grid-
like matrix dataset. For example visual datasets like
images or videos where data patterns play an extensive
role.
CNN architecture
• Convolutional Neural Network consists of multiple layers
like the input layer, Convolutional layer, Pooling layer,
and fully connected layers.
How Convolutional Layers works

• Convolution Neural Networks or covnets are neural


networks that share their parameters. Imagine you have
an image. It can be represented as a cuboid having its
length, width (dimension of the image), and height (i.e
the channel as images generally have red, green, and
blue channels).
• Now imagine taking a small patch of this image and
running a small neural network, called a filter or kernel
on it, with say, K outputs and representing them
vertically.
• Now slide that neural network across the whole image,
as a result, we will get another image with different
widths, heights, and depths. Instead of just R, G, and B
channels now we have more channels but lesser width
and height. This operation is called Convolution. If the
patch size is the same as that of the image it will be a
regular neural network. Because of this small patch, we
have fewer weights.
• Now let’s talk about a bit of mathematics that is involved in the whole
convolution process.
• Convolution layers consist of a set of learnable filters (or kernels) having small
widths and heights and the same depth as that of input volume (3 if the input
layer is image input).
• For example, if we have to run convolution on an image with dimensions
34x34x3. The possible size of filters can be axax3, where ‘a’ can be anything like
3, 5, or 7 but smaller as compared to the image dimension.
• During the forward pass, we slide each filter across the whole input volume step
by step where each step is called stride (which can have a value of 2, 3, or even
4 for high-dimensional images) and compute the dot product between the kernel
weights and patch from input volume.
• As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them
together as a result, we’ll get output volume having a depth equal to the
number of filters. The network will learn all the filters.
Layers used to build ConvNets

• A complete Convolution Neural Networks architecture is also known as covnets. A covnets is a sequence of layers, and every layer
transforms one volume to another through a differentiable function.
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
• Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input will be an image or a sequence of
images. This layer holds the raw input of the image with width 32, height 32, and depth 3.
• Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset. It applies a set of learnable filters
known as the kernels to the input images. The filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides over the
input image data and computes the dot product between kernel weight and the corresponding input image patch. The output of
this layer is referred as feature maps. Suppose we use a total of 12 filters for this layer we’ll get an output volume of dimension 32 x
32 x 12.
• Activation Layer: By adding an activation function to the output of the preceding layer, activation layers add nonlinearity to the
network. it will apply an element-wise activation function to the output of the convolution layer. Some common activation functions
are RELU: max(0, x), Tanh, Leaky RELU, etc. The volume remains unchanged hence output volume will have dimensions 32 x 32 x
12.
• Pooling layer: This layer is periodically inserted in the covnets and its main function is to reduce the size of volume which makes the
computation fast reduces memory and also prevents overfitting. Two common types of pooling layers are max pooling and average
pooling. If we use a max pool with 2 x 2 filters and stride 2, the resultant volume will be of dimension 16x16x12.
• Flattening: The resulting feature maps are flattened
into a one-dimensional vector after the convolution and
pooling layers so they can be passed into a completely
linked layer for categorization or regression.
• Fully Connected Layers: It takes the input from the
previous layer and computes the final classification or
regression task.
• Output Layer: The output from the fully connected
layers is then fed into a logistic function for
classification tasks like sigmoid or softmax which
converts the output of each class into the probability
score of each class.
• Advantages of Convolutional Neural Networks (CNNs):
1.Good at detecting patterns and features in images, videos, and audio signals.
2.Robust to translation, rotation, and scaling invariance.
3.End-to-end training, no need for manual feature extraction.
4.Can handle large amounts of data and achieve high accuracy.
• Disadvantages of Convolutional Neural Networks (CNNs):
1.Computationally expensive to train and require a lot of memory.
2.Can be prone to overfitting if not enough data or proper regularization is
used.
3.Requires large amounts of labeled data.
4.Interpretability is limited, it’s hard to understand what the network has
learned.
Normalization
Need for Batch Normalization in
CNN mode
• Batch Normalization in CNN addresses several challenges encountered during training.
There are following reasons highlight the need for batch normalization in CNN:
1.Addressing Internal Covariate Shift: Internal covariate shift occurs when the
distribution of network activations changes as parameters are updated during training.
Batch normalization addresses this by normalizing the activations in each layer,
maintaining consistent mean and variance across inputs throughout training. This
stabilizes training and speeds up convergence.
2.Improving Gradient Flow: Batch normalization contributes to stabilizing the gradient
flow during backpropagation by reducing the reliance of gradients on parameter scales.
As a result, training becomes faster and more stable, enabling effective training of
deeper networks without facing issues like vanishing or exploding gradients.
3.Regularization Effect: During training, batch normalization introduces noise to the
network activations, serving as a regularization technique. This noise aids in averting
overfitting by injecting randomness and decreasing the network’s sensitivity to minor
fluctuations in the input data.
Max pooling
Applications
• Decoding Facial Recognition
• Facial recognition is broken down by a convolutional
neural network into the following major components -
• Identifying every face in the picture
• Focusing on each face despite external factors, such as
light, angle, pose, etc.
• Identifying unique features
• Comparing all the collected data with already existing
data in the database to match a face with a name.
• A similar process is followed for scene labeling as well.
Analyzing Documents
• Convolutional neural networks can also be used for
document analysis. This is not just useful for
handwriting analysis, but also has a major stake in
recognizers. For a machine to be able to scan an
individual's writing, and then compare that to the wide
database it has, it must execute almost a million
commands a minute. It is said with the use of CNNs and
newer models and algorithms, the error rate has been
brought down to a minimum of 0.4% at a character
level, though it's complete testing is yet to be widely
seen.
Collecting Historic and
Environmental Elements
• CNNs are also used for more complex purposes such as
natural history collections. These collections act as key
players in documenting major parts of history such as
biodiversity, evolution, habitat loss, biological invasion,
and climate change.
Collecting Historic and
Environmental Elements
• CNNs are also used for more complex purposes such as
natural history collections. These collections act as key
players in documenting major parts of history such as
biodiversity, evolution, habitat loss, biological invasion,
and climate change.
Understanding Climate
• CNNs can be used to play a major role in the fight
against climate change, especially in understanding the
reasons why we see such drastic changes and how we
could experiment in curbing the effect. It is said that the
data in such natural history collections can also provide
greater social and scientific insights, but this would
require skilled human resources such as researchers
who can physically visit these types of repositories.
There is a need for more manpower to carry out deeper
experiments in this field.
Understanding Gray Areas
• Introduction of the gray area into CNNs is posed to
provide a much more realistic picture of the real world.
Currently, CNNs largely function exactly like a machine,
seeing a true and false value for every question.
However, as humans, we understand that the real world
plays out in a thousand shades of gray. Allowing the
machine to understand and process fuzzier logic will
help it understand the gray area we humans live in and
strive to work against. This will help CNNs get a more
holistic view of what human sees.
Example for CNN
Recurrent Neural Network(RNN)
• is a type of Neural Network where the output from the previous step is fed as input to
the current step.
• In traditional neural networks, all the inputs and outputs are independent of each other.
• Still, in cases when it is required to predict the next word of a sentence, the previous
words are required and hence there is a need to remember the previous words.
• Thus RNN came into existence, which solved this issue with the help of a Hidden Layer.
The main and most important feature of RNN is its Hidden state, which remembers
some information about a sequence. The state is also referred to as Memory
State since it remembers the previous input to the network.
• It uses the same parameters for each input as it performs the same task on all the
inputs or hidden layers to produce the output. This reduces the complexity of
parameters, unlike other neural networks.
• Types Of RNN
• There are four types of RNNs based on the number of
inputs and outputs in the network.
1.One to One
2.One to Many
3.Many to One
4.Many to Many
• One to One
• This type of RNN behaves the same as any simple
Neural network it is also known as Vanilla Neural
Network. In this Neural network, there is only one input
and one output.
How does RNN work?

• The Recurrent Neural Network consists of multiple fixed


activation function units, one for each time step. Each
unit has an internal state which is called the hidden
state of the unit. This hidden state signifies the past
knowledge that the network currently holds at a given
time step. This hidden state is updated at every time
step to signify the change in the knowledge of the
network about the past. The hidden state is updated
using the following recurrence relation:-
Issues or difficulties of Standard
RNNs
1.Vanishing Gradient: Text generation, machine translation,
and stock market prediction are just a few examples of the
time-dependent and sequential data problems that can be
modelled with recurrent neural networks. You will discover,
though, that the gradient problem makes training RNN
difficult.
2.Exploding Gradient: An Exploding Gradient occurs when a
neural network is being trained and the slope tends to grow
exponentially rather than decay. Large error gradients that
build up during training lead to very large updates to the
neural network model weights, which is the source of this
issue.
• Advantages and Disadvantages of Recurrent Neural Network
• Advantages
1.An RNN remembers each and every piece of information through time. It
is useful in time series prediction only because of the feature to
remember previous inputs as well. This is called Long Short Term Memory.
2.Recurrent neural networks are even used with convolutional layers to
extend the effective pixel neighborhood.
• Disadvantages
1.Gradient vanishing and exploding problems.
2.Training an RNN is a very difficult task.
3.It cannot process very long sequences if using tanh or relu as an
activation function.
• Applications of Recurrent Neural Network
1.Language Modelling and Generating Text
2.Speech Recognition
3.Machine Translation
4.Image Recognition, Face detection
5.Time series Forecasting
Variation Of Recurrent Neural Network (RNN)

• To overcome the problems like vanishing gradient and


exploding gradient descent several new advanced
versions of RNNs are formed some of these are as;
1.Bidirectional Neural Network (BiNN)
2.Long Short-Term Memory (LSTM)
Bidirectional Neural Network (BiNN)
• A BiNN is a variation of a Recurrent Neural Network in which the input
information flows in both direction and then the output of both
direction are combined to produce the input. BiNN is useful in
situations when the context of the input is more important such as
Nlp tasks and Time-series analysis problems.
Long Short-Term Memory (LSTM)
• Long Short-Term Memory works on the read-write-and-forget principle
where given the input information network reads and writes the most
useful information from the data and it forgets about the information
which is not important in predicting the output. For doing this three
new gates are introduced in the RNN. In this way, only the selected
information is passed through the network.
Long Short Term Memory Networks
Explanation

Prerequisites: Recurrent Neural Networks


Long Short Term Memory
Network(LSTM).
To solve the problem of Vanishing and Exploding Gradients in a Deep
Recurrent Neural Network, many variations were developed.
One of the most famous of them is the Long Short Term Memory
Network(LSTM).
In concept, an LSTM recurrent unit tries to “remember” all the past
knowledge that the network is seen so far and to “forget” irrelevant
data.
This is done by introducing different activation function layers called
“gates” for different purposes.
Each LSTM recurrent unit also maintains a vector called the Internal
Cell State which conceptually describes the information that was
chosen to be retained by the previous LSTM recurrent unit.
purposes for four different gates
1. Forget Gate(f): At forget gate the input is combined with the previous output to generate a fraction
between 0 and 1, that determines how much of the previous state need to be preserved (or in other words,
how much of the state should be forgotten). This output is then multiplied with the previous state. Note: An
activation output of 1.0 means “remember everything” and activation output of 0.0 means “forget
everything.” From a different perspective, a better name for the forget gate might be the “remember gate”
2. Input Gate(i): Input gate operates on the same signals as the forget gate, but here the objective is to
decide which new information is going to enter the state of LSTM. The output of the input gate (again a
fraction between 0 and 1) is multiplied with the output of tan h block that produces the new values that
must be added to previous state. This gated vector is then added to previous state to generate current
state
3. Input Modulation Gate(g): It is often considered as a sub-part of the input gate and much literature on
LSTM’s does not even mention it and assume it is inside the Input gate. It is used to modulate the
information that the Input gate will write onto the Internal State Cell by adding non-linearity to the
information and making the information Zero-mean. This is done to reduce the learning time as Zero-
mean input has faster convergence. Although this gate’s actions are less important than the others and
are often treated as a finesse-providing concept, it is good practice to include this gate in the structure of
the LSTM unit.
4. Output Gate(o): At output gate, the input and previous state are gated as before to generate another
scaling fraction that is combined with the output of tanh block that brings the current state. This output is
then given out. The output and state are fed back into the LSTM block.
Working of an LSTM recurrent
unit:
Gated Recurrent Unit
Networks
Gated Recurrent Unit Networks

• Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN)


that was introduced by Cho et al. in 2014 as a simpler alternative to
Long Short-Term Memory (LSTM) networks. Like LSTM, GRU can process
sequential data such as text, speech, and time-series data.
• The basic idea behind GRU is to use gating mechanisms to selectively
update the hidden state of the network at each time step. The gating
mechanisms are used to control the flow of information in and out of
the network. The GRU has two gating mechanisms, called the reset
gate and the update gate.
• The reset gate determines how much of the previous hidden state
should be forgotten, while the update gate determines how much of
the new input should be used to update the hidden state. The output of
the GRU is calculated based on the updated hidden state.
How do Gated Recurrent Units solve the problem of vanishing gradients?
• In Deep Learning, Many Complex problems can be
solved by constructing better neural network
architecture. The RNN(Recurrent Neural Network)
and its variants are much useful in sequence to
sequence learning. The RNN variant LSTM (
Long Short-term Memory) is the most used cell in
seq-seq learning tasks.
Encoder-Decoder Model

There are three main blocks in the encoder-decoder


model,
•Encoder
•Hidden Vector
•Decoder
The Encoder will convert the input sequence into a
single-dimensional vector (hidden vector). The decoder
will convert the hidden vector into the output sequence.
Encoder-Decoder models are jointly trained to maximize
the conditional probabilities of the target sequence given
the input sequence.
Encoder

• Multiple RNN cells can be stacked together to form the encoder.


RNN reads each inputs sequentially
• For every timestep (each input) t, the hidden state (hidden
vector) h is updated according to the input at that timestep X[i].
• After all the inputs are read by encoder model, the final hidden
state of the model represents the context/summary of the
whole input sequence.
• Example: Consider the input sequence “I am a Student” to be
encoded. There will be totally 4 timesteps ( 4 tokens) for the
Encoder model. At each time step, the hidden state h will be
updated using the previous hidden state and the current input.
Case study- spam classification
• https://medium.com/@azimkhan8018/email-spam-detection-with-
machine-learning-a-comprehensive-guide-b65c6936678b
What is Sentiment Analysis?

• Sentiment analysis is the process of classifying whether a block of text is


positive, negative, or neutral. The goal that Sentiment mining tries to gain is to
be analysed people’s opinions in a way that can help businesses expand. It
focuses not only on polarity (positive, negative & neutral) but also on emotions
(happy, sad, angry, etc.). It uses various Natural Language Processing
algorithms such as Rule-based, Automatic, and Hybrid.
• let’s consider a scenario, if we want to analyze whether a product is satisfying
customer requirements, or is there a need for this product in the market. We
can use sentiment analysis to monitor that product’s reviews. Sentiment
analysis is also efficient to use when there is a large set of unstructured data,
and we want to classify that data by automatically tagging it. Net Promoter
Score (NPS) surveys are used extensively to gain knowledge of how a
customer perceives a product or service. Sentiment analysis also gained
popularity due to its feature to process large volumes of NPS responses and
obtain consistent results quickly.
Why is Sentiment Analysis
Important?
• Sentiment analysis is the contextual meaning of words that indicates the social sentiment of a brand and also helps the business to
determine whether the product they are manufacturing is going to make a demand in the market or not.
• According to the survey,80% of the world’s data is unstructured. The data needs to be analyzed and be in a structured manner
whether it is in the form of emails, texts, documents, articles, and many more.
1. Sentiment Analysis is required as it stores data in an efficient, cost friendly.
2. Sentiment analysis solves real-time issues and can help you solve all real-time scenarios.
• Here are some key reasons why sentiment analysis is important for business:
• Customer Feedback Analysis: Businesses can analyze customer reviews, comments, and feedback to understand the sentiment
behind them helping in identifying areas for improvement and addressing customer concerns, ultimately enhancing customer
satisfaction.
• Brand Reputation Management: Sentiment analysis allows businesses to monitor their brand reputation in real-time.
By tracking mentions and sentiments on social media, review platforms, and other online channels, companies can respond
promptly to both positive and negative sentiments, mitigating potential damage to their brand.
• Product Development and Innovation: Understanding customer sentiment helps identify features and aspects of their products
or services that are well-received or need improvement. This information is invaluable for product development and innovation,
enabling companies to align their offerings with customer preferences.
• Competitor Analysis: Sentiment Analysis can be used to compare the sentiment around a company’s products or services with
those of competitors.
Businesses identify their strengths and weaknesses relative to competitors, allowing for strategic decision-making.
• Marketing Campaign Effectiveness
Businesses can evaluate the success of their marketing campaigns by analyzing the sentiment of online discussions and social
media mentions.
Positive sentiment indicates that the campaign is resonating with the target audience, while negative sentiment may signal the
need for adjustments.
What are the Types of Sentiment Analysis?

• Fine-Grained Sentiment Analysis


• This depends on the polarity base. This category can be designed as very positive, positive, neutral, negative, or very negative. The rating is
done on a scale of 1 to 5. If the rating is 5 then it is very positive, 2 then negative, and 3 then neutral.
• Emotion detection
• The sentiments happy, sad, angry, upset, jolly, pleasant, and so on come under emotion detection. It is also known as a lexicon method of
sentiment analysis.
• Aspect-Based Sentiment Analysis
• It focuses on a particular aspect for instance if a person wants to check the feature of the cell phone then it checks the aspect such as the
battery, screen, and camera quality then aspect based is used.
• Multilingual Sentiment Analysis
• Multilingual consists of different languages where the classification needs to be done as positive, negative, and neutral. This is highly
challenging and comparatively difficult.
• How does Sentiment Analysis work?
• Sentiment Analysis in NLP, is used to determine the sentiment expressed in a piece of text, such as a review, comment, or social media
post.
• The goal is to identify whether the expressed sentiment is positive, negative, or neutral. let’s understand the overview in general two steps:
• Preprocessing
• Starting with collecting the text data that needs to be analysed for sentiment like customer reviews, social media posts, news articles, or
any other form of textual content. The collected text is pre-processed to clean and standardize the data with various tasks:
• Removing irrelevant information (e.g., HTML tags, special characters).
• Tokenization: Breaking the text into individual words or tokens.
• Removing stop words (common words like “and,” “the,” etc. that don’t contribute much to sentiment).
• Stemming or Lemmatization: Reducing words to their root form.
What are the Approaches to Sentiment
Analysis?

• There are three main approaches used:


• Rule-based
• Over here, the lexicon method, tokenization, and parsing come in the rule-based. The approach is that
counts the number of positive and negative words in the given dataset. If the number of positive words is
greater than the number of negative words then the sentiment is positive else vice-versa.
• Machine Learning
• This approach works on the machine learning technique. Firstly, the datasets are trained and predictive
analysis is done. The next process is the extraction of words from the text is done. This text extraction
can be done using different techniques such as Naive Bayes, Support Vector machines,
hidden Markov model, and conditional random fields like this machine learning techniques are used.
• Neural Network
• In the last few years neural networks have evolved at a very rate. It involves using artificial neural
networks, which are inspired by the structure of the human brain, to classify text into positive, negative,
or neutral sentiments. it has Recurrent neural networks, Long short-term memory, Gated recurrent unit,
etc to process sequential data like text.
• Hybrid Approach
• It is the combination of two or more approaches i.e. rule-based and Machine Learning approaches. The
surplus is that the accuracy is high compared to the other two approaches.
Self Study for students
• Case study- Sentiment Analysis using Neural Network
Thank You

You might also like