0% found this document useful (0 votes)
40 views89 pages

Unit III

dad

Uploaded by

fakeidatuse123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views89 pages

Unit III

dad

Uploaded by

fakeidatuse123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Unit III

Advanced Deep Learning


Dr.E.Poongothai
Assistant Professor
Department of Computational Intelligence
SRM Institute of Science and Technology
• Convolutional Networks,
• Convolutional operation,
• Pooling,
• Normalization,
• Applications in Computer Vision,
• Sequence Modelling,
• Recurrent Neural Networks,
• Difficulty in Training RNN,
• LSTM,
• GRU,
• Encoder Decoder
• architectures,
• Application,
• Spam classification, sentiment analysis
Convolutional Neural Network (CNN)
is a type of Deep Learning neural network architecture commonly
used in Computer Vision.
Computer vision is a field of Artificial Intelligence that enables a
computer to understand and interpret the image or visual data.
• When it comes to Machine Learning, Artificial Neural
Networks perform really well. Neural Networks are used in various
datasets like images, audio, and text.
• Different types of Neural Networks are used for different purposes,
for example for predicting the sequence of words we use Recurrent
Neural Networks more precisely an LSTM,
• similarly for image classification we use Convolution Neural
networks.
Layers of Neural Network

1.Input Layers:
It’s the layer in which we give input to our model. The number of neurons in this
layer is equal to the total number of features in our data (number of pixels in the
case of an image).
2.Hidden Layer: The input from the Input layer is then fed into the hidden layer.
There can be many hidden layers depending on our model and data size. Each
hidden layer can have different numbers of neurons which are generally greater
than the number of features. The output from each layer is computed by matrix
multiplication of the output of the previous layer with learnable weights of that
layer and then by the addition of learnable biases followed by activation
function which makes the network nonlinear.
3.Output Layer: The output from the hidden layer is then fed into a logistic
function like sigmoid or softmax which converts the output of each class into the
probability score of each class.
• The data is fed into the model and output from each layer is
obtained from the above step is called feedforward, we then
calculate the error using an error function, some common error
functions are cross-entropy, square loss error, etc.
• The error function measures how well the network is
performing. After that, we backpropagate into the model by
calculating the derivatives. This step is
called Backpropagation which basically is used to minimize
the loss.
• cross-entropy measures the difference between the discovered
probability distribution of a classification model and the predicted
values.
• The cross-entropy loss function is used to find the optimal solution by
adjusting the weights of a machine learning model during training.
The objective is to minimize the error between the actual and
predicted outcomes. A lower cross-entropy value indicates better
performance.
Convolution Neural Network
• Convolutional Neural Network (CNN) is the extended version
of artificial neural networks (ANN) which is predominantly
used to extract the feature from the grid-like matrix dataset.
For example visual datasets like images or videos where data
patterns play an extensive role.
CNN architecture
• Convolutional Neural Network consists of multiple layers like
the input layer, Convolutional layer, Pooling layer, and fully
connected layers.
How Convolutional Layers works

• Convolution Neural Networks or covnets are neural networks


that share their parameters. Imagine you have an image. It can
be represented as a cuboid having its length, width (dimension
of the image), and height (i.e the channel as images generally
have red, green, and blue channels).
• Now imagine taking a small patch of this image and running a
small neural network, called a filter or kernel on it, with say, K
outputs and representing them vertically.
• Now slide that neural network across the whole image, as a
result, we will get another image with different widths,
heights, and depths. Instead of just R, G, and B channels now
we have more channels but lesser width and height. This
operation is called Convolution. If the patch size is the same as
that of the image it will be a regular neural network. Because
of this small patch, we have fewer weights.
• Now let’s talk about a bit of mathematics that is involved in the whole
convolution process.
• Convolution layers consist of a set of learnable filters (or kernels) having small
widths and heights and the same depth as that of input volume (3 if the input
layer is image input).
• For example, if we have to run convolution on an image with dimensions
34x34x3. The possible size of filters can be axax3, where ‘a’ can be anything like
3, 5, or 7 but smaller as compared to the image dimension.
• During the forward pass, we slide each filter across the whole input volume step
by step where each step is called stride (which can have a value of 2, 3, or even
4 for high-dimensional images) and compute the dot product between the
kernel weights and patch from input volume.
• As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them
together as a result, we’ll get output volume having a depth equal to the
number of filters. The network will learn all the filters.
Layers used to build ConvNets
• A complete Convolution Neural Networks architecture is also known as covnets. A covnets is a sequence of layers, and every layer
transforms one volume to another through a differentiable function.
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
• Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input will be an image or a sequence of
images. This layer holds the raw input of the image with width 32, height 32, and depth 3.
• Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset. It applies a set of learnable
filters known as the kernels to the input images. The filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides
over the input image data and computes the dot product between kernel weight and the corresponding input image patch. The
output of this layer is referred as feature maps. Suppose we use a total of 12 filters for this layer we’ll get an output volume of
dimension 32 x 32 x 12.
• Activation Layer: By adding an activation function to the output of the preceding layer, activation layers add nonlinearity to the
network. it will apply an element-wise activation function to the output of the convolution layer. Some common activation
functions are RELU: max(0, x), Tanh, Leaky RELU, etc. The volume remains unchanged hence output volume will have dimensions
32 x 32 x 12.
• Pooling layer: This layer is periodically inserted in the covnets and its main function is to reduce the size of volume which makes
the computation fast reduces memory and also prevents overfitting. Two common types of pooling layers are max
pooling and average pooling. If we use a max pool with 2 x 2 filters and stride 2, the resultant volume will be of dimension
16x16x12.

• Flattening: The resulting feature maps are flattened into a
one-dimensional vector after the convolution and pooling
layers so they can be passed into a completely linked layer for
categorization or regression.
• Fully Connected Layers: It takes the input from the previous
layer and computes the final classification or regression task.
• Output Layer: The output from the fully connected layers is
then fed into a logistic function for classification tasks like
sigmoid or softmax which converts the output of each class
into the probability score of each class.
• Advantages of Convolutional Neural Networks (CNNs):
1.Good at detecting patterns and features in images, videos, and audio signals.
2.Robust to translation, rotation, and scaling invariance.
3.End-to-end training, no need for manual feature extraction.
4.Can handle large amounts of data and achieve high accuracy.
• Disadvantages of Convolutional Neural Networks (CNNs):
1.Computationally expensive to train and require a lot of memory.
2.Can be prone to overfitting if not enough data or proper regularization is used.
3.Requires large amounts of labeled data.
4.Interpretability is limited, it’s hard to understand what the network has learned.
Normalization
Need for Batch Normalization in CNN
mode
• Batch Normalization in CNN addresses several challenges encountered during training.
There are following reasons highlight the need for batch normalization in CNN:
1.Addressing Internal Covariate Shift: Internal covariate shift occurs when the
distribution of network activations changes as parameters are updated during training.
Batch normalization addresses this by normalizing the activations in each layer,
maintaining consistent mean and variance across inputs throughout training. This
stabilizes training and speeds up convergence.
2.Improving Gradient Flow: Batch normalization contributes to stabilizing the gradient
flow during backpropagation by reducing the reliance of gradients on parameter scales.
As a result, training becomes faster and more stable, enabling effective training of
deeper networks without facing issues like vanishing or exploding gradients.
3.Regularization Effect: During training, batch normalization introduces noise to the
network activations, serving as a regularization technique. This noise aids in averting
overfitting by injecting randomness and decreasing the network’s sensitivity to minor
fluctuations in the input data.
Max pooling
Applications
• Decoding Facial Recognition
• Facial recognition is broken down by a convolutional neural network
into the following major components -
• Identifying every face in the picture
• Focusing on each face despite external factors, such as light, angle,
pose, etc.
• Identifying unique features
• Comparing all the collected data with already existing data in the
database to match a face with a name.
• A similar process is followed for scene labeling as well.
Analyzing Documents
• Convolutional neural networks can also be used for document
analysis. This is not just useful for handwriting analysis, but also has a
major stake in recognizers. For a machine to be able to scan an
individual's writing, and then compare that to the wide database it
has, it must execute almost a million commands a minute. It is said
with the use of CNNs and newer models and algorithms, the error
rate has been brought down to a minimum of 0.4% at a character
level, though it's complete testing is yet to be widely seen.
Collecting Historic and Environmental
Elements
• CNNs are also used for more complex purposes such as natural
history collections. These collections act as key players in
documenting major parts of history such as biodiversity, evolution,
habitat loss, biological invasion, and climate change.
Collecting Historic and Environmental
Elements
• CNNs are also used for more complex purposes such as natural
history collections. These collections act as key players in
documenting major parts of history such as biodiversity, evolution,
habitat loss, biological invasion, and climate change.
Understanding Climate
• CNNs can be used to play a major role in the fight against climate
change, especially in understanding the reasons why we see such
drastic changes and how we could experiment in curbing the effect. It
is said that the data in such natural history collections can also
provide greater social and scientific insights, but this would require
skilled human resources such as researchers who can physically visit
these types of repositories. There is a need for more manpower to
carry out deeper experiments in this field.
Understanding Gray Areas
• Introduction of the gray area into CNNs is posed to provide a much
more realistic picture of the real world. Currently, CNNs largely
function exactly like a machine, seeing a true and false value for every
question. However, as humans, we understand that the real world
plays out in a thousand shades of gray. Allowing the machine to
understand and process fuzzier logic will help it understand the gray
area we humans live in and strive to work against. This will help CNNs
get a more holistic view of what human sees.
Example for CNN
Recurrent Neural Network(RNN)
• is a type of Neural Network where the output from the previous step is fed as input to the current
step.
• In traditional neural networks, all the inputs and outputs are independent of each other.
• Still, in cases when it is required to predict the next word of a sentence, the previous words are
required and hence there is a need to remember the previous words.
• Thus RNN came into existence, which solved this issue with the help of a Hidden Layer. The main
and most important feature of RNN is its Hidden state, which remembers some information about
a sequence. The state is also referred to as Memory State since it remembers the previous input to
the network.
• It uses the same parameters for each input as it performs the same task on all the inputs or hidden
layers to produce the output. This reduces the complexity of parameters, unlike other neural
networks.
• Types Of RNN
• There are four types of RNNs based on the number of inputs
and outputs in the network.
1.One to One
2.One to Many
3.Many to One
4.Many to Many
• One to One
• This type of RNN behaves the same as any simple Neural
network it is also known as Vanilla Neural Network. In this
Neural network, there is only one input and one output.
How does RNN work?

• The Recurrent Neural Network consists of multiple


fixed activation function units, one for each time step. Each unit
has an internal state which is called the hidden state of the
unit. This hidden state signifies the past knowledge that the
network currently holds at a given time step. This hidden state
is updated at every time step to signify the change in the
knowledge of the network about the past. The hidden state is
updated using the following recurrence relation:-
Issues or difficulties of Standard RNNs

1.Vanishing Gradient: Text generation, machine translation, and


stock market prediction are just a few examples of the time-
dependent and sequential data problems that can be modelled
with recurrent neural networks. You will discover, though, that
the gradient problem makes training RNN difficult.
2.Exploding Gradient: An Exploding Gradient occurs when a
neural network is being trained and the slope tends to grow
exponentially rather than decay. Large error gradients that
build up during training lead to very large updates to the
neural network model weights, which is the source of this
issue.
• Advantages and Disadvantages of Recurrent Neural Network
• Advantages
1.An RNN remembers each and every piece of information through time. It
is useful in time series prediction only because of the feature to remember
previous inputs as well. This is called Long Short Term Memory.
2.Recurrent neural networks are even used with convolutional layers to
extend the effective pixel neighborhood.
• Disadvantages
1.Gradient vanishing and exploding problems.
2.Training an RNN is a very difficult task.
3.It cannot process very long sequences if using tanh or relu as an
activation function.
• Applications of Recurrent Neural Network
1.Language Modelling and Generating Text
2.Speech Recognition
3.Machine Translation
4.Image Recognition, Face detection
5.Time series Forecasting
Variation Of Recurrent Neural Network
(RNN)
• To overcome the problems like vanishing gradient and
exploding gradient descent several new advanced versions of
RNNs are formed some of these are as;
1.Bidirectional Neural Network (BiNN)
2.Long Short-Term Memory (LSTM)
Bidirectional Neural Network (BiNN)
• A BiNN is a variation of a Recurrent Neural Network in which the input
information flows in both direction and then the output of both direction
are combined to produce the input. BiNN is useful in situations when the
context of the input is more important such as Nlp tasks and Time-series
analysis problems.
Long Short-Term Memory (LSTM)
• Long Short-Term Memory works on the read-write-and-forget principle
where given the input information network reads and writes the most
useful information from the data and it forgets about the information
which is not important in predicting the output. For doing this three new
gates are introduced in the RNN. In this way, only the selected
information is passed through the network.
Long Short Term Memory Networks Explanation

Prerequisites: Recurrent Neural Networks


Long Short Term Memory
Network(LSTM).
To solve the problem of Vanishing and Exploding Gradients in a Deep
Recurrent Neural Network, many variations were developed.
One of the most famous of them is the Long Short Term Memory
Network(LSTM).
In concept, an LSTM recurrent unit tries to “remember” all the past
knowledge that the network is seen so far and to “forget” irrelevant
data.
This is done by introducing different activation function layers called
“gates” for different purposes.
Each LSTM recurrent unit also maintains a vector called the Internal
Cell State which conceptually describes the information that was
chosen to be retained by the previous LSTM recurrent unit.
purposes for four different gates
1. Forget Gate(f): At forget gate the input is combined with the previous output to generate a fraction between 0 and 1,
that determines how much of the previous state need to be preserved (or in other words, how much of the state
should be forgotten). This output is then multiplied with the previous state. Note: An activation output of 1.0 means
“remember everything” and activation output of 0.0 means “forget everything.” From a different perspective, a better
name for the forget gate might be the “remember gate”
2. Input Gate(i): Input gate operates on the same signals as the forget gate, but here the objective is to decide which
new information is going to enter the state of LSTM. The output of the input gate (again a fraction between 0 and 1) is
multiplied with the output of tan h block that produces the new values that must be added to previous state. This
gated vector is then added to previous state to generate current state
3. Input Modulation Gate(g): It is often considered as a sub-part of the input gate and much literature on LSTM’s does
not even mention it and assume it is inside the Input gate. It is used to modulate the information that the Input gate
will write onto the Internal State Cell by adding non-linearity to the information and making the information Zero-
mean. This is done to reduce the learning time as Zero-mean input has faster convergence. Although this gate’s
actions are less important than the others and are often treated as a finesse-providing concept, it is good practice to
include this gate in the structure of the LSTM unit.
4. Output Gate(o): At output gate, the input and previous state are gated as before to generate another scaling fraction
that is combined with the output of tanh block that brings the current state. This output is then given out. The output
and state are fed back into the LSTM block.
Working of an LSTM recurrent unit:
Gated Recurrent Unit Networks
Gated Recurrent Unit Networks

• Gated Recurrent Unit (GRU) is a type of recurrent neural network


(RNN) that was introduced by Cho et al. in 2014 as a simpler
alternative to Long Short-Term Memory (LSTM) networks. Like
LSTM, GRU can process sequential data such as text, speech, and
time-series data.
• The basic idea behind GRU is to use gating mechanisms to
selectively update the hidden state of the network at each time
step. The gating mechanisms are used to control the flow of
information in and out of the network. The GRU has two gating
mechanisms, called the reset gate and the update gate.
• The reset gate determines how much of the previous hidden state
should be forgotten, while the update gate determines how much of
the new input should be used to update the hidden state. The
output of the GRU is calculated based on the updated hidden state.
How do Gated Recurrent Units solve the problem of vanishing gradients?
• In Deep Learning, Many Complex problems can be solved by
constructing better neural network architecture. The RNN(Recurrent
Neural Network) and its variants are much useful in sequence to
sequence learning. The RNN variant LSTM (Long Short-term Memory)
is the most used cell in seq-seq learning tasks.
Encoder-Decoder Model

There are three main blocks in the encoder-decoder model,


•Encoder
•Hidden Vector
•Decoder
The Encoder will convert the input sequence into a single-dimensional
vector (hidden vector). The decoder will convert the hidden vector into
the output sequence.
Encoder-Decoder models are jointly trained to maximize the conditional
probabilities of the target sequence given the input sequence.
Encoder

• Multiple RNN cells can be stacked together to form the encoder. RNN
reads each inputs sequentially
• For every timestep (each input) t, the hidden state (hidden vector) h
is updated according to the input at that timestep X[i].
• After all the inputs are read by encoder model, the final hidden state
of the model represents the context/summary of the whole input
sequence.
• Example: Consider the input sequence “I am a Student” to be
encoded. There will be totally 4 timesteps ( 4 tokens) for the Encoder
model. At each time step, the hidden state h will be updated using
the previous hidden state and the current input.
Case study- spam classification
• https://medium.com/@azimkhan8018/email-spam-detection-with-
machine-learning-a-comprehensive-guide-b65c6936678b
What is Sentiment Analysis?

• Sentiment analysis is the process of classifying whether a block of text is


positive, negative, or neutral. The goal that Sentiment mining tries to gain
is to be analysed people’s opinions in a way that can help businesses
expand. It focuses not only on polarity (positive, negative & neutral) but
also on emotions (happy, sad, angry, etc.). It uses various Natural
Language Processing algorithms such as Rule-based, Automatic, and
Hybrid.
• let’s consider a scenario, if we want to analyze whether a product is
satisfying customer requirements, or is there a need for this product in the
market. We can use sentiment analysis to monitor that product’s reviews.
Sentiment analysis is also efficient to use when there is a large set of
unstructured data, and we want to classify that data by automatically
tagging it. Net Promoter Score (NPS) surveys are used extensively to gain
knowledge of how a customer perceives a product or service. Sentiment
analysis also gained popularity due to its feature to process large
volumes of NPS responses and obtain consistent results quickly.
Why is Sentiment Analysis Important?
• Sentiment analysis is the contextual meaning of words that indicates the social sentiment of a brand and also helps the business to determine whether the
product they are manufacturing is going to make a demand in the market or not.
• According to the survey,80% of the world’s data is unstructured. The data needs to be analyzed and be in a structured manner whether it is in the form of emails,
texts, documents, articles, and many more.
1. Sentiment Analysis is required as it stores data in an efficient, cost friendly.
2. Sentiment analysis solves real-time issues and can help you solve all real-time scenarios.
• Here are some key reasons why sentiment analysis is important for business:
• Customer Feedback Analysis: Businesses can analyze customer reviews, comments, and feedback to understand the sentiment behind them helping in
identifying areas for improvement and addressing customer concerns, ultimately enhancing customer satisfaction.
• Brand Reputation Management: Sentiment analysis allows businesses to monitor their brand reputation in real-time.
By tracking mentions and sentiments on social media, review platforms, and other online channels, companies can respond promptly to both positive and
negative sentiments, mitigating potential damage to their brand.
• Product Development and Innovation: Understanding customer sentiment helps identify features and aspects of their products or services that are well-
received or need improvement. This information is invaluable for product development and innovation, enabling companies to align their offerings with customer
preferences.
• Competitor Analysis: Sentiment Analysis can be used to compare the sentiment around a company’s products or services with those of competitors.
Businesses identify their strengths and weaknesses relative to competitors, allowing for strategic decision-making.
• Marketing Campaign Effectiveness
Businesses can evaluate the success of their marketing campaigns by analyzing the sentiment of online discussions and social media mentions.
Positive sentiment indicates that the campaign is resonating with the target audience, while negative sentiment may signal the need for adjustments.
What are the Types of Sentiment Analysis?

• Fine-Grained Sentiment Analysis


• This depends on the polarity base. This category can be designed as very positive, positive, neutral, negative, or very negative. The rating is done on a scale of 1 to 5. If the rating is 5 then it is very
positive, 2 then negative, and 3 then neutral.
• Emotion detection
• The sentiments happy, sad, angry, upset, jolly, pleasant, and so on come under emotion detection. It is also known as a lexicon method of sentiment analysis.
• Aspect-Based Sentiment Analysis
• It focuses on a particular aspect for instance if a person wants to check the feature of the cell phone then it checks the aspect such as the battery, screen, and camera quality then aspect based is
used.
• Multilingual Sentiment Analysis
• Multilingual consists of different languages where the classification needs to be done as positive, negative, and neutral. This is highly challenging and comparatively difficult.
• How does Sentiment Analysis work?
• Sentiment Analysis in NLP, is used to determine the sentiment expressed in a piece of text, such as a review, comment, or social media post.
• The goal is to identify whether the expressed sentiment is positive, negative, or neutral. let’s understand the overview in general two steps:
• Preprocessing
• Starting with collecting the text data that needs to be analysed for sentiment like customer reviews, social media posts, news articles, or any other form of textual content. The collected text is pre-
processed to clean and standardize the data with various tasks:
• Removing irrelevant information (e.g., HTML tags, special characters).
• Tokenization: Breaking the text into individual words or tokens.
• Removing stop words (common words like “and,” “the,” etc. that don’t contribute much to sentiment).
• Stemming or Lemmatization: Reducing words to their root form.
What are the Approaches to Sentiment
Analysis?
• There are three main approaches used:
• Rule-based
• Over here, the lexicon method, tokenization, and parsing come in the rule-based. The approach is that counts the
number of positive and negative words in the given dataset. If the number of positive words is greater than the
number of negative words then the sentiment is positive else vice-versa.
• Machine Learning
• This approach works on the machine learning technique. Firstly, the datasets are trained and predictive analysis is
done. The next process is the extraction of words from the text is done. This text extraction can be done using
different techniques such as Naive Bayes, Support Vector machines, hidden Markov model, and conditional random
fields like this machine learning techniques are used.
• Neural Network
• In the last few years neural networks have evolved at a very rate. It involves using artificial neural networks, which are
inspired by the structure of the human brain, to classify text into positive, negative, or neutral sentiments. it
has Recurrent neural networks, Long short-term memory, Gated recurrent unit, etc to process sequential data like
text.
• Hybrid Approach
• It is the combination of two or more approaches i.e. rule-based and Machine Learning approaches. The surplus is that
the accuracy is high compared to the other two approaches.
Self Study for students
• Case study- Sentiment Analysis using Neural Network
Thank You

You might also like