Deep Learning RNN
Deep Learning RNN
B.Tech (B)
Ms. Punam R. Patil
Vision of the Department:
To provide prominent computer engineering education with socio-moral values.
2
Deep Learning (PECO7031T)
3
Course Objectives:
Prerequisite: Artificial Intelligence, Machine Learning.
Course Objectives:
• To understand Hyper parameter Tuning.
• To explore Deep Learning Techniques with different learning
strategies.
• To design Deep Learning Models for real time applications.
4
Course Outcomes
5
Unit IV: Recurrent Neural Networks
10 Hrs
• Introduction to Sequence Models and RNNs, Recurrent Neural
Network Model, Backpropagation Through Time
• Different Types of RNNs: Unfolded RNNs, Seq2Seq RNNs, Long
Short-Term Memory (LSTM), Bidirectional RNN, Vanishing
Gradients with RNNs, Gated Recurrent Unit (GRU),
• RNN applications.
6
Sequence Models
• Sequence models are the machine learning models that input or
output sequences of data.
• Sequential data includes text streams, audio clips, video clips,
time-series data and etc.
• Recurrent Neural Networks (RNNs) is a popular algorithm used in
sequence models.
7
Applications of Sequence Models
1. Speech recognition: In speech recognition, an audio clip is given as
an input and then the model has to generate its text transcript.
Here both the input and output are sequences of data.
https://www.codingninjas.com/studio/library/sequence-models 8
2. Sentiment Classification: In sentiment classification opinions
expressed in a piece of text is categorized. Here the input is a
sequence of words.
As we know, the sequence of words changes the sentence's meaning.
So to maintain a sequence of words during classification, we use
sequence models.
https://www.codingninjas.com/studio/library/sequence-models 9
3. Video Activity Recognition: In video activity recognition, the
model needs to identify the activity in a video clip. A video clip is a
sequence of video frames, therefore in case of video activity
recognition input is a sequence of data.
https://www.codingninjas.com/studio/library/sequence-models 10
Recurrent Neural Networks (RNNs)
• Recurrent Neural Network(RNN) is a type of Neural Network where the output from
the previous step is fed as input to the current step.
• In traditional neural networks, all the inputs and outputs are independent of each
other, but in cases when it is required to predict the next word of a sentence, the
previous words are required and hence there is a need to remember the previous
words.
• Thus RNN came into existence, which solved this issue with the help of a Hidden Layer.
• The main and most important feature of RNN is its Hidden state, which remembers
some information about a sequence.
• The state is also referred to as Memory State since it remembers the previous input to
the network.
• It uses the same parameters for each input as it performs the same task on all the
inputs or hidden layers to produce the output.
• This reduces the complexity of parameters, unlike other neural networks. 11
Recurrent Neural Networks (RNNs)
12
Recurrent Neural Networks (RNNs)
• RNN is a special neural network suited for sequential (or recurrent) data.
• Examples of sequential data include:
1. Sentences (sequences of words).
2. Time series (sequences of stock prices, for instance).
3. Videos (sequences of frames).
• RNNs are mostly used in the field of Natural Language Processing (NLP). RNN
maintains internal memory, due to this they are very efficient for machine learning
problems that involve sequential data. RNNs are also used in time series predictions as
well.
• The main advantage of using RNNs instead of standard neural networks is that the
features are not shared in standard neural networks. Weights are shared across time
in RNN. RNNs can remember its previous inputs but Standard Neural Networks are
not capable of remembering previous inputs. RNN takes historical information for
computation.
13
Recurrent Neural Networks (RNNs)
• In RNN, the loss function is defined based on the loss at each time
step.
14
Backpropagation Through Time-RNN
15
Backpropagation Through Time-RNN
• The general algorithm of Backpropagation is as follows:
1. We first train input data and propagate it through the network to get an
output.
2. Compare the predicted outcomes to the expected results and calculate the
error.
3. Then, we calculate the derivatives of the error concerning the network
weights.
4. We use these calculated derivatives to adjust the weights to minimize the
error.
5. Repeat the process until the error is minimized.
16
Long Short-Term Memory (LSTM)
• Long Short Term Memory is a kind of recurrent neural network. In
RNN output from the last step is fed as input in the current step.
• LSTM was designed by Hochreiter & Schmidhuber.
• It tackled the problem of long-term dependencies of RNN in which
the RNN cannot predict the word stored in the long-term memory but
can give more accurate predictions from the recent information.
• As the gap length increases RNN does not give an efficient
performance. LSTM can by default retain the information for a long
period of time. It is used for processing, predicting, and classifying on
the basis of time-series data.
17
LSTM- Long Short-Term Memory
LSTM stands for long short-term memory networks, used in the field of Deep
Learning.
It is a variety of recurrent neural networks (RNNs) that are capable of
learning long-term dependencies, especially in sequence prediction problems.
LSTM has feedback connections, i.e., it is capable of processing the entire
sequence of data, apart from single data points such as images.
LSTM is a special kind of RNN, which shows outstanding performance on a
large variety of problems.
The central role of an LSTM model is held by a memory cell known as a ‘cell
state’ that maintains its state over time.
LSTM
The cell state is the horizontal line that runs through the top of the below diagram.
It can be visualized as a conveyor belt through which information just flows, unchanged.
Information can be added to or removed from the cell state in LSTM and is regulated by gates.
These gates optionally let the information flow in and out of the cell.
It contains a point-wise multiplication operation and a sigmoid neural net layer that assist the
mechanism.
The sigmoid layer gives out numbers between zero and one, where zero means ‘nothing should be
let through,’ and one means ‘everything should be let through.’
LSTM
Cell state
Output
Advantages of LSTM
21
Disadvantages of LSTM
22
LSTM Applications
1. sentiment analysis
2. Language modeling,
3. Speech recognition,
4. machine translation and
5. Video analysis
Applications of LSTM includes:
• Long Short-Term Memory (LSTM) is a powerful type of Recurrent
Neural Network (RNN) that has been used in a wide range of
applications. Here are a few famous applications of LSTM:
• Language Modeling: LSTMs have been used for natural language
processing tasks such as language modeling, machine translation, and
text summarization. They can be trained to generate coherent and
grammatically correct sentences by learning the dependencies
between words in a sentence.
• Speech Recognition: LSTMs have been used for speech recognition
tasks such as transcribing speech to text and recognizing spoken
commands. They can be trained to recognize patterns in speech and
match them to the corresponding text.
24
Applications of LSTM includes:
• Time Series Forecasting: LSTMs have been used for time series forecasting tasks such
as predicting stock prices, weather, and energy consumption. They can learn patterns
in time series data and use them to make predictions about future events.
• Anomaly Detection: LSTMs have been used for anomaly detection tasks such as
detecting fraud and network intrusion. They can be trained to identify patterns in
data that deviate from the norm and flag them as potential anomalies.
• Recommender Systems: LSTMs have been used for recommendation tasks such as
recommending movies, music, and books. They can learn patterns in user behavior
and use them to make personalized recommendations.
• Video Analysis: LSTMs have been used for video analysis tasks such as object
detection, activity recognition, and action classification. They can be used in
combination with other neural network architectures, such as Convolutional Neural
Networks (CNNs), to analyze video data and extract useful information.
25
Bidirectional RNN
• BRNNs process input sequences in both the forward and backward directions.
This is the main distinction between BRNNs and conventional recurrent neural
networks.
• A BRNN has two distinct recurrent hidden layers, one of which processes the
input sequence forward and the other of which processes it backward. After that,
the results from these hidden layers are collected and input into a prediction-
making final layer. Any recurrent neural network cell, such as Long Short-Term
Memory (LSTM) or Gated Recurrent Unit, can be used to create the recurrent
hidden layers.
• The BRNN functions similarly to conventional recurrent neural networks in the
forward direction, updating the hidden state depending on the current input and
the prior hidden state at each time step. The backward hidden layer, on the other
hand, analyses the input sequence in the opposite manner, updating the hidden
state based on the current input and the hidden state of the next time step.
26
Bidirectional RNN
• Compared to conventional unidirectional recurrent neural networks,
the accuracy of the BRNN is improved since it can process information
in both directions and account for both past and future contexts.
Because the two hidden layers can complement one another and give
the final prediction layer more data, using two distinct hidden layers
also offers a type of model regularisation.
• In order to update the model parameters, the gradients are computed
for both the forward and backward passes of the backpropagation
through the time technique that is typically used to train BRNNs. The
input sequence is processed by the BRNN in a single forward pass at
inference time, and predictions are made based on the combined
outputs of the two hidden layers.
27
Bidirectional RNN
28
Working of Bidirectional Recurrent Neural Network
• Inputting a sequence: A sequence of data points, each represented as a vector with the same
dimensionality, are fed into a BRNN. The sequence might have different lengths.
• Dual Processing: Both the forward and backward directions are used to process the data. On the
basis of the input at that step and the hidden state at step t-1, the hidden state at time step t is
determined in the forward direction. The input at step t and the hidden state at step t+1 are used
to calculate the hidden state at step t in a reverse way.
• Computing the hidden state: A non-linear activation function on the weighted sum of the input
and previous hidden state is used to calculate the hidden state at each step. This creates a memory
mechanism that enables the network to remember data from earlier steps in the process.
• Determining the output: A non-linear activation function is used to determine the output at each
step from the weighted sum of the hidden state and a number of output weights. This output has
two options: it can be the final output or input for another layer in the network.
• Training: The network is trained through a supervised learning approach where the goal is to
minimize the discrepancy between the predicted output and the actual output. The network
adjusts its weights in the input-to-hidden and hidden-to-output connections during training
through backpropagation. 29
Bidirectional RNN
• To calculate the output from an RNN unit, we use the following formula:
Ht (Forward) = A(Xt * WXH (forward) + Ht-1 (Forward) * WHH (Forward) + bH (Forward)
Ht (Backward) = A(Xt * WXH (Backward) + Ht+1 (Backward) * WHH (Backward) + bH (Backward)
where,
A = activation function,
W = weight matrix
b = bias
• The hidden state at time t is given by a combination of Ht (Forward) and Ht (Backward). The output at any given
hidden state is : Yt = Ht * WAY + by
• The training of a BRNN is similar to backpropagation through a time algorithm. BPTT algorithm works as follows:
• Roll out the network and calculate errors at each iteration
• Update weights and roll up the network.
However, because forward and backward passes in a BRNN occur simultaneously, updating the weights for the two
processes may occur at the same time. This produces inaccurate outcomes. Thus, the following approach is used to
train a BRNN to accommodate forward and backward passes individually.
30
Advantages of Bidirectional RNN
• Context from both past and future: With the ability to process sequential input both
forward and backward, BRNNs provide a thorough grasp of the full context of a sequence.
Because of this, BRNNs are effective at tasks like sentiment analysis and speech
recognition.
• Enhanced accuracy: BRNNs frequently yield more precise answers since they take both
historical and upcoming data into account.
• Efficient handling of variable-length sequences: When compared to conventional RNNs,
which require padding to have a constant length, BRNNs are better equipped to handle
variable-length sequences.
• Resilience to noise and irrelevant information: BRNNs may be resistant to noise and
irrelevant data that are present in the data. This is so because both the forward and
backward paths offer useful information that supports the predictions made by the
network.
• Ability to handle sequential dependencies: BRNNs can capture long-term links between
sequence pieces, making them extremely adept at handling complicated sequential
dependencies. 31
Disadvantages of Bidirectional RNN
• Computational complexity: Given that they analyze data both forward and
backward, BRNNs can be computationally expensive due to the increased
amount of calculations needed.
• Long training time: BRNNs can also take a while to train because there are
many parameters to optimize, especially when using huge datasets.
• Difficulty in parallelization: Due to the requirement for sequential
processing in both the forward and backward directions, BRNNs can be
challenging to parallelize.
• Overfitting: BRNNs are prone to overfitting since they include many
parameters that might result in too complicated models, especially when
trained on short datasets.
• Interpretability: Due to the processing of data in both forward and
backward directions, BRNNs can be tricky to interpret since it can be
difficult to comprehend what the model is doing and how it is producing
predictions.
32
Advantages of Recurrent Neural Network
• Recurrent Neural Networks (RNNs) have several advantages over
other types of neural networks, including:
1. Ability To Handle Variable-Length Sequences
• RNNs are designed to handle input sequences of variable length,
which makes them well-suited for tasks such as speech
recognition, natural language processing, and time series analysis.
2. Memory of Past Inputs
• RNNs have a memory of past inputs, which allows them to capture
information about the context of the input sequence. This makes
them useful for tasks such as language modeling, where the
meaning of a word depends on the context in which it appears.
https://www.simplilearn.com/tutorials/deep-learning-tutorial/rnn
33
Gated Recurrent Unit (GRU)
37
How GRU Works?
1. The reset gate r and update gate z are computed using the current input x and the previous
hidden state h_t-1
r_t = sigmoid(W_r * [h_t-1, x_t])
z_t = sigmoid(W_z * [h_t-1, x_t])
where W_r and W_z are weight matrices that are learned during training.
2. The candidate activation vector h_t~ is computed using the current input x and a modified
version of the previous hidden state that is "reset" by the reset gate:
h_t~ = tanh(W_h * [r_t * h_t-1, x_t])
where W_h is another weight matrix.
3. The new hidden state h_t is computed by combining the candidate activation vector with the
previous hidden state, weighted by the update gate:
h_t = (1 - z_t) * h_t-1 + z_t * h_t~
Understanding Gated Recurrent Unit (GRU) in Deep Learning | by Anishnama | Medium 38
How GRU Works?
• Overall, the reset gate determines how much of the previous
hidden state to remember or forget, while the update gate
determines how much of the candidate activation vector to
incorporate into the new hidden state. The result is a compact
architecture that is able to selectively update its hidden state
based on the input and previous hidden state, without the need for
a separate memory cell state like in LSTM.
44
One to One RNN
45
One to Many RNN
• This type of neural network has a single input and multiple outputs.
An example of this is the image caption.
46
Many to One RNN
47
Many to Many RNN
48
Advantages of Recurrent Neural Network
3. Parameter Sharing
• RNNs share the same set of parameters across all time steps,
which reduces the number of parameters that need to be learned
and can lead to better generalization.
4. Non-Linear Mapping
• RNNs use non-linear activation functions, which allows them to
learn complex, non-linear mappings between inputs and outputs.
5. Sequential Processing
• RNNs process input sequences sequentially, which makes them
computationally efficient and easy to parallelize.
https://www.simplilearn.com/tutorials/deep-learning-tutorial/rnn
49
Advantages of Recurrent Neural Network
6.Flexibility
• RNNs can be adapted to a wide range of tasks and input types, including
text, speech, and image sequences.
7. Improved Accuracy
• RNNs have been shown to achieve state-of-the-art performance on a
variety of sequence modeling tasks, including language modeling, speech
recognition, and machine translation.
https://www.simplilearn.com/tutorials/deep-learning-tutorial/rnn
51
Disadvantages of Recurrent Neural Network
3. Difficulty In Capturing Long-Term Dependencies
• Although RNNs are designed to capture information about past
inputs, they can struggle to capture long-term dependencies in the
input sequence. This is because the gradients can become very small
as they propagate through time, which can cause the network to
forget important information.
4. Lack Of Parallelism
• RNNs are inherently sequential, which makes it difficult to parallelize
the computation. This can limit the speed and scalability of the
network.
https://www.simplilearn.com/tutorials/deep-learning-tutorial/rnn
52
Disadvantages of Recurrent Neural Network
5. Difficulty In Choosing The Right Architecture
• There are many different variants of RNNs, each with its own advantages
and disadvantages. Choosing the right architecture for a given task can be
challenging, and may require extensive experimentation and tuning.
6. Difficulty In Interpreting The Output
• The output of an RNN can be difficult to interpret, especially when
dealing with complex inputs such as natural language or audio. This can
make it difficult to understand how the network is making its
predictions.