DeepLearning Unit-III
DeepLearning Unit-III
Unit-III
Recurrent Neural Network
Dr. Rajesh Thumma
Associate. Professor
Anurag University
Recurrent Neural Network (RNN)
• A recurrent neural network (RNN) is a type of artificial neural network which uses
sequential data or time series data.
• RNNs used for speech recognition, voice recognition, time series prediction,
and natural language processing and image captioning; they are incorporated into
popular applications such as Siri, voice search, and Google Translate.
• Like ANN and CNNs, RNNs utilize training data to learn. They are distinguished by
their “memory” as they take information from prior inputs to influence the current
input and output. While traditional deep neural networks assume that inputs and
outputs are independent of each other, the output of RNNs depend on the prior
elements within the sequence.
Recurrent Neural Network
• The most important feature of RNN is its Hidden state, which remembers some
information about a sequence. The state is also referred to as Memory State since it
remembers the previous input to the network.
The nodes in different layers of the neural network are compressed to form a single layer of
recurrent neural networks. A, B, and C are the parameters of the network.
Recurrent Neural Network
Fig: Fully connected Recurrent Neural Network
Here, “x” is the input layer, “h” is the hidden layer, and “y” is the output layer. A, B, and C are
the network parameters used to improve the output of the model. At any given time t, the
current input is a combination of input at x(t) and x(t-1). The output at any given time is
fetched back to the network to improve on the output.
Recurrent Neural Network
Why Recurrent Neural Networks?
RNN were created because there were a few issues in the feed-forward neural network:
The solution to these issues is the RNN. An RNN can handle sequential data, accepting the
current input data, and previously received inputs. RNNs can memorize previous inputs due to
The input layer ‘x’ takes in the input to the neural network and processes it and passes it onto
the middle layer. The middle layer ‘h’ can consist of multiple hidden layers, each with its own
activation functions and weights and biases. The Recurrent Neural Network will standardize the
different activation functions and weights and biases so that each hidden layer has the same
parameters. Then, instead of creating multiple hidden layers, it will create one and loop over it
as many times as required.
Training through RNN
1. The network takes a single time-step of the input.
2. We can calculate the current state through the current input and the previous state.
3. Now, the current state through ht-1 for the next state.
4. There is n number of steps, and in the end, all the information can be joined.
5. After completion of all the steps, the final step is for calculating the output.
6. At last, we compute the error by calculating the difference between actual output
and the predicted output.
7. The error is backpropagated to the network to adjust the weights and produce a
better outcome.
Applications of Recurrent Neural Networks
Applications of Recurrent Neural Networks
Applications of Recurrent Neural Networks
Types of Recurrent Neural Networks
There are four types of Recurrent Neural Networks:
1. One to One
2. One to Many
3. Many to One
4. Many to Many
Types of Recurrent Neural Networks
One to One RNN: This type of neural network is known as the Vanilla Neural
Network. It's used for general machine learning problems, which has a single
input and a single output.
Types of Recurrent Neural Networks
One to Many RNN: This type of neural network has a single input and multiple
outputs. An example of this is the image caption.
Types of Recurrent Neural Networks
Many to One RNN: This RNN takes a sequence of inputs and generates a single
output. Sentiment analysis is a good example of this kind of network where a
given sentence can be classified as expressing positive or negative sentiments.
Types of Recurrent Neural Networks
Many to Many RNN: This RNN takes a sequence of inputs and generates a
sequence of outputs. Machine translation is one of the examples.
Advantages of Recurrent Neural Network
• Ability To Handle Variable-Length Sequences: RNNs are designed to handle input
sequences of variable length, which makes them well-suited for tasks such as speech
recognition, natural language processing, and time series analysis.
• Memory of Past Inputs: RNNs have a memory of past inputs, which allows them to
capture information about the context of the input sequence. This makes them useful
for tasks such as language modeling, where the meaning of a word depends on the
context in which it appears.
• Parameter Sharing: RNNs share the same set of parameters across all time steps,
which reduces the number of parameters that need to be learned and can lead to better
generalization.
Advantages of Recurrent Neural Network
• Non-Linear Mapping: RNNs use non-linear activation functions, which allows
them to learn complex, non-linear mappings between inputs and outputs.
• Sequential Processing: RNNs process input sequences sequentially, which makes
them computationally efficient and easy to parallelize.
• Flexibility: RNNs can be adapted to a wide range of tasks and input types,
including text, speech, and image sequences.
• Improved Accuracy: RNNs have been shown to achieve state-of-the-art
performance on a variety of sequence modeling tasks, including language
modeling, speech recognition, and machine translation.
Disadvantages of Recurrent Neural Network
• Vanishing And Exploding Gradients: RNNs can suffer from the problem of
vanishing or exploding gradients, which can make it difficult to train the
network effectively. This occurs when the gradients of the loss function with
respect to the parameters become very small or very large as they propagate
through time.
• Computational Complexity: RNNs can be computationally expensive to train,
especially when dealing with long sequences. This is because the network has to
process each input in sequence, which can be slow.
• Difficulty In Capturing Long-Term Dependencies: Although RNNs are
designed to capture information about past inputs, they can struggle to capture
long-term dependencies in the input sequence. This is because the gradients can
become very small as they propagate through time, which can cause the
network to forget important information.
Disadvantages of Recurrent Neural Network
Lack Of Parallelism: RNNs are inherently sequential, which makes it difficult to
parallelize the computation. This can limit the speed and scalability of the
network.
Difficulty In Choosing The Right Architecture: There are many different
variants of RNNs, each with its own advantages and disadvantages. Choosing the
right architecture for a given task can be challenging, and may require extensive
experimentation and tuning.
Difficulty In Interpreting The Output: The output of an RNN can be difficult to
interpret, especially when dealing with complex inputs such as natural language or
audio. This can make it difficult to understand how the network is making its
predictions.
Backpropagation Through Time (BPTT)
When we apply a Backpropagation algorithm to a RNN with time series data as its
input, we call it backpropagation through time.
A single input is sent into the network at a time in a normal RNN, and a single output
is obtained. Backpropagation, on the other hand, uses both the current and prior
inputs as input. This is referred to as a timestep, and one timestep will consist of
multiple time series data points entering the RNN at the same time.
3. Bidirectional RNNs
4. Encoder-Decoder RNNs
Long Short-Term Memory (LSTM) Networks
LSTM is a type of RNN that is designed to handle the vanishing gradient problem.
LSTM networks are a modified version of RNNs, which makes it easier to remember
past data in memory.
It can process not only single data points (such as images) but also entire sequences of
data (such as speech or video). Ex: connected handwriting recognition, or speech
recognition.
General LSTM unit is composed of a cell, an input gate, an output gate, and a forget
gate. The cell remembers values over arbitrary time intervals, and three gates regulate
the flow of information into and out of the cell.
LSTM is well-suited to classify, process, and predict the time series given of unknown
duration.
Long Short-Term Memory (LSTM) Networks
Long Short-Term Memory (LSTM) Networks
1. Input gate: It discover which value from input should be used to modify the
memory. Sigmoid function decides which values to let through 0 or 1. And tanh function gives
weightage to the values which are passed, deciding their level of importance ranging
from -1 to 1.
2. Forget gate: It discover the details to be discarded from the block. A sigmoid function decides
it. It looks at the previous state (ht-1) and the content input (Xt) and outputs a number between
0(omit this) and 1(keep this) in the cell state Ct-1.
3. Output gate: The input and the memory of the block are used to decide the output. Sigmoid
function decides which values to let through 0 or 1. And tanh function decides which values to
let through 0, 1. And tanh function gives weightage to the values which are passed, deciding
their level of importance ranging from -1 to 1 and multiplied with an output of sigmoid.
Workings of LSTMs in RNN
LSTMs work in a 3-step process.
The first step in the LSTM is to decide which information should be omitted
from the cell in that particular time step. The sigmoid function determines
this. It looks at the previous state (ht-1) along with the current input xt and
computes the function.
Working of LSTM
Step 1: Decide How Much Past Data It Should Remember Cont….
John.
Working of LSTM
Step 2: Decide How Much This Unit Adds to the Current State
In this, there are two parts. sigmoid function and tanh function. In the sigmoid function, it
decides which values to let through (0 or 1) and tanh function gives weightage to the values
which are passed, deciding their level of importance (-1 to 1).
With the current input at x(t), the input gate analyzes the important information — John plays
football, and the fact that he was the captain of his college team is important.
“He told me yesterday over the phone” is less important; hence it's forgotten. This process of
adding some new information can be done via the input gate.
Working of LSTM
Step 3: Decide What Part of the Current Cell State Makes It to the Output
The third step is to decide what the output will be. First, we run a sigmoid layer, which decides
what parts of the cell state make it to the output. Then, we put the cell state through tanh to push
the values to be between -1 and 1 and multiply it by the output of the sigmoid gate.
Let’s consider this example to predict the next word in the sentence: “John played tremendously
well against the opponent and won for his team. For his contributions, brave ____ was awarded
player of the match.” There could be many choices for the empty space. The current input brave
is an adjective, and adjectives describe a noun. So, “John” could be the best output after brave.
Variant RNN Architectures
Gated Recurrent Unit (GRU) Networks:
GRU is another type of RNN that is designed to address the vanishing gradient
problem.
It has two gates: the reset gate and the update gate.
The reset gate determines how much of the previous state should be forgotten
The update gate determines how much of the new state should be remembered.
This allows the GRU network to selectively update its internal state based on the
input sequence.
Working of GRU
• GRU uses a reset gate and an update gate to solve the vanishing gradient problem. These
gates decide what information to be sent to the output. They can keep the information from
long back without diminishing it as the training continues. We can visualize the architecture
of GRU below:
Working of GRU
Reset gate: The reset gate determines the information of the past that it needs to forget. It
uses the same formula as the update gate.
Working of GRU
Update gate: It is responsible for long-term memory. It determines the amount of
information on the previous steps that must be passed further. The equation used in the
update gate is:
We multiply Xt, the current input, by its weight (W), and ht-1 with weight U. We then calculate
the Hadamard product, i.e., the element-wise product between the output of the reset gate (rt)
Uht-1. We then take the sum and apply the tanh activation function.
This is how GRU solves the vanishing gradient problem. It keeps the relevant information and
passes down the next step. It can perform excellently if trained correctly.
Variant RNN Architectures
Bidirectional RNNs: Bidirectional RNNs are designed to process input
sequences in both forward and backward directions. This allows the network to
capture both past and future context, which can be useful for speech recognition
and natural language processing tasks.
Encoder-Decoder RNNs: Encoder-decoder RNNs consist of two RNNs: an
encoder network that processes the input sequence and produces a fixed-length
vector representation of the input and a decoder network that generates the output
sequence based on the encoder's representation. This architecture is commonly
used for sequence-to-sequence tasks such as machine translation.
LSTM