0% found this document useful (0 votes)
4 views14 pages

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are designed to process sequential data and are particularly effective for tasks like time series analysis and natural language processing. They utilize feedback loops to retain information, but face challenges such as the vanishing and exploding gradient problems, which hinder learning long-term dependencies. Variants like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) have been developed to address these issues, improving performance in various applications.

Uploaded by

Kowsalya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views14 pages

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are designed to process sequential data and are particularly effective for tasks like time series analysis and natural language processing. They utilize feedback loops to retain information, but face challenges such as the vanishing and exploding gradient problems, which hinder learning long-term dependencies. Variants like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) have been developed to address these issues, improving performance in various applications.

Uploaded by

Kowsalya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Recurrent Neural Networks

• Recurrent Neural Networks (RNNs) are a type of artificial neural network designed to process sequences of data.
They work especially well for jobs requiring sequences, such as time series data, voice, natural language, and
other activities.
• RNN works on the principle of saving the output of a particular layer and feeding this back to the input in order to
predict the output of the layer.
• Below is how you can convert a Feed-Forward Neural Network into a Recurrent Neural Network:

Fig: Simple Recurrent Neural Network

1
Recurrent Neural Networks (Cont…)
The nodes in different layers of the neural network are compressed to form a single layer of recurrent neural
networks. A, B, and C are the parameters of the network.

2
Recurrent Neural Networks (Cont…)
Here, “x” is the input layer, “h” is the hidden layer, and “y” is the output layer. A, B, and C are the network
parameters used to improve the output of the model. At any given time t, the current input is a combination of
input at x(t) and x(t-1). The output at any given time is fetched back to the network to improve on the output.

Fig: Fully connected Recurrent Neural Network


3
Recurrent Neural Networks - Work
• In Recurrent Neural networks, the information cycles through a loop to the middle hidden layer.
• The input layer ‘x’ takes in the input to the neural network and processes it and passes it onto the middle layer.
• The middle layer ‘h’ can consist of multiple hidden layers, each with its own activation functions and weights
and biases. If you have a neural network where the various parameters of different hidden layers are not
affected by the previous layer, ie: the neural network does not have memory, then you can use a recurrent
neural network.
• The Recurrent Neural Network will standardize the different activation functions and weights and biases so that
each hidden layer has the same parameters. Then, instead of creating multiple hidden layers, it will create one
and loop over it as many times as required.

Fig: Working of Recurrent Neural Network


4
Types of Recurrent Neural Networks
There are four types of Recurrent Neural Networks:
• One to One
• One to Many
• Many to One
• Many to Many
One to One RNN
This type of neural network is known as the Vanilla Neural Network. It's used for general machine learning
problems, which has a single input and a single output.

5
Types of Recurrent Neural Networks (Cont…)
One to Many RNN
This type of neural network has a single input and multiple outputs. An example of this is the image caption.

Many to One RNN


This RNN takes a sequence of inputs and generates a single output.
Sentiment analysis is a good example of this kind of network where a given
sentence can be classified as expressing positive or negative sentiments.

6
Types of Recurrent Neural Networks (Cont…)
Many to Many RNN
This RNN takes a sequence of inputs and generates a sequence of outputs. Machine translation is one of the
examples.

7
Two Issues of Standard RNNs (Cont…)
1. Vanishing Gradient Problem
2. Exploding Gradient Problem

Vanishing Gradient Problem


Recurrent Neural Networks enable you to model time-dependent and sequential data problems, such as stock market
prediction, machine translation, and text generation. You will find, however, RNN is hard to train because of the
gradient problem.
RNNs suffer from the problem of vanishing gradients. The gradients carry information used in the RNN, and when the
gradient becomes too small, the parameter updates become insignificant. This makes the learning of long data
sequences difficult.
1. As the RNN trains, the gradients (used to adjust the model’s weights) become very small.
2. This makes it hard for the model to learn or remember information from earlier parts of the sequence.
3. The model repeatedly multiplies small numbers during backpropagation, making the gradients shrink to almost
zero.

Impact:

The RNN forgets long-term context and only focuses on recent inputs.

8
Two Issues of Standard RNNs
1. Vanishing Gradient Problem
2. Exploding Gradient Problem
Exploding Gradient Problem
While training a neural network, if the slope tends to grow exponentially instead of decaying, this is called an Exploding
Gradient. This problem arises when large error gradients accumulate, resulting in very large updates to the neural
network model weights during the training process.
Long training time, poor performance, and bad accuracy are the major issues in gradient problems.
• Sometimes, the gradients become extremely large during training.
• This makes the model’s weight updates go out of control, causing errors outputs.
• The model repeatedly multiplies large numbers, making the gradients grow bigger and bigger.

Impact:
Training becomes unstable, and the model fails to learn properly.

Issue Cause Impact Solution


Weights < 1 (during Cannot learn long-term LSTMs, GRUs, ReLU,
Vanishing Gradient
backpropagation) dependencies Clipping
Weights > 1 (during Gradient Clipping,
Exploding Gradient Training instability
backpropagation) Regularization
9
Long Short-Term Memory (LSTM)
1. Long Short-Term Memory (LSTM) is a special type of Recurrent Neural Network (RNN) designed to better handle
the vanishing gradient problem and learn long-term dependencies in sequential data. LSTMs are particularly useful
for tasks like language modeling, text generation, machine translation, and time-series forecasting.

Why LSTMs?
Standard RNNs struggle to learn long-term dependencies because their gradients can either vanish (become too small)
or explode (become too large) during backpropagation. This makes them ineffective for tasks where context over long
sequences is important. LSTMs overcome this limitation through their unique architecture that allows them to
remember information for longer periods.

10
Long Short-Term Memory (LSTM)
Structure of LSTM
Cell State (Ct):
The cell state acts as the memory of the LSTM. It carries information across time steps and can be modified by
different gates. This is what allows LSTMs to maintain long-term dependencies.
Hidden State (ht):
The hidden state is used for the output at each time step and is influenced by the cell state.
Gates:
Gates are neural network layers that control the flow of information through the cell state.
They use sigmoid σ(x) = 1 / (1 + e^(-x)) or tanh tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x)) activation functions.
The gates include:
Forget Gate: Decides what information from the cell state should be discarded.
Input Gate: Decides what new information should be added to the cell state.
Output Gate: Decides what part of the cell state should be output as the hidden state.

11
Gated Recurrent Unit (GRU) Networks
• GRU is another type of RNN that is designed to address the vanishing gradient problem.

• It has two gates: the reset gate and the update gate.

• The reset gate determines how much of the previous state should be forgotten, while the update gate determines
how much of the new state should be remembered.

• This allows the GRU network to selectively update its internal state based on the input sequence.

How GRUs Work in Simple Terms


Think of GRUs as having a mechanism to decide what to remember and what to forget at each step:
• Update Gate: Controls how much of the past should be kept and how much should be replaced with new
information.
• Reset Gate: Helps decide how much of the past should be ignored when generating the new hidden state.

12
Compare GRU vs LSTM
Here is a comparison of Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) networks

GRU LSTM

Structure Simpler structure with two gates (update and reset gate) More complex structure with three gates (input, forget, and output gate)

Fewer parameters (3 weight matrices - update gate, reset gate and candidate More parameters (4 weight matrices - candidate cell state, input, forget, and
Parameters
hidden state) output gate)

Training Faster to train Slow to train

In most cases, GRU tend to use fewer memory resources due to its simpler LSTM has a more complex structure and a larger number of parameters, thus
Space Complexity structure and fewer parameters, thus better suited for large datasets or might require more memory resources and could be less effective for large
sequences. datasets or sequences.

Generally performed similarly to LSTM on many tasks, but in some cases, GRU LSTM generally performs well on many tasks but is more computationally
Performance has been shown to outperform LSTM and vice versa. It's better to try both and expensive and requires more memory resources. LSTM has advantages over
see which works better for your dataset and task. GRU in natural language understanding and machine translation tasks.
Thank You

14

You might also like