Recurrent Neural Networks
Recurrent Neural Networks
• Recurrent Neural Networks (RNNs) are a type of artificial neural network designed to process sequences of data.
They work especially well for jobs requiring sequences, such as time series data, voice, natural language, and
other activities.
• RNN works on the principle of saving the output of a particular layer and feeding this back to the input in order to
predict the output of the layer.
• Below is how you can convert a Feed-Forward Neural Network into a Recurrent Neural Network:
1
Recurrent Neural Networks (Cont…)
The nodes in different layers of the neural network are compressed to form a single layer of recurrent neural
networks. A, B, and C are the parameters of the network.
2
Recurrent Neural Networks (Cont…)
Here, “x” is the input layer, “h” is the hidden layer, and “y” is the output layer. A, B, and C are the network
parameters used to improve the output of the model. At any given time t, the current input is a combination of
input at x(t) and x(t-1). The output at any given time is fetched back to the network to improve on the output.
5
Types of Recurrent Neural Networks (Cont…)
One to Many RNN
This type of neural network has a single input and multiple outputs. An example of this is the image caption.
6
Types of Recurrent Neural Networks (Cont…)
Many to Many RNN
This RNN takes a sequence of inputs and generates a sequence of outputs. Machine translation is one of the
examples.
7
Two Issues of Standard RNNs (Cont…)
1. Vanishing Gradient Problem
2. Exploding Gradient Problem
Impact:
The RNN forgets long-term context and only focuses on recent inputs.
8
Two Issues of Standard RNNs
1. Vanishing Gradient Problem
2. Exploding Gradient Problem
Exploding Gradient Problem
While training a neural network, if the slope tends to grow exponentially instead of decaying, this is called an Exploding
Gradient. This problem arises when large error gradients accumulate, resulting in very large updates to the neural
network model weights during the training process.
Long training time, poor performance, and bad accuracy are the major issues in gradient problems.
• Sometimes, the gradients become extremely large during training.
• This makes the model’s weight updates go out of control, causing errors outputs.
• The model repeatedly multiplies large numbers, making the gradients grow bigger and bigger.
Impact:
Training becomes unstable, and the model fails to learn properly.
Why LSTMs?
Standard RNNs struggle to learn long-term dependencies because their gradients can either vanish (become too small)
or explode (become too large) during backpropagation. This makes them ineffective for tasks where context over long
sequences is important. LSTMs overcome this limitation through their unique architecture that allows them to
remember information for longer periods.
10
Long Short-Term Memory (LSTM)
Structure of LSTM
Cell State (Ct):
The cell state acts as the memory of the LSTM. It carries information across time steps and can be modified by
different gates. This is what allows LSTMs to maintain long-term dependencies.
Hidden State (ht):
The hidden state is used for the output at each time step and is influenced by the cell state.
Gates:
Gates are neural network layers that control the flow of information through the cell state.
They use sigmoid σ(x) = 1 / (1 + e^(-x)) or tanh tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x)) activation functions.
The gates include:
Forget Gate: Decides what information from the cell state should be discarded.
Input Gate: Decides what new information should be added to the cell state.
Output Gate: Decides what part of the cell state should be output as the hidden state.
11
Gated Recurrent Unit (GRU) Networks
• GRU is another type of RNN that is designed to address the vanishing gradient problem.
• It has two gates: the reset gate and the update gate.
• The reset gate determines how much of the previous state should be forgotten, while the update gate determines
how much of the new state should be remembered.
• This allows the GRU network to selectively update its internal state based on the input sequence.
12
Compare GRU vs LSTM
Here is a comparison of Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) networks
GRU LSTM
Structure Simpler structure with two gates (update and reset gate) More complex structure with three gates (input, forget, and output gate)
Fewer parameters (3 weight matrices - update gate, reset gate and candidate More parameters (4 weight matrices - candidate cell state, input, forget, and
Parameters
hidden state) output gate)
In most cases, GRU tend to use fewer memory resources due to its simpler LSTM has a more complex structure and a larger number of parameters, thus
Space Complexity structure and fewer parameters, thus better suited for large datasets or might require more memory resources and could be less effective for large
sequences. datasets or sequences.
Generally performed similarly to LSTM on many tasks, but in some cases, GRU LSTM generally performs well on many tasks but is more computationally
Performance has been shown to outperform LSTM and vice versa. It's better to try both and expensive and requires more memory resources. LSTM has advantages over
see which works better for your dataset and task. GRU in natural language understanding and machine translation tasks.
Thank You
14