LSTM PPT
LSTM PPT
●
Short ●
Long
●
The clouds are in the ? ●
I grew up in
Kerala. .............................. I
speak fluent ?
●
2
LSTM - Overview
●
Special type of RNN capable of
learning long-term dependencies.
●
Extension of the basic Vanilla
RNN to overcome the
exploding/vanishing gradient
problem.
●
Uses 2 paths to predict values –
a long term and a short-term
memory path.
●
However, the computation is
complex when compared to
Vanilla RNN.
3
The repeating modules – RNN vs LSTM
●
Repeating Module in RNN ●
The repeating module in
LSTM
4
RNN – Components
●
Input layer
●
Hidden Layer
●
Activation Function
●
Output layer
●
Recurrent Connection
5
RNN - Architecture
Output Layer
Hidden Layer
Input Layer
6
LSTM - Overview
●
Uses the sigmoid activation function (0 to 1).
●
And Tanh or hyberbolic tangent activation function (-1 to 1).
7
Controlling the Gradients
●
When the gradients used to update the weights during
backpropagation get small and disappearing, it becomes hard
for the network to learn because the weights hardly change,
which slows down or stops training altogether.
●
LSTM leverages gating mechanisms to control the flow of
information and gradients.
●
This helps prevent the vanishing gradient problem and allows
the network to learn and retain information over longer
sequences.
8
LSTM Cell
●
Cell State – No weights or biases.
●
Information is added or removed
from the cell state using gates.
●
Gates – Composed of a
sigmoid function followed by
a pointwise multiplication.
9
●
In LSTMs, the cell state acts as a long-term memory, carrying information through the
sequence, while the hidden state is the output of the LSTM cell at a given time step and is
passed to the next cell.
●
Cell State (Ct):
– Represents the long-term memory of the model.
– Allows information to flow unchanged across the cell, providing a direct path for gradients during
backpropagation.
– Stores information about past inputs, enabling the model to learn long-term dependencies.
●
Hidden State (ht):
– The output of the LSTM cell at a given time step.
– Contributes to the final output and is passed to the next cell in the sequence.
– Is a representation of the previous inputs, retaining information from one time step to another.
– Can be thought of as the "working memory" that carries information from immediately previous events.
10
LSTM Cell
●
The sigmoid layer outputs numbers
between zero and one, describing how
much of each component should be let
through.
●
A value of zero means “let nothing
through,” while a value of one means “let
everything through!”
11
LSTM Gates - The Forget Gate
●
Decides what information we are going to
throw away from the cell state.
●
Looks at ht-1 and xt and ouputs a number
between 0 and 1 for each number ct-1 in
the cell state.
●
A value of zero means “completely omit,”
while a omit and a one means “completely
include.”
12
Forget Gate – Computation
●
Where Wf is the weight matrix, that is, the weights of ht-1 and xt.
●
Suppose we are trying to predict the next word based on all the
previous ones. In such a problem, the cell state might include
the gender of the present subject, so that the correct pronouns
can be used. When we see a new subject, we want to forget
the gender of the old subject.
13
Determining the information to store in the cell state
●
Has two parts.
●
First, a sigmoid layer called the “input gate layer” decides
which values we’ll update.
●
Next, a tanh layer creates a new candidate values, Ct, that
could be added to the state. In the next step, we’ll combine
these two to create an update to the state.
14
Determining the information to store in the cell state –
The input gate
●
This could be adding the gender of the new subject to the
cell state, to replace the old one we are forgetting.
15
Updating the cell state
●
Update using the forget gate and the input gate.
●
This could be adding the gender of the new subject to the
cell state, to replace the old one we are forgetting.
16
Deciding what to output – The output gate
●
Determines the next hidden state (short-term memory).
●
Output, ht (next hidden state) will be based on long-term (previous cell state) and short term
memory (previous hidden state).
●
The tanh layer ensures that only the relevant portions of the cell state are used to update
the hidden state.
17
Deciding what to output – The output gate
●
Sigmoid function determines how much of the information
represented by the cell state should be outputted.
●
For the language model example, since it just saw a subject, it might
want to output information relevant to a verb, in case that’s what is
coming next.
18
LSTM - Details
●
Cell state that represents Long-Term memory.
●
Can be modified by a multiplication and an addition.
●
Does not contain any weights or biases.
●
Information can flow across unrolled units without causing the
gradient to vanish or explode.
19
LSTM Details
●
Hidden state that represents short-term memories.
●
They are connected to weights and hence can get modifies
with these weights.
●
Long and short-term memories interact to make predictions.
20
Interaction between Long and Short Term Memory
●
Assume that the Long Term
memory is 2 and the short
term memory is 1.
●
The input reduces the effect
of the Long-term memory by
a small factor.
21
Interaction between Long and Short Term Memory
●
Assume that the Long Term
memory is 2 and the short term
memory is -10.
●
The input reduces the effect of the
Long-term memory to zero.
●
Since the Sigmoid function outputs
a number between 0 and 1, the o/p
determines what percentage of the
Long-term memory is
remembered.
●
This is the first stage in LSTM.
●
This part is called the forget gate.
22