0% found this document useful (0 votes)
95 views22 pages

LSTM PPT

LSTM (Long Short Term Memory) networks are a specialized type of recurrent neural network (RNN) designed to learn long-term dependencies and address the vanishing gradient problem. They utilize a complex architecture with cell states and gating mechanisms to control information flow, allowing for effective learning over longer sequences. The LSTM structure includes components such as forget gates, input gates, and output gates that manage the retention and output of information.

Uploaded by

heisenberganaya1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views22 pages

LSTM PPT

LSTM (Long Short Term Memory) networks are a specialized type of recurrent neural network (RNN) designed to learn long-term dependencies and address the vanishing gradient problem. They utilize a complex architecture with cell states and gating mechanisms to control information flow, allowing for effective learning over longer sequences. The LSTM structure includes components such as forget gates, input gates, and output gates that manage the retention and output of information.

Uploaded by

heisenberganaya1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

LSTM Networks

(Long Short Term Memory)


Long and short term dependency


Short ●
Long

The clouds are in the ? ●
I grew up in
Kerala. .............................. I
speak fluent ?

2
LSTM - Overview


Special type of RNN capable of
learning long-term dependencies.

Extension of the basic Vanilla
RNN to overcome the
exploding/vanishing gradient
problem.

Uses 2 paths to predict values –
a long term and a short-term
memory path.

However, the computation is
complex when compared to
Vanilla RNN.
3
The repeating modules – RNN vs LSTM


Repeating Module in RNN ●
The repeating module in
LSTM

4
RNN – Components


Input layer

Hidden Layer

Activation Function

Output layer

Recurrent Connection

5
RNN - Architecture
Output Layer

Hidden Layer

Input Layer

6
LSTM - Overview


Uses the sigmoid activation function (0 to 1).


And Tanh or hyberbolic tangent activation function (-1 to 1).

7
Controlling the Gradients


When the gradients used to update the weights during
backpropagation get small and disappearing, it becomes hard
for the network to learn because the weights hardly change,
which slows down or stops training altogether.

LSTM leverages gating mechanisms to control the flow of
information and gradients.

This helps prevent the vanishing gradient problem and allows
the network to learn and retain information over longer
sequences.

8
LSTM Cell


Cell State – No weights or biases.

Information is added or removed
from the cell state using gates.


Gates – Composed of a
sigmoid function followed by
a pointwise multiplication.

9

In LSTMs, the cell state acts as a long-term memory, carrying information through the
sequence, while the hidden state is the output of the LSTM cell at a given time step and is
passed to the next cell.

Cell State (Ct):
– Represents the long-term memory of the model.
– Allows information to flow unchanged across the cell, providing a direct path for gradients during
backpropagation.
– Stores information about past inputs, enabling the model to learn long-term dependencies.

Hidden State (ht):
– The output of the LSTM cell at a given time step.
– Contributes to the final output and is passed to the next cell in the sequence.
– Is a representation of the previous inputs, retaining information from one time step to another.
– Can be thought of as the "working memory" that carries information from immediately previous events.

10
LSTM Cell


The sigmoid layer outputs numbers
between zero and one, describing how
much of each component should be let
through.

A value of zero means “let nothing
through,” while a value of one means “let
everything through!”

11
LSTM Gates - The Forget Gate

Decides what information we are going to
throw away from the cell state.

Looks at ht-1 and xt and ouputs a number
between 0 and 1 for each number ct-1 in
the cell state.

A value of zero means “completely omit,”
while a omit and a one means “completely
include.”

12
Forget Gate – Computation


Where Wf is the weight matrix, that is, the weights of ht-1 and xt.

Suppose we are trying to predict the next word based on all the
previous ones. In such a problem, the cell state might include
the gender of the present subject, so that the correct pronouns
can be used. When we see a new subject, we want to forget
the gender of the old subject.
13
Determining the information to store in the cell state


Has two parts.

First, a sigmoid layer called the “input gate layer” decides
which values we’ll update.

Next, a tanh layer creates a new candidate values, Ct, that
could be added to the state. In the next step, we’ll combine
these two to create an update to the state.
14
Determining the information to store in the cell state –
The input gate


This could be adding the gender of the new subject to the
cell state, to replace the old one we are forgetting.

15
Updating the cell state


Update using the forget gate and the input gate.

This could be adding the gender of the new subject to the
cell state, to replace the old one we are forgetting.

16
Deciding what to output – The output gate


Determines the next hidden state (short-term memory).

Output, ht (next hidden state) will be based on long-term (previous cell state) and short term
memory (previous hidden state).

The tanh layer ensures that only the relevant portions of the cell state are used to update
the hidden state.

17
Deciding what to output – The output gate


Sigmoid function determines how much of the information
represented by the cell state should be outputted.

For the language model example, since it just saw a subject, it might
want to output information relevant to a verb, in case that’s what is
coming next.

18
LSTM - Details


Cell state that represents Long-Term memory.

Can be modified by a multiplication and an addition.

Does not contain any weights or biases.

Information can flow across unrolled units without causing the
gradient to vanish or explode.

19
LSTM Details


Hidden state that represents short-term memories.

They are connected to weights and hence can get modifies
with these weights.

Long and short-term memories interact to make predictions.

20
Interaction between Long and Short Term Memory


Assume that the Long Term
memory is 2 and the short
term memory is 1.

The input reduces the effect
of the Long-term memory by
a small factor.

21
Interaction between Long and Short Term Memory


Assume that the Long Term
memory is 2 and the short term
memory is -10.

The input reduces the effect of the
Long-term memory to zero.

Since the Sigmoid function outputs
a number between 0 and 1, the o/p
determines what percentage of the
Long-term memory is
remembered.

This is the first stage in LSTM.

This part is called the forget gate.
22

You might also like