0% found this document useful (0 votes)
155 views23 pages

Long Short Term Memory (LSTM)

The document explains Long Short-Term Memory (LSTM) networks, highlighting their memory cell structure that maintains state over time through gating mechanisms. It details the functions of the forget and input gates in managing information flow and memory updates, as well as the advantages and challenges of using LSTMs in sequential tasks. Additionally, it notes that while LSTMs mitigate issues like vanishing gradients, they can still be computationally expensive and are sometimes outperformed by newer models such as Transformers.

Uploaded by

Light Hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views23 pages

Long Short Term Memory (LSTM)

The document explains Long Short-Term Memory (LSTM) networks, highlighting their memory cell structure that maintains state over time through gating mechanisms. It details the functions of the forget and input gates in managing information flow and memory updates, as well as the advantages and challenges of using LSTMs in sequential tasks. Additionally, it notes that while LSTMs mitigate issues like vanishing gradients, they can still be computationally expensive and are sometimes outperformed by newer models such as Transformers.

Uploaded by

Light Hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Long Short-Term

Memory (LSTM)
Sir. Asif Ahsan
LSTMs
• Central Idea: A memory cell (interchangeably block) which can maintain its
state over time, consisting of an explicit memory (aka the cell state vector)
and gating units which regulate the information flow into and out of the
memory.
Cell State (convener belt)
• Represents the memory of the LSTM
• Undergoes changes via forgetting of old memory (forget
gate) and addition of new memory (input gate)
GATES
• Gate: sigmoid neural net layer followed by pointwise
multiplication operator.
• Gates control the flow of information to/from the
memory.
• Gates are controlled by a concatenation of the output
from the previous time step and the current input and
optionally the cell state vector.
Forget gate
• Controls what information to throw away from memory
Input gate
• input gate plays a crucial role in deciding how much of the input
information at the current time step should be updated in the cell
state.
Forget gate
• The forget gate uses a sigmoid activation function to produce
an output between 0 and 1 for each piece of information in the
cell state:
• A value of 0 indicates
• "completely forget this information."
• A value of 1 indicates
• "completely retain this information."
Memory Update
• The memory update in an LSTM (Long Short-Term Memory) network
involves modifying the cell state based on a combination of new information
and retained past information. This process is a key aspect of how LSTMs
manage long-term dependencies in sequential data.
LSTM complete
“Asif eats biryani almost every
day, it shouldn’t be hard to
guess that his favorite cuisine is
Pakistani."
"Asif eats biryani almost every
day, it shouldn’t be hard to guess
that his favorite cuisine is
Pakistani. His brother Ahmed,
however, is a lover of pasta and
cheese, which means Ahmed’s
favorite cuisine is Italian."

Cell state: birya


ni
"Asif eats biryani almost every
day, it shouldn’t be hard to guess
that his favorite cuisine is
Pakistani. His brother Ahmed,
however, is a lover of pasta and
cheese, which means Ahmed’s
favorite cuisine is Italian."

Cell state:
"Asif eats biryani almost every
day, it shouldn’t be hard to guess
that his favorite cuisine is
Pakistani. His brother Ahmed,
however, is a lover of pasta and
cheese, which means Ahmed’s
favorite cuisine is Italian."

Cell state: pasta


Memory
of Biryani
Adding memory for Pasta
Forget Biryani
Italian
cuisine
LSTMs
• It’s still possible for LSTMs to suffer from
vanishing/exploding gradients, but it’s way less likely
than with vanilla RNNs:

delicately find a recurrent weight matrix 𝑊 that isn’t too large


• If RNNs wish to preserve info over long contexts, it must

or small.
• However, LSTMs have 3 separate mechanism that adjust the
flow of information (e.g., forget gate, if turned off, will
preserve all info)
LSTM
• Advantages:
• Handles long-term dependencies
• Avoids vanishing gradients
• Excels in sequential tasks like NLP and time-series.
• Issues:
• Computationally expensive.
• Prone to overfitting.
• Outperformed by newer models like Transformers in some
cases.
Reading:
• https://colah.github.io/posts/2015-08-Understanding-LST
Ms/

• In addition to the original authors, a lot of people


contributed to the modern LSTM. A non-comprehensive
list is: Felix Gers, Fred Cummins, Santiago Fernandez,
Justin Bayer, Daan Wierstra, Julian Togelius, Faustino
Gomez, Matteo Gagliolo, and Alex Graves.

You might also like