Long Short Term Memory (LSTM)
Long Short Term Memory (LSTM)
Memory (LSTM)
Sir. Asif Ahsan
LSTMs
• Central Idea: A memory cell (interchangeably block) which can maintain its
state over time, consisting of an explicit memory (aka the cell state vector)
and gating units which regulate the information flow into and out of the
memory.
Cell State (convener belt)
• Represents the memory of the LSTM
• Undergoes changes via forgetting of old memory (forget
gate) and addition of new memory (input gate)
GATES
• Gate: sigmoid neural net layer followed by pointwise
multiplication operator.
• Gates control the flow of information to/from the
memory.
• Gates are controlled by a concatenation of the output
from the previous time step and the current input and
optionally the cell state vector.
Forget gate
• Controls what information to throw away from memory
Input gate
• input gate plays a crucial role in deciding how much of the input
information at the current time step should be updated in the cell
state.
Forget gate
• The forget gate uses a sigmoid activation function to produce
an output between 0 and 1 for each piece of information in the
cell state:
• A value of 0 indicates
• "completely forget this information."
• A value of 1 indicates
• "completely retain this information."
Memory Update
• The memory update in an LSTM (Long Short-Term Memory) network
involves modifying the cell state based on a combination of new information
and retained past information. This process is a key aspect of how LSTMs
manage long-term dependencies in sequential data.
LSTM complete
“Asif eats biryani almost every
day, it shouldn’t be hard to
guess that his favorite cuisine is
Pakistani."
"Asif eats biryani almost every
day, it shouldn’t be hard to guess
that his favorite cuisine is
Pakistani. His brother Ahmed,
however, is a lover of pasta and
cheese, which means Ahmed’s
favorite cuisine is Italian."
Cell state:
"Asif eats biryani almost every
day, it shouldn’t be hard to guess
that his favorite cuisine is
Pakistani. His brother Ahmed,
however, is a lover of pasta and
cheese, which means Ahmed’s
favorite cuisine is Italian."
or small.
• However, LSTMs have 3 separate mechanism that adjust the
flow of information (e.g., forget gate, if turned off, will
preserve all info)
LSTM
• Advantages:
• Handles long-term dependencies
• Avoids vanishing gradients
• Excels in sequential tasks like NLP and time-series.
• Issues:
• Computationally expensive.
• Prone to overfitting.
• Outperformed by newer models like Transformers in some
cases.
Reading:
• https://colah.github.io/posts/2015-08-Understanding-LST
Ms/