0% found this document useful (0 votes)
91 views24 pages

LSTM

Long Short-Term Memory (LSTM) is an advanced version of Recurrent Neural Networks (RNN) that effectively captures long-term dependencies in sequential data, making it suitable for tasks such as language translation and speech recognition. LSTMs address the vanishing and exploding gradient problems faced by traditional RNNs by incorporating a memory cell controlled by three gates: input, forget, and output. This architecture allows LSTMs to selectively manage information flow, enabling them to learn long-term patterns more effectively.

Uploaded by

Srikanth Sri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views24 pages

LSTM

Long Short-Term Memory (LSTM) is an advanced version of Recurrent Neural Networks (RNN) that effectively captures long-term dependencies in sequential data, making it suitable for tasks such as language translation and speech recognition. LSTMs address the vanishing and exploding gradient problems faced by traditional RNNs by incorporating a memory cell controlled by three gates: input, forget, and output. This architecture allows LSTMs to selectively manage information flow, enabling them to learn long-term patterns more effectively.

Uploaded by

Srikanth Sri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

What is LSTM – Long Short Term Memory?

Long Short-Term Memory (LSTM) is an enhanced


version of the Recurrent Neural Network (RNN)
designed by Hochreiter & Schmidhuber. LSTMs can
capture long-term dependencies in sequential data
making them ideal for tasks like language translation,
speech recognition and time series forecasting.
Unlike traditional RNNs which use a single hidden state
passed through time LSTMs introduce a memory cell
that holds information over extended periods
addressing the challenge of learning long-term
dependencies.
Problem with Long-Term Dependencies in RNN
Recurrent Neural Networks (RNNs) are designed to handle
sequential data by maintaining a hidden state that captures
information from previous time steps. However they often face
challenges in learning long-term dependencies where information
from distant time steps becomes crucial for making accurate
predictions for current state. This problem is known as the
vanishing gradient or exploding gradient problem.
• Vanishing Gradient: When training a model over time, the
gradients (which help the model learn) can shrink as they pass
through many steps. This makes it hard for the model to learn
long-term patterns since earlier information becomes almost
irrelevant.

• Exploding Gradient: Sometimes, gradients can grow too large,


causing instability. This makes it difficult for the model to learn
properly, as the updates to the model become erratic and
unpredictable.
LSTM Architecture
LSTM architectures involves the memory cell which is controlled by three gates: the input
gate, the forget gate and the output gate. These gates decide what information to add to,
remove from and output from the memory cell.
•Input gate: Controls what information is added to the memory cell.

•Forget gate: Determines what information is removed from the memory cell.

•Output gate: Controls what information is output from the memory cell.

This allows LSTM networks to selectively retain or discard information as it flows through
the network which allows them to learn long-term dependencies. The network has a
hidden state which is like its short-term memory. This memory is updated using the
current input, the previous hidden state and the current state of the memory cell.

You might also like