LSTM

LSTM, or Long Short-Term Memory, is an advanced type of Recurrent Neural Network designed to effectively handle long-term dependencies in sequential data, addressing issues like the vanishing gradient problem found in traditional RNNs. It utilizes a structure of three gates (forget gate, input gate, and output gate) to manage memory retention and manipulation, making it suitable for applications such as language modeling, time series prediction, and speech recognition. Additionally, variations like Bi-Directional LSTM and Seq2Seq models enhance its capabilities for tasks involving variable input and output lengths.

Uploaded by

BENAZIR AE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

LSTM

Uploaded by

BENAZIR AE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

What is LSTM?

- Introduction to Long
Short-Term Memory
 LSTM, an advanced form of Recurrent
Neural Network, is crucial in Deep Learning
for processing time series and sequential data.
 Designed by Hochreiter and Schmidhuber,
LSTM effectively addresses RNN's
limitations, particularly the vanishing gradient
problem, making it superior for remembering
long-term dependencies.
 This neural network integrates complex
algorithms and gated cells, allowing it to
retain and manipulate memory effectively,
which is pivotal for applications like video
processing and reading comprehension.
Need of LSTM
 LSTM was introduced to tackle the
problems and challenges in Recurrent Neural
Networks.
 RNN is a type of Neural Network that stores
the previous output to help improve its future
predictions.
 Vanilla RNN has a “short-term” memory.
 The input at the beginning of the sequence
doesn’t affect the output of the Network after
a while, maybe 3 or 4 inputs.
 This is called a long-term dependency
issue.

Example:
Let’s take this sentence.
The Sun rises in the ______.
An RNN could easily return the correct output
that the sun rises in the East as all the necessary
information is nearby.
Let’s take another example.
I was born in Japan, ……… and I speak
fluent ______.
In this sentence, the RNN would be unable to
return the correct output as it requires
remembering the word Japan for a long
duration. Since RNN only has a “Short-term”
memory, it doesn’t work well. LSTM solves this
problem by enabling the Network to
remember Long-term dependencies.
The other RNN problems are the Vanishing
Gradient and Exploding Gradient.
It arises during the Backpropagation of the
Neural Network. For example, suppose the
gradient of each layer is contained between 0
and 1. As the value gets multiplied in each
layer, it gets smaller and smaller, ultimately, a
value very close to 0. This is the Vanishing
gradient problem. The converse, when the
values are greater than 1, exploding gradient
problem occurs, where the value gets really big,
disrupting the training of the Network. Again,
these problems are tackled in LSTMs.
Structure of LSTM
 LSTM is a cell that consists of 3 gates.
 A forget gate, input gate, and output gate.
 The gates decide which information is
important and which information can be
forgotten.
 The cell has two states Cell
State and Hidden State.
 They are continuously updated and carry the
information from the previous to the current
time steps.
 The cell state is the “long-term” memory,
while the hidden state is the “short-term”
memory. Now let’s look at each gate in detail.

Forget Gate:
 Forget gate is responsible for deciding what
information should be removed from the cell
state.
 It takes in the hidden state of the previous
time-step and the current input and passes it to
a Sigma Activation Function, which outputs a
value between 0 and 1, where 0 means forget
and 1 means keep.

Input Gate:
 The Input Gate considers the current input
and the hidden state of the previous time step.
 The input gate is used to update the cell
state value.
 It has two parts.
 The first part contains the Sigma
activation function.
 Its purpose is to decide what percent of the
information is required.
 The second part passes the two values to
a Tanh activation function.
 It aims to map the data between -1 and 1.
 To obtain the relevant information required
from the output of Tanh, we multiply it by the
output of the Sigma function.
 This is the output of the Input gate, which
updates the cell state.

Output Gate:
 The output gate returns the hidden state for
the next time stamp.
 The output gate has two parts.
 The first part is a Sigma function, which
serves the same purpose as the other two
gates, to decide the percent of the relevant
information required.
 Next, the newly updated cell state is passed
through a Tanh function and multiplied by
the output from the sigma function. This is
now the new hidden state.
Cell State:
 The forget gate and input gate update the
cell state.
 The cell state of the previous state is
multiplied by the output of the forget gate.
 The output of this state is then summed with
the output of the input gate.
 This value is then used to calculate hidden
state in the output gate.

How do LSTMs Work?

 The LSTM architecture is similar to RNN,
but instead of the feedback loop has an LSTM
cell.
 The sequence of LSTM cells in each layer is
fed with the output of the last cell.
 This enables the cell to get the previous
inputs and sequence information.
 A cyclic set of steps happens in each LSTM
cell
 The Forget gate is computed.
 The Input gate value is computed.
 The Cell state is updated using the above
two outputs.
 The output(hidden state) is computed using
the output gate.
These series of steps occur in every LSTM cell.
The intuition behind LSTM is that the Cell and
Hidden states carry the previous information
and pass it on to future time steps.
The Cell state is aggregated with all the past
data information and is the long-
term information retainer.
The Hidden state carries the output of the last
cell, i.e. short-term memory.
This combination of Long term and short-term
memory techniques enables LSTM’s to perform
well In time series and sequence data.
Applications of LSTM
LSTM has a lot of applications:
 Language Modeling: LSTMs have been
used to build language models that can
generate natural language text, such as in
machine translation systems or chatbots.
 Time series prediction: LSTMs have been
used to model time series data and predict
future values in the series. For example,
LSTMs have been used to predict stock
prices or traffic patterns.
 Sentiment analysis: LSTMs have been used
to analyze text sentiments, such as in social
media posts or customer reviews.


 Speech recognition: LSTMs have been

used to build speech recognition systems
that can transcribe spoken language into
text.
 Image captioning: LSTMs have been used
to generate descriptive captions for images,
such as in image search engines or
automated image annotation
systems.
Bi-Directional LSTM

Bi-Directional LSTM or BiLSTM is an

enhancement of traditional LSTM Architecture.
In this method, we have two parallel LSTM
Networks. One network is moving forward on
the data, while the other is moving backward.
This is very helpful in NLP(Natural Language
Processing).

Example:
He went to ______
The model can only predict the correct value to
fill in the blank with the next sentence.
And he got up early the next morning
With this sentence to help, we can predict the
blank that he went to sleep. This can be
predicted by a BiLSTM model as it would
simultaneously process the data backward. So
BiLISTM enables better performance in
Sequential data.
Sequence to Sequence LSTMs or RNN
Encoder-Decoders
Seq2Seq is basically many-to-
many Architecture seen in RNNs. In many-to-
many architecture, an arbitrary length input is
given, and an arbitrary length is returned as
output. This Architecture is useful in
applications where there is variable input and
output length. For example, one such
application is Language Translation, where a
sentence length in one language doesn’t
translate to the same length in another language.
In these situations, Seq2Seq LSTMs are used.

A Seq2Seq model consists of 2 main

components. An Encoder and a Decoder. The
Encoder outputs a Context Vector, which is fed
to the Decoder. In our example of Language
Translation, the input is a sentence. The
Sentence is fed to the input, which learns
the representation of the input sentence.
Meaning it learns the context of the entire
sentence and embeds or Represents it in a
Context Vector. After the Encoder learns the
representation, the Context Vector is passed to
the Decoder, translating to the required
Language and returning a sentence.
An Encoder is nothing but an LSTM network
that is used to learn the representation. The main
difference is, instead of considering the output,
we consider the Hidden state of the last cell as
it contains context of all the inputs. This is
used as the context vector and passed to the
Decoder.
The Decoder is also an LSTM Network. But
Instead of initializing the hidden state to random
values, the context vector is fed as the hidden
state. The first input is initialized
to <bos> meaning ‘Beginning of Sentence’.
The output of the first cell(First Translated
word) is fed as the input to the next LSTM cell.

Compare LSTM vs. RNN

Here is a comparison of long short-term
memory (LSTM) and recursive neural networks
(RNNs).
Recurrent
Long Short-Term
Neural
Memory (LSTM)
Network (RNN)
Type of Recurrent neural Recurrent neural
network network. network.
Uses gates to
Uses recursive
control the flow of
How it connections to
information
works process
through the
sequential data.
network.
Long-term Short-term
Suitable for
dependencies. dependencies.
Language
Language
modeling, time
modeling, time
Common series prediction,
series prediction,
applications sentiment analysis,
speech
speech recognition,
recognition.
image captioning.
Conclusion
 LSTMs are a solution to the issue of RNN
long-term dependency.
 LSTMs use gates and 2 states to store
information.
 LSTMs work well for time-series data
processing, prediction, and classification.
 Seq2Seq model is used for Variable length
data.
 Encoder and Decoder use LSTM to
implement Seq2Seq.

What is LSTM - Long Short Term Memory_ - GeeksforGeeks
No ratings yet
What is LSTM - Long Short Term Memory_ - GeeksforGeeks
10 pages
MR590I - Manual - Neha Refu
100% (2)
MR590I - Manual - Neha Refu
182 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
longshorttermmemorylstm-231215171600-1feb7b1b
No ratings yet
longshorttermmemorylstm-231215171600-1feb7b1b
17 pages
LSTM by Bushra
No ratings yet
LSTM by Bushra
16 pages
LSTM Presentation
No ratings yet
LSTM Presentation
23 pages
DL CO-3 PPT 3
No ratings yet
DL CO-3 PPT 3
19 pages
LSTM Networks Thesis Updated
No ratings yet
LSTM Networks Thesis Updated
5 pages
LSTM
No ratings yet
LSTM
12 pages
LSTM
No ratings yet
LSTM
22 pages
lstm
No ratings yet
lstm
12 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
34-Long-Term Dependencies - Echo State Networks - Long Short-Term Memory and Othe-03!10!2024
No ratings yet
34-Long-Term Dependencies - Echo State Networks - Long Short-Term Memory and Othe-03!10!2024
14 pages
Neural Networks
No ratings yet
Neural Networks
22 pages
Long Short-Term Memory (LSTM) by Mohsin
No ratings yet
Long Short-Term Memory (LSTM) by Mohsin
17 pages
UNIT-III
No ratings yet
UNIT-III
5 pages
LSTM_1738024034
No ratings yet
LSTM_1738024034
13 pages
Sequence Modeling
No ratings yet
Sequence Modeling
131 pages
What is LSTM
No ratings yet
What is LSTM
5 pages
LSTM.pptx
No ratings yet
LSTM.pptx
11 pages
9 RNN LSTM Gru
No ratings yet
9 RNN LSTM Gru
91 pages
LSTM_networks_in_python__1723896317
No ratings yet
LSTM_networks_in_python__1723896317
17 pages
LSTM
No ratings yet
LSTM
24 pages
RNN_2
No ratings yet
RNN_2
144 pages
Cs224n 2025 Lecture06 Fancy Rnn
No ratings yet
Cs224n 2025 Lecture06 Fancy Rnn
57 pages
RNN and LSTM
No ratings yet
RNN and LSTM
32 pages
LSTM
No ratings yet
LSTM
14 pages
Long-Short Term Memory
No ratings yet
Long-Short Term Memory
21 pages
LSTM_ppt
No ratings yet
LSTM_ppt
22 pages
Presentation Title
No ratings yet
Presentation Title
10 pages
LSTM
No ratings yet
LSTM
8 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
chapter 2
No ratings yet
chapter 2
68 pages
EPJ LSTM Survey
No ratings yet
EPJ LSTM Survey
14 pages
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
8 pages
Long Short-Term Memory Survey Paper
No ratings yet
Long Short-Term Memory Survey Paper
6 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
RNN
No ratings yet
RNN
28 pages
Long Short Term Memory (LSTM)
No ratings yet
Long Short Term Memory (LSTM)
33 pages
Unlocking The Power of Long Short-Term Memory (LSTM) Networks - by Sachinsoni - Medium
No ratings yet
Unlocking The Power of Long Short-Term Memory (LSTM) Networks - by Sachinsoni - Medium
23 pages
Long Short-Term Memory Networks PDF
No ratings yet
Long Short-Term Memory Networks PDF
22 pages
T3-Slide_006_LSTM
No ratings yet
T3-Slide_006_LSTM
25 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
15 pages
OlahLSTM NEURAL NETWORK TUTORIAL 15
No ratings yet
OlahLSTM NEURAL NETWORK TUTORIAL 15
9 pages
lecture 11
No ratings yet
lecture 11
57 pages
DLT UNIT-4
No ratings yet
DLT UNIT-4
18 pages
A Review of Recurrent Neural Networks
No ratings yet
A Review of Recurrent Neural Networks
36 pages
MODULE 4
No ratings yet
MODULE 4
14 pages
RNN Part1
No ratings yet
RNN Part1
12 pages
LSTM
No ratings yet
LSTM
24 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
10 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
7 pages
Unit5 6LSTM
No ratings yet
Unit5 6LSTM
9 pages
Understanding LSTM_ A Simple Guide with Diagrams and Real-Time Examples _ by Neural pAi _ Feb, 2025 _ Medium
No ratings yet
Understanding LSTM_ A Simple Guide with Diagrams and Real-Time Examples _ by Neural pAi _ Feb, 2025 _ Medium
15 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
LSTM
No ratings yet
LSTM
27 pages
LSTMS
No ratings yet
LSTMS
14 pages
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
From Everand
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
Fouad Sabry
No ratings yet
Inception New
No ratings yet
Inception New
11 pages
Ex NO 9 DL LAB
No ratings yet
Ex NO 9 DL LAB
3 pages
JS FUNCTIONS
No ratings yet
JS FUNCTIONS
8 pages
Bootstrap Lab Manual
No ratings yet
Bootstrap Lab Manual
28 pages
Javascript Programs
No ratings yet
Javascript Programs
14 pages
Unit 1
No ratings yet
Unit 1
16 pages
Css Text Styling
No ratings yet
Css Text Styling
20 pages
Chapter 8: Programmable Logic Controller (PLC) : EG2098: Industrial Electronics and Control &topic 8 1 1
No ratings yet
Chapter 8: Programmable Logic Controller (PLC) : EG2098: Industrial Electronics and Control &topic 8 1 1
26 pages
Advanced-Java-Notes
No ratings yet
Advanced-Java-Notes
53 pages
Computer Network Lab Manual
No ratings yet
Computer Network Lab Manual
51 pages
Simplex Method Microsoft Office PowerPoint Presentation
No ratings yet
Simplex Method Microsoft Office PowerPoint Presentation
5 pages
Short CKT IEC
No ratings yet
Short CKT IEC
9 pages
An Overview of Hospital Information System HIS Imp
No ratings yet
An Overview of Hospital Information System HIS Imp
8 pages
LPTL 1
No ratings yet
LPTL 1
6 pages
Axie Infinity Reviewer - by Mhon
100% (2)
Axie Infinity Reviewer - by Mhon
29 pages
? How to Install Kali Linux in VMware ?
No ratings yet
? How to Install Kali Linux in VMware ?
3 pages
Page 190-281 (Country Wise Data
No ratings yet
Page 190-281 (Country Wise Data
46 pages
Download Full The Rough Guide to Nepal 10th Edition Stuart Butler PDF All Chapters
100% (3)
Download Full The Rough Guide to Nepal 10th Edition Stuart Butler PDF All Chapters
28 pages
RM 3
No ratings yet
RM 3
40 pages
DICOM Validation Tool Conformance Statement
No ratings yet
DICOM Validation Tool Conformance Statement
29 pages
Quidos Technical Bulletin - October 2021
No ratings yet
Quidos Technical Bulletin - October 2021
4 pages
Get (Ebook) Quantitative analysis for management by Badri, T.N.; Hale, Trevor S.; Hanna, Michael E.; Render, Barry; Stair, Ralph M ISBN 9789332568587, 9789332578692, 9332568588, 9332578699 PDF ebook with Full Chapters Now
100% (11)
Get (Ebook) Quantitative analysis for management by Badri, T.N.; Hale, Trevor S.; Hanna, Michael E.; Render, Barry; Stair, Ralph M ISBN 9789332568587, 9789332578692, 9332568588, 9332578699 PDF ebook with Full Chapters Now
55 pages
DCC-17
No ratings yet
DCC-17
11 pages
Facial Expressions Recogination Systen With Voice Alert Using
No ratings yet
Facial Expressions Recogination Systen With Voice Alert Using
17 pages
Point Cloud-to-3D
No ratings yet
Point Cloud-to-3D
7 pages
Assignment
No ratings yet
Assignment
5 pages
Unit 4 CN
No ratings yet
Unit 4 CN
2 pages
sxos作弊指南（中、英文）
No ratings yet
sxos作弊指南（中、英文）
19 pages
ION Cloud Mining - Pioneering the Future of Industrial Innovation
No ratings yet
ION Cloud Mining - Pioneering the Future of Industrial Innovation
3 pages
Boq 506479
No ratings yet
Boq 506479
5 pages
Manual AM706C V170701 (Exp)
No ratings yet
Manual AM706C V170701 (Exp)
26 pages
First Boot
No ratings yet
First Boot
3,379 pages
A Novel Back Up Wide Area Protection Technique For Power Transmission Grids Using Phasor Measurement Unit
No ratings yet
A Novel Back Up Wide Area Protection Technique For Power Transmission Grids Using Phasor Measurement Unit
9 pages
Agile Project Excel Template
No ratings yet
Agile Project Excel Template
37 pages
Getting Started Guide: Weatherlink®
No ratings yet
Getting Started Guide: Weatherlink®
32 pages
Ccs337 Cs Unit IV
No ratings yet
Ccs337 Cs Unit IV
30 pages