0% found this document useful (0 votes)

200 views6 pages

Introduction To Long Short Term Memory LSTM

LSTM is a type of recurrent neural network that can handle long-term dependencies. It consists of forget, input, and output gates that allow it to remember information for long periods of time. The forget gate determines what information to forget from the previous time step. The input gate quantifies the importance of new information, and the new information is added to the cell state. The output gate then produces the hidden state as a function of the long-term cell state and current output. This architecture allows LSTMs to remember inputs over long periods of time and avoid the vanishing gradient problem of standard RNNs.

Uploaded by

Juan Gonzalez-Espinosa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

200 views6 pages

Introduction To Long Short Term Memory LSTM

Uploaded by

Juan Gonzalez-Espinosa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Introduction to Long Short Term Memory (LSTM)

A D VA NC E D D E E P LE A RNI NG VI D E O S

Objective

LSTM is a special kind of recurrent neural network capable of handling long-term dependencies.
Understand the architecture and working of an LSTM network

Introduction

Long Short Term Memory Network is an advanced RNN, a sequential network, that allows information to
persist. It is capable of handling the vanishing gradient problem faced by RNN. A recurrent neural network
is also known as RNN is used for persistent memory.

Let’s say while watching a video you remember the previous scene or while reading a book you know what
happened in the earlier chapter. Similarly RNNs work, they remember the previous information and use it
for processing the current input. The shortcoming of RNN is, they can not remember Long term
dependencies due to vanishing gradient. LSTMs are explicitly designed to avoid long-term dependency
problems.

Note: If you are more interested in learning concepts in an Audio-Visual format, We have this entire article
explained in the video below. If not, you may continue reading.

Introduction to LSTM (Long Short Term Memory) | Deep Le…

Le…

LSTM Architecture

At a high-level LSTM works very much like an RNN cell. Here is the internal functioning of the LSTM
network. The LSTM consists of three parts, as shown in the image below and each part performs an
individual function.
The first part chooses whether the information coming from the previous timestamp is to be remembered
or is irrelevant and can be forgotten. In the second part, the cell tries to learn new information from the
input to this cell. At last, in the third part, the cell passes the updated information from the current
timestamp to the next timestamp.

These three parts of an LSTM cell are known as gates. The first part is called Forget gate, the second part
is known as the Input gate and the last one is the Output gate.

Just like a simple RNN, an LSTM also has a hidden state where H(t-1) represents the hidden state of the
previous timestamp and Ht is the hidden state of the current timestamp. In addition to that LSTM also have
a cell state represented by C(t-1) and C(t) for previous and current timestamp respectively.

Here the hidden state is known as Short term memory and the cell state is known as Long term memory.
Refer to the following image.

It is interesting to note that the cell state carries the information along with all the timestamps.
Let’s take an example to understand how LSTM works. Here we have two sentences separated by a full
stop. The first sentence is “Bob is a nice person” and the second sentence is “Dan, on the Other hand, is
evil”. It is very clear, in the first sentence we are talking about Bob and as soon as we encounter the full
stop(.) we started talking about Dan.

As we move from the first sentence to the second sentence, our network should realize that we are no
more talking about Bob. Now our subject is Dan. Here, the Forget gate of the network allows it to forget
about it. Let’s understand the roles played by these gates in LSTM architecture.

Forget Gate

In a cell of the LSTM network, the first step is to decide whether we should keep the information from the
previous timestamp or forget it. Here is the equation for forget gate.

Let’s try to understand the equation, here

Xt: input to the current timestamp.

Uf: weight associated with the input
Ht-1: The hidden state of the previous timestamp
Wf: It is the weight matrix associated with hidden state

Later, a sigmoid function is applied over it. That will make ft a number between 0 and 1. This ft is later
multiplied with the cell state of the previous timestamp as shown below.
If ft is 0 then the network will forget everything and if the value of ft is 1 it will forget nothing. Let’s get
back to our example, The first sentence was talking about Bob and after a full stop, the network will
encounter Dan, in an ideal case the network should forget about Bob.

Input Gate

Let’s take another example

“Bob knows swimming. He told me over the phone that he had served the navy for four long years.”

So, in both these sentences, we are talking about Bob. However, both give different kinds of information
about Bob. In the first sentence, we get the information that he knows swimming. Whereas the second
sentence tells he uses the phone and served in the navy for four years.

Now just think about it, based on the context given in the first sentence, which information of the second
sentence is critical. First, he used the phone to tell or he served in the navy. In this context, it doesn’t
matter whether he used the phone or any other medium of communication to pass on the information. The
fact that he was in the navy is important information and this is something we want our model to
remember. This is the task of the Input gate.

Input gate is used to quantify the importance of the new information carried by the input. Here is the
equation of the input gate

Here,

Xt: Input at the current timestamp t

Ui: weight matrix of input
Ht-1: A hidden state at the previous timestamp
Wi: Weight matrix of input associated with hidden state

Again we have applied sigmoid function over it. As a result, the value of I at timestamp t will be between 0
and 1.

New information

Now the new information that needed to be passed to the cell state is a function of a hidden state at the
previous timestamp t-1 and input x at timestamp t. The activation function here is tanh. Due to the tanh
function, the value of new information will between -1 and 1. If the value is of Nt is negative the
information is subtracted from the cell state and if the value is positive the information is added to the cell
state at the current timestamp.

However, the Nt won’t be added directly to the cell state. Here comes the updated equation

Here, Ct-1 is the cell state at the current timestamp and others are the values we have calculated
previously.

Output Gate

Now consider this sentence

“Bob single-handedly fought the enemy and died for his country. For his contributions, brave________ .”

During this task, we have to complete the second sentence. Now, the minute we see the word brave, we
know that we are talking about a person. In the sentence only Bob is brave, we can not say the enemy is
brave or the country is brave. So based on the current expectation we have to give a relevant word to fill in
the blank. That word is our output and this is the function of our Output gate.

Here is the equation of the Output gate, which is pretty similar to the two previous gates.

Its value will also lie between 0 and 1 because of this sigmoid function. Now to calculate the current
hidden state we will use Ot and tanh of the updated cell state. As shown below.

It turns out that the hidden state is a function of Long term memory (Ct) and the current output. If you
need to take the output of the current timestamp just apply the SoftMax activation on hidden state Ht.

Here the token with the maximum score in the output is the prediction.

This is the More intuitive diagram of the LSTM network.

This diagram is taken from an interesting blog. I urge you all to go through it. Here is the link-

Understanding LSTM Networks

End Notes

To summarize, in this article we saw the architecture of a sequential model LSTM and how it works in
detail.

If you are looking to kick start your Data Science Journey and want every topic under one roof, your search
stops here. Check out Analytics Vidhya’s Certified AI & ML BlackBelt Plus Program

If you have any questions, let me know in the comments section!

Article Url - https://www.analyticsvidhya.com/blog/2021/03/introduction-to-long-short-term-memory-

lstm/

Shipra Saxena
Shipra is a Data Science enthusiast, Exploring Machine learning and Deep learning algorithms. She is
also interested in Big data technologies. She believes learning is a continuous process so keep moving.

LSTM PPT
No ratings yet
LSTM PPT
22 pages
DL - Intro
No ratings yet
DL - Intro
35 pages
Visual Testing: - Asme - Section 5 (NDT) - Section 5 - Article 9 (VT)
100% (3)
Visual Testing: - Asme - Section 5 (NDT) - Section 5 - Article 9 (VT)
29 pages
LSTM
No ratings yet
LSTM
11 pages
Unit 2 DL
No ratings yet
Unit 2 DL
44 pages
RNN and LSTM - Explanation by Example
No ratings yet
RNN and LSTM - Explanation by Example
56 pages
RNNs and LSTMs
No ratings yet
RNNs and LSTMs
41 pages
LSTM
No ratings yet
LSTM
11 pages
Revision Notes LSTRM
No ratings yet
Revision Notes LSTRM
19 pages
Lecture 3 LSTM, GRU
No ratings yet
Lecture 3 LSTM, GRU
45 pages
Week 6
No ratings yet
Week 6
60 pages
Text Classification To Predict Skin Concerns Over Skincare Using Bidirectional Mechanism in Long Short-Term Memory
No ratings yet
Text Classification To Predict Skin Concerns Over Skincare Using Bidirectional Mechanism in Long Short-Term Memory
11 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
10 pages
Chapter 12 PartII en
No ratings yet
Chapter 12 PartII en
23 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
LSTM and GRU
No ratings yet
LSTM and GRU
22 pages
LSTM
No ratings yet
LSTM
19 pages
UNIT-5-Modern Recurrent Neural Networks
No ratings yet
UNIT-5-Modern Recurrent Neural Networks
60 pages
Understanding LSTM - A Simple Guide With Diagrams and Real-Time Examples - by Neural Pai - Feb, 2025 - Medium
No ratings yet
Understanding LSTM - A Simple Guide With Diagrams and Real-Time Examples - by Neural Pai - Feb, 2025 - Medium
15 pages
Unlocking The Power of Long Short-Term Memory (LSTM) Networks - by Sachinsoni - Medium
No ratings yet
Unlocking The Power of Long Short-Term Memory (LSTM) Networks - by Sachinsoni - Medium
23 pages
Module 4
No ratings yet
Module 4
14 pages
LSTM 006
No ratings yet
LSTM 006
6 pages
Longshorttermmemorylstm 231215171600 1feb7b1b
No ratings yet
Longshorttermmemorylstm 231215171600 1feb7b1b
17 pages
LSTM Presentation
No ratings yet
LSTM Presentation
23 pages
LSTM & Gru
No ratings yet
LSTM & Gru
17 pages
LSTMS
No ratings yet
LSTMS
14 pages
NLP - L8 LSTM
No ratings yet
NLP - L8 LSTM
7 pages
What Is LSTM - Long Short Term Memory - GeeksforGeeks
No ratings yet
What Is LSTM - Long Short Term Memory - GeeksforGeeks
9 pages
DL Co-3 PPT 3
No ratings yet
DL Co-3 PPT 3
19 pages
LSTM&RNN
No ratings yet
LSTM&RNN
10 pages
LSTM 1738024034
No ratings yet
LSTM 1738024034
13 pages
LSTM by Bushra
No ratings yet
LSTM by Bushra
16 pages
DLT Unit-4
No ratings yet
DLT Unit-4
18 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
No ratings yet
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
4 pages
LSTM
No ratings yet
LSTM
12 pages
RNN
No ratings yet
RNN
28 pages
LSTM
No ratings yet
LSTM
22 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
LSTMRefCard Printable
No ratings yet
LSTMRefCard Printable
1 page
LSTM Deep Learning
No ratings yet
LSTM Deep Learning
11 pages
Long Term Memory
No ratings yet
Long Term Memory
2 pages
LSTM
No ratings yet
LSTM
3 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
Addition Multiplication RNN
No ratings yet
Addition Multiplication RNN
7 pages
Modified Long Short-Term Memory and Utilizing in Building Sequential Model
No ratings yet
Modified Long Short-Term Memory and Utilizing in Building Sequential Model
6 pages
LSTM
No ratings yet
LSTM
12 pages
Long Short-Term Memory (LSTM) by Mohsin
No ratings yet
Long Short-Term Memory (LSTM) by Mohsin
17 pages
LSTM
No ratings yet
LSTM
14 pages
Unit Iii
No ratings yet
Unit Iii
5 pages
Neural Networks
No ratings yet
Neural Networks
22 pages
OlahLSTM NEURAL NETWORK TUTORIAL 15
No ratings yet
OlahLSTM NEURAL NETWORK TUTORIAL 15
9 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
Unit 2 DL
No ratings yet
Unit 2 DL
43 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
LSTM Networks Thesis Updated
No ratings yet
LSTM Networks Thesis Updated
5 pages
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
No ratings yet
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
14 pages
5 LSTM
No ratings yet
5 LSTM
4 pages
Calculation Sheet For External Surface Areas (Including Glass)
No ratings yet
Calculation Sheet For External Surface Areas (Including Glass)
20 pages
Long Short-Term Memory: Machine Learning Data Mining
No ratings yet
Long Short-Term Memory: Machine Learning Data Mining
6 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
Southern Mindanao College 229
No ratings yet
Southern Mindanao College 229
3 pages
DE09 Sol
No ratings yet
DE09 Sol
157 pages
AIB The Mock (Recall) Myth PDF
No ratings yet
AIB The Mock (Recall) Myth PDF
2 pages
Comparison Fagll03 Fbl3n Fbl5n
No ratings yet
Comparison Fagll03 Fbl3n Fbl5n
2 pages
Under Balanced Managed Pressure Drilling
No ratings yet
Under Balanced Managed Pressure Drilling
19 pages
Ipc - Jedec J-STD-020C
100% (1)
Ipc - Jedec J-STD-020C
14 pages
Next Gen HD LED Lit Videowall User Guide PDF
No ratings yet
Next Gen HD LED Lit Videowall User Guide PDF
109 pages
Highway Pavement Structural Design: (JRCP)
No ratings yet
Highway Pavement Structural Design: (JRCP)
37 pages
Hypotheses Test Python
No ratings yet
Hypotheses Test Python
11 pages
11-Chapter 11-Wellsite Geologist
No ratings yet
11-Chapter 11-Wellsite Geologist
140 pages
Half-Life Centium Lab
No ratings yet
Half-Life Centium Lab
6 pages
Sample PF Packing List
No ratings yet
Sample PF Packing List
595 pages
Bab3 Matrikulasi
No ratings yet
Bab3 Matrikulasi
31 pages
Exemples de Writing English BAC
No ratings yet
Exemples de Writing English BAC
3 pages
Formal and Informal Communication
No ratings yet
Formal and Informal Communication
10 pages
Algorithmic Aspects of Machine Learning
No ratings yet
Algorithmic Aspects of Machine Learning
249 pages
18 Spring Mid
No ratings yet
18 Spring Mid
16 pages
Bridgeswitch Family Datasheet PDF
No ratings yet
Bridgeswitch Family Datasheet PDF
32 pages
Arden University Guide To Harvard Citation
No ratings yet
Arden University Guide To Harvard Citation
15 pages
National Programme On Technology Enhanced Learning (Nptel) Frequently Asked Questions (Faqs)
No ratings yet
National Programme On Technology Enhanced Learning (Nptel) Frequently Asked Questions (Faqs)
9 pages
Lesson Plan in Napkin Folding
No ratings yet
Lesson Plan in Napkin Folding
2 pages
Major Project Synopsis Front Page
100% (1)
Major Project Synopsis Front Page
7 pages
Ishan Earthing Solutions India PVT - LTD
No ratings yet
Ishan Earthing Solutions India PVT - LTD
2 pages
Chapter 1
No ratings yet
Chapter 1
23 pages
Mendels Law of Segregation
No ratings yet
Mendels Law of Segregation
10 pages
Amendment in Regional Transmission Grid Plan of Gwadar Area - Complete (1) - Pages-70-74
No ratings yet
Amendment in Regional Transmission Grid Plan of Gwadar Area - Complete (1) - Pages-70-74
5 pages
Steps Involved in Production and Utilization of A TV Programme
No ratings yet
Steps Involved in Production and Utilization of A TV Programme
5 pages
Final Mid Term Risk Factore Including Nag
No ratings yet
Final Mid Term Risk Factore Including Nag
11 pages
Confirmation and Itinerar1
No ratings yet
Confirmation and Itinerar1
6 pages
A Guide To MLOps
No ratings yet
A Guide To MLOps
3 pages
A Guide To MLOps
No ratings yet
A Guide To MLOps
3 pages
Grade 10 Physics Assessment
No ratings yet
Grade 10 Physics Assessment
1 page

Introduction To Long Short Term Memory LSTM

Uploaded by

Introduction To Long Short Term Memory LSTM

Uploaded by

Introduction to Long Short Term Memory (LSTM)

Introduction to LSTM (Long Short Term Memory) | Deep Le…

Let’s try to understand the equation, here

Xt: input to the current timestamp.

Let’s take another example

Xt: Input at the current timestamp t

Now consider this sentence

This is the More intuitive diagram of the LSTM network.

Understanding LSTM Networks

If you have any questions, let me know in the comments section!

Article Url - https://www.analyticsvidhya.com/blog/2021/03/introduction-to-long-short-term-memory-

You might also like