0% found this document useful (0 votes)
202 views14 pages

What Is LSTM - Long Short Term Memory - GeeksforGeeks

Long Short-Term Memory (LSTM) is an advanced type of Recurrent Neural Network (RNN) that effectively captures long-term dependencies in sequential data, making it suitable for applications like language translation and speech recognition. LSTMs utilize a memory cell controlled by three gates (input, forget, and output) to manage information flow, addressing issues like the vanishing and exploding gradient problems faced by traditional RNNs. Additionally, variations like Bidirectional LSTMs enhance performance by processing data in both directions, further improving their ability to learn complex patterns.

Uploaded by

Amit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
202 views14 pages

What Is LSTM - Long Short Term Memory - GeeksforGeeks

Long Short-Term Memory (LSTM) is an advanced type of Recurrent Neural Network (RNN) that effectively captures long-term dependencies in sequential data, making it suitable for applications like language translation and speech recognition. LSTMs utilize a memory cell controlled by three gates (input, forget, and output) to manage information flow, addressing issues like the vanishing and exploding gradient problems faced by traditional RNNs. Additionally, variations like Bidirectional LSTMs enhance performance by processing data in both directions, further improving their ability to learn complex patterns.

Uploaded by

Amit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

3/8/25, 10:21 AM What is LSTM - Long Short Term Memory?

- GeeksforGeeks

What is LSTM – Long Short Term Memory?


Last Updated : 27 Feb, 2025

Long Short-Term Memory (LSTM) is an enhanced version of the


Recurrent Neural Network (RNN) designed by Hochreiter &
Schmidhuber. LSTMs can capture long-term dependencies in sequential
data making them ideal for tasks like language translation, speech
recognition and time series forecasting.

Unlike traditional RNNs which use a single hidden state passed through
time LSTMs introduce a memory cell that holds information over
extended periods addressing the challenge of learning long-term
dependencies.

Problem with Long-Term Dependencies in RNN


Recurrent Neural Networks (RNNs) are designed to handle sequential
data by maintaining a hidden state that captures information from
previous time steps. However they often face challenges in learning
long-term dependencies where information from distant time steps
becomes crucial for making accurate predictions for current state. This
problem is known as the vanishing gradient or exploding gradient
problem.

Vanishing Gradient: When training a model over time, the gradients


(which help the model learn) can shrink as they pass through many
steps. This makes it hard for the model to learn long-term patterns
since earlier information becomes almost irrelevant.
Exploding Gradient: Sometimes, gradients can grow too large,
causing instability. This makes it difficult for the model to learn

https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 1/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks

properly, as the updates to the model become erratic and


unpredictable.

Both of these issues make it challenging for standard RNNs to


effectively capture long-term dependencies in sequential data.

LSTM Architecture
LSTM architectures involves the memory cell which is controlled by
three gates: the input gate, the forget gate and the output gate. These
gates decide what information to add to, remove from and output from
the memory cell.

Input gate: Controls what information is added to the memory cell.


Forget gate: Determines what information is removed from the
memory cell.
Output gate: Controls what information is output from the memory
cell.

This allows LSTM networks to selectively retain or discard information


as it flows through the network which allows them to learn long-term
dependencies. The network has a hidden state which is like its short-
term memory. This memory is updated using the current input, the
previous hidden state and the current state of the memory cell.

Working of LSTM
LSTM architecture has a chain structure that contains four neural
networks and different memory blocks called cells.

https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 2/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks

Information is retained by the cells and the memory manipulations are


done by the gates. There are three gates –

Forget Gate

The information that is no longer useful in the cell state is removed with
the forget gate. Two inputs xt (input at the particular time) and ht-1
(previous cell output) are fed to the gate and multiplied with weight
matrices followed by the addition of bias. The resultant is passed
through an activation function which gives a binary output. If for a
particular cell state the output is 0, the piece of information is forgotten
and for output 1, the information is retained for future use.

The equation for the forget gate is:

ft = σ(Wf ⋅ [ht−1 , xt ] + bf )
​ ​ ​ ​ ​

where:

W_f represents the weight matrix associated with the forget gate.
[h_t-1, x_t] denotes the concatenation of the current input and the
previous hidden state.
b_f is the bias with the forget gate.
σ is the sigmoid activation function.

https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 3/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks

Input gate

The addition of useful information to the cell state is done by the input
gate. First, the information is regulated using the sigmoid function and
filter the values to be remembered similar to the forget gate using
inputs ht-1 and xt. . Then, a vector is created using tanh function that
gives an output from -1 to +1, which contains all the possible values
from ht-1 and xt. At last, the values of the vector and the regulated
values are multiplied to obtain the useful information. The equation for
the input gate is:

it = σ(Wi ⋅ [ht−1 , xt ] + bi )
​ ​ ​ ​ ​

C^t = tanh(Wc ⋅ [ht−1 , xt ] + bc )


​ ​ ​ ​ ​

We multiply the previous state by ft, disregarding the information we


had previously chosen to ignore. Next, we include it∗Ct. This represents
the updated candidate values, adjusted for the amount that we chose to
update each state value.
Data Science IBM Certification Data Science Data Science Projects Data Analysis Data Visualization
Ct = ft ⊙ Ct−1 + it ⊙ C^t
​ ​ ​ ​ ​

where

⊙ denotes element-wise multiplication


tanh is tanh activation function

https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 4/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks

Output gate

The task of extracting useful information from the current cell state to
be presented as output is done by the output gate. First, a vector is
generated by applying tanh function on the cell. Then, the information is
regulated using the sigmoid function and filter by the values to be
remembered using inputs ht−1 and xt . At last, the values of the vector
​ ​

and the regulated values are multiplied to be sent as an output and


input to the next cell. The equation for the output gate is:

ot = σ(Wo ⋅ [ht−1 , xt ] + bo )
​ ​ ​ ​ ​

Bidirectional LSTM Model


Bidirectional LSTM (Bi LSTM/ BLSTM) is a variation of normal LSTM
which processes sequential data in both forward and backward
directions. This allows Bi LSTM to learn longer-range dependencies in
sequential data than traditional LSTMs which can only process
sequential data in one direction.

Bi LSTMs are made up of two LSTM networks one that processes the
input sequence in the forward direction and one that processes the
input sequence in the backward direction.
The outputs of the two LSTM networks are then combined to
produce the final output.

LSTM models including Bi LSTMs have demonstrated state-of-


the-art performance across various tasks such as machine
https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 5/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks

translation, speech recognition and text summarization.

LSTM networks can be stacked to form deeper models allowing them to


learn more complex patterns in data. Each layer in the stack captures
different levels of information and time-based relationships in the input.

Applications of LSTM
Some of the famous applications of LSTM includes:

Language Modeling: Used in tasks like language modeling, machine


translation and text summarization. These networks learn the
dependencies between words in a sentence to generate coherent and
grammatically correct sentences.
Speech Recognition: Used in transcribing speech to text and
recognizing spoken commands. By learning speech patterns they can
match spoken words to corresponding text.
Time Series Forecasting: Used for predicting stock prices, weather
and energy consumption. They learn patterns in time series data to
predict future events.
Anomaly Detection: Used for detecting fraud or network intrusions.
These networks can identify patterns in data that deviate drastically
and flag them as potential anomalies.
Recommender Systems: In recommendation tasks like suggesting
movies, music and books. They learn user behavior patterns to
provide personalized suggestions.
Video Analysis: Applied in tasks such as object detection, activity
recognition and action classification. When combined with
Convolutional Neural Networks (CNNs) they help analyze video data
and extract useful information.

LTSM vs RNN

https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 6/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks

Feature LSTM (Long Short- RNN (Recurrent


term Memory) Neural Network)

Memory Has a special memory


unit that allows it to
Does not have a
learn long-term
memory unit
dependencies in
sequential data

Directionality Can be trained to


process sequential Can only be trained to
data in both forward process sequential
and backward data in one direction
directions

Training More difficult to train


than RNN due to the Easier to train than
complexity of the gates LSTM
and memory unit

Long-term
Yes Limited
dependency learning

Ability to learn
Yes Yes
sequential data

Applications Machine translation, Natural language


speech recognition, processing, machine
text summarization, translation, speech
natural language recognition, image
processing, time series processing, video
forecasting processing

Frequently Asked Questions (FAQs) on LSTM

https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 7/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks

What is LSTM and why it is used?

LSTM, or Long Short-Term Memory, is a type of recurrent neural


network designed for sequence tasks, excelling in capturing and
utilizing long-term dependencies in data.

How does LSTM work?

LSTMs use a cell state to store information about past inputs. This
cell state is updated at each step of the network, and the network
uses it to make predictions about the current input. The cell state
is updated using a series of gates that control how much
information is allowed to flow into and out of the cell.

What are LSTM examples?

LSTM (Long Short-Term Memory) examples include speech


recognition, machine translation, and time series prediction,
leveraging its ability to capture long-term dependencies in
sequential data.

What is the difference between LSTM and Gated Recurrent Unit


(GRU)?

LSTM has a cell state and gating mechanism which controls


information flow, whereas GRU has a simpler single gate update
mechanism. LSTM is more powerful but slower to train, while
GRU is simpler and faster.

What is difference between LSTM and RNN?

RNNs have a simple recurrent structure with unidirectional


information flow.
LSTMs have a gating mechanism that controls information flow
and a cell state for long-term memory.
https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 8/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks

LSTMs generally outperform RNNs in tasks that require


learning long-term dependencies.

Is LSTM faster than CNN?

No, LSTMs and CNNs serve different purposes. LSTMs are for
sequential data; CNNs are for spatial data.

Is LSTM faster than GRU?

Generally, yes. GRUs have fewer parameters, which can lead to


faster training compared to LSTMs.

Get IBM Certification and a 90% fee refund on completing 90%


course in 90 days! Take the Three 90 Challenge today.

Master Machine Learning, Data Science & AI with this complete


program and also get a 90% refund. What more motivation do you
need? Start the challenge right away!

Comment More info Next Article


Long Short Term Memory
Advertise with us Networks Explanation

Similar Reads
Long short-term memory (LSTM) RNN in Tensorflow
Long Short-Term Memory (LSTM) where designed to address the
vanishing gradient issue faced by traditional RNNs when learning long-…

4 min read

LSTM Full Form - Long Short-Term Memory


https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 9/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks

LSTM stands for Long Short-Term Memory. It is a type of recurrent neural


network (RNN) architecture that is designed to remember long-term…

1 min read

Long Short-Term Memory (LSTM) using R


Implementing Long Short-Term Memory (LSTM) networks in R involves
using libraries that support deep learning frameworks like TensorFlow or…

5 min read

What is LSTM - Long Short Term Memory?


Long Short-Term Memory (LSTM) is an enhanced version of the Recurrent
Neural Network (RNN) designed by Hochreiter & Schmidhuber. LSTMs…

8 min read

Long Short Term Memory (LSTM) Networks using PyTorch


Long Short-Term Memory (LSTM) where designed to overcome the
vanishing gradient problem which traditional RNNs face when learning…

4 min read

AWD-LSTM : Unraveling the Secrets of DropConnect in LSTM


AWD LSTM is a machine learning technique that helps in understanding
patterns over time, like predicting what comes next in a sequence of dat…

7 min read

Difference Between a Bidirectional LSTM and an LSTM


Long Short-Term Memory (LSTM) networks are capable of learning long-
term dependencies. They were introduced to address the vanishing and…

3 min read

Text Generation using Recurrent Long Short Term Memory Network


LSTMs are a type of neural network that are well-suited for tasks
involving sequential data such as text generation. They are particularly…

https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 10/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks
5 min read

Long Short Term Memory Networks Explanation


Prerequisites: Recurrent Neural Networks To solve the problem of
Vanishing and Exploding Gradients in a Deep Recurrent Neural Network,…

7 min read

Short term Memory


In the wider community of neurologists and those who are researching the
brain, It is agreed that two temporarily distinct processes contribute to th…

5 min read

Corporate & Communications Address:


A-143, 7th Floor, Sovereign Corporate
Tower, Sector- 136, Noida, Uttar Pradesh
(201305)

Registered Address:
K 061, Tower K, Gulshan Vivante
Apartment, Sector 137, Noida, Gautam
Buddh Nagar, Uttar Pradesh, 201305

Advertise with us

Company Explore
About Us Job-A-Thon Hiring Challenge
Legal Hack-A-Thon
Privacy Policy GfG Weekly Contest
Careers Offline Classes (Delhi/NCR)
In Media DSA in JAVA/C++
Contact Us Master System Design

https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 11/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks

GfG Corporate Solution Master CP


Placement Training Program GeeksforGeeks Videos
Geeks Community

Languages DSA
Python Data Structures
Java Algorithms
C++ DSA for Beginners
PHP Basic DSA Problems
GoLang DSA Roadmap
SQL DSA Interview Questions
R Language Competitive Programming
Android Tutorial

Data Science & ML Web Technologies


Data Science With Python HTML
Data Science For Beginner CSS
Machine Learning JavaScript
ML Maths TypeScript
Data Visualisation ReactJS
Pandas NextJS
NumPy NodeJs
NLP Bootstrap
Deep Learning Tailwind CSS

Python Tutorial Computer Science


Python Programming Examples GATE CS Notes
Django Tutorial Operating Systems
Python Projects Computer Network
Python Tkinter Database Management System
Web Scraping Software Engineering
OpenCV Tutorial Digital Logic Design
Python Interview Question Engineering Maths

DevOps System Design


Git High Level Design
AWS Low Level Design
Docker UML Diagrams
Kubernetes Interview Guide
Azure Design Patterns
GCP OOAD
DevOps Roadmap System Design Bootcamp
Interview Questions

School Subjects Software and Tools


Mathematics AI Tools Directory
Physics Marketing Tools Directory
Chemistry Accounting Software Directory
Biology HR Management Tools

https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 12/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks

Social Science Editing Software Directory


English Grammar Microsoft Products and Apps
Figma Tutorial

Databases Preparation Corner


SQL Company-Wise Recruitment Process
MYSQL Resume Templates
PostgreSQL Aptitude Preparation
PL/SQL Puzzles
MongoDB Company-Wise Preparation
Companies
Colleges

Competitive Exams More Tutorials


JEE Advanced Software Development
UGC NET Software Testing
UPSC Product Management
SSC CGL Project Management
SBI PO Linux
SBI Clerk Excel
IBPS PO All Cheat Sheets
IBPS Clerk Recent Articles

Free Online Tools Write & Earn


Typing Test Write an Article
Image Editor Improve an Article
Code Formatters Pick Topics to Write
Code Converters Share your Experiences
Currency Converter Internships
Random Number Generator
Random Password Generator

DSA/Placements Development/Testing
DSA - Self Paced Course JavaScript Full Course
DSA in JavaScript - Self Paced Course React JS Course
DSA in Python - Self Paced React Native Course
C Programming Course Online - Learn C with Data Structures Django Web Development Course
Complete Interview Preparation Complete Bootstrap Course
Master Competitive Programming Full Stack Development - [LIVE]
Core CS Subject for Interview Preparation JAVA Backend Development - [LIVE]
Mastering System Design: LLD to HLD Complete Software Testing Course [LIVE]
Tech Interview 101 - From DSA to System Design [LIVE] Android Mastery with Kotlin [LIVE]
DSA to Development [HYBRID]
Placement Preparation Crash Course [LIVE]

Machine Learning/Data Science Programming Languages


Complete Machine Learning & Data Science Program - [LIVE] C Programming with Data Structures
Data Analytics Training using Excel, SQL, Python & PowerBI - C++ Programming Course
[LIVE] Java Programming Course

https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 13/14
3/8/25, 10:21 AM What is LSTM - Long Short Term Memory? - GeeksforGeeks

Data Science Training Program - [LIVE] Python Full Course


Mastering Generative AI and ChatGPT
Data Science Course with IBM Certification

Clouds/Devops GATE 2026


DevOps Engineering GATE CS Rank Booster
AWS Solutions Architect Certification GATE DA Rank Booster
Salesforce Certified Administrator Course GATE CS & IT Course - 2026
GATE DA Course 2026
GATE Rank Predictor

@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved

https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/ 14/14

You might also like