0% found this document useful (0 votes)

85 views7 pages

Understanding LSTM Networks - Colah's Blog

The document discusses recurrent neural networks (RNNs) and long short-term memory (LSTM) networks. RNNs can remember previous inputs, allowing them to use information from the past to inform predictions. However, standard RNNs struggle with long-term dependencies where relevant information is far from the prediction point. LSTMs address this issue through their unique structure that allows information to persist indefinitely in the cell state without fading away. The cell state acts as a conveyor belt that can add or remove information through specialized gates that learn to open and close.

Uploaded by

Balaji Sundar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views7 pages

Understanding LSTM Networks - Colah's Blog

Uploaded by

Balaji Sundar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

27/09/2023, 17:23 Understanding LSTM Networks -- colah's blog

Understanding LSTM Networks

Posted on August 27, 2015

Recurrent Neural Networks

Humans don’t start their thinking from scratch every second. As you read this essay, you understand each word based
on your understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your
thoughts have persistence.

Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to
classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could
use its reasoning about previous events in the film to inform later ones.

Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.

10
Recurrent Neural Networks have loops.

In the above diagram, a chunk of neural network, A, looks at some input xt and outputs a value ht . A loop allows
information to be passed from one step of the network to the next.

These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out
that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as
multiple copies of the same network, each passing a message to a successor. Consider what happens if we unroll the
loop:

An unrolled recurrent neural network.

This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists. They’re the
natural architecture of neural network to use for such data.

And they certainly are used! In the last few years, there have been incredible success applying RNNs to a variety of
problems: speech recognition, language modeling, translation, image captioning… The list goes on. I’ll leave discussion
of the amazing feats one can achieve with RNNs to Andrej Karpathy’s excellent blog post, The Unreasonable
Effectiveness of Recurrent Neural Networks (http://karpathy.github.io/2015/05/21/rnn-effectiveness/). But they really
are pretty amazing.

1/7
27/09/2023, 17:23 Understanding LSTM Networks -- colah's blog

Essential to these successes is the use of “LSTMs,” a very special kind of recurrent neural network which works, for
many tasks, much much better than the standard version. Almost all exciting results based on recurrent neural
networks are achieved with them. It’s these LSTMs that this essay will explore.

The Problem of Long-Term Dependencies

One of the appeals of RNNs is the idea that they might be able to connect previous information to the present task,
such as using previous video frames might inform the understanding of the present frame. If RNNs could do this, they’d
be extremely useful. But can they? It depends.

Sometimes, we only need to look at recent information to perform the present task. For example, consider a language
model trying to predict the next word based on the previous ones. If we are trying to predict the last word in “the
clouds are in the sky,” we don’t need any further context – it’s pretty obvious the next word is going to be sky. In such
cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use
the past information.

But there are also cases where we need more context. Consider trying to predict the last word in the text
6 “I grew up in

France… I speak fluent French.” Recent information suggests that the next word is probably the name of a language,
but if we want to narrow down which language, we need the context of France, from further back. It’s entirely possible
for the gap between the relevant information and the point where it is needed to become very large.

Unfortunately, as that gap grows, RNNs become unable to learn to connect the information.

In theory, RNNs are absolutely capable of handling such “long-term dependencies.” A human could carefully pick
parameters for them to solve toy problems of this form. Sadly, in practice, RNNs don’t seem to be able to learn them.
The problem was explored in depth by Hochreiter (1991) [German]
(http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf) and Bengio, et al. (1994)
(http://www-dsi.ing.unifi.it/~paolo/ps/tnn-94-gradient.pdf), who found some pretty fundamental reasons why it might
be difficult.

Thankfully, LSTMs don’t have this problem!

LSTM Networks
Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning
long-term dependencies. They were introduced by Hochreiter & Schmidhuber (1997)
(http://www.bioinf.jku.at/publications/older/2604.pdf), and were refined and popularized by many people in following
work.1 They work tremendously well on a large variety of problems, and are now widely used.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods
of time is practically their default behavior, not something they struggle to learn!

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this
repeating module will have a very simple structure, such as a single tanh layer.

2/7
27/09/2023, 17:23 Understanding LSTM Networks -- colah's blog

The repeating module in a standard RNN contains a single layer.

LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single
neural network layer, there are four, interacting in a very special way.

The repeating module in an LSTM contains four interacting layers.

Don’t worry about the details of what’s going on. We’ll walk through the LSTM diagram step by step later. For now,
let’s just try to get comfortable with the notation we’ll be using.

In the above diagram, each line carries an entire vector, from the output of one node to the inputs of others. The pink
circles represent pointwise operations, like vector addition, while the yellow boxes are learned neural network layers.
Lines merging denote concatenation, while a line forking denote its content being copied and the copies going to
450
different locations.

The Core Idea Behind LSTMs

The key to LSTMs is the cell state, the horizontal line running through the top of the diagram.

The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear
interactions. It’s very easy for information to just flow along it unchanged.

The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called
gates.
3/7
27/09/2023, 17:23 Understanding LSTM Networks -- colah's blog

Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and a
pointwise multiplication operation.

The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let
through. A value of zero means “let nothing through,” while a value of one means “let everything through!”
5
An LSTM has three of these gates, to protect and control the cell state.

Step-by-Step LSTM Walk Through

The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This decision is
made by a sigmoid layer called the “forget gate layer.” It looks at ht−1 and xt , and outputs a number between 0 and 1 for
each number in the cell state Ct−1 . A 1 represents “completely keep this” while a 0 represents “completely get rid of
this.”

Let’s go back to our example of a language model trying to predict the next word based on all the previous ones. In
such a problem, the cell state might include the gender of the present subject, so that the correct pronouns can be used.
When we see a new subject, we want to forget the gender of the old subject.

The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a
sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of new
candidate values, C̃t , that could be added to the state. In the next step, we’ll combine these two to create an update to
the state.

In the example of our language model, we’d want to add the gender of the new subject to the cell state, to replace the
old one we’re forgetting.

It’s now time to update the old cell state, Ct−1 , into the new cell state Ct . The previous steps already decided what to
do, we just need to actually do it.

We multiply the old state by ft , forgetting the things we decided to forget earlier. Then we add it ∗ C̃t . This is the new
candidate values, scaled by how much we decided to update each state value.

In the case of the language model, this is where we’d actually drop the information about the old subject’s gender and
add the new information, as we decided in the previous steps.

4/7
27/09/2023, 17:23 Understanding LSTM Networks -- colah's blog

Finally, we need to decide what we’re going to output. This output will be based on our cell state, but will be a filtered
version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the
cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate,
so that we only output the parts we decided to.

For the language model example, since it just saw a subject, it might want to output information relevant to a verb, in
case that’s what is coming next. For example, it might output whether the subject is singular or plural, so that we
know what form a verb should be conjugated into if that’s what follows next.

Variants on Long Short Term Memory

What I’ve described so far is a pretty normal LSTM. But not all LSTMs are the same as the above. In fact, it seems
like almost every paper involving LSTMs uses a slightly different version. The differences are minor, but it’s worth
mentioning some of them.

One popular LSTM variant, introduced by Gers & Schmidhuber (2000) (ftp://ftp.idsia.ch/pub/juergen/TimeCount-
IJCNN2000.pdf), is adding “peephole connections.” This means that we let the gate layers look at the cell state.

The above diagram adds peepholes to all the gates, but many papers will give some peepholes and not others.

Another variation is to use coupled forget and input gates. Instead of separately deciding what to forget and what we
should add new information to, we make those decisions together. We only forget when we’re going to input something
in its place. We only input new values to the state when we forget something older.

5/7
27/09/2023, 17:23 Understanding LSTM Networks -- colah's blog

A slightly more dramatic variation on the LSTM is the Gated Recurrent Unit, or GRU, introduced by Cho, et al.
(2014) (http://arxiv.org/pdf/1406.1078v3.pdf). It combines the forget and input gates into a single “update gate.” It
also merges the cell state and hidden state, and makes some other changes. The resulting model is simpler than
standard LSTM models, and has been growing increasingly popular.

These are only a few of the most notable LSTM variants. There are lots of others, like Depth Gated RNNs by Yao, et
al. (2015) (http://arxiv.org/pdf/1508.03790v2.pdf). There’s also some completely different approach to tackling long-
term dependencies, like Clockwork RNNs by Koutnik, et al. (2014) (http://arxiv.org/pdf/1402.3511v1.pdf).

Which of these variants is best? Do the differences matter? Greff, et al. (2015) (http://arxiv.org/pdf/1503.04069.pdf)
do a nice comparison of popular variants, finding that they’re all about the same. Jozefowicz, et al. (2015)
(http://jmlr.org/proceedings/papers/v37/jozefowicz15.pdf) tested more than ten thousand RNN architectures, finding
some that worked better than LSTMs on certain tasks.

Conclusion
Earlier, I mentioned the remarkable results people are achieving with RNNs. Essentially all of these are achieved using
LSTMs. They really work a lot better for most tasks!

Written down as a set of equations, LSTMs look pretty intimidating. Hopefully, walking through them step by step in
this essay has made them a bit more approachable.

LSTMs were a big step in what we can accomplish with RNNs. It’s natural to wonder: is there another big step? A
common opinion among researchers is: “Yes! There is a next step and it’s attention!” The idea is to let every step of an
RNN pick information to look at from some larger collection of information. For example, if you are using an RNN to
create a caption describing an image, it might pick a part of the image to look at for every word it outputs. In fact, Xu,
et al. (2015) (http://arxiv.org/pdf/1502.03044v2.pdf) do exactly this – it might be a fun starting point if you want to
explore attention! There’s been a number of really exciting results using attention, and it seems like a lot more are
around the corner…

Attention isn’t the only exciting thread in RNN research. For example, Grid LSTMs by Kalchbrenner, et al. (2015)
(http://arxiv.org/pdf/1507.01526v1.pdf) seem extremely promising. Work using RNNs in generative models – such as
Gregor, et al. (2015) (http://arxiv.org/pdf/1502.04623.pdf), Chung, et al. (2015)
(http://arxiv.org/pdf/1506.02216v3.pdf), or Bayer & Osendorfer (2015) (http://arxiv.org/pdf/1411.7610v3.pdf) – also
seems very interesting. The last few years have been an exciting time for recurrent neural networks, and the coming
ones promise to only be more so!

Acknowledgments
I’m grateful to a number of people for helping me better understand LSTMs, commenting on the visualizations, and
providing feedback on this post.

6/7
27/09/2023, 17:23 Understanding LSTM Networks -- colah's blog

I’m very grateful to my colleagues at Google for their helpful feedback, especially Oriol Vinyals
(http://research.google.com/pubs/OriolVinyals.html), Greg Corrado
(http://research.google.com/pubs/GregCorrado.html), Jon Shlens
(http://research.google.com/pubs/JonathonShlens.html), Luke Vilnis (http://people.cs.umass.edu/~luke/), and Ilya
Sutskever (http://www.cs.toronto.edu/~ilya/). I’m also thankful to many other friends and colleagues for taking the
time to help me, including Dario Amodei (https://www.linkedin.com/pub/dario-amodei/4/493/393), and Jacob
Steinhardt (http://cs.stanford.edu/~jsteinhardt/). I’m especially thankful to Kyunghyun Cho
(http://www.kyunghyuncho.me/) for extremely thoughtful correspondence about my diagrams.

Before this post, I practiced explaining LSTMs during two seminar series I taught on neural networks. Thanks to
everyone who participated in those for their patience with me, and for their feedback.

1. In addition to the original authors, a lot of people contributed to the modern LSTM. A non-comprehensive list is:
Felix Gers, Fred Cummins, Santiago Fernandez, Justin Bayer, Daan Wierstra, Julian Togelius, Faustino Gomez,
Matteo Gagliolo, and Alex Graves (https://scholar.google.com/citations?user=DaFHynwAAAAJ&hl=en).↩

More Posts
(http://distill.pub/2016/augmented-rnns/) (../../posts/2014-07-Conv- 1
Nets-Modular/) (../../posts/2014-03-NN-
Manifolds-Topology/) (../../posts/2014-07-
NLP-RNNs-
Representations/)

Attention and Augmented

Recurrent Neural Networks
Conv Nets
On Distill A Modular Perspective
Neural Networks, Manifolds,
and Topology

Deep Learning, NLP, and

Representations

75 Comments (/posts/2015-08-Understanding-
LSTMs/#disqus_thread)

7/7

RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Unit 3 Deep Learning SPPU BE IT
No ratings yet
Unit 3 Deep Learning SPPU BE IT
30 pages
Fake News Detection Project Documentation
No ratings yet
Fake News Detection Project Documentation
16 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
RNN
No ratings yet
RNN
79 pages
A Presentation On "Deep Neural Network" Nikhil Sunil Patil
No ratings yet
A Presentation On "Deep Neural Network" Nikhil Sunil Patil
9 pages
Final PDL - Unit IV
No ratings yet
Final PDL - Unit IV
51 pages
Cs224n 2025 Lecture06 Fancy RNN
No ratings yet
Cs224n 2025 Lecture06 Fancy RNN
57 pages
RNNs and LSTMs
No ratings yet
RNNs and LSTMs
41 pages
Lecture 11
No ratings yet
Lecture 11
57 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Deep Learning (MODULE-5)
No ratings yet
Deep Learning (MODULE-5)
71 pages
Developing A Breast Cancer Disease Detection Model Using CNN Approach
No ratings yet
Developing A Breast Cancer Disease Detection Model Using CNN Approach
73 pages
LSTM, RNN
No ratings yet
LSTM, RNN
38 pages
Lec 10
No ratings yet
Lec 10
37 pages
RNN With LSTM
No ratings yet
RNN With LSTM
36 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
Sequence Modeling
No ratings yet
Sequence Modeling
131 pages
Session2 2024 - 2025 - Natural Language Processing
No ratings yet
Session2 2024 - 2025 - Natural Language Processing
30 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
06-DL-Deep Learning For Text Data (LSTM Seq2Seq Models)
No ratings yet
06-DL-Deep Learning For Text Data (LSTM Seq2Seq Models)
44 pages
LSTM
No ratings yet
LSTM
19 pages
LSTM&RNN
No ratings yet
LSTM&RNN
10 pages
Long Short-Term Memory (LSTM) : A Deep Dive Into Sequential Learning
No ratings yet
Long Short-Term Memory (LSTM) : A Deep Dive Into Sequential Learning
17 pages
LSTM Presentation
No ratings yet
LSTM Presentation
23 pages
Longshorttermmemorylstm 231215171600 1feb7b1b
No ratings yet
Longshorttermmemorylstm 231215171600 1feb7b1b
17 pages
Internshipreport 15
No ratings yet
Internshipreport 15
34 pages
9 RNN LSTM Gru
No ratings yet
9 RNN LSTM Gru
91 pages
Module 6
No ratings yet
Module 6
42 pages
LSTM Networks in Python 1723896317
No ratings yet
LSTM Networks in Python 1723896317
17 pages
One-Step Image Translation With Text-to-Image Models
No ratings yet
One-Step Image Translation With Text-to-Image Models
29 pages
RNN and LSTM
No ratings yet
RNN and LSTM
32 pages
10 Forward - Backward Algorithm
No ratings yet
10 Forward - Backward Algorithm
21 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
LSTM
No ratings yet
LSTM
123 pages
Unit 4 - DL
No ratings yet
Unit 4 - DL
23 pages
Long Short Term Memory (LSTM)
No ratings yet
Long Short Term Memory (LSTM)
33 pages
DL Module 5
No ratings yet
DL Module 5
10 pages
LSTM
No ratings yet
LSTM
12 pages
LSTM
No ratings yet
LSTM
22 pages
RNN Part1
No ratings yet
RNN Part1
12 pages
Python - Final 1
No ratings yet
Python - Final 1
17 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
15 pages
A Review On Various Methodologies Used For Vehicle Classification, Helmet Detection and Number Plate Recognition
No ratings yet
A Review On Various Methodologies Used For Vehicle Classification, Helmet Detection and Number Plate Recognition
9 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
10 pages
Machine Learning Questions and Answers For Interview
No ratings yet
Machine Learning Questions and Answers For Interview
20 pages
Applications of Arti Ficial Intelligence Methodologies To Behavioral and Social Sciences
No ratings yet
Applications of Arti Ficial Intelligence Methodologies To Behavioral and Social Sciences
13 pages
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
No ratings yet
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
4 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
Purdue AI and ML Dual Master Program - SlimUp
No ratings yet
Purdue AI and ML Dual Master Program - SlimUp
23 pages
List All The Categorical (Or Nominal) Attributes and The Real Valued Attributes Separately
100% (3)
List All The Categorical (Or Nominal) Attributes and The Real Valued Attributes Separately
58 pages
AI and Machine Learning in Cybersecurity
No ratings yet
AI and Machine Learning in Cybersecurity
8 pages
Roadmap To Crack DS - ML Interviews PDF
No ratings yet
Roadmap To Crack DS - ML Interviews PDF
2 pages
What Is An RNN
No ratings yet
What Is An RNN
6 pages
Sample Questions For Amazon Aif c01 Exam by Johnson
No ratings yet
Sample Questions For Amazon Aif c01 Exam by Johnson
6 pages
Towardsdatascience
No ratings yet
Towardsdatascience
10 pages
"Artificial Intelligence" Topic-A Modern Approach
No ratings yet
"Artificial Intelligence" Topic-A Modern Approach
14 pages
Unit 3
No ratings yet
Unit 3
8 pages
Flow Issue 1 2018 PDF
No ratings yet
Flow Issue 1 2018 PDF
35 pages
Characteristics of Artificial Neural Networks
No ratings yet
Characteristics of Artificial Neural Networks
38 pages
Hierarchical
No ratings yet
Hierarchical
9 pages
SOM Exemplo
No ratings yet
SOM Exemplo
5 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
Context-Based Bengali Next Word Prediction A Compa
No ratings yet
Context-Based Bengali Next Word Prediction A Compa
8 pages
Handwritten Digit Recognition Using Convolutional Neural Networks
No ratings yet
Handwritten Digit Recognition Using Convolutional Neural Networks
6 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
Survey On Recurrent Neural Network in Natural Lang
No ratings yet
Survey On Recurrent Neural Network in Natural Lang
5 pages
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
No ratings yet
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
14 pages
OlahLSTM NEURAL NETWORK TUTORIAL 15
No ratings yet
OlahLSTM NEURAL NETWORK TUTORIAL 15
9 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
Crop Yield Estimation ML
No ratings yet
Crop Yield Estimation ML
5 pages
8.5 Recurrent Neural Networks
No ratings yet
8.5 Recurrent Neural Networks
5 pages
Colah Github Io Posts 2015 08 Understanding LSTMs
No ratings yet
Colah Github Io Posts 2015 08 Understanding LSTMs
16 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
15 pages
Liver Patient Classifi Cation Using Logistic Regression
No ratings yet
Liver Patient Classifi Cation Using Logistic Regression
5 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
8 pages
The Unreasonable Effectiveness of Recurrent Neural Networks
No ratings yet
The Unreasonable Effectiveness of Recurrent Neural Networks
1 page
Recurrent Neural Networks Tutorial, Part 1 - Introduction To RNNs - WildML
No ratings yet
Recurrent Neural Networks Tutorial, Part 1 - Introduction To RNNs - WildML
8 pages
Abhishek Vardhan Narayanam Resume
No ratings yet
Abhishek Vardhan Narayanam Resume
1 page
Vipul Kumar Gupta Resume
No ratings yet
Vipul Kumar Gupta Resume
1 page
Understanding Image Datasets The Foundation of AI and Computer Vision
No ratings yet
Understanding Image Datasets The Foundation of AI and Computer Vision
11 pages
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet

Understanding LSTM Networks - Colah's Blog

Uploaded by

Understanding LSTM Networks - Colah's Blog

Uploaded by

27/09/2023, 17:23 Understanding LSTM Networks -- colah's blog

Understanding LSTM Networks

Recurrent Neural Networks

An unrolled recurrent neural network.

The Problem of Long-Term Dependencies

Thankfully, LSTMs don’t have this problem!

The repeating module in a standard RNN contains a single layer.

The repeating module in an LSTM contains four interacting layers.

The Core Idea Behind LSTMs

Step-by-Step LSTM Walk Through

Variants on Long Short Term Memory

Attention and Augmented

Deep Learning, NLP, and

You might also like