LSTM Material 1

Uploaded by

anushattecdigital

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views3 pages

LSTM Material 1

Uploaded by

anushattecdigital

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Traditional Neural Networks small, RNNs can learn to use the past

information.
• Information doesn’t not persist.
• But there are cases we need more context:
Recurrent Neural Network (RNN) Example: “I grew up in France…(100 words)… I
speak fluent French.” Recent information
• It’s a network with loops in them. suggests that the next word is probably the
• Information can persist. name of a language, but if we want to narrow
• Can be thought of multiple copies of the same down which language, we need the context of
network, passing the message to successor. France, from further back. It’s entirely possible
for the gap between the relevant information
and the point where it is needed to become
very large.
• Unfortunately, as that gap grows, RNNs
become unable to learn to connect the
information.
• Conclusion: RNN cannot handle long-term
dependencies.
It can connect information if there is less
content, as the content/words increases, it gets
difficult for it to connect the information.

An unrolled recurrent neural network

A = chunk of neural network, looks at

Input = xt and
outputs a value = ht.
• A loop allows information to be passed
from one step of the network to the next.
• RNN Use Cases: speech recognition,
language modeling, translation, and image
captioning.
• One of the Appeals of RNN: They might be
able to connect previous information to
the present task, such as using previous Long Short-Term Memory Networks (LSTM)
video frames might inform the • Introduced by Hochreiter & Schmid Huber
understanding of the present frame. (1997)
• Special kind of RNN
Structure: • Works much better than the standard version.
• The repeating module in a standard RNN will • Capable of learning long-term dependencies.
have a very simple structure, such as a single • Remembering information for long periods of
tanh layer. time is practically their default behaviour.

Structure:

Issues with RNN

• Where the gap between the relevant • In the above diagram, each line carries an
information and the place that it’s needed is entire vector, from the output of one node to
the inputs of others. The pink circles represent
pointwise operations, like vector addition, and outputs a number between 0 and 1 for
while the yellow boxes are learned neural each number in the cell state Ct−1.
network layers. • 1 represents “completely keep this” while a 0
• LSTMs also have this chain like structure, but represents “completely get rid of this.”
the repeating module has a different structure. • Example: The cell state might include the
Instead of having a single neural network layer, gender of the present subject, so that the
there are four, interacting in a very special way. correct pronouns can be used. When we see a
new subject, we want to forget the gender of
the old subject.

The Core Idea Behind LSTMs

• The key to LSTMs is the cell state, the
horizontal line running through the top of the
diagram.
• It runs straight down the entire chain, with only Second step:
some minor linear interactions. • Decide what new information to store in the
• It’s very easy for information to just flow along cell state.
it unchanged. This has two parts.
o First, a sigmoid layer called the “input
gate layer” decides which values we’ll
update.
o Next, a tanh layer creates a vector of
new candidate values , that could be
• The LSTM does have the ability to remove or added to the state.
add information to the cell state, carefully o In the next step, we’ll combine these
regulated by structures called gates. two to create an update to the state.
• Gates are a way to optionally let information • Example: We’d want to add the gender of the
through. They are composed out of a sigmoid new subject to the cell state, to replace the old
neural net layer and a pointwise multiplication one we’re forgetting.
operation.

• The sigmoid layer outputs numbers between

zero and one, describing how much of each Third Step:
component should be let through. A value of • It’s now time to update the old cell state, Ct−1,
zero means “let nothing through,” while a into the new cell state Ct.
value of one means “let everything through!” • The previous steps already decided what to do,
we just need to apply the changes.
An LSTM has three of these gates, to protect and • We multiply the old state by ft, forgetting the
control the cell state. things we decided to forget earlier. Then we
add
STEP-BY-STEP LSTM WALK THROUGH
• This is the new candidate values, scaled by how
First step: much we decided to update each state value.
• Example: this is where we’d drop the
• Decide what information we’re going to throw
information about the old subject’s gender and
away from the cell state.
add the new information, as we decided in the
• This decision is made by a sigmoid layer called previous steps.
the “forget gate layer.” It looks at ht−1 and xt,
Final Step: • A slightly more dramatic variation on the LSTM
• Decide what we’re going to output. is the Gated Recurrent Unit, or GRU,
• This output will be based on our cell state but introduced by Cho, et al. (2014).
will be a filtered version. • It combines the forget and input gates into a
o First, we run a sigmoid layer which single “update gate.” It also merges the cell
decides what parts of the cell state state and hidden state and makes some other
we’re going to output. changes. The resulting model is simpler than
o Then, we put the cell state through standard LSTM models and has been growing
tanh (to push the values to be between increasingly popular.
−1 and 1) and multiply it by the output
of the sigmoid gate, so that we only
output the parts we decided to.
• Example: Since it just saw a subject, it might
want to output information relevant to a verb
in case that’s what is coming next. It might
output whether the subject is singular or
plural, so that we know what form a verb
should be conjugated into if that’s what follows
next.

Variants on Long Short-Term Memory

• One popular LSTM variant, introduced by Gers
& Schmid Huber (2000), is adding “peephole
connections.” This means that we let the gate
layers look at the cell state.

• Another variation is to use coupled forget and

input gates. Instead of separately deciding
what to forget and what we should add new
information to, we make those decisions
together. We only forget when we’re going to
input something in its place. We only input new
values to the state when we forget something
older.

TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Business Requirements Document
100% (1)
Business Requirements Document
13 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
Long Short Term Memory (LSTM)
No ratings yet
Long Short Term Memory (LSTM)
33 pages
OlahLSTM NEURAL NETWORK TUTORIAL 15
No ratings yet
OlahLSTM NEURAL NETWORK TUTORIAL 15
9 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
10 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
7 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
15 pages
LSTM
No ratings yet
LSTM
12 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
DLT Unit-4
No ratings yet
DLT Unit-4
18 pages
LSTM
No ratings yet
LSTM
22 pages
Module 4
No ratings yet
Module 4
14 pages
LSTM Presentation
No ratings yet
LSTM Presentation
23 pages
RNNs and LSTMs
No ratings yet
RNNs and LSTMs
41 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
8 pages
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
LSTM
No ratings yet
LSTM
19 pages
LSTM PPT
No ratings yet
LSTM PPT
22 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
LSTM&RNN
No ratings yet
LSTM&RNN
10 pages
Lecture 11
No ratings yet
Lecture 11
57 pages
LSTM Deep Learning
No ratings yet
LSTM Deep Learning
11 pages
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
No ratings yet
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
14 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
Neural Networks
No ratings yet
Neural Networks
22 pages
Longshorttermmemorylstm 231215171600 1feb7b1b
No ratings yet
Longshorttermmemorylstm 231215171600 1feb7b1b
17 pages
LSTM
No ratings yet
LSTM
3 pages
Week 6
No ratings yet
Week 6
60 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
Deep Learning L3
No ratings yet
Deep Learning L3
37 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Unit 3
No ratings yet
Unit 3
8 pages
Modified Long Short-Term Memory and Utilizing in Building Sequential Model
No ratings yet
Modified Long Short-Term Memory and Utilizing in Building Sequential Model
6 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
06-DL-Deep Learning For Text Data (LSTM Seq2Seq Models)
No ratings yet
06-DL-Deep Learning For Text Data (LSTM Seq2Seq Models)
44 pages
LSTM & Gru
No ratings yet
LSTM & Gru
17 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
6 - RNN LSTM & Gru
No ratings yet
6 - RNN LSTM & Gru
14 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
Deep Learning 2017 Lecture6RNN 1 18
No ratings yet
Deep Learning 2017 Lecture6RNN 1 18
18 pages
Unit Iii
No ratings yet
Unit Iii
5 pages
RNN & LSTM Notes
No ratings yet
RNN & LSTM Notes
8 pages
Unit 4
No ratings yet
Unit 4
50 pages
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
No ratings yet
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
4 pages
10 20 - Apr - DL
No ratings yet
10 20 - Apr - DL
69 pages
AAM Unit 6 Notes
No ratings yet
AAM Unit 6 Notes
20 pages
Module 6
No ratings yet
Module 6
42 pages
Machine Learning Unit 4 RNN
No ratings yet
Machine Learning Unit 4 RNN
11 pages
Cs224n 2025 Lecture06 Fancy RNN
No ratings yet
Cs224n 2025 Lecture06 Fancy RNN
57 pages
Understanding LSTM - A Simple Guide With Diagrams and Real-Time Examples - by Neural Pai - Feb, 2025 - Medium
No ratings yet
Understanding LSTM - A Simple Guide With Diagrams and Real-Time Examples - by Neural Pai - Feb, 2025 - Medium
15 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
Recurrent Neural Network: What Does RNN Stand For?
No ratings yet
Recurrent Neural Network: What Does RNN Stand For?
7 pages
RNN
No ratings yet
RNN
28 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
RNN 2
No ratings yet
RNN 2
144 pages
LSTM Networks Thesis Updated
No ratings yet
LSTM Networks Thesis Updated
5 pages
LSTM
No ratings yet
LSTM
14 pages
DL - Intro
No ratings yet
DL - Intro
35 pages
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet
A Performance Task in Grade 12A
100% (3)
A Performance Task in Grade 12A
3 pages
Stefano Boni
No ratings yet
Stefano Boni
2 pages
TMGT Tech Marketing Influencer Briefing Book
No ratings yet
TMGT Tech Marketing Influencer Briefing Book
3 pages
Effects of Mechanical Cleaning
No ratings yet
Effects of Mechanical Cleaning
7 pages
JHi 5 SJ 8 Oa FQTSMJ W8 L RFM 2 JJ 9 P MK VK7 I LNMM Ko Hy 7 G NC
No ratings yet
JHi 5 SJ 8 Oa FQTSMJ W8 L RFM 2 JJ 9 P MK VK7 I LNMM Ko Hy 7 G NC
1 page
M3 Playbook Risk Management Plan Template
No ratings yet
M3 Playbook Risk Management Plan Template
15 pages
MAt Design Complete
No ratings yet
MAt Design Complete
13 pages
19MnCr5
No ratings yet
19MnCr5
8 pages
Syl 1011 Psychology of Love
No ratings yet
Syl 1011 Psychology of Love
3 pages
Gokulram J Instr Supervisor 7
No ratings yet
Gokulram J Instr Supervisor 7
3 pages
RCD3601
No ratings yet
RCD3601
20 pages
Lecture 10B - Area Computation Techniques and Omitted Measurements
No ratings yet
Lecture 10B - Area Computation Techniques and Omitted Measurements
14 pages
The SM Store - Application Form
No ratings yet
The SM Store - Application Form
1 page
Geometrical Optics ASSIGN - Student
No ratings yet
Geometrical Optics ASSIGN - Student
29 pages
Error Correction
50% (2)
Error Correction
14 pages
Wave Equation - Wikipedia PDF
No ratings yet
Wave Equation - Wikipedia PDF
79 pages
024 - Benjamin Fulford - 01-31-12
No ratings yet
024 - Benjamin Fulford - 01-31-12
2 pages
PCC-CS494 Rakesh Manna
No ratings yet
PCC-CS494 Rakesh Manna
1 page
Significant Figures
No ratings yet
Significant Figures
31 pages
Market Identification Guide
0% (1)
Market Identification Guide
14 pages
Business Economics - II - SYBCOM
No ratings yet
Business Economics - II - SYBCOM
6 pages
CXTFit ISMAR PDF
100% (1)
CXTFit ISMAR PDF
4 pages
Literacy Science
No ratings yet
Literacy Science
14 pages
Deleted Chapter 2024 @somyajeet
No ratings yet
Deleted Chapter 2024 @somyajeet
5 pages
Sh79f166 App - Note v2.1
No ratings yet
Sh79f166 App - Note v2.1
21 pages
ConfD Kick Start Guide
No ratings yet
ConfD Kick Start Guide
37 pages
An Explicit Equation For Friction Factor in Pipe
No ratings yet
An Explicit Equation For Friction Factor in Pipe
2 pages
Freehold Regional School District Progress Report
No ratings yet
Freehold Regional School District Progress Report
1 page

LSTM Material 1

Uploaded by

LSTM Material 1

Uploaded by

Traditional Neural Networks small, RNNs can learn to use the past

An unrolled recurrent neural network

A = chunk of neural network, looks at

Issues with RNN

The Core Idea Behind LSTMs

• The sigmoid layer outputs numbers between

Variants on Long Short-Term Memory

• Another variation is to use coupled forget and

You might also like