0% found this document useful (0 votes)
8 views3 pages

LSTM Material 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views3 pages

LSTM Material 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Traditional Neural Networks small, RNNs can learn to use the past

information.
• Information doesn’t not persist.
• But there are cases we need more context:
Recurrent Neural Network (RNN) Example: “I grew up in France…(100 words)… I
speak fluent French.” Recent information
• It’s a network with loops in them. suggests that the next word is probably the
• Information can persist. name of a language, but if we want to narrow
• Can be thought of multiple copies of the same down which language, we need the context of
network, passing the message to successor. France, from further back. It’s entirely possible
for the gap between the relevant information
and the point where it is needed to become
very large.
• Unfortunately, as that gap grows, RNNs
become unable to learn to connect the
information.
• Conclusion: RNN cannot handle long-term
dependencies.
It can connect information if there is less
content, as the content/words increases, it gets
difficult for it to connect the information.

An unrolled recurrent neural network

A = chunk of neural network, looks at


Input = xt and
outputs a value = ht.
• A loop allows information to be passed
from one step of the network to the next.
• RNN Use Cases: speech recognition,
language modeling, translation, and image
captioning.
• One of the Appeals of RNN: They might be
able to connect previous information to
the present task, such as using previous Long Short-Term Memory Networks (LSTM)
video frames might inform the • Introduced by Hochreiter & Schmid Huber
understanding of the present frame. (1997)
• Special kind of RNN
Structure: • Works much better than the standard version.
• The repeating module in a standard RNN will • Capable of learning long-term dependencies.
have a very simple structure, such as a single • Remembering information for long periods of
tanh layer. time is practically their default behaviour.

Structure:

Issues with RNN


• Where the gap between the relevant • In the above diagram, each line carries an
information and the place that it’s needed is entire vector, from the output of one node to
the inputs of others. The pink circles represent
pointwise operations, like vector addition, and outputs a number between 0 and 1 for
while the yellow boxes are learned neural each number in the cell state Ct−1.
network layers. • 1 represents “completely keep this” while a 0
• LSTMs also have this chain like structure, but represents “completely get rid of this.”
the repeating module has a different structure. • Example: The cell state might include the
Instead of having a single neural network layer, gender of the present subject, so that the
there are four, interacting in a very special way. correct pronouns can be used. When we see a
new subject, we want to forget the gender of
the old subject.

The Core Idea Behind LSTMs


• The key to LSTMs is the cell state, the
horizontal line running through the top of the
diagram.
• It runs straight down the entire chain, with only Second step:
some minor linear interactions. • Decide what new information to store in the
• It’s very easy for information to just flow along cell state.
it unchanged. This has two parts.
o First, a sigmoid layer called the “input
gate layer” decides which values we’ll
update.
o Next, a tanh layer creates a vector of
new candidate values , that could be
• The LSTM does have the ability to remove or added to the state.
add information to the cell state, carefully o In the next step, we’ll combine these
regulated by structures called gates. two to create an update to the state.
• Gates are a way to optionally let information • Example: We’d want to add the gender of the
through. They are composed out of a sigmoid new subject to the cell state, to replace the old
neural net layer and a pointwise multiplication one we’re forgetting.
operation.

• The sigmoid layer outputs numbers between


zero and one, describing how much of each Third Step:
component should be let through. A value of • It’s now time to update the old cell state, Ct−1,
zero means “let nothing through,” while a into the new cell state Ct.
value of one means “let everything through!” • The previous steps already decided what to do,
we just need to apply the changes.
An LSTM has three of these gates, to protect and • We multiply the old state by ft, forgetting the
control the cell state. things we decided to forget earlier. Then we
add
STEP-BY-STEP LSTM WALK THROUGH
• This is the new candidate values, scaled by how
First step: much we decided to update each state value.
• Example: this is where we’d drop the
• Decide what information we’re going to throw
information about the old subject’s gender and
away from the cell state.
add the new information, as we decided in the
• This decision is made by a sigmoid layer called previous steps.
the “forget gate layer.” It looks at ht−1 and xt,
Final Step: • A slightly more dramatic variation on the LSTM
• Decide what we’re going to output. is the Gated Recurrent Unit, or GRU,
• This output will be based on our cell state but introduced by Cho, et al. (2014).
will be a filtered version. • It combines the forget and input gates into a
o First, we run a sigmoid layer which single “update gate.” It also merges the cell
decides what parts of the cell state state and hidden state and makes some other
we’re going to output. changes. The resulting model is simpler than
o Then, we put the cell state through standard LSTM models and has been growing
tanh (to push the values to be between increasingly popular.
−1 and 1) and multiply it by the output
of the sigmoid gate, so that we only
output the parts we decided to.
• Example: Since it just saw a subject, it might
want to output information relevant to a verb
in case that’s what is coming next. It might
output whether the subject is singular or
plural, so that we know what form a verb
should be conjugated into if that’s what follows
next.

Variants on Long Short-Term Memory


• One popular LSTM variant, introduced by Gers
& Schmid Huber (2000), is adding “peephole
connections.” This means that we let the gate
layers look at the cell state.

• Another variation is to use coupled forget and


input gates. Instead of separately deciding
what to forget and what we should add new
information to, we make those decisions
together. We only forget when we’re going to
input something in its place. We only input new
values to the state when we forget something
older.

You might also like