Long Short Term Memory PDF
Long Short Term Memory PDF
day pizza
of the
week
month sushi
of the
year
late waffles
meeting
What’s for dinner?
pizza pizza
yesterday
sushi sushi
yesterday
waffles waffles
yesterday
predicted pizza for yesterday
pizza
predicted sushi for yesterday
waffles yesterday
A vector is a list of values
High Weather vector
67
temperature
67
Low
43
“High is 67 F. temperature
43
Low is 43 F.
Wind is 13 mph.
.25 inches of rain.
Relative humidity
is 83%.”
= Wind speed 13
= 13
.25
Precipitation .25
.83
Humidity .83
A vector is a list of values
Sunday 0 Day of week vector
Monday 0 0
0
Tuesday 1
1
“It’s Tuesday”
= Wednesday
Thursday
0
0
= 0
0
Friday 0 0
Saturday 0
A vector is a list of values
Dinner prediction vector
Pizza 0
0
“Tonight I think
we’re going to
have sushi.”
= Sushi 1
= 1
0
Waffles 0
predicted pizza for yesterday
pizza
predicted sushi for yesterday
waffles yesterday
prediction for today
predictions for yesterday
dinner yesterday
prediction
new information
prediction
Unrolled predictions
two days ago yesterday
... pizza
... sushi
... waffles
Write a children’s book
Doug saw Jane.
...
Jane
Doug
Spot Jane
saw
Doug
.
Spot
Jane
saw
Doug
new information .
Spot
saw
.
prediction
Jane
Doug
Spot Jane
saw
Doug
.
Spot
Jane
saw
Doug
new information
Spot .
saw
.
prediction
Jane
Doug
Spot Jane
saw
Doug
.
Spot
Jane
saw
Doug
new information
Spot .
saw
.
prediction
Jane
Doug
Spot Jane
saw
Doug
.
Spot
Jane
saw
Doug
new information
Spot .
saw
.
recurrent
prediction
neural
network
new
information
Hyperbolic tangent (tanh) squashing function
1.0
.5
-2.0 -1.5 -1.0 -.5
-.5
-1.0
tanh squashing function
1.0
.5
-2.0 -1.5 -1.0 -.5
-.5
-1.0
1.0
.5
-2.0 -1.5 -1.0 -.5
-.5
-1.0
tanh squashing function
The squashed version
comes out here 1.0
.5
-2.0 -1.5 -1.0 -.5
-.5
-1.0
tanh squashing function
1.0
.5
-2.0 -1.5 -1.0 -.5
-.5
-1.0
tanh squashing function
1.0
.5
-2.0 -1.5 -1.0 -.5
-.5
-1.0
tanh squashing function
1.0
.5
-2.0 -1.5 -1.0 -.5
-.5
-1.0
No matter what you start with, the answer stays between -1 and 1.
recurrent
prediction
neural
network
new
information
Mistakes an RNN can make
Doug saw Doug.
new
information
recurrent prediction
neural
network
new
information
memory prediction +
memories
forgetting
memory
prediction
new
information
Plus junction: element-by-element addition
3
3+6 9
5
6
= 4+7
5+8
= 11
13
8
Times junction: element-by-element multiplication
3
3x6 18
5
6
= 4x7
5x8
= 28
40
8
Gating
0.8
Signal 0.8
1.0
= 0.8 x 0.5
0.8 x 0.0
= 0.4
0.0
On / Off
gating 0.5
0.0
Logistic (sigmoid) squashing function
1.0
.5
No matter what you start with, the answer stays between 0 and 1.
memory prediction
and
memories
forgetting
memory
prediction
new
information
memory prediction
selection
collected
possibilities
forgetting
memory
new possibilities
information
long prediction
short-term selection
memory
collected
possibilities
forgetting
memory
filtered
possibilities
ignoring
new possibilities
information
long prediction
short-term selection
memory
collected
possibilities
forgetting
memory
filtered
possibilities
ignoring
short-term selection
memory
Doug,
Jane, collected
Spot possibilities
forgetting
memory
filtered
possibilities
ignoring
short-term selection
memory
Doug,
Jane, collected
Spot possibilities
forgetting
memory
filtered
possibilities
ignoring
short-term selection
memory
Doug,
Jane, collected
Spot possibilities
forgetting
memory
saw,
Doug
ignoring
short-term selection
memory
Doug,
Jane, saw,
Spot Doug
forgetting
memory
saw,
Doug
ignoring
short-term selection
memory
collected
possibilities
forgetting
memory
filtered
possibilities
ignoring
new possibilities
information
long prediction
short-term selection
memory
saw collected
possibilities
forgetting
memory
filtered
possibilities
ignoring
new possibilities
information
long prediction
short-term selection
memory
saw collected
possibilities
forgetting
memory
filtered
possibilities
ignoring
short-term selection
memory
saw,
saw Doug collected
possibilities
forgetting
memory
filtered
possibilities
ignoring
short-term selection
memory
saw,
saw Doug collected
possibilities
forgetting
Doug
filtered
possibilities
ignoring
short-term selection
memory
saw,
saw Doug collected
possibilities
forgetting
Doug
filtered
possibilities
ignoring
short-term selection
memory
saw,
saw Doug collected
possibilities
forgetting
Doug Doug,
Jane,
ignoring Spot
short-term selection
memory
saw,
saw Doug Jane,
Spot
forgetting
Doug Doug,
Jane,
ignoring Spot
memory
saw,
saw Doug Jane,
Spot
forgetting
Doug Doug,
Jane,
ignoring Spot
Speech
Audio
Video
Physical processes
Andrej Karpathy’s
Blog post
RNN code
Stanford CS231n lecture
Sushi image
Waffles image