0% found this document useful (0 votes)
170 views56 pages

Long Short Term Memory PDF

The document discusses Long Short-Term Memory (LSTM) and Recurrent Neural Networks (RNNs). LSTMs and RNNs are neural networks that can process sequences of inputs and make predictions. They work by making predictions, collecting new information, and updating their memories and predictions over time as new inputs are received.

Uploaded by

a_pothana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views56 pages

Long Short Term Memory PDF

The document discusses Long Short-Term Memory (LSTM) and Recurrent Neural Networks (RNNs). LSTMs and RNNs are neural networks that can process sequences of inputs and make predictions. They work by making predictions, collecting new information, and updating their memories and predictions over time as new inputs are received.

Uploaded by

a_pothana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

How

Long Short-Term Memory (LSTM)


and
Recurrent Neural Networks (RNNs)
work
Brandon Rohrer
What’s for dinner?

day pizza
of the
week

month sushi
of the
year

late waffles
meeting
What’s for dinner?

pizza pizza
yesterday

sushi sushi
yesterday

waffles waffles
yesterday
predicted pizza for yesterday

pizza
predicted sushi for yesterday

predicted waffles for yesterday


sushi
pizza yesterday

sushi yesterday waffles

waffles yesterday
A vector is a list of values
High Weather vector
67
temperature
67
Low
43
“High is 67 F. temperature
43
Low is 43 F.
Wind is 13 mph.
.25 inches of rain.
Relative humidity
is 83%.”
= Wind speed 13
= 13

.25
Precipitation .25
.83

Humidity .83
A vector is a list of values
Sunday 0 Day of week vector

Monday 0 0

0
Tuesday 1
1
“It’s Tuesday”
= Wednesday

Thursday
0

0
= 0

0
Friday 0 0

Saturday 0
A vector is a list of values
Dinner prediction vector

Pizza 0
0
“Tonight I think
we’re going to
have sushi.”
= Sushi 1
= 1

0
Waffles 0
predicted pizza for yesterday

pizza
predicted sushi for yesterday

predicted waffles for yesterday


sushi
pizza yesterday

sushi yesterday waffles

waffles yesterday
prediction for today
predictions for yesterday

dinner yesterday
prediction

new information
prediction
Unrolled predictions
two days ago yesterday
... pizza

... sushi

... waffles
Write a children’s book
Doug saw Jane.

Jane saw Spot.

Spot saw Doug.

...

Your dictionary is small: {Doug, Jane, Spot, saw, .}


prediction

Jane
Doug
Spot Jane
saw
Doug
.
Spot
Jane
saw
Doug
new information .
Spot
saw
.
prediction

Jane
Doug
Spot Jane
saw
Doug
.
Spot
Jane
saw
Doug
new information
Spot .
saw
.
prediction

Jane
Doug
Spot Jane
saw
Doug
.
Spot
Jane
saw
Doug
new information
Spot .
saw
.
prediction

Jane
Doug
Spot Jane
saw
Doug
.
Spot
Jane
saw
Doug
new information
Spot .
saw
.
recurrent
prediction
neural
network

new
information
Hyperbolic tangent (tanh) squashing function

1.0

.5
-2.0 -1.5 -1.0 -.5

.5 1.0 1.5 2.0

-.5

-1.0
tanh squashing function

1.0

.5
-2.0 -1.5 -1.0 -.5

.5 1.0 1.5 2.0

-.5

-1.0

Your number goes in here


tanh squashing function

1.0

.5
-2.0 -1.5 -1.0 -.5

.5 1.0 1.5 2.0

-.5

-1.0
tanh squashing function
The squashed version
comes out here 1.0

.5
-2.0 -1.5 -1.0 -.5

.5 1.0 1.5 2.0

-.5

-1.0
tanh squashing function

1.0

.5
-2.0 -1.5 -1.0 -.5

.5 1.0 1.5 2.0

-.5

-1.0
tanh squashing function

1.0

.5
-2.0 -1.5 -1.0 -.5

.5 1.0 1.5 2.0

-.5

-1.0
tanh squashing function

1.0

.5
-2.0 -1.5 -1.0 -.5

.5 1.0 1.5 2.0

-.5

-1.0

No matter what you start with, the answer stays between -1 and 1.
recurrent
prediction
neural
network

new
information
Mistakes an RNN can make
Doug saw Doug.

Jane saw Spot saw Doug saw …

Spot. Doug. Jane.


recurrent
prediction
neural
network

new
information
recurrent prediction

neural
network

new
information
memory prediction +
memories

forgetting
memory

prediction

new
information
Plus junction: element-by-element addition
3

3+6 9
5

6
= 4+7

5+8
= 11

13

8
Times junction: element-by-element multiplication
3

3x6 18
5

6
= 4x7

5x8
= 28

40

8
Gating

0.8

Signal 0.8

0.8 x 1.0 0.8


0.8

1.0
= 0.8 x 0.5

0.8 x 0.0
= 0.4

0.0
On / Off
gating 0.5

0.0
Logistic (sigmoid) squashing function

1.0

.5

-2.0 -1.0 1.0 2.0

No matter what you start with, the answer stays between 0 and 1.
memory prediction
and
memories

forgetting
memory

prediction

new
information
memory prediction

selection

collected
possibilities
forgetting
memory

new possibilities
information
long prediction

short-term selection
memory

collected
possibilities
forgetting
memory
filtered
possibilities
ignoring

new possibilities
information
long prediction

short-term selection
memory

collected
possibilities
forgetting
memory
filtered
possibilities
ignoring

Jane saw Spot. possibilities


Doug …
long prediction

short-term selection
memory
Doug,
Jane, collected
Spot possibilities
forgetting
memory
filtered
possibilities
ignoring

Jane saw Spot. possibilities


Doug …
long prediction

short-term selection
memory
Doug,
Jane, collected
Spot possibilities
forgetting
memory
filtered
possibilities
ignoring

Jane saw Spot.


saw,
Doug …
Doug
long prediction

short-term selection
memory
Doug,
Jane, collected
Spot possibilities
forgetting
memory
saw,
Doug
ignoring

Jane saw Spot.


saw,
Doug …
Doug
long prediction

short-term selection
memory
Doug,
Jane, saw,
Spot Doug
forgetting
memory
saw,
Doug
ignoring

Jane saw Spot.


saw,
Doug …
Doug
long saw
short-term selection
memory
Doug,
Jane, saw,
Spot Doug
forgetting
memory
saw,
Doug
ignoring

Jane saw Spot.


saw,
Doug …
Doug
long prediction

short-term selection
memory

collected
possibilities
forgetting
memory
filtered
possibilities
ignoring

new possibilities
information
long prediction

short-term selection
memory

saw collected
possibilities
forgetting
memory
filtered
possibilities
ignoring

new possibilities
information
long prediction

short-term selection
memory

saw collected
possibilities
forgetting
memory
filtered
possibilities
ignoring

Jane saw Spot. possibilities


Doug saw …
long prediction

short-term selection
memory
saw,
saw Doug collected
possibilities
forgetting
memory
filtered
possibilities
ignoring

Jane saw Spot. possibilities


Doug saw …
long prediction

short-term selection
memory
saw,
saw Doug collected
possibilities
forgetting

Doug
filtered
possibilities
ignoring

Jane saw Spot. possibilities


Doug saw …
long prediction

short-term selection
memory
saw,
saw Doug collected
possibilities
forgetting

Doug
filtered
possibilities
ignoring

Jane saw Spot. Doug,


Doug saw … Jane,
Spot
long prediction

short-term selection
memory
saw,
saw Doug collected
possibilities
forgetting

Doug Doug,
Jane,
ignoring Spot

Jane saw Spot. Doug,


Doug saw … Jane,
Spot
long prediction

short-term selection
memory
saw,
saw Doug Jane,
Spot
forgetting

Doug Doug,
Jane,
ignoring Spot

Jane saw Spot. Doug,


Doug saw … Jane,
Spot
long Jane,
short-term selection
Spot

memory
saw,
saw Doug Jane,
Spot
forgetting

Doug Doug,
Jane,
ignoring Spot

Jane saw Spot. Doug,


Doug saw … Jane,
Spot
Sequential patterns
Text

Speech

Audio

Video

Physical processes

Anything embedded in time (almost everything)


Resources
Chris Olah’s tutorial

Andrej Karpathy’s
Blog post
RNN code
Stanford CS231n lecture

The DeepLearning 4J tutorial has some helpful


discussion and a longer list of good resources.

How Neural Networks Work [video]


Credits (all images CC0)
Pizza image

Sushi image

Waffles image

You might also like