0% found this document useful (0 votes)

343 views93 pages

6S191 MIT DeepLearning L2

This document discusses challenges in sequence modeling and introduces recurrent neural networks (RNNs) as an approach to address these challenges. Specifically, it notes that to model sequences effectively, a model needs to handle variable-length sequences, track long-term dependencies, maintain information about order, and share parameters across the sequence. RNNs are presented as a way to meet these criteria for sequence modeling problems.

Uploaded by

Mohammad Mashrequl Islam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

343 views93 pages

6S191 MIT DeepLearning L2

Uploaded by

Mohammad Mashrequl Islam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 93

Deep Sequence Modeling

Ava Soleimany
MIT 6.S191
January 27, 2020

6.S191 Introduction to Deep Learning

introtodeeplearning.com @MITDeepLearning
Given an image of a ball,
can you predict where it will go next?

1 9 1
6 . S
M I T
Given an image of a ball,
can you predict where it will go next?

1 9 1
6 . S
M I T ???
Given an image of a ball,
can you predict where it will go next?

1 9 1
6 . S
M I T
Given an image of a ball,
can you predict where it will go next?

1 9 1
6 . S
M I T
Sequences in the Wild

1 9 1
6 . S
M I T
Audio
6.S191 Introduction to Deep Learning
1/27/20
introtodeeplearning.com @MITDeepLearning
Sequences in the Wild

character:

1 9 1
6 . S
6.S191 Introduction to Deep Learning
word:

M I T
Text
6.S191 Introduction to Deep Learning
1/27/20
introtodeeplearning.com @MITDeepLearning
1 9 1
6 . S
A Sequence Modeling Problem:
Predict the Next Word

M I T
A Sequence Modeling Problem: Predict the Next Word
“This morning I took my cat for a walk.”

1 9 1
6 . S
M I T
6.S191 Introduction to Deep Learning
H. Suresh, 6.S191 2018. 1/27/20
introtodeeplearning.com @MITDeepLearning
A Sequence Modeling Problem: Predict the Next Word
“This morning I took my cat for a walk.”
given these words

1 9 1 predict the
next word

6 . S
M I T
6.S191 Introduction to Deep Learning
H. Suresh, 6.S191 2018. 1/27/20
introtodeeplearning.com @MITDeepLearning
Idea #1: Use a Fixed Window
“This morning I took my cat for a walk.”
given these
two words

1 9 1 predict the
next word

. S
One-hot feature encoding: tells us what each word is

[1000001000] 6
M I T for a

prediction
6.S191 Introduction to Deep Learning
H. Suresh, 6.S191 2018. 1/27/20
introtodeeplearning.com @MITDeepLearning
Problem #1: Can’t Model Long-Term Dependencies

“France is where I grew up, but I now live in Boston. I speak fluent ___.”

1 9 1
.
J’aime 6.S191!
6 S
I T
We need information from the distant past to accurately

M predict the correct word.

6.S191 Introduction to Deep Learning

H. Suresh, 6.S191 2018. 1/27/20
introtodeeplearning.com @MITDeepLearning
Idea #2: Use Entire Sequence as Set of Counts
“This morning I took my cat for a”

1 9 1
. S
“bag of words”

6
[ 0 1 0 0 1 0 0 … 0 0 1 1 0 0 0 1]

M I T
prediction

6.S191 Introduction to Deep Learning

H. Suresh, 6.S191 2018. 1/27/20
introtodeeplearning.com @MITDeepLearning
Problem #2: Counts Don’t Preserve Order

The food was good, not bad at all.

1 9 1
6 . S vs.

M I T
The food was bad, not good at all.

6.S191 Introduction to Deep Learning

H. Suresh, 6.S191 2018. 1/27/20
introtodeeplearning.com @MITDeepLearning
Idea #3: Use a Really Big Fixed Window
“This morning I took my cat for a walk.”
given these
words

1 9 1 predict the
next word

morning I
.
[10000000010010001000 00010 … ]

6 S
took this cat

M I T
prediction

6.S191 Introduction to Deep Learning

H. Suresh, 6.S191 2018. 1/27/20
introtodeeplearning.com @MITDeepLearning
Problem #3: No Parameter Sharing
[10000000010010001000 00010 … ]
this morning took the

1 9 1 cat

. S
Each of these inputs has a separate parameter:

6
M I T
6.S191 Introduction to Deep Learning
H. Suresh, 6.S191 2018. 1/27/20
introtodeeplearning.com @MITDeepLearning
Problem #3: No Parameter Sharing
[10000000010010001000 00010 … ]
this morning took the

1 9 1 cat

. S
Each of these inputs has a separate parameter:

6
M I T
[0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 … ]
this morning

6.S191 Introduction to Deep Learning

H. Suresh, 6.S191 2018. 1/27/20
introtodeeplearning.com @MITDeepLearning
Problem #3: No Parameter Sharing
[10000000010010001000 00010 … ]
this morning took the

1 9 1 cat

. S
Each of these inputs has a separate parameter:

6
M I T
[0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 … ]

Things we learn about the sequence won’t transfer if

this morning

they appear elsewhere in the sequence.

6.S191 Introduction to Deep Learning
H. Suresh, 6.S191 2018. 1/27/20
introtodeeplearning.com @MITDeepLearning
Sequence Modeling: Design Criteria
To model sequences, we need to:
1. Handle variable-length sequences

1 9 1
2. Track long-term dependencies
3.
6 .
Maintain information about order S RNN

M I T
Share parameters across the sequence

Today: Recurrent Neural Networks (RNNs) as

an approach to sequence modeling problems

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
1 9 1
. S
Recurrent Neural Networks (RNNs)
6
M I T
Standard Feed-Forward Neural Network

1 9 1
6 . S
!
One to One
M I T
“Vanilla” neural network

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
Recurrent Neural Networks for Sequence Modeling

1 9 1
6 . S
!
One to One
M I T
“Vanilla” neural network
Many to One
Sentiment Classification

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
Recurrent Neural Networks for Sequence Modeling

1 9 1
6 . S
!
One to One
M I T
“Vanilla” neural network
Many to One
Sentiment Classification
Many to Many
Music Generation

6.S191 Lab!

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
Recurrent Neural Networks for Sequence Modeling

1 9 1
6 . S … and many other
architectures and
applications

!
One to One
M I T
“Vanilla” neural network
Many to One
Sentiment Classification
Many to Many
Music Generation

6.S191 Lab!

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
Standard “Vanilla” Neural Network
$#"
1
output vector

1 9
6 . S
M I
input vector
T !"

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
Recurrent Neural Network (RNN)
$#"
1
output vector

1 9
RNN ℎ"
6 . S
M I
input vector
T !"

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
Recurrent Neural Network (RNN)
$#"
1
output vector

1 9
RNN ℎ"
6 . S
T
recurrent cell

M I
input vector !"

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
Recurrent Neural Network (RNN)

1
output vector $#" Apply a recurrence relation at every

9
time step to process a sequence:

. S 1
T
RNN
recurrent cell
ℎ"
6
M I
input vector !"

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
Recurrent Neural Network (RNN)

1
output vector -," Apply a recurrence relation at every

ℎ = $1(ℎ9 , * )
time step to process a sequence:

RNN ℎ"
6 . S "
cell state
%
function
"'(
old state
"
input vector at
time step t

T
parameterized
recurrent cell

I
by W

M
input vector *"

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
Recurrent Neural Network (RNN)

1
output vector -," Apply a recurrence relation at every

ℎ = $1(ℎ9 , * )
time step to process a sequence:

RNN ℎ"
6 . S "
cell state
%
function
"'(
old state
"
input vector at
time step t

T
parameterized
recurrent cell

I
by W

M
input vector *"
Note: the same function and set of
parameters are used at every time step

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
RNN Intuition

my_rnn = RNN()
hidden_state = [0, 0, 0, 0]

1 9 1 output vector $#"

for word in sentence:

6 . S
sentence = ["I", "love", "recurrent", "neural"]

RNN

M I T
prediction, hidden_state = my_rnn(word, hidden_state)

next_word_prediction = prediction
# >>> "networks!"
input vector
recurrent cell

!"
ℎ"

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
RNN Intuition

my_rnn = RNN()
hidden_state = [0, 0, 0, 0]

1 9 1 output vector $#"

for word in sentence:

6 . S
sentence = ["I", "love", "recurrent", "neural"]

RNN

M I T
prediction, hidden_state = my_rnn(word, hidden_state)

next_word_prediction = prediction
# >>> "networks!"
input vector
recurrent cell

!"
ℎ"

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
RNN Intuition

my_rnn = RNN()
hidden_state = [0, 0, 0, 0]

1 9 1 output vector $#"

for word in sentence:

6 . S
sentence = ["I", "love", "recurrent", "neural"]

RNN

M I T
prediction, hidden_state = my_rnn(word, hidden_state)

next_word_prediction = prediction
# >>> "networks!"
input vector
recurrent cell

!"
ℎ"

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
RNN State Update and Output
$#"
1
output vector

1 9
RNN ℎ"
6 . S
T
recurrent cell

M I
input vector !"

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
RNN State Update and Output
$#"
1
output vector

1 9
RNN ℎ"
6 . S
T
recurrent cell

M I
input vector !"
Input Vector
!"

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
RNN State Update and Output
$#"
1
output vector

1 9
RNN ℎ"
6 . S Update Hidden State
ℎ" = tanh(,.-- ℎ"/0 + ,.2- !" )
T
recurrent cell

M I
input vector !"
Input Vector
!"

6.S191 Introduction to Deep Learning

1/27/20
introtodeeplearning.com @MITDeepLearning
RNN State Update and Output
Output Vector
"!#
1
output vector

"!# = %(&' ℎ#

1 9
RNN ℎ#
6 . S Update Hidden State
ℎ# = tanh(%(&& ℎ#01 + %(3& *# )
T
recurrent cell

M I
input vector *#
Input Vector
*#