0% found this document useful (0 votes)

43 views106 pages

cs231n 2019 Lecture10

Uploaded by

vijayk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views106 pages

cs231n 2019 Lecture10

Uploaded by

vijayk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 106

Lecture 10:

Recurrent Neural Networks

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 1 May 2, 2019
Administrative: Midterm

- Midterm next Tue 5/7 during class time. Room

assignments and practice midterm on Piazza.
** Please don’t go to the wrong midterm room!!

- Midterm review session: Fri 5/3 discussion section

- Midterm covers material up to this lecture (Lecture 10)

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 2 May 2, 2019
Administrative

- Project proposal feedback has been released

- Project milestone due Wed 5/15, see Piazza for

requirements
** Need to have some baseline / initial results by then,
so start implementing soon if you haven’t yet!

- A3 will be released Wed 5/8, due Wed 5/22

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 3 May 2, 2019
Last Time: CNN Architectures

GoogLeNet
AlexNet

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 4 May 2, 2019
Last Time: CNN Architectures

ResNet

SENet

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 5 May 2, 2019
Comparing complexity...

An Analysis of Deep Neural Network Models for Practical Applications, 2017.

Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 6 May 2, 2019
Efficient networks...
MobileNets: Efficient Convolutional Neural Networks for
Mobile Applications
[Howard et al. 2017]

- Depthwise separable convolutions replace

standard convolutions by factorizing them
into a depthwise convolution and a 1x1
convolution that is much more efficient
- Much more efficient, with little loss in
accuracy
- Follow-up MobileNetV2 work in 2018
(Sandler et al.)
- Other works in this space e.g. ShuffleNet
(Zhang et al. 2017)

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 2, 2019
Meta-learning: Learning to learn network architectures...
Neural Architecture Search with Reinforcement Learning (NAS)
[Zoph et al. 2016]

- “Controller” network that learns to design a good

network architecture (output a string
corresponding to network design)
- Iterate:
1) Sample an architecture from search space
2) Train the architecture to get a “reward” R
corresponding to accuracy
3) Compute gradient of sample probability, and
scale by R to perform controller parameter
update (i.e. increase likelihood of good
architecture being sampled, decrease
likelihood of bad architecture)

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 8 May 2, 2019
Meta-learning: Learning to learn network architectures...
Learning Transferable Architectures for Scalable Image
Recognition
[Zoph et al. 2017]
- Applying neural architecture search (NAS) to a
large dataset like ImageNet is expensive
- Design a search space of building blocks
(“cells”) that can be flexibly stacked
- NASNet: Use NAS to find best cell structure
on smaller CIFAR-10 dataset, then transfer
architecture to ImageNet
- Many follow-up works in this
space e.g. AmoebaNet (Real et
al. 2019) and ENAS (Pham,
Guan et al. 2018)

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 9 May 2, 2019
Today: Recurrent Neural Networks

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 10 May 2, 2019
“Vanilla” Neural Network

Vanilla Neural Networks

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 11 May 2, 2019
Recurrent Neural Networks: Process Sequences

e.g. Image Captioning

image -> sequence of words

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 2, 2019
Recurrent Neural Networks: Process Sequences

e.g. Sentiment Classification

sequence of words -> sentiment

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 13 May 2, 2019
Recurrent Neural Networks: Process Sequences

e.g. Machine Translation

seq of words -> seq of words

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 14 May 2, 2019
Recurrent Neural Networks: Process Sequences

e.g. Video classification on frame level

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 15 May 2, 2019
Sequential Processing of Non-Sequence Data

Classify images by taking a

series of “glimpses”

Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015.
Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015
Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra,
2015. Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 16 May 2, 2019
Sequential Processing of Non-Sequence Data
Generate images one piece at a time!

Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015
Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with
permission.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 17 May 2, 2019
Recurrent Neural Network

RNN

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 18 May 2, 2019
Recurrent Neural Network

y
Key idea: RNNs have an
“internal state” that is
updated as a sequence is
RNN processed

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 19 May 2, 2019
Recurrent Neural Network
We can process a sequence of vectors x by
applying a recurrence formula at every time step: y

RNN
new state old state input vector at
some time step
some function x
with parameters W

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 20 May 2, 2019
Recurrent Neural Network
We can process a sequence of vectors x by
applying a recurrence formula at every time step: y

RNN

Notice: the same function and the same set x

of parameters are used at every time step.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 21 May 2, 2019
(Simple) Recurrent Neural Network
The state consists of a single “hidden” vector h:

RNN

x
Sometimes called a “Vanilla RNN” or an
“Elman RNN” after Prof. Jeffrey Elman

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 22 May 2, 2019
RNN: Computational Graph

h0 fW h1

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 23 May 2, 2019
RNN: Computational Graph

h0 fW h1 fW h2

x1 x2

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 24 May 2, 2019
RNN: Computational Graph

h0 fW h1 fW h2 fW h3
… hT

x1 x2 x3

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 25 May 2, 2019
RNN: Computational Graph

Re-use the same weight matrix at every time-step

h0 fW h1 fW h2 fW h3
… hT

x1 x2 x3
W

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 26 May 2, 2019
RNN: Computational Graph: Many to Many

y1 y2 y3 yT

h0 fW h1 fW h2 fW h3
… hT

x1 x2 x3
W

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 27 May 2, 2019
RNN: Computational Graph: Many to Many

y1 L1 y2 L2 y3 L3 yT LT

h0 fW h1 fW h2 fW h3
… hT

x1 x2 x3
W

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 28 May 2, 2019
RNN: Computational Graph: Many to Many L

y1 L1 y2 L2 y3 L3 yT LT

h0 fW h1 fW h2 fW h3
… hT

x1 x2 x3
W

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 29 May 2, 2019
RNN: Computational Graph: Many to One

h0 fW h1 fW h2 fW h3
… hT

x1 x2 x3
W

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 30 May 2, 2019
RNN: Computational Graph: One to Many

y1 y2 y3 yT

h0 fW h1 fW h2 fW h3
… hT

x
W

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 31 May 2, 2019
Sequence to Sequence: Many-to-one +
one-to-many
Many to one: Encode input
sequence in a single vector

h h h h … h
fW fW fW
0 1 2 3 T

x x x
W
1 2 3
1

Sutskever et al, “Sequence to Sequence Learning with Neural Networks”, NIPS 2014

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 32 May 2, 2019
Sequence to Sequence: Many-to-one +
one-to-many
One to many: Produce output
sequence from single input vector
Many to one: Encode input
sequence in a single vector y y
1 2

h h h h … h h h
fW fW fW fW fW fW …
0 1 2 3 T 1 2

x x x
W W
1 2 3
1 2

Sutskever et al, “Sequence to Sequence Learning with Neural Networks”, NIPS 2014

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 33 May 2, 2019
Example:
Character-level
Language Model

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 34 May 2, 2019
Example:
Character-level
Language Model

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 35 May 2, 2019
Example:
Character-level
Language Model

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 36 May 2, 2019
“e” “l” “l” “o”
Example: Sample
Character-level .03
.13
.25
.20
.11
.17
.11
.02
Softmax .00 .05 .68 .08
Language Model .84 .50 .03 .79

Sampling

Vocabulary:
[h,e,l,o]

At test-time sample
characters one at a time,
feed back to model

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 37 May 2, 2019
“e” “l” “l” “o”
Example: Sample
Character-level .03
.13
.25
.20
.11
.17
.11
.02
Softmax .00 .05 .68 .08
Language Model .84 .50 .03 .79

Sampling

Vocabulary:
[h,e,l,o]

At test-time sample
characters one at a time,
feed back to model

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 38 May 2, 2019
“e” “l” “l” “o”
Example: Sample
Character-level .03
.13
.25
.20
.11
.17
.11
.02
Softmax .00 .05 .68 .08
Language Model .84 .50 .03 .79

Sampling

Vocabulary:
[h,e,l,o]

At test-time sample
characters one at a time,
feed back to model

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 39 May 2, 2019
“e” “l” “l” “o”
Example: Sample
Character-level .03
.13
.25
.20
.11
.17
.11
.02
Softmax .00 .05 .68 .08
Language Model .84 .50 .03 .79

Sampling

Vocabulary:
[h,e,l,o]

At test-time sample
characters one at a time,
feed back to model

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 40 May 2, 2019
Forward through entire sequence to

Backpropagation through time compute loss, then backward through

entire sequence to compute gradient

Loss

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 41 May 2, 2019
Truncated Backpropagation through time
Loss

Run forward and backward

through chunks of the
sequence instead of whole
sequence

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 42 May 2, 2019
Truncated Backpropagation through time
Loss

Carry hidden states

forward in time forever,
but only backpropagate
for some smaller
number of steps

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 43 May 2, 2019
Truncated Backpropagation through time
Loss

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 44 May 2, 2019
min-char-rnn.py gist: 112 lines of Python

(https://gist.github.com/karpathy/d4dee
566867f8291f086)

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 45 May 2, 2019
y

RNN

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 46 May 2, 2019
at first:
train more

train more

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 47 May 2, 2019
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 48 May 2, 2019
The Stacks Project: open source algebraic geometry textbook

Latex source http://stacks.math.columbia.edu/

The stacks project is licensed under the GNU Free Documentation License

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 49 May 2, 2019
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 50 May 2, 2019
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 51 May 2, 2019
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 52 May 2, 2019
Generated
C code

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 53 May 2, 2019
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 54 May 2, 2019
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 55 May 2, 2019
Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 56 May 2, 2019
Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 57 May 2, 2019
Searching for interpretable cells

quote detection cell

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 58 May 2, 2019
Searching for interpretable cells

line length tracking cell

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 59 May 2, 2019
Searching for interpretable cells

if statement cell
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 60 May 2, 2019
Searching for interpretable cells

quote/comment cell
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 61 May 2, 2019
Searching for interpretable cells

code depth cell

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 62 May 2, 2019
Image Captioning

Figure from Karpathy et a, “Deep

Visual-Semantic Alignments for Generating
Image Descriptions”, CVPR 2015; figure
copyright IEEE, 2015.
Reproduced for educational purposes.

Explain Images with Multimodal Recurrent Neural Networks, Mao et al.

Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei
Show and Tell: A Neural Image Caption Generator, Vinyals et al.
Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.
Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 63 May 2, 2019
Recurrent Neural Network

Convolutional Neural Network

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 64 May 2, 2019
test image

This image is CC0 public domain

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 2, 2019
test image

X
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 2, 2019
test image

x0
<STA
RT>

Fei-Fei Li & Justin Johnson & Serena Yeung

<START> Lecture 10 - May 2, 2019
test image

before:
h = tanh(Wxh * x + Whh * h)
h0
Wih
now:
h = tanh(Wxh * x + Whh * h + Wih * v)
x0
<STA
RT>

v Li & Justin Johnson<START>

Fei-Fei & Serena Yeung Lecture 10 - May 2, 2019
test image

sample!
h0

x0
<STA straw
RT>

Fei-Fei Li & Justin Johnson & Serena Yeung

<START> Lecture 10 - May 2, 2019
test image

y0 y1

h0 h1

x0
<STA straw
RT>

Fei-Fei Li & Justin Johnson & Serena Yeung

<START> Lecture 10 - May 2, 2019
test image

y0 y1

h0 h1
sample!

x0
<STA straw hat
RT>

Fei-Fei Li & Justin Johnson & Serena Yeung

<START> Lecture 10 - May 2, 2019
test image

y0 y1 y2

h0 h1 h2

x0
<STA straw hat
RT>

Fei-Fei Li & Justin Johnson & Serena Yeung

<START> Lecture 10 - May 2, 2019
test image

y0 y1 y2

sample
<END> token
h0 h1 h2 => finish.

x0
<STA straw hat
RT>

Fei-Fei Li & Justin Johnson & Serena Yeung

<START> Lecture 10 - May 2, 2019
Captions generated using neuraltalk2
All images are CC0 Public domain:

Image Captioning: Example Results cat suitcase, cat tree, dog, bear,
surfers, tennis, giraffe, motorcycle

A cat sitting on a A cat is sitting on a tree A dog is running in the A white teddy bear sitting in
suitcase on the floor branch grass with a frisbee the grass

Two people walking on A tennis player in action Two giraffes standing in a A man riding a dirt bike on
the beach with surfboards on the court grassy field a dirt track

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 75 May 2, 2019
Captions generated using neuraltalk2
All images are CC0 Public domain: fur

Image Captioning: Failure Cases coat, handstand, spider web, baseball

A bird is perched on
a tree branch

A woman is holding a cat

in her hand

A man in a
baseball uniform
throwing a ball

A woman standing on a
beach holding a surfboard
A person holding a
computer mouse on a desk

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 76 May 2, 2019
Image Captioning with Attention
RNN focuses its attention at a different spatial location
when generating each word

Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015
Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 77 May 2, 2019
Image Captioning with Attention

CNN h0

Features:
Image: LxD
HxWx3

Xu et al, “Show, Attend and Tell: Neural

Image Caption Generation with Visual
Attention”, ICML 2015

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 78 May 2, 2019
Image Captioning with Attention
Distribution over
L locations

CNN h0

Features:
Image: LxD
HxWx3

Xu et al, “Show, Attend and Tell: Neural

Image Caption Generation with Visual
Attention”, ICML 2015

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 79 May 2, 2019
Image Captioning with Attention
Distribution over
L locations

CNN h0

Features:
Image: LxD
HxWx3 Weighted
z1
features: D
Weighted
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
combination
Attention”, ICML 2015 of features

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 80 May 2, 2019
Image Captioning with Attention
Distribution over
L locations

CNN h0 h1

Features:
Image: LxD
HxWx3 Weighted
z1 y1
features: D
Weighted
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
combination First word
Attention”, ICML 2015 of features

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 81 May 2, 2019
Image Captioning with Attention
Distribution over Distribution
L locations over vocab

a1 a2 d1

CNN h0 h1

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 82 May 2, 2019
Image Captioning with Attention
Distribution over Distribution
L locations over vocab

a1 a2 d1

CNN h0 h1 h2

Features:
Image: LxD
HxWx3 Weighted
z1 y1 z2 y2
features: D
Weighted
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
combination First word
Attention”, ICML 2015 of features

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 83 May 2, 2019
Image Captioning with Attention
Distribution over Distribution
L locations over vocab

a1 a2 d1 a3 d2

CNN h0 h1 h2

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 84 May 2, 2019
Image Captioning with Attention

Soft attention

Hard attention

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 85 May 2, 2019
Image Captioning with Attention

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 86 May 2, 2019
Visual Question Answering

Agrawal et al, “VQA: Visual Question Answering”, ICCV 2015

Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016
Figure from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 87 May 2, 2019
Visual Question Answering: RNNs with Attention

Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016
Figures from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 88 May 2, 2019
Multilayer RNNs

LSTM:

depth

time

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 89 May 2, 2019
Vanilla RNN Gradient Flow
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013

W tanh

ht-1 stack ht

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 90 May 2, 2019
Vanilla RNN Gradient Flow
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013

Backpropagation from ht
to ht-1 multiplies by W
(actually WhhT)

W tanh

ht-1 stack ht

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 91 May 2, 2019
Vanilla RNN Gradient Flow
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013

h0 h1 h2 h3 h4

x1 x2 x3 x4

Computing gradient
of h0 involves many
factors of W
(and repeated tanh)

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 92 May 2, 2019
Vanilla RNN Gradient Flow
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013

h0 h1 h2 h3 h4

x1 x2 x3 x4

Largest singular value > 1:

Computing gradient Exploding gradients
of h0 involves many
factors of W Largest singular value < 1:
(and repeated tanh) Vanishing gradients

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 93 May 2, 2019
Vanilla RNN Gradient Flow
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013

h0 h1 h2 h3 h4

x1 x2 x3 x4

Largest singular value > 1: Gradient clipping: Scale

Computing gradient Exploding gradients gradient if its norm is too big
of h0 involves many
factors of W Largest singular value < 1:
(and repeated tanh) Vanishing gradients

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 94 May 2, 2019
Vanilla RNN Gradient Flow
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013

h0 h1 h2 h3 h4

x1 x2 x3 x4

Largest singular value > 1:

Computing gradient Exploding gradients
of h0 involves many
factors of W Largest singular value < 1:
(and repeated tanh) Change RNN architecture
Vanishing gradients

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 95 May 2, 2019
Long Short Term Memory (LSTM)

Vanilla RNN LSTM

Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation

1997

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 96 May 2, 2019
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997] i: Input gate, whether to write to cell
f: Forget gate, Whether to erase cell
o: Output gate, How much to reveal cell
vector from g: Gate gate (?), How much to write to cell
below (x)
x sigmoid i

h sigmoid f
W
vector from sigmoid o
before (h)
tanh g

4h x 2h 4h 4*h

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 97 May 2, 2019
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]

ct-1 ☉ + ct

f
i
W ☉ tanh
g
ht-1 stack
o ☉ ht

xt
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 98 May 2, 2019
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
Backpropagation from ct to
ct-1 only elementwise
multiplication by f, no matrix
ct-1 ☉ + ct multiply by W

f
i
W ☉ tanh
g
ht-1 stack
o ☉ ht

xt
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 99 May 2, 2019
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]

Uninterrupted gradient flow!

c0 c1 c2 c3

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 100 May 2, 2019
c3

Lecture 10 - 101 May 2, 2019

Uninterrupted gradient flow!
Long Short Term Memory (LSTM): Gradient Flow

Softmax
FC 1000
Pool
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64

Fei-Fei Li & Justin Johnson & Serena Yeung

3x3 conv, 64
3x3 conv, 64
...
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
c1

3x3 conv, 128 / 2

3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
[Hochreiter et al., 1997]

3x3 conv, 64
3x3 conv, 64
Pool
7x7 conv, 64 / 2
Input

Similar to ResNet!
c0
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]

Uninterrupted gradient flow!

c0 c1 c2 c3

In between:
Highway Networks
3x3 conv, 128 / 2
7x7 conv, 64 / 2

3x3 conv, 128

3x3 conv, 128
3x3 conv, 128

Similar to ResNet!
3x3 conv, 64
3x3 conv, 64

3x3 conv, 64
3x3 conv, 64

3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64

3x3 conv, 64
3x3 conv, 64

FC 1000
Softmax
Input

Pool

Pool
...
Srivastava et al, “Highway Networks”,
ICML DL Workshop 2015

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 102 May 2, 2019
[An Empirical Exploration of
Other RNN Variants Recurrent Network Architectures,
Jozefowicz et al., 2015]
GRU [Learning phrase representations using rnn
encoder-decoder for statistical machine translation,
Cho et al. 2014]

[LSTM: A Search Space Odyssey,

Greff et al., 2015]

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 103 May 2, 2019
Recently in Natural Language Processing…
New paradigms for reasoning over sequences
[“Attention is all you need”, Vaswani et al., 2018]

- New “Transformer” architecture no longer

processes inputs sequentially; instead it can
operate over inputs in a sequence in parallel
through an attention mechanism

- Has led to many state-of-the-art results and

pre-training in NLP, for more interest see e.g.
- “BERT: Pre-training of Deep Bidirectional
Transformers for Language
Understanding”, Devlin et al., 2018
- OpenAI GPT-2, Radford et al., 2018

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 104 May 2, 2019
Summary
- RNNs allow a lot of flexibility in architecture design
- Vanilla RNNs are simple but don’t work very well
- Common to use LSTM or GRU: their additive interactions
improve gradient flow
- Backward flow of gradients in RNN can explode or vanish.
Exploding is controlled with gradient clipping. Vanishing is
controlled with additive interactions (LSTM)
- Better/simpler architectures are a hot topic of current research,
as well as new paradigms for reasoning over sequences
- Better understanding (both theoretical and empirical) is needed.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 105 May 2, 2019
Next time: Midterm!

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 106 May 2, 2019

AML - Lecture - 09 - 08nov24
No ratings yet
AML - Lecture - 09 - 08nov24
126 pages
RNN, Gru, LSTM
No ratings yet
RNN, Gru, LSTM
129 pages
RNN
No ratings yet
RNN
79 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
104 pages
Support Materi
No ratings yet
Support Materi
120 pages
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
No ratings yet
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
71 pages
RG GAI Module IV All Slides With Qs
No ratings yet
RG GAI Module IV All Slides With Qs
90 pages
RNN and LSTM
No ratings yet
RNN and LSTM
65 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
111 pages
Recurrent Neural Net-1
No ratings yet
Recurrent Neural Net-1
47 pages
Deep Learning Recurrent Neural Networks - Introduction
No ratings yet
Deep Learning Recurrent Neural Networks - Introduction
106 pages
Unit 5 Updated
No ratings yet
Unit 5 Updated
125 pages
CS115 Math For Computer Science
No ratings yet
CS115 Math For Computer Science
45 pages
Unit 3 Chapter 1 RNN
No ratings yet
Unit 3 Chapter 1 RNN
121 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
Very Deep Learning - 2
No ratings yet
Very Deep Learning - 2
63 pages
Lec 10
No ratings yet
Lec 10
37 pages
Introduction To Rnns
No ratings yet
Introduction To Rnns
48 pages
Deep Learning Unit 4 by Syed Ateeq
No ratings yet
Deep Learning Unit 4 by Syed Ateeq
34 pages
CH 10
No ratings yet
CH 10
40 pages
Recurrent Neural Networks (RNNS)
No ratings yet
Recurrent Neural Networks (RNNS)
45 pages
IT641 RNN V2-Compressed
No ratings yet
IT641 RNN V2-Compressed
74 pages
DL Unit Iv
No ratings yet
DL Unit Iv
15 pages
09-RNN (V.Andicsova)
No ratings yet
09-RNN (V.Andicsova)
30 pages
Deep Learning - Lecture 10
No ratings yet
Deep Learning - Lecture 10
22 pages
12.advanced DL Topics
No ratings yet
12.advanced DL Topics
104 pages
Module5 DL
No ratings yet
Module5 DL
18 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
Unit 5
No ratings yet
Unit 5
76 pages
Recurrent and Recursive Neural Networks
No ratings yet
Recurrent and Recursive Neural Networks
19 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
Unit V
No ratings yet
Unit V
32 pages
DNN U2 Notes
No ratings yet
DNN U2 Notes
32 pages
T3-Slide - 002 - Vanilla RNNs
No ratings yet
T3-Slide - 002 - Vanilla RNNs
25 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
Module 5
No ratings yet
Module 5
21 pages
DL Notes
No ratings yet
DL Notes
35 pages
Dl-Unit 5
No ratings yet
Dl-Unit 5
10 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
The Unreasonable Effectiveness of Recurrent Neural Networks
No ratings yet
The Unreasonable Effectiveness of Recurrent Neural Networks
1 page
6S191 MIT DeepLearning L2
No ratings yet
6S191 MIT DeepLearning L2
85 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics
No ratings yet
Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics
50 pages
Lec 60
No ratings yet
Lec 60
21 pages
Aquino Dominic Bien FA2.2
No ratings yet
Aquino Dominic Bien FA2.2
3 pages
A Brief Overview of Recurrent Neural Networks (RNN)
No ratings yet
A Brief Overview of Recurrent Neural Networks (RNN)
8 pages
Motivation: The Scientific Guide On How To Get and Stay Motivated
100% (2)
Motivation: The Scientific Guide On How To Get and Stay Motivated
14 pages
DL Unit - III Notes1
No ratings yet
DL Unit - III Notes1
14 pages
DL Unit 4 Part 2
No ratings yet
DL Unit 4 Part 2
8 pages
Unit 3
No ratings yet
Unit 3
8 pages
Mergeddv
No ratings yet
Mergeddv
2 pages
Institute of Engineering and Technology Davv, Indore: Lab Assingment On
No ratings yet
Institute of Engineering and Technology Davv, Indore: Lab Assingment On
14 pages
Duolingo For Grammar Learning: Indah Sri Redjeki, R. Muhajir
No ratings yet
Duolingo For Grammar Learning: Indah Sri Redjeki, R. Muhajir
24 pages
REPORT
No ratings yet
REPORT
24 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
11 pages
11 Klas Anglijska Mova Nerisjan 2019 PDF
No ratings yet
11 Klas Anglijska Mova Nerisjan 2019 PDF
192 pages
The Referential Theory of Meaning
100% (1)
The Referential Theory of Meaning
9 pages
Classroom Discourse and The Space of Learning
No ratings yet
Classroom Discourse and The Space of Learning
255 pages
Recurrent Neural Network Wiki
100% (1)
Recurrent Neural Network Wiki
7 pages
Trial Memorandum Writing
No ratings yet
Trial Memorandum Writing
6 pages
Lecture Notes - Recurrent Neural Networks
No ratings yet
Lecture Notes - Recurrent Neural Networks
11 pages
Counselling Theories: Understand Construct of Theories and Its Application On Practise
No ratings yet
Counselling Theories: Understand Construct of Theories and Its Application On Practise
77 pages
Science 3rd QTR WEEK2 Day 2 3 FEb.6
No ratings yet
Science 3rd QTR WEEK2 Day 2 3 FEb.6
5 pages
Article Teaching English Learners The Siop Way
No ratings yet
Article Teaching English Learners The Siop Way
4 pages
The Art Classroom
No ratings yet
The Art Classroom
3 pages
TA12 - Unit 6
No ratings yet
TA12 - Unit 6
53 pages
7 Main Developmental Theories 1 - 113532
No ratings yet
7 Main Developmental Theories 1 - 113532
16 pages
An In-Depth Exploration of Natural Language Processing: Evolution, Applications, and Future Directions
100% (8)
An In-Depth Exploration of Natural Language Processing: Evolution, Applications, and Future Directions
5 pages
Assessment in Invasion Games II Planning For Assessment
No ratings yet
Assessment in Invasion Games II Planning For Assessment
4 pages
Place Value
No ratings yet
Place Value
13 pages
This Webinar Taught Us On How Important Work Ethics A Student and A Future Criminologist
No ratings yet
This Webinar Taught Us On How Important Work Ethics A Student and A Future Criminologist
2 pages
7 Habits Tree
No ratings yet
7 Habits Tree
2 pages
Gardner 2014
No ratings yet
Gardner 2014
20 pages
Testing and Evaluation
No ratings yet
Testing and Evaluation
12 pages
Metacognition PDF
No ratings yet
Metacognition PDF
4 pages
Coordinate Plane Lesson Plan
No ratings yet
Coordinate Plane Lesson Plan
4 pages
Chapter 3 PDF
No ratings yet
Chapter 3 PDF
44 pages
Lcolegario Dissertation Preliminary Pages
No ratings yet
Lcolegario Dissertation Preliminary Pages
9 pages
Writing Poetry Lesson Plan
No ratings yet
Writing Poetry Lesson Plan
2 pages
Final Ubd - Anne Frank-1
No ratings yet
Final Ubd - Anne Frank-1
3 pages
Week 10
No ratings yet
Week 10
4 pages
Behaviourism Lecture Paper
No ratings yet
Behaviourism Lecture Paper
2 pages
Physci Grasps Peta
No ratings yet
Physci Grasps Peta
2 pages
Pronomes em Latim
No ratings yet
Pronomes em Latim
4 pages
Comparison Degree
No ratings yet
Comparison Degree
2 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Online Finite Element Analysis Course
From Everand
Online Finite Element Analysis Course
Dr. James A. Mandel P.E.
3/5 (1)