0% found this document useful (0 votes)
38 views

Lab RNN Intro

This document discusses recurrent neural networks (RNNs) and their applications. It provides examples of RNN tasks like image captioning, sentiment classification, translation, and video classification. It then explains the basic vanilla RNN model, how RNNs are unfolded in time, and backpropagation through time. Next, it covers truncated backpropagation, teacher forcing, and warm-starting. Finally, it introduces long short-term memory (LSTM) cells and their components, equations, and practical applications in large sequence to sequence models.

Uploaded by

Miruna -Alondra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Lab RNN Intro

This document discusses recurrent neural networks (RNNs) and their applications. It provides examples of RNN tasks like image captioning, sentiment classification, translation, and video classification. It then explains the basic vanilla RNN model, how RNNs are unfolded in time, and backpropagation through time. Next, it covers truncated backpropagation, teacher forcing, and warm-starting. Finally, it introduces long short-term memory (LSTM) cells and their components, equations, and practical applications in large sequence to sequence models.

Uploaded by

Miruna -Alondra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Machine Learning

- Intro to Recurrent Neural Networks -


RNN Tasks

2/18
RNN Tasks

Vanilla RNNs

Source: CS231n Lecture 10

3/18
RNN Tasks

e.g. Image Captioning


Image → sequence of words
Source: CS231n Lecture 10

4/18
RNN Tasks

e.g. Sentiment Classification


Sequence of words → sentiment
Source: CS231n Lecture 10

5/18
RNN Tasks

e.g. Translation
Sequence of words → sequence of words
Source: CS231n Lecture 10

6/18
RNN Tasks

e.g. Video classification


on frame level
Source: CS231n Lecture 10

7/18
RNN Model

8/18
Vanilla RNN Model

(t) (t ) (t )
x h y

wih whh who


Current state depends on current inputs and previous state

RNNs can yield outputs at each time step
(t ) (t−1) (t)
h =f w (h
hh
, f w ( x ))
ih

(t ) (t )
y =f w (h ), ∀ t ∈{1... τ }
ho

9/18
Unfolding RNN in time

Source: NN Lectures, Tudor


Berariu, 2016

10/18
Unfolding RNN in time

Source: NN Lectures, Tudor


Berariu, 2016

11/18
Unfolding RNN in time

Source: NN Lectures, Tudor


Berariu, 2016

12/18
Forward through entire sequence to
compute loss, then backward through
Backpropagation through time entire sequence to compute gradient

Loss

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 10 - 50 April 29, 2021


Truncated Backpropagation through time
Loss

Run forward and backward


through chunks of the
sequence instead of whole
sequence

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 10 - 51 April 29, 2021


Truncated Backpropagation through time
Loss

Carry hidden states


forward in time forever,
but only backpropagate
for some smaller
number of steps

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 10 - 52 April 29, 2021


Truncated Backpropagation through time
Loss

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 10 - 53 April 29, 2021


Truncated BPTT


Used in practice

Summary of the algorithm:
– Present a sequence of k1 timesteps of input and output pairs to
the network.
– Unroll the network then calculate and accumulate errors across
k2 timesteps.
– Roll-up the network and update weights.
– Repeat

13/18
Teacher Forcing and Warm-start


When training a RNN to generate a sequence, often, the
predictions (outputs y(t)) of a RNN cell are used as the input of
the cell at the next timestamp

Teacher Forcing: at training time, use the targets of the
sequence, instead of RNN predictions, as inputs to the next
step


Warm-start: when using an RNN to predict a next value
conditioned on previous predictions, it is sometimes
necessary to give the RNN some context (known ground truth
elements) before letting it predict on its own

14/18
LSTM

15/18
LSTM Cell

Img source:
https://medium.com/
@kangeugine/


Input Gate (i in (0, 1) – sigmoid) – scales input to cell (write)

Output Gate (o in (0, 1) – sigmoid) – scales output from cell
(read)

Forget Gate (f in (0, 1) – sigmoid) – scales old cell values
(reset mem)

16/18
LSTM Cell - Equations

(t ) (t−1)
it =σ ( θ xi x + θhi h +b i )

(t ) (t−1)
f t =σ ( θ xf x + θhf h +b f )

(t ) (t−1)
o t =σ ( θ xo x + θho h +b o )

(t) (t−1)
g t =tanh ( θ xg x + θhg h +b g )

c t =f t ⊙c(t−1)+it ⊙g t
h t =ot ⊙tanh(ct ) , where ⊙ is elementwise multiplication

17/18
LSTMs in practice


Sutskever et al, Sequence
to Sequence Learning with
Neural Networks, NIPS 2014
– Models are huge :-)

– 4 layers, 1000 LSTM cells


per layer
– Input vocabulary of 160k
– Output vocabulary of 80k
– 1000 dimensional word
embeddings

18/18

You might also like