0% found this document useful (0 votes)
52 views69 pages

11.1. Deep Learning (RNN)

This document discusses deep learning sequence models, focusing on Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRUs). It explains the architecture and functioning of RNNs, including the concept of hidden states and the challenges of training RNNs, particularly the vanishing and exploding gradient problems. The document also outlines the computational graph for RNNs and the implications of weight parameters on gradient behavior during training.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views69 pages

11.1. Deep Learning (RNN)

This document discusses deep learning sequence models, focusing on Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRUs). It explains the architecture and functioning of RNNs, including the concept of hidden states and the challenges of training RNNs, particularly the vanishing and exploding gradient problems. The document also outlines the computational graph for RNNs and the implications of weight parameters on gradient behavior during training.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Deep Learning:

Sequence Models
Part – 1
Dr. Oybek Eraliev,
Department of Computer Engineering
Inha University In Tashkent.
Email: [email protected]

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 1


Content

ØRecurrent Neural Network (RNN)


ØLong Short-Term Memory (LSTM)
ØGated Recurrent Unit Network (GRUs)
ØImplementation of RNN, GRU and LSTM

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 2


Recurrent Neural Network
“Vanilla” neural network
ØA vanilla neural network (NN) is a 𝑊 (")
type of artificial intelligence (AI) 𝑊 ($)
model that is made up of an input
layer, an output layer, and multiple
hidden layers. The hidden layers are
fully connected, meaning that each
neuron in one layer is connected to
every neuron in the next layer. The
output of each layer is then fed
through a nonlinear activation Inpputs Hidden Layer Outputs
function.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 3


Recurrent Neural Network
“Vanilla” neural network

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 4


Recurrent Neural Network
Recurrent Neural Network: Process Sequence
one to one one to many many to one many to many many to many

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 5


Recurrent Neural Network

y
Key idea: RNNs have an
“internal state” that is
RNN updated as a sequence is
processed
x

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 6


Recurrent Neural Network

y1 y2 y3 yt

RNN RNN RNN … RNN

x1 x2 x3 xt

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 7


Recurrent Neural Network

RNN hidden state update


We can process a sequence of vectors x by applying a y
recurrence formula at every time step:

ℎ! = 𝑓" (ℎ!#$ , 𝑥! ) RNN

New state Old state x


some function with Input vector at
parameters W some time step

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 8


Recurrent Neural Network

RNN output generation


We can process a sequence of vectors x by applying a y
recurrence formula at every time step:

𝑦! = 𝑓%9: (ℎ! ) RNN

Output New state x


another function with
parameters 𝑊!"
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 9
Recurrent Neural Network

y1 y2 y3 yt

ℎ" ℎ# ℎ$ ℎ%
RNN RNN RNN … RNN

x1 x2 x3 xt

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 10


Recurrent Neural Network

We can process a sequence of vectors x by applying a


recurrence formula at every time step:
y

ℎ! = 𝑓" (ℎ!#$ , 𝑥! ) RNN

Notice: the same function and the x


same set of parameters are used at
every time step.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 11
Recurrent Neural Network
Recurrent Neural Network (Vanilla)
The state consists of a single “hidden” vector h:

ℎ; = 𝑓< (ℎ;=> , 𝑥; ) y

RNN
ℎ; = 𝑡𝑎𝑛ℎ(𝑊?? ℎ;=> , +𝑊@? 𝑥; )
𝑦; = 𝑓(𝑊?A ℎ; ) x

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 12


Recurrent Neural Network
RNN: Computational Graph
Re-use the same weight matrix at every time-step

ℎ% 𝑓' ℎ" 𝑓' ℎ$ 𝑓' ℎ& … ℎ'

𝑥# 𝑥$ 𝑥%
𝑾

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 13


Recurrent Neural Network
RNN: Computational Graph: Many to Many

𝑦" 𝑦$ 𝑦& 𝑦'

ℎ% 𝑓' ℎ" 𝑓' ℎ$ 𝑓' ℎ& … ℎ'

𝑥# 𝑥$ 𝑥%
𝑾

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 14


Recurrent Neural Network
𝐿
RNN: Computational Graph: Many to Many

𝑦" 𝐿" 𝑦$ 𝐿$ 𝑦& 𝐿& 𝑦' 𝐿'

ℎ% 𝑓' ℎ" 𝑓' ℎ$ 𝑓' ℎ& … ℎ'

𝑥# 𝑥$ 𝑥%
𝑾

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 15


Recurrent Neural Network
RNN: Computational Graph: Many to One

ℎ% 𝑓' ℎ" 𝑓' ℎ$ 𝑓' ℎ& … ℎ'

𝑥# 𝑥$ 𝑥%
𝑾

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 16


Recurrent Neural Network
RNN: Computational Graph: One to Many

𝑦" 𝑦$ 𝑦& 𝑦'

ℎ% 𝑓' ℎ" 𝑓' ℎ$ 𝑓' ℎ& … ℎ'

𝑥
𝑾

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 17


Recurrent Neural Network
RNN: Computational Graph: One to Many

𝑦" 𝑦$ 𝑦& 𝑦'

ℎ% 𝑓' ℎ" 𝑓' ℎ$ 𝑓' ℎ& … ℎ'

𝑥 ? ? ?
𝑾

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 18


Recurrent Neural Network
RNN: Computational Graph: One to Many

𝑦" 𝑦$ 𝑦& 𝑦'

ℎ% 𝑓' ℎ" 𝑓' ℎ$ 𝑓' ℎ& … ℎ'

𝑥 0 0 0
𝑾

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 19


Recurrent Neural Network
RNN: Computational Graph: One to Many

𝑦" 𝑦$ 𝑦& 𝑦'

ℎ% 𝑓' ℎ" 𝑓' ℎ$ 𝑓' ℎ& … ℎ'

𝑥 𝑦# 𝑦$ 𝑦!"#
𝑾

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 20


Recurrent Neural Network

y1 y2 y3 y4
One big problem is
that the more we
unroll a recurrent RNN RNN RNN RNN
neural network, the
harder it is to train
x1 x2 x3 x4

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 21


Recurrent Neural Network

y1 y2 y3 y4
This problem is
called The
Vanishing/Explodin RNN RNN RNN RNN
g Gradient Problem.

x1 x2 x3 x4

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 22


Recurrent Neural Network

y1 y2 y3 y4
This problem is
called The
Vanishing/Explodin RNN RNN RNN RNN
g Gradient Problem.

x1 x2 x3 x4

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 23


Recurrent Neural Network

y1 y2 y3 y4
In our example the
Vanishing/Exploding
𝑊** 𝑊** 𝑊** 𝑊**
Gradient problem RNN RNN RNN RNN
has to do with the
squiglle that we copy
each time we unroll x1 x2 x3 x4
the network.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 24


Recurrent Neural Network

Note: To make it easier


to understand the y1 y2 y3 y4
Vanishing/Exploding
Gradient problem, 𝑊** 𝑊** 𝑊** 𝑊**
we’re going to ignore RNN RNN RNN RNN
the other weights and
biases in this network
x1 x2 x3 x4
and just focus on 𝑊**

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 25


Recurrent Neural Network

Note: To make it easier


to understand the y1 y2 y3 y4
Vanishing/Exploding
Gradient problem, 𝑊** 𝑊** 𝑊** 𝑊**
we’re going to ignore RNN RNN RNN RNN
the other weights and
biases in this network
x1 x2 x3 x4
and just focus on 𝑊**

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 26


Recurrent Neural Network

Also, just to remind you,


when we optimize
neural networks with
Backpropagation, we
first find the
derivatives, or &'())
W≔ 𝑊 − 𝛼 &)
Gradients, for each
parameter

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 27


Recurrent Neural Network

Also, just to remind you,


when we optimize &'())
W≔ 𝑊 − 𝛼
neural networks with &)
𝐽 𝑊
Backpropagation, we
first find the
derivatives, or
Gradients, for each
parameter
W

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 28


Recurrent Neural Network

How can a gradient


Explode? y1 y2 y3 y4

𝑊** 𝑊** 𝑊** 𝑊**


RNN RNN RNN RNN

x1 x2 x3 x4

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 29


Recurrent Neural Network

How can a gradient


Explode? y1 y2 y3 y4

In our example, the 𝑊** 𝑊** 𝑊** 𝑊**


gradient will explode, RNN RNN RNN RNN
when we set 𝑊** to any
value larger than 1.
x1 x2 x3 x4

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 30


Recurrent Neural Network

Let’s set 𝑊** = 2.


y1 y2 y3 y4

𝑊** 𝑊** 𝑊** 𝑊**


RNN RNN RNN RNN

x1 x2 x3 x4

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 31


Recurrent Neural Network

Let’s set 𝑊** = 2.


y1 y2 y3 y4
Now, the first input
value, 𝑥# … 𝑊** 𝑊** 𝑊** 𝑊**
RNN RNN RNN RNN

x1 x2 x3 x4

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 32


Recurrent Neural Network

Let’s set 𝑊** = 2.


y1 y2 y3 y4
Now, the first input
value, 𝑥# … 𝑊** 𝑊** 𝑊** 𝑊**
RNN RNN RNN RNN

x1 x2 x3 x4
𝑥# ×2

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 33


Recurrent Neural Network

Let’s set 𝑊** = 2.


y1 y2 y3 y4
Now, the first input
value, 𝑥# … 𝑊** 𝑊** 𝑊** 𝑊**
RNN RNN RNN RNN

x1 x2 x3 x4
𝑥# ×2×2

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 34


Recurrent Neural Network

Let’s set 𝑊** = 2.


y1 y2 y3 y4
Now, the first input
value, 𝑥# … 𝑊** 𝑊** 𝑊** 𝑊**
RNN RNN RNN RNN

x1 x2 x3 x4
𝑥# ×2×2×2

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 35


Recurrent Neural Network

We multiply the input


value by 𝑊** , which is
2, raised to the number y1 y2 y3 y4
of time we unralled,
which is 3 𝑊** 𝑊** 𝑊** 𝑊**
RNN RNN RNN RNN
𝑥# ×2×2×2

𝑥# ×2% x1 x2 x3 x4

+,-./011
𝑥# ×𝑊**
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 36
Recurrent Neural Network

And that means the first


input value is amplified
8 times before it gets to y1 y2 y3 y4
the final copy of the
network 𝑊** 𝑊** 𝑊** 𝑊**
RNN RNN RNN RNN
𝑥# ×2×2×2

𝑥# ×2% x1 x2 x3 x4

+,-./011
𝑥# ×𝑊**
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 37
Recurrent Neural Network

Now if we had 50
sequential RNN cells
y1 y2 y3 y4

𝑊** 𝑊** 𝑊** 𝑊**


RNN RNN RNN RNN
𝑥# ×22" = 𝑥# × 𝐴 𝐻𝑢𝑔𝑒 𝑁𝑢𝑚.

+,-./011
𝑥# ×𝑊** x1 x2 x3 x4

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 38


Recurrent Neural Network

This Huge Number is


why they call this an
Exploding Gradients y1 y2 y3 y4
Problem.
𝑊** 𝑊** 𝑊** 𝑊**
RNN RNN RNN RNN
𝑥# ×22" = 𝑥# × 𝐴 𝐻𝑢𝑔𝑒 𝑁𝑢𝑚.

+,-./011
𝑥# ×𝑊** x1 x2 x3 x4

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 39


Recurrent Neural Network

If we tried to train this RNN 𝜕𝐽(𝑊) 𝜕𝐽(𝑊) 𝜕𝑦T 𝜕ℎ#


model with backpropogation, = U U +⋯
𝜕𝑊!- 𝜕𝑦T 𝜕ℎ# 𝜕𝑊!-
this Huge Number would find
its way into same of the
𝜕ℎ# 𝜕
gradients = 𝑥# U 𝑊!- U 𝑊!!
𝜕𝑊!- 𝜕𝑊!-
𝑥# ×22" = 𝑥# × 𝐴 𝐻𝑢𝑔𝑒 𝑁𝑢𝑚.

+,-./011
𝑥# ×𝑊**

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 40


Recurrent Neural Network

… and that would make it hard 𝜕𝐽(𝑊) 𝜕𝐽(𝑊) 𝜕𝑦T 𝜕ℎ#


to take small steps to find the = U U +⋯
𝜕𝑊!- 𝜕𝑦T 𝜕ℎ# 𝜕𝑊!-
optimal weights and biases.
𝜕ℎ# 𝜕
= 𝑥# U 𝑊!- U 𝑊!!
𝜕𝑊!- 𝜕𝑊!-

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 41


Recurrent Neural Network

&'())
However, when the gradient W≔ 𝑊 − 𝛼 &)
contains a huge number,
then we’ll end up taking
𝐽 𝑊
relatively large steps.

And instead of finding the


optimal parameter, we’ll
just bounce around a lot
W

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 42


Recurrent Neural Network

&'())
One way to prevent the W≔ 𝑊 − 𝛼 &)
Exploding Gradient
Problem would be to limit
𝐽 𝑊
𝑊** to value <1.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 43


Recurrent Neural Network

&'())
One way to prevent the W≔ 𝑊 − 𝛼 &)
Exploding Gradient
Problem would be to limit
𝐽 𝑊
𝑊** to value <1.

However this results in The


Vanishing Gradient
Problem.
W

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 44


Recurrent Neural Network

Now if we had 50
sequential RNN cells
y1 y2 y3 y4

𝑊** 𝑊** 𝑊** 𝑊**


RNN RNN RNN RNN
𝑥# ×0.52" = 𝑥# × 𝑉𝑒𝑟𝑦 𝑆𝑚𝑎𝑙𝑙 𝑁𝑢𝑚.

+,-./011
x1 x2 x3 x4
𝑥# ×𝑊**

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 45


Recurrent Neural Network

This number is super


close to 0, his is called
Vanishing Gradient y1 y2 y3 y4
Problem
𝑊** 𝑊** 𝑊** 𝑊**
RNN RNN RNN RNN
𝑥# ×0.52" = 𝑥# × 𝑉𝑒𝑟𝑦 𝑆𝑚𝑎𝑙𝑙 𝑁𝑢𝑚.

+,-./011
x1 x2 x3 x4
𝑥# ×𝑊**

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 46


Recurrent Neural Network

&'())
Now, when optimizing a W≔ 𝑊 − 𝛼 &)
parameter, instead of taking
step that are too large, we
𝐽 𝑊
end up taking steps that are
too small.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 47


Recurrent Neural Network
Summary
𝒚𝑻
ℎ ! = 𝑓" (ℎ !#$ , 𝑥! )
ℎ /0# ℎ/

ℎ ! = 𝑡𝑎𝑛ℎ(𝑊%% ℎ !#$ , +𝑊&% 𝑥! )


𝒕𝒂𝒏𝒉

𝑦! = 𝑓(𝑊%' ℎ ! )

𝒙𝑻

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 48


Recurrent Neural Network
RNN tradeoffs
RNN Advantages:
• Can process any length of the input
• Computation for step t can (in theory) use information from many steps
back
• Model size does not increase for longer input
• The same weights are applied on every timestep, so there is symmetry in
how inputs are processed.

RNN Disadvantages:
• Recurrent computation is slow
• In practice, difficult to access information from many steps back

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 49


Content

ØRecurrent Neural Network (RNN)


ØLong Short-Term Memory (LSTM)
ØGated Recurrent Unit Network (GRUs)
ØImplementation of RNN, GRU and LSTM

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 50


Long Short-Term Memory (LSTM)

Ø Long short-term memory (LSTM) is a type of recurrent neural


network (RNN) aimed at mitigating the vanishing gradient
problem commonly encountered by traditional RNNs.

Ø Its relative insensitivity to gap length is its advantage over other


RNNs, hidden Markov models, and other sequence learning methods.
Ø It aims to provide a short-term memory for RNN that can last thousands of
timesteps (thus "long short-term memory").
Ø The name is made in analogy with long-term memory and short-term
memory and their relationship, studied by cognitive psychologists since the
early 20th century.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 51


Long Short-Term Memory (LSTM)

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 52


Long Short-Term Memory (LSTM)

An LSTM unit is typically composed


of a cell and three gates: an input
gate, an output gate, and a forget
gate.

The cell remembers values over


arbitrary time intervals, and the
gates regulate the flow of
information into and out of the cell.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 53


Long Short-Term Memory (LSTM)

Forget gates decide what


information to discard from the
previous state, by mapping the
previous state and the current
input to a value between 0 and 1.

A (rounded) value of 1 signifies


retention of the information, and
a value of 0 represents discarding.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 54


Long Short-Term Memory (LSTM)

Input gates decide which pieces of


new information to store in the
current cell state, using the same
system as forget gates.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 55


Long Short-Term Memory (LSTM)

Output gates control which pieces of


information in the current cell state to
output, by assigning a value from 0 to
1 to the information, considering the
previous and current states.

Selectively outputting relevant


information from the current state
allows the LSTM network to maintain
useful, long-term dependencies to
make predictions, both in current and
future time-steps.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 56
Long Short-Term Memory (LSTM)

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 57


Long Short-Term Memory (LSTM)

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 58


Long Short-Term Memory (LSTM)

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 59


Long Short-Term Memory (LSTM)

Ø The intuition behind the LSTM architecture is to create an additional


module in a neural network that learns when to remember and when to
forget pertinent information.

Ø In other words, the network effectively learns which information might


be needed later on in a sequence and when that information is no longer
needed.

Ø For instance, in the context of natural language processing, the network


can learn grammatical dependencies.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 60


Content

ØRecurrent Neural Network (RNN)


ØLong Short-Term Memory (LSTM)
ØGated Recurrent Unit Network (GRUs)
ØImplementation of RNN, GRU and LSTM

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 61


Gated Recurrent Unit Network (GRUs)

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural


networks, introduced in 2014.

The GRU is like a long short-term memory (LSTM) with a gating


mechanism to input or forget certain features, but lacks a context vector or
output gate, resulting in fewer parameters than LSTM.

GRU's performance on certain tasks of polyphonic music modeling, speech


signal modeling and natural language processing was found to be similar to
that of LSTM.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 62


Gated Recurrent Unit Network (GRUs)

There are several variations on the full gated unit, with gating done using the
previous hidden state and light ted recurrent unit, and a simplified form
called minimal gated unit.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 63


Gated Recurrent Unit Network (GRUs)

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 64


Content

ØRecurrent Neural Network (RNN)


ØLong Short-Term Memory (LSTM)
ØGated Recurrent Unit Network (GRUs)
ØImplementation of RNN, GRU and LSTM

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 65


Implementation of RNN, GRU and LSTM

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 66


Implementation of RNN, GRU and LSTM

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 67


Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 68
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 69

You might also like