0% found this document useful (0 votes)

12 views34 pages

Module 4 Part 1

Module 4 of the Deep Learning course focuses on Recurrent Neural Networks (RNNs), covering their design, computational graphs, and applications in sequence-to-sequence tasks. It explains the architecture of RNNs, including encoder-decoder models, and discusses concepts such as backpropagation through time and teacher forcing. The module highlights the importance of RNNs in processing sequential data like text and time-series, and introduces advanced techniques like attention mechanisms for improved performance in tasks such as machine translation.

Uploaded by

uos4367

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views34 pages

Module 4 Part 1

Uploaded by

uos4367

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

CST414

DEEP LEARNING
Module-4 PART -I

1
SYLLABUS
2

Module- 4 (Recurrent Neural Network)

 Recurrent neural networks – Computational graphs, RNN design,
encoder – decoder sequence to sequence architectures, deep
recurrent networks, recursive neural networks, modern RNNs LSTM
and GRU.
TRACE KTU
Recurrent neural networks
3

 Recurrent neural networks are designed for sequential data like

text sentences, time-series,and other discrete sequences like
biological sequences
 The input is of the form x1 . . . xn, where xt is a d-dimensional
point received at the time-stamp t.

TRACE KTU
 In a text-setting, the vector xt will contain the one-hot encoded
word at the t th time-stamp.
 In one-hot encoding, we have a vector of length equal to the
lexicon size, and the component for the relevant word has a
value of 1. All other components are 0
 successive words are dependent on one another.
4

TRACE KTU
 key point is that there is an input xt at each time-stamp, and a
hidden state ht that changes at each time stamp as new data
points arrive. Each time-stamp also has an output value yt.
 When used in the text-setting of predicting the next word, this
approach is referred to as language modeling.
 The hidden state at time t is given by a function of the input
5 vector at time t and the hidden vector at time (t − 1):

 A separate function yt = g(ht) is used to learn the output

probabilities from the hidden States

TRACE KTU
 Note that the functions f(·) and g(·) are the same at each time
stamp.
 A key point here is the presence of the self-loop in Figure 1.17(a),
which will cause the hidden state of the neural network to
change after the input of each xt.
 works with sequences of finite length, and it makes sense to
unfurl the loop into a “time-layered” network that looks more like
a feed-forward network. This network is shown in Figure 1.17(b).
6
 weight matrices of the connections are shared by multiple
connections in the time layered network to ensure that the
same function is used at each time stamp. This sharing is the key
to the domain-specific insights that are learned by the network.
 The backpropagation algorithm takes the sharing and temporal
length into account when updating the weights during the
learning process. This special type of backpropagation algorithm

TRACE KTU
is referred to as backpropagation through time (BPTT).
 Because of the recursive nature ,the recurrent network has the
ability to compute a function of variable-length inputs.
 For example, starting at h0, which is typically fixed to some constant
 Note that the function Ft(·) varies with the value of t. Such an
7 approach is particularly useful for variable-length inputs like text
sentences.
 An interesting theoretical property of recurrent neural networks
is that they are Turing complete . What this means is that given
enough data and computational resources, a recurrent neural
network can simulate any algorithm.

TRACE KTU
COMPUTATIONAL GRAPHS
8
 a recurrent neural network is a neural network that is specialized
for processing a sequence of values x(1), . . . , x(τ)
 A computational graph is a way to formalize the structure of a set of
computations,
 such as those involved in mapping inputs and parameters to

TRACE KTU
outputs and loss.
 unfolding a recursive
computational graph
or recurrent computation into a

 Unfolding this graph results in the sharing of parameters across a

deep networkStructure
 For example, consider the classical form of a dynamical
9 system:

TRACE KTU
• is recurrent because the definition of s at time t refers back to the
same definition at time t − 1.
• For a finite number of time steps τ , the graph can be unfolded by
applying the definition τ − 1 times. For example, if we unfold
equation 10.1 for τ = 3 time steps, we obtain.
 Such an expression can now be represented by a traditional
10 directed acyclic computational graph

TRACE KTU
Figure 10.1: The classical dynamical system described by equation 10.1,
illustrated as an unfolded computational graph. Each node represents the
state at some time t and the function f maps the state at t to the state at t
+ 1. The same parameters (the same value of θ used to parametrize f) are
used for all time steps.
 As another example, let us consider a dynamical system driven
11 by an external signal x(t),

 any function involving recurrence can be considered a

recurrent neural network.
 To indicate that the state is the hidden units of the network, we

TRACE KTU
now rewrite equation 10.4 using the variable h to represent the
state:
12

TRACE KTU
 One way to draw the RNN is with a diagram containing one node
for every component that might exist in a physical implementation
of the model, such as a biological neural network
 In this view, the network defines a circuit that operates in real
time, with physical parts whose current state can influence their
future state,
 The other way to draw the RNN is as an unfolded computational
graph, in which each component is represented by many different
13
variables, with one variable per time step, representing the state
of the component at that point in time.
 Each variable for each time step is drawn as a separate node of
the computational graph, as in the right of figure 10.2.
 The unfolded graph now has a size that depends on the sequence
length.
TRACE KTU
14
 The function g(t) takes the whole past sequence (x(t), x(t−1),
x(t−2), . . . , x(2), x(1)) as input and produces the current state,
 unfolded recurrent structure allows us to factorize g(t) into
repeated application of a function f.
 The unfolding process thus introduces two major advantages:

TRACE KTU
 1. Regardless of the sequence length, the learned model always has
the same input size, because it is specified in terms of transition
from one state to another state, rather than specified in terms of
a variable-length history of states.
 2. It is possible to use the same transition function f with the
same parameters at every time step.
 The recurrent graph and the unrolled graph have their uses
15
 The recurrent graph is succinct.
 The unfolded graph provides an explicit description of which
computations to perform.
 The unfolded graph also helps to illustrate the idea of
information flow forward in time (computing outputs and
losses) and backward in time (computing gradients) by
TRACE KTU
explicitly showing the path along which this information
flows.
RNN Design Figure 10.3: The computational graph to compute the training
loss of a recurrent network that maps an input sequence of x
16 values to a corresponding sequence of output o values.
A loss L measures how far each o is from the corresponding
training target y . When using softmax outputs, we assume o
is the unnormalized log probabilities. The loss L internally
computes ˆ y = softmax(o) and compares this to the target y.
The RNN has input to hidden connections parametrized by a
weight matrix U, hidden-to-hidden recurrent connections
parametrized by a weight matrix W , and hidden-to-output
connections parametrized by a weight matrix V . Equation

TRACE KTU10.8 defines forward propagation in this model. (Left)The

RNN and its loss drawn with recurrent connections. (Right)The
same seen as an time unfolded computational graph, where
each node is now associated with one particular time
instance.
• Some examples of important design patterns for recurrent neural networks
17 include the following

 • Recurrent networks that produce an output at each time step

and have recurrent connections between hidden units, illustrated
in figure 10.3.
 • Recurrent networks that produce an output at each time step
and have recurrent connections only from the output at one time
TRACE KTU
step to the hidden units at the next time step, illustrated in figure
10.4
 • Recurrent networks with recurrent connections between hidden
units, that read an entire sequence and then produce a single
output, illustrated in figure 10.5.
18

TRACE KTU
smaller set of functions) than those in the family represented by figure 10.3.
 The RNN in this figure is trained to put a specific output value
19 into o , and o is the only information it is allowed to send to
the future.
 There are no direct connections from h going forward. The
previous h is connected to the present only indirectly, via the
predictions it was used to produce.
 Unless o is very high-dimensional and rich, it will usually lack

TRACE KTU
important information from the past.
 This makes the RNN in this figure less powerful, but it may be
easier to train because each time step can be trained in isolation
from the others t = 1 to t = τ , we apply the following update
equations:
 where the parameters are the bias vectors b and c along with
the weight matrices U , V and W , respectively for input-to-
20
hidden, hidden-to-output and hidden-to –hidden
connections
 This is an example of a recurrent network that maps an input
sequence to an output sequence of the same length.
 The total loss for a given sequence of x values paired with a
sequence of y values would then be just
TRACE KTU
 The sum of the losses over all the time steps. For example, if
L(t) is the negative log-likelihood of y (t) given x(1) , . . . , x(t) ,
then
21

moving right to left through the graph. The runtime is O(τ ) and
cannot be reduced by parallelization because the forward propagation
graph is inherently sequential; each time step may only be computed
after the previous one.

TRACE KTU
 States computed in the forward pass must be stored until they are
reused during the backward pass, so the memory cost is also O(τ ). The
back-propagation algorithm applied to the unrolled graph with O( τ)
cost is called back-propagation through time or BPTT
22  Teacher forcing is a procedure that emerges from the
maximum likelihood criterion, in which during training the y(t)
model receives the ground truth output as input at time t + 1.

The conditional maximum likelihood criterion is

TRACE KTU
• we see that at time t = 2, the model is trained to maximize the
y(2) conditional probability of given both the x sequence so far
and the previous y value from the training set.
• Maximum likelihood thus specifies that during training, rather
than feeding the model’s own output back into itself, these
connections should be fed with the target values specifying what
the correct output should be.
 motivated teacher forcing as allowing us to avoid back-
23
propagation through time in models that lack hidden-to-hidden
connections
 Some models may thus be trained with both teacher forcing and
BPTT.
 The disadvantage of strict teacher forcing arises if the network is
going to be later used in an open-loop mode, with the network

TRACE KTU
outputs (or samples from the output distribution) fed back as
input
ENCODER – DECODER SEQUENCE TO SEQUENCE
24 ARCHITECTURES

TRACE KTU
 Here we discuss how an RNN can be trained to map an input
25 sequence to an output sequence which is not necessarily
of the same length.
 This comes up in many applications, such as speech
recognition, machine translation or question answering, where
the input and output sequences in the training set are
generally not of the same length

TRACE KTU
 The input to the RNN the “context.” We want to produce a
representation of this context, C . The context C might be a
vector or sequence of vectors that summarize the input
sequence
26  (1) an encoder or reader or input RNN processes the input
sequence. The encoder emits the context C , usually as a
simple function of its final hidden state.
 (2) a decoder or writer or output RNN is conditioned on that
fixed-length vector to generate the output sequence

TRACE KTU
 In a sequence-to-sequence architecture, the two RNNs are
trained jointly to maximize the average of

over all the pairs of x and y sequences in the training set.

 The last state hnx of the encoder RNN is typically used as
a representation C of the input sequence that is provided
as input to the decoder RNN.
 If the context C is a vector, then the decoder RNN is simply a vector-
27 to sequence RNN
 There is no constraint that the encoder must have the same
size of hidden layer as the decoder
 One clear limitation of this architecture is when the context C
output by the encoder RNN has a dimension that is too small to
properly summarize a long sequence.

TRACE KTU
 Phenomenon was observed by bahdanau et al. (2015) in the context
of machine translation. They proposed to make C a variable-length
sequence rather than a fixed-size vector.
 They introduced an attention mechanism that learns to associate
elements of the sequence c to elements of the output sequence
28

TRACE KTU
 Encoder
29
• A stack of several recurrent units where each accepts a
single element of the input sequence, collects information for
that element and propagates it forward.
• In question-answering problem, the input sequence is a
collection of all words from the question. Each word is
represented as x_i where i is the order of that word.

TRACE KTU
• The hidden states h_i are computed using the formula:
30  Encoder Vector
• This is the final hidden state produced from the encoder part
of the model. It is calculated using the formula above.
• This vector aims to encapsulate the information for all input
elements in order to help the decoder make accurate
predictions.

TRACE KTU
• It acts as the initial hidden state of the decoder part of the
model.
 Decoder
31
• A stack of several recurrent units where each predicts an output
y_t at a time step t.
• Each recurrent unit accepts a hidden state from the previous
unit and produces and output as well as its own hidden state.
• In the question-answering problem, the output sequence is a
collection of all words from the answer. Each word is represented

TRACE KTU
as y_i where i is the order of that word.
• Any hidden state h_i is computed using the formula:
 we are just using the previous hidden state to compute the next
32 one.
• The output y_t at time step t is computed using the formula:

• We calculate the outputs using the hidden state at the current

TRACE KTU
time step together with the respective weight W(S). Softmax is
used to create a probability vector which will help us determine
the final output (e.g. word in the question-answering problem)
Fig; machine translation
33 English to spanish

TRACE KTU
34

Applications
It possesses many applications
TRACE KTU
such as
•Google’s Machine Translation
•Question answering chatbots
•Speech recognition
•Time Series Application etc.,

Troanary Photonic Storage Blueprint - How Light Based Logic can Redefine Computation and Data Storage
From Everand
Troanary Photonic Storage Blueprint - How Light Based Logic can Redefine Computation and Data Storage
Ylia Callan
No ratings yet
Chapter 4 Introduction To Discontinuity Study
No ratings yet
Chapter 4 Introduction To Discontinuity Study
87 pages
MODULE 5 Part 1
No ratings yet
MODULE 5 Part 1
46 pages
Solid State Physics From The Material Properties of Solids To Nanotechnologies Essentials of Physics Series David Schmool (Author) Download
No ratings yet
Solid State Physics From The Material Properties of Solids To Nanotechnologies Essentials of Physics Series David Schmool (Author) Download
42 pages
Fundamental Principles of Counting - 073819
No ratings yet
Fundamental Principles of Counting - 073819
6 pages
Hydrogen Safety & Risk Analysis - A Critical Review
No ratings yet
Hydrogen Safety & Risk Analysis - A Critical Review
19 pages
CST402 Scheme
No ratings yet
CST402 Scheme
9 pages
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
21 pages
NMP5 Q4 Week 2
No ratings yet
NMP5 Q4 Week 2
16 pages
DC Notes
No ratings yet
DC Notes
54 pages
Question Answer Bank - Module II
No ratings yet
Question Answer Bank - Module II
30 pages
Module 5 Part2new
No ratings yet
Module 5 Part2new
71 pages
Hollywood Sex Analysis
No ratings yet
Hollywood Sex Analysis
2 pages
Gtu Me Dissertation Topics
100% (2)
Gtu Me Dissertation Topics
8 pages
Protection Coordinator
No ratings yet
Protection Coordinator
4 pages
CS Project File
No ratings yet
CS Project File
8 pages
PD 11 - 12 Q2 1201 My Goals PS
No ratings yet
PD 11 - 12 Q2 1201 My Goals PS
13 pages
Unit - 2 TC
No ratings yet
Unit - 2 TC
23 pages
Table
No ratings yet
Table
1 page
Indian Meterology Pilot Mantras
No ratings yet
Indian Meterology Pilot Mantras
6 pages
Uts Finals Reviewer Complete
No ratings yet
Uts Finals Reviewer Complete
6 pages
IAN Akyildiz
No ratings yet
IAN Akyildiz
49 pages
SPARK STAR-68 Final 33KV Tripping
No ratings yet
SPARK STAR-68 Final 33KV Tripping
2 pages
Scotts DW2 Guide V1.7
No ratings yet
Scotts DW2 Guide V1.7
40 pages
Tajneen Islam - Write-Up 3 Soils
No ratings yet
Tajneen Islam - Write-Up 3 Soils
9 pages
Exetastai-The Discourses of Identity in Hellenistic Erythrai
100% (1)
Exetastai-The Discourses of Identity in Hellenistic Erythrai
34 pages
Unit 2
No ratings yet
Unit 2
48 pages
HT39
No ratings yet
HT39
18 pages
Opa1632 Used in AMB Laboratories Schematics
No ratings yet
Opa1632 Used in AMB Laboratories Schematics
35 pages
Data Sheet - F12 EN 2021.12.09
No ratings yet
Data Sheet - F12 EN 2021.12.09
4 pages
Atomic Theory Science Presentation Colorful 3D Style
No ratings yet
Atomic Theory Science Presentation Colorful 3D Style
10 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
Ad3501-Dl-Unit 3 Notes
No ratings yet
Ad3501-Dl-Unit 3 Notes
34 pages
AD3501 DL UNIT 3 Notes - Nil AD3501 DL UNIT 3 Notes - Nil
No ratings yet
AD3501 DL UNIT 3 Notes - Nil AD3501 DL UNIT 3 Notes - Nil
31 pages
DL Unit Iv
No ratings yet
DL Unit Iv
15 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
54 pages
Recurrent Neural Networks: RNN: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
No ratings yet
Recurrent Neural Networks: RNN: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
47 pages
Technical DL U4-6
No ratings yet
Technical DL U4-6
98 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
Explain The Concept of Unfolding Computational Graphs in The Context of Recurrent Neural Networks
No ratings yet
Explain The Concept of Unfolding Computational Graphs in The Context of Recurrent Neural Networks
9 pages
Unit 3 RCNN
No ratings yet
Unit 3 RCNN
25 pages
Untitled10 - Jupyter Notebook
No ratings yet
Untitled10 - Jupyter Notebook
9 pages
RNN Tutorial
No ratings yet
RNN Tutorial
41 pages
A Brief Overview of Recurrent Neural Networks (RNN)
No ratings yet
A Brief Overview of Recurrent Neural Networks (RNN)
8 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
19 pages
Mod 4-RNN Deep Learning
No ratings yet
Mod 4-RNN Deep Learning
63 pages
Recurrent and Recursive Neural Networks
No ratings yet
Recurrent and Recursive Neural Networks
19 pages
Black Holes and Beyond
No ratings yet
Black Holes and Beyond
140 pages
Recurrent Neural Networks and Long Short-Term Memory Networks: Tutorial and Survey
No ratings yet
Recurrent Neural Networks and Long Short-Term Memory Networks: Tutorial and Survey
15 pages
A Friendly Introduction to MATLAB Programming
From Everand
A Friendly Introduction to MATLAB Programming
Orhan Gazi
No ratings yet
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
From Everand
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
Björn Olsson
No ratings yet
1 Recurrent Neural Networks
No ratings yet
1 Recurrent Neural Networks
34 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
Module5 Notes
No ratings yet
Module5 Notes
23 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
Nursing Management and Leadership Approaches From The Perspective of Registered Nurses in Portugal
No ratings yet
Nursing Management and Leadership Approaches From The Perspective of Registered Nurses in Portugal
8 pages
Technical Data Sheet Jazeera Maxim Tex JA-26002: Description
No ratings yet
Technical Data Sheet Jazeera Maxim Tex JA-26002: Description
3 pages
Unit - 5 Deep Learning
No ratings yet
Unit - 5 Deep Learning
15 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
Unit 3 Chapter 1 RNN
No ratings yet
Unit 3 Chapter 1 RNN
121 pages
Unit 5 Updated
No ratings yet
Unit 5 Updated
125 pages
Module 5 (Chapter 10)
No ratings yet
Module 5 (Chapter 10)
17 pages
4-Discovery of The Subatomic Particles
100% (1)
4-Discovery of The Subatomic Particles
35 pages
Sort - SEIRI: Checklist Item Criteria Exist? Rating Comments
No ratings yet
Sort - SEIRI: Checklist Item Criteria Exist? Rating Comments
2 pages
DL Unit 4 Part 2
No ratings yet
DL Unit 4 Part 2
8 pages
Technical DL U4-6
No ratings yet
Technical DL U4-6
98 pages
Module5 DL
No ratings yet
Module5 DL
18 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
36 pages
ML Lec 21 RNN
No ratings yet
ML Lec 21 RNN
72 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Unit 3
No ratings yet
Unit 3
41 pages
Unit 5
No ratings yet
Unit 5
76 pages
Unit-2 Part-2
No ratings yet
Unit-2 Part-2
42 pages
Unit IV
No ratings yet
Unit IV
31 pages
UNIT5
No ratings yet
UNIT5
13 pages
RNN
No ratings yet
RNN
79 pages
Bianchi
No ratings yet
Bianchi
62 pages
DL M5 Tech
No ratings yet
DL M5 Tech
21 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
9 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
LSTMDerivadas
No ratings yet
LSTMDerivadas
10 pages
Unit V Recurrent Neural Networks
No ratings yet
Unit V Recurrent Neural Networks
35 pages
DL Unit4
No ratings yet
DL Unit4
20 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
18 pages
The Prodigy of Welding: By: Brandy Ratliff Graduation Project 2010
No ratings yet
The Prodigy of Welding: By: Brandy Ratliff Graduation Project 2010
16 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
Program: B.Tech, CSE, 6 Sem, 3 Year CS 601: Machine Learning Unit-4 Machine Learning: RNN in ML
No ratings yet
Program: B.Tech, CSE, 6 Sem, 3 Year CS 601: Machine Learning Unit-4 Machine Learning: RNN in ML
24 pages
10 RNN
No ratings yet
10 RNN
19 pages
RNN Structure PDF
No ratings yet
RNN Structure PDF
1 page
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages
CS60010: Deep Learning: Recurrent Neural Network
No ratings yet
CS60010: Deep Learning: Recurrent Neural Network
44 pages
Loop-shaping Robust Control
From Everand
Loop-shaping Robust Control
Philippe Feyel
No ratings yet

Module 4 Part 1

Uploaded by

Module 4 Part 1

Uploaded by

CST414

Module- 4 (Recurrent Neural Network)

 Recurrent neural networks are designed for sequential data like

 A separate function yt = g(ht) is used to learn the output

 Unfolding this graph results in the sharing of parameters across a

 any function involving recurrence can be considered a

TRACE KTU10.8 defines forward propagation in this model. (Left)The

 • Recurrent networks that produce an output at each time step

The conditional maximum likelihood criterion is

over all the pairs of x and y sequences in the training set.

• We calculate the outputs using the hidden state at the current

You might also like