0% found this document useful (0 votes)

11 views

Module 4-1

The document provides an overview of Recurrent Neural Networks (RNNs), including their design, applications, and variations such as LSTMs and GRUs. It explains the architecture of encoder-decoder models for sequence-to-sequence tasks and discusses the advantages of modern RNNs in handling long-term dependencies. Additionally, it covers computational graphs and the concept of unrolling RNNs for better visualization and training efficiency.

Uploaded by

akshaylalsp6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Module 4-1

Uploaded by

akshaylalsp6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Module- 4

Recurrent Neural Network

Recurrent neural networks – Computational graphs, RNN design,
encoder – decoder sequence to sequence architectures, Deep
recurrent networks, recursive neural networks, modern RNNs LSTM
and GRU.

Reena Thomas, Asst. Prof., CSE dept., CEMP

Recurrent neural networks (RNN)
• RNN is a type of ANN designed for sequential data processing.

• Unlike traditional feedforward networks, RNNs have loops that allow information to
persist, making them well-suited for tasks where the order of data matters.

• In RNNs, the input and output sizes are more flexible.

• Eg. : time series forecasting, speech recognition, and natural language processing (NLP)

How does a RNN share parameters?

• By applying the same update rule to each step in the sequence, ensuring that every
output is generated using the same set of weights.

2
3
(b) Time-Layered Representation (Unrolled RNN)
• The RNN is expanded across time steps to show how information flows through the
sequence.
• Each word in a sentence (e.g., "the cat chased the mouse") is processed in a step-
by-step manner.
• The hidden state (ht) is passed to the next time step, allowing the network to retain
memory of previous words.
• The final output represents the predicted words
4
Real-Life Applications of RNNs

5
Computational graphs
• A computational graph represents the sequence of operations in a neural
network, mapping inputs and parameters to outputs and loss.

• It helps in visualizing data flow and computing gradients for training.

Unfolding a Recurrent Computational graphs

• Unfolding an RNN expands its recursive structure into a sequential computational

graph, making hidden state transitions explicit.

• This enables BPTT while maintaining shared parameters across time steps.

6
The basic formula for RNN is :
The theta are the parameters of the function f.
• Unfolding maps the left to the right in the figure below (both are computational
graphs of a RNN without output o)

• The black square indicates that an interaction takes place with a delay of 1 time
step, from the state at time t to the state at time t + 1.
• Unfolding/parameter sharing is better than using different parameters per
position: less parameters to estimate, generalize to various length.
7
RNN design

8
1. Variation 1 of RNN (basic form): hidden2hidden connections,
sequence output

• (Left)The RNN and its loss drawn with recurrent connections.

• (Right)The same seen as an time unfolded computational graph.
9
• The basic equations that defines the above RNN is shown below

• The computational graph to compute the training loss of a recurrent network that
maps an input sequence of x values to a corresponding sequence of output o
values.
• Loss 𝐿 evaluates the difference between 𝑜 and the target 𝑦.
• With softmax, 𝑜 represents log probabilities, and 𝑦^=softmax(𝑜) is compared with 𝑦
• The RNN is structured with three weight matrices:
• 𝑈: Connects input to hidden state
• 𝑊: Recurrent hidden-to-hidden connections
• 𝑉: Connects hidden state to output
This setup enables learning from sequential data.
10
2. Variation 2 of RNN output2hidden, sequence output
• It produces an output at each time step and have recurrent connections only from the
output at one time step to the hidden units at the next time step

11
• Teacher forcing can be used to train RNN as in Fig 10.4, where only
output2hidden connections exist.
• i.e hidden2hidden connections are absent.
• In teacher forcing, instead of using the model’s predicted output, we provide the
actual correct output from the training data for the previous time step.
• This helps the model learn faster and more accurately because it doesn't have to
rely on its own potentially incorrect predictions.

12
13
Encoder – decoder sequence to
sequence architectures

14
• The Encoder-Decoder sequence-to-sequence architecture is a neural network
framework designed for the tasks where input and output sequences have different
lengths.
• It consists of an encoder and decoder.
 An encoder or reader or input RNN processes the input sequence X = ( x(1) , . . . ,
x(nx ) ). The encoder emits the context C , usually as a simple function of its final
hidden state.
 A decoder or writer or output RNN is conditioned on that fixed-length vector to
generate the output sequence Y = ( y(1) , . . . , y(ny ) ).
 Encoder vector (also called the context vector) is the fixed-length representation
of the input sequence that the decoder uses to generate the output sequence.
• Commonly used in speech recognition, machine translation, and question answering.

15
16
• Training: Two RNNs (Input RNN & Output RNN) are trained jointly to maximize the
average of logP(y(1),…,y(ny) |x(1),…,x(nx)) over all the pairs of x and y sequences in
the training set.
• If the context C is a vector, then the decoder RNN is simply a vector to sequence
RNN
• One clear limitation of this architecture is when the context C output by the
encoder RNN has a dimension that is too small to properly summarize a long
sequence.
• Bahdanau et al. (2015) proposed making C a variable-length sequence instead of a
fixed-size vector.
• They introduced attention mechanism, allowing the decoder to focus on different
parts of the input dynamically.

17
How the Sequence to Sequence Model works?

18
Example
Consider the input sequence “I am a Student” to be encoded. There will be totally 4
timesteps ( 4 tokens) for the Encoder model. At each time step, the hidden state h will
be updated using the previous hidden state and the current input.

• At timestep t1, the initial hidden state h0

(zero or random) and input X[1] are used
to compute h1.
• The RNN cell outputs both h1 and an
intermediate output, but only h1 is passed
to the next layer.
• At timestep t2, the updated h1 and input
X[2] are used to compute h2.
• The RNN again produces an output, but
only h2 is propagated, continuing this
process for the remaining timesteps.
19
• In tasks like question answering, the decoder
generates words sequentially, forming the
output sequence.

20
Deep recurrent networks
• The computation in most RNNs can be decomposed into three blocks of
parameters and associated transformations:

1. from the input to the hidden state, x(t) → h(t)

2. from the previous hidden state to the next hidden state, h(t-1) → h(t)

3. from the hidden state to the output, h(t) → o(t)

• These transformations are represented as a single layer (Shallow transformation)

within a deep MLP in the previous discussed models.

• However, we can use multiple layers for each of the above transformations,
which results in deep recurrent networks.

21
22
• Previous figure shows a significant benefit of decomposing the state of an RNN into
multiple layers.
• The lower layers in the hierarchy depicted in figure, as playing a role in transforming
the raw input into a representation that is more appropriate, at the higher levels of
the hidden state.
• It is easier to optimize shallower architectures, and adding the extra depth of figure
makes the shortest path from a variable in time step t to a variable in time step t + 1
become longer.

23
Recursive neural networks
• It represent yet another generalization of recurrent networks, with a different
kind of computational graph.

• It is structured as a deep tree, rather than the chain-like structure of RNNs.

• The typical computational graph for a recursive network is illustrated in Fig. 10.14

24
25
• Recursive networks have been successfully applied to processing data
structures as input to neural nets, in natural language process, as well as in
computer vision.
• One clear advantage of recursive network over recurrent nets is that for a
sequence of the same length , the depth can be drastically reduced from ,
which might help deal with long-term dependencies.
• In some application domains, external methods can suggest the appropriate tree
structure.
• For example, when processing natural language sentences, the tree structure for
the recursive network can be fixed to the structure of the parse tree of the
sentence provided by a natural language parser.

26
Modern RNNs
• Modern RNNs, such as LSTMs (Long Short-Term Memory) and GRUs (Gated
Recurrent Units).

• They address the vanishing and exploding gradient problems in traditional RNNs.

• They use gates to control information flow, allowing them to retain long-term
dependencies and adapt their weights at each time step.

• LSTMs use input, forget, and output gates.

• GRUs simplify this with reset and update gates, making them more computationally
efficient.

• These models are widely used in NLP, speech recognition, and time-series
forecasting due to their ability to reduce information loss.

27
LSTM (Long Short-Term Memory)

28
• LSTM is a type of RNN designed to handle sequential data and capture long-term
dependencies.

• It was introduced to solve the problem of vanishing and exploding gradients that
traditional RNNs suffer from when learning long-term dependencies.

Key Features of LSTMs

• Memory Cells: Unlike regular RNNs, LSTMs have special units called memory cells
that help them remember information over long sequences.

• Gating Mechanism: LSTMs use three gates (input, forget, and output) to control
the flow of information.

• Long-Term Dependency Handling: They can retain important past information

while forgetting irrelevant details.

29
30
31
32
33
34
35
GRU (Gated Recurrent Unit)
• GRU is a type of RNN architecture designed to solve problems like vanishing
gradients and inefficient long-term dependency learning in traditional RNNs.
• It is similar to LSTM, but is more computationally efficient due to having fewer
parameters.

GRU Architecture

36
37
38
OR

39
40
Difference between RNN and Modern RNN

41
Previous Year Questions

42
• What is a computational graph and how is it used in the context of RNN?
• Compare LSTM and RNN.
• Suppose you were given a task of predicting long term dependencies in data. Which
architecture would you prefer: LSTM or RNN? Justify your answer and explain its
architecture.
• List and explain the applications of deep recurrent neural networks.
• With neat diagram explain GRU architecture.
• Explain the concept of ‘Unrolling through time’ in Recurrent Neural Networks.
• How does a recursive neural network work?
• Draw and explain the architecture of LSTM.
• How does encoder-decoder RNN work?

43
• Draw and explain the architecture of Recurrent Neural Networks.
• Describe how an LSTM takes care of the vanishing gradient problem.
• Sketch diagrams of different Recurrent Neural Network patterns and explain them in
detail.
• Discuss different ways to make a Recurrent Neural Network(RNN) deep RNN with the
help of diagrams.

AI by Hand Vol 1
No ratings yet
AI by Hand Vol 1
28 pages
BackPropagation Through Time
No ratings yet
BackPropagation Through Time
6 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
DL_MOD4 (3)
No ratings yet
DL_MOD4 (3)
105 pages
Module5-dl
No ratings yet
Module5-dl
18 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Lecture Notes_RRN
No ratings yet
Lecture Notes_RRN
8 pages
Unit 5
No ratings yet
Unit 5
76 pages
Module 06
No ratings yet
Module 06
5 pages
DL For Sequencial Data
No ratings yet
DL For Sequencial Data
36 pages
Unit 4
No ratings yet
Unit 4
27 pages
Unit III- Recurrent Neural Networks
No ratings yet
Unit III- Recurrent Neural Networks
44 pages
UNIT-IV DL
No ratings yet
UNIT-IV DL
23 pages
Bianchi
No ratings yet
Bianchi
62 pages
AD3501-DL-UNIT 3 NOTES
No ratings yet
AD3501-DL-UNIT 3 NOTES
34 pages
Unit_3_rcnn
No ratings yet
Unit_3_rcnn
25 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
Unit 3
No ratings yet
Unit 3
8 pages
Recurrent and Recursive Neural Networks
No ratings yet
Recurrent and Recursive Neural Networks
19 pages
Module2 L7 RNN LSTM
No ratings yet
Module2 L7 RNN LSTM
47 pages
Module 5(Chapter 10)
No ratings yet
Module 5(Chapter 10)
17 pages
Unit III (2) RNN, LSTM, Gru
No ratings yet
Unit III (2) RNN, LSTM, Gru
14 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
lec14-RNN3-8-Feb-18
No ratings yet
lec14-RNN3-8-Feb-18
16 pages
UNIT-3
No ratings yet
UNIT-3
30 pages
RNN
No ratings yet
RNN
9 pages
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
No ratings yet
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
71 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
34 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
AAM unit 6 notes
No ratings yet
AAM unit 6 notes
20 pages
Mod 4-RNN Deep Learning
No ratings yet
Mod 4-RNN Deep Learning
63 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
AD3501_UNIT3
No ratings yet
AD3501_UNIT3
29 pages
Unit V
No ratings yet
Unit V
32 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
module-4-RNN-LSTM-GRU
No ratings yet
module-4-RNN-LSTM-GRU
59 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages
RNN Simplified.
No ratings yet
RNN Simplified.
2 pages
Sequence Modeling
No ratings yet
Sequence Modeling
131 pages
Unit 3 Chapter 1 RNN
No ratings yet
Unit 3 Chapter 1 RNN
121 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
DL 4
No ratings yet
DL 4
11 pages
Endsem Imp Dl Unit 4
No ratings yet
Endsem Imp Dl Unit 4
30 pages
Deep Learning (MODULE-4)
No ratings yet
Deep Learning (MODULE-4)
102 pages
RNN-StannfordBased
No ratings yet
RNN-StannfordBased
102 pages
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
No ratings yet
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
12 pages
Unit-5-updated
No ratings yet
Unit-5-updated
125 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
MODULE 4
No ratings yet
MODULE 4
14 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
RNN
No ratings yet
RNN
23 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
Deep Arch Msc 2024
No ratings yet
Deep Arch Msc 2024
83 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
CH4_AA1.1-Sequence Models (1)
No ratings yet
CH4_AA1.1-Sequence Models (1)
26 pages
T3-Slide_002_Vanilla RNNs
No ratings yet
T3-Slide_002_Vanilla RNNs
25 pages
Bidirectional RNN and RVNN
No ratings yet
Bidirectional RNN and RVNN
15 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Lec 9
No ratings yet
Lec 9
16 pages
Week 10
No ratings yet
Week 10
3 pages
Lecture 17 - KL Divergence, Autoencoders
No ratings yet
Lecture 17 - KL Divergence, Autoencoders
54 pages
DL Practical QP
No ratings yet
DL Practical QP
10 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
47 pages
Lecture-02: PGDDS 202
No ratings yet
Lecture-02: PGDDS 202
15 pages
Introduction To Deep Learning: Ashis Kumer Biswas, Ph.D. ML Lab@CU Denver
No ratings yet
Introduction To Deep Learning: Ashis Kumer Biswas, Ph.D. ML Lab@CU Denver
28 pages
Unit 5 CNN
No ratings yet
Unit 5 CNN
151 pages
AD3511-DEEP LEARNING LAB MANUAL Revised
No ratings yet
AD3511-DEEP LEARNING LAB MANUAL Revised
72 pages
Adaline
No ratings yet
Adaline
18 pages
Livro 4 - Deep-Learning
No ratings yet
Livro 4 - Deep-Learning
271 pages
Deep Learning Practical File
No ratings yet
Deep Learning Practical File
36 pages
3 1 Backpropagation - Example
No ratings yet
3 1 Backpropagation - Example
9 pages
Neural-Networks Back Propagation
No ratings yet
Neural-Networks Back Propagation
70 pages
Unit 3 CNN
No ratings yet
Unit 3 CNN
47 pages
Very Deep Convolutional Networks For Large-Scale Image Recognition
No ratings yet
Very Deep Convolutional Networks For Large-Scale Image Recognition
12 pages
Deep Learning (MODULE-5)
No ratings yet
Deep Learning (MODULE-5)
71 pages
Echo State Network
No ratings yet
Echo State Network
4 pages
NLP Assignment
No ratings yet
NLP Assignment
3 pages
AI & Deep Learning TensorFlow, Keras, PyTorch_80 hours-1
No ratings yet
AI & Deep Learning TensorFlow, Keras, PyTorch_80 hours-1
12 pages
ProjectOutliner
No ratings yet
ProjectOutliner
7 pages
Soft Computing Question Paper
No ratings yet
Soft Computing Question Paper
2 pages
99-Article Text-341-1-10-20190510
No ratings yet
99-Article Text-341-1-10-20190510
9 pages
Tensorflow, Keras and Deep Learning
No ratings yet
Tensorflow, Keras and Deep Learning
51 pages
Understanding GRU Networks
No ratings yet
Understanding GRU Networks
8 pages
Tensorflow Playground:: Exercise 2
No ratings yet
Tensorflow Playground:: Exercise 2
2 pages
LSTM & GRU
No ratings yet
LSTM & GRU
17 pages
Unit 2 - Soft Computing
No ratings yet
Unit 2 - Soft Computing
49 pages

Module 4-1

Uploaded by

Module 4-1

Uploaded by

Module- 4

Recurrent Neural Network

Reena Thomas, Asst. Prof., CSE dept., CEMP

• In RNNs, the input and output sizes are more flexible.

How does a RNN share parameters?

• It helps in visualizing data flow and computing gradients for training.

Unfolding a Recurrent Computational graphs

• Unfolding an RNN expands its recursive structure into a sequential computational

• (Left)The RNN and its loss drawn with recurrent connections.

• At timestep t1, the initial hidden state h0

1. from the input to the hidden state, x(t) → h(t)

3. from the hidden state to the output, h(t) → o(t)

• These transformations are represented as a single layer (Shallow transformation)

• It is structured as a deep tree, rather than the chain-like structure of RNNs.

• LSTMs use input, forget, and output gates.

Key Features of LSTMs

• Long-Term Dependency Handling: They can retain important past information

You might also like