0% found this document useful (0 votes)
3 views

Module 4-1

The document provides an overview of Recurrent Neural Networks (RNNs), including their design, applications, and variations such as LSTMs and GRUs. It explains the architecture of encoder-decoder models for sequence-to-sequence tasks and discusses the advantages of modern RNNs in handling long-term dependencies. Additionally, it covers computational graphs and the concept of unrolling RNNs for better visualization and training efficiency.

Uploaded by

akshaylalsp6
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Module 4-1

The document provides an overview of Recurrent Neural Networks (RNNs), including their design, applications, and variations such as LSTMs and GRUs. It explains the architecture of encoder-decoder models for sequence-to-sequence tasks and discusses the advantages of modern RNNs in handling long-term dependencies. Additionally, it covers computational graphs and the concept of unrolling RNNs for better visualization and training efficiency.

Uploaded by

akshaylalsp6
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Module- 4

Recurrent Neural Network


Recurrent neural networks – Computational graphs, RNN design,
encoder – decoder sequence to sequence architectures, Deep
recurrent networks, recursive neural networks, modern RNNs LSTM
and GRU.

Reena Thomas, Asst. Prof., CSE dept., CEMP


Recurrent neural networks (RNN)
• RNN is a type of ANN designed for sequential data processing.

• Unlike traditional feedforward networks, RNNs have loops that allow information to
persist, making them well-suited for tasks where the order of data matters.

• In RNNs, the input and output sizes are more flexible.

• Eg. : time series forecasting, speech recognition, and natural language processing (NLP)

How does a RNN share parameters?

• By applying the same update rule to each step in the sequence, ensuring that every
output is generated using the same set of weights.

2
3
(b) Time-Layered Representation (Unrolled RNN)
• The RNN is expanded across time steps to show how information flows through the
sequence.
• Each word in a sentence (e.g., "the cat chased the mouse") is processed in a step-
by-step manner.
• The hidden state (ht) is passed to the next time step, allowing the network to retain
memory of previous words.
• The final output represents the predicted words
4
Real-Life Applications of RNNs

5
Computational graphs
• A computational graph represents the sequence of operations in a neural
network, mapping inputs and parameters to outputs and loss.

• It helps in visualizing data flow and computing gradients for training.

Unfolding a Recurrent Computational graphs

• Unfolding an RNN expands its recursive structure into a sequential computational


graph, making hidden state transitions explicit.

• This enables BPTT while maintaining shared parameters across time steps.

6
The basic formula for RNN is :
The theta are the parameters of the function f.
• Unfolding maps the left to the right in the figure below (both are computational
graphs of a RNN without output o)

• The black square indicates that an interaction takes place with a delay of 1 time
step, from the state at time t to the state at time t + 1.
• Unfolding/parameter sharing is better than using different parameters per
position: less parameters to estimate, generalize to various length.
7
RNN design

8
1. Variation 1 of RNN (basic form): hidden2hidden connections,
sequence output

• (Left)The RNN and its loss drawn with recurrent connections.


• (Right)The same seen as an time unfolded computational graph.
9
• The basic equations that defines the above RNN is shown below

• The computational graph to compute the training loss of a recurrent network that
maps an input sequence of x values to a corresponding sequence of output o
values.
• Loss 𝐿 evaluates the difference between 𝑜 and the target 𝑦.
• With softmax, 𝑜 represents log probabilities, and 𝑦^=softmax(𝑜) is compared with 𝑦
• The RNN is structured with three weight matrices:
• 𝑈: Connects input to hidden state
• 𝑊: Recurrent hidden-to-hidden connections
• 𝑉: Connects hidden state to output
This setup enables learning from sequential data.
10
2. Variation 2 of RNN output2hidden, sequence output
• It produces an output at each time step and have recurrent connections only from the
output at one time step to the hidden units at the next time step

11
• Teacher forcing can be used to train RNN as in Fig 10.4, where only
output2hidden connections exist.
• i.e hidden2hidden connections are absent.
• In teacher forcing, instead of using the model’s predicted output, we provide the
actual correct output from the training data for the previous time step.
• This helps the model learn faster and more accurately because it doesn't have to
rely on its own potentially incorrect predictions.

12
13
Encoder – decoder sequence to
sequence architectures

14
• The Encoder-Decoder sequence-to-sequence architecture is a neural network
framework designed for the tasks where input and output sequences have different
lengths.
• It consists of an encoder and decoder.
 An encoder or reader or input RNN processes the input sequence X = ( x(1) , . . . ,
x(nx ) ). The encoder emits the context C , usually as a simple function of its final
hidden state.
 A decoder or writer or output RNN is conditioned on that fixed-length vector to
generate the output sequence Y = ( y(1) , . . . , y(ny ) ).
 Encoder vector (also called the context vector) is the fixed-length representation
of the input sequence that the decoder uses to generate the output sequence.
• Commonly used in speech recognition, machine translation, and question answering.

15
16
• Training: Two RNNs (Input RNN & Output RNN) are trained jointly to maximize the
average of logP(y(1),…,y(ny) |x(1),…,x(nx)) over all the pairs of x and y sequences in
the training set.
• If the context C is a vector, then the decoder RNN is simply a vector to sequence
RNN
• One clear limitation of this architecture is when the context C output by the
encoder RNN has a dimension that is too small to properly summarize a long
sequence.
• Bahdanau et al. (2015) proposed making C a variable-length sequence instead of a
fixed-size vector.
• They introduced attention mechanism, allowing the decoder to focus on different
parts of the input dynamically.

17
How the Sequence to Sequence Model works?

18
Example
Consider the input sequence “I am a Student” to be encoded. There will be totally 4
timesteps ( 4 tokens) for the Encoder model. At each time step, the hidden state h will
be updated using the previous hidden state and the current input.

• At timestep t1, the initial hidden state h0


(zero or random) and input X[1] are used
to compute h1.
• The RNN cell outputs both h1 and an
intermediate output, but only h1 is passed
to the next layer.
• At timestep t2, the updated h1 and input
X[2] are used to compute h2.
• The RNN again produces an output, but
only h2 is propagated, continuing this
process for the remaining timesteps.
19
• In tasks like question answering, the decoder
generates words sequentially, forming the
output sequence.

20
Deep recurrent networks
• The computation in most RNNs can be decomposed into three blocks of
parameters and associated transformations:

1. from the input to the hidden state, x(t) → h(t)

2. from the previous hidden state to the next hidden state, h(t-1) → h(t)

3. from the hidden state to the output, h(t) → o(t)

• These transformations are represented as a single layer (Shallow transformation)


within a deep MLP in the previous discussed models.

• However, we can use multiple layers for each of the above transformations,
which results in deep recurrent networks.

21
22
• Previous figure shows a significant benefit of decomposing the state of an RNN into
multiple layers.
• The lower layers in the hierarchy depicted in figure, as playing a role in transforming
the raw input into a representation that is more appropriate, at the higher levels of
the hidden state.
• It is easier to optimize shallower architectures, and adding the extra depth of figure
makes the shortest path from a variable in time step t to a variable in time step t + 1
become longer.

23
Recursive neural networks
• It represent yet another generalization of recurrent networks, with a different
kind of computational graph.

• It is structured as a deep tree, rather than the chain-like structure of RNNs.

• The typical computational graph for a recursive network is illustrated in Fig. 10.14

24
25
• Recursive networks have been successfully applied to processing data
structures as input to neural nets, in natural language process, as well as in
computer vision.
• One clear advantage of recursive network over recurrent nets is that for a
sequence of the same length , the depth can be drastically reduced from ,
which might help deal with long-term dependencies.
• In some application domains, external methods can suggest the appropriate tree
structure.
• For example, when processing natural language sentences, the tree structure for
the recursive network can be fixed to the structure of the parse tree of the
sentence provided by a natural language parser.

26
Modern RNNs
• Modern RNNs, such as LSTMs (Long Short-Term Memory) and GRUs (Gated
Recurrent Units).

• They address the vanishing and exploding gradient problems in traditional RNNs.

• They use gates to control information flow, allowing them to retain long-term
dependencies and adapt their weights at each time step.

• LSTMs use input, forget, and output gates.

• GRUs simplify this with reset and update gates, making them more computationally
efficient.

• These models are widely used in NLP, speech recognition, and time-series
forecasting due to their ability to reduce information loss.

27
LSTM (Long Short-Term Memory)

28
• LSTM is a type of RNN designed to handle sequential data and capture long-term
dependencies.

• It was introduced to solve the problem of vanishing and exploding gradients that
traditional RNNs suffer from when learning long-term dependencies.

Key Features of LSTMs

• Memory Cells: Unlike regular RNNs, LSTMs have special units called memory cells
that help them remember information over long sequences.

• Gating Mechanism: LSTMs use three gates (input, forget, and output) to control
the flow of information.

• Long-Term Dependency Handling: They can retain important past information


while forgetting irrelevant details.

29
30
31
32
33
34
35
GRU (Gated Recurrent Unit)
• GRU is a type of RNN architecture designed to solve problems like vanishing
gradients and inefficient long-term dependency learning in traditional RNNs.
• It is similar to LSTM, but is more computationally efficient due to having fewer
parameters.

GRU Architecture

36
37
38
OR

39
40
Difference between RNN and Modern RNN

41
Previous Year Questions

42
• What is a computational graph and how is it used in the context of RNN?
• Compare LSTM and RNN.
• Suppose you were given a task of predicting long term dependencies in data. Which
architecture would you prefer: LSTM or RNN? Justify your answer and explain its
architecture.
• List and explain the applications of deep recurrent neural networks.
• With neat diagram explain GRU architecture.
• Explain the concept of ‘Unrolling through time’ in Recurrent Neural Networks.
• How does a recursive neural network work?
• Draw and explain the architecture of LSTM.
• How does encoder-decoder RNN work?

43
• Draw and explain the architecture of Recurrent Neural Networks.
• Describe how an LSTM takes care of the vanishing gradient problem.
• Sketch diagrams of different Recurrent Neural Network patterns and explain them in
detail.
• Discuss different ways to make a Recurrent Neural Network(RNN) deep RNN with the
help of diagrams.

44

You might also like