0% found this document useful (0 votes)

25 views42 pages

Unit-2 Part-2

Uploaded by

hithesh187

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views42 pages

Unit-2 Part-2

Uploaded by

hithesh187

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Sequence Modeling: Recurrent and

Recursive Nets
By
Rashmi A.R.
Recurrent Neural Networks (RNNs)
• RNNs are a family of neural networks for processing sequential data.
• Much as a convolutional network is a neural network that is specialized for
processing a grid of values X such as an image.
• A recurrent neural network is a neural network that is specialized for
processing a sequence of values x(1), . . . , x(τ) .
• RNN is a type of sequential model specifically designed to work on
sequential data.
• RNN is used in NLP domain.
• RNN is well suited to sequence data.
Why use RNN?
• In sequential data, ANNs cannot be used.
• All the inputs are text, our Neural Network will not understand text, so we
need to vectorized.

Problem:
1. Textual data may be of different size.
2. Zero padding ⟵ unnecessary computation.
3. Prediction will fail when input size is big.
4. Totally disregarding the sequence information. [semantic meaning is not
maintained/retained.

Applications:
• Sentiment analysis
• Sentence completion
• Image captioning
Unfolding computational graphs
Unfolding computational graphs is a key concept in Recurrent Neural Networks
(RNNs) that helps in understanding how the network processes sequences over time.

What is a Computational Graph?

A computational graph is a structure used to formalize the computations in a neural
network, mapping inputs, parameters, and operations to produce outputs and compute
losses. It shows how data flows through the network in both the forward and
backward passes.
In Recurrent Neural Networks (RNNs), unfolding (or unrolling) refers to the process of breaking down the RNN over
multiple time steps, essentially expanding the network to represent each time step individually.

This unrolling helps visualize and understand the way the RNN processes sequential data by passing information from
one time step to the next.
Benefits of Unfolding:
• Parameter Sharing: The same parameters are used at each time step,
allowing the model to generalize across different sequence lengths.
• Backpropagation Through Time (BPTT): Once the graph is unfolded,
standard backpropagation can be applied to compute gradients across time
steps. This process is called Backpropagation Through Time (BPTT),
where gradients flow both forward in time (during the forward pass) and
backward in time (during the backward pass).
Visual Representation:
Imagine an RNN as a looped structure, where the hidden state ℎ𝑡 feeds into the
next step. Unfolding breaks this loop and stretches it into a linear chain of
computations.

Why Unfold?
Unfolding is essential because it allows the recurrent model, which operates on
a sequence, to be trained using standard neural network training techniques.
Without unfolding, the recurrence would make it difficult to compute gradients
and update parameters effectively.
Recurrent Neural Network (RNN)
• RNNs are special class of Neural Network which has memory like features
in it.
• Past inputs are remembered, that is why they work great on sequential data.

Types of RNNs
Popular recurrent neural network architecture variants include:
• Standard RNNs
• Bidirectional recurrent neural networks (BRRNs)
• Long short-term memory (LSTM)
• Gated recurrent units (GNUs)
• Encoder-decoder RNN
Figure: FeedForward Neural Network (FNN) Figure: Recurrent Neural Network (RNN)
Recurrent Neural Network (RNN) (Cont’d)

Figure: RNN Unfolded

Recurrent Neural Network (RNN) (Cont’d)
• The RNN takes an input vector X and the network generates an output
vector y by scanning the data sequentially from left to right, with each time
step updating the hidden state and producing an output.
• It shares the same parameters across all time steps. This means that, the
same set of parameters, represented by U, V, W is used consistently
throughout the network.
• U represents the weight parameter governing the connection from input
layer X to the hidden layer h , W represents the weight associated with the
connection between hidden layers, and V for the connection from hidden
layer h to output layer y.
• This sharing of parameters allows the RNN to effectively capture temporal
dependencies and process sequential data more efficiently by retaining the
information from previous input in its current hidden state.
why RNN?
** in other NN it was enable to process he sequential data.
** it only has the current input.it will not be having any idea about the history of the
inputs.
** there is no memory in ann
**
Challenges in RNNs

1. Vanishing and Exploding Gradients:

Vanishing Gradients: During backpropagation, the gradients of the loss
function with respect to the weights can become very small, especially when
long sequences are involved. This can lead to the model not learning
effectively because the updates to the weights become negligible.
Exploding Gradients: Conversely, gradients can become very large, causing
instability in training and leading to divergent weight updates.

2. Long-Term Dependencies:
Difficulty in Capturing Long-Term Dependencies: Standard RNNs
struggle to capture long-range dependencies due to the vanishing gradient
problem, making it challenging to learn from data where dependencies span
many time steps.
Bi-directional Recurrent Neural Network (BRNN)
Bi-directional Recurrent Neural Network (BRNN)
• A bidirectional recurrent neural network (RNN) is a type of recurrent
neural network (RNN) that processes input sequences in both forward and
backward directions.
• This allows the RNN to capture information from the input sequence that
may be relevant to the output prediction. Still, the same could be lost in a
traditional RNN that only processes the input sequence in one direction.
• This allows the network to consider information from the past and future
when making predictions rather than just relying on the input data at the
current time step.
• This can be useful for tasks such as language processing, where
understanding the context of a word or phrase can be important for making
accurate predictions.
• In general, bidirectional RNNs can help improve a model's performance on
various sequence-based tasks.
• This means that the network has two separate RNNs:
• One that processes the input sequence from left to right
• Another one that processes the input sequence from right to left.
• Two RNNs are applied, at every time step, both RNNs are giving the
output.
• Finally, we concatenate the output.
Equations:
We have 2 RNNs
Bi-directional Recurrent Neural Network (BRNN)
Need for Bidirectional Recurrent Neural Networks

• Bidirectional Recurrent Neural Networks (RNNs) are used when the output
at a particular time step depends on the input at that time step as well
as the inputs that come after it. However, in some cases, the output at a
particular time step may also depend on the inputs that come before it. In
such cases, Bidirectional RNNs are used to capture the dependencies in
both directions.
• The main need for Bidirectional RNNs arises in sequential data processing
tasks where the context of the data is important. For instance, in natural
language processing, the meaning of a word in a sentence may depend on
the words that come before and after it. Similarly, in speech recognition, the
current sound may depend on the previous and upcoming sounds.
• The need for Bidirectional RNNs arises in tasks where the context of the
data is important, and the output at a particular time step depends on both
past and future inputs. By processing the input sequence in both
directions, Bidirectional RNNs help to capture these dependencies and
improve the accuracy of predictions.
BRNNs improve upon traditional RNNs

Bidirectional Recurrent Neural Networks (BRNNs) enhance traditional RNNs

by processing input sequences in both forward and backward directions,
effectively capturing dependencies from both past and future contexts. Here’s
how they improve upon traditional RNNs and where they are particularly
useful:
• Contextual Awareness:
• Bi-RNNs process data in both forward and backward directions,
allowing the model to consider both past and future context when
making predictions.
• This is particularly useful for tasks like speech recognition, language
modeling, and sequence labeling, where understanding the entire
sequence enhances performance.
• Improved Accuracy:
• By leveraging information from both directions, Bi-RNNs often
outperform unidirectional RNNs in tasks that involve complex
dependencies across time steps.
• This makes them ideal for tasks like machine translation and sentiment
analysis.
BRNNs improve upon traditional RNNs (Cont’d):

• Better Handling of Long-term Dependencies:

• Bi-RNNs help capture relationships between distant time steps more
effectively since the backward pass can directly access future time
steps, mitigating the issue of long-term dependencies in sequences.
• Versatility in Sequence Tasks:
• Bi-RNNs are well-suited for various sequence-to-sequence tasks, such
as named entity recognition, part-of-speech tagging, and video frame
classification, where context from both past and future frames is
important for better prediction.
Some common tasks of Bi-RNNs include:

1. Natural Language Processing (NLP):

• Machine Translation: Understanding the entire sentence (both previous
and next words) improves translation accuracy.
• Named Entity Recognition (NER): Identifying entities like names,
locations, and organizations benefits from knowing both preceding and
following words.
• Part-of-Speech Tagging: Determining the grammatical structure of a
sentence requires context from both sides of a word.
2. Speech Recognition: Bi-RNNs are effective for converting speech into
text, as they take into account future speech frames to understand the current
spoken word more accurately.
3. Time Series Analysis: In applications like stock price prediction, weather
forecasting, or anomaly detection, using data from both previous and future
points in the time series improves prediction accuracy.
4. Sentiment Analysis: Understanding the sentiment of a sentence often
requires knowing both the words before and after key terms (e.g., "not good"
vs. "good").
5. Video Frame Analysis: For tasks like action recognition or scene
understanding in video sequences, Bi-RNNs process both previous and future
frames to provide more accurate predictions.
6. Speech Synthesis and Text-to-Speech (TTS): By considering future text
or phonemes, Bi-RNNs can generate more natural and coherent speech.
7. Handwriting Recognition: For sequential input like handwriting, both
past and future strokes influence the interpretation of a character or word.
Encoder-Decoder Sequence to Sequence Architecture
Sequence to sequence data

Input Output
Sequence Sequence

Eg. Nice to meet you

3 challenges:
1. Input: sentence in some language (English) -> variable length
2. Output: sentence in some language (Hindi) -> variable length
3. No guarantee that 4 English words (Eg. Nice to meet you) will be translated
only to 4 words of Hindi language.

Variable length is handled by LSTM, GRU, and variable length is handled in

input and not in output and this is what needs to be solved in Seq to seq
network.
Encoder-Decoder Sequence to Sequence Architecture
Encoder-Decoder Sequence to Sequence Architecture

• Encoder: Where we give the input sequence. English sentence is fed as

input.
• The input sequence are fed token by token (word to word) basis.
• Encoder will try to understand the complete sentence.
• It tries to summarize, once it summarizes it gives the output.
• The output will be a vector which is called a context vector.
• Context vector will be given to the decoder.
• Decoder will try to understand the context vector, it tries to print the output
word by word. It converts it into other language (language translation)
Encoder
Decoder
Encoder-Decoder Sequence to Sequence Architecture
Recursive Neural Network
Recursive Neural Network
• Recursive Neural Networks (RvNNs) are a class of deep neural networks
that can learn detailed and structured information.
• With RvNN, you can get a structured prediction by recursively applying
the same set of weights on structured inputs.
• The word recursive indicates that the neural network is applied to its
output.
• Due to their deep tree-like structure, Recursive Neural Networks can
handle hierarchical data. The tree structure means combining child nodes
and producing parent nodes.
• Each child-parent bond has a weight matrix, and similar children have the
same weights. The number of children for every node in the tree is fixed
to enable it to perform recursive operations and use the same weights.
RvNNs are used when there's a need to parse an entire sentence.
• The efficiency of a recursive network is higher than a feed-forward
network.
• Recurrent Networks are recurrent over time, meaning recursive networks
are just a generalization of the recurrent network.
Recursive Neural Network
• A subset of deep neural networks called recursive neural networks
(RvNNs) are capable of learning organized and detailed data. By
repeatedly using the same set of weights on structured inputs, RvNN
enables you to obtain a structured prediction. Recursive refers to the
neural network's application to its output.
• Recursive neural networks are capable of handling hierarchical data
because of their indepth tree-like structure. In a tree structure, parent nodes
are created by joining child nodes. There is a weight matrix for every
child-parent bond, and comparable children have the same weights. To
allow for recursive operations and the use of the same weights, the number
of children for each node in the tree is fixed. When it's necessary to parse a
whole sentence, RvNNs are employed.
Recursive Neural Network
• A Recursive Neural Network (RvNNs) is a generalization of Recurrent
Neural Networks (RNNs), where the computational graph is structured as
a deep tree, rather than a simple chain-like structure as in traditional
RNNs.
• This tree structure allows RvNNs to process data that has hierarchical
structures, such as parse trees in natural language processing (NLP) or
hierarchical structures in images.
Structure and Workflow:
• The typical computational graph of a recursive network maps a variable-
length sequence of input data to a fixed-size output representation.
• RvNNs use weight-sharing in a manner similar to RNNs, but the key
difference is in the shape of the computational graph.
• RvNNs propagate information over a tree structure rather than over a
linear sequence, which allows them to capture hierarchical relationships
between inputs.
Applications:
• Natural Language Processing (NLP): RvNNs are well-suited for parsing
and processing tree structures like sentence parse trees, which represent
the syntactic structure of sentences. They can be used for tasks such as
sentiment analysis and syntactic parsing, where the hierarchical nature of
language is crucial.
• Computer Vision: In vision tasks, RvNNs can be applied to capture
relationships between objects in a scene or for scene parsing, where
understanding spatial hierarchies is important.
• Learning to Reason: RvNNs have been suggested as a tool for learning
reasoning tasks, where data is structured in a hierarchical or nested way,
making them useful in tasks that require compositionality and hierarchical
reasoning.

In summary, Recursive Neural Networks extend traditional RNNs by operating

on hierarchical data structures rather than linear sequences. They excel in
applications like natural language processing and tasks requiring hierarchical
reasoning, making them particularly useful for tasks involving structured data.

21cse356t NLP Unit 4
No ratings yet
21cse356t NLP Unit 4
81 pages
Deep Learning (MODULE-4)
No ratings yet
Deep Learning (MODULE-4)
102 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
36 pages
DL Co3 - PPT 1
No ratings yet
DL Co3 - PPT 1
22 pages
Unit 5 Short Notes
No ratings yet
Unit 5 Short Notes
35 pages
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
No ratings yet
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
71 pages
RNN LSTM Gru R
No ratings yet
RNN LSTM Gru R
97 pages
Unit 5 Updated
No ratings yet
Unit 5 Updated
125 pages
DL Unit Iv
No ratings yet
DL Unit Iv
15 pages
DL Module 4 Notes
No ratings yet
DL Module 4 Notes
27 pages
Recurrent Neural Networks (RNNS)
No ratings yet
Recurrent Neural Networks (RNNS)
45 pages
21CSE356T-NLP-Unit 4.1
No ratings yet
21CSE356T-NLP-Unit 4.1
46 pages
Definition of RNN (Recurrent Neural Network) :: H F W X W H B y G W H B
No ratings yet
Definition of RNN (Recurrent Neural Network) :: H F W X W H B y G W H B
26 pages
Endsem Imp DL Unit 4
No ratings yet
Endsem Imp DL Unit 4
30 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
18 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
54 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
19 pages
Module5 DL
No ratings yet
Module5 DL
18 pages
Exercise - Analytical Exposition Text
40% (5)
Exercise - Analytical Exposition Text
3 pages
DL 4
No ratings yet
DL 4
19 pages
Deep Arch MSC 2024
No ratings yet
Deep Arch MSC 2024
83 pages
Nria20-Dl - Unit-4 Notes-Final
No ratings yet
Nria20-Dl - Unit-4 Notes-Final
21 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
4-1 Nic
No ratings yet
4-1 Nic
26 pages
Unit 5
No ratings yet
Unit 5
76 pages
Unit V
No ratings yet
Unit V
32 pages
Semster - DL
No ratings yet
Semster - DL
15 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
23 pages
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
21 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
Unit 3 RCNN
No ratings yet
Unit 3 RCNN
25 pages
Unit 3
No ratings yet
Unit 3
41 pages
DL Notes
No ratings yet
DL Notes
35 pages
Module 5
No ratings yet
Module 5
21 pages
Deep & Reinforcement - Unit 4
No ratings yet
Deep & Reinforcement - Unit 4
17 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
Convolutional Neural Networks (CNNS)
No ratings yet
Convolutional Neural Networks (CNNS)
10 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
Unit 3
No ratings yet
Unit 3
30 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Explain The Concept of Unfolding Computational Graphs in The Context of Recurrent Neural Networks
No ratings yet
Explain The Concept of Unfolding Computational Graphs in The Context of Recurrent Neural Networks
9 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
DL Unit 4 Part 2
No ratings yet
DL Unit 4 Part 2
8 pages
DL Unit4
No ratings yet
DL Unit4
20 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
Unit V Recurrent Neural Networks
No ratings yet
Unit V Recurrent Neural Networks
35 pages
A Recurrent Neural Network
No ratings yet
A Recurrent Neural Network
22 pages
Module 06
No ratings yet
Module 06
5 pages
Steps For Training A Recurrent Neural Network: Advantages
No ratings yet
Steps For Training A Recurrent Neural Network: Advantages
13 pages
Lecture Notes - RRN
No ratings yet
Lecture Notes - RRN
8 pages
What Is An RNN
No ratings yet
What Is An RNN
6 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
9 pages
UNIT5
No ratings yet
UNIT5
13 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
11 pages
Research in Daily Life 1 Research in Daily Life 2: Flexible Instruction Delivery Plan (Fidp)
No ratings yet
Research in Daily Life 1 Research in Daily Life 2: Flexible Instruction Delivery Plan (Fidp)
9 pages
Vlsi Term Paper Topics
100% (1)
Vlsi Term Paper Topics
7 pages
Dell Vostro 5368 5468 Inspiron 7569 7778 LA-D822P UMA Rev 1.0 Schematics
No ratings yet
Dell Vostro 5368 5468 Inspiron 7569 7778 LA-D822P UMA Rev 1.0 Schematics
46 pages
Script Output
No ratings yet
Script Output
53 pages
Encyclopedia of Giftedness Creativity and Talent 1st Edition Barbara Kerr Download
No ratings yet
Encyclopedia of Giftedness Creativity and Talent 1st Edition Barbara Kerr Download
86 pages
Loyola College (Autonomous), Chennai - 600 034: B.Sc. November 2016 16UST1MC01/ST 1502/ST 1500 - STATISTICAL METHODS
No ratings yet
Loyola College (Autonomous), Chennai - 600 034: B.Sc. November 2016 16UST1MC01/ST 1502/ST 1500 - STATISTICAL METHODS
2 pages
Cashless Economy
No ratings yet
Cashless Economy
9 pages
16th January 2018 Part 1 Standardised Competence-Oriented Written School-Leaving Examination
No ratings yet
16th January 2018 Part 1 Standardised Competence-Oriented Written School-Leaving Examination
28 pages
Embr 1 PDF
No ratings yet
Embr 1 PDF
32 pages
ME 111 Thermodynamics 1
No ratings yet
ME 111 Thermodynamics 1
8 pages
Schneider Electric - ComPacT-NSX-new-generation - LV432642
No ratings yet
Schneider Electric - ComPacT-NSX-new-generation - LV432642
3 pages
Company Profile-Polybond
No ratings yet
Company Profile-Polybond
40 pages
Motherboard Labeling Designed by Fujitsu
No ratings yet
Motherboard Labeling Designed by Fujitsu
3 pages
Graven and Venkat
No ratings yet
Graven and Venkat
21 pages
Solving Linear Fractional Programming Problems With Interval Coefficients in The Objective Function. A New Approach
No ratings yet
Solving Linear Fractional Programming Problems With Interval Coefficients in The Objective Function. A New Approach
11 pages
Hippo 4 - Writing SF
No ratings yet
Hippo 4 - Writing SF
2 pages
DLP Cot2
No ratings yet
DLP Cot2
3 pages
ITP AND Reports Only Approved
No ratings yet
ITP AND Reports Only Approved
18 pages
18CSP83 - Project Phase 2 - Body
No ratings yet
18CSP83 - Project Phase 2 - Body
11 pages
Sensors: Implementation of Parameter Observer For Capacitors
No ratings yet
Sensors: Implementation of Parameter Observer For Capacitors
19 pages
Lecture-3.1.5
No ratings yet
Lecture-3.1.5
14 pages
9780374533557RGGReading Group Gold
No ratings yet
9780374533557RGGReading Group Gold
5 pages
Form 1 Term 2 Mathematics SOW 2024
No ratings yet
Form 1 Term 2 Mathematics SOW 2024
4 pages
Fin
No ratings yet
Fin
2 pages
Definitions of Curriculum Bsed
No ratings yet
Definitions of Curriculum Bsed
1 page
Overview of Distributed Control Systems Formalisms
No ratings yet
Overview of Distributed Control Systems Formalisms
4 pages
Basfiber For Construction Market (US Customary Units) .
No ratings yet
Basfiber For Construction Market (US Customary Units) .
4 pages
Case Study BARGAIN CITY
No ratings yet
Case Study BARGAIN CITY
1 page
C# Tutorial - SoloLearn - Learn To Code For FREE!
No ratings yet
C# Tutorial - SoloLearn - Learn To Code For FREE!
1 page
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)

Unit-2 Part-2

Uploaded by

Unit-2 Part-2

Uploaded by

Sequence Modeling: Recurrent and

What is a Computational Graph?

Figure: RNN Unfolded

1. Vanishing and Exploding Gradients:

Bidirectional Recurrent Neural Networks (BRNNs) enhance traditional RNNs

• Better Handling of Long-term Dependencies:

1. Natural Language Processing (NLP):

Eg. Nice to meet you

Variable length is handled by LSTM, GRU, and variable length is handled in

• Encoder: Where we give the input sequence. English sentence is fed as

In summary, Recursive Neural Networks extend traditional RNNs by operating

You might also like