0% found this document useful (0 votes)
25 views42 pages

Unit-2 Part-2

Uploaded by

hithesh187
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views42 pages

Unit-2 Part-2

Uploaded by

hithesh187
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Sequence Modeling: Recurrent and

Recursive Nets
By
Rashmi A.R.
Recurrent Neural Networks (RNNs)
• RNNs are a family of neural networks for processing sequential data.
• Much as a convolutional network is a neural network that is specialized for
processing a grid of values X such as an image.
• A recurrent neural network is a neural network that is specialized for
processing a sequence of values x(1), . . . , x(τ) .
• RNN is a type of sequential model specifically designed to work on
sequential data.
• RNN is used in NLP domain.
• RNN is well suited to sequence data.
Why use RNN?
• In sequential data, ANNs cannot be used.
• All the inputs are text, our Neural Network will not understand text, so we
need to vectorized.

Problem:
1. Textual data may be of different size.
2. Zero padding ⟵ unnecessary computation.
3. Prediction will fail when input size is big.
4. Totally disregarding the sequence information. [semantic meaning is not
maintained/retained.

Applications:
• Sentiment analysis
• Sentence completion
• Image captioning
Unfolding computational graphs
Unfolding computational graphs is a key concept in Recurrent Neural Networks
(RNNs) that helps in understanding how the network processes sequences over time.

What is a Computational Graph?


A computational graph is a structure used to formalize the computations in a neural
network, mapping inputs, parameters, and operations to produce outputs and compute
losses. It shows how data flows through the network in both the forward and
backward passes.
In Recurrent Neural Networks (RNNs), unfolding (or unrolling) refers to the process of breaking down the RNN over
multiple time steps, essentially expanding the network to represent each time step individually.

This unrolling helps visualize and understand the way the RNN processes sequential data by passing information from
one time step to the next.
Benefits of Unfolding:
• Parameter Sharing: The same parameters are used at each time step,
allowing the model to generalize across different sequence lengths.
• Backpropagation Through Time (BPTT): Once the graph is unfolded,
standard backpropagation can be applied to compute gradients across time
steps. This process is called Backpropagation Through Time (BPTT),
where gradients flow both forward in time (during the forward pass) and
backward in time (during the backward pass).
Visual Representation:
Imagine an RNN as a looped structure, where the hidden state ℎ𝑡 feeds into the
next step. Unfolding breaks this loop and stretches it into a linear chain of
computations.

Why Unfold?
Unfolding is essential because it allows the recurrent model, which operates on
a sequence, to be trained using standard neural network training techniques.
Without unfolding, the recurrence would make it difficult to compute gradients
and update parameters effectively.
Recurrent Neural Network (RNN)
• RNNs are special class of Neural Network which has memory like features
in it.
• Past inputs are remembered, that is why they work great on sequential data.

Types of RNNs
Popular recurrent neural network architecture variants include:
• Standard RNNs
• Bidirectional recurrent neural networks (BRRNs)
• Long short-term memory (LSTM)
• Gated recurrent units (GNUs)
• Encoder-decoder RNN
Figure: FeedForward Neural Network (FNN) Figure: Recurrent Neural Network (RNN)
Recurrent Neural Network (RNN) (Cont’d)

Figure: RNN Unfolded


Recurrent Neural Network (RNN) (Cont’d)
• The RNN takes an input vector X and the network generates an output
vector y by scanning the data sequentially from left to right, with each time
step updating the hidden state and producing an output.
• It shares the same parameters across all time steps. This means that, the
same set of parameters, represented by U, V, W is used consistently
throughout the network.
• U represents the weight parameter governing the connection from input
layer X to the hidden layer h , W represents the weight associated with the
connection between hidden layers, and V for the connection from hidden
layer h to output layer y.
• This sharing of parameters allows the RNN to effectively capture temporal
dependencies and process sequential data more efficiently by retaining the
information from previous input in its current hidden state.
why RNN?
** in other NN it was enable to process he sequential data.
** it only has the current input.it will not be having any idea about the history of the
inputs.
** there is no memory in ann
**
Challenges in RNNs

1. Vanishing and Exploding Gradients:


Vanishing Gradients: During backpropagation, the gradients of the loss
function with respect to the weights can become very small, especially when
long sequences are involved. This can lead to the model not learning
effectively because the updates to the weights become negligible.
Exploding Gradients: Conversely, gradients can become very large, causing
instability in training and leading to divergent weight updates.

2. Long-Term Dependencies:
Difficulty in Capturing Long-Term Dependencies: Standard RNNs
struggle to capture long-range dependencies due to the vanishing gradient
problem, making it challenging to learn from data where dependencies span
many time steps.
Bi-directional Recurrent Neural Network (BRNN)
Bi-directional Recurrent Neural Network (BRNN)
• A bidirectional recurrent neural network (RNN) is a type of recurrent
neural network (RNN) that processes input sequences in both forward and
backward directions.
• This allows the RNN to capture information from the input sequence that
may be relevant to the output prediction. Still, the same could be lost in a
traditional RNN that only processes the input sequence in one direction.
• This allows the network to consider information from the past and future
when making predictions rather than just relying on the input data at the
current time step.
• This can be useful for tasks such as language processing, where
understanding the context of a word or phrase can be important for making
accurate predictions.
• In general, bidirectional RNNs can help improve a model's performance on
various sequence-based tasks.
• This means that the network has two separate RNNs:
• One that processes the input sequence from left to right
• Another one that processes the input sequence from right to left.
• Two RNNs are applied, at every time step, both RNNs are giving the
output.
• Finally, we concatenate the output.
Equations:
We have 2 RNNs
Bi-directional Recurrent Neural Network (BRNN)
Need for Bidirectional Recurrent Neural Networks

• Bidirectional Recurrent Neural Networks (RNNs) are used when the output
at a particular time step depends on the input at that time step as well
as the inputs that come after it. However, in some cases, the output at a
particular time step may also depend on the inputs that come before it. In
such cases, Bidirectional RNNs are used to capture the dependencies in
both directions.
• The main need for Bidirectional RNNs arises in sequential data processing
tasks where the context of the data is important. For instance, in natural
language processing, the meaning of a word in a sentence may depend on
the words that come before and after it. Similarly, in speech recognition, the
current sound may depend on the previous and upcoming sounds.
• The need for Bidirectional RNNs arises in tasks where the context of the
data is important, and the output at a particular time step depends on both
past and future inputs. By processing the input sequence in both
directions, Bidirectional RNNs help to capture these dependencies and
improve the accuracy of predictions.
BRNNs improve upon traditional RNNs

Bidirectional Recurrent Neural Networks (BRNNs) enhance traditional RNNs


by processing input sequences in both forward and backward directions,
effectively capturing dependencies from both past and future contexts. Here’s
how they improve upon traditional RNNs and where they are particularly
useful:
• Contextual Awareness:
• Bi-RNNs process data in both forward and backward directions,
allowing the model to consider both past and future context when
making predictions.
• This is particularly useful for tasks like speech recognition, language
modeling, and sequence labeling, where understanding the entire
sequence enhances performance.
• Improved Accuracy:
• By leveraging information from both directions, Bi-RNNs often
outperform unidirectional RNNs in tasks that involve complex
dependencies across time steps.
• This makes them ideal for tasks like machine translation and sentiment
analysis.
BRNNs improve upon traditional RNNs (Cont’d):

• Better Handling of Long-term Dependencies:


• Bi-RNNs help capture relationships between distant time steps more
effectively since the backward pass can directly access future time
steps, mitigating the issue of long-term dependencies in sequences.
• Versatility in Sequence Tasks:
• Bi-RNNs are well-suited for various sequence-to-sequence tasks, such
as named entity recognition, part-of-speech tagging, and video frame
classification, where context from both past and future frames is
important for better prediction.
Some common tasks of Bi-RNNs include:

1. Natural Language Processing (NLP):


• Machine Translation: Understanding the entire sentence (both previous
and next words) improves translation accuracy.
• Named Entity Recognition (NER): Identifying entities like names,
locations, and organizations benefits from knowing both preceding and
following words.
• Part-of-Speech Tagging: Determining the grammatical structure of a
sentence requires context from both sides of a word.
2. Speech Recognition: Bi-RNNs are effective for converting speech into
text, as they take into account future speech frames to understand the current
spoken word more accurately.
3. Time Series Analysis: In applications like stock price prediction, weather
forecasting, or anomaly detection, using data from both previous and future
points in the time series improves prediction accuracy.
4. Sentiment Analysis: Understanding the sentiment of a sentence often
requires knowing both the words before and after key terms (e.g., "not good"
vs. "good").
5. Video Frame Analysis: For tasks like action recognition or scene
understanding in video sequences, Bi-RNNs process both previous and future
frames to provide more accurate predictions.
6. Speech Synthesis and Text-to-Speech (TTS): By considering future text
or phonemes, Bi-RNNs can generate more natural and coherent speech.
7. Handwriting Recognition: For sequential input like handwriting, both
past and future strokes influence the interpretation of a character or word.
Encoder-Decoder Sequence to Sequence Architecture
Sequence to sequence data

Input Output
Sequence Sequence

Eg. Nice to meet you

3 challenges:
1. Input: sentence in some language (English) -> variable length
2. Output: sentence in some language (Hindi) -> variable length
3. No guarantee that 4 English words (Eg. Nice to meet you) will be translated
only to 4 words of Hindi language.

Variable length is handled by LSTM, GRU, and variable length is handled in


input and not in output and this is what needs to be solved in Seq to seq
network.
Encoder-Decoder Sequence to Sequence Architecture
Encoder-Decoder Sequence to Sequence Architecture

• Encoder: Where we give the input sequence. English sentence is fed as


input.
• The input sequence are fed token by token (word to word) basis.
• Encoder will try to understand the complete sentence.
• It tries to summarize, once it summarizes it gives the output.
• The output will be a vector which is called a context vector.
• Context vector will be given to the decoder.
• Decoder will try to understand the context vector, it tries to print the output
word by word. It converts it into other language (language translation)
Encoder
Decoder
Encoder-Decoder Sequence to Sequence Architecture
Recursive Neural Network
Recursive Neural Network
• Recursive Neural Networks (RvNNs) are a class of deep neural networks
that can learn detailed and structured information.
• With RvNN, you can get a structured prediction by recursively applying
the same set of weights on structured inputs.
• The word recursive indicates that the neural network is applied to its
output.
• Due to their deep tree-like structure, Recursive Neural Networks can
handle hierarchical data. The tree structure means combining child nodes
and producing parent nodes.
• Each child-parent bond has a weight matrix, and similar children have the
same weights. The number of children for every node in the tree is fixed
to enable it to perform recursive operations and use the same weights.
RvNNs are used when there's a need to parse an entire sentence.
• The efficiency of a recursive network is higher than a feed-forward
network.
• Recurrent Networks are recurrent over time, meaning recursive networks
are just a generalization of the recurrent network.
Recursive Neural Network
• A subset of deep neural networks called recursive neural networks
(RvNNs) are capable of learning organized and detailed data. By
repeatedly using the same set of weights on structured inputs, RvNN
enables you to obtain a structured prediction. Recursive refers to the
neural network's application to its output.
• Recursive neural networks are capable of handling hierarchical data
because of their indepth tree-like structure. In a tree structure, parent nodes
are created by joining child nodes. There is a weight matrix for every
child-parent bond, and comparable children have the same weights. To
allow for recursive operations and the use of the same weights, the number
of children for each node in the tree is fixed. When it's necessary to parse a
whole sentence, RvNNs are employed.
Recursive Neural Network
• A Recursive Neural Network (RvNNs) is a generalization of Recurrent
Neural Networks (RNNs), where the computational graph is structured as
a deep tree, rather than a simple chain-like structure as in traditional
RNNs.
• This tree structure allows RvNNs to process data that has hierarchical
structures, such as parse trees in natural language processing (NLP) or
hierarchical structures in images.
Structure and Workflow:
• The typical computational graph of a recursive network maps a variable-
length sequence of input data to a fixed-size output representation.
• RvNNs use weight-sharing in a manner similar to RNNs, but the key
difference is in the shape of the computational graph.
• RvNNs propagate information over a tree structure rather than over a
linear sequence, which allows them to capture hierarchical relationships
between inputs.
Applications:
• Natural Language Processing (NLP): RvNNs are well-suited for parsing
and processing tree structures like sentence parse trees, which represent
the syntactic structure of sentences. They can be used for tasks such as
sentiment analysis and syntactic parsing, where the hierarchical nature of
language is crucial.
• Computer Vision: In vision tasks, RvNNs can be applied to capture
relationships between objects in a scene or for scene parsing, where
understanding spatial hierarchies is important.
• Learning to Reason: RvNNs have been suggested as a tool for learning
reasoning tasks, where data is structured in a hierarchical or nested way,
making them useful in tasks that require compositionality and hierarchical
reasoning.

In summary, Recursive Neural Networks extend traditional RNNs by operating


on hierarchical data structures rather than linear sequences. They excel in
applications like natural language processing and tasks requiring hierarchical
reasoning, making them particularly useful for tasks involving structured data.

You might also like