0% found this document useful (0 votes)
445 views30 pages

Unit 3 Deep Learning SPPU BE IT

The document compares feed forward and recurrent neural networks, explaining how RNNs can process sequential data over time using feedback loops. It also describes different RNN types and how LSTMs help address the long term dependency problem in RNNs.

Uploaded by

mansimengde17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
445 views30 pages

Unit 3 Deep Learning SPPU BE IT

The document compares feed forward and recurrent neural networks, explaining how RNNs can process sequential data over time using feedback loops. It also describes different RNN types and how LSTMs help address the long term dependency problem in RNNs.

Uploaded by

mansimengde17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Unit 3

Recurrent Neural Network


CO403.3:Compare Feed Forward Neural Network and Recurrent Neural
Network and learn modeling the time
dimension using RNN and LSTM.
Recurrent Neural Network
• A recurrent neural network is a neural network that is specialized for
processing a sequence of data x(t)= x(1), . . . , x(τ) with the time step
index t ranging from 1 to τ. For tasks that involve sequential inputs,
such as speech and language, it is often better to use RNNs.
Recurrent Neural Network
• What is sequential data?
• If there is a particular order in which related things follow each other,
we call it as a sequence.
• “i am a good boy” and “am i a good boy” .
• Do you think both sentences mean the same? NO! which means the
position of words is very important! They are a sequence of
words.
• Think of a video playing. You can easily predict the next scene if you
have already watched that. But consider that you are sleepy, and you
don’t remember the position of frames(all jumbled frames in
mind). Can you predict the next scene then??? Of course not!!!
Recurrent Neural Network
They are like us!!! They can remember sequences. If you tell an RNN to predict the next scene, it can tell you.
When the Feedforward Neural Network takes decisions based on only the current input, the RNN
takes decisions based on current and previous inputs.
Recurrent Neural Network

The Green Box represents a Neural Network. The arrows indicate memory or simply feedback to the next
input.
The first figure shows the RNN. The Second figure shows the same RNN unrolled in time. Consider a
sequence [i am a good boy]. We can say that the sequence is arranged in time. At t=0, X0=“i” is given
as the input . At t=1, X1=“am” is given as the input. The state from the first time step is remembered and
given as input during the second time step along with the current input at that time step. In a Feed Forward
Neural Network, the Network is forward propagated only once per sample. But in RNN, the network is
forward propagated equal to the number of time steps per sample.
Recurrent Neural Network (RNN)

RNN works on the principle of saving the output of a particular layer and
feeding this back to the input in order to predict the output of the
layer.above is how you can convert a Feed-Forward Neural Network
into a Recurrent Neural Network:
The nodes in different layers of the neural network are compressed to
form a single layer of recurrent neural networks. A, B, and C are the
parameters of the network.
Here, “x” is the input layer, “h” is the hidden layer, and “y” is the output
layer. A, B, and C are the network parameters used to improve the output
of the model. At any given time t, the current input is a combination of
input at x(t) and x(t-1). The output at any given time is fetched back to
the network to improve on the output.
How Does Recurrent Neural Networks Work?

The input layer ‘x’ takes in the input to the neural network and processes it and passes it onto the middle layer. The
middle layer ‘h’ can consist of multiple hidden layers, each with its own activation functions and weights and biases. If
you have a neural network where the various parameters of different hidden layers are not affected by the previous layer,
ie: the neural network does not have memory, then you can use a recurrent neural network.The Recurrent Neural
Network will standardize the different activation functions and weights and biases so that each hidden layer has the same
parameters. Then, instead of creating multiple hidden layers, it will create one and loop over it as many times as
required.
Different Problems solved with RNN
• Generating Text
• Given a sequence of words we want to predict the probability of each word given the previous words.
• Machine Translation
• Machine Translation is similar to language modeling in that our input is a sequence of words in our
source language (e.g. German). We want to output a sequence of words in our target language (e.g.
English).
• Speech Recognition
• Given an input sequence of acoustic signals from a sound wave, we can predict a sequence of
phonetic segments together with their probabilities.
• Generating Image Descriptions
• Together with convolutional Neural Networks, RNNs have been used as part of a model to
generate descriptions for unlabeled images.
• Chatbots
• Chatbots can give reply to your queries. When a sequence of words is given as the input, sequence of
words will be generated at the output.
Types of Recurrent Neural Networks

There are four types of Recurrent Neural Networks:


1. One to One
2. One to Many
3. Many to One
4. Many to Many
One to One RNN This type of neural network is known as the Vanilla Neural Network. It's used for general machine
learning problems, which has a single input and a single output.
One to Many RNN This type of neural network has a single input and multiple outputs. An example of this is the
image caption.
Many to One RNN This RNN takes a https://www.simplilearn.com/tutorials/deep-learning-tutorial/rnn
sequence of inputs and generates a single output. Sentiment analysis is a good
Feed-Forward Neural Networks vs Recurrent
Neural Networks

A feed-forward neural network allows information to flow only in the forward direction, from the input nodes, through the hidden
layers, and to the output nodes. There are no cycles or loops in the network.

Below is how a simplified presentation of a feed-forward neural network looks like:

In a feed-forward neural network, the decisions are based on the current input. It doesn’t memorize the past data, and there’s no future
scope. Feed-forward neural networks are used in general regression and classification problems.
Long Short-Term Memory Network (LSTMs).

Now, let’s discuss the most popular and efficient way to deal with gradient problems, i.e., Long Short-Term
Memory Network (LSTMs).

First, let’s understand Long-Term Dependencies.

Suppose you want to predict the last word in the text: “The clouds are in the ______.”

The most obvious answer to this is the “sky.” We do not need any further context to predict the last word in the
above sentence.

Consider this sentence: “I have been staying in Spain for the last 10 years…I can speak fluent ______.”

The word you predict will depend on the previous few words in context. Here, you need the context of Spain to
predict the last word in the text, and the most suitable answer to this sentence is “Spanish.” The gap between the
relevant information and the point where it's needed may have become very large. LSTMs help you solve this
problem.
Long Short-Term Memory Networks

LSTMs are a special kind of RNN — capable of learning long-term dependencies by remembering information for long periods is the default behavior.

All RNN are in the form of a chain of repeating modules of a neural network. In standard RNNs, this repeating module will have a very simple structure,
such as a single tanh layer.

LSTMs also have a chain-like structure, but the repeating module is a bit different structure. Instead of having a single neural network layer, four
interacting layers are communicating extraordinarily.
Workings of LSTMs in RNN
Step 1: Decide How Much Past Data It Should
Remember

The first step in the LSTM is to decide which information should be


omitted from the cell in that particular time step. The sigmoid function
determines this. It looks at the previous state (ht-1) along with the
current input xt and computes the function.
Workings of LSTMs in RNN

Consider the following two sentences:

Let the output of h(t-1) be “Alice is good in Physics. John, on the other hand, is good at Chemistry.”

Let the current input at x(t) be “John plays football well. He told me yesterday over the phone that he
had served as the captain of his college football team.”

The forget gate realizes there might be a change in context after encountering the first full stop. It
compares with the current input sentence at x(t). The next sentence talks about John, so the
information on Alice is deleted. The position of the subject is vacated and assigned to John.
Workings of LSTMs in RNN

Step 2: Decide How Much This Unit Adds to the Current State
In the second layer, there are two parts. One is the sigmoid function, and the other is the tanh
function. In the sigmoid function, it decides which values to let through (0 or 1). tanh function
gives weightage to the values which are passed, deciding their level of importance (-1 to 1).

With the current input at x(t), the input gate analyzes the important information — John plays
football, and the fact that he was the captain of his college team is important.“He told me
yesterday over the phone” is less important; hence it's forgotten. This process of adding some new
information can be done via the input gate.
Workings of LSTMs in RNN

Step 3: Decide What Part of the Current Cell State Makes It to the Output
The third step is to decide what the output will be. First, we run a sigmoid layer, which decides what
parts of the cell state make it to the output. Then, we put the cell state through tanh to push the values to
be between -1 and 1 and multiply it by the output of the sigmoid gate.

Let’s consider this example to predict the next word in the sentence: “John played tremendously well against
the opponent and won for his team. For his contributions, brave ____ was awarded player of the match.”
There could be many choices for the empty space. The current input brave is an adjective, and adjectives
describe a noun. So, “John” could be the best output after brave.
Encoder Decoder architectures
The encoder-decoder model is a way of using recurrent neural networks for sequence-to-sequence
prediction problems.It was initially developed for machine translation problems
The approach involves two recurrent neural networks, one to encode the input sequence, called the
encoder, and a second to decode the encoded input sequence into the target sequence called the
decoder.
The architecture of Encoder-Decoder
The overall structure of sequence to sequence model(encoder-decoder) which is commonly used is as
shown below-
It consists of 3 parts: encoder, intermediate vector and
decoder.
Encoder-It accepts a single element of the input sequence
at each time step, process it, collects information for that element
and propagates it forward.
Intermediate vector- This is the final internal state produced
from the encoder part of the model. It contains information about
The entire input sequence to help the decoder make accurate predictions.
Decoder- given the entire sentence, it predicts an output at each time step.
Encoder part of the model-

● The encoder is basically LSTM/GRU cell.

● An encoder takes the input sequence and encapsulates the information as the internal state vectors.

● Outputs of the encoder are rejected and only internal states are

used.Let’s understand how encoder part of the model works-

LSTM takes only one element at a time, so if the input sequence is of

length m, then LSTM takes m time steps to read the entire sequence.

● Xt is the input at time step t.


● ht and ct are internal states at time step t of the LSTM and for

GRU there is only one internal state ht.

● Yt is the output at time step t.


Encoder part of the model-
Inputs of Encoder Xt-

Consider the English sentence- India is beautiful country. This sequence


can be thought of as a sentence containing 4 words

(India, is, beautiful, country). So here

X1 =’India’

X2=’is’

X3= ‘beautiful’

X4=’country’.

Therefore LSTM will read this sequence word by word in 4-time step as
follows-
Encoder part of the model-
Here each Xt (each word) is represented as a vector using the word embedding, which converts each word into a
vector of fixed length.

Now coming to internal states (ht,ct) -

● It learns what the LSTM has read until time step t. For e.g when t=2, it remembers that LSTM has read
‘India is ‘.
● The initial states ho, co(both are vectors) is initialized randomly or with zeroes.
● Remember the dimension of ho, co is same as the number of units in LSTM cell.
● The final state h4,c4 contains the crux of the entire input sequence India is beautiful country.

The output of encoder Yt- Yt at each time steps is the predictions of the LSTM at each time step. In machine
translation problems, we generate the outputs when we have read the entire input sequence. So Yt at each time step
in the encoder is of no use so we discard it.

So summarizing the encoder part of the model-

The encoder will read the English sentence word by word and store the final internal states (known as an
intermediate vector) of the LSTM generated after the last time step and since the output will be generated once the
entire sequence is read, therefore outputs (Yt) of the Encoder at each time step are discarded.
Decoder part of the model in the Training Phase
The working of the decoder is
different during the training
and testing phase unlike the
encoder part of the model
which works in the same
fashion in training and test
phase.For the decoder to
recognize the starting and end
of the sequence, we will add
START_ at the beginning of the
output sequence and _END at
the end of the output
sequence.So our Output
sentence will be START_भारत
खूबसूरत देश है _END
Decoder part of the model in the Training Phase

Let’s understand the working visually-

● The initial states (ho, co) of the decoder is set to the final states of the

encoder. It can be thought of as that the decoder is trained to generate

the output based on the information gathered by the encoder.

● Firstly, we input the START_ so that the decoder starts generating the

next word. And after the last word in the Hindi sentence, we make the

decoder learn to predict the _END.

● Here we use the teacher forcing technique where the input at each time

step is actual output and not the predicted output from the last time step.

● At last, the loss is calculated on the predicted outputs from each time

step and the errors are backpropagated through time to update the

parameters of the model.

● The final states of the decoder are discarded as we got the output hence it

is of no use.
Decoder part of the model in Test Phase-

Process of the Decoder in the test period-

● The initial states of the decoder are set to the final


states of the encoder.
● LSTM in the decoder process single word at every
time step.
● Input to the decoder always starts with the START_.
● The internal states generated after every time step is
fed as the initial states of the next time step. for e.g
At t=1, the internal states produced after inputting
START_ is fed as the initial states at t=2.
● The output produced at each time step is fed as input
in the next time step.
● We get to know about the end of the sequence when
the decoder predicts the END_.

https://medium.com/analytics-vidhya/machine-translation-encoder-decoder-model-7e4867377161
Recursive Neural Networks
Recursive Neural Networks (RvNNs) are a class of deep neural networks that can learn detailed and
structured information. With RvNN, you can get a structured prediction by recursively applying the same
set of weights on structured inputs. The word recursive indicates that the neural network is applied to its
output.
Due to their deep tree-like structure, Recursive Neural Networks can handle hierarchical data. The tree
structure means combining child nodes and producing parent nodes. Each child-parent bond has a weight
matrix, and similar children have the same weights. The number of children for every node in the tree is
fixed to enable it to perform recursive operations and use the same weights. RvNNs are used when there's
a need to parse an entire sentence.

To calculate the parent node's representation, we add the products of the weight matrices (W_i) and the
children's representations (C_i) and apply the transformation f:

\[h = f \left( \sum_{i=1}^{i=c} W_i C_i \right) \], where c is the number of children.

https://www.simplilearn.com/recursive-neural-network-in-deep-learning-article
Recurrent Neural Network vs. Recursive Neural Networks

● Recurrent Neural Networks (RNNs) are another well-known class of neural networks used for
processing sequential data. They are closely related to the Recursive Neural Network.
● Recurrent Neural Networks represent temporal sequences, which they find application in
Natural language Processing (NLP) since language-related data like sentences and paragraphs
are sequential in nature. Recurrent networks are usually chain structures. The weights are
shared across the chain length, keeping the dimensionality constant.
● On the other hand, Recursive Neural Networks operate on hierarchical data models due to
their tree structure. There are a fixed number of children for each node in the tree so that it can
execute recursive operations and use the same weights for each step. Child representations are
combined into parent representations.
● The efficiency of a recursive network is higher than a feed-forward network.
● Recurrent Networks are recurrent over time, meaning recursive networks are just a
generalization of the recurrent network.
Recursive Neural Network Implementation

A Recursive Neural Network is used for sentiment analysis in natural language sentences. It is one of the
most important tasks of Natural language Processing (NLP), which identifies the writing tone and
sentiments of the writer in a particular sentence. If a writer expresses any sentiment, basic labels about the
writing tone are recognized. We want to identify the smaller components like nouns or verb phrases and
order them in a syntactic hierarchy. For example, it identifies whether the sentence showcases a
constructive form of writing or negative word choices.

A variable called 'score' is calculated at each traversal of nodes, telling us which pair of phrases and words
we must combine to form the perfect syntactic tree for a given sentence.
Recursive Neural Network Implementation

Let us consider the representation of the phrase -- "a lot of fun" in the following sentence.

Programming is a lot of fun.

An RNN representation of this phrase would not be suitable because it considers only sequential relations.
Each state varies with the preceding words' representation. So, a subsequence that doesn't occur at the
beginning of the sentence can't be represented. With RNN, when processing the word 'fun,' the hidden
state will represent the whole sentence.

However, with a Recursive Neural Network (RvNN), the hierarchical architecture can store the
representation of the exact phrase. It lies in the hidden state of the node R_{a\ lot\ of\ fun}. Thus, Syntactic
parsing is completely implemented with the help of Recursive Neural Networks.
Benefits of RvNNs for Natural Language Processing
● The two significant advantages of Recursive Neural Networks for Natural Language
Processing are their structure and reduction in network depth.
● As already explained, the tree structure of Recursive Neural Networks can manage
hierarchical data like in parsing problems.
● Another benefit of RvNN is that the trees can have a logarithmic height. When there
are O(n) input words, a Recursive Neural Network can represent a binary tree with
height O(log\ n). This lessens the distance between the first and last input elements.
Hence, the long-term dependency turns shorter and easier to grab.
Disadvantages of RvNNs for Natural Language Processing

● The main disadvantage of recursive neural networks can be the tree structure. Using the tree
structure indicates introducing a unique inductive bias to our model. The bias corresponds to
the assumption that the data follow a tree hierarchy structure. But that is not the truth. Thus,
the network may not be able to learn the existing patterns.
● Another disadvantage of the Recursive Neural Network is that sentence parsing can be slow
and ambiguous. Interestingly, there can be many parse trees for a single sentence.
● Also, it is more time-consuming and labor-intensive to label the training data for recursive
neural networks than to construct recurrent neural networks. Manually parsing a sentence into
short components is more time-consuming and tedious than assigning a label to a sentence.
THANK YOU

You might also like