0% found this document useful (0 votes)

25 views17 pages

RNN Basics

Uploaded by

aoneone71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views17 pages

RNN Basics

Uploaded by

aoneone71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

What is RNN ?

Recurrent Neural Network(RNN) is a type of Neural Network where the output from the
previous step is fed as input to the current step.

• In some cases when it is required to predict the next word of a sentence, the
previous words are required and hence there is a need to remember the
previous words. Thus RNN came into existence, which solved this issue with the
help of a Hidden Layer.

• The main and most important feature of RNN is its Hidden state, which remembers
some information about a sequence. The state is also referred to as Memory
State since it remembers the previous input to the network.

• RNN uses the same weights for each element of the sequence.

RNN are a class of neural networks that is powerful for modeling sequence data such as
time series or natural language. Basically, main idea behind this architecture is to use
sequential information.

Do you know how Google’s autocomplete function works???

Basically, collection of large volumes of most frequently occurring consecutive words fed
into RNN network. It analyze the data by finding the sequence of words occurring
frequently and builds a model to predict the next word in the sentence.

So, do you see the importance of the RNN in our daily life.

Why to make it more complex by introducing a new network (RNN), hen we already
have feed forward neural network(ANN)?

In ANN, information flows only in forward direction from the input nodes, through the
hidden layers and to the output nodes. There are no cycles or loops in the network.
ANN architecutre

Issues in the feed forward neural network : -

1. Can’t handle sequential data.

2. Consider only current input.

3. Can’t memorize the previous input.

RNN architecture
Table of contents

• What are RNNs used for?

• What are RNNs and how do they work?

• A trivial example — forward propagation, backpropagation through time

• One major problem: vanishing gradients

What are RNNs used for?

Recurrent Neural Networks (RNNs) are widely used for data with some kind of sequential
structure. For instance, time series data has an intrinsic ordering based on time. Sentences
are also sequential, “I love dogs” has a different meaning than “Dogs I love.” Simply put, if
the semantics of your data is altered by random permutation, you have a sequential
dataset and RNNs may be used for your problem! To help solidify the types of problems
RNNs can solve, here is a list of common applications¹ :

• Speech Recognition

• Sentiment Classification

• Machine Translation (i.e. Chinese to English)

• Video Activity Recognition

• Name Entity Recognition — (i.e. Identifying names in a sentence)

Great! We know the types of problems that we can apply RNNs to, now…

What are RNNs and how do they work?

RNNs are different than the classical multi-layer perceptron (MLP) networks because of
two main reasons: 1) They take into account what happened previously and 2)
they share parameters/weights.

The architecture of an RNN

Left: Shorthand notation often used for RNNs

Right: Unfolded notation for RNNs

Don’t worry if this doesn’t make sense, we’re going to break down all the variables and go
through a forward propagation and backpropagation in a little bit! Just focus on the flow of
variables at first glance.

A breakdown of the architecture

The green blocks are called hidden states. The blue circles, defined by the vector a within
each block, are called hidden nodes or hidden units where the number of nodes is
decided by the hyper-parameter d. Similar to activations in MLPs, think of each green block
as an activation function that acts on each blue node. We’ll talk about the calculations
within the hidden states in the forward propagation section of this article.

Vector h — is the output of the hidden state after the activation function has been applied
to the hidden nodes. As you can see at time t, the architecture takes into account what
happened at t-1 by including the h from the previous hidden state as well as the input x at
time t. This allows the network to account for information from previous inputs that are
sequentially behind the current input. It’s important to note that the zeroth h vector will
always start as a vector of 0’s because the algorithm has no information preceding the first
element in the sequence.
The hidden state at t=2, takes as input the output from t-1 and x at t.

Matrices Wx, Wy, Wh — are the weights of the RNN architecture which
are shared throughout the entire network. The model weights of Wx at t=1 are the exact
same as the weights of Wx at t=2 and every other time step.

Vector xᵢ — is the input to each hidden state where i=1, 2,…, n for each element in the input
sequence. Recall that text must be encoded into numerical values. For example, every
letter in the word “dogs” would be a one-hot encoded vector with
dimension (4x1). Similarly, x can also be word embedding or other numerical
representations.
One-Hot Encoding of the word “dogs”

RNN Equations

Now that we know what all the variables are, here are all the equations that we’re going to
need in order to go through an RNN calculation:

These are the only three equations that we need, pretty sweet! The hidden nodes are a
concatenation of the previous state’s output weighted by the weight matrix Wh and the
input x weighted by the weight matrix Wx. The tanh function is the activation function that
we mentioned earlier, symbolized by the green block. The output of the hidden state is the
activation function applied to the hidden nodes. To make a prediction, we take the output
from the current hidden state and weight it by the weight matrix Wy with a soft max
activation.
It’s also important to understand the dimensions of all the variables floating around. In
general for predicting a sequence:

Where

• k is the dimension of the input vector xᵢ

• d is the number of hidden nodes

Now we’re ready to walk through an example!

A trivial example

Take the word “dogs,” where we want to train an RNN to predict the letter “s” given the
letters “d”-“o”-“g”. The architecture above would look like the following:
RNN architecture predicting the letter “s” in “dogs”

To keep this example simple, we’ll use 3 hidden nodes in our RNN (d=3). The dimensions
for each of our variables are as follows:
where k = 4, because our input x is a 4-dimensional one-hot vector for the letters in “dogs.”

Forward Propagation

Let’s see how a forward propagation would work at time t=1. First, we have to calculate the
hidden nodes a, then apply the activation function to get h, and finally calculate
the prediction. Easy!

At t=1
To make the example concrete, I’ve initialized random weights for the matrices Wx,
Wy, and Wh to provide an example with numbers.

At t=1, our RNN would predict the letter “d” given the input “d”. This doesn’t make sense,
but that’s ok because we’ve used untrained random weights. This was just to show the
workflow of a forward pass in an RNN.

At t=2 and t=3, the workflow would be analogous except that the vector h from t-1 would no
longer be a vector of 0’s, but a vector of non-zeros based on the inputs before time t. (As a
reminder, the weight matrices Wx, Wh, and Wy remain the same for t=1,2, and 3. )

It’s important to note that while the RNN can output a prediction at every single time step, it
isn’t necessary. If we were just interested in the letter after the input “dog” we could just
take the output at t=3 and ignore the others.

Now that we understand how to make predictions with RNNs, let’s explore how RNNs learn
to make correct predictions.

Backpropagation through time (BPTT)

Like their classical counterparts (MLPs), RNNs use the backpropagation methodology to
learn from sequential training data. Backpropagation with RNNs is a little more challenging
due to the recursive nature of the weights and their effect on the loss which spans over
time. We’ll see what that means in a bit.
To get a concrete understanding of how backpropagation works, let’s lay out the general
workflow:

1. Initialize weight matrices Wx, Wy, Wh randomly

2. Forward propagation to compute predictions

3. Compute the loss

4. Backpropagation to compute gradients

5. Update weights based on gradients

6. Repeat steps 2–5

Note: that the output h from the hidden unit is not learned, it is merely the information
gained by concatenating the learned weights to previous output h and current input x.

Because this example is a classification problem where we’re trying to predict four possible
letters (“d-o-g-s”), it makes sense to use the multi-class cross entropy loss function:

Taking into account all time steps, the overall loss is:

Visually, this can be seen as:

Given our loss function, we need to calculate the gradients for our three weight
matrices Wx, Wy, Wh, and update them with a learning rate η. Similar to normal
backpropagation, the gradient gives us a sense of how the loss is changing with respect to
each weight parameter. We update the weights to minimize loss with the following
equation:

where i = x, y, and h as a shorthand for the 3 weight matrices

Now here comes the tricky part, calculating the gradient for Wx, Wy, and Wh. We’ll start by
calculating the gradient for Wy because it’s the easiest. As stated before, the effect of the
weights on loss spans over time. The weight gradient for Wy is the following:
That’s the gradient calculation for Wy. Hopefully, pretty straight forward, the main idea is
chain rule and to account for the loss at each time step.

The weight matrices Wx and Wh are analogous to each other, so we’ll just look at the
gradient for Wx and leave Wh to you. One of the trickiest parts about calculating Wx is the
recursive dependency on the previous state, as stated in line (2) in the image below. We
need to account for the derivatives of the current error with respect to each of the previous
states, which is done in (3). Finally, we again need to account for the loss at each time step
(4).

And that’s backpropagation! Once we have the gradients for Wx, Wh, and Wy, we update
them as usual and continue on with the backpropagation workflow. Now that you know
how RNNs learn and make predictions.

One major problem: vanishing gradients

A problem that RNNs face, which is also common in other deep neural nets, is
the vanishing gradient problem. Vanishing gradients make it difficult for the model to
learn long-term dependencies. For example, if an RNN was given this sentence:

and had to predict the last two words “german” and “shepherd,” the RNN would need to
take into account the inputs “brown”, “black”, and “dog,” which are the nouns and
adjectives that describe a german shepherd. However, the word “brown” is quite far from
the word “shepherd.” From the gradient calculation of Wx that we saw earlier, we can break
down the backpropagation error of the word “shepherd” back to “brown” and see what it
looks like:
The partial derivative of the state corresponding to the input “shepherd” respective to the
state “brown” is actually a chain rule in itself, resulting in:

That’s a lot of chain rule! These chains of gradients are troublesome because if less than 1
they can cause the loss from the word shepherd with respect to the word brown to
approach 0, thereby vanishing. This makes it difficult for the weights to take into account
words that occur at the start of a long sequence. So the word “brown” when doing a
forward propagation, may not have any effect in the prediction of “shepherd” because the
weights weren’t updated due to the vanishing gradient. This is one of the major
disadvantages of RNNs.

However, there have been advancements in RNNs such as gated recurrent units (GRUs)
and long short term memory (LSTMs) that have been able to deal with the problem of
vanishing gradients.

Types of RNN architectures:

One to One

This type of RNN behaves the same as any simple Neural network it is also known as
Vanilla Neural Network. In this Neural network, there is only one input and one output.

One To Many
In this type of RNN, there is one input and many outputs associated with it. One of the most
used examples of this network is Image captioning where given an image we predict a
sentence having Multiple words.

Many to One

In this type of network, Many inputs are fed to the network at several states of the network
generating only one output. This type of network is used in the problems like sentimental
analysis. Where we give multiple words as input and predict only the sentiment of the
sentence as output.

Many to Many

In this type of neural network, there are multiple inputs and multiple outputs corresponding
to a problem. One Example of this Problem will be language translation. In language
translation, we provide multiple words from one language as input and predict multiple
words from the second language as output.

Advantages and Disadvantages of Recurrent Neural Network

Advantages

1. An RNN remembers each and every piece of information through time. It is useful in
time series prediction only because of the feature to remember previous inputs as
well. This is called Long short term memory.

2. Recurrent neural networks are even used with convolutional layers to extend the
effective pixel neighborhood.

Disadvantages

1. Vanishing and exploding gradient problems.

2. Training an RNN is a very difficult task.

3. It cannot process very long sequences if using Tanh or Relu as an activation

function.

Applications of Recurrent Neural Network

1. Language Modelling and Generating Text

2. Speech Recognition

3. Machine Translation

4. Image Recognition, Face detection

5. Time series Forecasting

Variation Of Recurrent Neural Network (RNN)

To overcome the problems like vanishing gradient and exploding gradient descent several
new advanced versions of RNNs are formed some of these are as;

1. Bidirectional Neural Network (BiNN)

2. Long Short-Term Memory (LSTM)

Bidirectional Neural Network (BiNN)

A BiNN is a variation of a Recurrent Neural Network in which the input information flows in
both direction and then the output of both direction are combined to produce the input.
BiNN is useful in situations when the context of the input is more important such as NLP
tasks and Time-series analysis problems.

Long Short-Term Memory (LSTM)

Long Short-Term Memory works on the read-write-and-forget principle where given the
input information network reads and writes the most useful information from the data and
it forgets about the information which is not important in predicting the output. For doing
this three new gates are introduced in the RNN. In this way, only the selected information is
passed through the network.

Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
21 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
3 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
DL M5 Tech
No ratings yet
DL M5 Tech
21 pages
RNN
No ratings yet
RNN
23 pages
DL 4 Notes
No ratings yet
DL 4 Notes
34 pages
Unit 4
No ratings yet
Unit 4
13 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
18 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
9 pages
Module 4 RNN LSTM GRU
No ratings yet
Module 4 RNN LSTM GRU
59 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
Traditional Neural Networks (TNNS) - Simple Explanation What Are Traditional Neural Networks?
No ratings yet
Traditional Neural Networks (TNNS) - Simple Explanation What Are Traditional Neural Networks?
25 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
28 pages
Module 5
No ratings yet
Module 5
21 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
Deep Learning Recurrent Neural Networks - Introduction
No ratings yet
Deep Learning Recurrent Neural Networks - Introduction
106 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
The Unreasonable Effectiveness of Recurrent Neural Networks
No ratings yet
The Unreasonable Effectiveness of Recurrent Neural Networks
1 page
DL Unit Iv
No ratings yet
DL Unit Iv
15 pages
A Brief Overview of Recurrent Neural Networks (RNN)
No ratings yet
A Brief Overview of Recurrent Neural Networks (RNN)
8 pages
21CSE356T-NLP-Unit 4.1
No ratings yet
21CSE356T-NLP-Unit 4.1
46 pages
21cse356t NLP Unit 4
No ratings yet
21cse356t NLP Unit 4
81 pages
Understanding Recurrent Neural Networks (RNN) - NLP - by Praveen Raj - Medium
No ratings yet
Understanding Recurrent Neural Networks (RNN) - NLP - by Praveen Raj - Medium
25 pages
Unit 3 Deep Learning SPPU BE IT
No ratings yet
Unit 3 Deep Learning SPPU BE IT
30 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
RNN
No ratings yet
RNN
4 pages
Unit V Recurrent Neural Networks
No ratings yet
Unit V Recurrent Neural Networks
35 pages
Introduction To Recurrent Neural Networks - GeeksforGeeks
No ratings yet
Introduction To Recurrent Neural Networks - GeeksforGeeks
18 pages
Recurrent Neural Network (RNN)
No ratings yet
Recurrent Neural Network (RNN)
26 pages
T3-Slide - 002 - Vanilla RNNs
No ratings yet
T3-Slide - 002 - Vanilla RNNs
25 pages
Unit 3 RCNN
No ratings yet
Unit 3 RCNN
25 pages
The Math Behind Recurrent Neural Networks
No ratings yet
The Math Behind Recurrent Neural Networks
39 pages
Unit 5 RNN
No ratings yet
Unit 5 RNN
14 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
19 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
Recurrent Neural Network - Fundamentals of Deep Learning
No ratings yet
Recurrent Neural Network - Fundamentals of Deep Learning
16 pages
Recurrent Neural Networks: RNN: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
No ratings yet
Recurrent Neural Networks: RNN: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
47 pages
Module 3.2 Time Series Forecasting LSTM Model
No ratings yet
Module 3.2 Time Series Forecasting LSTM Model
23 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages
Udacity Deep LEarning Part4 RNN
No ratings yet
Udacity Deep LEarning Part4 RNN
338 pages
Recurrent Neural Networks Tutorial, Part 1 - Introduction To RNNs - WildML
No ratings yet
Recurrent Neural Networks Tutorial, Part 1 - Introduction To RNNs - WildML
8 pages
Different Artificial Neural Networks Architectures
No ratings yet
Different Artificial Neural Networks Architectures
27 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
Ad3501-Dl-Unit 3 Notes
No ratings yet
Ad3501-Dl-Unit 3 Notes
34 pages
Time Series RNN LSTM 1746197734
No ratings yet
Time Series RNN LSTM 1746197734
25 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
Introduction To Recurrent Neural Networks
No ratings yet
Introduction To Recurrent Neural Networks
15 pages
NLP Unit-3A Notes
No ratings yet
NLP Unit-3A Notes
28 pages
Top 25 Interview Questions On RNN - Reader View
No ratings yet
Top 25 Interview Questions On RNN - Reader View
9 pages
LSTMDerivadas
No ratings yet
LSTMDerivadas
10 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
RNN SK
No ratings yet
RNN SK
17 pages
Lec14 RNN3 8 Feb 18
No ratings yet
Lec14 RNN3 8 Feb 18
16 pages
Artificial Intelligence-Informed Mobile Mental Hea
No ratings yet
Artificial Intelligence-Informed Mobile Mental Hea
20 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
51 pages
Thesis 18
No ratings yet
Thesis 18
55 pages
Stock Price Prediction Using Deep Learning
No ratings yet
Stock Price Prediction Using Deep Learning
60 pages
Hao 2016
No ratings yet
Hao 2016
23 pages
Deep Learning PPT Full Notes
100% (3)
Deep Learning PPT Full Notes
105 pages
Predicting Winner of NFL Games Using Deep Learning
No ratings yet
Predicting Winner of NFL Games Using Deep Learning
20 pages
Aiml Report
No ratings yet
Aiml Report
70 pages
A Hybrid Partitioned Deep Learning Methodology For Moving Interface and Fluid-Structure Interaction
No ratings yet
A Hybrid Partitioned Deep Learning Methodology For Moving Interface and Fluid-Structure Interaction
45 pages
Intrusion Detection System On IoT With 5G Network
No ratings yet
Intrusion Detection System On IoT With 5G Network
13 pages
Unit 4 DL
No ratings yet
Unit 4 DL
31 pages
Expt 2 Ressearch
No ratings yet
Expt 2 Ressearch
7 pages
Deep Learning MCQ
No ratings yet
Deep Learning MCQ
6 pages
DL Co3 - PPT 1
No ratings yet
DL Co3 - PPT 1
22 pages
Yadav 2020
No ratings yet
Yadav 2020
6 pages
20CT1153
No ratings yet
20CT1153
2 pages
2 Yeswanth
No ratings yet
2 Yeswanth
74 pages
Understanding Deep Learning
100% (1)
Understanding Deep Learning
39 pages
Presentaton PPT Stock Prediction
No ratings yet
Presentaton PPT Stock Prediction
10 pages
Ai Chapter 5
No ratings yet
Ai Chapter 5
45 pages
Machine Learning-Based Earthquake Detection and P-Wave Arrival Time Picking
No ratings yet
Machine Learning-Based Earthquake Detection and P-Wave Arrival Time Picking
5 pages
Class 5 - Deep Dive Into AI
No ratings yet
Class 5 - Deep Dive Into AI
32 pages
DL - Unit IV
No ratings yet
DL - Unit IV
36 pages
Emotionally Intelligent Chatbots A Systematic Lite
No ratings yet
Emotionally Intelligent Chatbots A Systematic Lite
23 pages
LSTM Deep Learning
No ratings yet
LSTM Deep Learning
11 pages
Multivariate Time Series Data Prediction Based On
No ratings yet
Multivariate Time Series Data Prediction Based On
14 pages
(IJCST-V12I3P5) :arjita Sable, Riya Gupta, Prof Aproov Khare, Prof Richa Shukla
No ratings yet
(IJCST-V12I3P5) :arjita Sable, Riya Gupta, Prof Aproov Khare, Prof Richa Shukla
6 pages
Artificial Neural Network Part-2
No ratings yet
Artificial Neural Network Part-2
15 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
18 pages

RNN Basics

Uploaded by

RNN Basics

Uploaded by

What is RNN ?

Do you know how Google’s autocomplete function works???

Issues in the feed forward neural network : -

1. Can’t handle sequential data.

2. Consider only current input.

3. Can’t memorize the previous input.

• What are RNNs used for?

• What are RNNs and how do they work?

• A trivial example — forward propagation, backpropagation through time

• One major problem: vanishing gradients

What are RNNs used for?

• Machine Translation (i.e. Chinese to English)

• Video Activity Recognition

• Name Entity Recognition — (i.e. Identifying names in a sentence)

What are RNNs and how do they work?

The architecture of an RNN

Right: Unfolded notation for RNNs

A breakdown of the architecture

• k is the dimension of the input vector xᵢ

• d is the number of hidden nodes

Now we’re ready to walk through an example!

Backpropagation through time (BPTT)

1. Initialize weight matrices Wx, Wy, Wh randomly

2. Forward propagation to compute predictions

3. Compute the loss

4. Backpropagation to compute gradients

5. Update weights based on gradients

6. Repeat steps 2–5

Visually, this can be seen as:

where i = x, y, and h as a shorthand for the 3 weight matrices

One major problem: vanishing gradients

Types of RNN architectures:

Advantages and Disadvantages of Recurrent Neural Network

1. Vanishing and exploding gradient problems.

2. Training an RNN is a very difficult task.

3. It cannot process very long sequences if using Tanh or Relu as an activation

Applications of Recurrent Neural Network

1. Language Modelling and Generating Text

4. Image Recognition, Face detection

Variation Of Recurrent Neural Network (RNN)

1. Bidirectional Neural Network (BiNN)

2. Long Short-Term Memory (LSTM)

Bidirectional Neural Network (BiNN)

Long Short-Term Memory (LSTM)

You might also like