0% found this document useful (0 votes)
1 views

Module 6

The document discusses recurrent neural networks and their ability to handle sequential data. It describes issues with standard feedforward networks and recurrent networks, such as the vanishing gradient problem. It also introduces long short-term memory networks which are designed to address long-term dependencies.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Module 6

The document discusses recurrent neural networks and their ability to handle sequential data. It describes issues with standard feedforward networks and recurrent networks, such as the vanishing gradient problem. It also introduces long short-term memory networks which are designed to address long-term dependencies.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Artificial

Intelligence
Recurrent Neural Networks

Copyright IntelliPaat. All rights reserved


Agenda
Issues with Feed Forward Understanding Recurrent Neural
01 Network 02 Networks

03 Types of RNN 04 Issues with RNN

05 Vanishing Gradient Problem 06 Long Short Term Networks

07 Demo on LSTM with Keras

Copyright Intellipaat. All rights reserved.


Issues with Feed Forward Network

Outputs are independent of each other

No Relation

Cannot handle sequential data


Output at ‘t’ Output at ‘t+1’

Cannot memorize previous inputs


Feed Forward Network

Copyright IntelliPaat. All rights reserved


Issues with Feed Forward Network

Would this feed


forward network be
able to predict the
Input Output
next word?

Recurrent Neural ……………………. FFN

This feed forward network


wouldn’t be able to predict
the next word because it
cannot memorize the
previous inputs

Copyright IntelliPaat. All rights reserved


Solution with Recurrent Neural Network

I only cook these


three items and
in the same
sequence

Day 1 Day 2 Day 3

Copyright IntelliPaat. All rights reserved


Solution with Recurrent Neural Network

Outputs are dependent on each other

Can handle sequential data

Day 1 Day 2 Day 3


Can memorize previous inputs

Recurrent Neural
Network

Copyright IntelliPaat. All rights reserved


Understanding Recurrent Neural Networks

▪ RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous

computations

▪ Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far

Input at Input at Input at


Input
‘t-1’ ‘t’ ‘t+1’

Copyright IntelliPaat. All rights reserved


Understanding Recurrent Neural Networks

• Xt is the input at time step ‘t’


• St is the hidden state at time step ‘t’. It’s the memory of the network. St is calculated based on the previous hidden
state and the input at the current step: 𝑠𝑡 = 𝑓 𝑈𝑥𝑡 + 𝑊𝑠𝑡−1 . The function 𝑓 is usually a non-linearity such as tanh
or ReLu.
• Ot is the output at step ‘t’. 𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡

Copyright IntelliPaat. All rights reserved


Back-Propagation through Time

▪ Backpropagation Through Time (BPTT) is used to update the weights in the

recurrent neural network

▪ RNN typically predicts one output per each time step. Conceptually,

Backpropagation through Time works by unrolling the network to get each of

these individual time steps.

▪ Then, it calculates the error across each time step and adds up all of the

individual errors to get the final accumulated error.

▪ Following which the network is rolled back up and the weights are updated

Copyright IntelliPaat. All rights reserved


Types of RNN

single images ( or words,... ) are


single images ( or words,... ) are
classified in single class ( binary
classified in multiple classes
classification ) i.e. is this a bird or not

Copyright IntelliPaat. All rights reserved


Types of RNN

sequence of images ( or words, ... )


sequence of images ( or words, ... )
is classified in single class ( binary
is classified in multiple classes
classification of a sequence )

Copyright IntelliPaat. All rights reserved


Issues with RNN

Suppose we try to
predict the last word
in this text.. Input Output

Recurrent Neural …… RNN Network

Here, the RNN does not


need any further context. It
can easily predict that the
last word would be
‘Network’

Copyright IntelliPaat. All rights reserved


Issues with RNN

Now, let’s predict the


last word in this text..

Input Output

I’ve been staying in Spain for the last 10


years. I can speak fluent ………….. RNN

Regular RNN’s have


difficulty in learning long
range dependencies

Copyright IntelliPaat. All rights reserved


Issues with RNN

I’ve been staying in Spain for the last 10 years. I can speak fluent …………..

• In this case, the network needs the context of ‘Spain’ to predict the last word in this text, which is “Spanish”

• The gap between the word which we want to predict and the relevant information is very large and this is known as
long term dependency

∂E/∂W = ∂E/∂ 3 *∂ 3/∂h3 *∂h3/∂ 2 *∂ 2/∂h1…

• There arises a long dependency while backpropagating the error

Copyright IntelliPaat. All rights reserved


Vanishing Gradient Problem

▪ Now, if there is a really long dependency, there’s a good probability that one

of the gradients might approach zero and this would lead to all the gradients

rushing to zero exponentially fast due to multiplication

∂E/∂W=0

▪ Such states would no longer help the network to learn anything. This is

known as vanishing gradient problem

Copyright IntelliPaat. All rights reserved


Long Short Term Networks

Long Short Term Networks are special kind of RNNs which are explicitly designed to avoid the long-term dependency problem

Standard RNN

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will

have a very simple structure, such as a single tanh layer

Copyright IntelliPaat. All rights reserved


Long Short Term Networks

Long Short Term Networks are special kind of RNNs which are explicitly designed to avoid the long-term dependency problem

h1 h
h 1

LSTM
h

1 1

LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer,

there are four, interacting in a very special way

Copyright IntelliPaat. All rights reserved


Core Idea behind LSTMs

The key to LSTMs is the cell state. The cell state is kind of like a conveyor belt. It runs straight down the entire
chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged

h
h 1 h

The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures
called gates

Copyright IntelliPaat. All rights reserved


Core Idea behind LSTMs

Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and
a pointwise multiplication operation

The sigmoid layer outputs numbers between zero and one, describing how much of each component should be
let through. A value of zero means “let nothing through,” while a value of one means “let everything through!”

Copyright IntelliPaat. All rights reserved


Working of LSTMs

Step 1

The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This
decision is made by a sigmoid layer called the “forget gate layer”

Copyright IntelliPaat. All rights reserved


Working of LSTMs

Step 2

The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a
sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of
new candidate values, that could be added to the state

Copyright IntelliPaat. All rights reserved


Working of LSTMs

Step 3

Then we have to update the old cell state, Ct-1, into new cell state Ct. So, we multiply the old state (Ct-1) by ft,
forgetting the things we decided to forget earlier. Then we add (it * C~t). This is the new candidate values, scaled
by how much we decided to update each state value

h
h 1 h

Copyright IntelliPaat. All rights reserved


Working of LSTMs

Step 4

Finally, we’ll run a sigmoid layer which decides what part of the cell state we’re going to output. Then, we put the
cell state through tanh and multiply it by the output of the sigmoid gate, so that we only output the parts we
decided to

h
h 1

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN
Loading the required packages:

Preparing the input data:

Creating 100 vectors with 5 consecutive


numbers

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Preparing the output data:

Converting the data & target into numpy arrays:

Having a glance at the shape:

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Dividing the data into train & test sets:

Creating a sequential model:

Adding the LSTM layer with the output and input shape:

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Compiling the model with ‘Adam’ optimizer:

Having a glance at the model summary:

10

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Fitting a model on the train set:

11

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Predicting the values on the test set:

12

Making a scatter plot for actual values and predicted values:

13

Copyright IntelliPaat. All rights reserved


We see that the model
fails miserably and none
of the predictions are
correct

Copyright IntelliPaat. All rights reserved


We’d have to normalize
the data before we
build the model

Normalized
Raw Data Normalizing
Data

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Normalizing the input data:

14

Normalizing the output data:

15

Fitting the model with normalized values and number of epochs to be 500:

16

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Predicting the values on test set:

17

Making a scatter plot for actual values & predicted values:

18

Copyright IntelliPaat. All rights reserved


We see that the loss has
reduced after
normalizing the data
and increasing the
epochs

Copyright IntelliPaat. All rights reserved


Quiz

Copyright IntelliPaat. All rights reserved


Quiz 1

Gated Recurrent units can help prevent


vanishing gradient problem in RNN.

A True

B False

Copyright IntelliPaat. All rights reserved


Answer 1

Gated Recurrent units can help prevent


vanishing gradient problem in RNN.

A True

B False

Copyright IntelliPaat. All rights reserved


Quiz 2

How many types of RNN exist?

A 4

B 2

C 3

D None of these

Copyright IntelliPaat. All rights reserved


Answer 2

How many types of RNN exist?

A 4

B 2

C 3

D None of these

Copyright IntelliPaat. All rights reserved


Quiz 3

How many gates are there in LSTM?

A 1

B 2

C 3

D 4

Copyright IntelliPaat. All rights reserved


Answer 3

How many gates are there in LSTM?

A 1

B 2

C 3

D 4

Copyright IntelliPaat. All rights reserved


India: +91-7847955955

US: 1-800-216-8930 (TOLL FREE)

[email protected]

24/7 Chat with Our Course Advisor

Copyright IntelliPaat. All rights reserved

You might also like