0% found this document useful (0 votes)

7 views42 pages

Module 6

The document discusses recurrent neural networks and their ability to handle sequential data. It describes issues with standard feedforward networks and recurrent networks, such as the vanishing gradient problem. It also introduces long short-term memory networks which are designed to address long-term dependencies.

Uploaded by

Ashin Sarkar Lahiri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views42 pages

Module 6

Uploaded by

Ashin Sarkar Lahiri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Artificial

Intelligence
Recurrent Neural Networks

Copyright IntelliPaat. All rights reserved

Agenda
Issues with Feed Forward Understanding Recurrent Neural
01 Network 02 Networks

03 Types of RNN 04 Issues with RNN

05 Vanishing Gradient Problem 06 Long Short Term Networks

07 Demo on LSTM with Keras

Copyright Intellipaat. All rights reserved.

Issues with Feed Forward Network

Outputs are independent of each other

No Relation

Cannot handle sequential data

Output at ‘t’ Output at ‘t+1’

Cannot memorize previous inputs

Feed Forward Network

Copyright IntelliPaat. All rights reserved

Issues with Feed Forward Network

Would this feed

forward network be
able to predict the
Input Output
next word?

Recurrent Neural ……………………. FFN

This feed forward network

wouldn’t be able to predict
the next word because it
cannot memorize the
previous inputs

Copyright IntelliPaat. All rights reserved

Solution with Recurrent Neural Network

I only cook these

three items and
in the same
sequence

Day 1 Day 2 Day 3

Copyright IntelliPaat. All rights reserved

Solution with Recurrent Neural Network

Outputs are dependent on each other

Can handle sequential data

Day 1 Day 2 Day 3

Can memorize previous inputs

Recurrent Neural
Network

Copyright IntelliPaat. All rights reserved

Understanding Recurrent Neural Networks

▪ RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous

computations

▪ Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far

Input at Input at Input at

Input
‘t-1’ ‘t’ ‘t+1’

Copyright IntelliPaat. All rights reserved

Understanding Recurrent Neural Networks

• Xt is the input at time step ‘t’

• St is the hidden state at time step ‘t’. It’s the memory of the network. St is calculated based on the previous hidden
state and the input at the current step: 𝑠𝑡 = 𝑓 𝑈𝑥𝑡 + 𝑊𝑠𝑡−1 . The function 𝑓 is usually a non-linearity such as tanh
or ReLu.
• Ot is the output at step ‘t’. 𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡

Copyright IntelliPaat. All rights reserved

Back-Propagation through Time

▪ Backpropagation Through Time (BPTT) is used to update the weights in the

recurrent neural network

▪ RNN typically predicts one output per each time step. Conceptually,

Backpropagation through Time works by unrolling the network to get each of

these individual time steps.

▪ Then, it calculates the error across each time step and adds up all of the

individual errors to get the final accumulated error.

▪ Following which the network is rolled back up and the weights are updated

Copyright IntelliPaat. All rights reserved

Types of RNN

single images ( or words,... ) are

single images ( or words,... ) are
classified in single class ( binary
classified in multiple classes
classification ) i.e. is this a bird or not

Copyright IntelliPaat. All rights reserved

Types of RNN

sequence of images ( or words, ... )

sequence of images ( or words, ... )
is classified in single class ( binary
is classified in multiple classes
classification of a sequence )

Copyright IntelliPaat. All rights reserved

Issues with RNN

Suppose we try to
predict the last word
in this text.. Input Output

Recurrent Neural …… RNN Network

Here, the RNN does not

need any further context. It
can easily predict that the
last word would be
‘Network’

Copyright IntelliPaat. All rights reserved

Issues with RNN

Now, let’s predict the

last word in this text..

Input Output

I’ve been staying in Spain for the last 10

years. I can speak fluent ………….. RNN

Regular RNN’s have

difficulty in learning long
range dependencies

Copyright IntelliPaat. All rights reserved

Issues with RNN

I’ve been staying in Spain for the last 10 years. I can speak fluent …………..

• In this case, the network needs the context of ‘Spain’ to predict the last word in this text, which is “Spanish”

• The gap between the word which we want to predict and the relevant information is very large and this is known as
long term dependency

∂E/∂W = ∂E/∂ 3 ∂ 3/∂h3 ∂h3/∂ 2 *∂ 2/∂h1…

• There arises a long dependency while backpropagating the error

Copyright IntelliPaat. All rights reserved

Vanishing Gradient Problem

▪ Now, if there is a really long dependency, there’s a good probability that one

of the gradients might approach zero and this would lead to all the gradients

rushing to zero exponentially fast due to multiplication

∂E/∂W=0

▪ Such states would no longer help the network to learn anything. This is

known as vanishing gradient problem

Copyright IntelliPaat. All rights reserved

Long Short Term Networks

Long Short Term Networks are special kind of RNNs which are explicitly designed to avoid the long-term dependency problem

Standard RNN

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will

have a very simple structure, such as a single tanh layer

Copyright IntelliPaat. All rights reserved

Long Short Term Networks

Long Short Term Networks are special kind of RNNs which are explicitly designed to avoid the long-term dependency problem

h1 h
h 1

LSTM
h

1 1

LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer,

there are four, interacting in a very special way

Copyright IntelliPaat. All rights reserved

Core Idea behind LSTMs

The key to LSTMs is the cell state. The cell state is kind of like a conveyor belt. It runs straight down the entire
chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged

h
h 1 h

The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures
called gates

Copyright IntelliPaat. All rights reserved

Core Idea behind LSTMs

Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and
a pointwise multiplication operation

The sigmoid layer outputs numbers between zero and one, describing how much of each component should be
let through. A value of zero means “let nothing through,” while a value of one means “let everything through!”

Copyright IntelliPaat. All rights reserved

Working of LSTMs

Step 1

The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This
decision is made by a sigmoid layer called the “forget gate layer”

Normalized
Raw Data Normalizing
Data

Implementing a Simple RNN

Normalizing the input data:

Normalizing the output data:

Fitting the model with normalized values and number of epochs to be 500:

Implementing a Simple RNN

Predicting the values on test set:

Making a scatter plot for actual values & predicted values:

We see that the loss has
reduced after
normalizing the data
and increasing the
epochs

Quiz

Quiz 1

Gated Recurrent units can help prevent

vanishing gradient problem in RNN.

A True

B False

Answer 1

[email protected]

24/7 Chat with Our Course Advisor

Stock Prediction Using Recurrent Neural Network (RNN)
0% (1)
Stock Prediction Using Recurrent Neural Network (RNN)
24 pages
LSTM
No ratings yet
LSTM
22 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
Recurrent Neural Network: What Does RNN Stand For?
No ratings yet
Recurrent Neural Network: What Does RNN Stand For?
7 pages
RNN LSTM
No ratings yet
RNN LSTM
37 pages
Long Short-Term Memory Networks (LSTM)- simply explained! _ Data Basecamp
No ratings yet
Long Short-Term Memory Networks (LSTM)- simply explained! _ Data Basecamp
4 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
CS5560 Lect12-RNN - LSTM
No ratings yet
CS5560 Lect12-RNN - LSTM
30 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
lec-10
No ratings yet
lec-10
37 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
7 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
15 pages
RNN_2
No ratings yet
RNN_2
144 pages
4-Recurrent Neural Network
No ratings yet
4-Recurrent Neural Network
21 pages
RNNs
No ratings yet
RNNs
22 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
OlahLSTM NEURAL NETWORK TUTORIAL 15
No ratings yet
OlahLSTM NEURAL NETWORK TUTORIAL 15
9 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
slides_rnn
No ratings yet
slides_rnn
75 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
8 pages
Sequence Modeling - Recurrent Networks: Biplab Banerjee
No ratings yet
Sequence Modeling - Recurrent Networks: Biplab Banerjee
66 pages
Session2 2024_2025_ Natural Language Processing
No ratings yet
Session2 2024_2025_ Natural Language Processing
30 pages
Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics
No ratings yet
Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics
50 pages
15.03.2024_CSA3007_A24+D23+D24 (1)
No ratings yet
15.03.2024_CSA3007_A24+D23+D24 (1)
8 pages
Unit 3 Deep Learning SPPU BE IT
No ratings yet
Unit 3 Deep Learning SPPU BE IT
30 pages
DS303_RNN_LSTM
No ratings yet
DS303_RNN_LSTM
16 pages
lecture 11
No ratings yet
lecture 11
57 pages
dis6-sol
No ratings yet
dis6-sol
6 pages
chapter 2
No ratings yet
chapter 2
68 pages
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
No ratings yet
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
14 pages
RNNs and LSTMs
No ratings yet
RNNs and LSTMs
41 pages
07 RNN Recurrent Neural Networks
No ratings yet
07 RNN Recurrent Neural Networks
115 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
10 pages
CH4_AA1.1-Sequence Models (1)
No ratings yet
CH4_AA1.1-Sequence Models (1)
26 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
Final PDL_Unit IV
No ratings yet
Final PDL_Unit IV
51 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
Survey of Prediction Using Recurrent Neural Network
No ratings yet
Survey of Prediction Using Recurrent Neural Network
3 pages
Unit IV
No ratings yet
Unit IV
22 pages
MachineLearningSlides PartTwo
No ratings yet
MachineLearningSlides PartTwo
141 pages
Deep Arch Msc 2024
No ratings yet
Deep Arch Msc 2024
83 pages
UNIT-III
No ratings yet
UNIT-III
5 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
30-35
No ratings yet
30-35
26 pages
Deep Learning RNN
100% (1)
Deep Learning RNN
53 pages
Rnn
No ratings yet
Rnn
50 pages
8.5 Recurrent Neural Networks
No ratings yet
8.5 Recurrent Neural Networks
5 pages
6S191 MIT DeepLearning L2
No ratings yet
6S191 MIT DeepLearning L2
85 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
19 pages
Deep Learning L3
No ratings yet
Deep Learning L3
37 pages

Module 6

Uploaded by

Module 6

Uploaded by

Artificial

Copyright IntelliPaat. All rights reserved

03 Types of RNN 04 Issues with RNN

05 Vanishing Gradient Problem 06 Long Short Term Networks

07 Demo on LSTM with Keras

Copyright Intellipaat. All rights reserved.

Outputs are independent of each other

Cannot handle sequential data

Cannot memorize previous inputs

Copyright IntelliPaat. All rights reserved

Would this feed

Recurrent Neural ……………………. FFN

This feed forward network

Copyright IntelliPaat. All rights reserved

I only cook these

Day 1 Day 2 Day 3

Copyright IntelliPaat. All rights reserved

Outputs are dependent on each other

Can handle sequential data

Day 1 Day 2 Day 3

Copyright IntelliPaat. All rights reserved

Input at Input at Input at

Copyright IntelliPaat. All rights reserved

• Xt is the input at time step ‘t’

Copyright IntelliPaat. All rights reserved

▪ Backpropagation Through Time (BPTT) is used to update the weights in the

recurrent neural network

Backpropagation through Time works by unrolling the network to get each of

these individual time steps.

individual errors to get the final accumulated error.

Copyright IntelliPaat. All rights reserved

single images ( or words,... ) are

Copyright IntelliPaat. All rights reserved

sequence of images ( or words, ... )

Copyright IntelliPaat. All rights reserved

Recurrent Neural …… RNN Network

Here, the RNN does not

Copyright IntelliPaat. All rights reserved

Now, let’s predict the

I’ve been staying in Spain for the last 10

Regular RNN’s have

Copyright IntelliPaat. All rights reserved

∂E/∂W = ∂E/∂ 3 *∂ 3/∂h3 *∂h3/∂ 2 *∂ 2/∂h1…

• There arises a long dependency while backpropagating the error

Copyright IntelliPaat. All rights reserved

rushing to zero exponentially fast due to multiplication

known as vanishing gradient problem

Copyright IntelliPaat. All rights reserved

have a very simple structure, such as a single tanh layer

Copyright IntelliPaat. All rights reserved

there are four, interacting in a very special way

Copyright IntelliPaat. All rights reserved

Copyright IntelliPaat. All rights reserved

Copyright IntelliPaat. All rights reserved

Copyright IntelliPaat. All rights reserved

Copyright IntelliPaat. All rights reserved

Copyright IntelliPaat. All rights reserved

Copyright IntelliPaat. All rights reserved

Preparing the input data:

Creating 100 vectors with 5 consecutive

Copyright IntelliPaat. All rights reserved

Preparing the output data:

Converting the data & target into numpy arrays:

Having a glance at the shape:

Copyright IntelliPaat. All rights reserved

Dividing the data into train & test sets:

Creating a sequential model:

Copyright IntelliPaat. All rights reserved

Compiling the model with ‘Adam’ optimizer:

Having a glance at the model summary:

Copyright IntelliPaat. All rights reserved

Fitting a model on the train set:

Copyright IntelliPaat. All rights reserved

Predicting the values on the test set:

Making a scatter plot for actual values and predicted values:

Copyright IntelliPaat. All rights reserved

∂E/∂W = ∂E/∂ 3 ∂ 3/∂h3 ∂h3/∂ 2 *∂ 2/∂h1…