0% found this document useful (0 votes)

69 views18 pages

Recurrent Neural Networks

This document provides an overview of recurrent neural networks (RNNs). It discusses how RNNs can handle sequential data by using their internal state to process inputs of varying lengths. It also explains how RNNs differ from feedforward neural networks in having cycles and using previous outputs and internal states as inputs. The document then summarizes various types of RNNs including vanilla RNNs, bidirectional RNNs, multilayer RNNs, and long short-term memory (LSTM) networks, and discusses challenges like vanishing gradients that RNNs aim to address.

Uploaded by

polinati.vinesh2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views18 pages

Recurrent Neural Networks

Uploaded by

polinati.vinesh2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 6 Machine Learning based

on Artificial Neural Networks

Video 6.5 Recurrent Neural Networks

Structure of Lectures in week 6
L1 Fundamentals of
Neural Networks

McCulloch and Pitts

Supervised learning L2 Perceptrons Linear L6 Hebbian Learning and

- classification classification Associative Memory
- regression
L3 och L4 Feed forward multiple layer Reinforcement
networks and Backpropagation learning Unsupervised
learning

We are here now

L5 Recurrent Neural Sequence and L7 Hopfield Networks and
Perception Networks (RNN) temporal data Boltzman Machines

L8 Convolutional Neural
Networks (CNN)
L9 Deep Learning and Development of
recent developments the ANN field
Recurrent Neural Network (RNN)
A recurrent neural network (RNN) is a class of artificial neural networks
which are able to handle sequences in space and time. This allows RNNs to
exhibit temporal dynamic behaviors.

Unlike feedforward neural networks, RNNs have a memory (persistent state)

that affects fortcoming computations. However it should be observed that
many RNN implements memory indirectly by unfolding of the network into
time steps using hidden units (stateless fashion). Only some RNN has explicit
cell states (statefull fashion).

RNNs can use their internal state (memory) to process temporal or sequential
structures and varying length of inputs and outputs. This makes them
applicable to tasks such as handwriting recognition or speech recognition.

A RNN performs the same task for every element of a sequence (that’s
what recurrent stands for).

RNNs have cycles. A RNN takes both the output of the network from the
Single Layer RNN
previous time step as input and uses the internal state from the previous time
step as a starting point for the current time step.
Recurrent Neural Network (RNN)

ANN example RNN example 1 RNN example 2 RNN example 3 RNN example 4

Instance Image Seq. of words Seq. of words Video frames

mapped onto mapped onto mapped onto mapped onto mapped onto
Class Seq. of Words Sentiment Seq. of words Classifications
Different forms of Recurrent Neural Network (RNN)
RRN has received a substantial interest in the ANN world and can also boost many success stories.
As a consequence the area has many alternative lines of development that are not totally trivial to follow.

We will start to discuss what we call a Vanilla RNN, single layered RNN that can be unrolled and replaced with a
strictly feedforward acyclic neural network. A requirement for the Vanilla RNN is that time can be discretized
which in turns requires that the duration and effect of single neuron activities are finite (finite impulse recurrent
network). A network that lacks this property is called an infinite impulse recurrent network and that cannot be
unrolled.

The two natural extensions of the Vanilla RNN are to allow dependencies not only backward but also forward in
time (Bi-directional RNN) or/and to stack vanilla RNNs (Multilayer RNN). Obviously the layers can be
individually unfolded as for the single layer case.

The Vanilla RNN does not solve but rather makes problems like vanishing gradient or exploding gradient
even worse. Vanilla RNN has in practise also a problem to handle very long sequences. The most wellknown
attempt to handle these problems are Short Long Term Memory (SLTM), which introduces a much more complex
machinery in each Recurrent Neuron, essentially gating mechanisms for stronger signal control.

Finally certain architectures that are termed RNN are not able to dynamically handle input sequences. However
they have cycles and internal memory. An example is an associative memory architecture like a Hopfield
network.
Unfolding of a Vanilla RNN
into a Feedfoward ANN
without cycles
Unfolding (Unrolling) a Recurrent Neural Network
Transformation into a Feed Forward network
Consider the case where we have multiple time steps of input (X(t), X(t+1), …), multiple
time steps of internal state (u(t), u(t+1), …), and multiple time steps of outputs (y(t), y(t+1),
…).

We can unfold the above network schematic into a graph without any cycles. We can see that
the output (y(t)) and internal state (u(t)) from the previous time step are passed on to the
network as inputs for processing the next time step. Bias weights are omitted for clarity.

Key in this conceptualization is that the network (RNN) does not change between the
unfolded time steps. Specifically, the same weights are used for each time step and it is only
the outputs and the internal states that differ.

Further, each copy of the network may be thought of as an additional layer of the same feed
forward neural network. RNNs, once unfolded in time, can be seen as very deep feedforward
networks in which all the layers share the same weights.

By unfolding an RNN N times, very deep feedforward networks are generated, where a new
layer is created for each time step of an input sequence processed by the network.
Unfolding (Unrolling) for three time steps
Hidden state at time step t
Output state at time step t

Activation function

Input at time step

8
Consequence of Unfolding for the Learning Process

Backpropagation through time

As an unfolded RNN is a straightforward Feed Forward Network network it also inherits the
potential for learning through Backpropagation.

Importantly, the backpropagation of error for a given time step depends on the activation of the
network at the prior time step.

Error is propagated back to the first input time step of the sequence so that the error gradient can
be calculated and the weights of the network can be updated.

Like for standard backpropagation, backpropagation through time consists of a repeated

application of the chain rule.

The subtlety is that, for recurrent networks, the loss function depends on the activation of the
hidden layer not only through its influence on the output layer, but also through its influence on
the hidden layer at the next time-step.
Multi Layer or Stacked Recurrent
Neural Network (RNN)

It is a natural extension to stack vanilla RNNs (Multilayer RNN).

Obviously each layer in the stack can be individually unfolded

as for the single layer case.

The main reason for having many layers is that the layers correspond
To different levels of abstraction or aggregation for the sequential and
temporal data-items.

As an example in wordprocessing the first layer can model sequences

of characters, the second level sequences of words, the third level
sequences of sentences etc.

The risk by increasing the number of layers is to make problems like

Vanishing Gradient even worse.
Bidirectional Recurrent Neural Networks (BRNN)
By introducing what is termed Bidirectional RNN,
the output layer can get information from past
(backwards) and future (forward) states
simultaneously.

The principle of BRNN is to split the neurons of a

regular RNN into two directions, one for positive
time direction (forward states), and another for
negative time direction (backward states).

One can view this as a second step of unfolding.

Bidirectional Recurrent Neural Networks

connects two hidden layers of opposite directions to
the same output.

The output of from states in one direction are not

connected to inputs of the opposite direction states.

Obviously also bidrectional RNN can be stacked in

several levels.
Challenges for RNN
The vanishing gradient problem
The vanishing gradient problem is a difficulty for networks with gradient-
based learning methods and backpropagation, ANN and unfolded RNN. The
problem was first highlighted by Hochreiter in 1991.

In such methods, each of the neural network's weights receives an update

proportional to the partial derivative of the error function with respect to the
current weight in each iteration of training. As many activation functions have
gradients in the range of (0..1) and these are combined using the chain rule, the
final gradient can be vanishingly small, effectively preventing the weight
from changing its value.
The exploding gradient problem

Exploding gradients are a problem when large error gradients accumulate and
result in very large updates to neural network model weights during training. A
very large gradient causes an unstable network.
Very long sequences and temporal dependencies
Even if a RNN theoretically can handle very long sequences and temporal
dependencies, in practise the very deep networks generated are increasingly
harmed by performance problems.
Long Short Term Memory (LSTM)
Long Short-Term Memory (LSTM) networks are extensions of recurrent neural
networks, which basically extends their memory function. The core of the approach is
to elaborate the interior of a vanilla RNN unit with the purpose to increase control of
signal flows.
LSTM was explicitly designed to combat the vanishing and long-term dependency
problems. LSTM was introduced by Hochreiter and Schmidhuber in 1997. The units
of an LSTM are used as building units for the layers of a RNN, which is then often
called an LSTM network.
LSTM’s enable RNN’s to remember their inputs over a long period of time. This is
because LSTM’s contain their information in a memory, that is much like the
memory of a computer because the LSTM can read, write and delete information
from its memory.
LSTMs are widely recognized academically as well as commercially They are
extensively used for speech and language processing by companies like Google,
Apple, Microsoft and Amazon.

13
Difference between vanilla RNN and LSTM

14
Comments on the detailed functionality of a LSTM unit
The LSTM memory can be seen as a gated cell, where gated means that the
cell decides whether or not to store or delete information (typically by
opening the gates or not), based on the importance it assigns to the
information.

The assigning of importance happens through weights, which are also

learned by the algorithm. This simply means that it learns over time which
information is important and which is not.

Specific to LSTMs is the cell state manifested by the horizontal line running
through the top of the diagram (Ct-1…Ct , C for cell state).

In an LSTM you have three gates. These gates determine whether or not to
• let new input in (input gate)
• delete the information because it isn’t important (forget gate) or to
• let it impact the output at the current time step (output gate).
Structural aspects of LSTM
In principal LSTMs can as for Vanilla RNN:
• be unfolded
• be stacked in a multilayer structure
• be arranged in a bi-directional fashion

If this is done in a stateless fashion (no cell state is used : Ct-1=Ct ) the resulting network is a
normal feed forward network potentially with backpropagation (with smaller modifications).

If the LSTM cells use explicit cell states (statefull fashion) there is no guarantee for the
above.
List of RNN systems/approaches
Vanilla (Full or Standard) RNN taht can be unfolded
Multiple layer (stacked) vanilla RNNs
Bidirectional RNN - one layer or multiple layers

LSTM

-----------------------------------------------------------------------------------------------------------

Gated Recurrent Unit - a simpler version of LSTM

Elman networks and Jordan networks - early RNNs with simple structures
Hopfield networks - to be described in the context of Associative Memory
Echo state - an RNN wide a very sparsely connected hidden layer
Independent RNN (IndRNN) - restrict connectivity to fight vanishing gradients

Neural history compressor, Neural Turing machines

Continuous-time RNN, Multiple timescales model
Recurrent multilayer perceptron network
Differentiable neural computer and Neural network pushdown automata
.........
NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Thanks for your attention!

The next lecture 6.6 will be on the topic:

Hebbian Learning and

Associative Memory

GenAI Module2
No ratings yet
GenAI Module2
190 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
54 pages
SRM Institute of Science and Technology: Record Work
No ratings yet
SRM Institute of Science and Technology: Record Work
251 pages
Recurrent & Recursive Nets
No ratings yet
Recurrent & Recursive Nets
10 pages
DL Co3 - PPT 1
No ratings yet
DL Co3 - PPT 1
22 pages
CS115 Math For Computer Science
No ratings yet
CS115 Math For Computer Science
45 pages
NLP Unit-3A Notes
No ratings yet
NLP Unit-3A Notes
28 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
21 pages
2 Marks Deep Learning
No ratings yet
2 Marks Deep Learning
4 pages
RNN LSTM Gru R
No ratings yet
RNN LSTM Gru R
97 pages
Short Notes On Vanishing & Exploding Gradients
No ratings yet
Short Notes On Vanishing & Exploding Gradients
30 pages
RNN Tutorial
No ratings yet
RNN Tutorial
41 pages
Pretrained Inception-V3 Convolutional Neural Network - MATLAB Inceptionv3
100% (1)
Pretrained Inception-V3 Convolutional Neural Network - MATLAB Inceptionv3
2 pages
Deep Arch MSC 2024
No ratings yet
Deep Arch MSC 2024
83 pages
Module2 L7 RNN LSTM
No ratings yet
Module2 L7 RNN LSTM
47 pages
Unit V
No ratings yet
Unit V
32 pages
Unit 5
No ratings yet
Unit 5
76 pages
Module I
No ratings yet
Module I
109 pages
Cabs Availability Prediction Using Deep Learning: Project Member
No ratings yet
Cabs Availability Prediction Using Deep Learning: Project Member
58 pages
T3-Slide - 002 - Vanilla RNNs
No ratings yet
T3-Slide - 002 - Vanilla RNNs
25 pages
DNN U2 Notes
No ratings yet
DNN U2 Notes
32 pages
Dl-Unit 5
No ratings yet
Dl-Unit 5
10 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
Unit 3 RCNN
No ratings yet
Unit 3 RCNN
25 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
18 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
Nria20-Dl - Unit-4 Notes-Final
No ratings yet
Nria20-Dl - Unit-4 Notes-Final
21 pages
Deep & Reinforcement - Unit 4
No ratings yet
Deep & Reinforcement - Unit 4
17 pages
Unit-2 Part-2
No ratings yet
Unit-2 Part-2
42 pages
Hopfield Networks and Boltzman Machines-Part 1
100% (1)
Hopfield Networks and Boltzman Machines-Part 1
13 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
Chen and Chang1996 - A Feedforward Neural Network With Function Shape Autotuning
No ratings yet
Chen and Chang1996 - A Feedforward Neural Network With Function Shape Autotuning
15 pages
Module 5
No ratings yet
Module 5
21 pages
DL Notes
No ratings yet
DL Notes
35 pages
NFT PPT1 w21
No ratings yet
NFT PPT1 w21
352 pages
Convolutional Neural Networks - Annotated
No ratings yet
Convolutional Neural Networks - Annotated
83 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
No ratings yet
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
18 pages
Unit 3
No ratings yet
Unit 3
30 pages
Deep Learning Interview Questions - Deep Learning Questions
No ratings yet
Deep Learning Interview Questions - Deep Learning Questions
21 pages
RNN Introduction
No ratings yet
RNN Introduction
22 pages
Module 5 (Chapter 10)
No ratings yet
Module 5 (Chapter 10)
17 pages
Model of Neuron in An ANN
No ratings yet
Model of Neuron in An ANN
12 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Recurrent Neural Network (RNN)
No ratings yet
Recurrent Neural Network (RNN)
26 pages
What Are Recurrent Neural Networks
No ratings yet
What Are Recurrent Neural Networks
7 pages
2 U4-Rnn
No ratings yet
2 U4-Rnn
17 pages
5.4-Reinforcement Learning-Part2-Learning-Algorithms
No ratings yet
5.4-Reinforcement Learning-Part2-Learning-Algorithms
15 pages
NNDL Objective Paper - Mid - 1
No ratings yet
NNDL Objective Paper - Mid - 1
8 pages
MLT CNN Architectures
No ratings yet
MLT CNN Architectures
104 pages
Unit V Recurrent Neural Networks
No ratings yet
Unit V Recurrent Neural Networks
35 pages
Video Classification Project
No ratings yet
Video Classification Project
52 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
Perceptrons
No ratings yet
Perceptrons
11 pages
Recurrent Neural Networks: Index
No ratings yet
Recurrent Neural Networks: Index
13 pages
RNN SK
No ratings yet
RNN SK
17 pages
Unit 4 - Merged
No ratings yet
Unit 4 - Merged
13 pages
Module 06
No ratings yet
Module 06
5 pages
An Introduction To Convolutional Neural Networks: November 2015
No ratings yet
An Introduction To Convolutional Neural Networks: November 2015
12 pages
cs224n Practice Midterm 3 Sol
No ratings yet
cs224n Practice Midterm 3 Sol
14 pages
Department of Electronics & Electrical Engineering: Ec5245: Artificial Neural Network & Fuzzy Logic
No ratings yet
Department of Electronics & Electrical Engineering: Ec5245: Artificial Neural Network & Fuzzy Logic
51 pages
Graph Neural Networks (GNNS)
No ratings yet
Graph Neural Networks (GNNS)
22 pages
Convolutional Neural Networks-Part2
No ratings yet
Convolutional Neural Networks-Part2
21 pages
7.2 Interdisciplinary Inspiration
No ratings yet
7.2 Interdisciplinary Inspiration
17 pages
Lecture Notes - RRN
No ratings yet
Lecture Notes - RRN
8 pages
DL Unit - III Notes1
No ratings yet
DL Unit - III Notes1
14 pages
DL 4
No ratings yet
DL 4
11 pages
Unit III (2) RNN, LSTM, Gru
No ratings yet
Unit III (2) RNN, LSTM, Gru
14 pages
Neural Networks and Recurrent Neural Networks
No ratings yet
Neural Networks and Recurrent Neural Networks
1 page
P95 Course Slides
No ratings yet
P95 Course Slides
86 pages
6 9-DeepLearning
No ratings yet
6 9-DeepLearning
8 pages
Assignments For Week 5 2024
No ratings yet
Assignments For Week 5 2024
10 pages
Types of Hidden Layers
No ratings yet
Types of Hidden Layers
3 pages
UNIT5
No ratings yet
UNIT5
13 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
9 pages
Sparse, Stacked and Variational Autoencoder - by Venkata Krishna Jonnalagadda - Medium
No ratings yet
Sparse, Stacked and Variational Autoencoder - by Venkata Krishna Jonnalagadda - Medium
17 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
Deep Learning Notebook
No ratings yet
Deep Learning Notebook
7 pages
Hopfield Networks and Boltzman Machines-Part 2
No ratings yet
Hopfield Networks and Boltzman Machines-Part 2
13 pages
Hebbian Learning and Associative Memory
No ratings yet
Hebbian Learning and Associative Memory
13 pages
CNNs Pytorch
No ratings yet
CNNs Pytorch
19 pages
5 2-ExplanationBasedLearning
No ratings yet
5 2-ExplanationBasedLearning
19 pages
Assignments For Week 6 2024
No ratings yet
Assignments For Week 6 2024
13 pages
CH 13
No ratings yet
CH 13
48 pages
Steps For Training A Recurrent Neural Network: Advantages
No ratings yet
Steps For Training A Recurrent Neural Network: Advantages
13 pages
1) Importing Python Packages For GAN
No ratings yet
1) Importing Python Packages For GAN
8 pages
Xor in C#
No ratings yet
Xor in C#
3 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
10 pages
4.2-GeneralizationAsSearch Part 1
No ratings yet
4.2-GeneralizationAsSearch Part 1
17 pages
COMP3308/3608 Artificial Intelligence Week 9 Tutorial Exercises Multilayer Neural Networks 2. Deep Learning
No ratings yet
COMP3308/3608 Artificial Intelligence Week 9 Tutorial Exercises Multilayer Neural Networks 2. Deep Learning
2 pages
4.1-Inductive Learning Based On Symbolic Representations and Week Theories
No ratings yet
4.1-Inductive Learning Based On Symbolic Representations and Week Theories
9 pages
Taud 2017
No ratings yet
Taud 2017
5 pages
Recurrent Neural Network Wiki
100% (1)
Recurrent Neural Network Wiki
7 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
11 pages
Ministry of Higher Education and Scientific Research University of Technology Computer Engineering Department
No ratings yet
Ministry of Higher Education and Scientific Research University of Technology Computer Engineering Department
6 pages
Activation Function: Presented by
No ratings yet
Activation Function: Presented by
19 pages
Rosen Blatt's Perceptron Model
No ratings yet
Rosen Blatt's Perceptron Model
11 pages
Paper of Rolling Net
No ratings yet
Paper of Rolling Net
9 pages
Soft Computing Roadmap
No ratings yet
Soft Computing Roadmap
3 pages

Recurrent Neural Networks

Uploaded by

Recurrent Neural Networks

Uploaded by

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 6 Machine Learning based

Video 6.5 Recurrent Neural Networks

McCulloch and Pitts

Supervised learning L2 Perceptrons Linear L6 Hebbian Learning and

We are here now

Unlike feedforward neural networks, RNNs have a memory (persistent state)

Instance Image Seq. of words Seq. of words Video frames

Input at time step

Backpropagation through time

Like for standard backpropagation, backpropagation through time consists of a repeated

It is a natural extension to stack vanilla RNNs (Multilayer RNN).

Obviously each layer in the stack can be individually unfolded

As an example in wordprocessing the first layer can model sequences

The risk by increasing the number of layers is to make problems like

The principle of BRNN is to split the neurons of a

One can view this as a second step of unfolding.

Bidirectional Recurrent Neural Networks

The output of from states in one direction are not

Obviously also bidrectional RNN can be stacked in

In such methods, each of the neural network's weights receives an update

The assigning of importance happens through weights, which are also

Gated Recurrent Unit - a simpler version of LSTM

Neural history compressor, Neural Turing machines

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Thanks for your attention!

The next lecture 6.6 will be on the topic:

Hebbian Learning and

You might also like