0% found this document useful (0 votes)
69 views18 pages

Recurrent Neural Networks

This document provides an overview of recurrent neural networks (RNNs). It discusses how RNNs can handle sequential data by using their internal state to process inputs of varying lengths. It also explains how RNNs differ from feedforward neural networks in having cycles and using previous outputs and internal states as inputs. The document then summarizes various types of RNNs including vanilla RNNs, bidirectional RNNs, multilayer RNNs, and long short-term memory (LSTM) networks, and discusses challenges like vanishing gradients that RNNs aim to address.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views18 pages

Recurrent Neural Networks

This document provides an overview of recurrent neural networks (RNNs). It discusses how RNNs can handle sequential data by using their internal state to process inputs of varying lengths. It also explains how RNNs differ from feedforward neural networks in having cycles and using previous outputs and internal states as inputs. The document then summarizes various types of RNNs including vanilla RNNs, bidirectional RNNs, multilayer RNNs, and long short-term memory (LSTM) networks, and discusses challenges like vanishing gradients that RNNs aim to address.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 6 Machine Learning based


on Artificial Neural Networks

Video 6.5 Recurrent Neural Networks


Structure of Lectures in week 6
L1 Fundamentals of
Neural Networks

McCulloch and Pitts

Supervised learning L2 Perceptrons Linear L6 Hebbian Learning and


- classification classification Associative Memory
- regression
L3 och L4 Feed forward multiple layer Reinforcement
networks and Backpropagation learning Unsupervised
learning

We are here now


L5 Recurrent Neural Sequence and L7 Hopfield Networks and
Perception Networks (RNN) temporal data Boltzman Machines

L8 Convolutional Neural
Networks (CNN)
L9 Deep Learning and Development of
recent developments the ANN field
Recurrent Neural Network (RNN)
A recurrent neural network (RNN) is a class of artificial neural networks
which are able to handle sequences in space and time. This allows RNNs to
exhibit temporal dynamic behaviors.

Unlike feedforward neural networks, RNNs have a memory (persistent state)


that affects fortcoming computations. However it should be observed that
many RNN implements memory indirectly by unfolding of the network into
time steps using hidden units (stateless fashion). Only some RNN has explicit
cell states (statefull fashion).

RNNs can use their internal state (memory) to process temporal or sequential
structures and varying length of inputs and outputs. This makes them
applicable to tasks such as handwriting recognition or speech recognition.

A RNN performs the same task for every element of a sequence (that’s
what recurrent stands for).

RNNs have cycles. A RNN takes both the output of the network from the
Single Layer RNN
previous time step as input and uses the internal state from the previous time
step as a starting point for the current time step.
Recurrent Neural Network (RNN)

ANN example RNN example 1 RNN example 2 RNN example 3 RNN example 4

Instance Image Seq. of words Seq. of words Video frames


mapped onto mapped onto mapped onto mapped onto mapped onto
Class Seq. of Words Sentiment Seq. of words Classifications
Different forms of Recurrent Neural Network (RNN)
RRN has received a substantial interest in the ANN world and can also boost many success stories.
As a consequence the area has many alternative lines of development that are not totally trivial to follow.

We will start to discuss what we call a Vanilla RNN, single layered RNN that can be unrolled and replaced with a
strictly feedforward acyclic neural network. A requirement for the Vanilla RNN is that time can be discretized
which in turns requires that the duration and effect of single neuron activities are finite (finite impulse recurrent
network). A network that lacks this property is called an infinite impulse recurrent network and that cannot be
unrolled.

The two natural extensions of the Vanilla RNN are to allow dependencies not only backward but also forward in
time (Bi-directional RNN) or/and to stack vanilla RNNs (Multilayer RNN). Obviously the layers can be
individually unfolded as for the single layer case.

The Vanilla RNN does not solve but rather makes problems like vanishing gradient or exploding gradient
even worse. Vanilla RNN has in practise also a problem to handle very long sequences. The most wellknown
attempt to handle these problems are Short Long Term Memory (SLTM), which introduces a much more complex
machinery in each Recurrent Neuron, essentially gating mechanisms for stronger signal control.

Finally certain architectures that are termed RNN are not able to dynamically handle input sequences. However
they have cycles and internal memory. An example is an associative memory architecture like a Hopfield
network.
Unfolding of a Vanilla RNN
into a Feedfoward ANN
without cycles
Unfolding (Unrolling) a Recurrent Neural Network
Transformation into a Feed Forward network
Consider the case where we have multiple time steps of input (X(t), X(t+1), …), multiple
time steps of internal state (u(t), u(t+1), …), and multiple time steps of outputs (y(t), y(t+1),
…).

We can unfold the above network schematic into a graph without any cycles. We can see that
the output (y(t)) and internal state (u(t)) from the previous time step are passed on to the
network as inputs for processing the next time step. Bias weights are omitted for clarity.

Key in this conceptualization is that the network (RNN) does not change between the
unfolded time steps. Specifically, the same weights are used for each time step and it is only
the outputs and the internal states that differ.

Further, each copy of the network may be thought of as an additional layer of the same feed
forward neural network. RNNs, once unfolded in time, can be seen as very deep feedforward
networks in which all the layers share the same weights.

By unfolding an RNN N times, very deep feedforward networks are generated, where a new
layer is created for each time step of an input sequence processed by the network.
Unfolding (Unrolling) for three time steps
Hidden state at time step t
Output state at time step t

Activation function

Input at time step

8
Consequence of Unfolding for the Learning Process

Backpropagation through time


As an unfolded RNN is a straightforward Feed Forward Network network it also inherits the
potential for learning through Backpropagation.

Importantly, the backpropagation of error for a given time step depends on the activation of the
network at the prior time step.

Error is propagated back to the first input time step of the sequence so that the error gradient can
be calculated and the weights of the network can be updated.

Like for standard backpropagation, backpropagation through time consists of a repeated


application of the chain rule.

The subtlety is that, for recurrent networks, the loss function depends on the activation of the
hidden layer not only through its influence on the output layer, but also through its influence on
the hidden layer at the next time-step.
Multi Layer or Stacked Recurrent
Neural Network (RNN)

It is a natural extension to stack vanilla RNNs (Multilayer RNN).

Obviously each layer in the stack can be individually unfolded


as for the single layer case.

The main reason for having many layers is that the layers correspond
To different levels of abstraction or aggregation for the sequential and
temporal data-items.

As an example in wordprocessing the first layer can model sequences


of characters, the second level sequences of words, the third level
sequences of sentences etc.

The risk by increasing the number of layers is to make problems like


Vanishing Gradient even worse.
Bidirectional Recurrent Neural Networks (BRNN)
By introducing what is termed Bidirectional RNN,
the output layer can get information from past
(backwards) and future (forward) states
simultaneously.

The principle of BRNN is to split the neurons of a


regular RNN into two directions, one for positive
time direction (forward states), and another for
negative time direction (backward states).

One can view this as a second step of unfolding.

Bidirectional Recurrent Neural Networks


connects two hidden layers of opposite directions to
the same output.

The output of from states in one direction are not


connected to inputs of the opposite direction states.

Obviously also bidrectional RNN can be stacked in


several levels.
Challenges for RNN
The vanishing gradient problem
The vanishing gradient problem is a difficulty for networks with gradient-
based learning methods and backpropagation, ANN and unfolded RNN. The
problem was first highlighted by Hochreiter in 1991.

In such methods, each of the neural network's weights receives an update


proportional to the partial derivative of the error function with respect to the
current weight in each iteration of training. As many activation functions have
gradients in the range of (0..1) and these are combined using the chain rule, the
final gradient can be vanishingly small, effectively preventing the weight
from changing its value.
The exploding gradient problem

Exploding gradients are a problem when large error gradients accumulate and
result in very large updates to neural network model weights during training. A
very large gradient causes an unstable network.
Very long sequences and temporal dependencies
Even if a RNN theoretically can handle very long sequences and temporal
dependencies, in practise the very deep networks generated are increasingly
harmed by performance problems.
Long Short Term Memory (LSTM)
Long Short-Term Memory (LSTM) networks are extensions of recurrent neural
networks, which basically extends their memory function. The core of the approach is
to elaborate the interior of a vanilla RNN unit with the purpose to increase control of
signal flows.
LSTM was explicitly designed to combat the vanishing and long-term dependency
problems. LSTM was introduced by Hochreiter and Schmidhuber in 1997. The units
of an LSTM are used as building units for the layers of a RNN, which is then often
called an LSTM network.
LSTM’s enable RNN’s to remember their inputs over a long period of time. This is
because LSTM’s contain their information in a memory, that is much like the
memory of a computer because the LSTM can read, write and delete information
from its memory.
LSTMs are widely recognized academically as well as commercially They are
extensively used for speech and language processing by companies like Google,
Apple, Microsoft and Amazon.

13
Difference between vanilla RNN and LSTM

14
Comments on the detailed functionality of a LSTM unit
The LSTM memory can be seen as a gated cell, where gated means that the
cell decides whether or not to store or delete information (typically by
opening the gates or not), based on the importance it assigns to the
information.

The assigning of importance happens through weights, which are also


learned by the algorithm. This simply means that it learns over time which
information is important and which is not.

Specific to LSTMs is the cell state manifested by the horizontal line running
through the top of the diagram (Ct-1…Ct , C for cell state).

In an LSTM you have three gates. These gates determine whether or not to
• let new input in (input gate)
• delete the information because it isn’t important (forget gate) or to
• let it impact the output at the current time step (output gate).
Structural aspects of LSTM
In principal LSTMs can as for Vanilla RNN:
• be unfolded
• be stacked in a multilayer structure
• be arranged in a bi-directional fashion

If this is done in a stateless fashion (no cell state is used : Ct-1=Ct ) the resulting network is a
normal feed forward network potentially with backpropagation (with smaller modifications).

If the LSTM cells use explicit cell states (statefull fashion) there is no guarantee for the
above.
List of RNN systems/approaches
Vanilla (Full or Standard) RNN taht can be unfolded
Multiple layer (stacked) vanilla RNNs
Bidirectional RNN - one layer or multiple layers

LSTM

-----------------------------------------------------------------------------------------------------------

Gated Recurrent Unit - a simpler version of LSTM


Elman networks and Jordan networks - early RNNs with simple structures
Hopfield networks - to be described in the context of Associative Memory
Echo state - an RNN wide a very sparsely connected hidden layer
Independent RNN (IndRNN) - restrict connectivity to fight vanishing gradients

Neural history compressor, Neural Turing machines


Continuous-time RNN, Multiple timescales model
Recurrent multilayer perceptron network
Differentiable neural computer and Neural network pushdown automata
.........
NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Thanks for your attention!

The next lecture 6.6 will be on the topic:

Hebbian Learning and


Associative Memory

You might also like