0% found this document useful (0 votes)
11 views

Module 06

Module 06 covers Recurrent Neural Networks (RNNs), which are designed for processing sequences of data, such as text and speech. It discusses various types of RNNs, including LSTM and GRU, and their applications in fields like natural language processing and finance. The module also addresses challenges like vanishing gradients and solutions such as gradient clipping and advanced architectures.

Uploaded by

yoxisam356
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Module 06

Module 06 covers Recurrent Neural Networks (RNNs), which are designed for processing sequences of data, such as text and speech. It discusses various types of RNNs, including LSTM and GRU, and their applications in fields like natural language processing and finance. The module also addresses challenges like vanishing gradients and solutions such as gradient clipping and advanced architectures.

Uploaded by

yoxisam356
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Module 06: Recurrent Neural Networks

Contents: Introduction to Sequence Models and RNNs, RNN Model, Backpropagation


through Time (BPTT), Different Types of RNNs: Unfolded RNNs, Seq2Seq RNNs, Long
Short-Term Memory (LSTM), Bidirectional RNN, Vanishing Gradients with RNNs, Gated
Recurrent Unit (GRU), RNN applications
Introduction to Sequence Models : Sequence models are used for tasks where the input or
output (or both) are ordered sequences. Examples include:

• Text data (e.g., sentences)

• Speech recognition

• Time series forecasting

• Music generation

Sequence models capture dependencies between elements of a sequence, which traditional


feedforward neural networks cannot do.

Recurrent Neural Networks (RNNs)


Recurrent Neural Networks (RNNs) are a type of artificial neural network designed to process
sequences of data. They work especially well for jobs requiring sequences, such as time series
data, voice, natural language, and other activities. RNN works on the principle of saving the
output of a particular layer and feeding this back to the input in order to predict the output of
the layer.

Below is how you can convert a Feed-Forward Neural Network into a Recurrent Neural
Network:

The nodes in different layers of the neural network are compressed to form a single layer of
recurrent neural networks. A, B, and C are the parameters of the network. The input
layer X processes the initial input and passes it to the middle layer A. The middle layer consists
of multiple hidden layers, each with its activation functions, weights, and biases. These
parameters are standardized across the hidden layer so that instead of creating multiple hidden
layers, it will create one and loop it over.
Instead of using traditional backpropagation, recurrent neural networks
use backpropagation through time (BPTT) algorithms to determine the gradient. In
backpropagation, the model adjusts the parameter by calculating errors from the output to the
input layer. BPTT sums the error at each time step as RNN shares parameters across each layer.
Backpropagation Through Time (BPTT)

BPTT is a version of backpropagation used to train RNNs. The idea is to unroll the network
over time and apply standard backpropagation.
Steps:

1. Forward pass-through time

2. Compute loss at each time step

3. Backward pass-through time to compute gradients

4. Update parameters

Challenges:

• Vanishing gradients: Small gradients vanish over long sequences


• Exploding gradients: Large gradients can make training unstable

Solutions:

• Gradient clipping

• LSTM and GRU architectures

Types of Recurrent Neural Networks

Feedforward networks have single input and output, while recurrent neural networks are
flexible as the length of inputs and outputs can be changed. This flexibility allows RNNs to
generate music, sentiment classification, and machine translation.

There are four types of RNN based on different lengths of inputs and outputs.

• One-to-one is a simple neural network. It is commonly used for machine learning


problems that have a single input and output.

• One-to-many has a single input and multiple outputs. This is used for generating image
captions.

• Many-to-one takes a sequence of multiple inputs and predicts a single output. It is


popular in sentiment classification, where the input is text and the output is a category.

• Many-to-many takes multiple inputs and outputs. The most common application is
machine translation.
Key Differences Between CNN and RNN

• CNN is applicable for sparse data like images. RNN is applicable for time series and
sequential data.

• While training the model, CNN uses a simple backpropagation and RNN uses
backpropagation through time to calculate the loss.

• RNN can have no restriction in length of inputs and outputs, but CNN has finite inputs
and finite outputs.

• CNN has a feedforward network and RNN works on loops to handle sequential data.

• CNN can also be used for video and image processing. RNN is primarily used for
speech and text analysis.

RNN Advanced Architectures

1. Unfolded RNN

An unfolded RNN displays the time steps of the RNN as layers. Helps understand dependencies
and training flow.

Sequence-to-Sequence (Seq2Seq) RNN

• Encoder-decoder architecture

• Used for tasks like machine translation, summarization

2. Long Short-Term Memory (LSTM)

The Long Short-Term Memory (LSTM) is the advanced type of RNN, which was designed to
prevent both decaying and exploding gradient problems. Just like RNN, LSTM has repeating
modules, but the structure is different. Instead of having a single layer of tanh, LSTM has four
interacting layers that communicate with each other. This four-layered structure helps LSTM
retain long-term memory and can be used in several sequential problems including machine
translation, speech synthesis, speech recognition, and handwriting recognition. All RNN are
in the form of a chain of repeating modules of a neural network. In standard RNNs, this
repeating module will have a very simple structure, such as a single tanh layer. LSTMs also
have a chain-like structure, but the repeating module is a bit different structure. Instead of
having a single neural network layer, four interacting layers are communicating extraordinarily.
3. Bidirectional RNN (BiRNN)

• Processes sequence in both forward and backward directions

• Final output: concatenation of forward and backward hidden states

• Better for understanding full context (e.g., in NLP)

4. Vanishing Gradients with RNNs

Problem:

• During backpropagation, gradients become very small and eventually vanish


• As a result, earlier time steps fail to learn effectively

Cause:

• Chain rule of derivatives causes exponential decrease in gradients over many


layers/time steps

Solution:

• Use LSTM or GRU

• Gradient clipping

• Use ReLU or other non-saturating activation functions

5. Gated Recurrent Unit (GRU)


The gated recurrent unit (GRU) is a variation of LSTM as both have design similarities, and in
some cases, they produce similar results. GRU uses an update gate and reset gate to solve the
vanishing gradient problem. These gates decide what information is important and pass it to
the output. The gates can be trained to store information from long ago, without vanishing over
time or removing irrelevant information.
Unlike LSTM, GRU does not have cell state Ct. It only has a hidden state ht, and due to the
simple architecture, GRU has a lower training time compared to LSTM models. The GRU
architecture is easy to understand as it takes input xt and the hidden state from the previous
timestamp ht-1 and outputs the new hidden state ht.
Applications of RNNs

Domain Applications

NLP Text generation, translation, chatbots

Speech Voice recognition, speech-to-text

Healthcare ECG analysis, patient monitoring

Finance Stock prediction, anomaly detection

Robotics Motion control, sequence prediction

You might also like