Module 06
Module 06
• Speech recognition
• Music generation
Below is how you can convert a Feed-Forward Neural Network into a Recurrent Neural
Network:
The nodes in different layers of the neural network are compressed to form a single layer of
recurrent neural networks. A, B, and C are the parameters of the network. The input
layer X processes the initial input and passes it to the middle layer A. The middle layer consists
of multiple hidden layers, each with its activation functions, weights, and biases. These
parameters are standardized across the hidden layer so that instead of creating multiple hidden
layers, it will create one and loop it over.
Instead of using traditional backpropagation, recurrent neural networks
use backpropagation through time (BPTT) algorithms to determine the gradient. In
backpropagation, the model adjusts the parameter by calculating errors from the output to the
input layer. BPTT sums the error at each time step as RNN shares parameters across each layer.
Backpropagation Through Time (BPTT)
BPTT is a version of backpropagation used to train RNNs. The idea is to unroll the network
over time and apply standard backpropagation.
Steps:
4. Update parameters
Challenges:
Solutions:
• Gradient clipping
Feedforward networks have single input and output, while recurrent neural networks are
flexible as the length of inputs and outputs can be changed. This flexibility allows RNNs to
generate music, sentiment classification, and machine translation.
There are four types of RNN based on different lengths of inputs and outputs.
• One-to-many has a single input and multiple outputs. This is used for generating image
captions.
• Many-to-many takes multiple inputs and outputs. The most common application is
machine translation.
Key Differences Between CNN and RNN
• CNN is applicable for sparse data like images. RNN is applicable for time series and
sequential data.
• While training the model, CNN uses a simple backpropagation and RNN uses
backpropagation through time to calculate the loss.
• RNN can have no restriction in length of inputs and outputs, but CNN has finite inputs
and finite outputs.
• CNN has a feedforward network and RNN works on loops to handle sequential data.
• CNN can also be used for video and image processing. RNN is primarily used for
speech and text analysis.
1. Unfolded RNN
An unfolded RNN displays the time steps of the RNN as layers. Helps understand dependencies
and training flow.
• Encoder-decoder architecture
The Long Short-Term Memory (LSTM) is the advanced type of RNN, which was designed to
prevent both decaying and exploding gradient problems. Just like RNN, LSTM has repeating
modules, but the structure is different. Instead of having a single layer of tanh, LSTM has four
interacting layers that communicate with each other. This four-layered structure helps LSTM
retain long-term memory and can be used in several sequential problems including machine
translation, speech synthesis, speech recognition, and handwriting recognition. All RNN are
in the form of a chain of repeating modules of a neural network. In standard RNNs, this
repeating module will have a very simple structure, such as a single tanh layer. LSTMs also
have a chain-like structure, but the repeating module is a bit different structure. Instead of
having a single neural network layer, four interacting layers are communicating extraordinarily.
3. Bidirectional RNN (BiRNN)
Problem:
Cause:
Solution:
• Gradient clipping
Domain Applications