0% found this document useful (0 votes)
8 views32 pages

Unit V

The document provides an overview of various types of Recurrent Neural Networks (RNNs), including their architectures, advantages, and challenges, as well as applications in fields like natural language processing and image generation. It discusses specific types such as Bidirectional RNNs and Deep Recurrent Networks, highlighting their ability to process sequential data and capture long-term dependencies. Additionally, it addresses common issues faced by RNNs, such as vanishing and exploding gradients, and introduces methods like Long Short-Term Memory (LSTM) networks to mitigate these problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views32 pages

Unit V

The document provides an overview of various types of Recurrent Neural Networks (RNNs), including their architectures, advantages, and challenges, as well as applications in fields like natural language processing and image generation. It discusses specific types such as Bidirectional RNNs and Deep Recurrent Networks, highlighting their ability to process sequential data and capture long-term dependencies. Additionally, it addresses common issues faced by RNNs, such as vanishing and exploding gradients, and introduces methods like Long Short-Term Memory (LSTM) networks to mitigate these problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Unit 5

Recurrent Neural Networks: Introduction – Recursive Neural Networks –


Bidirectional RNNs – Deep Recurrent Networks – Applications: Image
Generation, Image Compression, Natural Language Processing. Complete Auto
encoder, Regularized Autoencoder, Stochastic Encoders and Decoders,
Contractive Encoders.

Recurrent Neural Networks: Introduction

• Recurrent Neural Networks (RNNs) are a type of artificial neural network


designed to effectively deal with sequential data, where the order of
elements matters.
• Unlike feedforward neural networks, where the flow of data is strictly
forward, RNNs have connections that form directed cycles, allowing them
to exhibit dynamic temporal behavior.
• This makes RNNs particularly suitable for tasks such as time series
prediction, natural language processing (NLP), speech recognition, and
more.
• However, if we have data in a sequence such that one data point depends
upon the previous data point, we need to modify the neural network to
incorporate the dependencies between these data points.
• RNNs have the concept of “memory” that helps them store the states or
information of previous inputs to generate the next output of the
sequence.

A simple RNN has a feedback loop, as shown in the first diagram of the above
figure.
The feedback loop shown in the gray rectangle can be unrolled in three-time
steps to produce the second network of the above figure. Of course, you can vary
the architecture so that the network unrolls 𝑘 time steps. In the figure, the
following notation is used:
Hence, in the feedforward pass of an RNN, the network computes the values of
the hidden units and the output after 𝑘 time steps. The weights associated with
the network are shared temporally.
Each recurrent layer has two sets of weights:
• One for the input
• Second for the hidden unit
• The last feedforward layer, which computes the final output for the kth
time step, is just like an ordinary layer of a traditional feedforward
network.

Why Recurrent Neural Networks?

Recurrent Neural Networks have unique capacities as opposed to other kinds


of Neural Networks, which open a wide range of possibilities for their users still
also bringing some challenges with them. Then’s a rundown of the main
benefits
• It’s the only neural network with memory and binary data processing.
• It can plan out several inputs and productions. Unlike other algorithms
that deliver one product for one input, the benefit of RNN is that it can
plot out many to many, one to many, and many to one input and
productions.
Types of Recurrent Neural Networks
There are four types of Recurrent Neural Networks:
• One to One
This type of neural network is understood because the Vanilla Neural
Network. It’s used for general machine learning problems, which
contains a single input and one output.

• One to Many
This type of neural network incorporates a single input and multiple
outputs. An example of this is often the image caption.

• Many to One
This RNN takes a sequence of inputs and generates one output. Sentiment
analysis may be a example of this sort of network where a given sentence
are often classified as expressing positive or negative sentiments.
• Many to Many
This RNN takes a sequence of inputs and generates a sequence of outputs.
artificial intelligence is one among the examples.

Two Issues of Standard RNNs


1. Vanishing Gradient Problem
• Recurrent Neural Networks enable you to model time-dependent and
sequential data problems, like stock exchange prediction, artificial
intelligence, and text generation. you’ll find, however, RNN is tough to
train due to the gradient problem.
• RNNs suffer from the matter of vanishing gradients. The gradients carry
information utilized in the RNN, and when the gradient becomes too
small, the parameter updates become insignificant. This makes the
training of long data sequences difficult.

2. Exploding Gradient Problem

• While training a neural network, if the slope tends to grow exponentially


rather than decaying, this is often called an Exploding Gradient.

• This problem arises when large error gradients accumulate, leading to


very large updates to the neural network model weights during the
training process.
Now, let’s discuss the foremost popular and efficient thanks to cope with
gradient problems, i.e., Long immediate memory Network (LSTMs).

First, let’s understand Long-Term Dependencies.

Suppose you wish to predict the last word within the text: “The clouds
are within the ______.”
The most obvious answer to the present is that the “sky.” We don’t need
from now on context to predict the last word within the above sentence.
Consider this sentence: “I are staying in Spain for the last 10 years…I can
speak fluent ______.”
The word you are expecting will rely on the previous couple of words in
context. Here, you would like the context of Spain to predict the last word
within the text, and also the most fitted answer to the present sentence
is “Spanish.” The gap between the relevant information and the point
where it’s needed may became very large. LSTMs facilitate to solve this
problem.
*****************************************************************

Recursive Neural Networks (ReNNs)


Recursive Neural Networks (ReNNs) are a type of neural network
architecture designed to process structured data, such as hierarchical
data structures or recursive structures. Unlike traditional feedforward or
recurrent neural networks, which operate on fixed-sized input vectors or
sequences, ReNNs operate on tree-like or graph-like structures, allowing
them to model relationships between elements in a hierarchical manner.

Due to their deep tree-like structure, Recursive Neural Networks


can handle hierarchical data. The tree structure means combining child
nodes and producing parent nodes. Each child-parent bond has a weight
matrix, and similar children have the same weights. The number of
children for every node in the tree is fixed to enable it to perform
recursive operations and use the same weights. RvNNs are used when
there's a need to parse an entire sentence.
Difference between Recurrent neural network and recursive
neural networks

Aspect Recurrent Neural Networks Recursive Neural Networks


(RNNs) (ReNNs)
Architecture Sequential architecture, nodes Hierarchical or recursive
connected to previous time architecture, nodes connected in
steps a tree-like or graph-like
structure.
Data Operates on sequential data Handles structured data with
Structure where order matters hierarchical or recursive
relationships
Training Typically trained using May involve specialized
backpropagation through time algorithms for handling the
(BPTT) recursive structure (e.g., BPTS)
Applications Language modeling, machine Parsing syntactic or semantic
translation, sentiment analysis, structures in NLP, analyzing
time series prediction hierarchical structures in images
or videos, processing hierarchical
data in bioinformatics

A Recursive Neural Networks is more like a hierarchical network where


there is really no time aspect to the input sequence but the input has to
be processed hierarchically in a tree fashion. Here is an example of how a
recursive neural network looks. It shows the way to learn a parse tree of
a sentence by recursively taking the output of the operation performed
on a smaller chunk of the text.
• The children of each parent node are just a node like that node.
RvNNs comprise a class of architectures that can work with
structured input. The network looks at a series of inputs, each time
at x1, x2… and prints the results of each of these inputs.
• This means that the output depends on the number of neurons in
each layer of the network and the number of connections between
them. The simplest form of a RvNNs, the vanilla RNG, resembles a
regular neural network. Each layer contains a loop that allows the
model to transfer the results of previous neurons from another
layer.
• Schematically, RvNN layer uses a loop to iterate through a
timestamp sequence while maintaining an internal state that
encodes all the information about that timestamp it has seen so far.

Features of Recursive Neural Networks


• A recursive neural network is created in such a way that it includes
applying same set of weights with different graph like structures.
• The nodes are traversed in topological order.
• This type of network is trained by the reverse mode of automatic
differentiation.
• Natural language processing includes a special case of recursive
neural networks.
• This recursive neural tensor network includes various
composition functional nodes in the tree.
Challenges:
While Recursive Neural Networks offer advantages for modelling
structured data, they also come with challenges:
• Computational Complexity: Processing recursive structures can be
computationally expensive, especially for deep trees or graphs with many
nodes.
• Data Representation: Representing complex structures in a fixed-
dimensional vector space can be challenging, especially for structures
with varying sizes or irregularities.
• Training Difficulty: Training ReNNs may require specialized algorithms
and techniques to handle the recursive nature of the network and
mitigate issues such as vanishing gradients.

Bidirectional Recurrent Neural Networks (Bi-RNNs)

Bidirectional Recurrent Neural Networks (Bi-RNNs) are an extension of


traditional Recurrent Neural Networks (RNNs) that can capture both past and
future information at each time step. In standard RNNs, the prediction at a given
time step depends only on the past history of the sequence. However, in many
applications, it's beneficial to consider both past and future context to make
better predictions.
The architecture of a Bidirectional RNN involves two separate recurrent layers:

1. One processing the input sequence in the forward direction


2. Another processing the sequence in the backward direction.

Each layer computes hidden states at each time step, considering


information from both past and future context. The final output at each time
step is typically a concatenation of the forward and backward hidden states.

Working of Bidirectional Recurrent Neural Network

Inputting a sequence:
A sequence of data points, each represented as a vector with the same
dimensionality, are fed into a BRNN. The sequence might have different lengths.
Dual Processing:

Both the forward and backward directions are used to process the data. On the
basis of the input at that step and the hidden state at step t-1, the hidden state
at time step t is determined in the forward direction. The input at step t and the
hidden state at step t+1 are used to calculate the hidden state at step t in a
reverse way.

Computing the hidden state:

A non-linear activation function on the weighted sum of the input and previous
hidden state is used to calculate the hidden state at each step. This creates a
memory mechanism that enables the network to remember data from earlier
steps in the process.

Determining the output:

A non-linear activation function is used to determine the output at each step


from the weighted sum of the hidden state and a number of output weights. This
output has two options: it can be the final output or input for another layer in
the network.

Training:

The network is trained through a supervised learning approach where the goal
is to minimize the discrepancy between the predicted output and the actual
output. The network adjusts its weights in the input-to-hidden and hidden-to-
output connections during training through backpropagation.
To calculate the output from an RNN unit, we use the following formula:

where,
A = activation function, W = weight matrix, b = bias

The training of a BRNN is similar to backpropagation through a time


algorithm. BPTT algorithm works as follows:

• Roll out the network and calculate errors at each iteration


• Update weights and roll up the network.

However, because forward and backward passes in a BRNN occur


simultaneously, updating the weights for the two processes may occur at the
same time. This produces inaccurate outcomes. Thus, the following approach is
used to train a BRNN to accommodate forward and backward passes
individually.

Advantages of Bidirectional RNN

• Context from both past and future:


With the ability to process sequential input both forward and backward,
BRNNs provide a thorough grasp of the full context of a sequence.
Because of this, BRNNs are effective at tasks like sentiment analysis and
speech recognition.

• Enhanced accuracy:
BRNNs frequently yield more precise answers since they take both
historical and upcoming data into account.

• Efficient handling of variable-length sequences:


When compared to conventional RNNs, which require padding to have a
constant length, BRNNs are better equipped to handle variable-length
sequences.

• Resilience to noise and irrelevant information:


BRNNs may be resistant to noise and irrelevant data that are present in
the data. This is so because both the forward and backward paths offer
useful information that supports the predictions made by the network.

• Ability to handle sequential dependencies:


BRNNs can capture long-term links between sequence pieces, making
them extremely adept at handling complicated sequential dependencies.

Applications of Bidirectional Recurrent Neural Network

Bi-RNNs have been applied to various natural language processing (NLP) tasks,
including:

• Sentiment Analysis:
By taking into account both the prior and subsequent context, BRNNs can
be utilized to categorize the sentiment of a particular sentence.

• Named Entity Recognition:


By considering the context both before and after the stated thing, BRNNs
can be utilized to identify those entities in a sentence.

• Part-of-Speech Tagging:
The classification of words in a phrase into their corresponding parts of
speech, such as nouns, verbs, adjectives, etc., can be done using BRNNs.
• Machine Translation:
BRNNs can be used in encoder-decoder models for machine translation,
where the decoder creates the target sentence and the encoder analyses
the source sentence in both directions to capture its context.

• Speech Recognition:
When the input voice signal is processed in both directions to capture the
contextual information, BRNNs can be used in automatic speech
recognition systems.

Disadvantages of Bidirectional RNN

• Computational complexity:
Given that they analyze data both forward and backward, BRNNs can be
computationally expensive due to the increased amount of calculations
needed.

• Long training time:


BRNNs can also take a while to train because there are many parameters
to optimize, especially when using huge datasets.

• Difficulty in parallelization:
Due to the requirement for sequential processing in both the forward and
backward directions, BRNNs can be challenging to parallelize.

• Overfitting:
BRNNs are prone to overfitting since they include many parameters that
might result in too complicated models, especially when trained on short
datasets.

• Interpretability:
Due to the processing of data in both forward and backward directions,
BRNNs can be tricky to interpret since it can be difficult to comprehend
what the model is doing and how it is producing predictions.

Deep recurrent networks (DRNs)

• Deep recurrent networks (DRNs) are a class of neural networks that


combine the concepts of deep learning and recurrent neural networks
(RNNs).

• RNNs are a type of neural network designed to work with sequential data,
where the output of each step is dependent on the previous steps.
• This makes them particularly suitable for tasks like natural language
processing (NLP), time series prediction, and speech recognition.

• Deep recurrent networks extend the capabilities of traditional RNNs by


stacking multiple layers of recurrent units, allowing for the creation of
deeper architectures.

• Each layer in a DRN passes its output as input to the next layer, enabling
the network to learn hierarchical representations of sequential data.

• Deep recurrent networks have been successfully applied to various tasks,


including sequence prediction, language modeling, machine translation,
and speech recognition.

• They have demonstrated superior performance compared to shallow


recurrent networks in many cases, especially when dealing with complex
sequential data with long-range dependencies.

There are several types of recurrent units that can be used in deep recurrent
networks, such as:

• Vanilla RNNs:
These are the simplest form of recurrent units, where the output is
computed based on the current input and the previous hidden state.
• Long Short-Term Memory (LSTM):
LSTMs are a type of recurrent unit that introduces gating mechanisms to
control the flow of information within the network, allowing it to learn
long-range dependencies more effectively and mitigate the vanishing
gradient problem.
• Gated Recurrent Units (GRUs):
GRUs are like LSTMs but have a simpler structure with fewer parameters,
making them computationally more efficient.
Steps to develop a deep RNN application
Developing an end-to-end deep RNN application involves several steps,
including data preparation, model architecture design, training the model, and
deploying it. Here is an example of an end-to-end deep RNN application for
sentiment analysis.
Data preparation:
The first step is to gather and preprocess the data. In this case, we’ll need a
dataset of text reviews labelled with positive or negative sentiment. The text
data needs to be cleaned, tokenized, and converted to the numerical format.
This can be done using libraries like NLTK or spaCy in Python.

Model architecture design:


The next step is to design the deep RNN architecture. We’ll need to decide on
the number of layers, number of hidden units, and type of recurrent unit (e.g.
LSTM or GRU). We’ll also need to decide how to handle the input and output
sequences, such as using padding or truncation.

Training the model:


Once the architecture is designed, we’ll need to train the model using the
preprocessed data. We’ll split the data into training and validation sets and train
the model using an optimization algorithm like stochastic gradient descent.
We’ll also need to set hyperparameters like learning rate and batch size.

Evaluating the model:


After training, we’ll evaluate the model’s performance on a separate test set.
We’ll use metrics like accuracy, precision, recall, and F1 score to assess the
model’s performance.

Deploying the model:


Finally, we’ll deploy the trained model to a production environment, where it
can be used to classify sentiment in real-time. This could involve integrating the
model into a web application or API.

Processing Diagram of Deep Recurrent Networks


Output

Input Sequence (Predictions)

Embedding Output Layer


Layer
(Eg. Softmax)

Recurrent Layer
(Multiple layers stacked together)

This block diagram provides a high-level overview of the architecture of a


deep recurrent network.
• Input Sequence:
This is the sequential data fed into the network. It could be text, time-
series data, audio, etc.

• Embedding Layer:
Converts the input sequence into a dense representation suitable for
processing by the recurrent layers. It typically involves mapping each
element of the sequence (e.g., word or data point) to a high-dimensional
vector space.

• Recurrent Layers:
Consist of multiple recurrent units stacked together. Each layer processes
the input sequence sequentially, capturing temporal dependencies.
Common types of recurrent units include vanilla RNNs, LSTMs, and GRUs.

• Output Layer:
Takes the output from the recurrent layers and produces the final
prediction or output. The structure of this layer depends on the specific
task, such as classification (e.g., softmax activation) or regression (e.g.,
linear activation).

• Output (Prediction):
The final output of the network, which could be a sequence of predictions
for each time step or a single prediction for the entire sequence,
depending on the task.

Deep recurrent networks (DRNs) offer several advantages:

• Hierarchical Representation Learning:


With multiple layers of recurrent units, DRNs can learn hierarchical
representations of sequential data. Each layer can capture different levels
of abstraction, allowing the network to extract complex features from the
input sequence.
• Modeling Long-term Dependencies:
Deep architectures enable DRNs to capture long-range dependencies in
sequential data more effectively. By stacking recurrent layers, the
network can maintain and propagate information over longer sequences,
which is crucial for tasks involving context or memory over extended
periods.
• Improved Expressiveness:
Deeper architectures provide more expressive power, allowing DRNs to
learn complex patterns and relationships within sequential data. This
increased expressiveness can lead to better performance on tasks that
require modeling intricate dependencies or understanding subtle
variations in the data.
• Better Feature Abstraction:
Each layer in a DRN learns to abstract features from the input sequence,
leading to a hierarchy of representations. This hierarchical feature
extraction can facilitate learning informative and discriminative features,
which are essential for tasks like sequence classification, language
modeling, and machine translation.

• Transfer Learning:
Pre-training deep recurrent networks on large-scale datasets for related
tasks (e.g., language modeling) and fine-tuning them for specific tasks
often leads to improved performance. The hierarchical representations
learned during pre-training capture generic features of the data, which
can be beneficial for downstream tasks with limited labeled data.

Disadvantages of Deep recurrent networks (DRNs)

• Vanishing/Exploding Gradient Problem:


Training deep recurrent networks can be challenging due to the
vanishing or exploding gradient problem. As gradients are
backpropagated through multiple layers during training, they can
become either extremely small (vanishing) or extremely large
(exploding), which hinders learning and stability. Techniques like
gradient clipping and careful initialization of weights are often necessary
to mitigate this issue.

• Computational Complexity:
Deep recurrent networks with multiple layers can be computationally
expensive to train and deploy, especially when dealing with large-scale
datasets or complex architectures. The computational complexity
increases with the number of layers, making it challenging to train deep
models on resource-constrained devices or in real-time applications.

• Long Training Time:


Training deep recurrent networks requires significant computational
resources and time, especially when dealing with large datasets and
complex architectures. The training process often involves multiple
iterations over the entire dataset, which can take hours, days, or even
weeks depending on the size of the data and the complexity of the model.

• Overfitting:
Deep recurrent networks are prone to overfitting, especially when
dealing with small datasets or overly complex models. With a large
number of parameters, deep models have a high capacity to memorize
noise or irrelevant patterns in the training data, leading to poor
generalization performance on unseen data. Regularization techniques
such as dropout and weight decay are commonly used to prevent
overfitting.
• Difficulty in Interpretability:
Understanding the internal workings of deep recurrent networks and
interpreting their decisions can be challenging. With multiple layers of
non-linear transformations, it can be difficult to interpret the learned
representations and understand how the network arrives at a particular
prediction. This lack of interpretability can be a significant drawback in
applications where transparency and interpretability are essential.

Application: Image Generation

• Generating images using recurrent neural networks (RNNs) is an exciting


application that leverages the sequential nature of RNNs to produce
images pixel by pixel.
• While RNNs are not commonly used for image generation due to their
sequential processing nature and the high dimensionality of image data,
they can still be applied for certain types of image generation tasks.
• RNN-based approaches can still be useful in scenarios where sequential
processing or conditioning on external information is desirable.

Architecture diagram which can generate images from text descriptions:

▪ Semantic information from the textual description was used as input in


the generator model, which converts characteristic information to pixels
and generates the images.
▪ This generated image was used as input in the discriminator along with
real/wrong textual descriptions and real sample images from the dataset.
▪ A sequence of distinct (picture and text) pairings are then provided as
input to the model to meet the goals of the discriminator: input pairs of
real images and real textual descriptions, wrong images and mismatched
textual descriptions, and generated images and real textual descriptions.
▪ The real photo and real text combinations are provided so that the model
can determine if a particular image and text combination align. An
incorrect picture and real text description indicates that the image does
not match the caption.
▪ The discriminator is trained to identify real and generated images. At the
start of training, the discriminator was good at classification of
real/wrong images. Loss was calculated to improve the weight and to
provide training feedback to the generator and discriminator model.
▪ As soon as the training proceeded, the generator produced more realistic
images and it fooled the discriminator when distinguishing between real
and generated images.

Here's how it can be done:


• Text-to-Image Generation:
One common approach to image generation using RNNs is to generate
images conditioned on textual descriptions. In this setup, an RNN, such
as a Long Short-Term Memory (LSTM) network, is used to process the
input text, encoding the semantic information into a fixed-length vector
representation. This vector is then used as a conditioning input to
another network, typically a Generative Adversarial Network (GAN) or a
Variational Autoencoder (VAE), which generates the corresponding
image.
• Sequence-to-Sequence Generation:
Another approach is to directly generate images pixel by pixel using
autoregressive models. In this setup, an RNN is trained to predict the next
pixel in the image sequence given the previous pixels. This process is
repeated iteratively until the entire image is generated. Variants of RNNs,
such as PixelRNN and PixelCNN, have been proposed for this task, where
the model predicts the color value of each pixel conditioned on the
previously generated pixels.
• Conditional Image Generation:
RNNs can also be used for conditional image generation, where the
generation process is conditioned on some input information. For
example, the input could be a low-resolution image, a sketch, or a set of
object labels. The RNN processes this input and generates the
corresponding high-resolution image or completes the missing parts of
the input image.
• Data Augmentation:
RNNs can be used to generate synthetic images for data augmentation
purposes. By training an RNN to generate realistic images similar to the
training data distribution, additional training samples can be generated
to increase the diversity of the dataset and improve the generalization
performance of image classification or object detection models.
• Artistic Style Transfer:
RNNs can be used for artistic style transfer, where the style of one image
is transferred to the content of another image. In this setup, the RNN is
trained to generate an image that matches the content of one image while
incorporating the style features learned from another image. This
process typically involves optimizing a loss function that balances
content preservation and style transfer.

Application: Image Compression

▪ Image compression is a method to remove spatial redundancy between


adjacent pixels and reconstruct a high-quality image.
▪ In the past few years, deep learning has gained huge attention from the
research community and produced promising image reconstruction
results.
▪ Therefore, recent methods focused on developing deeper and more
complex networks, which significantly increased network complexity
▪ Using recurrent neural networks (RNNs) for image compression is an
innovative application that leverages the sequential processing capability
of RNNs to effectively encode and compress image data.

Architecture Diagram of image compression framework based on


Recurrent Neural Network (RNN)

In above diagram, there are three modules with two additional novel blocks in
the end-to-end framework, i.e., encoder network, analysis block, binarizer,
decoder network, and synthesis block. Image patches are directly given to the
analysis block as an input that generates latent features using the proposed
analysis encoder block. The entire framework architecture is presented in
architecture diagram.
The single iteration of the end-to-end framework is represented in below
Equation.

The training process of image compression network is optimized by adopting


the loss at each iteration based on actual weighted and predicted value.

Here's how RNNs can be applied for image compression:


▪ Sequence-to-Sequence Compression:
In this approach, the input image is divided into a sequence of patches or
blocks. Each block is then sequentially processed by an RNN, such as a
Long Short-Term Memory (LSTM) network or a Gated Recurrent Unit
(GRU). The RNN compresses the information in each block into a fixed-
length vector representation, capturing the essential features of the
image content.
▪ Hierarchical Compression:
Another approach involves using a hierarchical RNN architecture for
compression. In this setup, multiple layers of RNNs are stacked together,
with each layer processing increasingly abstract representations of the
image. The lower layers capture fine-grained details, while the higher
layers capture more global structures and patterns. This hierarchical
representation enables efficient compression of images with varying
levels of detail.
▪ Conditional Compression:
RNNs can be conditioned on contextual information to improve
compression performance. For example, the compression process can be
conditioned on the image content, image resolution, or specific
compression requirements (e.g., target compression ratio). By
incorporating additional information into the compression model, RNNs
can adapt their encoding strategy to better preserve important features
of the input image.
▪ Lossy Compression:
RNN-based compression models can be trained to perform lossy
compression, where some information in the input image is discarded to
achieve higher compression ratios. The RNN learns to prioritize
important features while discarding less critical information, resulting in
compact representations of the input images. Techniques such as
quantization and entropy coding can be combined with RNN-based
compression to further improve compression efficiency.

▪ Learned Compression Algorithms:


Instead of handcrafting compression algorithms, RNNs can be trained to
learn effective compression strategies directly from data. By optimizing
compression performance using techniques such as autoencoders or
reinforcement learning, RNN-based compression models can adapt to the
statistical properties of different types of images and achieve better
compression ratios.

Application: Natural Language Processing

▪ Natural Language Processing (NLP) using recurrent neural networks


(RNNs) is a prominent area of research and application.
▪ RNNs, with their ability to model sequential data, are well-suited for
various NLP tasks that involve understanding and generating natural
language.
▪ RNNs play a vital role in various NLP tasks by effectively modeling the
sequential nature of natural language and capturing the contextual
dependencies in text data.
▪ Their versatility and ability to handle sequential data make them a
powerful tool for understanding, generating, and processing natural
language in a wide range of applications.
▪ RNN are effective for sequential data processing. In RNN computation is
recursively applied to each instance of input sequence from previous
computed results. Recurrent unit is sequentially fed with the sequences
represented by fixed size vector of tokens.
RNN based framework for NLP is shown in Figure below:

The advantage of RNN is that it can memorize the results of previous


computation and utilize that information in current computation.
So, it is possible to model context dependencies in inputs of arbitrary length
with RNN and proper composition of input can be created.
Mainly RNNs are used in different NLP tasks like,
▪ Natural language generation (e.g. image captioning, machine translation,
visual question answering)
▪ Word - level classification (e.g. Named Entity recognition (NER))
▪ Language modelling
▪ Semantic matching
▪ Sentence-level classification (e.g., sentiment polarity)
Here are some key applications of RNNs in NLP:
✓ Sequence Modelling:
RNNs excel at sequence modelling tasks, such as language modelling and
text generation. They can be trained to predict the next word in a
sentence given the previous words, capturing the sequential
dependencies in the language. Language models based on RNNs have
been used for tasks like speech recognition, machine translation, and
autocomplete suggestions.
✓ Machine Translation:
RNNs, particularly the sequence-to-sequence (seq2seq) architecture,
have been widely used for machine translation tasks. In this setup, an
RNN encoder processes the input sentence in the source language, and
another RNN decoder generates the corresponding translation in the
target language. This approach has been extended with attention
mechanisms to handle longer sentences and improve translation quality.
✓ Sentiment Analysis:
RNNs are effective for sentiment analysis tasks, where the goal is to
determine the sentiment or opinion expressed in a piece of text. By
processing the text sequentially and capturing the contextual
information, RNNs can classify text into different sentiment categories
(e.g., positive, negative, neutral). They have been used for sentiment
analysis in social media posts, customer reviews, and news articles.
✓ Named Entity Recognition (NER):
RNNs have been applied to named entity recognition tasks, where the
goal is to identify and classify entities (e.g., persons, organizations,
locations) mentioned in text. By modelling the sequential context of the
text, RNNs can learn to recognize and classify entities based on their
surrounding words and phrases. This is useful in applications like
information extraction and text summarization.
✓ Part-of-Speech Tagging:
RNNs can be used for part-of-speech (POS) tagging, where each word in
a sentence is assigned a grammatical category (e.g., noun, verb,
adjective). By considering the sequential context of the words, RNNs can
learn to predict the POS tags more accurately, even for ambiguous cases.
POS tagging is an essential component in many NLP pipelines and
applications.
✓ Text Classification:
RNNs are commonly used for text classification tasks, such as document
categorization, topic modelling, and spam detection. By processing the
text sequentially and capturing the semantic information, RNNs can learn
to classify documents or sentences into different categories based on
their content. They have been used in various domains, including news
categorization, customer support, and email filtering.
✓ Dialogue Systems:
RNNs have been employed in dialogue systems, also known as chatbots
or conversational agents, to generate responses in natural language. By
modelling the sequential interaction between users and the system,
RNNs can generate contextually relevant and coherent responses to user
queries or prompts. Dialogue systems based on RNNs have been used in
virtual assistants, customer service bots, and language learning
applications.

Complete Auto Encoder

✓ An autoencoder is a type of artificial neural network used for


unsupervised learning of efficient data representations.
✓ Autoencoders emerge as a fascinating subset of neural networks, offering
a unique approach to unsupervised learning.
✓ Autoencoders are an adaptable and strong class of architectures for the
dynamic field of deep learning, where neural networks develop
constantly to identify complicated patterns and representations.
✓ With their ability to learn effective representations of data, these
unsupervised learning models have received considerable attention and
are useful in a wide variety of areas, from image processing to anomaly
detection.
It consists of two main components:
✓ An encoder: The encoder compresses the input data into a latent
representation.
✓ A decoder: The decoder reconstructs the original input from the latent
representation.
Architecture of Complete Auto Encoder

• Basically, autoencoders are approximators for the identity operation;


therefore learning these weights might seem trivial; but by constraining
the parameters (such as number of nodes or number of connections),
interesting representations can be uncovered in the data.
• Most real datasets are structured i.e. they have a high degree of local
correlations; usually, the autoencoder can exploit these correlations and
yield compressed representations. However, autoencoders are not
usually used for compression, rather they are used for learning the
representations which are later used for classification i.e. for feature
learning.
• Autoencoders can come in various architectures, each serving different
purposes and having different properties.
Here are some types of complete autoencoders:
• Vanilla Autoencoder:
A vanilla autoencoder consists of an encoder and a decoder where both
are fully connected neural networks. It aims to learn a compressed
representation of the input data without any specific constraints on the
learned representations.
• Sparse Autoencoder:
In a sparse autoencoder, additional constraints are imposed on the
learned representations to encourage sparsity. This can be achieved by
adding a sparsity penalty term to the loss function, such as L1
regularization or the Kullback-Leibler (KL) divergence.
• Denoising Autoencoder:
Denoising autoencoders are trained to reconstruct clean data from
corrupted inputs. During training, noise is added to the input data, and
the model is trained to reconstruct the original, noise-free data. This
helps the model learn more robust and informative representations.
• Variational Autoencoder (VAE):
VAEs are probabilistic autoencoders that learn a latent variable model of
the data. They aim to capture the underlying probability distribution of
the input data in the latent space and generate new samples by sampling
from this distribution. VAEs consist of an encoder that outputs the
parameters of a probability distribution (e.g., mean and variance) and a
decoder that samples from this distribution to generate reconstructions.
• Contractive Autoencoder:
Contractive autoencoders are trained to learn representations that are
robust to small perturbations in the input data. They achieve this by
adding a penalty term to the loss function that penalizes the Frobenius
norm of the Jacobian matrix of the encoder with respect to the input data.
• Adversarial Autoencoder (AAE):
AAEs combine autoencoders with adversarial training techniques. They
consist of an encoder-decoder pair trained to reconstruct the input data,
along with a discriminator network that tries to distinguish between the
latent representations learned by the encoder and samples from a prior
distribution.
• Convolutional Autoencoder:
Convolutional autoencoders use convolutional layers instead of fully
connected layers in both the encoder and decoder. They are particularly
well-suited for image data and can capture spatial dependencies more
effectively compared to vanilla autoencoders.
• Recurrent Autoencoder:
Recurrent autoencoders utilize recurrent neural networks (RNNs) in
either the encoder, decoder, or both. They are useful for sequential data,
such as time series or natural language sequences, and can capture
temporal dependencies in the input data.
Regularized autoencoders

• Regularized autoencoders are a type of autoencoder that incorporates


regularization techniques to improve the quality of learned
representations and prevent overfitting.
• These techniques impose additional constraints on the autoencoder's
training process, encouraging it to learn more robust and generalizable
representations of the input data.
• Regularization helps prevent the autoencoder from memorizing the
training data and capturing noise, resulting in better performance on
unseen data.
• Regularized autoencoders are widely used in various applications,
including dimensionality reduction, feature learning, data denoising, and
anomaly detection.
• By incorporating regularization techniques into the training process,
regularized autoencoders can learn more informative and generalizable
representations of the input data, leading to better performance on
downstream tasks.

Structure of Regularized Autoencoders


Let’s dive into the structural nuances that differentiate regularized
autoencoders from their traditional counterparts.
Neuronal Arrangement:
The arrangement remains like traditional autoencoders, with an encoder and a
decoder. The deviation lies in the incorporation of regularization methods
within the layers.
Activation Functions:
Regularized autoencoders may employ specific activation functions tailored for
regularization, contributing to a more balanced learning process.
Incorporating Regularization Methods:
Regularization methods, such as dropout or L1/L2 regularization, are
integrated into the architecture to curb overfitting.

Some common regularization techniques used in regularized


autoencoders include:

• L1 and L2 Regularization:
L1 and L2 regularization penalize the magnitude of the weights in the
autoencoder's neural network. By adding a regularization term to the
loss function proportional to either the L1 or L2 norm of the weights,
these techniques encourage sparsity (in the case of L1 regularization) or
small weights (in the case of L2 regularization), helping prevent
overfitting.

• Dropout:
Dropout is a regularization technique that randomly sets a fraction of the
input units to zero during each training iteration. This helps prevent the
autoencoder's neural network from relying too heavily on any individual
input features, forcing it to learn more robust representations.

• Batch Normalization:
Batch normalization normalizes the activations of each layer in the
autoencoder's neural network, helping stabilize and accelerate the
training process. By reducing internal covariate shift, batch
normalization acts as a regularizer, making the autoencoder more
resistant to overfitting.

• Noise Injection:
Noise injection involves adding noise to the input data or the activations
of the autoencoder's hidden layers during training. This helps prevent the
autoencoder from memorizing the training data and encourages it to
learn more generalizable representations.

• Contractive Regularization:
Contractive regularization penalizes the Frobenius norm of the Jacobian
matrix of the encoder with respect to the input data. This encourages the
encoder to learn representations that are invariant to small changes in
the input data, making the autoencoder more robust to variations in the
input.
Stochastic Encoders and Decoders

• Stochastic encoders and decoders are components of probabilistic


autoencoder models, such as Variational Autoencoders (VAEs).
• These components introduce stochasticity into the encoding and
decoding process, enabling the model to learn a probabilistic
representation of the input data distribution.
• Stochastic encoders and decoders in VAEs enable various applications,
including generative modelling, data synthesis, and unsupervised
representation learning.
• They provide a principled framework for learning complex data
distributions and generating new samples from these distributions.

Stochastic Encoder:
In a VAE, the encoder network outputs the parameters of a probability
distribution instead of a deterministic encoding. Instead of directly
outputting the latent representation of the input data, the encoder
outputs the mean and variance (or other parameters) of a Gaussian
distribution that represents the distribution of possible latent variables
given the input. The latent variable is then sampled from this distribution
to generate a stochastic representation.

Stochastic Decoder:
Similarly, the decoder network in a VAE accepts a sampled latent variable
as input instead of a deterministic encoding. This sampled latent variable
is generated by sampling from the distribution outputted by the encoder.
The decoder then generates the reconstructed output based on this
sampled latent variable.
Cost Function Calculation

The cost function of VAE is based on log likelihood maximization.


The cost function consists of reconstruction and regularization error
terms:
Cost = Reconstruction Error + Regularization Error

Contractive autoencoders

• Contractive autoencoders are a variant of autoencoders that incorporate


a regularization term known as contractive regularization.
• The goal of contractive regularization is to encourage the autoencoder's
encoder network to learn a more robust and stable representation of the
input data by penalizing variations in the input space.
• In a contractive autoencoder, the contractive regularization term is added
to the loss function during training. This regularization term penalizes
the Frobenius norm of the Jacobian matrix of the encoder's output with
respect to the input data.
• Intuitively, this penalizes variations in the input space by encouraging the
encoder to learn representations that are insensitive to small changes in
the input data.
• Contractive autoencoder simply targets to learn invariant
representations to unimportant transformations for the given data.
• CAE surpasses results obtained by regularizing autoencoder using
weight decay or by denoising. CAE is a better choice than denoising
autoencoder to learn useful feature extraction.

• During training, the contractive autoencoder is optimized to minimize


the reconstruction error (e.g., mean squared error) while simultaneously
minimizing the contractive regularization term.
• This encourages the encoder to learn representations that capture the
underlying structure of the data while being robust to small
perturbations in the input space.
• Contractive autoencoders have been applied in various domains,
including dimensionality reduction, feature learning, and data denoising.
• They are particularly useful in scenarios where the input data is noisy or
contains small variations, as they encourage the autoencoder to learn
stable and invariant representations of the data.
The benefits and applications of contractive autoencoders include:
• Robustness to Noise: Contractive regularization encourages the
encoder to learn representations that are robust to small variations and
noise in the input data. This makes contractive autoencoders suitable for
tasks involving noisy or corrupted data, such as denoising autoencoding
• Improved Generalization: By penalizing variations in the input space,
contractive regularization helps prevent overfitting and improves the
generalization performance of the autoencoder. This allows the model to
learn more generalizable representations of the data that can be applied
to unseen examples.
• Feature Learning: Contractive autoencoders can learn informative and
discriminative features from the input data by capturing the underlying
structure of the data distribution. These learned features can be used for
downstream tasks such as classification, clustering, or anomaly
detection.
• Dimensionality Reduction: The compact and stable representations
learned by contractive autoencoders can be used for dimensionality
reduction tasks. By projecting high-dimensional data into a lower-
dimensional space while preserving important information, contractive
autoencoders facilitate visualization, data compression, and efficient
storage.
• Unsupervised Learning: Contractive autoencoders belong to the class
of unsupervised learning algorithms, as they do not require labelled data
during training. This makes them suitable for tasks where labelled data
is scarce or expensive to obtain, allowing for the extraction of useful
information from large amounts of unlabelled data.

You might also like