0% found this document useful (0 votes)

23 views63 pages

DL Unit 5

The document discusses various applications of deep learning, focusing on image segmentation, object detection, and automatic image captioning. It outlines different types of image segmentation tasks, including semantic, instance, and panoptic segmentation, and describes key algorithms like R-CNN, Fast R-CNN, and YOLO for object detection. Additionally, it highlights the significance of these techniques in fields such as robotics, medical imaging, and autonomous vehicles.

Uploaded by

vishnupriyavp2606

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views63 pages

DL Unit 5

Uploaded by

vishnupriyavp2606

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 63

23CS2902 DEEP LEARNING

UNIT -5 APPLICATIONS OF DEEP LEARNING

Image Segmentation – Object Detection – Automatic Image Captioning – Image

generation with Generative Adversarial Networks – Video to Text with LSTM Models –
Attention Models for Computer Vision – Case Study: Named Entity Recognition –
Opinion Mining using Recurrent Neural Networks – Parsing and Sentiment Analysis
using Recursive Neural Networks – Sentence Classification using Convolutional Neural
Networks – Dialogue Generation with LSTMs.

IMAGE SEGMENTATION

Image segmentation is an extension of image classification where, in addition to classification, we

perform localization. Image segmentation thus is a superset of image classification with the model
pinpointing where a corresponding object is present by outlining the object's boundary.

In computer vision, most image segmentation models consist of an encoder-decoder network as

compared to a single encoder network in classifiers.

The encoder encodes a latent space representation of the input which the decoder decodes to form
segment maps, or in other words maps outlining each object’s location in the image.

A typical segment map looks something like this:

1
Types of Image Segmentation tasks

Image segmentation tasks can be classified into three groups based on the amount and type of
information they convey.

 Semantic Segmentation
 Instance Segmentation
 Panoptic Segmentation

Semantic segmentation

 Semantic segmentation refers to the classification of pixels in an image into semantic classes.
Pixels belonging to a particular class are simply classified to that class with no other
information or context taken into consideration.
 As might be expected, it is a poorly defined problem statement when there are closely
grouped multiple instances of the same class in the image. An image of a crowd in a street
would have a semantic segmentation model predict the entire crowd region as belonging to
the “pedestrian” class, thus providing very little in-depth detail or information on the image.

Instance segmentation

 Instance segmentation models classify pixels into categories on the basis of “instances” rather
than classes.
 An instance segmentation algorithm has no idea of the class a classified region belongs to but
can segregate overlapping or very similar object regions on the basis of their boundaries.
 If the same image of a crowd we talked about before is fed to an instance segmentation
model, the model would be able to segregate each person from the crowd as well as the
surrounding objects (ideally), but would not be able to predict what each region/object is an
instance of.
2
Panoptic segmentation

 Panoptic segmentation, the most recently developed segmentation task, can be expressed as
the combination of semantic segmentation and instance segmentation where each instance of
an object in the image is segregated and the object’s identity is predicted.
 Panoptic segmentation algorithms find large-scale applicability in popular tasks like self-
driving cars where a huge amount of information about the immediate surroundings must be
captured with the help of a stream of images.

Deep Learning-based methods

convolutional Encoder-Decoder Architecture

 Encoder decoder architectures for semantic segmentation became popular with the onset of
works like SegNet (by Badrinarayanan et. a.) in 2015.
 SegNet proposes the use of a combination of convolutional and downsampling blocks to
squeeze information into a bottleneck and form a representation of the input. The decoder
then reconstructs input information to form a segment map highlighting regions on the input
and grouping them under their classes.
 Finally, the decoder has a sigmoid activation at the end that squeezes the output in the range
(0,1).

3
U NET:

 SegNet was accompanied by the release of another independent segmentation work at the
same time, U-Net ( by Ronnerberger et. al.), which first introduced skip connections in Deep
Learning as a solution for the loss of information observed in downsampling layers of typical
encoder-decoder networks.
 Skip connections are connections that go from the encoder directly to the decoder without
passing through the bottleneck.
 In other words, feature maps at various levels of encoded representations are captured and
concatenated to feature maps in the decoder. This helps to reduce data loss by aggressive
pooling and downsampling as done in the encoder blocks of an encoder-decoder architecture.
 Skip Connections were a big hit, specifically in the domain of medical imaging, with U-Net
providing state-of-the-art results in cell segmentation for the diagnosis of diseases.

DEEP LAB

Following UNet, DeepLab by Facebook served as a milestone, providing state-of-the-art results on

semantic segmentation.

DeepLab made use of atrous convolutions replacing simple pooling operations and preventing
significant information loss while downsampling. They further introduced multi-scale feature
extraction with the help of Atrous Spatial Pyramid Pooling to help the network segment objects
regardless of their sizes.
4
To recover boundary information, one of the most important parts of semantic as well as instance
segmentation, they made use of fully connected Conditional Random Fields (CRFs).

Coupling the fine-grained localization accuracy of CRFs, the recognition capacity of CNNs led
DeepLab to provide highly accurate segment maps, beating methods like FCNs and SegNet by a wide
margin.

Papers like SegNet, U-Net, and DeepLab laid the groundwork for future work like Mask-RCNN, the
DeepLab series by Facebook, and works like PspNet and GSCNN.

Applications of Image Segmentation

 Image segmentation is an important step in artificial vision. Machines need to divide visual
data into segments for segment-specific processing to take place.
 Image segmentation thus finds its way in prominent fields like Robotics, Medical Imaging,
Autonomous Vehicles, and Intelligent Video Analytics.

Robotics (Machine Vision)

Image segmentation aids machine perception and locomotion by pointing out objects in their path of
motion, enabling them to change paths effectively and understand the context of their environment.

Apart from locomotion, segmentation of images helps machines segregate the objects they are
working with and enables them to interact with real-world objects using only vision as a reference.
This allows the machine to be useful almost anywhere without much constraint.

 Instance segmentation for robotic grasping

 Recycling object picking

 Autonomous navigation and SLAM

5
Medical imaging

Medical Imaging is an important domain of computer vision that focuses on the diagnosis of diseases
from visual data, both in the form of simple visual data and biomedical scans.

Segmentation forms an important role in medical imaging as it helps doctors identify possible
malignant features in images in a fast and accurate manner.

Using image segmentation, diagnosis of diseases can not only be speeded up but can also be made
cheaper, thereby benefiting thousands across the globe.

 X-Ray segmentation

 CT scan organ segmentation

 Dental instance segmentation

 Digital pathology cell segmentation

 Surgical video annotation

Smart Cities

Smart Cities often have CCTV cameras for real-time monitoring of pedestrians, traffic, and crime.
This monitoring can be easily automated with the help of image segmentation.

With AI-based monitoring, crimes can be reported faster, road accidents can be followed up with
immediate ambulances, and speeding cars can be easily caught and penalized.

The use of image segmentation and AI-based monitoring can thus improve the lifestyle of people.

 Pedestrian detection

 Traffic analytics

 License plate detection

 Video Surveillance

Self Driving Cars

Self Driving cars are one of the biggest applications of image segmentation with the planning of
routes and movement depending heavily on it.

Semantic and instance segmentation helps these vehicles to identify road patterns and other vehicles,
thereby enabling a hassle-free and smooth ride.

6
 Drivable surface semantic segmentation

 Car and pedestrian instance segmentation

 In-vehicle object detection (stuff left behind by passengers)

 Pothole detection and segmentation

What is object detection?

 Object detection is a supervised machine learning problem, which means you must
train your models on labeled examples.

 Object detection is a computer vision technique that focuses on identifying and

labelling objects within images, videos, and even live footage.
 Object detection models are trained with a surplus of annotated visuals in order to carry
out this process with new data.

 It becomes as simple as feeding input visuals and receiving a fully marked-up output
visual.
 A key component is the object detection bounding box which identifies the edges of the
object tagged with a clear-cut quadrilateral — typically either a square or rectangle.
 They are accompanied by a label of the object, whether it is a person, a car, or a dog to
describe the target object.
7
 Bounding boxes can overlap to showcase multiple objects in a given shot as long as the
model has prior knowledge of items it is tagging.

 Object detection is the process of finding instances of objects in images.

 Object detection is a subset of object recognition, where the object is not only identified
but also located in an image. This allows for multiple objects to be identified and
located within the same image

Object detection vs. other tasks

Let’s break down the other computer vision tasks individually for a greater understanding of
each one:

 Computer Vision: It is a field of artificial intelligence that enables us to train the

computers to understand and interpret the visuals of images and videos using algorithms
and models.
 Image Classification: It involves the detection and labelling of images using artificial
intelligence. These images are classified using the features given by the users
 Object Localization: It involves the detection of different objects in a given visual
and draws a boundary around them, mostly a box, to classify them.
 Object Detection: It involves both of these processes and classifies the objects, then
draws boundaries for each object and labels them according to their features.

What is object detection with deep learning?

The object detection with deep learning apart from alternative approaches is the employment
of convolutional neural networks (CNN).

The neural networks mimic that of the complex neural architecture of the human mind. They
primarily consist of an input layer, hidden inner layers, and an output layer.
8
The learning for these neural networks can be supervised, semi-supervised, and unsupervised,
referring to how much of the training data is annotated, if at all (unsupervised).

Deep neural networks for object detection yield by far the quickest and most accurate results
for single and multiple object detection since CNNs are capable of automated learning with less
manual engineering involved.

There is a world to unpack regarding deep learning and CNNs, but today we will only focus on
key points that regard object detection algorithms and models.

Methods and algorithms

Object detection is not possible without models designed especially for handling that task.

These object detection models are trained with hundreds of thousands of visual content to
optimize the detection accuracy on an automatic basis later on.

Training and refining models are made efficient through the help of readily available datasets
like COCO (Common Objects in Context) to help give you a head start in scaling your
annotation pipeline.

Types of object detection algorithms and methods

R-CNN

The first largely successful family of methods was R-CNN (Region-Based Convolutional
Neural Network), which was proposed in 2014. It surpassed its predecessors by extracting
merely 2,000 regions from the image, which were referred to as region proposals, instead of an
exceedingly large number of regions prior to this.

The flowchart of R-CNN is the following:

 The input image is selected, of which 2,000 region proposals are extracted.
 Next, the features would be extracted from each individual region, which would then go
on to be classified as one of the known classes.
 The primary shortcoming of R-CNN lies in the fact that although it extracted 2,000
region proposals, it was nonetheless a lengthy process. That is what paved the way to
the new and improved Fast R-CNN.

9
Problems with R-CNN

 It still takes a huge amount of time to train the network as you would have to classify 2000
region proposals per image.

 It cannot be implemented real time as it takes around 47 seconds for each test image.

 The selective search algorithm is a fixed algorithm. Therefore, no learning is happening at

that stage. This could lead to the generation of bad candidate region proposals.

Fast R-CNN

To overcome drawbacks of R-CNN to build a faster object detection algorithm and it was called
Fast R-CNN. The approach is similar to the R-CNN algorithm. But, instead of feeding the region
proposals to the CNN, we feed the input image to the CNN to generate a convolutional feature
map. From the convolutional feature map, we identify the region of proposals and warp them
into squares and by using a RoI pooling layer we reshape them into a fixed size so that it can be
fed into a fully connected layer. From the RoI feature vector, we use a softmax layer to predict
the class of the proposed region and also the offset values for the bounding box.

10
The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to feed 2000 region
proposals to the convolutional neural network every time. Instead, the convolution operation is
done only once per image and a feature map is generated from it.

Faster R-CNN

Both of the above algorithms(R-CNN & Fast R-CNN) uses selective search to find out the
region proposals. Selective search is a slow and time-consuming process affecting the
performance of the network. Therefore, Shaoqing Ren et al. Came up with an object detection
algorithm that eliminates the selective search algorithm and lets the network learn the region
proposals.

Similar to Fast R-CNN, the image is provided as an input to a convolutional network which
provides a convolutional feature map. Instead of using selective search algorithm on the feature
map to identify the region proposals, a separate network is used to predict the region proposals.
The predicted region proposals are then reshaped using a RoI pooling layer which is then used to
classify the image within the proposed region and predict the offset values for the bounding
boxes.

11
YOLO (You Only Look Once)

All of the previous object detection algorithms use regions to localize the object within the
image. The network does not look at the complete image. Instead, parts of the image which has
high probabilities of containing the object. YOLO or You Only Look Once is an object detection
algorithm much different from the region based algorithms seen above. In YOLO a single
convolutional network predicts the bounding boxes and the class probabilities for these boxes.

How YOLO works is that we take an image and split it into an SxS grid, within each of the grid
we take m bounding boxes. For each of the bounding box, the network outputs a class
probability and offset values for the bounding box. The bounding boxes having the class
probability above a threshold value is selected and used to locate the object within the image.

YOLO is orders of magnitude faster(45 frames per second) than other object detection
algorithms. The limitation of YOLO algorithm is that it struggles with small objects within the

12
image, for example it might have difficulties in detecting a flock of birds. This is due to the
spatial constraints of the algorithm.

Applications

 Surveillance, security, and traffic

 Automobile
 Medical
 Retail

AUTOMATIC IMAGE CAPTIONING:

Image Captioning refers to the process of generating textual description from an image –
based on the objects and actions in the image.

Image Captioning is the process of generating a textual description for given images. It has been
a very important and fundamental task in the Deep Learning domain. Image captioning has a
huge amount of application. NVIDIA is using image captioning technologies to create an
application to help people who have low or no eyesight.

Image captioning can be regarded as an end-to-end Sequence to Sequence problem, as it

converts images, which is regarded as a sequence of pixels to a sequence of words. For this
purpose, we need to process both the language or statements and the images. For the Language
part, we use recurrent Neural Networks and for the Image part, we use Convolutional Neural
Networks to obtain the feature vectors respectively.

Now, How does the idea work?

Say, we as humans are seeing a scene as given below.

13
If we are told to describe it, maybe we will describe it as: “A puppy on a blue towel” or “A
brown dog playing with a green ball”. So, how are we doing this? While forming the
description, we are seeing the image but at the same time, we are looking to create a meaningful
sequence of words. The first part is handled by CNNs and the second is handled by RNNs.

Now, one issue we might have overlooked here. We have seen that we can describe the above
images in several ways. So, how do we evaluate our model? For sequence to sequence problems,
like summarization, language translations, or captioning we use a Metrics called the BLEU
score.

BLEU stands for Bilingual Evaluation Understudy. It is a metric for evaluating a generated
sentence to a reference sentence. The perfect match is 1.0 and a perfect mismatch is 0.0. You can
study more about the BLEU score from this awesome blog post.

We have seen that we need to create a multimodal neural network that uses feature vectors
obtained using both RNN and CNN, so consequently, we will have two inputs. One is the image
we need to describe, a feed to the CNN, and the second is the words in the text sequence
produced till now as a sequence as the input to the RNN.

We are dealing with two types of information, a language one and another image one. So, the
question arises how or in what order should we introduce the pieces of information into our
model? Elaborately speaking, we need a language RNN model as we want to generate a
word sequence, so, when should we introduce the image data vectors in the language
model. A paper by Marc Tanti and Albert Gatt, Institute of Linguistics and Language
Technology, University of Malta covered a comparison study, of all the approaches. Let’s look
into the approaches.

Types of Architectures

There are two basic types of architectures:

14
The first architecture is called Injecting Architecture and the second one is called Merging

architecture. FF shows feed-forward networks.

In the Injecting Architecture, the image data is introduced along with the language data, and the

image and language data mixture is represented together. The RNN trains on the mixture. So, at

every step of training, the RNN uses the mixture of both pieces of information to predict the next

word, and consequently, the RNN fine-tunes image information as well during training.

In the Merging Architecture, the image data is not introduced in the RNN network. So, the

image and the language information are encoded separately and introduced together in a feed-
forward network, creating multimodal layer architecture.
What is CNN?

CNN is a powerful algorithm for image processing. These algorithms are currently the best
algorithms we have for the automated processing of images. Many companies use these
algorithms to do things like identifying the objects in an image.

Images contain data of RGB combination. Matplotlib can be used to import an image into
memory from a file. The computer doesn’t see an image, all it sees is an array of
numbers. Color images are stored in 3-dimensional arrays. The first two dimensions correspond
to the height and width of the image (the number of pixels). The last dimension corresponds to
the red, green, and blue colors present in each pixel.

Three Layers of CNN

15
Convolutional Neural Networks specialized for applications in image & video recognition.
CNN is mainly used in image analysis tasks like Image recognition, Object detection &
Segmentation.

There are three types of layers in Convolutional Neural Networks:

1) Convolutional Layer: In a typical neural network each input neuron is connected to the next
hidden layer. In CNN, only a small region of the input layer neurons connect to the neuron
hidden layer.

2) Pooling Layer: The pooling layer is used to reduce the dimensionality of the feature map.
There will be multiple activation & pooling layers inside the hidden layer of the CNN.

3) Fully-Connected layer: Fully Connected Layers form the last few layers in the network.
The input to the fully connected layer is the output from the final Pooling or
Convolutional Layer, which is flattened and then fed into the fully connected layer.

The CNN-LSTM Model

how would the LSTM or any other sequence prediction model understand the input image. We

cannot directly input the RGB image tensor as they are ill-equipped to work with such inputs.

Can we extract some features from the input image?

Yes, this is precisely what we need to do in order to use the LSTM architecture for our purpose.

We can use the deep CNN architecture to extract features from the image which are then fed into

the LSTM architecture to output the caption.

This is called the CNN LSTM model, specifically designed for sequence prediction problems

with spatial inputs, like images or videos. This architecture involves using Convolutional Neural

Network (CNN) layers for feature extraction on input data combined with LSTMs to perform

sequence prediction on the feature vectors.

16
Neural network models for captioning involve two main elements:

1. Feature Extraction.

2. Language Model.

Feature extraction

The feature extraction model is a neural network that given an image is able to extract the salient

features, often in the form of a fixed-length vector. A deep convolutional neural network, or

CNN, is used as the feature extraction submodel. This network can be trained directly on the

images in your dataset. Alternatively, you can use a pre-trained convolutional model .

Language Model

For image captioning, we are creating an LSTM based model that is used to predict the

sequences of words, called the caption, from the feature vectors obtained from the VGG

network.

Model Evaluation:

17
BLEU stands for Bilingual Evaluation Understudy. It is a metric for evaluating a generated
sentence to a reference sentence. The perfect match is 1.0 and a perfect mismatch is 0.0.

ADVERSARIAL GENERATIVE NETWORKS

GANs are a powerful class of neural networks that are used for unsupervised learning.

In GAN, when we give a training set, this technique learns to generate new data with the
same statistics as the training set with the help of algorithmic architectures that uses two neural
networks to generate new, synthetic instances of data that is very much similar to the real
data.

GANs can create anything whatever you feed to them, as it Learn-Generate-Improve.

Generative Adversarial Networks are deep learning machines that combine two separate
models into one architecture. The two components are:

 Generator Model
 Discriminator Model

The two models compete against each other in a zero-sum game.

Zero sum game-> a situation in which one person or group can win something only by causing
another person or group to lose it.

The generator model tries to generate new data samples similar to those in the problem
domain.

The goal of the generator is to fool the discriminator, so the generative neural network is
trained to maximise the final classification error (between true and generated data)

Generative models: they create new data instances that resemble your training data. For
example, GANs can create images that look like photographs of human faces, even though the
faces don't belong to any real person.

It takes a random set of input values of fixed length, and it tries to generate a sample in the
domain same as of input data.

Input is drawn randomly from a Gaussian distribution. Once the model is trained, it can be used
in future for generating new samples as same as input data.

18
The discriminator tries to identify whether the example presented is fake (comes from a
generator) or real (comes from the actual data domain).

The goal of the discriminator is to detect fake generated data, so the discriminative neural
network is trained to minimise the final classification error.

The training of GANs typically begins with the discriminator. The discriminator may even
first be preliminarily trained to recognize the samples from the dataset using the softmax layer.

Once the discriminator is in place, we start feeding it the samples generated by the thus far
untrained generator.

The discriminator produces an error at the classifier for predicting if the image came from the
dataset or from the generator.

As the learning proceeds, the generator network learns to produce samples that are closer and
closer to the original data up to a point where the generation is indistinguishable (at least for
the discriminator network) from the data.

19
For producing better images, we may use the transposed convolutions and the extension made
using de-convolutional GANs.

The competition between the generator and the discriminator makes them adversaries, which
gives the name to GANs.

At each iteration of the training process, the weights of the generative network are updated in
order to increase the classification error (error gradient ascent over the generator’s
parameters) whereas the weights of the discriminative network are updated so that to decrease
this error (error gradient descent over the discriminator’s parameters).
Here, the generative model captures the distribution of data and is trained in such a manner that
it tries to maximize the probability of the Discriminator in making a mistake.
The Discriminator, on the other hand, is based on a model that estimates the probability that the
sample that it got is received from the training data and not from the Generator. The GANs are
formulated as a minimax game, where the Discriminator is trying to minimize its reward V(D,
G) and the Generator is trying to minimize the Discriminator’s reward or in other words,
maximize its loss. It can be mathematically described by the formula below:

20
where,
G = Generator
D = Discriminator
Pdata(x) = distribution of real data
P(z) = distribution of generator
x = sample from Pdata(x)
z = sample from P(z)
D(x) = Discriminator network
G(z) = Generator network
Different types of GANs:

GANs are now a very active topic of research and there have been many different types of
GAN implementation. Some of the important ones that are actively being used currently are
described below:

Vanilla GAN: This is the simplest type GAN. Here, the Generator and the Discriminator are
simple multi-layer perceptron’s. In vanilla GAN, the algorithm is really simple, it tries to
optimize the mathematical equation using stochastic gradient descent.

Conditional GAN (CGAN): CGAN can be described as a deep learning method in which
some conditional parameters are put into place. In CGAN, an additional parameter ‘y’ is
added to the Generator for generating the corresponding data. Labels are also put into the
input to the Discriminator in order for the Discriminator to help distinguish the real data
from the fake generated data.

Deep Convolutional GAN (DCGAN): DCGAN is one of the most popular also the most
successful implementation of GAN. It is composed of ConvNets in place of multi-layer
perceptron’s. The ConvNets are implemented without max pooling, which is in fact replaced
by convolutional stride. Also, the layers are not fully connected.

Laplacian Pyramid GAN (LAPGAN): The Laplacian pyramid is a linear invertible image
representation consisting of a set of band-pass images, spaced an octave apart, plus a low-
frequency residual. This approach uses multiple numbers of Generator and Discriminator
networks and different levels of the Laplacian Pyramid. This approach is mainly used because
it produces very high-quality images. The image is down-sampled at first at each layer of the
21
pyramid and then it is again up-scaled at each layer in a backward pass where the image
acquires some noise from the Conditional GAN at these layers until it reaches its original size.

Super Resolution GAN (SRGAN): SRGAN as the name suggests is a way of designing a
GAN in which a deep neural network is used along with an adversarial network in order to
produce higher resolution images. This type of GAN is particularly useful in optimally up-
scaling native low-resolution images to enhance its details minimizing errors while doing so.

Application of GANs
 Generate new data from available data – It means generating new samples from an
available sample that is not similar to a real one.
 Generate realistic pictures of people that have never existed.
 Gans is not limited to Images, It can generate text, articles, songs, poems, etc.
 Generate Music by using some clone Voice.
 Text to Image Generation (Object GAN and Object Driven GAN)
 Creation of anime characters in Game Development and animation production.
 Image to Image Translation – We can translate one Image to another without
changing the background of the source image. For example, Gans can replace a dog
with a cat.
 Low resolution to High resolution – If you pass a low-resolution Image or video,
GAN can produce a high-resolution Image version of the same.
 Prediction of Next Frame in the video – By training a neural network on small frames
of video, GANs are capable to generate or predict a small next frame of video. For
example, you can have a look at below GIF
 Interactive Image Generation – It means that GANs are capable to generate images
and video footage in an art form if they are trained on the right real dataset.
Steps to Implement Basic GAN
1. Importing all libraries
2. Getting the Dataset
3. Data Preparation – It includes various steps to accomplish like preprocessing data,
scaling, flattening, and reshaping the data.
4. Define the function Generator and Discriminator.
5. Create a Random Noise and then create an Image with Random Noise.
6. Setting Parameters like defining epoch, batch size, and Sample size.
7. Define the function of generating Sample Images.
8. Train Discriminator then trains Generator and it will create Images.
9. Will see what clarity of Images is created by Generator.

22
Video to text with LSTM Models

LSTM:

Long Short Term Memory is a kind of recurrent neural network. In RNN output from the last
step is fed as input in the current step. LSTM was designed by Hochreiter & Schmidhuber. It
tackled the problem of long-term dependencies of RNN in which the RNN cannot predict
the word stored in the long-term memory but can give more accurate predictions from the
recent information.
As the gap length increases RNN does not give an efficient performance. LSTM can by
default retain the information for a long period of time. It is used for processing, predicting,
and classifying on the basis of time-series data.

Long Short- Term Memory (LSTM) networks are a modified version of recurrent
neural networks, which makes it easier to remember past data in memory.

Input gate- It discover which value from input should be used to modify the memory.
Sigmoid function decides which values to let through 0 or 1. And tanh function gives
weightage to the values which are passed, deciding their level of importance ranging from -1
to 1.

Forget gate- It discover the details to be discarded from the block. A sigmoid function
decides it. It looks at the previous state (ht-1) and the content input (Xt) and outputs a
23
number between 0(omit this) and 1(keep this) for each number in the cell state Ct-1.

24
3. Output gate- The input and the memory of the block are used to decide the output.
Sigmoid function decides which values to let through 0 or 1. And tanh function decides
which values to let through 0, 1. And tanh function gives weightage to the values which are
passed, deciding their level of importance ranging from -1 to 1 and multiplied with an output
of sigmoid.

It represents a full RNN cell that takes the current input of the sequence xi, and outputs the
current hidden state, hi, passing this to the next RNN cell for our input sequence. The inside
of an LSTM cell is a lot more complicated than a traditional RNN cell, while the
conventional RNN cell has a single "internal layer" acting on the current state (ht-1) and
input (xt).
Video to text with LSTM Models:

In video to text the methods for generating open-domain video descriptions should be
sensitive to temporal structure and allow both input (sequence of frames) and output
(sequence of words) of variable length.

25
Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of
video frames to a sequence of words in order to generate a description of the event in the
video clip. Our model naturally is able to learn the temporal structure of the sequence of
frames as well as the sequence model of the generated sentences, i.e. a language model. We
evaluate several variants of our model that exploit different visual features on a standard
set of YouTube videos and two movie description datasets (M-VAD and MPII-MD).

Our S2VT approach performs video description using a sequence to sequence model. It
incorporates a stacked LSTM which first reads the sequence of frames and then generates a
sequence of words. The input visual sequence to the model is comprised of RGB and/or
optical flow CNN outputs.

We propose a sequence to sequence model for video description, where the input is the
sequence of video frames (x1, . . . , xn), and the output is the sequence of words (y1, . . . ,

26
ym). Naturally, both the input and output are of variable, potentially different, lengths. In our
case, there are typically many more frames than words. In our model, we estimate the
conditional probability of an output sequence (y1, . . . , ym) given an input sequence (x1, . . . ,
xn) i.e.

p(y1, . . . , ym|x1, . . . , xn) (1)

This problem is analogous to machine translation between natural languages, where a

sequence of words in the input language is translated to a sequence of words in the output
language.

LSTMs for sequence modeling:

The main idea to handle variable-length input and output is to first encode the input
sequence of frames, one at a time, representing the video using a latent vector
representation, and then decode from that representation to a sentence, one word at a time.
for an input xt at time step t, the LSTM computes a hidden/control state ht and a memory cell
state ct which is an encoding of everything the cell has observed until time t:

(2)

where σ is the sigmoidal non-linearity,

⊙ represents the element-wise product with the gate

φ is the hyperbolic tangent non-linearity,

value, and the weight matrices denoted by Wij and

biases bj are the trained parameters.

Thus, in the encoding phase, given an input sequence X (x1, . . . , xn), the LSTM computes a
sequence of hidden states (h1, . . . , hn). During decoding it defines a distribution over the
output sequence Y (y1, . . . , ym) given the input sequence X as p(Y |X) is

p(y1, . . . , ym|x1, . . . , xn) = Ym t=1 p(yt|hn+t−1, yt−1) (3)

where the distribution of p(yt|hn+t) is given by a softmax over all of the words in the
vocabulary (see Equation 5). Note that hn+t is obtained from hn+t−1, yt−1 based on the
recursion in Equation 2.
Sequence to sequence video to text Our approach, S2VT, is depicted in Figure 2. While first
27
encode the input sequence to a fixed length vector using one LSTM and then use
another

28
LSTM to map the vector to a sequence of outputs, we rely on a single LSTM for both the
encoding and decoding stage. This allows parameter sharing between the encoding and
decoding stage.
Our model uses a stack of two LSTMs with 1000 hidden units each. Figure 2 shows the
LSTM stack unrolled over time. When two LSTMs are stacked together,
in our case, the hidden representation (ht) from the first LSTM layer (colored red) is provided
as the input (xt) to the second LSTM (colored green). The top LSTM layer in our architecture
is used to model the visual frame sequence, and the next layer is used to model the output
word sequence. Training and Inference
In the first several time steps, the top LSTM layer (colored red in Figure 2) receives a
sequence of frames and encodes them while the second LSTM layer receives the hidden
representation (ht) and concatenates it with null padded input words (zeros), which it
then encodes. There is no loss during this stage when the LSTMs are encoding.

After all the frames in the video clip are exhausted, the second LSTM layer is fed the
beginning-ofsentence () tag, which prompts it to start decoding its current hidden
representation into a sequence of words.

While training in the decoding stage, the model maximizes for the log-likelihood of the
predicted output sentence given the hidden representation of the visual frame sequence, and
the previous words it has seen. From Equation 3 for a model with parameters θ and output
sequence Y = (y1, . . . , ym), this is formulated as:

θ ∗ = argmax θ Xm t=1 log p(yt|hn+t−1, yt−1; θ) (4)

figure(2)

This log-likelihood is optimized over the entire training dataset using stochastic gradient
descent. The loss is computed only when the LSTM is learning to decode. Since this loss is
propagated back in time, the LSTM learns to generate an appropriate hidden state
representation (hn) of the input sequence. The output (zt) of the second LSTM layer is used
to obtain the emitted word (y). We apply a softmax function to get the probability distribution
over the words y ′ in the vocabulary V :
29
(5)

We note that, during the decoding phase, the visual frame representation for the first LSTM
layer is simply a vector of zeros that acts as padding input. We require an explicit end-of-
sentence tag () to terminate each sentence since this enables the model to define a distribution
over sequences of varying lengths. At test time, during each decoding step we choose the
word yt with the maximum probability after the softmax (from Equation 5) until it emits the
token.
We propose a stack of two LSTMs that learn a representation of a sequence of frames in
order to decode it into a sentence that describes the event in the video. The top LSTM layer
(colored red) models visual feature inputs. The second LSTM layer (colored green) models
language given the text input and the hidden representation of the video sequence. We use to
indicate begin-of-sentence and for the end-of-sentence tag. Zeros are used as a when
there is no input at the time step.
Attention Models in Computer vision

1. What are Attention Models?

Attention models, or attention mechanisms, are input processing techniques for neural
networks that allows the network to focus on specific aspects of a complex input, one at a
time until the entire dataset is categorized. Similar to how the human mind solves a new
problem by dividing it into simpler tasks and solving them one by one.

Since the introduction of Transformer in the work “Attention is all you need”, there has been a
transition in the field of NLP towards replacing Recurrent Neural Networks (RNN) with
attention-based networks.

The aim of attention models is to reduce larger, more complicated tasks into smaller, more
manageable areas of attention to understand and process sequentially.

The models work within neural networks, which are a type of network model with a similar
structure and processing methods as the human brain for simplifying and processing
information. This allows for efficient and sequential data processing, especially when the
network needs to categorize entire datasets.

This one refers to the mechanism of relating different positions of a single sequence to compute
a representation of the same sequence. Self-attention works by comparing every word in the
sentence to every other word and reweighing the word embedding’s of each word to
include contextual relevance.

2. Attention method in computer vision:

30
Here we focus on the two most popular ones for computer vision tasks: Multi-Head
Attention and Convolutional Block Attention Module (CBAM).

Multi-Head Attention:

Multi-Head Attention is a module for attention mechanism that runs an attention module
several times in parallel. Hence, to understand its logic it is first needed to understand the
Attention module. The two most commonly used attention functions are Additive
Attention and Dot-Product.

The basic structure of the Attention module is that there are two lists of vectors x1 and x2, one
which is attended and the other one which attends. The vector x2 generates a ‘query’ while the
vector x1 creates a ‘key’ and a ‘value’. The idea behind the attention function is to map the
query and the set key-value pairs to an output. “The output is computed as a weighted sum of
the values, where the weight assigned to each value is computed by a compatibility
function of the query with the corresponding key”. The output is computed as follows:

As mentioned in this discussion, the key/value/query concepts come from retrieval systems. For
example, when typing a query on YouTube to search for some video, the search engine will map
your query against a set of keys (video title, description, etc.) linked with candidate videos in
the database. Then, it will present you with the best-matched videos (values).

Convolutional Block Attention Module (CBAM):

Compared to Multi-Head attention, this type of attention was intentionally made for feed-
forward convolutional neural networks and can be applied at every convolutional block in
deep networks. CBAM contains two sequential sub-modules called the Channel Attention
Module (CAM) and the Spatial Attention Module (SAM). While channel refers to the number
of features or channels for each pixel, spatial refers to the feature maps of dimension (h x
w).

 Spatial Attention Module (SAM):

This module is comprised of a three-fold sequential operation. The first part of it is called the
Channel Pool and it consists of applying Max Pooling and Average Pooling across the channels
to the input (c × h × w) to generate an output with shape (2 × h × w). This is the input to a
convolution layer that outputs a 1-channel feature map (1 × h × w). After passing this output
through a Batch-Norm and an optional ReLU, the data goes to a Sigmoid Activation layer.
31
 Channel Attention Module (CAM):

This module first decomposes the input tensor into 2 subsequent vectors of dimensionality (c ×
1 × 1) generated by Global Average Pooling (GAP) and Global Max Pooling (GMP).
Thereafter, the output goes through a fully connected layer followed by a ReLu activation
layer.

3. What Types of Problems Do Attention Models Solve?

While this is a powerful technique for improving computer vision, the most work so far with
attention mechanisms has focused on Neural Machine Translation (NMT).

Using attention mechanisms in NMT is a much simpler approach. Here, the meaning of a
sentence is mapped into a fixed-length vector, which then generates a translation based on
that vector as a whole. The goal isn’t to translate the sentence word for word, but rather pay
attention to the general, “high level” overall sentiment. Besides extremely improving
accuracy, this attention-driven learning approach is much easier to construct and faster
to train.

4. The classification of attention mechanism models in computer vision:

By applying the bottom-up visual attention computing models to the specific tasks of
computer vision, so the generalization performance of network can be improved. In the
process of development, attention mechanisms in computer vision have evolved into different
categories and different models.

Global attention model:

The global attention model, which is also similar to the soft attention model, collects inputs
from all encoder and decoder states before evaluating the current state to determine the
output. This model uses each encoder step and each decoder preview step to calculate the
attention weights or align weights. This allows the model to find the decoder output.

It has been used in many fields of computer vision, such as classification, detection,
segmentation, model generation, video processing, etc.

Mechanisms of soft attention can be categorized into spatial attention, channel attention, mixed
attention, self-attention.

 Spatial attention:

Ordinary CNN can show the translation-invariance and implicit rotation-invariance of

learning. Compared with the networks learning things implicitly, an explicit processing
module is preferred for the network to handle all the above mentioned transformations.
Consequently, Deep Mind designed Spatial Transformer Layer (STL) to realize spatial
invariance and its network structure.
32
The localization net firstly obtains a Θ according to the input image, U, by computation.
And the grid generator then computes the coordinates of input image according to Θ and
the coordinates of output image. In the end, the sampler fills image V based on the defined
rules of filling (bilinear interpolation is generally used). In this case, the input image can be
corrected into the desired image by STL.

 Channel attention:

In a convolutional neural network, each image is initially represented by three channels

(R, G, B). After being processed by different convolution kernels, each channel will generate
new channels containing different information.

If weights are added to each channel to show the relevance between channel and key
information, a greater weight means a higher relevancy, and more attention should be
paid to the corresponding channel. It models the importance of each feature channel and then

33
enhances or suppresses it in different tasks.

A by-pass branch emerging after normal convolution is operated by squeezing, which

compresses the features of spatial dimension that is, each two-dimensional feature map
becomes a real number. The next step is the operation of excitation, which generates a weight w
for each feature channel to explicitly model the relevance. Once the weight of each feature
channels is obtained, the weights are applied to each original feature channel, and the
importance of different channels can be learned according to specific tasks.

 Mixed attention:

The mixed attention combines multiple attention mechanisms into one

framework, which can bring better performance to a certain extent. CBAM
firstly combined the mechanism of channel attention and spatial attention,
making the network know not only 'look what' but also 'look where' by just

34
adding a few more parameters.

 Self-attention:

In a convolutional neural network, the convolution kernel is confined by its size, which can
only use local information to calculate the target pixel, so it may lead to deviations due to the
ignorance of global information.

If each pixel in the feature map is regarded as a random variable and the paring co-variances
are calculated, the value of each predicted pixel can be enhanced or weakened based on its
similarity to other pixels in the image. The mechanism of employing similar pixels in
training and prediction and ignoring dissimilar pixels is called the self-attention
mechanism.

35
To achieve global reference for each pixel-level prediction. By using the self-attention
mechanism, global reference can be realized during the training and prediction of models.
The model is with good bias-variance weight, making it more reasonable.

Local attention model:

The local attention model is similar to the global attention model, but it only uses a few encoder
positions to determine the align weights. The model calculates the align weights and context
vector by using the first single-aligned position and a selection of words from the encoder
source.

The local attention model also allows for monotonic alignment and predictive alignment.
Monotonic alignment assumes only select information matters, whereas predictive
alignment allows the model itself to predict the final alignment position. The local attention
model is similar to the hard attention model.

 Hard attention:

The mechanism of soft attention has been successfully applied in the field of computer vision.
As the mechanism of hard attention can select important features from input information, it is
observed as a more efficient and direct approach.

It pays most attention to the elements related to the task, and temporarily ignores the other
signals, segments the input image into several blocks, and then simulates self-attention
architecture. Relevant blocks are selected in each time step, and once determined, Attention
Agent will make decisions only according to these blocks while ignoring the other blocks.
Usually, back propagation is utilized to optimize neural networks.

The upper row: input transforming - the sliding window splits the input image into
smaller blocks and then “flattens” them for future processing.

The middle row: block election - the modified self-attention modules vote between blocks
to generate a vector of block importance.

The lower row: action generation – Attention Agent selects the most important blocks,
extracts corresponding features, and makes decisions on their basis. It has been proven that
Attention Agent has successfully learned to pay attention to different regions in the input
image.

36
5. Tips for using attention models:

Explore different models-Consider the different types of models available for attention
mechanism. Think about which may best meet your needs and provide the most accurate
results.

Provide training- It's important to provide consistent back propagation training and
reinforcement to ensure your attention models are accurate and effective. This helps
identify potential errors within your models, helping you find ways to refine and improve them.

Use them for translation- Implement attention models to support language translations.
Using them frequently may help improve the accuracy of your translations.

37
CASE STUDY:
NAME AND ENTITY RECOGNITION:
NER – Definition
• Named entity recognition (NER) ‒ also called entity identification or entity
extraction ‒ is a natural language processing (NLP) technique that automatically
identifies named entities in a text and classifies them into predefined categories.
• An entity is basically the thing that is consistently talked about or refer to in the text.
• Entities can be names of people, organizations, locations, times, quantities, monetary
values, percentages, and more.
• The named entity recognition (NER) is one of the most popular data preprocessing
task. It involves the identification of key information in the text and classification into
a set of predefined categories.

• Named entity recognition (NER) helps to easily identify the key elements in a text,
like names of people, places, brands, monetary values, and more.
• With named entity recognition, key information can be extracted to understand what a
text is about, or merely use it to collect important information to store in a database.
NER – Concept
 It detects named entities like person, organization, place, date etc.
 It predicts the entities based on model which was trained using the labelled data.
 It is a supervised learning.
 It involves the identification of key information in the text and classification into a set
of predefined categories.
 It is one of the most popular data preprocessing task.
NER – Process
• Two Processes that are involved:
 Detecting the entities from the text
 Classifying them into different categories
• First find the entities mentioned in a given text and
• Assign them to a particular class in our list of predefined entities.

38
e.g 1

Nelson is a boy – Potential

Location
Organization
Person
Facility
Miscellaneous

Nelson lives in India

Person Location

NER Tag

<START:Person>Nelson<END>lives in <START:Location>India<END>

NER – Steps

39
• Steps to build the custom NER model for detecting the job role in job postings in
spaCy 3.0:
• Annotate the data to train the model.
• Convert the annotated data into the spaCy bin object.
• Generate the config file from the spaCy website.
• Train the model in the command line.
• Load and test the saved model

• Keyword extraction (also known as keyword detection or keyword analysis) is a text

analysis technique that automatically extracts the most used and most important
words and expressions from a text. It helps summarize the content of texts and
recognize the main topics discussed.

• Keyword extraction uses machine learning artificial intelligence (AI) with natural
language processing (NLP) to break down human language so that it can be
understood and analyzed by machines. It’s used to find keywords from all manner of
text: regular documents and business reports, social media comments, online forums
and reviews, news reports, and more.

• Imagine thousands of online reviews about the product has to be analysed. Keyword
extraction helps to sift through the whole set of data and obtain the words that best
describe each review in just seconds. That way, what customers are mentioning most
often can be easily and automatically seen , saving teams hours upon hours of manual
processing.

• Ambiguity in NE

• Charles has told Nokia employees to come back to their respective offices atleast 40
hrs a week or leave the company.

• An engineer Edward Nokia lived in a small NewYork city.

• Nokia – Company & Person

• NewYork – City

• Charles – Person

• When a sentence is read, whether person,place or organization is not known.

• This is where NER came into picture.

• It actually tells what entity it is i.e it extracts the entities amidst lot of articles

40
Approaches:

• Simple lookup table

• Content Recommendation

• Many modern applications (like Netflix and YouTube) rely on recommendation

systems to create optimal customer experiences.

• Lot of comedies on Netflix, get more recommendations that have been classified as
the entity Comedy.

• Customer care services

Methods of NER

• One way is to train the model for multi-class classification using different machine
learning algorithms, but it requires a lot of labelling. In addition to labelling the
model also requires a deep understanding of context to deal with the ambiguity of
the sentences. This makes it a challenging task for a simple machine learning
algorithm.

• Another way is that Conditional random field that is implemented by both NLP
Speech Tagger and NLTK. It is a probabilistic model that can be used to model
sequential data such as words.

Deep Learning Based NER: Deep learning NER is much more accurate than
previous method, as it is capable to assemble words. This is due to the fact that it used
a method called word embedding, that is capable of understanding the semantic and
syntactic relationship between various words. It is also able to learn analyzes topic-

41
specific as well as high level words automatically. This makes deep learning NER
applicable for performing multiple tasks. Deep learning can do most of the repetitive
work itself, hence researchers for example can use their time more efficiently.

• Implementation

In this implementation, Named Entity Recognition is performed using two different

frameworks: Spacy and NLTK. This code can be run on colab, however for
visualization purpose. The frameworks can be installed using pip install.

Python 3

# command to run before code

! pip install spacy

! pip install nltk

! python -m spacy download en_core_web_sm

# imports and load spacy english language package

import spacy

from spacy import displacy

from spacy import tokenizer

nlp = spacy.load('en_core_web_sm')

#Load the text and process it

text =("Python is an interpreted, high-level and general-purpose programming language

"Pythons design philosophy emphasizes code readability with"

"its notable use of significant indentation."

"Its language constructs and object-oriented approach aim to"

"help programmers write clear and"

"logical code for small and large-scale projects")

# text2 = # copy the paragraphs from https://www.python.org/doc/essays/

doc = nlp(text)

#doc2 = nlp(text2)

sentences = list(doc.sents)

print(sentences)
42
# tokenization

for token in doc:

print(token.text)

# print entities

ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]

print(ents)

# now we use displaycy function on doc2

displacy.render(doc, style='ent', jupyter=True)

Applications

• Biomedical data for

• gene identification,

• DNA identification, and also

• identification of drug names and disease names. These experiments use CRFs
with features engineered for their domain data.

• Auto compilation in gmail.

• Spam filters.

• Sentence translation.

• Customer service chat bot (Bank)

OPINION MINING USING RECURRENT NEURAL NETWORK

Sentiment Analysis Using Rnn:

 Sentiment analysis is an NLP technique that identifies a text’s polarity or sentimental

tone or opinion.

 Emotions or attitudes towards a topic can be positive, negative, or neutral. This makes
sentiment analysis a text classification task.

 Examples of positive, negative, and neutral expressions are:

“I enjoyed the movie!” – Positive

43
“I am not sure if I liked the movie.” – Neutral

“It was the most worst movie I have ever seen.” – Negative

Applications Of Sentiment Analysis:

 Social media monitoring

 Customer support ticket analysis
 Brand monitoring and reputation management
 Listen to voice of the customer (VoC)
 Listen to voice of the employee
 Product analysis
 Market research and competitive research

What are Recurrent Neural Networks:

 A recurrent neural network (RNN) is a type of artificial neural network which uses
sequential data or time series data.
 It is a type of Neural Network where the output from the previous step are fed as input
to the current step.
 While traditional deep neural networks assume that inputs and outputs are
independent of each other, the output of recurrent neural networks depend on the prior
elements within the sequence.
Why Recurrent Neural Networks?

44
 RNN were created because there were a few issues in the feed-forward neural
network:
o Cannot handle sequential data
o Considers only the current input
o Cannot memorize previous inputs
 The solution to these issues is the RNN. An RNN can handle sequential data,
accepting the current input data, and previously received inputs.
 RNNs can memorize previous inputs due to their internal memory.(to predict the next
word of a sentence)
How RNN Works:
 The input layer ‘x’ takes in the input to the neural network and processes it and passes
it onto the middle layer.
 The middle layer ‘h’ can consist of multiple hidden layers, each with its own
activation functions ,weights and biases
 The Recurrent Neural Network will standardize the different activation
functions ,weights and biases so that each hidden layer has the same parameters.
 Then, instead of creating multiple hidden layers, it will create one and loop over it as
many times as required.

Few Real-time examples of RNN:

 Google translator
 Chatbots in Apps like Flipkart & Swiggy
 Auto completion feature in Gmail
 Personal Assistance like Alexa, Siri & Google Assistance
 Email spam detection

45
 Document summarization
Problem Statement:
 Imagine the task of determining whether a product’s review is positive or negative;
you could do it yourself just by reading it, right? But what happens when the company
you work for sells 2k products every single day? Are you pretending to read all the
reviews and manually classify them? Let’s be honest, your job would be the worst
ever. There’s where Sentiment Analysis comes in and makes your life and job easier.

What Is Tokenization Means?

Tokenization is a method to segregate a particular text into small chunks or tokens.
Here the tokens or chunks can be anything from words to characters, even subwords.

46
Solution:
 The common and most basic steps in sentiment analysis are:
 Remove URLs and email addresses from every single sample — because they won’t
add meaningful value.
 Remove punctuation signs — otherwise your model won’t understand that “good!”
and “good” are actually meaning the same thing.
 Lowercase all text — because you want to make the input text as generic as possible
 Example, a “Good” which is at the beginning of a phrase is understood differently
than the “good” in another sample.
 Remove stop-words — because they only add noise and won’t make the data more
meaningful.
 Stop words are the very common words like 'if', 'but', 'we', 'he', 'she', and 'they‘.
 Stemming/Lemmatizing: Lemmatizing generally returns valid words (that exist) while
stemming techniques return (most of the times) shorten words, that’s why
lemmatizing is used more in real world implementations. This is how lemmatizers vs.
stemmers work: suppose you want to find the root word of ‘caring’: ‘Caring’ -
> Lemmatization -> ‘Care’.
Stemming -> ‘Car’
Preparing IMDB reviews for Sentiment Analysis:

IMDB movie review dataset is a collection of 50K movie reviews tagged with
corresponding true sentiment value. Out of which 25K reviews belong to the ‘positive‘
category and the rest, 25K belong to the ‘negative‘ sentiment category.

import re
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
import keras
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
import math

47
import nltk

We load the dataset into a pandas dataframe with the help of the following code :

data = pd.read_csv('IMDB Dataset.csv')

data

The data looks like this :

Source: Screenshot from my Jupyter Notebook

Data Preprocessing

First, we need to remove HTML tags, URLs, and non-alphanumeric characters from the

reviews. We do that with the help of the remove_tags function, and Regex functions are used

for easy string manipulation.

def remove_tags(string):

48
removelist = ""
result = re.sub('','',string) #remove HTML tags
result = re.sub('https://.*','',result) #remove URLs
result = re.sub(r'[^w'+removelist+']', ' ',result) #remove non-alphanumeric characters
result = result.lower()
return result
data['review']=data['review'].apply(lambda cw : remove_tags(cw))

We also need to remove stopwords from the corpus. Stopwords are commonly used words

like ‘and’, ‘the’, ‘at’ that do not add any special meaning or significance to a sentence. A list

of stopwords are available with nltk, and they can be removed from the corpus using the

following code :

nltk.download('stopwords')
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
data['review'] = data['review'].apply(lambda x: ' '.join([word for word in x.split() if word not
in (stop_words)]))

We now perform lemmatization on the text. Lemmatization is a useful technique in NLP to

obtain the root form of words, known as lemmas. For example, the lemma of the words

reading, reads, read is read. This helps save unnecessary computational overhead in trying to

decipher entire words, as the meanings of most words are well-expressed by their separate

lemmas. We perform lemmatization using the WordNetLemmatizer() from nltk. The text is

first broken into individual words using the WhitespaceTokenizer() from nltk. We write a

function lemmatize_text to perform lemmatization on the individual tokens.

w_tokenizer = nltk.tokenize.WhitespaceTokenizer()
lemmatizer = nltk.stem.WordNetLemmatizer()
def lemmatize_text(text):
st = ""
for w in w_tokenizer.tokenize(text):
st = st + lemmatizer.lemmatize(w) + " "
return st

49
data['review'] = data.review.apply(lemmatize_text)
data

The processed data looks like this :

Source: Screenshot from my Jupyter Notebook

Next, we print some basic statistics about the dataset and check if the dataset is balanced or

not (equal number of all labels). Ideally, the dataset should be balanced because a severely

imbalanced dataset can be challenging to model and require specialized techniques.

s = 0.0
for i in data['review']:
word_list = i.split()
s = s + len(word_list)
print("Average length of each review : ",s/data.shape[0])
pos = 0
for i in range(data.shape[0]):
if data.iloc[i]['sentiment'] == 'positive':
pos = pos + 1
neg = data.shape[0]-pos
print("Percentage of reviews with positive sentiment is "+str(pos/data.shape[0]*100)+"%")
print("Percentage of reviews with negative sentiment is "+str(neg/data.shape[0]*100)+"%")
>>Average length of each review : 119.57112

50
>>Percentage of reviews with positive sentiment is 50.0%
>>Percentage of reviews with negative sentiment is 50.0%
Encoding Labels and Making Train-Test Splits

We use the LabelEncoder() from sklearn.preprocessing to convert the labels (‘positive’,

‘negative’) into 1’s and 0’s respectively.

reviews = data['review'].values
labels = data['sentiment'].values
encoder = LabelEncoder()
encoded_labels = encoder.fit_transform(labels)

Finally, we split the dataset into train and test parts using train_test_split from

sklearn.model_selection. We use 80% of the dataset for training and 20% for testing.

train_sentences, test_sentences, train_labels, test_labels = train_test_split(reviews,

encoded_labels, stratify = encoded_labels)

Before being fed into the LSTM model, the data needs to be padded and tokenized:

 Tokenizing: Keras’ inbuilt tokenizer API has fit the dataset, which splits the sentences into
words and creates a dictionary of all unique words found and their uniquely assigned
integers. Each sentence is converted into an array of integers representing all the individual
words present in it.
 Sequence Padding: The array representing each sentence in the dataset is filled with zeroes
to the left to make the size of the array ten and bring all collections to the same length.

# Hyperparameters of the model

vocab_size = 3000 # choose based on statistics
oov_tok = ''
embedding_dim = 100
max_length = 200 # choose based on statistics, for example 150 to 200
padding_type='post'
trunc_type='post'
# tokenize sentences
tokenizer = Tokenizer(num_words = vocab_size, oov_token=oov_tok)
tokenizer.fit_on_texts(train_sentences)
word_index = tokenizer.word_index

51
# convert train dataset to sequence and pad sequences
train_sequences = tokenizer.texts_to_sequences(train_sentences)
train_padded = pad_sequences(train_sequences, padding='post', maxlen=max_length)
# convert Test dataset to sequence and pad sequences
test_sequences = tokenizer.texts_to_sequences(test_sentences)
test_padded = pad_sequences(test_sequences, padding='post', maxlen=max_length)
Building the Model

A Keras sequential model is built. It is a linear stack of the following layers :

 An embedding layer of dimension 100 converts each word in the sentence into a fixed-length
dense vector of size 100. The input dimension is set as the vocabulary size, and the output
dimension is 100. Each word in the input will hence get represented by a vector of size 100.
 A bidirectional LSTM layer of 64 units.
 A dense (fully connected) layer of 24 units with relu activation.
 A dense layer of 1 unit and sigmoid activation outputs the probability of the review is
positive, i.e. if the label is 1.

The code for building the model :

# model initialization
model = keras.Sequential([
keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
keras.layers.Bidirectional(keras.layers.LSTM(64)),
keras.layers.Dense(24, activation='relu'),
keras.layers.Dense(1, activation='sigmoid')
])
# compile model
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# model summary
model.summary()

The model is compiled with binary cross-entropy loss and adam optimizer. Since we have a

binary classification problem, binary cross-entropy loss is used. The Adam optimizer uses

stochastic gradient descent to train deep learning models, and it compares each of the

52
predicted probabilities to the actual class label (0 or 1). Accuracy is used as the primary

performance metric. The model summary can be seen below :

Source: Screenshot from my Jupyter Notebook

Model Training and Evaluation

The model is trained for five epochs.

num_epochs = 5
history = model.fit(train_padded, train_labels,
epochs=num_epochs, verbose=1,
validation_split=0.1)

The model is evaluated by calculating its accuracy. Accuracy of classification is calculated by

dividing the number of correct predictions by the total number of predictions.

prediction = model.predict(test_padded)
# Get labels based on probability 1 if p>= 0.5 else 0
pred_labels = []
for i in prediction:

53
if i >= 0.5:
pred_labels.append(1)
else:
pred_labels.append(0)
print("Accuracy of prediction on test set : ", accuracy_score(test_labels,pred_labels))

The accuracy of prediction on the test set comes out to be 87.27%! You can improve the

accuracy further by playing around with the model hyperparameters, further tuning the model

architecture or changing the train-test split ratio. You should also train the model for a more

significant number of epochs, and we stopped at five epochs because of the computational

time. Ideally, it would help prepare the model until the train and test losses converge.

Using the model to determine the sentiment of unseen movie reviews

We can use our trained model to determine the sentiment of new unseen movie reviews not

present in the dataset. Each new text must be tokenized and padded before being fed as input

to the model. The model.predict() function returns the probability of the positive review. If

the probability is more significant than 0.5, we consider the study to be positive, else

negative.

# reviews on which we need to predict

sentence = ["The movie was very touching and heart whelming",
"I have never seen a terrible movie like this",
"the movie plot is terrible but it had good acting"]
# convert to a sequence
sequences = tokenizer.texts_to_sequences(sentence)
# pad the sequence
padded = pad_sequences(sequences, padding='post', maxlen=max_length)
# Get labels based on probability 1 if p>= 0.5 else 0
prediction = model.predict(padded)
pred_labels = []
for i in prediction:

54
if i >= 0.5:
pred_labels.append(1)
else:
pred_labels.append(0)
for i in range(len(sentence)):
print(sentence[i])
if pred_labels[i] == 1:
s = 'Positive'
else:
s = 'Negative'
print("Predicted sentiment : ",s)

The output looks very promising!

Source: Screenshot from my Jupyter Notebook

Conclusion

Sentiment analysis, like any other classification task, can be performed with many different

machine learning and deep learning models, such as Naive Bayes, KNN, SVM or CNN,

ANN, etc. Now that you know its basics go ahead and explore other models to perform

sentiment analysis.

Dialogue generation using LSTM:

Long Short Term Memory (LSTM)

 The problems of conventional RNNs, namely, the vanishing and exploding gradients
and provides a convenient solution to these problems in the form of Long Short
Term Memory (LSTM).

55
 Long Short-Term Memory is an advanced version of recurrent neural network
(RNN) architecture that was designed to model chronological sequences and their
long-range dependencies more precisely than conventional RNNs.
 The major highlights include the interior design of a basic LSTM cell, the variations
brought into the LSTM architecture, and few applications of LSTMs that are highly
in demand.

 It also makes a comparison between LSTMs and GRUs. The article concludes with
a list of disadvantages of the LSTM network and a brief introduction of the
upcoming attention-based models that are swiftly replacing LSTMs in the real
world.
Introduction:

 LSTM networks are an extension of recurrent neural networks (RNNs) mainly

introduced to handle situations where RNNs fail. Talking about RNN, it is a
network that works on the present input by taking into consideration the previous
output (feedback) and storing in its memory for a short period of time (short-term
memory).
 Out of its various applications, the most popular ones are in the fields of speech
processing, non-Markovian control, and music composition.
 Nevertheless, there are drawbacks to RNNs. First, it fails to store information for a
longer period of time. At times, a reference to certain information stored quite a
long time ago is required to predict the current output. But RNNs are absolutely
incapable of handling such “long-term dependencies”.
 Second, there is no finer control over which part of the context needs to be carried
forward and how much of the past needs to be ‘forgotten’. Other issues with RNNs
are exploding and vanishing gradients (explained later) which occur during the
training process of a network through backtracking.
 Thus, Long Short-Term Memory (LSTM) was brought into the picture. It has been
so designed that the vanishing gradient problem is almost completely removed,
while the training model is left unaltered.
Architecture:

 The basic difference between the architectures of RNNs and LSTMs is that the
hidden layer of LSTM is a gated unit or gated cell.
 It consists of four layers that interact with one another in a way to produce the
output of that cell along with the cell state. These two things are then passed onto
the next hidden layer.
 Unlike RNNs which have got the only single neural net layer of tanh, LSTMs
comprises of three logistic sigmoid gates and one tanh layer.

Gates have been introduced in order to limit the information that is passed through the cell.

They determine which part of the information will be needed by the next cell and which

part is to be discarded

56
 The output is usually in the range of 0-1 where ‘0’ means ‘reject all’ and ‘1’ means
‘include all’.
Conventional LSTM:

Variations:

 With the increasing popularity of LSTMs, various alterations have been tried on the
conventional LSTM architecture to simplify the internal design of cells to make
them work in a more efficient way and to reduce the computational complexity.

 Gers and Schmidhuber introduced peephole connections which allowed gate layers
to have knowledge about the cell state at every instant. Some LSTMs also made use
of a coupled input and forget gate instead of two separate gates that helped in
making both the decisions simultaneously.

 Another variation was the use of the Gated Recurrent Unit(GRU) which improved
the design complexity by reducing the number of gates. It uses a combination of the
cell state and hidden state and also an update gate which has forgotten and input

57
gates merged into it.

LSTM(Figure-A), DLSTM(Figure-B), LSTMP(Figure-C) and DLSTMP(Figure-D)

1. Figure-A represents what a basic LSTM network looks like. Only one layer of LSTM
between an input and output layer has been shown here.
2. Figure-B represents Deep LSTM which includes a number of LSTM layers in between
the input and output. The advantage is that the input values fed to the network not only
go through several LSTM layers but also propagate through time within one LSTM cell.
Hence, parameters are well distributed within multiple layers. This results in a thorough
process of inputs in each time step.
3. Figure-C represents LSTM with the Recurrent Projection layer where the recurrent
connections are taken from the projection layer to the LSTM layer input. This
architecture was designed to reduce the high learning computational complexity (O(N))
for each time step) of the standard LSTM RNN.
4. Figure-D represents Deep LSTM with a Recurrent Projection Layer consisting of
multiple LSTM layers where each layer has its own projection layer. The increased
depth is quite useful in the case where the memory size is too large. Having increased
depth prevents overfitting in models as the inputs to the network need to go through
many nonlinear functions.

58
Applications:

1. Language modelling or text generation, that involves the computation of words when a
sequence of words is fed as input. Language models can be operated at the character
level, n-gram level, sentence level or even paragraph level.
2. Image processing, that involves performing analysis of a picture and concluding its
result into a sentence Speech and Handwriting Recognition
3. Music generation which is quite similar to that of text generation where LSTMs predict
musical notes instead of text by analyzing a combination of given notes fed as input.
4. Language Translation involves mapping a sequence in one language to a sequence in
another language.

Disadvantages

1. the cell has become quite complex now with the additional features (such as forget
gates) being brought into the picture.
2. Hardware-wise, LSTMs become quite inefficient.
3. LSTMs get affected by different random weight initialization and hence behave quite
similar to that of a feed-forward neural net. They prefer small weight initialization
instead.
4. LSTMs are prone to overfitting and it is difficult to apply the dropout algorithm to curb
this issue.

DIALOGUE GENERATION EXAMPLE

59
60
61
62
63

Unit 3 DL
No ratings yet
Unit 3 DL
15 pages
M2L2 Producing Reports Exercises
No ratings yet
M2L2 Producing Reports Exercises
6 pages
Operator Manual 306681a
No ratings yet
Operator Manual 306681a
176 pages
Transaction OPK4 - Parameters For Order Confirmation
No ratings yet
Transaction OPK4 - Parameters For Order Confirmation
14 pages
07 - Ai-900 71-90
No ratings yet
07 - Ai-900 71-90
6 pages
Unit Vi Introduction To X.25 Protocol
No ratings yet
Unit Vi Introduction To X.25 Protocol
27 pages
4th SUMMATIVE TEST IN ENGLISH 7 - WEEK 7-8
No ratings yet
4th SUMMATIVE TEST IN ENGLISH 7 - WEEK 7-8
3 pages
De La Salle Santiago Zobel School High School Department Unit Assessment Matrix
No ratings yet
De La Salle Santiago Zobel School High School Department Unit Assessment Matrix
6 pages
Harmony Gtu Hmidt952
No ratings yet
Harmony Gtu Hmidt952
7 pages
‎⁨حل كتاب التمارين الإنجليزي Mega Goal 2.1 ثاني ثانوي مسارات ف1 1444⁩
No ratings yet
‎⁨حل كتاب التمارين الإنجليزي Mega Goal 2.1 ثاني ثانوي مسارات ف1 1444⁩
50 pages
Image Segmentation Using Deep Learning: A Survey
No ratings yet
Image Segmentation Using Deep Learning: A Survey
23 pages
A Survey On Deep Learning Techniques For Image and Video Semantic Segmentation
No ratings yet
A Survey On Deep Learning Techniques For Image and Video Semantic Segmentation
68 pages
A Review On Deep Learning Techniques Applied To Semantic Segmentation
No ratings yet
A Review On Deep Learning Techniques Applied To Semantic Segmentation
23 pages
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
No ratings yet
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
20 pages
DNV-CG-0004 2021-07
No ratings yet
DNV-CG-0004 2021-07
44 pages
Blitznet: A Real-Time Deep Network For Scene Understanding
No ratings yet
Blitznet: A Real-Time Deep Network For Scene Understanding
11 pages
Basic Networking Hardware: Kamal Harmoni Kamal Ariff
100% (2)
Basic Networking Hardware: Kamal Harmoni Kamal Ariff
22 pages
Lecture 13 Image Segmentation Using Convolutional Neural Network
No ratings yet
Lecture 13 Image Segmentation Using Convolutional Neural Network
9 pages
SegNet: A Deep Convolutional Encoder-Decoder Architecture For Image Segmentation
No ratings yet
SegNet: A Deep Convolutional Encoder-Decoder Architecture For Image Segmentation
15 pages
6 Segnet
No ratings yet
6 Segnet
14 pages
Brief+case Study
No ratings yet
Brief+case Study
4 pages
How To Install SSL Certificate On Oracle Linux
No ratings yet
How To Install SSL Certificate On Oracle Linux
3 pages
PCConsoleTOC FR JPN
No ratings yet
PCConsoleTOC FR JPN
111 pages
Tutor Resume Example
100% (1)
Tutor Resume Example
4 pages
Wep App Testing
No ratings yet
Wep App Testing
20 pages
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
No ratings yet
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
11 pages
Cubic Spline v101
No ratings yet
Cubic Spline v101
2 pages
Unit 5 CNC Machines and Tool Handling Systems
No ratings yet
Unit 5 CNC Machines and Tool Handling Systems
22 pages
Statistical Softwares: Excel Stata Spss
No ratings yet
Statistical Softwares: Excel Stata Spss
1 page
Ans: - Continous and Discrete Simulation Model
No ratings yet
Ans: - Continous and Discrete Simulation Model
6 pages
Data Structure and Algorithm MCQ
100% (1)
Data Structure and Algorithm MCQ
8 pages
Sap Ehp1 For Sap CRM 7.0: Learning Map For Partner Channel Management Consultants
No ratings yet
Sap Ehp1 For Sap CRM 7.0: Learning Map For Partner Channel Management Consultants
8 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
A Brief Survey and An Application of Sem
No ratings yet
A Brief Survey and An Application of Sem
38 pages
ML Report-Image Segmentation
No ratings yet
ML Report-Image Segmentation
19 pages
20PWMCT0732 Ass#3
No ratings yet
20PWMCT0732 Ass#3
8 pages
Image Segmentation in Deep Learning
No ratings yet
Image Segmentation in Deep Learning
12 pages
Firewall
No ratings yet
Firewall
2 pages
Post-Reading Report Alex Shen (Mid Exam)
No ratings yet
Post-Reading Report Alex Shen (Mid Exam)
36 pages
Deep Learning
No ratings yet
Deep Learning
9 pages
AI - Group - 7
No ratings yet
AI - Group - 7
8 pages
Literature Review in Apa Research Paper
100% (1)
Literature Review in Apa Research Paper
8 pages
Computer Aided Engineering Drawing: Entity Draw Commands in Autocad
No ratings yet
Computer Aided Engineering Drawing: Entity Draw Commands in Autocad
37 pages
NNDL Unit 5
No ratings yet
NNDL Unit 5
21 pages
Thesis AlexanderJaus BIBTEX
No ratings yet
Thesis AlexanderJaus BIBTEX
9 pages
Image Segmentation Keras: Implementation of Segnet, FCN, Unet, Pspnet and Other Models in Keras
No ratings yet
Image Segmentation Keras: Implementation of Segnet, FCN, Unet, Pspnet and Other Models in Keras
5 pages
Summary
No ratings yet
Summary
65 pages
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
No ratings yet
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
16 pages
Lecture Sematic-Segmentation
No ratings yet
Lecture Sematic-Segmentation
23 pages
Dlcv2017d3l1segmentation 170623173102
No ratings yet
Dlcv2017d3l1segmentation 170623173102
36 pages
Prime Number
No ratings yet
Prime Number
1 page
Deep Dual-Resolution Networks For Real-Time and Accurate Semantic Segmentation of Road Scenes
No ratings yet
Deep Dual-Resolution Networks For Real-Time and Accurate Semantic Segmentation of Road Scenes
12 pages
Image Segmentation For Object Detection Using Mask R-CNN in Colab
No ratings yet
Image Segmentation For Object Detection Using Mask R-CNN in Colab
5 pages
5 Major Computervision Technique
No ratings yet
5 Major Computervision Technique
10 pages
Lecture 8 Image Segmentationi N Computer Vision 2025
No ratings yet
Lecture 8 Image Segmentationi N Computer Vision 2025
18 pages
Computer VIsion Applications
No ratings yet
Computer VIsion Applications
30 pages
DL UNIt-III
No ratings yet
DL UNIt-III
21 pages
Unit - 3 - DL
No ratings yet
Unit - 3 - DL
15 pages
Understanding Deep Learning Techniques For Image Segmentation
No ratings yet
Understanding Deep Learning Techniques For Image Segmentation
58 pages
Computer Vision Technology
No ratings yet
Computer Vision Technology
29 pages
Object Detection and Segmentation - Part 2
No ratings yet
Object Detection and Segmentation - Part 2
36 pages
Image Segmentation ÔÇö A BeginnerÔÇÖs Guide - Medium
No ratings yet
Image Segmentation ÔÇö A BeginnerÔÇÖs Guide - Medium
16 pages
1 Image Segmentation Using Deep Learning
No ratings yet
1 Image Segmentation Using Deep Learning
6 pages
Image Segmentation DeepLearning
No ratings yet
Image Segmentation DeepLearning
18 pages
Class 2 - Worksheet No 10 - Numerical Ability - Easy
No ratings yet
Class 2 - Worksheet No 10 - Numerical Ability - Easy
3 pages
Lec 2 (Image Segemnation)
No ratings yet
Lec 2 (Image Segemnation)
52 pages
2018 - SeGAN - Adversarial Network With Multi-Scale L 1 Loss For Medical
No ratings yet
2018 - SeGAN - Adversarial Network With Multi-Scale L 1 Loss For Medical
10 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
Gs36j02a10-01e 047
No ratings yet
Gs36j02a10-01e 047
13 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
Report Explo
No ratings yet
Report Explo
31 pages
Explo PPT
No ratings yet
Explo PPT
25 pages
【SegFormer】NeurIPS 2021 Segformer Simple and Efficient Design for Semantic Segmentation With Transformers Paper
No ratings yet
【SegFormer】NeurIPS 2021 Segformer Simple and Efficient Design for Semantic Segmentation With Transformers Paper
14 pages
IVP Notes
No ratings yet
IVP Notes
25 pages
A1745136595 29458 13 2025 Unit6cv
No ratings yet
A1745136595 29458 13 2025 Unit6cv
54 pages
A Comprehensive Review of Modern Object Segmentation Approaches
No ratings yet
A Comprehensive Review of Modern Object Segmentation Approaches
177 pages
IJRAR1DUP001
No ratings yet
IJRAR1DUP001
3 pages
Segmentation by Gan
No ratings yet
Segmentation by Gan
18 pages
Facebook Solutions Engineering Interview Take-Home Assignment
100% (1)
Facebook Solutions Engineering Interview Take-Home Assignment
15 pages
Mask R-CNN
No ratings yet
Mask R-CNN
4 pages
Curtis Oxburgh 2022 Understanding Cybercrime in Real World Policing and Law Enforcement
No ratings yet
Curtis Oxburgh 2022 Understanding Cybercrime in Real World Policing and Law Enforcement
20 pages
Da Unit-Iv
No ratings yet
Da Unit-Iv
23 pages
A Beginner's Guide To Deep Learning Based Semantic Segmentation Using Keras - Divam Gupta
No ratings yet
A Beginner's Guide To Deep Learning Based Semantic Segmentation Using Keras - Divam Gupta
14 pages
DL U-III Computer Vision
No ratings yet
DL U-III Computer Vision
30 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Machine Vision: Insights into the World of Computer Vision
From Everand
Machine Vision: Insights into the World of Computer Vision
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet