0% found this document useful (0 votes)
19 views5 pages

Natural Language Processing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

Natural Language Processing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Natural language processing-unit-III

Deep Learning for NLP


Introduction to Deep Learning :

Deep learning is a machine learning technique that teaches computers to do what comes
naturally to humans: learn by example. Deep learning is a key technology behind driverless
cars, enabling them to recognize a stop sign.
A type of machine learning based on artificial neural networks in which multiple layers of
processing are used to extract progressively higher level features from data.

Deep learning is a subfield of machine learning that is inspired by the function of the brain. Just like
how neurons are interconnected in the brain, neural networks also work the same. Each neuron
takes input, does some kind of manipulation within the neuron, and produces an output th at is
closer to the expected output.

Deep learning process

• Understand the problem


• Identify data
• Select deep learning algorihm
• Training model
• Test the model

Deep learning applications are

• Self driving car.


• Voice controlled assistance
• Automatic image capture generation
• Automatic machine transilation

Deep learning is a computer software that imitates the network of neurons in a brain. It is a sebset
of machine learning based on artificial neural networks with representation learning. It is called deep
learning because it makes use of deep neural networks. This learning can be supervised, semi-
supervised or unsupervised.

The network has 3 components:

• Input layer

• Hidden layer/layers

• Output layer
Natural language processing-unit-III

The functions can be of different types based on the problem or the data. These are also called
activation functions.

The function types are

• Linear Activation functions: A linear neuron takes a linear combination of the weighted inputs; and
the output can take any value between -infinity to infinity.

• Nonlinear Activation function: These are the most used ones, and they make the output restricted
between some range.

If you describe something as non-linear, you mean that it does not progress or develop smoothly
from one stage to the next in a logical way. Instead, it makes sudden changes, or seems to develop
in different directions at the same time. ...

• Sigmoid or Logit Activation Function: Basically, it scales down the output between 0 and 1
by applying a log function, which makes the classification problems easier.

• Softmax function: Softmax is almost similar to sigmoid, but it calculates the probabilities of
the event over ‘n’ different classes, which will be useful to determine the target in multiclass
classification problems.

• Tanh Function: The range of the tanh function is from (-1 to 1), and the rest remains the
same as sigmoid.

• Rectified Linear Unit Activation function: ReLU converts anything that is less than zero to
zero. So, the range becomes 0 to infinity.

Convolutional Neural Networks :

Convolutional Neural Networks (CNN) are similar to ordinary neural networks but have multiple
hidden layers and a filter called the convolution layer. CNN is successful in identifying faces, objects,
and traffic signs and also used in self-driving cars.

algorithms work basically on numerical data. Images and text data are unstructured data. and they
need to be converted into numerical values even before we start anything.

• Image: Computer takes an image as an array of pixel values. Depending on the resolution and size
of the image, it will see an X Y x Z array of numbers.

For example, there is a color image and its size is 480 x 480 pixels. The representation of the array
will be 480 x 480 x 3 where 3 is the RGB value of the color. Each of these numbers varies from 0 to
255, which describes the pixel intensity/density at that point. The concept is that if given the
computer and this array of numbers, it will output the probability of the image being a certain class
in case of a classification problem.

RNN and LSTM techniques to convert text to features.


Natural language processing-unit-III

Architecture :

CNN is a special case of a neural network with an input layer, output layer, and multiple hidden
layers. The hidden layers have 4 different procedures to complete the network.

The Convolution layer is the heart of a Convolutional Neural Network, which does most of the
computational operations. The name comes from the “convolution” operator that extracts features
from the input image. These are also called filters. The matrix formed by sliding the filter over the
full image and calculating the dot product between these 2 matrices is called the ‘Convolved
Feature’ or ‘Activation Map’ or the ‘Feature Map’.

During the training of the CNN, it learns the numbers or values present inside the filter and uses
them on testing data. The greater the number of features, the more the image features get
extracted and recognize all patterns in unseen images.

Nonlinearity (ReLU) :

ReLU (Rectified Linear Unit) is a nonlinear function that is used after a convolution layer in CNN
architecture. It replaces all negative values in the matrix to zero. The purpose of ReLU is to introduce
nonlinearity in the CNN to perform better.

Pooling :

Pooling or subsampling is used to decrease the dimensionality of the feature without losing
important information. It’s done to reduce the huge number of inputs to a full connected layer and
computation required to process the model. It also helps to reduce the overfitting of the model. It
uses a 2 x 2 window and slides over the image and takes the maximum value in each region . This is
how it reduces dimensionality.

Flatten, Fully Connected, and Softmax Layers :

Flattening is converting the data into a 1-dimensional array for inputting it to the next layer.
The last layer is a dense layer that needs feature vectors as input. But the output from the pooling
layer is not a 1D feature vector. This process of converting the output of convolution to a feature
vector is called flattening.

The Fully Connected layer takes an input from the flatten layer and gives out an N -dimensional
vector where N is the number of classes.
Natural language processing-unit-III

The function of the fully connected layer is to use these features for classifying the input image into
various classes based on the loss function on the training dataset.

The Softmax function is used at the very end to convert these N-dimensional vectors into a
probability for each class, which will eventually classify the image into a particular class.

Recurrent Neural Networks :

CNNs are basically used for computer vision problems but fail to solve sequence models. Sequence
models are those where even a sequence of the entity also matters. For example, in the text, the
order of the words matters to create meaningful sentences. This is where RNNs come into the
picture and are useful with sequential data because each neuron can use its memory to remember
information about the previous step.

It is quite complex to understand how exactly RNN is working. the recurrent neural network is taking
the output from the hidden layer and sending it back to the same layer before giving the prediction.

Training RNN – Backpropagation Through Time (BPTT) :

how training is done in case of RNN

If we just discuss the hidden layer, it’s not only taking input from the hidden layer, but we can also
add another input to the same hidden layer. Now the backpropagation happens like any other
previous training. Here error is backpropagated from the last timestamp to the first through
unrolling the hidden layers. This allows calculating the error for each timestamp and updating the
weights. Recurrent networks with recurrent connections between hidden units read an entire
sequence and then produce a required output.

When the values of a gradient are too small and the model takes way too long to learn, this is called
Vanishing Gradients. This problem is solved by LSTMs.
Natural language processing-unit-III

Long Short-Term Memory (LSTM) :

LSTMs are a kind of RNNs with betterment in equation and backpropagation, which makes it
perform better. LSTMs work almost similarly to RNN, but these units can learn things with very long
time gaps, and they can store information just like computers.

The algorithm learns the importance of the word or character through weighing methodology and
decides whether to store it or not. For this, it uses regulated structures called gates that have the
ability to remove or add information to the cell. These cells have a sigmoid layer that decides how
much information should be passed. It has three layers, namely “input,” “forget,” and “output” to
carry out this process.

You might also like