0% found this document useful (0 votes)
13 views

Natural Language Processing

Uploaded by

k44123040
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Natural Language Processing

Uploaded by

k44123040
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Natural Language Processing

Natural Language Processing is one of the branches of AI that helps machines to


understand, and process human languages such as English or Hindi to analyse and
derive their meaning.
NLP takes the data as input from the spoken words, verbal commands or speech
recognition software that humans use in their daily lives and operates on this.
Before getting deeper into the concept follow the links to play the games based on
NLP.

Applications of Natural Language processing


Some of the applications of Natural Language Processing that are used in the real-
life
Automatic Summarization

1. Automatic summarization is relevant for summarizing the meaning of


documents and information and also to understand the emotional
meanings within the information (such as in collecting data from social
media)
2. For example, newsletters, social media marketing, video scripting etc.

Sentiment Analysis

1. Identify sentiment among several posts or even in the same post where
emotion is not always explicitly expressed.
2. Companies use it to identify opinions and sentiments to understand what
customers think about their products and services.
Text classification

1. Text classification makes it possible to assign predefined categories to a


document and organize it to help you find the information you need or
simplify some activities.
2. For example, an application of text categorization is spam filtering in email.
Virtual Assistants

1. Nowadays Google Assistant, Cortana, Siri, Alexa, etc have become an integral
part of our lives. Not only can we talk to them but they also have the ability
to make our lives easier.
2. By accessing our data, they can help us in keeping notes of our tasks, making
calls for us, sending messages, and a lot more.
3. With the help of speech recognition, these assistants can not only detect our
speech but can also make sense of it.
4. According to recent research, a lot more advancements are expected in this
field in the near future.
Chatbots

• A chatbot is a software application used to conduct an on-line chat


conversation via text or text-to-speech, in lieu of providing direct contact with
a live human agent. Some of the popular chatbots are: Mitsuku Bot, Clever
Bot, Jabberwacky, Haptik, Rose, Chatbot
There are 2 types of chatbots
1. Script bot 2. Smart bot
Differentiate between a script-bot and a smart-bot.
Script bot Smart bot
A scripted chatbot doesn’t carry even Smart bots are built on NLP and ML.
a glimpse of A.I

Script bots are easy to make Smart –bots are comparatively


difficult to make.

Script bot functioning is very limited Smart-bots are flexible and powerful.
as they are less powerful.

Script bots work around a script Smart bots work on bigger databases
which is programmed in them and other resources directly

No or little language processing skills NLP and Machine learning skills are
required.

Limited functionality Wide functionality


Example: the bots which are deployed Example: Google Assistant, Alexa,
in the customer care section of various Cortana, Siri, etc.
companies

Analogy with programming

• Different syntax, same semantics: 2+3 = 3+2


o Here the way these statements are written is different, but their
meanings are the same that is 5.
• Different semantics, same syntax: watch=watch
o Here the statements written have the same syntax but their meanings
are different.

Multiple meanings of the word

Sentence 1: “His face turned red after he found out that he took the wrong
bag”
Possible meanings: Is he feeling ashamed because he took another person’s bag
instead of his?
• Is he feeling angry because he did not manage to steal the bag that he has
been targeting?

Sentence 2: “The red car zoomed past his nose”

Possible meanings: Probably talking about the colour of the car.

Sentence 3: “His face turns red after consuming the medicine”


Possible meanings: Is he having an allergic reaction?
Or is he not able to bear the taste of that medicine?
Perfect Syntax, no meaning

“Chickens feed extravagantly while the moon drinks tea.”

• This statement is correct in syntax but does this make any sense?
• In human language, a perfect balance of syntax and semantics is important
for better understanding.

Text Normalisation process

In Text Normalization, we undergo several steps to normalize the text to a lower


level. That is, we will be working on text from multiple documents and the term
used for the whole textual data from all the documents altogether is known
as corpus.

1. Sentence Segmentation

Under sentence segmentation, the whole corpus is divided into sentences. Each
sentence is taken as a different data so now the whole corpus gets reduced to
sentences.

Example:

Before Sentence Segmentation


“You want to see the dreams with close eyes and achieve them? They’ll remain
dreams, look for AIMs and your eyes have to stay open for a change to be seen.”

After Sentence Segmentation


1. You want to see the dreams with close eyes and achieve them?
2. They’ll remain dreams, look for AIMs and your eyes have to stay open for a
change to be seen.
2. Tokenisation
After segmenting the sentences, each sentence is then further divided into tokens.
A “Token” is a term used for any word or number or special character occurring in
a sentence.

1.
You
want
to
see
the
drea
ms Yo wa See th drea wit clos ey
to and Achieve them ?
with u nt e ms h e es
close
eyes
and
achie
ve
them
?

Under Tokenisation, every word, number, and special character is considered


separately and each of them is now a separate token.
3. Removal of Stopwords

In this step, the tokens which are not necessary are removed from the token list. To
make it easier for the computer to focus on meaningful terms, these words are
removed. It could also be a number, special character

Stopwords: Stopwords are the words that occur very frequently in the corpus but
do not add any value to it.

Examples: a, an, and, are, as, for, it, is, into, in, if, on, or, such, the, there, to.

Example

1. You want to see the dreams with close eyes and achieve them?
the removed words would be
o
o to, the, and, ?
2. The outcome would be:
o You want see dreams with close eyes achieve them

4.Converting text to a common case

we convert the whole text into a similar case, preferably lower case. This ensures
that the case sensitivity of the machine does not consider the same words as
different just because of different cases.

5. Stemming: Stemming is a technique used to extract the base form of the words
by removing affixes from them. It is just like cutting down the branches of a
tree to its stems.

Words Affixes Stem

healing ing heal

dreams s dream

caring ing car


6. Lemmatization: : Lemmatization is an organized & step by step procedure of obtaining the
root form of the word.

Words Affixes lemma

healing ing heal

dreams s dream

caring ing care

Bag of Words

Bag of Words is a Natural Language Processing model which helps in extracting


features out of the text which can be helpful in machine learning algorithms. In bag
of words, we get the occurrences of each word and construct the vocabulary for the
corpus.

The following steps should be followed to implement the bag of words:

1. Text Normalisation: Collect data and pre-process it


2. Create Dictionary: Make a list of all the unique words occurring in the corpus.
(Vocabulary)
3. Create document vectors: For each document in the corpus, find out how
many times the word from the unique list of words has occurred.
4. Create document vectors for all the documents.

Let us go through all the steps with an example


Step 1: Pre process the documents.
Document 1: Aman and Anil are stressed
Document 2: Aman went to a therapist
Document 3: Anil went to download a health chatbot
Here are three documents having one sentence each. After text normalisation, the
text becomes:
Document 1: [aman, and, anil, are, stressed]
Document 2: [aman, went, to, a, therapist]
Document 3: [anil, went, to, download, a, health, chatbot]
Note that no tokens have been removed in the stop words removal step. It is
because we have very little data and since the frequency of all the words is
almost the same, no word can be said to have lesser value than the other.
Step 2: Create Dictionary : Go through all the steps and create a dictionary i.e.,
list down all the words which occur in all three documents:
Note that dictionary in NLP means a list of all the unique words occurring in
the corpus. even though some words are repeated in different documents,
they are all written just once as while creating the dictionary, we create the
list of unique words.

Step 3: Create document vector In this step, the vocabulary is written in the top
row. Now, for each word in the document, if it matches with the vocabulary, put a
1 under it. If the same word appears again, increment the previous value by 1.
And if the word does not occur in that document, put a 0 under it.

Step 4: Repeat the same for all the documents.

Finally, the words have been converted to numbers. These numbers are the values
of each document. Here, we can see that since we have less amount of data, words
like ‘are’ and ‘and’ also have a high value.

You might also like