0% found this document useful (0 votes)

18 views

Natural Language Processing

Uploaded by

k44123040

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Natural Language Processing

Uploaded by

k44123040

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Natural Language Processing

Natural Language Processing is one of the branches of AI that helps machines to

understand, and process human languages such as English or Hindi to analyse and
derive their meaning.
NLP takes the data as input from the spoken words, verbal commands or speech
recognition software that humans use in their daily lives and operates on this.
Before getting deeper into the concept follow the links to play the games based on
NLP.

Applications of Natural Language processing

Some of the applications of Natural Language Processing that are used in the real-
life
Automatic Summarization

1. Automatic summarization is relevant for summarizing the meaning of

documents and information and also to understand the emotional
meanings within the information (such as in collecting data from social
media)
2. For example, newsletters, social media marketing, video scripting etc.

Sentiment Analysis

1. Identify sentiment among several posts or even in the same post where
emotion is not always explicitly expressed.
2. Companies use it to identify opinions and sentiments to understand what
customers think about their products and services.
Text classification

1. Text classification makes it possible to assign predefined categories to a

document and organize it to help you find the information you need or
simplify some activities.
2. For example, an application of text categorization is spam filtering in email.
Virtual Assistants

1. Nowadays Google Assistant, Cortana, Siri, Alexa, etc have become an integral
part of our lives. Not only can we talk to them but they also have the ability
to make our lives easier.
2. By accessing our data, they can help us in keeping notes of our tasks, making
calls for us, sending messages, and a lot more.
3. With the help of speech recognition, these assistants can not only detect our
speech but can also make sense of it.
4. According to recent research, a lot more advancements are expected in this
field in the near future.
Chatbots

• A chatbot is a software application used to conduct an on-line chat

conversation via text or text-to-speech, in lieu of providing direct contact with
a live human agent. Some of the popular chatbots are: Mitsuku Bot, Clever
Bot, Jabberwacky, Haptik, Rose, Chatbot
There are 2 types of chatbots
1. Script bot 2. Smart bot
Differentiate between a script-bot and a smart-bot.
Script bot Smart bot
A scripted chatbot doesn’t carry even Smart bots are built on NLP and ML.
a glimpse of A.I

Script bots are easy to make Smart –bots are comparatively

difficult to make.

Script bot functioning is very limited Smart-bots are flexible and powerful.
as they are less powerful.

Script bots work around a script Smart bots work on bigger databases
which is programmed in them and other resources directly

No or little language processing skills NLP and Machine learning skills are
required.

Limited functionality Wide functionality

Example: the bots which are deployed Example: Google Assistant, Alexa,
in the customer care section of various Cortana, Siri, etc.
companies

Analogy with programming

• Different syntax, same semantics: 2+3 = 3+2

o Here the way these statements are written is different, but their
meanings are the same that is 5.
• Different semantics, same syntax: watch=watch
o Here the statements written have the same syntax but their meanings
are different.

Multiple meanings of the word

Sentence 1: “His face turned red after he found out that he took the wrong
bag”
Possible meanings: Is he feeling ashamed because he took another person’s bag
instead of his?
• Is he feeling angry because he did not manage to steal the bag that he has
been targeting?

Sentence 2: “The red car zoomed past his nose”

Possible meanings: Probably talking about the colour of the car.

Sentence 3: “His face turns red after consuming the medicine”

Possible meanings: Is he having an allergic reaction?
Or is he not able to bear the taste of that medicine?
Perfect Syntax, no meaning

“Chickens feed extravagantly while the moon drinks tea.”

• This statement is correct in syntax but does this make any sense?
• In human language, a perfect balance of syntax and semantics is important
for better understanding.

Text Normalisation process

In Text Normalization, we undergo several steps to normalize the text to a lower

level. That is, we will be working on text from multiple documents and the term
used for the whole textual data from all the documents altogether is known
as corpus.

1. Sentence Segmentation

Under sentence segmentation, the whole corpus is divided into sentences. Each
sentence is taken as a different data so now the whole corpus gets reduced to
sentences.

Example:

Before Sentence Segmentation

“You want to see the dreams with close eyes and achieve them? They’ll remain
dreams, look for AIMs and your eyes have to stay open for a change to be seen.”

After Sentence Segmentation

1. You want to see the dreams with close eyes and achieve them?
2. They’ll remain dreams, look for AIMs and your eyes have to stay open for a
change to be seen.
2. Tokenisation
After segmenting the sentences, each sentence is then further divided into tokens.
A “Token” is a term used for any word or number or special character occurring in
a sentence.

1.
You
want
to
see
the
drea
ms Yo wa See th drea wit clos ey
to and Achieve them ?
with u nt e ms h e es
close
eyes
and
achie
ve
them
?

Under Tokenisation, every word, number, and special character is considered

separately and each of them is now a separate token.
3. Removal of Stopwords

In this step, the tokens which are not necessary are removed from the token list. To
make it easier for the computer to focus on meaningful terms, these words are
removed. It could also be a number, special character

Stopwords: Stopwords are the words that occur very frequently in the corpus but
do not add any value to it.

Examples: a, an, and, are, as, for, it, is, into, in, if, on, or, such, the, there, to.

Example

1. You want to see the dreams with close eyes and achieve them?
the removed words would be
o
o to, the, and, ?
2. The outcome would be:
o You want see dreams with close eyes achieve them

4.Converting text to a common case

we convert the whole text into a similar case, preferably lower case. This ensures
that the case sensitivity of the machine does not consider the same words as
different just because of different cases.

5. Stemming: Stemming is a technique used to extract the base form of the words
by removing affixes from them. It is just like cutting down the branches of a
tree to its stems.

Words Affixes Stem

healing ing heal

dreams s dream

caring ing car

6. Lemmatization: : Lemmatization is an organized & step by step procedure of obtaining the
root form of the word.

Words Affixes lemma

healing ing heal

dreams s dream

caring ing care

Bag of Words

Bag of Words is a Natural Language Processing model which helps in extracting

features out of the text which can be helpful in machine learning algorithms. In bag
of words, we get the occurrences of each word and construct the vocabulary for the
corpus.

The following steps should be followed to implement the bag of words:

1. Text Normalisation: Collect data and pre-process it

2. Create Dictionary: Make a list of all the unique words occurring in the corpus.
(Vocabulary)
3. Create document vectors: For each document in the corpus, find out how
many times the word from the unique list of words has occurred.
4. Create document vectors for all the documents.

Let us go through all the steps with an example

Step 1: Pre process the documents.
Document 1: Aman and Anil are stressed
Document 2: Aman went to a therapist
Document 3: Anil went to download a health chatbot
Here are three documents having one sentence each. After text normalisation, the
text becomes:
Document 1: [aman, and, anil, are, stressed]
Document 2: [aman, went, to, a, therapist]
Document 3: [anil, went, to, download, a, health, chatbot]
Note that no tokens have been removed in the stop words removal step. It is
because we have very little data and since the frequency of all the words is
almost the same, no word can be said to have lesser value than the other.
Step 2: Create Dictionary : Go through all the steps and create a dictionary i.e.,
list down all the words which occur in all three documents:
Note that dictionary in NLP means a list of all the unique words occurring in
the corpus. even though some words are repeated in different documents,
they are all written just once as while creating the dictionary, we create the
list of unique words.

Step 3: Create document vector In this step, the vocabulary is written in the top
row. Now, for each word in the document, if it matches with the vocabulary, put a
1 under it. If the same word appears again, increment the previous value by 1.
And if the word does not occur in that document, put a 0 under it.

Step 4: Repeat the same for all the documents.

Finally, the words have been converted to numbers. These numbers are the values
of each document. Here, we can see that since we have less amount of data, words
like ‘are’ and ‘and’ also have a high value.

Unit-2 Aim 502
No ratings yet
Unit-2 Aim 502
6 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
No ratings yet
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
6 pages
NLP Assignment Answer
No ratings yet
NLP Assignment Answer
4 pages
Multilingual Information Retrieval
No ratings yet
Multilingual Information Retrieval
18 pages
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
0% (1)
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
2 pages
Natural Language Processing: Dr. Abdulfetah A.A
No ratings yet
Natural Language Processing: Dr. Abdulfetah A.A
25 pages
NLP Qb-Ese
No ratings yet
NLP Qb-Ese
2 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
Solutions To NLP I Mid Set A
100% (1)
Solutions To NLP I Mid Set A
8 pages
Dependency Parsing: Pawan Goyal
No ratings yet
Dependency Parsing: Pawan Goyal
38 pages
Semantics: Lexical Semantics: Pawan Goyal
No ratings yet
Semantics: Lexical Semantics: Pawan Goyal
54 pages
Chapter 6
100% (1)
Chapter 6
28 pages
NLP 3 4 5
No ratings yet
NLP 3 4 5
105 pages
Development of An Indian Legal Language Model (LLM) For Enhanced Legal Text Analysis and Assistance
No ratings yet
Development of An Indian Legal Language Model (LLM) For Enhanced Legal Text Analysis and Assistance
7 pages
NLP Iat QB
No ratings yet
NLP Iat QB
10 pages
Unit-1 Aim 502
No ratings yet
Unit-1 Aim 502
15 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
IS 7118 Unit-9 Semantics
No ratings yet
IS 7118 Unit-9 Semantics
82 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
Unit-III PDF
No ratings yet
Unit-III PDF
72 pages
NLP Unit-2 Notes
No ratings yet
NLP Unit-2 Notes
45 pages
Com713 Advanced Data Structures and Algorithms
No ratings yet
Com713 Advanced Data Structures and Algorithms
13 pages
(A) What Is Traditional Model of NLP?: Unit - 1
No ratings yet
(A) What Is Traditional Model of NLP?: Unit - 1
18 pages
NLP Unit 4
No ratings yet
NLP Unit 4
10 pages
Text Processing, Tokenization & Characteristics
No ratings yet
Text Processing, Tokenization & Characteristics
89 pages
NLP Notes Unit-3.Doc
No ratings yet
NLP Notes Unit-3.Doc
19 pages
Unit -1 NLP-R20
No ratings yet
Unit -1 NLP-R20
10 pages
Question Bank
No ratings yet
Question Bank
13 pages
NLP SEM QUESTIONS AND ANSWERS
No ratings yet
NLP SEM QUESTIONS AND ANSWERS
72 pages
NLP Lab Tasks
No ratings yet
NLP Lab Tasks
16 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
No ratings yet
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
30 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
103 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
Speech and Language Processing - J&M
No ratings yet
Speech and Language Processing - J&M
599 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
2 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
14 pages
QA Review: IR-based Question Answering
No ratings yet
QA Review: IR-based Question Answering
11 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
Shivangi Tyagi (NLP Assignments)
No ratings yet
Shivangi Tyagi (NLP Assignments)
60 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
9 pages
NLP Akash
No ratings yet
NLP Akash
4 pages
A Beginner's Introduction To Natural Language Processing (NLP)
100% (1)
A Beginner's Introduction To Natural Language Processing (NLP)
15 pages
NLP Unit 3
No ratings yet
NLP Unit 3
20 pages
Speech Recognition Architecture
No ratings yet
Speech Recognition Architecture
13 pages
NLP Lab Expdoc New
No ratings yet
NLP Lab Expdoc New
103 pages
CFG 2
No ratings yet
CFG 2
6 pages
UNIT_5_DL
No ratings yet
UNIT_5_DL
11 pages
Implementation of N-Gram Technique
No ratings yet
Implementation of N-Gram Technique
6 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
NLP Part1
No ratings yet
NLP Part1
67 pages
Cs 224 N
No ratings yet
Cs 224 N
128 pages
NLP Notes
No ratings yet
NLP Notes
43 pages
Natural Language Processing-Wiki
No ratings yet
Natural Language Processing-Wiki
237 pages
NLP End Sem Paper - Evaluation Scheme
No ratings yet
NLP End Sem Paper - Evaluation Scheme
14 pages
Ch11 3 Tries
No ratings yet
Ch11 3 Tries
11 pages
Csa4006 Natural-Language-Processing LT 1.0 6 Csa4006
No ratings yet
Csa4006 Natural-Language-Processing LT 1.0 6 Csa4006
2 pages
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Assignment 5 Teaching Young Learners
No ratings yet
Assignment 5 Teaching Young Learners
2 pages
Local Media-2138901270
No ratings yet
Local Media-2138901270
32 pages
Detailed Lesson Plan-Teaching and Assessment of Macro-Skills
No ratings yet
Detailed Lesson Plan-Teaching and Assessment of Macro-Skills
10 pages
Task CA Nanda
No ratings yet
Task CA Nanda
3 pages
B2 COURSE - Unit 30 PDF
No ratings yet
B2 COURSE - Unit 30 PDF
3 pages
Spiritual Science Research
No ratings yet
Spiritual Science Research
9 pages
TOPIC 4 NOTES 3-7 Key Features of 21st Century Learning
No ratings yet
TOPIC 4 NOTES 3-7 Key Features of 21st Century Learning
8 pages
Context Clues Notes Name: Date
No ratings yet
Context Clues Notes Name: Date
2 pages
LP - Stress and Intonation
No ratings yet
LP - Stress and Intonation
12 pages
Mil Notes
No ratings yet
Mil Notes
2 pages
Effect of English and Urdu As A Medium of Instruction On Student Achievement at Elementary Level in Rural Areas of Girls Public Schools of Tehsil Gujrat
No ratings yet
Effect of English and Urdu As A Medium of Instruction On Student Achievement at Elementary Level in Rural Areas of Girls Public Schools of Tehsil Gujrat
3 pages
Seibert - PHD - Thesis - Rainfall Runoff Models
No ratings yet
Seibert - PHD - Thesis - Rainfall Runoff Models
52 pages
Teaching and Teacher Education: Robert C. Kleinsasser
No ratings yet
Teaching and Teacher Education: Robert C. Kleinsasser
11 pages
The Caduecean Hieratic of Matrix Synchronicity
100% (3)
The Caduecean Hieratic of Matrix Synchronicity
102 pages
AI 1 Solution
100% (1)
AI 1 Solution
3 pages
Lecture 1 Communicative Acts in Business Communication
No ratings yet
Lecture 1 Communicative Acts in Business Communication
14 pages
Impact of OB On Performance
100% (1)
Impact of OB On Performance
6 pages
Visual Scaffolding
No ratings yet
Visual Scaffolding
20 pages
Dsds
No ratings yet
Dsds
10 pages
Deep Learning Decoding Problems
100% (1)
Deep Learning Decoding Problems
103 pages
ai
No ratings yet
ai
56 pages
UEL-SG-7001 - Module Specification
100% (1)
UEL-SG-7001 - Module Specification
4 pages
Peter Senge
No ratings yet
Peter Senge
1 page
Mental Power Unleashed
100% (3)
Mental Power Unleashed
34 pages
Cal/ Cale/ Calm Performance-Based Assessment
No ratings yet
Cal/ Cale/ Calm Performance-Based Assessment
24 pages
Do You Consider Yourself A Leader or A Follower?
No ratings yet
Do You Consider Yourself A Leader or A Follower?
1 page
Blind To Details
No ratings yet
Blind To Details
6 pages
Unit 3 UD
No ratings yet
Unit 3 UD
8 pages
Excel Dax Tutorial PDF
No ratings yet
Excel Dax Tutorial PDF
23 pages
ASL 2 Test James Answer Keys
No ratings yet
ASL 2 Test James Answer Keys
2 pages