0% found this document useful (0 votes)

53 views7 pages

Unit 6 - NLP Notes

Natural Language Processing (NLP) is a field that focuses on the interaction between computers and human languages, enabling computers to process and analyze natural language data. Key applications of NLP include automatic summarization, sentiment analysis, text classification, and virtual assistants. The document also covers concepts like chatbots, text normalization steps, the Bag of Words model, and the relationship between word occurrence and value.

Uploaded by

ZIA PUTHENKATTIL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views7 pages

Unit 6 - NLP Notes

Uploaded by

ZIA PUTHENKATTIL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

UNIT 6: NATURAL LANGUAGE PROCESSING

1. Define NLP.
NLP is a subfield of Linguistics, Computer Science, Information Engineering,
and Artificial Intelligence concerned with the interactions between computers
and human (natural) languages, in particular how to program computers to
process and analyse large amounts of natural language data.
2. Explain the Applications of Natural Language Processing.
I. Automatic Summarization
Automatic summarization – is relevant not only for summarizing the
meaning of documents and information, but also to understand the
emotional meanings within the information such as in collecting data
from social media.
E.g. Overview of a news item or blog post
II. Sentiment Analysis:
The goal of sentiment analysis is to identify sentiment among several
posts or even in the same post where emotion is not always explicitly
expressed.
Companies use Natural Language Processing applications, such as
sentiment analysis, to identify opinions and sentiment online to help
them understand what customers think about their products and
services.

III. Text classification:

Text classification makes it possible to assign predefined categories to
a document and organize it to help you find the information you need or
simplify some activities. For example, an application of text
categorization is spam filtering in email.
IV. Virtual Assistants:
By accessing our data, they can help us in keeping notes of our tasks,
make calls for us, send messages and a lot more. With the help of
speech recognition, these assistants can not only detect our speech
but can also make sense out of it.
E.g. Alexa, Cortana, Siri, Google Assistant
3. What is a Chabot?

A chatbot is a computer program that's designed to simulate human

conversation through voice commands or text chats or both. Eg: Mitsuku
Bot, Jabberwacky,CleverBot,Haptik,Rose,Ochatbot etc.
4. Differentiate between a script-bot and a smart-bot.
5. Differentiate between Human Language and Computer Language.
Humans communicate through language which we process all the time.
Our brain keeps on processing the sounds that it hears around itself and
tries to make sense out of them all the time.
On the other hand, the computer understands the language of numbers.
Everything that is sent to the machine has to be converted to numbers.
And while typing, if a single mistake is made, the computer throws an
error and does not process that part. The communications made by the
machines are very basic and simple.

6. Give an example of the following: Multiple meanings of a word Perfect

syntax, no meaning.
Example of Multiple meanings of a word
His face turns red after consuming the medicine
Meaning - Is he having an allergic reaction? Or is he not able to bear the taste
of that medicine?

Example of Perfect syntax, no meaning-

Chickens feed extravagantly while the moon drinks tea.
This statement is correct grammatically but it does not make any sense. In
Human language, a perfect balance of syntax and semantics is important for
better understanding.
7. Define POS.
Part-of-speech (POS) tagging is a popular Natural Language Processing
process which refers to categorizing words in a text (corpus) in
correspondence with a particular part of speech, depending on the definition
of the word and its context.

8. What are the steps of text Normalization? Explain them in brief.

Text Normalization: In Text Normalization, we undergo several steps to

normalize the text to a lower level.

1. Sentence Segmentation –
Under sentence segmentation, the whole corpus is divided into
sentences. Each sentence is taken as a different data so now the whole
corpus gets reduced to sentences.
2. Tokenisation- After segmenting the sentences, each sentence is then
further divided into tokens.
Tokens is a term used for any word or number or special character
occurring in a sentence.
Under tokenisation, every word, number and special character is
considered separately and each of them is now a separate token.
3. Removing Stop words, Special Characters and Numbers - In this
step, the tokens which are not necessary are removed from the token list.
Stop words are the words which occur very frequently in the corpus but
do not add any value to it.
4. Converting text to a common case -After the stop words removal, we
convert the whole text into a similar case, preferably lower case. This
ensures that the case-sensitivity of the machine does not consider same
words as different just because of different cases.
5. Stemming -In this step, the remaining words are reduced to their root
words. In other words, stemming is the process in which the affixes of
words are removed and the words are converted to their base form.
6. Lemmatization -in lemmatization, the word we get after affix removal
(also
known as lemma) is a meaningful one. With this we have normalized our
text to tokens which are the simplest form of words present in the corpus.
Now it is time to convert the tokens into numbers. For this, we would use
the Bag of Words algorithm.
9. Rahul has been given the task of text normalisation. Help him
normalise the text in the segmented sentences given below:
Document 1: My brother loves math and science.
Document 2: My brother likes to read books on science and listen to rock
music.
Step1: Tokenisation
[My, brother, loves, math, and, science, likes, to, read, books, on, science,
and, listen, to, rock, music]
Step2: Removing stopwords
[My, brother, loves, math, science, likes, read, books, listen, rock, music]
Step3: Converting text to common case
[my, brother, loves, math, science, likes, read, books, listen, rock, music]
Step4: stemming/Lemmatization
[my, brother, love, math, science, like, read, book, listen, rock, music]
10. Define Bag of Words.
Bag of Words is a Natural Language Processing model which helps in
extracting features out of the text which can be helpful in machine learning
algorithms. In bag of words, we get the occurrences of each word and
construct the vocabulary for the corpus.
11. Describe the steps to implement bag of words algorithm.
Step-by-step approach to implement bag of words algorithm:
1. Text Normalisation: Collect data and pre-process it
2. Create Dictionary: Make a list of all the unique words occurring in the
corpus. (Vocabulary)
3. Create document vectors: For each document in the corpus, find out how
many times the word from the unique list of words has occurred.
4. Create document vectors for all the documents.
12. Create a document vector table from the following documents by
implementing all the four steps of Bag of words model.
Document 1: Aman and Anil are stressed
Document 2: Aman went to a therapist
Document 3: Anil went to download a health chatbot
Solution:
Step 1: Collecting data and pre-processing it.
Document 1: [aman, and, anil, are, stressed]
Document 2: [aman, went, to, a, therapist]
Document 3: [anil, went, to, download, a, health, chatbot]
Step 2: Create Dictionary

Step 3: Create document vector

Step 4: Repeat for all documents

13. Define the following terms:

Document Vector Table- is a table containing the frequency of each word of
the vocabulary in each document.
Term frequency- is the frequency of a word in one document.
Document Frequency is the number of documents in which the word occurs
irrespective of how many times it has occurred in those documents.
Inverse Document Frequency- In case of inverse document frequency, we
need to put the document frequency in the denominator while the total
number of documents is the numerator.
Inverse document frequency=total no.of documents/document frequency.

14. Explain the relation between occurrence and value of a word.

As shown in the graph, occurrence and value of a word are inversely
proportional. The words which occur most (like stop words) have negligible
value. As the occurrence of words drops, the value of such words rises.
These words are termed as rare or valuable words. These words occur the
least but add the most value to the corpus.
15. Classify each of the images according to how well the model’s output
matches the data samples:

Here, the red dashed line is model’s output while the blue crosses are actual
data samples.
1. The model’s output does not match the true function at all. Hence the
model is said to be under fitting and its accuracy is lower.
2. In the second case, model performance is trying to cover all the data
samples even if they are out of alignment to the true function. This
model is said to be over fitting and this too has a lower accuracy
3. In the third one, the model’s performance matches well with the true
function which states that the model has optimum accuracy and the
model is called a perfect fit.
16. Calculate Term frequency, Document frequency and inverse document
frequency for the given corpus and mention the word(s) having highest
value.
Document 1: We are going to Mumbai
Document 2: Mumbai is a famous place.
Document 3: We are going to a famous place.
Document 4: I am famous in Mumbai.
Solution:
Term Frequency:

Document Frequency:

Inverse Document Frequency:

W Ar goi to Mumb is a Famo Pla I a in

e e ng ai us ce m
4/2 4/2 4/2 4/ 4/3 4/ 4/ 4/3 4/2 4/ 4/ 4/
2 1 2 1 1 1

The words having highest value are – Mumbai, Famous

Faculty Name: Dr. Humera Khanam Subject Name:NLP
No ratings yet
Faculty Name: Dr. Humera Khanam Subject Name:NLP
206 pages
ProfEd221 - Unit 5 - Feedbacking and Communicating Assessment Results PDF
100% (4)
ProfEd221 - Unit 5 - Feedbacking and Communicating Assessment Results PDF
12 pages
Literary Voice - March 2021
No ratings yet
Literary Voice - March 2021
372 pages
WEG - Transformer
No ratings yet
WEG - Transformer
20 pages
Natural Language Processing Notes Class 10
No ratings yet
Natural Language Processing Notes Class 10
10 pages
Sample Paper Questions - NLP (Part 1)
No ratings yet
Sample Paper Questions - NLP (Part 1)
7 pages
NLP Q&A1a Text Processing
No ratings yet
NLP Q&A1a Text Processing
16 pages
Applications of AI
No ratings yet
Applications of AI
11 pages
Moog Valves DIVelectricalInterfaces Manual
No ratings yet
Moog Valves DIVelectricalInterfaces Manual
108 pages
MRO Intelligence Report PDF
No ratings yet
MRO Intelligence Report PDF
9 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
NLP 2-5 Unit Notes
No ratings yet
NLP 2-5 Unit Notes
83 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
NLP - Module 2
No ratings yet
NLP - Module 2
54 pages
NLP-Questions Class 10 Ai
No ratings yet
NLP-Questions Class 10 Ai
8 pages
Compiler Design 1
100% (1)
Compiler Design 1
30 pages
NLP
No ratings yet
NLP
40 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
PDF NLP
No ratings yet
PDF NLP
7 pages
Planning For Estidama
No ratings yet
Planning For Estidama
34 pages
C10 - Ai - Unit 3 - NLP - Half Yearly
No ratings yet
C10 - Ai - Unit 3 - NLP - Half Yearly
37 pages
Most Expected Questions 2024 AI 417 Class 10
No ratings yet
Most Expected Questions 2024 AI 417 Class 10
109 pages
Oral Habits and Its Relationship To Malocclusion A Review.20141212083000
No ratings yet
Oral Habits and Its Relationship To Malocclusion A Review.20141212083000
4 pages
Chapter 5
No ratings yet
Chapter 5
22 pages
Social Science Sec 2024-25
No ratings yet
Social Science Sec 2024-25
65 pages
OOP Templates Assignment - Zip
No ratings yet
OOP Templates Assignment - Zip
31 pages
Part B Notes
No ratings yet
Part B Notes
62 pages
Unit 6 - AI (NLP)
No ratings yet
Unit 6 - AI (NLP)
37 pages
Robotic Gripper Using Four Bar Mechanism
No ratings yet
Robotic Gripper Using Four Bar Mechanism
54 pages
1 - Tuberia 4'' SCH40 222956 Tpco
No ratings yet
1 - Tuberia 4'' SCH40 222956 Tpco
2 pages
Ai NLP
No ratings yet
Ai NLP
9 pages
Mining Industry Business Plan by Slidesgo
No ratings yet
Mining Industry Business Plan by Slidesgo
58 pages
Lemmatization Is The Grouping Together of Different Forms of The Same Word. in Search
No ratings yet
Lemmatization Is The Grouping Together of Different Forms of The Same Word. in Search
11 pages
NLP and Evaluation
No ratings yet
NLP and Evaluation
23 pages
(English-Vietnamese) Bạn có nhiều hơn một cuộc đời - Marc Levy - Have A Sip EP98 (DownSub.com)
No ratings yet
(English-Vietnamese) Bạn có nhiều hơn một cuộc đời - Marc Levy - Have A Sip EP98 (DownSub.com)
46 pages
Black Spot Study and Accident Prediction Model Using Multiple Liner Regression PDF
No ratings yet
Black Spot Study and Accident Prediction Model Using Multiple Liner Regression PDF
16 pages
1009 NLP PPT
No ratings yet
1009 NLP PPT
31 pages
Air Pollution: Classification of Air Pollutants
No ratings yet
Air Pollution: Classification of Air Pollutants
33 pages
Unit-6 Natural Language Processing
No ratings yet
Unit-6 Natural Language Processing
7 pages
Dupppppppppp
No ratings yet
Dupppppppppp
15 pages
Unit 6 Natural Language Processing
No ratings yet
Unit 6 Natural Language Processing
10 pages
Liebert Psa5 500 1500va User Guide - 00
No ratings yet
Liebert Psa5 500 1500va User Guide - 00
26 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
NLP Ai X
No ratings yet
NLP Ai X
6 pages
Natural Language Processing - Compressed
No ratings yet
Natural Language Processing - Compressed
17 pages
Week 2 - Critical Thinking and Fundamental Reading Skills
No ratings yet
Week 2 - Critical Thinking and Fundamental Reading Skills
49 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Plucker and Callahan 2014
No ratings yet
Plucker and Callahan 2014
17 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
Sample Paper Questions - NLP (Part 2)
No ratings yet
Sample Paper Questions - NLP (Part 2)
7 pages
CH 6
No ratings yet
CH 6
19 pages
NLP Notes
No ratings yet
NLP Notes
10 pages
Unit 6 (NLP)
No ratings yet
Unit 6 (NLP)
8 pages
SKD Academy (CBSE) Session - 2024-2025 Subject - Artificial Intelligence (417) Important Questions Chap - NLP
No ratings yet
SKD Academy (CBSE) Session - 2024-2025 Subject - Artificial Intelligence (417) Important Questions Chap - NLP
7 pages
HDI OnQ RandI Set A Closed To Arrival Control On Rate Levels V1.0
No ratings yet
HDI OnQ RandI Set A Closed To Arrival Control On Rate Levels V1.0
11 pages
Assignment of AI Finished
No ratings yet
Assignment of AI Finished
16 pages
NLP Qa
No ratings yet
NLP Qa
10 pages
Artificial Intelligence Class X Unit 7: Natural Language Processing
No ratings yet
Artificial Intelligence Class X Unit 7: Natural Language Processing
10 pages
Natural Language Processing - NOTES
No ratings yet
Natural Language Processing - NOTES
4 pages
AI HW
No ratings yet
AI HW
4 pages
517-C-30070-Assignment - Chapter NLP
No ratings yet
517-C-30070-Assignment - Chapter NLP
9 pages
Q ClassX AI Ch7
No ratings yet
Q ClassX AI Ch7
6 pages
NLP Revision Notes
No ratings yet
NLP Revision Notes
6 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
5 pages
MTE Practice Set
No ratings yet
MTE Practice Set
4 pages
Mitutoyo - Przenośny Twardościomierz Leeb HH-411 - 2006 EN
No ratings yet
Mitutoyo - Przenośny Twardościomierz Leeb HH-411 - 2006 EN
2 pages
Ai Notes
No ratings yet
Ai Notes
11 pages
AIUnit 6 10
No ratings yet
AIUnit 6 10
8 pages
NLP Class10 PDF
No ratings yet
NLP Class10 PDF
9 pages
9783293-CLASS10 AI Worksheet PART B UNIT6 Natural Language Processing
No ratings yet
9783293-CLASS10 AI Worksheet PART B UNIT6 Natural Language Processing
3 pages
Cbse - Department of Skill Education Artificial Intelligence
No ratings yet
Cbse - Department of Skill Education Artificial Intelligence
11 pages
NLP - CH-6
No ratings yet
NLP - CH-6
4 pages
Installation Instructions: Diesel/Alternator Tachometer 3-3/8" & 5"
No ratings yet
Installation Instructions: Diesel/Alternator Tachometer 3-3/8" & 5"
2 pages
Question Bank For Seen Pre-Board - AI - Grade 10 - 2021-22
No ratings yet
Question Bank For Seen Pre-Board - AI - Grade 10 - 2021-22
7 pages
1st Sem Syllabus (ICT 416 Programming Concept With C)
No ratings yet
1st Sem Syllabus (ICT 416 Programming Concept With C)
5 pages
PSM1
No ratings yet
PSM1
4 pages
P.S.Senior Secondary School Class X - Artificial Intelligence - 2021-22 Natural Language Processing Question and Answers
No ratings yet
P.S.Senior Secondary School Class X - Artificial Intelligence - 2021-22 Natural Language Processing Question and Answers
7 pages
Ocean DEHUMID
No ratings yet
Ocean DEHUMID
4 pages
Previous Year Question Paper NLP
No ratings yet
Previous Year Question Paper NLP
5 pages
NLP - Notes
No ratings yet
NLP - Notes
3 pages
NLP Notes
No ratings yet
NLP Notes
3 pages
X - AI-NLP Worksheet
No ratings yet
X - AI-NLP Worksheet
2 pages
Ch-6 NLP
No ratings yet
Ch-6 NLP
4 pages
Gyan Sagar College of Engineering, SAGAR, (M.P.)
No ratings yet
Gyan Sagar College of Engineering, SAGAR, (M.P.)
5 pages
Admnadvt
No ratings yet
Admnadvt
2 pages
NLP Key Points
No ratings yet
NLP Key Points
3 pages
Reflective Essay
No ratings yet
Reflective Essay
4 pages
Work Measurement Techniques Methods Types
No ratings yet
Work Measurement Techniques Methods Types
5 pages
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet

Unit 6 - NLP Notes

Uploaded by

Unit 6 - NLP Notes

Uploaded by

UNIT 6: NATURAL LANGUAGE PROCESSING

III. Text classification:

A chatbot is a computer program that's designed to simulate human

6. Give an example of the following: Multiple meanings of a word Perfect

Example of Perfect syntax, no meaning-

8. What are the steps of text Normalization? Explain them in brief.

Text Normalization: In Text Normalization, we undergo several steps to

Step 3: Create document vector

13. Define the following terms:

14. Explain the relation between occurrence and value of a word.

Inverse Document Frequency:

W Ar goi to Mumb is a Famo Pla I a in

The words having highest value are – Mumbai, Famous

You might also like