0% found this document useful (0 votes)
473 views

Sample Paper Questions - NLP (Part 1)

Uploaded by

luvikasingh20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
473 views

Sample Paper Questions - NLP (Part 1)

Uploaded by

luvikasingh20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Sample Paper Questions - Natural Language Processing (Part 1)

Q1 What do you mean by corpus?


In Text Normalization, we undergo several steps to normalize the text to a lower level. That is, we will be
working on text from multiple documents and the term used for the whole textual data from all the documents
altogether is known as corpus.

Q2 Explain the concept of Bag of Words.


 Bag of Words is a Natural Language Processing model which helps in extracting features out of the text
which can be helpful in machine learning algorithms.
 In bag of words, we get the occurrences of each word and construct the vocabulary for the corpus.
 Bag of Words just creates a set of vectors containing the count of word occurrences in the document
(reviews).
 Bag of Words vectors are easy to interpret.

Q3 What is meant by a dictionary in NLP?


Dictionary in NLP means a list of all the unique words occurring in the corpus. If some words are repeated in
different documents, they are all written just once as while creating the dictionary.

Q4 What do you mean by document vectors?


 Document Vector contains the frequency of each word of the vocabulary in a particular document.
 In document vector vocabulary is written in the top row.
 Now, for each word in the document, if it matches with the vocabulary, put a 1 under it. If the same
word appears again, increment the previous value by 1. And if the word does not occur in that document,
put a 0 under it.

Q5 What is a document vector table?


 Document Vector Table is used while implementing Bag of Words algorithm.
 In a document vector table, the header row contains the vocabulary of the corpus and other rows
correspond to different documents.
 If the document contains a particular word it is represented by 1 and absence of word is represented by 0
value.

Q6 Mention the steps involved in Bag of Words Algorithm


Here is the step-by-step approach to implement bag of words algorithm:
1. Text Normalisation: Collect data and pre-process it
2. Create Dictionary: Make a list of all the unique words occurring in the corpus. (Vocabulary)
3. Create document vectors: For each document in the corpus, find out how many times the word from
the unique list of words has occurred.
4. Create document vectors for all the documents.

Note: Number of document vectors will be the number of documents given like if 3 documents are given
then document vectors will be 3
Bag of words Algorithm

Create bag of words for the given corpus:


Document 1: We are going to Chennai.
Document 2: Chennai is a famous place.
Document 3: We are going to a famous place.
Document 4: We are going to sea shore in Chennai.

Step by step method


Ans. Step 1 – Text Normalization

Here we place the document 1 and check the other documents if any word is already in the document 1 then it is
not taken again in document 2 same ways checking for document 3 if already in 1 and 2 then not taken again it
will create text normalization

Document 1: We are going to Chennai.


Document 2: Chennai is a famous place.
Document 3: We are going to a famous place.
Document 4: We are going to sea shore in Chennai.

Output
Document 1: We, are, going, to, Chennai.
Document 2: is, a, famous, place.
Document 3:
Document 4: sea, shore, in

Step 2 – Create Dictionary: write the above words in the form of dictionary

We are going to Chennai is a famous place sea shore in

Step 3 – Create Document vector

Place the original document 1 given in the table, if the words are there then place 1 else place 0

We are going to Chennai is a famous place sea shore in


1 1 1 1 1 0 0 0 0 0 0 0

Step 4 – Repeat all the above steps for all documents to create Document vector table for the given
corpus:

Place all the original documents as 0 and 1 in the table

We are going to Chennai is a famous place sea shore in


1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 1 0 0 0
1 1 1 1 0 0 1 1 1 0 0 0
1 1 1 1 1 0 0 0 0 1 1 1
Q7 Create dictionary for the given corpus:
Document 1: We are going to Chennai.
Document 2: Chennai is a famous place.
Document 3: We are going to a famous place.
Document 4: We are going to sea shore in Chennai.

Step 1 – Text Normalization

Document 1: We, are, going, to, Chennai


Document 2: is, a, famous, place
Document 3:
Document 4: sea , shore, in

Step 2 – Create Dictionary

We are going to Chennai is a famous place sea shore in

Q8 Create a document vector table for the given corpus:


Document 1: We are going to Chennai.
Document 2: Chennai is a famous place.
Document 3: We are going to a famous place.
Document 4: We are going to sea shore in Chennai.

Ans. Document vector table for the given corpus:

We are going to Chennai is a famous place sea shore in


1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 1 0 0 0
1 1 1 1 0 0 1 1 1 0 0 0
1 1 1 1 1 0 0 0 0 1 1 1

Q9 Create bag of words for the given corpus:

Document 1: We are going to Singapore


Document 2: Singapore is a famous tourist place.
Document 3: We are going to a famous tourist place.
Document 4: We are going to Chinatown in Singapore.
Step 1 – Text Normalization

Document 1: [We, are, going, to, Singapore]


Document 2: [ is, a, famous, tourist, place]
Document 3:
Document 4: [Chinatown, in]
Step 2 – Create Dictionary
We are going to Singapore is a famous tourist place Chinatown in

Step 3 – Create Document vector

We are going to Singapore is a famous tourist place Chinatown in


1 1 1 1 1 0 0 0 0 0 0 0

Step 4 – Repeat all the above steps for all documents to create Document vector table

We are going to Singapore is a famous tourist place Chinatown in


1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 1 1 0 0
1 1 1 1 0 0 1 1 1 1 0 0
1 1 1 1 1 0 0 0 0 0 1 1

Q10 Create bag of words for the given corpus:


Document 1: We are going to Mumbai
Document 2: Mumbai is a famous place.
Document 3: We are going to a famous place.
Document 4: I am famous in Mumbai.

Step 1 – Text Normalization

Document 1: [We, are, going, to Mumbai]


Document 2: [is, a, famous, place]
Document 3:
Document 4: [I, am, in]

Step 2 – Create Dictionary


We are going to Mumbai is a famous place I am in

Step 3 – Create Document vector


We are going to Mumbai is a famous place I am In
1 1 1 1 1 0 0 0 0 0 0 0

Step 4 – Repeat all the above steps for all documents to create Document vector table

We are going to Mumbai is a famous place I am In


1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 1 0 0 0
1 1 1 1 0 0 1 1 1 0 0 0
0 0 0 0 1 0 0 1 0 1 1 1
Q11 Create bag of words for the given corpus:
Document 1: Johny Johny, Yes Papa,
Document 2: Eating sugar? No Papa
Document 3: Telling lies? No Papa
Document 4: Open your mouth, Ha! Ha! Ha!

Step 1 – Text Normalization

Document 1: [Jo hny, Yes , Pap a ]


Document 2: [ Eating, Sugar, No]
Document 3: [Telling, lies]
Document 4: [Open, your, mouth, Ha]

Step 2 – Create Dictionary


Johny Yes Papa Eating Sugar No Telling Lies Open your Mouth Ha

Step 3 – Create Document vector

Johny Yes Papa Eating Sugar No Telling Lies Open your Mouth Ha
2 1 1 0 0 0 0 0 0 0 0 0

Step 4 – Repeat all the above steps for all documents to create Document vector table

Johny Yes Papa Eatin Sugar No Tellin Lies Open your Mout Ha
g g h
2 1 1 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 0 0 0 0 0 0
0 0 1 0 0 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 3

Q12 Create bag of words for the given corpus:


Document 1: Sahil likes to play cricket.
Document 2: Sajal likes cricket too.
Document 3: Sajal also likes to play basketball.

Step 1 – Text Normalization

Document 1: [Sahil, likes, to, play, cricket ]


Document 2: [sajal, too]
Document 3: [also, basketball]

Step 2 – Create Dictionary


Sahil likes to play cricket sajal too also basketball
Step 3 – Create Document vector

Sahil likes to play cricket sajal too also Basketball


1 1 1 1 1 0 0 0 0

Step 4 – Repeat all the above steps for all documents to create Document vector table

Sahil likes to play cricket sajal too also Basketball


1 1 1 1 1 0 0 0 0
0 1 0 0 1 1 1 0 0
0 1 1 1 0 1 0 1 1

Q13 Create bag of words for the given corpus:


Document 1: We can use health chatbots for treating stress.
Document 2: we can use NLP to create chatbots and we will be making health chatbots now!
Document 3: Health chatbots cannot replace human counsellors now.

Step 1 – Text Normalization

Document 1: [We, can, use, health, chatbots, for, treating, stress]


Document 2: [NLP, to, create, and, will, be, making, now]
Document 3: [cannot, replace, human, counsellors]

Step 2 – Create Dictionary


We can use health chatbots for treating stress NLP to create and willbe making now cannot replace Human counsellors

Step 3 – Create Document vector

We can use healthchatbots for treatingstressNLP to createandwill be makingnowcannotreplaceHumancounsellors

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

Step 4 – Repeat all the above steps for all documents to create Document vector table

We can use healthchatbots for treatingstressNLP to createandwill be makingnowcannotreplaceHumancounsellors

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 1 2 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0
0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1

Q14 Create dictionary for the given corpus:

Document 1- Sameera and Sanya are classmates


Document 2 – Sameera likes dancing but sanya loves to study mathematics
Step 1 – Text Normalization

Document 1- [Sameera, and, Sanya, are, classmates]


Document 2 – [likes, dancing, but, loves, mathematics]

Step 2 – Create Dictionary

Sameera and Sanya are classmates likes dancing but loves to study mathematics

Q15 Create dictionary for the given corpus:


Document 1: Amit and Amita are twins
Document 2: Amit lives with his grandparents in Shimla
Document 3: Amita lives with her parents in Delhi

Step 1 – Text Normalization

Document 1- [Amit, and Amita, are, twins]


Document 2 – [lives, with, his, grandparents, in Shimla]
Document 3 – [her, parents, Delhi]

Step 2 – Create Dictionary

Amit and Amita are twins lives with his grandparents in Shimla her parents Delhi

Q16 Create Dictionary for the given corpus:


Document 1: I will study to become a doctor.
Document 2: Doctors are the best professionals.
Document 3: I know many doctors.

Step 1 – Text Normalization


Document 1: [I, will, study, to, become, a doctor]
Document 2: [Doctors are the best professionals]
Document 3: [know, many]

Step 2 – Create Dictionary


‘I’, ‘will’, ‘study’, ‘to’, ‘become’, ‘a’, ‘doctor’, ‘doctors’, ‘are’, ‘the’, ‘best’, ‘professionals’, ‘know’, ‘many’.

Q17 On the basis of given corpus guess the document vector table.

Document 1: How are you my friend?


Document 2: I am a person who cares about friends.
Document 3: Friends are who inspire me.

How are you my friend I am a Person who Cares About friends inspire me

1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 0 0
0 1 0 0 0 0 0 0 0 1 0 0 1 1 1

You might also like