Sample Paper Questions - NLP (Part 1)
Sample Paper Questions - NLP (Part 1)
Note: Number of document vectors will be the number of documents given like if 3 documents are given
then document vectors will be 3
Bag of words Algorithm
Here we place the document 1 and check the other documents if any word is already in the document 1 then it is
not taken again in document 2 same ways checking for document 3 if already in 1 and 2 then not taken again it
will create text normalization
Output
Document 1: We, are, going, to, Chennai.
Document 2: is, a, famous, place.
Document 3:
Document 4: sea, shore, in
Step 2 – Create Dictionary: write the above words in the form of dictionary
Place the original document 1 given in the table, if the words are there then place 1 else place 0
Step 4 – Repeat all the above steps for all documents to create Document vector table for the given
corpus:
Step 4 – Repeat all the above steps for all documents to create Document vector table
Step 4 – Repeat all the above steps for all documents to create Document vector table
Johny Yes Papa Eating Sugar No Telling Lies Open your Mouth Ha
2 1 1 0 0 0 0 0 0 0 0 0
Step 4 – Repeat all the above steps for all documents to create Document vector table
Johny Yes Papa Eatin Sugar No Tellin Lies Open your Mout Ha
g g h
2 1 1 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 0 0 0 0 0 0
0 0 1 0 0 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 3
Step 4 – Repeat all the above steps for all documents to create Document vector table
1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Step 4 – Repeat all the above steps for all documents to create Document vector table
1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 1 2 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0
0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
Sameera and Sanya are classmates likes dancing but loves to study mathematics
Amit and Amita are twins lives with his grandparents in Shimla her parents Delhi
Q17 On the basis of given corpus guess the document vector table.
How are you my friend I am a Person who Cares About friends inspire me
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 0 0
0 1 0 0 0 0 0 0 0 1 0 0 1 1 1