We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 6
VIT-AP
UNIVERSITY
‘Apply Knowledge. Improve Life!”
QUESTION PAPER
Name of the Examination: WINTER 2022-2023 - CAT-1
Course Code: CSE3015 Course Title: Natural Language Processing
Set number: 3 Date of Exam: weemae@@ | 6 02-2023 C Fr)
Duration: 1 hour 30 minutes Total Marks; 50 Co )
Qi. a. considers the following sentences of English language. (5M)
@.
“Natural language processing (NLP) is en interdisciplinary subfield of linguistics, computer
science, and artificial intelligence concemed with the interactions between computers and human
language, in particular how to program computers to process and analyse large amounts of natural
language data.”
‘Tokenize the above content using necessary operations in NLTIC library. And from the resultant
tokens, remove the stop words using the wordnet dictionary and explain each step involved.
», considers the following sentences of English language «sm
“With the growth of the web, increasing amounts of raw language data has become available since
the mid-1990s, Research has thus increasingly focused on unsupervised and semi-supervised
learning algorithms”
Find the root word of each tokens using stemming and lemmatization using NLTK. library.
eompare the results of those operations and justify the performance of each method.
(10m)
a, stemming the following tokens using appropriate regular expression patterns from nltk
library and explain the problems in stemmed words. 6M)
[Adviseble, Drinking, Eating, Swimming, Comparable, Computers, Bats, Considered,
Informed
Asked]
’. correct the following words using edit distance and Jaccard distance methods from ritk
library compare the results and explain each step involved, «@™)
[asending,caating,flwing,expreesion,amaaing]
4, Tag the following sentences using proper POS tags from Penn tree bank Tag set and
‘groups those tags as open and closed categories. (eM)
“Tendulkar took up cricket atthe age of eleven, made his Test match debut on 15 November 1989
‘against Pakistan in Karachi at the age of sixteen, and went on to represent Mumbal domestically
tnd India internattonally for close to twenty-four years”
Page 1of24.
b. Tag the following testing sentences using unigram tagger based on appropriate POS tags from
universal tag set. Generate the lookup table from the given training data set by training the
unigram model. Note: use the DefaulfTagger as backoff tagger.
‘Training sentences
“Sachin/NOUN is/VERB the/DET bes/ADJ righthand/ADJ player/NOUN in/PRT the/DET
ctickeyNOUN worl/NOUN. Ganguly/NOUN is/VERB the/DET best/ADJ_lefthand/ADJ
player/NOUN f/PRT the/DET cricket/NOUN world/NOUN. GlennMeGrath/NOUN is/VERB
the/DET bes/ADJ fusvADJ bowler/NOUN in/PRT the/DET crickeyNOUN world/NOUN.
Shanewame/NOUN is/VERB the/DET bestADJ spinner/NOUN is/PRT the/DET crickev’NOUN
world/NOUN”
‘Testing sentences
‘Shewag is the good cricket player in the cricket world. He is the better spin bowler too. He is,
right hand batter and bowler.
(sm)
«a, Design the regular expression tagger to tag the following tokens using the necessary POS
rules, Use the POS tags from the Penn tree bank Tag set to define the rules.
[_Bating, Would, Should, Borrowed, India, Pakistan, Eats, advisable, best, the, 1988]
b, finds the output of the following python code.
(SM)
em
{import nt
for sent in nltk.sent_tokenize(sentence):
sentence = "Khan was bom on 2 November 1965 into a Muslim family in New Delhi. He spent the first
five years of hs life in Mangalore, where his maternal grandfather, Iftichar Ahmed, served as the chief
engineer ofthe port in the 1960s. Khan's father, Mir Te} Mohammed Khan, was an Indian
independence activist from Peshawar who campaigned alongside the Khudai Khidmatgar, a nonviolent
resistance movement led by Abdul Ghaffar Khan that sought a united and independent India.”
for chunk in nltk.ne_chunk(nitk.pos_tag(nltk.word_tokenize(sent))):
if hasattr(chunk, ‘label:
print(chunk.labelO, '‘join(c{0] for ¢ in chunks)
.
design the HMM tagger using required transitional and emission probabilities from the given
set of training sentences, And find the POS tagging for the sentence “I learn R with Artificial
intelligence”.
Training sentence
(10m)
R/noun is/verb easier/adverb. AU/noun with/adposition R/noun is/verb great/adverb.
Unoun love/verb Al/noun, So/conjunction, I/noun learn/verb R/noun.
QP MAPPING
Q.No.
Module
Number
CO Mapped
PO
Mapped
PEO
Mapped
PSO Mapped
Marks
42
42
1
10
12
13
10
a3
22
13
i0
aa
13
10
Os
123
13
10
Page 2 of 2jo) VIT-AP
sees: UNIVERSITY
‘Apply Knowledge. Improve Lifel”
QUESTION PAPER
Name of the Examination: WINTER 2022-2023 ~ CAT-1
Course Code: CSE3015 Course Title: Natural language Processing
Set number: 2p. Date of Exam: 20-02-2022, (‘Arv)
Duration: 90 minutes Total Marks: 50 Cor)
Instructions:
1. Assume data wherever necessary.
2. Any assumptions made should be clearly stated,
QI. a) Explain briefly the stemming and its type. Provide proper explanation and reasons to support
‘the assertions. Given a raw string, apply different type of text stemming technique based on
Natural Language Too! Kit.
b) Design a program to remove rareword from a given string “Britannia and Parle are rivals of
each other and better than Oreo” using Natural Language Too! kit platform. Also, print the
tokens before and after the removal of rareword. (545)M
Q2. Under what circumstances lemmatization proves to be better than stemming. Provide proper
theoretical and practical explanation to support the assertions. Based on Lancaster stemming find
the result of the tokens listed as: (5+5)M,
{uing, saging, aiming, ely, being)
a) Compare the working procedure of Unigrams Tagger, Bigrams Tagger, and Trigram Tagger to
‘tag any given string token. Provide proper explanation, snippet representation and reasons to
support the explanation.
b) Design a program to illustrate pos_tag() method endorsed around Natural Language Tool Kit.
Display the token and its output for the string : (5+5)M
“and my friend are going for a drive in India”
(Q4. A university want computational linguistic people to design tagger based the new rule of
tagging. Being an expert in Natural Language Processing, suggest what type of tagger to be used
and provide some regular expression to ease the work. (20)
Page 1 of 2QS. a) Illustrate the working procedure of Hidden Markov Mode! based tagging that can solve the
inadequacies of the automatic corpus tagging. Provide proper explanation, data and reasons
to support the assertions.
b.) Solve the tagaing problem using Hidden Markov approach. Evaluate the probability of
sequence of string “spray martin write can” that best matches the corpus tagging shown
below, where N: Noun, M:Modal, and Vverb. (545M,
spray/N will/M write/V martin/N
will/M lee/N spray/V Walter/N
walter/N lee/N can/M write/V will/N
walter/N will/M slide/V spray/N
QP MAPPING
Module PO PEO
Q.No. | Number |COM@PPEE | sated Mapped | PS0Mapped | Marks
Cry 1 12 ~ 42 1 10
a 1 1,2 12 13 10
a3 2 2,2 12 13 10
aa 2 3 123 13 10
as 2 123 123 13 10
Page 2 of 2@ VIT-AP
sec UNIVERSITY
Apply Knowledge. Improve Life!”
QUESTION PAPER
Name of the Examination: WINTER 2022-2023 ~ CAT-1
Course Code: CSE3015 Course Title: Natural language Processing,
Set number: 5 Date of Exam: [y~2 -202.3 (FN)
Duration: 90 minutes Total Marks: 50 (8)
Instructions:
1. Assume di
wherever necessary.
2, Any assumptions made should be clearly steted.
Ql. a) Explain briefly the concept of text wrangling and its type. Provide proper explanation and
reasons to support the assertions. Given a raw string, apply different type of text wrangling
pre-processing technique based on Natural Language Too! Kit.
b) Design a program to remove stopword froma given string “India in my County and we are
proud of it” using Natural Language Tool Kit platform. Furthermore, print the tokens before
and after the removal of stopword. Also, print the stopwords in the string. (545M
{Q2. a) Differentiate between stemming and lemmatization, Provide proper theoretical and practical
‘explanation to support the assertions. Based on Lancaster stemming find the result of the
tokens listed as:
{elng, rhythmic, caring, aings, spying}
b) Describe any two strict conditions /rules fo prevent the phenomenon of over stemming of
Various short-rooted words for Lancaster stemming. Also, explain the different techniques that
can reduce the stemming error encountered while using prefixes suffixes based stemming.
(5+5)M
Q3. a) Explain the working procedure of Brill tagger and illustrate any two rules that Is
‘employed to tag a token, Provide proper explanation, snippet representation and reasons
‘to support the explanation.
b) Design a program to illustrate ne_chunk() method endorsed around Named Entity
Recognition in Natural Language Tool Kit. Display the output for the string : (5+5 )M
“David Is playing at Vellore University In India”
Page 1 of 2(Q4. Differentiate between Regex and N-gram tagger. Design a program for both Regex and N-
gram tagger and explain the reason for similarity or difference in the output. (10)M
Q5..) Explain why annotating modern mult-blllon-word corpora manually Is unrealistic, Provide
proper explanation, data and reasons to support the assertions? Illustrate the working
procedure of an automatic and stochastic tagging-based approach that can solve the
inadequacies of the enormous text corpora.
',) Solve the tagging problem using stochastic tagging-based approach. Evaluate the probability
of sequence of string “can danlel write spray” that best matches the corpus tagging shown
below, where N: Noun, M:Modal, and Vverb. (545)
daniel/N will/M slide/V spray/N
spray/N will/M write/V dantel/N
will/M brown/N spray/V daniel/N
daniel/N brown/N can/M write/V will/N
‘QP MAPPING
[eawo. | Met | comapped oo) PEO | PsoMapped | Marks
* | Number ped | Mapped Mapped ces a
a 1 12 42 1 10
a x 42 iz 13 10
@ 2 22 a2 73 70
a 2 3 123 13 10
os 2 123 123 13 70
Page 2 of 2