0% found this document useful (0 votes)
101 views6 pages

Cse3015 NLP QP

NLP Qp ,vitap

Uploaded by

Chicken chief
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
101 views6 pages

Cse3015 NLP QP

NLP Qp ,vitap

Uploaded by

Chicken chief
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 6
VIT-AP UNIVERSITY ‘Apply Knowledge. Improve Life!” QUESTION PAPER Name of the Examination: WINTER 2022-2023 - CAT-1 Course Code: CSE3015 Course Title: Natural Language Processing Set number: 3 Date of Exam: weemae@@ | 6 02-2023 C Fr) Duration: 1 hour 30 minutes Total Marks; 50 Co ) Qi. a. considers the following sentences of English language. (5M) @. “Natural language processing (NLP) is en interdisciplinary subfield of linguistics, computer science, and artificial intelligence concemed with the interactions between computers and human language, in particular how to program computers to process and analyse large amounts of natural language data.” ‘Tokenize the above content using necessary operations in NLTIC library. And from the resultant tokens, remove the stop words using the wordnet dictionary and explain each step involved. », considers the following sentences of English language «sm “With the growth of the web, increasing amounts of raw language data has become available since the mid-1990s, Research has thus increasingly focused on unsupervised and semi-supervised learning algorithms” Find the root word of each tokens using stemming and lemmatization using NLTK. library. eompare the results of those operations and justify the performance of each method. (10m) a, stemming the following tokens using appropriate regular expression patterns from nltk library and explain the problems in stemmed words. 6M) [Adviseble, Drinking, Eating, Swimming, Comparable, Computers, Bats, Considered, Informed Asked] ’. correct the following words using edit distance and Jaccard distance methods from ritk library compare the results and explain each step involved, «@™) [asending,caating,flwing,expreesion,amaaing] 4, Tag the following sentences using proper POS tags from Penn tree bank Tag set and ‘groups those tags as open and closed categories. (eM) “Tendulkar took up cricket atthe age of eleven, made his Test match debut on 15 November 1989 ‘against Pakistan in Karachi at the age of sixteen, and went on to represent Mumbal domestically tnd India internattonally for close to twenty-four years” Page 1of2 4. b. Tag the following testing sentences using unigram tagger based on appropriate POS tags from universal tag set. Generate the lookup table from the given training data set by training the unigram model. Note: use the DefaulfTagger as backoff tagger. ‘Training sentences “Sachin/NOUN is/VERB the/DET bes/ADJ righthand/ADJ player/NOUN in/PRT the/DET ctickeyNOUN worl/NOUN. Ganguly/NOUN is/VERB the/DET best/ADJ_lefthand/ADJ player/NOUN f/PRT the/DET cricket/NOUN world/NOUN. GlennMeGrath/NOUN is/VERB the/DET bes/ADJ fusvADJ bowler/NOUN in/PRT the/DET crickeyNOUN world/NOUN. Shanewame/NOUN is/VERB the/DET bestADJ spinner/NOUN is/PRT the/DET crickev’NOUN world/NOUN” ‘Testing sentences ‘Shewag is the good cricket player in the cricket world. He is the better spin bowler too. He is, right hand batter and bowler. (sm) «a, Design the regular expression tagger to tag the following tokens using the necessary POS rules, Use the POS tags from the Penn tree bank Tag set to define the rules. [_Bating, Would, Should, Borrowed, India, Pakistan, Eats, advisable, best, the, 1988] b, finds the output of the following python code. (SM) em {import nt for sent in nltk.sent_tokenize(sentence): sentence = "Khan was bom on 2 November 1965 into a Muslim family in New Delhi. He spent the first five years of hs life in Mangalore, where his maternal grandfather, Iftichar Ahmed, served as the chief engineer ofthe port in the 1960s. Khan's father, Mir Te} Mohammed Khan, was an Indian independence activist from Peshawar who campaigned alongside the Khudai Khidmatgar, a nonviolent resistance movement led by Abdul Ghaffar Khan that sought a united and independent India.” for chunk in nltk.ne_chunk(nitk.pos_tag(nltk.word_tokenize(sent))): if hasattr(chunk, ‘label: print(chunk.labelO, '‘join(c{0] for ¢ in chunks) . design the HMM tagger using required transitional and emission probabilities from the given set of training sentences, And find the POS tagging for the sentence “I learn R with Artificial intelligence”. Training sentence (10m) R/noun is/verb easier/adverb. AU/noun with/adposition R/noun is/verb great/adverb. Unoun love/verb Al/noun, So/conjunction, I/noun learn/verb R/noun. QP MAPPING Q.No. Module Number CO Mapped PO Mapped PEO Mapped PSO Mapped Marks 42 42 1 10 12 13 10 a3 22 13 i0 aa 13 10 Os 123 13 10 Page 2 of 2 jo) VIT-AP sees: UNIVERSITY ‘Apply Knowledge. Improve Lifel” QUESTION PAPER Name of the Examination: WINTER 2022-2023 ~ CAT-1 Course Code: CSE3015 Course Title: Natural language Processing Set number: 2p. Date of Exam: 20-02-2022, (‘Arv) Duration: 90 minutes Total Marks: 50 Cor) Instructions: 1. Assume data wherever necessary. 2. Any assumptions made should be clearly stated, QI. a) Explain briefly the stemming and its type. Provide proper explanation and reasons to support ‘the assertions. Given a raw string, apply different type of text stemming technique based on Natural Language Too! Kit. b) Design a program to remove rareword from a given string “Britannia and Parle are rivals of each other and better than Oreo” using Natural Language Too! kit platform. Also, print the tokens before and after the removal of rareword. (545)M Q2. Under what circumstances lemmatization proves to be better than stemming. Provide proper theoretical and practical explanation to support the assertions. Based on Lancaster stemming find the result of the tokens listed as: (5+5)M, {uing, saging, aiming, ely, being) a) Compare the working procedure of Unigrams Tagger, Bigrams Tagger, and Trigram Tagger to ‘tag any given string token. Provide proper explanation, snippet representation and reasons to support the explanation. b) Design a program to illustrate pos_tag() method endorsed around Natural Language Tool Kit. Display the token and its output for the string : (5+5)M “and my friend are going for a drive in India” (Q4. A university want computational linguistic people to design tagger based the new rule of tagging. Being an expert in Natural Language Processing, suggest what type of tagger to be used and provide some regular expression to ease the work. (20) Page 1 of 2 QS. a) Illustrate the working procedure of Hidden Markov Mode! based tagging that can solve the inadequacies of the automatic corpus tagging. Provide proper explanation, data and reasons to support the assertions. b.) Solve the tagaing problem using Hidden Markov approach. Evaluate the probability of sequence of string “spray martin write can” that best matches the corpus tagging shown below, where N: Noun, M:Modal, and Vverb. (545M, spray/N will/M write/V martin/N will/M lee/N spray/V Walter/N walter/N lee/N can/M write/V will/N walter/N will/M slide/V spray/N QP MAPPING Module PO PEO Q.No. | Number |COM@PPEE | sated Mapped | PS0Mapped | Marks Cry 1 12 ~ 42 1 10 a 1 1,2 12 13 10 a3 2 2,2 12 13 10 aa 2 3 123 13 10 as 2 123 123 13 10 Page 2 of 2 @ VIT-AP sec UNIVERSITY Apply Knowledge. Improve Life!” QUESTION PAPER Name of the Examination: WINTER 2022-2023 ~ CAT-1 Course Code: CSE3015 Course Title: Natural language Processing, Set number: 5 Date of Exam: [y~2 -202.3 (FN) Duration: 90 minutes Total Marks: 50 (8) Instructions: 1. Assume di wherever necessary. 2, Any assumptions made should be clearly steted. Ql. a) Explain briefly the concept of text wrangling and its type. Provide proper explanation and reasons to support the assertions. Given a raw string, apply different type of text wrangling pre-processing technique based on Natural Language Too! Kit. b) Design a program to remove stopword froma given string “India in my County and we are proud of it” using Natural Language Tool Kit platform. Furthermore, print the tokens before and after the removal of stopword. Also, print the stopwords in the string. (545M {Q2. a) Differentiate between stemming and lemmatization, Provide proper theoretical and practical ‘explanation to support the assertions. Based on Lancaster stemming find the result of the tokens listed as: {elng, rhythmic, caring, aings, spying} b) Describe any two strict conditions /rules fo prevent the phenomenon of over stemming of Various short-rooted words for Lancaster stemming. Also, explain the different techniques that can reduce the stemming error encountered while using prefixes suffixes based stemming. (5+5)M Q3. a) Explain the working procedure of Brill tagger and illustrate any two rules that Is ‘employed to tag a token, Provide proper explanation, snippet representation and reasons ‘to support the explanation. b) Design a program to illustrate ne_chunk() method endorsed around Named Entity Recognition in Natural Language Tool Kit. Display the output for the string : (5+5 )M “David Is playing at Vellore University In India” Page 1 of 2 (Q4. Differentiate between Regex and N-gram tagger. Design a program for both Regex and N- gram tagger and explain the reason for similarity or difference in the output. (10)M Q5..) Explain why annotating modern mult-blllon-word corpora manually Is unrealistic, Provide proper explanation, data and reasons to support the assertions? Illustrate the working procedure of an automatic and stochastic tagging-based approach that can solve the inadequacies of the enormous text corpora. ',) Solve the tagging problem using stochastic tagging-based approach. Evaluate the probability of sequence of string “can danlel write spray” that best matches the corpus tagging shown below, where N: Noun, M:Modal, and Vverb. (545) daniel/N will/M slide/V spray/N spray/N will/M write/V dantel/N will/M brown/N spray/V daniel/N daniel/N brown/N can/M write/V will/N ‘QP MAPPING [eawo. | Met | comapped oo) PEO | PsoMapped | Marks * | Number ped | Mapped Mapped ces a a 1 12 42 1 10 a x 42 iz 13 10 @ 2 22 a2 73 70 a 2 3 123 13 10 os 2 123 123 13 70 Page 2 of 2

You might also like