0% found this document useful (0 votes)
82 views7 pages

NLP QB

Uploaded by

mohanabhijeeth52
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views7 pages

NLP QB

Uploaded by

mohanabhijeeth52
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CSE (AI & ML)

Course code: MR20- Course Name NATURAL LANGUAGE


1CS0249R20) PROCESSING
QUESTION BANK
Qno Question Marks Section
1 What word tokenization? What are the major challenges to tokenize words of 12 Section-I
a sentence

2 What is minimum edit distance between two words? Calculate minimum edit 12 Section-I
distance between two words: “small” and “smell” using dynamic
programming algorithm. Consider cost for insertion, deletion and
substitution is 1,1,2.

3 What is the difference between non-word and real-word spelling correction? 12 Section-I
What is perplexity? Estimate the perplexity of the corpus based on unigram
language model: “the man is a thief but the man is a good man”

4 What is Maximum Likelihood Estimate? How is it used Language Model? 12 Section-I


Given the Corpus

Calculate the following:


a. Find all the possible bigrams from the given corpus.
b. Find frequencies of all the bigrams.
c. Find the frequencies of all unigram.
d. Calculate the Maximum Likelihood Estimate for all bigrams.

5 What is morphology in NLP? What is morphemes? What is bounded and free 12 Section-I
morphemes explain with example. What is stemming and how it is different
from lemmatization?

6 Write about Evaluation of Language Models and Basic Smoothing 12 Section-I

7 Explain about Noisy Channel Model for Spelling Correction-Gram Language 12 Section-I
Models

8 Develop a comprehensive text processing pipeline that includes tokenization, 12 Section-I


stemming, normalization, and spelling correction.

9 Apply different smoothing techniques to a language model and analyze their 12 Section-I
impact on performance.

10 Design an algorithm to correct spelling errors in a given text document. 12 Section-I


11 What is morphology in NLP? What is morphemes? What is bounded and free 12 Section-II
morphemes explain with example. What is stemming and how it is different
from lemmatization?

12 What is the difference between inflectional and derivational morphology 12 Section-II


explains with example. What is morphological analysis explain with example.

13 What is POS tagging? Find the POS tag for the phrase “the light book”, using 12 Section-II
Viterbi algorithm in Hidden markov tagging model with the following
information.

14 What is difference between real words and non words? What is FSA and how 12 Section-II
inflections in words can be represented using FSA, explain with example.

15 What are the problems of Hidden Markov model to predict the POS tags for a 12 Section-II
given sentence or phrase? Explain how Baum Welch algorithm learns the
parameters – transition matrix, observation matrix and initial state
distribution.

16 What is smoothing in language model? What are the advantages smoothing? 12 Section-II
Find the Good turing smoothing for the following sentence:
“he is he is good man”

17 Explain the different categories of affixes in morphology with examples. 12 Section-II


What are the differences between content and functional morphemes?
What is the difference between regular and irregular forms of verbs and
nouns respectively?

18 Established why maximum entropy model is better than hidden Markov 12 Section-II
model. How POS tagging is achieved in maximum entropy model. What is
beam search explain in detail.

19 How is the uniformity maintained in maximum entropy model? Write the 12 Section-II
maximum entropy model principles.

20 Consider the maximum entropy model for POS tagging, where you want to 12 Section-II
estimate P(tag|word). In a hypothetical setting, assume that tag can
take the values D, N and V (short forms for Determiner, Noun and
Verb). The variable word could be any member of a set V of possible
words, where V contains the words a, man, sleeps, as well as
additional words. The distribution should give the following
probabilities
P(D|a) = 0.9
P(N|man) = 0.9 P(V|sleeps) = 0.9
P(D|word) = 0.6 for any word other than a, man or sleeps P(N|word) = 0.3
for any word other than a, man or sleeps P(V|word) = 0.1 for any
word other than a, man or sleeps
It is assumed that all other probabilities, not defined above could take any
values such that
ΣP(tag|word) = 1 is satisfied for any word in V.
a. Define the features of your maximum entropy model that can
model this distribution. Mark your features as f1, f2 and so on.
Each feature should have the same format as explained in the
class
b. For each feature fi, assume a weight λi. Now, write expression
for the following probabilities in terms of your model
parameters
P(D|cat)
P(N|laughs)
P(D|man)
c. What value do the parameters in your model take to give the
distribution as described above. (i.e.P(D|a) = 0.9) and so on.

21 What is syntax? What is parsing? What is the difference between derivation 12 Section-
and parse tree? What is constituency? Write down different forms of III
constituency with example. What is the significance of “head” of a
constituency, explain?

22 What is the difference between top down and bottom up parsing? Apply CYK 12 Section-
algorithm to parse the sentence “a pilot likes flying planes” with given III
grammar
23 What is inside-outside probability? Apply CYK algorithm to parse the 12 Section-
sentence “a pilot likes flying planes” with given probabilistic context free III
grammar to find most probable sparse tree.

24 What is dependency parsing? What is difference between classical and 12 Section-


dependency parsing? Exaplain the dependency structure in dependency III
parsing with suitable example. What is head and dependent and what are
the criteria are set for them?

25 What is dependency graph? What are the main characteristics of 12 Section-


dependency graph? What is configuration in transition based dependency III
parsing and what is the initial value for configuration. Parse the following
sentence with Arc-Eager algorithm.

26 For the given grammar 12 Section-


III

Find the inside probabilities for each word for the following sentence;
“Astronomers saw stars with ears”

27 Evaluate the effectiveness of the CKY algorithm in various syntax 12 Section-


parsing tasks. III
28 Describe the inside-outside algorithm for calculating probabilities 12 Section-
over parse trees. III
29 Explain how PCFGs assign probabilities to different parse trees for 12 Section-
a given sentence. III
30 Discuss the evaluation of transition-based parsers using different 12 Section-
metrics. III
31 What do you mean by distributional semantics? What is contextual 12 Section-IV
representation and how we can we learn new words from contextual cues?
Explain with examples. What do mean by Distributional Semantic
Models(DSMs)?

32 What is word space? Write down the steps to create words space and explain 12 Section-IV
it with example and how can it be useful to show the word similarities.

33 How weights can be measured based on context? Deduce the formulation 12 Section-IV
for weight measurements. What is difference between attributional and
relational similarity?

34 What one-hot encoding? How words can be represented using one-hot 12 Section-IV
encoding explain with example? What are the limitations of one-hot
encoding explaining with example?

35 What is CBOW? How CBOW is used to emebed word explain with example. 12 Section-IV
What is the difference between skip-gram and CBOW?

36 Discuss the advantages and limitations of distributional semantic 12 Section-IV


models compared to other approaches.
37 Discuss the application of distributional semantic models in 12 Section-IV
sentiment analysis and topic modeling.
38 Discuss the different types of word embedding techniques, 12 Section-IV
including word2vec, GloVe, and fastText.
39 Describe the application of word embeddings in various NLP tasks, 12 Section-IV
including machine translation, sentiment analysis, and question
answering.
40 Explain how WordNet is used for word sense disambiguation and 12 Section-IV
lexical relation extraction.
41 What is summary? What is text summarization? What are the applications of 12 Section-V
text summarization give examples?

42 What are the main stages of text summarization? How salient words can be 12 Section-V
defined? How sentence can be weighted?

43 How sentences can be simplified, Explain with example. How summarization 12 Section-V
systems can be evaluated? What is ROUGE and how is it used for system
evaluation

44 What is text classification? What kind of problems can be solved using text 12 Section-V
classification? How text classification problems can be solved?
45 Discuss the different types of text classification tasks, including binary, multi- 12 Section-V
class, and hierarchical classification.

46 Discuss the evaluation of text classifiers using metrics like accuracy, precision, 12 Section-V
recall, and F1-score.

47 Describe the application of sentiment analysis in social media analysis, 12 Section-V


product reviews, and customer feedback.

48 Discuss the challenges of sentiment analysis, including handling sarcasm, 12 Section-V


irony, and ambiguity.

49 Describe the application of machine learning algorithms like Naive Bayes, 12 Section-V
support vector machines (SVMs), and random forests in text classification.

50 Describe the application of optimization algorithms like integer linear 12 Section-V


programming (ILP) and genetic algorithms in text summarization.
45 What is sentiment analysis? Explain with an example. Give 12 Section-V
examples where sentiment analysis can be used. Write down the
challenges faced by sentiment analysis.

47 12 Section-IV
48 12 Section-IV
49 12 Section-IV
50 12 Section-IV

You might also like