0% found this document useful (0 votes)
25 views8 pages

NLP Assignment

Uploaded by

birenmalik99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views8 pages

NLP Assignment

Uploaded by

birenmalik99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Paper Code: PEC-CSE702A

Paper Name: Speech & Natural Language Processing


Assignment List

To be solved for better understanding of the subject.


Very Short Questions
What is the purpose of grammar-based language model?
What do you understand by statistical language modelling?
Define regular grammar.
Define finite state automaton.
State the difference between deterministic finite automata and non-deterministic finite
automata.
What can be done in morphological parsing?
What are stems?
What are called affixes?
What is lexicon?
What do you understand by morphotactics?
What is the minimum edit distance?
What do you understand by similarity key techniques?
What do you understand by n-gram based techniques?
What do you understand by rule-based techniques?
What is Part-of-speech tagging process?
Write a regular expression to represent a set of all strings over {a, b} of even length.
Write a regular expression to represent a set of all strings over {a, b} of length 4. starting
with an a.
Write a regular expression to represent a set of all strings over {a, b} containing at least one
a.
Write a regular expression to represent a set of strings over {a, b} having exactly 3b's.
Write a regular expression to represent a set of all strings over {a, b} having abab as a
substring.
Write a regular expression to represent a set of all strings containing exactly 2a’s.
Write a regular expression to represent a set of all strings containing at least 2a’s.
Write a regular expression to represent a set of all strings containing at most 2a's.
Write a regular expression to represent a set of all strings containing the substring aa.
Write a regular expression to represent all strings over {a, b} starting with any number of
a's, followed by one or more b's, followed by one or more a's, followed by a single b, followed
by any number of a's, followed by band ending in any string of a's and b's.
What is pragmatic ambiguity?
What do you understand by the term “word sense disambiguation”?
What is corpus?
What do you understand by supervised learning?
What do you understand by unsupervised learning?
How KNN works?
What is the speciality of naïve bayes classifier?
What is the purpose of rule-based taggers?
What is the purpose of stochastic taggers?
What is the purpose of hybrid taggers?
Define context-free grammar.
What is the purpose of parse tree?
When a grammar is called ambiguous?
State the difference between top-down parsing and bottom-up parsing.
What is lexical ambiguity?
What is syntactic ambiguity?
What is semantic ambiguity?
Find the regular expression representing the set of all strings of the form ambncp where m, n,
p >= 1.
Find the regular expression representing the set of all strings of the form amb2nc3p where m,
n, p >= 1.
Find the regular expression representing the set of all strings of the form an ba2mb2 where
m>=0, n>= 1.

What are the main challenges in NLP?

Mention some areas of NLP application.


What is machine translation?
What is morphological segmentation?
What is tokenization?
“I saw bats” – contains which type of ambiguity?
“Rina loves her mother and Tina does too” - contains which type of ambiguity?
How many ambiguities exist in the given sentence “ I know little Italian.”
At which phase of language processing parse tree is required?
At which phase meaning of the word is checked?
What type of words called “stop word”?
What is the use of TF-IDF?
What is the difference between formal and natural languages?
What do you understand by information extraction?

What are the various models of information extraction?

How can you differentiate Artificial Intelligence, Machine Learning, and Natural Language
Processing?
What is noise removal in NLP?
What are the stages in the lifecycle of a natural language processing (NLP) project?

What is meant by data augmentation?


How can data be obtained for NLP projects?
What do you mean by Text Extraction and Clean-up?

What is the meaning of Text Normalization in NLP?


What are the steps to follow when building a text classification system?
What do you mean by a Bag of Words?

Which step is required as pre-processing step in language processing?


What is unigram? Explain with example.
What is bigram? Explain with example
How unigram and bigram differs?
How n-gram differs from unigram and bigram?
Describe F1 score.
How regular grammar and context free grammar differs?
What is term frequency?
What is inverse document frequency?
Why term weighting is important?
What is the purpose of Jaccard’s coefficient?
Short Questions:
Briefly discuss the meaning components of a language.
Differentiate between the rationalist and empiricist approaches to natural language
processing.
List the motivation behind the development of computational models of languages.
Give the representation of a sentence in d-structure and s-structure in GB.
Compare GB and PG. Why is PG called syntactico-semantic theory?
What are the problems associated with n-gram model? How are these problems handled?
What are lexical rules? Give the complete entry for a verb in the lexicon, to be used in LFG.
Define a finite automaton that accepts the following language: (aa)*(bb)*.
Compute the minimum edit distance between paecful and peaceful.
Construct a finite automaton for the regular expression (a+b)*abb.
Construct a nondeterministic finite automaton accepting {ab, ba}, and use it to find a
deterministic automaton accepting the same set.
Construct a nondeterministic finite automaton accepting the set of all strings over {a, b}
ending in aba. Use it to construct DFA accepting the same set of strings.
Construct a transition system which can accept strings over the alphabet a, b,….containing
either cat or rat.
Find all strings of length 5 or less in the regular set represented by the following regular
expressions:
(a) (ab + a)*(aa + b)
(b) (a*b + b*a)*a
Describe, in the English language, the sets represented by the following regular expressions:
(a) a(a+b)*ab
(b) (aa+b)*(bb+a)*
Construct a finite automaton accepting all strings over {0, 1} ending in 010 or 0010.
Find a derivation tree a*b + a*b of a grammar L(G), where G is given by S S + S|S*S,
S a|b.
A context-free grammar G has the following productions:
S0S0 l 1S1 I A, A2B3. B 2B3|3 Describe the language generated by the parameters.
Consider the following productions:
S  aB | bA
A  aS | bAA | a
B  bS | aBB | b
For the string aaabbabbba, find
(a) the leftmost derivation,
(b) the rightmost derivation,
and (c) the parse tree.
Show that the grammar S a |abSb | aAb, A  bS |aAAb is ambiguous.
Show that the grammar S  aBb, A  aAB |a, B  ABb| b is ambiguous.
Give two possible parse trees for the sentence, Stolen painting found by tree.
What are the advantages and disadvantages of using semantic grammar?
Discuss lexical ambiguity.
Discuss syntactic ambiguity.
Discuss semantic ambiguity.
Discuss pragmatic ambiguity.
Discuss knowledge-based word sense disambiguation approaches.
Discuss corpus-based word sense disambiguation approaches.
Discuss Lesk’s algorithm.
Discuss Walker’s algorithm.
Discuss the concept of Bayesian classification.
Discuss the concept of Naïve Bayesian classification.
Discuss k-nearest neighbour algorithm with example.
Use systemic grammar to analyse to handle the following sentences:
(a) Savitha will sing a song.
(b) Savitha sings a song.
Use the FUF grammar to build a fully unified FD for the following sentences:
(a) Savitha will sing a song.
(b) Savitha sings a song.
Discuss the concept of precision and recall and their use.
Discuss the basic information retrieval process.
Discuss Zipf’s law.
Discuss term weighting.
Long Answer type questions:
What is the role of transformational rules in transformational grammar? Explain with the
help of examples.
Write regular expressions for the following languages.
1. the set of all alphabetic strings;
2. the set of all lower-case alphabetic strings ending in a b;
3. the set of all strings from the alphabet a,b such that each a is immediately preceded by and
immediately followed by a b;
Write regular expressions for the following languages. By “word”, we mean an alphabetic
string separated from other words by whitespace, any relevant punctuation, line breaks, and
so forth.
1. the set of all strings with two consecutive repeated words (e.g., “Humbert Humbert” and
“the the” but not “the bug” or “the big bug”);
2. all strings that start at the beginning of the line with an integer and that end at the end of
the line with a word;
3. all strings that have both the word grotto and the word raven in them (but not, e.g., words
like grottos that merely contain the word grotto);
4. write a pattern that places the first word of an English sentence in a register. Deal with
punctuation
What are f-structure and C-structure in LFG? Consider the following sentence and explain
both the structure, “She saw stars in the sky”
What are Karaka relations? Explain the difference between Karta and Agent.
Calculate the bigram probability for each of the words of the sentence “I want Chinese
food.”
We are given the following corpus:
<s> I am sam </s>
<s> Sam I am </s>
<s> I am Sam </s>
<s> I do not like green eggs and Sam</s>
Using a bigram language model with add-one smoothing, what is P(Sam | am)? Include <s>
& </s> in your counts just like any other token.
We are given the following corpus:
<s> I am sam </s>
<s> Sam I am </s>
<s> I am Sam </s>
<s> I do not like green eggs and Sam</s>
If we use linear interpolation smoothing between a maximum-likelihood bi-gram model and
a maximum-likelihood unigram model with λ1 = 1/2 and λ2 = 1/2, what is P(Sam|am)?
Include in your counts just like any other token.
How words can be handled in the tagging process?
Comment on the validity of the following statements:
a) Rule-based taggers are non-deterministic
b) Stochastic taggers are language independent
c) Brill’s tagger is a rule-based tagger
Construct a transition system corresponding to the regular expressions (i) (ab + c*)*b (ii) a
+ bb +bab*a
Construct a finite automaton recognizing L(G), where G is the grammar SaS | bA| b and
A  aA | bS | a.
Construct a deterministic finite automaton equivalent to the grammar S aS | bS | aA, A 
bB, BaC, CA
Find a reduced grammar equivalent to the grammar G whose productions are SAB|CA,
BBC|AB, Aa , CaB|b
Construct a reduced grammar equivalent to the grammar
SaAa, ASb | bCC | DaA, Cabb | DD, EaC, D aDA
Let G be SAB, Aa, BC|b, CD, DE and Ea. Eliminate unit productions and get
an equivalent grammar.
Reduce the following grammar G to CNF. G is SaAD, AaB | bAB, Bb and Dd.
Construct a grammar in Greibach normal form to the grammar SAA|a, ASS|b.
Convert the grammar SAB, ABS|b, BSA|a into GNF.
Find a grammar in GNF equivalent to the grammar
E E + T|T, TT*F|F, F(E) | a.
Discuss the disadvantages of the basic top-down parser with the help of an appropriate
example.
Tabulate the sequence of states created by CYK algorithm while parsing, The sun rises in
the east
Use the following grammar:
S NP VP SVP NPDet Noun
NP Noun NPNP PP VP VP NP
VPVerb VPVP PP PPPreposition NP
Give two possible parse tree of the sentence: “Pluck the flower with stick.”
Introduce lexicon rules for words appearing in the sentence. Using these parse trees obtain
maximum likelihood estimates for the grammar rules used in the tree. Calculate probability
of any one parse tree using these estimates.
For each of the following verbs give all the selectional restrictions possible: think, eat, bark,
fly.
Find out the number of senses of ‘still’, ‘bat’ and ‘cricket’ in WordNet.
Given the sentence, The shopkeeper showed a Dell laptop to Suha and she liked it very
much, find possible referents for the pronouns ‘she’ and ‘it’, and give the score of these
referents for the following antecedent indicators: definiteness, referential distance, indicating
verbs, term preference, section heading, non-prepositional noun phrase, collocation.
Identify the coherence relation between the following sentences:
(a) There is a train on Platform 6.
(b) Its destination is New Delhi.
(c) There is another train on Platform 7.
(d) Its destination is Varanasi.
Given the following document:
The oldest Chinese language we know about is on oracle bones. Priests scratched questions
on animal bones and then held the bones in a fire so that they cracked. The places where the
cracks crossed the pictograms were thought to give the answers from the god.
Assume that raw term frequency is used and the stop words are ‘the’, ‘we’, ‘is’, ‘on’, ‘and’,
‘then’ , ‘in’ , ‘a’, ‘so’ ‘that’, ‘they’, ‘were’, ‘to’, ‘where’, ‘but’, ‘only’, ‘out’.
Find the vector representation of the above document. Use porter stemmer for stemming.
Give evidence where the use of WordNet relations for query expansion might have an
adverse effect on the performance of an IR system.
Construct the conceptual graph for the following tagged sentence.
A DT methodology NNP for IN calculating VBG square JJ root NNP.

You might also like