We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8
Paper Code: PEC-CSE702A
Paper Name: Speech & Natural Language Processing
Assignment List
To be solved for better understanding of the subject.
Very Short Questions What is the purpose of grammar-based language model? What do you understand by statistical language modelling? Define regular grammar. Define finite state automaton. State the difference between deterministic finite automata and non-deterministic finite automata. What can be done in morphological parsing? What are stems? What are called affixes? What is lexicon? What do you understand by morphotactics? What is the minimum edit distance? What do you understand by similarity key techniques? What do you understand by n-gram based techniques? What do you understand by rule-based techniques? What is Part-of-speech tagging process? Write a regular expression to represent a set of all strings over {a, b} of even length. Write a regular expression to represent a set of all strings over {a, b} of length 4. starting with an a. Write a regular expression to represent a set of all strings over {a, b} containing at least one a. Write a regular expression to represent a set of strings over {a, b} having exactly 3b's. Write a regular expression to represent a set of all strings over {a, b} having abab as a substring. Write a regular expression to represent a set of all strings containing exactly 2a’s. Write a regular expression to represent a set of all strings containing at least 2a’s. Write a regular expression to represent a set of all strings containing at most 2a's. Write a regular expression to represent a set of all strings containing the substring aa. Write a regular expression to represent all strings over {a, b} starting with any number of a's, followed by one or more b's, followed by one or more a's, followed by a single b, followed by any number of a's, followed by band ending in any string of a's and b's. What is pragmatic ambiguity? What do you understand by the term “word sense disambiguation”? What is corpus? What do you understand by supervised learning? What do you understand by unsupervised learning? How KNN works? What is the speciality of naïve bayes classifier? What is the purpose of rule-based taggers? What is the purpose of stochastic taggers? What is the purpose of hybrid taggers? Define context-free grammar. What is the purpose of parse tree? When a grammar is called ambiguous? State the difference between top-down parsing and bottom-up parsing. What is lexical ambiguity? What is syntactic ambiguity? What is semantic ambiguity? Find the regular expression representing the set of all strings of the form ambncp where m, n, p >= 1. Find the regular expression representing the set of all strings of the form amb2nc3p where m, n, p >= 1. Find the regular expression representing the set of all strings of the form an ba2mb2 where m>=0, n>= 1.
What are the main challenges in NLP?
Mention some areas of NLP application.
What is machine translation? What is morphological segmentation? What is tokenization? “I saw bats” – contains which type of ambiguity? “Rina loves her mother and Tina does too” - contains which type of ambiguity? How many ambiguities exist in the given sentence “ I know little Italian.” At which phase of language processing parse tree is required? At which phase meaning of the word is checked? What type of words called “stop word”? What is the use of TF-IDF? What is the difference between formal and natural languages? What do you understand by information extraction?
What are the various models of information extraction?
How can you differentiate Artificial Intelligence, Machine Learning, and Natural Language Processing? What is noise removal in NLP? What are the stages in the lifecycle of a natural language processing (NLP) project?
What is meant by data augmentation?
How can data be obtained for NLP projects? What do you mean by Text Extraction and Clean-up?
What is the meaning of Text Normalization in NLP?
What are the steps to follow when building a text classification system? What do you mean by a Bag of Words?
Which step is required as pre-processing step in language processing?
What is unigram? Explain with example. What is bigram? Explain with example How unigram and bigram differs? How n-gram differs from unigram and bigram? Describe F1 score. How regular grammar and context free grammar differs? What is term frequency? What is inverse document frequency? Why term weighting is important? What is the purpose of Jaccard’s coefficient? Short Questions: Briefly discuss the meaning components of a language. Differentiate between the rationalist and empiricist approaches to natural language processing. List the motivation behind the development of computational models of languages. Give the representation of a sentence in d-structure and s-structure in GB. Compare GB and PG. Why is PG called syntactico-semantic theory? What are the problems associated with n-gram model? How are these problems handled? What are lexical rules? Give the complete entry for a verb in the lexicon, to be used in LFG. Define a finite automaton that accepts the following language: (aa)*(bb)*. Compute the minimum edit distance between paecful and peaceful. Construct a finite automaton for the regular expression (a+b)*abb. Construct a nondeterministic finite automaton accepting {ab, ba}, and use it to find a deterministic automaton accepting the same set. Construct a nondeterministic finite automaton accepting the set of all strings over {a, b} ending in aba. Use it to construct DFA accepting the same set of strings. Construct a transition system which can accept strings over the alphabet a, b,….containing either cat or rat. Find all strings of length 5 or less in the regular set represented by the following regular expressions: (a) (ab + a)*(aa + b) (b) (a*b + b*a)*a Describe, in the English language, the sets represented by the following regular expressions: (a) a(a+b)*ab (b) (aa+b)*(bb+a)* Construct a finite automaton accepting all strings over {0, 1} ending in 010 or 0010. Find a derivation tree a*b + a*b of a grammar L(G), where G is given by S S + S|S*S, S a|b. A context-free grammar G has the following productions: S0S0 l 1S1 I A, A2B3. B 2B3|3 Describe the language generated by the parameters. Consider the following productions: S aB | bA A aS | bAA | a B bS | aBB | b For the string aaabbabbba, find (a) the leftmost derivation, (b) the rightmost derivation, and (c) the parse tree. Show that the grammar S a |abSb | aAb, A bS |aAAb is ambiguous. Show that the grammar S aBb, A aAB |a, B ABb| b is ambiguous. Give two possible parse trees for the sentence, Stolen painting found by tree. What are the advantages and disadvantages of using semantic grammar? Discuss lexical ambiguity. Discuss syntactic ambiguity. Discuss semantic ambiguity. Discuss pragmatic ambiguity. Discuss knowledge-based word sense disambiguation approaches. Discuss corpus-based word sense disambiguation approaches. Discuss Lesk’s algorithm. Discuss Walker’s algorithm. Discuss the concept of Bayesian classification. Discuss the concept of Naïve Bayesian classification. Discuss k-nearest neighbour algorithm with example. Use systemic grammar to analyse to handle the following sentences: (a) Savitha will sing a song. (b) Savitha sings a song. Use the FUF grammar to build a fully unified FD for the following sentences: (a) Savitha will sing a song. (b) Savitha sings a song. Discuss the concept of precision and recall and their use. Discuss the basic information retrieval process. Discuss Zipf’s law. Discuss term weighting. Long Answer type questions: What is the role of transformational rules in transformational grammar? Explain with the help of examples. Write regular expressions for the following languages. 1. the set of all alphabetic strings; 2. the set of all lower-case alphabetic strings ending in a b; 3. the set of all strings from the alphabet a,b such that each a is immediately preceded by and immediately followed by a b; Write regular expressions for the following languages. By “word”, we mean an alphabetic string separated from other words by whitespace, any relevant punctuation, line breaks, and so forth. 1. the set of all strings with two consecutive repeated words (e.g., “Humbert Humbert” and “the the” but not “the bug” or “the big bug”); 2. all strings that start at the beginning of the line with an integer and that end at the end of the line with a word; 3. all strings that have both the word grotto and the word raven in them (but not, e.g., words like grottos that merely contain the word grotto); 4. write a pattern that places the first word of an English sentence in a register. Deal with punctuation What are f-structure and C-structure in LFG? Consider the following sentence and explain both the structure, “She saw stars in the sky” What are Karaka relations? Explain the difference between Karta and Agent. Calculate the bigram probability for each of the words of the sentence “I want Chinese food.” We are given the following corpus: <s> I am sam </s> <s> Sam I am </s> <s> I am Sam </s> <s> I do not like green eggs and Sam</s> Using a bigram language model with add-one smoothing, what is P(Sam | am)? Include <s> & </s> in your counts just like any other token. We are given the following corpus: <s> I am sam </s> <s> Sam I am </s> <s> I am Sam </s> <s> I do not like green eggs and Sam</s> If we use linear interpolation smoothing between a maximum-likelihood bi-gram model and a maximum-likelihood unigram model with λ1 = 1/2 and λ2 = 1/2, what is P(Sam|am)? Include in your counts just like any other token. How words can be handled in the tagging process? Comment on the validity of the following statements: a) Rule-based taggers are non-deterministic b) Stochastic taggers are language independent c) Brill’s tagger is a rule-based tagger Construct a transition system corresponding to the regular expressions (i) (ab + c*)*b (ii) a + bb +bab*a Construct a finite automaton recognizing L(G), where G is the grammar SaS | bA| b and A aA | bS | a. Construct a deterministic finite automaton equivalent to the grammar S aS | bS | aA, A bB, BaC, CA Find a reduced grammar equivalent to the grammar G whose productions are SAB|CA, BBC|AB, Aa , CaB|b Construct a reduced grammar equivalent to the grammar SaAa, ASb | bCC | DaA, Cabb | DD, EaC, D aDA Let G be SAB, Aa, BC|b, CD, DE and Ea. Eliminate unit productions and get an equivalent grammar. Reduce the following grammar G to CNF. G is SaAD, AaB | bAB, Bb and Dd. Construct a grammar in Greibach normal form to the grammar SAA|a, ASS|b. Convert the grammar SAB, ABS|b, BSA|a into GNF. Find a grammar in GNF equivalent to the grammar E E + T|T, TT*F|F, F(E) | a. Discuss the disadvantages of the basic top-down parser with the help of an appropriate example. Tabulate the sequence of states created by CYK algorithm while parsing, The sun rises in the east Use the following grammar: S NP VP SVP NPDet Noun NP Noun NPNP PP VP VP NP VPVerb VPVP PP PPPreposition NP Give two possible parse tree of the sentence: “Pluck the flower with stick.” Introduce lexicon rules for words appearing in the sentence. Using these parse trees obtain maximum likelihood estimates for the grammar rules used in the tree. Calculate probability of any one parse tree using these estimates. For each of the following verbs give all the selectional restrictions possible: think, eat, bark, fly. Find out the number of senses of ‘still’, ‘bat’ and ‘cricket’ in WordNet. Given the sentence, The shopkeeper showed a Dell laptop to Suha and she liked it very much, find possible referents for the pronouns ‘she’ and ‘it’, and give the score of these referents for the following antecedent indicators: definiteness, referential distance, indicating verbs, term preference, section heading, non-prepositional noun phrase, collocation. Identify the coherence relation between the following sentences: (a) There is a train on Platform 6. (b) Its destination is New Delhi. (c) There is another train on Platform 7. (d) Its destination is Varanasi. Given the following document: The oldest Chinese language we know about is on oracle bones. Priests scratched questions on animal bones and then held the bones in a fire so that they cracked. The places where the cracks crossed the pictograms were thought to give the answers from the god. Assume that raw term frequency is used and the stop words are ‘the’, ‘we’, ‘is’, ‘on’, ‘and’, ‘then’ , ‘in’ , ‘a’, ‘so’ ‘that’, ‘they’, ‘were’, ‘to’, ‘where’, ‘but’, ‘only’, ‘out’. Find the vector representation of the above document. Use porter stemmer for stemming. Give evidence where the use of WordNet relations for query expansion might have an adverse effect on the performance of an IR system. Construct the conceptual graph for the following tagged sentence. A DT methodology NNP for IN calculating VBG square JJ root NNP.