0% found this document useful (0 votes)
69 views6 pages

Pract Q

Uploaded by

purid9991
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views6 pages

Pract Q

Uploaded by

purid9991
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Practice Questions

Q. Convert the following Context Free Grammar (CFG) into Chomsky Normal Form (CNF)

S -> a B B B | b A A A
A -> a | A s | b B B
B -> b| b S |A a a
Where S-Start Symbol, A and B -Non-Terminal symbols, a,b and s – terminal symbols.

Q. Apply CYK parsing algorithm to generate the parsing table for input sentence “the pilot flew the plane
to Delhi” using the given grammar in CNF

Q. Consider the sentence “Sing loudly to Dance joyfully” that needs to be tagged using a Hidden Markov Model
(HMM). Two stochastic probability matrices are given: matrix A for state transitions and matrix B for emission
probabilities, with two states being Adjective (Adj) and Adverb (Adv).

The state transition matrix A and the emission probability matrix B are provided as follows:
Matrix A (State Transition Probabilities):
State Adj Adv
Adj 0.6 0.4
Adv 0.3 0.7
Matrix B (Emission Probabilities):
State/Obs. Adj Adv
Adj 0.5 0.5
Adv 0.4 0.6

1. Draw the HMM transition model with transition probabilities Aij and emission probabilities Bij for the
two ambiguous words of the sentence W1=Sing and W2=Dance as states of the model.

2. Determine the state (Adj or Adv), if the observation or output state is “Adverb” and also calculate the
probability for W1=Sing and W2=Dance.

3. If the sequence of observations or output states is “Adv-Adj-Adv”, calculate the probability of


each possible case for the word sequence and write the most likely word sequence.

Q. Write Notes on following:


• Morphological ambiguity
• Label Bracketing
• Bag of Word
• Transition based Discourse Analysis
Q. Explain the difference between Tokenization and Segmentation in the context of Natural Language
Processing. Provide a suitable example to illustrate these concepts.
Q. What are the key differences between Syntax and Semantics in language processing? Give an example that
highlights these differences.
Q. Compare and contrast between Part-of-Speech Tagging and Named Entity Recognition in NLP. Use examples
to demonstrate how these processes differ in handling text data.
Q. What is the distinction between a Corpus and a Lexicon in linguistic studies? Illustrate your answer with an
example of how each would be used in language analysis.

Q The following shows a simple context free grammar (CFG) for a fragment of English.

the Show the parse tree for sentence “the dog is angry at the cat”.

Q. Show all possible parse trees for the sentence "bronze pots clatter" and calculate the probability
of each tree using PCFG.
S -> Noun VP [0.5]
S -> NP Verb [0.5]
VP -> Verb Noun [1.0]
NP -> Adj Noun [1.0]
Adj -> "bronze" [1.0]
Noun -> "bronze" [0.4]
Noun -> "pots" [0.3]
Noun -> "clatter" [0.3]
Verb -> "bronze" [0.3]
Verb -> "pots" [0.5]
Verb -> "clatter" [0.2]
Answer
Parse Tree 1:
S -> Noun VP
Noun -> "bronze"
VP -> Verb Noun
Verb -> "pots"
Noun -> "clatter"

Probability Calculation:

P(T1)=P(S→Noun VP)×P(Noun→"bronze")×P(VP→Verb Noun)×P(Verb→"pots")×P(Noun→"clatter


")

P(T1)=0.5×0.4×1.0×0.5×0.3 =0.03

Parse Tree 2:
S -> NP Verb
NP -> Adj Noun
Adj -> "bronze"
Noun -> "pots"
Verb -> "clatter"
Probability Calculation:
P(T2)=P(S→NP Verb)×P(NP→Adj Noun)×P(Adj→"bronze")×P(Noun→"pots")×P(Verb→"clatter")
P(T2)=0.5×1.0×1.0×0.3×0.2
P(T2)=0.03
Both parse trees T1 and T2 are possible for the sentence "bronze pots clatter" with the provided
PCFG, and both have the same probability of 0.03. This illustrates how natural language sentences
can have multiple valid parses, and how PCFGs can be used to calculate the probability of each
parse, which can be useful in choosing the most likely parse in natural language processing
applications.

Q. Given the following PCFG rules, show all possible parse trees for the sentence "red birds sing"
and calculate the probability of each tree.
S -> Noun VP [0.6] S -> NP Verb [0.4] VP -> Verb Noun [0.9] VP -> Verb Adj [0.1] NP -> Adj Noun
[1.0] Adj -> "red" [1.0] Noun -> "red" [0.2] Noun -> "birds" [0.5] Noun -> "sing" [0.3] Verb -> "red"
[0.1] Verb -> "birds" [0.2] Verb -> "sing" [0.7]

Q. Using the PCFG provided, construct all possible parse trees for the sentence "green fish swim"
and compute their respective probabilities.

S -> NP VP [0.7]
S -> Noun VP [0.3]
VP -> Verb NP [0.8]
VP -> Verb Noun [0.2]
NP -> Adj Noun [1.0]
Adj -> "green" [1.0]
Noun -> "green" [0.3]
Noun -> "fish" [0.4]
Noun -> "swim" [0.3]
Verb -> "green" [0.2]
Verb -> "fish" [0.3]
Verb -> "swim" [0.5]

Q. Create a unigram and bigram word model for the following corpus:
Have fun on the school trip.
Ask the teacher if you have any problem.
I have fun just looking around.
(i) Predict the probability of occurrence of next word as ‘any’ after the given word as ‘have’.
(ii) Predict the probability of occurrence of next word as ‘fun’ after the given word as ‘have’.
(iii) Predict the probability of sentence ‘Ask the teacher if you have any problem’ considering Bigram.
(iv) Predict the probability of sentence ‘Ask the teacher if you have any problem’ considering Trigram.
Answer

Unigram model
The unigram model considers each word independently.

To calculate the probability of a word, we use:


p(word)=Count(word) / Total number of words
Total number of words in the corpus = 18
Count("have") = 2
Count("any") = 1
Count("fun") = 2

P("have")=2/18
P("any")=1/18
P ("fun")=2/82

Bigram Model
The bigram model considers a pair of words. To calculate the probability of a word given the previous
word, we use:
p(word2 | word1)=Count(word1 word2) / Count(word1)

Count("have any") = 1
Count("have fun") = 2

P("any" | "have")=1/2
P("fun" | "have")= 2/2

(i) The probability of occurrence of the next word as ‘any’ after the given word as ‘have’:
P("any" | "have")=0.5
(ii) The probability of occurrence of the next word as ‘fun’ after the given word as ‘have’:
P("fun" | "have")=1.0
Predict the probability of the sentence ‘Ask the teacher if you have any problem’ considering
Bigram:
P(Sentence | Bigram)=P(Ask | START)×P(the | Ask)×P(teacher | the)×P(if | teacher)×P(you | i
f)×P(have | you)×P(any | have)×P(problem | any)×P(END | problem)
P(Ask | START) = Count(START Ask) / Count(START) = 1/3 (since "Ask" is the starting
word in one out of three sentences)
p(the | Ask)P(the | Ask) = Count(Ask the) / Count(Ask) = 1/1
p(teacher | the)P(teacher | the) = Count(the teacher) / Count(the) = 1/2
p(if | teacher)P(if | teacher) = Count(teacher if) / Count(teacher) = 1/1
p(you | if)P(you | if) = Count(if you) / Count(if) = 1/1
p(have | you)P(have | you) = Count(you have) / Count(you) = 1/1
p(any | have)P(any | have) = Count(have any) / Count(have) = 1/2
p(problem | any)P(problem | any) = Count(any problem) / Count(any) = 1/1
p(END | problem)P(END | problem) = Count(problem END) / Count(problem) = 1/1
(assuming that every sentence ending with "problem" is followed by an END token)

P(Sentence | Bigram)=31×1×21×1×1×1×21×1×1
P(Sentence | Bigram)= 1/12
So, the probability of the sentence "Ask the teacher if you have any problem" under the bigram
model and given the provided corpus is 1/12 or approximately 0.0833.

(iv) To calculate the probability of this sentence using the trigram model, we need to find the
probability of each word given the two previous words, i.e., p(wordn∣word n−1,wordn−2)).

P(Sentence | Trigram)=1×1×1×1×1×1×1×1×1
P(Sentence | Trigram)=1

Q. Questions on N-Gram Model

Corpus:
Time flies like an arrow.
Fruit flies like a banana.
She likes to have fruit for breakfast.

Question 1:
Create a unigram model for the above corpus and use it to:
(i) Predict the probability of occurrence of the word 'like'.
(ii) Predict the probability of occurrence of the word 'flies'.
Question 2:
Create a bigram model for the same corpus and use it to:
(i) Predict the probability of the next word being 'flies' given the current word is 'Time'.
(ii) Predict the probability of the next word being 'an' given the current word is 'like'.
Question 3:
Using the bigram model, predict the probability of the sentence 'Fruit flies like a banana'
occurring in the corpus.
Question 4:
Create a trigram model for the corpus and use it to:
(i) Predict the probability of the next word being 'arrow' given the previous two words are
'Time flies'.
(ii) Predict the probability of the next word being 'for' given the previous two words are
'to have'.
Question 5:
Using the trigram model, predict the probability of the sentence 'She likes to have fruit for
breakfast' occurring in the corpus.

Instructions for Solving Q1-Q5:


Unigram Model: Count the frequency of each word and divide by the total number of
words in the corpus.
Bigram Model: Count the frequency of each pair of consecutive words (bigram) and divide
by the frequency of the first word in the pair.
Trigram Model: Similar to bigram, but for sequences of three words.
Sentence Probability Calculation:
For bigrams: Multiply the probabilities of each bigram in the sentence.
For trigrams: Multiply the probabilities of each trigram in the sentence.

You might also like