0% found this document useful (0 votes)
124 views4 pages

Bahir Dar University Bahir Dar Institute of Technology Faculty of Computing Department of Computer Science

The document discusses n-gram language modeling and provides the most probable next words for 4 word sequences based on a bi-gram language model trained on a toy data set. For the sequence "Sam . . .", the most probable next word is "I". For "Sam I do . . .", the next words "I" and "Like" are equally probable. For "Sam I am Sam . . .", the next word is again "I". And for "do I like . . .", the most probable next word is "Sam".

Uploaded by

Molalegn Tamiru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views4 pages

Bahir Dar University Bahir Dar Institute of Technology Faculty of Computing Department of Computer Science

The document discusses n-gram language modeling and provides the most probable next words for 4 word sequences based on a bi-gram language model trained on a toy data set. For the sequence "Sam . . .", the most probable next word is "I". For "Sam I do . . .", the next words "I" and "Like" are equally probable. For "Sam I am Sam . . .", the next word is again "I". And for "do I like . . .", the most probable next word is "Sam".

Uploaded by

Molalegn Tamiru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Bahir Dar University

Bahir Dar Institute of Technology


Faculty of Computing
Department of Computer Science
Natural Language Processing(CoSc5262)

“ Assignment Four : NGRAM Language Modeling”

Name: Molalegn Tamiru ID: BDU1300608

Submitted To: Dr. Milion M. (PHD).

June 01, 2021

Addis Ababa, Ethiopia


Consider the following toy example:

Training data:

I am Sam

Sam I am

Sam I like

Sam I do like

do I like Sam

Assume that we use a bi-gram language model based on the above training data. What is the
most probable next word predicted by the model for the following word sequences? Show.

(1) Sam . . .

(2) Sam I do . . .

(3) Sam I am Sam . . .

(4) do I like . . .

Solution :-

Next word prediction is an input technology that simplifies the process of


typing by suggesting the next word to a user to select, as typing in a
conversation consumes time[1].n-grams are Markov models that estimate
words from a fixed window of previous words. n-gram probabilities can be
estimated by counting in a corpus and normalizing (the maximum likelihood
estimate).

Estimating bigram probabilies by checking the probability of next word prediction for word(W)
can be calculated as [2]

P(W|Wi-1) = count(Wi-1,W)/count(Wi-1)
Assume that we use a bi-gram language model based on the above training data. What is
the most probable next word predicted by the model for the following word sequences?
Show.

 (1) Sam . . .

 Check the probability of word next to sam :-

 P(I|Sam) = count(Sam , I)/count(Sam) = 3/5 = 0.6

 P(am|sam) = count(Sam,am)/count(Sam) = 0/5 = 0

 P(do|Sam) = count(Sam , do)/count(Sam) = 0/5 = 0

 P(like|Sam) = count(Sam,like)/count(Sam) = 0/5 = 0

 P(Sam|Sam) = count(Sam,Sam)/count(Sam) = 0/5 = 0

 therefore word next to “Sam” is “I”

 (2) Sam I do . . .

 Check the probability of word next to do :-

 P(I|do) = count(do , I)/count(do) = ½ = 0.5

 P(am|do) = count(do,am)/count(do) = 0/2 = 0

 P(like|do) = count(do,like)/count(do) = ½ = 0.5

 P(Sam|do) = count(do,am)/count(do) = 0/2 = 0

 P(do|do) = count(do , do)/count(do) = 0/2 = 0

 therefore word next to do are “I” and “Like” are equally probable

 (3) Sam I am Sam . . .

 Check the probability of word next to sam :-

 P(I|Sam) = count(Sam , I)/count(Sam) = 3/5 = 0.6

 P(am|sam) = count(Sam,am)/count(Sam) = 0/5 = 0


 P(do|Sam) = count(Sam , do)/count(Sam) = 0/5 = 0

 P(like|Sam) = count(Sam,like)/count(Sam) = 0/5 = 0

 P(Sam|Sam) = count(Sam,Sam)/count(Sam) = 0/5 = 0

 therefore word next to “Sam” is “I”

 (4) do I like . . .

 Check the probability of word next to Like :-

 P(I|like) = count( like , I)/count(like) = 0/3 = 0

 P(do|like) = count( like , do)/count(like) = 0/3 = 0

 P(Sam|like) = count(like,Sam)/count(like) = 1/3 = 0.33333333

 P(am|like) = count(like,am)/count(like) = 0/3 = 0

 P(like|like) = count(like,like)/count(like) = 0/3 = 0

 Therefore word next to like is Sam

[1] R. Nagata, H. Takamura, and G. Neubig, “Adaptive Spelling Error Correction Models for Learner
English,” Procedia Comput. Sci., vol. 112, pp. 474–483, 2017, doi: 10.1016/j.procs.2017.08.065.

[2] J. Lin, “N-Gram Language Models N-Gram Language Models,” 2009.

You might also like