0% found this document useful (0 votes)

2 views

Lecture 4

Uploaded by

Beekan Gammadaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Lecture 4

Uploaded by

Beekan Gammadaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

Language Model

Content
• Word prediction task
• Language modeling (N-grams)
– N-gram introduction
– The chain rule
– Model evaluation
– Smoothing

2
Word Prediction

• Guess the next word...

– ... I notice three guys standing on the ???
• There are many sources of knowledge that
can be used to inform this task, including
arbitrary world knowledge.
• But it turns out that you can do pretty well
by simply looking at the preceding words
and keeping track of some fairly simple
counts.
3
Word Prediction

• We can formalize this task using what are called

N-gram models.
• N-grams are token sequences of length N.
• Our earlier example contains the following 2-
grams (bigrams)
– (I notice), (notice three), (three guys), (guys
standing), (standing on), (on the)
• Given knowledge of counts of N-grams such as
these, we can guess likely next words in a
sequence.
4
N-Gram Models
• More formally, we can use knowledge of
the counts of N-grams to assess the
conditional probability of candidate words
as the next word in a sequence.
• Or, we can use them to assess the
probability of an entire sequence of words.
– Pretty much the same thing as we’ll see...

5
Applications
• It turns out that being able to predict the next word
(or any linguistic unit) in a sequence is an
extremely useful thing to be able to do.
• As we’ll see, it lies at the core of the following
applications
– Automatic speech recognition
– Handwriting and character recognition
– Spelling correction
– Machine translation
– And many more.

6
Counting
• Simple counting lies at the core of any
probabilistic approach. So let’s first take a
look at what we’re counting.
– He stepped out into the hall, was delighted to
encounter a water brother.
• 13 tokens, 15 if we include “,” and “.” as separate
tokens.
• Assuming we include the comma and period, how
many bigrams are there?

7
Counting
• Not always that simple
– I do uh main- mainly business data processing
• Spoken language poses various challenges.
– Should we count “uh” and other fillers as tokens?
– What about the repetition of “mainly”? Should such do-overs
count twice or just once?
– The answers depend on the application.
• If we’re focusing on something like ASR to support indexing for
search, then “uh” isn’t helpful (it’s not likely to occur as a query).
• But filled pauses are very useful in dialog management, so we might
want them there.

8
Counting: Types and Tokens
• How about
– They picnicked by the pool, then lay back on
the grass and looked at the stars.
• 18 tokens (again counting punctuation)
• But we might also note that “the” is used 3
times, so there are only 16 unique types (as
opposed to tokens).
• In going forward, we’ll have occasion to
focus on counting both types and tokens of
both words and N-grams.
9
Counting: Wordforms
• Should “cats” and “cat” count as the same
when we’re counting?
• How about “geese” and “goose”?
• Some terminology:
– Lemma: a set of lexical forms having the same
stem, major part of speech, and rough word
sense
– Wordform: fully inflected surface form
• Again, we’ll have occasion to count both
lemmas and wordforms
10
Counting: Corpora
• So what happens when we look at large bodies of text
instead of single utterances?
• Brown et al (1992) large corpus of English text
– 583 million wordform tokens
– 293,181 wordform types
• Google
– Crawl of 1,024,908,267,229 English tokens
– 13,588,391 wordform types
• That seems like a lot of types... After all, even large dictionaries of English have
only around 500k types. Why so many here?

• Numbers
• Misspellings
• Names
• Acronyms
• etc
11
Language Modeling
• Back to word prediction
• We can model the word prediction task as
the ability to assess the conditional
probability of a word given the previous
words in the sequence
– P(wn|w1,w2…wn-1)
• We’ll call a statistical model that can assess
this a Language Model

12
Language Modeling
• How might we go about calculating such a
conditional probability?
– One way is to use the definition of conditional
probabilities and look for counts. So to get
– P(the | its water is so transparent that)
• By definition that’s
P(its water is so transparent that the)
P(its water is so transparent that)
We can get each of those from counts in a large
corpus.
13
Very Easy Estimate
• How to estimate?
– P(the | its water is so transparent that)

P(the | its water is so transparent that) =

Count(its water is so transparent that the)
Count(its water is so transparent that)

14
Very Easy Estimate
• According to Google those counts are 5/9.
– Unfortunately... 2 of those were to these
slides... So maybe it’s really
– 3/7
– In any case, that’s not terribly convincing due
to the small numbers involved.

15
Language Modeling
• Unfortunately, for most sequences and for
most text collections we won’t get good
estimates from this method.
– What we’re likely to get is 0. Or worse 0/0.
• Clearly, we’ll have to be a little more clever.
– Let’s use the chain rule of probability
– And a particularly useful independence
assumption.

16
The Chain Rule

• Recall the definition of conditional probabilities

P ( A^ B )
• Rewriting: P( A | B) 
P( B)

17
The Chain Rule

P(its water was so transparent)=

P(its)*
P(water|its)*
P(was|its water)*
P(so|its water was)*
P(transparent|its water was so)

18
Unfortunately

• There are still a lot of possible sentences

• In general, we’ll never be able to get enough
data to compute the statistics for those longer
prefixes
– Same problem we had for the strings themselves

19
Independence Assumption
• Make the simplifying assumption
– P(lizard|
the,other,day,I,was,walking,along,and,saw,a) =
P(lizard|a)
• Or maybe
– P(lizard|
the,other,day,I,was,walking,along,and,saw,a) =
P(lizard|saw,a)
• That is, the probability in question is
independent of its earlier history.

20
Independence Assumption

• This particular kind of independence assumption is

called a Markov assumption after the Russian
mathematician Andrei Markov.

21
Markov Assumption

So for each component in the product replace with

the approximation (assuming a prefix of N)

n 1 n 1
P(wn | w 1 ) P(wn | w n N 1 )
Bigram version

n 1
P(w n | w 1 ) P(w n | w n 1 )

22
Estimating Bigram Probabilities

• The Maximum Likelihood Estimate (MLE)

count(w i 1,w i )
P(w i | w i 1) 
count(w i 1 )

23
An Example
• <s> I am Sam </s>
• <s> Sam I am </s>
• <s> I do not like green eggs and ham </s>

24
Maximum Likelihood Estimates
• The maximum likelihood estimate of some parameter of a
model M from a training set T
– Is the estimate that maximizes the likelihood of the training set T given
the model M
• Suppose the word Chinese occurs 400 times in a corpus of a
million words (Brown corpus)
• What is the probability that a random word from some other
text from the same distribution will be “Chinese”
• MLE estimate is 400/1000000 = .004
– This may be a bad estimate for some other corpus
• But it is the estimate that makes it most likely that “Chinese”
will occur 400 times in a million word corpus.

25
Berkeley Restaurant Project Sentences

• can you tell me about any good cantonese restaurants

close by
• mid priced thai food is what i’m looking for
• tell me about chez panisse
• can you give me a listing of the kinds of food that are
available
• i’m looking for a good place to eat breakfast
• when is caffe venezia open during the day

26
Bigram Counts
• Out of 9222 sentences
– Eg. “I want” occurred 827 times

27
Bigram Probabilities
• Divide bigram counts by prefix unigram
counts to get probabilities.

28
Bigram Estimates of Sentence Probabilities

• P(<s> I want english food </s>) =

30
Evaluation
• How do we know if our models are any
good?
– And in particular, how do we know if one
model is better than another.
• Well Shannon’s game gives us an intuition.
– The generated texts from the higher order
models sure look better. That is, they sound
more like the text the model was obtained from.
– But what does that mean? Can we make that
notion operational?

31
Evaluation

• Standard method
– Train parameters of our model on a training set.
– Look at the models performance on some new data
• This is exactly what happens in the real world; we want to know
how our model performs on data we haven’t seen
– So use a test set. A dataset which is different than our
training set, but is drawn from the same source
– Then we need an evaluation metric to tell us how well
our model is doing on the test set.
• One such metric is perplexity

32
Unknown Words
• But once we start looking at test data, we’ll
run into words that we haven’t seen before
(pretty much regardless of how much
training data you have.
• With an Open Vocabulary task
– Create an unknown word token <UNK>
– Training of <UNK> probabilities
• Create a fixed lexicon L, of size V
– From a dictionary or
– A subset of terms from the training set
• At text normalization phase, any training word not in L changed to
<UNK>
• Now we count that like a normal word
– At test time
• Use UNK counts for any word not in training
33
Zero Counts
• Back to Shakespeare
– Recall that Shakespeare produced 300,000 bigram
types out of V2= 844 million possible bigrams...
– So, 99.96% of the possible bigrams were never seen
(have zero entries in the table)
– Does that mean that any sentence that contains one of
those bigrams should have a probability of 0?

34
Laplace-Smoothed Bigram Counts

35
Laplace-Smoothed Bigram Probabilities

36
Backoff and Interpolation
• Another really useful source of knowledge
• If we are estimating:
– trigram p(z|x,y)
– but count(xyz) is zero
• Use info from:
– Bigram p(z|y)
• Or even:
– Unigram p(z)
• How to combine this trigram, bigram,
unigram info in a valid fashion?
37

B2PLUS U6 Test Higher
100% (1)
B2PLUS U6 Test Higher
6 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
CME4408 P5 N-grams Smooting
No ratings yet
CME4408 P5 N-grams Smooting
43 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
lecture5-ngrams
No ratings yet
lecture5-ngrams
40 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
56 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
59 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
6.Chapter6_LanguageModel
No ratings yet
6.Chapter6_LanguageModel
33 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
Ngrams
100% (1)
Ngrams
22 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
No ratings yet
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
32 pages
Lecture 2. N-Gram LMs
No ratings yet
Lecture 2. N-Gram LMs
77 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
N-Grams and Corpus Linguistics: Julia Hirschberg
No ratings yet
N-Grams and Corpus Linguistics: Julia Hirschberg
47 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
N Grams
No ratings yet
N Grams
51 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
KEN2570 4 LanguageModel
No ratings yet
KEN2570 4 LanguageModel
17 pages
Session 2-3 Language Modeling
No ratings yet
Session 2-3 Language Modeling
69 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
IS 7118 Unit-4 N-Grams
100% (2)
IS 7118 Unit-4 N-Grams
93 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
88 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
Lec 3 slp04 LM and Ngrans
No ratings yet
Lec 3 slp04 LM and Ngrans
73 pages
Natural Language Processing:: N-Gram Language Models
No ratings yet
Natural Language Processing:: N-Gram Language Models
48 pages
3_LM_2024
No ratings yet
3_LM_2024
78 pages
Language Modeling and Spelling Correction
No ratings yet
Language Modeling and Spelling Correction
97 pages
NLP_Module 2(1)
No ratings yet
NLP_Module 2(1)
77 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
lm24aug
No ratings yet
lm24aug
84 pages
Week 3
No ratings yet
Week 3
24 pages
02 Estimating N-Gram Probabilities 9-38
No ratings yet
02 Estimating N-Gram Probabilities 9-38
4 pages
n Grams -Nptel Notes
No ratings yet
n Grams -Nptel Notes
75 pages
Chapter Four 1
No ratings yet
Chapter Four 1
91 pages
Chapter 03-Number System
No ratings yet
Chapter 03-Number System
88 pages
N-Grams and Smoothing: CSC 371: Spring 2012
No ratings yet
N-Grams and Smoothing: CSC 371: Spring 2012
39 pages
NLP
No ratings yet
NLP
46 pages
3_2
No ratings yet
3_2
26 pages
NLP UNIT III (Part 1)
No ratings yet
NLP UNIT III (Part 1)
15 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
Language Model PDF
No ratings yet
Language Model PDF
76 pages
08 Language Models
No ratings yet
08 Language Models
69 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
NLP Cat 2
No ratings yet
NLP Cat 2
78 pages
Analysis of Statistical Parsing in Natural Language Processing
No ratings yet
Analysis of Statistical Parsing in Natural Language Processing
6 pages
N-Grams - Text Representation
No ratings yet
N-Grams - Text Representation
23 pages
04_N-gram Language Models
No ratings yet
04_N-gram Language Models
41 pages
The Little Prover
From Everand
The Little Prover
Daniel P. Friedman
4/5 (3)
Be the Best at Math
From Everand
Be the Best at Math
Rebecca Rissman
No ratings yet
DOLE Citizens Charter
No ratings yet
DOLE Citizens Charter
85 pages
LIFT
No ratings yet
LIFT
11 pages
Act 471 Native Courts Criminal Jurisdiction Act 1991
No ratings yet
Act 471 Native Courts Criminal Jurisdiction Act 1991
8 pages
Irta Is Seeking A PHD Student Granted by The Spanish Research Agency (Aei)
No ratings yet
Irta Is Seeking A PHD Student Granted by The Spanish Research Agency (Aei)
2 pages
四年级英文
No ratings yet
四年级英文
15 pages
Grand Paris Express
No ratings yet
Grand Paris Express
17 pages
Bank of England and The British Empire: A "New World Order"?
100% (6)
Bank of England and The British Empire: A "New World Order"?
157 pages
CPD II Sessional
No ratings yet
CPD II Sessional
72 pages
Double Diamond Agile
No ratings yet
Double Diamond Agile
20 pages
E Passbook 2024 10 15 19 43 53 PM
No ratings yet
E Passbook 2024 10 15 19 43 53 PM
16 pages
Sun Tzŭ On The Art of War, by Sunzi-A Project Gutenberg Ebook
No ratings yet
Sun Tzŭ On The Art of War, by Sunzi-A Project Gutenberg Ebook
86 pages
BSI9999 2017
No ratings yet
BSI9999 2017
29 pages
2N LIFT1 User Guide EN 2.0.2
No ratings yet
2N LIFT1 User Guide EN 2.0.2
143 pages
2025 Baseline Test GRADE 12 MATHEMATICS_
No ratings yet
2025 Baseline Test GRADE 12 MATHEMATICS_
8 pages
Jio (Marketing Strategy Analysis)
No ratings yet
Jio (Marketing Strategy Analysis)
28 pages
Tech iFIX
No ratings yet
Tech iFIX
5 pages
Aman 2018 Decolonising Intercultural Education - Colonial Differences, The Geopolitics of Knowledge, and Inter-Epistemic Dialogue
No ratings yet
Aman 2018 Decolonising Intercultural Education - Colonial Differences, The Geopolitics of Knowledge, and Inter-Epistemic Dialogue
116 pages
Copernicus-Enabled Assessment of The Impact of War On Ukrainian Agriculture White Paper
No ratings yet
Copernicus-Enabled Assessment of The Impact of War On Ukrainian Agriculture White Paper
32 pages
Kanika Annexture B (Synopsis)
No ratings yet
Kanika Annexture B (Synopsis)
5 pages
Moral Choices An Introduction To Ethics Rae Download PDF
100% (8)
Moral Choices An Introduction To Ethics Rae Download PDF
49 pages
Diosdado Macapagal
100% (1)
Diosdado Macapagal
3 pages
Finnegan NASTool
No ratings yet
Finnegan NASTool
1 page
Outcome Based Course Syllabus of The College of Engineering and Architecture
No ratings yet
Outcome Based Course Syllabus of The College of Engineering and Architecture
7 pages
Lit Elements Handout
No ratings yet
Lit Elements Handout
4 pages
Soil Testing Geotechnical Investigations
No ratings yet
Soil Testing Geotechnical Investigations
21 pages
Sleep, Neurobehavioral Functioning, and Behavior Problems in School-Age Children
No ratings yet
Sleep, Neurobehavioral Functioning, and Behavior Problems in School-Age Children
13 pages
28.9 RHEL-7-Security - Guide-en-US PDF
No ratings yet
28.9 RHEL-7-Security - Guide-en-US PDF
266 pages
MATH MODULE 1 Answers9423
No ratings yet
MATH MODULE 1 Answers9423
9 pages
Kagura sensory
No ratings yet
Kagura sensory
14 pages

Lecture 4

Uploaded by

Lecture 4

Uploaded by

Language Model

• Guess the next word...

• We can formalize this task using what are called

P(the | its water is so transparent that) =

• Recall the definition of conditional probabilities

P(its water was so transparent)=

• There are still a lot of possible sentences

• This particular kind of independence assumption is

So for each component in the product replace with

• The Maximum Likelihood Estimate (MLE)

• can you tell me about any good cantonese restaurants

• P(<s> I want english food </s>) =

You might also like