0% found this document useful (0 votes)

17 views29 pages

Lecture 4 N Grams

The document discusses N-grams, which are statistical models used for word prediction based on the occurrence of sequences of words. It covers their applications in various natural language processing tasks such as speech recognition, machine translation, and spelling correction, as well as the importance of training and test sets in evaluating these models. Additionally, it addresses challenges related to vocabulary, including open versus closed vocabulary tasks and the estimation of probabilities for unknown words.

Uploaded by

p8449878

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views29 pages

Lecture 4 N Grams

Uploaded by

p8449878

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

N-grams

 Introduction
 Training and Test Sets,
- word prediction,
- statistical model,
- probability of occurrence,
 N-gram Sensitivity to the Training
- language models (LMs), Corpus,
 Applications of N-grams  Unknown Words: Open versus
- part-of speech tagging, closed vocabulary tasks,
- natural language generation, - out of vocabulary (OOV),
- word similarity, - estimate the probabilities,
- authorship identification,  Evaluating N-grams: Perplexity,
sentiment extraction, - Intrinsitic/extrinsic evaluation
 Counting Words in Corpora,  Smoothing
- text corpus or speech, - Laplace Smoothing, Interpolation
 Simple (Unsmoothed) N-grams, - Back off
- transparent rules,  Kneser- Ney Smoothing
- bigram model, trigram model,
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
1. Introduction of N-Grams
“Predicting words seems somewhat less fraught”
We formalize this idea of word prediction with probabilistic models called
N-gram models, which predict the next word from the previous N −1 words.

For Examples;
=> Finding the probability of occurrence of word “been” in following the
sentence “I have….”
=> Please turn your homework ….
- Hopefully, most of you concluded that a very likely word is “in” or possibly
“over” but probably not “the”.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
1. Introduction of N-Grams (Cont…)
=> Please turn your homework ….
An N-grams is an N-token sequence of words:
- a 2-gram (more commonly called a bigram) is a two-word sequence of
words like “please turn”, “turn your” or “your homework”.
- a 3-gram (more commonly called a trigram) is a three-word sequence of
words like “please turn your”, or “turn your homework”.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
1. Introduction of N-Grams (Cont…)
 Such statistical models of word sequences are also called language models
or LMs.
- computing the probability of the next word turns out to be closely related
to computing the probability of a sequence of words.

For Example; the following sentence, has a non-zero probability of appearing

a text as;
 non-zero probability:
“ …. all of a sudden I notice three guys standing on the sidewalk…”
 Lower probability:
- while the same set of words in a different order has a much lower probability;
“ …. on guys all I notice sidewalk three a sudden standing the…”

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
1. Introduction of N-Grams (Applications)
 Estimators like N-grams that assign a conditional probability to possible
next words can be used to assign a joint probability to an entire sentence.
 N-grams are essential in any task in which we have to identify words in
noisy, ambiguous input.

 In speech recognition, for example, the input speech sounds are very
confusable and many words sound extremely similar. e.g., in the movie
“take the Money and Run”;
e.g., - “I have a gun” is far more probable than
- the non-word “I have a gub” or even “I have a gull”.

 N-grams are essential In machine translation, suppose we are translating

a Chinese source sentence into English translations ;
e.g., - “he briefed to reporters on the chief contents of the statement”.
- “he briefed to reporters on the main contents of the statement”.
- “he briefed reporters on the main contents of the statement”.
=> An N-gram grammar might tell us that, brief reporters is more likely than briefed
to reporters, and main contents is more likely than chief contents.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
1. Introduction of N-Grams (Applications) (Cont…)
 In spelling correction, we need to find and correct spelling errors like the
following that accidentally result in real English words:
e.g., - “They are leaving in about fifteen minuets to go to her house.”

NOTE: Since these errors have real words, we can’t find them by just flagging
words that aren’t in the dictionary.
Therefore, “in about fifteen minuets” is a much less probable sequence than “in
about fifteen minutes”.

 A spellchecker can use a probability estimator to both factors as;

- detect these errors and
- to suggest higher-probability corrections.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
1. Introduction of N-Grams (Applications) (Cont…)
Applications of N-grams
N-grams are also crucial in NLP tasks like;
Part-of speech tagging:
- In determining the role of word as a noun or verb and etc. in sentence
given the previous words
e.g., - This is a Screw(Noun).
- Please Screw(Verb) this nut.
Natural language generation:
- Predicting the next word in sentence.
e.g., - This has to…..
Word similarity:
- Finding similarity between words.
e.g., - minutes and minuet.
as well as in applications from authorship identification and sentiment
extraction to predictive text input systems for cell phones.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
2. Counting Words in Corpora
 Probabilities are based on counting things.

 Counting of things in natural language is based on a corpus (plural

corpora), an on-line collection of text Corpus or speech.

 Vocabulary size (the number of types) grows with at least the

square root of the number of tokens (i.e. V > O(√N).

 Two popular corpora (1) Brown corpus, (2) Switchboard corpus.

(1) Brown corpus : It has 61,805 wordform type.
(2) Switchboard corpus: It has 20,000 wordform type.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
2. Counting Words in Corpora (Cont…)
 Brown corpus : It has 61,805 wordform type.
- with just 87-tag tagset.

Figure: 87-tag Brown corpus tagset.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Simple (Unsmoothed) N-grams
 Our goal is
- to compute the probability of a word w
- given some history h, or P(w|h).
 Suppose the history h is “its water is so transparent that” and we want to
know the probability that the next word is “the”:
P(the|its water is so transparent that).
P(the|its water is so transparent that) = C(its water is so transparent that the)
C(its water is so transparent that)

 If we wanted to know the joint probability of an entire sequence of words

like its water is so transparent, we could do it by asking “out of all
possible sequences of 5 words, how many of them are its water is so
transparent?”
 We would have to get the count of its water is so transparent, and divide
by the sum of the counts of all possible 5 word sequences.
“ That seems rather a lot to estimate”.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Simple (Unsmoothed) N-grams (Cont…)
 For this reason, we represent the probability of a particular random
variable;
- Xi taking on the value “the”, or P(Xi = “the”),
- represent a sequence of N words either as P(w1 , w2,... wn)

 How we compute probabilities of entire sequences like P(w1 , w2,... wn)?

 We can do is to decompose this probability using the chain rule of
probability:

Bigram model trigram model N-1 gram model

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Simple (Unsmoothed) N-grams (Cont…)
 N-grams Models

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Simple (Unsmoothed) N-grams (Cont…)
 The bigram model, for example,
- approximates the probability of a word given all the previous words
by the conditional probability of the preceding word
- In other words, instead of computing the probability;

- we approximate it with the probability;

 When we use a bigram model to predict the conditional probability of the

next word, we are thus making the following approximations as;

 Thus, the general equation for this N-gram approximation to the

conditional probability of the next word is a sequence is;

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Simple (Unsmoothed) N-grams (Cont…)
 Finally, the simplest and most intuitive way to estimate probabilities is
called
- Maximum Likelihood Estimation, or MLE.

 We get the MLE estimate for the parameters of an N-gram model by

taking counts from a corpus, and normalizing them so they lie between 0
and 1.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Simple (Unsmoothed) N-grams (Example-1)
 Let’s work through an example using a mini-corpus of three sentences.
- We’ll first need to augment each sentence with a special symbol <s> at the
beginning of the sentence, to give us the bigram context of the ﬁrst word. We’ll
also need a special end-symbol </s>.
<s> I am Sam </s>
<s> Sam I am </s>
<s> I do not like green eggs and ham </s>
 Here are the calculations for some of the bigram probabilities from this
corpus.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Simple (Unsmoothed) N-grams (Example-2)
 Suppose the word Chinese occurs 400 times in a corpus of a million words
like the Brown corpus.
 What is the probability that a random word selected from some other text
of say a million words will be the word Chinese?

 The MLE estimate of its probability is 400 / 1000000 = .0004.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Simple (Unsmoothed) N-grams (Class Participation)
 Calculate the bigram probabilities using a mini-corpus of following
sentences combination;
 Phone conversation  Flight timing conversation
<s> Hello, are you fine </s> <s> My flight time is 2:00pm </s>
<s> Hello, I am fine </s> <s> Good, my flight time is also 2:00pm </s>
<s> are your fine</s> <s> let’s go same time 2:00pm</s>
<s> I am fine too</s> <s> I will come 2:00pm sharply</s>

 Appointment for meeting

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
4. Training and Test Sets
 The probabilities of an N-gram model come from the corpus it is trained
on.
 The parameters of a statistical model are trained on some set of data
(training corpus) , and then;
- we apply the models to some new data in some task (test corpus) (such
as speech recognition) and see how well they work.
 There is a useful metric for how well a given statistical model matches a
test corpus, called perplexity.
- Perplexity is based on computing the probability of each sentence in
the test set.
- and model assigns a higher probability to the test set.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
4.1 N-Gram sensitivity to the Training Corpus
 N-grams do a better and better job of modeling the training corpus;
- as we increase the value of N (i.e., Uni, bi, Trigram, ….)
 We especially wouldn’t choose training and tests from different genres of
text like;
- newspaper text, early English fiction, telephone conversations, and web
pages.

For Example;
 to build N-grams for text prediction in SMS (Short Message Service),
- we need a training corpus of SMS data.
 To build N-grams on business meetings,
- we would need to have corpora of transcribed business meetings.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
4.2 Unknown Words: Open versus closed vocabulary tasks
 Closed Vocabulary is the assumption that we have such a lexicon; and
- the test set can only contain words from this lexicon.
- The closed vocabulary task thus assumes there are no unknown words.
 We call these unseen events unknown words, or out of vocabulary
(OOV) words.
- The percentage of OOV words that appear in the test set is called the
OOV rate.
 Open Vocabulary system is one where we model these potential
unknown words in the test set by adding a pseudo-word called
<UNK>.
• We can train the probabilities of the unknown word model <UNK>
by following ways
Example of open vocabulary;
- Subject used i.e., instead of For example.

2. Convert in the training set any word that is not in this set (any
OOV word) to the unknown word token <UNK> in a text
normalization step.

3. Estimate the probabilities for <UNK> from its counts just like any
other regular word in the training set.
For Example;
- list of fruits i.e., apple, grapes, …..
- contents of course i.e., regular expression, parsing, ….

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
5. Evaluating N-grams: Perplexity
 The correct way to evaluate the performance of a language model is;
- to embed it in an application and measure the total performance of the
application.

(1) Extrinsic evaluation :-

 End-to-end evaluation is called extrinsic evaluation, and also sometimes
called in vivo evaluation.
- End-to-end evaluation is often very expensive; evaluating a large speech
recognition test set.
- does not guarantee an improvement in speech recognition performance

(2) Intrinsic evaluation :-

 An intrinsic evaluation metric is one which measures the quality of a
model dependent to specific application.
- given two probabilistic models, that has a tighter fit to the test data or predicts
the details of the test data better.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6. Smoothing
 Any corpus is limited, some perfectly acceptable English word sequences
are bound to be missing from it.
- “zero probability N-grams”.

 Words that are in our vocabulary (they are not unknown words),
- but appear in a test set in an unseen context (For Example; they appear
after a word they never appeared after in training)?.

 Due to this problem, the maximum likelihood estimate of the probability

for this N-gram will be zero.

 This mean, to evaluate our language model,

- we need to modify the MLE method to assign some non-zero probability
to any N-gram,
- even one that was never observe in training.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6. Smoothing (Cont…)
 To keep a language model from assigning zero probability to these
unseen events, we’ll have;
- to shave off a bit of probability mass from some more frequent events
- and give it to the events we’ve never seen.

 This modification is called smoothing or discounting.

 There are variety of ways to do smoothing:

(a) add-1 smoothing,
(b) add-k smoothing,
(c) Kneser-Ney smoothing

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6.1 Laplace Smoothing (add-1/ add-k smoothing)
 One simple way to do smoothing might be just to take our matrix of
bigram counts,
- before we normalize them into probabilities, and add one to all the
counts.
 This algorithm is called Laplace smoothing, or Laplace’s Law or add one
smoothing.
 Laplace smoothing merely adds one to each count (hence its alternate
name add one smoothing).

For Example;
 Probability of new word “fax” will not be zero it will be
 P(fax)= 0+1/N+V (N = increment, V = total number of words)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6.2 Advanced Smoothing Methods: Kneser-Ney
Smoothing
 A brief introduction to the most commonly used modern N-gram smoothing
method, the interpolated Kneser-Ney algorithm.
 Kneser-Ney has its roots in a discounting method called Absolute discounting.
 Absolute discounting is a much better method of computing a revised count c∗
than the Good-Turing discount.
 Kneser-Ney discounting augments absolute discounting with a more
sophisticated way to handle the backoff distribution.
 The Kneser-Ney intuition is to base our estimate on the number of different
contexts word w has appeared in.
For Example;
 PE(I went to the store), has different contexts as;
=> PE(store went to I the) or PE(I store went to the) or PE(store went to I the)

 Words that have appeared in more contexts are more likely to appear in some
new context as well.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Cambridge Final 7 - 12 Level 6
100% (1)
Cambridge Final 7 - 12 Level 6
7 pages
Speakout Advanced Workbook
100% (3)
Speakout Advanced Workbook
97 pages
SB - PERSONAL BEST B1 - GRAMMAR PRACTICE - Removed
No ratings yet
SB - PERSONAL BEST B1 - GRAMMAR PRACTICE - Removed
1 page
A Book of Anagrams: An Ancient Word Game
From Everand
A Book of Anagrams: An Ancient Word Game
Daniel H. Wieczorek
No ratings yet
NLP 1.2
No ratings yet
NLP 1.2
22 pages
CME4408 P5 N-Grams Smooting
No ratings yet
CME4408 P5 N-Grams Smooting
43 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
NLP - Module 2
No ratings yet
NLP - Module 2
77 pages
Chapter 5
No ratings yet
Chapter 5
22 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
N-Gram Language Models
No ratings yet
N-Gram Language Models
26 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
NLP Unit2
No ratings yet
NLP Unit2
65 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
IS 7118 Unit-4 N-Grams
100% (2)
IS 7118 Unit-4 N-Grams
93 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Chapter Four 1
No ratings yet
Chapter Four 1
91 pages
Module-1 ch-2
No ratings yet
Module-1 ch-2
31 pages
NLP-Ch-2 Introduction To Language Models
No ratings yet
NLP-Ch-2 Introduction To Language Models
82 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
NLP m2
No ratings yet
NLP m2
74 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
Chapter 6-NLP
No ratings yet
Chapter 6-NLP
8 pages
IJISRT18DC138
No ratings yet
IJISRT18DC138
6 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
25 pages
Analysis of Statistical Parsing in Natural Language Processing
No ratings yet
Analysis of Statistical Parsing in Natural Language Processing
6 pages
NLP
No ratings yet
NLP
12 pages
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
No ratings yet
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
46 pages
Unit 3-Notes AI
No ratings yet
Unit 3-Notes AI
36 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Unit 5-Aiml
No ratings yet
Unit 5-Aiml
25 pages
NLP Cat 2
No ratings yet
NLP Cat 2
78 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
NLP Lec 11
No ratings yet
NLP Lec 11
6 pages
Lec 3 slp04 LM and Ngrans
No ratings yet
Lec 3 slp04 LM and Ngrans
73 pages
2 N-Gram
No ratings yet
2 N-Gram
70 pages
Lecture 6 To 8 N-Gram
No ratings yet
Lecture 6 To 8 N-Gram
19 pages
NLP Unit-4
No ratings yet
NLP Unit-4
62 pages
Predicting Words and Sentences Using Statistical Models: Nicola Carmignani
No ratings yet
Predicting Words and Sentences Using Statistical Models: Nicola Carmignani
42 pages
Artificial Intelligence: Natural Language Processing
No ratings yet
Artificial Intelligence: Natural Language Processing
13 pages
NLP UNIT III (Part 1)
No ratings yet
NLP UNIT III (Part 1)
15 pages
Unit 5 Notes Final
No ratings yet
Unit 5 Notes Final
14 pages
N Grams
No ratings yet
N Grams
51 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
N Gram Model
No ratings yet
N Gram Model
4 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
Unit 5
No ratings yet
Unit 5
26 pages
N-Grams and Corpus Linguistics: Julia Hirschberg
No ratings yet
N-Grams and Corpus Linguistics: Julia Hirschberg
47 pages
Chapter 01
No ratings yet
Chapter 01
47 pages
N-Gram in NLP
No ratings yet
N-Gram in NLP
15 pages
N Grams - Nptel Notes
No ratings yet
N Grams - Nptel Notes
75 pages
English4 Conversation - Book 1 - Unit 1 - Let's Speak
No ratings yet
English4 Conversation - Book 1 - Unit 1 - Let's Speak
7 pages
A Day Out at The Monkey Forest Using Irregular Ver Fun Activities Games Grammar Drills 23849
No ratings yet
A Day Out at The Monkey Forest Using Irregular Ver Fun Activities Games Grammar Drills 23849
2 pages
De Thi Giua Ki 1 Mon Tieng Anh Lop 8 Nam2020
No ratings yet
De Thi Giua Ki 1 Mon Tieng Anh Lop 8 Nam2020
12 pages
Module 1 Section 3 Hobbies and Pastime Activities
100% (1)
Module 1 Section 3 Hobbies and Pastime Activities
2 pages
M1. Lecture 1 - Key Terms and Concepts On Language - Arts & Humanities Elective
No ratings yet
M1. Lecture 1 - Key Terms and Concepts On Language - Arts & Humanities Elective
4 pages
Advance Morphology: "A Perspective "
No ratings yet
Advance Morphology: "A Perspective "
7 pages
6th ELA3
No ratings yet
6th ELA3
46 pages
Grammar Practice Worksheet
No ratings yet
Grammar Practice Worksheet
4 pages
Raju
No ratings yet
Raju
13 pages
English 2
No ratings yet
English 2
2 pages
Mortal Syntax 101 Language Choices That Will Get You Clobbered by The Grammar Snobs Even If Youre Right Casagrande J. Book Review
No ratings yet
Mortal Syntax 101 Language Choices That Will Get You Clobbered by The Grammar Snobs Even If Youre Right Casagrande J. Book Review
2 pages
(Lesson 5) Peppa Pig
No ratings yet
(Lesson 5) Peppa Pig
47 pages
DLL - English 6 - Q1 - W2
No ratings yet
DLL - English 6 - Q1 - W2
8 pages
Open Cloze Verbs
No ratings yet
Open Cloze Verbs
2 pages
Presentation 31 Suffixes
No ratings yet
Presentation 31 Suffixes
27 pages
Blended Lesson Plan Sentence Structure and Types 1
No ratings yet
Blended Lesson Plan Sentence Structure and Types 1
2 pages
B1 UNIT 8 Test Answer Key Higher
No ratings yet
B1 UNIT 8 Test Answer Key Higher
2 pages
Detailed Syllabus - French As Foreign Language 2 - ALL 2018-2019
No ratings yet
Detailed Syllabus - French As Foreign Language 2 - ALL 2018-2019
2 pages
Greetings and Introductions
No ratings yet
Greetings and Introductions
52 pages
RPT Bahasa Inggeris Form 2 2024
No ratings yet
RPT Bahasa Inggeris Form 2 2024
17 pages
IELTS WRITING TASK 1 - Grammar
No ratings yet
IELTS WRITING TASK 1 - Grammar
13 pages
Theoretical Framework1
No ratings yet
Theoretical Framework1
12 pages
7 96
No ratings yet
7 96
9 pages
The International Research Foundation: Teacher Knowledge: Selected References (Last Updated 14 February 2017)
No ratings yet
The International Research Foundation: Teacher Knowledge: Selected References (Last Updated 14 February 2017)
5 pages
Vocabulary Lesson GR 2 The Water Walker
No ratings yet
Vocabulary Lesson GR 2 The Water Walker
3 pages
Adverbs of Manner - Eic
No ratings yet
Adverbs of Manner - Eic
12 pages
Unit 8: Grammar Focus 1-Relative Clauses of Time
No ratings yet
Unit 8: Grammar Focus 1-Relative Clauses of Time
5 pages

Lecture 4 N Grams

Uploaded by

Lecture 4 N Grams

Uploaded by

N-grams

For Example; the following sentence, has a non-zero probability of appearing

 N-grams are essential In machine translation, suppose we are translating

 A spellchecker can use a probability estimator to both factors as;

 Counting of things in natural language is based on a corpus (plural

 Vocabulary size (the number of types) grows with at least the

 Two popular corpora (1) Brown corpus, (2) Switchboard corpus.

Figure: 87-tag Brown corpus tagset.

 If we wanted to know the joint probability of an entire sequence of words

 How we compute probabilities of entire sequences like P(w1 , w2,... wn)?

Bigram model trigram model N-1 gram model

- we approximate it with the probability;

 When we use a bigram model to predict the conditional probability of the

 Thus, the general equation for this N-gram approximation to the

 We get the MLE estimate for the parameters of an N-gram model by

 The MLE estimate of its probability is 400 / 1000000 = .0004.

 Appointment for meeting

(1) Extrinsic evaluation :-

(2) Intrinsic evaluation :-

 Due to this problem, the maximum likelihood estimate of the probability

 This mean, to evaluate our language model,

 This modification is called smoothing or discounting.

 There are variety of ways to do smoothing:

You might also like