Unit 2b

Uploaded by

Samriddhi Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views22 pages

Unit 2b

Uploaded by

Samriddhi Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Other Statistical

Methods/Models
Unit2
Language Models
• Models that assign probabilities to sequences of words are called
language models
• models that assign a probability to each possible next word or to an
entire sentence.
• Why would you want to predict upcoming words, or assign
probabilities to sentences?
• identify words in noisy, ambiguous input, like speech recognition.
• Writing tools like spelling correction or grammatical error correction.
• also essential in machine translation.
N-gram Language Models
• simplest model that assigns probabilities to sentences and sequences
of words
• computing P(w/h), the probability of a word w given some history h

The intuition of the n-gram model is that instead of computing the probability of a
word given its entire history, we can approximate the history by just the last few
words.
bigram
• bigram model, for example, approximates the probability of a word
given all the previous words P(wn|w1:n-1) by using only the conditional
probability of the preceding word P(wn|wn-1) . In other words, instead
of computing the probability P(the|Walden Pond’s water is so
transparent that) we approximate it with the probability P(the|that).
• The assumption that the probability of a word depends only on the
previous word is called a Markov assumption.
ngram
• P(wn|wn-N+1:n-1) N = grams example bigram N=2 trigram N=3…
• So if N=4 then P(wn|wn-4+1:n-1) = P(wn|wn-3:n-1)
• = P(wn|wn-3wn-2wn-1)
• P(the|Walden Pond’s water is so transparent that) ≈ P(the|so
transparent that)
Markov models
• Markov models are the class of probabilistic models that assume we
can predict the probability of some future unit without looking too far
into the past.
• We can generalize the bigram (which looks one word into the past) to
the trigram (which looks two words into the past) and thus to the n-
gram (which looks n-1 words into the past).
maximum likelihood estimation
• How do we estimate these bigram or n-gram probabilities?
• We get the MLE estimate for the parameters of an n-gram model by
getting counts from a corpus, and normalizing the counts so that they
lie between 0 and 1.
Counts of
each word
in corpus

Find the bigram probabilities for each cell??

0.056

Counts of
each word
in corpus
42
=
Find the bigram probabilities for each cell?? Find P(lunch|eat) 746
If we have following probabilities…

• Now we can compute the probability of sentences like I want English

food or I want Chinese food by simply multiplying the appropriate
bigram probabilities together, as follows..
Evaluating Language Models
• extrinsic evaluation
• intrinsic evaluation
• training, development, and test sets
• In practice, we often just divide our data into 80% training, 10%
development, and 10% test
In general, perplexity is a measurement of how well a
probability model predicts a sample. In the context of
Perplexity Natural Language Processing, perplexity is one way to
evaluate language models.
• The perplexity (sometimes called PP for short) of a language model
on a test set is the inverse probability of the test set, normalized by
the number of words.

• minimizing perplexity is equivalent to maximizing the test set

probability according to the language model
Perplexity
The table below shows the perplexity of a 1.5 million word WSJ test set according to each of these
grammars.

In general, perplexity is a measurement of how well a probability model predicts a

sample. In the context of Natural Language Processing, perplexity is one way to
evaluate language models.
Smoothing
• words that are in our vocabulary but appear in a test set in an unseen
context
• for example they appear after a word they never appeared after in training
• To prevent a language model from assigning zero probability to these
unseen events we do smoothing.
• Laplace (add-one) smoothing,
• add-k smoothing,
• stupid backoff, and
• Kneser-Ney smoothing
Laplace Smoothing
• The simplest way to do smoothing is to add one to all the n-gram
counts, before we normalize them into probabilities.
• Does not perform well enough to be used smoothing in modern n-gram
models.
• Sometimes is also a practical smoothing algorithm for other tasks like
text classification.
• word wi is its count ci normalized by the total number of word tokens N

After Laplace Smoothing becomes

N is number of tokens V is number of words in the vocabulary

Add-k smoothing
• Instead of adding 1 to each count, we add a fractional count k (.5?
.05? .01?). This algorithm is therefore called add-k smoothing.

• Not that useful in language modelling

Backoff and Interpolation
• If we are trying to compute P(wn|wn-2wn-1) but we have no examples of a
particular trigram wn-2wn-1wn, we can instead estimate its probability by
using the bigram probability P(wn|wn-1). Similarly, if we don’t have counts
to compute P(wn|wn-1), we can look to the unigram P(wn)
• In backoff, we use the trigram if the evidence is sufficient, otherwise we
use the bigram, otherwise the unigram.
• By contrast, in interpolation, we always mix the probability estimates
from all the n-gram estimators, weighting and combining the trigram,
bigram, and unigram counts.
Kneser-Ney Smoothing
• Absolute discounting formalizes this intuition by subtracting a fixed
(absolute) discount d from each count.

• The first term is the discounted bigram, and the second term is the
unigram with an interpolation weight λ. We could just set all the d
values to .75, or we could keep a separate discount value of 0.5 for
the bigrams with counts of 1.

Unit Ii - NLP
No ratings yet
Unit Ii - NLP
35 pages
Project Report On Diabetes Prediction
No ratings yet
Project Report On Diabetes Prediction
29 pages
Mechanics of Anisotropic Materials (PDFDrive)
50% (2)
Mechanics of Anisotropic Materials (PDFDrive)
328 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Continuos Steel Reheating Furnaces: Specification, Design and Equipment
83% (12)
Continuos Steel Reheating Furnaces: Specification, Design and Equipment
68 pages
NLP - Module 2
No ratings yet
NLP - Module 2
77 pages
AI Unit V
No ratings yet
AI Unit V
64 pages
Preformulation Study
No ratings yet
Preformulation Study
130 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
Tutorial CMG
100% (4)
Tutorial CMG
30 pages
Nelson MHF 4U Advanced Function 1.1
No ratings yet
Nelson MHF 4U Advanced Function 1.1
10 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
CSE 108.04 Analysis & Design of Reinforced Concrete Foundati
No ratings yet
CSE 108.04 Analysis & Design of Reinforced Concrete Foundati
85 pages
Ebook Erp
No ratings yet
Ebook Erp
223 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
04 Language Modeling
No ratings yet
04 Language Modeling
70 pages
Ngrams
100% (1)
Ngrams
22 pages
LM 24 Aug
No ratings yet
LM 24 Aug
84 pages
PGDCA Diploma
No ratings yet
PGDCA Diploma
15 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Module 2
No ratings yet
Module 2
98 pages
NLP Unit2
No ratings yet
NLP Unit2
65 pages
NLP Cat 2
No ratings yet
NLP Cat 2
78 pages
Language Modeling
No ratings yet
Language Modeling
50 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
NLP m2
No ratings yet
NLP m2
74 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
N Grams
No ratings yet
N Grams
51 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
April 22 Part 2achine Translation
No ratings yet
April 22 Part 2achine Translation
36 pages
Ngram
No ratings yet
Ngram
41 pages
Sistem Pendukung Keputusan Penilaian Kinerja Pegawai Dengan Metode Multi
No ratings yet
Sistem Pendukung Keputusan Penilaian Kinerja Pegawai Dengan Metode Multi
58 pages
CME4408 P5 N-Grams Smooting
No ratings yet
CME4408 P5 N-Grams Smooting
43 pages
Week5 Languagemodels
No ratings yet
Week5 Languagemodels
35 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
Language Modeling
No ratings yet
Language Modeling
43 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
Unit 1a
No ratings yet
Unit 1a
53 pages
Modulation
100% (1)
Modulation
5 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
Module 2
No ratings yet
Module 2
26 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
009 Lecture 8 Sequential Circuits - Shift Registers
No ratings yet
009 Lecture 8 Sequential Circuits - Shift Registers
23 pages
Unit 5
No ratings yet
Unit 5
42 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
04 - N-Gram Language Models
No ratings yet
04 - N-Gram Language Models
41 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
NLP 5th Unit
No ratings yet
NLP 5th Unit
19 pages
16 Uji Sterilitas
No ratings yet
16 Uji Sterilitas
39 pages
Unit 2a
No ratings yet
Unit 2a
51 pages
NLP Units Iv V
No ratings yet
NLP Units Iv V
30 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
Course File - IPCW-Aug-Dec 2023 - All Batches
No ratings yet
Course File - IPCW-Aug-Dec 2023 - All Batches
11 pages
Teacher Assistants Working With Students With Disability: The Role of Adaptability in Enhancing Their Workplace Wellbeing
No ratings yet
Teacher Assistants Working With Students With Disability: The Role of Adaptability in Enhancing Their Workplace Wellbeing
24 pages
Membership Service Provider (MSP)
No ratings yet
Membership Service Provider (MSP)
12 pages
Natural Language Processing - Notes - Unit 2
No ratings yet
Natural Language Processing - Notes - Unit 2
19 pages
Transaction Flow
No ratings yet
Transaction Flow
11 pages
Statistical Inference
No ratings yet
Statistical Inference
38 pages
NLP Unit-II
No ratings yet
NLP Unit-II
20 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
15 pages
Unit 1b
No ratings yet
Unit 1b
24 pages
Artificial Intelligence: Natural Language Processing
No ratings yet
Artificial Intelligence: Natural Language Processing
13 pages
Alternating Current Short Notes
No ratings yet
Alternating Current Short Notes
4 pages
User Manual: Temperature Transducer EDT 101
No ratings yet
User Manual: Temperature Transducer EDT 101
40 pages
Units and Measurements 50 MCQ REAL
No ratings yet
Units and Measurements 50 MCQ REAL
8 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
0404-Mathematics Paper+With+Sol. Evening
No ratings yet
0404-Mathematics Paper+With+Sol. Evening
11 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Emat Sensor Design
No ratings yet
Emat Sensor Design
20 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Counting Eggs and Larvae
No ratings yet
Counting Eggs and Larvae
5 pages
Evaluating Language Models
No ratings yet
Evaluating Language Models
21 pages
Logfile
No ratings yet
Logfile
19 pages
UPES Internship - HLD
No ratings yet
UPES Internship - HLD
7 pages
NLP Kneserney
No ratings yet
NLP Kneserney
10 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Unit Vapplications Notes
No ratings yet
Unit Vapplications Notes
13 pages
ECE 598 PV Course Notes2
No ratings yet
ECE 598 PV Course Notes2
6 pages
9-10. Evaluation of Language Models and Smoothing
No ratings yet
9-10. Evaluation of Language Models and Smoothing
10 pages
JD - Managed Services
No ratings yet
JD - Managed Services
2 pages
NLP Lecture 8 Week 4
No ratings yet
NLP Lecture 8 Week 4
10 pages
N Grams
No ratings yet
N Grams
3 pages
N-Grams and Smoothing: Course Based On Jurafsky and Martin (2009, Chap.4)
No ratings yet
N-Grams and Smoothing: Course Based On Jurafsky and Martin (2009, Chap.4)
36 pages
SET-01 - SOCS - ESE-MAY23 - B.Tech (CSE) +AIML - VIII - CSBA4014 - Data Analysis and Modelling Technique
No ratings yet
SET-01 - SOCS - ESE-MAY23 - B.Tech (CSE) +AIML - VIII - CSBA4014 - Data Analysis and Modelling Technique
2 pages
Unit 2
No ratings yet
Unit 2
7 pages
Thermostats & Humidostats
No ratings yet
Thermostats & Humidostats
3 pages
JD - Global Security
No ratings yet
JD - Global Security
2 pages
Lecture13 LM YirenWang
No ratings yet
Lecture13 LM YirenWang
8 pages
Customer Sentiment Analysis Project
No ratings yet
Customer Sentiment Analysis Project
3 pages
Gravitation Notes
No ratings yet
Gravitation Notes
21 pages
Assignment 3
No ratings yet
Assignment 3
1 page
Complex Numbers
No ratings yet
Complex Numbers
9 pages
Set-1 Ese Dec22 B Tech (Cse+Bao) B Tech (Cse+Bao) (Hons.) V Eceg3052 Micro Processor & Embedded Systems
No ratings yet
Set-1 Ese Dec22 B Tech (Cse+Bao) B Tech (Cse+Bao) (Hons.) V Eceg3052 Micro Processor & Embedded Systems
2 pages
BFT Automatic Sliding Gate Brochure 2020
No ratings yet
BFT Automatic Sliding Gate Brochure 2020
11 pages
Response of Mung Bean (Vigna Radiata (L.) R. Wilczek) To An Increasing Natural Temperature Gradient Under Different Crop Management Systems
No ratings yet
Response of Mung Bean (Vigna Radiata (L.) R. Wilczek) To An Increasing Natural Temperature Gradient Under Different Crop Management Systems
18 pages
SET-02 - SOCS - ESE-DEC23 - B.Tech (CSE-H+NH) - All Spec. - 5 - ECEG3052 - Micro Processor & Embedded Systems
No ratings yet
SET-02 - SOCS - ESE-DEC23 - B.Tech (CSE-H+NH) - All Spec. - 5 - ECEG3052 - Micro Processor & Embedded Systems
2 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Teguflex Catalogue 2014
No ratings yet
Teguflex Catalogue 2014
8 pages
Manual Aire Acondicionado
No ratings yet
Manual Aire Acondicionado
22 pages
Geologist's Daily Progress Report: Status of Excavation
No ratings yet
Geologist's Daily Progress Report: Status of Excavation
2 pages
Limiting Reagents - Chemistry LibreTexts
No ratings yet
Limiting Reagents - Chemistry LibreTexts
5 pages
Corpus (Pl. Corpora) A Computer-Readable Collection Of: Introduction To NLP
No ratings yet
Corpus (Pl. Corpora) A Computer-Readable Collection Of: Introduction To NLP
3 pages

Unit 2b

Uploaded by

Unit 2b

Uploaded by

Other Statistical

Find the bigram probabilities for each cell??

• Now we can compute the probability of sentences like I want English

• minimizing perplexity is equivalent to maximizing the test set

In general, perplexity is a measurement of how well a probability model predicts a

After Laplace Smoothing becomes

N is number of tokens V is number of words in the vocabulary

• Not that useful in language modelling

You might also like