Slides - 02 01 EvalandPerplex

Uploaded by

man994412

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views10 pages

Slides - 02 01 EvalandPerplex

Uploaded by

man994412

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Language

Modeling
Evaluation and
Perplexity
Dan Jurafsky

Evaluation: How good is our model?

• Does our language model prefer good sentences to bad ones?
• Assign higher probability to “real” or “frequently observed” sentences
• Than “ungrammatical” or “rarely observed” sentences?
• We train parameters of our model on a training set.
• We test the model’s performance on data we haven’t seen.
• A test set is an unseen dataset that is different from our training set,
totally unused.
• An evaluation metric tells us how well our model does on the test set.
Dan Jurafsky

Extrinsic evaluation of N-gram models

• Best evaluation for comparing models A and B
• Put each model in a task
• spelling corrector, speech recognizer, MT system
• Run the task, get an accuracy for A and for B
• How many misspelled words corrected properly
• How many words translated correctly
• Compare accuracy for A and B
Dan Jurafsky

Difficulty of extrinsic (in-vivo) evaluation

of N-gram models
• Extrinsic evaluation
• Time-consuming; can take days or weeks
• So
• Sometimes use intrinsic evaluation: perplexity
• Bad approximation
• unless the test data looks just like the training data
• So generally only useful in pilot experiments
• But is helpful to think about.
Dan Jurafsky

Intuition of Perplexity
mushrooms 0.1
• The Shannon Game:
• How well can we predict the next word? pepperoni 0.1
anchovies 0.01
I always order pizza with cheese and ____
….
The 33rd President of the US was ____
fried rice 0.0001
I saw a ____ ….
• Unigrams are terrible at this game. (Why?) and 1e-100
• A better model of a text
• is one which assigns a higher probability to the word that actually occurs
Dan Jurafsky

Perplexity
The best language model is one that best predicts an unseen test set
• Gives the highest P(sentence)
Perplexity is the probability of the test
set, normalized by the number of
words:

Chain rule:

For bigrams:

Minimizing perplexity is the same as maximizing probability

Dan Jurafsky

The Shannon Game intuition for perplexity

• From Josh Goodman
• How hard is the task of recognizing digits ‘0,1,2,3,4,5,6,7,8,9’
• Perplexity 10
• How hard is recognizing (30,000) names at Microsoft.
• Perplexity = 30,000
• If a system has to recognize
• Operator (1 in 4)
• Sales (1 in 4)
• Technical Support (1 in 4)
• 30,000 names (1 in 120,000 each)
• Perplexity is 53
• Perplexity is weighted equivalent branching factor
Dan Jurafsky

Perplexity as branching factor

• Let’s suppose a sentence consisting of random digits
• What is the perplexity of this sentence according to a model
that assign P=1/10 to each digit?
Dan Jurafsky

Lower perplexity = better model

• Training 38 million words, test 1.5 million words, WSJ

N-gram Unigram Bigram Trigram

Order
Perplexity 962 170 109
Language
Modeling
Evaluation and
Perplexity

Language Models and Application of Natural Language Processing
No ratings yet
Language Models and Application of Natural Language Processing
70 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
76 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
Session 2-3 Language Modeling
No ratings yet
Session 2-3 Language Modeling
69 pages
Session10 - cs2731 NLP LM
No ratings yet
Session10 - cs2731 NLP LM
47 pages
04 Language Modeling
No ratings yet
04 Language Modeling
70 pages
NLP m2
No ratings yet
NLP m2
74 pages
08 Language Models
No ratings yet
08 Language Models
69 pages
NLP MOD2 Advanced Smoothing Techniques
No ratings yet
NLP MOD2 Advanced Smoothing Techniques
41 pages
Language Modeling and Spelling Correction
No ratings yet
Language Modeling and Spelling Correction
97 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
LM 24 Aug
No ratings yet
LM 24 Aug
84 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
TI-06-Stream Codes
No ratings yet
TI-06-Stream Codes
88 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
April 22 Part 2achine Translation
No ratings yet
April 22 Part 2achine Translation
36 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
HeadRush Amp & Effect List
No ratings yet
HeadRush Amp & Effect List
10 pages
Chapter 03-Number System
No ratings yet
Chapter 03-Number System
88 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
56 pages
Grade 7 SCIENCE Item-Analysis-for-item-bank
100% (1)
Grade 7 SCIENCE Item-Analysis-for-item-bank
5 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
Language Modeling
No ratings yet
Language Modeling
50 pages
PLM 17
No ratings yet
PLM 17
15 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
Ngrams
100% (1)
Ngrams
22 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
88 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Artificial Intelligence: Natural Language Processing
No ratings yet
Artificial Intelligence: Natural Language Processing
13 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Ramadan in Java The Joy Jihad of Ritual Fasting Lund Studies in History of Religions Andre Moller Instant Download
No ratings yet
Ramadan in Java The Joy Jihad of Ritual Fasting Lund Studies in History of Religions Andre Moller Instant Download
70 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
59 pages
Bcse306l Ai Module-7 Smsatapathy
No ratings yet
Bcse306l Ai Module-7 Smsatapathy
51 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
Unit V-AI-KCS071
No ratings yet
Unit V-AI-KCS071
28 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
Lecture5 Ngrams
No ratings yet
Lecture5 Ngrams
40 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
15 pages
Evaluating Computational Language Models With Scaling Properties of Natural Language
No ratings yet
Evaluating Computational Language Models With Scaling Properties of Natural Language
34 pages
Lecture 2. N-Gram LMs
No ratings yet
Lecture 2. N-Gram LMs
77 pages
N Grams
No ratings yet
N Grams
51 pages
REFLEX ACT III™ Quick User Guide v12
100% (1)
REFLEX ACT III™ Quick User Guide v12
20 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
Unit - 2
No ratings yet
Unit - 2
10 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
Week 4
No ratings yet
Week 4
37 pages
NLP Units Iv V
No ratings yet
NLP Units Iv V
30 pages
Lecture 6 To 8 N-Gram
No ratings yet
Lecture 6 To 8 N-Gram
19 pages
Scaler User Manual
No ratings yet
Scaler User Manual
20 pages
Rnn-Based Ams + Introduction To Language Modeling: Instructor: Preethi Jyothi
No ratings yet
Rnn-Based Ams + Introduction To Language Modeling: Instructor: Preethi Jyothi
36 pages
N-Grams and Corpus Linguistics: Julia Hirschberg
No ratings yet
N-Grams and Corpus Linguistics: Julia Hirschberg
47 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
Bo de Thi Tieng Anh Lop 4 Hoc Ki 1 Co Dap An
No ratings yet
Bo de Thi Tieng Anh Lop 4 Hoc Ki 1 Co Dap An
60 pages
Language Model PDF
No ratings yet
Language Model PDF
76 pages
03 Evaluation and Perplexity 11-09
No ratings yet
03 Evaluation and Perplexity 11-09
5 pages
Language Modelling
No ratings yet
Language Modelling
17 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Dissertation Sara Parchami
100% (2)
Dissertation Sara Parchami
7 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Evaluating Language Models
No ratings yet
Evaluating Language Models
21 pages
Strategic Management Notes 3-4
No ratings yet
Strategic Management Notes 3-4
7 pages
Bock HGX44e
No ratings yet
Bock HGX44e
24 pages
Understanding How PeopleCode Events Work
No ratings yet
Understanding How PeopleCode Events Work
14 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
SwOS CSS326
No ratings yet
SwOS CSS326
14 pages
Operating - Station Master
No ratings yet
Operating - Station Master
9 pages
Six Sigma Level - 1 Exam 31 12 24
No ratings yet
Six Sigma Level - 1 Exam 31 12 24
12 pages
VB7
No ratings yet
VB7
44 pages
DXC INTERVIEW QUESTIONS Consolidated
No ratings yet
DXC INTERVIEW QUESTIONS Consolidated
8 pages
NSCP (2010) - Chapter 3
No ratings yet
NSCP (2010) - Chapter 3
24 pages
Weather Vocab
No ratings yet
Weather Vocab
2 pages
Kisi-Kisi SOAL UJIAN AKHIR SEKOLAH BAHASA INGGRIS 2024
No ratings yet
Kisi-Kisi SOAL UJIAN AKHIR SEKOLAH BAHASA INGGRIS 2024
12 pages
3D Assignment
No ratings yet
3D Assignment
7 pages
Sat Practice Test 7
No ratings yet
Sat Practice Test 7
3 pages
MCQ in Plane Geometry Part 2 ECE Board Exam
No ratings yet
MCQ in Plane Geometry Part 2 ECE Board Exam
10 pages
Jasmina Milicevic
100% (1)
Jasmina Milicevic
17 pages
141
No ratings yet
141
12 pages
Edu 402 Quiz Solved: Brutal Facts
No ratings yet
Edu 402 Quiz Solved: Brutal Facts
7 pages
Pascal Programming
No ratings yet
Pascal Programming
31 pages
MSUAAF Glidden 2013 Plans Book
No ratings yet
MSUAAF Glidden 2013 Plans Book
24 pages
Et Zc341 Ec-3r Solution Second Sem 2013-2014
No ratings yet
Et Zc341 Ec-3r Solution Second Sem 2013-2014
9 pages
In Uence of Geographical Phenomenon On Yoga: A Study On Yoga-Geography
No ratings yet
In Uence of Geographical Phenomenon On Yoga: A Study On Yoga-Geography
10 pages
Wind Energy
No ratings yet
Wind Energy
26 pages
After You Graduate You Get A Job in A Small
No ratings yet
After You Graduate You Get A Job in A Small
2 pages
English Sentences Quiz
From Everand
English Sentences Quiz
Radosław Więckowski
No ratings yet

Slides - 02 01 EvalandPerplex

Uploaded by

Slides - 02 01 EvalandPerplex

Uploaded by

Language

Evaluation: How good is our model?

Extrinsic evaluation of N-gram models

Difficulty of extrinsic (in-vivo) evaluation

Minimizing perplexity is the same as maximizing probability

The Shannon Game intuition for perplexity

Perplexity as branching factor

Lower perplexity = better model

• Training 38 million words, test 1.5 million words, WSJ

N-gram Unigram Bigram Trigram

You might also like