0% found this document useful (0 votes)

2 views44 pages

Eisner-Probability How To Use Prob

This document outlines the fundamentals of probability notation and models, particularly in the context of language identification. It discusses how to interpret and manipulate probability expressions, the significance of different types of statistics, and the construction of probability models. Additionally, it emphasizes the importance of conditional independence and the application of the chain rule in estimating probabilities for sequences of events.

Uploaded by

yarno.prc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views44 pages

Eisner-Probability How To Use Prob

Uploaded by

yarno.prc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 44

How to Use Probabilities

The Crash Course

Jason Eisner

1
Goals of this lecture

• Probability notation like p(X | Y):

– What does this expression mean?
– How can I manipulate it?
– How can I estimate its value in practice?
• Probability models:
– What is one?
– Can we build one for language ID?
– How do I know if my model is any good?

600.465 – Intro to NLP – J. Eisner 2

3 Kinds of Statistics

• descriptive: mean Hopkins SAT (or median)

• confirmatory: statistically significant?

• predictive: wanna bet?

this course – why?

600.465 – Intro to NLP – J. Eisner 3

Notation for Greenhorns

0.9
“Paul
probability
Revere”
model

p(Paul Revere wins | weather’s clear) = 0.9

600.465 – Intro to NLP – J. Eisner 4

What does that really mean?
p(Paul Revere wins | weather’s clear) = 0.9

• Past performance?
– Revere’s won 90% of races with clear weather
• Hypothetical performance?
– If he ran the race in many parallel universes …
• Subjective strength of belief?
– Would pay up to 90 cents for chance to win $1
• Output of some computable formula?
– Ok, but then which formulas should we trust?
p(X | Y) versus q(X | Y)
600.465 – Intro to NLP – J. Eisner 5
p is a function on event sets
p(win | clear)  p(win, clear) / p(clear)

weather’s
clear
Paul Revere
wins

All Events (races)

600.465 – Intro to NLP – J. Eisner 6
p is a function on event sets
p(win | clear)  p(win, clear) / p(clear)
syntactic sugar logical conjunction predicate selecting
of predicates races where
weather’s clear

weather’s
clear p measures total
Paul Revere
wins
probability of a
All Events (races)
set of events.
600.465 – Intro to NLP – J. Eisner 7
most of the
Required Properties of p (axioms)

• p() = 0 p(all events) = 1

• p(X)  p(Y) for any X  Y
• p(X) + p(Y) = p(X  Y) provided X  Y=
e.g., p(win & clear) + p(win & clear) = p(win)

weather’s
clear p measures total
Paul Revere
wins
probability of a
All Events (races)
set of events.
600.465 – Intro to NLP – J. Eisner 8
Commas denote conjunction
p(Paul Revere wins, Valentine places, Epitaph
shows | weather’s clear)
what happens as we add conjuncts to left of bar ?
• probability can only decrease
• numerator of historical estimate likely to go to zero:
# times Revere wins AND Val places… AND weather’s clear
# times weather’s clear

600.465 – Intro to NLP – J. Eisner 9

Commas denote conjunction
p(Paul Revere wins, Valentine places, Epitaph
shows | weather’s clear)
p(Paul Revere wins | weather’s clear, ground is
dry, jockey getting over sprain, Epitaph also in race, Epitaph
was recently bought by Gonzalez, race is on May 17, … )
what happens as we add conjuncts to right of bar ?
• probability could increase or decrease
• probability gets more relevant to our case (less bias)
• probability estimate gets less reliable (more variance)
# times Revere wins AND weather clear AND … it’s May 17
# times weather clear AND … it’s May 17
600.465 – Intro to NLP – J. Eisner 10
Simplifying Right Side: Backing Off

p(Paul Revere wins | weather’s clear, ground is

dry, jockey getting over sprain, Epitaph also in race, Epitaph
was recently bought by Gonzalez, race is on May 17, … )
not exactly what we want but at least we can get a
reasonable estimate of it!
(i.e., more bias but less
variance)
try to keep the conditions that we suspect will have the
most influence on whether Paul Revere wins
600.465 – Intro to NLP – J. Eisner 11
Simplifying Right Side: Backing Off
p(Paul Revere wins, Valentine places, Epitaph
shows | weather’s clear)
NOT ALLOWED!
but we can do something similar to help …

600.465 – Intro to NLP – J. Eisner 12

Factoring Left Side: The Chain Rule
p(Revere, Valentine, Epitaph | weather’s clear) RVEW/W
= p(Revere | Valentine, Epitaph, weather’s clear) = RVEW/VEW
* p(Valentine | Epitaph, weather’s clear) * VEW/EW
* p(Epitaph | weather’s clear) * EW/W
True because numerators cancel against denominators
Makes perfect sense when read from bottom to top
Moves material to right of bar so it can be ignored

If this prob is unchanged by backoff, we say Revere was

CONDITIONALLY INDEPENDENT of Valentine and Epitaph
(conditioned on the weather’s being clear). Often we just
ASSUME conditional independence to get the nice product above.
600.465 – Intro to NLP – J. Eisner 13
Remember Language ID?
• “Horses and Lukasiewicz are on the curriculum.”

• Is this English or Polish or what?

• We had some notion of using n-gram models …

• Is it “good” (= likely) English?

• Is it “good” (= likely) Polish?

• Space of events will be not races but character

sequences (x1, x2, x3, …) where xn = EOS

600.465 – Intro to NLP – J. Eisner 14

Remember Language ID?

• Let p(X) = probability of text X in English

• Let q(X) = probability of text X in Polish
• Which probability is higher?
– (we’d also like bias toward English since it’s
more likely a priori – ignore that for now)

“Horses and Lukasiewicz are on the curriculum.”

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

600.465 – Intro to NLP – J. Eisner 15

Apply the Chain Rule

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

= p(x1=h) 4470/ 52108

* p(x2=o | x1=h) 395/ 4470

5/ 395
* p(x3=r | x1=h, x2=o)
3/ 5
* p(x4=s | x1=h, x2=o, x3=r)
3/ 3
* p(x5=e | x1=h, x2=o, x3=r, x4=s) 0/ 3
* p(x6=s | x1=h, x2=o, x3=r, x4=s, x5=e)
*… =0 counts from
Brown corpus
600.465 – Intro to NLP – J. Eisner 16
Back Off On Right Side

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

 p(x1=h) 4470/ 52108

* p(x2=o | x1=h) 395/ 4470

5/ 395
* p(x3=r | x1=h, x2=o)
12/ 919
* p(x4=s | x2=o, x3=r)
12/ 126
* p(x5=e | x3=r, x4=s) 3/ 485
* p(x6=s | x4=s, x5=e)
* … = 7.3e-10 * … counts from
Brown corpus
600.465 – Intro to NLP – J. Eisner 17
Change the Notation

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

 p(x1=h) 4470/ 52108
* p(x2=o | x1=h) 395/ 4470
* p(xi=r | xi-2=h, xi-1=o, i=3) 5/ 395
* p(xi=s | xi-2=o, xi-1=r, i=4) 12/ 919

* p(xi=e | xi-2=r, xi-1=s, i=5) 12/ 126

3/ 485
* p(xi=s | xi-2=s, xi-1=e, i=6)
* … = 7.3e-10 * … counts from
Brown corpus
600.465 – Intro to NLP – J. Eisner 18
Another Independence Assumption

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

 p(x1=h) 4470/ 52108
* p(x2=o | x1=h) 395/ 4470

* p(xi=r | xi-2=h, xi-1=o) 1417/ 14765

1573/ 26412
* p(xi=s | xi-2=o, xi-1=r)
1610/ 12253
* p(xi=e | xi-2=r, xi-1=s)
2044/ 21250
* p(xi=s | xi-2=s, xi-1=e)
* … = 5.4e-7 * … counts from
Brown corpus
600.465 – Intro to NLP – J. Eisner 19
Simplify the Notation

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

 p(x1=h) 4470/ 52108

* p(x2=o | x1=h) 395/ 4470

* p(r | h, o) 1417/ 14765

1573/ 26412
* p(s | o, r)
1610/ 12253
* p(e | r, s)
2044/ 21250
* p(s | s, e)
*… counts from
Brown corpus
600.465 – Intro to NLP – J. Eisner 20
Simplify the Notation

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

the parameters
 p(h | BOS, BOS) of our old 4470/ 52108
* p(o | BOS, h) trigram generator! 395/ 4470
Same assumptions
* p(r | h, o) about language. 1417/ 14765
* p(s | o, r) values of 1573/ 26412
those
* p(e | r, s) parameters, 1610/ 12253
as naively 2044/ 21250
* p(s | s, e) estimated
* … These basic probabilities from Brown
corpus.
are used to define p(horses) counts from
Brown corpus
600.465 – Intro to NLP – J. Eisner 21
Simplify the Notation

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

the parameters
 t BOS, BOS, h of our old 4470/ 52108
trigram generator! 395/ 4470
* t BOS, h, o Same assumptions
about language. 1417/ 14765
* t h, o, r
values of 1573/ 26412
* t o, r, s those
parameters, 1610/ 12253
* t r, s, e as naively 2044/ 21250
estimated
* t s, e,sThis notation emphasizes that from Brown
corpus. counts from
* … whose value must be estimated
they’re just real variables
Brown corpus
600.465 – Intro to NLP – J. Eisner 22
Definition: Probability Model

param Trigram Model definition

values (defined in terms of p
of parameters like
t h, o, r and t o, r, s )

generate find event

random probabilities
text
600.465 – Intro to NLP – J. Eisner 23
English vs. Polish

English
param definition
values of p
Trigram Model
(defined in terms
Polish of parameters like definition
param t h, o, r and t o, r, s ) of q
values
p are
com compute
compute p(X)
q(X)
600.465 – Intro to NLP – J. Eisner 24
What is “X” in p(X)?
• Element of some implicit “event space”
• e.g., race
definition
• e.g., sentence
of p
• What if event is a whole text?
• p(text) definition
= p(sentence 1, sentence 2, …) of q
= p(sentence 1)
* p(sentence 2 | sentence 1) are
p
*… com compute
compute p(X)
q(X)
600.465 – Intro to NLP – J. Eisner 25
What is “X” in “p(X)”?
• Element of some implicit “event space”
• e.g., race, sentence, text …
• Suppose an event is a sequence of letters:
p(horses)

• But we rewrote p(horses) as

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)
p(x1=h) * p(x2=o | x1=h) * …are
o mp
c
• What does this variable=value notation mean?

600.465 – Intro to NLP – J. Eisner 26

Random Variables:
What is “variable” in “p(variable=value)”?
Answer: variable is really a function of Event
• p(x1=h) * p(x2=o | x1=h) * …
• Event is a sequence of letters
• x2 is the second letter in the sequence
• p(number of heads=2) or just p(H=2)
• Event is a sequence of 3 coin flips
• H is the number of heads
• p(weather’s clear=true) or just m are
p(weather’s
p clear)
co
• Event is a race
• weather’s clear is true or false
600.465 – Intro to NLP – J. Eisner 27
Random Variables:
What is “variable” in “p(variable=value)”?
Answer: variable is really a function of Event
• p(x1=h) * p(x2=o | x1=h) * …
• Event is a sequence of letters
• x2(Event) is the second letter in the sequence
• p(number of heads=2) or just p(H=2)
• Event is a sequence of 3 coin flips
• H(Event) is the number of heads
• p(weather’s clear=true) or just m are
p(weather’s
p clear)
co
• Event is a race
• weather’s clear (Event) is true or false
600.465 – Intro to NLP – J. Eisner 28
Random Variables:
What is “variable” in “p(variable=value)”?

• p(number of heads=2) or just p(H=2)

• Event is a sequence of 3 coin flips
• H is the number of heads in the event
• So p(H=2)
= p(H(Event)=2) picks out the set of events with 2 heads
= p({HHT,HTH,THH})
= p(HHT)+p(HTH)+p(THH) TTT TTH HTT HTH

THT THH HHT HHH

600.465 – Intro to NLP – J. Eisner All Events 29

Random Variables:
What is “variable” in “p(variable=value)”?

• p(weather’s clear)
• Event is a race
• weather’s clear is true or false of the event

• So p(weather’s clear)
= p(weather’s clear(Event)=true)
picks out the set of events weather’s
with clear weather clear
Paul Revere
wins
p(win | clear)  p(win, clear) / p(clear)
All Events (races)
600.465 – Intro to NLP – J. Eisner 30
Random Variables:
What is “variable” in “p(variable=value)”?

• p(x1=h) * p(x2=o | x1=h) * …

• Event is a sequence of letters
• x2 is the second letter in the sequence
• So p(x2=o)
= p(x2(Event)=o) picks out the set of events with …
=  p(Event) over all events whose second letter …
= p(horses) + p(boffo) + p(xoyzkklp) + …

600.465 – Intro to NLP – J. Eisner 31

Back to trigram model of p(horses)

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

p(W1 = horses)
where word vector W is a function of the event (the sentence) just as
character vector X is.
= p(Wi = horses | i=1)
 p(Wi = horses) = 7.2e-5
independence assumption says that sentence-initial words w1 are just like
all other words wi (gives us more data to use)

Much larger than previous estimate of 5.4e-7 – why?

Advantages, disadvantages?

600.465 – Intro to NLP – J. Eisner 33

Improving the New Model:
Weaken the Indep. Assumption
• Don’t totally cross off i=1 since it’s not irrelevant:
– Yes, horses is common, but less so at start of sentence since most
sentences start with determiners.
p(W1 = horses) = t p(W1=horses, T1 = t)
= t p(W1=horses|T1 = t) * p(T1 = t)
= t p(Wi=horses|Ti = t, i=1) * p(T1 = t)
 t p(Wi=horses|Ti = t) * p(T1 = t)
= p(Wi=horses|Ti = PlNoun) * p(T1 = PlNoun)
(if first factor is 0 for any other part of speech)
 (72 / 55912) * (977 / 52108)
= 2.4e-5
600.465 – Intro to NLP – J. Eisner 34
Which Model is Better?

• Model 1 – predict each letter Xi from

previous 2 letters Xi-2, Xi-1
• Model 2 – predict each word Wi by its part
of speech Ti, having predicted Ti from i

• Models make different independence

assumptions that reflect different intuitions
• Which intuition is better???
600.465 – Intro to NLP – J. Eisner 35
Measure Performance!
• Which model does better on language ID?
– Administer test where you know the right answers
– Seal up test data until the test happens
• Simulates real-world conditions where new data comes along that
you didn’t have access to when choosing or training model
– In practice, split off a test set as soon as you obtain the
data, and never look at it
– Need enough test data to get statistical significance
• For a different task (e.g., speech transcription instead
of language ID), use that task to evaluate the models

600.465 – Intro to NLP – J. Eisner 36

Bayes’ Theorem
• p(A | B) = p(B | A) * p(A) / p(B)

• Easy to check by removing syntactic sugar

• Use 1: Converts p(B | A) to p(A | B)
• Use 2: Updates p(A) to p(A | B)

• Stare at it so you’ll recognize it later

600.465 – Intro to NLP – J. Eisner 37

Language ID
• Given a sentence x, I suggested comparing its prob in
different languages:
– p(SENT=x | LANG=english) (i.e.,
penglish(SENT=x))
– p(SENT=x | LANG=polish) (i.e.,
ppolish(SENT=x))
– p(SENT=x | LANG=xhosa) (i.e.,
pxhosa(SENT=x))

• But surely for language ID we should compare

– p(LANG=english | SENT=x)
– p(LANG=polish | SENT=x)
– p(LANG=xhosa | SENT=x)

600.465 – Intro to NLP – J. Eisner 38

Language ID
• For language ID we should compare
– p(LANG=english | SENT=x)
– p(LANG=polish | SENT=x) a posteriori
– p(LANG=xhosa | SENT=x)
• For ease, multiply by p(SENT=x) and compare
– p(LANG=english, SENT=x) sum of these is a way to
– p(LANG=polish, SENT=x) find p(SENT=x); can divide
– p(LANG=xhosa, SENT=x) back by that to get
posterior probs
• Must know prior probabilities; then rewrite as
– p(LANG=english) * p(SENT=x | LANG=english)
– p(LANG=polish) * p(SENT=x | LANG=polish)
– p(LANG=xhosa) * p(SENT=x | LANG=xhosa)
a priori likelihood (what we had before)
600.465 – Intro to NLP – J. Eisner 39
General Case (“noisy channel”)
“noisy channel”
“decoder”
a mess up b
p(A=a) a into b
p(B=b | A=a) most likely
language  text reconstruction of a
text  speech maximize p(A=a | B=b)
spelled  misspelled = p(A=a) p(B=b | A=a) /
English  French (B=b)
= p(A=a) p(B=b | A=a)
/  p(A=a’) p(B=b | A=a’)
600.465 – Intro to NLP – J. Eisner 40
Language ID
• For language ID we should compare
– p(LANG=english | SENT=x)
– p(LANG=polish | SENT=x) a posteriori
– p(LANG=xhosa | SENT=x)
• For ease, multiply by p(SENT=x) and compare
– p(LANG=english, SENT=x)
– p(LANG=polish, SENT=x)
– p(LANG=xhosa, SENT=x)
• Must know prior probabilities; then rewrite as
– p(LANG=english) * p(SENT=x | LANG=english)
– p(LANG=polish) * p(SENT=x | LANG=polish)
– p(LANG=xhosa) * p(SENT=x | LANG=xhosa)
a priori likelihood
600.465 – Intro to NLP – J. Eisner 41
General Case (“noisy channel”)
• Want most likely A to have generated evidence B
– p(A = a1 | B = b)
– p(A = a2 | B = b) a posteriori
– p(A = a3 | B = b)
• For ease, multiply by p(SOUND=x) and compare
– p(A = a1, B = b)
– p(A = a2, B = b)
– p(A = a3, B = b)
• Must know prior probabilities; then rewrite as
– p(A = a1) * p(B = b | A = a1)
– p(A = a2) * p(B = b | A = a2)
– p(A = a3) * p(B = b | A = a3)
a priori likelihood
600.465 – Intro to NLP – J. Eisner 42
Speech Recognition
• For baby speech recognition we should compare
– p(MEANING=gimme | SOUND=uhh)
– p(MEANING=changeme | SOUND=uhh) a posteriori
– p(MEANING=loveme | SOUND=uhh)
• For ease, multiply by p(SOUND=uhh) & compare
– p(MEANING=gimme, SOUND=uhh)
– p(MEANING=changeme, SOUND=uhh)
– p(MEANING=loveme, SOUND=uhh)
• Must know prior probabilities; then rewrite as
– p(MEAN=gimme) * p(SOUND=uhh | MEAN=gimme)
– p(MEAN=changeme) * p(SOUND=uhh | MEAN=changeme)
– p(MEAN=loveme) * p(SOUND=uhh | MEAN=loveme)
a priori likelihood
600.465 – Intro to NLP – J. Eisner 43
Life or Death!
• p(diseased) = 0.001 so p(diseased) = 0.999

• p(positive test | diseased) = 0.05 “false pos”

• p(negative test | diseased) = x  0 “false neg”
so p(positive test | diseased) = 1-x  1

• What is p(diseased | positive test)?

– don’t panic - still very small! < 1/51 for any x

600.465 – Intro to NLP – J. Eisner 44

Q-Checker For V5 Release 5.5: Criteria Manual
100% (1)
Q-Checker For V5 Release 5.5: Criteria Manual
1,238 pages
TEM Evolution Systems Manual - 1
80% (5)
TEM Evolution Systems Manual - 1
24 pages
Unit 3
No ratings yet
Unit 3
68 pages
Emergence in Complex Cognitive Social and Biological Systems 2002 PDF
No ratings yet
Emergence in Complex Cognitive Social and Biological Systems 2002 PDF
389 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
2018 Biosatics MCQ
100% (4)
2018 Biosatics MCQ
33 pages
Hennig 2021 Probabilistic Machine Learning
No ratings yet
Hennig 2021 Probabilistic Machine Learning
189 pages
Pactron Parameterliste
No ratings yet
Pactron Parameterliste
15 pages
Revit Family Formula Tips and Tricks
No ratings yet
Revit Family Formula Tips and Tricks
14 pages
Lect14 Semantics
No ratings yet
Lect14 Semantics
73 pages
Sage X3 - User Guide - HTG-AP Discounts PDF
No ratings yet
Sage X3 - User Guide - HTG-AP Discounts PDF
7 pages
ISA-RP67.04.02-2000 - Methodologies For The Determination of Setpoints For Nuclear Safety-Related Instrumentation
No ratings yet
ISA-RP67.04.02-2000 - Methodologies For The Determination of Setpoints For Nuclear Safety-Related Instrumentation
156 pages
Time of Concentration Estimation - 1
No ratings yet
Time of Concentration Estimation - 1
15 pages
Sampling and Estimation A Level Notes (Precision)
No ratings yet
Sampling and Estimation A Level Notes (Precision)
44 pages
Parameter Estimation For PCFGS: Julia Hockenmaier
No ratings yet
Parameter Estimation For PCFGS: Julia Hockenmaier
90 pages
Statistics Hand-Outs (NEW)
No ratings yet
Statistics Hand-Outs (NEW)
21 pages
NLP - Module 2
No ratings yet
NLP - Module 2
77 pages
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
No ratings yet
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
148 pages
SURFTEST SJ-310 Series: Portable Surface Roughness Tester
No ratings yet
SURFTEST SJ-310 Series: Portable Surface Roughness Tester
12 pages
8 Class Notes
No ratings yet
8 Class Notes
35 pages
DesignXplorer Users Guide
No ratings yet
DesignXplorer Users Guide
410 pages
Vandurme2011 How To Use Prob
No ratings yet
Vandurme2011 How To Use Prob
44 pages
L3 LanguageModels
No ratings yet
L3 LanguageModels
118 pages
Unit 5
No ratings yet
Unit 5
107 pages
User'S Manual: Dynamic Nanotem Concept Document
No ratings yet
User'S Manual: Dynamic Nanotem Concept Document
42 pages
Ngrams
100% (1)
Ngrams
22 pages
Natural Language Understanding Allen 1995 Chapter 7
No ratings yet
Natural Language Understanding Allen 1995 Chapter 7
98 pages
CUET PG 2022 Question Paper Economics
No ratings yet
CUET PG 2022 Question Paper Economics
29 pages
L4 Knowledge Representation
No ratings yet
L4 Knowledge Representation
95 pages
Unit 5 Updated
No ratings yet
Unit 5 Updated
107 pages
(TEST CASE) Creating Formulas For Work Centers - ERP Manufacturing (PP) - Community Wiki
No ratings yet
(TEST CASE) Creating Formulas For Work Centers - ERP Manufacturing (PP) - Community Wiki
5 pages
Song11-11 Probability Distribution
No ratings yet
Song11-11 Probability Distribution
34 pages
Lec 3 slp04 LM and Ngrans
No ratings yet
Lec 3 slp04 LM and Ngrans
73 pages
SPT PDF
No ratings yet
SPT PDF
8 pages
Cpts 440 / 540 Artificial Intelligence: Knowledge Representation
No ratings yet
Cpts 440 / 540 Artificial Intelligence: Knowledge Representation
95 pages
The Expectation Maximization (EM) Algorithm: Continued!
No ratings yet
The Expectation Maximization (EM) Algorithm: Continued!
67 pages
NLP Week4 Ngrams
No ratings yet
NLP Week4 Ngrams
60 pages
NLP Week 02
No ratings yet
NLP Week 02
54 pages
File 1691289494 5000672 L2-IntroductiontoProbability
No ratings yet
File 1691289494 5000672 L2-IntroductiontoProbability
74 pages
4-Bayesian Theory
No ratings yet
4-Bayesian Theory
65 pages
NLP Week 02
No ratings yet
NLP Week 02
55 pages
Ima 2000
No ratings yet
Ima 2000
56 pages
Math - ML Trang 6
No ratings yet
Math - ML Trang 6
53 pages
6 Uncertainty
No ratings yet
6 Uncertainty
45 pages
Chapter-02 2
No ratings yet
Chapter-02 2
42 pages
2 BasicProbPart1
No ratings yet
2 BasicProbPart1
37 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Notes 8: Predicate Logic and Inference: ICS 270a Spring 2003
No ratings yet
Notes 8: Predicate Logic and Inference: ICS 270a Spring 2003
35 pages
Over Current Relay Test Report: Doble Engineering Company
No ratings yet
Over Current Relay Test Report: Doble Engineering Company
9 pages
Logical Prior Probability
No ratings yet
Logical Prior Probability
10 pages
Lecture03 Naive Bayes
No ratings yet
Lecture03 Naive Bayes
33 pages
Applied Natural Language Processing: Barbara Rosario
No ratings yet
Applied Natural Language Processing: Barbara Rosario
39 pages
Tungban Probabilistic ML 2021 - 02 - Reasoning - Under - Uncertainty
No ratings yet
Tungban Probabilistic ML 2021 - 02 - Reasoning - Under - Uncertainty
40 pages
Statistical Inference
No ratings yet
Statistical Inference
38 pages
NLP Week 03
No ratings yet
NLP Week 03
33 pages
Exponential Family
No ratings yet
Exponential Family
45 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Estimation: M. Shafiqur Rahman
No ratings yet
Estimation: M. Shafiqur Rahman
31 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
The Futureof Statistics Bayesian
No ratings yet
The Futureof Statistics Bayesian
11 pages
3 Analyze Hypothesis Testing Normal Data
No ratings yet
3 Analyze Hypothesis Testing Normal Data
35 pages
ch09ppln Two Populations
No ratings yet
ch09ppln Two Populations
46 pages
Artificial Intelligence: Rohan Raj Poudel
No ratings yet
Artificial Intelligence: Rohan Raj Poudel
34 pages
Image AI
No ratings yet
Image AI
37 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
Dimensionless Correlation For Sand Erosion of Families of Polymers PDF
No ratings yet
Dimensionless Correlation For Sand Erosion of Families of Polymers PDF
4 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
Lec03 PROBABLITY MODELS
No ratings yet
Lec03 PROBABLITY MODELS
45 pages
AI CSE Unit - 3 First Half
No ratings yet
AI CSE Unit - 3 First Half
51 pages
Uncertainty F23 Part1
No ratings yet
Uncertainty F23 Part1
44 pages
Calibration of A Complex Activated Sludge Model For The Full-Scale Wastewater Treatment Plant
No ratings yet
Calibration of A Complex Activated Sludge Model For The Full-Scale Wastewater Treatment Plant
13 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
Natural Language Processing:: N-Gram Language Models
No ratings yet
Natural Language Processing:: N-Gram Language Models
48 pages
01 Introduction To N-Grams 8-41
No ratings yet
01 Introduction To N-Grams 8-41
13 pages
Term Paper Topic: First Order Probabilistic Logic
No ratings yet
Term Paper Topic: First Order Probabilistic Logic
6 pages
Aligning Sentences in Parallel Corpora
No ratings yet
Aligning Sentences in Parallel Corpora
8 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
Machine Learning and Statistical Natural Language Processing
No ratings yet
Machine Learning and Statistical Natural Language Processing
27 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Chapter 8 Knowledge Representation II
No ratings yet
Chapter 8 Knowledge Representation II
26 pages
7confidence Interval
No ratings yet
7confidence Interval
18 pages
Lec-1 Probabilistic Models
No ratings yet
Lec-1 Probabilistic Models
29 pages
Statistics 2 Intro Prob
No ratings yet
Statistics 2 Intro Prob
21 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
NLP Unit 4
No ratings yet
NLP Unit 4
22 pages
9783642336928-c2+ch 2
No ratings yet
9783642336928-c2+ch 2
9 pages
Ch13-1skelpart 1
No ratings yet
Ch13-1skelpart 1
22 pages
Language Modelling
No ratings yet
Language Modelling
17 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Evaluating Language Models
No ratings yet
Evaluating Language Models
21 pages
6914 - N - 19724 Sampling For Proportions
No ratings yet
6914 - N - 19724 Sampling For Proportions
13 pages
2019 - 5G-ACIA - WP - A-5G-Traffic-Model-for-Industrial-Use-Cases - SinglePages
No ratings yet
2019 - 5G-ACIA - WP - A-5G-Traffic-Model-for-Industrial-Use-Cases - SinglePages
28 pages
MCMC Sampling For Dummies
No ratings yet
MCMC Sampling For Dummies
15 pages
Bernoulli Trials Geometric and Binomial Probability Models
No ratings yet
Bernoulli Trials Geometric and Binomial Probability Models
15 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
DLP Q4 W1 Math 7
No ratings yet
DLP Q4 W1 Math 7
3 pages
Mathematical Assessment Synthetic Hydrology: Vol. $, No. 4 Water Resources Research Fourth Quarter 1967
No ratings yet
Mathematical Assessment Synthetic Hydrology: Vol. $, No. 4 Water Resources Research Fourth Quarter 1967
9 pages
Homework Oktober 2011
No ratings yet
Homework Oktober 2011
1 page
Bio Stats
No ratings yet
Bio Stats
2 pages
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Simple Numbers
From Everand
Simple Numbers
Prasant
No ratings yet

Eisner-Probability How To Use Prob

Uploaded by

Eisner-Probability How To Use Prob

Uploaded by

How to Use Probabilities

The Crash Course

• Probability notation like p(X | Y):

600.465 – Intro to NLP – J. Eisner 2

• descriptive: mean Hopkins SAT (or median)

• confirmatory: statistically significant?

• predictive: wanna bet?

600.465 – Intro to NLP – J. Eisner 3

p(Paul Revere wins | weather’s clear) = 0.9

600.465 – Intro to NLP – J. Eisner 4

All Events (races)

• p() = 0 p(all events) = 1

600.465 – Intro to NLP – J. Eisner 9

p(Paul Revere wins | weather’s clear, ground is

600.465 – Intro to NLP – J. Eisner 12

If this prob is unchanged by backoff, we say Revere was

• Is this English or Polish or what?

• Is it “good” (= likely) English?

• Space of events will be not races but character

600.465 – Intro to NLP – J. Eisner 14

• Let p(X) = probability of text X in English

“Horses and Lukasiewicz are on the curriculum.”

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

600.465 – Intro to NLP – J. Eisner 15

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

* p(x2=o | x1=h) 395/ 4470

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

* p(x2=o | x1=h) 395/ 4470

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

* p(xi=e | xi-2=r, xi-1=s, i=5) 12/ 126

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

* p(xi=r | xi-2=h, xi-1=o) 1417/ 14765

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

* p(x2=o | x1=h) 395/ 4470

* p(r | h, o) 1417/ 14765

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

param Trigram Model definition

generate find event

• But we rewrote p(horses) as

600.465 – Intro to NLP – J. Eisner 26

• p(number of heads=2) or just p(H=2)

THT THH HHT HHH

600.465 – Intro to NLP – J. Eisner All Events 29

• p(x1=h) * p(x2=o | x1=h) * …

600.465 – Intro to NLP – J. Eisner 31

p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)

Much larger than previous estimate of 5.4e-7 – why?

600.465 – Intro to NLP – J. Eisner 33

• Model 1 – predict each letter Xi from

• Models make different independence

600.465 – Intro to NLP – J. Eisner 36

• Easy to check by removing syntactic sugar

• Stare at it so you’ll recognize it later

600.465 – Intro to NLP – J. Eisner 37

• But surely for language ID we should compare

600.465 – Intro to NLP – J. Eisner 38

• p(positive test | diseased) = 0.05 “false pos”

• What is p(diseased | positive test)?

600.465 – Intro to NLP – J. Eisner 44

You might also like