0% found this document useful (0 votes)

41 views25 pages

Maximum Likelihood Learning of Gaussians For Data Mining

Uploaded by

king_bhaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views25 pages

Maximum Likelihood Learning of Gaussians For Data Mining

Uploaded by

king_bhaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Learning with

Maximum Likelihood
Note to other teachers and users of
these slides. Andrew would be delighted
Andrew W. Moore
Professor
if you found this source material useful in
giving your own lectures. Feel free to use
these slides verbatim, or to modify them
to fit your own needs. PowerPoint
originals are available. If you make use School of Computer Science
of a significant portion of these slides in
your own lecture, please include this
message, or the following link to the Carnegie Mellon University
source repository of Andrew’s tutorials:
http://www.cs.cmu.edu/~awm/tutorials . www.cs.cmu.edu/~awm
Comments and corrections gratefully
received. [email protected]
412-268-7599

Copyright © 2001, 2004, Andrew W. Moore Sep 6th, 2001

Maximum Likelihood learning of

Gaussians for Data Mining
• Why we should care
• Learning Univariate Gaussians
• Learning Multivariate Gaussians
• What’s a biased estimator?
• Bayesian Learning of Gaussians

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 2

1
Why we should care
• Maximum Likelihood Estimation is a very
very very very fundamental part of data
analysis.
• “MLE for Gaussians” is training wheels for
our future techniques
• Learning Gaussians is more useful than you
might guess…

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 3

Learning Gaussians from Data

• Suppose you have x1, x2, … xR ~ (i.i.d) N(µ,σ2)
• But you don’t know µ
(you do know σ2)

MLE: For which µ is x1, x2, … xR most likely?

MAP: Which µ maximizes p(µ|x1, x2, … xR , σ2)?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 4

2
Learning Gaussians from Data
• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,σ2)
• But you don’t know µ
(you do know σ2)
Sneer

MLE: For which µ is x1, x2, … xR most likely?

MAP: Which µ maximizes p(µ|x1, x2, … xR , σ2)?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 5

Learning Gaussians from Data

• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,σ2)
• But you don’t know µ
(you do know σ2)
Sneer

MLE: For which µ is x1, x2, … xR most likely?

MAP: Which µ maximizes p(µ|x1, x2, … xR , σ2)?

Despite this, we’ll spend 95% of our time on MLE. Why? Wait and see…

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 6

3
MLE for univariate Gaussian
• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,σ2)
• But you don’t know µ (you do know σ2)
• MLE: For which µ is x1, x2, … xR most likely?

µ mle = arg max p ( x1 , x2 ,... x R | µ , σ 2 )

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 7

Algebra Euphoria
µ mle = arg max p ( x1 , x2 ,... x R | µ , σ 2 )
µ

= (by i.i.d)

= (monotonicity of
log)

= (plug in formula
for Gaussian)

= (after
simplification)

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 8

4
Algebra Euphoria
µ mle = arg max p ( x1 , x2 ,... x R | µ , σ 2 )
µ
R
= arg max ∏ p ( xi | µ , σ 2 )
(by i.i.d)
µ i =1
R
= arg max log p ( x | µ , σ 2 ) (monotonicity of
∑ µ
i
i =1
log)

= arg max 1 R
( xi − µ ) 2 (plug in formula

µ 2π σ
∑
i =1
−
2σ 2
for Gaussian)

= R (after
arg min ∑ ( xi − µ ) 2 simplification)
µ i =1

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 9

Intermission: A General Scalar

MLE strategy
Task: Find MLE θ assuming known form for p(Data| θ,stuff)
1. Write LL = log P(Data| θ,stuff)
2. Work out ∂LL/∂θ using high-school calculus
3. Set ∂LL/∂θ=0 for a maximum, creating an equation in
terms of θ
4. Solve it*
5. Check that you’ve found a maximum rather than a
minimum or saddle-point, and be careful if θ is
constrained

*This is a perfect example of something that works perfectly in

all textbook examples and usually involves surprising pain if
you need it for something new.
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 10

5
The MLE µ
µ mle = arg max p ( x1 , x2 ,... x R | µ , σ 2 )
µ
R
= arg min ∑ ( xi − µ ) 2
µ i =1

∂LL
= µ s.t. 0 = =
∂µ

= (what?)

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 11

The MLE µ
µ mle = arg max p ( x1 , x2 ,... x R | µ , σ 2 )
µ
R
= arg min ∑ ( xi − µ ) 2
µ i =1

∂LL ∂ R
= µ s.t. 0 =
∂µ
=
∂µ
∑ (x
i =1
i − µ )2
R
− ∑ 2 ( xi − µ )
i =1

1 R
Thus µ = ∑ xi
R i =1
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 12

6
Lawks-a-lawdy!
1 R
µ mle = ∑ xi
R i =1

• The best estimate of the mean of a

distribution is the mean of the sample!
At first sight:
This kind of pedantic, algebra-filled and
ultimately unsurprising fact is exactly the
reason people throw down their
“Statistics” book and pick up their “Agent
Based Evolutionary Data Mining Using
The Neuro-Fuzz Transform” book.

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 13

A General MLE strategy

Suppose θ = (θ1, θ2, …, θn)T is a vector of parameters.
Task: Find MLE θ assuming known form for p(Data| θ,stuff)
1. Write LL = log P(Data| θ,stuff)
2. Work out ∂LL/∂θ using high-school calculus
⎛ ∂LL ⎞
⎜ ⎟
⎜ ∂θ1 ⎟
∂LL ⎟
∂LL ⎜⎜
= ∂θ ⎟
∂θ ⎜ 2 ⎟
⎜ M ⎟
⎜ ∂LL ⎟
⎜ ∂θ ⎟
⎝ n ⎠
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 14

7
A General MLE strategy
Suppose θ = (θ1, θ2, …, θn)T is a vector of parameters.
Task: Find MLE θ assuming known form for p(Data| θ,stuff)
1. Write LL = log P(Data| θ,stuff)
2. Work out ∂LL/∂θ using high-school calculus
3. Solve the set of simultaneous equations

∂LL
=0
∂θ1
∂LL
=0
∂θ 2
M
∂LL
=0
∂θ n
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 15

A General MLE strategy

∂LL
=0
∂θ1
∂LL
=0 4. Check that you’re at
∂θ 2
a maximum
M
∂LL
=0
∂θ n
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 16

8
A General MLE strategy
Suppose θ = (θ1, θ2, …, θn)T is a vector of parameters.
Task: Find MLE θ assuming known form for p(Data| θ,stuff)
1. Write LL = log P(Data| θ,stuff)
2. Work out ∂LL/∂θ using high-school calculus
3. Solve the set of simultaneous equations

∂LL
=0
If you can’t solve them, ∂θ1
what should you do? ∂LL
=0 4. Check that you’re at
∂θ 2
a maximum
M
∂LL
=0
∂θ n
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 17

MLE for univariate Gaussian

• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,σ2)
• But you don’t know µ or σ2
• MLE: For which θ =(µ,σ2) is x1, x2,…xR most likely?
R
1 1
log p ( x1 , x2 ,... x R | µ , σ 2 ) = − R (log π +
2
log σ 2 ) −
2σ 2
∑ (x
i =1
i −µ ) 2

∂LL 1 R

∂µ
= 2
σ
∑ (x
i =1
i −µ )

∂LL R 1 R

∂σ 2
=−
2σ 2
+
2σ 4
∑ (x
i =1
i −µ ) 2

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 18

9
MLE for univariate Gaussian
• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,σ2)
• But you don’t know µ or σ2
• MLE: For which θ =(µ,σ2) is x1, x2,…xR most likely?
R
1 1
log p ( x1 , x2 ,... x R | µ , σ 2 ) = − R (log π +
2
log σ 2 ) −
2σ 2
∑ (x
i =1
i −µ ) 2

R
1
0=
σ2
∑ (x
i =1
i −µ )

R
R 1
0=−
2σ 2
+
2σ 4 ∑ (x
i =1
i −µ ) 2

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 19

MLE for univariate Gaussian

• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,σ2)
• But you don’t know µ or σ2
• MLE: For which θ =(µ,σ2) is x1, x2,…xR most likely?
R
1 1
log p ( x1 , x 2 ,... x R | µ , σ 2 ) = − R (log π +
2
log σ 2 ) −
2σ 2
∑ (x
i =1
i −µ ) 2

R R
1 1
0=
σ 2 ∑ (x
i =1
i −µ ) ⇒ µ = ∑ xi
R i =1
R
R 1
0=−
2σ 2
+
2σ 4 ∑ (x
i =1
i −µ ) 2 ⇒ what?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 20

10
MLE for univariate Gaussian
• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,σ2)
• But you don’t know µ or σ2
• MLE: For which θ =(µ,σ2) is x1, x2,…xR most likely?

1 R
µ mle = ∑ xi
R i =1

1 R
σ mle
2
= ∑
R i =1
( xi −µ mle ) 2

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 21

Unbiased Estimators
• An estimator of a parameter is unbiased if the
expected value of the estimate is the same as the
true value of the parameters.
• If x1, x2, … xR ~(i.i.d) N(µ,σ2) then
⎡1 R ⎤
E[ µ mle ] = E ⎢ ∑ xi ⎥ = µ
⎣ R i =1 ⎦

µmle is unbiased

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 22

11
Biased Estimators
• An estimator of a parameter is biased if the
expected value of the estimate is different from
the true value of the parameters.
• If x1, x2, … xR ~(i.i.d) N(µ,σ2) then
⎡1 ⎛ R ⎞ ⎤
2

Eσ [ ] 2
mle
⎡1 R mle 2 ⎤ 1 R
= E ⎢ ∑ ( xi −µ ) ⎥ = E ⎢ ⎜⎜ ∑ xi − ∑ x j ⎟⎟ ⎥ ≠ σ 2
⎣ R i =1 ⎦ ⎢⎣ R ⎝ i =1 R j =1 ⎠ ⎥
⎦
σ2mle is biased

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 23

MLE Variance Bias

• If x1, x2, … xR ~(i.i.d) N(µ,σ2) then

⎡1 ⎛ R ⎞ ⎤ ⎛
2

Eσ [ ]
2
mle
1 R 1⎞
= E ⎢ ⎜⎜ ∑ xi − ∑ x j ⎟⎟ ⎥ = ⎜1 − ⎟σ 2 ≠ σ 2
⎢⎣ R ⎝ i =1 R j =1 ⎠ ⎥ ⎝
⎦
R⎠

Intuition check: consider the case of R=1

Why should our guts expect that σ2mle would be an

underestimate of true σ2?
How could you prove that?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 24

12
Unbiased estimate of Variance
• If x1, x2, … xR ~(i.i.d) N(µ,σ2) then

⎡1 ⎛ R ⎞ ⎤ ⎛
2

Eσ [ ] 2
mle
1 R 1⎞
= E ⎢ ⎜⎜ ∑ xi − ∑ x j ⎟⎟ ⎥ = ⎜1 − ⎟σ 2 ≠ σ 2
⎢⎣ R ⎝ i =1 R j =1 ⎠ ⎥ ⎝
⎦
R⎠

σ mle
2
σ =
[ ]
2
So define So E σ unbiased
2
=σ2
⎛ 1⎞
unbiased

⎜1 − ⎟
⎝ R⎠

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 25

Unbiased estimate of Variance

• If x1, x2, … xR ~(i.i.d) N(µ,σ2) then

⎡1 ⎛ R ⎞ ⎤ ⎛
2

Eσ [ ] 2
mle
1 R 1⎞
= E ⎢ ⎜⎜ ∑ xi − ∑ x j ⎟⎟ ⎥ = ⎜1 − ⎟σ 2 ≠ σ 2
⎢⎣ R ⎝ i =1 R j =1 ⎠ ⎥ ⎝
⎦
R⎠

σ mle
2
σ = [ ]
2
So define unbiased
⎛ 1⎞ So E σ unbiased
2
=σ2
⎜1 − ⎟
⎝ R⎠

1 R
σ unbiased
2
= ∑
R − 1 i =1
( xi −µ mle ) 2

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 26

13
Unbiaseditude discussion
• Which is best?
1 R
σ mle
2
= ∑
R i =1
( xi −µ mle ) 2

1 R
σ unbiased
2
= ∑
R − 1 i =1
( xi −µ mle ) 2

Answer:
•It depends on the task
•And doesn’t make much difference once R--> large

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 27

Don’t get too excited about being

unbiased
• Assume x1, x2, … xR ~(i.i.d) N(µ,σ2)
• Suppose we had these estimators for the mean

R
1
µ suboptimal
=
R+7 R
∑x
i =1
i

Are either of these unbiased?

µ crap = x1 Will either of them asymptote to the
correct value as R gets large?
Which is more useful?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 28

14
MLE for m-dimensional Gaussian
• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,Σ)
• But you don’t know µ or Σ
• MLE: For which θ =(µ,Σ) is x1, x2, … xR most likely?

1 R
µ mle = ∑ xk
R k =1

Σ mle =
1 R
∑
R k =1
( )(
x k − µ mle x k − µ mle )
T

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 29

MLE for m-dimensional Gaussian

• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,Σ)
• But you don’t know µ or Σ
• MLE: For which θ =(µ,Σ) is x1, x2, … xR most likely?

1 R 1 R Where 1 ≤ i ≤ m
µ mle = ∑ xk
R k =1
µ mle
i = ∑ x ki
R k =1
And xki is value of the
ith component of xk
( )( )
R
1
∑
T
Σ mle = x k − µ mle x k − µ mle (the ith attribute of
R k =1
the kth record)
And µimle is the ith
component of µmle
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 30

15
MLE for m-dimensional Gaussian
• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,Σ)
• But you don’t know µ or Σ
• MLE: For which θ =(µ,Σ) is x1, x2, … xR most likely?
Where 1 ≤ i ≤ m, 1 ≤ j ≤ m
R
1
µ mle = ∑ xk
R k =1 And xki is value of the ith
component of xk (the ith
attribute of the kth record)
Σ mle =
1 R
∑
R k =1
( )(
x k − µ mle x k − µ mle )T

And σijmle is the (i,j)th

component of Σmle

σ ijmle =
1 R
∑
R k =1
( )(
x ki − µ imle x kj − µ mle
j )
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 31

MLE for m-dimensional Gaussian

Q: How would you prove this?
• Suppose you have x1, x2, … xRA:~(i.i.d) N(µ,Σ)the MLE
Just plug through
• But you don’t know µ or Σ recipe.

• MLE: For which θ =(µ,Σ) is xNote how Σmle is forced to be

1, x2, … xR most likely?
symmetric non-negative definite
Note the unbiased case
1 R
µ mle = ∑ xk How many datapoints would you
R k =1 need before the Gaussian has a
chance of being non-degenerate?

Σ mle =
1 R
∑
R k =1
( )(
x k − µ mle x k − µ mle )T

Σ unbiased =
Σ mle 1 R
1 R −1 ∑
= ( )(
x k − µ mle x k − µ mle )T

1− k =1
R
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 32

16
Confidence intervals
We need to talk

We need to discuss how accurate we expect µmle and Σmle to be as

a function of R
And we need to consider how to estimate these accuracies from
data…
•Analytically *
•Non-parametrically (using randomization and bootstrapping) *
But we won’t. Not yet.
*Will be discussed in future Andrew lectures…just before
we need this technology.

Structural error
Actually, we need to talk about something else too..
What if we do all this analysis when the true distribution is in fact
not Gaussian?
How can we tell? *
How can we survive? *
*Will be discussed in future Andrew lectures…just before
we need this technology.

17
Gaussian MLE in action
Using R=392 cars from the
“MPG” UCI dataset supplied
by Ross Quinlan

Data-starved Gaussian MLE

Using three subsets of MPG.
Each subset has 6
randomly-chosen cars.

18
Bivariate MLE in action

Multivariate MLE

Covariance matrices are not exciting to look at

19
Being Bayesian: MAP estimates for Gaussians
• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,Σ)
• But you don’t know µ or Σ
• MAP: Which (µ,Σ) maximizes p(µ,Σ |x1, x2, … xR)?

Step 1: Put a prior on (µ,Σ)

Being Bayesian: MAP estimates for Gaussians

• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,Σ)
• But you don’t know µ or Σ
• MAP: Which (µ,Σ) maximizes p(µ,Σ |x1, x2, … xR)?

Step 1: Put a prior on (µ,Σ)

Step 1a: Put a prior on Σ
(ν0-m-1) Σ ~ IW(ν0, (ν0-m-1) Σ 0 )
This thing is called the Inverse-Wishart
distribution.
A PDF over SPD matrices!

20
Being Bayesian: MAP estimates for Gaussians
ν small: “I am not sure
• Suppose
0
you have x , x , … xR ~(i.i.d) N(µ,Σ)
about my guess of Σ 0 1“ 2 Σ 0 : (Roughly) my best
• But you don’t know µ or Σ guess of Σ
• ν 0 large: “I’m pretty sure
MAP: Which (µ,Σ) maximizes p(µ,Σ |x1, x2, … xR)?
about my guess of Σ 0 “ Ε[Σ ] = Σ 0
Step 1: Put a prior on (µ,Σ)
Step 1a: Put a prior on Σ
(ν0-m-1) Σ ~ IW(ν0, (ν0-m-1) Σ 0 )
This thing is called the Inverse-Wishart
distribution.
A PDF over SPD matrices!

Being Bayesian: MAP estimates for Gaussians

• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,Σ)
• But you don’t know µ or Σ
• MAP: Which (µ,Σ) maximizes p(µ,Σ |x1, x2, … xR)?

Step 1: Put a prior on (µ,Σ)

Step 1a: Put a prior on Σ
(ν0-m-1)Σ ~ IW(ν0, (ν0-m-1)Σ 0 ) Together, “Σ” and
Step 1b: Put a prior on µ | Σ “µ | Σ” define a
joint distribution
µ | Σ ~ N(µ0 , Σ / κ0) on (µ,Σ)

21
Being Bayesian: MAP estimates for Gaussians
• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,Σ)
• But you don’t know µ or Σ κ0 small: “I am not sure
about
p(µmy guess of µ “
• MAP: Which (µ,Σ) maximizes ,Σ |x 1, x2, …0 xR)?

κ0 large: “I’m pretty sure

µ 0Step
: My1:best
Put guess
a priorofon
µ (µ,Σ)
about my guess of µ 0 “
1a: ]Put
Step E[µ = µa0 prior on Σ

(ν0-m-1)Σ ~ IW(ν0, (ν0-m-1)Σ 0 ) Together, “Σ” and

Step 1b: Put a prior on µ | Σ “µ | Σ” define a
joint distribution
µ | Σ ~ N(µ0 , Σ / κ0) on (µ,Σ)
Notice how we are forced to express our
ignorance of µ proportionally to Σ

Being Bayesian: MAP estimates for Gaussians

• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,Σ)
• But you don’t know µ or Σ
• MAP: Which (µ,Σ) maximizes p(µ,Σ |x1, x2, … xR)?

Step 1: Put a prior on (µ,Σ) Why do we use this form

of prior?
Step 1a: Put a prior on Σ
(ν0-m-1)Σ ~ IW(ν0, (ν0-m-1)Σ 0 )
Step 1b: Put a prior on µ | Σ
µ | Σ ~ N(µ0 , Σ / κ0)

22
Being Bayesian: MAP estimates for Gaussians
• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,Σ)
• But you don’t know µ or Σ
• MAP: Which (µ,Σ) maximizes p(µ,Σ |x1, x2, … xR)?
Why do we use this form of
Step 1: Put a prior on (µ,Σ) prior?
Step 1a: Put a prior on Σ Actually, we don’t have to

(ν0-m-1)Σ ~ IW(ν0, (ν0-m-1)Σ 0 ) But it is computationally and

algebraically convenient…
Step 1b: Put a prior on µ | Σ …it’s a conjugate prior.
µ | Σ ~ N(µ0 , Σ / κ0)

Being Bayesian: MAP estimates for Gaussians

• Suppose you have x1, x2, … xR ~(i.i.d) N(µ,Σ)
• MAP: Which (µ,Σ) maximizes p(µ,Σ |x1, x2, … xR)?
Step 1: Prior: (ν0-m-1) Σ ~ IW(ν0, (ν0-m-1) Σ 0 ), µ | Σ ~ N(µ0 , Σ / κ0)
Step 2:
1 R κ µ + Rx ν R = ν 0 + R
x= ∑
R k =1
xk µ R = 0 0
κ0 + R κ = κ + R
R 0

R
(ν R + m − 1) Σ R = (ν 0 + m − 1) Σ 0 + ∑ (x k − x )(x k − x ) +
T (x − µ 0 )(x − µ 0 )T
k =1 1/ κ 0 +1/ R
Step 3: Posterior: (νR+m-1)Σ ~ IW(νR, (νR+m-1) Σ R ),
µ | Σ ~ N(µR , Σ / κR)

Result: µmap = µR, E[Σ |x1, x2, … xR ]= ΣR

23
Being Bayesian: •Look
MAPcarefully
estimates forformulae
at what these
doing. It’s all very sensible.
Gaussians
are

x,x,…
• Suppose you have•Conjugate xRmean
priors ~(i.i.d) N(µand
prior form ,Σ)posterior
form1 are 2same and characterized by “sufficient
• MAP: Which (µ,Σ)statistics”
maximizes of the p( µ,Σ |x1, x2, … xR)?
data.
Step 1: Prior: (ν0-m-1) Σ ~ •The
IW(ν0marginal
, (ν0-m-1)distribution ~µ
Σ 0 ), µ | Σon is a, student-t
N(µ 0 Σ / κ0)
•One point of view: it’s pretty academic if R > 30
Step 2: R
1 κ 0 µ 0 + Rx ν R = ν 0 + R
x= ∑ k
R k =1
x µ R =
κ0 + R κ = κ + R
R 0

Result: µmap = µR, E[Σ |x1, x2, … xR ]= ΣR

Where we’re at

Categorical Real-valued Mixed Real /

inputs only inputs only Cat okay

Predict Joint BC
Inputs

Dec Tree
Classifier category Naïve BC

Joint DE Gauss DE
Inputs Inputs

Density Prob-
Estimator ability Naïve DE
Predict
Regressor real no.

24
What you should know
• The Recipe for MLE
• What do we sometimes prefer MLE to MAP?
• Understand MLE estimation of Gaussian
parameters
• Understand “biased estimator” versus
“unbiased estimator”
• Appreciate the outline behind Bayesian
estimation of Gaussian parameters

Useful exercise
• We’d already done some MLE in this class
without even telling you!
• Suppose categorical arity-n inputs x1, x2, …
xR~(i.i.d.) from a multinomial
M(p1, p2, … pn)
where
P(xk=j|p)=pj
• What is the MLE p=(p1, p2, … pn)?

Drilling Fluids Course
100% (1)
Drilling Fluids Course
79 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Introduction to Minimax
From Everand
Introduction to Minimax
V. F. Dem’yanov
No ratings yet
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
2020 Houdini Learning
100% (1)
2020 Houdini Learning
87 pages
Vapor-Liquid Equl. K - Value
100% (1)
Vapor-Liquid Equl. K - Value
49 pages
Chapte 2 - Maximum Likelihood - HEC - Lausanne
No ratings yet
Chapte 2 - Maximum Likelihood - HEC - Lausanne
276 pages
Chapter 2 - Maximum Likelihood - HEC - Lausanne
No ratings yet
Chapter 2 - Maximum Likelihood - HEC - Lausanne
277 pages
Unit 04 - Maximum Likelihood Estimation - 1 Per Page
No ratings yet
Unit 04 - Maximum Likelihood Estimation - 1 Per Page
62 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Chapter 2: Maximum Likelihood Estimation: Advanced Econometrics - HEC Lausanne
No ratings yet
Chapter 2: Maximum Likelihood Estimation: Advanced Econometrics - HEC Lausanne
207 pages
2025 - Exemplar English Gr3T2 Maths Diagnostic Assessment
No ratings yet
2025 - Exemplar English Gr3T2 Maths Diagnostic Assessment
6 pages
Module 4. Magnetism and Electromagnetism
100% (1)
Module 4. Magnetism and Electromagnetism
4 pages
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Lecture1 ML MLE
No ratings yet
Lecture1 ML MLE
103 pages
Maximum Likelihood Homework
100% (1)
Maximum Likelihood Homework
8 pages
CS775 Lec 2
No ratings yet
CS775 Lec 2
66 pages
Topic3 Formalizing Estimation Oct112023
No ratings yet
Topic3 Formalizing Estimation Oct112023
55 pages
Assign 1
No ratings yet
Assign 1
4 pages
Ps 2,3
No ratings yet
Ps 2,3
48 pages
Gopalakrishnan OkstSeparation of Resin Types in Mixed Bed Ion Exchange Columnate 0664M 11405
No ratings yet
Gopalakrishnan OkstSeparation of Resin Types in Mixed Bed Ion Exchange Columnate 0664M 11405
95 pages
Sta255 Week 11-1 Pre
No ratings yet
Sta255 Week 11-1 Pre
37 pages
Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
No ratings yet
Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
50 pages
LDP Cli
No ratings yet
LDP Cli
44 pages
OS Objective Type
No ratings yet
OS Objective Type
16 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
46 pages
02 Review Estimation 2
No ratings yet
02 Review Estimation 2
36 pages
Electrical Syllabus
No ratings yet
Electrical Syllabus
2 pages
Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
51 pages
Sta255 Week 11-2 Pre
No ratings yet
Sta255 Week 11-2 Pre
21 pages
Lec8 MLE
No ratings yet
Lec8 MLE
35 pages
Inf 2
No ratings yet
Inf 2
37 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Lecture 03 Maximum Likelihood Estimation
No ratings yet
Lecture 03 Maximum Likelihood Estimation
22 pages
Kelm102 PDF
No ratings yet
Kelm102 PDF
37 pages
Dashboard
No ratings yet
Dashboard
6 pages
21 Mle
No ratings yet
21 Mle
24 pages
Topic - Syllogism: DIRECTIONS For Questions 1 - 10: in Each of The Questions Below Are Given Three Statements Followed by
No ratings yet
Topic - Syllogism: DIRECTIONS For Questions 1 - 10: in Each of The Questions Below Are Given Three Statements Followed by
4 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
14 pages
Process Fluid Flow
No ratings yet
Process Fluid Flow
24 pages
Class 12 Maths Project Helpful
No ratings yet
Class 12 Maths Project Helpful
23 pages
VSD Atv212hu22n4
No ratings yet
VSD Atv212hu22n4
11 pages
Holiday Homework Science 2023
No ratings yet
Holiday Homework Science 2023
17 pages
Statistical Inference: Classical and Bayesian Methods
No ratings yet
Statistical Inference: Classical and Bayesian Methods
22 pages
Lecture03c Maximum Likelihood Annotated
No ratings yet
Lecture03c Maximum Likelihood Annotated
8 pages
Lecture03c Maximum Likelihood
No ratings yet
Lecture03c Maximum Likelihood
8 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
16 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
12 MLEFilled
No ratings yet
12 MLEFilled
8 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
14 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
MLE Assingnment
No ratings yet
MLE Assingnment
7 pages
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
No ratings yet
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
17 pages
L08 MaximumLikelihoodEstimation
No ratings yet
L08 MaximumLikelihoodEstimation
5 pages
Cat 500kva PDF
No ratings yet
Cat 500kva PDF
6 pages
Compressive Strength
No ratings yet
Compressive Strength
6 pages
ML Notes
No ratings yet
ML Notes
4 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
NI Tutorial 3173 en PDF
No ratings yet
NI Tutorial 3173 en PDF
9 pages
MLEstimation
No ratings yet
MLEstimation
8 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
4694
No ratings yet
4694
4 pages
Introduction To MME
No ratings yet
Introduction To MME
4 pages
Maximum-Likelihood & Bayesian Parameter Estimation: Srihari: CSE 555
No ratings yet
Maximum-Likelihood & Bayesian Parameter Estimation: Srihari: CSE 555
9 pages
Satellite Comm Note Ymca
No ratings yet
Satellite Comm Note Ymca
5 pages
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
No ratings yet
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
5 pages
Module 4
No ratings yet
Module 4
3 pages
Signal Converter Boe Bipolar en
No ratings yet
Signal Converter Boe Bipolar en
2 pages
404MKT - Marketing Strategy - 2019
No ratings yet
404MKT - Marketing Strategy - 2019
2 pages
Advanced Calculus I 2024 01 Winter Work
No ratings yet
Advanced Calculus I 2024 01 Winter Work
2 pages
JIT Template
No ratings yet
JIT Template
2 pages
Petroleum and Gas Field Processing
No ratings yet
Petroleum and Gas Field Processing
23 pages
Maximum
No ratings yet
Maximum
3 pages
LAMBDA DOSER - HIDOSER, Powder Dosing Instrument Leaflet
No ratings yet
LAMBDA DOSER - HIDOSER, Powder Dosing Instrument Leaflet
3 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Thermostats & Humidostats
No ratings yet
Thermostats & Humidostats
3 pages
JSS2
No ratings yet
JSS2
2 pages