0% found this document useful (0 votes)

33 views62 pages

Unit 04 - Maximum Likelihood Estimation - 1 Per Page

Uploaded by

debebeneg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views62 pages

Unit 04 - Maximum Likelihood Estimation - 1 Per Page

Uploaded by

debebeneg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Stat 111

Unit 4: Maximum Likelihood Estimation

(MLE)
Sections 7.2, 7.5, 7.6, 8.5, 8.8 in DeGroot

1
Unit 4 Outline
• The Likelihood Function and Maximum Likelihood
Estimation (MLE)
• MLE Examples
• Functions in R
• Newton’s Method to find roots
• Fisher’s Information and the Assymptotic Normality of
MLEs
• Confidence Intervals for MLEs
• Mean Square Error, Efficiency, and the Cramer-Rao Lower
Bound
• Optimization in R
2
MLE: an improved estimation approach
• In Unit 3, we learned one way to construct
estimators: Method of Moments
• This is not necessarily the best way to find
estimators. In fact, it is rarely used in practice (it’s
just so simple, so it’s a good teaching tool to
introduce estimation methods)
• A more efficient and widely-used approach:
Maximum Likelihood Estimation (MLE).
• This approach essentially chooses the value for the
parameter, θ, that maximizes the likelihood of seeing
the sample data that is collected.

3
Likelihood Function
• Again, what is inference?
• It is making statements about parameter(s), θ, given a
sample of data, X1, X2, …, Xn.
• It sure would be nice to have a function of θ given
the sample data in order to help us make these
inferential statements.
• That is exactly what the likelihood function is doing:
lik ( )  f ( X 1 , X 1 ,..., X n |  )

• In English: the likelihood function is the probability

of observing the data as a function of θ.
Note: other notations used: lik ( )   ( | x)  L ( | x)
4
Likelihood Function
• So really, this is just a slight of hand, or thinking about
the pdf from a different perspective.
• Instead of writing the PDF as a function of x given θ,
we can think of it as a function of θ, but it loses the
interpretation that the PDF would have.
• Mathematically this is fine to do. Once the sample is
drawn, the unknown is now the parameter (for now we
will not think of it as a random variable…since we are
first taking the classical Frequentist approach).
• If X1, X2, …, Xn are i.i.d. from f(x|θ) (this is usually the
case, but not always), the likelihood function
simplifies to: n
lik ( )   f  X i |  
i 1
5
Log-Likelihood Function
• What happens to this likelihood function when there is a
lot of data (n is large)?
• Since you are multiplying a lot of things (which are
usually quite small), the likelihood is likely to get very
small.
• To fix this computational problem we can instead deal
with the log-likelihood for convenience sake (and for
other reasons too, which we will get into much later).
• We define the log-likelihood, l(θ) = log[lik(θ)].
• And for i.i.d. observations:
n
l ( )  loglik ( )    log f  X i |  
i 1

• Note: Statisticians are sloppy: log = loge = ln. R too!

6
Maximum Likelihood Estimator (MLE)
• What is the likelihood function really measuring?
• It’s a measure of how likely any values of the
parameter(s) seem to be given a specific observed sample
of data
• So what’s better? Higher or lower?
• How can we maximize this function in terms of θ?
• Hooray for calculus!
• We call the value for θ that maximizes the likelihood
function, the maximum likelihood estimator (MLE) of θ:
ˆ  arg maxlik ( )


Note: it will sometimes be written as ˆMLE to differentiate it

from other estimators
7
Finding the MLE
• So the logarithm function is monotonic, the value of
that maximizes the likelihood function also maximizes
the log-likelihood function. The log function

• So instead we can solve:

2
1
0
-4 -3 -2 -1
ˆ  arg maxl ( )

y

0 2 4 6 8 10
x

• Simplest MLE example: fair coin vs. biased coin.

• You own 2 coins: a fair one (with p = 0.50 of landing
heads) and a biased one (with p = 0.80). You reach
into your pocket and select one coin at random to flip.
• You flip it 4 times and see 3 heads and one tail.
• What is the maximum likelihood estimate for p?
8
Bigger is better. For treehouses…
and for likelihood functions.

http://www.youtube.com/watch?v=JGS90HEbP5U

9
Unit 4 Outline
• The Likelihood Function and Maximum Likelihood
Estimation (MLE)
• MLE Examples
• Functions in R
• Newton’s Method to find roots
• Fisher’s Information and the Assymptotic Normality of
MLEs
• Confidence Intervals for MLEs
• Mean Square Error, Efficiency, and the Cramer-Rao Lower
Bound
• Optimization in R
10
MLE Example 1: Poisson distribution
• Suppose we have i.i.d. Xi ~ Pois(λ).
• What is the likelihood function?
n
  X i  
lik ( )    e 
i 1  ( X i )! 
• What is the log-likelihood function?
 n   X i   n
l ( )  log   e    X i log( )  log( X i !)   
 i 1  ( X i )!  i 1
n n
 log( ) X i    log( X i !)  n
i 1 i 1

11
MLE Example 1: Poisson dist. (cont.)
• What is the maximum likelihood estimator for λ?
• First differentiate log-likelihood = lʹ(λ):
 n n

d  log( ) X i    log( X i !)  n 
 
l ( )   
n
   X i   n
i 1 i 1 1
d    i 1
• Then set it to zero, lʹ(λ) = 0, and solve for λ.
  n
l ' ( )      X i   n  0
1
   i 1 1 n
̂    X i  X
 
1 n  n  i 1
  X i  n
   i 1
12
MLE Example 2: Normal distribution
• Suppose Xi ~ N(μ, σ2) and i.i.d.
• What is the likelihood function?
n  1  ( X   ) 2

lik (  ,  )   
2
exp  i

i 1   2  2 2

• What is the log-likelihood function?
n  1  ( X   ) 2

l (  ,  )   log 
2
exp  i

i 1  2  2 2


 
 ( X i   )2 
n n n
  log( )   log 2    
i 1 i 1 i 1  2  2


 1 n

  n log( )  n log 2  2  ( X i   ) 2
2 i 1
13
MLE Example 2: Normal dist. (cont.)
• Now there are two unknown parameters so we will
need to find the separate partial derivatives:
l (  ,  2 )  
  2
n
1

   n log( )  n log 2  2
  2

i 1
( X i   ) 

1 n
 2  (X i  )
 i 1

l (  ,  2 )   n
    
n
1 2

1
   log( )  n log 2  
2
(X i  ) 
2

 2
  2
2
2 i 1 

  (X
n
n1 2 2
 2     ) 2

2
i
2 i 1

14
MLE Example 2: Normal dist. (cont.)
• Set the separate partial derivatives to zero and solve for
the specific parameter:
l (  ,  2 )  1 n
 2  ( X i  ˆ )  0
  i 1
1 n
ˆ   X i  X
n i 1

l (  ,  2 ) n 1 n

 2
 2 
2ˆ  
2 ˆ 2
2  i
( X
i 1
  ) 2
0

1 n 1 n
ˆ   ( X i   )   ( X i  ˆ ) 2
2 2

n i 1 n i 1

15
MLE Example 3: Gamma distribution
• Suppose Xi ~ Gamma(a, λ) and i.i.d.
• What is the likelihood function?
n
 a a 1  X i 
lik (a,  )    Xi e 
i 1   a  
• What is the log-likelihood function?
n
 a a 1  X i 
l (a,  )   log  Xi e 
i 1  a  
n n n n
  log(a )   log(a )   (a  1) log X i     X i
i 1 i 1 i 1 i 1
n n
 na log( )  n log(a )   (a  1) log X i     X i
i 1 i 1
MLE Example 3: Gamma dist. (cont.)
• Two unknown parameters: θ ={a, λ}, so take the two
partial derivative separately:

l (a,  )   n n

  na log( )  n log(a )   (a  1) log X i     X i 
a a  i 1 i 1 
' (a ) n
 n log( )  n   log X i 
(a ) i 1

l (a,  )   n n

  na log( )  n log(a )   (a  1) log X i     X i 
   i 1 i 1 
na n
   Xi
 i 1
MLE Example 3: Gamma dist. (cont.)
• And set to zero (solve the λ-partial first):
l (a,  ) na n
   Xi  0
 ˆ i 1
ˆ aˆ aˆ
 n

n  Xi
1 X
i 1

And plug into a-partial equation:

l (a,  )
  ˆ n
  log X i   0
' ( a )
 n log ˆ  n
a (aˆ ) i 1
' (aˆ )  1  n
logaˆ   logX      log X i   0
(aˆ )  n  i 1
MLE Example 3: Gamma dist. (cont.)
• That second equation (the a-partial solution) is a non-
linear equation…a closed form solution does not exist 
(and boy is it ugly!)
• For a particular application, we need a numerical method
to solve. What are some examples?
• bisection method
• Newton’s method (also called Newton-Raphson)
• etc…
• We can get R to do this for us!
• How can we get sampling distributions for the MLE’s?
• Empirically via simulation/resampling methods!

19
Which Newton is That?

20
Unit 4 Outline
• The Likelihood Function and Maximum Likelihood
Estimation (MLE)
• MLE Examples
• Functions in R
• Newton’s Method to find roots
• Fisher’s Information and the Assymptotic Normality of
MLEs
• Confidence Intervals for MLEs
• Mean Square Error, Efficiency, and the Cramer-Rao Lower
Bound
• Optimization in R
21
Functions in R
• We’d like to calculate the log-likelihood (or likelihood) for a
model given a set of parameter(s), θ, and the data, x.
• The best way to do this is to write a user-defined function in
R so that this calculation can be done over and over again (so
we can draw the function, determine it’s maximum, etc…).
• A user-defined function in R looks like this:
my.function = function(arg1,arg2,...){
result = ... # do some work
return(result)
}
• my.function has several parts: the function name, arguments,
body, and results to be passed back to the user in the regular
R environment.
• The work (like the result var) is done internally in the
function, and cannot be accessed outside the function unless
it is explicitly returned to the user with the return expression.
• An example would be helpful…
22
R-Function: Gamma log-lik
• Recall that the log-likelihood of i.i.d. Xi ~ Gamma(a, λ):
n n
l (a,  )  na log(  )  n log(a)   (a  1) log  X i     X i
i 1 i 1

• A user-defined function in R to calculate this:

gamma.loglik=function(theta,x){
n=length(x)
a=theta[1]
lambda=theta[2]
l=n*(a*log(lambda)-log(gamma(a))+(a-1)*mean(log(x))-
lambda*mean(x))
return(l)
}

• So how do we use this function? Need an application…

23
Boston Storm Data
• Recall the Boston Storm Data from 2013:
• Observations are the daily rainfall for each day that it
rained in Boston (n = 129)
• The histogram loosely resembles a Gamma distribution
(right-skewed):
100

> hist(precip,col="gray",main="")
80
60
Frequency

40
20
0

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

precip

• We will use this as illustration for numerically calculating

log-likelihood and finding the MLE estimates for the
gamma distribution.
24
Some old R code (to find MOMs)
f=file.choose()
storm=read.csv(f,header=T)
dim(storm)
names(storm)
> lambda.mom
attach(storm)
[1] 1.459674
n=length(precip) > a.mom
xbar=mean(precip) [1] 0.4325839
s2=var(precip)
sigma2.hat=s2*(n-1)/n
lambda.mom=xbar/sigma2.hat
a.mom=xbar^2/sigma2.hat
lambda.mom
a.mom

25
Plotting the l(a, λ): a double for loop
• We would like to plot the log-likelihood for the Boston storm
dataset for various values of a and λ.
• This poses a difficulty since there are two unknowns
(parameters here) we need to look over
• We need to use a “double” for loop. See code below:
a=1:100/100
lambda=1:200/50
loglik=matrix(NA,nrow=length(a),ncol=length(lambda))
dim(loglik)

for(i in 1:length(a)){
for(j in 1:length(lambda)){

loglik[i,j]=gamma.loglik(theta=c(a[i],lambda[j]),x=precip)
}
}
Plotting l(a, λ) and Finding MLEs
• Let’s plot is (need a 3d plot, so need to use an R-package:
• This poses a difficulty since there are two unknowns
(parameters here) we need to look over
• We need to use a “double” for loop. See code below:
require(scatterplot3d)
persp(x=a,y=lambda,z=loglik,shade=0.5,axes=T,
col=c("darkred"),phi=20, theta=-60,
ticktype="detailed")

• And with some careful indexing, we can find the correct values
for a and λ that maximize our calculated log-likelihoods:
index=which(loglik==max(loglik))
a[index%%length(a)]
lambda[ceiling(index/length(a))]
[index%%length(a)]
[1] 0.59
> lambda[ceiling(index/length(a))]
[1] 2
Unit 4 Outline
• The Likelihood Function and Maximum Likelihood
Estimation (MLE)
• MLE Examples
• Functions in R
• Newton’s Method to find roots
• Fisher’s Information and the Assymptotic Normality of
MLEs
• Confidence Intervals for MLEs
• Mean Square Error, Efficiency, and the Cramer-Rao Lower
Bound
• Optimization in R
28
Newton’s Method
• Newton’s Method (sometimes called the Newton-Raphson
Method) is a numerical way to solve for roots of a
function.
• It is an iterative algorithm based on the following equation:
f ( x0 )
x1  x0 
f ' ( x0 )
f ( xk )
xk 1  xk 
f ' ( xk )
• You iteratively update a potential root xk until:
| xk – xk-1 | < ε for some small amount of error (ε).
• Some issues arise. But key is that your starting point, x0,
needs to be somewhat close to the potential root.
29
Newton’s Method (cont.)
• Newton’s Method is quite applicable for maximum
likelihood estimation! The situation often arise when
the roots of l’(θ) = 0 do not have a closed form
expression.
• So for MLE, the f(x) from the previous slide is actually
l’(θ). Thus the formulas become:
l ' ( 0 ) l ' ( k )
1   0   k 1   k 
l ' ' ( 0 ) l ' ' ( k )

• We will talk about l’’(θ) in a bit. But for now, we can

just use the predefined functions in R: uniroot and
polyroot

30
Using Newton’s Method: Gamma dist.
• Recall, the a-partial equation for l’(a) for a gamma
distribution was:
l (a,  ) ' (aˆ )  1  n
 logaˆ   logX      log X i   0
a (aˆ )  n  i 1
• So we need to do 2 steps in R to solve this equation:
1) Create a user-defined function to calculate the result of
this function (l’(a)) given a value of the parameter a
(and given the data X1, X2, …, Xn).
2) Use the function uniroot to find the appropriate root
for this equation
*And don’t forget there another parameter to estimate
afterwards: ˆ â

X
A Function in R to Calculate l’(a)
• Step #1: create a user-defined function (let’s call it a.partial)
to calculate the results of the function of the a-partial
derivative (given parameter, a, and data, x):
l (a,  ) ' (aˆ ) 1 n
 logaˆ    logX     log X i   0
a (aˆ )  n  i 1

a.partial = function(a,x){
f = log(a)-digamma(a)-log(mean(x))+mean(log(x))
return(f)
}

• What is the name of the function? What are the arguments

of the function? What is the body of the function? What is
the result of the function? How do we use this function?

32
uniroot in R to solve l’(a) = 0
• Step #2: Use the function uniroot to find the appropriate
root for this equation:
> result1
?uniroot
$root
uniroot(f=a.partial,interval= [1] 0.5920035
c(min(precip),max(precip)),x=precip) $f.root
result1=uniroot(f=a.partial,interval= [1] -4.113931e-06

c(min(precip),max(precip)),x=precip) $iter
a.mle=result1$root [1] 10

lambda.mle=a.mle/mean(precip) $estim.prec
[1] 6.103516e-05
a.mle
lambda.mle > a.mle
[1] 0.5920035
> lambda.mle
[1] 1.997605

33
Unit 4 Outline
• The Likelihood Function and Maximum Likelihood
Estimation (MLE)
• MLE Examples
• Functions in R
• Newton’s Method to find roots
• Fisher’s Information and the Assymptotic Normality of
MLEs
• Confidence Intervals for MLEs
• Mean Square Error, Efficiency, and the Cramer-Rao Lower
Bound
• Optimization in R
34
Fisher’s Information
• Finally, back to real Statistics (and not computation) 
• Recall, l(θ) is measuring the likelihood of the potential
values of θ given the data, X1, X2, …, Xn.
• We’d like a measure for the uncertainty of an MLE
(like the variance of ˆMLE).
• The Fisher information, In(θ), does exactly that:
   
2

I n ( )  E  l ( ) 
   
• Which is mathematically equivalent to (for ease of
calculations):
 2 
I n ( )   E  2 l ( )
  
*See DeGroot p.515-516 for proof.
35
Fisher’s Information
• Theorem: for MLEs, we can show that under “mild
conditions”:
 1 
ˆ 
~ N  ˆ   0 ,  ˆ 
2

I n ( 0 ) 
MLE

where θ0 is the true unknown parameter(s).

• What distribution does I n ( 0 ) ˆMLE   0 have? 
• What does this say about biasness of MLEs? What
about consistency of MLEs?
• What is Fisher’s information, In(θ0) , measuring? Do
you want it to be large or small?

36
Derivation of Theorem
 
• By definition of MLE: l ' ˆMLE  0 . Applying a Taylor
series expansion:
l ' ˆMLE   l '  0   ˆ0   0 l ' '  0 

ˆMLE  0    ll'''0 
0

l '  0 
1

n ˆMLE 
 0  n
 l ' '  0 
1
n
• Expand the numerator:
1 n 
l '  0   log f ( X i |  0 ) 
1
n

n i 1 
• What is this a sum of?
• i.i.d. random variables. So if CLT holds…
37
Derivation of Theorem (cont.)
• By CLT, we know:
1 n     2  
 log  f ( X |  )   N 
   E  log  f ( X |  )
0  ,   Var  log  f ( X |  ) 
0  
n i 1     
i 0
  
   
• What are E  log f ( X |  0 )  and Var  log f ( X |  0 )  ?
     
     log f ( X |  )  f ( X |  )dx
E  log f ( X |  0 )    0 

0
    
   
     f ( X |  0 )  / f ( X |  0 )  f ( X |  0 )dx
    
 
  f ( X |  0 ) dx   f ( X |  0 )dx  0
 

38
Bring it home!
  E   log f ( X |  )    0 2  I  
 2

Var  log f ( X |  0 )   0  
 
0
   
   

• Hooray! So (if n is large enough, by the CLT):
l '  0  ~ N 0, I ( 0 ) 
1
n
• What about the denominator?
1 n 2  2 
 l ' '  0     2 log f ( X i |  0 )   E  2 log f ( X |  0 ) 
1
n n i 1    
• Thus:
l '  0 
1
I  0 
ˆ 
n 0  0  n 

1  1
  
 
l '  0   N    0,  
 2

 2


1 
 


 l ' '  0   
1 I 0 n  I 0
I 0 
n
39
Proof is over!
• So what did we just show?
• That the sampling distribution of any MLE will be
approximately Normally distribution, given:
• n is large enough,
• you don’t have too extreme outliers in l’(θ0)
• and your observations are i.i.d.
• So what? Now we have an easy way to construct
confidence intervals and conduct hypothesis tests 
• Note this also holds in the multi-dimensional
parameter case, but what are the dimension of In(θ0)?
• So it needs to be written as:
ˆ0  N    0 ,  2  I n1  0 
40
Happy National Battery Day!
• Batteries were invented by this guy in 1800:

Alessandro Volta

• Batteries are good for:

• Transferring chemical energy
into electrical energy.
• 9V: Shocking your tongue:

• D-cell: Throwing at JD Drew:

41
Unit 4 Outline
• The Likelihood Function and Maximum Likelihood
Estimation (MLE)
• MLE Examples
• Functions in R
• Newton’s Method to find roots
• Fisher’s Information and the Assymptotic Normality of
MLEs
• Confidence Intervals for MLEs
• Mean Square Error, Efficiency, and the Cramer-Rao Lower
Bound
• Optimization in R
42
Confidence Intervals Revisited
• Back in Unit 2, we talked about confidence intervals from an
Intro. Stat (104) perspective.
• What’s the purpose of calculating a confidence interval?
• Gives an interval estimate for a parameter for the
population/theoretical distribution.
• What is the interpretation of a confidence interval?
• It is a range of plausible values for the parameter of the
distribution that generated our sample
• “We are 95% confident that the true population parameter falls
in the range”
• “If we were to repeatedly sample and recalculate this interval
over-and-over again, we’d expect 95% of them to cover the
true parameter we are estimating (the random sampling/data
generation method is random).”
43
Confidence Intervals Revisited
• How is one classically calculated?
• We calculate the quantiles of the sampling distribution of the
statistic assuming the population parameter was equal to the
observed statistic in the actual sample.
• For example, we’d like to construct a 95% confidence interval
for the mean of a normal distribution, µ. Then we can calculate
a 95% C.I. for μ based on a t-distribution since the statistic
X   / S / n  is know to follow a t-distribution:
n X    *
P (t / 2 
*
 t1 / 2 )  1  
S • What’s the estimator
here?
  *  S   S  
P    X  t / 2  , X  t1 / 2 
*
   1   • What’s the difference
   n  n   between the 2nd and
3rd lines? Why is this
   S   S   legal?
P    X  t1 / 2 
*
, X  t1 / 2 
*
   1  
   n  n  
44
Pivotal Statistic
• In the previous slide, we mentioned that the statistic
T  X    / S / n  is know to follow a t-distribution.
• Not only that, but this t-distribution that the statistic follows
does not depend on the values of the parameters. Thus, T is
called a:
• pivotal statistic: a statistic, V, (which is a random variable)
whose distribution is the same for all values of the
parameter(s).
• Pivotal statistics are useful because they can be used as the
basis of confidence intervals: just select the appropriate
quantiles of the distribution of the pivotal statistic, V, (based
on its inverse CDF: FV-1) in order to build the bounds of the
confidence interval.
*
* t
• This is what we did in the previous slide: t / 2 and 1 / 2
are the quantiles from the dist. of the pivotal statistic: T.
45
Calculating Confidence Intervals
• There are 3 approaches (that we will cover) to calculate
confidence intervals:
• Exact method (based on the analytical solution to
the sampling distribution)
• Approximation based on large sample theory
(usually based on the normal dist.)
• Resampling methods (bootstrap, etc…)
• In each approach, you pull off the (α/2) th and (1 –
α/2)th quantiles of the sampling distribution, where α
is some small error rate (usually α = 0.05).
• For our MLEs, we can use any of the 3. But usually
the exact distribution is difficult to solve, so we will
usually rely on the asymptotic normality of MLEs
46
Conf. Int. Construction for MLEs
(based on asymptotic normality)
• From earlier in Unit 4, we saw:
 
I n ( 0 ) ˆMLE   0  N (0, 1)

• What is θ0? What is In(θ0)?

• So we can always construct an assymptotic confidence
interval from this result.
   
P z* / 2  I n ( 0 ) ˆMLE   0  z1* / 2  1  
   z1* / 2   z1* / 2  

P  0  ˆMLE  , ˆMLE     1  
   I ( )   I ( )  
  n 0   n 0  

• Uh-oh, this assumes we know the true In(θ0). What

should we use instead? I n ˆMLE  
47
Conf. Int. Construction for MLEs
(based on asymptotic normality)

   z*   z*  

P 0  ˆMLE  1 / 2 , ˆ   1 / 2   1  
   ˆ  MLE  ˆ  
   I n ( MLE )   I n ( MLE )  
^
• By using In(θ) instead of In(θ0), we will technically
mess up the asymptotic normal distribution (just like
using s screwed up the normal distribution in
calculating µ).
• But it’s close enough to a normal distribution when n is
large (just like a t-distribution looks like a normal
when df is large).

48
Example: MLE-based C.I. for a Poisson
• Let i.i.d Xi ~ Pois(λ).
• What is λ𝑀𝐿𝐸 ? What is Var(λ𝑀𝐿𝐸 )?
λ𝑀𝐿𝐸 = X Var(X ) = λ/n
• What is Fisher’s Information, In(θ)?
 2   2  n n

I n ( )   E  2 l ( )   E  2  log( ) X i    log( X i !)  n 
      i 1 i 1 
 n 
   1  n   i 
  X 
   nX  n n
  E    i  X   n  
   E  i 1
   E  2 
 2 
     i 1        
2

 
n
• Thus the estimated Information is I n (ˆ)  .ˆ

49
Example: MLE-based C.I. for a Poisson
(cont.)
• Construct an asymptotic 95% C.I. for λ.
  z*   z* 
ˆMLE   1 / 2 , ˆMLE   1 / 2 
  ˆ )  ˆ ) 
 I (
 n MLE  I (
 n MLE 
    
  *   * 
 ˆ 
  
z1 / 2  ˆ  z1 / 2 
,  
 n  n 
  ˆ  
ˆ

      
 ˆ ˆ ˆ 
ˆ
   1.96 ,   1.96 
 n n


50
Unit 4 Outline
• The Likelihood Function and Maximum Likelihood
Estimation (MLE)
• MLE Examples
• Functions in R
• Newton’s Method to find roots
• Fisher’s Information and the Assymptotic Normality of
MLEs
• Confidence Intervals for MLEs
• Mean Square Error, Efficiency, and the Cramer-Rao Lower
Bound
• Optimization in R
51
Mean Square Error
• We have talked about a few properties of estimators so
far: biasedness, consistency, and variance.
• Another way to measure how good an estimator is: the
mean squared error (MSE):

MSE (ˆ)  E (ˆ   0 ) 2 

 Var (ˆ)  E (ˆ)   0 
2

• MSE is an intuitive measure of how good an estimator

is. Just look at the squared distance from what you want
it to be. To compare estimators, we’d like to minimize
this measure.
• Note: MSE takes into account both the bias ( E (ˆ)   0)
of an estimator and its variance.
52
Efficiency of Estimators
• The relative efficiency of two estimators, θ and θ, is
defined to be:
~
ˆ ~ MSE ( )
eff ( ,  ) 
MSE (ˆ)
• If θ and θ have the same bias, this simplifies to:
~
ˆ ~ Var ( )
eff ( ,  ) 
Var (ˆ)
• Example: Suppose i.i.d Xi ~ N(μ, σ2). Let θ = θ𝑀𝐿𝐸 =X
and θ = X1.
• What are these estimators biases? What are their
variances? What is the relative efficiency of θ to θ ?
53
Cramer-Rao Lower Bound
• For a statistical estimation problem, the optimal
estimate is often considered to be the one with
min(MSE).
• If we are comparing two unbiased estimators, then
MSE(θ) = Var(θ).
• So what is the best we can do as far as variance?
• Cramer-Rao Lower Bound: Let θ be an unbiased
estimator of θ. Then:
1
Var (ˆ) 
I n ( )
• If an estimator achieves the Cramer-Rao lower bound, it
is said to be completely efficient.
54
Cramer-Rao Example: Poisson
• Let i.i.d Xi ~ Pois(λ).

• What is θ𝑀𝐿𝐸 ?
θ𝑀𝐿𝐸 = X

• What is Var(θ𝑀𝐿𝐸 )?
Var(X ) = λ/n

• What is Fisher’s Information, In(θ)?

In(θ) = n/λ

• We’ve achieved the Cramer-Rio Lower bound!!!

55
Asymptotic Efficiency of MLEs
• What is the asymptotic bias of any maximum likelihood
estimator, θ𝑀𝐿𝐸 ? Meaning: what is the bias as n → ∞?
• In the last lecture we saw that E(θ𝑀𝐿𝐸 ) → θ as n → ∞.
• What is the asymptotic variance of θ𝑀𝐿𝐸 ?
1
Var (ˆ
MLE ) 
I n ( )
• So MLEs achieve the Cramer-Rao lower bound
asymptotically.
• This means MLEs are asymptotically efficient! It’s the
best we can do (asymptotically).
56
Unit 4 Outline
• The Likelihood Function and Maximum Likelihood
Estimation (MLE)
• MLE Examples
• Functions in R
• Newton’s Method to find roots
• Fisher’s Information and the Assymptotic Normality of
MLEs
• Confidence Intervals for MLEs
• Mean Square Error, Efficiency, and the Cramer-Rao Lower
Bound
• Optimization in R
57
Optimization in R
• In the last lecture, we numerically calculated the
MLE by solving the derivative of the log-likelihood
function, l’(θ) = 0, by using the uniroot function
in R.
• This works great if we can analytically write down
l’(θ). Sometimes this is not easy.
• That’s OK, there’s another way to numerically solve
for MLEs based on the log-likelihood function
directly: using R’s optim function.
• This will allow us to maximize a function that has
multiple parameters at once. But the key is: you
need a good starting spot!
58
optim in R to minimize or
maximize functions
?optim

optim(par, fn, gr = NULL, ...,

control = list(), hessian = FALSE)

par: the initial estimate(s) of the parameter(s) of the function

fn: the function which you minimizing or maximizing
gr: optional: an analytical function for the gradient
control: other arg’s: use “control=list(fnscale=-1)” to maximize
hessian: whether to return the numeric hessian matrix for fn
…: other arguments to pass along to the function fn
*Note: read the help file for other arguments.
59
optim: Example l(θ) Maximization
• Using the function optim to maximize a likelihood
function: > theta
$par
[1] 0.59196 1.99734
gamma.loglik=function(theta,x){
$value
n=length(x) [1] 42.54922
a=theta[1] $counts
lambda=theta[2] function gradient
101 NA
l=n*(a*log(lambda)-log(gamma(a))
+(a-1)*mean(log(x))-lambda*mean(x)) $convergence
[1] 0
return(l)
} $message
NULL

theta=optim(par=c(a.mom,lambda.mom), fn=gamma.loglik,
control=list(fnscale=-1), x=precip)
theta
60
So which approach should we take?

• We’ve learned 3 ways to calculate the MLE’s:

1) Analytical Solution
2) Numerically solving l’(θ) (uniroot)
3) Numerically maximizing l(θ) directly (optim)
• So which is the best choice? Does it really matter?
• At the very least, you can double-check your
solutions with another approach…

61
Hooray for MLEs!!
• What’s your favorite MLE?
Managed Learning Environment? Major League Eating?

DJ MLE? Maximum Likelihood Estimation?

Chapter 2 - Maximum Likelihood - HEC - Lausanne
No ratings yet
Chapter 2 - Maximum Likelihood - HEC - Lausanne
277 pages
PSLecture18 2022
No ratings yet
PSLecture18 2022
100 pages
Chapte 2 - Maximum Likelihood - HEC - Lausanne
No ratings yet
Chapte 2 - Maximum Likelihood - HEC - Lausanne
276 pages
Lecture1 ML MLE
No ratings yet
Lecture1 ML MLE
103 pages
Module02B Slides Print 1
No ratings yet
Module02B Slides Print 1
59 pages
Chapter 2: Maximum Likelihood Estimation: Advanced Econometrics - HEC Lausanne
No ratings yet
Chapter 2: Maximum Likelihood Estimation: Advanced Econometrics - HEC Lausanne
207 pages
Section 5
No ratings yet
Section 5
18 pages
7 Mle
No ratings yet
7 Mle
31 pages
Sta255 Week 11-1 Pre
No ratings yet
Sta255 Week 11-1 Pre
37 pages
Inf 2
No ratings yet
Inf 2
37 pages
Sta255 Week 11-2 Pre
No ratings yet
Sta255 Week 11-2 Pre
21 pages
Lecture17 Mle Map
No ratings yet
Lecture17 Mle Map
29 pages
Stat100b Maximum Likelihood
No ratings yet
Stat100b Maximum Likelihood
9 pages
FADEC - Full-Authority Digital Engine Control
100% (5)
FADEC - Full-Authority Digital Engine Control
44 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
46 pages
Toyota 5l
100% (2)
Toyota 5l
79 pages
STAT 135 Lab 2 Confidence Intervals, MLE and The Delta Method
No ratings yet
STAT 135 Lab 2 Confidence Intervals, MLE and The Delta Method
28 pages
Linear Operators for Quantum Mechanics
From Everand
Linear Operators for Quantum Mechanics
Thomas F. Jordan
5/5 (1)
T8 - Classical Stat Inference
No ratings yet
T8 - Classical Stat Inference
8 pages
Ps 2,3
No ratings yet
Ps 2,3
48 pages
Prints PDF
No ratings yet
Prints PDF
106 pages
Microcontroller Based Dam Gate Control System Project
50% (4)
Microcontroller Based Dam Gate Control System Project
108 pages
Week+3 418
No ratings yet
Week+3 418
9 pages
Numerical Optimization of Likelihoods: Additional Literature For STK2120
No ratings yet
Numerical Optimization of Likelihoods: Additional Literature For STK2120
46 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
12 MLEFilled
No ratings yet
12 MLEFilled
8 pages
Statistical Inference
No ratings yet
Statistical Inference
55 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Maximum Likelihood Estimator (MLE) and Ex-Amples: 1. Binomial Distribution
No ratings yet
Maximum Likelihood Estimator (MLE) and Ex-Amples: 1. Binomial Distribution
4 pages
Biostatistics: School of Mathematics and Statistics
No ratings yet
Biostatistics: School of Mathematics and Statistics
17 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
Statistical Inference: Classical and Bayesian Methods
No ratings yet
Statistical Inference: Classical and Bayesian Methods
22 pages
MLE Assingnment
No ratings yet
MLE Assingnment
7 pages
Student's Revision Book For Computer Studies P840/1
No ratings yet
Student's Revision Book For Computer Studies P840/1
193 pages
Introduction To Maximum Likelihood (ML)
No ratings yet
Introduction To Maximum Likelihood (ML)
16 pages
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
No ratings yet
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
5 pages
2022 Icas TC Ar V Imp
No ratings yet
2022 Icas TC Ar V Imp
534 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
L08 MaximumLikelihoodEstimation
No ratings yet
L08 MaximumLikelihoodEstimation
5 pages
Ism Practical File Nothing
No ratings yet
Ism Practical File Nothing
84 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
14 pages
Introduction To MME
No ratings yet
Introduction To MME
4 pages
Structural Engineering PG Lab Manual
No ratings yet
Structural Engineering PG Lab Manual
47 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Automatic & Manual Vacuum Cleaning Robot
No ratings yet
Automatic & Manual Vacuum Cleaning Robot
4 pages
Module 4
No ratings yet
Module 4
3 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
21 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Eproc Tenders
No ratings yet
Eproc Tenders
104 pages
HYC 1718 P1 Eng
No ratings yet
HYC 1718 P1 Eng
24 pages
Statistics 512 Notes 12: Maximum Likelihood Estimation: X X PX X
No ratings yet
Statistics 512 Notes 12: Maximum Likelihood Estimation: X X PX X
5 pages
Maximum Likelihood Estimators and Least Squares
No ratings yet
Maximum Likelihood Estimators and Least Squares
5 pages
Online Food Delivery App Foodie
No ratings yet
Online Food Delivery App Foodie
12 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
ML Notes
No ratings yet
ML Notes
4 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
8 pages
Maximum
No ratings yet
Maximum
3 pages
Application Form For Job
No ratings yet
Application Form For Job
3 pages
MLEstimation
No ratings yet
MLEstimation
8 pages
Topic 14: Maximum Likelihood Estimation: 1 Examples
No ratings yet
Topic 14: Maximum Likelihood Estimation: 1 Examples
6 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
Physics With Arduino
No ratings yet
Physics With Arduino
44 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
14 pages
Brochure SRT 4930 - en
No ratings yet
Brochure SRT 4930 - en
2 pages
Title: Automated Eye Tracking System
No ratings yet
Title: Automated Eye Tracking System
19 pages
Frequentist Estimation: 4.1 Likelihood Function
No ratings yet
Frequentist Estimation: 4.1 Likelihood Function
6 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Qualification Certificate
No ratings yet
Qualification Certificate
2 pages
Sensors: A Stress Sensor Based On Galvanic Skin Response (GSR) Controlled by Zigbee
No ratings yet
Sensors: A Stress Sensor Based On Galvanic Skin Response (GSR) Controlled by Zigbee
30 pages
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
Numerical Methods103
No ratings yet
Numerical Methods103
7 pages
Service Manual: TV-21ST3 TV-20ST5 TV-14ST5
No ratings yet
Service Manual: TV-21ST3 TV-20ST5 TV-14ST5
6 pages
Tatsuo Nakamura - Gaijin: Notes
No ratings yet
Tatsuo Nakamura - Gaijin: Notes
4 pages
Data Anonymization - SAP
No ratings yet
Data Anonymization - SAP
4 pages
A Tool For The Analysis of Chromosomes: Karyotype: Taxon June 2016
No ratings yet
A Tool For The Analysis of Chromosomes: Karyotype: Taxon June 2016
8 pages
Adhisuchna 25042013
No ratings yet
Adhisuchna 25042013
5 pages
Oxford Learner's Bookshelf E-Books For Learning English 2
No ratings yet
Oxford Learner's Bookshelf E-Books For Learning English 2
1 page
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Questions On Mysql-Dbms and Python Connectivity With Answers Using Format
No ratings yet
Questions On Mysql-Dbms and Python Connectivity With Answers Using Format
3 pages
Peoplesoft Price List 070612
No ratings yet
Peoplesoft Price List 070612
11 pages
Introduction to Logarithms and Exponentials
From Everand
Introduction to Logarithms and Exponentials
Simone Malacrida
No ratings yet
The Youtube Social Network: Mirjam Wattenhofer Roger Wattenhofer Zack Zhu
No ratings yet
The Youtube Social Network: Mirjam Wattenhofer Roger Wattenhofer Zack Zhu
9 pages
Group: Hina Akbar (005), M.Moaaz (021) (Group Leader), Amber: Zahra (002), Hammad Nawaz (017), Syed Hamza Ali Hashmi
No ratings yet
Group: Hina Akbar (005), M.Moaaz (021) (Group Leader), Amber: Zahra (002), Hammad Nawaz (017), Syed Hamza Ali Hashmi
4 pages
Resource Scheduling in Cloud Computing
No ratings yet
Resource Scheduling in Cloud Computing
6 pages
DSO-DP6 Plug-In Card 100-00168
No ratings yet
DSO-DP6 Plug-In Card 100-00168
2 pages

Unit 04 - Maximum Likelihood Estimation - 1 Per Page

Uploaded by

Unit 04 - Maximum Likelihood Estimation - 1 Per Page

Uploaded by

Stat 111

Unit 4: Maximum Likelihood Estimation

• In English: the likelihood function is the probability

• Note: Statisticians are sloppy: log = loge = ln. R too!

Note: it will sometimes be written as ˆMLE to differentiate it

• So instead we can solve:

• Simplest MLE example: fair coin vs. biased coin.

And plug into a-partial equation:

• A user-defined function in R to calculate this:

• So how do we use this function? Need an application…

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

• We will use this as illustration for numerically calculating

• We will talk about l’’(θ) in a bit. But for now, we can

• What is the name of the function? What are the arguments

• Batteries are good for:

• D-cell: Throwing at JD Drew:

• What is θ0? What is In(θ0)?

• Uh-oh, this assumes we know the true In(θ0). What

• MSE is an intuitive measure of how good an estimator

• What is Fisher’s Information, In(θ)?

• We’ve achieved the Cramer-Rio Lower bound!!!

optim(par, fn, gr = NULL, ...,

par: the initial estimate(s) of the parameter(s) of the function

• We’ve learned 3 ways to calculate the MLE’s:

DJ MLE? Maximum Likelihood Estimation?

You might also like