0% found this document useful (0 votes)
23 views32 pages

Chap 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views32 pages

Chap 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

5.

Maximum Likelihood Estimation

Ismaı̈la Ba

[email protected]
STAT 3100 - Winter 2024

1 / 32
Preambule

Contents

1 Preambule

2 The Likelihood Function

3 Maximum Likelihood Estimation

2 / 32
Preambule

MLE : intuition

Example 1
Suppose we toss an unfair coin. Let p be the probability of getting heads,
that is P(heads) = p and consider the observed sample x = (0, 1, 1, 1)
(0=Tails,1=Heads). For what value of p is the observed sample most likely
to have occurred ?

1 Let X1 , X2 , X3 , X4 be iid Bernoulli(p) (probability distribution for a


coin tossing) random variables.
2 The probability of the event {X1 = x1 , X2 = x2 , X3 = x3 , X4 = x4 } is

f (x1 , x2 , x3 , x4 ; p) = (1 − p)p 3

3 Natural idea of the maximum likelihood : find the value of p which


maximizes the probability of observing the sample x = (0, 1, 1, 1).

3 / 32
Preambule

What do you think ?


The function. . . and its derivative with respect to p
0.10

0.2
0.0
0.08

−1.0 −0.8 −0.6 −0.4 −0.2


0.06
0.04
0.02
0.00

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

⇒ maximum likelihood estimate : p̂(x) = 3/4 !


4 / 32
The Likelihood Function

Contents

1 Preambule

2 The Likelihood Function

3 Maximum Likelihood Estimation

5 / 32
The Likelihood Function

Definition 1
Let X1 , . . . , Xn have joint pmf or pdf fX1 ,...,Xn (·; θ) where the parameters
θ = (θ1 , . . . , θm ) have unknown values. Given
X1 = x1 , X2 = x2 , . . . , Xn = xn is observed, the function of θ defined by

L(θ) = L(θ; x) = f (x1 , x2 , . . . , xn ; θ)

is the likelihood function. If X = (X1 , . . . , Xn ) is a random sample from a


distribution with density f (x; θ), the likelihood function is
n
Y
L(θ; x) = f (xi ; θ)
i=1

where f (xi ; θ) is the common density of the Xi .

The likelihood function is not a probability density function.

6 / 32
Maximum Likelihood Estimation

Contents

1 Preambule

2 The Likelihood Function

3 Maximum Likelihood Estimation

7 / 32
Maximum Likelihood Estimation

Methodology

The maximum likelihood principle for estimation is to choose the value of θ


that maximizes the likelihood function L(θ; x) for the observed x.

Definition : maximum likelihood


Let x = (x1 , . . . , xn ) be a realization of X = (X1 , . . . , Xn ) with likelihood
function L(θ; x), where θ ∈ Θ and Θ is the parameter space of θ. Then a
maximum likelihood estimator (MLE) of θ is any θ̂ that satisfies

L(θ̂; x) = max f (x1 , . . . , xn ; θ).


θ∈Θ

We could also write

θ̂(X) = argmaxθ∈Θ L(θ; X) = argmaxθ∈Θ f (X1 , . . . , Xn ; θ).

8 / 32
Maximum Likelihood Estimation

Log-likelihood
It is more convenient to work with the log-likelihood function : that
is
n
X
ln L(θ; x) = ln f (xi ; θ)
i=1
and since x 7→ ln x is increasing
θ̂(X) = argmaxθ∈Θ ln L(θ; X).

General methodology : For X ∼ f (·; θ) with θ ∈ Θ ⊂ R


1 Compute the log-likelihood function ;

2 If this is differentiable with respect to θ, compute (ln L(θ; x))′ : let

θ̂(x) be the value of θ where the derivative vanishes ; (otherwise find


another argument) ;
3 Verify that it is indeed a maximum by checking that

(ln L(θ; x))′′ < 0.


|θ=θ̂(x)
9 / 32
Maximum Likelihood Estimation

Examples - Exponential distribution

Example 2

Let X1 , . . . , Xn be a random sample from an Exp(β) distribution so that


n
 
1  1X 

L(β; x) = f (x1 , . . . , xn ; β) = n exp  1{x > 0, ∀i = 1, . . . , n}.
 
− xi 

β  β
  i

i=1

Now,
n
1X
ln L(β; x) = −n ln β − xi .
β i=1
To maximize this with respect to β, we differentiate to obtain
 n
 n
d d  1 X  n 1 X
ln L(β; x) = −n ln β − xi  = − + 2 xi .
dβ dβ β i=1 β β i=1

10 / 32
Maximum Likelihood Estimation

Examples - Exponential distribution


Example 2 continued
Setting this equal to 0 and solving for β yields
n
1X
β= xi = x n .
n i=1

We should always check that this a maximum. The second derivative is


 n
 n
d2 d  n 1 X  n 2 X
ln L(β; x) = − + xi  = − xi ,
dβ  β β2 i=1  β2 β3 i=1
 
dβ2

which is negative when evaluated at β = x n . Thus, β̂ = X n is the


maximum likelihood estimator (MLE) of β.

Remark : This is the same as the method of moments estimator, which we


have seen is unbiased and consistent for β.
11 / 32
Maximum Likelihood Estimation

Examples - Poisson distribution


Example 3

Let X1 , . . . , Xn be a random sample from a Poisson distribution with


parameter λ unknown. The likelihood function is
n Pn
Y λxi e −λ λ( i=1 xi )
e −nλ
L(λ; x) = = Qn .
i=1
xi ! i=1 xi !

The log-likelihood function is


 n   n 
X  Y 
ln L(λ; x) =  xi  ln λ − nλ − ln  xi ! .
i=1 i=1

Differentiating with respect to λ yields


 n 
d 1 X 
ln L(λ; x) =  xi  − n.
dλ λ i=1 12 / 32
Maximum Likelihood Estimation

Examples - Poisson distribution

Example 3 continued
Setting this equal to 0 and solving for λ yields λ = x n . The second
derivative is  n 
d2 1 X 
ln L(λ; x) = − 2  xi 
dλ2 λ i=1
which is negative when evaluated at λ = x n . The MLE of λ is hence
λ̂ = X n . Since X n is unbiased and V(X n ) = λ/n, we also have that λ̂ is
consistent for λ.

13 / 32
Maximum Likelihood Estimation

Examples - Gamma distribution

Example 4

Let X1 , . . . , Xn be a random sample from a Gamma(α, β) distribution with


parameters α and β unknown. The likelihood function writes
α−1
x α−1 e −xi /β
n !n Y
 n  n
 
Y
i 1  1X 

L(α, β; x) = .
 
= α xi  exp  − xi 

βα Γ(α) β Γ(α)  β
  

i=1 i=1 i=1

The log-likelihood function becomes


 n  n
Y  1 X
ln L(α, β; x) = −nα ln(β) − n ln Γ(α) + (α − 1) ln  xi  − xi .
i=1
β i=1

14 / 32
Maximum Likelihood Estimation

Examples - Gamma distribution


Example 4 continued
The partial derivatives are
 n 
d Γ′ (α) Y 
ln L(α, β; x) = − n ln(β) − n + ln  xi  ;
dα Γ(α) i=1
n
d nα 1 X
ln L(α, β; x) = − + 2 xi .
dα β β i=1

Now, define
 n 1/n
d Γ′ (α) Y 
ψ(α) := ln Γ(α) = and x̃n =  xi 
dα Γ(α) i=1

where ψ(·) is called the digamma function and x̃n is the geometric
mean of x1 , . . . , xn .
15 / 32
Maximum Likelihood Estimation

Examples - Gamma distribution

Example 4 continued
Setting the partial derivatives to 0, we obtain the maximum likelihood
equations

xn
β=
α
ln(α) − ψ(α) − ln(x n /x̃n ) = 0.

There is no closed form solution for these equations, but they can be
solved numerically.

16 / 32
Maximum Likelihood Estimation

Order Statistics
Definition : order statistics
The order statistics from a random sample X1 , . . . , Xn are the random
variables X1:n , X2:n , . . . , Xn:n given by

X1:n = the smallest among X1 , . . . , Xn


X2:n = the second smallest among X1 , . . . , Xn
..
.
Xn:n = the largest among X1 , . . . , Xn

so that with probability 1, −∞ < X1:n < X2:n < . . . < Xn:n < ∞.

Remark
X1:n = min{X1 , . . . , Xn }.
Xn:n = max{X1 , . . . , Xn }.
17 / 32
Maximum Likelihood Estimation

Order Statistics (2)

Proposition : joint density and marginal distribution


Let gX1:n ,...,Xn:n (x1:n , . . . , xn:n ; θ) denote the joint density of the order
statistics X1:n , . . . , Xn:n resulting from a random sample of Xi ’s from a
density fX (x). Then
n
Y
gX1:n ,...,Xn:n (x1:n , . . . , xn:n ; θ) = n! fX (xi:n )1{x1:n < x2:n < . . . < xn:n }.
i=1

The marginal distribution of the first r order statistics is given by


 r 
n! Y 
gX1:n ,...,Xr :n (x1:n , . . . , xr :n ; θ) =  fX (xi:n ) × [1 − FX (xr :n )]n−r
(n − r )! i=1

when x1:n < x2:n < . . . < xr :n and 0 otherwise.

18 / 32
Maximum Likelihood Estimation

Order Statistics (3)

More generally, consider the joint (marginal) distribution of x3:10 and


x7:10 .
( ) Draw a picture to see what is going on.
The (marginal) joint density of (x3:10 , x7:10 ) writes

n!
g (x3:10 , x7:10 ) = [FX (x3:10 )]2 fX (x3:10 )[FX (x7:10 ) − FX (x3:10 )]3
2!1!3!1!3!
fX (x7:10 )[1 − FX (x7:10 )]3
when x3:10 < x7:10 and 0 otherwise.

The multinomial coefficient represents the number of ways to arrange


10 observations into groups of sizes 2, 1, 3, 1, 3.
Not a formal proof but it works !

19 / 32
Maximum Likelihood Estimation

Order Statistics (4)

Suppose we have a random sample X1 , . . . , Xn from a continuous


distribution with CDF F and pdf f , with n ≥ 3 and, for some fixed i, j, k
with i < j < k, we want the joint likelihood (or density) of X1:n , Xj:n and
Xk:n . Using the above trick, we can easily write this down :
n!
g (xi:n , xj:n , xk:n ) =
(i − 1)!(j − i − 1)!(k − j − 1)!(n − k)!
× [F (xi:n )]i−1 f (xi:n )[F (xj:n ) − F (xi:n )]j−i−1 f (xj:n )
× [F (xk:n ) − F (xj:n )]k−j−1 f (xk:n )[1 − F (xk:n )]n−k
when xi:n < xj:n < xk:n and 0 otherwise.

Remark : Adding up the exponents should yield n − ν where ν is the


number of arguments in the joint density g . Here, they add up to n − 3.

20 / 32
Maximum Likelihood Estimation

Exercise on joint density for order statistics

Exercise 1
Consider a random sample X1 , . . . , X20 from a continuous distribution with
CDF F and density f . What is the joint density of X2:20 , X5:20 , and X13:20 ?

21 / 32
Maximum Likelihood Estimation

Example 5

Suppose that the lifetime of a particular component has an Exp(β)


distribution and that n of these are randomly chosen (independently) and
placed into service. We observe the times of the first r failures (i.e. we
observe x1:n , . . . , xr :n ). From the above slides on order statistcs, the joint
pdf of x1:n , . . . , xr :n is given by

L(β; x) =g (x1:n , . . . , xr :n ; β)
 r 
n! Y 
=  f (xi:n ; β) [1 − F (xr :n ; β)]n−r
(n − r )! i=1
r
  ( )
n! 1  1X

 
 (n − r )xr :n
= exp  − xi:n  exp −

(n − r )! βr  β

i=1

 β
  r 
n! 1  1 X
 
.
 
= exp − x i:n + (n − r )xr :n

(n − r )! β  β
r 
 

  


i=1

22 / 32
Maximum Likelihood Estimation

Example 5 continued
Notice that t = T (x1 , . . . , xn ) := ri=1 xi:n + (n − r )xr :n represents the
P
observed total time in service of the n times when the experiment is
terminated (at the time of the r’th failure). The log-likelihood (in terms of
t) is
t
ln L(β; x) = −r ln(β) − + const
β
and its derivative with respect to β
d r t
ln L(β; x) = − + 2 .
dβ β β

Setting this equal to 0 and solving for β yields β = rt . The second


derivative is
d2 1 2t
ln L(β; x) = 2 (r − ),
dβ 2 β β
which is negative when evaluated at β = rt .
23 / 32
Maximum Likelihood Estimation

Example 5 continued
Thus, the MLE becomes
T
β̂ =
r
with T = T (X1 , . . . , Xn ) := i=1 Xi:n + (n − r )Xr :n .
Pr

Remark : If r = n, then t = ni=1 xi:r = ni=1 xi since, either way, we are


P P

adding up the entire sample and the MLE becomes β̂ = X n as before.

Exercise 2
Suppose that X1 , . . . , Xn are iid N(µ, σ2 ) random variables with both µ and
σ2 unknown. Find the MLE’s of µ and σ2 .

24 / 32
Maximum Likelihood Estimation

Example 6

Let X1 , . . . , Xn be a random sample from a two parameter exponential


distribution Exp(1, η) with density

f (x; η) = e −(x−η) 1(η,∞) (x).

The likelihood function for η writes


 n   n 

 X  
 X 
η) η η)
   
L(η; x) = exp  − (x − 1{x ≥ ∀i} = exp − (x − 1{η ≤ x1:n }.
 
 i 
 i 
 i 

   
i=1 i=1

This likelihood function is monotonically increasing in η to x1:n and then is


0 for all η > x1:n . The derivative with respect to η won’t help us since the
maximum occurs on the boundary and L(η) is not continuous at this point.
Nevertheless, the MLE for η is η̂ = X1:n , the minimum of the Xi . This is
quite different than the method of moments estimator.

25 / 32
Maximum Likelihood Estimation

The following exercise examines the properties of the MLE found in


Example 6, that is of η̂ = X1:n .
Exercise 3
Let X1 , . . . , Xn be a random sample from a two parameter exponential
distribution Exp(1, η) with density

f (x; η) = e −(x−η) 1(η,∞) (x).

From the above, the MLE of η is η̂ = X1:n .


(a) Show that FX1:n (t) = P(X1:n ≤ t) = [1 − e −n(t−η) ]1(η,∞) (t).
(b) Show that fX1:n (t) = ne −n(t−η) 1(η,∞) (t).
(c) Use (b) to show that E(η̂) = E(X1:n ) = η + n1 , V(η̂) = V(X1:n ) = 1
n2
.
(d) Use (c) to conclude that η̂ is asymptotically unbiased and consistent
for η.

26 / 32
Maximum Likelihood Estimation

Exercise 4
Let X1 , . . . , Xn be a random sample from a two parameter exponential
distribution Exp(β, η) with density

1 −(x−η)/β
f (x; η, β) = e 1(η,∞) (x).
β
Find the MLE’s of β and η ; and compare with the method of moments
result from Chapter 4.

Example 7

Let X1 , . . . , Xn be a random sample from a two parameter Pareto


distribution Pareto(α, κ) with density

ακα
f (x; α, κ) = 1(κ,∞) (x).
x α+1

27 / 32
Maximum Likelihood Estimation

Examples - Two parameter Pareto distribution

Example 7 continued
The log-likelihood function is
n
 
 Y 1 
ln L(α, κ; x) = 
 
n ln(α) + nα ln(κ) + ln 1{κ ≤ x ∀i}
 
α+1 i 
i=1 xi

 

n
 
 X 
.
 
= n ln(α) + nα ln(κ) − (α + 1) ln(xi ) 1{κ ≤ xi ∀i}
 

 

i=1

Differentiating ln L(α, κ; x) with respect to κ and setting this equal to 0


yield nα/κ = 0 which implies that κ = ∞, which is impossible because
κ ≤ x1:n . As a function of κ, L(α, κ; x) is monotonically increasing in κ until
κ > x1:n , at which time L(α, κ; x) becomes 0. Therefore, the MLE for κ is
κ̂ = X1:n .

28 / 32
Maximum Likelihood Estimation

Examples - Two parameter Pareto distribution

Example 7 continued
Differentiating ln L(α, κ; x) with respect to α yields
n
d n X
ln L(α, κ; x) = + n ln(κ) − ln(xi ).
dα α i=1

Setting this equal to 0 and solving for α yields


n
α = Pn .
i=1 ln(xi /κ)

Thus, the MLE for α is


n
α̂ = Pn .
i=1 ln(Xi /X1:n )

29 / 32
Maximum Likelihood Estimation

Exercises

Exercise 5
Find the MLE’s when X1 , . . . , Xn is a random sample from the following
distributions :
(a) U(0, θ) where θ > 0 is unknown.
(b) Weibull(1/2, γ) where γ > 0 is unknown.
(c) Binomial(20, p) where p ∈ [0, 1] is unknown.
(d) Geometric(p) where pin(0, 1) is unknown.
(e) Laplace(λ) where λ > 0 is unknown.

30 / 32
Maximum Likelihood Estimation

Exercises - Two parameter Laplace distribution

Exercise 6
Let X1 , . . . , Xn be a random sample from a distribution with density
1 −|x−η|/β
f (x; η, β) = e 1(−∞,∞) (x).
β
Find the MLE’s for β and η. Hint : The value of a that minimizes
i=1 |xi − a| is a = median(x1 , . . . , xn ). What are the method of moments
Pn
estimator ?

31 / 32
Maximum Likelihood Estimation

Definition : Invariance Property


If θ̂ is the MLE of θ and if u(θ) is a function of θ, then u(θ̂) is an MLE for
u(θ).

Example 8
Let X1 , . . . , Xn be a random sample from an Exp(β) distribution. What is
the MLE for estimating p(β) = P(X ≥ 1) = e −1/β ? Since X n is the MLE
for β, we have p(β) d = p(β̂) = e −1/X n .

Example 9
Let X1 , . . . , Xn be a random sample from a Poisson(λ) distribution. What is
the MLE for estimating p(λ) = P(X = 0) = e −λ ? Since X n is the MLE for
λ, we have p(λ) d = p(λ̂) = e −X n .

32 / 32

You might also like