0% found this document useful (0 votes)
327 views10 pages

Maximum Likelihood

The document provides information on calculating the Fisher information and the asymptotic variance of maximum likelihood estimators (MLEs). It defines the Fisher information for single-parameter and multi-parameter cases. For multiple parameters, the information takes the form of a matrix. The asymptotic variance of MLEs equals the inverse of this information matrix. Examples are given to illustrate calculating the information and variance for specific distributions.

Uploaded by

yuan100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
327 views10 pages

Maximum Likelihood

The document provides information on calculating the Fisher information and the asymptotic variance of maximum likelihood estimators (MLEs). It defines the Fisher information for single-parameter and multi-parameter cases. For multiple parameters, the information takes the form of a matrix. The asymptotic variance of MLEs equals the inverse of this information matrix. Examples are given to illustrate calculating the information and variance for specific distributions.

Uploaded by

yuan100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Supplementary Note About The MLE

Theorem (Invariance Property of MLE). If the MLE of a parameter θ is θ̂ , then the


MLE of g(θ) is g(θ̂).

If the MLE of θ is θ̂, then the MLE of the parameter θ2 + sin θ is θ̂2 + sin(θ̂). So in
particular, if based on an observed data set we got the estimation θ̂ = 0.7226 for θ, then the
estimation for θ2 + sin θ is (0.7226)2 + sin(0.7226) = 1.1835

How to calculate the the Fisher Information

Let l(θ) be the log-likelihood. Here is how the (Fisher) information is calculated.

Case 1 one parameter only

[ ] [ ]
I(θ) = −E l′′ (θ) = E (l′ (θ))2

Case 2. Several parameters (θ1 , ... , θr )

In this case the information matrix is an r × r matrix whose (i, j)-th entry is

[ ∂2l ] [ ∂l ∂l ]
−E =E
∂θi ∂θj ∂θi ∂θj

For example, for two parameters θ1 and θ2 , the Information matrix is:
 [ ] [ ]   
∂2l ∂2l ∂2l ∂2l
E E
 ∂θ12 ∂θ1 ∂θ2   ∂θ12 ∂θ1 ∂θ2 
   
−  = −E  
 [ ] [ ]   
∂2l ∂2l ∂2l ∂2l
E ∂θ1 ∂θ2 E ∂θ22 ∂θ1 ∂θ2 ∂θ22

1
The asymptotic variance of the MLE estimator is then equal to

I(θ)−1

In the case of several parameter, the information matrix needs to be inverted. In the
professional exams only 2 × 2 matrices will be given. The inverse of any invertible 2 × 2
matrix can be calculated from the following formula:

 −1  
a b 1 d −b
  =  
c d ad − bc −c a

Note. The matrix


 
∂2l ∂2l
 ∂θ12 ∂θ1 ∂θ2 
 
 
 
∂2l ∂2l
∂θ1 ∂θ2 ∂θ22

is called the Hessian matrix.

Note. If n observations have been collected, then the amount of information for this n
observations is equal to n times the amount of information from one single observation.

Theorem. The asymptotic variance of the MLE is equal to

I(θ)−1

Example (question 13.66 of the textbook) ∗. A distribution has two parameters, α and
β. A sample of size 10 produced the following loglikelihood function:

l(α, β) = −2.5α2 − 3αβ − β 2 + 50α + 2β + k

where k is a constant. Estimate the covariance matrix of the MLE of (α̂ , β̂).

Solution.

2
∂l
= −5α − 3β + 50
∂α

∂l
= −3α − 2β + 2
∂β

∂2l
= −5
∂α2

∂2l
= −2
∂β 2

∂2l
= −3
∂α∂β
 
5 3
(information matrix) I = −E[Hessian Matrix] =  
3 2

 
1 2 −3
covariance matrix =  
(5)(2) − (3)(3) −3 5

Example. A single observation, x, is taken from a normal distribution with mean µ = 0 and
variance σ 2 = θ. The normal distribution has its probability density function given by

1 (x−µ)2
f (x) = √ e− 2σ2
σ 2π

Let θ̂ be the maximum likelihood estimator of θ. Which of the following is the variance of θ̂ ?.

1 1 1
(A) (B) (C) (D) 2θ (E) θ2
θ θ2 2θ

Solution.

1 x2 x2
e− 2 θ = (2πθ)− 2 e− 2 θ
1
µ=0 σ2 = θ ⇒ L(θ) = √
2πθ

1 x2
l(θ) = C − ln(θ) − with some constant C
2 2θ

3
1 x2 1 x2
l′ (θ) = − + 2 ⇒ l′′ (θ) = −
2θ 2θ 2θ2 θ3

But:

E(X 2 ) = Var(X) + E(X)2 = σ 2 + µ2 = θ

Therefore:

[ ] 1 θ 1 [ ] 1
E l′′ (θ) = 2 − 3 = − 2 ⇒ I(θ) = −E l′′ (θ) = 2
2θ θ 2θ 2θ

1
asymptotic variance = = 2θ2
I(θ)

Distribution Asymptotic Variance


of MLE
θ2
Exponential n
nθ2
Uniform(0 , θ)
 (n+1) (n+2)
2


 Var(µ̂) = σn
2


σ 2
Lognormal Var(σ̂) = 2n



 Cov(µ̂ , σ̂) = 0
α2
Pareto with fixed θ Var(α̂) = n
(α+2)θ2
Pareto with fixed α Var(θ̂) = nα
θ2
Weibull with fixed τ Var(θ̂) = n τ2

Example. Verify the formula for the lognormal.

Solution.
( )n ( ∑ )
1 (x−µ)2 1 (xi − µ)2
f (x) = √ e− 2σ2 ⇒ 2
L(µ , σ ) = f (x1 , ..., xn ) = √ exp −
σ 2π σ 2π 2σ 2

(xi − µ)2
l(µ , σ 2 ) = C − n ln(σ) −
2σ 2

4
∑ ∑
∂l (xi − µ)2 xi − nµ
=− 2
=−
∂µ σ σ2
[ ]
∂2l n ∂2l n
2
=− 2 ⇒ E − 2 = 2
∂µ σ ∂µ σ
∑ [ ]
∂2l 2 (ln(xi ) − µ) ∂2l
= ⇒ E − =0
∂µ ∂σ σ3 ∂µ ∂σ

∂l (ln(xi ) − µ)2 n
= −
∂σ σ3 σ
∑ [ ]
∂2l −3 (ln(xi ) − µ)2 n ∂2l 3nσ 2 n 2n
2
= 4
+ 2 ⇒ E − 2 = 4
− 2 = 2
∂σ σ σ ∂σ σ σ σ
   
n σ2
0 0
I(µ , σ) =  σ2  ⇒ variance-covariance matrix Σ = I −1 =  n 
2n σ2
0 σ2
0 2n

5
Delta Method

σ2
then g(θ̂) ≈ N (g(θ) , g ′ (θ) σn )
2
Theorem. If θ̂ ≈ N (θ , n ),

In the professional exams, here is how it is used (we really don’t pay careful attention as to
whether the distribution is normal of not):

1
g(X) = g(α) + g ′ (α)(X − α) + g ′′ (α)(X − α)2 + · · · ⇒ g(X) ≈ g(α) + g ′ (α)(X − α)
2

Var(g(X)) ≈ g′ (α)2 Var(X)

and in the case of two variables involved:

  
[ ] ∂g
∂g ∂g  Var(X) Cov(X, Y )
 ∂x 
Var(g(X, Y)) ≈
∂x ∂y Cov(X, Y ) Var(Y ) ∂g
∂y

Example. Claim size X follows a single parameter Pareto distribution with known parameter
θ = 50. We estimate α to be 4 with variance 0.3 (variance of estimator). Calculate the
variance of the estimate for P r(X < 100).

Solution.

For a single parameter Pareto distribution we have:


( )α ( )α
θ 50
P (X < x) = 1 − ⇒ P (X < 100) = 1 − = 1 − (0.5)α
x 100

If we the estimator of α by Y , then the estimator of P (X < 100) is g(Y ) = 1 − (0.5)Y . Then
using the delta method:

g ′ (Y ) = −(0.5)Y ln(0.5) ⇒ g ′ (4) = (0.5)4 ln(0.5) = −0.0433

6
Var(g(Y )) = g ′ (4)2 Var(Y ) = (−0.0433)2 (0.3) = 0.00056 ✓

Example (questions 13.56 and 13.74 of the textbook) ∗.

(i) The random variable X had the pdf

f (x) = αλα (λ + x)−α−1 x, α, λ > 0

It is known that λ = 1000. You are given the following observations:

43 , 145 , 233 , 396 , 775

Determine the MLE of α.

(ii) Estimate the variance of the MLE and use it to construct a 95% confidence interval for
E(X ∧ 500)

Solution to part (i).


5
L = α5 10005α (1000 + xj )−α−1
j=1


5
(log-likelihood) l = 5 ln(α) + 5α ln(1000) − (α + 1) ln(1000 + xj )
j=1

5 ∑ 5
5
l′ (α) = + 5 ln(1000) − ln(1000 + xj ) = + 34.5388 − 35.8331
α α
j=1

l′ (α) = 0 ⇒ α̂ = 3.8629

Solution to part (ii).

ln f (x) = ln(α) + α ln(λ) − (α + 1) ln(λ + x)

∂ 2 ln f (x) 1
2
=− 2
∂α α

7
[ ]
∂ 2 ln f (x) n
(based on n=5 observations) I(α) = −nE 2
= 2
∂α α

Invert it to get:

α2 2 2
\ = α̂ = 3.8629 = 2.9844
Var(α̂) = ⇒ Var(α̂)
n n 5
∫ 500 ∫ 500
g(α) = E(X ∧ 500) = xf (x)dx = x α 1000α (1000 + x)−α−1 dx +
∫ 500 0 0
1000 2 1500
500 α 1000α (1000 + x)−α−1 dx = using the Pareto integrals = − ( )α
0 α−1 3 α−1
( )α ( ) ( )
′ 1000 2 1500 1500 2 α 2
g (α) = − + − ln
(α − 1)2 3 (α − 1) 2 α−1 3 3

 g(α̂)
d = 239.88
α̂ = 3.8629 ⇒
 g ′ (α̂) = −39.428

( )2
Var(g(α̂)) ≈ g ′ (α̂) Var(α̂) = (−39.428)2 (2.9844) = 5639.45

√ √
d ±
confidence interval = g(α̂) Var(g(α̂))z0.025 = 239.88 ± ( 5639.45)(1.96)

Example ∗. At this moment, the examples 62.5 and 62.6 of the Finan’s study guide were
solved in class.

8
3. Calculations for qx

If T is to denote the time of death, then the conditional distribution (T | x < T ≤ x + 1) is


assumed to be uniform on (x , x + 1). This is the same assumption we use for large data sets.
Under this assumption, for 0 < t < 1 we have:



 t qx = t qx




 (1−t)qx
q
1−t x+t = 1−t qx

For example, here is how the first equality is proved:

P (x < T ≤ x + t)
q = P (x < T ≤ x + t | x < T ) =
t x
P (x < T )

P (x < T ≤ x + t) P (x < T ≤ x + 1)
=
P (x < T ≤ x + 1) P (x < T )

= P (x < T ≤ x + t | x < T ≤ x + 1) P (x < T ≤ x + 1 | x < T ) = t qx

One problem of interest is the calculation of the MLE for the probability
qx = P ( x < T ≤ x + 1 | x < T ).

Example. A cohort of 500 individuals of age x is observed. The study ends at age x + 1. Five
deaths were observed and as many as 350 of them left the study at age x + 0.7. Assuming a
uniform distribution of death times in one year, find the MLE of qx .

Solution. The likelihood of death by time x + 0.7 is 0.7qx , therefore the likelihood of
surviving to age x + 0.7 is 1 − 0.7qx . Therefore, the likelihood function will be:

L(qx ) = qx5 (1 − 0.7qx )350 (1 − qx )145

The log-likelihood:

9
l(qx ) = 5 ln(qx ) + 350 ln(1 − 0.7qx ) + 145 ln(1 − qx )

dl 5 (350)(0.7) 145 5 245 145


= − − = − −
d qx qx 1 − 0.7qx 1 − qx qx 1 − 0.7qx 1 − qx

Setting the derivative equal to zero gives us:

5(1 − 0.7qx )(1 − qx ) − 145qx (1 − 0.7qx ) − 245qx (1 − qx ) = 0

700 qx2 − 797 qx + 10 = 0 ⇒ qx = 0.0127

Example ∗. In a one year mortality study on ten lives of age x, three withdrawals occur at
time 0.4 and one death is observed. Mortality is assumed to have a uniform distribution.
Determine the maximum likelihood estimate of qx .

Solution.

L(qx ) = qx (1 − 0.4qx )3 (1 − qx )6

The log-likelihood:

l(qx ) = ln(qx ) + 3 ln(1 − 0.4qx ) + 6 ln(1 − qx )

dl 1 1.2 6
= − −
d qx qx 1 − 0.4qx 1 − qx

Setting the derivative equal to zero gives us:

(1 − 0.4qx )(1 − qx ) − 6qx (1 − 0.4qx ) − 1.2qx (1 − qx ) = 0

4 qx2 − 8.6 qx + 1 = 0 ⇒ qx = 0.1234

10

You might also like