0% found this document useful (0 votes)

16 views22 pages

6 Mle Asy A

Uploaded by

Dx clino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views22 pages

6 Mle Asy A

Uploaded by

Dx clino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

ST2132

Topic 06: Large Sample Theory for MLE

part a: Asymptotic Normality, Confidence Intervals

Semester 1 20/21

1 / 22
Introduction

I The fact that many MLEs are consistent and asymptotically normal is
of great importance. In particular, large-sample confidence intervals
are feasible.

I This can be viewed as the parametric version of the fact in survey

sampling that X̄ is asymptotically normal. There, the population is
non-parametric, i.e., not described by a simple density.

I Not surprisingly, the common underlying tool is the Central Limit

Theorem. We will explore that in the heuristic proof in part b.

2 / 22
Theorem: Asymptotic normality of MLE

I Let X1 , . . . , Xn be IID with density f (·|θ), where θ is an unknown

constant in the parameter space Θ ⊂ R. Let θ̂ be the MLE of θ. As
n → ∞,
p
nI(θ)(θ̂ − θ) → N(0, 1) in distribution

I Consequently, for large n, approximately

I(θ)−1

θ̂ ∼ N θ,
n

In particular, as n → ∞, θ̂ → θ: the MLE is consistent.

3 / 22
Asymptotic normality of MLE: the vector version

I Let X1 , . . . , Xn be IID with density f (·|θ), where θ is an unknown

vector in the parameter space Θ ⊂ Rp . Let θ̂ be the MLE of θ. As
n → ∞,
p
nI(θ)(θ̂ − θ) → N(0, Ip ) in distribution

I For large n, approximately

I(θ)−1

θ̂ ∼ N θ,
n

4 / 22
Interpretation

I Recall that nI(θ) is the amount of information in n IID samples with

density f (·|θ).

I The asymptotic variance of the MLE is inversely proportional to the

sample size n. The notation I(θ)−1 emphasises this point: it is
similar to σ 2 in sample survey (slide 16, 3 sur sam a.pdf).

5 / 22
The Poisson

I X1 , . . . , Xn IID Poisson(λ). θ = λ. θ̂ = X̄ . I(θ) = 1/λ. According to

the theorem, if n is large, approximately

λ
X̄ ∼ N λ,
n
I The theorem confirms what we already know (slides 11 and 13,
4 par est a.pdf).

6 / 22
The normal case (a)

I X1 , . . . , Xn IID N(µ, σ 2 ). θ = (µ, σ). θ̂ = (X̄ , σ̂).

1/σ 2

0
I(θ) =
0 2/σ 2

I The vector-version of the theorem implies that if n is large,

approximately
2
X̄ µ σ /n 0
∼N ,
σ̂ σ 0 σ 2 /(2n)

This is a bivariate normal distribution.

I We already know X̄ ∼ N(µ, σ 2 /n) exactly and X̄ ⊥ σ̂ . The

approximate normality of σ̂ is new.
7 / 22
The normal case (b)

I X1 , . . . , Xn IID N(µ, ν = σ 2 ). θ = (µ, ν). θ̂ = (X̄ , σ̂ 2 ).

1/σ 2

0
I(θ) =
0 1/(2σ 4 )

I If n is large,
2
X̄ µ σ /n 0
∼N ,
σ̂ 2 σ2 0 2σ 4 /n

I Again, the approximate normality of σ̂ 2 is new.

8 / 22
The HWE (Rice page 283)
I Let W1 , . . . , Wn be IID Multinomial(1,p), where
p = ((1 − θ)2 , 2θ(1 − θ), θ2 ). Wi takes values (1,0,0), (0,1,0) and
(0,0,1) with these probabilities. W1 + · · · + Wn = X
∼ Multinomial(n, p).

I The random loglikelihood is

n
X
L(θ) = (Wi,1 log p1 + Wi,2 log p2 + Wi,3 log p3 )
i=1
= (2X1 + X2 ) log(1 − θ) + (X2 + 2X3 ) log θ + X2 log 2

The MLE based on the W’s is the same as that based on X:

X2 + 2X3
θ̂ =
2n
9 / 22
The HWE (continued)
I To avoid confusion with the Fisher information based on X, let the
Fisher information based on W be
2
I ∗ (θ) =
θ(1 − θ)
I We apply the theorem to the n IID W’s. For large n, approximately

θ(1 − θ)
θ̂ ∼ N θ,
2n
I Let I(θ) = nI ∗ (θ) be the Fisher information based on X. Then

θ̂ ∼ N(θ, I(θ)−1 )

It is hard to apply the theorem directly on X, since the sample size is

1.
10 / 22
The general trinomial distribution
I Let W1 , . . . , Wn be IID Multinomial(1,p), where θ = (p1 , p2 ). As in
HWE, the MLE based on the W’s are the same as that based on
X = W1 + · · · + Wn : θ̂ = (X1 /n, X2 /n).

I The Fisher information based on W is

∗ 1/p1 + 1/p3 1/p3
I (θ) =
1/p3 1/p2 + 1/p3
I Applying the vector version of the theorem on the W’s, for large n,
approximately

p1 p1 (1 − p1 )/n −p1 p2 /n
θ̂ ∼ N ,
p2 −p1 p2 /n p2 (1 − p2 )/n
We already know the expectation and variance are exact, but
approximate normality is new.
11 / 22
The SE

I Recall that the SE of an estimate of θ is defined as the SD of the

corresponding estimator θ̂. For a maximum likelihood estimate, the
theorem implies that
r
I(θ)−1
SE = SD(θ̂) ≈
n
I Since θ is unknown, we use the bootstrap: calculate the Fisher
information at the estimate instead. If we switch notation, denoting
the estimate as θ̂, then
s
I(θ̂)−1
SE ≈
n

12 / 22
Sufficient conditions for theorem

Suppose there is δ > 0 such that for each x ∈ R,

I
∂k
log f (x|θ), k = 1, 2, 3
∂θk
exist on (θ − δ, θ + δ) and are continuous.

I
∂3
log f (x|θ) < M(x)
∂θ3
on (θ − δ, θ + δ), with Eθ (M) < K , a constant.

13 / 22
Sufficient conditions for theorem (continued)

I These conditions are satisfied in all our examples, and in practically

all applications.

I The first condition allows interchanging of differentiation and

integration.

I For the vector version, similar conditions are required, and I(θ) is
assumed to be invertible.

14 / 22
Random interval

This prepares the construction of large-sample CI for θ. Let θ̂ be the ML

estimator of θ. For large n,
!
θ̂ − θ
1 − α ≈ Pr −zα/2 ≤ p ≤ zα/2
I(θ)−1 /n
so r r !
I(θ)−1 I(θ)−1
1 − α ≈ Pr θ̂ − zα/2 ≤ θ ≤ θ̂ + zα/2
n n

Unlike slide 3 of 3 sur sam b.pdf, in general SD(θ̂) is not exactly

r
I(θ)−1
n

15 / 22
Confidence interval

I Let θ̂ be the ML estimate of θ. For large n, an approximate

(1 − α)-CI for θ is
 s s 
−1 −1
θ̂ − zα/2 I(θ̂) , θ̂ + zα/2 I(θ̂) 
n n

p q
I The approximate SE I(θ)−1 /n is estimated by I(θ̂)−1 /n (the
bootstrap).

I The CI is a realisation of a random interval, so is fixed. θ̂ is a

realisation of the ML estimator. θ is either in the confidence interval
or not, and we will not know which is the case.

16 / 22
CI for Poisson rate λ

I θ = λ, estimated by θ̂ = x̄. I(θ)−1 = λ, estimated by I(θ̂)−1 = x̄.

I For large n, an approximate (1 − α)-CI for λ is

r r !
x̄ x̄
x̄ − zα/2 , x̄ + zα/2
n n

This is the same as slide 13 of 4 par est a.pdf.

17 / 22
CI for µ and σ from N(µ, σ 2 )

I θ = (µ, σ), estimated by θ̂ = (x̄, σ̂). I(θ)−1 is estimated by

2
−1 σ̂ 0
I(θ̂) =
0 σ̂ 2 /2

I For large n, an approximate (1 − α)-CI for µ is

σ̂ σ̂
x̄ − zα/2 √ , x̄ + zα/2 √
n n

an approximate (1 − α)-CI for σ is

σ̂ σ̂
σ̂ − zα/2 √ , σ̂ + zα/2 √
2n 2n

18 / 22
CI for µ and σ 2 from N(µ, σ 2 )

I θ = (µ, σ 2 ), estimated by θ̂ = (x̄, σ̂ 2 ). I(θ)−1 is estimated by

2
−1 σ̂ 0
I(θ̂) =
0 2σ̂ 4

I For large n, an approximate (1 − α)-CI for σ 2 is

r r !
2 2 2
σ̂ 2 − zα/2 σ̂ 2 , σ̂ + zα/2 σ̂ 2
n n

For µ, the CI is the same as in the previous slide.

19 / 22
The bivariate normal distribution
I The density on Rice page 81 can be written as

1 1 0 −1
f (x) = exp − (x − µ) Σ (x − µ)
2π|Σ|1/2 2

σ12

x1 µ1 ρσ1 σ2
x= ,µ = ,Σ =
x2 µ2 ρσ1 σ2 σ22
We write X ∼ N(µ, Σ).

I It can be shown that any bivariate normal X can be written as

X = AZ + b ∼ N(b, AA0 ), for some 2 × 1 Z IID N(0,1) components,
and 2 × 2 A, 2 × 1 b constants.

I The multivariate normal density (x is p × 1) looks the same, except

that the power of 2π in the denominator is p/2.
20 / 22
Examples

I Let Y1 , . . . , Yn be IID N(µ, σ 2 ). What is the distribution of

(Y1 , . . . , Yn )?

I Let Y1 , . . . , Yn be independent, with Yi ∼ N(µi , σ 2 ). What is the

distribution of (Y1 , . . . , Yn )?

21 / 22
Linear regression

I Let Y1 , . . . , Yn be random variables with

Yi = β1 xi1 + · · · + βp xip + i

where
X is a fixed known n × p matrix.
p × 1 β is fixed unknown.
n × 1 ∼ N(0, σ 2 In ), with σ 2 fixed unknown.
What is the joint distribution of the n × 1 Y? More compactly, we
can write
Y = Xβ +
I Given realisation y of Y, how can we get ML estimates of β and σ 2 ?

22 / 22

Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
2023 Final Review
No ratings yet
2023 Final Review
61 pages
R300 Advanced Econometrics Methods Lecture Slides
No ratings yet
R300 Advanced Econometrics Methods Lecture Slides
362 pages
0 - Statistical Inference Theory
No ratings yet
0 - Statistical Inference Theory
80 pages
Common Statistical Densities: Appendix 1
No ratings yet
Common Statistical Densities: Appendix 1
59 pages
Mathematical Statistics Intro Course 1713243381
No ratings yet
Mathematical Statistics Intro Course 1713243381
142 pages
Notes For Lectures 1 To 10 - 2024
No ratings yet
Notes For Lectures 1 To 10 - 2024
39 pages
Generalized Fisher-Darmois-Koopman-Pitman Theorem and Rao-Blackwell Type Estimators For Power-Law Distributions
No ratings yet
Generalized Fisher-Darmois-Koopman-Pitman Theorem and Rao-Blackwell Type Estimators For Power-Law Distributions
19 pages
Chapter 5
No ratings yet
Chapter 5
60 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
7 Mle
No ratings yet
7 Mle
31 pages
s131 Reviewer 002
No ratings yet
s131 Reviewer 002
14 pages
4 Comparison of Estimators: 4.1 Optimality Theory
No ratings yet
4 Comparison of Estimators: 4.1 Optimality Theory
16 pages
Statistics Review
No ratings yet
Statistics Review
9 pages
2A.3 Lecture Slides 0
No ratings yet
2A.3 Lecture Slides 0
19 pages
Sample Statistics: N I N I
No ratings yet
Sample Statistics: N I N I
13 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
Unit 19
No ratings yet
Unit 19
16 pages
Spring 2009
No ratings yet
Spring 2009
4 pages
STAT 2-2 Test of Hypothesis
No ratings yet
STAT 2-2 Test of Hypothesis
14 pages
Notes
No ratings yet
Notes
10 pages
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
No ratings yet
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
5 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Normal Statistics Estimation
No ratings yet
Normal Statistics Estimation
8 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Prints PDF
No ratings yet
Prints PDF
106 pages
Stat6201 ch5-1
No ratings yet
Stat6201 ch5-1
2 pages
Sol Stat Chapter2
No ratings yet
Sol Stat Chapter2
9 pages
Stat709 19
No ratings yet
Stat709 19
16 pages
(FreeCourseWeb - Com) 1493997599
100% (1)
(FreeCourseWeb - Com) 1493997599
386 pages
Maximum Likelihood An Introduction: L. Le Cam
No ratings yet
Maximum Likelihood An Introduction: L. Le Cam
31 pages
Formulasheetensvnew
No ratings yet
Formulasheetensvnew
15 pages
AllNotes 4
No ratings yet
AllNotes 4
56 pages
2 The Multinomial Distribution
No ratings yet
2 The Multinomial Distribution
15 pages
NOTES
No ratings yet
NOTES
14 pages
Ps 2,3
No ratings yet
Ps 2,3
48 pages
18.443 MIT Stats Course
No ratings yet
18.443 MIT Stats Course
139 pages
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
100% (1)
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
14 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
X400004 20230214 Solutions
No ratings yet
X400004 20230214 Solutions
9 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
MA204 FinalTest 2022
No ratings yet
MA204 FinalTest 2022
14 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
Module 4
No ratings yet
Module 4
3 pages
Final Soln
No ratings yet
Final Soln
5 pages
22-23 323 Week5Notes
No ratings yet
22-23 323 Week5Notes
8 pages
Tutorial 5 So LN
No ratings yet
Tutorial 5 So LN
10 pages
Method of Moments
No ratings yet
Method of Moments
5 pages
Chap2 Multivariate Normal and Related Distributions
No ratings yet
Chap2 Multivariate Normal and Related Distributions
18 pages
Statistical Methods
No ratings yet
Statistical Methods
25 pages
Introduction
No ratings yet
Introduction
11 pages
ECE286 Final Exam Aid Sheet
No ratings yet
ECE286 Final Exam Aid Sheet
4 pages
Estimation EMV
No ratings yet
Estimation EMV
37 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Exam3 Cheatsheet
No ratings yet
Exam3 Cheatsheet
1 page
College Statistics
No ratings yet
College Statistics
244 pages
MATHEMATICS FIRST TERM EXAMINATION JSS 2 - EduDelightTutors
No ratings yet
MATHEMATICS FIRST TERM EXAMINATION JSS 2 - EduDelightTutors
10 pages
6.multiple Reactions
No ratings yet
6.multiple Reactions
22 pages
Pesco
No ratings yet
Pesco
6 pages
Tuck Casebook 2008 Draft
100% (2)
Tuck Casebook 2008 Draft
115 pages
EN14181 Asig Calitatii CEMS
No ratings yet
EN14181 Asig Calitatii CEMS
45 pages
RD Sharma Solutions For Class 11 Maths Chapter 9 Values of Trigonometric Functions at Multiples and Submultiples of An Angle
No ratings yet
RD Sharma Solutions For Class 11 Maths Chapter 9 Values of Trigonometric Functions at Multiples and Submultiples of An Angle
26 pages
TYBCA Practicals On Multithreading
No ratings yet
TYBCA Practicals On Multithreading
12 pages
Answers To Mymathlab Homework
0% (1)
Answers To Mymathlab Homework
11 pages
Configuration of Fibers in Staple Yarn
No ratings yet
Configuration of Fibers in Staple Yarn
8 pages
Technical Reference WaterCAD V8 XM
No ratings yet
Technical Reference WaterCAD V8 XM
0 pages
What Is A Research Design
No ratings yet
What Is A Research Design
13 pages
338 22 Residue Formulas
No ratings yet
338 22 Residue Formulas
2 pages
Algebra - PDF Version 1 PDF
No ratings yet
Algebra - PDF Version 1 PDF
66 pages
Evaluating The Forecast Accuracy-17052025-1
No ratings yet
Evaluating The Forecast Accuracy-17052025-1
8 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
Parent Styles Associated With Children's Self-Regulation and
No ratings yet
Parent Styles Associated With Children's Self-Regulation and
13 pages
Sta 445 3 Time Series Models Ar
No ratings yet
Sta 445 3 Time Series Models Ar
16 pages
Class X Record Book Questions
No ratings yet
Class X Record Book Questions
5 pages
03 - Power Optimization
No ratings yet
03 - Power Optimization
20 pages
As Znotes Pure 1
No ratings yet
As Znotes Pure 1
22 pages
DLD Lab Report
No ratings yet
DLD Lab Report
3 pages
JDS Math Test 2017 Solution 230107 025748
No ratings yet
JDS Math Test 2017 Solution 230107 025748
8 pages
Pythonintroin Your Cs0160Directory. It Should Contain Two
No ratings yet
Pythonintroin Your Cs0160Directory. It Should Contain Two
36 pages
Critical Risk Factors in International Construction Projects - An Indian Perspective
No ratings yet
Critical Risk Factors in International Construction Projects - An Indian Perspective
23 pages
1.introduction and Operations
No ratings yet
1.introduction and Operations
18 pages
Link 180
No ratings yet
Link 180
5 pages
SWQL Functions Solarwinds - OrionSDK Wiki GitHub
No ratings yet
SWQL Functions Solarwinds - OrionSDK Wiki GitHub
3 pages
A Rod of Length L and Diameter D Is Subjected To A Tensile Load P
No ratings yet
A Rod of Length L and Diameter D Is Subjected To A Tensile Load P
2 pages
Finite Element Analysis of Beam (Loaded Triangular Frame)
No ratings yet
Finite Element Analysis of Beam (Loaded Triangular Frame)
7 pages
Clinical Evaluation of Correction Algorithm For Corvis ST Tonometry (Post)
No ratings yet
Clinical Evaluation of Correction Algorithm For Corvis ST Tonometry (Post)
1 page
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet

6 Mle Asy A

Uploaded by

6 Mle Asy A

Uploaded by

ST2132

Topic 06: Large Sample Theory for MLE

I This can be viewed as the parametric version of the fact in survey

I Not surprisingly, the common underlying tool is the Central Limit

I Let X1 , . . . , Xn be IID with density f (·|θ), where θ is an unknown

I Consequently, for large n, approximately

In particular, as n → ∞, θ̂ → θ: the MLE is consistent.

I Let X1 , . . . , Xn be IID with density f (·|θ), where θ is an unknown

I For large n, approximately

I Recall that nI(θ) is the amount of information in n IID samples with

I The asymptotic variance of the MLE is inversely proportional to the

I X1 , . . . , Xn IID Poisson(λ). θ = λ. θ̂ = X̄ . I(θ) = 1/λ. According to

I X1 , . . . , Xn IID N(µ, σ 2 ). θ = (µ, σ). θ̂ = (X̄ , σ̂).

I The vector-version of the theorem implies that if n is large,

This is a bivariate normal distribution.

I We already know X̄ ∼ N(µ, σ 2 /n) exactly and X̄ ⊥ σ̂ . The

I X1 , . . . , Xn IID N(µ, ν = σ 2 ). θ = (µ, ν). θ̂ = (X̄ , σ̂ 2 ).

I Again, the approximate normality of σ̂ 2 is new.

I The random loglikelihood is

The MLE based on the W’s is the same as that based on X:

It is hard to apply the theorem directly on X, since the sample size is

I The Fisher information based on W is

I Recall that the SE of an estimate of θ is defined as the SD of the

Suppose there is δ > 0 such that for each x ∈ R,

I These conditions are satisfied in all our examples, and in practically

I The first condition allows interchanging of differentiation and

This prepares the construction of large-sample CI for θ. Let θ̂ be the ML

Unlike slide 3 of 3 sur sam b.pdf, in general SD(θ̂) is not exactly

I Let θ̂ be the ML estimate of θ. For large n, an approximate

I The CI is a realisation of a random interval, so is fixed. θ̂ is a

I θ = λ, estimated by θ̂ = x̄. I(θ)−1 = λ, estimated by I(θ̂)−1 = x̄.

I For large n, an approximate (1 − α)-CI for λ is

This is the same as slide 13 of 4 par est a.pdf.

I θ = (µ, σ), estimated by θ̂ = (x̄, σ̂). I(θ)−1 is estimated by

I For large n, an approximate (1 − α)-CI for µ is

an approximate (1 − α)-CI for σ is

I θ = (µ, σ 2 ), estimated by θ̂ = (x̄, σ̂ 2 ). I(θ)−1 is estimated by

I For large n, an approximate (1 − α)-CI for σ 2 is

For µ, the CI is the same as in the previous slide.

I It can be shown that any bivariate normal X can be written as

I The multivariate normal density (x is p × 1) looks the same, except

I Let Y1 , . . . , Yn be IID N(µ, σ 2 ). What is the distribution of

I Let Y1 , . . . , Yn be independent, with Yi ∼ N(µi , σ 2 ). What is the

I Let Y1 , . . . , Yn be random variables with

You might also like