Lecture15 Fisherinfo

lecture fisher information

Uploaded by

eajjejan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views4 pages

Lecture15 Fisherinfo

lecture fisher information

Uploaded by

eajjejan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

STATS 200: Introduction to Statistical Inference Autumn 2016

Lecture 15 — Fisher information and the Cramer-Rao bound

15.1 Fisher information for one or more parameters

For a parametric model {f (x|θ) : θ ∈ Ω} where θ ∈ R is a single parameter, we showed
IID
last lecture that the MLE θ̂n based on X1 , . . . , Xn ∼ f (x|θ) is, under certain regularity
conditions, asymptotically normal:
√

1
n(θ̂n − θ) → N 0,
I(θ)
in distribution as n → ∞, where
2
∂ ∂
I(θ) := Varθ log f (X|θ) = −Eθ log f (X|θ)
∂θ ∂θ2
is the Fisher information. As an application of this result, let us study the sampling
distribution of the MLE in a one-parameter Gamma model:
IID
Example 15.1. Let X1 , . . . , Xn ∼ Gamma(α, 1). (For this example, we are assuming that
we know β = 1 and only need to estimate α.) Then
1 α−1 −x
log f (x|α) = log x e = − log Γ(α) + (α − 1) log x − x.
Γ(α)
The log-likelihood of all observations is then
n
X
l(α) = (− log Γ(α) + (α − 1) log Xi − Xi )
i=1
n
X n
X
= −n log Γ(α) + (α − 1) log Xi − Xi .
i=1 i=1

Γ0 (α)
Introducing the digamma function ψ(α) = Γ(α)
, the MLE α̂ is obtained by (numerically)
solving
n
X
0
0 = l (α) = −nψ(α) + log Xi .
i=1
What is the sampling distribution of α̂? We compute
∂2
2
log f (x|α) = −ψ 0 (α).
∂α
As this does not depend on x, the Fisher information is I(α) = −Eα [−ψ 0 (α)] = ψ 0 (α). Then
for large n, α̂ is distributed approximately as N (α, nψ10 (α) ).

15-1
Asymptotic normality of the MLE extends naturally to the setting of multiple parameters:
Theorem 15.2. Let {f (x|θ) : θ ∈ Ω} be a parametric model, where θ ∈ Rk has k parameters.
IID
Let X1 , . . . , Xn ∼ f (x|θ) for θ ∈ Ω, and let θ̂n be the MLE based on X1 , . . . , Xn . Define the
Fisher information matrix I(θ) ∈ Rk×k as the matrix whose (i, j) entry is given by the
equivalent expressions
∂2

∂ ∂
I(θ)ij = Covθ log f (X|θ), log f (X|θ) = −Eθ log f (X|θ) . (15.1)
∂θi ∂θj ∂θi ∂θj
Then under the same conditions as Theorem 14.1,
√
n(θ̂n − θ) → N (0, I(θ)−1 ),
where I(θ)−1 is the k × k matrix inverse of I(θ) (and the distribution on the right is the
multivariate normal distribution having this covariance).
(For k = 1, this definition of I(θ) is exactly the same as our previous definition, and I(θ)−1
1
is just I(θ) . The proof of the above result is analogous to the k = 1 case from last lecture,
employing a multivariate Taylor expansion of the equation 0 = ∇l(θ̂) around θ̂ = θ0 .)
IID
Example 15.3. Consider now the full Gamma model, X1 , . . . , Xn ∼ Gamma(α, β). Nu-
merical computation of the MLEs α̂ and β̂ in this model was discussed in Lecture 13. To
approximate their sampling distributions, note
β α α−1 −βx
log f (x|α, β) = log x e = α log β − log Γ(α) + (α − 1) log x − βx,
Γ(α)
so
∂2 ∂2 1 ∂2 α
2
log f (x|α, β) = −ψ 0 (α), log f (x|α, β) = , 2
log f (x|α, β) = − 2 .
∂α ∂α∂β β ∂β β
These partial derivatives again do not depend on x, so the Fisher information matrix is
0
ψ (α) − β1

I(α, β) = ,
− β1 α
β2

and its inverse is α 1

−1 1 β2 β
I(α, β) = .
ψ 0 (α) βα2 − 1
β2
1
β
ψ 0 (α)
(α̂, β̂) is approximately distributed as the bivariate normal distribution N (α, β), n1 I(α, β)−1 .

In particular, the marginal distribution of α̂ is approximately

!
1 α
N α, .
n(ψ 0 (α) βα2 − β12 ) β 2
Suppose, in this example, that in fact the true parameter β = 1. Then the variance of
1
α̂ reduces to n(ψ0 (α)−1/α) , which is not the variance nψ10 (α) obtained in Example 15.1—the
variance here is larger. The difference is that in this example, we do not assume that we
know β = 1 and instead are estimating β by its MLE β̂. As a result, the MLEs of α in
these two examples are not the same, and here our uncertainty about β is also increasing
the variability of our estimate of α.

15-2
More generally, for any 2 × 2 Fisher information matrix

a b
I= ,
b c

the first definition of equation (15.1) implies that a, c ≥ 0. The upper-left element of I −1
is a−b12 /c , which is always at least a1 . This implies, for any model with a single parameter
θ1 that is contained inside a larger model with parameters (θ1 , θ2 ), that the variability of
the MLE for θ1 in the larger model is always at least that of the MLE for θ1 in the smaller
model; they are equal when the off-diagonal entry b is equal to 0. The same observation is
true for any number of parameters k ≥ 2 in the larger model.
This is a simple example of a trade-off between model complexity and accuracy of esti-
mation, which is fundamental to many areas of statistics and machine learning: a complex
model with more parameters might better capture the true distribution of data, but these
parameters will also be more difficult to estimate than those in a simpler model.

15.2 The Cramer-Rao lower bound

Let’s return to the setting of a single parameter θ ∈ R. Why is the Fisher information I(θ)
called “information”, and why should we choose to estimate θ by the MLE θ̂?
IID Pn
If X1 , . . . , Xn ∼ f (x|θ0 ) for a true parameter θ0 , and l(θ) = i=1 log f (Xi |θ) is the
log-likelihood function, then
2h
∂ i 1
I(θ0 ) = −Eθ0 2
log f (X|θ) = − Eθ0 [l00 (θ0 )].
∂θ θ=θ0 n

I(θ0 ) measures the expected curvature of the log-likelihood function l(θ) around the true
parameter θ = θ0 . If l(θ) is sharply curved around θ0 —in other words, I(θ0 ) is large—then a
small change in θ can lead to a large decrease in the log-likelihood l(θ), and hence the data
provides a lot of “information” that the true value of θ is close to θ0 . Conversely, if I(θ0 )
is small, then a small change in θ does not affect l(θ) by much, and the data provides less
information about θ. In this (heuristic) sense, I(θ0 ) quantifies the amount of information
that each observation Xi contains about the unknown parameter.
The Fisher information I(θ) is an intrinsic property of the model {f (x|θ) : θ ∈ Ω}, not
of any specific estimator. (We’ve shown that it is related to the variance of the MLE, but
its definition does not involve the MLE.) There are various information-theoretic results
stating that I(θ) describes a fundamental limit to how accurate any estimator of θ based on
X1 , . . . , Xn can be. We’ll prove one such result, called the Cramer-Rao lower bound:

Theorem 15.4. Consider a parametric model {f (x|θ) : θ ∈ Ω} (satisfying certain mild

regularity assumptions) where θ ∈ R is a single parameter. Let T be any unbiased estimator
IID
of θ based on data X1 , . . . , Xn ∼ f (x|θ). Then
1
Varθ [T ] ≥ .
nI(θ)

15-3
Proof. Recall the score function
∂
∂ f (x|θ)
z(x, θ) = log f (x|θ) = ∂θ ,
∂θ f (x|θ)

and let Z := Z(X1 , . . . , Xn , θ) = ni=1 z(Xi , θ). By the definition of correlation and the fact
P
that the correlation of two random variables is always between -1 and 1,

Covθ [Z, T ]2 ≤ Varθ [Z] × Varθ [T ].

The random variables z(X1 , θ), . . . , z(Xn , θ) are IID, and by Lemma 14.1, they have mean 0
and variance I(θ). Then

Varθ [Z] = n Varθ [z(X1 , θ)] = nI(θ).

Since T is unbiased,
Z
θ = Eθ [T ] = T (x1 , . . . , xn )f (x1 |θ) × . . . × f (xn |θ)dx1 . . . dxn .
Rn

Differentiating both sides with respect to θ and applying the product rule of differentiation,
Z
∂
1= T (x1 , . . . , xn ) f (x1 |θ) × f (x2 |θ) × . . . × f (xn |θ)
Rn ∂θ
∂
+ f (x1 |θ) × f (x2 |θ) × . . . × f (xn |θ) + . . .
∂θ
∂
+ f (x1 |θ) × f (x2 |θ) × . . . × f (xn |θ) dx1 . . . dxn
∂θ
Z
= T (x1 , . . . , xn )Z(x1 , . . . , xn , θ)f (x1 |θ) × . . . × f (xn |θ)dx1 . . . dxn
Rn
= Eθ [T Z].
1
Since Eθ [Z] = 0, this implies Covθ [T, Z] = Eθ [T Z] = 1, so Varθ [T ] ≥ nI(θ)
as desired.
For two unbiased estimators of θ, the ratio of their variances is called their relative effi-
1
ciency. An unbiased estimator is efficient if its variance equals the lower bound nI(θ) . Since
the MLE achieves this lower bound asymptotically, we say it is asymptotically efficient.
The Cramer-Rao bound ensures that no unbiased estimator can achieve asymptotically
lower variance than the MLE. Stronger results, which we will not prove in this class, in fact
show that no estimator, biased or unbiased, can asymptotically achieve lower mean-squared-
1
error than nI(θ) , except possibly on a small set of special values θ ∈ Ω.1 In particular,
when the method-of-moments estimator differs from the MLE, we expect it to have higher
mean-squared-error than the MLE for large n, which explains why the MLE is usually the
preferred estimator in simple parametric models.
1
For example, the constant estimator θ̂ = c for fixed c ∈ Ω achieves 0 mean-squared-error if the true
parameter happened to be the special value c, but at all other parameter values is worse than the MLE for
sufficiently large n.

15-4

The Geometry of Art and Life Matila Ghyka
100% (1)
The Geometry of Art and Life Matila Ghyka
206 pages
Algebra - A Complete Introduction
No ratings yet
Algebra - A Complete Introduction
491 pages
Solutions To Steven Kay's Statistical Estimation Book
67% (3)
Solutions To Steven Kay's Statistical Estimation Book
16 pages
ICSE Class 10 Maths Chapter 06 Quadratic Equations
No ratings yet
ICSE Class 10 Maths Chapter 06 Quadratic Equations
30 pages
Aerospace Material Specification
100% (1)
Aerospace Material Specification
7 pages
Tolman's Purpos
100% (1)
Tolman's Purpos
7 pages
Igcse Ict Year11 Week3day3
0% (1)
Igcse Ict Year11 Week3day3
29 pages
Mir Moscow
100% (1)
Mir Moscow
69 pages
A-Cat Corp
No ratings yet
A-Cat Corp
26 pages
Chapter 7. Statistical Estimation 7.7: Properties of Estimators II
No ratings yet
Chapter 7. Statistical Estimation 7.7: Properties of Estimators II
6 pages
Mohan P Shell Dissertation
0% (1)
Mohan P Shell Dissertation
164 pages
Neural Syllabus
No ratings yet
Neural Syllabus
1 page
Kelly JB. The Determination of Child Custody
100% (1)
Kelly JB. The Determination of Child Custody
23 pages
Solved Examples of Cramer Rao Lower Bound
100% (1)
Solved Examples of Cramer Rao Lower Bound
6 pages
STA 303 Theory of Estimation 9th Lecture-1
No ratings yet
STA 303 Theory of Estimation 9th Lecture-1
7 pages
Offshor Mooring System
No ratings yet
Offshor Mooring System
6 pages
Honors Flex45 Training
No ratings yet
Honors Flex45 Training
73 pages
Fisher Information and Cram Er-Rao Bound: Lecturer: Songfeng Zheng
No ratings yet
Fisher Information and Cram Er-Rao Bound: Lecturer: Songfeng Zheng
12 pages
Typical EfW Plant Commissioning Plan Feb 2010
No ratings yet
Typical EfW Plant Commissioning Plan Feb 2010
176 pages
Dream India Technologies Is The Best Way To Learn Spoken English
No ratings yet
Dream India Technologies Is The Best Way To Learn Spoken English
1 page
Avogadro
No ratings yet
Avogadro
2 pages
Linn, Wen Teck
No ratings yet
Linn, Wen Teck
36 pages
COP 4710 - Database Systems - Spring 2004 Homework #3 - 115 Points
100% (1)
COP 4710 - Database Systems - Spring 2004 Homework #3 - 115 Points
5 pages
Chapter 8 Estimation of Parameters and Fitting of Probability Distributions
No ratings yet
Chapter 8 Estimation of Parameters and Fitting of Probability Distributions
20 pages
Template For Egra Grade 2
No ratings yet
Template For Egra Grade 2
9 pages
MVUE
No ratings yet
MVUE
89 pages
Module02B Slides Print 1
No ratings yet
Module02B Slides Print 1
59 pages
Cram Er-Rao Lower Bound and Information Geometry: 1 Introduction and Historical Background
No ratings yet
Cram Er-Rao Lower Bound and Information Geometry: 1 Introduction and Historical Background
27 pages
CIMA Code of Ethics For Professional Accountants
No ratings yet
CIMA Code of Ethics For Professional Accountants
4 pages
Econ-2042 - Unit 6-W12-13
No ratings yet
Econ-2042 - Unit 6-W12-13
77 pages
Week 1 1720465962 Estimation Hour 2
No ratings yet
Week 1 1720465962 Estimation Hour 2
14 pages
Sol Stat Chapter2
No ratings yet
Sol Stat Chapter2
9 pages
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
No ratings yet
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
18 pages
Lecture 06
No ratings yet
Lecture 06
30 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
7 Mle
No ratings yet
7 Mle
31 pages
Airports Authority of India: Air Traffic Flow Management - India
No ratings yet
Airports Authority of India: Air Traffic Flow Management - India
40 pages
The Roots of Filipino Indiscipline: Social Aspect
No ratings yet
The Roots of Filipino Indiscipline: Social Aspect
3 pages
02 Point Estimators
No ratings yet
02 Point Estimators
33 pages
RaoCramerans PDF
No ratings yet
RaoCramerans PDF
10 pages
Chapter4 Estimation
No ratings yet
Chapter4 Estimation
28 pages
Lloyds British - Scope of Work Training
No ratings yet
Lloyds British - Scope of Work Training
23 pages
Lecture 1.4
No ratings yet
Lecture 1.4
13 pages
Stat-Review Xid-8243919 1
No ratings yet
Stat-Review Xid-8243919 1
24 pages
CRLB Vector Proof
No ratings yet
CRLB Vector Proof
24 pages
Quality of Care Between Donabedian Model and Iso9001v2008 PDF
100% (1)
Quality of Care Between Donabedian Model and Iso9001v2008 PDF
14 pages
Preparing A Debate Arguments and Fallacies
No ratings yet
Preparing A Debate Arguments and Fallacies
37 pages
Classics: 76 Resonance
No ratings yet
Classics: 76 Resonance
15 pages
Stat 463 Estimation 1: Ch. 6.1 - 6.2 Estimation:: N Iid
No ratings yet
Stat 463 Estimation 1: Ch. 6.1 - 6.2 Estimation:: N Iid
12 pages
PDF Estimation Corr
No ratings yet
PDF Estimation Corr
43 pages
Estimation EMV
No ratings yet
Estimation EMV
37 pages
Lecture 22
No ratings yet
Lecture 22
7 pages
02 Estimation
No ratings yet
02 Estimation
20 pages
Unbias
No ratings yet
Unbias
15 pages
Crrao
No ratings yet
Crrao
7 pages
At First Blush: The Politics of Guilt and Shame: Marguerite La Caze
No ratings yet
At First Blush: The Politics of Guilt and Shame: Marguerite La Caze
15 pages
Fisher Information and Cramer-Rao Bound
No ratings yet
Fisher Information and Cramer-Rao Bound
13 pages
Stanford Statistical - Fisher Information
No ratings yet
Stanford Statistical - Fisher Information
7 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
Slides Estimation PDF
No ratings yet
Slides Estimation PDF
17 pages
Point Estimation: Institute of Technology of Cambodia
No ratings yet
Point Estimation: Institute of Technology of Cambodia
22 pages
01 Estimation PDF
No ratings yet
01 Estimation PDF
13 pages
Statistical Methods
No ratings yet
Statistical Methods
25 pages
Classical Estimation
No ratings yet
Classical Estimation
11 pages
Last Week: 4.2 Cramer-Rao Lower Bound: 2 2 Fisher Bilgisi CRB
No ratings yet
Last Week: 4.2 Cramer-Rao Lower Bound: 2 2 Fisher Bilgisi CRB
9 pages
Chapter 7. Statistical Estimation: 7.7: Properties of Estimators II
No ratings yet
Chapter 7. Statistical Estimation: 7.7: Properties of Estimators II
6 pages
Asymptotic Variance of The MLE and The CRLB
No ratings yet
Asymptotic Variance of The MLE and The CRLB
12 pages
CHAP2 Bioavailability of Metals David Jhon Leventhal
No ratings yet
CHAP2 Bioavailability of Metals David Jhon Leventhal
9 pages
Introecon Estimators Properties
No ratings yet
Introecon Estimators Properties
8 pages
Agricultural Land Use in Kerala
No ratings yet
Agricultural Land Use in Kerala
5 pages
2 Efficiency
No ratings yet
2 Efficiency
4 pages
Lectura 2 Point Estimator Basics
No ratings yet
Lectura 2 Point Estimator Basics
11 pages
Pengaruh Berbagai Komposisi Media Tanam 986d8217
No ratings yet
Pengaruh Berbagai Komposisi Media Tanam 986d8217
11 pages
Lecture Note 16
No ratings yet
Lecture Note 16
4 pages
Ozcan UCLA FacultyCandidates PDF
No ratings yet
Ozcan UCLA FacultyCandidates PDF
12 pages
Estimation
No ratings yet
Estimation
6 pages
牛颖Introduction to M-estimator
No ratings yet
牛颖Introduction to M-estimator
4 pages
6.chapter 4
No ratings yet
6.chapter 4
9 pages
Basic Stats Estimation
No ratings yet
Basic Stats Estimation
8 pages
Notes On The Cram Er-Rao Inequality: Kimball Martin February 8, 2012
No ratings yet
Notes On The Cram Er-Rao Inequality: Kimball Martin February 8, 2012
6 pages
Minimum Variance Unbiased Estimators
No ratings yet
Minimum Variance Unbiased Estimators
4 pages
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
No ratings yet
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
2 pages
Week 7 In-Class Problems
No ratings yet
Week 7 In-Class Problems
2 pages
Optical Spectrum Analyzer AP204XB - APEX Technologies
No ratings yet
Optical Spectrum Analyzer AP204XB - APEX Technologies
4 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Topology Essentials
From Everand
Topology Essentials
Emil G. Milewski
5/5 (1)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet