0% found this document useful (0 votes)
45 views7 pages

ECE531 Screencast 2.2: Fisher Information For Estimating A Scalar Parameter

1) Fisher information measures how sensitive the likelihood function is to changes in the parameter being estimated. It averages the squared relative steepness of the likelihood function over all possible observations. 2) Fisher information is additive for independent observations - the Fisher information of multiple independent observations is the sum of the Fisher information of each individual observation. 3) For i.i.d. observations, the Fisher information of n observations is n times the Fisher information of a single observation.

Uploaded by

Karthik Mohan K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views7 pages

ECE531 Screencast 2.2: Fisher Information For Estimating A Scalar Parameter

1) Fisher information measures how sensitive the likelihood function is to changes in the parameter being estimated. It averages the squared relative steepness of the likelihood function over all possible observations. 2) Fisher information is additive for independent observations - the Fisher information of multiple independent observations is the sum of the Fisher information of each individual observation. 3) For i.i.d. observations, the Fisher information of n observations is n times the Fisher information of a single observation.

Uploaded by

Karthik Mohan K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ECE531 Screencast 2.

2: Fisher Information

ECE531 Screencast 2.2: Fisher Information for


Estimating a Scalar Parameter

D. Richard Brown III

Worcester Polytechnic Institute

Worcester Polytechnic Institute D. Richard Brown III 1/7


ECE531 Screencast 2.2: Fisher Information

A Definition of “Sensitivity” (Scalar Parameter – part 1)

◮ We require the likelihood function pY (y ; θ) to be differentiable with


respect to θ for each y ∈ Y.
◮ Holding y fixed, the relative steepness of the likelihood function
pY (y ; θ) (as a function of θ) can be expressed as

∂θ pY (y ;θ) ∂
ψ(y ; θ) := = ln pY (y ; θ)
pY (y ; θ) ∂θ

◮ Two problems:
1. We don’t care if the relative steepness is positive or negative. So we
should square this result to give a non-negative measure of squared
relative steepness.
2. This “relative steepness” ψ(y ; θ) or “squared relative steepness”
ψ 2 (y ; θ) is only for a particular observation Y = y. We need to
average this result over Y (holding θ fixed).

Worcester Polytechnic Institute D. Richard Brown III 2/7


ECE531 Screencast 2.2: Fisher Information

A Definition of “Sensitivity” (Scalar Parameter – part 2)


◮ Averaging the squared relative steepness: We compute the mean
squared value of ψ as
I(θ) := E[ψ 2 (Y ; θ)]
" 2 #

= E ln pY (Y ; θ)
∂θ
Z  2

= ln pY (y ; θ) pY (y ; θ) dy
Y ∂θ
!2

p (y ; θ)
Z
∂θ Y
= pY (y ; θ) dy
Y pY (y ; θ)

◮ Terminology: I(θ) is called the “Fisher information” that the random


observation Y can tell us, on average, about the parameter θ.
◮ The Fisher information I(θ) is only a function of θ (and other known
quantities). It is not a function of y.
◮ Fisher information 6= mutual information (information theory).
Worcester Polytechnic Institute D. Richard Brown III 3/7
ECE531 Screencast 2.2: Fisher Information

Example: Single Sample of Unknown Parameter in Noise


Suppose we get one sample of an unknown parameter θ ∈ R corrupted by
zero-mean additive Gaussian noise, i.e. Y = θ + W where W ∼ N (0, σ 2 ).
The likelihood function is then
−(y − θ)2
 
1
pY (y ; θ) = √ exp
2πσ 2σ 2
The relative slope of pY (y ; θ) with respect to θ can be easily computed

∂θ pY (y ; θ) θ−y
ψ(y ; θ) := =
pY (y ; θ) σ2
The Fisher information is then
I(θ) = E[ψ 2 (Y ; θ)]
2
−(y − θ)2
Z ∞
θ−y
 
1
= √ exp dy
−∞ σ2 2πσ 2σ 2
 2
−t
Z ∞
1 1
= √ t2 exp dt = 2
2
2πσ −∞ 2 σ
Worcester Polytechnic Institute D. Richard Brown III 4/7
ECE531 Screencast 2.2: Fisher Information

Fisher Information: Alternative Derivation


∂2
If θ) exists for all θ ∈ Λ and y ∈ Y and
p (y ;
∂θ 2 Y
∂2 ∂2
Z Z
p Y (y ; θ) dy = pY (y ; θ) dy = 0
∂θ 2 ∂θ 2
then we can derive an alternative (equivalent) expression for the Fisher
Information as follows:
" 2 !2 #
∂ ∂
∂2 p (Y ; θ) p (Y ; θ)
 
∂θ 2 Y ∂θ Y
E ln p Y (Y ; θ) = E −
∂θ2 pY (Y ; θ) pY (Y ; θ)
" 2 #

2 pY (Y ; θ)
= E ∂θ − I(θ)
pY (Y ; θ)
∂2
p (y ; θ)
Z
∂θ 2 Y
= pY (y ; θ) dy − I(θ)
y∈Y pY (y ; θ)
∂2
Z
= pY (y ; θ) dy − I(θ) = 0 − I(θ)
∂θ2 y∈Y
h i
∂2
Hence I(θ) = −E ∂θ 2
ln pY (y ; θ) .
Worcester Polytechnic Institute D. Richard Brown III 5/7
ECE531 Screencast 2.2: Fisher Information

Additive Information from Independent Observations

Lemma
If Y1 and Y2 are independent random variables with densities pY1 (y ; θ)
and pY2 (y ; θ) parameterized by θ then

I(θ) = IY1 (θ) + IY2 (θ)

where IY1 (θ), IY2 (θ), and I(θ) are the information about θ contained in
Y1 , Y2 , and {Y1 , Y2 }, respectively.

Corollary
If Y0 , . . . , Yn−1 are i.i.d., and each has information I(θ) about θ, then the
information in {Y0 , . . . , Yn−1 } about θ is nI(θ).

Worcester Polytechnic Institute D. Richard Brown III 6/7


ECE531 Screencast 2.2: Fisher Information

Appendix: Useful Calculus Results

These results were used in the derivations:



∂ ∂θ f (θ)
ln f (θ) =
∂θ f (θ)

" #

∂2 ∂ ∂θ f (θ)
ln f (θ) =
∂θ 2 ∂θ f (θ)
  2
∂2 ∂
∂θ 2 f (θ) f (θ) − ∂θ f (θ)
=
f 2 (θ)
∂2 2
f (θ)

∂θ 2 ∂
= − ln f (θ)
f (θ) ∂θ

Worcester Polytechnic Institute D. Richard Brown III 7/7

You might also like