0% found this document useful (0 votes)
32 views15 pages

Chapter 3 - Statistical Inference (Point Estimation

Uploaded by

羅峻熒
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views15 pages

Chapter 3 - Statistical Inference (Point Estimation

Uploaded by

羅峻熒
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Chapter 3.

Statistical Inference – Point Estimation


Problem in statistics:
A random variables X with p.d.f. of the form f (x, θ) where function f is
known but parameter θ is unknown. We want to gain knowledge about θ.
What we have for inference:
There is a random sample X1 , . . . , Xn from f (x, θ).
 


  Point estimation: θ̂ = θ̂(X1 , . . . , Xn )


 
Interval estimation:

 
 Estimation 



Find statistics T1 = t1 (X1 , . . . , Xn ), T2 = t2 (X1 , . . . , Xn )
Statistical inferences






 such that 1 − α = P (T1 ≤ θ ≤ T2 )

Hypothesis testing: H0 : θ = θ0 or H0 : θ ≥ θ0 .





 Want to find a rule to decide if we accept or reject H .
0

Def. We call a statistic θ̂ = θ̂(X1 , . . . , Xn ) an estimator of parameter θ


if it is used to estimate θ. If X1 = x1 , . . . , Xn = xn are observed, then
θ̂ = θ̂(x1 , . . . , xn ) is called an estimate of θ.

Two problems are concerned in estimation of θ :

(a) How can we evaluate an estimator θ̂ for its use in estimation of θ ?


Need criterion for this estimation.

(b) Are there general rules in deriving estimators ? We will introduce two
methods for deriving estimator of θ .

Def. We call an estimator θ unbiased for θ if it satisfies

Eθ (θ̂(X1 , . . . , Xn )) = θ, ∀θ.
( R
∞ R∞
··· θ̂(x1 , . . . , xn )f (x1 , . . . , xn , θ)dx1 · · · dxn
Eθ (θ̂(X1 , . . . , Xn )) = R−∞

−∞

−∞
θ fθ̂ (θ )dθ∗ where θ̂ = θ̂(X1 , . . . , Xn ) is a r.v. with pdf fθ̂ (θ∗ )
∗ ∗

Def. If Eθ (θ̂(X1 , . . . , Xn )) 6= θ for some θ, we said that θ̂ is a biased esti-


mator.

13
iid
Example : X1 , . . . , Xn ∼ N (µ, σ 2 ), Suppose that our interest is µ, X1 ,
Eµ (X1 ) = µ, is unbiased for µ,
1
2
(X1 + X2 ), E( X1 +X2
2
) = µ, is unbiased for µ,
X, Eµ (X) = µ, is unbiased for µ,
n→∞
I an −→ a, if , for  > 0, there exists N > 0 such that |an −a| <  if n ≥ N.
{Xn } is a sequence of r.v.’s. How can we define Xn −→ X as n −→ ∞?

Def. We say that Xn converges to X, a r.v. or a constant, in probability


if for  > 0,
P (|Xn − X| > ) −→ 0, as n −→ ∞.
P
In this case, we denote Xn −→ X.

Thm.
P
If E(Xn ) = a or E(Xn ) −→ a and Var(Xn ) −→ 0, then Xn −→ a.

Proof.

E[(Xn − a)2 ] = E[(Xn − E(Xn ) + E(Xn ) − a)2 ]


= E[(Xn − E(Xn ))2 ] + E[(E(Xn ) − a)2 ] + 2E[(Xn − E(Xn ))(E(Xn ) − a)]
= Var(Xn ) + E((Xn ) − a)2

Chebyshev’s Inequality :

E(Xn − X)2 1
P (|Xn − X| ≥ ) ≤ or P (|X n − µ| ≥ kσ) ≤
2 k2
For  > 0,

0 ≤ P (|Xn − a| > ) = P ((Xn − a)2 > 2 )


E(Xn − a)2 Var(Xn ) + (E(Xn ) − a)2
≤ = −→ 0 as n −→ ∞.
2 2
P
⇒ P (|Xn − a| > ) −→ 0, as n −→ ∞. ⇒ Xn −→ a.

Thm. Weak Law of Large Numbers(WLLN)


If X1 , . . . , Xn is a random sample with mean µ and finite variance σ 2 , then
P
X −→ µ.

14
Proof.
σ2 P
E(X) = µ, Var(X) = −→ 0 as n −→ ∞. ⇒ X −→ µ.
n

P
Def. We sat that θ̂ is a consistent estimator of θ if θ̂ −→ θ.
Example : X1 , . . . , Xn is a random sample with mean µ and finite variance
σ 2 .Is X1 a consistent estimator of µ ?
E(X1 )=µ, X1 is unbiased for µ.
Let  > 0,

P (|X1 − µ| > ) = 1 − P (|X1 − µ| ≤ ) = 1 − P (µ −  ≤ X1 ≤ µ + )


Z µ+
=1− fX (x)dx > 0, 9 0 as n −→ ∞.
µ−

⇒ X is not a consistent estimator of µ


σ2
E(X) = µ, Var(X) = −→ 0 as n −→ ∞.
n
P
⇒ X −→ µ.
⇒ X is a consistent estimator of µ.

I Unbiasedness and consistency are two basic conditions for good estimator.

Moments :
Let X be a random variable having a p.d.f. f (x, θ), the population kth
moment is defined by
 P
 xk f (x, θ) , discrete
k
Eθ (X ) = all x
 R∞ k
−∞
x f (x, θ)dx , continuous
n
1
Xi k .
P
The sample kth moment is defined by n
i=1
Note :
n n n
1X k 1X 1X
E( Xi ) = E(Xi k ) = Eθ (X k ) = Eθ (X k )
n i=1 n i=1 n i=1

15
⇒ Sample kth moment is unbiased for population kth moment.
n n n
1X k 1 X 1 X 1
Var( Xi ) = 2 Var( Xi k ) = 2 Var(Xi k ) = Var(X k ) −→ 0 as n −→ ∞.
n i=1 n i=1
n i=1 n
n
P
1
Xi k −→ Eθ (X k ).
P
⇒ n
i=1
n
1
Xi k is a consistent estimator of Eθ (X k ).
P
⇒ n
i=1

Let X1 , . . . , Xn be a random sample with mean µ and variance σ 2 .The sample


n
1
variance is defined by S 2 = n−1 (Xi −X)2 Want to show that S 2 is unbiased
P
i=1
for σ 2 .

Var(X) = E[(X − µ)2 ] = E[X 2 − 2µX + µ2 ] = E(X 2 ) − µ2

⇒ E(X 2 ) = Var(X) + µ2 = Var(X) + (E(X))2


σ2
E(X) = µ, Var(X) =
n
n n n
2 1 X 2 1 X
2
X 2
E(S ) = E( (Xi − X) ) = E( Xi − 2X Xi + nX )
n − 1 i=1 n − 1 i=1 i=1
n n
1 X 2 1 X 2
= E( Xi2 − nX ) = [ E(Xi2 ) − nE(X )]
n − 1 i=1 n − 1 i=1
1 σ2 1
= [nσ + nµ − n( + µ2 )] =
2 2
(n − 1)σ 2 = σ 2
n−1 n n−1
n
1
⇒ S2 = (Xi − X)2 is unbiased for σ 2 .
P
n−1
i=1

n n
2 1 X 2 2 n 1X 2 2 P
S = [ Xi − nX ] = [ Xi − X ] −→ E(X 2 ) − µ2 = σ 2 + µ2 − µ2 = σ 2
n − 1 i=1 n − 1 n i=1
 
X1 , . . . , Xn are iid with mean µ and variance σ 2
 X1 2 , . . . , Xn 2 are iid r.v.’s with mean E(X 2 ) = µ2 + σ 2 
 
 n 
P
By WLLN , n1 Xi2 −→ E(X 2 ) = µ2 + σ 2
P
i=1

P
⇒ s2 −→ σ 2

16
Def. Let X1 , . . . , Xn be a random sample from a distribution with p.d.f.
f (x, θ)

(a) If θ is univariate, the method of moment estimator θ̂ solve θ for X =


Eθ (X)

(b) If θ = (θ1 , θ2 ) is bivariate, the method of moment estimator (θˆ1 , θˆ2 )


solves (θ1 , θ2 ) for
n
1X 2
X = Eθ1 ,θ2 (X), Xi = Eθ1 ,θ2 (X 2 )
n i=1

(c) If θ = (θ1 , . . . , θk ) is k-variate, the method of moment estimator (θˆ1 , . . . , θˆk )


solves θ1 , . . . , θk for
n
1X j
Xi = Eθ1 ,...,θk (X j ), j = 1, . . . , k
n i=1

Example :

iid
(a) X1 , . . . , Xn ∼ Bernoulli(p)
Let X = Ep (X) = p
⇒ The method of moment estimator of p is p̂ = X
P
By WLLN, p̂ = X −→ Ep (X) = p ⇒ p̂ is consistent for p.
E(p̂) = E(X) = E(X) = p ⇒ p̂ is unbiased for p.

(b) Let X1 , . . . , Xn be a random sample from Poisson(λ)


Let X = Eλ (X) = λ
⇒ The method of moment estimator of λ is λ̂ = X
E(λ̂) = E(X) = λ ⇒ λ̂ is unbiased for λ.
P
λ̂ = X −→ E(X) = λ ⇒ λ̂ is consistent for λ.

(c) Let X1 , . . . , Xn be a random sample with mean µ and variance σ 2 .


θ = (µ, σ 2 )
Let X = Eµ,σ2 (X) = µ
n
1
Xi 2 = Eµ,σ2 (X 2 ) = σ 2 + µ2
P
n
i=1
⇒ Method of moment estimator are µ̂ = X ,

17
n 2 n
σ̂ 2 = 1
Xi 2 − X = 1
(Xi − X)2 .
P P
n n
i=1 i=1
X is unbiasedPand consistent estimator P
for µ.
1 n−1 1 n−1 2
E(σ̂ ) = E( n (Xi − X) ) = n E( n−1 (Xi − X)2 ) =
2 2
n
σ 6= σ 2
⇒ σ̂ 2 is nnot unbiased for σ 2
P 2 2 p
σ̂ 2 = n1 Xi − X −→ E(X 2 ) − µ2 = σ 2
i=1
⇒ σ̂ 2 is consistent for σ 2 .

Maximum Likelihood Estimator :


Let X1 , . . . , Xn be a random sample with p.d.f. f (x, θ).
The joint p.d.f. of X1 , . . . , Xn is
n
Y
f (x1 , . . . , xn , θ) = f (xi , θ), xi ∈ R, i = 1, . . . , n
i=1

Let Θ be the space of possible values of θ. We call Θ the parameter space.

Def. The likelihood function of a random sample is defined as its joint p.d.f.
as
L(θ) = L(θ, x1 , . . . , xn ) = f (x1 , . . . , xn , θ), θ ∈ Θ.
which is considered as a function of θ.
For (x1 , . . . , xn ) fixed, the value L(θ, x1 , . . . , xn ) is called the likelihood at θ.

Given observation x1 , . . . , xn , the likelihood L(θ, x1 , . . . , xn ) is considered as


the probability that X1 = x1 , . . . , Xn = xn occurs when θ is true.

Def. Let θ̂ = θ̂(x1 , . . . , xn ) be any value of θ that maximizes L(θ, x1 , . . . , xn ).


Then we call θ̂ = θ̂(x1 , . . . , xn ) the maximum likelihood estimator (m.l.e)
of θ. When X1 = x1 , . . . , Xn = xn is observed, we call θ̂ = θ̂(x1 , . . . , xn ) the
maximum likelihood estimate of θ.

Note :

(a) Why m.l.e ?


When L(θ1 , x1 , . . . , xn ) ≥ L(θ2 , x1 , . . . , xn ),
we are more confident to believe θ = θ1 than to believe θ = θ2

18
(b) How to derive m.l.e ?
∂ ln x
∂x
= x1 > 0 ⇒ ln x is % in x
⇒If L(θ1 ) ≥ L(θ2 ), then ln L(θ1 ) ≥ ln L(θ2 )
If θ̂ is the m.l.e., then L(θ̂, x1 , . . . , xn ) = max L(θ, x1 , . . . , xn ) and
θ∈Θ
ln L(θ̂, x1 , . . . , xn ) = max ln L(θ, x1 , . . . , xn )
θ∈Θ
Two cases to solve m.l.e. :
(b.1) ∂ ln∂θL(θ) = 0
(b.2) L(θ) is monotone. Solve max L(θ, x1 , . . . , xn ) from monotone
θ∈Θ
property.

Order statistics:
Let (X1 , . . . , Xn ) be a random sample with d.f. F and p.d.f. f.
Let (Y1 , . . . , Yn ) be a permutation (X1 , . . . , Xn ) such that Y1 ≤ Y2 ≤ · · · Yn .
Then we call (Y1 , . . . , Yn ) the order statistic of (X1 , . . . , Xn ) where Y1 is
the first (smallest) order statistic, Y2 is the second order statistic,. . . , Yn is
the largest order statistic.

If (X1 , . . . , Xn ) are independent, then


Z Z
P (X1 ∈ A1 , X2 ∈ A2 , . . . , Xn ∈ An ) = ··· f (x1 , . . . , xn )dx1 · · · dxn
An A1
Z Z
= fn (xn )dxn · · · f1 (x1 )dx1
An A1
= P (Xn ∈ An ) · · · P (X1 ∈ A1 )

Thm. Let (X1 , . . . , Xn ) be a random sample from a “continuous distribution”


with p.d.f. f (x) and d.f F (x). Then the p.d.f. of Yn = max{X1 , . . . , Xn } is

gn (y) = n(F (y))n−1 f (y)

and the p.d.f. of Y1 = min{X1 , . . . , Xn } is

g1 (y) = n(1 − F (y))n−1 f (y)

Proof. This is a Rn → R transformation. Distribution function of Yn is

Gn (y) = P (Yn ≤ y) = P (max{X1 , . . . , Xn } ≤ y) = P (X1 ≤ y, . . . , Xn ≤ y)


= P (X1 ≤ y)P (X2 ≤ y) · · · P (Xn ≤ y) = (F (y))n

19
⇒ p.d.f. of Yn is gn (y) = Dy (F (y))n = n(F (y))n−1 f (y)
Distribution function of Y1 is

G1 (y) = P (Y1 ≤ y) = P (min{X1 , . . . , Xn } ≤ y) = 1 − P (min{X1 , . . . , Xn } > y)


= 1 − P (X1 > y, X2 > y, . . . , Xn > y) = 1 − P (X1 > y)P (X2 > y) · · · P (Xn > y)
= 1 − (1 − F (y))n

⇒ p.d.f. of Y1 is g1 (y) = Dy (1 − (1 − F (y))n ) = n(1 − F (y))n−1 f (y)

Example : Let (X1 , . . . , Xn ) be a random sample from U (0, θ).


Find m.l.e. of θ. Is it unbiased and consistent ?
sol: The p.d.f. of X is
(
1
θ
if 0 ≤ x ≤ θ
f (x, θ) =
0 elsewhere.

Consider the indicator function


(
1 if a ≤ x ≤ b
I(a,b) (x) =
0 elsewhere.

Then f (x, θ) = 1θ I[0,θ] (x).


The likelihood function is
n n n
Y Y 1 1 Y
L(θ) = f (xi , θ) = I[0,θ] (xi ) = n I[0,θ] (xi )
i=1 i=1
θ θ i=1

Let Yn = max{X1 , . . . , Xn }
Qn
Then I[0,θ] (xi ) = 1 ⇔ 0 ≤ xi ≤ θ, for all i = 1, . . . , n ⇔ 0 ≤ yn ≤ θ
i=1
We then have
(
1 1 1
θn
if θ ≥ yn
L(θ) = n
I[0,θ] (yn ) = n I[yn ,∞] (θ) =
θ θ 0 if θ < yn

L(θ) is maximized when θ = yn . Then m.l.e. of θ is θ̂ = Yn


The d.f. of x is
Z x
1 x
F (x) = P (X ≤ x) = dt = , 0 ≤ x ≤ θ
0 θ θ

20
The p.d.f. of Y is
y 1 y n−1
gn (y) = n( )n−1 = n n , 0 ≤ y ≤ θ
θ θ θ
Rθ n−1
E(Yn ) = 0 yn y θn dy = n+1
n
θ 6= θ ⇒ m.l.e. θ̂ = Yn is not unbiased.
n
However, E(Yn ) = n+1 θ → θ as n → ∞, m.l.e. θ̂ is asymptotically unbiased.
Z θ
2 y n−1 n 2
E(Yn ) = y 2 n n dy = θ
0 θ n+2
n 2 n 2 2
Var(Yn ) = E(Yn2 )−(EYn )2 = θ −( ) θ −→ θ2 −θ2 = 0 as n −→ ∞.
n+2 n+1
P
⇒ Yn −→ θ ⇒ m.l.e. θ̂ = Yn is consistent for θ .
Is there unbiased estimator for θ ?
n+1 n+1 n+1 n
E( Yn ) = E(Yn ) = θ=θ
n n n n+1
⇒ n+1
n
Yn is unbiased for θ.
Example :
(a) Y ∼ b(n, p)
The likelihood function is
 
n y
L(p) = fY (y, p) = p (1 − p)n−y
y
 
n
ln L(p) = ln + y ln p + (n − y) ln (1 − p)
y
∂ ln L(p) y n−y y n−y
= − =0⇔ = ⇔ y(1−p) = p(n−y) ⇔ y = np
∂p p 1−p p 1−p
⇒ m.l.e. p̂ = Yn
E(p̂) = n1 E(Y ) = p ⇒ m.l.e. p̂ = Yn is unbiased.
Var(p̂) = n12 Var(Y ) = n1 p(1 − p) −→ 0 as n −→ ∞
⇒ m.l.e. p̂ = Yn is consistent for p.

(b) X1 , . . . , Xn are a random sample from N (µ, σ 2 ). Want m.l.e.’s of µ and


σ2
The likelihood function is
∞ Pn 2
1 (x −µ)2 n n − i=1 (xi −µ)
− i 2
Y
2 − 2 −
L(µ, σ ) = √ 1 e 2σ = (2π) 2 (σ ) 2 e 2σ 2

i=1 2π(σ 2 ) 2

21
n
2 n n 2 1 X
ln L(µ, σ ) = (− ) ln (2π) − ln σ − 2 (xi − µ)2
2 2 2σ i=1
n
∂ ln L(µ, σ 2 ) 1 X
= 2 (xi − µ)2 = 0 ⇒ µ̂ = X
∂µ σ i=1
n n
∂ ln L(µ̂, σ 2 ) n 1 X 2 ˆ 1X
2
=− 2 + 4 2
(xi − x) = 0 ⇒ σ = (xi − x)2
∂σ 2σ 2σ i=1 n i=1
2
E(µ̂) = E(X) = µ (unbiased),Var(µ̂) = Var(X) = σn −→ 0 as n −→ ∞
⇒ m.l.e. µ̂ isP consistent for µ.
E(σ̂ 2 ) = E( n1 (Xi − X)2 ) = n−1 n
σ 2 6= σ 2 (biased).
E(σ̂ 2 ) = n−1
n
σ 2 −→ σ 2 as n −→ ∞ ⇒ σ̂ 2 is asymptotically unbiased.
n
(xi − x)2
P
n
1 X 1
Var(σ̂ 2 ) = Var( (xi − x)2 ) = 2 Var(σ 2 i=1 2 )
n
i=1
n σ
n
(xi − x)2
P
σ4 2(n − 1) 4
= 2 Var( i=1 2 )= σ −→ 0 as n −→ ∞
n σ n2

⇒ m.l.e. σˆ2 is consistent for σ 2 .

Suppose that we have m.l.e. θ̂ = θ̂(x1 , . . . , xn ) for parameter θ and our in-
terest is a new parameter τ (θ), a function of θ.
What is the m.l.e. of τ (θ) ?
The space of τ (θ) is T = {τ : ∃θ ∈ Θ s.t τ = τ (θ)}

Thm. If θ̂ = θ̂(x1 , . . . , xn ) is the m.l.e. of θ and τ (θ) is a 1-1 function of θ,


then m.l.e. of τ (θ) is τ (θ̂)

Proof. The likelihood function for θ is L(θ, x1 , . . . , xn ). Then the likelihood


function for τ (θ) can be derived as follows :

L(θ, x1 , . . . , xn ) = L(τ −1 (τ (θ)), x1 , . . . , xn )


= M (τ (θ), x1 , . . . , xn )
= M (τ, x1 , . . . , xn ), τ ∈ T

22
M (τ (θ̂), x1 , . . . , xn ) = L(τ −1 (τ (θ̂), x1 , . . . , xn )
= L(θ̂, x1 , . . . , xn )
≥ L(θ, x1 , . . . , xn ), ∀θ ∈ Θ
= L(τ −1 (τ (θ)), x1 , . . . , xn )
= M (τ (θ), x1 , . . . , xn ), ∀θ ∈ Θ
= M (τ, x1 , . . . , xn ), τ ∈ T
⇒ τ (θ̂) is m.l.e. of τ (θ).
This is the invariance property of m.l.e.
Example :
Y
(1)If Y ∼ b(n, p), m.l.e of p is p̂ = n
τ (p) m.l.e of τ (p)
Y 2
p2 pb2 = (q
n
)
√ √
cp = Y
p n
p(1 − p) is not a 1-1 function of p.
Y
ep ebp = e n
Y
e−p −p = e− n
ec
iid
(2) X1 , . . . , Xn ∼ N (µ, σ 2 ), m.l.e.’s of (µ, σ 2 ) is (X, n1 (Xi − X)2 ).
P
q P
m.l.e.’s of (µ, σ) is (X, n1 (Xi − X)2 ) (∵ σ ∈ (0, ∞) ∴ σ 2 −→ σ is
1-1)
You can also solve
∂ ln L(µ, σ 2 , x1 , . . . , xn )
=0
∂µ
∂ ln L(µ, σ 2 , x1 , . . . , xn )
= 0 for µ, σ
∂σ
(µ2 , σ) is not a 1-1 function of (µ, σ 2 ).
(∵ µ ∈ (−∞, ∞) ∴ µ −→ µ2 isn’t 1-1)

Best estimator :

Def. An unbiased estimator θ̂ = θ̂(X1 , . . . , Xn ) is called a uniformly min-


imum variance unbiased estimator (UMVUE) or best estimator if for any
unbiased estimator θˆ∗ ,we have
Varθ θ̂ ≤ Varθ θˆ∗ , for θ ∈ Θ
(θ̂ is uniformly better than θˆ∗ in variance. )

23
There are several ways in deriving UMVUE of θ.
Cramer-Rao lower bound for variance of unbiased estimator :
Regularity conditions :
(a) Parameter space Θ is an open interval. (a, ∞), (a, b), (b, ∞), a,b are
constants not depending on θ.

(b) Set {x : f (x, θ) = 0} is independent of θ.

R ∂f (x,θ) ∂
R
(c) ∂θ
dx = ∂θ
f (x, θ)dx = 0

(d) If T = t(x1 , . . . , xn ) is an unbiased estimator, then


Z Z
∂f (x, θ) ∂
t dx = tf (x, θ)dx
∂θ ∂θ
Thm. Cramer-Rao (C-R)
Suppose that the regularity conditions hold.
If τ̂ (θ) = t(X1 , . . . , Xn ) is unbiased for τ (θ), then
(τ 0 (θ))2 (τ 0 (θ))2
Varθ τ̂ (θ) ≥ = 2 for θ ∈ Θ
nEθ [( ∂ ln ∂θ
f (x,θ) 2
)] −nEθ [( ∂ ln f (x,θ)
∂θ2
)]
Proof. Consider only the continuous distribution.
Z ∞ Z ∞
∂ ln f (x, θ) ∂ ln f (x, θ) ∂f (x, θ)
E[ ]= f (x, θ)dx = dx
∂θ −∞ ∂θ −∞ ∂θ
Z ∞

= f (x, θ)dx = 0
∂θ −∞
Z Z n
Y n
Y
τ (θ) = Eθ τ̂ (θ) = Eθ (t(x1 , . . . , xn )) = ··· t(x1 , . . . , xn ) f (xi , θ) dxi
i=1 i=1
Taking derivatives both sides.
Z Z n n Z Z Yn n
0 ∂ Y Y ∂ Y
τ (θ) = · · · t(x1 , . . . , xn ) f (xi , θ) dxi − τ (θ) ··· f (xi , θ) dxi
∂θ i=1 i=1
∂θ i=1 i=1
Z Z n n Z Z n n
∂ Y Y ∂ Y Y
= · · · t(x1 , . . . , xn ) f (xi , θ) dxi − · · · τ (θ) f (xi , θ) dxi
∂θ i=1 i=1
∂θ i=1 i=1
Z Z n n
∂ Y Y
= · · · (t(x1 , . . . , xn ) − τ (θ))( f (xi , θ)) dxi
∂θ i=1 i=1

24
Now,
n
∂ Y ∂
f (xi , θ) = [f (x1 , θ)f (x2 , θ) · · · f (xn , θ)]
∂θ i=1 ∂θ
∂ Y ∂ Y
= ( f (x1 , θ)) f (xi , θ) + · · · + ( f (xn , θ)) f (xi , θ)
∂θ i6=1
∂θ i6=n
n
X ∂ Y
= f (xj , θ) f (xi , θ)
j=1
∂θ i6=j
n
X ∂ ln f (xj , θ) Y
= f (xj , θ) f (xi , θ)
j=1
∂θ i6=j
n n
X ∂ ln f (xj , θ) Y
= f (xi , θ)
j=1
∂θ j=1

Cauchy-Swartz Inequality
[E(XY )]2 ≤ E(X 2 )E(Y 2 )
Then
Z Z n n n
0
X ∂ ln f (xj , θ) Y Y
τ (θ) = ··· (t(x1 , . . . , xn ) − τ (θ))( ) f (xi , θ) dxi
j=1
∂θ i=1 i=1
n
X ∂ ln f (xj , θ)
= E[(t(x1 , . . . , xn ) − τ (θ)) ]
j=1
∂θ
n
0 2
X ∂ ln f (xj , θ) 2
2
(τ (θ)) ≤ E[(t(x1 , . . . , xn ) − τ (θ)) ] E[( )]
j=1
∂θ
(τ 0 (θ))2
⇒ Var(τ̂ (θ)) ≥ n
P ∂ ln f (xj ,θ) 2
E[( ∂θ
)]
j=1

Since
n n
X ∂ ln f (xj , θ) 2
X ∂ ln f (xj , θ) 2 X ∂ ln f (xj , θ) ∂ ln f (xi , θ)
E[( ) ]= E( ) + E( )
j=1
∂θ j=1
∂θ i6=j
∂θ ∂θ
n
X ∂ ln f (xj , θ) 2
= E( )
j=1
∂θ
∂ ln f (xj , θ) 2
= n E( )
∂θ

25
Then, we have
(τ 0 (θ))2
Varθ τ̂ (θ) ≥
nEθ [( ∂ ln ∂θ
f (x,θ) 2
)]
You may further check that

∂ 2 ln f (x, θ) ∂ ln f (x, θ) 2
Eθ ( 2
) = −Eθ ( )
∂θ ∂θ

Thm. If there is an unbiased estimator τ̂ (θ) with variance achieving the


(τ 0 (θ))2
Cramer-Rao lower bound ∂ 2 ln f (x,θ)
,then τ̂ (θ) is a UMVUE of τ (θ).
−nEθ [( )]
∂θ 2

Note:
If τ (θ) = θ, then any unbiased estimator θ̂ satisfies

(τ 0 (θ))2
Varθ (θ̂) ≥ 2
−nEθ ( ∂ ln f (x,θ)
∂θ2
)

Example:
iid
(a)X1 , . . . , Xn ∼ Poisson(λ), E(X) = λ, Var(X) = λ.
MLE λ̂ = X, E(λ̂) = λ, Var(λ̂) = nλ .
x e−λ
p.d.f. f (x, λ) = λ x! , x = 0, 1, . . .

⇒ ln f (x, λ) = x ln λ − λ − ln x!
∂ x
⇒ ln f (x, λ) = − 1
∂λ λ
∂2 x
⇒ ln f (x, λ) = −
∂λ2 λ2
2
∂ x E(X) 1
E( 2 ln f (x, λ)) = E(− 2 ) = − 2 = −
∂λ λ λ λ
Cramer-Rao lower bound is
1 λ
1 = = Var(λ̂)
−n(− λ ) n

⇒ MLE λ̂ = X is the UMVUE of λ.

26
iid
(b)X1 , . . . , Xn ∼ Bernoulli(p), E(X) = p, Var(X) = p(1 − p).
Want UMVUE of p.
p.d.f f (x, p) = px (1 − p)1−x

⇒ ln f (x, p) = x ln p + (1 − x) ln(1 − p)
∂ x 1−x
ln f (x, p) = −
∂p p 1−p
2
∂ x 1−x
2
ln f (x, p) = − 2 +
∂p p (1 − p)2
∂2 X 1−X 1 1 1
E( 2 ln f (X, p)) = E(− 2 + 2
)=− + =−
∂p p (1 − p) p 1−p p(1 − p)

C-R lower bound for p is

1 p(1 − p)
1 =
−n(− p(1−p) ) n

m.l.e. of p is p̂ = X
p(1−p)
E(p̂) = E(X) = p, Var(p̂) = Var(X) = n
= C-R lower bound.
⇒ MLE p̂ is the UMVUE of p.

27

You might also like