Chapter 3 - Statistical Inference (Point Estimation
Chapter 3 - Statistical Inference (Point Estimation
(b) Are there general rules in deriving estimators ? We will introduce two
methods for deriving estimator of θ .
Eθ (θ̂(X1 , . . . , Xn )) = θ, ∀θ.
( R
∞ R∞
··· θ̂(x1 , . . . , xn )f (x1 , . . . , xn , θ)dx1 · · · dxn
Eθ (θ̂(X1 , . . . , Xn )) = R−∞
∞
−∞
−∞
θ fθ̂ (θ )dθ∗ where θ̂ = θ̂(X1 , . . . , Xn ) is a r.v. with pdf fθ̂ (θ∗ )
∗ ∗
13
iid
Example : X1 , . . . , Xn ∼ N (µ, σ 2 ), Suppose that our interest is µ, X1 ,
Eµ (X1 ) = µ, is unbiased for µ,
1
2
(X1 + X2 ), E( X1 +X2
2
) = µ, is unbiased for µ,
X, Eµ (X) = µ, is unbiased for µ,
n→∞
I an −→ a, if , for > 0, there exists N > 0 such that |an −a| < if n ≥ N.
{Xn } is a sequence of r.v.’s. How can we define Xn −→ X as n −→ ∞?
Thm.
P
If E(Xn ) = a or E(Xn ) −→ a and Var(Xn ) −→ 0, then Xn −→ a.
Proof.
Chebyshev’s Inequality :
E(Xn − X)2 1
P (|Xn − X| ≥ ) ≤ or P (|X n − µ| ≥ kσ) ≤
2 k2
For > 0,
14
Proof.
σ2 P
E(X) = µ, Var(X) = −→ 0 as n −→ ∞. ⇒ X −→ µ.
n
P
Def. We sat that θ̂ is a consistent estimator of θ if θ̂ −→ θ.
Example : X1 , . . . , Xn is a random sample with mean µ and finite variance
σ 2 .Is X1 a consistent estimator of µ ?
E(X1 )=µ, X1 is unbiased for µ.
Let > 0,
I Unbiasedness and consistency are two basic conditions for good estimator.
Moments :
Let X be a random variable having a p.d.f. f (x, θ), the population kth
moment is defined by
P
xk f (x, θ) , discrete
k
Eθ (X ) = all x
R∞ k
−∞
x f (x, θ)dx , continuous
n
1
Xi k .
P
The sample kth moment is defined by n
i=1
Note :
n n n
1X k 1X 1X
E( Xi ) = E(Xi k ) = Eθ (X k ) = Eθ (X k )
n i=1 n i=1 n i=1
15
⇒ Sample kth moment is unbiased for population kth moment.
n n n
1X k 1 X 1 X 1
Var( Xi ) = 2 Var( Xi k ) = 2 Var(Xi k ) = Var(X k ) −→ 0 as n −→ ∞.
n i=1 n i=1
n i=1 n
n
P
1
Xi k −→ Eθ (X k ).
P
⇒ n
i=1
n
1
Xi k is a consistent estimator of Eθ (X k ).
P
⇒ n
i=1
n n
2 1 X 2 2 n 1X 2 2 P
S = [ Xi − nX ] = [ Xi − X ] −→ E(X 2 ) − µ2 = σ 2 + µ2 − µ2 = σ 2
n − 1 i=1 n − 1 n i=1
X1 , . . . , Xn are iid with mean µ and variance σ 2
X1 2 , . . . , Xn 2 are iid r.v.’s with mean E(X 2 ) = µ2 + σ 2
n
P
By WLLN , n1 Xi2 −→ E(X 2 ) = µ2 + σ 2
P
i=1
P
⇒ s2 −→ σ 2
16
Def. Let X1 , . . . , Xn be a random sample from a distribution with p.d.f.
f (x, θ)
Example :
iid
(a) X1 , . . . , Xn ∼ Bernoulli(p)
Let X = Ep (X) = p
⇒ The method of moment estimator of p is p̂ = X
P
By WLLN, p̂ = X −→ Ep (X) = p ⇒ p̂ is consistent for p.
E(p̂) = E(X) = E(X) = p ⇒ p̂ is unbiased for p.
17
n 2 n
σ̂ 2 = 1
Xi 2 − X = 1
(Xi − X)2 .
P P
n n
i=1 i=1
X is unbiasedPand consistent estimator P
for µ.
1 n−1 1 n−1 2
E(σ̂ ) = E( n (Xi − X) ) = n E( n−1 (Xi − X)2 ) =
2 2
n
σ 6= σ 2
⇒ σ̂ 2 is nnot unbiased for σ 2
P 2 2 p
σ̂ 2 = n1 Xi − X −→ E(X 2 ) − µ2 = σ 2
i=1
⇒ σ̂ 2 is consistent for σ 2 .
Def. The likelihood function of a random sample is defined as its joint p.d.f.
as
L(θ) = L(θ, x1 , . . . , xn ) = f (x1 , . . . , xn , θ), θ ∈ Θ.
which is considered as a function of θ.
For (x1 , . . . , xn ) fixed, the value L(θ, x1 , . . . , xn ) is called the likelihood at θ.
Note :
18
(b) How to derive m.l.e ?
∂ ln x
∂x
= x1 > 0 ⇒ ln x is % in x
⇒If L(θ1 ) ≥ L(θ2 ), then ln L(θ1 ) ≥ ln L(θ2 )
If θ̂ is the m.l.e., then L(θ̂, x1 , . . . , xn ) = max L(θ, x1 , . . . , xn ) and
θ∈Θ
ln L(θ̂, x1 , . . . , xn ) = max ln L(θ, x1 , . . . , xn )
θ∈Θ
Two cases to solve m.l.e. :
(b.1) ∂ ln∂θL(θ) = 0
(b.2) L(θ) is monotone. Solve max L(θ, x1 , . . . , xn ) from monotone
θ∈Θ
property.
Order statistics:
Let (X1 , . . . , Xn ) be a random sample with d.f. F and p.d.f. f.
Let (Y1 , . . . , Yn ) be a permutation (X1 , . . . , Xn ) such that Y1 ≤ Y2 ≤ · · · Yn .
Then we call (Y1 , . . . , Yn ) the order statistic of (X1 , . . . , Xn ) where Y1 is
the first (smallest) order statistic, Y2 is the second order statistic,. . . , Yn is
the largest order statistic.
19
⇒ p.d.f. of Yn is gn (y) = Dy (F (y))n = n(F (y))n−1 f (y)
Distribution function of Y1 is
Let Yn = max{X1 , . . . , Xn }
Qn
Then I[0,θ] (xi ) = 1 ⇔ 0 ≤ xi ≤ θ, for all i = 1, . . . , n ⇔ 0 ≤ yn ≤ θ
i=1
We then have
(
1 1 1
θn
if θ ≥ yn
L(θ) = n
I[0,θ] (yn ) = n I[yn ,∞] (θ) =
θ θ 0 if θ < yn
20
The p.d.f. of Y is
y 1 y n−1
gn (y) = n( )n−1 = n n , 0 ≤ y ≤ θ
θ θ θ
Rθ n−1
E(Yn ) = 0 yn y θn dy = n+1
n
θ 6= θ ⇒ m.l.e. θ̂ = Yn is not unbiased.
n
However, E(Yn ) = n+1 θ → θ as n → ∞, m.l.e. θ̂ is asymptotically unbiased.
Z θ
2 y n−1 n 2
E(Yn ) = y 2 n n dy = θ
0 θ n+2
n 2 n 2 2
Var(Yn ) = E(Yn2 )−(EYn )2 = θ −( ) θ −→ θ2 −θ2 = 0 as n −→ ∞.
n+2 n+1
P
⇒ Yn −→ θ ⇒ m.l.e. θ̂ = Yn is consistent for θ .
Is there unbiased estimator for θ ?
n+1 n+1 n+1 n
E( Yn ) = E(Yn ) = θ=θ
n n n n+1
⇒ n+1
n
Yn is unbiased for θ.
Example :
(a) Y ∼ b(n, p)
The likelihood function is
n y
L(p) = fY (y, p) = p (1 − p)n−y
y
n
ln L(p) = ln + y ln p + (n − y) ln (1 − p)
y
∂ ln L(p) y n−y y n−y
= − =0⇔ = ⇔ y(1−p) = p(n−y) ⇔ y = np
∂p p 1−p p 1−p
⇒ m.l.e. p̂ = Yn
E(p̂) = n1 E(Y ) = p ⇒ m.l.e. p̂ = Yn is unbiased.
Var(p̂) = n12 Var(Y ) = n1 p(1 − p) −→ 0 as n −→ ∞
⇒ m.l.e. p̂ = Yn is consistent for p.
i=1 2π(σ 2 ) 2
21
n
2 n n 2 1 X
ln L(µ, σ ) = (− ) ln (2π) − ln σ − 2 (xi − µ)2
2 2 2σ i=1
n
∂ ln L(µ, σ 2 ) 1 X
= 2 (xi − µ)2 = 0 ⇒ µ̂ = X
∂µ σ i=1
n n
∂ ln L(µ̂, σ 2 ) n 1 X 2 ˆ 1X
2
=− 2 + 4 2
(xi − x) = 0 ⇒ σ = (xi − x)2
∂σ 2σ 2σ i=1 n i=1
2
E(µ̂) = E(X) = µ (unbiased),Var(µ̂) = Var(X) = σn −→ 0 as n −→ ∞
⇒ m.l.e. µ̂ isP consistent for µ.
E(σ̂ 2 ) = E( n1 (Xi − X)2 ) = n−1 n
σ 2 6= σ 2 (biased).
E(σ̂ 2 ) = n−1
n
σ 2 −→ σ 2 as n −→ ∞ ⇒ σ̂ 2 is asymptotically unbiased.
n
(xi − x)2
P
n
1 X 1
Var(σ̂ 2 ) = Var( (xi − x)2 ) = 2 Var(σ 2 i=1 2 )
n
i=1
n σ
n
(xi − x)2
P
σ4 2(n − 1) 4
= 2 Var( i=1 2 )= σ −→ 0 as n −→ ∞
n σ n2
Suppose that we have m.l.e. θ̂ = θ̂(x1 , . . . , xn ) for parameter θ and our in-
terest is a new parameter τ (θ), a function of θ.
What is the m.l.e. of τ (θ) ?
The space of τ (θ) is T = {τ : ∃θ ∈ Θ s.t τ = τ (θ)}
22
M (τ (θ̂), x1 , . . . , xn ) = L(τ −1 (τ (θ̂), x1 , . . . , xn )
= L(θ̂, x1 , . . . , xn )
≥ L(θ, x1 , . . . , xn ), ∀θ ∈ Θ
= L(τ −1 (τ (θ)), x1 , . . . , xn )
= M (τ (θ), x1 , . . . , xn ), ∀θ ∈ Θ
= M (τ, x1 , . . . , xn ), τ ∈ T
⇒ τ (θ̂) is m.l.e. of τ (θ).
This is the invariance property of m.l.e.
Example :
Y
(1)If Y ∼ b(n, p), m.l.e of p is p̂ = n
τ (p) m.l.e of τ (p)
Y 2
p2 pb2 = (q
n
)
√ √
cp = Y
p n
p(1 − p) is not a 1-1 function of p.
Y
ep ebp = e n
Y
e−p −p = e− n
ec
iid
(2) X1 , . . . , Xn ∼ N (µ, σ 2 ), m.l.e.’s of (µ, σ 2 ) is (X, n1 (Xi − X)2 ).
P
q P
m.l.e.’s of (µ, σ) is (X, n1 (Xi − X)2 ) (∵ σ ∈ (0, ∞) ∴ σ 2 −→ σ is
1-1)
You can also solve
∂ ln L(µ, σ 2 , x1 , . . . , xn )
=0
∂µ
∂ ln L(µ, σ 2 , x1 , . . . , xn )
= 0 for µ, σ
∂σ
(µ2 , σ) is not a 1-1 function of (µ, σ 2 ).
(∵ µ ∈ (−∞, ∞) ∴ µ −→ µ2 isn’t 1-1)
Best estimator :
23
There are several ways in deriving UMVUE of θ.
Cramer-Rao lower bound for variance of unbiased estimator :
Regularity conditions :
(a) Parameter space Θ is an open interval. (a, ∞), (a, b), (b, ∞), a,b are
constants not depending on θ.
R ∂f (x,θ) ∂
R
(c) ∂θ
dx = ∂θ
f (x, θ)dx = 0
24
Now,
n
∂ Y ∂
f (xi , θ) = [f (x1 , θ)f (x2 , θ) · · · f (xn , θ)]
∂θ i=1 ∂θ
∂ Y ∂ Y
= ( f (x1 , θ)) f (xi , θ) + · · · + ( f (xn , θ)) f (xi , θ)
∂θ i6=1
∂θ i6=n
n
X ∂ Y
= f (xj , θ) f (xi , θ)
j=1
∂θ i6=j
n
X ∂ ln f (xj , θ) Y
= f (xj , θ) f (xi , θ)
j=1
∂θ i6=j
n n
X ∂ ln f (xj , θ) Y
= f (xi , θ)
j=1
∂θ j=1
Cauchy-Swartz Inequality
[E(XY )]2 ≤ E(X 2 )E(Y 2 )
Then
Z Z n n n
0
X ∂ ln f (xj , θ) Y Y
τ (θ) = ··· (t(x1 , . . . , xn ) − τ (θ))( ) f (xi , θ) dxi
j=1
∂θ i=1 i=1
n
X ∂ ln f (xj , θ)
= E[(t(x1 , . . . , xn ) − τ (θ)) ]
j=1
∂θ
n
0 2
X ∂ ln f (xj , θ) 2
2
(τ (θ)) ≤ E[(t(x1 , . . . , xn ) − τ (θ)) ] E[( )]
j=1
∂θ
(τ 0 (θ))2
⇒ Var(τ̂ (θ)) ≥ n
P ∂ ln f (xj ,θ) 2
E[( ∂θ
)]
j=1
Since
n n
X ∂ ln f (xj , θ) 2
X ∂ ln f (xj , θ) 2 X ∂ ln f (xj , θ) ∂ ln f (xi , θ)
E[( ) ]= E( ) + E( )
j=1
∂θ j=1
∂θ i6=j
∂θ ∂θ
n
X ∂ ln f (xj , θ) 2
= E( )
j=1
∂θ
∂ ln f (xj , θ) 2
= n E( )
∂θ
25
Then, we have
(τ 0 (θ))2
Varθ τ̂ (θ) ≥
nEθ [( ∂ ln ∂θ
f (x,θ) 2
)]
You may further check that
∂ 2 ln f (x, θ) ∂ ln f (x, θ) 2
Eθ ( 2
) = −Eθ ( )
∂θ ∂θ
Note:
If τ (θ) = θ, then any unbiased estimator θ̂ satisfies
(τ 0 (θ))2
Varθ (θ̂) ≥ 2
−nEθ ( ∂ ln f (x,θ)
∂θ2
)
Example:
iid
(a)X1 , . . . , Xn ∼ Poisson(λ), E(X) = λ, Var(X) = λ.
MLE λ̂ = X, E(λ̂) = λ, Var(λ̂) = nλ .
x e−λ
p.d.f. f (x, λ) = λ x! , x = 0, 1, . . .
⇒ ln f (x, λ) = x ln λ − λ − ln x!
∂ x
⇒ ln f (x, λ) = − 1
∂λ λ
∂2 x
⇒ ln f (x, λ) = −
∂λ2 λ2
2
∂ x E(X) 1
E( 2 ln f (x, λ)) = E(− 2 ) = − 2 = −
∂λ λ λ λ
Cramer-Rao lower bound is
1 λ
1 = = Var(λ̂)
−n(− λ ) n
26
iid
(b)X1 , . . . , Xn ∼ Bernoulli(p), E(X) = p, Var(X) = p(1 − p).
Want UMVUE of p.
p.d.f f (x, p) = px (1 − p)1−x
⇒ ln f (x, p) = x ln p + (1 − x) ln(1 − p)
∂ x 1−x
ln f (x, p) = −
∂p p 1−p
2
∂ x 1−x
2
ln f (x, p) = − 2 +
∂p p (1 − p)2
∂2 X 1−X 1 1 1
E( 2 ln f (X, p)) = E(− 2 + 2
)=− + =−
∂p p (1 − p) p 1−p p(1 − p)
1 p(1 − p)
1 =
−n(− p(1−p) ) n
m.l.e. of p is p̂ = X
p(1−p)
E(p̂) = E(X) = p, Var(p̂) = Var(X) = n
= C-R lower bound.
⇒ MLE p̂ is the UMVUE of p.
27