0% found this document useful (0 votes)

32 views15 pages

Chapter 3 - Statistical Inference (Point Estimation

Uploaded by

羅峻熒

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views15 pages

Chapter 3 - Statistical Inference (Point Estimation

Uploaded by

羅峻熒

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Chapter 3.

Statistical Inference – Point Estimation

Problem in statistics:
A random variables X with p.d.f. of the form f (x, θ) where function f is
known but parameter θ is unknown. We want to gain knowledge about θ.
What we have for inference:
There is a random sample X1 , . . . , Xn from f (x, θ).
 


  Point estimation: θ̂ = θ̂(X1 , . . . , Xn )


 
Interval estimation:

 
 Estimation 



Find statistics T1 = t1 (X1 , . . . , Xn ), T2 = t2 (X1 , . . . , Xn )
Statistical inferences






 such that 1 − α = P (T1 ≤ θ ≤ T2 )

Hypothesis testing: H0 : θ = θ0 or H0 : θ ≥ θ0 .





 Want to find a rule to decide if we accept or reject H .
0

Def. We call a statistic θ̂ = θ̂(X1 , . . . , Xn ) an estimator of parameter θ

if it is used to estimate θ. If X1 = x1 , . . . , Xn = xn are observed, then
θ̂ = θ̂(x1 , . . . , xn ) is called an estimate of θ.

Two problems are concerned in estimation of θ :

(a) How can we evaluate an estimator θ̂ for its use in estimation of θ ?

Need criterion for this estimation.

(b) Are there general rules in deriving estimators ? We will introduce two
methods for deriving estimator of θ .

Def. We call an estimator θ unbiased for θ if it satisfies

Eθ (θ̂(X1 , . . . , Xn )) = θ, ∀θ.
( R
∞ R∞
··· θ̂(x1 , . . . , xn )f (x1 , . . . , xn , θ)dx1 · · · dxn
Eθ (θ̂(X1 , . . . , Xn )) = R−∞
∞
−∞

−∞
θ fθ̂ (θ )dθ∗ where θ̂ = θ̂(X1 , . . . , Xn ) is a r.v. with pdf fθ̂ (θ∗ )
∗ ∗

Def. If Eθ (θ̂(X1 , . . . , Xn )) 6= θ for some θ, we said that θ̂ is a biased esti-

mator.

13
iid
Example : X1 , . . . , Xn ∼ N (µ, σ 2 ), Suppose that our interest is µ, X1 ,
Eµ (X1 ) = µ, is unbiased for µ,
1
2
(X1 + X2 ), E( X1 +X2
2
) = µ, is unbiased for µ,
X, Eµ (X) = µ, is unbiased for µ,
n→∞
I an −→ a, if , for > 0, there exists N > 0 such that |an −a| < if n ≥ N.
{Xn } is a sequence of r.v.’s. How can we define Xn −→ X as n −→ ∞?

Def. We say that Xn converges to X, a r.v. or a constant, in probability

if for > 0,
P (|Xn − X| > ) −→ 0, as n −→ ∞.
P
In this case, we denote Xn −→ X.

Thm.
P
If E(Xn ) = a or E(Xn ) −→ a and Var(Xn ) −→ 0, then Xn −→ a.

Proof.

E[(Xn − a)2 ] = E[(Xn − E(Xn ) + E(Xn ) − a)2 ]

= E[(Xn − E(Xn ))2 ] + E[(E(Xn ) − a)2 ] + 2E[(Xn − E(Xn ))(E(Xn ) − a)]
= Var(Xn ) + E((Xn ) − a)2

Chebyshev’s Inequality :

E(Xn − X)2 1
P (|Xn − X| ≥ ) ≤ or P (|X n − µ| ≥ kσ) ≤
2 k2
For > 0,

0 ≤ P (|Xn − a| > ) = P ((Xn − a)2 > 2 )

E(Xn − a)2 Var(Xn ) + (E(Xn ) − a)2
≤ = −→ 0 as n −→ ∞.
2 2
P
⇒ P (|Xn − a| > ) −→ 0, as n −→ ∞. ⇒ Xn −→ a.

Thm. Weak Law of Large Numbers(WLLN)

If X1 , . . . , Xn is a random sample with mean µ and finite variance σ 2 , then
P
X −→ µ.

14
Proof.
σ2 P
E(X) = µ, Var(X) = −→ 0 as n −→ ∞. ⇒ X −→ µ.
n

P
Def. We sat that θ̂ is a consistent estimator of θ if θ̂ −→ θ.
Example : X1 , . . . , Xn is a random sample with mean µ and finite variance
σ 2 .Is X1 a consistent estimator of µ ?
E(X1 )=µ, X1 is unbiased for µ.
Let > 0,

P (|X1 − µ| > ) = 1 − P (|X1 − µ| ≤ ) = 1 − P (µ − ≤ X1 ≤ µ + )

Z µ+
=1− fX (x)dx > 0, 9 0 as n −→ ∞.
µ−

⇒ X is not a consistent estimator of µ

σ2
E(X) = µ, Var(X) = −→ 0 as n −→ ∞.
n
P
⇒ X −→ µ.
⇒ X is a consistent estimator of µ.

I Unbiasedness and consistency are two basic conditions for good estimator.

Moments :
Let X be a random variable having a p.d.f. f (x, θ), the population kth
moment is defined by
 P
 xk f (x, θ) , discrete
k
Eθ (X ) = all x
 R∞ k
−∞
x f (x, θ)dx , continuous
n
1
Xi k .
P
The sample kth moment is defined by n
i=1
Note :
n n n
1X k 1X 1X
E( Xi ) = E(Xi k ) = Eθ (X k ) = Eθ (X k )
n i=1 n i=1 n i=1

15
⇒ Sample kth moment is unbiased for population kth moment.
n n n
1X k 1 X 1 X 1
Var( Xi ) = 2 Var( Xi k ) = 2 Var(Xi k ) = Var(X k ) −→ 0 as n −→ ∞.
n i=1 n i=1
n i=1 n
n
P
1
Xi k −→ Eθ (X k ).
P
⇒ n
i=1
n
1
Xi k is a consistent estimator of Eθ (X k ).
P
⇒ n
i=1

Let X1 , . . . , Xn be a random sample with mean µ and variance σ 2 .The sample

n
1
variance is defined by S 2 = n−1 (Xi −X)2 Want to show that S 2 is unbiased
P
i=1
for σ 2 .

Var(X) = E[(X − µ)2 ] = E[X 2 − 2µX + µ2 ] = E(X 2 ) − µ2

⇒ E(X 2 ) = Var(X) + µ2 = Var(X) + (E(X))2

σ2
E(X) = µ, Var(X) =
n
n n n
2 1 X 2 1 X
2
X 2
E(S ) = E( (Xi − X) ) = E( Xi − 2X Xi + nX )
n − 1 i=1 n − 1 i=1 i=1
n n
1 X 2 1 X 2
= E( Xi2 − nX ) = [ E(Xi2 ) − nE(X )]
n − 1 i=1 n − 1 i=1
1 σ2 1
= [nσ + nµ − n( + µ2 )] =
2 2
(n − 1)σ 2 = σ 2
n−1 n n−1
n
1
⇒ S2 = (Xi − X)2 is unbiased for σ 2 .
P
n−1
i=1

n n
2 1 X 2 2 n 1X 2 2 P
S = [ Xi − nX ] = [ Xi − X ] −→ E(X 2 ) − µ2 = σ 2 + µ2 − µ2 = σ 2
n − 1 i=1 n − 1 n i=1
 
X1 , . . . , Xn are iid with mean µ and variance σ 2
 X1 2 , . . . , Xn 2 are iid r.v.’s with mean E(X 2 ) = µ2 + σ 2 
 
 n 
P
By WLLN , n1 Xi2 −→ E(X 2 ) = µ2 + σ 2
P
i=1

P
⇒ s2 −→ σ 2

16
Def. Let X1 , . . . , Xn be a random sample from a distribution with p.d.f.
f (x, θ)

(a) If θ is univariate, the method of moment estimator θ̂ solve θ for X =

Eθ (X)

(b) If θ = (θ1 , θ2 ) is bivariate, the method of moment estimator (θˆ1 , θˆ2 )

solves (θ1 , θ2 ) for
n
1X 2
X = Eθ1 ,θ2 (X), Xi = Eθ1 ,θ2 (X 2 )
n i=1

(c) If θ = (θ1 , . . . , θk ) is k-variate, the method of moment estimator (θˆ1 , . . . , θˆk )

solves θ1 , . . . , θk for
n
1X j
Xi = Eθ1 ,...,θk (X j ), j = 1, . . . , k
n i=1

Example :

iid
(a) X1 , . . . , Xn ∼ Bernoulli(p)
Let X = Ep (X) = p
⇒ The method of moment estimator of p is p̂ = X
P
By WLLN, p̂ = X −→ Ep (X) = p ⇒ p̂ is consistent for p.
E(p̂) = E(X) = E(X) = p ⇒ p̂ is unbiased for p.

(b) Let X1 , . . . , Xn be a random sample from Poisson(λ)

Let X = Eλ (X) = λ
⇒ The method of moment estimator of λ is λ̂ = X
E(λ̂) = E(X) = λ ⇒ λ̂ is unbiased for λ.
P
λ̂ = X −→ E(X) = λ ⇒ λ̂ is consistent for λ.

(c) Let X1 , . . . , Xn be a random sample with mean µ and variance σ 2 .

θ = (µ, σ 2 )
Let X = Eµ,σ2 (X) = µ
n
1
Xi 2 = Eµ,σ2 (X 2 ) = σ 2 + µ2
P
n
i=1
⇒ Method of moment estimator are µ̂ = X ,

17
n 2 n
σ̂ 2 = 1
Xi 2 − X = 1
(Xi − X)2 .
P P
n n
i=1 i=1
X is unbiasedPand consistent estimator P
for µ.
1 n−1 1 n−1 2
E(σ̂ ) = E( n (Xi − X) ) = n E( n−1 (Xi − X)2 ) =
2 2
n
σ 6= σ 2
⇒ σ̂ 2 is nnot unbiased for σ 2
P 2 2 p
σ̂ 2 = n1 Xi − X −→ E(X 2 ) − µ2 = σ 2
i=1
⇒ σ̂ 2 is consistent for σ 2 .

Maximum Likelihood Estimator :

Let X1 , . . . , Xn be a random sample with p.d.f. f (x, θ).
The joint p.d.f. of X1 , . . . , Xn is
n
Y
f (x1 , . . . , xn , θ) = f (xi , θ), xi ∈ R, i = 1, . . . , n
i=1

Let Θ be the space of possible values of θ. We call Θ the parameter space.

Def. The likelihood function of a random sample is defined as its joint p.d.f.
as
L(θ) = L(θ, x1 , . . . , xn ) = f (x1 , . . . , xn , θ), θ ∈ Θ.
which is considered as a function of θ.
For (x1 , . . . , xn ) fixed, the value L(θ, x1 , . . . , xn ) is called the likelihood at θ.

Given observation x1 , . . . , xn , the likelihood L(θ, x1 , . . . , xn ) is considered as

the probability that X1 = x1 , . . . , Xn = xn occurs when θ is true.

Def. Let θ̂ = θ̂(x1 , . . . , xn ) be any value of θ that maximizes L(θ, x1 , . . . , xn ).

Then we call θ̂ = θ̂(x1 , . . . , xn ) the maximum likelihood estimator (m.l.e)
of θ. When X1 = x1 , . . . , Xn = xn is observed, we call θ̂ = θ̂(x1 , . . . , xn ) the
maximum likelihood estimate of θ.

Note :

(a) Why m.l.e ?

When L(θ1 , x1 , . . . , xn ) ≥ L(θ2 , x1 , . . . , xn ),
we are more confident to believe θ = θ1 than to believe θ = θ2

18
(b) How to derive m.l.e ?
∂ ln x
∂x
= x1 > 0 ⇒ ln x is % in x
⇒If L(θ1 ) ≥ L(θ2 ), then ln L(θ1 ) ≥ ln L(θ2 )
If θ̂ is the m.l.e., then L(θ̂, x1 , . . . , xn ) = max L(θ, x1 , . . . , xn ) and
θ∈Θ
ln L(θ̂, x1 , . . . , xn ) = max ln L(θ, x1 , . . . , xn )
θ∈Θ
Two cases to solve m.l.e. :
(b.1) ∂ ln∂θL(θ) = 0
(b.2) L(θ) is monotone. Solve max L(θ, x1 , . . . , xn ) from monotone
θ∈Θ
property.

Order statistics:
Let (X1 , . . . , Xn ) be a random sample with d.f. F and p.d.f. f.
Let (Y1 , . . . , Yn ) be a permutation (X1 , . . . , Xn ) such that Y1 ≤ Y2 ≤ · · · Yn .
Then we call (Y1 , . . . , Yn ) the order statistic of (X1 , . . . , Xn ) where Y1 is
the first (smallest) order statistic, Y2 is the second order statistic,. . . , Yn is
the largest order statistic.

If (X1 , . . . , Xn ) are independent, then

Z Z
P (X1 ∈ A1 , X2 ∈ A2 , . . . , Xn ∈ An ) = ··· f (x1 , . . . , xn )dx1 · · · dxn
An A1
Z Z
= fn (xn )dxn · · · f1 (x1 )dx1
An A1
= P (Xn ∈ An ) · · · P (X1 ∈ A1 )

Thm. Let (X1 , . . . , Xn ) be a random sample from a “continuous distribution”

with p.d.f. f (x) and d.f F (x). Then the p.d.f. of Yn = max{X1 , . . . , Xn } is

gn (y) = n(F (y))n−1 f (y)

and the p.d.f. of Y1 = min{X1 , . . . , Xn } is

g1 (y) = n(1 − F (y))n−1 f (y)

Proof. This is a Rn → R transformation. Distribution function of Yn is

Gn (y) = P (Yn ≤ y) = P (max{X1 , . . . , Xn } ≤ y) = P (X1 ≤ y, . . . , Xn ≤ y)

= P (X1 ≤ y)P (X2 ≤ y) · · · P (Xn ≤ y) = (F (y))n

19
⇒ p.d.f. of Yn is gn (y) = Dy (F (y))n = n(F (y))n−1 f (y)
Distribution function of Y1 is

G1 (y) = P (Y1 ≤ y) = P (min{X1 , . . . , Xn } ≤ y) = 1 − P (min{X1 , . . . , Xn } > y)

= 1 − P (X1 > y, X2 > y, . . . , Xn > y) = 1 − P (X1 > y)P (X2 > y) · · · P (Xn > y)
= 1 − (1 − F (y))n

⇒ p.d.f. of Y1 is g1 (y) = Dy (1 − (1 − F (y))n ) = n(1 − F (y))n−1 f (y)

Example : Let (X1 , . . . , Xn ) be a random sample from U (0, θ).

Find m.l.e. of θ. Is it unbiased and consistent ?
sol: The p.d.f. of X is
(
1
θ
if 0 ≤ x ≤ θ
f (x, θ) =
0 elsewhere.

Consider the indicator function

(
1 if a ≤ x ≤ b
I(a,b) (x) =
0 elsewhere.

Then f (x, θ) = 1θ I[0,θ] (x).

The likelihood function is
n n n
Y Y 1 1 Y
L(θ) = f (xi , θ) = I[0,θ] (xi ) = n I[0,θ] (xi )
i=1 i=1
θ θ i=1

Let Yn = max{X1 , . . . , Xn }
Qn
Then I[0,θ] (xi ) = 1 ⇔ 0 ≤ xi ≤ θ, for all i = 1, . . . , n ⇔ 0 ≤ yn ≤ θ
i=1
We then have
(
1 1 1
θn
if θ ≥ yn
L(θ) = n
I[0,θ] (yn ) = n I[yn ,∞] (θ) =
θ θ 0 if θ < yn

L(θ) is maximized when θ = yn . Then m.l.e. of θ is θ̂ = Yn

The d.f. of x is
Z x
1 x
F (x) = P (X ≤ x) = dt = , 0 ≤ x ≤ θ
0 θ θ

20
The p.d.f. of Y is
y 1 y n−1
gn (y) = n( )n−1 = n n , 0 ≤ y ≤ θ
θ θ θ
Rθ n−1
E(Yn ) = 0 yn y θn dy = n+1
n
θ 6= θ ⇒ m.l.e. θ̂ = Yn is not unbiased.
n
However, E(Yn ) = n+1 θ → θ as n → ∞, m.l.e. θ̂ is asymptotically unbiased.
Z θ
2 y n−1 n 2
E(Yn ) = y 2 n n dy = θ
0 θ n+2
n 2 n 2 2
Var(Yn ) = E(Yn2 )−(EYn )2 = θ −( ) θ −→ θ2 −θ2 = 0 as n −→ ∞.
n+2 n+1
P
⇒ Yn −→ θ ⇒ m.l.e. θ̂ = Yn is consistent for θ .
Is there unbiased estimator for θ ?
n+1 n+1 n+1 n
E( Yn ) = E(Yn ) = θ=θ
n n n n+1
⇒ n+1
n
Yn is unbiased for θ.
Example :
(a) Y ∼ b(n, p)
The likelihood function is

n y
L(p) = fY (y, p) = p (1 − p)n−y
y

n
ln L(p) = ln + y ln p + (n − y) ln (1 − p)
y
∂ ln L(p) y n−y y n−y
= − =0⇔ = ⇔ y(1−p) = p(n−y) ⇔ y = np
∂p p 1−p p 1−p
⇒ m.l.e. p̂ = Yn
E(p̂) = n1 E(Y ) = p ⇒ m.l.e. p̂ = Yn is unbiased.
Var(p̂) = n12 Var(Y ) = n1 p(1 − p) −→ 0 as n −→ ∞
⇒ m.l.e. p̂ = Yn is consistent for p.

(b) X1 , . . . , Xn are a random sample from N (µ, σ 2 ). Want m.l.e.’s of µ and

σ2
The likelihood function is
∞ Pn 2
1 (x −µ)2 n n − i=1 (xi −µ)
− i 2
Y
2 − 2 −
L(µ, σ ) = √ 1 e 2σ = (2π) 2 (σ ) 2 e 2σ 2

i=1 2π(σ 2 ) 2

21
n
2 n n 2 1 X
ln L(µ, σ ) = (− ) ln (2π) − ln σ − 2 (xi − µ)2
2 2 2σ i=1
n
∂ ln L(µ, σ 2 ) 1 X
= 2 (xi − µ)2 = 0 ⇒ µ̂ = X
∂µ σ i=1
n n
∂ ln L(µ̂, σ 2 ) n 1 X 2 ˆ 1X
2
=− 2 + 4 2
(xi − x) = 0 ⇒ σ = (xi − x)2
∂σ 2σ 2σ i=1 n i=1
2
E(µ̂) = E(X) = µ (unbiased),Var(µ̂) = Var(X) = σn −→ 0 as n −→ ∞
⇒ m.l.e. µ̂ isP consistent for µ.
E(σ̂ 2 ) = E( n1 (Xi − X)2 ) = n−1 n
σ 2 6= σ 2 (biased).
E(σ̂ 2 ) = n−1
n
σ 2 −→ σ 2 as n −→ ∞ ⇒ σ̂ 2 is asymptotically unbiased.
n
(xi − x)2
P
n
1 X 1
Var(σ̂ 2 ) = Var( (xi − x)2 ) = 2 Var(σ 2 i=1 2 )
n
i=1
n σ
n
(xi − x)2
P
σ4 2(n − 1) 4
= 2 Var( i=1 2 )= σ −→ 0 as n −→ ∞
n σ n2

⇒ m.l.e. σˆ2 is consistent for σ 2 .

Suppose that we have m.l.e. θ̂ = θ̂(x1 , . . . , xn ) for parameter θ and our in-
terest is a new parameter τ (θ), a function of θ.
What is the m.l.e. of τ (θ) ?
The space of τ (θ) is T = {τ : ∃θ ∈ Θ s.t τ = τ (θ)}

Thm. If θ̂ = θ̂(x1 , . . . , xn ) is the m.l.e. of θ and τ (θ) is a 1-1 function of θ,

then m.l.e. of τ (θ) is τ (θ̂)

Proof. The likelihood function for θ is L(θ, x1 , . . . , xn ). Then the likelihood

function for τ (θ) can be derived as follows :

L(θ, x1 , . . . , xn ) = L(τ −1 (τ (θ)), x1 , . . . , xn )

= M (τ (θ), x1 , . . . , xn )
= M (τ, x1 , . . . , xn ), τ ∈ T

22
M (τ (θ̂), x1 , . . . , xn ) = L(τ −1 (τ (θ̂), x1 , . . . , xn )
= L(θ̂, x1 , . . . , xn )
≥ L(θ, x1 , . . . , xn ), ∀θ ∈ Θ
= L(τ −1 (τ (θ)), x1 , . . . , xn )
= M (τ (θ), x1 , . . . , xn ), ∀θ ∈ Θ
= M (τ, x1 , . . . , xn ), τ ∈ T
⇒ τ (θ̂) is m.l.e. of τ (θ).
This is the invariance property of m.l.e.
Example :
Y
(1)If Y ∼ b(n, p), m.l.e of p is p̂ = n
τ (p) m.l.e of τ (p)
Y 2
p2 pb2 = (q
n
)
√ √
cp = Y
p n
p(1 − p) is not a 1-1 function of p.
Y
ep ebp = e n
Y
e−p −p = e− n
ec
iid
(2) X1 , . . . , Xn ∼ N (µ, σ 2 ), m.l.e.’s of (µ, σ 2 ) is (X, n1 (Xi − X)2 ).
P
q P
m.l.e.’s of (µ, σ) is (X, n1 (Xi − X)2 ) (∵ σ ∈ (0, ∞) ∴ σ 2 −→ σ is
1-1)
You can also solve
∂ ln L(µ, σ 2 , x1 , . . . , xn )
=0
∂µ
∂ ln L(µ, σ 2 , x1 , . . . , xn )
= 0 for µ, σ
∂σ
(µ2 , σ) is not a 1-1 function of (µ, σ 2 ).
(∵ µ ∈ (−∞, ∞) ∴ µ −→ µ2 isn’t 1-1)

Best estimator :

Def. An unbiased estimator θ̂ = θ̂(X1 , . . . , Xn ) is called a uniformly min-

imum variance unbiased estimator (UMVUE) or best estimator if for any
unbiased estimator θˆ∗ ,we have
Varθ θ̂ ≤ Varθ θˆ∗ , for θ ∈ Θ
(θ̂ is uniformly better than θˆ∗ in variance. )

23
There are several ways in deriving UMVUE of θ.
Cramer-Rao lower bound for variance of unbiased estimator :
Regularity conditions :
(a) Parameter space Θ is an open interval. (a, ∞), (a, b), (b, ∞), a,b are
constants not depending on θ.

(b) Set {x : f (x, θ) = 0} is independent of θ.

R ∂f (x,θ) ∂
R
(c) ∂θ
dx = ∂θ
f (x, θ)dx = 0

(d) If T = t(x1 , . . . , xn ) is an unbiased estimator, then

Z Z
∂f (x, θ) ∂
t dx = tf (x, θ)dx
∂θ ∂θ
Thm. Cramer-Rao (C-R)
Suppose that the regularity conditions hold.
If τ̂ (θ) = t(X1 , . . . , Xn ) is unbiased for τ (θ), then
(τ 0 (θ))2 (τ 0 (θ))2
Varθ τ̂ (θ) ≥ = 2 for θ ∈ Θ
nEθ [( ∂ ln ∂θ
f (x,θ) 2
)] −nEθ [( ∂ ln f (x,θ)
∂θ2
)]
Proof. Consider only the continuous distribution.
Z ∞ Z ∞
∂ ln f (x, θ) ∂ ln f (x, θ) ∂f (x, θ)
E[ ]= f (x, θ)dx = dx
∂θ −∞ ∂θ −∞ ∂θ
Z ∞
∂
= f (x, θ)dx = 0
∂θ −∞
Z Z n
Y n
Y
τ (θ) = Eθ τ̂ (θ) = Eθ (t(x1 , . . . , xn )) = ··· t(x1 , . . . , xn ) f (xi , θ) dxi
i=1 i=1
Taking derivatives both sides.
Z Z n n Z Z Yn n
0 ∂ Y Y ∂ Y
τ (θ) = · · · t(x1 , . . . , xn ) f (xi , θ) dxi − τ (θ) ··· f (xi , θ) dxi
∂θ i=1 i=1
∂θ i=1 i=1
Z Z n n Z Z n n
∂ Y Y ∂ Y Y
= · · · t(x1 , . . . , xn ) f (xi , θ) dxi − · · · τ (θ) f (xi , θ) dxi
∂θ i=1 i=1
∂θ i=1 i=1
Z Z n n
∂ Y Y
= · · · (t(x1 , . . . , xn ) − τ (θ))( f (xi , θ)) dxi
∂θ i=1 i=1

24
Now,
n
∂ Y ∂
f (xi , θ) = [f (x1 , θ)f (x2 , θ) · · · f (xn , θ)]
∂θ i=1 ∂θ
∂ Y ∂ Y
= ( f (x1 , θ)) f (xi , θ) + · · · + ( f (xn , θ)) f (xi , θ)
∂θ i6=1
∂θ i6=n
n
X ∂ Y
= f (xj , θ) f (xi , θ)
j=1
∂θ i6=j
n
X ∂ ln f (xj , θ) Y
= f (xj , θ) f (xi , θ)
j=1
∂θ i6=j
n n
X ∂ ln f (xj , θ) Y
= f (xi , θ)
j=1
∂θ j=1

Cauchy-Swartz Inequality
[E(XY )]2 ≤ E(X 2 )E(Y 2 )
Then
Z Z n n n
0
X ∂ ln f (xj , θ) Y Y
τ (θ) = ··· (t(x1 , . . . , xn ) − τ (θ))( ) f (xi , θ) dxi
j=1
∂θ i=1 i=1
n
X ∂ ln f (xj , θ)
= E[(t(x1 , . . . , xn ) − τ (θ)) ]
j=1
∂θ
n
0 2
X ∂ ln f (xj , θ) 2
2
(τ (θ)) ≤ E[(t(x1 , . . . , xn ) − τ (θ)) ] E[( )]
j=1
∂θ
(τ 0 (θ))2
⇒ Var(τ̂ (θ)) ≥ n
P ∂ ln f (xj ,θ) 2
E[( ∂θ
)]
j=1

Since
n n
X ∂ ln f (xj , θ) 2
X ∂ ln f (xj , θ) 2 X ∂ ln f (xj , θ) ∂ ln f (xi , θ)
E[( ) ]= E( ) + E( )
j=1
∂θ j=1
∂θ i6=j
∂θ ∂θ
n
X ∂ ln f (xj , θ) 2
= E( )
j=1
∂θ
∂ ln f (xj , θ) 2
= n E( )
∂θ

25
Then, we have
(τ 0 (θ))2
Varθ τ̂ (θ) ≥
nEθ [( ∂ ln ∂θ
f (x,θ) 2
)]
You may further check that

∂ 2 ln f (x, θ) ∂ ln f (x, θ) 2
Eθ ( 2
) = −Eθ ( )
∂θ ∂θ

Thm. If there is an unbiased estimator τ̂ (θ) with variance achieving the

(τ 0 (θ))2
Cramer-Rao lower bound ∂ 2 ln f (x,θ)
,then τ̂ (θ) is a UMVUE of τ (θ).
−nEθ [( )]
∂θ 2

Note:
If τ (θ) = θ, then any unbiased estimator θ̂ satisfies

(τ 0 (θ))2
Varθ (θ̂) ≥ 2
−nEθ ( ∂ ln f (x,θ)
∂θ2
)

Example:
iid
(a)X1 , . . . , Xn ∼ Poisson(λ), E(X) = λ, Var(X) = λ.
MLE λ̂ = X, E(λ̂) = λ, Var(λ̂) = nλ .
x e−λ
p.d.f. f (x, λ) = λ x! , x = 0, 1, . . .

⇒ ln f (x, λ) = x ln λ − λ − ln x!
∂ x
⇒ ln f (x, λ) = − 1
∂λ λ
∂2 x
⇒ ln f (x, λ) = −
∂λ2 λ2
2
∂ x E(X) 1
E( 2 ln f (x, λ)) = E(− 2 ) = − 2 = −
∂λ λ λ λ
Cramer-Rao lower bound is
1 λ
1 = = Var(λ̂)
−n(− λ ) n

⇒ MLE λ̂ = X is the UMVUE of λ.

26
iid
(b)X1 , . . . , Xn ∼ Bernoulli(p), E(X) = p, Var(X) = p(1 − p).
Want UMVUE of p.
p.d.f f (x, p) = px (1 − p)1−x

⇒ ln f (x, p) = x ln p + (1 − x) ln(1 − p)
∂ x 1−x
ln f (x, p) = −
∂p p 1−p
2
∂ x 1−x
2
ln f (x, p) = − 2 +
∂p p (1 − p)2
∂2 X 1−X 1 1 1
E( 2 ln f (X, p)) = E(− 2 + 2
)=− + =−
∂p p (1 − p) p 1−p p(1 − p)

C-R lower bound for p is

1 p(1 − p)
1 =
−n(− p(1−p) ) n

m.l.e. of p is p̂ = X
p(1−p)
E(p̂) = E(X) = p, Var(p̂) = Var(X) = n
= C-R lower bound.
⇒ MLE p̂ is the UMVUE of p.

Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

Chapter 3 - Statistical Inference (Point Estimation

Uploaded by

Chapter 3 - Statistical Inference (Point Estimation

Uploaded by

Chapter 3.

Statistical Inference – Point Estimation

Def. We call a statistic θ̂ = θ̂(X1 , . . . , Xn ) an estimator of parameter θ

Two problems are concerned in estimation of θ :

(a) How can we evaluate an estimator θ̂ for its use in estimation of θ ?

Def. We call an estimator θ unbiased for θ if it satisfies

Def. If Eθ (θ̂(X1 , . . . , Xn )) 6= θ for some θ, we said that θ̂ is a biased esti-

Def. We say that Xn converges to X, a r.v. or a constant, in probability

E[(Xn − a)2 ] = E[(Xn − E(Xn ) + E(Xn ) − a)2 ]

0 ≤ P (|Xn − a| > ) = P ((Xn − a)2 > 2 )

Thm. Weak Law of Large Numbers(WLLN)

P (|X1 − µ| > ) = 1 − P (|X1 − µ| ≤ ) = 1 − P (µ −  ≤ X1 ≤ µ + )

⇒ X is not a consistent estimator of µ

Let X1 , . . . , Xn be a random sample with mean µ and variance σ 2 .The sample

Var(X) = E[(X − µ)2 ] = E[X 2 − 2µX + µ2 ] = E(X 2 ) − µ2

⇒ E(X 2 ) = Var(X) + µ2 = Var(X) + (E(X))2

(a) If θ is univariate, the method of moment estimator θ̂ solve θ for X =

(b) If θ = (θ1 , θ2 ) is bivariate, the method of moment estimator (θˆ1 , θˆ2 )

(c) If θ = (θ1 , . . . , θk ) is k-variate, the method of moment estimator (θˆ1 , . . . , θˆk )

(b) Let X1 , . . . , Xn be a random sample from Poisson(λ)

(c) Let X1 , . . . , Xn be a random sample with mean µ and variance σ 2 .

Maximum Likelihood Estimator :

Let Θ be the space of possible values of θ. We call Θ the parameter space.

Given observation x1 , . . . , xn , the likelihood L(θ, x1 , . . . , xn ) is considered as

Def. Let θ̂ = θ̂(x1 , . . . , xn ) be any value of θ that maximizes L(θ, x1 , . . . , xn ).

(a) Why m.l.e ?

If (X1 , . . . , Xn ) are independent, then

Thm. Let (X1 , . . . , Xn ) be a random sample from a “continuous distribution”

gn (y) = n(F (y))n−1 f (y)

and the p.d.f. of Y1 = min{X1 , . . . , Xn } is

g1 (y) = n(1 − F (y))n−1 f (y)

Proof. This is a Rn → R transformation. Distribution function of Yn is

Gn (y) = P (Yn ≤ y) = P (max{X1 , . . . , Xn } ≤ y) = P (X1 ≤ y, . . . , Xn ≤ y)

G1 (y) = P (Y1 ≤ y) = P (min{X1 , . . . , Xn } ≤ y) = 1 − P (min{X1 , . . . , Xn } > y)

⇒ p.d.f. of Y1 is g1 (y) = Dy (1 − (1 − F (y))n ) = n(1 − F (y))n−1 f (y)

Example : Let (X1 , . . . , Xn ) be a random sample from U (0, θ).

Consider the indicator function

Then f (x, θ) = 1θ I[0,θ] (x).

L(θ) is maximized when θ = yn . Then m.l.e. of θ is θ̂ = Yn

(b) X1 , . . . , Xn are a random sample from N (µ, σ 2 ). Want m.l.e.’s of µ and

⇒ m.l.e. σˆ2 is consistent for σ 2 .

Thm. If θ̂ = θ̂(x1 , . . . , xn ) is the m.l.e. of θ and τ (θ) is a 1-1 function of θ,

Proof. The likelihood function for θ is L(θ, x1 , . . . , xn ). Then the likelihood

L(θ, x1 , . . . , xn ) = L(τ −1 (τ (θ)), x1 , . . . , xn )

Def. An unbiased estimator θ̂ = θ̂(X1 , . . . , Xn ) is called a uniformly min-

(b) Set {x : f (x, θ) = 0} is independent of θ.

(d) If T = t(x1 , . . . , xn ) is an unbiased estimator, then

Thm. If there is an unbiased estimator τ̂ (θ) with variance achieving the

⇒ MLE λ̂ = X is the UMVUE of λ.

C-R lower bound for p is

You might also like

0 ≤ P (|Xn − a| > ) = P ((Xn − a)2 > 2 )

P (|X1 − µ| > ) = 1 − P (|X1 − µ| ≤ ) = 1 − P (µ − ≤ X1 ≤ µ + )