0% found this document useful (0 votes)

77 views37 pages

Estimation EMV

Uploaded by

Robinson Ortega Meza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views37 pages

Estimation EMV

Uploaded by

Robinson Ortega Meza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Economics 240A Fall 2003

Department of Economics M. Jansson

University of California at Berkeley

Point Estimation

Recall that a probability space (sometimes called a probability model ) is a triple (Ω, B, P ) , where Ω is
a sample space, B is a σ-algebra of events (subsets of Ω) and P is a probability function (deÞned on B).
In statistics we need to be able to study several probability functions simultaneously.

DeÞnition. A statistical experiment (sometimes called a statistical model ) is a triple (Ω, B, P) , where
Ω is a sample space, B is a σ-algebra of events and P is a collection of probability functions deÞned on
B; that is, P is a collection of probability functions such that (Ω, B, P ) is a probability space for each P ∈ P.
© ª
DeÞnition. A statistical model (Ω, B, P) is parametric if P is of the form Pθ : θ ∈ Θ ⊆ Rk , where
θ is a parameter, which takes on values in the parameter space Θ, a subset of Rk .

For concreteness, we will only consider parametric statistical models. Moreover, we will assume that the
elementary outcomes are vectors of real numbers and that these outcomes are realizations of a collection
of i.i.d. random variables. That is, each outcome is of the form
 
x1
 
x =  ...  ,
xn

where each xi (i = 1, . . . , n) is a realization of a random variable Xi and the random variables X1 , . . . , Xn

are a random sample from some distribution with cdf F (·|θ) , where θ ∈ Θ is unknown. Under these
assumptions, each of the probability functions Pθ appearing in the deÞnition of a parametric statistical
model is uniquely determined by the corresponding cdf F (·|θ) . In other words, there is a one-to-one cor-
respondence between the collection P = {Pθ : θ ∈ Θ} and the associated family F = {F (·|θ) : θ ∈ Θ} of
marginal cdfs, so we can (and typically will) specify a statistical model in terms of the latter.

Example. Suppose Xi ∼ i.i.d. Ber (p) , where p ∈ [0, 1] is an unknown parameter. In this case, θ = p,
Θ = [0, 1] ⊆ R and

 0 for x < 0
F (x|p) = 1−p for 0 ≤ x < 1

1 for x ≥ 1.

Example. Suppose Xi ∼ i.i.d. U [0, θ] , where θ > 0 is an unknown parameter. In this case, Θ = R++ ⊆ R
and

 0 for x < 0
F (x|θ) = x/θ for 0 ≤ x < θ

1 for x ≥ θ.

1
Economics 240A Fall 2003

¡ ¢
Example. Suppose Xi ∼ i.i.d. N µ, σ 2 , where µ ∈ R and σ 2 > 0 are unknown parameters. In
¡ ¢0
this case, θ = µ, σ 2 , Θ = R × R++ ⊆ R2 and
Z x
¡ ¢ ¡ ¢
F x|µ, σ 2 = φ t|µ, σ2 dt, x ∈ R,
−∞

where
µ ¶
¡ 2
¢ 1 1 2
φ t|µ, σ = √ exp − 2 (t − µ) , t ∈ R.
2πσ 2 2σ

DeÞnition. Let X1 , . . . , Xn be a random sample from a distribution with cdf F (·|θ) , where θ is an
unknown parameter. A point estimator is any statistic W (X1 , . . . , Xn ) .

At this level of generality, a point estimator is just a random variable. A realized value of a (point)
estimator is called a (point) estimate. The quantity we are trying to estimate (typically θ) is called the
estimand. It is not required that the range of the estimator coincides with the range of the estimand;
that is, W (X1 , . . . , Xn ) ∈ Θ is not required. On the other hand, an estimator W (X1 , . . . , Xn ) of θ is
a good estimator (only) if it is “close” to θ in some probabilistic sense and this will typically require
W (X1 , . . . , Xn ) ∈ Θ.
Casella and Berger (Sections 7.2.1-7.2.3) discuss three methods that can be used to generate estimators
under quite general circumstances. We will cover two of these methods, the method of moments and
the maximum likelihood procedure. Method of moments estimators are obtained by solving a system of
equations, while maximum likelihood estimators are constructed by solving a maximization problem.
Suppose X1 , . . . , Xn is a random sample from a distribution with cdf F (·|θ) , where θ ∈ Θ is an unknown
scalar parameter. Let µ : Θ → R be the function deÞned by
Z ∞
µ (θ) = xdF (x|θ) , θ ∈ Θ.
−∞

As deÞned, µ (θ) is the expected value of a random variable with cdf F (·|θ) . The true parameter value θ
solves the equation

E (X) = µ (θ) ,

where X is a random variable with the same (marginal) distribution as Xi (i = 1, . . . , n) . A method of

moments estimator θ̂ solves the sample analogue of this equation, viz.

1X
n ³ ´
X̄ = Xi = µ θ̂ .
n
i=1

To the extent that X̄ is a good estimator of E (X) (it turns out that it often is), one would expect θ̂ to be
a good estimator of θ.

Example. Suppose Xi ∼ i.i.d. Ber (p) , where p ∈ [0, 1] is an unknown parameter. In this case, θ = p,
Θ = [0, 1] and

µ (p) = p, 0 ≤ p ≤ 1.

2
Economics 240A Fall 2003

Therefore, the method of moments estimator of p is

p̂ = X̄.

Example. Suppose Xi ∼ i.i.d. U [0, θ] , where θ > 0 is an unknown parameter. We have:

θ
µ (θ) = , θ > 0.
2
³ ´
The method of moments estimator θ̂ is found by solving the equation µ θ̂ = X̄ :
³ ´ θ̂
µ θ̂ = = X̄ ⇔ θ̂ = 2X̄.
2
As it turns out, this method of moments estimator is not a terribly good estimator. Notice that even
though θ is unknown, we do know that Xi > θ is impossible when Xi ∼ U [0, θ] . It is possible to have
Xi > θ̂ for some i (e.g. if n = 3 and X1 = X2 = 1 and X3 = 7), so it seems plausible that a better
estimator can be constructed.

In the case of a scalar parameter θ, the method of moments estimator is constructed by solving one (mo-
ment) equation in one unknown parameter. When θ is a k-dimensional parameter vector, θ = (θ1 , . . . , θk )0 ,
the method of moments estimator of θ is constructed by solving k equations in the k unknown parameters
θ1 , . . . , θk .

DeÞnition. Let X1 , . . . , Xn be a random sample from a distribution with cdf F (·|θ) , where θ ∈ Θ ⊆ Rk
is a vector of unknown parameters. For any j = 1, . . . , k, let µj : Θ → R be deÞned by
Z ∞
µj (θ) = xj dF (x|θ) , θ ∈ Θ.
−∞

A method of moments estimator θ̂ of θ solves the estimating equations

1X j
n ³ ´
Xi = µj θ̂ , j = 1, . . . , k.
n
i=1

¡ ¢
Example. Suppose Xi ∼ i.i.d. N µ, σ 2 , where µ ∈ R and σ 2 > 0 are unknown parameters. In
¡ ¢0
this case, θ = µ, σ 2 , Θ = R × R++ ⊆ R2 and the functions µ1 (·) and µ2 (·) are given by
¡ ¢
µ1 µ, σ2 = µ,
¡ ¢
µ2 µ, σ 2 = σ 2 + µ2 .

¡ ¢0
Any solution µ̂, σ̂ 2 to the equation
n
1X ¡ ¢
Xi = µ1 µ̂, σ̂ 2 = µ̂
n
i=1

3
Economics 240A Fall 2003

satisÞes
n
1X
µ̂ = Xi = X̄.
n
i=1

Using this relation, the equation

n
1X 2 ¡ ¢
Xi = µ2 µ̂, σ̂ 2 = σ̂ 2 + µ̂2
n
i=1

can solved for σ̂ 2 to yield

n n n
1X 2 1X 2 1 X¡ ¢2
σ̂ 2 = Xi − µ̂2 = Xi − X̄ 2 = Xi − X̄ .
n n n
i=1 i=1 i=1

In all of the examples considered so far, there is a unique solution θ̂ ∈ Θ to the system
n Z ³ ´
1X j ∞
Xi = xj dF x|θ̂ , j = 1, . . . , k,
n −∞
i=1

of estimating equations. It is not diﬃcult to construct examples where the method of moments breaks
down. In such cases, some variant of the method of moments may work.
¡ ¢
Example. Suppose Xi ∼ i.i.d. N 0, σ2 , where σ 2 > 0 is an unknown parameter. In this case, θ = σ 2 ,
Θ = R++ ⊆ R and the function µ01 (·) is given by
¡ ¢
µ1 σ 2 = 0.

The equation
¡ ¢
X̄ = µ1 σ̂ 2 = 0

has inÞnitely many solutions when X̄ = 0 and no solutions when X̄ 6= 0.

In contrast, the sample counterpart of the equation
¡ ¢ ¡ ¢
E X 2 = µ2 σ 2 = σ 2

has a unique solution:

n
2 1X 2
σ̂ = Xi .
n
i=1

Generalizing this example, let X1 , . . . , Xn be a random sample from a distribution with cdf F (·|θ) ,
where θ ∈ Θ ⊆ Rk is a vector of unknown parameters. Even if we cannot Þnd a unique solution θ̂ to the
estimating equations

4
Economics 240A Fall 2003

n Z ³ ´
1X j ∞
Xi = xj dF x|θ̂ , j = 1, . . . , k,
n −∞
i=1

we may be able to Þnd functions gj : R → R (j = 1, . . . , k) such that the system of equations

n Z ³ ´
1X ∞
gj (Xi ) = gj (x) dF x|θ̂ , j = 1, . . . , k,
n −∞
i=1

has a unique solution θ̂ ∈ Θ. Estimators θ̂ constructed in this way are also called method of moments
estimators.

DeÞnition. Let X = (X1 , . . . , Xn )0 be a discrete (continuous) n-dimensional random vector with joint
pmf (pdf) fX (·|θ) : Rn → R+ , where θ ∈ Θ is an unknown parameter vector. For any x = (x1 , . . . , xn )0 ,
the likelihood function given x is the function L (·|x) : Θ → R+ given by

L (θ|x) = L (θ|x1 , . . . , xn ) = fX (x|θ) , θ ∈ Θ.

The log likelihood function given x is the function l (·|x) : Θ → [−∞, ∞) given by

l (θ|x) = l (θ|x1 , . . . , xn ) = log L (θ|x) , θ ∈ Θ.

When X1 , . . . , Xn is a random sample from a discrete (continuous) distribution with pmf (pdf) f (·|θ) ,
the likelihood function given x = (x1 , . . . , xn )0 is
n
Y
L (θ|x) = f (xi |θ) , θ ∈ Θ,
i=1

while the log likelihood function given x is

n
X
l (θ|x) = log f (xi |θ) , θ ∈ Θ.
i=1

DeÞnition. Let X1 , . . . , Xn be a random sample from a discrete (continuous) distribution with pmf
(pdf) f (·|θ) , where θ ∈ Θ is an unknown parameter vector. When X = (X1 , . . . , Xn )0 = x, a maximum
likelihood estimate θ̂ (x) of θ satisÞes
³ ´
L θ̂ (x) |x = maxθ∈Θ L (θ|x) ,

where L (·|x) is the likelihood function given x. The estimator θ̂ (X) is a maximum likelihood estimator
(MLE) of θ.

Maximum likelihood estimators often enjoy favorable large sample properties. That result is related to
the following fact, which in itself can be used to motivate the maximum likelihood estimator.

5
Economics 240A Fall 2003

Theorem (Information Inequality; Ruud, Lemma D.2). Let X be a discrete (continuous) ran-
dom variable with pmf (pdf ) f0 and let f1 be any other pmf (pdf). Then

E (log f0 (X)) ≥ E (log (f1 (X))) .

Remark. The information inequality is strict unless P (f0 (X) = f1 (X)) = 1.

Proof. The claim is that E (log (Y )) ≤ 0, where

½
f1 (X) /f0 (X) for X ∈ X
Y = ,
0 for X ∈
/ X.

where X = {x : f0 (x) > 0} is the support of X.

Recall the following implication of Jensen’s inequality: If Y is a random variable with P (Y ≥ 0) = 1,
then

E (log (Y )) ≤ log (E (Y )) .

Now,
X f1 (x) X X
E (Y ) = · f0 (x) = f1 (x) ≤ f1 (x) = 1
f0 (x)
x∈X x∈X x∈R

if X is discrete, while
Z Z Z ∞
f1 (x)
E (Y ) = · f0 (x) dx = f1 (x) dx ≤ f1 (x) dx = 1
X f0 (x) X −∞

if X is continuous. In both cases, E (Y ) ≤ 1 and it follows from Jensen’s inequality that

E (log (Y )) ≤ log (E (Y )) ≤ log (1) = 0,

as was to be shown. ¥

Let X1 , . . . , Xn be a random sample from a discrete (continuous) distribution with pmf (pdf) f (·|θ) ,
where θ ∈ Θ is unknown. It follows from the information inequality that

Eθ (log f (X|θ)) ≥ Eθ (log f (X|θ∗ ))

for any θ∗ ∈ Θ, where Eθ (·) denotes the expected value computed using the true (unknown) cdf F (·|θ) of
the random variable X. As a consequence, the true parameter value θ solves the problem of maximizing

Eθ (log f (X|θ∗ ))

with respect to θ∗ ∈ Θ; that is,

6
Economics 240A Fall 2003

Eθ (log f (X|θ)) = maxθ∗ ∈Θ Eθ (log f (X|θ∗ )) .

The sample analogue of this problem is that of maximizing the average log likelihood
n
1X
log f (Xi |θ∗ )
n
i=1

with respect θ∗ ∈ Θ. The average log likelihood is a strictly increasing function of L (θ∗ |X1 , . . . , Xn ) .
SpeciÞcally,
n µ ¶
1X 1
log f (Xi |θ∗ ) = log ∗
L (θ |X1 , . . . , Xn ) .
n n
i=1

Therefore, a maximum likelihood estimator θ̂ (X1 , . . . , Xn ) maximizes the average log likelihood with re-
spect to θ∗ :
Ã n !
1X
n ³ ´ 1X
log f Xi |θ̂ (X1 , . . . , Xn ) = maxθ∗ ∈Θ log f (Xi |θ∗ ) .
n n
i=1 i=1

Example. Suppose Xi ∼ i.i.d. Ber (p) , where p ∈ [0, 1] is an unknown parameter. Each Xi is discrete
with pmf


 1−p for x = 0
f (x|p) = p for x = 1

0 otherwise
½ x
p (1 − p)1−x for x ∈ {0, 1}
=
0 otherwise,

= px (1 − p)1−x · 1 (x ∈ {0, 1}) ,

where 00 = 1 and 1 (·) is the indicator function. It suﬃces to consider the case where xi ∈ {0, 1} for
i = 1, . . . , n, as the likelihood is zero for all other values of x = (x1 , . . . , xn )0 .
The likelihood given x is

n
Y n
Y
L (p|x) = f (xi |p) = pxi (1 − p)1−xi
i=1 i=1
Pn Pn
= p i=1 xi
(1 − p)n− i=1 xi
, p ∈ [0, 1] ,

while the log likelihood given x is

7
Economics 240A Fall 2003

n
X n
X
l (p|x) = log f (xi |p) = (xi log p + (1 − xi ) log (1 − p))
i=1 i=1
Ã n ! Ã n
!
X X
= xi log p + n− xi log (1 − p) , p ∈ [0, 1] ,
i=1 i=1

whereP
0 · log 0 = 0.
If ni=1 xi = 0, then

L (p|x) = (1 − p)n

is a decreasing
P function of p and p = 0 maximizes L (p|x) with respect to p ∈ [0, 1] .
If ni=1 xi = n, then

L (p|x) = pn

and p = 1 maximizes L (p|x) with respect

P to p ∈ [0, 1] .
In intermediate cases where 0 < ni=1 xi < n, the maximum likelihood estimate can be found by solving
the Þrst-order condition for an interior maximum:

¯ Ã n ! Ã n
!
d ¯ X 1 X 1
l (p|x)¯¯ = xi − n− xi =0
dp p=p̂ p̂ 1 − p̂
i=1 i=1

m
Pn
i=1 xi
p̂ = .
n

This unique solution to the Þrst-order condition is a maximizer because

¯ Ã n ! Ã n
!
d2 ¯ X 1 X 1
l (p|x)¯¯ =− xi − n− xi < 0.
dp 2
p=p̂ i=1
p̂2
(1 − p̂)2
i=1

Combining the results, we see that

Pn
i=1 Xi
p̂ = X̄ =
n

is the maximum likelihood estimator of p. In this case, the maximum likelihood estimator coincides with
the method of moments estimator.

Exercise. Verify that p∗ = p maximizes

8
Economics 240A Fall 2003

Ep (log f (Xi |p∗ )) = Ep (Xi log p∗ + (1 − Xi ) log (1 − p∗ ))

= p · log p∗ + (1 − p) · log (1 − p∗ )

with respect to p∗ ∈ [0, 1] .

When a unique maximum likelihood estimator θ̂ of θ = (θ1 , . . . , θk )0 exists, it can usually be constructed
by solving the likelihood equations
¯
∂ ¯
l (θ|X1 , . . . , Xn )¯¯ = 0, j = 1, . . . , k,
∂θj θ=θ̂

and verifying that a second-order condition holds. For instance, if θ is a scalar parameter a unique solution
θ̂ to
¯
d ¯
l (θ|x1 , . . . , xn )¯¯ =0
dθ θ=θ̂

is a maximum likelihood estimate if l (·|x1 , . . . , xn ) is twice diﬀerentiable, Θ is an interval (possibly un-

bounded) and
¯
d2 ¯
¯
2 l (θ|x1 , . . . , xn )¯ < 0.
dθ θ=θ̂

¡ ¢
Example. Suppose X¡i ∼ i.i.d. ¢ N µ, σ 2 , where µ ∈ R and σ 2 > 0 are unknown parameters. The
marginal pdf of Xi is f ·|µ, σ2 , where

µ ¶
¡ ¢ 1 1 2
f x|µ, σ2 = √ exp − 2 (x − µ)
2πσ 2 2σ
µ ¶
¡ ¢
2 −1/2 1 2
= 2πσ exp − 2 (x − µ) .
2σ

The likelihood given x = (x1 , . . . , xn )0 is

Yn n µ ¶
¡ ¢ ¡ ¢ Y ¡ ¢−1/2 1
L µ, σ 2 |x = f xi |µ, σ2 = 2πσ2 exp − 2 (xi − µ)2
2σ
i=1 i=1
Ã n
!
¡ ¢ −n/2 1 X 2
= 2πσ2 exp − 2 (xi − µ) ,
2σ
i=1

while the log likelihood given x is

9
Economics 240A Fall 2003

n
X
¡ ¢ ¡ ¢
l µ, σ 2 |x = log f xi |µ, σ 2
i=1
n µ
X ¶
1 1 ¡ 2¢ 1 2
= − log (2π) − log σ − 2 (xi − µ)
2 2 2σ
i=1
n
n n 1 X
2
= − log 2π − log σ − 2 (xi − µ)2 .
2 2 2σ
i=1

The likelihood equations are:

¯ n
∂ ¡ ¢¯ 1 X
l µ, σ2 |X1 , . . . , Xn ¯¯ = 2 (Xi − µ̂) = 0
∂µ θ=θ̂ σ̂ i=1

and
¯ n
∂ ¡ ¢¯
¯ n 1 X
l µ, σ 2
|X1 , . . . , Xn ¯ = − + (Xi − µ̂)2 ,
∂ σ̂ 2 θ=θ̂ 2σ̂ 2
2σ̂ 4
i=1
¡ ¢0 ¡ ¢0
where θ = µ, σ 2 and θ̂ = µ̂, σ̂2 . The unique solution to these equations is

n
1X
µ̂ = Xi = X̄,
n
i=1
n n
1X 1 X¡ ¢2
σ̂ 2
= (Xi − µ̂)2 = Xi − X̄ .
n n
i=1 i=1

for any σ 2 > 0 and

¡ ¢ ¡ ¢
limσ2 →0 L µ, σ2 |x = limσ2 →∞ L µ, σ2 |x = 0

¡ ¢0 ¡ ¢0
for any µ ∈ R, so µ̂, σ̂ 2 is the maximum likelihood estimator of µ, σ 2 . Once again, the maximum
likelihood estimator coincides with the method of moments estimator.
In the present case, the second-order condition can also be veriÞed using univariate calculus (Casella
and Berger, Example 7.2.11).

10
Economics 240A Fall 2003

¡ ¢ ¡ ¢
Exercise. Verify that µ∗ , σ∗2 = µ, σ 2 maximizes

¡ ¡ ¢¢
E(µ,σ2 ) log f Xi |µ∗ , σ∗2
µ ¶
1 1 ¡ ∗2 ¢ 1 ∗ 2
= E(µ,σ2 ) − log (2π) − log σ − ∗2 (Xi − µ )
2 2 2σ
n n 1 ³ ´
= − log 2π − log σ ∗2 − ∗2 σ 2 + (µ − µ∗ )2
2 2 2σ
¡ ¢
with respect to µ∗ , σ ∗2 ∈ R × R++ .

One case where the maximum likelihood estimator cannot be constructed by solving the likelihood
equations is the following.

Example. Suppose Xi ∼ i.i.d. U [0, θ] , where θ > 0 is an unknown parameter. Each Xi is continu-
ous with pdf

½
1/θ for 0 ≤ x ≤ θ
f (x|θ) =
0 otherwise
1
= 1 (0 ≤ x ≤ θ) .
θ

It suﬃces to consider the case where xi ≥ 0 for i = 1, . . . , n, as the likelihood is zero for all other values of
x = (x1 , . . . , xn )0 .
The likelihood given x is

n
Y n µ
Y ¶
1
L (θ|x) = f (xi |θ) = 1 (0 ≤ xi ≤ θ)
θ
i=1 i=1
n
1 Y
= 1 (0 ≤ xi ≤ θ)
θn
i=1
1
= 1 (max1≤i≤n xi ≤ θ) , θ > 0,
θn

where the third equality uses that fact that xi ≥ 0 for i = 1, . . . , n.

The likelihood given x is zero for θ < max1≤i≤n xi and is a decreasing function of θ for θ ≥ max1≤i≤n xi .
As a consequence, the maximum likelihood estimator of θ is

θ̂ = max1≤i≤n Xi .

In this case, the maximum likelihood estimator is diﬀerent from the method moments estimator (the latter
is 2 · X̄). Unlike the method of moments estimator, the maximum likelihood estimator has the property

11
Economics 240A Fall 2003

that θ̂ ≥ Xi for every i. On the³ other´hand, since max1≤i≤n Xi is a lower bound on the true θ, θ̂ will tend
to underestimate θ. Indeed, P θ̂ ≤ θ = 1.

Exercise. Verify that θ∗ = θ maximizes

Eθ (log f (Xi |θ∗ )) = − log θ∗ − ∞ · Pθ (Xi > θ∗ )

with respect to θ∗ > 0, where ∞ · 0 = 0.

In the sense of the following deÞnition, the maximum likelihood estimator max1≤i≤n Xi in the preceding
example is the nth order statistic and is often denoted by X(n) .

DeÞnition. Let X1 , . . . , Xn be a random sample. The order statistics are the sample values placed
in ascending order. They are denoted by X(1) , . . . , X(n) .

Example. For any random sample X1 , . . . , Xn , X(1) = min1≤i≤n Xi and X(n) = max1≤i≤n Xn .

Remark. The pmf (pdf) of order statistics obtained from a discrete (continuous) distribution can be
characterized using combinatorial arguments (Casella and Berger, Section 5.4).

Remark. Maximum likelihood ³ ´ estimators are equivariant in the sense that if θ̂ is a maximum likeli-
hood estimator of θ, then τ θ̂ is a maximum likelihood estimator of τ (θ) for any function τ (·) deÞned
on Θ (Casella and Berger, Theorem 7.2.10).

Let X1 , . . . , Xn be a random sample from a distribution with cdf F (·|θ) , where θ ∈ Θ ⊆ Rk is an

unknown parameter vector. An estimator θ̂ of θ is called a Z-estimator if it is (implicitly) deÞned as a
solution of a system of equations of the form

1X ³ ´
n
ψ j Xi , θ̂ = 0, j = 1, . . . , k,
n
i=1

where each ψ j : R × Θ → R is a function. Z-estimators are usually motivated by showing that θ is the
only value of θ∗ ∈ Θ for which

Eθ ψ j (X, θ∗ ) = 0 ∀j ∈ {1, . . . , k} .

The leading special case is the method of moments estimator, which is a Z-estimator with
Z ∞
∗ j
ψ j (Xi , θ ) = Xi − xj dF (x|θ∗ ) , j = 1, . . . , k,
−∞

or, more generally,

Z ∞
∗
ψ j (Xi , θ ) = gj (Xi ) − gj (x) dF (x|θ∗ ) , j = 1, . . . , k.
−∞

An estimator θ̂ of θ is called an M -estimator if it is (implicitly) deÞned as

12
Economics 240A Fall 2003

1X ³ ´
n n
1X
m Xi , θ̂ = maxθ∗ ∈Θ m (Xi , θ∗ ) ,
n n
i=1 i=1

where m : R × Θ → R is a function. M -estimators are usually motivated by showing that θ is the unique
maximizer (with respect to θ∗ ∈ Θ) of

Eθ m (X, θ∗ ) .

The leading special case is the maximum likelihood estimator, which is an M -estimator with

m (Xi , θ∗ ) = log f (Xi |θ∗ ) ,

where f (·|θ∗ ) is the pmf/pdf of the cdf F (·|θ∗ ) .

Many M -estimators satisfy Þrst-order conditions of the form
n µ ¯ ¶
1X ∂ ¯
m (Xi , θ)¯¯ = 0, j = 1, . . . , k.
n ∂θj θ=θ̂
i=1

and can therefore be interpreted as Z-estimators with

¯
∂ ¯
ψj (Xi , θ∗ ) = m (Xi , θ)¯¯ , j = 1, . . . , k.
∂θj θ=θ∗

In particular, many maximum likelihood estimators can be interpreted as method of moments estimators
with
¯
∂ ¯
∗
gj (Xi , θ ) = log f (Xi , θ)¯¯ , j = 1, . . . , k.
∂θj θ=θ∗
© ª
It will almost always be possible to Þnd a set of functions ψ j : j = 1, . . . , k such that θ is the only
value of θ∗ ∈ Θ for which

Eθ ψ j (X, θ∗ ) = 0 ∀j ∈ {1, . . . , k}

or a function m such that θ is the unique maximizer (with respect to θ∗ ∈ Θ) of

Eθ m (X, θ∗ ) .

An important exception occurs when the model is not identiÞed.

DeÞnition. Let (Ω, B, {Pθ : θ ∈ Θ}) be a parametric statistical model. A parameter value θ1 ∈ Θ
is identiÞed if there does not exist another parameter value θ2 ∈ Θ such that Pθ1 = Pθ2 . The model
(Ω, B, {Pθ : θ ∈ Θ}) is identiÞed if every parameter value θ ∈ Θ is identiÞed.

In other words, a model is identiÞed if knowledge of the true marginal cdf F (·|θ) implies knowledge
of the parameter θ. This is a very modest and reasonable requirement. IdentiÞcation is a property of

13
Economics 240A Fall 2003

the parameterization/speciÞcation of a statistical model. When the cdfs {F (·|θ) : θ ∈ Θ} characterizing a

statistical model are speciÞed directly, identiÞcation usually holds. On the other hand, problems may arise
when the observed sample is assumed to be generated by a transformation model.

Example. Suppose X1 , . . . , Xn is a random sample generated by the model

½
0 for Xi∗ ≤ 0 ¡ ¢
Xi = 1 (Xi∗ > 0) = ∗ , Xi∗ ∼ i.i.d.N µ, σ 2 ,
1 for Xi > 0

where µ ∈ R and σ 2 > 0 are unknown parameters. In this case, Xi ∼ i.i.d. Ber (Φ (−µ/σ)) , where Φ (·) is
the cdf of the standard normal distribution:
Z x µ ¶
1 1 2
Φ (x) = √ exp − t dt.
−∞ 2π 2
¡ ¢0
As a consequence, the marginal distribution of each Xi depends on µ, σ2 only through µ/σ and any
¡ ¢0
parameter value µ1 , σ 21 is unidentiÞed. We can achieve identiÞcation by imposing an identifying assump-
tion on the parameters. In this case, a natural identifying assumption is σ = 1.

Remark. If θ1 ∈ Θ is an unidentiÞed parameter value, there is another parameter value θ2 ∈ Θ such that
F (·|θ1 ) = F (·|θ2 ) , implying
Z ∞ Z ∞
Eθ1 g (X) = g (x) dF (x|θ1 ) = g (x) dF (x|θ2 ) = Eθ2 g (X)
−∞ −∞

for any function g. In particular, any solution θ∗ to a system of equations of the form

Eθ1 ψ j (X, θ∗ ) = 0 ∀j ∈ {1, . . . , k}

will also be a solution to the following system of equations:

Eθ2 ψ j (X, θ∗ ) = 0 ∀j ∈ {1, . . . , k} .

Similarly, any maximizer (with respect to θ∗ ∈ Θ) of

Eθ1 m (X, θ∗ )

will also be a maximizer of

Eθ2 m (X, θ∗ ) .

If one attempts to estimate the parameters of an unidentiÞed model, unique method of moments (maximum
likelihood) estimators typically cannot be found.

An estimator W (X1 , . . . , Xn ) of θ is a good estimator (only) if it is “close” to θ in some probabilistic

sense. We will use mean squared error as our measure of closeness.

14
Economics 240A Fall 2003

DeÞnition. The mean squared error (MSE) matrix of an estimator θ̂ of θ is the function (of θ) given by
³ ´ ·³ ´³ ´0 ¸
M SEθ θ̂ = Eθ θ̂ − θ θ̂ − θ , θ ∈ Θ.

DeÞnition. The bias of an estimator θ̂ of θ is

³ ´ ³ ´
Biasθ θ̂ = Eθ θ̂ − θ, θ ∈ Θ.

An estimator θ̂ of θ is unbiased if
³ ´
Eθ θ̂ = θ ∀θ ∈ Θ.

Many results derived using MSE generalize to other measures of closeness. It is convenient to use MSE
because it is analytically tractable and has a straightforward interpretation in terms of the variance and
bias of the estimator θ̂. SpeciÞcally,

³ ´ ·³ ´³ ´0 ¸
M SEθ θ̂ = Eθ θ̂ − θ θ̂ − θ
·³ ³ ´ ³ ´ ´³ ³ ´ ³ ´ ´0 ¸
= Eθ θ̂ − Eθ θ̂ + Eθ θ̂ − θ θ̂ − Eθ θ̂ + Eθ θ̂ − θ
·³ ³ ´´ ³ ³ ´´0 ¸
= Eθ θ̂ − Eθ θ̂ θ̂ − Eθ θ̂
³ ³ ´ ´³ ³ ´ ´0
+ Eθ θ̂ − θ Eθ θ̂ − θ
h ³ ´i ³ ³ ´ ´0
+Eθ θ̂ − Eθ θ̂ Eθ θ̂ − θ
³ ³ ´ ´ h ³ ´i0
+ Eθ θ̂ − θ Eθ θ̂ − Eθ θ̂

³ ´ ³ ´ ³ ´0
= V arθ θ̂ + Biasθ θ̂ · Biasθ θ̂

h ³ ´i ³ ´ ³ ´
because Eθ θ̂ − Eθ θ̂ = 0 and Eθ θ̂ − θ = Biasθ θ̂ is non-random. In particular,
³ ´ ³ ´ ³ ´2
M SEθ θ̂ = V arθ θ̂ + Biasθ θ̂

when θ is a scalar parameter.

¡ ¢
Example. Suppose Xi ∼ i.i.d. N µ, σ2 , where µ ∈ R and σ 2 > 0 are unknown parameters. Two
estimators of σ 2 are the maximum likelihood estimator

15
Economics 240A Fall 2003

n
1 X¡ ¢2
σ̂ 2 = Xi − X̄
n
i=1

and the sample variance

n
1 X¡ ¢2
S2 = Xi − X̄ .
n−1
i=1

The sample variance satisÞes

n
(n − 1) S 2 1 X¡ ¢2
= Xi − X̄ ∼ χ2 (n − 1) ,
σ2 σ2
i=1

implying
Ã n
!
¡ ¢ σ2 1 X¡ ¢2 σ2
E S2 = ·E Xi − X̄ = · (n − 1) = σ 2
n−1 σ2 n−1
i=1

and
µ ¶2 Ã n
! µ ¶2
¡ 2
¢ σ2 1 X¡ ¢2 σ2 2
V ar S = · V ar 2
Xi − X̄ = · 2 (n − 1) = σ4 .
n−1 σ n−1 n−1
i=1

In particular,
¡ ¢ 2
M SE(µ,σ2 ) S 2 = σ4 .
n−1
Similarly,
Ã n
!
¡ 2
¢ σ2 1 X¡ ¢2 σ2 n−1 2
E σ̂ = ·E 2
Xi − X̄ = · (n − 1) = σ
n σ n n
i=1

and
µ ¶2 Ã n
! µ ¶2
¡ ¢ σ2 1 X¡ ¢2 σ2 2 (n − 1) 4
V ar σ̂ 2 = · V ar 2
Xi − X̄ = · 2 (n − 1) = σ ,
n σ n n2
i=1

µ ¶2
¡ 2
¢ 2 (n − 1) 4 1 2 2n − 1 4
M SE(µ,σ2 ) σ̂ = σ + σ = σ
n2 n n2
2 − 1/n 4 2 ¡ ¢
= σ < σ 4 = M SE(µ,σ2 ) S 2 .
n n−1

Unlike S 2 , σ̂ 2 is biased. Nonetheless, its variance is so much smaller than that of S 2 that its MSE is
smaller for all values of µ and σ 2 .

16
Economics 240A Fall 2003

In this example, the MSE ranking does not depend on the true value of the parameter(s). In spite of
this we do not know whether an even better estimator exists. To answer that question, it might appear
natural to look for an estimator that minimizes MSE uniformly in µ and σ2 . Unfortunately, such an esti-
mator does not exist.

Example. As a competitor to σ̂ 2 , consider the estimator σ̃ 2 = 1. Evidently, σ̃ 2 is a perfect estimator

if σ 2 happens to equal unity, but is an inferior estimator for most other values of σ 2 .

The point is that in order to Þnd a uniformly (in the value of the parameters) best estimator, we need
to impose certain restrictions on the class of estimators under consideration.

DeÞnition. Let W be a class of estimators. An estimator θ̂ of θ is eﬃcient relative to W if

³ ´
M SEθ θ̂ ≤ M SEθ (W ) ∀θ ∈ Θ

for every W ∈ W.
³ ´
Remark. When θ is a vector, the notation “M SEθ θ̂ ≤ M SEθ (W )” is shorthand for “the matrix
³ ´
M SEθ (W ) − M SEθ θ̂ is positive semi-deÞnite”.
¡ ¢
Example. Suppose Xi ∼ i.i.d. N µ, σ 2 , where 2
ª R and σ > 0 are unknown parameters. The
© 2 µ 2∈
2 2
estimator (of σ ) σ̂ is eﬃcient relative to W = σ̂ , S .
¡ ¢
Example. Suppose Xi ∼ i.i.d. N µ, σ2 , where µ ∈ R and σ 2 > 0 are unknown parameters. Con-
sider the following class of estimators (of σ 2 ):
( n
)
2 1 X¡ ¢2
W = σ̃ c = Xi − X̄ : c > 0 .
c
i=1

The estimators
n
1 X¡ ¢2
σ̂ 2 = Xi − X̄ = σ̃ 2n
n
i=1

and
n
1 X¡ ¢2
S2 = Xi − X̄ = σ̃ 2n−1
n−1
i=1

are both members of W. It is not hard to show that σ̃ 2n+1 is eﬃcient relative to W.

A “natural” class of estimators with fairly general applicability is the class

Wu (θ) = {W : Eθ (W ) = θ and V arθ (W ) < ∞ for every θ ∈ Θ}

of unbiased estimators of θ with Þnite variance. For unbiased estimators, the MSE is simply the variance.

17
Economics 240A Fall 2003

DeÞnition. An estimator θ̂ ∈ Wu (θ) of θ is a uniform minimum variance unbiased (UMVU) estima-

tor of θ if θ̂ is eﬃcient relative to Wu (θ) .

It turns out that UMVU estimators often exist. The Rao-Blackwell Theorem facilitates the search for
UMVU estimators by showing that UMVU estimators can always be based on statistics that are suﬃcient
in the sense of the following deÞnition.

DeÞnition. Let X1 , . . . , Xn be a random sample from a distribution with cdf F (·|θ) , where θ ∈ Θ is
unknown. A statistic T = T (X1 , . . . , Xn ) is a suﬃcient statistic for θ if the conditional distribution of
(X1 , . . . , Xn )0 given T does not depend on θ.

Theorem (Rao-Blackwell Theorem; Casella and Berger, Theorem 7.3.17). Let θ̂ ∈ Wu (θ)
and let T be any suﬃcient statistic for θ. Then
³ ´
θ̃ = EX|T θ̂|T ∈ Wu (θ)

and
³ ´ ³ ´
V arθ θ̃ ≤ V arθ θ̂ ∀θ ∈ Θ.

³ ´ ³ ´ ³ ´
Remark. The inequality V arθ θ̃ ≤ V arθ θ̂ is strict unless Pθ θ̃ = θ̂ = 1.

Proof. The distribution of the estimator θ̂ = θ̂ (X1 , . . . , Xn ) conditional on T does not depend on θ
when T is suﬃcient. Therefore,
³ ´
θ̃ = EX|T θ̂|T

is a function of T = T (X1 , . . . , Xn ) that does not depend on the true (unknown) value of θ. In particular,
θ̃ is an estimator.
The estimator θ̃ is unbiased because
³ ´ ³ ³ ´´ ³ ´
Eθ θ̃ = Eθ EX|T θ̂|T = Eθ θ̂ = θ,

where the second equality uses the law of iterated expectations and the last equality uses the fact that θ̂
is unbiased.
Applying the conditional variance identity, we have:

³ ´ ³ ³ ´´ ³ ³ ´´
V arθ θ̂ = V arθ EX|T θ̂|T + Eθ V arX|T θ̂|T
³ ³ ´´ ³ ´
≥ V arθ EX|T θ̂|T = V arθ θ̃ . ¥

18
Economics 240A Fall 2003

³ ´ ³ ´
Remark. With a little more eﬀort, a proof of the relation V arθ θ̂ ≥ V arθ θ̃ can be based on the
conditional version of Jensen’s inequality:

³ ´ µ³ ´2 ¶ µ µ³ ´2 ¶¶
V arθ θ̂ = Eθ θ̂ − θ = Eθ EX|T θ̂ − θ |T
µ³ ³ ´ ´2 ¶ µ³ ´2 ¶ ³ ´
≥ Eθ EX|T θ̂|T − θ = Eθ θ̃ − θ = V arθ θ̃ ,

where the second equality uses the law of iterated expectations and the inequality uses the conditional
version of Jensen’s
³ ³ inequality.
´´ This
³ method
´ of proof is applicable whenever the measure of closeness is of
the form Eθ L θ̂, θ , where L θ̂, θ is a convex function of θ̂ :

³ ³ ´´ ³ ³ ³ ´ ´´
Eθ L θ̂, θ = Eθ EX|T L θ̂, θ |T
³ ³ ³ ´ ´´ ³ ³ ´´
≥ Eθ L EX|T θ̂|T , θ = Eθ L θ̃, θ .

¯ ¯
¯ ¯
For instance, ¯θ̂ − θ¯ is a convex function of θ̂ and therefore

³¯ ¯´ ³ ³¯ ¯ ´´
¯ ¯ ¯ ¯
Eθ ¯θ̂ − θ¯ = Eθ EX|T ¯θ̂ − θ¯ |T
³³¯ ³ ´ ¯´´ ³¯ ¯´
¯ ¯ ¯ ¯
≥ Eθ ¯EX|T θ̂|T − θ¯ = Eθ ¯θ̃ − θ¯ .

It follows from the Rao-Blackwell Theorem that when looking for UMVU estimators, there is no need
to consider estimators that cannot be written as functions of a sufficient statistic. Indeed, it suffices to
look at estimators that are necessary statistics in the sense that they can be written as functions of every
sufficient statistic. Here, the word “every” is crucial because any estimator is a function of (X1 , . . . , Xn )0
and (X1 , . . . , Xn )0 is always a sufficient statistic.
Of course, the usefulness of the Rao-Blackwell Theorem depends on the extent to which sufficient sta-
tistics of low dimension are available and easy to Þnd. As it turns out, sufficient statistics of the same
dimension as θ are available in many cases. In cases where determination of sufficient statistics by means
of the deÞnition is tedious, the following characterization of sufficiency may be useful.

Theorem (Factorization Criterion; Casella and Berger, Theorem 6.2.6). Let X1 , . . . , Xn be

a random sample from a discrete (continuous) distribution with cdf F (·|θ) , where θ ∈ Θ is unknown. A
statistic T = T (X1 , . . . , Xn ) is a suﬃcient statistic for θ if and only if there exist functions g (·|·) and
h (·) such that fX (·|θ) is a pmf (pdf ) of (X1 , . . . , Xn )0 , where

fX (x1 , . . . , xn |θ) = g (T (x1 , . . . , xn ) |θ) h (x1 , . . . , xn )

for every (x1 , . . . , xn ) ∈ Rn and every θ ∈ Θ.

19
Economics 240A Fall 2003

Example. For any random sample ¡ X1 , . . . , X¢n from a discrete (continuous) distribution, two suﬃcient
0 0
statistics are (X1 , . . . , Xn ) and X(1) , . . . , X(n) .

Example. Suppose Xi ∼ i.i.d. Ber (p) , where p ∈ [0, 1] is an unknown parameter. The joint pmf of
(X1 , . . . , Xn )0 is

n
Y
fX (x1 , . . . , xn |p) = pxi (1 − p)1−xi · 1 (xi ∈ {0, 1})
i=1
Ãn !
Pn P
n− n
Y
= p i=1 xi (1 − p) i=1 xi · 1 (xi ∈ {0, 1})
i=1
³Xn ´
= g xi |p · h (x1 , . . . , xn ) ,
i=1

where

g (t|p) = pt (1 − p)n−t

and
n
Y
h (x1 , . . . , xn ) = 1 (xi ∈ {0, 1}) .
i=1
P
Therefore, ni=1 Xi is a suﬃcient statistic for p.
The maximum likelihood (and method of moments) estimator
Pn
Xi
p̂ = i=1
n
Pn
is a function of i=1 Xi .

Example. Suppose Xi ∼ i.i.d. U [0, θ] , where θ > 0 is an unknown parameter. The joint pdf of
(X1 , . . . , Xn )0 is

n µ
Y ¶
1
fX (x1 , . . . , xn |θ) = 1 (0 ≤ xi ≤ θ)
θ
i=1
n
1 Y
= 1 (0 ≤ xi ≤ θ)
θn
i=1
1 ¡ ¢ ¡ ¢
= n 1 x(n) ≤ θ · 1 x(1) ≥ 0
θ
¡ ¢
= g x(n) |θ · h (x1 , . . . , xn ) ,

where

20
Economics 240A Fall 2003

1
g (t|θ) = 1 (t ≤ θ)
θn

and
¡ ¢
h (x1 , . . . , xn ) = 1 x(1) ≥ 0 .

Therefore, X(n) = max1≤i≤n Xi is a suﬃcient statistic for θ.

The method of moments estimator θ̂MM = 2X̄ is unbiased but is not a function of X(n) (unless n = 1)
and therefore cannot be UMVU. On the other hand, the maximum likelihood estimator θ̂ML = X(n) is
based on X(n) .
¡ ¢
Example. Suppose Xi ∼ i.i.d. N µ, σ 2 , where µ ∈ R and σ 2 > 0 are unknown parameters. In
this case, the joint pdf of (X1 , . . . , Xn )0 is

Yn µ ¶
¡ 2
¢ ¡ ¢
2 −1/2 1 2
fX x1 , . . . , xn |µ, σ = 2πσ exp − 2 (xi − µ)
2σ
i=1
Ã n
!
n/2 ¡ 2 ¢−n/2 1 X¡ 2 2
¢
= (2π) σ exp − 2 xi + µ − 2µxi
2σ
i=1
µ ¶ Ã n n
!
n/2 ¡ 2 ¢−n/2 nµ2 µ X 1 X 2
= (2π) σ exp − 2 exp xi − 2 xi
2σ σ2 2σ
i=1 i=1
³Xn Xn ´
= g xi , x2i |µ, σ 2 · h (x1 , . . . , xn ) ,
i=1 i=1

where
µ ¶ µ ¶
¡ 2
¢ n/2 ¡ 2 ¢−n/2 nµ2 µ 1
g t1 , t2 |µ, σ = (2π) σ exp − 2 exp t1 − 2 t2
2σ σ2 2σ

and

h (x1 , . . . , xn ) = 1.

¡Pn Pn ¢ ¡ ¢
2 0 is a suﬃcient statistic for µ, σ 2 0 .
Therefore, i=1 Xi , i=1 Xi
The maximum likelihood (and method of moments) estimator of µ,
Pn
Xi
µ̂ = X̄ = i=1 ,
n
¡Pn Pn ¢
2 0,
is a function of i=1 Xi , i=1 Xi as is the maximum likelihood (and method of moments) estimator
of σ 2 ,

21
Economics 240A Fall 2003

n
Ã n
!2
1X 2 1X
σ̂ 2 = Xi − Xi .
n n
i=1 i=1

A sufficient statistic T is particularly useful if unbiased estimators based on T are essentially unique in
the sense that any two unbiased estimators based on T are equal with probability one. Indeed, if we can
somehow Þnd a sufficient statistic T such that unbiased estimators based on T are essentially unique, then
any θ̂ ∈ Wu (θ) based on T is UMVU. The Lehmann-Scheffé Theorem establishes essential uniqueness of
unbiased estimators based on a complete sufficient statistic T.

DeÞnition. A suﬃcient statistic T for θ is complete if

Eθ (g (T )) = 0 ∀θ ∈ Θ

implies

Pθ (g (T ) = 0) = 1 ∀θ ∈ Θ.

Theorem (Lehmann-Scheﬀé Theorem; Casella and Berger, Theorem 7.5.1). Unbiased esti-
mators based on complete suﬃcient statistics are essentially unique.

Proof. Suppose θ̂ = θ̂ (T ) and θ̃ = θ̃ (T ) are unbiased estimators θ based on a suﬃcient statistic T.

Then
³ ´ ³ ´ ³ ´
Eθ θ̂ (T ) − θ̃ (T ) = Eθ θ̂ (T ) − Eθ θ̃ (T ) = θ − θ = 0 ∀θ ∈ Θ

because θ̂ and θ̃ are unbiased. If T is complete, then

³ ´ ³ ´
Pθ θ̂ (T ) − θ̃ (T ) = 0 = Pθ θ̂ (T ) = θ̃ (T ) = 1 ∀θ ∈ Θ. ¥

Corollary. If T is a complete suﬃcient statistic and θ̂ ∈ Wu (θ) is based on T, then θ̂ is a UMVU

estimator of θ.
³ ´ ³ ´
Proof. Suppose there exists an estimator θ̃ ∈ Wu (θ) such that V arθ∗ θ̃ < V arθ∗ θ̂ for some θ∗ ∈ Θ.
³ ´
By the Rao-Blackwell theorem, E θ̃|T ∈ Wu (θ) is based on T and satisÞes
³ ³ ´´ ³ ´ ³ ´
V arθ∗ E θ̃|T ≤ V arθ∗ θ̃ < V arθ∗ θ̂ .
³ ³ ´´ ³ ´
In view of the Lehmann-Scheﬀé theorem, this is impossible because V arθ∗ E θ̃|T < V arθ∗ θ̂ implies
that θ̂ is not an essentially unique unbiased estimator of θ based on T. ¥

22
Economics 240A Fall 2003

Complete suﬃcient statistics can often be found if the family {f (·|θ) : θ ∈ Θ} of pmfs/pdfs is an expo-
nential family.

DeÞnition. A family {f (·|θ) : θ ∈ Θ} of pmfs/pdfs is called a d-dimensional exponential family if there

exist functions h : R → R+ , c : Θ → R+ , ηi : Θ → R (i = 1, . . . , d) and ti : R → R (i = 1, . . . , d) such that
Ã d !
X
f (x|θ) = h (x) c (θ) exp η i (θ) ti (x) , ∀x ∈ R, θ ∈ Θ.
i=1

Example. Suppose Xi ∼ i.i.d. Ber (p) , where p ∈ (0, 1) . The marginal pmf satisÞes

f (x|p) = px (1 − p)1−x · 1 (x ∈ {0, 1})

µ ¶x
p
= (1 − p) · 1 (x ∈ {0, 1})
1−p
µ µ ¶ ¶
p
= exp log · x (1 − p) · 1 (x ∈ {0, 1})
1−p

= h (x) c (p) exp (η (p) t (x)) ,

where

h (x) = 1 (x ∈ {0, 1}) ,

c (p) = 1 − p,
µ ¶
p
η (p) = log ,
1−p

t (x) = x.

¡ ¢
Example. Suppose Xi ∼ i.i.d. N µ, σ 2 , where µ ∈ R and σ 2 > 0 are unknown parameters. In
this case, the marginal pdf satisÞes

µ ¶
¡ 2
¢ ¡ ¢
2 −1/2 1 2
f x|µ, σ = 2πσ exp − 2 (x − µ)
2σ
µ ¶ µ ¶
1/2 ¡ 2 ¢−1/2 µ2 µ 1 2
= (2π) σ exp − 2 exp x − 2x
2σ σ2 2σ
Ã 2 !
¡ ¢ X ¡ ¢
2 2
= h (x) c µ, σ exp ηi µ, σ ti (x) ,
i=1

23
Economics 240A Fall 2003

where

h (x) = 1,
µ ¶
¡ ¢ ¡ ¢−1/2 µ2
c µ, σ 2 = (2π)1/2 σ 2 exp − 2 ,
2σ
¡ ¢ µ
η1 µ, σ 2 = ,
σ2

t1 (x) = x,

¡ ¢ 1
η2 µ, σ 2 = − 2 ,
2σ

t2 (x) = x2 .

Theorem (Casella and Berger, Theorem 6.2.25). Let X1 , . . . , Xn be a random sample from a discrete
(continuous) exponential family with pmf (pdf)
Ã d !
X
f (x|θ) = h (x) c (θ) exp η i (θ) ti (x) , x ∈ R, θ ∈ Θ.
i=1

The suﬃcient (for θ) statistic

Ã n n
!0
X X
T (X1 , . . . , Xn ) = t1 (Xi ) , . . . , td (Xi )
i=1 i=1

is complete if the set {(η1 (θ) , . . . , η d (θ)) : θ ∈ Θ} contains an open set.

Remark. A set A ⊆ Rd contains an open set if and only if we can Þnd constants aL U L U
1 < a1 , . . . , ad < ad
such that
£ L U¤ £ ¤
a1 , a1 × . . . × aL U
d , ad ⊆ A;

that is, a set A ⊆ Rd contains an open set if and only if it contains a d-dimensional rectangle.

Example. Suppose Xi ∼ i.i.d. Ber (p) , where p ∈ (0, 1) . The marginal pdf satisÞes

f (x|p) = h (x) c (p) exp (η (p) t (x)) ,

where

24
Economics 240A Fall 2003

h (x) = 1 (x ∈ {0, 1}) ,

c (p) = 1 − p,
µ ¶
p
η (p) = log ,
1−p

t (x) = x.

The set
½ µ ¶ ¾
p
{η (p) : p ∈ (0, 1)} = log : p ∈ (0, 1) = R
1−p

is open, so
n
X n
X
t (Xi ) = Xi
i=1 i=1

is a complete suﬃcient statistic.

The maximum likelihood estimator
Pn
i=1 Xi
p̂ =
n

is unbiased,

Ep (p̂) = Ep (Xi ) = p,

P
and is based on ni=1 Xi . Therefore, p̂ is a UMVU estimator of p.
The conclusion is not aﬀected if Θ = [0, 1] is considered, as V arp (p̂) = 0 when p ∈ {0, 1} .
¡ ¢
Example. Suppose Xi ∼ i.i.d. N µ, σ 2 , where µ ∈ R and σ 2 > 0 are unknown parameters. In
this case, the marginal pdf satisÞes
Ã 2 !
¡ ¢ ¡ ¢ X ¡ ¢
2 2 2
f x|µ, σ = h (x) c µ, σ exp η i µ, σ ti (x) ,
i=1

where

25
Economics 240A Fall 2003

h (x) = 1,
µ ¶
¡ 2
¢ 1/2 ¡ 2 ¢−1/2 µ2
c µ, σ = (2π) σ exp − 2 ,
2σ
¡ ¢ µ
η1 µ, σ 2 = ,
σ2

t1 (x) = x,

¡ ¢ 1
η2 µ, σ 2 = − 2 ,
2σ

t2 (x) = x2 .

The set

n¡ ¡ ¢ ¡ ¢¢ o ½µ µ 1 0
¶ ¾
2 2 0 2 2
η1 µ, σ , η 2 µ, σ : µ ∈ R, σ > 0 = ,− : µ ∈ R, σ > 0 = R× (−∞, 0)
σ 2 2σ 2

is open, so
Ã n n
!0 Ã n n
!0
X X X X
t1 (Xi ) , t2 (Xi ) = Xi , Xi2
i=1 i=1 i=1 i=1

is a complete suﬃcient statistic.

The maximum likelihood estimator
Pn
i=1 Xi
µ̂ = X̄ =
n
¡Pn Pn ¢
2 0 and X̄ is therefore a UMVU estimator of µ.
of µ is unbiased and is based on i=1 Xi , i=1 Xi
An alternative UMVU estimator of µ is
½
X̄ for X̄ 6= 0
µ̃ =
1 for X̄ = 0.

Therefore, X̄ is only an essentially unique UMVU estimator of µ.

The sample variance

n n
Ã n !2
2 1 X¡ ¢2 1 X 2 1 X
S = Xi − X̄ = Xi − Xi
n−1 n−1 (n − 1) n
i=1 i=1 i=1
¡Pn Pn ¢
2 0
is unbiased and is based on i=1 Xi , i=1 Xi and S 2 is therefore a UMVU estimator of σ 2 .

26
Economics 240A Fall 2003

Outside the exponential family of distributions, we typically have to Þnd complete suﬃcient statistics
(if they exist) by applying the deÞnition of completeness.

Example. Suppose Xi ∼ i.i.d. U [0, θ] , where θ > 0 is an unknown parameter. The suﬃcient statis-
tic T = T (X1 , . . . , Xn ) = X(n) is continuous with pdf f (·|θ) given by (Casella and Berger, Example
6.2.23)

f (t|θ) = ntn−1 θ−n 1 (0 ≤ t ≤ θ) , θ > 0.

The statistic T is complete if the set {x ∈ R+ : g (x) 6= 0} has (Lebesgue) measure zero whenever g : R → R
is a function satisfying
Z θ
Eθ (g (T )) = g (t) ntn−1 θ−n dt = 0 ∀θ > 0.
0

Given any such function g, let g + and g − be the negative and positive parts of g, respectively; that is, let
g + (t) = max (0, g (t)) and g − (t) = max (0, −g (t)) . By assumption,
Z θ Z θ
g + (t) tn−1 dt = g − (t) tn−1 dt ∀θ > 0.
0 0

It can be shown (using the Radon-Nikodym theorem) that this implies that the set {x ∈ R+ : g + (x) 6= g − (x)}
has measure zero. Therefore, the set {x ∈ R+ : g (x) = g + (x) − g − (x) 6= 0} has measure zero. In particu-
lar, X(n) is a complete suﬃcient statistic.
The maximum likelihood estimator θ̂ML = X(n) is based on X(n) but is biased because

³ ´ Z θ Z θ
Eθ θ̂ML = tf (t|θ) dt = ntn θ−n dt
0 0
¯
n n+1 −n ¯¯θ
= t θ ¯
n+1 t=0
n
= θ.
n+1

On the other hand, the estimator

n+1 n+1
θ̂ = X(n) = θ̂ML
n n

is unbiased and is based on the complete suﬃcient statistic X(n) . As a consequence, θ̂ is a UMVU estimator
of θ.

In this example, we constructed a UMVU estimator of θ by Þnding “the” function θ̂ (·) such that
³ ´
Eθ θ̂ (T ) = θ ∀θ ∈ Θ.

27
Economics 240A Fall 2003

In cases where an unbiased estimator θ̂ (not based on a complete suﬃcient statistic) has already been
found, a UMVU estimator of θ can be found by “Rao-Blackwellization”; that is,
³ ´
θ̃ = E θ̂|T

is UMVU if θ̂ is unbiased and T is a complete suﬃcient statistic.

Example. Suppose Xi ∼ i.i.d. U [0, θ] , where θ > 0 is an unknown parameter. Since Eθ (Xi ) = θ/2,
an unbiased estimator of θ is

θ̂ = 2X1 .

The conditional distribution of X1 given X(n) = x(n)£ is a mixture

¤ distribution. SpeciÞcally, X1 equals x(n)
with probability 1/n and is uniformly distributed on 0, x(n) with probability (n − 1) /n. As a consequence,
³ ´ µ ¶
¡ ¢ 1 n − 1 X(n) n+1
E θ̂|X(n) = 2E X1 |X(n) = 2 X + · = X(n)
n (n) n 2 n

is a UMVU estimator of θ.

If a UMVU estimator does not exist or is hard to Þnd, it is nice to have a benchmark against which
all estimators can be compared. That is, it is nice to have a lower bound on the variance of any unbiased
estimator.

DeÞnition. An estimator θ̂ ∈ Wu (θ) of θ is locally minimum variance unbiased (LMVU) at θ0 if

³ ´
V arθ0 θ̂ ≤ V arθ0 (W )

for any W ∈ Wu (θ) .

Suppose an LMVU estimator exists at every θ0 ∈ Θ and let VL (θ0 ) denote the variance of the LMVU
estimator at θ0 . The function VL (·) : Θ → R+ provides us with a lower bound on the variance of any
estimator θ̂ ∈ Wu (θ) . The bound is sharp in the sense that it can be attained for any θ0 ∈ Θ. If the LMVU
estimator (or its variance) is hard to Þnd, we may nonetheless be able to Þnd a nontrivial lower bound on
the variance of any unbiased estimator of θ. A very useful bound, the Cramér-Rao bound, can be obtained
by applying the covariance inequality, a corollary of the Cauchy-Schwarz inequality.

Theorem (Cauchy-Schwarz Inequality; Casella and Berger, Theorem 4.7.3). If (X, Y ) is a

bivariate random vector, then
¡ ¡ ¢¢1/2 ¡ ¡ 2 ¢¢1/2
|E (XY )| ≤ E |XY | ≤ E X 2 E Y .

Corollary (Covariance Inequality; Casella and Berger, Example 4.7.4). If (X, Y ) is a bivariate
random vector, then

28
Economics 240A Fall 2003

Cov (X, Y )2 ≤ V ar (X) V ar (Y ) .

Let X1 , . . . , Xn be a random sample from a discrete (continuous) distribution with pmf (pdf) f (·|θ) ,
where θ ∈ Θ is an unknown parameter vector. Moreover, let θ̂ = θ̂ (X1 , . . . , Xn ) be any estimator of θ
and let ψ = ψ (X1 , . . . , Xn ; θ) be any vector-valued function of (X1 , . . . , Xn )0 and θ. It follows from the
(multivariate version of the) covariance inequality that
³ ´ ³ ´ ³ ´0
V arθ θ̂ ≥ Covθ θ̂, ψ V arθ (ψ)−1 Covθ θ̂, ψ .

In general, the lower bound on the right hand side depends on θ̂ and therefore the inequality may not seem
to be very helpful.
³ ´ ³ ´
Remark. It can be shown that the function (of θ and θ̂) Covθ θ̂, ψ depends on θ̂ only through Eθ θ̂
if and only if

Covθ (ψ, U ) = 0 ∀θ ∈ Θ

whenever U is a random variable with Eθ (U ) = ³ 0 and´ V arθ (U ) < ∞ for every θ ∈ Θ.

In particular, the function (of θ and θ̂) Covθ θ̂, ψ depends on θ̂ only through θ if and only if

Covθ (ψ, U ) = 0 ∀θ ∈ Θ

whenever U is a random variable with Eθ (U ) = 0 and V arθ (U ) < ∞ for every θ ∈ Θ.

³ ´
It turns out that under certain conditions (on f (·|θ)), Covθ θ̂, ψ is independent of θ̂ when θ̂ ∈ Wu (θ)
and ψ is the score function evaluated at θ.

DeÞnition. Let X1 , . . . , Xn be a random sample from a discrete (continuous) distribution with pmf
(pdf) f (·|θ) , where θ ∈ Θ ⊆ Rk is an unknown parameter vector. The score function is the (random)
function S (·|X1 , . . . , Xn ) : Θ → Rk given by
n
∂ X
S (θ|X1 , . . . , Xn ) = log f (Xi |θ) , θ ∈ Θ.
∂θ
i=1

Theorem (Cramér-Rao Inequality; Casella and Berger, Theorem 7.3.9). Let X1 , . . . , Xn be

a random sample from a discrete (continuous) distribution with pmf (pdf ) f (·|θ) , where θ ∈ Θ is an
unknown parameter vector. Moreover, let θ̂ ∈ Wu (θ) . If

Eθ (S (θ|X1 , . . . , Xn )) = 0

and

29
Economics 240A Fall 2003

³ ´
Eθ θ̂ · S (θ|X1 , . . . , Xn )0 = I,

then
³ ´
V arθ θ̂ ≥ V arθ (S (θ|X1 , . . . , Xn ))−1

whenever V arθ (S (θ|X1 , . . . , Xn )) exists and is positive deÞnite.

Proof. Let S (θ) = S (θ|X1 , . . . , Xn ) . Under the stated assumptions,

³ ´ ³ ´ ³ ´
Covθ θ̂, S (θ) = Eθ θ̂ · (S (θ) − Eθ (S (θ)))0 = Eθ θ̂ · S (θ)0

= I,

and it follows from the covariance inequality that

³ ´ ³ ´ ³ ´0
V arθ θ̂ ≥ Covθ θ̂, S (θ) V arθ (S (θ))−1 Covθ θ̂, S (θ)

= V arθ (S (θ))−1 . ¥

The high-level assumptions

Eθ (S (θ|X1 , . . . , Xn )) = 0

and
³ ´
Eθ θ̂ · S (θ|X1 , . . . , Xn )0 = I

both have natural interpretations. Suppose X = (X1 , . . . , Xn )0 is continuous with joint pdf fX (·|θ) and let
T (X1 . . . , Xn ) be any statistic with Eθ |T (X1 . . . , Xn )| < ∞. If we can interchange the order of diﬀerenti-
ation and diﬀerentiation, then

30
Economics 240A Fall 2003

where the last equality holds because

n
∂ X
S (θ|X1 , . . . , Xn ) = log f (Xi |θ)
∂θ
i=1
Ãn !
∂ Y
= log f (Xi |θ)
∂θ
i=1

∂
= log fX (X1 , . . . , Xn |θ)
∂θ

when X1 , . . . , Xn is a random sample from a distribution with pdf f (·|θ) . Setting h (X) = 1 and h (X) = θ̂,
we obtain the relations

Eθ (S (θ|X1 , . . . , Xn )) = 0

and
³ ´
Eθ θ̂ · S (θ|X1 , . . . , Xn )0 = I,

respectively.
Conditions under which we can interchange of the order of integration and diﬀerentiation are available
(Casella and Berger, Section 2.4).

Lemma. Let X1 , . . . , Xn be a random sample from a discrete (continuous) distribution with pmf (pdf )
f (·|θ) , where θ ∈ Θ. Suppose

(i) Θ is open.
(ii) The set {x ∈ R : f (x|θ) > 0} does not depend on θ.
(iii) For every θ ∈ Θ there is a function bθ : R → R+ and a constant ∆θ > 0 such that

V arθ (bθ (X)) < ∞

and
¯ ¯
¯ f (x|θ + δ) − f (x|θ) ¯
¯ ¯ < bθ (x) ∀x ∈ R
¯ δ ¯

whenever |δ| < ∆θ .

Then

∂ ¡ 0¢
0 Eθ (T (X)) = Eθ T (X) S (θ|X)
∂θ

for any statistic T (X1 . . . , Xn ) and any θ ∈ Θ.

31
Economics 240A Fall 2003

Remark. Conditions (ii) and (iii) of the lemma hold whenever f (x|θ) is of the form
Ã d !
X
f (x|θ) = h (x) c (θ) exp η i (θ) ti (x) , x ∈ R, θ ∈ Θ,
i=1

where each ηi (·) is diﬀerentiable.

The quantity
¡ ¢
I (θ) = Eθ S (θ|X1 , . . . , Xn ) S (θ|X1 , . . . , Xn )0

is called the information matrix, or the Fisher information. Under the assumptions of Cramér-Rao In-
equality, the Fisher information is

I (θ) = V arθ (S (θ|X1 , . . . , Xn ))

and I (θ)−1 provides a lower bound on the variance of any estimator θ̂ ∈ Wu (θ) .
The Fisher information is easy to compute when X1 , . . . , Xn is a random sample (Casella and Berger,
Corollary 7.3.10):

ÃÃ n
!Ã n
!0 !
∂ X ∂ X
I (θ) = Eθ log f (Xi |θ) log f (Xi |θ)
∂θ ∂θ
i=1 i=1
µµ ¶µ ¶0 ¶
∂ ∂
= n · Eθ log f (X|θ) log f (X|θ) ,
∂θ ∂θ

where X is a random variable with pmf/pdf f (·|θ) . Moreover, if

Z ∞ Z ∞
∂2 ∂2
0 f (x|θ) dx = f (x|θ) dx = 0,
−∞ ∂θ∂θ ∂θ∂θ0 −∞

then (Casella and Berger, Lemma 7.3.11)

µµ ¶µ ¶0 ¶
∂ ∂
I (θ) = n · Eθ log f (X|θ) log f (X|θ)
∂θ ∂θ
µ 2 ¶
∂
= −n · Eθ log f (X|θ) .
∂θ∂θ0

If the conditions of the Cramér-Rao inequality are satisÞed and it just so happens that an unbiased
estimator attains the bound, then the estimator is UMVU. The Cramér-Rao inequality can therefore be
used to establish optimality in some cases.

Example. Suppose Xi ∼ i.i.d. Ber (p) , where p ∈ (0, 1) is unknown. When x ∈ {0, 1} , we have:

32
Economics 240A Fall 2003

³ ´
log f (x|p) = log px (1 − p)1−x = x · log p + (1 − x) log (1 − p) ,

and

∂ x (1 − x) (1 − p) x p (x − 1) x 1
log f (x|p) = − = + = − .
∂p p 1−p p (1 − p) p (1 − p) p (1 − p) 1 − p

The conditions of the Cramér-Rao inequality are satisÞed, so the Fisher information is

µ ¶
Xi 1
I (p) = n · V arp −
p (1 − p) 1 − p
µ ¶2
1
= n· V arp (Xi )
p (1 − p)
µ ¶2
1
= n· p (1 − p)
p (1 − p)
n
= .
p (1 − p)

The variance of any unbiased estimator of p is bounded from below by

p (1 − p)
I (p)−1 = .
n

Now, the maximum likelihood estimator

Pn
i=1 Xi
p̂ = = X̄
n

of p satisÞes
¡ ¢ V arp (Xi ) p (1 − p)
V arp (p̂) = V arp X̄ = = .
n n

The maximum likelihood estimator attains the lower bound I (p)−1 and is therefore UMVU.

There are cases where the Cramér-Rao bound does not apply or fails to be sharp.

Example. Suppose Xi ∼ i.i.d. U [0, θ] , where θ > 0 is an unknown parameter. When 0 < x < θ,
we have:
1
log f (x|θ) = log = − log θ
θ

and

33
Economics 240A Fall 2003

∂ 1
log f (x|θ) = − .
∂θ θ

The conditions of the Cramér-Rao inequality are violated because

µ ¶
∂ n
Eθ (S (θ|X1 , . . . , Xn )) = n · Eθ log f (Xi |θ) = − 6= 0.
∂θ θ

We therefore cannot be sure that the Fisher information

Ãµ ¶ !
1 2 n
I (θ) = n · Eθ − = 2
θ θ

delivers a lower bound on the variance of unbiased estimators. In fact, the variance of UMVU estimator
n+1
θ̂ = X(n)
n

of θ is (Casella and Berger, Example 7.3.13)

³ ´ 1 1
V arθ θ̂ = θ2 < θ2 = I (θ)−1 .
n (n + 2) n

¡ ¢
Example. Suppose Xi ∼ i.i.d. N µ, σ2 , where µ ∈ R and σ 2 > 0 are unknown parameters. We
have:

µ µ ¶¶
¡ ¢ ¡ ¢−1/2 1
log f x|µ, σ 2 = log 2πσ 2 exp − 2 (x − µ)2
2σ
1 1 ¡ ¢ 1
= − log (2π) − log σ 2 − 2 (x − µ)2 ,
2 2 2σ
∂ ¡ ¢ 1
log f x|µ, σ2 = 2 (x − µ) ,
∂µ σ

and

∂ ¡ ¢ 1 1
2
log f x|µ, σ 2 = − 2 + 4 (x − µ)2 .
∂σ 2σ 2σ

The conditions of the Cramér-Rao inequality are satisÞed, so the Fisher information is
µ 1 ¶
¡ ¢ 0
I µ, σ 2 = n · σ2 .
0 2σ1 4

because
µ ¶
∂ ¡ 2
¢ 1 1
V ar(µ,σ2 ) log f Xi |µ, σ = 4 V ar(µ,σ2 ) (Xi − µ) = 2 ,
∂µ σ σ

34
Economics 240A Fall 2003

µ ¶ ³ ´
∂ ¡ 2
¢ 1 2 1
V ar(µ,σ2 ) log f Xi |µ, σ = 2 V ar 2
(µ,σ ) (Xi − µ) = 4,
∂σ2 4
(2σ ) 2σ

and

µ ¶ ³ ´
∂ ¡ ¢ ∂ ¡ ¢ 1 1 2
Cov(µ,σ2 ) log f Xi |µ, σ2 , 2 log f Xi |µ, σ2 = Cov 2
(µ,σ ) (Xi − µ) , (Xi − µ)
∂µ ∂σ σ 2 2σ 4

= 0.

¡ ¢0
As a consequence, a lower bound on the covariance matrix of an unbiased estimator of µ, σ 2 is
Ã 2 !
¡ ¢
2 −1
σ
0
I µ, σ = n .
4
0 2σn

In particular, no unbiased estimator of µ can have variance smaller than σ 2 /n. This lower bound is attained
by the maximum likelihood estimator µ̂ = X̄. The Cramér-Rao lower bound on the variance of unbiased
estimators of σ2 is 2σ 4 /n. The variance of the UMVU estimator S 2 is 2σ 4 / (n − 1) and the Cramér-Rao
bound therefore cannot be attained.

The following theorem delivers a couple of useful implications of optimality.

Theorem. If θ̂0 is a UMVU estimator of θ and θ̂1 ∈ Wu (θ) , then

³ ´
Covθ θ̂0 , θ̂1 − θ̂0 = 0 ∀θ ∈ Θ.

Proof. For any a ∈ R, let θ̂a be the unbiased estimator of θ given by

³ ´
θ̂a = (1 − a) θ̂0 + aθ̂1 = θ̂0 + a θ̂1 − θ̂0 .
³ ´ ³ ´ ³ ´
If θ̂0 is UMVU, then V arθ θ̂a ≥ V arθ θ̂0 for any θ ∈ Θ and any a ∈ R. For any θ ∈ Θ, V arθ θ̂a is a
diﬀerentiable (indeed, a quadratic) function of a :

³ ´ ³ ³ ´´
V arθ θ̂a = V arθ θ̂0 + a θ̂1 − θ̂0
³ ´ ³ ´ ³ ´
= V arθ θ̂0 + a2 V arθ θ̂1 − θ̂0 + 2a · Covθ θ̂0 , θ̂1 − θ̂0 .

³ ´ ³ ´
Therefore, if V arθ θ̂a ≥ V arθ θ̂0 for any θ ∈ Θ and any a ∈ R, then

d ³ ´¯¯ ³ ´
V arθ θ̂a ¯¯ = 2 · Covθ θ̂0 , θ̂1 − θ̂0 = 0 ∀θ ∈ Θ. ¥
da a=0

Remark. The theorem also has a converse. Indeed, if θ̂ ∈ Wu (θ) and

35
Economics 240A Fall 2003

³ ´
Covθ θ̂, θ̃ − θ̂ = 0 ∀θ ∈ Θ

for every θ̃ ∈ Wu (θ) , then θ̂ is a UMVU estimator of θ because

³ ´ ³ ´
V arθ θ̃ = V arθ θ̂ + θ̃ − θ̂
³ ´ ³ ´ ³ ´
= V arθ θ̂ + V arθ θ̃ − θ̂ + 2Covθ θ̂, θ̃ − θ̂
³ ´ ³ ´
= V arθ θ̂ + V arθ θ̃ − θ̂
³ ´
≥ V arθ θ̂

³ ´
whenever Covθ θ̂, θ̃ − θ̂ = 0.

Corollary (Casella and Berger, Theorem 7.3.20). If θ̂ is a UMVU estimator of θ and U is a

random variable with Eθ (U ) = 0 and V arθ (U ) < ∞ for every θ ∈ Θ, then
³ ´
Covθ θ̂, U = 0 ∀θ ∈ Θ.

Proof. Apply the theorem to θ̂0 = θ̂ and θ̂1 = θ̂ + U. ¥

Corollary. If θ̂ is a UMVU estimator of θ and θ̃ ∈ Wu (θ) , then

³ ´ ³ ´ ³ ´
V arθ θ̃ − θ̂ = V arθ θ̃ − V arθ θ̂

and
³ ´ ³ ´
Covθ θ̂, θ̃ = V arθ θ̂

for every θ ∈ Θ.

Proof. Now,

³ ´ ³ ´
Covθ θ̂, θ̃ = Covθ θ̂, θ̃ − θ̂ + θ̂
³ ´ ³ ´
= Covθ θ̂, θ̃ − θ̂ + V arθ θ̂
³ ´
= V arθ θ̂ ,

36
Economics 240A Fall 2003

³ ´
where the last holds because Covθ θ̂, θ̃ − θ̂ = 0 in view of the theorem. Using this relation,

³ ´ ³ ´ ³ ´ ³ ´
V arθ θ̃ − θ̂ = V arθ θ̃ + V arθ θ̂ − 2Covθ θ̂, θ̃
³ ´ ³ ´
= V arθ θ̃ − V arθ θ̂ . ¥

Corollary (Casella and Berger, Theorem 7.3.19). UMVU estimators are essentially unique in the
sense that if θ̂ is a UMVU estimator of θ and θ̃ ∈ Wu (θ) , then
³ ´ ³ ´
V arθ θ̃ > V arθ θ̂
³ ´
unless Pθ θ̃ = θ̂ = 1.

Proof.
³ If
´ a random
³ ´ variable
³ ´ X has V ar (X) = 0, then P (X = E (X)) = 1. ³ Unbiasedness
´ ³ ´implies
Eθ θ̃ − θ̂ = Eθ θ̃ − Eθ θ̂ = 0 and it therefore suﬃces to show that V arθ θ̃ > V arθ θ̂ unless
³ ´
V arθ θ̃ − θ̂ = 0. The stated result now follows from the previous corollary. ¥

GEM-2 Short User Manual
No ratings yet
GEM-2 Short User Manual
22 pages
A Complete Introduction To Time Series Analysis (With R) - SARIMA Models
No ratings yet
A Complete Introduction To Time Series Analysis (With R) - SARIMA Models
26 pages
Theory of Estimation Notes
No ratings yet
Theory of Estimation Notes
19 pages
Statistical Inference
100% (1)
Statistical Inference
118 pages
René L. Schilling - (Solution Manual) Brownian Motion - An Guide To Random Processes and Stochastic Calculus - (2022)
No ratings yet
René L. Schilling - (Solution Manual) Brownian Motion - An Guide To Random Processes and Stochastic Calculus - (2022)
240 pages
Answers To End-Of-Chapter Questions For Chapter 4, Chemical Calculations
0% (1)
Answers To End-Of-Chapter Questions For Chapter 4, Chemical Calculations
2 pages
Image Compression Comparison Using Golden Section Transform, CDF 5/3 (Le Gall 5/3) and CDF 9/7 Wavelet Transform by Matlab
No ratings yet
Image Compression Comparison Using Golden Section Transform, CDF 5/3 (Le Gall 5/3) and CDF 9/7 Wavelet Transform by Matlab
29 pages
Exploring The Limits of Bootstrap
No ratings yet
Exploring The Limits of Bootstrap
458 pages
Insurance Sem 4 - Copy1
No ratings yet
Insurance Sem 4 - Copy1
64 pages
Historical Dictionary of The British Empire
100% (1)
Historical Dictionary of The British Empire
767 pages
An Introduction To Generalized Linear Models (Third Edition, 2008) by Annette Dobson & Adrian Barnett Outline of Solutions For Selected Exercises
No ratings yet
An Introduction To Generalized Linear Models (Third Edition, 2008) by Annette Dobson & Adrian Barnett Outline of Solutions For Selected Exercises
23 pages
Cyclone MKV 2 - User Manual
No ratings yet
Cyclone MKV 2 - User Manual
44 pages
MTH210
No ratings yet
MTH210
126 pages
281A Final Sol
No ratings yet
281A Final Sol
9 pages
Fundamental Probability - 2006 - Paolella
100% (2)
Fundamental Probability - 2006 - Paolella
474 pages
Amzn1.Tortuga.3.Bc55a94a 9d6d 4883 A54c 888faa4c62c0.T23B1PWXCPIVJP
No ratings yet
Amzn1.Tortuga.3.Bc55a94a 9d6d 4883 A54c 888faa4c62c0.T23B1PWXCPIVJP
392 pages
Probability Density Functions: Created by T Madas
No ratings yet
Probability Density Functions: Created by T Madas
41 pages
Notes PDF
No ratings yet
Notes PDF
407 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
14 pages
Stat 231 Course Notes
100% (1)
Stat 231 Course Notes
326 pages
Qaisar Nadeem Department of Nuclear Engineering, PIEAS Pakistan 1 Meteorology and Radioactive Effluent Dispersion
No ratings yet
Qaisar Nadeem Department of Nuclear Engineering, PIEAS Pakistan 1 Meteorology and Radioactive Effluent Dispersion
21 pages
NPTEL Courses - Final Course List (Jan - April 2022)
No ratings yet
NPTEL Courses - Final Course List (Jan - April 2022)
15 pages
Additional Exercises For Convex Optimization PDF
No ratings yet
Additional Exercises For Convex Optimization PDF
187 pages
Washing Machine Owner's Instructions: B1485AV/ B1285AV/ B1285AS/ B1285A/ B1085A/ R1285AV/ R1085A/ F1285AV/ F1085A
No ratings yet
Washing Machine Owner's Instructions: B1485AV/ B1285AV/ B1285AS/ B1285A/ B1085A/ R1285AV/ R1085A/ F1285AV/ F1085A
22 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
13 pages
Sufficient Statistics - Problems - Solved - Xiang - Yin
No ratings yet
Sufficient Statistics - Problems - Solved - Xiang - Yin
5 pages
Statistical Inference
No ratings yet
Statistical Inference
55 pages
Lecture 7: Stochastic Differential Equations: Lecturer: Phạm Thị Hồng Thắm
No ratings yet
Lecture 7: Stochastic Differential Equations: Lecturer: Phạm Thị Hồng Thắm
31 pages
Equity Financing in Cooperatives. Three Case Studies in Dairy Sector
No ratings yet
Equity Financing in Cooperatives. Three Case Studies in Dairy Sector
15 pages
Master's Written Examination and Solution
No ratings yet
Master's Written Examination and Solution
14 pages
Balston Gas and Liquid Sample Analyzer Filters
No ratings yet
Balston Gas and Liquid Sample Analyzer Filters
50 pages
MSexam Stat 2019S Solutions
No ratings yet
MSexam Stat 2019S Solutions
11 pages
Tutorial On Stochastic Differential Equations
100% (1)
Tutorial On Stochastic Differential Equations
38 pages
Course Notes STATS 325 Stochastic Processes: Department of Statistics University of Auckland
No ratings yet
Course Notes STATS 325 Stochastic Processes: Department of Statistics University of Auckland
195 pages
Ejemplo de Inferencia Umvue
No ratings yet
Ejemplo de Inferencia Umvue
10 pages
Practice Final Solutions: N 1 2 3 B A A A+1 B ' 1 ' 1
No ratings yet
Practice Final Solutions: N 1 2 3 B A A A+1 B ' 1 ' 1
8 pages
Pretest English 7
No ratings yet
Pretest English 7
3 pages
Principles of Biostatistics: Class Notes To Accompany The Textbook by Pagano and Gauvreau
No ratings yet
Principles of Biostatistics: Class Notes To Accompany The Textbook by Pagano and Gauvreau
125 pages
Cramer Raoh and Out 08
No ratings yet
Cramer Raoh and Out 08
13 pages
MCMC Sheldon Ross
No ratings yet
MCMC Sheldon Ross
68 pages
Appm5450spring16final Solutions
No ratings yet
Appm5450spring16final Solutions
6 pages
AA'BB' Spectra
No ratings yet
AA'BB' Spectra
11 pages
MSexam Stat 2019F Solutions
No ratings yet
MSexam Stat 2019F Solutions
11 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
56 pages
Nkomazana 2005 Gender Analysis of Bogwera and Bojale Initiation Among Batswana
No ratings yet
Nkomazana 2005 Gender Analysis of Bogwera and Bojale Initiation Among Batswana
22 pages
(MADHU MANGAL PAUL) Numerical Analysis For Scienti
100% (1)
(MADHU MANGAL PAUL) Numerical Analysis For Scienti
666 pages
Kernel Density Estimation
No ratings yet
Kernel Density Estimation
10 pages
Solutions de Poisson
No ratings yet
Solutions de Poisson
5 pages
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
No ratings yet
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
7 pages
Department Presentation From BMTU New
No ratings yet
Department Presentation From BMTU New
18 pages
SIF - A318 - A319 - A320 - A321 - IPC - FSN - 002 - 01-Feb-2024 - FIG. 25-62-45-01F - COVER-DECORATIVE (Nov 01 - 20)
No ratings yet
SIF - A318 - A319 - A320 - A321 - IPC - FSN - 002 - 01-Feb-2024 - FIG. 25-62-45-01F - COVER-DECORATIVE (Nov 01 - 20)
2 pages
Sitrans fmt020
No ratings yet
Sitrans fmt020
11 pages
Eschatology - Kingdom of God
50% (2)
Eschatology - Kingdom of God
13 pages
Structure Question Properties of Ionic and Covalent
No ratings yet
Structure Question Properties of Ionic and Covalent
4 pages
Sufficient Statistics and Exponential Family
No ratings yet
Sufficient Statistics and Exponential Family
11 pages
Anoaj MS Id 000128
No ratings yet
Anoaj MS Id 000128
4 pages
Spring 2009
No ratings yet
Spring 2009
4 pages
Sampling Theory and Method-301-500
No ratings yet
Sampling Theory and Method-301-500
200 pages
Measure Theory - SEO
No ratings yet
Measure Theory - SEO
62 pages
Fake News Detectio3
No ratings yet
Fake News Detectio3
24 pages
Binomial Distribution
No ratings yet
Binomial Distribution
16 pages
3 - Principles of Data Reduction
No ratings yet
3 - Principles of Data Reduction
14 pages
Generalized Linear Models: Ariel Alonso Abad
No ratings yet
Generalized Linear Models: Ariel Alonso Abad
43 pages
Sta 212 Mathematical Statistics 1
No ratings yet
Sta 212 Mathematical Statistics 1
4 pages
MSexam Stat 2016S Solution
No ratings yet
MSexam Stat 2016S Solution
11 pages
202004160626023624rajiv Saksena Advance Statistical Inference
No ratings yet
202004160626023624rajiv Saksena Advance Statistical Inference
31 pages
Stat 581 Midterm Exam Solutions (Autumn 2011)
No ratings yet
Stat 581 Midterm Exam Solutions (Autumn 2011)
3 pages
PS: Advanced Probability Theory Sheet 1: Solutions
No ratings yet
PS: Advanced Probability Theory Sheet 1: Solutions
3 pages
Fall 2011
No ratings yet
Fall 2011
2 pages
(GAM) Application PDF
No ratings yet
(GAM) Application PDF
30 pages
2024WIAWIS Limbs
No ratings yet
2024WIAWIS Limbs
7 pages
Parametric Families of Discrete Distributions
No ratings yet
Parametric Families of Discrete Distributions
2 pages
SAMSAQ
No ratings yet
SAMSAQ
12 pages
STAT 480b Answer Key To Problem Set No. 4
No ratings yet
STAT 480b Answer Key To Problem Set No. 4
3 pages
Beamer 7
100% (1)
Beamer 7
92 pages
Gamma Extended Frechet Distribution
No ratings yet
Gamma Extended Frechet Distribution
23 pages
Statistics
No ratings yet
Statistics
41 pages
ADM Natural Vs Synthetic Vitamin E
No ratings yet
ADM Natural Vs Synthetic Vitamin E
2 pages
A Brief Course in Mathematical Statistics 1st Edition Tanis Hogg Solution Manual
75% (4)
A Brief Course in Mathematical Statistics 1st Edition Tanis Hogg Solution Manual
8 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
100% (1)
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
25 pages
Rohatgi Expl
No ratings yet
Rohatgi Expl
192 pages
Final Exam Sample Solutions
No ratings yet
Final Exam Sample Solutions
19 pages
GBT 1591 2018 en
No ratings yet
GBT 1591 2018 en
33 pages
STAT 2-2 Test of Hypothesis
No ratings yet
STAT 2-2 Test of Hypothesis
14 pages
Solution CH # 5
No ratings yet
Solution CH # 5
39 pages
A Proof of Jensen's Inequality
No ratings yet
A Proof of Jensen's Inequality
3 pages
Amc 71
No ratings yet
Amc 71
1 page
Profile
No ratings yet
Profile
2 pages
Jealousy, Jealousy - Olivia Rodrigo Song Worksheet
No ratings yet
Jealousy, Jealousy - Olivia Rodrigo Song Worksheet
1 page
Exponential Distribution
No ratings yet
Exponential Distribution
19 pages
Bayesian Statistics: A User's Perspective
No ratings yet
Bayesian Statistics: A User's Perspective
24 pages
Solution Exercises List 1 - Probability and Measure Theory
No ratings yet
Solution Exercises List 1 - Probability and Measure Theory
8 pages
Lecture Note 17
No ratings yet
Lecture Note 17
10 pages
NOTES
No ratings yet
NOTES
14 pages
Diffusions and Stochastic Differential Equations
No ratings yet
Diffusions and Stochastic Differential Equations
8 pages
Estimation and Hypothesis
100% (2)
Estimation and Hypothesis
32 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Mathematics 1: Matrix Algebra E. Kreyszig
No ratings yet
Mathematics 1: Matrix Algebra E. Kreyszig
1 page
Guidelines On OECD Code 2
No ratings yet
Guidelines On OECD Code 2
45 pages
A Survey of Minimal Surfaces
From Everand
A Survey of Minimal Surfaces
Robert Osserman
3.5/5 (1)
Drilling Engineering 30 Days Program
No ratings yet
Drilling Engineering 30 Days Program
2 pages
Levenberg Examples
100% (1)
Levenberg Examples
2 pages