Module3
Module3
Recall the (parametric) inference problem: Let X1 , · · · , Xn be a random sample from some
distribution F , which is parameterized by some parameter vector θ, θ ∈ Θ, where Θ is the parameter
space. Our goal is to infer about F (which is equivalent to infer about θ, or some function of θ) using
the random sample X1 , · · · , Xn .
Estimator. The main task of parametric inference is to estimate the parameter θ with the help
of samples X1 , · · · , Xn . For example, suppose it is assumed that the random sample belongs to the
normal family of distributions, but the parameters of the distribution, µ, σ 2 , are not specified. A
statistician naturally uses functions of sample observations (statistics) to estimate the parameters. For
example, one may use the sample mean, X̄n , to estimate µ, and the sample variance Sn2 to estimate
σ 2 . A statistic, which are used to estimate a parameter, are called an estimator. A realization of the
estimator based on a sample, which serves as a potential value of the parameter, is called an estimate.
Of course, for any parameter there exists many estimators. For example, one can also put forward
the sample median X̃me , instead of sample mean X̄n , to estimate µ. In this module, we will learn
some desirable properties that a good estimator should satisfy.
1
Note: For an unbiased estimator T (X), MSE is equal to variance of the estimator.
Remark 5. A biased estimator may have lower MSE than an unbiased estimator. For example,
consider a random sample {X1 , . . . , Xn } from Normal(µ, σ 2 ), µ ∈ R, σ 2 > 0. The MSE of unbiased
sample variance Sn⋆2 , 2σ 4 /(n − 1), is higher than the MSE of sample variance Sn2 , (2n − 1)σ 4 /n2 .
Example 2. Let {X1 , . . . , Xn } be a random sample from Bernoulli(p), 0 ≤ p ≤ 1. Then X̄n is
unbiased estimator of p, and the MSE of X̄n is p(1 − p)/n.
Remark 6. An unbiased estimator of θ may not exist. For example, in the Bernoulli(p) example
above, an unbiased estimator of p2 based on a random sample of size n = 1 does not exist. (WHY?)
For a parameter θ, if there exist an unbiased estimator of θ, then θ is called U-estimable.
Remark 7. An unbiased estimator may not always be a good estimator. Consider a random sample
X of size n = 1 from Poisson(λ) distribution, and suppose θ = exp{−3λ}. It can be shown that
T = T (X) = (−2)X is an unbiased estimator of θ. However, T is an absurd estimator as it may have
large negative realization, although θ is positive.
Remark 8. Usually unbiased estimator of an U-estimable parameter θ is not unique. In fact if there
exists two different unbiased estimator T1 and T2 of θ, then for any α ∈ (0, 1), Tα = αT1 + (1 − α)T2
is also unbiased for θ.
As MSE is equal to variance for an unbiased estimator, the best estimator (in terms of MSE) in
the class of unbiased estimators is the one with minimum variance.
Definition 3 (Uniformly minimum variance unbiased estimator). An estimator of θ, T (X), is called
a uniformly minimum variance unbiased estimator (UMVUE) if
(i) Eθ {T (X)} = θ for all θ ∈ Θ, and
(ii) for any other estimator T ′ (X) with Eθ {T ′ (X)} = θ, varθ {T (X)} ≤ varθ {T ′ (X)} for all θ ∈ Θ.
Remark 9. In the above definition, it is important that (i) and (ii) satisfy for all θ ∈ Θ. If (i) and
(ii) satisfy for a particular choice of θ, say θ0 , then the corresponding estimator T (X) is called locally
minimum variance unbiased estimator (LMVUE).
2
Corollary 1 (Properties of UMVUE: 2). The UMVUE of θ is unique. [Proof]
The next theorem provides a way to improve an unbiased estimator of θ using a sufficient statistic.
Theorem 2 (Rao-Blackwell Theorem). Let T1 (X) be an unbiased estimator of θ and T (X) be a
sufficient statistic for θ. Then the conditional expectation ϕ(t) = E(T1 (X) | T (X) = t) defines a
statistic ϕ(T ). This statistic ϕ(T ) is
(i) a function of the sufficient statistic T (X) for θ,
(ii) is an unbiased estimator of θ, and
(iii) satisfies varθ (ϕ(T )) ≤ varθ (T1 (X)) for all θ ∈ Θ.
Proof of Theorem 2 requires the following result:
Result 1. Under the existence and finiteness of all the relevant expectations, the following properties
of conditional expectation and variance are satisfied:
(A) EZ EY |Z {h(Y ) | Z} = E {h(Y )} ,
(B) varZ EY |Z {h(Y ) | Z} + EZ varY |Z {h(Y ) | Z} = varY {h(Y )} .
[Proof of Theorem 2]
Remark 12. If T1 (X) is solely a function of a the sufficient statistic T (X) then varθ (ϕ(T )) =
varθ (T1 (X)) for all θ ∈ Θ. [Proof]
Remark 13. For any statistic T (X), and an unbiased estimator T1 (X), E {E(T1 (X) | T (X) = t)} = θ,
and conclusion (iii) in Theorem 2 holds. However, if T (X) is not sufficient,
Pn then E(T1 (X) | T (X) = t)
may not be a statistic. For instance, consider the function E(n−1 i=1 Xi | X1 ) in the above example.
Observe that !
n n
1X x1 1X x1 + (n − 1)θ
E Xi | X1 = x1 = + E(Xi ) = ,
n i=1 n n i=2 n
which is not an estimator.
Remark 14. Rao-Blackwell theorem indicates that the UMVUE must be a function of sufficient statis-
tic. If not, then a better estimate can be obtained by considering the conditional expectation given a
sufficient statistic.
Note: Rao-Blackwell theorem provides a way to improve on an existing estimator. Based on the
theorem, we can make a general recommendation of selecting an appropriate (unbiased) function of a
sufficient statistic as an estimator of θ. However, the class of sufficient statistics is also uncountable (as
one-one function of sufficient statistic is also sufficient). Thus, this theorem does not directly indicate
the choice of UMVUE.
Remark 15. It is not, in general, easy to characterize the class of unbiased estimators of zero,
consequently, verifying if cov(T (X), U (X)) = 0, for all unbiased estimator U (X) of zero is difficult.
However, if the distribution of T (X) is complete and U is a function of T , then it implies that the only
(with probability one) unbiased estimator of zero is zero itself (with probability one). Thus, correlation
of any estimator T of θ and an unbiased estimator of zero, must be zero. The Lehman-Scheffe theorem
formalizes this idea.
3
2 Methods of Finding UMVUE
I. Rao-Blackwellization: If an unbiased estimator of θ is available, and the distribution of the
complete sufficient statistic in known. Then one might find the UMVUE ba taking the conditional
expectation the unbiased estimator given the complete sufficient statistic.
Example 7. Let X1 , . . . , Xn be a random sample from Poisson(λ) distribution. Consider the problem
of estimating P (X = 1) = θP= λ exp{−λ}. Observe that Y1 = I{1} (X1 ) is an unbiased
n
estimator of θ, and T (X) = i=1 Xi is sufficient for λ, and hence for θ. The conditional
distribution of Y1 given T is a two point distribution with probability mass ϕ(t) =
t(n − 1)t−1 /nt for Y1 = 1 and 1 − q otherwise. Thus the conditional expectation is ϕ(t).
Thus the statistic ϕ(T ) = T (1 − 1/n)T /(n − 1) is the UMVUE for θ.
III. Method of Solving: Let T (X) be a complete sufficient statistic. As it is known that any function
of T which is an unbiased estimator of θ is the UMVUE of θ, one can obtain the UMVUE directly
by solving for the function g(T ) such that Eθ (g(T )) = θ.
Example 8. In the same problem of estimating P (X = 1) = θ = λ exp{−λ}, we can apply the direct
solving method as follows:
nt λt (n − 1)t−1 λt
Eλ {g(T )} = θ ⇔ g(0)+g(1)nλ+· · ·+g(t) +· · · = λ+(n−1)λ2 +· · ·+ +· · · ,
t! t!
which arrives at the same choice of UMVUE.
Remark 17. From Lehman-Scheffe theorem, it is intuitive that if T is a complete sufficient statistic,
and T ′ is any other sufficient statistic, then ϕ(T ′ ) (ϕ as described in Rao-Blackwell theorem) must be
a function of ϕ(T ), which in turn implies that T must be minimal sufficient. We conclude this section
with the proof of Theorem 6, stated in Module 2.
We now prove the theorem stated in the last module.
Theorem 4. A complete-sufficient statistic is minimal sufficient. [Proof]
3 Variance Inequalities
In many situations it is not possible to obtain a complete sufficient statistic, or it is difficult to verify
if a given estimator is the UMVUE. Further, in many situations an UMVUE does not exist. In such
cases, if one can obtain a tight (achievable) lower-bound for the class of variances of all the unbiased
estimators, then it would be possible evaluate the performance of any unbiased estimator.
The Rao-Cramer lower bound serves this purpose.
Theorem 5 (Cramer-Rao Lowerbound, CRLB). Let X1 , · · · , Xn be a sample with pdf fX (·; θ), θ ∈ Θ,
satisfying the following regularity conditions:
(i) Θ is an open interval in R, and the support SX does not depend on θ.
(ii) For each x ∈ SX and θ ∈ Θ, the derivative ∂ log fX (x; θ)/∂θ exists and is finite.
(iii) For any statistic S(X) with E(|S(X)|) < ∞ for all θ, we have
Z
∂ ∂
Eθ {S(X)} = S(x) fX (x; θ)dx.
∂θ ∂θ
Let T (X) be a statistic satisfying varθ {T (X)} < ∞. " 2 #
∂ ∂
Define Eθ {T (X)} = ψ(θ), ψ ′ (θ) = ψ(θ) and I(θ) = Eθ log fX (x; θ) .
∂θ ∂θ
If 0 < I(θ) < ∞, then T (X) satisfies
[ψ ′ (θ)]2
varθ {T (X)} ≥ . (1)
I(θ)
[Proof of Theorem 5]
4
Remark 18. From the proof of Theorem 5 it follows that equality holds in (1) if T (X) − ψ(θ) =
∂
k(θ) ∂θ log fX (X; θ) with probability one. Integrating both sides with respect to θ, we observe that
equality holds for one-parameter exponential family with T (X) being a sufficient statistic.
Remark 19. When ψ(θ) = θ, then the CRLB reduces to varθ {T (X)} ≥ [I(θ)]−1 .
2
∂
Remark 20. When X1 , · · · , Xn are i.i.d., then I(θ) = nI1 (θ), where I1 (θ) = Eθ log fX1 (x; θ) .
∂θ
Remark 21. The quantity I(θ) is called the Fisher information of the sample X1 , · · · , Xn . As n
increases, I(θ) increases and consequently the variance of the UMVUE decreases. Thus the estimator
becomes more concentrated around θ, i.e., has more information about θ. To understand the intuition
behind the term information further, read this.
Z
∂ ∂ ∂ ∂
Remark 22. If fX (x; θ) satisfies Eθ log fX (X; θ) = log fX (x; θ) fX (x; θ) dx,
∂θ ∂θ ∂θ ∂θ
then " 2 # 2
∂ ∂
Eθ log fX (X; θ) = −Eθ log fX (X; θ)
∂θ ∂θ2
Remark 23. CRLB provides another way to verify if an unbiased estimator of θ is UMVUE or not.
We can simply compute the variance of the given unbiased estimator, and check if the variance matches
the CRLB. However, one must be cautious about applying this method. The regularity conditions (i)-
(iii) must be satisfied by the underlying class of distributions, and the estimators under consideration.
For example, Uniform(0, θ), location exponential distributions do not satisfy criteria (i).
Remark 24. Even when the family of distributions satisfies the regularity conditions, CRLB may not
be achieved by the UMVUE of θ. The following is one such example.
Example 9. Let X1 , · · · , Xn be a random sample from Normal(µ, σ 2 ). It is not difficult to see that
the CRLB is 2σ 4 /n, and the variance of the UMVUE Sn⋆2 (being an unbiased estimator
based on a complete sufficient statistic) is 2σ 4 /(n − 1).
Example 10. Let X1 , . . . , Xn be a random sample from a location family of distributions, i.e., Xi =
θ + Wi , i = 1, .. . , n, where Wi s are i.i.d. from a distribution with p.d.f. fW , free of θ.
′
Then I(θ) = E {fW (W )/fW (W )}2 is free of θ.