0% found this document useful (0 votes)
10 views

Module3

Module 3 of MTH211A focuses on point estimation in statistics, detailing the process of estimating parameters using random samples. It introduces key concepts such as estimators, unbiased estimators, and the mean squared error (MSE), highlighting the importance of finding the uniformly minimum variance unbiased estimator (UMVUE). The module also discusses the Rao-Blackwell theorem and methods for finding UMVUE, emphasizing the role of sufficient statistics in improving estimators.

Uploaded by

priyamsahoojnvkp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Module3

Module 3 of MTH211A focuses on point estimation in statistics, detailing the process of estimating parameters using random samples. It introduces key concepts such as estimators, unbiased estimators, and the mean squared error (MSE), highlighting the importance of finding the uniformly minimum variance unbiased estimator (UMVUE). The module also discusses the Rao-Blackwell theorem and methods for finding UMVUE, emphasizing the role of sufficient statistics in improving estimators.

Uploaded by

priyamsahoojnvkp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

MTH211A: Theory of Statistics

Module 3: Point Estimation

Recall the (parametric) inference problem: Let X1 , · · · , Xn be a random sample from some
distribution F , which is parameterized by some parameter vector θ, θ ∈ Θ, where Θ is the parameter
space. Our goal is to infer about F (which is equivalent to infer about θ, or some function of θ) using
the random sample X1 , · · · , Xn .

Estimator. The main task of parametric inference is to estimate the parameter θ with the help
of samples X1 , · · · , Xn . For example, suppose it is assumed that the random sample belongs to the
normal family of distributions, but the parameters of the distribution, µ, σ 2 , are not specified. A
statistician naturally uses functions of sample observations (statistics) to estimate the parameters. For
example, one may use the sample mean, X̄n , to estimate µ, and the sample variance Sn2 to estimate
σ 2 . A statistic, which are used to estimate a parameter, are called an estimator. A realization of the
estimator based on a sample, which serves as a potential value of the parameter, is called an estimate.
Of course, for any parameter there exists many estimators. For example, one can also put forward
the sample median X̃me , instead of sample mean X̄n , to estimate µ. In this module, we will learn
some desirable properties that a good estimator should satisfy.

1 Desirable properties of a point estimators


Definition 1 (Mean Squared Error). The mean squared error (MSE) of an estimator T (X) of the
parameter θ is defined as Eθ {T (X) − θ}2 .
Remark 1. In the definition of MSE, the suffix θ in Eθ (·) is used to emphasize that the data generating
process of the sample {X1 , . . . , Xn } involves the parameter θ. As the expectation is taken over X, the
MSE is a function of θ only. For example, let {X1 , . . . , Xn } be a random sample from N (µ, σ 2 ), then
the MSE of X̄n is Eµ,σ2 (X̄n − µ)2 = varµ,σ2 (X̄n ) = σ 2 /n.
Remark 2. Instead of MSE one can use other (expected) measures of difference, for example, mean
absolute error (MAE) Eθ |T (X) − θ|. However, MSE is usually preferred over MAE due to algebraic
amenability.
Remark 3. The MSE can be decomposed into two parts:
2
MSE = Eθ [{T (X) − Eθ (T (X)} + {Eθ (T (X) − θ}] = varθ {T (X)}+{Eθ (T (X))−θ}2 = variance+bias2 .
Intuitively, the MSE is the sum of two quantities, one measuring accuracy (bias2 ), and the other
measuring precision (variance).
Remark 4. Naturally, one would like to use the point estimator having minimum MSE, among the pool
of all possible estimators. However, the minimum MSE estimator do not generally exist. Therefore,
one possible way is to focus on a (reasonable) sub-class of all possible estimators, and try to find
estimator with minimum MSE in that subclass. The subclass we focus on is the class of unbiased
estimators.
Definition 2 (Unbiased estimator). The expected difference of the estimator T (X) and the parameter
θ, when the data generating process involves θ, is called bias. An estimator T (X) is called unbiased
for θ if the bias is zero for all θ ∈ Θ, i.e., if Eθ {T (X)} = θ for all θ ∈ Θ.
Example 1. Let {X1 , . . . , Xn } be a random sample from some population with finite mean µ. Then
sample mean X̄n is an unbiased estimator of µ. Further, if {X1 , . . . , Xn } is a random
sample from some population with finite variance σ 2 , then the estimator Sn⋆2 = nSn2 /(n−
1) is an unbiased estimator of σ 2 .

1
Note: For an unbiased estimator T (X), MSE is equal to variance of the estimator.
Remark 5. A biased estimator may have lower MSE than an unbiased estimator. For example,
consider a random sample {X1 , . . . , Xn } from Normal(µ, σ 2 ), µ ∈ R, σ 2 > 0. The MSE of unbiased
sample variance Sn⋆2 , 2σ 4 /(n − 1), is higher than the MSE of sample variance Sn2 , (2n − 1)σ 4 /n2 .
Example 2. Let {X1 , . . . , Xn } be a random sample from Bernoulli(p), 0 ≤ p ≤ 1. Then X̄n is
unbiased estimator of p, and the MSE of X̄n is p(1 − p)/n.
Remark 6. An unbiased estimator of θ may not exist. For example, in the Bernoulli(p) example
above, an unbiased estimator of p2 based on a random sample of size n = 1 does not exist. (WHY?)
For a parameter θ, if there exist an unbiased estimator of θ, then θ is called U-estimable.
Remark 7. An unbiased estimator may not always be a good estimator. Consider a random sample
X of size n = 1 from Poisson(λ) distribution, and suppose θ = exp{−3λ}. It can be shown that
T = T (X) = (−2)X is an unbiased estimator of θ. However, T is an absurd estimator as it may have
large negative realization, although θ is positive.
Remark 8. Usually unbiased estimator of an U-estimable parameter θ is not unique. In fact if there
exists two different unbiased estimator T1 and T2 of θ, then for any α ∈ (0, 1), Tα = αT1 + (1 − α)T2
is also unbiased for θ.
As MSE is equal to variance for an unbiased estimator, the best estimator (in terms of MSE) in
the class of unbiased estimators is the one with minimum variance.
Definition 3 (Uniformly minimum variance unbiased estimator). An estimator of θ, T (X), is called
a uniformly minimum variance unbiased estimator (UMVUE) if
(i) Eθ {T (X)} = θ for all θ ∈ Θ, and
(ii) for any other estimator T ′ (X) with Eθ {T ′ (X)} = θ, varθ {T (X)} ≤ varθ {T ′ (X)} for all θ ∈ Θ.
Remark 9. In the above definition, it is important that (i) and (ii) satisfy for all θ ∈ Θ. If (i) and
(ii) satisfy for a particular choice of θ, say θ0 , then the corresponding estimator T (X) is called locally
minimum variance unbiased estimator (LMVUE).

1.1 Properties of UMVUE


Understanding the properties of UMVUE is necessary for finding UMVUE in practical applications.
The first property of UMVUE says that, an UMVUE must be uncorrelated to any unbiased estimator
of zero. One can interpret a random variable having zero expectation as a random noise. Ideally,
the best unbiased estimator should be independent of an unbiased estimator of zero, as any part of
UMVUE should not be explained by a random noise.
Remark 10 (Role of unbiased estimator of zero). An unbiased estimator of zero, say S(X), is an
estimator satisfying Eθ {S(X)} = 0 for all θ ∈ Θ. For any unbiased estimator of θ, say T (X), and an
unbiased estimator of zero, say S(X), a (uncountable) class of unbiased estimators of θ can be obtained
by adding a multiple of S(X) with T (X), i.e., T (X) + αS(X), α ∈ R. An UMVUE must have the
minimum variance among this class of unbiased estimators, considering all possible α and S.
Theorem 1 (Properties of UMVUE: 1). Let U be the class of all unbiased estimators of θ ∈ Θ with fi-
nite variance, i.e., U = T = T (X) : Eθ (T ) = θ, and Eθ T 2 < ∞ for all θ ∈ Θ , and V0 be theclass
of unbiased estimators of zero with finite variance, i.e., V0 = S = S(X) : Eθ (S) = 0, and Eθ S 2 < ∞ for all θ ∈ Θ .
Suppose U is non-empty. Then T (X) ∈ U is UMVUE of θ ∈ Θ if and only if T (X) is uncorrelated
with all unbiased estimators of zero. [Proof]
Remark 11. The above characterization of UMVUE is of limited application, as the class of unbiased
estimators of zero is also very large. However, the above theorem is useful in proving that an unbiased
estimator of θ is not an UMVUE.
Example 3. Let X1 , . . . , Xn be a random sample from Poisson(θ) distribution. The estimators
T1 (X) = X̄n , T2 (X) = Sn⋆2 and T3 (X) = X1 are unbiased for θ. However, we can
discard T3 as a possible candidate of UMVUE as varθ X̄n < varθ (X1 ) for n > 1. Now,
to check if T2 can be the UMVUE, consider the covariance of T2 and T1 − T3 . Verify
that the covariance is non-zero. Hence T2 can not be an UMVUE of θ.

2
Corollary 1 (Properties of UMVUE: 2). The UMVUE of θ is unique. [Proof]
The next theorem provides a way to improve an unbiased estimator of θ using a sufficient statistic.
Theorem 2 (Rao-Blackwell Theorem). Let T1 (X) be an unbiased estimator of θ and T (X) be a
sufficient statistic for θ. Then the conditional expectation ϕ(t) = E(T1 (X) | T (X) = t) defines a
statistic ϕ(T ). This statistic ϕ(T ) is
(i) a function of the sufficient statistic T (X) for θ,
(ii) is an unbiased estimator of θ, and
(iii) satisfies varθ (ϕ(T )) ≤ varθ (T1 (X)) for all θ ∈ Θ.
Proof of Theorem 2 requires the following result:

Result 1. Under the existence and finiteness of all the relevant expectations, the following properties
of conditional expectation and variance are satisfied:
 
(A) EZ EY |Z {h(Y ) | Z} = E {h(Y )} ,
   
(B) varZ EY |Z {h(Y ) | Z} + EZ varY |Z {h(Y ) | Z} = varY {h(Y )} .

[Proof of Theorem 2]

Example 4. Let X1 , . . . , Xn be n random samples from Bernoulli(θ). Then X1 an unbiased estima-


tor
Pnof θ. However, Pn a better estimator can be constructed by considering T1 (X) = E(X1 |
X
i=1 i ) as i=1 Xi is a sufficient statistic.

Remark 12. If T1 (X) is solely a function of a the sufficient statistic T (X) then varθ (ϕ(T )) =
varθ (T1 (X)) for all θ ∈ Θ. [Proof]
Remark 13. For any statistic T (X), and an unbiased estimator T1 (X), E {E(T1 (X) | T (X) = t)} = θ,
and conclusion (iii) in Theorem 2 holds. However, if T (X) is not sufficient,
Pn then E(T1 (X) | T (X) = t)
may not be a statistic. For instance, consider the function E(n−1 i=1 Xi | X1 ) in the above example.
Observe that !
n n
1X x1 1X x1 + (n − 1)θ
E Xi | X1 = x1 = + E(Xi ) = ,
n i=1 n n i=2 n
which is not an estimator.
Remark 14. Rao-Blackwell theorem indicates that the UMVUE must be a function of sufficient statis-
tic. If not, then a better estimate can be obtained by considering the conditional expectation given a
sufficient statistic.

Note: Rao-Blackwell theorem provides a way to improve on an existing estimator. Based on the
theorem, we can make a general recommendation of selecting an appropriate (unbiased) function of a
sufficient statistic as an estimator of θ. However, the class of sufficient statistics is also uncountable (as
one-one function of sufficient statistic is also sufficient). Thus, this theorem does not directly indicate
the choice of UMVUE.

Remark 15. It is not, in general, easy to characterize the class of unbiased estimators of zero,
consequently, verifying if cov(T (X), U (X)) = 0, for all unbiased estimator U (X) of zero is difficult.
However, if the distribution of T (X) is complete and U is a function of T , then it implies that the only
(with probability one) unbiased estimator of zero is zero itself (with probability one). Thus, correlation
of any estimator T of θ and an unbiased estimator of zero, must be zero. The Lehman-Scheffe theorem
formalizes this idea.

Theorem 3 (Lehman-Scheffe Theorem). If T is a complete-sufficient statistic, and there exists an


unbiased estimator T1 (X) of θ, then the UMVUE of θ is given by ϕ(T ) = E(T1 | T ). [Proof]
Remark 16. The above theorem says that if one can obtain an unbiased estimator, T (X), of θ based
on a complete-sufficient statistics, then that much be the UMVUE of θ. For instance, in Example 4,
T1 (X) is the UMVUE of θ.

3
2 Methods of Finding UMVUE
I. Rao-Blackwellization: If an unbiased estimator of θ is available, and the distribution of the
complete sufficient statistic in known. Then one might find the UMVUE ba taking the conditional
expectation the unbiased estimator given the complete sufficient statistic.

Example 7. Let X1 , . . . , Xn be a random sample from Poisson(λ) distribution. Consider the problem
of estimating P (X = 1) = θP= λ exp{−λ}. Observe that Y1 = I{1} (X1 ) is an unbiased
n
estimator of θ, and T (X) = i=1 Xi is sufficient for λ, and hence for θ. The conditional
distribution of Y1 given T is a two point distribution with probability mass ϕ(t) =
t(n − 1)t−1 /nt for Y1 = 1 and 1 − q otherwise. Thus the conditional expectation is ϕ(t).
Thus the statistic ϕ(T ) = T (1 − 1/n)T /(n − 1) is the UMVUE for θ.

III. Method of Solving: Let T (X) be a complete sufficient statistic. As it is known that any function
of T which is an unbiased estimator of θ is the UMVUE of θ, one can obtain the UMVUE directly
by solving for the function g(T ) such that Eθ (g(T )) = θ.

Example 8. In the same problem of estimating P (X = 1) = θ = λ exp{−λ}, we can apply the direct
solving method as follows:
nt λt (n − 1)t−1 λt
Eλ {g(T )} = θ ⇔ g(0)+g(1)nλ+· · ·+g(t) +· · · = λ+(n−1)λ2 +· · ·+ +· · · ,
t! t!
which arrives at the same choice of UMVUE.

Remark 17. From Lehman-Scheffe theorem, it is intuitive that if T is a complete sufficient statistic,
and T ′ is any other sufficient statistic, then ϕ(T ′ ) (ϕ as described in Rao-Blackwell theorem) must be
a function of ϕ(T ), which in turn implies that T must be minimal sufficient. We conclude this section
with the proof of Theorem 6, stated in Module 2.
We now prove the theorem stated in the last module.
Theorem 4. A complete-sufficient statistic is minimal sufficient. [Proof]

3 Variance Inequalities
In many situations it is not possible to obtain a complete sufficient statistic, or it is difficult to verify
if a given estimator is the UMVUE. Further, in many situations an UMVUE does not exist. In such
cases, if one can obtain a tight (achievable) lower-bound for the class of variances of all the unbiased
estimators, then it would be possible evaluate the performance of any unbiased estimator.
The Rao-Cramer lower bound serves this purpose.
Theorem 5 (Cramer-Rao Lowerbound, CRLB). Let X1 , · · · , Xn be a sample with pdf fX (·; θ), θ ∈ Θ,
satisfying the following regularity conditions:
(i) Θ is an open interval in R, and the support SX does not depend on θ.
(ii) For each x ∈ SX and θ ∈ Θ, the derivative ∂ log fX (x; θ)/∂θ exists and is finite.
(iii) For any statistic S(X) with E(|S(X)|) < ∞ for all θ, we have
Z
∂ ∂
Eθ {S(X)} = S(x) fX (x; θ)dx.
∂θ ∂θ
Let T (X) be a statistic satisfying varθ {T (X)} < ∞. " 2 #
∂ ∂
Define Eθ {T (X)} = ψ(θ), ψ ′ (θ) = ψ(θ) and I(θ) = Eθ log fX (x; θ) .
∂θ ∂θ
If 0 < I(θ) < ∞, then T (X) satisfies

[ψ ′ (θ)]2
varθ {T (X)} ≥ . (1)
I(θ)
[Proof of Theorem 5]

4
Remark 18. From the proof of Theorem 5 it follows that equality holds in (1) if T (X) − ψ(θ) =

k(θ) ∂θ log fX (X; θ) with probability one. Integrating both sides with respect to θ, we observe that
equality holds for one-parameter exponential family with T (X) being a sufficient statistic.
Remark 19. When ψ(θ) = θ, then the CRLB reduces to varθ {T (X)} ≥ [I(θ)]−1 .
 2

Remark 20. When X1 , · · · , Xn are i.i.d., then I(θ) = nI1 (θ), where I1 (θ) = Eθ log fX1 (x; θ) .
∂θ
Remark 21. The quantity I(θ) is called the Fisher information of the sample X1 , · · · , Xn . As n
increases, I(θ) increases and consequently the variance of the UMVUE decreases. Thus the estimator
becomes more concentrated around θ, i.e., has more information about θ. To understand the intuition
behind the term information further, read this.
  Z   
∂ ∂ ∂ ∂
Remark 22. If fX (x; θ) satisfies Eθ log fX (X; θ) = log fX (x; θ) fX (x; θ) dx,
∂θ ∂θ ∂θ ∂θ
then " 2 #  2 
∂ ∂
Eθ log fX (X; θ) = −Eθ log fX (X; θ)
∂θ ∂θ2

Remark 23. CRLB provides another way to verify if an unbiased estimator of θ is UMVUE or not.
We can simply compute the variance of the given unbiased estimator, and check if the variance matches
the CRLB. However, one must be cautious about applying this method. The regularity conditions (i)-
(iii) must be satisfied by the underlying class of distributions, and the estimators under consideration.
For example, Uniform(0, θ), location exponential distributions do not satisfy criteria (i).
Remark 24. Even when the family of distributions satisfies the regularity conditions, CRLB may not
be achieved by the UMVUE of θ. The following is one such example.

Example 9. Let X1 , · · · , Xn be a random sample from Normal(µ, σ 2 ). It is not difficult to see that
the CRLB is 2σ 4 /n, and the variance of the UMVUE Sn⋆2 (being an unbiased estimator
based on a complete sufficient statistic) is 2σ 4 /(n − 1).

Example 10. Let X1 , . . . , Xn be a random sample from a location family of distributions, i.e., Xi =
θ + Wi , i = 1, .. . , n, where Wi s are i.i.d. from a distribution with p.d.f. fW , free of θ.

Then I(θ) = E {fW (W )/fW (W )}2 is free of θ.

We conclude this section with some definitions.


Definition 4 (Efficiency of twoestimators). Let T1 and T2 be two unbiased estimators for the param-
eter ψ(θ). Suppose that Eθ Ti2 < ∞, for i = 1, 2. Then the efficiency of T1 relative to T2 is defined
as
varθ (T2 )
eff θ (T1 | T2 ) = .
varθ (T1 )
We say that T1 is more efficient than T2 if eff θ (T1 | T2 ) > 1.
The following definition assumes that the underlying family of distributions satisfy the regularity
conditions stated in Theorem 5.
Definition 5 (Efficiency of an Estimator). The ratio of the CRLB to the actual variance of an unbiased
estimator T (X) is called efficiency of T .
Definition 6 (Efficient Estimator). An unbiased estimator of θ, T (X) is said to be efficient or most
efficient if variance of T attains the CRLB.

You might also like