Stat2602 Chapter3
Stat2602 Chapter3
Definition
Example 3.1
1 xi 2
f x; θ
20
exp
i 1 2 2 2 2
1 20 2
2 20 exp 2 xi
10
2 i 1
The sample space for this model is n ; and the parameter space is
0, .
P.41
Stat2602 Probability and Statistics II Fall 2014-2015
Example 3.2
Suppose that in an election of governor for a large city, there are 3 candidates: Ada,
Bob, and Carter. Denote 1 , 2 , and 3 respectively as the proportion of the votes
they will receive, where 1 , 2 , 3 are unknown constants. An opinion poll asks a
sample of 100 voters who they would vote, and observes the counts as X 1 , X 2 ,
X 3 . Then X X 1 , X 2 , X 3 , θ 1 , 2 , 3 , and p 3 . Since the voting population
is assumed to be very large, it is reasonable and permissible to think of the
probabilities as unchanging once a voter is selected for the sample. Hence we may
assume the following trinomial pmf of X :
100 !
f x; θ 1x 2x 3x
1 2 3
The sample space for this model is x1 , x2 , x3 0,...,100 | x1 x2 x3 100 ;
3
and the parameter space is 1 , 2 , 3 0,1 | 1 2 3 1 . Note that
3
X 1 , X 2 , X 3 are not independent in this model.
Remark
“Essentially, all models are wrong, but some are useful.” – George E.P. Box
Definition
Definition
Note that T may be vector valued, i.e. it may contain one, or more than one
components.
P.42
Stat2602 Probability and Statistics II Fall 2014-2015
Example 3.3
On the other hand, the sample mean X and sample variance S 2 are statistics as
they are functions of X X 1 , X 2 ,..., X 20 . Moreover, the sample minimum X 1 ,
sample maximum X 20 , sample median M X 10 X 11 2 , etc, are also statistics.
Note that the sample itself is just an identity function of the sample: X T X ,
therefore the original sample X 1 , X 2 ,..., X 20 is also a statistic.
One primary goal of statistical analysis to draw inferences about the unknown
parameters, using observed statistics. Point estimation is the process of using a
statistic to calculate a single value which is to serve as a “guess” of the unknown
parameter value.
Definition
Note that the notations ˆ and refer to different quantities, with ˆ T X as the
sample statistic that can be observed from the sample while is the population
parameter that is unobservable.
Roughly speaking, parameter is the target; estimator ˆ is the tool; and estimate
is the result.
P.43
Stat2602 Probability and Statistics II Fall 2014-2015
Example 3.4
From the data, X 99.2 . Hence a point estimate of the population mean is
ˆ 99.2 . Similarly, using the sample standard deviation ˆ S as the point
estimator of , a point estimate is obtained as ˆ 13.33 .
Note that is also the population median for this normal model. We may also use
the sample median to estimate the value of . In general for any parameter, there
may be many different methods to do the estimation.
Example 3.5
Suppose that we can observe some random points which are known to be
uniformly distributed from 0 to an unknown constant . Then we have the
following statistical model: X 1 , X 2 ,..., X n ~ U 0, , with parameter space
iid
0, .
However, since the population mean is , it is also reasonable to estimate
2
by using the sample mean, i.e. ˆ 2 X .
P.44
Stat2602 Probability and Statistics II Fall 2014-2015
Definition
bias ˆ E ˆ .
Note that the expectation is taken with respect to the sampling distribution of
ˆ T X . We say that ˆ overestimates if bias ˆ 0 ; underestimates if
bias ˆ 0 .
Unbiased Estimator
E ˆ for all the values of θ ,
i.e. the estimator is unbiased if bias ˆ 0 for any θ .
Example 3.6
E X for all .
2. Sample variance S 2
1 n
X i X 2 is an unbiased estimator of the
n 1 i 1
population variance as
2
E S 2 2 for all 2 .
Example 3.7
Using the formula in Section 2.1.4 for order statistics, the pdf of ˆ2 is obtained as
n 1
y 1 ny n 1
fˆ y n , 0 y .
2
n
ny n 1 n
n
ny
E ˆ2 0 n dy n
.
n 1 0 n 1
bias ˆ2 E ˆ2
n
n 1
n 1
0,
n 1
ˆ3 X n ,
n
P.46
Stat2602 Probability and Statistics II Fall 2014-2015
Example 3.8
X
An unbiased estimator of p is the sample proportion: pˆ as
n
E X np
E pˆ p for all p 0,1 .
n n
X2
However, pˆ 2 is not unbiased for p 2 as
2
p 1 p
E pˆ 2 Var pˆ E pˆ p2 p2 .
2
p 1 p p n 1 p 2
E pˆ 2 p2
n n n
n E pˆ
E pˆ p
2 2
n 1 n
n 2 pˆ
E pˆ p
2
n 1 n
n 2 pˆ n X 2 X X X 1
ˆ ˆ
p .
n 1 n n 1 n 2 n 2 n n 1
Example 3.9
To find an unbiased estimator of , consider the fact that for any i 1,2,..., n , we
have e P X i 1 .
P.47
Stat2602 Probability and Statistics II Fall 2014-2015
1 if X i 1,
I X 1
0 otherwise.
i
Then
E I X 1 1 P X i 1 0 P X i 1
i
P X i 1
which is the sample proportion of the sample data with values greater than one.
Definition
lim
n
bias ˆ 0 .
Example 3.10
as
lim bias ˆ2 lim
0.
n n
n 1
p 1 p
lim bias pˆ 2 lim 0.
n n
n
P.48
Stat2602 Probability and Statistics II Fall 2014-2015
biased.
Definition
MSE ˆ E ˆ .
2
Remark
The MSE of an estimator can be expressed in terms of its bias and variance.
MSE ˆ E ˆ
2
E ˆ E ˆ E ˆ
2
2
E ˆ E ˆ E ˆ
2
2 E ˆ E ˆ E ˆ
Var ˆ bias ˆ
2
Therefore the MSE can be regarded as a measure combining the precision and bias
of an estimator. In particular, if ˆ is unbiased, then
MSE ˆ Var ˆ .
Efficiency
Suppose that there are two estimators ˆ1 ,ˆ2 of a parameter . The ratio
eff ˆ1 ,ˆ2 MSE ˆ2
MSE ˆ1
is called the efficiency of ˆ1 compared to ˆ2 .
P.49
Stat2602 Probability and Statistics II Fall 2014-2015
The estimator ˆ1 is said to be better (or formally, more efficient) than the estimator
ˆ for estimating if eff , 1 for all possible values of and provided that
2 1 2
Example 3.11
ˆ1 2 X unbiased;
ˆ2 X n
biased with bias ˆ2
n 1
;
n 1
ˆ3 X n unbiased.
n
ˆ ˆ
MSE 1 Var 1 4Var X
4 2 4 2 2
n
.
n 12 3n
For ˆ2 ,
ny n 2 n 2
n 1
ny
E 2 0 n dy
ˆ 2
n
,
n 2 0 n 2
n n n 2
2 2
Var 2
ˆ
n 2 n 1 n 1 n 2
2
and hence
n 2 2 2
2
2
MSE ˆ2 bias ˆ2 Var ˆ2 .
n 1 n 12
n 2 n 1n 2
eff ˆ2 ,ˆ1
MSE
n 1n 2
1
MSE 2 6n
P.50
Stat2602 Probability and Statistics II Fall 2014-2015
n 1
MSE 3 Var 3
2
n 1
2
n 2 2
ˆ ˆ Var 2
ˆ .
n n2 n 12 n 2 nn 2
eff ˆ3 ,ˆ2
MSE
2n
2
MSE 3 n 1
Example 3.12
Suppose that for a particular statistical model, we use ˆ1 2 as an estimator of the
unknown parameter , i.e. we estimate the value of by 2 no matter what we
observe from the sample. The mean square error of such an estimator is 2
2
On the other hand, we may use ˆ2 5 as the estimator. Then it will be the best if
5 . Therefore comparing ˆ and ˆ , none of them can outperform the other for
1 2
all values of .
As can be seen from this simple example, it is impossible to find a “best” estimator
with minimum MSE for all . In searching for a “best” estimator, we often impose
some desirable constraints, such as unbiasedness.
Definition
Methods for finding best unbiased estimator will be discussed in later sections.
P.51
Stat2602 Probability and Statistics II Fall 2014-2015
A good estimator should have the property that when we observe a very large
sample, the observed value of ˆn would be arbitrarily close to the true value .
Definition
Remarks
Proof
P ˆn P ˆn 2 1
2
E ˆn 1
2
as n .
2
2. If ˆn is an unbiased estimator of with lim Var ˆn 0 , then ˆn is a consistent
n
estimator of .
3. If ˆn is an estimator of with lim bias ˆn 0 and lim Var ˆn 0 , then ˆn is a
n n
consistent estimator of .
P.52
Stat2602 Probability and Statistics II Fall 2014-2015
Example 3.13
Consider the three estimators of in the statistical model X 1 , X 2 ,..., X n ~ U 0,
iid
2 2
2
2
Since these MSEs all converge to zero as n tends to infinity, all the three estimators
are consistent.
Example 3.14
Suppose that X 1 , X 2 ,..., X n is a random sample drawn from a population with
finite mean and finite variance 2 . Then by the law of large numbers, we have
X
P
and S
P
,
Now suppose that we randomly draw a point uniformly from 0 to 1 and denote it as
U, i.e. U ~ U 0,1 . Then consider the following estimator of :
n if U 1 n ,
ˆ '
X otherwise.
1 1
n 1 E X 1 1
1
E ˆ '
n n n
lim biasˆ ' lim1 1 0 ,
n n
n
P n 1 P X
1
P ˆ '
1
n n
P.53
Stat2602 Probability and Statistics II Fall 2014-2015
Method of Moment
Equate the moments of the random variables to the sample moments and then solve
for the equations in terms of the parameters.
1 n r
r th sample moment : mr Xi
n i 1
Example 3.15
In example 3.7, X 1 , X 2 ,..., X n ~ U 0, , we have E X
iid
. The MME of can be
2
obtained by solving the equation:
ˆ
X ,
2
Example 3.16
T a bX
P.54
Stat2602 Probability and Statistics II Fall 2014-2015
Since there are two unknown parameters, we will need two equations based on the
first two moments:
1
EX , E X 2
1
The MME of and can be obtained by solving
ˆ
m1 X ,
ˆ ˆ
1 n 2 ˆ ˆ 1
m2 Xi
n i 1
ˆ ˆ ˆ 1
which gives
m1 m1 m2 1 m m m2
, 1 1
.
m2 m12 m2 m12
Example 3.17
1
Using the first moment E X , the MME of can be easily obtained as
1
ˆ . Similarly, the MME of e is given by ˆ e 1 X .
X
Remarks
1. Method of moment estimators may not be unbiased even when sample size is
large.
2. Method of moment estimator may not have the least variance even when
sample size is large.
3. Although method of moment estimator may not be the best, it is easy to find
and therefore it can be used as a first guess on the value of the parameters.
P.55
Stat2602 Probability and Statistics II Fall 2014-2015
Definition
l θ ; x log Lθ ; x .
In particular, if X 1
, X 2 ,..., X n is a random sample (iid) with common pdf f x; θ
then
Lθ ; x f xi ; θ .
n
i 1
and
l θ ; x log f xi ; θ .
n
i 1
Maximum Likelihood
Since the logarithm function is strictly increasing, we can also obtain the MLE by
maximizing the log-likelihood function.
P.56
Stat2602 Probability and Statistics II Fall 2014-2015
Example 3.18
likelihood function
n
e x n
1
x
L ; x p xi ; xi ! e n ,
n n i i
i 1
i 1 i 1 xi ! i 1
i 1 i 1
d 1 1 n
l ; x 0
n
d
n
xi 0
i 1
xi x
n i 1
Example 3.19
X 1 , X 2 ,..., X n ~ U 0, ,
iid
Consider the statistical model in Example 3.7:
0, . Since the pdf of U 0, is
1
for 0 x ,
f x;
0 otherwies,
1
for 0 x1 , 0 x2 , ..., 0 xn ,
L ; x f xi ; n
n
i 1
0 otherwise,
P.57
Stat2602 Probability and Statistics II Fall 2014-2015
1
for max x1 , x2 ,..., xn ,
or simply, L ; x n
0 otherwise.
Example 3.20
1
L , ; x f xi ; , 2 2 n exp 2 x ,
n n n
2
2
i
i 1 i 1
n 1
l , ; x log L , ; x log2 n log 2 x
n
2
.
2
i
2 i 1
x xi x n x
n n
2 2 2
i
i 1 i 1
n 1
l x , ; x log2 n log 2 x x .
n
2
2
i
2 i 1
d n 1 n 1 n
l x , ; x 0 3 xi x 0 xi x 2
2
d i 1 n i 1
P.58
Stat2602 Probability and Statistics II Fall 2014-2015
1 n
It can be easily verified that l x , ; x achieves its maximum at xi x 2
n i 1
by the second derivative test. Therefore the MLE of θ , is given by
θˆ ML X ,
1 n
X i X 2 X , n 1S .
n i 1 n
n 1
In particular, ̂ML X , ˆ ML S which are quite close to the usual estimators.
n
Example 3.21
Suppose that we draw a random sample from the Gamma distribution, i.e. we have
the statistical model: X 1 , X 2 ,..., X n ~ Gamma , , θ , 0, 0, ,
iid
n
xi
L , ; x f xi ; ,
n n
x i
1
e i 1
,
i 1 i 1
i 1 i 1
l n ' n
0 n log log xi 0 ,
i 1
l n n
0 xi 0 .
i 1 x
There is, however, no close form solution to these equations. Numerical methods
such as Newton’s method will be needed to find the MLEs of and .
P.59
Stat2602 Probability and Statistics II Fall 2014-2015
Under some regularity conditions, the MLE has the following nice properties that
justify its popularity to be applied for a wide range of statistical models.
1. Asymptotically Unbiased
In small sample, the MLE ˆML may be biased. However, the bias will converge
to zero as n , i.e. ˆ is approximately unbiased in large sample.
ML
2. Asymptotically Efficient
The MLE ˆML is the best unbiased estimator in large sample, i.e. it has the
smallest variance among all unbiased estimators as n .
3. Consistency
4. Asymptotic Normality
The MLE ˆML is approximately distributed as normal when the sample size is
large.
Example 3.22
xi
L ; x f xi ; e e
n n
xi n
Likelihood function: i 1
i 1 i 1
P.60
Stat2602 Probability and Statistics II Fall 2014-2015
In this section, we first consider only the statistical model with one-dimensional
parameter space. Generalization to p-dimensional parameter vector θ requires the
use of matrix algebra and multivariable calculus and will be briefly described later.
Details will be taught in advanced statistical inference course.
Definition
Let X be a random vector with likelihood function L , X . The score function is
defined as the derivative of the log-likelihood function:
d
S X; log L ; X ,
d
and the variance of the score function is called the Fisher Information :
I Var S X; .
Remarks
1. The score function measures how sensitively the likelihood function L , X
depends on the parameter , while the Fisher information measures the amount
of information that the observed X carries about the unknown parameter .
I E S X;
2
d2
E 2 log L ; X
d
The proof is beyond the scope of this course and a mathematical justification is
given in the supplementary notes.
P.61
Stat2602 Probability and Statistics II Fall 2014-2015
Example 3.23
e x n
1
x
L ; x p xi ;
n n i
xi ! e
i
n
Likelihood function: i 1
i 1 i 1 xi ! i 1
i 1 i 1
d d2 1
log L ; X 2
1 n
log L ; X n X i ,
n
d i 1 d 2
X
i 1
i
d2 1 1 n
I E 2 log L ; X E 2
n
X 2 n
d
i
i 1
Example 3.24
xi
L ; x f xi ; e x n e
n n
Likelihood function: i i 1
i 1 i 1
i 1
d n n d2 n
log L ; X X i , log L ; X
d i 1 d2 2
d2 n n
I E 2 log L ; X E 2 2
d
P.62
Stat2602 Probability and Statistics II Fall 2014-2015
Cramér-Rao Inequality
Var ˆ
' 2
.
I
The quantity ' I gives a lower bound to the variance of all unbiased
estimator and is called the Cramér-Rao Lower Bound.
I ˆML
L
N 0, '
2
as n ,
'
2
I ˆ
L
N 0,1 .
P.63
Stat2602 Probability and Statistics II Fall 2014-2015
Example 3.25
From Example 3.23, the Fisher information for the Poisson model was determined
n
as I .
Var ˆ
'
2
' .
2
I n
Therefore for any unbiased estimator ˆ of , we have Var ˆ .
n
Var X
Since X is unbased for and Var X , the variance of X attains
n n
the Cramér-Rao Lower Bound and hence
Var ˆ Var X
for any unbiased estimator ˆ , i.e. X has the minimum variance among all
unbiased estimators. The sample mean is therefore the best unbiased estimator, or
the Uniformly Minimum-Variance Unbiased Estimator (UMVUE).
n X
I X
L
N 0,1 as n ,
i.e. X is approximately distributed as N , when the sample size n is large.
n
This is exactly the same result obtained from the central limit theorem.
n n
2 4 3
and approximately, ML X ~ N ,
ˆ 2
.
n
P.64
Stat2602 Probability and Statistics II Fall 2014-2015
Example 3.26
From Example 3.24, the Fisher information for the exponential model was
n
determined as I 2 .
Var ˆ
'
2
2
' .
2
I n
2
I ˆML
n ˆML
L
N 0,1
as n ,
2
i.e. ML X is approximately distributed as N , when the sample size n is
ˆ 1
n
large.
n n
2 e 2
and approximately, ˆML e 1 X
~ Ne , .
n
P.65
Stat2602 Probability and Statistics II Fall 2014-2015
Remarks
2. The asymptotic distribution of MLE has mean equal to the parameter and
variance equal to the Cramér-Rao Lower Bound. Therefore the MLE is
asymptotically unbiased and efficient.
3. The “regularity conditions” for the asymptotic normality of the MLE includes:
The support of the data distribution cannot depend on the parameter. For
example, U 0, violates this condition.
The MLE cannot lie on the boundary of the set of possible parameters. For
example, this condition is violated for the normal model N , 2 with
0 , as the MLE of would be obtained as ̂ X if X 0 and ˆ 0 if
X 0.
The number of nuisance parameters cannot increase with the sample size.
Nuisance parameters are parameters other than the one being estimated. For
example, this condition is violated if we are estimating based on the
model X i ~ N , i2 , i 1,2,..., n as the number of variances i2 increases
as the sample size.
d d2
Both log L ; X and log L ; X exists.
d d 2
I is continuous.
P.66
Stat2602 Probability and Statistics II Fall 2014-2015
2
I θ E 2 log Lθ; X
θ
2
where the (i,j)th element is E log Lθ; X .
i j
1
Var ˆ θ ' I θ θ
θ θ
where θ θ , θ ,, θ ' .
θ 1 2 n
Example 3.27
Consider the normal model: X 1 , X 2 ,..., X n ~ N , 2 , θ , 0, ,
iid
n 1
log Lθ; X log2 n log 2 X .
n
2
2
i
2 i 1
1 n 1 n
log Lθ; X 2 X , log Lθ; X 3 X i
n
2
i 1
i
i 1
2 n 2 n
log L θ; X E 2 log Lθ; X 2
2 2
2 2 n 2
log Lθ; X 3 X i E log Lθ; X 0
i 1
2 n 3 2 2n
log Lθ; X 2 4 X log Lθ; X 2
n
E
2
2 i 2
i 1
P.67
Stat2602 Probability and Statistics II Fall 2014-2015
n
2 0 n 1 0
I θ
0 2n 2 0 2
2
2 2 0
I θ
1
with inverse given by .
2n 0 1
For the parameter , the Cramér-Rao Lower Bound for any unbiased estimator ̂
is
2 2 0 1 2
Var ˆ 1 , 0' .
2n 0 1 0 n
2
Since the sample mean X is unbiased for and Var X , it is an efficient
n
estimator of , and hence the UMVUE of .
On the other hand, for the parameter 2 , the Cramér-Rao Lower Bound for any
unbiased estimator ˆ 2 is
2 2 0 0 2 4
Var ˆ 2 0 , 2 ' .
2n 0 1 2 n
Finally, for the coefficient of variation , the Cramér-Rao Lower Bound for
any unbiased estimator ˆ is given by
1 2 2 0 2 2 2
2
P.68
Stat2602 Probability and Statistics II Fall 2014-2015
Definition
Suppose we know only the value of a sufficient statistic T but not the complete
data X . According to the definition, the conditional distribution of X given T
does not depend on θ . Therefore we can use this conditional distribution to
generate another set of data X * , without knowing the value of θ . This generated
data X * should have the same distribution as the original data X . If we apply the
same statistical method to make inference about θ , the performance would be the
same no matter we apply it on X or X * . Therefore knowing the value of T is as
good as having the complete data X , i.e. T contains all the information about θ
and is sufficient for θ .
Example 3.28
p 0,1 .
n
Let T X i . Then T is simply the count of number of successes out of the n
i 1
P.69
Stat2602 Probability and Statistics II Fall 2014-2015
p x | t P X 1 x1 , X 2 x2 ,, X n xn | T t
P X 1 x1 , X 2 x2 ,, X n xn , T t
PT t
n
For xi 1
i t,
P X 1 x1 , X 2 x2 ,, X n xn
p x | t
PT t
P X 1 x1 P X 2 x2 P X n xn
PT t
p x 1 p p x 1 p p x 1 p
1
1 x 1 2
1 x 1 x
2 n n
n t
p 1 p
n t
t
p x x x 1 p
1 2 n
n x1 x2 xn
n t
p 1 p
n t
t
p t 1 p
n t 1
n
n t t
p 1 p
n t
t
In summary
n 1
if x1 x2 xn t ,
px | t t
0 otherwise.
n
which does not depend on the parameter p. Therefore T X i is a sufficient
i 1
statistic for p. Knowing the number of successes is already enough for us to make
inference about the success probability p. Once we have the number of successes,
we don’t need to know which of the Bernoulli trials are successes.
P.70
Stat2602 Probability and Statistics II Fall 2014-2015
Example 3.29
X 1 2 X 2 3X 3
Let T be a weighted mean of the sample, i.e. the third trial will
6
be counted more than the first two trials. Consider the conditional probability
1 P X 1 1, X 2 1, X 3 0
P X 1 1, X 2 1, X 3 0 | T
2 PT 1 2
P X 1 1, X 2 1, X 3 0
P X 1 2 X 2 3 X 3 3
P X 1 1, X 2 1, X 3 0
P X 1 1, X 2 1, X 3 0 P X 1 0, X 2 0, X 3 1
p 2 1 p
2
p 1 p 1 p p
2
p
Remarks
P.71
Stat2602 Probability and Statistics II Fall 2014-2015
Fisher-Neyman Criterion
For a statistical model with random vector X and parameter vector θ , the statistic
T T X is sufficient for θ if and only if the ratio
p x; θ
qT x ; θ
does not depend on θ , where p x; θ and qt; θ are the pmfs (or pdfs) of X and
T T X , respectively.
Factorization Criterion
For a statistical model with random vector X and parameter vector θ , the statistic
T T X is sufficient for θ if and only if there exist functions g t; θ and h x
such that the pmf (or pdf) p x; θ of X can be written as
p x; θ h x g T x ; θ ,
i.e. the pmf (or pdf) of X can be “factorized”, with the function involving θ
depends only on T X .
These criteria are true in both discrete and continuous cases. However, the proofs
for continuous case need measure theory and are omitted. The proofs for discrete
case can be found in the supplementary notes.
Example 3.30
xi n
px; p p 1 p 1 p
n
xi 1 xi
p i 1
n xi
i 1
i 1
n
Therefore base on the factorization criterion, T X i is sufficient for p.
i 1
P.72
Stat2602 Probability and Statistics II Fall 2014-2015
Example 3.31
1 x n n x 1
f x; θ
n
n xi
i
xi e i
e i 1
i 1 i 1
i i
i 1 i 1
Example 3.32
2 i 1
1
2 2 2 exp 2 xi2 2 xi n 2
n n n
2 u1 i 1
i 1 i 1
also write the pdf as
1 2
f x; θ 2 2 2 exp 2 xi x n x
n n
2
2 i 1
1
2
2 2 2 exp 2 n 1s 2 n x ,
n
2
P.73
Stat2602 Probability and Statistics II Fall 2014-2015
Example 3.33
X 1 , X 2 ,..., X n ~ N ,1 ,
iid
1 n
f x; 2 2 exp xi2 2 xi n 2
n n
2 i 1 i 1
1 2 n 2
2 2 exp xi exp n x
n n
2 i 1 2
Example 3.34
Putting 1 into the expressions in Example 3.32, the pdf can be expressed as
1
f x; 2 2 2 exp 2 xi2 2 xi n
n n n
2 i 1 i 1
1
2
2 2 2 exp 2 n 1s 2 n x 1
n
2
i 1 i 1
not a sufficient statistic for .
P.74
Stat2602 Probability and Statistics II Fall 2014-2015
The following theorem shows how we can possibly “improve” an estimator using
the sufficient statistics.
Rao-Blackwell Theorem
ˆ* E ˆ | T
is an estimator of θ which has mean square error no greater than the mean
square error of ˆ .
Proof
Hence ˆ * and ˆ have same expectation and same bias. Consider the variance:
As a result,
P.75
Stat2602 Probability and Statistics II Fall 2014-2015
Remarks
1. The theorem suggest that we can always try to improve an estimator ˆ using a
sufficient statistic T , so that ˆ* E ˆ | T can perform as good as, and may be
better than, the original ˆ . Such process is often called Rao-Blackwellization.
E Varˆ | T X E ˆ ˆ * 0 ,
2
i.e. if ˆ* E ˆ | T has the same variance as the unbiased estimator ˆ , then
P * 1. This implies that a best unbiased estimator (if it exists) must be a
function of the sufficient statistic.
Example 3.35
Consider the Poisson model: X 1 , X 2 ,..., X n ~ Poisson , 0,
iid
e x n
1
x
p x; xi ! e n
n i i
i 1
i 1 xi ! i 1
n
Therefore T X i is a sufficient statistic for . Now suppose we want to find an
i 1
E e X E e T n M T exp e 1 n 1 e
1
n
1 if X 1 0,
ˆ
0 otherwise.
Of course it is rather silly to do the estimation using only one out of n observations.
However, it is the starting point from which we can do the Rao-Blackwellization.
P.76
Stat2602 Probability and Statistics II Fall 2014-2015
E ˆ | T t P ˆ 1 | T t
P X 1 0 | T t
P X 1 0, T t
PT t
P X 1 0, X i t
n
i 2
PT t
e n 1 n 1 e n n
t t
e
t! t!
t
1
1
n
P.77