0% found this document useful (0 votes)
12 views6 pages

Chapter 7. Statistical Estimation: 7.6: Properties of Estimators I

Uploaded by

Chen Wu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

Chapter 7. Statistical Estimation: 7.6: Properties of Estimators I

Uploaded by

Chen Wu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter 7.

Statistical Estimation
7.6: Properties of Estimators I
(From “Probability & Statistics with Applications to Computing” by Alex Tsun)

Now that we have all these techniques to compute estimators, you might be wondering which one is the
“best”. Actually, a better question would be: how can we determine which estimator is “better” (rather
than the technique)? There are even more different ways to estimate besides MLE/MoM/MAP, and in
different scenarios, different techniques may work better. In these notes, we will consider some properties of
estimators that allow us to compare their “goodness”.

7.6.1 Bias

The first estimator property we’ll cover is Bias. The bias of an estimator measures whether or not in
expectation, the estimator will be equal to the true parameter.

Definition 7.6.1: Bias

Let θ̂ be an estimator for θ. The bias of θ̂ as an estimator for θ is


h i
Bias(θ̂, θ) = E θ̂ − θ

If
h i
• Bias(θ̂, θ) = 0, or equivalently E θ̂ = θ, then we say θ̂ is an unbiased estimator of θ̂.

• Bias(θ̂, θ) > 0, then θ̂ typically overestimates θ.

• Bias(θ̂, θ) < 0, then θ̂ typically underestimates θ.

Let’s go through some examples!

Example(s)

First, recall that, if x1 , ..., xn are iid realizations from Poi(θ), then the MLE and MoM were both the
sample mean.
n
1X
θ̂ = θ̂M LE = θ̂M oM = xi
n i=1

Show that θ̂ is an unbiased estimator of θ.

1
2 Probability & Statistics with Applications to Computing 7.6

Solution
" n #
h i 1X
E θ̂ = E xi
n i=1
n
1X
= E [xi ] [LoE]
n i=1
n
1X
= θ [E [Poi(θ)] = θ]
n i=1
1
= nθ
n

This makes sense: the average of your samples should be “on-target” for the true average!

Example(s)

First, recall that, if x1 , ..., xn are iid realizations from (continuous) Unif(0, θ), then
n
1X
θ̂M LE = xmax θ̂M oM = 2 · xi
n i=1

Sure, θ̂M LE maximizes the likelihood, so in a way θ̂M LE is better than θ̂M OM . But, what are the biases
of these estimators? Before doing any computation: do you think θ̂M LE and θ̂M oM are overestimates,
underestimates, or unbiased?

Solution I actually think θ̂M oM is spot-on since the average of the samples should be close to θ/2, and mul-
tiplying by 2 would seem to give the true θ. On the other hand, θ̂M LE might be a bit of an underestimate,
since we probably wouldn’t have θ be exactly the largest (maybe a little larger).

• Bias of the maximum likelihood estimator.


Recall from 5.10 that the density of the largest order statistic (i.e. the maximum of the sample) is
 y n−1 1
n−1
fXmax (y) = n FX (y) fX (y) = n
θ θ
You could also instead first find the CDF of Xmax as
 y n
n
FXmax (y) = P (Xmax ≤ y) = P (Xi ≤ y) = FX (y)n =
θ
since the max is less than or equal to a value if and only if each of them is, then take the derivative.
Using this density function we can compute the expected value of the θMˆLE as follows:
! " #θ
Z θ  y n−1 1 Z θ
h i n n n 1 n+1 n
E θ̂M LE = E [Xmax ] = y n dy = n y dy = n y = θ
0 θ θ θ 0 θ n+1 n+1
0

This makes sense because if I had 3 samples from Unif(0, 1) for example, I would expect them at
n
1/4, 2/4, 3/4, and so it would be as my expected max. Similarly, if I had 4 samples, then I would
n+1
n
expect them at 1/5, 2/5, 3/5, 4/5, and so it would again be as my expected max.
n+1
7.6 Probability & Statistics with Applications to Computing 3

Finally,
h i n 1
Bias(θ̂M LE , θ) = E θ̂M LE − θ = θ−θ =− θ
n+1 n+1

• Bias of the method of moments estimator.


" n
# n
h i 1 X 2 X 2 θ
E θ̂M OM = E 2 · xi = E [xi ] = n = θ
n i=1 n i=1 n 2
h i
Bias(θ̂M oM , θ) = E θ̂M oM − θ = θ − θ = 0

• Analysis of Results
This means that θ̂M LE typically underestimates θ and θ̂M OM is an unbiased estimator of θ. But some-
thing isn’t quite right...

Suppose the samples are x1 = 1, x2 = 9, x3 = 2. Then, we would have

2
θ̂M LE = max{1, 9, 2} = 9 θ̂M OM = (1 + 9 + 2) = 8
3
However, based on our sample, the MoM estimator is impossible. If the actual parameter were 8, then
that means that the distribution we pulled the sample from is Unif(0, 8), in which case the likelihood
that we get a 9 is 0. But we did see a 9 in our sample. So, even though θ̂M OM is unbiased, it still
yields an impossible estimate. This just goes to show that finding the right estimator is actually quite
tricky.

A good solution would be to “de-bias” the MLE by scaling it appropriately. If you decided to have a
new estimator based on the MLE:
n+1
θ̂ = θ̂M LE
n
you would now get an unbiased estimator that can’t be wrong! But now it does not maximize the
likelihood anymore...
Actually, the MLE is what we say to be “asymptotically unbiased”, meaning unbiased in the limit.
This is because
1
Bias(θ̂M LE , θ) = − θ→0
n+1
as n → ∞. So usually we might just leave it because we can’t seem to win...

Example(s)

Recall that if x1 , . . . , xn ∼ Exp(θ) are iid, our MLE and MoM estimates were both the inverse sample
mean:
1 n
θ̂ = θ̂M LE = θ̂M oM = = Pn
x̄ i=1 xi
What can you say about the bias of this estimator?

Solution
4 Probability & Statistics with Applications to Computing 7.6

 
h i n
E θ̂ = E Pn
i=1 xi
n
≥ Pn [Jensen’s inequality]
i=1 E [xi ]
 
n 1
= Pn 1 E [Exp(θ)] =
i=1 θ θ
n
= 1

1
The inequality comes from Jensen’s (section 6.3): since g(x1 , . . . , xn ) = Pn is convex (at least in the
xi
i=1
positive octant when all xi ≥ 0), we have that E [g(x1 , . . . , xn )] ≥ g(E [x1 ] , E [x2 ] , . . . , E [xn ]). It is convex
1 h i
for a reason similar to that is a convex function. So E θ̂ ≥ θ systematically, and we typically have an
x
overestimate.

7.6.2 Variance and Mean Squared Error

We are often also interested in how much a estimator varies (we would like it to be unbiased and have small
variance to that it is more accurate). One metric that captures this property of estimators is an estimators
variance.
The variance of an estimator θ̂ is   h h i i
Var θ̂ = E (θ̂ − E θ̂ )2

This is just the definition of variance applied to the random variable θ̂ and isn’t actually a new definition.
But maybe instead of just computing the variance, we want a slightly different metric which instead measures
the squared difference from the actual estimator and not just its expectation:
h i
E (θ̂ − θ)2

We call this property the mean squared error (MSE), h i and it is related to both bias and variance! Look
closely at the difference: if θ̂ is unbiased, then E θ̂ = θ and the MSE and variance are actually equal!

Definition 7.6.2: Mean Squared Error

The mean squared error of an estimator θ̂ of θ is


h i
MSE(θ̂, θ) = E (θ̂ − θ)2
h i  
If θ̂ is an unbiased estimator of θ (i.e. E θ̂ = θ), then you can see that MSE(θ̂, θ) = Var θ̂ . In
 
fact, in general MSE(θ̂, θ) = Var θ̂ + Bias(θ̂, θ)2 .
7.6 Probability & Statistics with Applications to Computing 5

This leads to what is known as the “Bias-Variance Tradeoff” in machine learning and statistics. Usually, we
want to minimize MSE, and these two quantities are often inversely related. That is, decreasing one leads
to an increase in the other, and finding the balance will minimize the MSE. It’s hard to see why that might
be the case since we aren’t working with as complex of estimators (we’re just learning the basics!).
 
Proof of Alternate MSE Formula. We will prove that MSE(θ̂, θ) = Var θ̂ + Bias(θ̂, θ)2 .

h i
MSE(θ̂, θ) = E (θ̂ − θ)2 [def of MSE])
 h i h i 2  h i
= E θ̂ − E θ̂ + E θ̂ − θ [add and subtract E θ̂ ]
 h i2  h h i  h i i  h i 2 
= E θ̂ − E θ̂ + 2E θ̂ − E θ̂ E θ̂ − θ + E E θ̂ − θ [(a + b)2 = a2 + 2ab + b2 ]
  h i h h ii
= Var θ̂ + 0 + E Bias(θ̂, θ)2 [def of var, bias, E θ̂ − E θ̂ = 0]
 
= Var θ̂ + Bias(θ̂, θ)2

It is highly desirable that the MSE of an estimator


 is low! We want a small difference between θ̂ and θ. Use
the formula above to compute MSE: Var θ̂ is something we learned how to compute a long time ago, and
there are several examples of bias computations above.

Example(s)

First, recall that, if x1 , ..., xn are iid realizations from Poi(θ), then the MLE and MoM were both the
sample mean.
n
1X
θ̂ = θ̂M LE = θ̂M oM = xi
n i=1

Compute the MSE of θ̂ as an estimator of θ.

Solution To compute the MSE, let’s compute the bias and variance separately. Earlier, we showed that
h i
Bias(θ̂, θ) = E θ̂ − θ = θ − θ = 0

Now for the variance:


n
!
  1X
Var θ̂ = Var xi
n i=1
n
1X
= Var (xi ) [variance adds if independent]
n i=1
n
1 X
= θ [Var (Poi(θ)) = θ]
n2 i=1
1
= nθ
n2
θ
=
n
6 Probability & Statistics with Applications to Computing 7.6

Finally, using both of those results:


  θ θ
MSE(θ̂, θ) = Var θ̂ + Bias(θ̂, θ)2 = + 02 =
n n

You might also like