0% found this document useful (0 votes)
46 views3 pages

Handout 14: Unbiasedness and MSE

This document discusses key concepts in statistical inference including unbiasedness, mean squared error (MSE), and point estimation. It defines an estimator as a function of the data that provides a guess for an unknown parameter. An unbiased estimator has a bias of zero, where bias is defined as the difference between the expected value of the estimator and the true parameter value. The MSE of an estimator is the sum of its variance and squared bias. The sample mean and sample variance are proved to be unbiased estimators of the population mean and variance, respectively.

Uploaded by

Victor Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views3 pages

Handout 14: Unbiasedness and MSE

This document discusses key concepts in statistical inference including unbiasedness, mean squared error (MSE), and point estimation. It defines an estimator as a function of the data that provides a guess for an unknown parameter. An unbiased estimator has a bias of zero, where bias is defined as the difference between the expected value of the estimator and the true parameter value. The MSE of an estimator is the sum of its variance and squared bias. The sample mean and sample variance are proved to be unbiased estimators of the population mean and variance, respectively.

Uploaded by

Victor Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

1

Handout 14: Unbiasedness and MSE

We begin making the transition from the ideas and tools of probability
theory to the theory and methods of statistical inference. For simplicity,
suppose that we have the data X1 , X2 , . . . , Xn in hand, where the X’s
are assumed to be independent random variables with a common dis-
tribution Fθ indexed by the parameter θ whose value is unspecified. In
practice, we would often be willing to assume that the distribution Fθ has
a density or probability mass function of some known form. For exam-
ple, physical measurements might be assumed to be random draws from
an N (µ, σ 2 ) distribution, counts might be assumed to be Binomial(n, p)
or P oisson(λ) variables, and the failure times of engineered systems in a
life-testing experiment might be assumed to be Γ(α, β) variables. Note
that such assumptions do not specify the exact distribution that applies
to the experiment of interest, but rather, specifies only its “type.”

Suppose that a particular parametric model F θ has been assumed


to apply to the available data X 1 , X 2 ,…, Xn . It is virtually always the
case in practice that the exact value of the parameter θ is not known. A
naïve but nonetheless useful way to think of statistical estimation is to
equate it with the process of guessing. The goal of statistical estimation
is to make an educated guess about the value of the unknown parameter
θ . What makes one’s guess “educated” is the fact that the estimate
of θ is informed by the data. A point estimator of θ is a fully specified
function of the data which, when the data is revealed, yields a numerical
guess at the value of θ . The estimator will typically be represented by
the symbol θ, b pronounced “theta hat,” whose dependence on the data
is reflected in the equation:

θb = θ(X
b 1 , X 2 , . . . , Xn )

Let us suppose, for example, that the experiment of interest involves


n independent tosses of a bent coin, where n is a known integer. The
parameter p, the probability of heads in a given coin toss, is treated
as an unknown constant. It seems quite reasonable to assume that the
experimental data we will observe is well described as a sequence of
iid Bernoulli trials. Most people would base their estimate of p on the
variable X, the number of successes in n iid Bernoulli trials; of course,
X ~ B(n, p). The sample proportion of successes, pb = X/n is a natural
estimator of p and is, in fact, the estimator of p that most people would
use.
2

What makes a “good” estimator? Ideally, we would hope that it tends


to be close to the true parameter it is estimating. The following defintion
formalizes this notion.

Definition 1 (Mean Squared Error (MSE)) The mean squared er-


ror of an estimator θb of the parameter θ is defined as:
( )2
b = E θb − θ
M SE(θ)

The MSE has a useful decomposition that we will often use in the
analyses of various estimators.

Theorem 1 (Decomposition of MSE) The mean squared error of θb


can be decomposed as follows:

b = V ar(θ)
M SE(θ) b + Bias(θ)
b2

Where:

b = E(θb − θ)
Bias(θ)

Proof. This result can be established with the following:

θb − θ = (θb − Eθ)
b + (Eθb − θ)

(θb − θ)2 = (θb − Eθ)


b 2 + (Eθb − θ)2 + 2 · (θb − Eθ)
b · (Eθb − θ)

Taking expectations on both sides, notice that the cross-term drops out:

E(θb − θ)2 = E(θb − Eθ)


b 2 + E(Eθb − θ)2
b = V ar(θ)
M SE(θ) b + Bias(θ)
b2

Giving the desired result ■.

Notice that the MSE in general is a function of the value of actual


value of θ. The decomposition leads to a definition of bias:

Definition 2 (Unbiasedness) An estimator θb is unbiased for θ if


b = 0.
Bias(θ)

We close with a result that establishes the unbiasness of estimators


for the mean and standard deviation of an arbitrary estimator.
3

iid
Theorem 2 If X1 , . . . Xn ∼ F for some distribution F with finite mean
µ and variance σ 2 , then

Xi
X̄ = i
n
1 ∑
s2 = (Xi − X̄)2
(n − 1) i

Are unbiased estimators of µ and σ 2 , respectively.

Proof. The first result is a simple application of the linearity of the


expectation operator. The second comes from a similar derivation to
the decomposition of the MSE:
[ n ] [ n ]
∑ ∑
E (Xi − X̄) = E
2
(Xi − µ + µ − X̄) 2

i=1 i=1
[ ]

n ∑
n
=E (Xi − µ)2 + 2(µ − X̄) (Xi − µ) + n(X̄ − µ)2
i=1 i=1
[ ]

n
=E (Xi − µ) − 2n(X̄ − µ) + n(X̄ − µ)
2 2 2

i=1
[ n ]

=E (Xi − µ) − n(X̄ − µ)
2 2

i=1

n
= E(Xi − µ)2 − nE(X̄ − µ)2
i=1

n ( )
σ2
= σ2 − n
i=1
n
= (n − 1)σ 2

Dividing both sides by (n − 1) finishes the proof ■.

You might also like