0% found this document useful (0 votes)
11 views

02 Point Estimators

Uploaded by

Ulya Rey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

02 Point Estimators

Uploaded by

Ulya Rey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Properties of point estimators

Advanced Statistics II

Prof. Dr. Matei Demetrescu

Statistics and Econometrics (CAU Kiel) Summer 2021 1 / 33


Today’s outline

Properties of point estimators

1 Estimators and their risk

2 Finite-sample properties

3 Large-sample properties

4 Up next

Statistics and Econometrics (CAU Kiel) Summer 2021 2 / 33


Estimators and their risk

Outline

1 Estimators and their risk

2 Finite-sample properties

3 Large-sample properties

4 Up next

Statistics and Econometrics (CAU Kiel) Summer 2021 3 / 33


Estimators and their risk

Two flavours of estimation


Point estimation is concerned with estimating θ or q(θ) using the
outcome of a random sample X .

Point estimation: The outcome of some statistic, say t(X 1 , ..., X n ),


is taken as estimation of the unknown θ 0 or q(θ 0 ).

interval estimation: For a scalar target, define two statistics, say


t1 (X 1 , ..., X n ) and t2 (X 1 , ..., X n ) so that
[t1 (X 1 , ..., X n ) , t2 (X 1 , ..., X n )]
is an interval for which we control the probability that the unknown
θ0 or q(θ 0 ) are between T1 and T2 (coverage probability).
(For parameter vectors: confidence regions.)

We focus on point estimation in AS II.

Statistics and Econometrics (CAU Kiel) Summer 2021 4 / 33


Estimators and their risk

Just another statistic...

Definition
A statistic, T = t(X ), whose outcomes are used to estimate the value of a
scalar or vector function, q(θ), of the parameter vector, θ, is called a
point estimator.
An observed outcome of an estimator is called a point estimate.

Many possibilities to choose t;


... but how do we select among them?.

There are two aspects:


Formulate generic ways to propose estimators (estimation principles);
Evaluate (the statistical properties of) such estimators.

Statistics and Econometrics (CAU Kiel) Summer 2021 5 / 33


Estimators and their risk

Three estimators of the mean of a standard normal


3.0

3.0
Density

Density
2.0

2.0
1.0

1.0
0.0

0.0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
Estimator 1 Estimator 2
3.0
Density
2.0
1.0
0.0

−1.0 −0.5 0.0 0.5 1.0


Estimator 3
... which one?
Statistics and Econometrics (CAU Kiel) Summer 2021 6 / 33
Estimators and their risk

Decision theoretic criteria


Often, estimators and tests are computed to enable some decision:
estimate Value-at-Risk and decide to invest
... or not.

(Normative) Decision theory gives us tools to evaluate estimators and


tests.

Just call the respective statistics decisions,


δ(X ) = θ̂ for estimation; for testing,
δ(X ) = IX ∈Cr with Cr the so-called critical, or rejection, region

Key notion: The loss function


For (single) parameter estimation, you could use the MSE,
L(δ(X )) = (δ(X ) − θ0 )2
For testing, you could use the zero-one loss.
Statistics and Econometrics (CAU Kiel) Summer 2021 7 / 33
Estimators and their risk

Minimize risk

To evaluate decisions, we need to known what the average loss (risk) is,

R (δ, θ) = Eθ (L (δ)) ,

so that we could minimize it.

But: each θ may lead to different optimal decision!

And θ is unknown to begin with...

If there is no uniformly (in θ) best decision,


R
One could minimize the weighted risk, w(θ)R (δ, θ) dθ
... or hedge against the worst-case scenario by minimizing
supθ R (δ, θ) (minimax).

Specific details (focussing on MSE loss) in due time. Until then, ...
Statistics and Econometrics (CAU Kiel) Summer 2021 8 / 33
Estimators and their risk

Small vs. large sample properties

We rank estimators according to their (MSE) risk.

Recall, this is the expected (MSE) loss, where the expectation is taken
w.r.t. the sampling distribution.

Small-sample (finite-sample) properties refer to the exact sampling


distribution of the estimator for finite samples;
Large-sample (asymptotic) properties refer to approximations to
sampling distribution properties based upon limiting distributions.

We resort to asymptotic properties when finite-sample ones are not


tractable (or we don’t want to make too many assumptions).

Statistics and Econometrics (CAU Kiel) Summer 2021 9 / 33


Finite-sample properties

Outline

1 Estimators and their risk

2 Finite-sample properties

3 Large-sample properties

4 Up next

Statistics and Econometrics (CAU Kiel) Summer 2021 10 / 33


Finite-sample properties

Finite-sample risk

We agree that

We should pick estimators with “minimal” risk.

As a popular criterion, we take a quadratic loss, L(u) = u2 .1

Definition (scalar case)


The mean squared error (MSE) of an estimator T = t(X ) of q(θ) is
defined as
 
2
MSEθ (T ) = Eθ (T − q(θ)) ∀ θ ∈ Ω.

Different θs imply different MSE in general!


1
Or a quadratic form in the multivariate q(θ) case.
Statistics and Econometrics (CAU Kiel) Summer 2021 11 / 33
Finite-sample properties

Some implications

The MSE can be decomposed into the variance and the bias of the
estimator, as
   
Eθ (T − q(θ))2 = Eθ (T − Eθ (T ) + Eθ (T ) − q(θ))2
   
= Eθ (T − Eθ (T ))2 + Eθ (Eθ (T ) − q(θ))2
+2 Eθ ((T − Eθ (T ))(Eθ (T ) − q(θ)))
= Varθ (T ) + (Eθ (T ) − q(θ))2 ≥ 0
| {z }
Bias

Large MSE of T
is implied by large variance, large bias, or both;
and both variance and bias depend on θ!
Estimators with smaller MSEs are preferred (are “more efficient”).

Statistics and Econometrics (CAU Kiel) Summer 2021 12 / 33


Finite-sample properties

Relative efficiency & admissibility

Definition
Let T and T ∗ be two estimators of a scalar q(θ). The relative efficiency of
T w.r.t. T ∗ is given by
MSEθ (T ∗ )
REθ (T, T ∗ ) = , ∀ θ ∈ Ω.
MSEθ (T )

T is relatively more efficient than T ∗ if

REθ (T, T ∗ ) ≥1 ∀ θ ∈ Ω and REθ (T, T ∗ ) >1 for at least one θ ∈ Ω.

If some T is relatively more efficient than some other estimator T ∗ , then


T ∗ is inadmissible for estimating q(θ) and T admissible.

Statistics and Econometrics (CAU Kiel) Summer 2021 13 / 33


Finite-sample properties

Bernoulli

Example
Suppose (X1 , ..., Xn ) is a random sample from a Bernoulli distribution
with P(Xi = 1) = p and n = 25.

Consider the following two estimators for p ∈ [0, 1]:

T = n1 ni=1 Xi and T ∗ = n+1 1 Pn


P
i=1 Xi .

Which estimator, if either, is preferred on the basis of MSE?

Statistics and Econometrics (CAU Kiel) Summer 2021 14 / 33


Finite-sample properties

Non-existence of the most efficient estimator

In general, no MSE-optimal estimator exists; see the following example:


Assume that we want to estimate the scalar θ.
Consider the degenerate estimator T1 = t1 (X) = θ1 (fixed value,
ignoring the sample information) with

MSEθ (T1 ) = E (θ1 − θ)2 = (θ1 − θ)2




which is exactly zero for θ = θ1 ,


... so an MSE-optimal estimator would have to always equal the true
parameter without knowing it in advance!

Statistics and Econometrics (CAU Kiel) Summer 2021 15 / 33


Finite-sample properties

Unbiasedness

But

Optimality within restricted classes is achievable!

Definition
An estimator T is said to be an unbiased estimator of q(θ) iff

Eθ (T ) = q(θ) ∀ θ ∈ Ω.

Otherwise, the estimator is said to be biased.

An unbiased estimator has the appealing property that its outcomes


are equal to q(θ) on the average.

Statistics and Econometrics (CAU Kiel) Summer 2021 16 / 33


Finite-sample properties

Location and scale


Example
Let (X1 , ..., Xn ) be an iid(µ, σ 2 ) sample.
Then T = n1 ni=1 Xi is unbiased for µ, E(T ) =
P 1 Pn
n i=1 E(Xi ) = µ.
Consider then the sample variance

T = n1 ni=1 (Xi − X̄)2 ,


P 1 Pn
with X̄ = n i=1 Xi

as an estimator for σ 2 . Its expectation is


hP i hP i
n n
E(T ) = n1 E i=1 (X i − X̄)2 = 1 E
n i=1 (Xi − µ) 2 − n(X̄ − µ)2

1 2
= (1 − )σ 6= σ 2
n
1 1 Pn
An unbiased estimator is S 2 = 1 T
(1− n )
= n−1 i=1 (Xi − X̄)2 .

Statistics and Econometrics (CAU Kiel) Summer 2021 17 / 33


Finite-sample properties

(Scalar) MVU Estimators

Unbiasedness seems to be a desirable property.

Definition
An estimator T is said to be a minimum-variance unbiased estimator of
q(θ) iff
1. Eθ (T ) = q(θ) ∀ θ ∈ Ω, that is, T is unbiased, and
2. Varθ (T ) ≤ Varθ (T ∗ ) ∀θ ∈ Ω ∀ other unbiased estimator T ∗ .

The Minimum Variance Unbiased Estimator has the smallest MSE within
the class of unbiased estimators.

Statistics and Econometrics (CAU Kiel) Summer 2021 18 / 33


Finite-sample properties

... and L ones

Finding a MVUE for q(θ) is challenging in general.


What if focussing on linear ones?

Definition
An estimator T is said to be a BLUE of q(θ) iff
1. T is a linear function of the random sample X = (X1 , ..., Xn )0 , i.e.,

T = a0 X = a1 X1 + · · · + an Xn ,

2. Eθ (T ) = q(θ) ∀ θ ∈ Ω, that is, T is unbiased, and


3. Varθ (T ) ≤ Varθ (T ∗ ) ∀θ ∈ Ω for any other linear and unbiased
estimator T ∗ .

Statistics and Econometrics (CAU Kiel) Summer 2021 19 / 33


Finite-sample properties

BLUEs

Example
Let (X1 , ..., Xn ) be a random sample from a population with E(X) = µ
and Var(X) = σ 2 .

The BLUE for µ is


Pn
T = 0 + n1 X1 + · · · + n1 Xn = 1
n i=1 Xi .

Regularity conditions assumed, the OLS/GLS estimator (thoroughly


discussed in econometrics) is BLUE.

Statistics and Econometrics (CAU Kiel) Summer 2021 20 / 33


Large-sample properties

Outline

1 Estimators and their risk

2 Finite-sample properties

3 Large-sample properties

4 Up next

Statistics and Econometrics (CAU Kiel) Summer 2021 21 / 33


Large-sample properties

Asymptotic properties

Sometimes, finite sample properties of T are intractable. E.g.


when T is a complicated non-linear function of the random sample
t(X ), or
under semiparametric model specifications for X
(i.e. no specific distribution assumed).

Asymptotic properties are analogous to finite sample properties,


... except that they are based on the asymptotic distribution.

Statistics and Econometrics (CAU Kiel) Summer 2021 22 / 33


Large-sample properties

Recall asymptotic distributions

Definition (Asymptotic Distribution)


Let {Zn } be a sequence of random variables defined by
d
Zn = h(Xn , an ), where Xn → X (nondegenerate),
an : sequence of numbers/parameters.

An asymptotic distribution for Zn is the distribution of h(X, an ),


a
Zn ∼ h (X, an ) “Zn is asymptotically distributed as h (X, an )”.

Statistics and Econometrics (CAU Kiel) Summer 2021 23 / 33


Large-sample properties

Consistent estimators

Definition
An estimator T n is said to be a consistent estimator of q(θ) iff
plimθ T n = q(θ) ∀ θ ∈ Ω.

A consistent estimator converges in probability to what is being


estimated.2
For large enough n, the probability that Tn ∈ [q(θ) −  , q(θ) + ] is
practically unity for any (small)  > 0.
I.e., the sampling density of T n concentrates on the true value q(θ)
as n → ∞.

(To establish consistency of a given Tn , one usually approximates Tn with


a function of some sample average which we know how to deal with.)
2
Alternative formulation: “is consistent for...”
Statistics and Econometrics (CAU Kiel) Summer 2021 24 / 33
Large-sample properties

Non-uniqueness

Example
Let (X1 , ..., Xn ) be a random sample from a population
Pn with E(X) = µ and
Var(X) = σ 2 . Then the sample mean Tn = X̄n = n1 i=1 Xi is a consistent
estimator for µ since

σ2
E(X̄n ) = µ, and Var(X̄n ) = n →0 as n → ∞.
Pn
Now, consider as an alternative estimator Tn∗ = n−k1
i=1 Xi for a fixed value of
k. Even though Tn∗ is a biased estimator for µ, it is consistent, since

nσ 2
E(Tn∗ ) = nµ
n−k → µ, and Var(Tn∗ ) = (n−k)2 →0 as n → ∞.

We typically have many consistent estimators for an estimation problem


(in this example, asymptotically equivalent ones).

Statistics and Econometrics (CAU Kiel) Summer 2021 25 / 33


Large-sample properties

Consistent asymptotically normal (CAN)

Definition
An estimator T n is said to be a CAN estimator of q(θ) iff
√ d
n(T n − q(θ)) → N (0, Σ),

where Σ is a symmetric positive definite (covariance) matrix.

The asymptotic/approximative distribution of a CAN estimator T n is


√ a a
T n ∼ N (q θ), n1 Σ .

Z n = n(T n − q(θ)) ∼ N (0, Σ) ⇒

Slutsky’s theorem implies consistency!

Btw., non-uniqueness of the kind discussed on the previous slide is avoided



by focussing on n.

Statistics and Econometrics (CAU Kiel) Summer 2021 26 / 33


Large-sample properties

The (eternal) sample average

Example
Let (X1 , ..., Xn ) be a random sample from a population with
P E(X) = µ
and Var(X) = σ 2 . Then the sample mean Tn = X̄n = n1 ni=1 Xi is a
CAN estimator for µ since by Lindeberg-Lévy’s CLT
√ d a
n(X̄n − µ) → N (0, σ 2 ) X̄n ∼ N µ, n1 σ 2 .

and

Statistics and Econometrics (CAU Kiel) Summer 2021 27 / 33


Large-sample properties

Leaving the finite sample for good


Asymptotic versions of MSE, bias and variance can be defined w.r.t. the
unique asymptotic distribution of CAN estimators.

The asymptotic MSE for a CAN estimator Tn for the scalar q(θ) with
a
Tn ∼ N (q(θ), n1 σ 2 ) is

EA (Tn − q(θ))2

AMSEθ (Tn ) = (EA : Expect. w.r.t. asym. distribution)
= AVar(Tn ) + [EA (Tn ) − q(θ)]2 (Avar : Var of the asy. distribution)
| {z }
Asymtotic Bias = 0

= Avar(Tn ),
1 2
= nσ .

A CAN estimator for q(θ) is necessarily asymptotically unbiased.

Statistics and Econometrics (CAU Kiel) Summer 2021 28 / 33


Large-sample properties

(Scalar) Asymptotic relative efficiency

Definition
Let Tn and Tn∗ be CAN estimators of q(θ) such that
d
n1/2 (Tn − q(θ)) N 0, σT2


d
n1/2 (Tn∗ − q(θ)) N 0, σT2 ∗ .

and →

The asymptotic relative efficiency of Tn with respect to Tn∗ is


AMSEθ (Tn∗ ) σ2 ∗
AREθ (Tn , Tn∗ ) = = T2 ∀ θ ∈ Ω.
AMSEθ (Tn ) σT

Tn is asymptotically relatively more efficient than Tn∗ if

AREθ (T, T ∗ ) ≥1 ∀ θ ∈ Ω and ∃θ ∈ Ω with AREθ (T, T ∗ ) >1.

Statistics and Econometrics (CAU Kiel) Summer 2021 29 / 33


Large-sample properties

Uniformly smallest asymptotic variance?

If the estimator Tn is asymptotically relatively more efficient than Tn∗ , then


Tn∗ is called asymptotically inadmissible.
Otherwise, Tn∗ is asymptotically admissible.3

Define asymptotic efficiency in terms of CAN estimator with uniformly


smallest variance.
However,
one can show that for any CAN estimator,
there is an alternative estimator that has a smaller variance for at
least one θ ∈ Ω (the Hodges’ estimator).
Hence, we cannot define an achievable lower bound to the asymptotic
variance of CAN estimators.

3
In the scalar case; see see Mittelhammer (1996, Def. 7.16) for the vector case.
Statistics and Econometrics (CAU Kiel) Summer 2021 30 / 33
Large-sample properties

Unless...

But it can be shown that


under mild regularity conditions (LeCam)
there does exist a lower bound for the asymptotic variance of a CAN
estimator
that holds for almost all θ ∈ Ω, i.e. except perhaps on a countable
set of θ values (this matches the so called Cramér-Rao lower bound).

Definition
If Tn is a CAN estimator of q(θ) having the smallest asymptotic variance
among all CAN estimators ∀ θ ∈ Ω, except on a set of Lebesque measure
zero, Tn is said to be asymptotically efficient.

Statistics and Econometrics (CAU Kiel) Summer 2021 31 / 33


Up next

Outline

1 Estimators and their risk

2 Finite-sample properties

3 Large-sample properties

4 Up next

Statistics and Econometrics (CAU Kiel) Summer 2021 32 / 33


Up next

Coming up

The Cramér-Rao lower bound

Statistics and Econometrics (CAU Kiel) Summer 2021 33 / 33

You might also like