0% found this document useful (0 votes)
7 views110 pages

ST Topic 4

The document discusses point estimators in statistics, focusing on their definitions, properties, and examples. It covers concepts such as bias, variance, efficiency, and methods like Maximum Likelihood Estimation and Method of Moments. The aim is to provide a framework for estimating unknown population parameters using sample data.

Uploaded by

lucia.garzon500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views110 pages

ST Topic 4

The document discusses point estimators in statistics, focusing on their definitions, properties, and examples. It covers concepts such as bias, variance, efficiency, and methods like Maximum Likelihood Estimation and Method of Moments. The aim is to provide a framework for estimating unknown population parameters using sample data.

Uploaded by

lucia.garzon500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

Statistics

Topic 4: Point Estimators

Based on documents by Jesús Marı́a Pinar Pérez


Point estimators

1 Point estimators
2 Bias of an estimator
Properties of unbiased estimators
Unbiased estimators Examples
Variance of an estimator
Mean square error
3 Efficiency of an estimator
Cramér-Rao Lower Bound
Relative efficency
Efficiency Examples
4 Consistency of an estimator
Consistency Examples
5 Method of Moments
Moments
MOM Examples
6 Maximum Likelihood Estimation
MLE Examples

Statistics 2 / 62
Point estimators

Introduction: Point estimators

Suppose that the population is represented by its distribution function f with


some unknown population parameter denoted by θ. That parameter can be
for example µ, σ, p etc.

Statistics 3 / 62
Point estimators

Introduction: Point estimators

Suppose that the population is represented by its distribution function f with


some unknown population parameter denoted by θ. That parameter can be
for example µ, σ, p etc.
We select a function of sample random variables X1 , X2 , · · · , Xn
(observations), which we will denote by θ̂ = f (X1 , X2 , · · · .Xn ) and use to
obtain the inference about the value of parameter θ.

Statistics 3 / 62
Point estimators

Introduction: Point estimators

Suppose that the population is represented by its distribution function f with


some unknown population parameter denoted by θ. That parameter can be
for example µ, σ, p etc.
We select a function of sample random variables X1 , X2 , · · · , Xn
(observations), which we will denote by θ̂ = f (X1 , X2 , · · · .Xn ) and use to
obtain the inference about the value of parameter θ.
The statistic θ̂ is a function of the sample observations, so for each
determined sample X1 , X2 , · · · , Xn it will take a different value, and therefore
be based on a sample statistic (information provided by the sample).

Statistics 3 / 62
Point estimators

Introduction: Point estimators

Suppose that the population is represented by its distribution function f with


some unknown population parameter denoted by θ. That parameter can be
for example µ, σ, p etc.
We select a function of sample random variables X1 , X2 , · · · , Xn
(observations), which we will denote by θ̂ = f (X1 , X2 , · · · .Xn ) and use to
obtain the inference about the value of parameter θ.
The statistic θ̂ is a function of the sample observations, so for each
determined sample X1 , X2 , · · · , Xn it will take a different value, and therefore
be based on a sample statistic (information provided by the sample).

Statistics 3 / 62
Point estimators

Estimate

The goal is to use statistics to estimate the parameters.


We will see TWO types of estimators:
Point estimation. We will obtain a point, a value, as an estimate of the
parameter.
Confidence Intervals. We will obtain an interval, within we can estimate
(under a certain probability) the value of the parameter. (Next Topic).

Statistics 4 / 62
Point estimators

Point estimators

Let X be a population distributed with some pdf f (x , θ), where θ is some


unknown parameter(s).
Let X1 , X2 , · · · , Xn be a series of random elements picked from the
population. They form the sample of size n that was selected. They are
independent and identically distributed (i.i.d) random variables.
We define point estimator(s) θ̂ as a statistic that is used to approximate the
unknown parameter(s). θ.

Of course, once we have picked a sample then θ̂ can be calculated and assigned
a value. This value is called the point estimate θ. To summarize, θ̂ is typically a
“formula” whereas θ is an actual number.

Statistics 5 / 62
Point estimators

Common point estimators

Are you looking for the (unknown) mean of a population? Collect a sample
and report its average.
Are you looking for the (unknown) population variance? Collect a sample and
report its sample variance.
Recall that every distribution we have seen had some parameters that were
required to define it. For example, the binomial distribution needed n > 0
and p ≥ 0, whereas the exponential distribution or the Poisson distribution
needed λ > 0.
For a single population:
Parameters Point estimators
Population mean µ Sample average θ̂ = X̄
Population variance σ 2 Sample variance θ̂ = S 2
Population proportion p Sample proportion nn̂

Statistics 6 / 62
Point estimators

Example

Suppose we want to estimate the average income µ of all families in a city. It


seems logical to use the sample mean X̄ as an estimator of the population
mean µ.
We need to select a random sample (for example n = 80), and then we would
obtain the average income of the sample, for example x̄ = 800€.
Then the estimator of the population mean µ will be
Pn
Xi
µ̂ = X̄ = i=1 ,
n

that is, the mean statistic will be X̄ and the point estimate will be x̄ = 800€.

Statistics 7 / 62
Bias of an estimator

1 Point estimators
2 Bias of an estimator
Properties of unbiased estimators
Unbiased estimators Examples
Variance of an estimator
Mean square error
3 Efficiency of an estimator
Cramér-Rao Lower Bound
Relative efficency
Efficiency Examples
4 Consistency of an estimator
Consistency Examples
5 Method of Moments
Moments
MOM Examples
6 Maximum Likelihood Estimation
MLE Examples

Statistics 8 / 62
Bias of an estimator

What makes a good estimator?


We will have as many point estimates, in each sample, as estimators we have
constructed. We need criteria that allow us to choose the “best” estimator
among the possible ones in each case.

Statistics 9 / 62
Bias of an estimator

What makes a good estimator?


We will have as many point estimates, in each sample, as estimators we have
constructed. We need criteria that allow us to choose the “best” estimator
among the possible ones in each case.
Every estimator has two main items we want to evaluate it by: bias and
standard error or variance. Reducing bias improves accuracy, while
reducing variance minimizes dispersion.

Statistics 9 / 62
Bias of an estimator

Bias

We define the bias of a point estimator as the difference between its expectation
and the parameter itself.
Bias

bias[θ̂] = E [θ̂] − θ

The bias can be positive, negative, or even zero.


If bias (θ̂) > 0 the estimator overestimates the value of the unknown
parameter.
If bias (θ̂) < 0 the estimator underestimates the value of the unknown
parameter.
If bias (θ̂) = 0 the estimator is referred to as unbiased.

Statistics 10 / 62
Bias of an estimator

Bias example

Assume a population with mean µ and variance σ 2 . As the mean is unknown you
decide to use the following three approaches to estimate it:
1 Get the average from a sample of 3 randomly picked observations, that is
X1 + X2 + X3
.
3
2 Get a sample of 3 randomly picked observations and calculate
2 · X1 + X2 − X3
.
2
3 Get a sample of 3 randomly picked observations and calculate

2X1 + X2 − X3 .

What are the biases of each of the three point estimators?

Statistics 11 / 62
Bias of an estimator

Bias example
θ̂1 = X1 +X2 +X3
3 .
Then
X1 + X2 + X3 1
h i  
E θ̂1 = E = (E [X1 ] + E [X2 ] + E [X3 ])
3 3
1
= (µ + µ + µ) = µ =⇒ bias[θ̂1 ] = µ − µ = 0.
3

Statistics 12 / 62
Bias of an estimator

Bias example
θ̂1 = X1 +X2 +X3
3 .
Then
X1 + X2 + X3 1
h i  
E θ̂1 = E = (E [X1 ] + E [X2 ] + E [X3 ])
3 3
1
= (µ + µ + µ) = µ =⇒ bias[θ̂1 ] = µ − µ = 0.
3
θ̂2 = 2·X1 +X2 −X3
2 .
Then
2 · X1 + X2 − X3 2µ + µ − µ
h i  
E θ̂2 = E = =µ
2 2
=⇒ bias[θ̂2 ] = µ − µ = 0.

Statistics 12 / 62
Bias of an estimator

Bias example
θ̂1 = X1 +X2 +X3
3 .
Then
X1 + X2 + X3 1
h i  
E θ̂1 = E = (E [X1 ] + E [X2 ] + E [X3 ])
3 3
1
= (µ + µ + µ) = µ =⇒ bias[θ̂1 ] = µ − µ = 0.
3
θ̂2 = 2·X1 +X2 −X3
2 .
Then
2 · X1 + X2 − X3 2µ + µ − µ
h i  
E θ̂2 = E = =µ
2 2
=⇒ bias[θ̂2 ] = µ − µ = 0.

θ̂3 = 2 · X1 + X2 − X3 . Then

E [θ̂3 ] = E [2 · X1 + X2 − X3 ] = 2µ + µ − µ = 2µ
=⇒ bias[θ̂3 ] = 2µ − µ = µ.

Statistics 12 / 62
Bias of an estimator

Bias example
θ̂1 = X1 +X2 +X3
3 .
Then
X1 + X2 + X3 1
h i  
E θ̂1 = E = (E [X1 ] + E [X2 ] + E [X3 ])
3 3
1
= (µ + µ + µ) = µ =⇒ bias[θ̂1 ] = µ − µ = 0.
3
θ̂2 = 2·X1 +X2 −X3
2 .
Then
2 · X1 + X2 − X3 2µ + µ − µ
h i  
E θ̂2 = E = =µ
2 2
=⇒ bias[θ̂2 ] = µ − µ = 0.

θ̂3 = 2 · X1 + X2 − X3 . Then

E [θ̂3 ] = E [2 · X1 + X2 − X3 ] = 2µ + µ − µ = 2µ
=⇒ bias[θ̂3 ] = 2µ − µ = µ.

The first two estimators are unbiased. The last one is biased and its bias is µ.
Statistics 12 / 62
Bias of an estimator Properties of unbiased estimators

Properties of unbiased estimators

If θ̂1 and θ̂2 are two unbiased estimators of the parameter θ, then the
estimator θ̂ defined as

θ̂ = λθ̂1 + (1 − λ)θ̂2 , λ ∈ (0, 1).

is also an unbiased estimator of the parameter θ.

Statistics 13 / 62
Bias of an estimator Properties of unbiased estimators

Properties of unbiased estimators

If θ̂1 and θ̂2 are two unbiased estimators of the parameter θ, then the
estimator θ̂ defined as

θ̂ = λθ̂1 + (1 − λ)θ̂2 , λ ∈ (0, 1).

is also an unbiased estimator of the parameter θ.


In hfact,
i sinceh θ̂1i and θ̂2 are two unbiased estimators then
E θ̂1 = E θ̂2 = θ. Hence
h i h i h i
E [θ̂] = E λθ̂1 + (1 − λ)θ̂2 = λE θ̂1 + (1 − λ)E θ̂2
= λθ + (1 − λ)θ = θ

Statistics 13 / 62
Bias of an estimator Unbiased estimators Examples

Unbiasad estimators example

Show that Pn 2
Xi − X̄
S = 2 i=1
n−1
is an unbiased estimator of the population variance.
(Xi −X̄ )
Pn 2

Recall that, if S = i=1n−1


2
then

E [S 2 ] = σ 2 .

Statistics 14 / 62
Bias of an estimator Unbiased estimators Examples

Unbiasad estimators example

Show that Pn 2
Xi − X̄
S = 2 i=1
n−1
is an unbiased estimator of the population variance.
(Xi −X̄ )
Pn 2

Recall that, if S = i=1n−1


2
then

E [S 2 ] = σ 2 .

Hence
bias[S 2 ] = E [S 2 ] − σ 2 = 0.

Statistics 14 / 62
Bias of an estimator Variance of an estimator

Standard Error

The variance of a point estimator is

Var[θ̂] = E [(θ̂ − E [θ̂])2 ].

We define the standard error of a point estimator as the square root of its variance,
that is,

q
SE [θ̂] = Var[θ̂].

We want this to be minimum. A point estimator with minimum variance and zero
bias is called a minimum variance unbiased estimator.

Statistics 15 / 62
Bias of an estimator Variance of an estimator

Variances example
Assume the same population with unknown mean µ and variance σ 2 (as in bias
Example). We use again the three estimators from before. What are the variances?

Statistics 16 / 62
Bias of an estimator Variance of an estimator

Variances example
Assume the same population with unknown mean µ and variance σ 2 (as in bias
Example). We use again the three estimators from before. What are the variances?

θ̂1 = X1 +X2 +X3


3 . Then

X1 + X2 + X3 1 1 σ2
h i
= (Var [X1 ] + Var [X2 ] + Var [X3 ]) = 3σ 2 =
 
Var θ̂1 = Var .
3 9 9 3

Statistics 16 / 62
Bias of an estimator Variance of an estimator

Variances example
Assume the same population with unknown mean µ and variance σ 2 (as in bias
Example). We use again the three estimators from before. What are the variances?

θ̂1 = X1 +X2 +X3


3 . Then

X1 + X2 + X3 1 1 σ2
h i
= (Var [X1 ] + Var [X2 ] + Var [X3 ]) = 3σ 2 =
 
Var θ̂1 = Var .
3 9 9 3

θ̂2 = Then
2·X1 +X2 −X3
2 .

2X1 − X2 + X3 1 1
h i  
Var θ̂2 = Var = Var [X1 ] + Var [X2 ] + Var [X3 ]
2 4 4
σ 2
σ 2
3
= σ2 + + = σ2 .
4 4 2

Statistics 16 / 62
Bias of an estimator Variance of an estimator

Variances example
Assume the same population with unknown mean µ and variance σ 2 (as in bias
Example). We use again the three estimators from before. What are the variances?

θ̂1 = X1 +X2 +X3


3 . Then

X1 + X2 + X3 1 1 σ2
h i
= (Var [X1 ] + Var [X2 ] + Var [X3 ]) = 3σ 2 =
 
Var θ̂1 = Var .
3 9 9 3

θ̂2 = Then
2·X1 +X2 −X3
2 .

2X1 − X2 + X3 1 1
h i  
Var θ̂2 = Var = Var [X1 ] + Var [X2 ] + Var [X3 ]
2 4 4
σ 2
σ 2
3
= σ2 + + = σ2 .
4 4 2

θ̂3 =h 2 ·iX1 + X2 − X3 . Then


Var θ̂3 = Var [2 · X1 + X2 − X3 ] = 4σ 2 + σ 2 + σ 2 = 6σ 2 .

Statistics 16 / 62
Bias of an estimator Variance of an estimator

Variances example
Assume the same population with unknown mean µ and variance σ 2 (as in bias
Example). We use again the three estimators from before. What are the variances?

θ̂1 = X1 +X2 +X3


3 . Then

X1 + X2 + X3 1 1 σ2
h i
= (Var [X1 ] + Var [X2 ] + Var [X3 ]) = 3σ 2 =
 
Var θ̂1 = Var .
3 9 9 3

θ̂2 = Then
2·X1 +X2 −X3
2 .

2X1 − X2 + X3 1 1
h i  
Var θ̂2 = Var = Var [X1 ] + Var [X2 ] + Var [X3 ]
2 4 4
σ 2
σ 2
3
= σ2 + + = σ2 .
4 4 2

θ̂3 =h 2 ·iX1 + X2 − X3 . Then


Var θ̂3 = Var [2 · X1 + X2 − X3 ] = 4σ 2 + σ 2 + σ 2 = 6σ 2 .
Among the three options, θ̂1 is the minimum variance unbiased estimator.
Statistics 16 / 62
Bias of an estimator Mean square error

Mean square error


We define the mean square error of a point estimator as the expected value
of the square error (θ̂ − θ)2 :

h i
MSE = E (θ̂ − θ)2

Statistics 17 / 62
Bias of an estimator Mean square error

Mean square error


We define the mean square error of a point estimator as the expected value
of the square error (θ̂ − θ)2 :

h i
MSE = E (θ̂ − θ)2

We can use this to derive the fact that the mean square error is equal to the
summation of the variance plus the square of the bias:

MSE(θ̂)
h i
= E (θ̂ − θ)2 = E [(θ̂ − E [θ̂])2 ] + (θ − E [θ̂])2 − 2E [(θ̂ − E [θ̂])(θ − E [θ̂])] =
= Var[θ̂] + bias[θ̂]2 .

Statistics 17 / 62
Bias of an estimator Mean square error

Mean square error


We define the mean square error of a point estimator as the expected value
of the square error (θ̂ − θ)2 :

h i
MSE = E (θ̂ − θ)2

We can use this to derive the fact that the mean square error is equal to the
summation of the variance plus the square of the bias:

MSE(θ̂)
h i
= E (θ̂ − θ)2 = E [(θ̂ − E [θ̂])2 ] + (θ − E [θ̂])2 − 2E [(θ̂ − E [θ̂])(θ − E [θ̂])] =
= Var[θ̂] + bias[θ̂]2 .
By definition, the MSE tries to capture both bias and variance at the same
time. Hence, we typically say that one estimator is better than another if its
MSE is smaller.
Statistics 17 / 62
Bias of an estimator Mean square error

Mean square errors example


Assume the same population with unknown mean µ and variance σ 2 . We use the
three estimators θ̂1 , θ̂2 , and θ̂3 . What are the mean square errors of each one?
Which one would we prefer?

Statistics 18 / 62
Bias of an estimator Mean square error

Mean square errors example


Assume the same population with unknown mean µ and variance σ 2 . We use the
three estimators θ̂1 , θ̂2 , and θ̂3 . What are the mean square errors of each one?
Which one would we prefer?
θ̂1 = X1 +X2 +X3
3 . Then
  σ2 σ2
MSE θ̂1 = Var[θ̂1 ] + bias[θ̂1 ]2 = +0= .
3 3

Statistics 18 / 62
Bias of an estimator Mean square error

Mean square errors example


Assume the same population with unknown mean µ and variance σ 2 . We use the
three estimators θ̂1 , θ̂2 , and θ̂3 . What are the mean square errors of each one?
Which one would we prefer?
θ̂1 = X1 +X2 +X3
3 . Then
  σ2 σ2
MSE θ̂1 = Var[θ̂1 ] + bias[θ̂1 ]2 = +0= .
3 3
θ̂2 = 2·X1 +X2 −X3
2 . Then
  3σ 2 3σ 2
MSE θ̂2 = Var[θ̂2 ] + bias[θ̂2 ]2 = +0= .
2 2

Statistics 18 / 62
Bias of an estimator Mean square error

Mean square errors example


Assume the same population with unknown mean µ and variance σ 2 . We use the
three estimators θ̂1 , θ̂2 , and θ̂3 . What are the mean square errors of each one?
Which one would we prefer?
θ̂1 = X1 +X2 +X3
3 . Then
  σ2 σ2
MSE θ̂1 = Var[θ̂1 ] + bias[θ̂1 ]2 = +0= .
3 3
θ̂2 = 2·X1 +X2 −X3
2 . Then
  3σ 2 3σ 2
MSE θ̂2 = Var[θ̂2 ] + bias[θ̂2 ]2 = +0= .
2 2
θ̂3 = 2 · X1 + X2 − X3 . Then
 
MSE θ̂3 = Var[θ̂3 ] + bias[θ̂3 ]2 = 6σ 2 + µ2 .

Statistics 18 / 62
Bias of an estimator Mean square error

Mean square errors example


Assume the same population with unknown mean µ and variance σ 2 . We use the
three estimators θ̂1 , θ̂2 , and θ̂3 . What are the mean square errors of each one?
Which one would we prefer?
θ̂1 = X1 +X2 +X3
3 . Then
  σ2 σ2
MSE θ̂1 = Var[θ̂1 ] + bias[θ̂1 ]2 = +0= .
3 3
θ̂2 = 2·X1 +X2 −X3
2 . Then
  3σ 2 3σ 2
MSE θ̂2 = Var[θ̂2 ] + bias[θ̂2 ]2 = +0= .
2 2
θ̂3 = 2 · X1 + X2 − X3 . Then
 
MSE θ̂3 = Var[θ̂3 ] + bias[θ̂3 ]2 = 6σ 2 + µ2 .

Hence θ̂1 has the smallest MSE (as expected), followed by θ̂2 .
Statistics 18 / 62
Efficiency of an estimator

1 Point estimators
2 Bias of an estimator
Properties of unbiased estimators
Unbiased estimators Examples
Variance of an estimator
Mean square error
3 Efficiency of an estimator
Cramér-Rao Lower Bound
Relative efficency
Efficiency Examples
4 Consistency of an estimator
Consistency Examples
5 Method of Moments
Moments
MOM Examples
6 Maximum Likelihood Estimation
MLE Examples

Statistics 19 / 62
Efficiency of an estimator

Efficiency of an Estimator

Efficiency is measured by the variance of the estimator.


The most efficient estimator among a group of unbiased estimators will be
the one with the smallest variance.

Statistics 20 / 62
Efficiency of an estimator

Efficiency of an Estimator, visual example

Each curve represents the probability distribution of an unbiased estimator


(centered at the true parameter θ = 50).
Estimator 1 (Small Variance, solid line) is the most efficient because it
is the most concentrated around θ, leading to more precise estimates.
Estimator 2 (Medium Variance, dashed line) and Estimator 3 (Large
Variance, dotted line) have increasing spread, indicating higher variance
and lower efficiency.
Statistics 21 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

Cramér-Rao Lower Bound (CRLB)

To select the most efficient estimator among the set of all possible estimators
of an unknown parameter θ, it is necessary to calculate its variances and
check which is the smallest.

Statistics 22 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

Cramér-Rao Lower Bound (CRLB)

To select the most efficient estimator among the set of all possible estimators
of an unknown parameter θ, it is necessary to calculate its variances and
check which is the smallest.
We may not be able to define the set of all possible estimators, so we
typically investigate the efficiency of an estimator by comparing its variance
to a quantity, called the Cramér-Rao bound.

Statistics 22 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

Cramér-Rao Lower Bound (CRLB)

To select the most efficient estimator among the set of all possible estimators
of an unknown parameter θ, it is necessary to calculate its variances and
check which is the smallest.
We may not be able to define the set of all possible estimators, so we
typically investigate the efficiency of an estimator by comparing its variance
to a quantity, called the Cramér-Rao bound.
The Cramér-Rao bound provides a lower bound for the variance of any
unbiased estimator:
1
Var[θ̂] ≥
I(θ)
Where I(θ) is the Fisher Information, defined as:
" 2 #

I(θ) = E ln f (X ; θ) .
∂θ

Statistics 22 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

Cramér-Rao Lower Bound (CRLB)

If we have n independent and identically distributed random variables with


density function f (X ; θ). Then:
" 2 #

In (θ) = n · E ln f (X ; θ) .
∂θ

The Cramér-Rao Bound for the entire sample then becomes:


1 1
Var[θ̂] ≥ = .
In (θ) nI(θ)

Statistics 23 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

The Cramér-Rao Bound: Example 1


Suppose we have n independent observations: X1 , X2 , · · · , Xn ∼ Exp(λ). Let us
compute the Cramér-Rao Bound.

Recall that, for the exponential distribution we have


(
λe −λx x ≥ 0,
f (x ) = .
0 x <0

Where E [X ] = λ1 , Var[X ] = 1
λ2 and E [X 2 ] = Var[X ] + E [X ]2 = 2
λ2 .

Statistics 24 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

The Cramér-Rao Bound: Example 1


Suppose we have n independent observations: X1 , X2 , · · · , Xn ∼ Exp(λ). Let us
compute the Cramér-Rao Bound.

Recall that, for the exponential distribution we have


(
λe −λx x ≥ 0,
f (x ) = .
0 x <0

Where E [X ] = λ1 , Var[X ] = 1
λ2 and E [X 2 ] = Var[X ] + E [X ]2 = 2
λ2 .
Then, for x ≥ 0,
∂ ∂ 1
ln f (X ; λ) = (ln λ − λX ) = − X .
∂λ ∂λ λ
So that, the Fisher information is:
" 2 #

I(λ) = E ln f (X ; λ) .
∂λ

Statistics 24 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

The Cramér-Rao Bound: Example 1

Then
" 2 #
1 1 2X 1 2 2 1
 
I(λ) = E −X =E 2
− + X 2
= 2 − 2 + 2 = 2.
λ λ λ λ λ λ λ

Thus, the Cramér-Rao Bound (for n independent r.v.) is:

1 λ2
Var(λ̂) ≥ = .
In (λ) n

Statistics 25 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

Efficiency of unbiased estimator

The efficiency of an unbiased estimator θ̂, of the parameter θ is defined as:

Cramér-Rao Bound
Efficiency(θ̂) =
Var(θ̂)

Verifying that Efficiency(θ̂) ≤ 1.

An unbiased estimator is most efficient when its efficiency is equal to 1. This


means that the estimator achieves the Cramér-Rao Lower Bound, which
represents the lowest possible variance an unbiased estimator can have.

Statistics 26 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

The Cramér-Rao Bound - Efficiency: Example 2

Given: X1 , X2 , · · · , Xn ∼ N(µ, σ 2 ) with unknown µ. Compute the Cramér-Rao


Bound (CRB) and efficiency of the sample mean µ̂ for a normal distribution.

The Fisher Information for parameter µ is:


" 2 #

I(µ) = E ln f (X ; µ)
∂µ

Statistics 27 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

The Cramér-Rao Bound - Efficiency: Example 2

Given: X1 , X2 , · · · , Xn ∼ N(µ, σ 2 ) with unknown µ. Compute the Cramér-Rao


Bound (CRB) and efficiency of the sample mean µ̂ for a normal distribution.

The Fisher Information for parameter µ is:


" 2 #

I(µ) = E ln f (X ; µ)
∂µ

(x −µ)2
For a normal distribution: f (x ; µ) = √ 1 e− 2σ 2 . Then
2πσ 2

1 (x − µ)2
ln f (x ; µ) = ln( √ )− .
2πσ 2 2σ 2
So that
∂ x −µ
ln f (x ; µ) = .
∂µ σ2
Statistics 27 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

The Cramér-Rao Bound - Efficiency: Example 2

Hence " 2 #
X −µ E [(X − µ)2 ] 1
I(µ) = E = = 2,
σ2 σ4 σ

because E [(X − µ)2 ] = σ 2 .

Statistics 28 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

The Cramér-Rao Bound - Efficiency: Example 2

Hence " 2 #
X −µ E [(X − µ)2 ] 1
I(µ) = E = = 2,
σ2 σ4 σ

because E [(X − µ)2 ] = σ 2 .


For a sample of size n, the Fisher Information is:
n
In (µ) =
σ2
The Cramér-Rao Bound is:
1 σ2
Var(µ̂) ≥ = .
In (µ) n

This is the lowest possible variance for any unbiased estimator of µ.

Statistics 28 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

The Cramér-Rao Bound - Efficiency: Example 2

Let us compute the efficiency of the sample mean.


The sample mean is defined as:
n
1X
µ̂ = X̄ = Xi .
n i=1

Statistics 29 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

The Cramér-Rao Bound - Efficiency: Example 2

Let us compute the efficiency of the sample mean.


The sample mean is defined as:
n
1X
µ̂ = X̄ = Xi .
n i=1

Recall that, the variance of the sample mean is:

σ2
Var(µ̂) = .
n
Since this exactly matches the Cramér-Rao Bound, the efficiency is:

Cramér-Rao Bound σ 2 /n
Efficiency(µ̂) = = 2 = 1.
Var(µ̂) σ /n

The sample mean µ̂ is a minimum variance unbiased estimator for µ.

Statistics 29 / 62
Efficiency of an estimator Cramér-Rao Lower Bound

Relative efficency

We may also define the relative efficiency as the ratio of two estimator mean
square errors: if θ̂1 and θ̂2 are unbiased estimators of θ, the relative efficiency of
θ̂1 with respect to θ̂2 is:

Efficiency[θˆ1 ] Var[θˆ2 ]
Relative efficiency (θˆ1 , θˆ2 ) = = .
ˆ
Efficiency[θ2 ] Var[θˆ1 ]

If the relative efficiency is less than 1, then we say that point estimator θ̂2 is
preferred to point estimator θ̂1 or, the variance of point estimator θ̂2 is smaller than
the variance of point estimator θ̂1 .

Statistics 30 / 62
Efficiency of an estimator Relative efficency

Relative efficency

Assume the same population with unknown mean µ and variance σ 2 . We use the
three estimators θ̂1 , θ̂2 , and θ̂3 . The relative efficiency of the only two unbiased
estimators is:
θ̂1 , θ̂2 :
Var[θˆ2 ] 3σ 2
9
Relative efficiency (θˆ1 , θˆ2 ) = = 2
= > 1,
Var[θˆ1 ] σ2
3
2

so θ̂1 is preferred.

Statistics 31 / 62
Efficiency of an estimator Efficiency Examples

Efficiency: example

Let (X1 , X2 , · · · .Xn ) be a simple random sample from a population N(µ, σ). Using
the sample mean X̄ and the sample median Xm as estimators of the population
mean µ. Study their relative efficiency.

Since the original population is normally distributed, we know that they are
unbiased estimators, due to the symmetry of the distribution and the
coincidence of the mean, median and mode.
2
Recall that Var(X̄ ) = σn .

Statistics 32 / 62
Efficiency of an estimator Efficiency Examples

Efficiency: example

Let (X1 , X2 , · · · .Xn ) be a simple random sample from a population N(µ, σ). Using
the sample mean X̄ and the sample median Xm as estimators of the population
mean µ. Study their relative efficiency.

Since the original population is normally distributed, we know that they are
unbiased estimators, due to the symmetry of the distribution and the
coincidence of the mean, median and mode.
2
Recall that Var(X̄ ) = σn .
π σ2 2
It can be prove that Var (Xm ) = 2 n ≈ 1, 57 σn .

Statistics 32 / 62
Efficiency of an estimator Efficiency Examples

Efficiency: example

Let (X1 , X2 , · · · .Xn ) be a simple random sample from a population N(µ, σ). Using
the sample mean X̄ and the sample median Xm as estimators of the population
mean µ. Study their relative efficiency.

Since the original population is normally distributed, we know that they are
unbiased estimators, due to the symmetry of the distribution and the
coincidence of the mean, median and mode.
2
Recall that Var(X̄ ) = σn .
π σ2 2
It can be prove that Var (Xm ) = 2 n ≈ 1, 57 σn .
Hence, the relative efficiency is
2
Var (Xm ) 1, 57 σ
= σ2 n = 1.57.
Var(X̄ ) n

This implies that the sample median Xm is 1.57 times less efficient than the
sample mean X̄ in estimating the population mean µ.
Statistics 32 / 62
Consistency of an estimator

1 Point estimators
2 Bias of an estimator
Properties of unbiased estimators
Unbiased estimators Examples
Variance of an estimator
Mean square error
3 Efficiency of an estimator
Cramér-Rao Lower Bound
Relative efficency
Efficiency Examples
4 Consistency of an estimator
Consistency Examples
5 Method of Moments
Moments
MOM Examples
6 Maximum Likelihood Estimation
MLE Examples

Statistics 33 / 62
Consistency of an estimator

Consistency of an Estimator
Until now the estimators were based on random samples of size n, but it seems
logical to think that when n is larger the estimator will be better.

Let {θ̂1 , θ̂1 , · · · , θ̂n } be a sequence of estimators of the parameter θ obtained from
samples of sizes 1, 2, · · · , n, respectively. We say that the sequence θ̂n is
consistent if, for all ε > 0,

lim P(|θ̂n − θ| ≥ ε) = 0.
n→∞

Each element of the sequence will be said to be a consistent estimator.

Statistics 34 / 62
Consistency of an estimator

Consistency of an Estimator
Until now the estimators were based on random samples of size n, but it seems
logical to think that when n is larger the estimator will be better.

Let {θ̂1 , θ̂1 , · · · , θ̂n } be a sequence of estimators of the parameter θ obtained from
samples of sizes 1, 2, · · · , n, respectively. We say that the sequence θ̂n is
consistent if, for all ε > 0,

lim P(|θ̂n − θ| ≥ ε) = 0.
n→∞

Each element of the sequence will be said to be a consistent estimator.

This means that as the sample size n increases, the estimator θ̂n converges in
probability to the true parameter θ. For example:
Pn
If estimating the population mean µ, then θ̂n = X̄n = n1 i=1 Xi .
If estimating thePpopulation variance σ 2 , then
n
θ̂n = Sn2 = n−1
1
i=1 (Xi − X̄n ) .
2

Statistics 34 / 62
Consistency of an estimator

Consistency of an Estimator

Chebyshev’s Theorem: Applying Chebyshev’s Theorem to the random variable θ̂


we have that

E [(θn − θ)2 ] MSE(θ̂)


P(|θ̂n − θ| ≥ ε) ≤ =
ε2 ε2
Var(θ̂n ) bias(θ̂n )2
= +
ε2 ε2

Statistics 35 / 62
Consistency of an estimator

Consistency of an Estimator

Chebyshev’s Theorem: Applying Chebyshev’s Theorem to the random variable θ̂


we have that

E [(θn − θ)2 ] MSE(θ̂)


P(|θ̂n − θ| ≥ ε) ≤ =
ε2 ε2
Var(θ̂n ) bias(θ̂n )2
= +
ε2 ε2

Therefore, if {θ̂1 , θ̂1 , · · · , θ̂n } is a sequence of (unbiased) estimators, since bias(θ̂n ) =


0 for every n, it is enough to show that

lim Var(θ̂n ) = 0.
n→∞

Statistics 35 / 62
Consistency of an estimator Consistency Examples

Consistency of the sample mean. Exercise 6


Let X1 , X2 , · · · , Xn be an i.i.d sample of size n from a population with mean µ and
(finite) variance σ 2 . Recall that the sample mean, for every n is defined as
n
1X
X̄n = Xi .
n i=1

We have showed that X̄n is an unbiased estimator for every n.


Recall that, for every n,
σ2
Var(X̄n ) = .
n

Statistics 36 / 62
Consistency of an estimator Consistency Examples

Consistency of the sample mean. Exercise 6


Let X1 , X2 , · · · , Xn be an i.i.d sample of size n from a population with mean µ and
(finite) variance σ 2 . Recall that the sample mean, for every n is defined as
n
1X
X̄n = Xi .
n i=1

We have showed that X̄n is an unbiased estimator for every n.


Recall that, for every n,
σ2
Var(X̄n ) = .
n

Hence, since σ 2 is finite,

σ2
lim Var(X̄n ) = lim = 0.
n→∞ n→∞ n

Therefore for every n, X̄n is a consistent estimator of µ.


Statistics 36 / 62
Consistency of an estimator Consistency Examples

Consistency of the sample variance. Exercise 6


Let X1 , X2 , · · · , Xn be an independent and normally distributed sample of size n
from a population with mean µ and (finite) variance σ 2 . For each n, the sample
variance, is defined by
n
1 X
Sn2 = (Xi − X̄n )2 ,
n − 1 i=1

Recall that, since E [Sn2 ] = σ 2 , then Sn2 is an unbiased estimator for every n.

Statistics 37 / 62
Consistency of an estimator Consistency Examples

Consistency of the sample variance. Exercise 6


Let X1 , X2 , · · · , Xn be an independent and normally distributed sample of size n
from a population with mean µ and (finite) variance σ 2 . For each n, the sample
variance, is defined by
n
1 X
Sn2 = (Xi − X̄n )2 ,
n − 1 i=1

Recall that, since E [Sn2 ] = σ 2 , then Sn2 is an unbiased estimator for every n.
If the population follows a normal distribution, recall that, for every n, the
variance of Sn2 is given by:

2σ 4
Var(Sn2 ) = .
n−1

Statistics 37 / 62
Consistency of an estimator Consistency Examples

Consistency of the sample variance. Exercise 6


Let X1 , X2 , · · · , Xn be an independent and normally distributed sample of size n
from a population with mean µ and (finite) variance σ 2 . For each n, the sample
variance, is defined by
n
1 X
Sn2 = (Xi − X̄n )2 ,
n − 1 i=1

Recall that, since E [Sn2 ] = σ 2 , then Sn2 is an unbiased estimator for every n.
If the population follows a normal distribution, recall that, for every n, the
variance of Sn2 is given by:

2σ 4
Var(Sn2 ) = .
n−1

So that
2σ 4
lim Var(Sn2 ) = lim = 0.
n→∞ n→∞ n − 1

Therefore for every n, Sn2 is a consistent estimator of σ 2 .


Statistics 37 / 62
Method of Moments

1 Point estimators
2 Bias of an estimator
Properties of unbiased estimators
Unbiased estimators Examples
Variance of an estimator
Mean square error
3 Efficiency of an estimator
Cramér-Rao Lower Bound
Relative efficency
Efficiency Examples
4 Consistency of an estimator
Consistency Examples
5 Method of Moments
Moments
MOM Examples
6 Maximum Likelihood Estimation
MLE Examples

Statistics 38 / 62
Method of Moments

Method of estimation

Assume we are provided a population X distributed with unknown parameter(s) θ.


We want to estimate θ. Given a series of observations (sample) X1 , X2 · · · , Xn ,
how to come up with a “good” point estimator θ̂? Two widely used methods for
parameter estimation are:
Method of Moments (MOM): Matches sample moments to population
moments to estimate parameters.
Maximum Likelihood Estimation (MLE): Finds the parameter that
maximizes the likelihood function.

Statistics 39 / 62
Method of Moments Moments

Moments
Suppose that we have a population X distributed with PDF f (x ). We have managed
to collect a set of samples from the population X1 , X2 , · · · , Xn . Then:
The k-th population moment of a continuous population X is
Z +∞
E Xk = x k f (x )dx .
 
−∞

The k-th population moment of a discrete population X with PMF P(x ) is


  X k
E Xk = x P(x ).
x ∈X

The k-th sample moment of X is


n
1X k
X
n i=1 i
where X1 , X2 , · · · , Xn are samples from the population X .

Statistics 40 / 62
Method of Moments Moments

Moments

By definition, the first population moment of X is the population mean, and


the first sample moment of X is the sample average.
On the other hand, the second population moment of X is not the population
variance; instead, E X 2 is only part of the calculation of the variance:


Var[X ] = E X 2 − (E [X ])2
 

Similarly, the second sample moment of X is not the sample variance!

Statistics 41 / 62
Method of Moments Moments

Moments: Example

Suppose that f (x ) = 12 (1 − α · x ) defined for −1 ≤ x ≤ 1, where α is some


parameter. What are the first three population moments?

Statistics 42 / 62
Method of Moments Moments

Moments: Example

Suppose that f (x ) = 12 (1 − α · x ) defined for −1 ≤ x ≤ 1, where α is some


parameter. What are the first three population moments?
First moment:
+1 +1
1
Z Z
α
E [X ] = x · f (x )dx = x · (1 − α · x )dx = − .
−1 −1 2 3

Statistics 42 / 62
Method of Moments Moments

Moments: Example

Suppose that f (x ) = 12 (1 − α · x ) defined for −1 ≤ x ≤ 1, where α is some


parameter. What are the first three population moments?
First moment:
+1 +1
1
Z Z
α
E [X ] = x · f (x )dx = x · (1 − α · x )dx = − .
−1 −1 2 3
Second moment:
+1 +1
1 1
Z Z
E X2 = x 2 · f (x )dx = x 2 · (1 − α · x )dx = .
 
−1 −1 2 3

Statistics 42 / 62
Method of Moments Moments

Moments: Example

Suppose that f (x ) = 12 (1 − α · x ) defined for −1 ≤ x ≤ 1, where α is some


parameter. What are the first three population moments?
First moment:
+1 +1
1
Z Z
α
E [X ] = x · f (x )dx = x · (1 − α · x )dx = − .
−1 −1 2 3
Second moment:
+1 +1
1 1
Z Z
E X2 = x 2 · f (x )dx = x 2 · (1 − α · x )dx = .
 
−1 −1 2 3

Third moment:
+1 +1
1
Z Z
α
E X3 = x · f (x )dx =
3
x 3 · (1 − α · x )dx = − .
 
−1 −1 2 5

Statistics 42 / 62
Method of Moments MOM Examples

Method of moments: Example


Suppose that we have collected n = 5 samples from the population distributed with
X1 = 0.7, X2 = 0.77, X3 = 0.65, X4 = 0.5, X5 = 0.83 . What are the first three
sample moments?

Statistics 43 / 62
Method of Moments MOM Examples

Method of moments: Example


Suppose that we have collected n = 5 samples from the population distributed with
X1 = 0.7, X2 = 0.77, X3 = 0.65, X4 = 0.5, X5 = 0.83 . What are the first three
sample moments?
First moment:
n
1X 1
Xi = (0.7 + 0.77 + 0.65 + 0.5 + 0.83) = 0.69.
n i=1 5

Statistics 43 / 62
Method of Moments MOM Examples

Method of moments: Example


Suppose that we have collected n = 5 samples from the population distributed with
X1 = 0.7, X2 = 0.77, X3 = 0.65, X4 = 0.5, X5 = 0.83 . What are the first three
sample moments?
First moment:
n
1X 1
Xi = (0.7 + 0.77 + 0.65 + 0.5 + 0.83) = 0.69.
n i=1 5

Second moment:
n
1X 2 1
Xi = 0.72 + 0.772 + 0.652 + 0.52 + 0.832 = 0.48886.

n i=1 5

Statistics 43 / 62
Method of Moments MOM Examples

Method of moments: Example


Suppose that we have collected n = 5 samples from the population distributed with
X1 = 0.7, X2 = 0.77, X3 = 0.65, X4 = 0.5, X5 = 0.83 . What are the first three
sample moments?
First moment:
n
1X 1
Xi = (0.7 + 0.77 + 0.65 + 0.5 + 0.83) = 0.69.
n i=1 5

Second moment:
n
1X 2 1
Xi = 0.72 + 0.772 + 0.652 + 0.52 + 0.832 = 0.48886.

n i=1 5

Third moment:
n
1X 3 1
Xi = 0.73 + 0.773 + 0.653 + 0.53 + 0.833 = 0.354189.

n i=1 5

Statistics 43 / 62
Method of Moments MOM Examples

The method of moments (MOM)

The main idea behind the method is the following:

We want to match empirical (sample) moments of a distribution to the population


moments. Before we apply the method, we make a couple of observations.

The k-th moment of f (x ), E X k depends only on the unknown parameters


 

θ1 , θ2 , · · · , θm .
Pn
The k-th moment of the sample, n1 i=1 Xik depends only on the data (the
sample itself)!

Statistics 44 / 62
Method of Moments MOM Examples

The method of moments (MOM)

Suppose that we have m unknown parameters θ1 , θ2 , . . . , θm . The method of mo-


ment estimators θ̂1 , θ̂1 , · · · , θ̂m can be obtained by:
1 Get the first m moments of f (x ) and of the sample.
2 Equate them.
3 Solve a system of equations with m unknowns, the estimators θi .

Observation
Need to take more than m if some moments are zero or produce equations on the
same variables as the previous ones.

Statistics 45 / 62
Method of Moments MOM Examples

Properties of the estimators obtained by the MOM

Unbiased: If the unknown parameters that we try to estimate are population


moments, then the estimators obtained by this method are unbiased.
Consistency: Under quite general conditions the estimators obtained by this
method are consistent.

Statistics 46 / 62
Method of Moments MOM Examples

MOM: Example 1
Recall the population X distributed with f (x ) = 21 (1 − α · x ) where α is some
(unknown) parameter and −1 ≤ x ≤ 1.
We have collected a sample of n = 5 observations from X and we found the
observations to be X1 = 0.7, X2 = 0.77, X3 = 0.65, X4 = 0.5, X5 = 0.83.
The first population and the first sample moments are
n
α 1X
E [X ] = − and Xi = 0.69.
3 n i=1
Equating we get:
α
− = 0.69 =⇒ α̂ = −2.07
3
We put a “hat” on top of α when we assign a value to it in the end. This is
done to signal that this is merely an estimator and is not necessarily its true
value. Pn
In general, letting X̄ = n1 i=1 Xi , given any sample of n observations, we
may calculate the method of moments estimator for α as:
α̂ = −3 · X̄ .
Statistics 47 / 62
Method of Moments MOM Examples

MOM: Example 2
Consider a (continuous) population X distributed with PDF

1 1 1
f (x ) = − · θ1 · x + · θ2 · x 3 , −1 ≤ x ≤ 1,
2 4 8
where θ1 and θ2 are unknown. We have collected a sample of size n = 4 from the
population: X1 , X2 , X3 , X4 . What are the method of moments estimators for θ1
and θ2 ?

Statistics 48 / 62
Method of Moments MOM Examples

MOM: Example 2
Consider a (continuous) population X distributed with PDF

1 1 1
f (x ) = − · θ1 · x + · θ2 · x 3 , −1 ≤ x ≤ 1,
2 4 8
where θ1 and θ2 are unknown. We have collected a sample of size n = 4 from the
population: X1 , X2 , X3 , X4 . What are the method of moments estimators for θ1
and θ2 ?

Expectation Reality (sample)


R +1
E [X ] = −1 xf (x )dx = − θ61 + 20
θ2
4 (X1 + X2
1
+ X3 + X4 ) = X̄
 2  R +1 2
E X = −1 x f (x )dx = 13 ¯
4 (X1 + X2 + X 2 + X 2 ) = X̄
1 2 2
3 4
Problem! Can’t use this equality!
 3  R +1 3
E X = −1 x f (x )dx = ¯¯
4 (X1 + X23 + X33 + X43 ) = X̄
1 3
θ1
− 10 + θ2
28

Statistics 48 / 62
Method of Moments MOM Examples

MOM: Example 2
Here is the final system of equations:

− 61 + = X̄ ,
( θ θ2
20
¯
¯
θ1
− 10 + θ2
28 = X̄

Assume that our sample was X1 = 0.7, X2 = 0.6, X3 = 0.3, X4 = 0.7 . Then:
1X 2.3
X̄ = Xi = = 0.575
4 4
¯ = 1 X X 3 = 0.3575.
¯

4 i

So, the system becomes the following.


(
− θ61 + 20
θ2
= 0.575,
− 10 + 28 = 0.3575
θ1 θ2

The solution is: θˆ1 = −2.79, θˆ2 = 2.19.


Statistics 49 / 62
Method of Moments MOM Examples

Example: MOM for Exponential Distribution


We suspect X is a population that is exponentially distributed, but with unknown
rate λ. We have collected a sample from that population: X1 , X2 , · · · , Xn . We
have one unknown parameter (λ) so we will need one equation.

Statistics 50 / 62
Method of Moments MOM Examples

Example: MOM for Exponential Distribution


We suspect X is a population that is exponentially distributed, but with unknown
rate λ. We have collected a sample from that population: X1 , X2 , · · · , Xn . We
have one unknown parameter (λ) so we will need one equation.

Let us try the first population moment:


Z ∞ ∞
1
Z
E [X ] = xf (x )dx = λxe −λx dx = .
−∞ 0 λ

Statistics 50 / 62
Method of Moments MOM Examples

Example: MOM for Exponential Distribution


We suspect X is a population that is exponentially distributed, but with unknown
rate λ. We have collected a sample from that population: X1 , X2 , · · · , Xn . We
have one unknown parameter (λ) so we will need one equation.

Let us try the first population moment:


Z ∞ ∞
1
Z
E [X ] = xf (x )dx = λxe −λx dx = .
−∞ 0 λ
Similarly, we may obtain the first sample moment as:
X1 + X2 + . . . + Xn
X̄ = .
n

Statistics 50 / 62
Method of Moments MOM Examples

Example: MOM for Exponential Distribution


We suspect X is a population that is exponentially distributed, but with unknown
rate λ. We have collected a sample from that population: X1 , X2 , · · · , Xn . We
have one unknown parameter (λ) so we will need one equation.

Let us try the first population moment:


Z ∞ ∞
1
Z
E [X ] = xf (x )dx = λxe −λx dx = .
−∞ 0 λ
Similarly, we may obtain the first sample moment as:
X1 + X2 + . . . + Xn
X̄ = .
n
Equating the two (per the method of moments), we get:

1 n
λ̂ = = Pn .
X̄ i=1 Xi

Statistics 50 / 62
Method of Moments MOM Examples

Example: MOM for normal Distribution

We have some normally distributed population with mean µ and variance σ 2 .


They are both unknown. We have collected n observations (a sample) from the
population: X1 , X2 , · · · , Xn . What is the method of moments estimators for µ and
σ2 .

Statistics 51 / 62
Method of Moments MOM Examples

Example: MOM for normal Distribution

We have some normally distributed population with mean µ and variance σ 2 .


They are both unknown. We have collected n observations (a sample) from the
population: X1 , X2 , · · · , Xn . What is the method of moments estimators for µ and
σ2 .

For the
 population moments, we need (at least) the first two: E [X ] and
E X 2 . The first one is easy, as it is equal to µ.
The second one on the other hand is notthevariance: it is used in the
variance calculation! Recall that σ 2 = E X 2 − (E [X ])2 =⇒ E X 2 =
σ 2 + (E [X ])2 = σ 2 + µ2 .
In summary, we have:
E [X ] = µ
E X 2 = σ 2 + µ2 .
 

Statistics 51 / 62
Method of Moments MOM Examples

Example: MOM for normal Distribution

The sample moments. These are easier to calculate as:


n
1X
Xi = X̄
n i=1
n
1X 2 ¯.
X = X̄
n i=1 i
Equating the two, we get the following system of equations:

µ = X̄ =⇒ µ̂ = X̄ .
n Pn Pn
¯ = 1 X X 2 =⇒ σ̂ 2 = i=1 Xi − nµ = i=1 Xi − nX̄ .
2 2 2 2
µ2 + σ 2 = X̄ i
n i=1 n n

Statistics 52 / 62
Method of Moments MOM Examples

Example: MOM for Bernoulli Distribution. Exercise 17

let X be a Bernoulli random variable with probability of success p, that is


unknown. How to estimate it using the method of moments?

Statistics 53 / 62
Method of Moments MOM Examples

Example: MOM for Bernoulli Distribution. Exercise 17

let X be a Bernoulli random variable with probability of success p, that is


unknown. How to estimate it using the method of moments?

Let us run n experiments of that Bernoulli random variable and let’s mark
each of them as Xi with a 1 (when successful) or a 0 (when failed). Then:

E [X ] = 0 · P(X = 0) + 1 · P(X = 1) = 0 · (1 − p) + 1 · p = p
n
1X
X̄ = Xi the first sample moment.
n i=1

Statistics 53 / 62
Method of Moments MOM Examples

Example: MOM for Bernoulli Distribution. Exercise 17

let X be a Bernoulli random variable with probability of success p, that is


unknown. How to estimate it using the method of moments?

Let us run n experiments of that Bernoulli random variable and let’s mark
each of them as Xi with a 1 (when successful) or a 0 (when failed). Then:

E [X ] = 0 · P(X = 0) + 1 · P(X = 1) = 0 · (1 − p) + 1 · p = p
n
1X
X̄ = Xi the first sample moment.
n i=1

Equating the two, we get that


n
1X
p= Xi .
n i=1

Statistics 53 / 62
Maximum Likelihood Estimation

1 Point estimators
2 Bias of an estimator
Properties of unbiased estimators
Unbiased estimators Examples
Variance of an estimator
Mean square error
3 Efficiency of an estimator
Cramér-Rao Lower Bound
Relative efficency
Efficiency Examples
4 Consistency of an estimator
Consistency Examples
5 Method of Moments
Moments
MOM Examples
6 Maximum Likelihood Estimation
MLE Examples

Statistics 54 / 62
Maximum Likelihood Estimation

Maximum Likelihood Estimators (MLE)

Given a sample X1 , X2 , ..., Xn from a probability distribution with PDF f (X ; θ)


The likelihood function of a sample of n observations X1 , X2 , · · · , Xn is defined as
n
Y
L(θ) = f (X1 , θ) · f (X2 , θ) · · · f (Xn , θ) = f (Xi , θ) .
i=1

The maximum likelihood estimator (MLE) of θ is the value that maximizes


the likelihood function.
Since the logarithm is a non-decreasing function, taking logarithm in L(θ) the
result obtained will be the same.
The MLE is then found by solving:
n n
∂ ∂ Y X ∂
ln L(θ) = ln( f (Xi , θ)) = ln f (Xi , θ) = 0.
∂θ ∂θ i=1 i=1
∂θ

Statistics 55 / 62
Maximum Likelihood Estimation

Properties of the MLE

Unbiased: A MLE does not have to be unbiased, but they are asymptotically
unbiased. That is, as the sample size n → ∞ then the bias of an MLE tends
to zero.
Consistency: All MLE are consistent.
Efficiency: If there is an efficient estimator θ̂ of the parameter θ, then it is
also of maximum likelihood and it is unique. But every maximum likelihood
estimator does not have to be efficient. Maximum likelihood estimators are
asymptotically efficient.

Statistics 56 / 62
Maximum Likelihood Estimation

MLE: Example 1
We have a sample X1 = 0.7, X2 = 0.77, X3 = 0.65, X4 = 0.5, X5 = 0.83 from a
distribution with PDF f (x ) = 12 (1 − α · x ), −1 ≤ x ≤ 1. Find the MLE.

Statistics 57 / 62
Maximum Likelihood Estimation

MLE: Example 1
We have a sample X1 = 0.7, X2 = 0.77, X3 = 0.65, X4 = 0.5, X5 = 0.83 from a
distribution with PDF f (x ) = 12 (1 − α · x ), −1 ≤ x ≤ 1. Find the MLE.

We build the log-likelihood function as:

ln L(α) = ln f (X1 ) + ln f (X2 ) + ln f (X3 ) + ln f (X4 ) + ln f (X5 ) =


1 1 1
= ln (1 − 0.7α) + ln (1 − 0.77α) + ln (1 − 0.65α)+
2 2 2
1 1
+ ln (1 − 0.5α) + ln (1 − 0.83α).
2 2
Note that ∂θ ln 2 (1 − Xi α) = αXi −1 .
∂ 1 Xi


Hence
∂ ln L 0.7 0.77 0.65 0.5 0.83
= + + + + =0
∂α 1 − 0.7α 1 − 0.77α 1 − 0.65α 1 − 0.5α 1 − 0.83α
=⇒ α = 1.88

Statistics 57 / 62
Maximum Likelihood Estimation MLE Examples

MLE for an exponential distribution. Exercise 13

We have obtained a sample of n (positive) observations X1 , X2 , · · · , Xn with


average X̄ = X1 +X2 +...+X
n
n
. We also assume that the population is exponentially
distributed with rate λ. What is the MLE estimator for λ ?

Recall that, for the exponential distribution we have


(
λe −λx x ≥ 0,
f (x ) =
0 x < 0.

First, we build the log-likelihood function as:

ln L(λ) = ln λe −λX1 + ln λe −λX2 + . . . + ln λe −λXn =


= ln λ − λX1 + ln λ − λX2 + . . . + ln λ − λXn =
= n ln λ − λ (X1 + X2 + . . . + Xn ) .

Statistics 58 / 62
Maximum Likelihood Estimation MLE Examples

MLE for an exponential distribution. Exercise 13

Again, we find the maximizer:

∂ ln L(λ) ′
= (n ln λ − λ (X1 + X2 + . . . + Xn )) = 0 =⇒
∂λ
n n 1
− (X1 + X2 + . . . + Xn ) = 0 =⇒ λ̂ = Pn =
λ i=1 Xi X̄

Observe how we have reached the same result as when using the method of
moments. This is not necessarily always the case.

Statistics 59 / 62
Maximum Likelihood Estimation MLE Examples

MLE for a Bernoulli distribution. Exercise 17

Consider that we have a population producing random variables distributed as


Bernoulli with probability of success p. We have obtained a sample of n
observations with q successes (let them be Xi = 1) and n − q failures (Xi = 0).
What is the MLE estimator for the unknown p?

Recall that, for a Bernoulli random variable its probability mass function
(PMF) is P(0) = 1 − p and P(1) = p.
Without loss of generality, assume we arrange the observations with the
successes first (the first, say, q observations) and the failures next (the
remaining n − q observations).
We are now ready to build the likelihood function:
q
! n
!
Y Y
L(p) = p · (1 − p) = p q · (1 − p)n−q .
i=1 i=q+1

Statistics 60 / 62
Maximum Likelihood Estimation MLE Examples

Example: MLE for a Bernoulli distribution. Exercise 17

We assume that p ∈ (0, 1). Then the log-likelihood function is

ln L(p) = q ln(p) + (n − q) ln(1 − p).

The derivative of the log-likelihood function is

∂ ln L(p) q n−q
= − .
∂p p 1−p
Now, equate this to 0 to get the maximizer:
q n−q
− = 0 =⇒
p 1−p
q(1 − p) − (n − q)p = 0 =⇒
q
p̂ = .
n

Statistics 61 / 62
Maximum Likelihood Estimation MLE Examples

Example: MLE for Normal Distribution


Suppose X1 , X2 , ..., Xn ∼ N(µ, σ 2 ), with unknown µ. What
 is the MLE  estimator
2
i −µ)
for the unknown µ? Recall that f (Xi ; µ, σ 2 ) = √ 1 2 exp − (X2σ 2 is the PDF
2πσ
for each Xi .

Statistics 62 / 62
Maximum Likelihood Estimation MLE Examples

Example: MLE for Normal Distribution


Suppose X1 , X2 , ..., Xn ∼ N(µ, σ 2 ), with unknown µ. What
 is the MLE  estimator
2
i −µ)
for the unknown µ? Recall that f (Xi ; µ, σ 2 ) = √ 1 2 exp − (X2σ 2 is the PDF
2πσ
for each Xi .

The log-likelihood function is:


n
1 (Xi − µ)2 
Y  
ln L(µ, σ ) = ln
2
√ exp −
i=1 2πσ 2 2σ 2
n
n 1 X
= − ln(2πσ 2 ) − 2 (Xi − µ)2 .
2 2σ i=1

Statistics 62 / 62
Maximum Likelihood Estimation MLE Examples

Example: MLE for Normal Distribution


Suppose X1 , X2 , ..., Xn ∼ N(µ, σ 2 ), with unknown µ. What
 is the MLE  estimator
2
i −µ)
for the unknown µ? Recall that f (Xi ; µ, σ 2 ) = √ 1 2 exp − (X2σ 2 is the PDF
2πσ
for each Xi .

The log-likelihood function is:


n
1 (Xi − µ)2 
Y  
ln L(µ, σ ) = ln
2
√ exp −
i=1 2πσ 2 2σ 2
n
n 1 X
= − ln(2πσ 2 ) − 2 (Xi − µ)2 .
2 2σ i=1

Differentiating with respect to µ and setting it to zero:


n n
∂ 1 X X
ln L(µ, σ 2 ) = 2 (Xi − µ) = 0 ⇒ nµ = Xi .
∂µ σ i=1 i=1
Pn
So that µ̂ = 1
n i=1 Xi .
Statistics 62 / 62

You might also like