ST Topic 4
ST Topic 4
1 Point estimators
2 Bias of an estimator
Properties of unbiased estimators
Unbiased estimators Examples
Variance of an estimator
Mean square error
3 Efficiency of an estimator
Cramér-Rao Lower Bound
Relative efficency
Efficiency Examples
4 Consistency of an estimator
Consistency Examples
5 Method of Moments
Moments
MOM Examples
6 Maximum Likelihood Estimation
MLE Examples
Statistics 2 / 62
Point estimators
Statistics 3 / 62
Point estimators
Statistics 3 / 62
Point estimators
Statistics 3 / 62
Point estimators
Statistics 3 / 62
Point estimators
Estimate
Statistics 4 / 62
Point estimators
Point estimators
Of course, once we have picked a sample then θ̂ can be calculated and assigned
a value. This value is called the point estimate θ. To summarize, θ̂ is typically a
“formula” whereas θ is an actual number.
Statistics 5 / 62
Point estimators
Are you looking for the (unknown) mean of a population? Collect a sample
and report its average.
Are you looking for the (unknown) population variance? Collect a sample and
report its sample variance.
Recall that every distribution we have seen had some parameters that were
required to define it. For example, the binomial distribution needed n > 0
and p ≥ 0, whereas the exponential distribution or the Poisson distribution
needed λ > 0.
For a single population:
Parameters Point estimators
Population mean µ Sample average θ̂ = X̄
Population variance σ 2 Sample variance θ̂ = S 2
Population proportion p Sample proportion nn̂
Statistics 6 / 62
Point estimators
Example
that is, the mean statistic will be X̄ and the point estimate will be x̄ = 800€.
Statistics 7 / 62
Bias of an estimator
1 Point estimators
2 Bias of an estimator
Properties of unbiased estimators
Unbiased estimators Examples
Variance of an estimator
Mean square error
3 Efficiency of an estimator
Cramér-Rao Lower Bound
Relative efficency
Efficiency Examples
4 Consistency of an estimator
Consistency Examples
5 Method of Moments
Moments
MOM Examples
6 Maximum Likelihood Estimation
MLE Examples
Statistics 8 / 62
Bias of an estimator
Statistics 9 / 62
Bias of an estimator
Statistics 9 / 62
Bias of an estimator
Bias
We define the bias of a point estimator as the difference between its expectation
and the parameter itself.
Bias
bias[θ̂] = E [θ̂] − θ
Statistics 10 / 62
Bias of an estimator
Bias example
Assume a population with mean µ and variance σ 2 . As the mean is unknown you
decide to use the following three approaches to estimate it:
1 Get the average from a sample of 3 randomly picked observations, that is
X1 + X2 + X3
.
3
2 Get a sample of 3 randomly picked observations and calculate
2 · X1 + X2 − X3
.
2
3 Get a sample of 3 randomly picked observations and calculate
2X1 + X2 − X3 .
Statistics 11 / 62
Bias of an estimator
Bias example
θ̂1 = X1 +X2 +X3
3 .
Then
X1 + X2 + X3 1
h i
E θ̂1 = E = (E [X1 ] + E [X2 ] + E [X3 ])
3 3
1
= (µ + µ + µ) = µ =⇒ bias[θ̂1 ] = µ − µ = 0.
3
Statistics 12 / 62
Bias of an estimator
Bias example
θ̂1 = X1 +X2 +X3
3 .
Then
X1 + X2 + X3 1
h i
E θ̂1 = E = (E [X1 ] + E [X2 ] + E [X3 ])
3 3
1
= (µ + µ + µ) = µ =⇒ bias[θ̂1 ] = µ − µ = 0.
3
θ̂2 = 2·X1 +X2 −X3
2 .
Then
2 · X1 + X2 − X3 2µ + µ − µ
h i
E θ̂2 = E = =µ
2 2
=⇒ bias[θ̂2 ] = µ − µ = 0.
Statistics 12 / 62
Bias of an estimator
Bias example
θ̂1 = X1 +X2 +X3
3 .
Then
X1 + X2 + X3 1
h i
E θ̂1 = E = (E [X1 ] + E [X2 ] + E [X3 ])
3 3
1
= (µ + µ + µ) = µ =⇒ bias[θ̂1 ] = µ − µ = 0.
3
θ̂2 = 2·X1 +X2 −X3
2 .
Then
2 · X1 + X2 − X3 2µ + µ − µ
h i
E θ̂2 = E = =µ
2 2
=⇒ bias[θ̂2 ] = µ − µ = 0.
θ̂3 = 2 · X1 + X2 − X3 . Then
E [θ̂3 ] = E [2 · X1 + X2 − X3 ] = 2µ + µ − µ = 2µ
=⇒ bias[θ̂3 ] = 2µ − µ = µ.
Statistics 12 / 62
Bias of an estimator
Bias example
θ̂1 = X1 +X2 +X3
3 .
Then
X1 + X2 + X3 1
h i
E θ̂1 = E = (E [X1 ] + E [X2 ] + E [X3 ])
3 3
1
= (µ + µ + µ) = µ =⇒ bias[θ̂1 ] = µ − µ = 0.
3
θ̂2 = 2·X1 +X2 −X3
2 .
Then
2 · X1 + X2 − X3 2µ + µ − µ
h i
E θ̂2 = E = =µ
2 2
=⇒ bias[θ̂2 ] = µ − µ = 0.
θ̂3 = 2 · X1 + X2 − X3 . Then
E [θ̂3 ] = E [2 · X1 + X2 − X3 ] = 2µ + µ − µ = 2µ
=⇒ bias[θ̂3 ] = 2µ − µ = µ.
The first two estimators are unbiased. The last one is biased and its bias is µ.
Statistics 12 / 62
Bias of an estimator Properties of unbiased estimators
If θ̂1 and θ̂2 are two unbiased estimators of the parameter θ, then the
estimator θ̂ defined as
Statistics 13 / 62
Bias of an estimator Properties of unbiased estimators
If θ̂1 and θ̂2 are two unbiased estimators of the parameter θ, then the
estimator θ̂ defined as
Statistics 13 / 62
Bias of an estimator Unbiased estimators Examples
Show that Pn 2
Xi − X̄
S = 2 i=1
n−1
is an unbiased estimator of the population variance.
(Xi −X̄ )
Pn 2
E [S 2 ] = σ 2 .
Statistics 14 / 62
Bias of an estimator Unbiased estimators Examples
Show that Pn 2
Xi − X̄
S = 2 i=1
n−1
is an unbiased estimator of the population variance.
(Xi −X̄ )
Pn 2
E [S 2 ] = σ 2 .
Hence
bias[S 2 ] = E [S 2 ] − σ 2 = 0.
Statistics 14 / 62
Bias of an estimator Variance of an estimator
Standard Error
We define the standard error of a point estimator as the square root of its variance,
that is,
q
SE [θ̂] = Var[θ̂].
We want this to be minimum. A point estimator with minimum variance and zero
bias is called a minimum variance unbiased estimator.
Statistics 15 / 62
Bias of an estimator Variance of an estimator
Variances example
Assume the same population with unknown mean µ and variance σ 2 (as in bias
Example). We use again the three estimators from before. What are the variances?
Statistics 16 / 62
Bias of an estimator Variance of an estimator
Variances example
Assume the same population with unknown mean µ and variance σ 2 (as in bias
Example). We use again the three estimators from before. What are the variances?
X1 + X2 + X3 1 1 σ2
h i
= (Var [X1 ] + Var [X2 ] + Var [X3 ]) = 3σ 2 =
Var θ̂1 = Var .
3 9 9 3
Statistics 16 / 62
Bias of an estimator Variance of an estimator
Variances example
Assume the same population with unknown mean µ and variance σ 2 (as in bias
Example). We use again the three estimators from before. What are the variances?
X1 + X2 + X3 1 1 σ2
h i
= (Var [X1 ] + Var [X2 ] + Var [X3 ]) = 3σ 2 =
Var θ̂1 = Var .
3 9 9 3
θ̂2 = Then
2·X1 +X2 −X3
2 .
2X1 − X2 + X3 1 1
h i
Var θ̂2 = Var = Var [X1 ] + Var [X2 ] + Var [X3 ]
2 4 4
σ 2
σ 2
3
= σ2 + + = σ2 .
4 4 2
Statistics 16 / 62
Bias of an estimator Variance of an estimator
Variances example
Assume the same population with unknown mean µ and variance σ 2 (as in bias
Example). We use again the three estimators from before. What are the variances?
X1 + X2 + X3 1 1 σ2
h i
= (Var [X1 ] + Var [X2 ] + Var [X3 ]) = 3σ 2 =
Var θ̂1 = Var .
3 9 9 3
θ̂2 = Then
2·X1 +X2 −X3
2 .
2X1 − X2 + X3 1 1
h i
Var θ̂2 = Var = Var [X1 ] + Var [X2 ] + Var [X3 ]
2 4 4
σ 2
σ 2
3
= σ2 + + = σ2 .
4 4 2
Statistics 16 / 62
Bias of an estimator Variance of an estimator
Variances example
Assume the same population with unknown mean µ and variance σ 2 (as in bias
Example). We use again the three estimators from before. What are the variances?
X1 + X2 + X3 1 1 σ2
h i
= (Var [X1 ] + Var [X2 ] + Var [X3 ]) = 3σ 2 =
Var θ̂1 = Var .
3 9 9 3
θ̂2 = Then
2·X1 +X2 −X3
2 .
2X1 − X2 + X3 1 1
h i
Var θ̂2 = Var = Var [X1 ] + Var [X2 ] + Var [X3 ]
2 4 4
σ 2
σ 2
3
= σ2 + + = σ2 .
4 4 2
h i
MSE = E (θ̂ − θ)2
Statistics 17 / 62
Bias of an estimator Mean square error
h i
MSE = E (θ̂ − θ)2
We can use this to derive the fact that the mean square error is equal to the
summation of the variance plus the square of the bias:
MSE(θ̂)
h i
= E (θ̂ − θ)2 = E [(θ̂ − E [θ̂])2 ] + (θ − E [θ̂])2 − 2E [(θ̂ − E [θ̂])(θ − E [θ̂])] =
= Var[θ̂] + bias[θ̂]2 .
Statistics 17 / 62
Bias of an estimator Mean square error
h i
MSE = E (θ̂ − θ)2
We can use this to derive the fact that the mean square error is equal to the
summation of the variance plus the square of the bias:
MSE(θ̂)
h i
= E (θ̂ − θ)2 = E [(θ̂ − E [θ̂])2 ] + (θ − E [θ̂])2 − 2E [(θ̂ − E [θ̂])(θ − E [θ̂])] =
= Var[θ̂] + bias[θ̂]2 .
By definition, the MSE tries to capture both bias and variance at the same
time. Hence, we typically say that one estimator is better than another if its
MSE is smaller.
Statistics 17 / 62
Bias of an estimator Mean square error
Statistics 18 / 62
Bias of an estimator Mean square error
Statistics 18 / 62
Bias of an estimator Mean square error
Statistics 18 / 62
Bias of an estimator Mean square error
Statistics 18 / 62
Bias of an estimator Mean square error
Hence θ̂1 has the smallest MSE (as expected), followed by θ̂2 .
Statistics 18 / 62
Efficiency of an estimator
1 Point estimators
2 Bias of an estimator
Properties of unbiased estimators
Unbiased estimators Examples
Variance of an estimator
Mean square error
3 Efficiency of an estimator
Cramér-Rao Lower Bound
Relative efficency
Efficiency Examples
4 Consistency of an estimator
Consistency Examples
5 Method of Moments
Moments
MOM Examples
6 Maximum Likelihood Estimation
MLE Examples
Statistics 19 / 62
Efficiency of an estimator
Efficiency of an Estimator
Statistics 20 / 62
Efficiency of an estimator
To select the most efficient estimator among the set of all possible estimators
of an unknown parameter θ, it is necessary to calculate its variances and
check which is the smallest.
Statistics 22 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
To select the most efficient estimator among the set of all possible estimators
of an unknown parameter θ, it is necessary to calculate its variances and
check which is the smallest.
We may not be able to define the set of all possible estimators, so we
typically investigate the efficiency of an estimator by comparing its variance
to a quantity, called the Cramér-Rao bound.
Statistics 22 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
To select the most efficient estimator among the set of all possible estimators
of an unknown parameter θ, it is necessary to calculate its variances and
check which is the smallest.
We may not be able to define the set of all possible estimators, so we
typically investigate the efficiency of an estimator by comparing its variance
to a quantity, called the Cramér-Rao bound.
The Cramér-Rao bound provides a lower bound for the variance of any
unbiased estimator:
1
Var[θ̂] ≥
I(θ)
Where I(θ) is the Fisher Information, defined as:
" 2 #
∂
I(θ) = E ln f (X ; θ) .
∂θ
Statistics 22 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
Statistics 23 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
Where E [X ] = λ1 , Var[X ] = 1
λ2 and E [X 2 ] = Var[X ] + E [X ]2 = 2
λ2 .
Statistics 24 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
Where E [X ] = λ1 , Var[X ] = 1
λ2 and E [X 2 ] = Var[X ] + E [X ]2 = 2
λ2 .
Then, for x ≥ 0,
∂ ∂ 1
ln f (X ; λ) = (ln λ − λX ) = − X .
∂λ ∂λ λ
So that, the Fisher information is:
" 2 #
∂
I(λ) = E ln f (X ; λ) .
∂λ
Statistics 24 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
Then
" 2 #
1 1 2X 1 2 2 1
I(λ) = E −X =E 2
− + X 2
= 2 − 2 + 2 = 2.
λ λ λ λ λ λ λ
1 λ2
Var(λ̂) ≥ = .
In (λ) n
Statistics 25 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
Cramér-Rao Bound
Efficiency(θ̂) =
Var(θ̂)
Statistics 26 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
Statistics 27 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
(x −µ)2
For a normal distribution: f (x ; µ) = √ 1 e− 2σ 2 . Then
2πσ 2
1 (x − µ)2
ln f (x ; µ) = ln( √ )− .
2πσ 2 2σ 2
So that
∂ x −µ
ln f (x ; µ) = .
∂µ σ2
Statistics 27 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
Hence " 2 #
X −µ E [(X − µ)2 ] 1
I(µ) = E = = 2,
σ2 σ4 σ
Statistics 28 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
Hence " 2 #
X −µ E [(X − µ)2 ] 1
I(µ) = E = = 2,
σ2 σ4 σ
Statistics 28 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
Statistics 29 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
σ2
Var(µ̂) = .
n
Since this exactly matches the Cramér-Rao Bound, the efficiency is:
Cramér-Rao Bound σ 2 /n
Efficiency(µ̂) = = 2 = 1.
Var(µ̂) σ /n
Statistics 29 / 62
Efficiency of an estimator Cramér-Rao Lower Bound
Relative efficency
We may also define the relative efficiency as the ratio of two estimator mean
square errors: if θ̂1 and θ̂2 are unbiased estimators of θ, the relative efficiency of
θ̂1 with respect to θ̂2 is:
Efficiency[θˆ1 ] Var[θˆ2 ]
Relative efficiency (θˆ1 , θˆ2 ) = = .
ˆ
Efficiency[θ2 ] Var[θˆ1 ]
If the relative efficiency is less than 1, then we say that point estimator θ̂2 is
preferred to point estimator θ̂1 or, the variance of point estimator θ̂2 is smaller than
the variance of point estimator θ̂1 .
Statistics 30 / 62
Efficiency of an estimator Relative efficency
Relative efficency
Assume the same population with unknown mean µ and variance σ 2 . We use the
three estimators θ̂1 , θ̂2 , and θ̂3 . The relative efficiency of the only two unbiased
estimators is:
θ̂1 , θ̂2 :
Var[θˆ2 ] 3σ 2
9
Relative efficiency (θˆ1 , θˆ2 ) = = 2
= > 1,
Var[θˆ1 ] σ2
3
2
so θ̂1 is preferred.
Statistics 31 / 62
Efficiency of an estimator Efficiency Examples
Efficiency: example
Let (X1 , X2 , · · · .Xn ) be a simple random sample from a population N(µ, σ). Using
the sample mean X̄ and the sample median Xm as estimators of the population
mean µ. Study their relative efficiency.
Since the original population is normally distributed, we know that they are
unbiased estimators, due to the symmetry of the distribution and the
coincidence of the mean, median and mode.
2
Recall that Var(X̄ ) = σn .
Statistics 32 / 62
Efficiency of an estimator Efficiency Examples
Efficiency: example
Let (X1 , X2 , · · · .Xn ) be a simple random sample from a population N(µ, σ). Using
the sample mean X̄ and the sample median Xm as estimators of the population
mean µ. Study their relative efficiency.
Since the original population is normally distributed, we know that they are
unbiased estimators, due to the symmetry of the distribution and the
coincidence of the mean, median and mode.
2
Recall that Var(X̄ ) = σn .
π σ2 2
It can be prove that Var (Xm ) = 2 n ≈ 1, 57 σn .
Statistics 32 / 62
Efficiency of an estimator Efficiency Examples
Efficiency: example
Let (X1 , X2 , · · · .Xn ) be a simple random sample from a population N(µ, σ). Using
the sample mean X̄ and the sample median Xm as estimators of the population
mean µ. Study their relative efficiency.
Since the original population is normally distributed, we know that they are
unbiased estimators, due to the symmetry of the distribution and the
coincidence of the mean, median and mode.
2
Recall that Var(X̄ ) = σn .
π σ2 2
It can be prove that Var (Xm ) = 2 n ≈ 1, 57 σn .
Hence, the relative efficiency is
2
Var (Xm ) 1, 57 σ
= σ2 n = 1.57.
Var(X̄ ) n
This implies that the sample median Xm is 1.57 times less efficient than the
sample mean X̄ in estimating the population mean µ.
Statistics 32 / 62
Consistency of an estimator
1 Point estimators
2 Bias of an estimator
Properties of unbiased estimators
Unbiased estimators Examples
Variance of an estimator
Mean square error
3 Efficiency of an estimator
Cramér-Rao Lower Bound
Relative efficency
Efficiency Examples
4 Consistency of an estimator
Consistency Examples
5 Method of Moments
Moments
MOM Examples
6 Maximum Likelihood Estimation
MLE Examples
Statistics 33 / 62
Consistency of an estimator
Consistency of an Estimator
Until now the estimators were based on random samples of size n, but it seems
logical to think that when n is larger the estimator will be better.
Let {θ̂1 , θ̂1 , · · · , θ̂n } be a sequence of estimators of the parameter θ obtained from
samples of sizes 1, 2, · · · , n, respectively. We say that the sequence θ̂n is
consistent if, for all ε > 0,
lim P(|θ̂n − θ| ≥ ε) = 0.
n→∞
Statistics 34 / 62
Consistency of an estimator
Consistency of an Estimator
Until now the estimators were based on random samples of size n, but it seems
logical to think that when n is larger the estimator will be better.
Let {θ̂1 , θ̂1 , · · · , θ̂n } be a sequence of estimators of the parameter θ obtained from
samples of sizes 1, 2, · · · , n, respectively. We say that the sequence θ̂n is
consistent if, for all ε > 0,
lim P(|θ̂n − θ| ≥ ε) = 0.
n→∞
This means that as the sample size n increases, the estimator θ̂n converges in
probability to the true parameter θ. For example:
Pn
If estimating the population mean µ, then θ̂n = X̄n = n1 i=1 Xi .
If estimating thePpopulation variance σ 2 , then
n
θ̂n = Sn2 = n−1
1
i=1 (Xi − X̄n ) .
2
Statistics 34 / 62
Consistency of an estimator
Consistency of an Estimator
Statistics 35 / 62
Consistency of an estimator
Consistency of an Estimator
lim Var(θ̂n ) = 0.
n→∞
Statistics 35 / 62
Consistency of an estimator Consistency Examples
Statistics 36 / 62
Consistency of an estimator Consistency Examples
σ2
lim Var(X̄n ) = lim = 0.
n→∞ n→∞ n
Recall that, since E [Sn2 ] = σ 2 , then Sn2 is an unbiased estimator for every n.
Statistics 37 / 62
Consistency of an estimator Consistency Examples
Recall that, since E [Sn2 ] = σ 2 , then Sn2 is an unbiased estimator for every n.
If the population follows a normal distribution, recall that, for every n, the
variance of Sn2 is given by:
2σ 4
Var(Sn2 ) = .
n−1
Statistics 37 / 62
Consistency of an estimator Consistency Examples
Recall that, since E [Sn2 ] = σ 2 , then Sn2 is an unbiased estimator for every n.
If the population follows a normal distribution, recall that, for every n, the
variance of Sn2 is given by:
2σ 4
Var(Sn2 ) = .
n−1
So that
2σ 4
lim Var(Sn2 ) = lim = 0.
n→∞ n→∞ n − 1
1 Point estimators
2 Bias of an estimator
Properties of unbiased estimators
Unbiased estimators Examples
Variance of an estimator
Mean square error
3 Efficiency of an estimator
Cramér-Rao Lower Bound
Relative efficency
Efficiency Examples
4 Consistency of an estimator
Consistency Examples
5 Method of Moments
Moments
MOM Examples
6 Maximum Likelihood Estimation
MLE Examples
Statistics 38 / 62
Method of Moments
Method of estimation
Statistics 39 / 62
Method of Moments Moments
Moments
Suppose that we have a population X distributed with PDF f (x ). We have managed
to collect a set of samples from the population X1 , X2 , · · · , Xn . Then:
The k-th population moment of a continuous population X is
Z +∞
E Xk = x k f (x )dx .
−∞
Statistics 40 / 62
Method of Moments Moments
Moments
Var[X ] = E X 2 − (E [X ])2
Statistics 41 / 62
Method of Moments Moments
Moments: Example
Statistics 42 / 62
Method of Moments Moments
Moments: Example
Statistics 42 / 62
Method of Moments Moments
Moments: Example
Statistics 42 / 62
Method of Moments Moments
Moments: Example
Third moment:
+1 +1
1
Z Z
α
E X3 = x · f (x )dx =
3
x 3 · (1 − α · x )dx = − .
−1 −1 2 5
Statistics 42 / 62
Method of Moments MOM Examples
Statistics 43 / 62
Method of Moments MOM Examples
Statistics 43 / 62
Method of Moments MOM Examples
Second moment:
n
1X 2 1
Xi = 0.72 + 0.772 + 0.652 + 0.52 + 0.832 = 0.48886.
n i=1 5
Statistics 43 / 62
Method of Moments MOM Examples
Second moment:
n
1X 2 1
Xi = 0.72 + 0.772 + 0.652 + 0.52 + 0.832 = 0.48886.
n i=1 5
Third moment:
n
1X 3 1
Xi = 0.73 + 0.773 + 0.653 + 0.53 + 0.833 = 0.354189.
n i=1 5
Statistics 43 / 62
Method of Moments MOM Examples
θ1 , θ2 , · · · , θm .
Pn
The k-th moment of the sample, n1 i=1 Xik depends only on the data (the
sample itself)!
Statistics 44 / 62
Method of Moments MOM Examples
Observation
Need to take more than m if some moments are zero or produce equations on the
same variables as the previous ones.
Statistics 45 / 62
Method of Moments MOM Examples
Statistics 46 / 62
Method of Moments MOM Examples
MOM: Example 1
Recall the population X distributed with f (x ) = 21 (1 − α · x ) where α is some
(unknown) parameter and −1 ≤ x ≤ 1.
We have collected a sample of n = 5 observations from X and we found the
observations to be X1 = 0.7, X2 = 0.77, X3 = 0.65, X4 = 0.5, X5 = 0.83.
The first population and the first sample moments are
n
α 1X
E [X ] = − and Xi = 0.69.
3 n i=1
Equating we get:
α
− = 0.69 =⇒ α̂ = −2.07
3
We put a “hat” on top of α when we assign a value to it in the end. This is
done to signal that this is merely an estimator and is not necessarily its true
value. Pn
In general, letting X̄ = n1 i=1 Xi , given any sample of n observations, we
may calculate the method of moments estimator for α as:
α̂ = −3 · X̄ .
Statistics 47 / 62
Method of Moments MOM Examples
MOM: Example 2
Consider a (continuous) population X distributed with PDF
1 1 1
f (x ) = − · θ1 · x + · θ2 · x 3 , −1 ≤ x ≤ 1,
2 4 8
where θ1 and θ2 are unknown. We have collected a sample of size n = 4 from the
population: X1 , X2 , X3 , X4 . What are the method of moments estimators for θ1
and θ2 ?
Statistics 48 / 62
Method of Moments MOM Examples
MOM: Example 2
Consider a (continuous) population X distributed with PDF
1 1 1
f (x ) = − · θ1 · x + · θ2 · x 3 , −1 ≤ x ≤ 1,
2 4 8
where θ1 and θ2 are unknown. We have collected a sample of size n = 4 from the
population: X1 , X2 , X3 , X4 . What are the method of moments estimators for θ1
and θ2 ?
Statistics 48 / 62
Method of Moments MOM Examples
MOM: Example 2
Here is the final system of equations:
− 61 + = X̄ ,
( θ θ2
20
¯
¯
θ1
− 10 + θ2
28 = X̄
Assume that our sample was X1 = 0.7, X2 = 0.6, X3 = 0.3, X4 = 0.7 . Then:
1X 2.3
X̄ = Xi = = 0.575
4 4
¯ = 1 X X 3 = 0.3575.
¯
X̄
4 i
Statistics 50 / 62
Method of Moments MOM Examples
Statistics 50 / 62
Method of Moments MOM Examples
Statistics 50 / 62
Method of Moments MOM Examples
1 n
λ̂ = = Pn .
X̄ i=1 Xi
Statistics 50 / 62
Method of Moments MOM Examples
Statistics 51 / 62
Method of Moments MOM Examples
For the
population moments, we need (at least) the first two: E [X ] and
E X 2 . The first one is easy, as it is equal to µ.
The second one on the other hand is notthevariance: it is used in the
variance calculation! Recall that σ 2 = E X 2 − (E [X ])2 =⇒ E X 2 =
σ 2 + (E [X ])2 = σ 2 + µ2 .
In summary, we have:
E [X ] = µ
E X 2 = σ 2 + µ2 .
Statistics 51 / 62
Method of Moments MOM Examples
µ = X̄ =⇒ µ̂ = X̄ .
n Pn Pn
¯ = 1 X X 2 =⇒ σ̂ 2 = i=1 Xi − nµ = i=1 Xi − nX̄ .
2 2 2 2
µ2 + σ 2 = X̄ i
n i=1 n n
Statistics 52 / 62
Method of Moments MOM Examples
Statistics 53 / 62
Method of Moments MOM Examples
Let us run n experiments of that Bernoulli random variable and let’s mark
each of them as Xi with a 1 (when successful) or a 0 (when failed). Then:
E [X ] = 0 · P(X = 0) + 1 · P(X = 1) = 0 · (1 − p) + 1 · p = p
n
1X
X̄ = Xi the first sample moment.
n i=1
Statistics 53 / 62
Method of Moments MOM Examples
Let us run n experiments of that Bernoulli random variable and let’s mark
each of them as Xi with a 1 (when successful) or a 0 (when failed). Then:
E [X ] = 0 · P(X = 0) + 1 · P(X = 1) = 0 · (1 − p) + 1 · p = p
n
1X
X̄ = Xi the first sample moment.
n i=1
Statistics 53 / 62
Maximum Likelihood Estimation
1 Point estimators
2 Bias of an estimator
Properties of unbiased estimators
Unbiased estimators Examples
Variance of an estimator
Mean square error
3 Efficiency of an estimator
Cramér-Rao Lower Bound
Relative efficency
Efficiency Examples
4 Consistency of an estimator
Consistency Examples
5 Method of Moments
Moments
MOM Examples
6 Maximum Likelihood Estimation
MLE Examples
Statistics 54 / 62
Maximum Likelihood Estimation
Statistics 55 / 62
Maximum Likelihood Estimation
Unbiased: A MLE does not have to be unbiased, but they are asymptotically
unbiased. That is, as the sample size n → ∞ then the bias of an MLE tends
to zero.
Consistency: All MLE are consistent.
Efficiency: If there is an efficient estimator θ̂ of the parameter θ, then it is
also of maximum likelihood and it is unique. But every maximum likelihood
estimator does not have to be efficient. Maximum likelihood estimators are
asymptotically efficient.
Statistics 56 / 62
Maximum Likelihood Estimation
MLE: Example 1
We have a sample X1 = 0.7, X2 = 0.77, X3 = 0.65, X4 = 0.5, X5 = 0.83 from a
distribution with PDF f (x ) = 12 (1 − α · x ), −1 ≤ x ≤ 1. Find the MLE.
Statistics 57 / 62
Maximum Likelihood Estimation
MLE: Example 1
We have a sample X1 = 0.7, X2 = 0.77, X3 = 0.65, X4 = 0.5, X5 = 0.83 from a
distribution with PDF f (x ) = 12 (1 − α · x ), −1 ≤ x ≤ 1. Find the MLE.
Hence
∂ ln L 0.7 0.77 0.65 0.5 0.83
= + + + + =0
∂α 1 − 0.7α 1 − 0.77α 1 − 0.65α 1 − 0.5α 1 − 0.83α
=⇒ α = 1.88
Statistics 57 / 62
Maximum Likelihood Estimation MLE Examples
Statistics 58 / 62
Maximum Likelihood Estimation MLE Examples
∂ ln L(λ) ′
= (n ln λ − λ (X1 + X2 + . . . + Xn )) = 0 =⇒
∂λ
n n 1
− (X1 + X2 + . . . + Xn ) = 0 =⇒ λ̂ = Pn =
λ i=1 Xi X̄
Observe how we have reached the same result as when using the method of
moments. This is not necessarily always the case.
Statistics 59 / 62
Maximum Likelihood Estimation MLE Examples
Recall that, for a Bernoulli random variable its probability mass function
(PMF) is P(0) = 1 − p and P(1) = p.
Without loss of generality, assume we arrange the observations with the
successes first (the first, say, q observations) and the failures next (the
remaining n − q observations).
We are now ready to build the likelihood function:
q
! n
!
Y Y
L(p) = p · (1 − p) = p q · (1 − p)n−q .
i=1 i=q+1
Statistics 60 / 62
Maximum Likelihood Estimation MLE Examples
∂ ln L(p) q n−q
= − .
∂p p 1−p
Now, equate this to 0 to get the maximizer:
q n−q
− = 0 =⇒
p 1−p
q(1 − p) − (n − q)p = 0 =⇒
q
p̂ = .
n
Statistics 61 / 62
Maximum Likelihood Estimation MLE Examples
Statistics 62 / 62
Maximum Likelihood Estimation MLE Examples
Statistics 62 / 62
Maximum Likelihood Estimation MLE Examples