MS NotesWeek10
MS NotesWeek10
The Method of Moments is a simple technique based on the idea that the sample
moments are “natural” estimators of population moments.
Example 2.17. Let Yi ∼ N (µ, σ 2 ). We will find the Method of Moments estima-
iid
tors of µ and σ 2 .
2 2 2
Pn have2 µ1 = E(Y ) = µ, µ2 = E(Y ) = σ + µ , m1 2= Y and m2 =
0 0 0 0
We
i=1 Yi /n. So, the Method of Moments estimators of µ and σ satisfy the equa-
tions
b=Y
µ
n
2 2 1X 2
σ
b +µ b = Y .
n i=1 i
Thus, we obtain
µ
b=Y
n n
1X 2 2 1X
b2 =
σ Yi − Y = (Yi − Y )2 .
n i=1 n i=1
2.3. METHODS OF ESTIMATION 97
Example 2.18. Let Yi ∼ Poisson(λ). We will find the Method of Moments es-
iid
timator of λ. We know that for this distribution E(Yi ) = var(Yi ) = λ. Hence
By comparing the first and second population and sample moments we get two
different estimators of the same parameter,
b
λ1 = Y
n
b 1X 2 2
λ2 = Yi − Y .
n i=1
Exercise 2.11. Let Y = (Y1 , . . . , Yn )T be a random sample from the distribution
with the pdf given by
2
ϑ2
(ϑ − y), y ∈ [0, ϑ],
f (y; ϑ) =
0, elsewhere.
This method was introduced by R.A.Fisher and it is the most common method
of constructing estimators. We will illustrate the method by the following simple
example.
Example 2.19. Assume that Yi ∼ Bernoulli(p), i = 1, 2, 3, 4, with probability of
iid
success equal to p, where p ∈ Θ = { 41 , 42 , 34 } i.e., p belongs to the parameter space
of only three elements. We want to estimate parameter p based on observations of
the random sample Y = (Y1 , Y2 , Y3, Y4 )T .
The different values of the joint pmf for all p ∈ Θ are given in the table below
98 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE
1 2 3
P4
p 4 4 4 i=1 yi
81 16 1
256 256 256
0
27 16 3
256 256 256
1
9 16 9
P (Y = y; p) 256 256 256
2
3 16 27
256 256 256
3
1 16 81
256 256 256
4
P
We see that P ( 4i=1 Yi = 0) is largest when p = 41 . It can be interpreted that when
the observed value of the random sample is (0, 0, 0, 0)T the most likely value of
the parameter p is pb = 41 . Then, this value can be considered as an estimate of p.
Similarly, we can conclude that when the observed value of the random sample
is, for example, (0, 1, 1, 0)T , then the most likely value of the parameter is pb = 21 .
Altogether, we have
1
pb = 4
if we observe all failures or just one success;
1
pb = 2
if we observe two failures and two successes;
3
pb = 4
if we observe three successes and one failure or four successes.
Note that, for each point (y1 , y2 , y3 , y4)T , the estimate pb is the value of parameter
p for which the joint mass function, treated as a function of p, attains maximum
(or its largest value).
Here, we treat the joint pmf as a function of parameter p for a given y. Such a
function is called the likelihood function and it is denoted by L(p|y).
Properties of MLE
MLEs are asymptotically normal and asymptptically unbiased. Also, they are
efficient, that is
b = lim CRLB(g(ϑ)) = 1.
eff(g(ϑ))
n→∞ b
var g(ϑ)
b is approximately equal to the CRLB. Therefore,
In this case, for large n, var g(ϑ)
for large n,
b ∼ N g(ϑ), CRLB(g(ϑ))
g(ϑ)
b
approximately. This is called the asymptotic distribution of g(ϑ).
We need to find the value of λ which maximizes the likelihood. This value will
also maximize `(λ|y) = log L(λ|y), which is easier to work with. Now, we have
n
X n
X
`(λ|y) = yi log λ − nλ − log(yi !).
i=1 i=1
Thus, we have
n n
∂` 1 X 1 X
= 2 2(yi − µ) = 2 (yi − µ)
∂µ 2σ i=1 σ i=1
and
n n
∂` n 1 1 X 2 n 1 X
=− 2π + 4 (yi − µ) = − 2 + 4 (yi − µ)2 .
∂σ 2 2 2πσ 2 2σ i=1 2σ 2σ i=1
P
so that σ̂ 2 = ni=1 (Yi − Y )2 /n is the maximum likelihood estimator of σ 2 , which
are the same as the Method of Moments estimators.
λα α−1 −λy
f (y; λ, α) = y e , for y > 0.
Γ(α)
(c) Knowing that E(Yi ) = α λ1 for all i = 1, . . . , n, check that the MLE[g(λ)] is
an unbiased estimator of g(λ). What can you conclude about the properties
of the estimator?
2.3. METHODS OF ESTIMATION 101
If Y1 , . . . , Yn are independent random variables, which have the same variance and
higher-order moments, and, if each E(Yi ) is a linear function of ϑ1 , . . . , ϑp , then
the Least Squares estimates of ϑ1 , . . . , ϑp are obtained by minimizing
n
X
S(ϑ) = {Yi − E(Yi )}2 .
i=1
The Least Squares estimator of ϑj has minimum variance amongst all linear
unbiased estimators of ϑj and is known as the best linear unbiased estimator
(BLUE). If the Yi s have a normal distribution, then the Least Squares estimator
of ϑj is the Maximum Likelihood estimator, has a normal distribution and is the
MVUE.
Example 2.22. Suppose that Y1 , . . . , Yn1 are independent N (µ1 , σ 2 ) random vari-
ables and that Yn1 +1 , . . . , Yn are independent N (µ2 , σ 2 ) random variables. Find
the least squares estimators of µ1 and µ2 .
Since
µ1 , i = 1, . . . , n1 ,
E(Yi ) =
µ2 , i = n1 + 1, . . . , n,
it is a linear function of µ1 and µ2 . The Least Squares estimators are obtained by
minimizing
n
X n1
X n
X
2 2
S= {Yi − E(Yi )} = (Yi − µ1 ) + (Yi − µ2 )2 .
i=1 i=1 i=n1 +1
Now,
Xn1 n1
∂S 1 X
= −2 (Yi − µ1 ) = 0 ⇒ µ
b1 = Yi = Y 1
∂µ1 i=1
n1 i=1
and
X n n
∂S 1 X
= −2 (Yi − µ2 ) = 0 ⇒ µ
b2 = Yi = Y 2 ,
∂µ2 i=n +1
n2 i=n +1
1 1
Now,
X n X nX n
∂S
= −2 (Yi −β0 −β1 xi ) = 0 ⇒ Yi −nβb0 −βb1 xi = 0 ⇒ βb0 = y−βb1 x
∂β0 i=1 i=1 i=1
and
X n X X n X n n
∂S
= −2 xi (Yi − β0 − β1 xi ) = 0 ⇒ xi Yi − βb0 xi − βb1 x2i = 0.
∂β1 i=1 i=1 i=1 i=1
These are the Least Squares estimators of the regression coefficients β0 and β1 .
Exercise 2.13. Given data, (x1 , y1), . . . , (xn , yn ), assume that Yi ∼ N (β0 +β1 xi , σ 2 )
independently for i = 1, 2, . . . , n and that σ 2 is known.
(a) Show that the Maximum Likelihood Estimators of β0 and β1 must be the same
as the Least Squares Estimators of these parameters.
2.3. METHODS OF ESTIMATION 103
(b) The quench bath temperature in a heat treatment operation was thought to
affect the Rockwell hardness of a certain coil spring. An experiment was
run in which several springs were treated under four different temperatures.
The table gives values of the set temperatures (x) and the observed hardness
(y, coded). Assuming that hardness depends on temperature linearly and
the variance of the r.vs is constant we may write the following model:
E(Yi ) = β0 + β1 xi , var(Yi ) = σ 2 .
Run 1 2 3 4 5 6 7 8 9 10 11 12 13 14
x 30 30 30 30 40 40 40 50 50 50 60 60 60 60
y 55.8 59.1 54.8 54.6 43.1 42.2 45.2 31.6 30.9 30.8 17.5 20.5 17.2 16.9
T (Y1, . . . , Yn ) → Θ,
that is, their values belong to the parameter space. However, the values vary with
the observed sample. If the estimator is MVUE we may expect that “on average”
the calculated estimates are very close to the true parameter and also that the
variability of the estimates is the smallest possible.