0% found this document useful (0 votes)
51 views8 pages

MS NotesWeek10

This document discusses methods of estimation in statistics, including the method of moments and maximum likelihood estimation. The method of moments estimates parameters by setting sample moments equal to population moments. Maximum likelihood estimation selects parameter values that maximize the likelihood function, or the probability of obtaining the observed sample values. The document provides examples of applying these methods to distributions such as the normal, Poisson, and Bernoulli distributions.

Uploaded by

Shivam Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views8 pages

MS NotesWeek10

This document discusses methods of estimation in statistics, including the method of moments and maximum likelihood estimation. The method of moments estimates parameters by setting sample moments equal to population moments. Maximum likelihood estimation selects parameter values that maximize the likelihood function, or the probability of obtaining the observed sample values. The document provides examples of applying these methods to distributions such as the normal, Poisson, and Bernoulli distributions.

Uploaded by

Shivam Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

96 CHAPTER 2.

ELEMENTS OF STATISTICAL INFERENCE

2.3 Methods of Estimation

2.3.1 Method of Moments

The Method of Moments is a simple technique based on the idea that the sample
moments are “natural” estimators of population moments.

The k-th population moment of a random variable Y is


µ0k = E(Y k ), k = 1, 2, . . .
and the k-th sample moment of a sample Y1 , . . . , Yn is
n
1X k
m0k = Y , k = 1, 2, . . . .
n i=1 i
If Y1 , . . . , Yn are assumed to be independent and identically distributed then the
Method of Moments estimators of the distribution parameters ϑ1 , . . . , ϑp are ob-
tained by solving the set of p equations:
µ0k = m0k , k = 1, 2, . . . , p.
Under fairly general conditions, Method of Moments estimators are asymptot-
ically normal and asymptotically unbiased. However, they are not, in general,
efficient.

Example 2.17. Let Yi ∼ N (µ, σ 2 ). We will find the Method of Moments estima-
iid
tors of µ and σ 2 .

2 2 2
Pn have2 µ1 = E(Y ) = µ, µ2 = E(Y ) = σ + µ , m1 2= Y and m2 =
0 0 0 0
We
i=1 Yi /n. So, the Method of Moments estimators of µ and σ satisfy the equa-
tions
b=Y
µ
n
2 2 1X 2
σ
b +µ b = Y .
n i=1 i
Thus, we obtain
µ
b=Y
n n
1X 2 2 1X
b2 =
σ Yi − Y = (Yi − Y )2 .
n i=1 n i=1
2.3. METHODS OF ESTIMATION 97

Estimators obtained by the Method of Moments are not always unique.

Example 2.18. Let Yi ∼ Poisson(λ). We will find the Method of Moments es-
iid
timator of λ. We know that for this distribution E(Yi ) = var(Yi ) = λ. Hence
By comparing the first and second population and sample moments we get two
different estimators of the same parameter,

b
λ1 = Y
n
b 1X 2 2
λ2 = Yi − Y .
n i=1


Exercise 2.11. Let Y = (Y1 , . . . , Yn )T be a random sample from the distribution
with the pdf given by
 2
ϑ2
(ϑ − y), y ∈ [0, ϑ],
f (y; ϑ) =
0, elsewhere.

Find an estimator of ϑ using the Method of Moments.

2.3.2 Method of Maximum Likelihood

This method was introduced by R.A.Fisher and it is the most common method
of constructing estimators. We will illustrate the method by the following simple
example.
Example 2.19. Assume that Yi ∼ Bernoulli(p), i = 1, 2, 3, 4, with probability of
iid
success equal to p, where p ∈ Θ = { 41 , 42 , 34 } i.e., p belongs to the parameter space
of only three elements. We want to estimate parameter p based on observations of
the random sample Y = (Y1 , Y2 , Y3, Y4 )T .

The joint pmf is


4
Y P4 P4
yi
P (Y = y; p) = P (Yi = yi ; p) = p i=1 (1 − p)4− i=1 yi
.
i=1

The different values of the joint pmf for all p ∈ Θ are given in the table below
98 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE

1 2 3
P4
p 4 4 4 i=1 yi
81 16 1
256 256 256
0
27 16 3
256 256 256
1
9 16 9
P (Y = y; p) 256 256 256
2
3 16 27
256 256 256
3
1 16 81
256 256 256
4

P
We see that P ( 4i=1 Yi = 0) is largest when p = 41 . It can be interpreted that when
the observed value of the random sample is (0, 0, 0, 0)T the most likely value of
the parameter p is pb = 41 . Then, this value can be considered as an estimate of p.
Similarly, we can conclude that when the observed value of the random sample
is, for example, (0, 1, 1, 0)T , then the most likely value of the parameter is pb = 21 .
Altogether, we have

1
pb = 4
if we observe all failures or just one success;
1
pb = 2
if we observe two failures and two successes;
3
pb = 4
if we observe three successes and one failure or four successes.

Note that, for each point (y1 , y2 , y3 , y4)T , the estimate pb is the value of parameter
p for which the joint mass function, treated as a function of p, attains maximum
(or its largest value).

Here, we treat the joint pmf as a function of parameter p for a given y. Such a
function is called the likelihood function and it is denoted by L(p|y). 

Now we introduce a formal definition of the Maximum Likelihood Estimator (MLE).


b whose value for a given
Definition 2.11. The MLE(ϑ) is the statistic T (Y ) = ϑ
y satisfies the condition
b
L(ϑ|y) = sup L(ϑ|y),
ϑ∈Θ

where L(ϑ|y) is the likelihood function for ϑ.

Properties of MLE

The MLEs are invariant, that is


b
MLE(g(ϑ)) = g(MLE(ϑ)) = g(ϑ).
2.3. METHODS OF ESTIMATION 99

MLEs are asymptotically normal and asymptptically unbiased. Also, they are
efficient, that is
b = lim CRLB(g(ϑ)) = 1.
eff(g(ϑ))
n→∞ b
var g(ϑ)
b is approximately equal to the CRLB. Therefore,
In this case, for large n, var g(ϑ)
for large n, 
b ∼ N g(ϑ), CRLB(g(ϑ))
g(ϑ)
b
approximately. This is called the asymptotic distribution of g(ϑ).

Example 2.20. Suppose that Y1 , . . . , Yn are independent Poisson(λ) random vari-


ables. Then the likelihood is
n Pn
Y λyi e−λ λ i=1 yi −nλ
e
L(λ|y) = = Qn .
i=1
yi ! i=1 yi !

We need to find the value of λ which maximizes the likelihood. This value will
also maximize `(λ|y) = log L(λ|y), which is easier to work with. Now, we have
n
X n
X
`(λ|y) = yi log λ − nλ − log(yi !).
i=1 i=1

The value of λ which maximizes `(λ|y) is the solution of d`/dλ = 0. Thus,


solving the equation Pn
d` yi
= i=1 − n = 0
dλ λ
Pn
yields the estimator λ̂ = T (Y ) = i=1 Yi /n = Y , which is the same as the
Method of Moments estimator. The second derivative is negative for all λ hence,
b indeed maximizes the log-likelihood.
λ 
Example 2.21. Suppose that Y1 , . . . , Yn are independent N (µ, σ 2 ) random vari-
ables. Then the likelihood is
Yn  
2 1 (yi − µ)2
L(µ, σ |y) = √ exp −
i=1 2πσ 2 2σ 2
( n
)
n 1 X
= (2πσ 2 )− 2 exp − 2 (yi − µ)2
2σ i=1

and so the log-likelihood is


n
n 1 X
`(µ, σ 2 |y) = − log(2πσ 2 ) − 2 (yi − µ)2 .
2 2σ i=1
100 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE

Thus, we have
n n
∂` 1 X 1 X
= 2 2(yi − µ) = 2 (yi − µ)
∂µ 2σ i=1 σ i=1

and
n n
∂` n 1 1 X 2 n 1 X
=− 2π + 4 (yi − µ) = − 2 + 4 (yi − µ)2 .
∂σ 2 2 2πσ 2 2σ i=1 2σ 2σ i=1

Setting these equations to zero, we obtain


n n
1 X X
(yi − µ̂) = 0 ⇒ yi = nµ̂,
σ̂ 2 i=1 i=1

so that µ̂ = Y is the maximum likelihood estimator of µ, and


n n
n 1 X 2 2
X
− 2+ 4 (yi − y) = 0 ⇒ nσ̂ = (yi − y)2 ,
2σ̂ 2σ̂ i=1 i=1

P
so that σ̂ 2 = ni=1 (Yi − Y )2 /n is the maximum likelihood estimator of σ 2 , which
are the same as the Method of Moments estimators. 

Exercise 2.12. Let Y = (Y1 , . . . , Yn )T be a random sample from a Gamma distri-


bution, Gamma(λ, α), with the following pdf

λα α−1 −λy
f (y; λ, α) = y e , for y > 0.
Γ(α)

Assume that the parameter α is known.

(a) Identify the complete sufficient statistic for λ.


1
(b) Find the MLE[g(λ)], where g(λ) = λ
. Is the estimator a function of the
complete sufficient statistic?

(c) Knowing that E(Yi ) = α λ1 for all i = 1, . . . , n, check that the MLE[g(λ)] is
an unbiased estimator of g(λ). What can you conclude about the properties
of the estimator?
2.3. METHODS OF ESTIMATION 101

2.3.3 Method of Least Squares

If Y1 , . . . , Yn are independent random variables, which have the same variance and
higher-order moments, and, if each E(Yi ) is a linear function of ϑ1 , . . . , ϑp , then
the Least Squares estimates of ϑ1 , . . . , ϑp are obtained by minimizing
n
X
S(ϑ) = {Yi − E(Yi )}2 .
i=1

The Least Squares estimator of ϑj has minimum variance amongst all linear
unbiased estimators of ϑj and is known as the best linear unbiased estimator
(BLUE). If the Yi s have a normal distribution, then the Least Squares estimator
of ϑj is the Maximum Likelihood estimator, has a normal distribution and is the
MVUE.

Example 2.22. Suppose that Y1 , . . . , Yn1 are independent N (µ1 , σ 2 ) random vari-
ables and that Yn1 +1 , . . . , Yn are independent N (µ2 , σ 2 ) random variables. Find
the least squares estimators of µ1 and µ2 .

Since 
µ1 , i = 1, . . . , n1 ,
E(Yi ) =
µ2 , i = n1 + 1, . . . , n,
it is a linear function of µ1 and µ2 . The Least Squares estimators are obtained by
minimizing
n
X n1
X n
X
2 2
S= {Yi − E(Yi )} = (Yi − µ1 ) + (Yi − µ2 )2 .
i=1 i=1 i=n1 +1

Now,
Xn1 n1
∂S 1 X
= −2 (Yi − µ1 ) = 0 ⇒ µ
b1 = Yi = Y 1
∂µ1 i=1
n1 i=1

and
X n n
∂S 1 X
= −2 (Yi − µ2 ) = 0 ⇒ µ
b2 = Yi = Y 2 ,
∂µ2 i=n +1
n2 i=n +1
1 1

where n2 = n − n1 . So, we estimate the mean of each group in the population by


the mean of the corresponding sample. 
102 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE

Example 2.23. Suppose that Yi ∼ N (β0 +β1 xi , σ 2 ) independently for i = 1, 2, . . . , n,


where xi is some explanatory variable. This is called the simple linear regres-
sion model. Find the least squares estimators of β0 and β1 .

Since E(Yi ) = β0 + β1 xi , it is a linear function of β0 and β1 . So we can obtain


the least squares estimates by minimizing
n
X
S= (Yi − β0 − β1 xi )2 .
i=1

Now,

X n X nX n
∂S
= −2 (Yi −β0 −β1 xi ) = 0 ⇒ Yi −nβb0 −βb1 xi = 0 ⇒ βb0 = y−βb1 x
∂β0 i=1 i=1 i=1

and
X n X X n X n n
∂S
= −2 xi (Yi − β0 − β1 xi ) = 0 ⇒ xi Yi − βb0 xi − βb1 x2i = 0.
∂β1 i=1 i=1 i=1 i=1

Substituting the first equation into the second one, we have


n n n n
! n
X X X X X
xi Yi −(y−βb1 x) xi −βb1 x2i =0⇒ 2
nx − x2i b
β1 = nxy− xi Yi .
i=1 i=1 i=1 i=1 i=1

Hence, we have the estimators


Pn
i=1 xi Yi − nxY
βb0 = Y − βb1 x and b
β1 = P n 2 2 .
i=1 xi − nx

These are the Least Squares estimators of the regression coefficients β0 and β1 .


Exercise 2.13. Given data, (x1 , y1), . . . , (xn , yn ), assume that Yi ∼ N (β0 +β1 xi , σ 2 )
independently for i = 1, 2, . . . , n and that σ 2 is known.

(a) Show that the Maximum Likelihood Estimators of β0 and β1 must be the same
as the Least Squares Estimators of these parameters.
2.3. METHODS OF ESTIMATION 103

(b) The quench bath temperature in a heat treatment operation was thought to
affect the Rockwell hardness of a certain coil spring. An experiment was
run in which several springs were treated under four different temperatures.
The table gives values of the set temperatures (x) and the observed hardness
(y, coded). Assuming that hardness depends on temperature linearly and
the variance of the r.vs is constant we may write the following model:

E(Yi ) = β0 + β1 xi , var(Yi ) = σ 2 .

Calculate the LS estimates of β0 and of β1 . What is the estimate of the


expected hardness given the temperature x = 40?

Run 1 2 3 4 5 6 7 8 9 10 11 12 13 14
x 30 30 30 30 40 40 40 50 50 50 60 60 60 60
y 55.8 59.1 54.8 54.6 43.1 42.2 45.2 31.6 30.9 30.8 17.5 20.5 17.2 16.9

In this section we were considering so called point estimators. We used vari-


ous methods, such as the Method of Moments, Maximum Likelihood or Least
Squares, to derive them. We may also construct the estimators using the Rao-
Blackwell Theorem. The estimators are functions

T (Y1, . . . , Yn ) → Θ,

that is, their values belong to the parameter space. However, the values vary with
the observed sample. If the estimator is MVUE we may expect that “on average”
the calculated estimates are very close to the true parameter and also that the
variability of the estimates is the smallest possible.

Sometimes it is more appropriate to construct an interval which covers the un-


known parameter with high probability and whose limits depend on the sample.
We introduce such intervals in the next section. The point estimators are used in
constructing the intervals.

You might also like