0% found this document useful (0 votes)
5 views

Module4

This document discusses methods of point estimation in statistics, focusing on the Method of Moments (MoM) and Maximum Likelihood (ML) estimation. MoM involves equating sample moments to population moments to find estimators, while ML estimation maximizes the likelihood function to derive estimators. The document also highlights properties of MLE, including its consistency, asymptotic normality, and potential shortcomings.

Uploaded by

priyamsahoojnvkp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Module4

This document discusses methods of point estimation in statistics, focusing on the Method of Moments (MoM) and Maximum Likelihood (ML) estimation. MoM involves equating sample moments to population moments to find estimators, while ML estimation maximizes the likelihood function to derive estimators. The document also highlights properties of MLE, including its consistency, asymptotic normality, and potential shortcomings.

Uploaded by

priyamsahoojnvkp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

MTH211A: Theory of Statistics

Module 4: Methods of Point Estimation

In the previous modules, we have talked about properties of good estimators. Next, we will discuss
some common methods of finding point estimators of θ. The two most popular frequentist methods of
finding estimators are:
(A) method of moments (MoM), and
(B) maximum likelihood (ML) estimation.
We will now discuss these methods in details.

1 Method of Moments (MoM)


One of the simplest and oldest methods of finding estimators is the method of moments or substi-
tution principle. Let X1 , . . . , Xn be a random sample from a population with distribution function
{Fθ ; θ ∈ Θ}. Method of moments estimators are found by equating the first k sample moments to the
corresponding k population moments, and solving the resulting system of equations.
Let θ be a k-dimensional parameter. Then usually we require a system of k-equations to get an
estimate of θ. Let Mr′ be the r-th sample raw moment, and µ′r = E (X1r ) be the r-th raw moment of
Fθ , r = 1, . . . , k. Of course, for each r, µ′r will be a function of θ. Then we obtain estimates of θ by
solving the following equations:
Mr′ = µ′r , r = 1, . . . , k.

Example 1. Let X1 , . . . , Xn be a random sample from Normal(µ, σ 2 ). The MoM estimators of µ and
σ 2 are X̄ and S 2 respectively.
P (a.s)
Remark 1. From Weak (Strong) Law of Large Numbers, Mr′ −−−−−→ µ′r . Therefore, if one is interested
in the population moments, then MoM provides consistent (strongly consistent) estimators.
Remark 2. However, MoM may lead to estimators having sub-optimal sampling properties, and may
lead to absurd estimators in some cases.

Example 2. Let X1 , . . . , Xn be a random sample from Uniform(α, β). The MoM estimators of α and
β are T1 (X) and T2 (X), respectively, where
r Pn r Pn
3 i=1 (Xi − X̄n )2 3 i=1 (Xi − X̄n )2
T1 (X) = X̄n − , and T2 (X) = X̄n + .
n n
Observe that, none of the estimates are based on minimal sufficient statistic.

2 Maximum Likelihood (ML) Estimation


Definition 1 (Likelihood Function). Let X = (X1 , . . . , Xn )′ be a random sample from a population
with distribution function {Fθ ; θ ∈ Θ}. Suppose the distribution {Fθ ; θ ∈ Θ} is characterized by a
pdf (or, pmf ) fX (·; θ). Further, suppose x is a realization of X. Then the function of θ defined as
L(θ | x) = f (x; θ) is called the likelihood function.

Definition 2 (Maximum Likelihood Estimate, MLE). Given a realization x, let θ b be the value in
Θ that maximizes the likelihood function L(θ | x) with respect to θ, then θ is called the MLE of the
b
parameter θ.

1
Note that, the maximizer θ b is nothing but a function of the realization x. Thus we can treat the
maximizer of the likelihood function as a statistic or estimator of θ. This estimator is called Maximum
Likelihood (ML) estimator. Notationally, we write θ b=θ bML (X).

Example 3. Suppose there are n tosses of a coin, and we do not know the value of n, or the probability
of head (p). However, we know that n is between 3 to 5 and one of the sides of the coin
is twice as heavy as the other (i.e., either p = 2(1 − p) or (1 − p) = 2p). Then what is
the MLE of θ = (n, p) given we observe x heads, x = 1, . . . , 5?

Remark 3. If the likelihood function is differentiable with respect to θ, the one may take the differen-
tiation approach for finding the MLE. In case of a several value function L(θ), if the function is twice
continuously differentiable with respect to each θj , then a critical point of L(θ | x) can be obtained by

equating L(θ | x) = 0. Then to verify, if the critical point is a maximizer, one can check if the
∂θ
∂2
Hessian matrix L(θ | x) is negative definite at the critical point.
∂θ∂θ ′
Remark 4. Often it is convenient to work with the log likelihood function, instead of the likelihood
function. As logarithm is a monotone function, the maximizer of likelihood and the log likelihood are
the same. The log likelihood is generally denoted by l(θ; x).

Example 4. Let X1 , . . . , Xn be a random sample from normal(µ, σ 2 ). Then the MLE of µ and σ 2 are
X̄n and Sn2 , respectively.
Example 5. Let X1 , . . . , Xn be a random sample from uniform(α, β). Then the MLE of α and β are
X(1) and X(n) , respectively.

In the above two examples, we have seen that MLE is a function of a sufficient statistic. This
phenomenon is in general true, as stated by the following theorem.

Theorem 1 (Properties of MLE: 1). Let X1 , . . . , Xn be a random sample from some distribution with
pdf (or pmf ) fθ ; θ ∈ Θ ⊆ Rk , and let T(X) be a sufficient statistic for θ. Then an MLE, if exists and
unique, is a function of T. If MLE exists, but is not unique then, one can find an MLE which is a
function of T. [Proof]

Remark 5. Maximum likelihood estimate may not exist. See the following example.

Example 6. Let X1 , X2 be a random sample from Bernoulli(θ), θ ∈ (0, 1). Suppose the realization
(0, 0) is observed. Then the MLE does not exist.

Remark 6. Even if MLE exists, it may not be unique. See the following example.

Example 7. Let X1 , . . . , Xn be a random sample from Double Exponential(θ, σ) distribution, θ ∈ R.


Then the θbML is the median X1 , . . . , Xn , which is not unique.

Remark 7. The method of maximum likelihood estimation may produce an absurd (not meaningful)
estimator. In the example

In spite of all the above shortcomings, MLE is by far the most popular and reasonable frequentist
method of estimation. The reason is that MLE possesses a list of desirable properties. We will discuss
some of them below.
Theorem 2 (Properties of MLE: 2). Suppose the regularity conditions of CRLB (see Theorem 5 of
Module 3) are satisfied, the log-likelihood is twice differentiable, and there exists an unbiased estimator
θb⋆ of θ, the variance of which attains the CRLB. Suppose further that the likelihood equation has a
unique maximizer θbML (X), then θb⋆ = θbML (X). [Proof]
Corollary 1. Theorem 2 implies that if the CRLB is attained by any estimator, then it must be an
MLE. However, the converse is not true, i.e., the variance of an MLE may not attain the CRLB.

2
2.1 Invariance Property
Let η := Ψ(θ) be any function of θ, and we are interested in the optimal value of η given a sample
X1 , · · · , Xn . Let H = {η = Ψ(θ) : θ ∈ Θ} be the set of all possible values of η, and for each η ∈ H,
Aη = {θ ∈ Θ : Ψ(θ) = η}. We are interested in obtaining that value of η, corresponding to which
the likelihood is maximized, i.e., the η for which Aη contains the θ bML . If we term that optimal
η as η b ML , then ηb ML := arg maxη∈H supθ∈Aη L(θ | X) = arg maxη∈H L⋆ (η | X). The function

L (η | X) = supθ∈Aη L(θ | X) is called the induced likelihood of η, and the maximizer of the induced
likelihood is called the MLE of η. The following theorem states that η b ML = Ψ(θbML ) for any function
Ψ.
Theorem 3 (Properties of MLE: 3, Invariance Property). Let {fX (·; θ) : θ ∈ Θ} be a family of PDFs
(PMFs), and let L(θ | X) be the likelihood function. Suppose Θ ⊆ Rk , k ≥ 1. Let Ψ :Θ → Λ be  a
mapping of Θ onto Λ, where Λ ⊆ Rp (1 ≤ p ≤ k). If θ bML (X) is an MLE of θ, then Ψ θ bML (X) is
an MLE of Ψ(θ). [Proof]

Example 8. Let X1 , . . . , Xn be a random sample from Gamma(1, θ) distribution, θ > 0. Find an MLE
of θ.
Example 9. Let X1 , . . . , Xn be a random sample from Poisson(θ) distribution, θ > 0. Find an MLE
of P (X = 1) = exp{−θ}.

2.2 Asymptotic properties of MLE


Definition 3 (Consistent Estimator). Consider the family of distributions Fθ indexed by θ ∈ Θ ⊆ Rk ,
and let X1 , · · · , Xn be a random sample from Fθ0 for some θ 0 ∈ Θ. Then the estimator T(X) is called
p
a consistent estimator of θ if T(X) − → θ0 as n → ∞, i.e.,

Pθ0 (∥T(X) − θ∥ > ϵ) → 0 as n → ∞, for every ϵ > 0.

Definition 4 (Asymptotic Normality). Consider the family of distributions Fθ indexed by θ ∈ Θ ⊆ Rk ,


and let X1 , · · · , Xn be a random sample from Fθ0 for some θ 0 ∈ Θ. Then the estimator T (X) of θ is
called asymptotically normal (AN) if there exists a k × k positive definite matrix Vn (θ) depending on
n and θ such that
−1/2 d
[Vn (θ)] (T(X) − θ) −
→ N (0, Ik ) as n → ∞,
where Ik is the identity matrix of order k.
An estimator is called a CAN estimator, if it is consistent and asymptotically normal.
Definition 5 (Asymptotically Efficient Estimator). Suppose the Fisher information In (θ) = E [S(X; θ)S(X; θ)′ ]
is well-defined and positive definite. A asymptotically normal estimator T(X) is said to be asymptoti-
cally efficient iff Vn (θ) = In−1 (θ).
In the following theorem, we will see that under some regularity conditions MLE is a CAN estimator.
Theorem 4 (Properties of MLE: 4, Asymptotic normality). Consider the family of distributions Fθ
indexed by θ ∈ Θ ⊆ Rk which satisfies some regularity conditions (for details see the book ‘A course
in large sample theory’ by Ferguson, Chapter 18). Let X1 , · · · , Xn be a random sample from Fθ0 for
some θ 0 ∈ Θ, and θbML (X) is the MLE of θ. Then the following is true

√ n
 
1/2
o
D ′ ∂S(X; θ 0 )
I (θ0 ) n θ ML (X) − θ 0 −→ N(0, Ik ), where I(θ0 ) = Eθ0 [S(X; θ 0 )S(X; θ 0 ) ] = −Eθ0
b .
∂θ′

Here S(X; θ) = ∂ log fX (·; θ)/∂θ is the score function based on one sample. [Without Proof]
Corollary 2. Theorem 4 implies that under appropriate regularity conditions, MLE is a consistent
p
bML (X) −
estimator of θ, i.e., θ → θ as n → ∞ [Proof].
Further, from the definition of asymptotic efficiency it is also clear that MLE is an asymptotically
efficient estimator.

You might also like