Module4
Module4
In the previous modules, we have talked about properties of good estimators. Next, we will discuss
some common methods of finding point estimators of θ. The two most popular frequentist methods of
finding estimators are:
(A) method of moments (MoM), and
(B) maximum likelihood (ML) estimation.
We will now discuss these methods in details.
Example 1. Let X1 , . . . , Xn be a random sample from Normal(µ, σ 2 ). The MoM estimators of µ and
σ 2 are X̄ and S 2 respectively.
P (a.s)
Remark 1. From Weak (Strong) Law of Large Numbers, Mr′ −−−−−→ µ′r . Therefore, if one is interested
in the population moments, then MoM provides consistent (strongly consistent) estimators.
Remark 2. However, MoM may lead to estimators having sub-optimal sampling properties, and may
lead to absurd estimators in some cases.
Example 2. Let X1 , . . . , Xn be a random sample from Uniform(α, β). The MoM estimators of α and
β are T1 (X) and T2 (X), respectively, where
r Pn r Pn
3 i=1 (Xi − X̄n )2 3 i=1 (Xi − X̄n )2
T1 (X) = X̄n − , and T2 (X) = X̄n + .
n n
Observe that, none of the estimates are based on minimal sufficient statistic.
Definition 2 (Maximum Likelihood Estimate, MLE). Given a realization x, let θ b be the value in
Θ that maximizes the likelihood function L(θ | x) with respect to θ, then θ is called the MLE of the
b
parameter θ.
1
Note that, the maximizer θ b is nothing but a function of the realization x. Thus we can treat the
maximizer of the likelihood function as a statistic or estimator of θ. This estimator is called Maximum
Likelihood (ML) estimator. Notationally, we write θ b=θ bML (X).
Example 3. Suppose there are n tosses of a coin, and we do not know the value of n, or the probability
of head (p). However, we know that n is between 3 to 5 and one of the sides of the coin
is twice as heavy as the other (i.e., either p = 2(1 − p) or (1 − p) = 2p). Then what is
the MLE of θ = (n, p) given we observe x heads, x = 1, . . . , 5?
Remark 3. If the likelihood function is differentiable with respect to θ, the one may take the differen-
tiation approach for finding the MLE. In case of a several value function L(θ), if the function is twice
continuously differentiable with respect to each θj , then a critical point of L(θ | x) can be obtained by
∂
equating L(θ | x) = 0. Then to verify, if the critical point is a maximizer, one can check if the
∂θ
∂2
Hessian matrix L(θ | x) is negative definite at the critical point.
∂θ∂θ ′
Remark 4. Often it is convenient to work with the log likelihood function, instead of the likelihood
function. As logarithm is a monotone function, the maximizer of likelihood and the log likelihood are
the same. The log likelihood is generally denoted by l(θ; x).
Example 4. Let X1 , . . . , Xn be a random sample from normal(µ, σ 2 ). Then the MLE of µ and σ 2 are
X̄n and Sn2 , respectively.
Example 5. Let X1 , . . . , Xn be a random sample from uniform(α, β). Then the MLE of α and β are
X(1) and X(n) , respectively.
In the above two examples, we have seen that MLE is a function of a sufficient statistic. This
phenomenon is in general true, as stated by the following theorem.
Theorem 1 (Properties of MLE: 1). Let X1 , . . . , Xn be a random sample from some distribution with
pdf (or pmf ) fθ ; θ ∈ Θ ⊆ Rk , and let T(X) be a sufficient statistic for θ. Then an MLE, if exists and
unique, is a function of T. If MLE exists, but is not unique then, one can find an MLE which is a
function of T. [Proof]
Remark 5. Maximum likelihood estimate may not exist. See the following example.
Example 6. Let X1 , X2 be a random sample from Bernoulli(θ), θ ∈ (0, 1). Suppose the realization
(0, 0) is observed. Then the MLE does not exist.
Remark 6. Even if MLE exists, it may not be unique. See the following example.
Remark 7. The method of maximum likelihood estimation may produce an absurd (not meaningful)
estimator. In the example
In spite of all the above shortcomings, MLE is by far the most popular and reasonable frequentist
method of estimation. The reason is that MLE possesses a list of desirable properties. We will discuss
some of them below.
Theorem 2 (Properties of MLE: 2). Suppose the regularity conditions of CRLB (see Theorem 5 of
Module 3) are satisfied, the log-likelihood is twice differentiable, and there exists an unbiased estimator
θb⋆ of θ, the variance of which attains the CRLB. Suppose further that the likelihood equation has a
unique maximizer θbML (X), then θb⋆ = θbML (X). [Proof]
Corollary 1. Theorem 2 implies that if the CRLB is attained by any estimator, then it must be an
MLE. However, the converse is not true, i.e., the variance of an MLE may not attain the CRLB.
2
2.1 Invariance Property
Let η := Ψ(θ) be any function of θ, and we are interested in the optimal value of η given a sample
X1 , · · · , Xn . Let H = {η = Ψ(θ) : θ ∈ Θ} be the set of all possible values of η, and for each η ∈ H,
Aη = {θ ∈ Θ : Ψ(θ) = η}. We are interested in obtaining that value of η, corresponding to which
the likelihood is maximized, i.e., the η for which Aη contains the θ bML . If we term that optimal
η as η b ML , then ηb ML := arg maxη∈H supθ∈Aη L(θ | X) = arg maxη∈H L⋆ (η | X). The function
⋆
L (η | X) = supθ∈Aη L(θ | X) is called the induced likelihood of η, and the maximizer of the induced
likelihood is called the MLE of η. The following theorem states that η b ML = Ψ(θbML ) for any function
Ψ.
Theorem 3 (Properties of MLE: 3, Invariance Property). Let {fX (·; θ) : θ ∈ Θ} be a family of PDFs
(PMFs), and let L(θ | X) be the likelihood function. Suppose Θ ⊆ Rk , k ≥ 1. Let Ψ :Θ → Λ be a
mapping of Θ onto Λ, where Λ ⊆ Rp (1 ≤ p ≤ k). If θ bML (X) is an MLE of θ, then Ψ θ bML (X) is
an MLE of Ψ(θ). [Proof]
Example 8. Let X1 , . . . , Xn be a random sample from Gamma(1, θ) distribution, θ > 0. Find an MLE
of θ.
Example 9. Let X1 , . . . , Xn be a random sample from Poisson(θ) distribution, θ > 0. Find an MLE
of P (X = 1) = exp{−θ}.
√ n
1/2
o
D ′ ∂S(X; θ 0 )
I (θ0 ) n θ ML (X) − θ 0 −→ N(0, Ik ), where I(θ0 ) = Eθ0 [S(X; θ 0 )S(X; θ 0 ) ] = −Eθ0
b .
∂θ′
Here S(X; θ) = ∂ log fX (·; θ)/∂θ is the score function based on one sample. [Without Proof]
Corollary 2. Theorem 4 implies that under appropriate regularity conditions, MLE is a consistent
p
bML (X) −
estimator of θ, i.e., θ → θ as n → ∞ [Proof].
Further, from the definition of asymptotic efficiency it is also clear that MLE is an asymptotically
efficient estimator.