0% found this document useful (0 votes)
19 views7 pages

Chap 10

Uploaded by

Andrew Martins
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views7 pages

Chap 10

Uploaded by

Andrew Martins
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Page 1 of 7

Chapter 10 Point Estimation


Point
Estimation
Interval
Statistical Inference
Hypothesis Testing

Usually, we use the value of a statistic to estimate a population parameter. This value is called a
point estimate of the parameter. The statistic is also referred to as an estimator.
Examples- If we take a random sample of size n, the sample mean x may be used as a point
estimate of the population mean µ. The statistic X is an estimator of µ.

X is an estimator of µ
S 2 is an estimator of σ 2
p̂ = X/n is an estimator of p
Since there are many estimators, it is necessary to study some desirable properties of estimators.

Properties of Estimators
Section 10.2 Unbiased Estimators
A statistic θˆ is an unbiased estimator of the parameter θ if and only if E( θˆ ) = θ.
Let X 1 , X 2 , , X n be a random sample from a population with mean µ. Consider the sample
1 1
mean X . = E( X )
n
∑ E=( Xi ) =
n
∑ µ µ . Therefore, X is an unbiased estimator of µ.
If S 2 is the variance of a random sample from an infinite population, then E( S 2 ) = σ 2 .
If d̂ is biased for θ and lim E(dˆ ) = θ , then we say that d̂ is asymptotically unbiased for θ.
n →∞

Problems of Unbiasedness
• If θˆ is unbiased for θ, it does not follow that ω( θˆ ) is unbiased for ω(θ).
• Unbiased estimators are not necessarily unique.

Section 10.3 Efficiency


Definition: If an unbiased estimator θˆ has the smallest variance of all unbiased estimators of θ, it
is called the minimum variance unbiased (MVU) estimator or the best unbiased estimator for θ.

If θˆ is an unbiased estimator of θ, under very general conditions (like differentiation with


respect to θ under ∫ or ∑ sign), the variance of θˆ must satisfy the inequality
var(θˆ) ≥ 1
 ∂ ln f ( X ) 2 
.
nE   
 ∂θ  

The above is called the Cramer-Rao inequality. If θˆ is an unbiased estimator of θ and


var(θˆ) =  1 2
,
 ∂ ln f ( X ) 
nE   
 ∂θ  

then θˆ is a minimum variance unbiased estimator of θ.


Page 2 of 7

The quantity in the denominator is referred to as the information about θ, which is supplied by
the sample.
 ∂ ln f ( X )  2   ∂ 2 ln f ( X ) 
Note that E    = − E  ∂θ 2 .
 ∂θ    

Let θˆ1 and θˆ2 be two unbiased estimators of θ and let var( θˆ1 ) < var( θˆ2 ), we say that θˆ1 is
relatively more efficient. The efficiency of θˆ relative to θˆ is defined as the ratio
2 1

var(θˆ1 )
.
var(θˆ2 )
Example 10.7 on page 286. Efficiency of sample median to the sample mean is about 64%. For
large samples, the sample mean requires only 64% as many observations as the sample median to
estimate the population mean µ with the same reliability.

Note that relative efficiency is based on unbiased estimators. For biased estimators, we compare
them by using the mean square error instead of the variance. The mean square error is defined as
MSE = E[(θˆ − θ ) 2 ] .

Examples
Problem 10.7 page 287

Problem 10.15 page 287

Problem 10.23 page 288

Example: For what value of k is θˆ = kX an unbiased estimator of θ for the population


f ( x) = 1/ θ , 0 < x < θ.
Solution:

Problem 10.3 page 286


Page 3 of 7

Section 10.4 Consistency


The statistic θˆ is a consistent estimator of θ if and only if for each positive constant c,
( )
lim P | θˆ − θ |≥ c =
n →∞
(
0 or lim P | θˆ − θ |< c =
n →∞
)
1 . That is, when n is large, the estimation will take
on values which are very close to the respective parameters.
σ2
Recall the Chebyshev’s Theorem- P (| X − µ |< c ) ≥ 1 − , which is the law of large numbers.
nc 2

The statistic θˆ is a consistent estimator of the parameter θ if


(i) θˆ is unbiased; and (ii) var( θˆ ) → 0 as n → ∞ .
Note that a biased estimator may be consistent provided it is asymptotically unbiased. Thus, the
above conditions are sufficient and they are not necessary for consistency.

Section 10.5 Sufficiency


An estimator θˆ is sufficient if it uses all the information in a sample relevant to the estimation of
the population parameter θ.

Consider the outcomes of n trials of a Bernoulli experiment, X 1 , X 2 , , X n . X i = 1 with


probability θ and X i = 0 with probability 1 – θ. Suppose we are given Y = ΣX i , number of
successes in n trials. Given Y, can we gain any additional information on θ by looking at other
functions of X 1 , X 2 , , X n ? To answer this question, let us look at the conditional distribution of
X 1 , X 2 , , X n given Y.
P=
( X 1 x1 , ,= X n x=n ,Y y)
P( X= x1 , , X= xn | Y= y=
)
P(Y = y )
1 n

 n   n
= θ y (1 − θ ) n − y   θ y (1 − θ ) n − y  = 1   , which is independent
 y    y
of θ. Thus, once Y is known, no other information from X 1 , X 2 , , X n will shed additional light
on the possible value of θ. So Y contains all the information about θ. Therefore, Y is sufficient
for θ. If P= ( X 1 x1 ,=, X n x=n |Y y ) depends on θ, some X 1 , X 2 , , X n are more probable for
some values of θ than for others.

Definition: The statistic θˆ is a sufficient estimator of θ if and only if for each value of θˆ the
conditional distribution of X 1 , X 2 , , X n given Θ̂ = θˆ is independent of θ.

The above definition may not be easy to work with in order to verify sufficiency property. We
now state the following factorization theorem.

Factorization Theorem: The statistic θˆ is sufficient estimator of the parameter θ if and only if the
joint density or probability distribution of the random sample X 1 , X 2 , , X n can be factored so
Page 4 of 7

that f ( x1 , x2 ,  , xn ; θ ) = g (θˆ; θ )h( x1 , x2 ,  , xn ) , where g (θˆ; θ ) depends only on θˆ and θ,


and h( x1 , x2 ,  , xn ) does not depend on θ.

Property: If θˆ is sufficient for θ, then any single-valued function Y = u( θˆ ), not involving θ, is


also sufficient for θ. Also, u( θˆ ) is sufficient for u(θ) provided y = u( θˆ ) can be solved to give the
single-valued inverse θˆ = ω(y).

Example: Show that the estimator in Problem 10.23 (on page 288) is consistent.
Solution:

Problem 10.42 page 295

Problem 10.48 page 296

Example: Consider the density function f ( x) = 1/ θ , 0 < x < θ. Suppose we use Yn , the largest
order statistic to estimate θ, check whether this estimator is (a) unbiased and (b) consistent. Find
the efficiency of the sample mean relative to Yn .
Solution:

Section 10.6 Robustness


An estimator is robust if its sampling distribution is not seriously affected by violations of
assumptions. For example, we use a t-test for small samples when we assume that the population
is normal.

Methods for Finding Estimators


Among the various methods of estimation are method of moments, method of maximum
likelihood, method of least squares (in Chapter 14) and Bayes’ method.

Section 10.7 Method of Moments (Moment Estimators)


This method consists of equating the first few moments of a population to the corresponding
moments of a sample. These equations are then solved for the unknown parameters. The number
of equations is the same as the number of unknown parameters.

1 n k
If X 1 , X 2 , , X n is a random sample, the kth sample moment is mk′ = ∑ xi . The method of
n i =1
moments leads to
mk′ = µk′ , k = 1, 2, …, p
for the p parameters of the population. Note that one may use
mk = µk , k = 1, 2, …, p
where mk is the kth central sample moment and µk is the kth central population moment.
Page 5 of 7

Example: Find the moment estimates for the binomial parameters θ and n.
Solution:

Section 10.7 Method of Maximum Likelihood (ML):


Definition: If X 1 , X 2 , , X n is a random sample from a population with parameter θ, the
likelihood function of the sample is L(θ ) = f ( x1 , x2 ,  xn ; θ ) .
The method of maximum likelihood consists of maximizing L(θ ) with respect to θ. The value of
θ obtained in this way is the maximum likelihood estimate (mle) of θ.

Advantages: (i) mle yields sufficient estimators whenever they exist and (ii) mle are
asymptotically minimum variance unbiased estimators.

Note
a. The value of θ that will maximize L(θ ) is the same that will maximize ln [ L(θ ) ] . It may be
easier to work with ln [ L(θ ) ] .
b. It is not always the case that the method of differentiation can be used to obtain the mle.
When the domain of the function depends on the parameter, we can hardly use the method of
differentiation.

Regular case is when method of differentiation works and the non-regular case is when method
of differentiation does not work.

Examples
Problem 10.51 page 301

Problem 10.53 page 301

Problem 10.59 page 302

Example: If X 1 , X 2 , , X n is a random sample from a population with density function


f ( x; θ ) = 1/ θ , 0 < x < θ. Find the maximum likelihood estimate of θ.
Solution:
Page 6 of 7

Example: If X 1 , X 2 , , X n is a random sample from a population with density function


f ( x; θ ) = e − ( x −θ ) , x > θ. Find the maximum likelihood estimate of θ.
Solution:

Problem 10.82 page 309

Invariance Property: If θˆ is a maximum likelihood estimator of θ and the function g (θ ) is


continuous, then g (θˆ) is also a maximum likelihood estimator of g (θ ) .

Section 10.9 Bayesian Estimation


If we assume that the parameters are random variables with prior distributions, we can use
Bayesian estimation for the parameters. In Bayesian estimation, we combine prior feelings about
a parameter with sample evidence to obtain ϕ (θ | x) , the conditional density of Θ given X = x.
The conditional distribution is also called the posterior distribution of Θ . By definition,
f (θ , x) h(θ ) f ( x | θ )
ϕ=
(θ | x) = ,
g ( x) g ( x)
where h(θ ) is the prior distribution of Θ . The quantity h(θ ) reflects the subjective belief of Θ
before the sample is taken. The posterior distribution is the conditional distribution of Θ after
the sample is taken.

Bayesian point estimation is similar to finding decision function δ ( x) that can predict the value
of Θ when the value of x and the conditional density ϕ (θ | x) are known. In general, the mean
or the median is used to predict the population mean of a random variable. In Bayes statistics, the
choice of the decision function depends on a loss function L ( Θ, δ ( x) ) . One method is to select a
decision function δ ( x) such that the conditional expectation of the loss is minimum.

Definition: A Bayes estimate is a decision function δ ( x) that minimizes


E { L ( Θ, δ ( x) ) | X =
x)} =

∫ L (θ , δ ( x) ) ϕ (θ | x)dθ ,
−∞
if Θ is continuous. Replace integration with summation if Θ is discrete. The random variable
δ ( X ) is the Bayes estimator of θ .
Page 7 of 7

If we use the squared error loss function, L (θ , δ ( x)=


) (θ − δ ( x) )
2
, the Bayes estimate is
δ ( x=
) E (Θ | x) , the mean of the conditional distribution of Θ , given X = x. This is so since
E (U − b) 2 is minimized when b = E(U). If the loss function is L (θ , δ ( x) )= | θ − δ ( x) | , then the
median of the conditional distribution of Θ , given X = x, is the Bayes estimate.

Problem 10.77, page 307

Section 10.10 Theory in Practice


See the derivation of mean square error. In particular, see the derivation of the result
MSE(θ= ˆ) σ 2 + (Bias) 2 on page 309.
θˆ

Problem 10.31 page 288

You might also like