Point Estimation
Point Estimation
TABLE OF CONTENTS
Section No. and Heading Page No.
Learning Objectives 2
Practice Questions 19
POINT ESTIMATION
Learning Objectives
Now we will explain that estimators themselves are random variables. Usually we
describe a sample of size n by the values , ... of the random variables , , ...
. If sampling is with replacement, , , ... would be independent, identically
distributed random variables having probability distribution ( ). Their joint distribution
would then be
P( = , = ,........, = )= .....
Now we can use the sample values , ... to compute some statistic (mean, variance
etc.) and use this as an estimate of population parameter. Algebraically, a statistic for a
sample of size n can be defined as a function of the random variables , , ... , i.e.,
g( , , ... ).The function g( , , ... ), that is any statistic, is another random
variable, whose values can be represented by g( , ... ). The same holds true if
we have more than one sample. Suppose we take two samples of heights of m male
students and n female students at a particular university. We represent sample values by
, ... and , , ... respectively. The difference between the two sample mean
heights is - , and is the sensible statistic for estimating - , the difference between
the two population mean heights. Now the statistic - is a linear combination of two
random variables , , ... and , , ... and so itself is a random variable.
Since estimators are random variables, one of the key problems of point estimation
is to study their sampling distributions to make a comparison among different estimators.
For instance, when we estimate the variance of a population on the basis of a random
sample, we can hardly expect that the value of we get will actually equal ,but it would
be reassuring, at least, to know whether we can expect it to be close. Similarly, suppose we
draw a random sample of size n from a normal population with mean value . Now sample
arithmetic mean is a natural statistic for estimating . However, median of the population,
average of the two extreme observations in the population and k% trimmed mean are also
equal to , since normal distributions are symmetric. So we can consider any of the
following estimators for :
(a) Estimator = =Arithmetic Mean
(b) Estimator = =Median
in the sample
Which of these estimates is closest to the true value? We cannot answer this without
knowing the true value of (in which case estimation is unnecessary).Questions that can be
answered are, "which estimator, when used on other samples of 's will tend to produce
estimates closest to the true value, which will expose us to the smallest risk, which will give
us the most information at the lowest cost and so forth?"To decide which estimator is most
appropriate in a given situation, various statistical properties of estimators can be used.
Figure 1. The pdf's of a biased estimator and an unbiased estimator for a parameter .
One may feel that, it is necessary to know the true parameter value to see whether
an estimator is biased or unbiased. This is not usually the case because unbiasedness is a
general property of the estimator's sampling distribution-where it is centered-which is
typically not dependent on any particular parameter value.The following examples will
illustrate this:
Proof: E =E = E(X) =
Hence the distribution of the estimator will be centered at the true value p.
Example 3: Let , ,----- be a random sample from a normal population with mean
a biased estimator of .
Proof: Since , ,----- are random variables having the same distribution as the
population, which has mean µ, we have
E( )=µ for i=1,2,........n
We have as required
E( )= = (nµ)=µ
E( )=E
= E
it follows that E( )= =
which is very nearly only for large values of n(say, n 30). The desired unbiased estimator
is defined by
= = so that E( )=
It can be noted that we have divided the sum of squared deviations by (n-1) instead of n.
The reason for this is that by definition we should have taken deviations from µ rather than
. But we do not know the value of µ so we have to take deviations from . Since s will
always be closer to than to µ so the sum of squared deviations is underestimating the true
sum of squared deviations.
Proof:
Denote by L. Now L will be minimised when its first derivative with respect to c is
zero and second derivative with respect to c is positive. Differentiating with respect to c we
get
=2 (-1) = 0
= =
=2 >0
In order to make a correction for this underestimation we divide by (n-1) rather than
n.
Now we will discuss two basic difficulties associated with the concept of
unbiasedness. One difficulty associated with the concept of unbiasedness is that it may not
be retained under functional transformations, i.e. if is an unbiased estimator of , it does
not necessarily follow that g is an unbiased estimator of g For example, although is
an unbiased estimator of but is not an unbiased estimator of .Taking the square root
messes up the property of unbiasedness. Second difficulty associated with the concept of
unbiasedness is that unbiased estimators are not necessarily unique. The following example
will illustrate this:
.So for any fixed , Y is a random variable having mean value . That is, we assume that
the mean value of Y is related to by a line passing through (0,0) but that the observed
value of Y will typically deviate from this line. Now we can consider any of the following
three estimators of
(1) =
(2) =
(3) =
(1) E = = = = =
(2) E = = ( )= =
(3) E = = = = ( )=
Similarly, if , ,----- is a random sample from a normal distribution with mean µ, then
, and trimmed mean with any percentage are all unbiased estimators of µ.
It can be seen that both and are unbiased estimators of as pdf of each is centered at
, but has more spread as compared to . So we select . is also called minimum
variance unbiased estimator (MVUE) of as it has least variance among all unbiased
estimators of .
Example 5: For a normal population, the sampling distributions of the mean and median
both have the same mean, namely, the population mean. So both are unbiased estimators.
However, the variance of the sampling distribution of mean is equal to which is smaller
Therefore, the mean provides a more efficient estimate than the median and the
efficiency of the median relative to the mean is approximately
= =
or about 64%. It means that mean requires only 64% as many observations as the median
to estimate with the same reliability.
Question 1: Show that is a biased but more efficient estimator of population variance
Var( )=
So Var( )=Var =
Hence MSE( )= + =
that when n is sufficiently large, we can be practically certain that the error made with a
consistent estimator will be less than any small pre-assigned positive constant.
Figure 4 : variance 0 as n
as n
1) If we draw a random sample from a normal population, then is the best among the
four estimators ( , , and ), since its variance is least among all unbiased
estimators.
2) If we draw a random sample from a Cauchy distribution,
then and are bad estimators for , while is reasonably good. is bad as it is very
sensitive to extreme observations, and due to heavy tails of the Cauchy distribution it is
very likely that a few such observations appear in any sample.
3) If we draw a random sample from a uniform distribution, then is the best estimator.
is very sensitive to extreme observations but such observations are unlikely to
appear in any sample as uniform distribution does not have any tails.
4) The trimmed mean is not best in any of these three situations. However it is quite good
in all three. Hence with small trimming percentage is called a robust estimator i.e.
one that performs reasonably well for a wide variety of population distributions.
So both i.e. distribution of population and sampling distribution of estimator are important
to decide which estimator is best for a given situation.
standard error of the relevant estimator which we can denote by . It is the size of
Example 6: Let , ,----- be a random sample from a normal population, then the
standard error of = is given by = . If, we do not know, the value of then we can
We can also use the standard error of the estimator used to convert point estimate
into interval estimate.
As we have seen in this chapter, there can be many different ways (estimators) of
estimating a parameter of a population. Further different estimators have various desirable
properties to varying degrees. Therefore, it would seem desirable to have some general
methods that yield estimators with reasonable desirable properties. Here we will discuss two
such methods, the method of moments, which is historically one of the oldest methods
and the method of maximum likelihood. Although maximum likelihood estimators are
generally preferable to moment estimators because of certain efficiency properties, they
often require significantly more computation than do moment estimators.
Thus the first population moment is E(X)= , and the first sample moment is = .
The method of moments consists of equating the first few moments of a population
to the corresponding moments of a sample, thus getting as many equations as are needed
to solve for the unknown parameters of the population.
Thus the method of moments consists of solving the system of equations
= k=1,2-----p
Hence =
If both n and p are unknown, then the system of equations we shall have to solve is
= and =
we get
and solving these two equations for n and p, we find the estimates of the two parameters of
the binomial distribution.
Since npq +
= q+
= =(1- )
=1-
= =
Question 5: Given a random sample of size n from a uniform population with =1, use the
method of moments to obtain a formula for estimating the parameter .
Example 8: Suppose Mr X receives five letters on some particular day, but unfortunately
one of them gets misplaced before he has a chance to open it. If among the remaining four
letters three contain credit-card billings and the other one does not, what might be a good
estimate of k, the total number of credit-card billings among the five letters received?
Clearly k must be three or four. Assuming that each letter had the same chance of being
misplaced, we find that the probability of the observed data is
= for k=3
and
= for k=4
Therefore, if we choose as our estimate of k the value that maximizes the probability of
getting the observed data, we obtain k=4. We call this estimate a maximum likelihood
estimate and the method by which it was obtained is called the method of maximum
likelihood.
In the general case, if the observed sample values are , ,...... ,we can write in the
discrete case
P( = = ,......, = )= ( ,
; ) which is just the value of the joint
,......
probability distribution of the random variables , ,...... at the sample point ( ,
,...... ). Since the sample values have been observed and are therefore fixed numbers,
we regard ( , ,...... ; ) as the value of a function of the parameter ,referred to as
the likelihood function L( ). A similar definition applies when the random sample comes
from a continuous population, but in that case ( , ,...... ; ) is the value of the joint
probability density at the sample point ( , ,...... ). The method of maximum
likelihood consists of maximizing the likelihood function with respect to , and we refer to
the value of which maximizes the likelihood function as the maximum likelihood estimate
of .To maximize L( )= ( , ,...... ; ) we take the derivative of L( ) with respect
to and set it equal to zero.
Question 6: Given "successes" in n trials, find the maximum likelihood estimator of the
parameter of the binomial distribution.
L( )=b( ;n, )=( ) , it will be convenient to make use of the fact that the
value of which maximizes L( ) will also maximize
Thus we get = -
and, equating this derivative to 0 and solving for , we find that the likelihood function has
binomial distribution is = .
Question 7: Suppose that n observations, , ,...... are made from a normally
distributed population. Find
(a) the maximum likelihood estimate of the mean if variance is known but mean is unknown
(b) the maximum likelihood estimate of the variance if mean is known but variance is
unknown.
Solution:
(a) Since f( , )=
we have
(1) L = f( , )........ f( , )=
Therefore,
(2) ln L = - ln -
(3) =
Setting = 0 gives
(4) = 0 i.e. =0
or
(5) =
(b) Since f( , )=
we have
(1) L = f( , )........ f( , )=
Therefore,
(2) ln L = - ln -
(3) =- +
Setting = 0 gives
being the sample value. Show also that the estimate is biased.
Solution: Sample of unit size =1
likelihood function L( ) = ( = f(
=- +
= -
- + =0 = = =
When = ,
E( ) = E( )=2 = =
Practice Questions:
Q.1 Assuming that the population is normal, give examples of estimators (or estimates)
which are
(a) unbiased and efficient
(b) unbiased and inefficient
(c) biased and inefficient.
Q.2 Show that is a minimum variance unbiased estimator of the mean of a normal
population.
Q.3 If is an estimator of a parameter , its bias is given by b=E( )- . Show that
E =V( )+ .
Q.4 If and are unbiased estimators of the same parameter , what condition must be
imposed on the constants and so that + is also an unbiased estimator of ?
Q.5 Suppose that we use the largest value of a random sample of size n to estimate the
parameter of the population.
( )=
=0 Otherwise
Check whether this estimator is (a) unbiased and (b) consistent.
Q.6 Show that for a random sample from a normal population, the sample variance is a
Q.7 In estimating the mean of a normal population on the basis of a random sample of
size 2n+1, what is the efficiency of the median relative to the mean?
Q.8 If , ,...... are the values of a random sample of size n from a population having
the density
( ; )=
=0 otherwise
find an estimator for by the method of moments.
Q.9 Let ,... be a random sample from a gamma distribution with parameters and .
a. Derive the equations whose solutions yield the maximum likelihood estimators of
and . Do you think they can be solved explicitly ?
b. Show that the mle of = is = .
Q.10 Among N independent random variables having identical binomial distribution with the
parameters and n=2, take on the value zero, take on the value one, and take on
the value two. Find an estimate of using