0% found this document useful (0 votes)
72 views9 pages

Estimators: The Basic Statistical Model

1. The document discusses key concepts in statistics including estimators, parameters, bias, and consistency. It provides definitions and properties of estimators such as being unbiased or consistent estimators of parameters. 2. Specifically, it states that the sample mean is an unbiased and consistent estimator of the population mean, and the sample variance is an unbiased estimator of the population variance when the population mean is known, and a consistent estimator when the population mean is unknown. 3. It also discusses asymptotic properties of estimators as the sample size increases, and how estimators should improve as the sample size gets larger, such as being asymptotically unbiased or consistent.

Uploaded by

Manu Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views9 pages

Estimators: The Basic Statistical Model

1. The document discusses key concepts in statistics including estimators, parameters, bias, and consistency. It provides definitions and properties of estimators such as being unbiased or consistent estimators of parameters. 2. Specifically, it states that the sample mean is an unbiased and consistent estimator of the population mean, and the sample variance is an unbiased estimator of the population variance when the population mean is known, and a consistent estimator when the population mean is unknown. 3. It also discusses asymptotic properties of estimators as the sample size increases, and how estimators should improve as the sample size gets larger, such as being asymptotically unbiased or consistent.

Uploaded by

Manu Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Virtual Laboratories >6.

Point Estimation >1 2 3 4 5 6


1. Estimators
The Basic Statistical Model
As usual, our starting point is a random experiment with an underlying sample space and a probability measure . In the basic
statistical model, we have an observable random variable taking values in a set . Recall that in general, this variable can have
quite a complicated structure. For example, if the experiment is to sample objects from a population and record various
measurements of interest, then the data vector has the form
where is the vector of measurements for the th object. The most important special case is when are
independent and identically distributed (IID). In this case is a random sample of size from the distribution of an underlying
measurement variable .
Statistics
Recall also that a statistic is an observable function of the outcome variable of the random experiment: . Thus, a
statistic is simply a random variable derived from the observation variable , with the assumption that is also observable. As
the notation indicates, is typically also vector-valued. Note that the original data vector is itself a statistic, but usually we are
interested in statistics derived from . A statistic may be computed to answer an inferential question. In this context, if the
dimension of (as a vector) is smaller than the dimension of (as is usually the case), then we have achieved data reduction.
Ideally, we would like to achieve significant data reduction with no loss of information about the inferential question at hand.
Parameters
In the technical sense, a parameter is a function of the distribution of , taking values in a parameter space . Typically, the
distribution of will have real parameters of interest, so that has the form and thus . In
many cases, one or more of the parameters are unknown, and must be estimated from the data variable . This is one of the of the
most important and basic of all statistical problems, and is the subject of this chapter.
Estimators
Suppose now that we have an unknown real parameter taking values in a parameter space . A real-valued statistic
that is used to estimate is called, appropriately enough, an estimator of . Thus, the estimator is a random variable
and hence has a distribution, a mean, a variance, and so on. When we actually run the experiment and observe the data , the
observed value (a single number) is the estimate of the parameter .
Basic Properties
The (random) error is the difference between the estimator and the parameter: . The quality of as an estimator of is
usually measured by the first two moments of the error random variable. Keep in mind that the distribution of (and hence its
moments) depend on the unknown parameter , even though we generally suppress this dependence in our notation. First, the
expected value of the error is known as the bias:
Estimators http://www.math.uah.edu/stat/point/Estimators.html
1 of 9 9/1/2014 2:48 PM
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
1. .
Proof:
This follows from basic properties of expected value. Recall that our point of view is that is deterministic (that is, non-random)
even though unknown.
The concept of bias leads to some natural definitions:
is unbiased if , or equivalently , for all . Thus, the expected value of the estimator is the
parameter being estimated, clearly a desirable property.
is negatively biased if , or equivalently , for all . In this case, the estimator underestimates
the parameter, on average.
is positively biased if , or equivalently , for all . Thus, the estimator tends to overestimate
the parameter, on average.
Our definitions of negative and positive bias are weak in the sense that the weak inequalities and are used. There are
corresponding strong definitions, of course, using the strong inequalities and . Note, however, that none of these definitions
may apply. For example, it might be the case that for some , for other , and
for yet other .
In addition to bias, we want to measure the distance from the estimator to the parameter. The most important ways to do this are
with the mean square error
and its square root, the root mean square error
2.
Proof:
This follows from basic properties of expected value and variance:
In particular, if the estimator is unbiased, then the mean square error of is simply the variance of .
Ideally, we would like to have unbiased estimators with small mean square error. However, this is not always possible, and Exercise
2 shows the delicate relationship between bias and mean square error. In the next section we will see an example with two
estimators of a parameter that are multiples of each other; one is unbiased, but the other has smaller mean square error. However, if
we have two unbiased estimators of , denoted and , we naturally prefer the one with the smaller variance (mean square error).
The relative efficiency of to is simply the ratio of the variances:
Asymptotic Properties
Estimators http://www.math.uah.edu/stat/point/Estimators.html
2 of 9 9/1/2014 2:48 PM
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
Often we have a general formula that defines an estimator of for any sample size . Technically, this gives a sequence of
real-valued estimators of :
In this case, we can discuss the asymptotic properties of the estimators as . Most of the definitions are natural
generalizations of the ones above. First, the sequence of estimators is said to be asymptotically unbiased if
3. is asymptotically unbiased if and only if as for every .
Suppose now that and are two sequences of estimators that are asymptotically unbiased for . The asymptotic
relative efficiency of to is the following limit, if it exists:
Naturally, we expect our estimators to improve, in some sense, as the sample size increases. Specifically, the sequence of
estimators is said to be consistent for if as in probability for each :
4. If as for each then is consistent for .
Proof:
From Markov's inequality,
The condition in Theorem 4 is known as mean-square consistency. Thus, mean-square consistency implies simple consistency. This
is simply a statistical version of the theorem that states that mean-square convergence implies convergence in probability.
In the next several subsections, we will review several basic estimation problems that were studied in the chapter on Random
Samples.
Estimation in the Single Variable Model
Suppose that is a random sample of size from the distribution of a real-valued random variable that
has mean and standard deviation . We will also assume that the fourth central moment is finite. Recall that
is the kurtosis of .
Estimating the Mean
This subsection is a review of some results obtained in the section on the Law of Large Numbers in the chapter on Random
Samples. Recall that a natural estimator of the distribution mean is the sample mean, defined by
5. The sample mean satisfies the following properties:
Estimators http://www.math.uah.edu/stat/point/Estimators.html
3 of 9 9/1/2014 2:48 PM
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
so is an unbiased estimator of . a.
so is a consistent estimator of . b.
6. In the sample mean experiment, set the sampling distribution to gamma. Increase the sample size with the scroll bar and note
graphically and numerically the unbiased and consistent properties. Run the experiment 1000 times and note the apparent
convergence of the sample mean to the distribution mean.
7. Run the normal estimation experiment 1000 times for several values of the parameters. In each case, compare the empirical
bias and mean square error of with the theoretical values.
The consistency of the sample mean as an estimator of the distribution mean is simply the weak law of large numbers.
Moreover, there are a number of important special cases of the results in Exercise 5. See the section on Sample Mean for the
details.
Suppose that , the indicator variable for an event that has probability . Then the sample mean of the random
sample is the relative frequency or empirical probability of , denoted . Hence is an unbiased and
consistent estimator of .
Suppose that denotes the distribution function of a real-valued random variable . Then for fixed , the empirical
distribution function is simply the sample mean for a random sample of size from the distribution of the indicator
variable . Hence is an unbiased and consistent estimator of .
Suppose that is a random variable with a discrete distribution on a countable set and denotes the probability density
function of . Then for fixed , the empirical probability density function is simply the sample mean for a
random sample of size from the distribution of the indicator variable . Hence is an unbiased and
consistent estimator of .
8. In matching experiment, the random variable is the number of matches. Run the simulation 1000 times and note the apparent
convergence of
the sample mean to the distribution mean. a.
the empirical density function to the probability density function. b.
Estimating the Variance
This subsection is a review of some results obtained in the section on the Sample Variance in the chapter on Random Samples.
Recall first that if is known (almost always an artificial assumption), then a natural estimator of is a special version of the
sample variance, defined by
9. is the sample mean of the random sample and satisfies the following properties
so is an unbiased estimator of . a.
so is a consistent estimator of . b.
If is unknown (the more reasonable assumption), then a natural estimator of the distribution variance is the standard version of
the sample variance, defined by
Estimators http://www.math.uah.edu/stat/point/Estimators.html
4 of 9 9/1/2014 2:48 PM
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
10. The sample variance satisfies the following properties:
so is an unbiased estimator of . a.
so is a consistent estimator of . b.
11. Run the exponential experiment 1000 times and note the apparent convergence of the sample standard deviation to the
distribution standard deviation.
12. The following properties compare to as estimators of :
. Thus, is better than , assuming that is known so that we can actually use . a.
The asymptotic relative efficiency of to is 1, so for large sample sizes, works just about as well as . b.
13. Run the normal estimation experiment 1000 times for several values of the parameters. In each case, compare the empirical
bias and mean square error of and of to their theoretical values. Which estimator seems to work better?
Of course, the sample standard deviation is a natural estimator of the distribution standard deviation . Unfortunately, this
estimator is biased.
14. , so is negatively biased as an estimator of .
Thus, we should not be too obsessed with the unbiased property. For most sampling distributions, there will be no statistic with
the property that is an unbiased estimator of and is an unbiased estimator of .
The Poisson Distribution
Let's consider a simple example that illustrates some of the ideas last two subsections. Recall that the Poisson distribution with
parameter has probability density function given by
The Poisson distribution is often used to model the number of random points in a region of time or space, and is studied in more
detail in the chapter on the Poisson process. The parameter is proportional to the size of the region of time or space; the
proportionality constant is the average rate of the random points. The distribution is named for Simeon Poisson.
15. Suppose that has the Poisson distribution with parameter . The factorial moments are for . Hence
a.
b.
c.
Suppose now that is a random sample of size from the Poisson distribution with parameter . From the
previous exercise, is both the mean and the variance of the sampling distribution, so that we could use either the sample mean
or the sample variance as an estimator of . Both are unbiased, so which is better? Naturally, we use mean square error as our
criterion.
Estimators http://www.math.uah.edu/stat/point/Estimators.html
5 of 9 9/1/2014 2:48 PM
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
16. The following properties compare to as estimators of
a.
b.
, so the sample mean is a better estimator of the parameter than the sample variance . c.
The asymptotic relative efficiency of to is . d.
17. Run the Poisson experiment 100 times for several values of the parameter. In each case, compute the estimators and .
Which estimator seems to work better?
18. The emission of elementary particles from a sample of radioactive material in a time interval is often assumed to follow the
Poisson distribution. Thus, suppose that the alpha emissions data set is a sample from a Poisson distribution. Estimate the rate
parameter .
using the sample mean a.
using the sample variance b.
Answer:
8.367 a.
8.649 b.
Estimation in the Bivariate Model
In this subsection we review some of the results obtained in the section on the Correlation and Regression in the chapter on
Random Samples
Suppose that is a random sample of size from the distribution of , where is a
real-valued random variable with mean and standard deviation , and where is a real-valued random variable with mean and
standard deviation . Let , the covariance of and , the correlation of . We need one
higher-order moment as well: let . As usual, we will let and
; these are random samples of size from the distributions of and , respectively.
Estimating the Covariance
If and are known (almost always an artificial assumption), then a natural estimator of the distribution covariance is a special
version of the sample covariance, defined by
19. is the sample mean of the random sample and satisfies
the following properties:
so is an unbiased estimator of . a.
so is consistent estimator of . b.
If and are unknown (usually the more reasonable assumption), then a natural estimator of the distribution covariance is the
Estimators http://www.math.uah.edu/stat/point/Estimators.html
6 of 9 9/1/2014 2:48 PM
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
standard version of the sample covariance, defined by
20. The sample covariance satisfies the following properties:
so is an unbiased estimator of . a.
so is consistent estimator of . b.
21. The following properties compare the sample covariances and as estimators of the distribution covariance :
. Thus, is better than , assuming that and are known so that we can actually use . a.
The asymptotic relative efficiency of to is 1, so for large sample sizes, works just about as well as . b.
Estimating the Correlation
A natural estimator of the distribution correlation is the sample correlation
Note that this statistics is a nonlinear function of the sample covariance and the two sample standard deviations. For most
distributions of , we have no hope of computing the bias or mean square error of this estimator. If we could compute the
expected value, we would probably find that the estimator is biased. On the other hand, even though we cannot compute the mean
square error, a simple application of the law of large numbers shows that as with probability 1. Thus, the estimator
is at least consistent.
Estimating the regression coefficients
Recall that the distribution regression line, with as the predictor variable and as the response variable, is where
On the other hand, the sample regression line is where
Of course, the statistics and are natural estimators of the parameters and , respectively, and in a sense are derived from our
previous estimators of the distribution mean, variance, and covariance. Once again, for most distributions of , it would be
difficult to compute the bias and mean square errors of these estimators. But applications of the law of large numbers show that
with probability 1, and as .
Data Analysis Exercises
Estimators http://www.math.uah.edu/stat/point/Estimators.html
7 of 9 9/1/2014 2:48 PM
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
22. For Michelson's velocity of light data, compute the sample mean and sample variance.
Answer:
852.4, 6242.67
23. For Cavendish's density of the earth data, compute the sample mean and sample variance.
Answer:
5.448, 0.048817
24. For Short's parallax of the sun data, compute the sample mean and sample variance.
Answer:
8.616, 0.561032
25. Consider the Cicada data.
Compute the sample mean and sample variance of the body length variable. a.
Compute the sample mean and sample variance of the body weight variable. b.
Compute the sample covariance and sample correlation between the body length and body weight variables. c.
Answer:
24.0, 3.92 a.
0.180, 0.003512 b.
0.0471, 0.4012 c.
26. Consider the M&M data.
Compute the sample mean and sample variance of the net weight variable. a.
Compute the sample mean and sample variance of the total number of candies. b.
Compute the sample covariance and sample correlation between the number of candies and the net weight. c.
Answer:
57.1, 5.68 a.
49.215, 2.3163 b.
2.878, 0.794 c.
27. Consider the Pearson data.
Compute the sample mean and sample variance of the height of the father. a.
Compute the sample mean and sample variance of the height of the son. b.
Compute the sample covariance and sample correlation between the height of the father and height of the son. c.
Answer:
67.69, 7.5396 a.
68.68, 7.9309 b.
3.875, 0.501 c.
Estimators http://www.math.uah.edu/stat/point/Estimators.html
8 of 9 9/1/2014 2:48 PM
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
The estimators of the mean, variance, and covariance that we have considered in this section have been natural in a sense. However,
for other parameters, it is not clear how to even find a reasonable estimator in the first place. In the next several sections, we will
consider the problem of constructing estimators. Then we return to the study of the mathematical properties of estimators, and
consider the question of when we can know that an estimator is the best possible, given the data.
Estimators http://www.math.uah.edu/stat/point/Estimators.html
9 of 9 9/1/2014 2:48 PM
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)

You might also like