Module 06 - One Population Parameter Estimation - Topic 4A
Module 06 - One Population Parameter Estimation - Topic 4A
PARAMATER
ESTIMATION
In the real world, population parameters are almost always unknown, because they represent
summary measurements about large populations.
We will discuss the estimation of population parameters which are unknown using known sample
statistics.
Estimation
(i) Point estimation: This is a single number (or value) calculated from available sample
data and used to estimate a population parameter.
Example
(ii) Interval estimation: This consists of two numbers that define a range of values, which
will enclose the unknown population parameter at some specified
probability level.
Example
We are 90% sure that the population mean for Tutorial Quiz 1 (out of 10) lies between 6 and 7
.
Confidence Intervals
Since a sample statistic such as varies from sample to sample, this variation can be taken into
consideration when estimating the true parameter. Hence an “Interval Estimate” for the population
mean is obtained.
The interval that is constructed has a specified confidence or probability of correctly estimating the
true value of the population parameter.
an interval can be constructed for which there will be a 95% confidence that this interval includes
the true value of the parameter. In other words, if we repeat the process of obtaining samples and
constructing intervals, in the long run 95% of these intervals will contain the true population mean.
.
These intervals are called confidence intervals. The “level of confidence” is given as 100 (1 – α) %
for a given value 0 < α < 1. The choice of the confidence level is somewhat arbitrary.
The choice of the critical value and standard error depends on what information we have about the
population.
In this topic we will be only estimating the population mean or proportion for one population. The
following table shows the point estimator (the sample statistic) that will be used when calculating
the confidence intervals.
Population Mean µ 𝑥̅
Population Proportion p 𝑝̂
The normal distribution has such useful properties that even in cases when the data is far from
“normal” (for example, when it has a marked positive skew), a transformation (taking logarithms or
square root are the most commonly used transformations) may be applied to the original data to
make it more symmetric (and therefore more like a normal distribution). The transformed data is
then analysed as if it came from a normal distribution.
We also know that the powerful result known as the Central Limit Theorem guarantees that the
sampling distribution of the mean tends to a Normal Distribution as the sample size increases.
The critical value is zα/2 while the standard error of the sample mean is
√
Hence the confidence interval to estimate the population mean given that the population standard
deviation is known is as follows
𝜎
𝑥̅ ± 𝑧
√𝑛
Example
An Internet service provider (ISP) conducted a survey of 250 of its customers, and found that the
average amount of time spent online was 10.5 hours per week. It is known that the population
standard deviation is 5.2 hours.
Construct a 95% confidence interval for the average online time for all users of this particular ISP.
Solution
5.2
10.5 ± 1.96 ×
√250
10.5 ± ( 0.65)
Thus we can say that we are 95% confident that the average online time for all users of this particular
Internet server is between 9.855 and 11.145 hours per week
Just as the mean μ of the population is usually unknown, the population standard deviation σ is also
usually unknown. Then we estimate σ by s, which is the sample standard deviation. In this case, the
standard error of 𝑥̅ will have to be estimated as . But now x can no longer be taken from a
√
s n
standard normal distribution.
Student t distributions
The t-distributions were discovered by William S Gosset in 1908. Gosset was a statistician employed
by the Guinness brewing company, which had stipulated that he not publish under his own name.
He therefore wrote under the pen name “Student”. The Student t-distributions were named after
him; and these distributions arise in the following situation.
x
If a random variable X has a normal distribution, then s n has a “t-distribution” with (n - 1)
“degrees of freedom (d.f.)”. This is a positive integer. A t-distribution variable is denoted by ‘t’ and
d.f. is specified as a subscript. For example, t8.
• The graph extends indefinitely to the left and right, and is “mound-shaped”.
• A t-curve is symmetric about its mean.
• The high point of the t-distribution occurs at its mean, which is always equal to zero.
• There are an infinite number of t-curves. Each is determined the degrees of freedom.
• When the d.f. increase, the t-distribution gets closer to a standard normal distribution.
• Critical values/probabilities are tabularised in t-distribution tables.
ta/2, n – 1 is the critical value of the t-distribution with (n – 1) degrees of freedom (d.f.). while the
standard error of the sample mean is
√
Hence the confidence interval to estimate the population mean given that the population standard
deviation is unknown is as follows
𝑠
𝑥̅ ± 𝑡
√𝑛
A random sample of 9 packets of a certain breakfast cereal is taken, and reveals an average fibre
content of 3.6 grams, with a standard deviation of 0.9 grams.
Construct a 90% confidence interval for the true average fibre content for this breakfast cereal.
Assume that the fibre content is normally distributed.
Solution
𝑠
𝑥̅ ± 𝑡
√𝑛
0.9
3.6 ± 1.860 ×
√9
3.6 ± ( 0.56)
Thus, we can say that a 90% confidence interval for the true average fibre content for this breakfast cereal is
between 3.04 and 4.16 grams.
When we are dealing with a categorical variable (e.g. gender, preference, etc.), the parameter we
are most interested in is the proportion, p. Consider a population, of which only some individuals
possess the desired characteristic. We may be interested in estimating the proportion possessing
that characteristic.
If we take a sample of size n (sampling without replacement) and count the number which have the
characteristic ("number of successes"), x, then x will have a binomial distribution with parameters n
and p. If n is sufficiently large, (np and nq both greater than 5) the binomial distribution (of x) can
be approximated by a normal distribution.
The same principles used for the confidence interval for the mean are used for the confidence
interval of the population proportion. Here we want to obtain a plausible range of values for the
population proportion, p. Keep in mind, p should have a value between 0 and 1. However when we
use the sample proportion in constructing confidence intervals for p, this can lead to confidence
intervals which contain values outside of 0 and 1.
Hence the confidence interval to estimate the population proportion given that the sample is large
is as follows
𝑝̂ 𝑞
𝑝̂ ± 𝑍
𝑛
Example
A survey was taken of women in a major city to determine what factor was the most important in
deciding where to shop. The results appear below. If the sample size was 1200, estimate with 95%
confidence the proportion of women who identified "price and value" as the most important factor.
Factor Percentage
Price and Value 40%
Quality and Selection of Merchandise 30%
Service 15%
Shopping Environment 15%
Solution
ˆˆ
pq
pˆ z 2
n
0.4 0.6
0.4 1.96
1200
0.4 0.028
0.372 p 0.428
Thus we can say that a 95% confidence interval for the true proportion of women who identified "price and
value" as the most important factor is between 37.2% and 42.8%.
The sampling procedure, together with the sample size, controls the total amount of relevant information in
a sample. At this point in our study, we are concerned with the simplest sampling situation – in other words,
simple random sampling from a relatively large population.
We need to recognise that the confidence intervals we have covered are basically the sample statistic plus
or minus an error value. If this error value (known as the maximum error or B) is known, we can then calculate
the sample size needed. The “B” stands for “(error) bound” here.
The sample size determination formulae we are about to give you come from the formulae for the maximum
desired error of the estimates. Basically, they come from the corresponding confidence interval formulae.
The formula is then solved for n.
Be sure to round the answer obtained UP to the next whole number, not off to the nearest whole number. If
you round off, then you will exceed your maximum error of the estimate in some cases.
By rounding up, you will have a smaller maximum error of the estimate than allowed, but this is better than
having a larger one than desired.
𝑧 / 𝜎
Estimating µ 𝑛≥
𝐵
Note: If the population standard deviation is unknown, then we have to estimate σ so we can determine
the sample size. In such cases, it is acceptable to use the following estimate. = range / 4
𝑧 /
Estimating µ 𝑛≥ 𝑝̂ 𝑞
𝐵
Note: When we are not given an estimate of the “true” proportion in advance, we should assume it will be
50%. (i.e. 𝑝̂ = 𝑞 = 0.5)
Example
A fast food company wants to determine the average number of times that fast food users visit fast food
restaurants per week. They have decided that their estimate needs to be accurate to within one-tenth of a
Solution
B = 0.1
= 0.7
α = 0.05 Hence z0.025 = 1.96
2 2
z 1.96 0.7
n /2 188.2384.
B 0.1
Example
A publisher wants to know what percent of the population might be interested in a new magazine on making
the most of your retirement. Secondary data (that is several years old) indicates that 22% of the population
is retired. They are willing to accept an error rate of 5%, and they want to be 95% certain that their finding
does not differ from the true rate by more than 5%. What is the required sample size?
Solution
B = 0.05
𝑝̂ = 0.22
α = 0.05 Hence z0.025 = 1.96
2 2
z 1.96
n / 2 pq 0.22 0.78 263.687.
B 0.05