0% found this document useful (0 votes)
6 views

Module 06 - One Population Parameter Estimation - Topic 4A

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Module 06 - One Population Parameter Estimation - Topic 4A

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ONE POPULATION

PARAMATER
ESTIMATION

A Statistician is one who collects data and draws confusion.


H.M. Berston

200032 Statistics for Business Page 71


4.1 Introduction
When computing probabilities in sampling distributions problems, we need to know the value of
the relevant parameters – a highly unlikely circumstance.

In the real world, population parameters are almost always unknown, because they represent
summary measurements about large populations.

We will discuss the estimation of population parameters which are unknown using known sample
statistics.

Estimation

The estimation procedures that we cover fall into two categories:

(i) Point estimation: This is a single number (or value) calculated from available sample
data and used to estimate a population parameter.

Example

𝑥̅ is a logical point estimate of the population mean µ.

(ii) Interval estimation: This consists of two numbers that define a range of values, which
will enclose the unknown population parameter at some specified
probability level.

Example

We are 90% sure that the population mean for Tutorial Quiz 1 (out of 10) lies between 6 and 7
.

Confidence Intervals

Since a sample statistic such as varies from sample to sample, this variation can be taken into
consideration when estimating the true parameter. Hence an “Interval Estimate” for the population
mean is obtained.

The interval that is constructed has a specified confidence or probability of correctly estimating the
true value of the population parameter.

200032 Statistics for Business Page 72


Example

an interval can be constructed for which there will be a 95% confidence that this interval includes
the true value of the parameter. In other words, if we repeat the process of obtaining samples and
constructing intervals, in the long run 95% of these intervals will contain the true population mean.
.

These intervals are called confidence intervals. The “level of confidence” is given as 100 (1 – α) %
for a given value 0 < α < 1. The choice of the confidence level is somewhat arbitrary.

The basic form for all confidence intervals is as follows:

Sample  critical value  standard error of the sample

The choice of the critical value and standard error depends on what information we have about the
population.

In this topic we will be only estimating the population mean or proportion for one population. The
following table shows the point estimator (the sample statistic) that will be used when calculating
the confidence intervals.

Estimating Population Sample

Population Mean µ 𝑥̅

Population Proportion p 𝑝̂

4.2 Estimating μ and p


Topic 3B introduced the Normal Distribution; by far the most commonly used continuous
distribution as many phenomena follow a normal distribution. It also usually gives a good
approximation even when the normal distribution does not apply, so the normal distribution is often
used as an approximation – especially in modern finance.

The normal distribution has such useful properties that even in cases when the data is far from
“normal” (for example, when it has a marked positive skew), a transformation (taking logarithms or
square root are the most commonly used transformations) may be applied to the original data to
make it more symmetric (and therefore more like a normal distribution). The transformed data is
then analysed as if it came from a normal distribution.

We also know that the powerful result known as the Central Limit Theorem guarantees that the
sampling distribution of the mean tends to a Normal Distribution as the sample size increases.

200032 Statistics for Business Page 73


Estimating µ with  known

Assumptions of the population:

1. The population is normally distributed; or


2. If the population is not normal, then the sample size must be large, (Central Limit
Theorem).
3. The population standard deviation σ is known.

The critical value is zα/2 while the standard error of the sample mean is

Hence the confidence interval to estimate the population mean given that the population standard
deviation is known is as follows

𝜎
𝑥̅ ± 𝑧
√𝑛

Example

How much time do computer users spend on the “information superhighway”?

An Internet service provider (ISP) conducted a survey of 250 of its customers, and found that the
average amount of time spent online was 10.5 hours per week. It is known that the population
standard deviation is 5.2 hours.

Construct a 95% confidence interval for the average online time for all users of this particular ISP.

Solution

95% confidence interval: 100 (1 – α) = 95, α = 0.05 z0.025 = 1.96


An ISP conducted a survey of 250 customers: n = 250.
Average amount of time spent online was 10.5: 𝑥̅ = 10.5
Population standard deviation is 5.2 hours:  = 5.2.
𝜎
𝑥̅ ± 𝑧
√𝑛

5.2
10.5 ± 1.96 ×
√250

10.5 ± ( 0.65)

9.85 < 𝜇 < 11.15

Thus we can say that we are 95% confident that the average online time for all users of this particular
Internet server is between 9.855 and 11.145 hours per week

200032 Statistics for Business Page 74


Estimating µ with  unknown

Just as the mean μ of the population is usually unknown, the population standard deviation σ is also
usually unknown. Then we estimate σ by s, which is the sample standard deviation. In this case, the
standard error of 𝑥̅ will have to be estimated as . But now x   can no longer be taken from a

s n
standard normal distribution.

Student t distributions

The t-distributions were discovered by William S Gosset in 1908. Gosset was a statistician employed
by the Guinness brewing company, which had stipulated that he not publish under his own name.
He therefore wrote under the pen name “Student”. The Student t-distributions were named after
him; and these distributions arise in the following situation.
x
If a random variable X has a normal distribution, then s n has a “t-distribution” with (n - 1)
“degrees of freedom (d.f.)”. This is a positive integer. A t-distribution variable is denoted by ‘t’ and
d.f. is specified as a subscript. For example, t8.

Properties of the t-distribution

• The graph extends indefinitely to the left and right, and is “mound-shaped”.
• A t-curve is symmetric about its mean.
• The high point of the t-distribution occurs at its mean, which is always equal to zero.
• There are an infinite number of t-curves. Each is determined the degrees of freedom.
• When the d.f. increase, the t-distribution gets closer to a standard normal distribution.
• Critical values/probabilities are tabularised in t-distribution tables.

Assumptions of the population:

1. The population is normally distributed; or


2. If the population is not normal, then the sample size must be large, (Central Limit Theorem).
3. The population standard deviation σ is known.

ta/2, n – 1 is the critical value of the t-distribution with (n – 1) degrees of freedom (d.f.). while the
standard error of the sample mean is

Hence the confidence interval to estimate the population mean given that the population standard
deviation is unknown is as follows

𝑠
𝑥̅ ± 𝑡
√𝑛

200032 Statistics for Business Page 75


Example

A random sample of 9 packets of a certain breakfast cereal is taken, and reveals an average fibre
content of 3.6 grams, with a standard deviation of 0.9 grams.

Construct a 90% confidence interval for the true average fibre content for this breakfast cereal.
Assume that the fibre content is normally distributed.

Solution

90% confidence interval: 100 (1 – α) = 10, α = 0.10


sample of 9 packets: n=9 d.f. = n – 1 = 8 t8, 0.05 = 1.86
reveals an average fibre content of 3.6: 𝑥̅ = 3.6
with a standard deviation of 0.9 grams: s = 0.9.

𝑠
𝑥̅ ± 𝑡
√𝑛

0.9
3.6 ± 1.860 ×
√9

3.6 ± ( 0.56)

3.04 < 𝜇 < 4.16

Thus, we can say that a 90% confidence interval for the true average fibre content for this breakfast cereal is
between 3.04 and 4.16 grams.

Estimating p – the population proportion

When we are dealing with a categorical variable (e.g. gender, preference, etc.), the parameter we
are most interested in is the proportion, p. Consider a population, of which only some individuals
possess the desired characteristic. We may be interested in estimating the proportion possessing
that characteristic.

If we take a sample of size n (sampling without replacement) and count the number which have the
characteristic ("number of successes"), x, then x will have a binomial distribution with parameters n
and p. If n is sufficiently large, (np and nq both greater than 5) the binomial distribution (of x) can
be approximated by a normal distribution.

The same principles used for the confidence interval for the mean are used for the confidence
interval of the population proportion. Here we want to obtain a plausible range of values for the
population proportion, p. Keep in mind, p should have a value between 0 and 1. However when we
use the sample proportion in constructing confidence intervals for p, this can lead to confidence
intervals which contain values outside of 0 and 1.

200032 Statistics for Business Page 76


When we have a large sample, the critical value is zα/2 while the standard error of the sample
( )
proportion is or more simply

Hence the confidence interval to estimate the population proportion given that the sample is large
is as follows

𝑝̂ 𝑞
𝑝̂ ± 𝑍
𝑛

Example

A survey was taken of women in a major city to determine what factor was the most important in
deciding where to shop. The results appear below. If the sample size was 1200, estimate with 95%
confidence the proportion of women who identified "price and value" as the most important factor.

Factor Percentage
Price and Value 40%
Quality and Selection of Merchandise 30%
Service 15%
Shopping Environment 15%

Solution

95% confidence interval ; α = 0.05. Hence z0.025 = 1.96.


The sample size was 1200 ; n = 1200.
Price and Value most important: 𝑝̂ = 0.4.

ˆˆ
pq
pˆ  z 2
n
0.4  0.6
0.4  1.96
1200
0.4  0.028
0.372  p  0.428
Thus we can say that a 95% confidence interval for the true proportion of women who identified "price and
value" as the most important factor is between 37.2% and 42.8%.

200032 Statistics for Business Page 77


4.3 Sample Size Determination
One of the most frequent questions asked of the statistician is,

How many measurements should be included in the sample?

The sampling procedure, together with the sample size, controls the total amount of relevant information in
a sample. At this point in our study, we are concerned with the simplest sampling situation – in other words,
simple random sampling from a relatively large population.

We need to recognise that the confidence intervals we have covered are basically the sample statistic plus
or minus an error value. If this error value (known as the maximum error or B) is known, we can then calculate
the sample size needed. The “B” stands for “(error) bound” here.

The sample size determination formulae we are about to give you come from the formulae for the maximum
desired error of the estimates. Basically, they come from the corresponding confidence interval formulae.
The formula is then solved for n.

Be sure to round the answer obtained UP to the next whole number, not off to the nearest whole number. If
you round off, then you will exceed your maximum error of the estimate in some cases.
By rounding up, you will have a smaller maximum error of the estimate than allowed, but this is better than
having a larger one than desired.

Sample Size Determination when Estimating µ and p


When estimating the population mean or proportion, we can rearrange the error term in the confidence
intervals that use z as the critical value.

𝑧 / 𝜎
Estimating µ 𝑛≥
𝐵

Note: If the population standard deviation is unknown, then we have to estimate σ so we can determine
the sample size. In such cases, it is acceptable to use the following estimate.  = range / 4

𝑧 /
Estimating µ 𝑛≥ 𝑝̂ 𝑞
𝐵

Note: When we are not given an estimate of the “true” proportion in advance, we should assume it will be
50%. (i.e. 𝑝̂ = 𝑞 = 0.5)

Example

A fast food company wants to determine the average number of times that fast food users visit fast food
restaurants per week. They have decided that their estimate needs to be accurate to within one-tenth of a

200032 Statistics for Business Page 78


visit, and they want to be 95% sure that their estimate does not differ from the true number of visits by more
than one-tenth of a visit. Previous research has shown that the standard deviation is 0.7 visits. What is the
required sample size?

Solution

From the question we can determine the following

B = 0.1
 = 0.7
α = 0.05 Hence z0.025 = 1.96

2 2
z   1.96  0.7 
n    /2     188.2384.
 B   0.1 

Hence a sample size of 189 or more is needed

Example

A publisher wants to know what percent of the population might be interested in a new magazine on making
the most of your retirement. Secondary data (that is several years old) indicates that 22% of the population
is retired. They are willing to accept an error rate of 5%, and they want to be 95% certain that their finding
does not differ from the true rate by more than 5%. What is the required sample size?

Solution

From the question we can determine the following

B = 0.05
𝑝̂ = 0.22
α = 0.05 Hence z0.025 = 1.96

2 2
z   1.96 
n    / 2  pq     0.22  0.78  263.687.
 B   0.05 

Hence a sample size of 264 or more is needed

200032 Statistics for Business Page 79

You might also like