Fstats ch2 PDF
Fstats ch2 PDF
2 ESTIMATION
Objectives
After studying this chapter you should
• be able to calculate confidence intervals for the mean of a
normal distribution with unknown variance;
• be able to calculate confidence intervals for the variance and
standard deviation of a normal distribution;
• be able to calculate approximate confidence intervals for a
proportion;
• be able to calculate approximate confidence intervals for the
mean of a Poisson distribution.
2.0 Introduction
In Chapter 9 of the text Statistics the idea of a confidence interval
was introduced. Confidence intervals are used when we want to
estimate a population parameter from a sample. The parameter
may be estimated by a single value (a point estimate) but it is
usually preferable to estimate it by an interval which will give
some indication of the amount of uncertainty attached to the
estimate. In Statistics, estimation of the mean of a normal
population with known standard deviation was considered.
If x is the mean of a random sample of size n from a normal
distribution with mean µ and standard deviation σ there is a
σ
probability of 0.95 that x lies within 1.96 of µ . If this is the
n
case the interval
σ
x ± 1.96
n
will contain µ . This interval is called a 95% confidence interval
for µ .
If further samples of size n were taken and the calculation
repeated, different intervals would be calculated. 95% of these 0.025 0.025
0.95
intervals would contain µ , but 5% would not.
Note: although µ is unknown, it does not vary, it is the intervals σ
µ + 1.96
σ
µ − 1.96 µ x
that vary. It is possible to calculate 99% or even 99.9% n n
confidence intervals which would be wider than the 95% interval
but it is not possible to calculate 100% confidence intervals.
21
Chapter 2 Estimation
Example
The length of time a bus takes to travel from Chorlton to All
Saints in the morning rush hour is normally distributed with
standard deviation 4 minutes. A random sample of 6 journeys
took 23, 19, 25, 34, 24 and 28 minutes. Find
Solution
153
The sample mean is = 25.5
6 0.025 0.025
0.95
Probably the journey times for similar journeys had been studied
and the standard deviation found to be about four minutes. The
statement that the standard deviation was 'known' was probably
something of an exaggeration. The same argument applies to the
normal distribution. However, as you are dealing with the sample
mean, no great error will result from assuming a normal
distribution unless the distribution is extremely unusual.
In the case of the bus journey times σ̂ = 5.09 , this is most easily
22
Chapter 2 Estimation
( x − x )2
σ̂ 2 = ∑ (n − 1)
or equivalent can be used.
σ
For a 95% confidence interval, σ known, the formula x ± 1.96
n
was used. Now the known standard deviation σ will be replaced by
an estimate σ̂ . There will be some uncertainty in this estimate
since, if you started again and took a different sample of the same
size, the estimate of σ̂ would almost certainly be different. It
therefore seems reasonable that to allow for this extra uncertainty,
the interval should be widened by increasing the figure of 1.96
which came from tables of the normal distribution. How much you
need to increase it by has fortunately been calculated for you and is
tabulated in tables of the t distribution.
In the example there were 6 bus journey times and so the estimate of
5.09 for the standard deviation is based on 5 degrees of freedom.
To find a 95% confidence interval you therefore require the 0.025 0.95 0.025
upper and lower 0.025 tails of the t distribution with 5 degrees
of freedom, denoted t 5 . As with the standard normal distribution,
the t distribution is symmetrical about zero and the required values –2.571 2.571 t5
are ± 2.571.
23
Chapter 2 Estimation
5.09
25.5 ± 2.571 × i.e. 25.5 ± 5.34 or ( 20.2, 30.8)
6
Note: For the use of the t distribution to be valid the data must be
normally distributed. However, small deviations from
normality will not seriously affect the results.
σ̂
x ± tn −1
n
Example
The resistances (in ohms) of a random sample from a batch of
resistors were
Solution
The data gives x = 2373. 4 and σ̂ = 47. 4.
⇒ 2373. 4 ± 43.8
⇒ (2330, 2417) .
(ii) t6, 0.05 = 1.943, giving
0.05 0.90 0.05
90% confidence interval for the mean as
47. 4
2373. 4 ± 1.943 ×
7 –1.943 1.943 t6
⇒ 2373. 4 ± 34.8
⇒ (2339, 2408) .
24
Chapter 2 Estimation
Activity 1
6.86 11.53 12.41 12.08 12.80 10.42 8.99 9.55 8.23 5.84
7.59 5.96 10.14 10.12 10.22 10.42 11.83 8.73 11.57 11.83
10.76 9.93 10.63 7.94 12.44 12.49 9.63 9.45 13.40 10.78
13.44 11.85 13.62 13.24 12.56 10.56 10.77 8.51 11.65 9.36
8.12 11.88 11.68 7.36 7.07 10.04 9.55 12.97 10.85 8.58
8.27 9.22 11.36 9.43 8.80 9.07 7.66 13.16 8.34 7.12
3.49 13.04 13.16 11.48 8.30 10.01 10.29 11.78 13.18 8.18
10.00 12.27 14.18 9.91 9.62 7.48 8.50 10.53 13.06 6.74
6.05 9.96 7.51 10.19 9.07 9.29 6.01 12.02 10.04 10.64
9.74 8.23 9.45 5.41 9.68 10.64 6.77 10.76 8.10 10.33
8.34 11.61 9.72 11.24 12.84 6.10 10.78 8.27 8.52 7.42
8.91 8.52 10.66 14.06 9.37 10.44 11.81 9.87 9.78 10.44
10.82 10.10 7.68 11.87 7.49 9.99 8.54 4.65 5.37 8.83
15.00 10.02 9.41 8.16 9.54 9.32 6.15 12.59 12.24 13.02
9.80 8.61 8.92 8.86 11.92 13.01 14.11 11.57 10.46 11.27
8.35 8.95 9.12 7.20 11.20 13.42 13.46 12.80 10.99 10.33
14.31 7.72 9.88 10.57 13.20 11.90 8.48 9.41 7.76 10.35
8.78 9.45 11.48 10.96 7.68 9.26 14.29 8.35 6.80 8.29
8.83 10.72 10.02 11.80 13.56 13.00 10.79 7.51 8.15 10.14
11.02 8.49 9.82 8.97 9.86 7.74 11.81 9.87 10.77 9.18
Now take the next sample of 3 and repeat the calculation. Carry
on until you have calculated at least 20, and preferably more,
intervals. If possible work with a group so that the labour of
calculation may be divided up between you.
25
Chapter 2 Estimation
Exercise 2A
1. Samples of a high temperature lubricant were 3. As part of a research study on pattern recognition
tested and the temperature ( ° C) at which they a random sample of students on a design course
ceased to be effective were as follows: were asked to examine a picture and see if they
235 242 235 240 237 234 239 237 could recognise a word. The picture contained
the word 'technology' written backwards. The
Calculate a 95% confidence for the mean. times, in seconds, taken to recognise the word
2. In a study aimed at improving the design of bus were as follows:
cabs the functional arm reach of a random sample 55, 28, 79, 54, 87, 61, 62, 68, 38
of bus drivers was measured. The results, in mm,
were as follows: Calculate
701, 642, 651, 700, 672, 674, 656, 649 (a) a 95% confidence interval for the mean,
Calculate a 95% confidence interval for the mean. (b) a 99% confidence interval for the mean.
To do this you need to use the fact that for a sample of size n
from a normal distribution
( x − x )2
∑ σ2
is distributed as
26
Chapter 2 Estimation
( x − x )2
a probability of 0.95 that ∑ σ2
will lie between them.
0.831 12.833
25.9
0.831 < 5 × < 12.833
σ2
1
⇒ 0.006415 < < 0.0990965
σ2
Example
In processing grain in the brewing industry, the percentage
extract recovered is measured. A particular brewery introduces
a new source of grain and the percentage extract on eleven
separate days is as follows:
95.2, 93.1, 93.5, 95.9, 94.0, 92.0, 94.4, 93.2, 95.5, 92.3, 95.4
27
Chapter 2 Estimation
Solution
For this data
n = 11 x = 94.045 σ̂ = 1.34117
(a)
(i) 90% confidence interval for variance is given by
1.341172
3.94 < 10 × < 18.307 0.05 0.05
σ2
0.90
1
⇒ 0.2190 < 2 < 1.0178
σ 3.440
3.940 18.307
(b) The mean of the previous source of grain was 94.2. This lies
in the middle of the confidence interval calculated for the
mean of the new source of grain. There is therefore no
evidence that the means differ.
The standard deviation of the previous source of grain was
2.5 and hence the variance was 2.52 = 6.25 .
This is above the upper limit of the confidence interval for
the variance of the new source of grain. This suggests that
the new source gives less variability.
Combining these two conclusions suggests that the new
source is preferable to the previous source.
28
Chapter 2 Estimation
Activity 2
Take a sample of size 6 from the data in Activity 1. Calculate
an 80% confidence interval for the standard deviation.
Exercise 2B
1. Using the data in Questions 1, 2 and 3 of Assuming a normal distribution calculate a 95%
Exercise 2A, calculate 95% confidence intervals confidence interval for the standard deviation.
for the population standard deviations. A greengrocer claimed that the method of
2. The external diameter, in cm, of a random determining the vitamin C content was extremely
sample of piston rings produced on a particular unreliable and that the observed variability was
machine were more due to errors in the determination rather
9.91, 9.89, 10.12, 9.98, 10.09, than to actual differences between lemons. To
9.81, 10.01, 9.99, 9.86 check this 7 independent determinations were
made of the vitamin C content of the same
Calculate a 95% confidence interval for the lemon. The results were as follows
standard deviation. Assume normal
distribution. 1.21, 1.22, 1.21, 1.23, 1.24, 1.23, 1.22
Do your results support the manufacturer's claim Assuming a normal distribution, calculate a 90%
that the standard deviation is 0.06 cm? confidence interval for the standard deviation of
the determinations. Does your result support the
3. The vitamin C content of a random sample of 5 greengrocer's claim?
lemons was measured. The results in 'mg per
10 g' were
1.04, 0.95, 0.63, 1.62, 1.11
29
Chapter 2 Estimation
r
The proportion of claims for less than £500 is estimated by .
n
np(1 − p) p(1 − p )
= .
n2 n
0.15 × 0.85
= 0.0010625
120
⇒ 0.15 ± 0.064
⇒ ( 0.086, 0.214) .
In general, the formula for an approximate confidence interval
for a proportion is
p̂(1 − p̂)
p̂ ± z
n
30
Chapter 2 Estimation
Example
Employees of a firm carrying out motorway maintenance are
issued with brightly coloured waterproof jackets. These come
in five different sizes numbered 1 to 5. The last 40 jackets
issued were of the following sizes
2 3 3 1 3 3 2 4 3 2 5 4 1 2 3 3 2 4 5 3
2 4 4 1 5 3 3 2 3 3 1 3 4 3 3 2 5 1 4 4
Solution
(a) (i) There are 15 out of 40 requiring size 3, a proportion of
15
= 0.375 . An approximate 95% confidence interval
40
is given by
0.375 × 0.625
0.375 ± 1.96 ×
40
⇒ 0.375 ± 0.150
⇒ (0.225, 0.525) .
(ii) The confidence interval is approximate because an
estimate of p is used (the true value is unknown) and
because the normal distribution is used as an
approximation to the binomial distribution.
p̂(1 − p̂)
p̂ ± z .
n
Hence if interval is p̂ ± 0.1 ,
0.375 × 0.625
0.1 = z
40
31
Chapter 2 Estimation
i.e. z = 1.306 ;
this is (1 − 2 × 0.096 ) × 100 = 81 per cent confidence 0.9042
0.0958
interval.
0.375×.625
0.1 = 1.96
n
⇒ n = 9. 4888
⇒ n = 90.04
Sample of size 90 needed.
Exercise 2C
1. When a random sample of 80 climbing ropes 3. A large civil engineering firm issues every new
were subjected to a strain equivalent to the employee with a safety helmet. Five different
weight of ten climbers, 12 of them broke. sizes are available numbered 1 to 5. A random
Calculate a 95% confidence interval for the sample of 90 employees required the following
proportion of ropes which would break under this sizes
strain. 2 4 2 2 2 5 4 5 4 4
2. Data from a completed questionnaire were 4 2 4 3 4 2 3 1 5 4
entered into a computer as a series of binary
3 2 3 3 3 4 3 2 4 4
digits (i.e. each digit was 0 or 1). A check on
1000 digits revealed errors in 19 of them. 3 4 4 5 3 3 3 2 4 4
Assuming the probability of an error is the same 2 2 3 2 3 2 3 3 5 4
for each digit entered, calculate a 90%
2 3 4 2 4 3 2 2 3 2
confidence interval for the proportion of digits
where an error will be made. 3 4 2 3 4 5 2 3 3 2
4 3 2 2 3 3 3 2 3 4
2 3 2 4 2 3 3 2 2 3
32
Chapter 2 Estimation
⇒ (118.6, 165. 4) .
In general the formula is
m±z m
The calculation could have been carried out by finding the mean
number of plants observed in 10 areas of 1m2 and basing the
calculation on this. However, there would be no advantage in
this as it would give the identical answer.
14.2
14.2 ± 1.96
10
33
Chapter 2 Estimation
Exercise 2D
1. Cars pass a point on a motorway during the 3. The number of a certain type of organism
morning rush hour at random at a constant suspended in a liquid follows a Poisson
average rate. An observer counts 212 cars distribution. 10 cc of the liquid are found to
passing during a 5 minute interval. Calculate contain 35 of the organisms. Calculate
(a) a 95% confidence interval for the mean (a) a 90% confidence interval for the mean
number of cars passing in a 5 minute interval, number of organisms per 10 cc,
(b) a 95% confidence interval for the mean (b) a 95% confidence interval for the mean
number of cars passing in a one minute number of organisms per cc,
interval,
(c) a 99% confidence interval for the mean
(c) a 90% confidence interval for the mean number of organisms per 100 cc.
number of cars passing in a two minute
interval, A further 10 cc of the liquid were examined and
found to contain 26 of the organisms. Modify
(d) a 99% confidence interval for the mean your answers to (a), (b) and (c) to take account
number of cars passing in an hour. of this additional data.
2. The number of times a machine needs resetting
on a night shift follows a Poisson distribution.
On three randomly selected nights it was reset 9,
5 and 11 times. Calculate a 95% confidence
interval for the average number of times it needs
resetting per night.
34
Chapter 2 Estimation
Assuming that the claim of a standard (a) Calculate a 95% confidence interval for the
deviation of 300 hours is correct and that the mean weight.
lives of the new components follow a normal (b) Find the proportion of packets in the sample
distribution, calculate weighing less than 191 g and use your result
(i) a 90% confidence interval for the to calculate an approximate 95% confidence
mean working life of the components, interval for the proportion of all packets
weighing less than 191 g.
(ii) how many components it would be
necessary to test to make the width of (c) Assuming that the mean weight is at the lower
a 90% confidence interval for the limit of the interval calculated in (a), what
mean just less than 100 hours. proportion of packets would weigh less than
182 g?
(c) Lives of components commonly follow a
distribution that is not normal. If the (d) Discuss the suitability of the packets from the
assumption of normality is invalid in this point of view of the average quantity system.
case, comment briefly on the amount of A simple adjustment will change the mean
uncertainty in your answers to (b)(i) and weight of future packages. Changing the
(b)(ii). standard deviation is possible but very
expensive. Without carrying out any further
(d) Using all the information available, compare calculations, discuss any adjustments you
the two designs and recommend which one might recommend. (AEB)
should be used. (AEB)
6. A car manufacturer introduces a new method of
4. The resistances (in ohms) of a sample from a assembling a new component. The old method
batch of resistors were had a mean assembly time of 42 minutes with a
2314, 2456, 2389, 2361, 2360, 2332, 2402. standard deviation of 4 minutes. The
manufacturer would like the assembly time to be
Assuming that the sample is from a normal
as short as possible and to have as little
distribution,
variation as possible. He expects the new
(a) Calculate a 90% confidence interval for the method to have a smaller mean but to leave the
standard deviation of the batch. variability unchanged. A random sample of
Past experience suggests that the standard assembly times, in minutes, taken after the new
deviation, σ , is 35 ohms. method had become established was
(b) Calculate a 95% confidence interval for the 27, 19, 68, 41, 17, 52, 35, 72, 38.
mean resistance of the batch A statistician glanced at the data and said she
thought the variability had increased.
(i) assuming σ = 35 ,
(a) Suggest why she said this.
(ii) making no assumption about the
standard deviation. (b) Assuming the data may be regarded as a
(c) Compare the merits of the confidence random sample from a normal distribution,
intervals calculated in (b). (AEB) calculate a 95% confidence interval for the
standard deviation. Does this confirm the
5. Packets of baking powder have a nominal weight statistician’s claim or not?
of 200 g. The distribution of weights is normal
and the standard deviation is 7 g. Average (c) Calculate a 90% confidence interval for the
quantity system legislation states that, if the mean using a method which is appropriate in
nominal weight is 200 g, the light of your answer to (b).
(i) the average weight must be at least (d) Comment on the suitability of the new
200 g, process. (AEB)
(ii) not more than 2.5% of packages 7. Stud anchors are used in the construction
may weigh less than 191 g, industry. Samples are tested by embedding them
(iii) not more than 1 in 1000 packages may in concrete and applying a steadily increasing
weigh less than 182 g. load until the stud fails.
A random sample of 30 packages had the (a) A sample of 6 tests gave the following
following weights: maximum loads in kN
218 207 214 189 211 206 203 217 183 186 27.0, 30.5, 28.0, 23.0, 27.5, 26.5
219 213 207 214 203 204 195 197 213 212 Assuming a normal distribution for maximum
188 221 217 184 186 216 198 211 216 200 loads, find 95% confidence intervals for
(i) the mean,
(ii) the standard deviation.
35
Chapter 2 Estimation
(b) If the mean was at the lower end and the (c) Of the 26 who replied, 10 had obtained
standard deviation at the upper end of the employment. Of all redundant miners who
confidence intervals calculated in (a), find would reply to a questionnaire, a proportion
the value of k which the maximum load p have obtained emplyment. Approximately
would exceed with probability 0.99. how many replies would be necessary to
obtain 95% confidence interval of width 0.1
Safety regulations state that the greatest load
for p?
that may be applied under working conditions
is
( x − 2 σ̂ ) where x is the mean and σ̂ 2 is
(d) Using the results of (b) and (c) estimate
approximately how many letters should be
3 sent out to give a high probability of
the unbiased estimate of variance calculated obtaining sufficient replies to calculate the
from a sample of 6 tests. Calculate this confidence interval in (c). Explain your
figure for the data above and comment on the answer. (AEB)
adequacy of this regulation in these
circumstances. 9. It is known that repeated weighings of the same
(AEB) object on a particular chemical balance give
readings which are normally distributed with
8. A campaign to combat the economic devastation mean equal to the mass of the object. Past
caused to coalfield communities by pit closures esperience suggests that the standard deviation,
employed a researcher. The campaign organisers σ , is 0.25 mg. Seven repeated weighings gave
wished to know the proportion of redundant the following readings (mg).
miners who were able to find alternative
19.3, 19.5, 19.1, 19.0, 19.8, 19.7, 19.4
employment within a year of becoming
redundant. (a) Use the data to calculate a 95% confidence
interval for σ .
(a) The researcher found that the probability of a
redundant miner visited at home refusing to (b) Calculate a 95% confidence interval for the
answer a questionnaire is 0.2. mass of the object assuming σ = 25 mg.
What is the probability that on a day when he (c) Calculate 95% confidence interval for the
visits twelve redundant miners at home mass of the object, making no assumption
(i) 3 or fewer will refuse to answer the about σ , and using only data from the
questionnaire, sample.
(ii) exactly 3 will refuse to answer the (d) Give two reasons for preferring the
questionnaire, confidence interval calculated in (b) to that
calculated in (c).
(iii) at least 10 will agree to answer the
questionnaire?
(b) The researcher decided to try a postal survey
and as a pilot scheme sent out 70
questionnaires to randomly selected
redundant miners. There were 26 completed
questionnaires returned. Calculate an
approximate 95% confidence interval for the
proportion of redundant miners who would
return a completed questionnaire.
36