Cambridge International AS & A Level Mathematics: Probability & Statistics 2
Cambridge International AS & A Level Mathematics: Probability & Statistics 2
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2
ve
y
op
ni
U
C
3 Niheda wishes to choose a representative sample of six employees from the 78 employees at her place of work.
ge
w
a Niheda considers taking as her sample the first six people arriving at work one morning. Give two reasons
ie
id
why this method is unsatisfactory.
ev
br
b Niheda decides to use the following method to choose her sample. She numbers each employee at her place
am
-R
of work and generates the following random numbers on her calculator:
-C
s
642 784 034 796 313 215 950 850 565 013 311 170 929
es
From these random numbers, she chooses employees 40 47 63 32 59 and 8. Explain how she chose
y
Pr
op
these employees.
ity
C
4 The manufacturer of a new chocolate bar wishes to find out what people think of it. The manufacturer decides
rs
w
to interview a sample of people. Describe the bias in the method used to select each of the following samples.
ie
ve
y
a A sample of people who have just bought the chocolate bar.
ev
op
ni
C
ge
c The first 20 males shopping at a store where the chocolate bar is sold.
w
ie
id
5 Milek wishes to choose a sample of four students from a class of 16 students. The students are numbered from
ev
br
3 to 18, inclusive. Milek throws three fair dice and adds the scores. Explain why this method of choosing the
am
sample is biased.
-R
-C
6 Describe briefly how to use random numbers to choose a sample of 50 employees from a company with
s
es
712 employees.
y
Pr
op
106
ity
C
Different samples of data chosen from the same population will most likely have different,
ie
ve
op
ni
The sample mean is the mean of all the items in your chosen sample.
R
The sample size is the number of items you choose to be in your sample.
ge
You can explore the distribution of sample means using any distribution, discrete or
ie
id
For example, suppose you spin a fair four-sided spinner, numbered 1, 1, 2 and 4, a number
am
-R
of times. With a sample size of five, you could get the following outcomes:
-C
1 1 4 2 4 or 1 1 1 4 1 or 4 1 2 2 1 or …
es
Pr
op
12 8 10
The sample means for each of the samples presented are = 2.4 , = 1.6 and = 2,
5 5 5
ity
C
respectively. It would take a very long time to list all possible samples and work out each
rs
w
sample mean. If we did, we could create a table showing the probability distribution of the
ie
ve
sample mean and create a graph to show the distribution of the sample means.
y
ev
op
ni
To show how this works we can begin with the simplest sample size, 1, and explore the
R
probability distribution of the means of increasing sample sizes using this fair four-sided
C
ge
spinner.
w
ie
id
Let the random variable X be ‘the score on the spinner when it is spun’.
ev
br
When we spin the spinner, it is equally likely to land on each side. With a sample size of 1
am
-R
s
es
ve
y
op
ni
U
C
To distinguish between the probability distribution of scores and the probability distribution
ge
w
of sample means we use X to represent the random variable of sample means. To follow the
ie
id
explanation it is easier to refer to the distribution of sample size 1 as X (1).
ev
br
The probability distribution of X (1) is:
am
-R
Sample mean, x 1 2 4
-C
1 1 1
s
P( X (1)) = x
es
2 4 4
y
Pr
The following figure shows the graph of the probability distribution of the sample means of
op
size 1.
ity
C
P(X (1))
rs
w
0.5
ie
ve
y
ev
0.4
op
ni
R
0.3
C
ge
0.2
w
ie
0.1
id
ev
br
0 1 2 3 4
am
Sample mean, x
-R
E( X (1)) = 1 × + 2 × + 4 ×
1 1 1
-C
2 4 4
es
=2
y
Pr
op
= 5.5 − 22
rs
w
= 1.5
ie
ve
y
Suppose we now choose random samples of size 2. If X1 is the score from the first spin and X 2
ev
op
ni
the score from the second spin, then we can draw a table to show all possible sample means.
R
X1
ge
1 1 2 4
ie
id
ev
br
1 1 1 1 1 2 1
2 2
am
-R
1 1 1 1 1 2 1
X2 2 2
-C
2 1 1 1 1 2 3
es
2 2
y
4 2 1 2 1 3 4
Pr
op
2 2
ity
C
From this we can find the probability distribution of the sample means of size 2, X (2). For
rs
w
example, there are 16 possible sample means and the sample mean 1 21 appears four times in
ie
ve
4 1
the table; hence, P( X (2) = 1 21 ) = = .
y
ev
16 4
op
ni
R
Sample mean, x 1 1 1 2 2 1 3 4
2 2
ge
1 1 1 1 1 1
ie
id
P( X (2)) = x
4 4 16 4 8 16
ev
br
am
This probability distribution is the distribution of the sample means of size 2. The following
-R
diagram shows the graph of the probability distribution of the sample means of size 2.
-C
s
es
ve
y
op
ni
U
C
P(X (2))
ge
REWIND
w
0.5
ie
id
We can explore these results using our
0.4
ev
br
knowledge of linear combinations
am 0.3 of random variables from Chapter 3.
-R
0.2 From Chapter 3 we know that:
-C
s
0.1
es
Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ).
y
Pr
1 2 3 4
op
ity
C
( ) ( ) (
E( X (2)) = 1 × 1 + 1 1 × 1 + 2 × 1 + 2 1 × 1 + 3 × 1 + 4 × 1) ( ) ( ) ( ) E( X (2)) = E ( X 1 + X 2 )
1
2
rs
4 2 4 16 2 4 8 16
w
=2
ie
= E X 1 + X 2
ve
1 1
y
2
ev
2
( ) 1
( ) 1
2 2
+ 22 ×
1
Var( X (2)) = 12 × + 1 1
op
ni
1
× + 21 × 1 1
4 2 4 16 2 4 = E( X ) + E( X )
R
C
2 2
= E( X ) = 2 (as before)
+ 32 × + 42 ×
1 1
ge
− 22
w
8 16 1
Var( X (2)) = Var ( X 1 + X 2 )
ie
id
= 4.75 − 4 = 0.75 2
ev
br
1 1
= Var X 1 + X 2
Note that E( X (1)) = E( X (2)), whereas Var( X (1)) ≠ Var(X (2)) . 2
am
-R 2
1 1 1
In fact, Var( X (2)) = Var( X (1)). = 2 Var(X ) + 2 Var(X )
-C
2 2 2
s
es
To confirm these results will always work, we can explore what happens when 1 1
= Var(X ) = × 1.5
we take a sample size of 3. 2 2
y
Pr
108
The probability distribution of the sample mean scores for X (3) is shown in
ity
C
ve
P( X (3)) = x
C
size of 3.
ie
id
EXPLORE 5.5
ev
br
-R
See if you can verify the probabilities of all other scores in the 3
following six ways:
distribution table for samples of size 2.
-C
12 4 14 2 214 2 41 412 4 21
s
es
The following diagram shows the graph of the probability distribution of the
of 1 × 1 × 1 = 1 . Hence, the
Pr
op
P(X (3)) 7
score is 6 = 3 .
3 32 16
rs
w
0.5
For a sample of size 3, a mean
ie
ve
0.4
score 2 can happen in the following
y
ev
0.3
op
ni
four ways:
0.2
R
0.1 11 4 1 4 1 4 11 2 2 2
ge
( )
ev
br
3 1×1×1 +1×1×1 = 3 + 1
2 2 4 4 4 4 16 64
am
-R
= 13
64
-C
s
es
ve
y
op
ni
U
C
1 4 3 5 3 13 7 3
ge
E( X (3)) = 1 × + × + × + 2 × + ×
w
8 3 16 3 32 64 3 16
ie
id
8 3 3 10 3 1
+ × + 3 × + × + 4 ×
ev
br
am 3 64 32 3 64 64
-R
1 1 5 13 7 1 9 5 1
= + + + + + + + +
8 4 32 32 16 8 32 32 16
-C
s
64
= =2
es
32
y
Pr
4 2 3 5 2 3 2 2 × 13 + 7 × 3
2
Var(X (3)) = 12 × + ×
op
1
+ × +
8 3 16 3
32 64 3
16
ity
C
8 2 3 10 2 3
+ × + 32 ×
3
+ 2 1
rs
w
× + 4 × 64 − 2
2
ie
3 64 32 3 64
ve
y
ev
1 1 25 13 49 1 27 25 1
op
ni
= + + + + + + + + − 22
8 3 96 16 48 3 32 48 4
R
C
1
= 4 21 − 4 =
ge
w
2
ie
id
ev
br
If X1 is the score from the first spin, X 2 the score from the second spin, and X 3 the score
am
3 3 3
es
y
109
3 3 3
ity
C
1 1 1
= 2 Var(X1 ) + 2 Var(X 2 ) + 2 Var(X 3 )
3 3 3
rs
w
1 1 1
ie
ve
op
ni
R
EXPLORE 5.6
ge
w
ie
id
Are you able to use your knowledge of linear combinations of random variables to
ev
br
work out the results E( X (4)) and Var( X (4)) for a sample of size 4?
am
-R
What about our original sample size of 5? Can you find E( X (5)) and Var( X (5))
without having to list all possible samples?
-C
What do you think the graph of the sample means will look like as the sample size n
Pr
op
increases?
ity
C
rs
w
ie
ve
op
ni
If you take many samples and calculate the mean of each sample, these means have a distribution
R
called the distribution of the sample mean. A sample mean can be regarded as a random variable.
ge
found, then:
σ2
ev
br
-R
-C
s
es
ve
y
op
ni
U
C
ge
WORKED EXAMPLE 5.1
w
ie
id
a Show that for samples of size 1 drawn from a fair six-sided die numbered 1, 2, 3, 4, 5 and 6, E( X (1)) = 3
ev
1
br
2 and
35 .
Var( X (1)) =
am
-R
12
b Work out E( X (2)) and Var( X (2)).
-C
s
es
Answer
y
Pr
op
ity
C
21 table.
= = 3 21
rs
w
6
ie
ve
Var( X (1)) = 12 × + 22 × + 32 × + 42 ×
1 1 1 1
y
ev
6
op
ni
6 6 6
R
C
+ 52 × + 62 × − 3
1 1 1
6 6 2
ge
w
91 49 35
ie
id
= − =
6 4 12
ev
br
b E( X (2)) = E( X ) + E( X ) = E( X ) = 3 21
-R
2 2 algebra, as you have found
-C
1 1 1 1 35 35 E( X ) and Var( X ).
s
2 2 2 2 12 24
y
Pr
op
110
ity
C
We have now found that the means of random samples of size n from a population with
2
mean µ and variance σ 2 will have a distribution with mean µ and variance σ , but what
ie
ve
y
ev
n
op
ni
Below are the graphs of the probability distributions for sample means of size 1, 2 and 3 for
ge
P(X
X (1)) P(X
X (2)) P(X
X (3))
ev
br
-R
s
es
Pr
ity
C
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
Sample mean, x Sample mean, x Sample mean, x
rs
w
ie
ve
We can see the shape of the distribution changes as n increases; this tells us the distribution
y
ev
of sample means does not depend on the shape of the original distribution.
op
ni
R
To examine the shape of the probability distribution of sample means as n increases, let us
C
explore an example using a more familiar object, the sample mean of scores on an ordinary
ge
fair die, numbered 1, 2, 3, 4, 5 and 6. The following graph shows the probability distribution
ie
id
-R
-C
s
es
ve
y
op
ni
U
C
P(X (1))
ge
w
0.2
ie
id
0.15
ev
br
am 0.1
-R
0.05
-C
s
es
0 1 2 3 4 5 6
Sample mean, x
y
Pr
op
For sample size of 1, the probability distribution graph is uniform; each score has
probability 1 .
ity
C
rs
w
The following graph shows the probability distribution of the sample means of size 2.
ie
ve
y
ev
P(X (2))
op
ni
R
0.2
C
ge
0.15
w
ie
0.1
id
ev
br
0.05
am
-R
0 1 2 3 4 5 6
Sample mean, x
-C
s
es
For sample of size 2, the probability distribution graph is symmetrical about the mean value.
y
Pr
op
111
If we draw probability distribution graphs for larger sample sizes; for example, samples of
ity
size 6 and 10, shown on the following graphs, we can see the distribution of sample means
C
ve
P(X (6))
y
ev
0.1
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
0
-C
1.0 6.0
es
Sample mean, x
y
P(X (10))
Pr
op
0.1
ity
C
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
0
am
-R
1.0 6.0
Sample mean, x
-C
s
es
ve
y
op
ni
U
C
For a fair ordinary die, and where the sample size is 1, the original distribution is uniform
ge
w
(rectangular). From sample of size 2 onwards, the graphs of the probability distributions of
ie
id
sample means show the peak of the graph at the mean of the original distribution.
ev
br
As the sample size increases, the probability of getting a sample mean further away from
am
-R
the actual mean of the distribution, such as the mean of samples of size 1, becomes smaller
and smaller. Hence, the variance of the distribution of the sample mean becomes smaller
-C
s
as n becomes larger.
es
y
Pr
op
EXPLORE 5.7
ity
C
rs
You can create probability distributions graphs for different sample sizes. Search
w
ve
y
ev
op
ni
R
C
EXPLORE 5.8
ge
w
ie
id
Look back at your graphs generated from means of samples of single-digit random
ev
br
numbers from Explore 5.4. What conclusions can you now draw from these graphs?
am
-R
-C
Pr
op
112 For large sample sizes, the distribution of a sample mean is approximately normal. This normal
distribution will have mean µ and variance σ .
2
ity
C
n
This result is true for all distributions of sample means, regardless of whether the underlying
rs
w
population is normal. This is the fundamental property of the central limit theorem.
ie
ve
y
ev
op
ni
R
The central limit theorem (CLT) states that, provided n is large, the distribution of sample
U
σ 2
X ( n ) ~ N µ , , where the original population has mean µ and variance σ 2 .
ie
id
n
ev
br
The value of sample size n required for the central limit theorem to be a good
am
-R
approximation. This depends on the distribution of the original population. If the original
es
population is approximately normal, then the distribution of sample means for a low
y
Pr
op
value of n is sufficient. However, if the original population does not display any features
of a normal distribution, then the value of n will need to be large. For any population, the
ity
C
central limit theorem can be used for sample size n > 50.
rs
w
ve
y
ev
of sample means from a normal distribution must also be a normal distribution since
op
ni
σ2
E( X ( n )) = µ and Var( X ( n )) =
R
n
means becomes more peaked and centred around µ , as shown in the following diagram.
ge
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
f(x)
ge
( 2
X(n) ~ N µ, σ )
w
n
ie
id
ev
br
am
-R
-C
s
es
y
Pr
op
ity
C
rs
w
ie
ve
x
y
ev
op
ni
0 µ 50
R
C
ge
w
KEY POINT 5.6
ie
id
ev
br
The central limit theorem is important when samples of data are being explored, because the
distribution of means of samples is approximately normal even when the parent population is not
am
-R
normal. The central limit theorem allows the use of the normal distribution to make statistical
judgements from sample data from any distribution.
-C
s
es
y
Pr
op
113
WORKED EXAMPLE 5.2
ity
C
rs
The masses of a variety of pears are normally distributed with mean 45 g and variance 52 g2. The pears are
w
ie
ve
packed in bags of six. Find the percentage of bags of pears with a total mass of more than 300 g.
y
ev
op
ni
Answer 1
R
300
Sample mean X ~ N 45, . In a bag with total
52
ge
6 − 45
w
1− Φ = 1 − Φ(1.698) 6
52 mass 300 g, each will have an average mass of 300 . Use
ie
id
6
ev
br
6
normal tables to calculate the probability and, hence,
= 1 − 0.9553 = 0.0447 = 4.47%
am
-R
the percentage.
-C
Answer 2
es
1− Φ = 1 − Φ(1.698)
312
Pr
X ~ N(270, 312)
rs
w
ie
ve
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
WORKED EXAMPLE 5.3
w
ie
id
During an exercise session, women will drink, on average, 500 ml of water with a standard deviation of 50 ml.
ev
br
25 women are taking part in the exercise session. You have available 13 litres of water. What is the probability
am
-R
you will have sufficient water?
-C
s
Answer
es
The situation described has µ = 500 and σ = 50. You do not know if the situation follows a normal
y
Pr
op
distribution.
The probability of sufficient water implies less than
ity
C
13000 ml will be needed by the group of women. However, the distribution of sample means is normal,
rs
and its mean is the same as the population.
w
ve
13000 Use normal tables to calculate the probability.
y
average, less than = 520ml.
ev
op
ni
25
R
2
U
X ~ N 500, 50
C
25
ge
w
ie
id
520 − 500
P( X < 520) = P Z <
ev
br
50
25
am
-R
= P( Z < 2 ) = 0.977
-C
s
es
y
Pr
op
114
WORKED EXAMPLE 5.4
ity
C
ve
x
0ø x ø 2
y
ev
f( x ) = 2
op
ni
0 otherwise
R
3
of X is less than .
ie
id
2
ev
br
Answer
am
-R
2 REWIND
2
x2 x3 4 First find the mean and variance of X .
Mean =
∫ dx = = 3
-C
of X .
2
x3
2 Chapter 4.
y
dx −
4
∫
Pr
0 2 3
probability.
ity
C
2
x4 16 2
= − 9 = 9
rs
w
ve
2
= ÷ 50
X ~ N ,
4 2
y
ev
op
ni
3 450 9
2
R
3 4 =
C
450
−
ge
3
w
450
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
WORKED EXAMPLE 5.5
w
ie
id
An IT security firm detects threats to steal online data at the rate of 12.2 per day. The
ev
br
threats occur singly and at random. A random sample of 100 weeks is chosen. Find
am
-R
the probability that the average weekly number of threats detected is less than 86.
-C
s
Answer 1
es
Let the random variable T be the First define the distribution.
y
REWIND
Pr
op
ity
C
of T .
Poisson distribution in
T ~ N 85.4,
85.4
rs
w
ve
1 1 .
1 =
y
ev
op
ni
C
85.4
ge
100
w
= Φ(0.6439) = 0.74
ie
id
ev
1
br
1
-R
not ± .
2
-C
To explain why, for this example we can find the required probability using an
s
es
alternative method.
y
Answer 2
Pr
op
115
Let the random variable X be the This time, we are using mean over the TIP
ity
C
number of threats over 100 weeks. Then: whole interval of 100 weeks. When using the central
rs
ve
Poisson distributions,
w
In the first method, we found P (T < 86), whereas in the second method we found the continuity
ie
id
1
P( X < 8600); 86 is 100 times smaller than 8600, and the continuity correction 1 correction is ± .
ev
br
2n
is 100 times smaller than 1 . 200
am
-R
2
-C
s
es
Pr
op
The random variable X ~ B ( 60, 0.25 ). The random variable X is the mean of a random sample of 50 observations
ity
C
of X . Find P( X ø 16).
rs
w
ie
ve
Answer
y
ev
op
ni
X ~ N 15,
11.25 Mean np = 60 × 0.25 = 15.
R
50
C
1 1 .
16 + 100 − 15 Use continuity correction
ie
( )
id
2 × 50
P X ø 16 = Φ = Φ( 2.129 ) = 0.983
ev
br
11.25
50
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
E RCISE 5B
w
ie
id
1 The random variable X has mean 6 and variance 8. The random variable X is the mean of a random
ev
br
sample of 80 observations of X . State the approximate distribution of X , giving its parameters, and find the
am
-R
probability that the sample mean is less than 6.4.
-C
s
2 The random variable X has mean 30 and variance 36. The random variable X is the mean of a random
es
sample of 100 observations of X . State the approximate distribution of X , giving its parameters, and find the
y
Pr
probability that the sample mean is greater than 31.
op
ity
C
3 The random variable Y has mean 21 and standard deviation 4.2. The random variable Y is the mean of a
random sample of 50 observations of Y . State the approximate distribution of Y , giving its parameters, and
rs
w
ve
y
ev
op
ni
4 The time taken for telephone calls to a call centre to be answered is normally distributed with mean
R
20 seconds and standard deviation 5 seconds. Find the probability that for 16 randomly selected calls made to
C
ge
the centre, the mean time taken to answer the calls is less than 18 seconds.
w
ie
id
PS 5 Ciara needs 5 kg of flour, so she buys 10 bags, each labelled as containing 500 g. Unknown to her, the bags
ev
br
contain, on average, 510 g with variance 120 g2. What is the probability that Ciara actually buys less flour
am
PS 6 The length, in cm, of an electrical component produced by a company may be considered to be a continuous
s
es
Pr
5
op
116
1.8 ø x ø 2.2
f( x ) = 2
ity
C
0 otherwise
rs
w
a Calculate the probability that the mean, X , of a random sample of 40 of these components is
ie
ve
y
greater than 2.05 cm.
ev
op
ni
b Calculate the probability that the mean, X , of a random sample of 20 of these components is
R
w
ie
7 A random sample of size 60 is taken from the random variable X , where X ~ B(45, 0.4). Given that X is the
id
ev
-R
a P( X , 19)
b P( X ø 18)
-C
s
es
8 A random sample of size 50 is taken from random variable X , where X ~ Po( 2 ). Find P(1.5 < X ø 2.2), where
y
Pr
ity
C
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
w
Checklist of learning and understanding
ie
id
ev
br
● ‘Population’ means all the items of interest within a study.
am
-R
● ‘Sample’ describes part of a population.
-C
s
es
● Random numbers can be used to generate a sample in which you have no control over the
y
Pr
selection.
op
● Random sampling is a process whereby each member of the population has an equal chance of
ity
C
selection.
rs
w
● Random sampling does not guarantee that the resulting sample will be representative of the
ie
ve
population.
y
ev
op
ni
● The central limit theorem allows the use of the normal distribution to make statistical
R
C
judgements from sample data from any distribution.
ge
For samples of size n drawn from a population with mean µ and variance σ 2, the distribution
w
●
σ2
ie
id
ev
br
am
-R
-C
s
es
y
Pr
op
117
ity
C
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
y
Pr
op
ity
C
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
END-OF-CHAPTER REVIEW EXERCISE 5
w
ie
id
PS 1 The mean and standard deviation of the time spent by visitors at an art gallery are 3.5 hours and 1.5 hours,
ev
br
respectively.
am
-R
a Find the probability that the mean time spent in the art gallery by a random sample of:
-C
s
i 60 people is more than 4 hours [3]
es
y
Pr
op
b What assumption(s), if any, did you need to make in part a ii? [1]
ity
C
PS 2 The score on a four-sided spinner is given by the random variable X with probability distribution
rs
w
ve
y
X 2 3 4 5
ev
op
ni
C
a Show that the variance is 1.01. [3]
ge
w
b The spinner is spun 100 times and each score noted. Let S be the random variable for the sum of
ie
id
ev
br
am
-R
c Use a normal distribution to work out the probability that the sum of the 100 observations is
less than 350. Explain why you can use the normal distribution in this situation. [4]
-C
PS 3 The burn time, in minutes, for a certain brand of candle can be modelled by a normal distribution
es
with mean 90 and standard deviation 15.6. Find the probability that a random sample of five candles,
y
Pr
op
118 each one lit immediately after another burns out, will burn for a total of 500 minutes or less. [5]
ity
C
4 A random sample of 35 observations is to be taken from a normal distribution with mean 15 and
variance 9. If X is the sample mean, find:
rs
w
ie
P( X < 16.2)
ve
a [4]
y
ev
op
ni
M 5 There are 12 equally talented children at a sports club. Jamil wishes to choose one child at random
ge
from these children to represent the club. The children are numbered 1, 2, 3 and so on up to 12.
w
Jamil then throws two ordinary fair dice, each numbered 1 to 6, and he finds the sum of the scores.
ie
id
He chooses the child whose number is the same as the sum of the scores.
ev
br
am
s
es
6 Dominic wishes to choose a random sample of five students from the 150 students in his year.
y
He numbers the students from 1 to 150. Then he uses his calculator to generate five random
Pr
op
numbers between 0 and 1. He multiplies each random number by 150 and rounds up to the next
ity
rs
w
i Dominic’s first random number is 0.392. Find the student number that is produced by this
ie
ve
op
ni
ii Dominic’s second student number is 104. Find a possible random number that would produce
R
iii Explain briefly why five random numbers may not be enough to produce a sample of five
w
ev
br
-R
-C
s
es
ve
y
op
ni
U
C
ge
w
ie
id
M 7 It is known that the number, N , of words contained in the leading article each day in a certain
ev
br
amnewspaper can be modelled by a normal distribution with mean 352 and variance 29. A researcher
-R
takes a random sample of 10 leading articles and finds the sample mean, N , of N.
-C
s
es
ii Find P(N > 354). [3]
y
Pr
op
ity
C
8 Jyothi wishes to choose a representative sample of 5 students from the 82 members of her school year.
rs
w
i She considers going into the canteen and choosing a table with five students from her year sitting
ie
ve
at it, and using these five people as her sample. Give two reasons why this method is unsatisfactory. [2]
y
ev
op
ni
ii Jyothi decides to use another method. She numbers all the students in her year from 1 to 82. Then
R
C
she uses her calculator and generates the following random numbers.
ge
w
231492 762305 346280
ie
id
From these numbers, she obtains the student numbers 23, 14, 76, 5, 34 and 62. Explain how Jyothi
ev
br
obtained these student numbers from the list of random numbers. [3]
am
-R
Cambridge International AS & A Level Mathematics 9709 Paper 73 Q1 June 2015
-C
PS 9 The editor of a magazine wishes to obtain the views of a random sample of readers about the future of
es
the magazine.
y
Pr
op
i A sub-editor proposes that they include in one issue of the magazine a questionnaire for readers 119
to complete and return. Give two reasons why the readers who return the questionnaire would not
ity
C
The editor decides to use a table of random numbers to select a random sample of 50 readers
ie
ve
y
from the 7302 regular readers. These regular readers are numbered from 1 to 7302. The first few
ev
op
ni
random numbers which the editor obtains from the table are as follows.
R
ii Use these random numbers to select the first three members in the sample. [2]
ie
id
ev
br
-R
M 10 The lengths of time people take to complete a certain type of puzzle are normally distributed with mean
48.8 minutes and standard deviation 15.6 minutes. The random variable X represents the time taken, in
-C
minutes, by a randomly chosen person to solve this type of puzzle. The times taken by random samples
es
of 5 people are noted. The mean time X is calculated for each sample.
y
Pr
op
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
y
op
ni
U
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
y
Pr
op
ity
C
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
y
Pr
op
120
ity
C
Chapter 6
rs
w
ie
ve
Estimation
y
ev
op
ni
R
C
ge
■ calculate unbiased estimates of the population mean and variance from a sample
ev
br
■ formulate hypotheses and carry out a hypothesis test concerning the population mean in cases
am
-R
where the population is normally distributed with known variance or where a large sample is used
■
-C
determine and interpret a confidence interval for a population mean in cases where the
s
population is normally distributed with known variance or where a large sample is used
es
■
y
determine, from a large sample, an approximate confidence interval for a population proportion.
Pr
op
ity
C
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
PREREQUISITE KNOWLEDGE
w
ie
id
ev
br
Where it comes from What you should be able Check your skills
am to do
-R
Probability & Statistics 1, Calculate the mean and Calculate the mean, variance and standard
-C
Chapters 2 and 3 variance from raw and deviation for the following data sets:
s
es
summarised data.
1 n = 11 ∑ x = 16.5 ∑ x 2 = 25.85
y
Pr
2 n = 8 ∑ x = 434 ∑ x 2 = 26 630
op
3 20 24 15 18 16 25 22
ity
C
rs
w
ie
ve
Chapter 1 Formulate and carry out State the null and alternative hypotheses and test
y
ev
op
ni
R
C
at 10%
ge
w
6 X ~ N (54, 32 ); sample value 50; one-tailed test
ie
id
at 5%
ev
br
-R
at 1%
-C
Probability & Statistics 1, Know how to approximate Express the following as approximate normal
s
es
normal distribution.
Pr
8 X ~ B(42, 0.4)
op
121
9 X ~ B(100, 0.55)
ity
C
rs
w
ie
ve
y
ev
op
ni
Chapter 5 explained that it is not always possible to collect data about every item in a
C
ge
population. There are many practical situations when it is necessary to use a sample
w
to obtain information about a population. For example, an asthma attack may lead to
ie
id
a hospital admission. Sample data allow us to estimate the number of people likely to
ev
br
require a stay in hospital following an asthma attack. Studying the length of the hospital
am
-R
stay will allow us to estimate hospital staffing and other resources. In turn, this allows the
hospital to assess its resources and plan for the needs of other patients.
-C
predictions of their numbers. For example, the estimate of the population of mountain
y
Pr
op
gorillas is that there are fewer than 800 left in the world. Snow leopards live in 12 countries
in central Asia. Since the start of this century, the estimated number of snow leopards has
ity
C
decreased by about 20%. The actual numbers of snow leopards and mountain gorillas are
rs
w
ve
y
ev
Consider a study that claims two-thirds of adults living in a particular country are
op
ni
overweight. It is unlikely that every adult in that country was weighed; yet the study states
R
they have evidence to justify their claim. That evidence comes from summary statistics
ge
The summary statistics calculated from a sample, the sample mean and the sample
ev
br
variance, are used to draw conclusions about the whole population based on the evidence
am
from the sample. These calculated summary statistics, since they only use part of a
-R
-C
s
es
ve
y
op
ni
U
C
population, are estimates. To differentiate between sample statistics and population
ge
w
statistics, the following convention is used:
ie
id
● Population parameters, such as mean and variance, use Greek letters µ and σ 2 , respectively.
ev
br
● Estimates of population parameters from a sample are written using Roman letters; for
am
-R
example, x is the sample mean and s 2 is the sample variance.
-C
Note that the subject you are studying is ‘statistics’ and confusingly an estimate of a
s
es
population parameter is called a statistic; so estimates of a population’s mean and variance
y
Pr
op
ity
C
REWIND
rs
w
Section 5.2 in the previous chapter explained about the sample mean, x . This is an estimate for the
ie
ve
mean, µ, of a population. The sample mean is an unbiased estimate since the expected value of the
y
ev
op
ni
sampling distribution of the sample mean is equal to the mean of the population, the parameter it
R
is estimating.
C
ge
As an example, suppose you wish to find out the average number of fiction books people read
w
each month. You cannot ask the entire population, so instead you ask a sample of the population
ie
id
and work out the average number of fiction books read each month from the sample data. For an
ev
br
unbiased estimate you need to use an unbiased sampling method, such as random sampling, that
am
-R
ensures all members of the population have an equal chance of being selected for the sample; and,
of course, you must ask unambiguous questions, making it clear that you are only interested in
-C
Pr
op
122
6.1 Unbiased estimates of population mean and variance
ity
C
unknown parameter in a population. The bias of an estimate is the difference between the
ie
ve
y
expected value of the estimate and the true value of the parameter. This difference is the
ev
op
ni
The reliability of an estimate can also depend on the variance of the population. A
ge
population with a small variance implies that the data are not widely dispersed and any
ie
id
with a large variance implies that the data are widely dispersed and so an unrepresentative
sample may easily arise.
am
-R
-C
A statistic is an unbiased estimate of a given population parameter when the mean of the sampling
ity
C
If Û is some statistic derived from a random sample taken from a population, then Û is an unbiased
ie
ve
op
ni
The most efficient estimate is one that is unbiased and has the smallest variance.
R
C
ge
All the examples presented in Chapter 5 involved a sample from a population with known
ie
id
variance. In practice, if you do not know the population mean, you are unlikely to know
ev
br
-R
-C
s
es
ve
y
op
ni
U
C
To explore the sampling distribution of the sample variance, we can return to the example
ge
w
about the spinner numbered 1, 1, 2, 4. In Section 5.2, we found that this distribution has
ie
id
mean 2 and variance 1.5.
ev
br
To explore the variance as the statistic, for a sample size of 1 we can work out the
am
-R
expectation of the variance E(V ).
-C
s
∑ x2
es
Sample outcomes ∑ x2 x Variance, v = − x2 Probability (outcome)
1
y
Pr
op
1
1 1 1 0
ity
2
C
rs
w
2 4 2 0 4
ie
ve
1
y
ev
4 16 4 0 4
op
ni
R
C
The sample variance, for sample size of 1, E(V ) = 0 .
ge
w
This is not equal to the variance of the original population, so the sample variance is not
ie
id
ev
br
am
We do not need to explore the variance for another sample size since a single example
-R
that shows the variance is biased is sufficient to prove the point. However, it is worthwhile
-C
exploring other sample sizes to see if there is a possible connection between the sample
s
es
Pr
For a sample of size 2, first list all possible sample outcomes, together with the variance
op
123
and probability of choosing that sample.
ity
C
rs
w
∑ x2
Sample outcomes ∑ x2 x Variance, v = − x2 Probability (outcome)
ie
ve
2
y
ev
op
ni
4
11 2 1 0
R
16
C
ge
1
22 8 2 0
w
16
ie
id
1
ev
br
44 32 4 0
16
am
-R
1 1 4
12 5 1 2 4 16
-C
1 1 4
es
14 17 2 2
2 4 16
y
Pr
op
2
24 20 3 1
16
ity
C
rs
w
You can check the values for the probabilities of these sample outcomes in Chapter 5,
ie
ve
Section 5.2.
y
ev
op
ni
We can now draw a probability distribution table for the sample variance, sample size of 2.
R
1
ge
1
v 0 1 2
w
4 4
ie
id
P (V = v ) 3 1 1 1
ev
br
8 4 8 4
am
-R
-C
s
es
ve
y
op
ni
U
C
Hence, E(V ) = 0 × + × + 1 × + 2 × =
3 1 1 1 1 1 3
ge
w
8 4 4 8 4 4 4
ie
id
3
Comparing this result with the variance of the original population, we find that E(V ) =
ev
br
4
and σ 2 = 1 21 , the variance of the original population.
am
-R
3 1 n −1
Notice that = × 1 21 , or E(V ) × σ 2, where n is the sample size.
-C
s
4 2 n
es
We need more than just this example to see if this relationship between the original
y
Pr
variance and the estimate of variance always holds.
op
Here are the data for a sample size of 3. You can refer to Chapter 5, Section 5.2 for
ity
C
rs
w
ie
ve
∑ x2
Sample outcome ∑ x2 x Variance, v = − x2 Probability (outcome)
y
ev
op
ni
R
C
111 3 1 0 8
ge
w
4 2 3
112 6
ie
id
3 9 16
ev
br
5 2 3
12 2 9
am
3 9
-R 32
1
-C
2 2 2 12 2 0 64
s
es
3
18 2 2
y
114 16
Pr
op
124
7 14 6
ity
C
12 4 21 3 9 32
rs
w
8 8 3
2 2 4 24
ie
ve
3 9 64
y
ev
3
op
ni
14 4 33 3 2 32
R
10 8 3
ge
2 4 4 36
w
3 9 64
ie
id
1
4 4 4 48 4 0
ev
br
64
am
-R
The probability distribution table for the sample variance, sample size of 3, is therefore:
-C
2 8 14
s
v 0 2
es
9 9 9
y
9 3 6 9
Pr
5
P (V = v )
op
32 32 32 32 32
ity
C
( ) ( ) ( ) ( ) ( )
rs
w
5 2 9 8 3 14 6 9
Hence, E (V ) = 0 × + × + × + × + 2× = 1, and if we use
ie
ve
32 9 32 9 32 9 32 32
y
n −1
ev
3−1
op
ni
n −1
am
-R
E(V ) = × σ 2
n
-C
s
es
ve
y
op
ni
U
C
nV
ge
Using the results we met in Chapter 3, Key point 3.3, this means that E = σ 2 and
w
2
n − 1
n ∑X 2
ie
( )
id
nV 1
= −X = ∑ X − nX .
2 2
n − 1 n − 1 n n −1
ev
br
am
-R
KEY POINT 6.2
-C
s
es
For sample size n taken from a population, an unbiased estimate of the population mean µ is the
y
sample mean x .
Pr
op
ity
C
s2 =
1
(
∑ x 2 − nx 2 )
rs
n −1
w
ie
ve
y
ev
op
ni
TIP
R
C
ge
Data may be raw data or summarised data. Use one of the equivalent formulae for variance to suit
w
the information:
ie
id
=
1
( ∑ x 2 − nx 2 )
ev
br
n −1
am
-R
1 ( ∑ x )2
= ∑ x2 −
n − 1 n
-C
1
( )
es
= ∑ ( x − x )2
n −1
y
Pr
op
rs
w
ve
y
ev
op
ni
A conservationist wishes to estimate the variance of the numbers of eggs laid by Melodious larks. The following
R
∑ m2 = 162, ∑ m = 66
ie
id
Use the data to find an unbiased estimate for the variance of the number of eggs laid by Melodious larks.
ev
br
am
-R
Answer
∑ m2 − = 162 − = 0.579
n − 1 30 − 1 30 1 ( ∑ x )2
es
n
∑ − .
2
x
y
n −1 n
Pr
op
ity
Note that if the question had stated: ‘The following data summarise
C
ve
y
ev
op
ni
be just that group of nests and you would use the variance formula
R
1 ( ∑ x)
2
1
∑x − = (162 − 145.2) = 0.56.
2
ge
n n 30
ie
id
29
× 0.579... = 0.56
ev
br
Note that
30
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
WORKED EXAMPLE 6.2
w
ie
id
A team of conservationists monitoring a tiger population record the number of tiger cubs in a sample of 24 litters.
ev
br
The table shows their findings.
am
-R
Number of cubs, c 1 2 3 4 >4
-C
s
Frequency, f 2 7 12 3 0
es
y
Find unbiased estimates for the mean and variance of the number of tiger cubs in the litters.
Pr
op
Answer
ity
C
∑ f c (1 × 2) + (2 × 7) + (3 × 12) + (4 × 3)
rs
Adapt formulae for grouped frequency.
w
c = =
∑f 24
ie
ve
Unbiased estimate for mean is equal to the
y
ev
64 2
= =2 sample mean.
op
ni
24 3
R
C
1 ( ∑ f c )2 use σ n − 1 or sn − 1. Show key values in your
ge
w
s2 = ∑ f c 2
− working.
n − 1
ie
n
id
ev
br
1 642
= ( )
(2 × 12 ) + (7 × 22 + (12 × 32 ) + (3 × 42 ) −
am
24
-R
24 − 1
1 4096 2
-C
= 186 − =
s
23 24 3
es
y
Pr
op
126
ity
C
E RCISE 6A
rs
w
ie
ve
In all of the following questions you are given some data and some descriptive statistics of the data. Your task is to find
y
ev
ev
br
2 Data: the time taken, t minutes, in a random sample of dental check-up appointments.
am
-R
3 Data: the yield per plant, in kg rounded to the nearest 100 g, of a random sample of a variety of
es
aubergine plants.
y
Pr
op
4 Data: the volumes, in ml, for a brand of ice cream in a 750 ml container.
rs
w
ve
y
ev
op
ni
5 Data: the total mass, x grams, for a random sample of quail eggs.
R
6 Data: the total number of faults found in a random sample of 60 silk scarves.
ie
id
ev
br
-R
s
es
ve
y
op
ni
U
C
7 Data: the time taken, in days, for a random sample of letters posted second class to be delivered.
ge
w
Number of days for letter to be delivered 1 2 3 4 5
ie
id
ev
br
Number of letters 24 32 29 9 6
am
-R
-C
s
6.2 Hypothesis testing of the population mean
es
Sample data are often collected to test a statistical hypothesis about a population. Such a
y
Pr
op
sample, even if it is a random sample, may or may not be representative of the population.
The central limit theorem studied in Chapter 5 proves that random sample estimates can be
ity
C
used to make statements about populations without having to assume that the populations
rs
w
have normal distributions. Estimates of the sample mean and sample variance can be
ie
ve
calculated from the sample and these estimates can be used to see if they support or reject
y
ev
op
ni
the null hypothesis. For sample data, (sample variance) ; that is, , is referred to as the
n
R
standard error.
C
ge
w
We follow the same process as previously used when carrying out a hypothesis test of the
ie
id
population mean. Ideally, we will set up the hypotheses, then collect the sample of data in
ev
br
that order.
am
-R
REWIND
-C
The steps to carry out a hypothesis test are explained in Chapter 1, but in summary they are:
es
y
Pr
127
• State the null and alternative hypotheses.
ity
C
ve
op
ni
with known variance
y
Pr
op
ity
rs
w
If the population mean is unknown, but the population variance is known, sample data can be used
ie
ve
to carry out a hypothesis test that the population mean has a particular value, as follows:
y
ev
op
ni
For a sample size n drawn from a normal distribution with known variance, σ 2, and sample mean
R
x−µ
ge
z=
w
σ
ie
id
n
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
WORKED EXAMPLE 6.3
w
ie
id
The masses of cucumbers grown at a smallholding are normally distributed with mean 310 g and standard
ev
br
deviation 22 g. Producers of a new plant food claim that its use increases the masses of cucumbers. To test this
am
-R
claim, some cucumber plants are grown using the new plant food and a random sample of 40 cucumbers from
these plants are selected and weighed. The mean mass of these cucumbers is 316 g.
-C
s
es
Assuming the standard deviation of the masses of the sample is the same as the standard deviation of the
y
Pr
op
Answer 1
ity
C
rs
w
222
ie
ve
Then X ~ N 310, . Use a one-tailed test, as you are looking for an
40
y
ev
increase in weight.
op
ni
H 0 : µ = 310 x−µ
R
and
C
H1: µ > 310 n
ge
w
One-tailed test at 5% level of significance
ie
id
ev
br
question.
316 − 310
P(X > 316 ) ≈ P z >
am
-R
22
40
-C
= 1 − Φ(1.725)
es
= 0.0423 or 4.23%.
y
Pr
op
128
4.23% , 5%, so the masses are in the critical region.
ity
C
ve
Answer 2
y
ev
One-tailed test at 5% level of significance, Compare it with the critical value, which is
z = φ−1(0.95) = 1.645.
ev
br
-R
40
es
Pr
ity
C
ve
The burn time, in minutes, for a certain brand of candle is modelled by a normal distribution with standard
ev
op
ni
deviation 5.7. The manufacturer claims that the mean is 250 minutes. Lanfen randomly selects ten of these candles
R
245 247 236 255 250 239 241 252 251 243
ie
id
ev
br
Stating any assumptions you make, investigate at the 5% level of significance whether the manufacturer’s claim is
am
valid.
-R
-C
s
es
ve
y
op
ni
U
C
Answer
ge
w
Assumptions: The assumptions are the conditions that allow you
ie
id
Random sample chosen. to use a sample to investigate the claim.
ev
Standard deviation of the sample the same as the population.
br
1 First find the mean of the sample.
Mean = (245 + 247 + 236 + 255 + 250 + 239
am
-R
10
State the null and alternative hypotheses.
2459
-C
s
10 This is a two-tailed test, as Lanfen is not investigating
es
H 0: µ = 250 whether the claim is only too high or only too low.
y
Pr
op
ity
C
rs
w
ve
5.72 Compare the test statistic with the critical value.
X ~ N 250,
y
10
ev
op
ni
z= = −2.275
C
5.7 question.
ge
10
w
ie
id
ev
br
−2.275 , −1.96
am
-R
Reject H 0 . There is sufficient evidence to doubt the manufacturer’s claim.
-C
s
es
Pr
op
129
It is possible to carry out a hypothesis test of a population mean when the population
ity
C
variance is unknown. Provided the sample is large, we follow the same process as for
a hypothesis test of a population mean from a normal population with known variance.
rs
w
For the variance, we use s 2 , an unbiased estimate of the population variance, where
ie
ve
( ∑ x )2 .
y
ev
1
op
ni
s2 = ∑ x2 −
n − 1 n
R
C
ge
w
ie
id
If the population mean and population variance are unknown, sample data can be used to conduct
am
-R
a hypothesis test that the population mean has a particular value, as follows:
-C
For a large sample size n drawn with unknown variance and sample mean x, the test statistic is:
s
x−µ
es
z=
s
y
Pr
n
op
( ∑x )
2
1
ity
C
where s =
2
∑x −
2
.
n −1 n
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
WORKED EXAMPLE 6.5
w
ie
id
A researcher believes that students underestimate how long 1 minute is. To test his belief, 42 students are
ev
br
chosen at random. Each student, in turn, closes their eyes and estimates 1 minute. The results for their times,
am
-R
x seconds, are summarised as follows:
-C
s
es
Investigate at the 10% level of significance if there is any evidence to support the researcher’s claim. What advice
y
Pr
op
ity
C
Answer
rs
w
ve
1 minute = 60 seconds.
y
H1: µ , 60
ev
op
ni
C
appropriate and find the critical value using tables.
critical value z is −1.282.
ge
w
∑ x 2471 Find unbiased estimates for the mean and variance.
x = = = 58.83
( ∑ x )2
ie
id
n 42 1
∑ x2 −
ev
Use to find an unbiased
br
n − 1 n
1 24712
am
s2 = − = 34.73
146 801
-R
42 − 1 42 estimate for the variance.
-C
58.83 − 60
s
34.73 Compare the test statistic with the critical value and
y
Pr
42
op
researcher’s claim.
ie
ve
y
ev
C
ge
w
ie
id
ev
br
E RCISE 6B
am
-R
PS 1 The manufacturer of a ‘fast-acting pain relief tablet’ claims that the time taken for its tablet to work follows
a normal distribution with mean 18.4 minutes and variance 3.62 minutes 2. Tyler claims that the tablets do
-C
s
es
not work that quickly. To test the claim, a random sample of 40 people record the time taken for the tablet to
y
Assuming the sample and population variances are the same, carry out an appropriate hypothesis test at the
ity
C
1% level of significance.
rs
w
PS 2 IQ test scores are normally distributed and are designed to have a mean score of 100. Anna believes the mean
ie
ve
is higher than 100. A random sample of 180 people’s IQ test scores, x, are summarised as follows.
ev
op
ni
C
ge
b Anna then discovers that the IQ test is also designed to have a variance of 152 . A random sample of
id
ev
br
six people take the test and their IQ test scores are:
am
-R
s
es
ve
y
op
ni
U
C
To test Anna’s belief, carry out an appropriate hypothesis test, using just the random sample of six people,
ge
w
at the 2% level of significance.
ie
id
c Comment, with reasons, on the reliability of your answers to parts a and b.
ev
br
am
-R
PS 3 The mass of pesto dispensed by a machine to fill a jar is a normally distributed random variable with mean
380 g. The variance of the mass, in grams2, of the pesto in the jars is 6.4. Each week a check is made to see
-C
that the mean mass dispensed by the machine has not significantly reduced. One particular week a sample of
s
es
ten jars is checked. The mean mass of pesto in these jars is 378.7 g . Carry out an appropriate hypothesis test at
y
Pr
op
ity
4 The average mass of large eggs is 68 g. The variance of the masses, in grams2, of large eggs is 1.72 . A farm
C
PS
shop sells large eggs singly. A customer claims that the eggs are underweight. To test the claim, a random
rs
w
ve
y
ev
68 65 59 72 65 60 71 73
op
ni
R
Carry out an appropriate hypothesis test at the 1% level of significance, stating any assumption(s) you
C
have made.
ge
w
ie
id
PS 5 A machine dispenses ice cream into a cone. The amount dispensed follows a normal distribution with mean
ev
br
80 ml and the variance of the amount of ice cream dispensed, in ml 2, is 9. A consumer complains that the
am
-R
amount is too low. To check whether the machine is dispensing the correct amount, a sample of six cones is
checked. The volumes in ml are as follows:
-C
82 72 75 80 76 80
es
y
Carry out an appropriate hypothesis test at the 5% level of significance, stating any assumption(s) you
Pr
op
131
have made.
ity
C
PS 6 A shop sells 2 kg bags of potatoes. A quality control inspection checks the masses of 80 randomly chosen
rs
w
ve
y
∑ x = 158.14 and ∑ x 2 = 314.094
ev
op
ni
R
Assuming the masses of the bags of potatoes are normally distributed, investigate at the 5% level of
C
significance whether there is any evidence that the bags are underweight.
ge
w
ie
id
PS 7 A manufacturer claims its light bulbs last for an average of 2000 hours. A random sample of 42 light bulbs is
ev
br
tested. The lengths of time the light bulbs last, t hours, are summarised as follows:
am
-R
Test the manufacturer’s claim at the 10% level of significance, stating any assumptions you have made.
s
es
y
When a hypothesis test reveals statistically significant results, the results are applicable to
ity
C
the sample. Often we use the results as if they apply to the population. However, we cannot
rs
w
ve
The hypothesis tests studied so far, in Chapter 1, Chapter 2 and earlier in this chapter,
y
ev
op
ni
involve a single parameter, the population mean, from a sample of data. To allow for the
R
issue that the sample may or may not be representative of the whole population, sample
C
data can also be used to construct an interval that specifies the limits within which it
ge
is likely that the population mean will lie. This interval is a confidence interval (CI).
ie
id
if the same population is sampled many times and each time an interval estimate is found,
am
-R
s
es
ve
y
op
ni
U
C
It is possible to construct one-sided or two-sided confidence intervals. However, we will
ge
w
consider only symmetrical two-sided intervals.
ie
id
A 95% confidence interval is the range of values in which we can be 95% confident that
ev
br
the true mean lies. If that interval is from a to b, then: P(a , true mean , b ) = 0.95. The
am
-R
central 95% of the sample distribution is from the 2.5th to the 97.5th percentile.
-C
For a normal distribution N( µ, σ 2 ), we find from normal tables that the central 95% lies
s
es
between −1.96 and +1.96 standard deviations either side of the mean.
y
Pr
σ2
op
ity
C
rs
w
ve
the true mean.
y
ev
op
ni
σ σ
If we work out sample means for a large number of −1.96 +1.96
R
√n √n
U
C
samples, 95% of the time we would expect the sample
ge
w
ie
id
σ σ
µ − 1.96 < x < µ+ 1.96
ev
br
n n
am
-R
which rearranges to give:
σ σ
x − 1.96 , µ , x + 1.96
-C
n n
es
y
So to find a 95% confidence interval, use the sample values and work out the interval
Pr
op
132
σ σ σ
x ± 1.96 . An alternative way to write the interval is x − 1.96 , x + 1.96 .
ity
C
n n n
rs
w
ie
ve
y
KEY POINT 6.5
ev
op
ni
R
A 95% confidence interval means that 95% of possible sample means lie within the interval. It tells
C
us the probability that the true mean lies within the interval is 0.95, and the probability that the true
ge
ev
br
am
-R
Consider Worked example 6.3. The hypothesis test found that there was evidence to accept
es
the producer’s claim that its plant food increases the mass of cucumbers.
y
Pr
op
22
The sample data are summarised by sample mean x = 316 and standard error = 3.48.
40
ity
C
A 95% confidence interval for these values is 316 ± 1.96 × 3.48; that is, (309, 323).
rs
w
ie
ve
This means we can be 95% confident that the true mean lies in this range.
y
ev
op
ni
Before using the new plant food, the mean mass of cucumbers was 310 g. This mass just
R
lies within the confidence interval (309, 322), at the lower end. We could conclude that it is
C
ge
possible that the plant food does not increase the mass of the cucumbers and the sample is
w
not representative.
ie
id
ev
The percentage level chosen for the confidence interval does affect the size of the interval.
br
-R
-C
s
es
ve
y
op
ni
U
C
A 90% confidence interval will give a smaller interval.
ge
w
From normal tables, the central 90% lies within 1.645 standard deviations of the mean.
ie
id
The 90% confidence interval is 316 ± 1.645 × 3.48; that is, (310, 322).
ev
br
am
-R
The lower bound 310 has been rounded from 310.3. Using a 90% confidence interval,
µ
the original population mean 310 lies just outside the interval and so you would σ σ
−1.645 +1.645
-C
s
es
What about a 99% confidence interval? Using normal tables, the 99% confidence
y
Pr
op
interval can be calculated as µ ± 2.576 , giving 316 ± 2.576 × 3.48; that is, (307, 325).
n
ity
C
Compare the confidence intervals, with all values given to the nearest integer: 90%
CI = (310, 322), 95% CI = (309, 323) and 99% CI = (307, 325). We can see that the higher
rs
w
the percentage, the more confident we can be that the true mean lies within that interval.
ie
ve
y
However, the higher percentage gives a wider interval, and this means the information we
ev
op
ni
have about the true mean is less precise; that is, there is a greater range of possible values
R
C
for the true mean.
ge
w
Sample size also affects the size of a confidence interval.
ie
id
Consider a population with known standard deviation 15. A random sample n = 100
ev
br
σ 15
and x = 20 has standard error = = 1.5. A 95% confidence interval is then
am
n 100
-R
20 ± 1.96 × 1.5 or (17.1, 22.9).
-C
σ
s
15
Let us increase n. If n = 400 with x = 20 , then standard error = = 0.75 and the
es
n 400
y
133
ity
C
ve
y
A confidence interval for an unknown population parameter, such as the mean, at a P% confidence
ev
op
ni
level, is an interval constructed so that there is a probability of P% that the interval includes the
R
parameter.
C
σ
ge
To find the confidence interval for a population mean with known variance σ 2 , calculate x ± k ,
w
n
where k is determined by the percentage level of the confidence interval.
ie
id
ev
br
% CI 90 95 98 99
am
-R
The greater the percentage, the more confident we can be that the true parameter lies within the interval.
es
y
The greater the percentage, the wider the confidence interval and the less precise we can be about
Pr
op
When choosing the sample size, n, as n increases the standard error σ decreases and the resulting
C
n
rs
w
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
EXPLORE 6.1
w
ie
id
σ
As sample size increases, the value of decreases and so the width of a confidence
ev
br
n
interval decreases. Why do you think it is not usual practice to use very large
am
-R
samples? Hint: Find the proportional decrease in the width of the confidence interval
-C
s
es
Discuss these two questions in relation to the scenarios that follow:
y
Pr
● How large a sample do you actually need?
op
ity
C
Scenario 1: Health officials for a city with population around 40 000 are concerned
rs
w
with the increase in body mass index, BMI, in the population. Would your sample
ie
ve
numbers change if the population was, say, 120 000? Explain why or why not.
y
ev
op
ni
C
treatment. Discuss the possible advantages and disadvantages in combining several
ge
different trials of a new drug treatment. (Note that combining the results of many
w
ie
scientific studies is called a meta-analysis.)
id
ev
br
am
-R
-C
Pr
Excessive vegetation in pond water can cause the appearance of unwanted organisms. Over a long period of time
op
134
it has been found that the number of unwanted organisms in 100 ml of pond water is approximately normally
ity
C
distributed with standard deviation 12. Adam takes six random 100 ml samples of water from his pond. The
rs
w
numbers of unwanted organisms in the samples are 56, 102, 48, 74, 88 and 67.
ie
ve
a Find a 95% confidence interval for the mean number of organisms in 100 ml of the pond water.
y
ev
op
ni
b If the mean number of unwanted organisms in 100 ml of pond water is above 80, vegetation should be
R
removed. Use your results to decide whether Adam needs to remove vegetation from his pond. What advice
ge
Answer
ev
br
1
( 56 + 102 + 48 + 74 + 88 + 67 ) = 72.5
am
x =
-R
12
n = 6, = = 4.9
s
n 6 σ
es
ity
y
ev
Despite the sample mean 72.5 being less than 80, the
op
ni
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
WORKED EXAMPLE 6.7
w
ie
id
The label on a certain packet of sweets states the contents are 100 g. It is known that the standard deviation is 5 g.
ev
br
The mechanism producing these packets of sweets is checked. From a random sample of ten packets, the mean is
am
-R
103.8 g. Find a 99% confidence interval for the mean contents of the packets of sweets. Use your result to explain
whether the mechanism needs adjustment.
-C
s
es
Answer
y
Pr
op
σ 5 σ
x ± 2.576 = 103.8 ± 2.576 = 103.8 ± 4.073 Use x ± 2.576 .
n 10
ity
n
C
rs
w
ve
The confidence interval tells us that it is possible for the
y
ev
op
ni
adjustment.
R
C
ge
w
ie
id
ev
br
-R
Quality control of manufacturing processes is one
application of sampling methods. Random samples of the
-C
Pr
consumers of the product get what they pay for. With any
op
135
product, there can be slight variations in some parameter,
ity
C
ve
y
manufacturing process is working correctly.
ev
op
ni
R
C
ge
To find a confidence interval for a population mean, we rely upon knowing the
standard deviation, σ , of the original population. However, since calculating standard
ev
br
deviation involves knowing the mean, it is more likely that the actual value of the
am
-R
standard deviation will be unknown. Instead, we can use the sample data to calculate
an unbiased estimate of variance, s 2 , and then use s in place of σ to find the confidence
-C
s
es
interval.
y
Pr
The procedure for finding a confidence interval using an unbiased estimate of standard
op
deviation from a sample gives a reasonably accurate result provided the sample is
ity
C
sufficiently large. How large is sufficiently large? Look back at Explore 6.1. If you compare
rs
w
sample sizes 25 and 400, then since 25 = 5 and 400 = 20 you will find that increasing
ie
ve
the sample size by 16 times (16 × 25 = 400) only reduces the margin of error by
y
ev
op
ni
1 1 1
one-quarter × = . ‘Large’ is not precisely defined; as a general rule, it
25 4 400
R
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
KEY POINT 6.7
w
ie
id
s
To find the confidence interval for a population mean using a large sample, calculate x ± k ,
ev
br
n
where s = am1
( )
∑ x 2 − nx 2 and k is determined by the percentage level of the confidence
-R
n −1
interval.
-C
s
es
y
Pr
op
ity
C
rs
w
ve
y
a Find a 90% confidence interval for the mean mass of the strawberries.
ev
op
ni
b An α % confidence interval for the population mean, based on this sample, is found to have width of
R
C
3.65 grams. Find α .
ge
w
Answer
ie
id
ev
br
a x = = = 16.2
n 60 ( ∑ x )2
am
-R 1
Use ∑ x 2
− to find an unbiased
s2 =
1
17 304.78 −
9722
= 26.413 , so n − 1 n
60
-C
60 − 1
s
s= 26.413 = 5.14 s
The sample is sufficiently large to use x ± 1.645
y
.
Pr
op
136 s 5.14 n
x ± 1.645 = 16.2 ± 1.645 = 16.2 ± 1.09
n 60 The confidence interval will be approximate, as the
ity
C
ve
5.14
y
= 3.65 s s
ev
60 n n
R
k= = 2.75
ge
For z = 2.75, from tables p = 0.997. 2(1 − p ) is the percentage in both tails.
ev
br
α = 1 − 2(1 − p )
so α = 99.4%.
am
-R
-C
s
es
E RCISE 6C
y
Pr
op
For questions 1 and 2 you may refer to the answers to Exercise 6A for unbiased estimates of population mean and
ity
C
ve
y
ev
Calculate:
C
ge
-R
-C
s
es
ve
y
op
ni
U
C
2 The following data summarise the time taken, t minutes, in a random sample of dental check-up
ge
w
appointments:
ie
id
n = 30 , ∑ t = 630 and ∑ t 2 = 13 770
ev
br
amCalculate:
-R
a a 95% confidence interval for the population mean
-C
s
b a 98% confidence interval for the population mean.
es
y
Pr
op
3 The following data summarise the total mass, x grams of the yield for a random sample of 44 chilli plants.
ity
∑ x = 842 and ∑ x 2 = 16 364
C
rs
w
Calculate:
ie
ve
a a 99% confidence interval for the population mean
y
ev
op
ni
C
ge
4 The following data summarise the volume, x litres, for a random sample of bottles of juice.
w
ie
id
ev
br
Calculate:
am
-R
a a 90% confidence interval for the population mean
-C
5 The following data summarise the total mass, x grams, for a random sample of quail eggs:
Pr
op
137
n = 30, ∑ x = 254.4 and ∑ x = 2271.6
2
ity
C
ve
b An α % confidence interval for the population mean, based on this sample, is found to have width
y
ev
6 The following data summarise the masses, x kg, of 60 bags of dry pet food.
ge
c An α % confidence interval for the population mean, based on this sample, is found to have width of
-C
Pr
M 7 a Explain why the width of a 98% confidence interval for the mean of a standard normal distribution
op
is 4.652 .
ity
C
b The result, X , of testing the breaking strain of a brand of fishing line is a normally distributed random
rs
w
variable with mean µ and variance 2.25. The testers wish to have a 98% confidence interval for
ie
ve
µ with a total width less than 1. Find the least number of tests needed.
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
6.4 Confidence intervals for population proportion
ge
w
Not every statistical investigation concerns means of samples. Consider, for example,
ie
id
opinion polls. Many organisations carry out opinion polls to gauge voter intentions.
ev
br
Different polls for the same election do not always agree, even when the people chosen
am
-R
are representative samples of the population. When the data have been collected and
presented, it is possible to calculate probabilities. We need to question, and statistically
-C
s
es
There are a number of situations in which people are required to choose between two
y
Pr
op
options, such as the UK Brexit vote where the options were to either remain or leave the
European Union. The following example models an opinion poll for such a situation.
ity
C
Sofia and Diego are the only two candidates in an election; there is no third option and
rs
w
everyone has to vote. Let a vote for Sofia be called a success. In an opinion poll of n people,
ie
ve
r
y
ev
where r people say they will vote for Sofia, the proportion of successes pˆ = .
op
ni
n
R
The binomial distribution is a suitable model for this situation since there are only two
C
outcomes, there is a fixed number of people in the poll and each person independently
ge
w
chooses who to vote for.
ie
id
Let the random variable, X , be the number of people who vote for Sofia. Then
ev
br
-R
X
Let P̂ be the random variable ‘the proportion of the sample voting for Sofia’. Then Pˆ = .
-C
n
s
es
( )
Expected value, E Pˆ = E = E ( X ) = × np = p , and so p̂ is an unbiased
X 1 1
y
n
Pr
n n
op
( )
Variance, Var Pˆ = Var = 2 Var( X ) = 2 × np( 1 − p ) =
X 1 1 p(1 − p)
rs
w
n n n n
ie
ve
Before the election, an opinion poll of a random sample of 200 people is conducted. In
y
ev
op
ni
this opinion poll 108 people say they will vote for Sofia and 92 say they will vote for Diego.
R
With more than half of the people in the sample voting for Sofia, you may conclude that
C
Sofia will win the election. To investigate how reliable this conclusion is we would have to
ge
For sufficiently large values of n, such that np > 5 and n(1 − p ) > 5 , a binomial distribution
ev
br
am
p(1 − p )
sample proportion is N p, .
-C
n
s
es
Confidence intervals for a population proportion are worked out in a similar way to
y
pˆ (1 − pˆ )
Pr
op
Note that this is an approximate confidence interval since a population proportion has a
ie
ve
However, it is not necessary to apply continuity corrections when finding these confidence
op
ni
intervals.
R
C
ge
Returning to the opinion poll for Sofia and Diego, for the random sample of 200 people:
w
108
ie
id
-R
s
es
ve
y
op
ni
U
C
For a 95% confidence interval:
ge
w
pˆ (1 − pˆ )
pˆ ± k = 0.54 ± 1.96 × 0.001242 = 0.54 ± 0.07
ie
id
n
ev
br
So the confidence interval is (0.471, 0.609) .
am
-R
With only two candidates, the winner needs more than 50% of the votes. The question to
be resolved is where the confidence interval lies with respect to the 50% value.
-C
s
es
The following diagram shows the range of the confidence interval crossing the 50%, or 0.5,
y
mark.
Pr
op
0.471 0.609
ity
C
rs
w
ie
ve
0.5
y
ev
op
ni
So for this sample, even though more than half said they would vote for Sofia, the
R
confidence interval suggests that the proportion of votes for Sofia could be less than half.
C
ge
Suppose instead you want to know how many people to poll (i.e. to select for the sample) to
w
find a confidence interval of a given width. You could ask, ‘What sample size is needed for
ie
id
an approximate 95% confidence interval for this proportion to have a width of 0.03?’.
ev
br
pˆ (1 − pˆ )
am
-R
To find a confidence interval, we calculate ± k , so the width of the confidence
n
pˆ (1 − pˆ )
-C
interval is given by 2 k . The question requires the same proportion as the sample,
s
n
es
so pˆ = 0.54.
y
Pr
op
0.54 × (1 − 0.54)
C
2 × 1.96 = 0.03
n
rs
w
ie
ve
op
ni
R
George Gallup showed the importance of opinion polls about the guidance of
-R
The work he began studying social, moral and religious polls on the BBC
y
Pr
op
political parties.
ie
ve
y
ev
op
ni
R
EXPLORE 6.2
C
ge
Find media reports on the results of an opinion poll. Does the report comment on
ie
id
how many people or voters were included in the poll? Does the report comment on
ev
br
the sampling method employed? Use the information given in the report to discuss
am
-R
s
es
ve
y
op
ni
U
C
ge
KEY POINT 6.8
w
ie
id
For a large random sample, size n, an approximate confidence interval for the population
ev
br
proportion, !p, is:
pˆ (1 − pˆ ) pˆ (1 − pˆ )
am
-R
pˆ − k n
, pˆ + k
n
-C
s
where k is determined by the percentage level of the confidence interval.
es
y
Pr
op
ity
C
rs
w
TIP
ie
ve
A Sudoku puzzle is classified as ‘easy’ if more than 70% of the people attempting to
y
Samples have a margin
ev
solve it do so within 10 minutes, and ‘hard’ if less than 20% of people take less than
op
ni
10 minutes to complete it. Otherwise it is classified as ‘average’. Of 120 people given a of error. When you find
R
a confidence interval,
C
Sudoku puzzle, 87 completed it within 10 minutes.
ge
w
a Find an approximate 99% confidence interval for the proportion of people where it lies in relation
ie
id
completing the puzzle within 10 minutes. Comment on how the Sudoku puzzle to the boundary, or
ev
br
Pr
Answer
op
140
87 First, find the sample proportion.
ity
pˆ = = 0.725
C
a
120
Use 2.58 for a 99% confidence
rs
w
ve
0.725 ± 2.58
y
120
ev
pˆ (1 − pˆ )
op
ni
of interest.
id
ev
br
am
-R
-C
0.2 0.7
es
y
Pr
ve
op
ni
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
WORKED EXAMPLE 6.10
w
ie
id
Apprentices work four days a week and spend one day a week at college. It is proposed that the college day is
ev
br
changed from a Monday to a Friday. The college will consider changing the day if 80% of apprentices are in
am
-R
favour of the change. In a sample of apprentices, how many should be asked to be 90% certain of gaining 80%
support that is not more than 5% wrong.
-C
s
es
Answer
y
pˆ = 0.8
Pr
You want the proportion in favour to be 80% or 0.8.
op
ity
C
1.645 = 0.05
n pˆ (1 − pˆ )
= 0.05 .
rs
Calculate k
w
2 n
ie
ve
n=
1.645
× 0.8 × (1 − 0.8) = 173
y
0.05 Use 1.645 for 90% confidence interval.
ev
op
ni
R
C
ge
w
ie
id
E RCISE 6D
ev
br
am
-R
1 A quality control check of a random sample of 120 pairs of jeans produced at a factory finds that 24 pairs
are sub-standard. Calculate the following confidence intervals for the proportion of jeans produced that are
-C
sub-standard:
es
y
141
b a 98% confidence interval.
ity
C
rs
w
2 At a university, a random sample of 250 students is asked if they use a certain social media app. Of the
ie
ve
students in the sample, 92 use this social media app. Calculate a 95% confidence interval for the proportion of
y
ev
3 A four-sided spinner has sides coloured red, yellow, green and blue. The probability that the spinner lands on
ge
yellow is p. In an experiment, the spinner lands on yellow 18 times out of 80 spins. Find an approximate
w
ev
br
-R
b This experiment is carried out ten times. How many of the confidence intervals would be expected to
es
Pr
op
M 5 The proportion of European men who are red-green colour-blind is 8%. How large a sample would need to be
ity
C
selected to be 95% certain that it contains at least this proportion of red-green colour-blind men?
rs
w
6 A random sample of 200 bees from a colony is tested to find out how many are infected with Varroa mites.
ie
ve
M PS
y
op
ni
a Calculate a 99% confidence interval for the proportion of the colony infected with Varroa mites.
R
b The colony of bees will collapse and will not survive if 35% or more are infected with Varroa mites. Show
ge
why it is possible, at the 99% confidence level, that the colony of bees might collapse.
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
w
Checklist of learning and understanding
ie
id
ev
br
● If U is some statistic derived from a random sample taken from a population, then U is an
am
unbiased estimate for Φ if E (U ) = Φ.
-R
● For sample size n taken from a population, an unbiased estimate of:
-C
s
es
● the population mean µ is the sample mean x
y
2
the population variance σ is:
Pr
●
op
( ∑ x ) = 1 ∑( x − x )2
2
( ) ( )
ity
1 1
C
s2 = ∑ x 2 − nx 2 = ∑ x2 −
n −1 n −1 n n −1
rs
w
ie
ve
● To test a hypothesis about a sample mean, x , for a sample size n drawn from a normal
y
ev
op
ni
R
x −µ
C
z = σ
.
ge
w
n
ie
id
● The test statistic, z, can be used to test a hypothesis about a population mean drawn from
ev
br
any population.
am
●
-R
Where the population variance is unknown, use the unbiased estimate of variance s 2.
A confidence interval for an unknown population parameter, such as the mean, is an interval
-C
●
s
142 σ σ
● population mean with known variance, σ , is x − k , x + k
n
ity
n
C
rs
s
w
s
● population mean using a large sample is x − k , x + k , where
n
ie
ve
n
( ∑ x2 − nx 2 )
y
1
s =
ev
n −1
op
ni
( )
R
pˆ (1− pˆ ) pˆ (1− pˆ )
● population proportion, !p, is pˆ − k , pˆ +k ,
ge
n n
w
ev
br
am
-R
-C
s
es
y
Pr
op
ity
C
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
END-OF-CHAPTER REVIEW EXERCISE 6
w
ie
id
PS 1 The worldwide proportion of left-handed people is 10%.
ev
br
ama Find a 95% confidence interval for the proportion of left-handed people in a
-R
random sample of 200 people from town A. [3]
-C
s
es
From a random sample of 100 people in town B, an α % confidence interval for the
y
Pr
op
i Show that the proportion of left-handed people in the sample from town B is 16%. [2]
ity
C
rs
w
ie
ve
PS 2 The label on a jar of jam carries the words ‘minimum contents 272 g’.
y
ev
op
ni
a Explain why, in practice, the average contents need to be greater than 272 g. [1]
R
C
b The mass of jam dispensed by a machine used to fill the jars is a normally distributed
ge
random variable with mean 276 g. The variance of the mass of the jam, in grams2,
w
dispensed by the machine is 1.82. Each week there is a check to see if the mean mass
ie
id
dispensed by the machine is 276 g. One particular week a sample of eight jars is checked.
ev
br
The mean mass of jam in these jars is 277.7 g. Carry out an appropriate hypothesis test
am
-R
at the 5% level of significance, stating any assumptions you have made. [5]
-C
M 3 An employer who is being sued for the wrongful dismissal of an employee is advised that any award paid out
s
es
will be based on national average earnings for employees of a similar age. A random sample of 120 people is
y
found to have a mean income of $21 000 with standard error $710.
Pr
op
143
a Find a 95% confidence interval for the award. [3]
ity
C
b The employer wants to know the upper limit of the award that is very unlikely to be exceeded. The employer
rs
w
ve
y
ev
i Explain why the required size of the confidence interval is 99.8%. [1]
op
ni
R
ii Work out the unlikely upper limit of the award, giving your answer to the nearest dollar. [3]
C
ge
M 4 The volume, v ml, of liquid dispensed by a vending machine for a random sample of 60 hot drinks is
w
summarised as follows:
ie
id
ev
br
-R
b Work out a 90% confidence interval for the population mean. [3]
s
es
M PS 5 The manufacturer of a certain smartphone advertises that the average charging time for the battery is
y
Pr
op
80 minutes with standard deviation 2.6 minutes. Owners of these smartphones suggest that the time is longer.
A random sample of the phones were charged from 0 to 100% and their times, in minutes, are as follows.
ity
C
rs
88 85 82 77 86 75 80 79
w
ie
ve
a Investigate at the 5% level of significance whether or not the manufacturer’s claim is justified, stating any
y
ev
op
ni
b The given length of time a charged smartphone battery will last is normally distributed with mean 24 hours.
ge
The variance of the time taken, in minutes, for the smartphone to work is 1. The variance of the length of
w
time time taken, in hours squared, for the battery to last is 1. Sami tests a random sample of five batteries;
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
w
ie
id
the sample mean time is 23.2 hours. Investigate at the 5% level of significance whether the time batteries
ev
br
amlast is less than the time given, stating any assumption(s) you make. [5]
-R
c In a single sample, determine how long the battery could last for, if a Type I error occurs. [1]
-C
PS 6 The manufacturer of a tablet computer claims that the mean battery life is 11 hours. A consumer organisation
s
es
wished to test whether the mean is actually greater than 11 hours. They invited a random sample of members to
y
report the battery life of their tablets. They then calculated the sample mean. Unfortunately a fire destroyed the
Pr
op
ity
C
rs
w
the tablet
ie
ve
y
Sample size, n
ev
op
ni
R
C
Is the result significant
ge
Yes
w
at the 5% level?
ie
id
ev
br
am
-R
Given that the population of battery lives is normally distributed with standard
-C
deviation 1.6 hours, find the set of possible values of the sample size, n. [5]
es
Pr
op
144
PS 7 Parcels arriving at a certain office have weights W kg, where the random variable W has mean µ and standard
ity
C
deviation 0.2. The value of µ used to be 2.60, but there is a suspicion that this may no longer be true. In order
to test at the 5% significance level whether the value of µ has increased, a random sample of 75 parcels is
rs
w
ve
y
ev
i The mean weight of the 75 parcels is found to be 2.64 kg. Carry out the test. [4]
op
ni
R
ii Later another test of the same hypotheses at the 5% significance level, with another random
C
ge
sample of 75 parcels, is carried out. Given that the value of µ is now 2.68, calculate the probability
w
ev
br
-R
PS 8 Last year Samir found that the time for his journey to work had mean 45.7 minutes and standard deviation
3.2 minutes. Samir wishes to test whether his average journey time has increased this year. He notes the times,
-C
in minutes, for a random sample of 8 journeys this year with the following results.
es
y
It may be assumed that the population of this year’s journey times is normally distributed with standard
ity
C
i State, with a reason, whether Samir should use a one-tail or a two-tail test. [2]
ie
ve
y
ev
ii Show that there is no evidence at the 5% significance level that Samir’s mean journey time
op
ni
iii State, with a reason, which one of the errors, Type I or Type II, might have been made in
ge
-R
-C
s
es
ve
y
op
ni
U
C
ge
w
ie
id
M 9 The management of a factory thinks that the mean time required to complete a particular task is 22 minutes.
ev
br
amThe times, in minutes, taken by employees to complete this task have a normal distribution with mean µ
-R
and standard deviation 3.5. An employee claims that 22 minutes is not long enough for the task. In order
to investigate this claim, the times for a random sample of 12 employees are used to test the null hypothesis
-C
s
es
i Show that the null hypothesis is rejected in favour of the alternative hypothesis
y
Pr
op
ity
C
ii Find the probability of a Type II error given that the actual mean time is 25.8 minutes. [4]
rs
w
ve
y
10 A doctor wishes to investigate the mean fat content in low-fat burgers. He takes a random sample of 15 burgers
ev
op
ni
and sends them to a laboratory where the mass, in grams, of fat in each burger is determined. The results are as
R
follows.
C
ge
w
9 7 8 9 6 11 7 9 8 9 8 10 7 9 9
ie
id
Assume that the mass, in grams, of fat in low-fat burgers is normally distributed with mean µ and that the
ev
br
-R
i Calculate a 99% confidence interval for µ. [4]
-C
ii Explain whether it was necessary to use the Central Limit Theorem in the calculation in part i. [2]
es
y
iii The manufacturer claims that the mean mass of fat in burgers of this type is 8 g.
Pr
op
145
Use your answer to part i to comment on this claim. [2]
ity
C
M 11 The masses of sweets produced by a machine are normally distributed with mean µ grams and
ie
ve
standard deviation 1.0 grams. A random sample of 65 sweets produced by the machine has a mean
y
ev
op
ni
The manufacturer claims that the machine produces sweets with a mean mass of 30 grams.
ie
id
ii Use the confidence interval found in part i to draw a conclusion about this claim. [2]
ev
br
am
iii Another random sample of 65 sweets produced by the machine is taken. This sample gives a
-R
99% confidence interval that leads to a different conclusion from that found in part ii.
-C
Assuming that the value of µ has not changed, explain how this can be possible. [1]
s
es
Pr
op
12 A random sample of n people were questioned about their internet use. 87 of them had a high-speed internet
connection. A confidence interval for the population proportion having a high-speed internet connection is
ity
C
0.1129 , p , 0.1771.
rs
w
ie
i Write down the mid-point of this confidence interval and hence find the value of n. [3]
ve
y
ev
[4]
R
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
w
ie
id
PS 13 The masses of packets of cornflakes are normally distributed with standard deviation 11 g. A random sample of
ev
br
20 packets was weighed and found to have a mean mass of 746 g.
am
-R
i Test at the 4% significance level whether there is enough evidence to conclude that
the population mean mass is less than 750 g. [4]
-C
s
es
ii Given that the population mean mass actually is 750 g, find the smallest possible sample size,
y
n, for which it is at least 97% certain that the mean mass of the sample exceeds 745 g. [4]
Pr
op
ity
C
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
y
Pr
op
146
ity
C
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
y
Pr
op
ity
C
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
CROSS-TOPIC REVIEW EXERCISE 2
w
ie
id
M 1 The time to failure, in years, for two types of kettle can be modelled by the continuous random variables X and
ev
br
Y which, respectively, have probability density functions as follows:
am
-R
x 9
0øxø4 1ø y ø 3
-C
f( x ) = 8 f( y ) = 4 y3
s
0 0
es
otherwise otherwise
y
Pr
op
Show that the probability of failure by time t is the same for both X and Y if t satisfies the equation
ity
C
t 4 − 18t 2 + 18 = 0 and verify that this time is just over 1 year. [7]
rs
w
ve
0.25
y
ev
4øxø8
op
ni
f( x ) =
0 otherwise
R
C
a Sketch the graph of y = f( x ). [2]
ge
w
b State the mean and use integration to find the variance. [3]
ie
id
ev
br
s
es
M 3 Chakib cycles to college. He models his journey time, T minutes, by the following probability density function:
y
Pr
1
op
147
(25 − t ) 10 ø t ø 20
f (t ) = 100
ity
C
0 otherwise
rs
w
ie
ve
op
ni
Chakib finds that a random sample of 20 of his journey times has mean 12.4 minutes.
R
b Write down the approximate distribution of the sample mean for a sample of size 20. [3]
ge
PS
ev
br
4 i Give a reason for using a sample rather than the whole population in carrying out
a statistical investigation. [1]
am
-R
ii Tennis balls of a certain brand are known to have a mean height of bounce of 64.7 cm, when dropped from
-C
a height of 100 cm. A change is made in the manufacturing process and it is required to test whether this
es
change has affected the mean height of bounce. 100 new tennis balls are tested and it is found that their
y
Pr
mean height of bounce when dropped from a height of 100 cm is 65.7 cm and the unbiased estimate of the
op
ve
b Use your answer to part ii a to explain what conclusion can be drawn about whether the
y
ev
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
w
ie
id
5 The diameter, in cm, of pistons made in a certain factory is denoted by X, where X is normally distributed with
ev
br
mean µ and variance σ 2 . The diameters of a random sample of 100 pistons were measured, with the following
am
-R
results:
-C
s
n = 100 ∑ x = 208.7 ∑ x 2 = 435.57
es
y
Pr
i [3]
op
The pistons are designed to fit into cylinders. The internal diameter, in cm, of the cylinders is denoted by Y ,
ity
C
where Y has an independent normal distribution with mean 2.12 and variance 0.000144. A piston will not fit
rs
w
ve
y
ev
ii Using your answers to part i, find the probability that a randomly chosen piston will not fit into a randomly
op
ni
C
Cambridge International AS & A Level Mathematics 9709 Paper 73 Q7 November 2015
ge
w
6 The marks, x, of a random sample of 50 students in a test were summarised as follows:
ie
id
ev
br
-R
i Calculate unbiased estimates of the population mean and variance. [3]
-C
Each student’s mark is scaled using the formula y = 1.5x + 10 . Find estimates of the population mean and
es
ii
y
148
Cambridge International AS & A Level Mathematics 9709 Paper 73 Q4 June 2015
ity
C
PS 7 In a survey a random sample of 150 households in Nantville were asked to fill in a questionnaire about
rs
w
household budgeting.
ie
ve
y
ev
i The results showed that 33 households owned more than one car. Find an approximate 99% confidence
op
ni
interval for the proportion of all households in Nantville with more than one car. [4]
R
ii The results also included the weekly expenditure on food, x dollars, of the households. These were
ge
summarised as follows:
ie
id
-R
Find unbiased estimates of the mean and variance of the weekly expenditure on food of all households in
Nantville. [3]
-C
iii The government has a list of all the households in Nantville numbered from 1 to 9526. Describe briefly how
es
to use random numbers to select a sample of 150 households from this list. [3]
y
Pr
op
PS 8 The number of hours that Mrs Hughes spends on her business in a week is normally distributed with mean
rs
w
µ and standard deviation 4.8. In the past the value of µ has been 49.5.
ie
ve
Assuming that µ is still equal to 49.5, find the probability that in a random sample of 40 weeks the mean
y
i
ev
op
ni
time spent on her business in a week is more than 50.3 hours. [4]
R
Following a change in her arrangements, Mrs Hughes wishes to test whether µ has decreased. She chooses a
ge
random sample of 40 weeks and notes that the total number of hours she spent on her business during these
ie
id
weeks is 1920.
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
w
ie
id
ii a Explain why a one-tail test is appropriate. [1]
ev
br
am b Carry out the test at the 6% significance level. [4]
-R
c Explain whether it was necessary to use the Central Limit theorem in part ii b. [1]
-C
s
Cambridge International AS & A Level Mathematics 9709 Paper 72 Q5 November 2014
es
y
PS 9 Following a change in flight schedules, an airline pilot wished to test whether the mean distance that he flies in a
Pr
op
week has changed. He noted the distances, x km, that he flew in 50 randomly chosen weeks and summarised the
ity
C
results as follows.
rs
w
ve
y
i Calculate unbiased estimates of the population mean and variance. [3]
ev
op
ni
R
ii In the past, the mean distance that he flew in a week was 2850 km. Test, at the 5% significance level, whether
U
C
the mean distance has changed. [5]
ge
w
Cambridge International AS & A Level Mathematics 9709 Paper 71 Q3 November 2013
ie
id
ev
br
10 Each of a random sample of 15 students was asked how long they spent revising for an exam.
am
-R
50 70 80 60 65 110 10 70 75 60 65 45 50 70 50
Assume that the times for all students are normally distributed with mean µ minutes and standard deviation
-C
s
es
12 minutes.
y
i [4]
op
149
rs
w
ve
op
ni
11 In the past the weekly profit at a store had mean $34 600 and standard deviation $4500. Following a change of
R
ownership, the mean weekly profit for 90 randomly chosen weeks was $35 400.
ge
i Stating a necessary assumption, test at the 5% significance level whether the mean weekly profit has
ie
id
increased. [6]
ev
br
ii State, with a reason, whether it was necessary to use the Central Limit Theorem in part i. [2]
am
-R
The mean weekly profit for another random sample of 90 weeks is found and the same test is carried out at the
-C
5% significance level.
s
es
Pr
op
iv Given that the population mean weekly profit is now $36 500, calculate the probability of a Type II error. [5]
ity
C
M 12 In order to obtain a random sample of people who live in her town, Jane chooses people at random from the
ie
ve
op
ni
i Give a reason why Jane’s method will not give a random sample of people who live in
R
Jane now uses a valid method to choose a random sample of 200 people from her town and finds that 38 live in
ie
id
apartments.
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
w
ie
id
ii Calculate an approximate 99% confidence interval for the proportion of all people in Jane’s town who live
ev
br
in apartments.
am [4]
-R
iii Jane uses the same sample to give a confidence interval of width 0.1 for this proportion. This interval is
-C
s
es
Cambridge International AS & A Level Mathematics 9709 Paper 72 Q6 November 2012
y
Pr
op
PS 13 The volumes of juice in bottles of Apricola are normally distributed. In a random sample of 8 bottles, the
volumes of juice, in millilitres, were found to be as follows.
ity
C
rs
w
ve
i Find unbiased estimates of the population mean and variance. [3]
y
ev
op
ni
A random sample of 50 bottles of Apricola gave unbiased estimates of 331 millilitres and 4.20 millilitres2 for
R
C
the population mean and variance respectively.
ge
w
ii Use this sample of size 50 to calculate a 98% confidence interval for the population mean. [3]
ie
id
iii The manufacturer claims that the mean volume of juice in all bottles is 333 millilitres. State, with a reason,
ev
br
-R
Cambridge International AS & A Level Mathematics 9709 Paper 71 Q4 November 2011
-C
PS 14 Metal bolts are produced in large numbers and have lengths which are normally distributed with mean 2.62 cm
es
Pr
op
150
i Find the probability that a random sample of 45 bolts will have a mean length of more than 2.55 cm. [3]
ity
C
ii The machine making these bolts is given an annual service. This may change the mean length of bolts
rs
w
produced but does not change the standard deviation. To test whether the mean has changed, a random
ie
ve
sample of 30 bolts is taken and their lengths noted. The sample mean length is m cm. Find the set of values
y
ev
of m which result in rejection at the 10% significance level of the hypothesis that no change in the mean
op
ni
w
ie
15 There are 18 people in Millie’s class. To choose a person at random she numbers the people in the class from 1 to
id
ev
br
18 and presses the random number button on her calculator to obtain a 3-digit decimal. Millie then multiplies
the first digit in this decimal by two and chooses the person corresponding to this new number. Decimals in
am
-R
i Give a reason why this is not a satisfactory method of choosing a person. [1]
es
Millie obtained a random sample of 5 people of her own age by a satisfactory sampling method and found that
y
Pr
op
their heights in metres were 1.66, 1.68, 1.54, 1.65 and 1.57. Heights are known to be normally distributed with
variance 0.0052 m 2 .
ity
C
rs
w
ii Find a 98% confidence interval for the mean height of people of Millie’s age. [3]
ie
ve
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
w
ie
id
16 When Sunil travels from his home in England to visit his relatives in India, his journey is in four stages. The
ev
br
times, in hours, for the stages have independent normal distributions as follows:
am
-R
Bus from home to the airport: N(3.75, 1.45)
-C
s
Waiting in the airport: N(3.1, 0.785)
es
Flight from England to India: N(11, 1.3)
y
Pr
op
ity
C
i Find the probability that the flight time is shorter than the total time for the other three stages. [6]
rs
w
ie
ii Find the probability that, for 6 journeys to India, the mean time waiting in the airport
ve
y
ev
op
ni
R
C
ge
PS 17 The times taken for the pupils in Ming’s year group to do their English homework have a normal distribution
w
with standard deviation 15.7 minutes. A teacher estimates that the mean time is 42 minutes. The times taken by a
ie
id
random sample of 3 students from the year group were 27, 35 and 43 minutes. Carry out a hypothesis test at the
ev
br
10% significance level to determine whether the teacher’s estimate for the mean should be accepted, stating the null
am
-R
and alternative hypotheses. [5]
-C
M 18 Diameters of golf balls are known to be normally distributed with mean µ cm and standard deviation σ cm.
y
Pr
A random sample of 130 golf balls was taken and the diameters, x cm, were measured. The results are
op
151
summarised by ∑ x = 555.1 and ∑ x 2 = 2371.30.
ity
C
ii [3]
ve
y
ev
op
ni
iii 300 random samples of 130 balls are taken and a 97% confidence interval is calculated for each sample.
How many of these intervals would you expect not to contain µ ?
R
[1]
C
ge
PS 19 The time in hours taken for clothes to dry can be modelled by the continuous random variable with probability
ev
br
-R
k t 1 ø t ø 4,
f(t ) =
-C
0 otherwise,
s
es
where k is a constant.
y
Pr
3
op
iii Find the median time taken for clothes to dry. [3]
ie
ve
y
ev
iv Find the probability that the time taken for clothes to dry is between the mean time and the median time. [2]
op
ni
R
w
ie
id
ev
br
am
-R
-C
s
es
ve
y
op
ni
U
C
ge
w
ie
id
20 A magazine conducted a survey about the sleeping time of adults. A random sample of 12 adults was chosen
ev
br
from the adults travelling to work on a train.
am
-R
i Give a reason why this is an unsatisfactory sample for the purposes of the survey. [1]
-C
s
ii State a population for which this sample would be satisfactory. [1]
es
A satisfactory sample of 12 adults gave numbers of hours of sleep as shown below.
y
Pr
op
4.6 6.8 5.2 6.2 5.7 7.1 6.3 5.6 7.0 5.8 6.5 7.2
ity
C
iii Calculate unbiased estimates of the mean and variance of the sleeping times of adults. [3]
rs
w
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
y
Pr
op
152
ity
C
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es
y
Pr
op
ity
C
rs
w
ie
ve
y
ev
op
ni
R
C
ge
w
ie
id
ev
br
am
-R
-C
s
es