0% found this document useful (0 votes)
446 views47 pages

Cambridge International AS & A Level Mathematics: Probability & Statistics 2

Uploaded by

Strix
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
446 views47 pages

Cambridge International AS & A Level Mathematics: Probability & Statistics 2

Uploaded by

Strix
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

ity

rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
3 Niheda wishes to choose a representative sample of six employees from the 78 employees at her place of work.

ge

w
a Niheda considers taking as her sample the first six people arriving at work one morning. Give two reasons

ie
id
why this method is unsatisfactory.

ev
br
b Niheda decides to use the following method to choose her sample. She numbers each employee at her place
am

-R
of work and generates the following random numbers on her calculator:
-C

s
642 784 034 796 313 215 950 850 565 013 311 170 929

es
From these random numbers, she chooses employees 40 47 63 32 59 and 8. Explain how she chose
y

Pr
op

these employees.

ity
C

4 The manufacturer of a new chocolate bar wishes to find out what people think of it. The manufacturer decides

rs
w

to interview a sample of people. Describe the bias in the method used to select each of the following samples.
ie

ve

y
a A sample of people who have just bought the chocolate bar.
ev

op
ni

b A sample of people aged between 25 and 29.


R

C
ge

c The first 20 males shopping at a store where the chocolate bar is sold.

w
ie
id

5 Milek wishes to choose a sample of four students from a class of 16 students. The students are numbered from

ev
br

3 to 18, inclusive. Milek throws three fair dice and adds the scores. Explain why this method of choosing the
am

sample is biased.
-R
-C

6 Describe briefly how to use random numbers to choose a sample of 50 employees from a company with
s
es

712 employees.
y

Pr
op

106
ity
C

5.2 The distribution of sample means


rs
w

Different samples of data chosen from the same population will most likely have different,
ie

ve

but not necessarily dissimilar, means.


y
ev

op
ni

The sample mean is the mean of all the items in your chosen sample.
R

The sample size is the number of items you choose to be in your sample.
ge

You can explore the distribution of sample means using any distribution, discrete or
ie
id

continuous, normal or otherwise, so long as it has a defined mean value.


ev
br

For example, suppose you spin a fair four-sided spinner, numbered 1, 1, 2 and 4, a number
am

-R

of times. With a sample size of five, you could get the following outcomes:
-C

1 1 4 2 4 or 1 1 1 4 1 or 4 1 2 2 1 or …
es

There are too many possibilities to list.


y

Pr
op

12 8 10
The sample means for each of the samples presented are = 2.4 , = 1.6 and = 2,
5 5 5
ity
C

respectively. It would take a very long time to list all possible samples and work out each
rs
w

sample mean. If we did, we could create a table showing the probability distribution of the
ie

ve

sample mean and create a graph to show the distribution of the sample means.
y
ev

op
ni

To show how this works we can begin with the simplest sample size, 1, and explore the
R

probability distribution of the means of increasing sample sizes using this fair four-sided
C
ge

spinner.
w
ie
id

Let the random variable X be ‘the score on the spinner when it is spun’.
ev
br

When we spin the spinner, it is equally likely to land on each side. With a sample size of 1
am

-R

the mean of the sample is the same as the score.


-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 5: Sampling

ve

y
op
ni
U

C
To distinguish between the probability distribution of scores and the probability distribution

ge

w
of sample means we use X to represent the random variable of sample means. To follow the

ie
id
explanation it is easier to refer to the distribution of sample size 1 as X (1).

ev
br
The probability distribution of X (1) is:
am

-R
Sample mean, x 1 2 4
-C

1 1 1

s
P( X (1)) = x

es
2 4 4
y

Pr
The following figure shows the graph of the probability distribution of the sample means of
op

size 1.

ity
C

P(X (1))

rs
w

0.5
ie

ve

y
ev

0.4

op
ni
R

0.3

C
ge

0.2

w
ie
0.1
id

ev
br

0 1 2 3 4
am

Sample mean, x
-R
E( X (1)) =  1 ×  +  2 ×  +  4 × 
1 1 1
-C

 2  4  4
es

=2
y

Pr
op

Var( X (1))  =  12 ×  +  22 ×  +  42 ×  − (E( X ))2


1 1 1 107
 2  4  4
ity
C

= 5.5 − 22  
rs
w

= 1.5
ie

ve

y
Suppose we now choose random samples of size 2. If X1 is the score from the first spin and X 2
ev

op
ni

the score from the second spin, then we can draw a table to show all possible sample means.
R

X1
ge

1 1 2 4
ie
id

ev
br

1 1 1 1 1 2 1
2 2
am

-R

1 1 1 1 1 2 1
X2 2 2
-C

2 1 1 1 1 2 3
es

2 2
y

4 2 1 2 1 3 4
Pr
op

2 2
ity
C

From this we can find the probability distribution of the sample means of size 2, X (2). For
rs
w

example, there are 16 possible sample means and the sample mean 1 21 appears four times in
ie

ve

4 1
the table; hence, P( X (2) = 1 21 ) = = .
y
ev

16 4
op
ni
R

Sample mean, x 1 1 1 2 2 1 3 4
2 2
ge

1 1 1 1 1 1
ie
id

P( X (2)) = x
4 4 16 4 8 16
ev
br
am

This probability distribution is the distribution of the sample means of size 2. The following
-R

diagram shows the graph of the probability distribution of the sample means of size 2.
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
P(X (2))

ge
REWIND

w
0.5

ie
id
We can explore these results using our
0.4

ev
br
knowledge of linear combinations
am 0.3 of random variables from Chapter 3.

-R
0.2 From Chapter 3 we know that:
-C

E(X 1 + X 2 ) = E(X 1 ) + E(X 2 ) and

s
0.1

es
Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ).
y

Pr
1 2 3 4
op

Using these results to check our


Sample mean, x
findings from the table, we find:

ity
C

( ) ( ) (
E( X (2)) = 1 × 1   + 1 1 × 1 + 2 × 1 + 2 1 × 1 + 3 × 1 + 4 × 1) ( ) ( ) ( ) E( X (2)) = E  ( X 1 + X 2 ) 
1
2 

rs
4 2 4 16 2 4 8 16
w

=2
ie

= E  X 1 + X 2 
ve
1 1

y
2 
ev

2

( ) 1 
( ) 1
2 2
+  22 ×
1 
Var( X (2)) =  12 ×  +  1 1

op
ni
1
× +  21 × 1 1
 4  2 4   16   2 4  = E( X ) + E( X )
R

C
2 2
= E( X ) = 2 (as before)
+  32 ×  +  42 ×
1 1 
ge

− 22

w
 8   16  1 
Var( X (2)) = Var  ( X 1 + X 2 ) 

ie
id

= 4.75 − 4 =  0.75 2 

ev
br

1 1 
= Var  X 1 + X 2 
Note that E( X (1)) = E( X (2)), whereas Var( X (1)) ≠ Var(X (2)) . 2 
am

-R 2
1 1 1
In fact, Var( X (2)) = Var( X (1)). = 2 Var(X ) + 2 Var(X )
-C

2 2 2
s
es

To confirm these results will always work, we can explore what happens when 1 1
= Var(X ) = × 1.5
we take a sample size of 3. 2 2
y

Pr

= 0.75 (as before)


op

108
The probability distribution of the sample mean scores for X (3) is shown in
ity
C

the following table.


REWIND
rs
w
ie

ve

Sample 4 5 7 8 10 We can use our knowledge of


y
1 2 3 4
ev

mean, x 3 3 3 3 3 permutations and combinations,


op
ni

1 3 3 13 3 3 3 3 1 from Probability & Statistics 1


R

P( X (3)) = x
C

8 16 32 64 16 64 32 64 64 Coursebook, Chapter 5, to list all


ge

possible outcomes for a sample


w

size of 3.
ie
id

EXPLORE 5.5
ev
br

For example, for a sample of size 3,


a mean score 7 can happen in the
am

-R

See if you can verify the probabilities of all other scores in the 3
following six ways:
distribution table for samples of size 2.
-C

12 4 14 2 214 2 41 412 4 21
s
es

Each arrangement has a probability


y

The following diagram shows the graph of the probability distribution of the
of 1 × 1 × 1 = 1 . Hence, the
Pr
op

sample means of size 3. 2 4 4 32


probability of obtaining a mean
ity
C

P(X (3)) 7
score is 6 = 3 .
3 32 16
rs
w

0.5
For a sample of size 3, a mean
ie

ve

0.4
score 2 can happen in the following
y
ev

0.3
op
ni

four ways:
0.2
R

0.1 11 4 1 4 1 4 11 2 2 2
ge

0 And the probability of a mean


1 2 3 4
ie
id

Sample mean, x score of 2 is:

( )
ev
br

3 1×1×1 +1×1×1 = 3 + 1
2 2 4 4 4 4 16 64
am

-R

= 13
64
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 5: Sampling

ve

y
op
ni
U

C
 1 4 3  5 3   13  7 3 

ge
E( X (3)) =  1 ×  +  ×  +  ×  + 2 ×  +  × 

w
 8  3 16   3 32   64   3 16 

ie
id
8 3   3   10 3   1 
+  ×  + 3 ×  +  ×  + 4 × 

ev
br
am  3 64   32   3 64   64 

-R
1 1 5 13 7 1 9 5 1
 = + + + + + + + +  
8 4 32 32 16 8 32 32 16
-C

s
64
= =2

es
32
y

Pr
 4 2 3    5 2 3   2 2 × 13  +   7  × 3 
2
Var(X (3))  =  12 ×  +    ×
op

1
+ × +
 8  3 16   3
 32   64   3
 16 

ity
C

 8 2 3   10  2 3 
+   ×   +  32 ×
3 
  +   2 1 
rs
w

 ×  +  4 × 64  − 2
2
         
ie

3 64 32 3 64
ve

y
ev

1 1 25 13 49 1 27 25 1

op
ni

= + + + + + + + + − 22
8 3 96 16 48 3 32 48 4
R

C
1
= 4 21 − 4 =
ge

w
2

ie
id

Alternatively, using linear combinations of random variables:

ev
br

If X1 is the score from the first spin, X 2 the score from the second spin, and X 3 the score
am

from the third spin, then:


-R
1 1 1 
E( X (3)) = E  X1 + X 2 + X 3  = E( X ) = 2 (as before)
-C

3 3 3 
es
y

Var( X (3)) = Var  X1 + X 2 + X 3 


1 1 1
Pr
op

109
3 3 3 
ity
C

1 1 1
= 2 Var(X1 ) + 2 Var(X 2 ) + 2 Var(X 3 )
3 3 3
rs
w

1 1 1
ie

ve

= Var(X ) = × 1.5 = (as before)


y
3 3 2
ev

op
ni
R

EXPLORE 5.6
ge

w
ie
id

Are you able to use your knowledge of linear combinations of random variables to
ev
br

work out the results E( X (4)) and Var( X (4)) for a sample of size 4?
am

-R

What about our original sample size of 5? Can you find E( X (5)) and Var( X (5))
without having to list all possible samples?
-C

Can you extend your results to sample size n?


es
y

What do you think the graph of the sample means will look like as the sample size n
Pr
op

increases?
ity
C

rs
w
ie

ve

KEY POINT 5.4


y
ev

op
ni

If you take many samples and calculate the mean of each sample, these means have a distribution
R

called the distribution of the sample mean. A sample mean can be regarded as a random variable.
ge

If a random sample consists of n observations of a random variable X and the mean X is


ie
id

found, then:
σ2
ev
br

E( X ( n )) = µ where µ   = E(X ) and Var(X ( n )) = where σ 2 =  Var(X ).


n
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge
WORKED EXAMPLE 5.1

w
ie
id
a Show that for samples of size 1 drawn from a fair six-sided die numbered 1, 2, 3, 4, 5 and 6, E( X (1)) = 3

ev
1

br
2 and
35 .
Var( X (1)) =
am

-R
12
b Work out E( X (2)) and Var( X (2)).
-C

s
es
Answer
y

Pr
op

 1  1  1  1  1  1 You may choose to draw


a E( X (1)) =  1 ×  +  2 ×  +  3 ×  +  4 ×  +  5 ×  +  6 ×  a probability distribution
 6   6   6   6   6   6

ity
C

21 table.
= = 3 21
rs
w

6
ie

ve
Var( X (1)) =  12 ×  +  22 ×  +  32 ×  +  42 × 
1 1 1 1

y
ev

       6

op
ni
6 6 6
R

C
+  52 ×  +  62 ×  −  3 
1 1 1
 6  6  2
ge

w
91 49 35

ie
id

= − =
6 4 12

ev
br

1 1 You can use expectation


am

b E( X (2)) = E( X ) + E( X ) = E( X ) = 3 21
-R
2 2 algebra, as you have found
-C

1 1 1 1 35 35 E( X ) and Var( X ).
s

Var( X (2)) = 2 Var( X ) + 2 Var( X ) = Var( X ) = × =


es

2 2 2 2 12 24
y

Pr
op

110
ity
C

The central limit theorem


rs
w

We have now found that the means of random samples of size n from a population with
2
mean µ and variance σ 2 will have a distribution with mean µ and variance σ , but what
ie

ve

y
ev

n
op
ni

sort of distribution will it be?


R

Below are the graphs of the probability distributions for sample means of size 1, 2 and 3 for
ge

the spinner numbered 1, 1, 2 and 4.


w
ie
id

P(X
X (1)) P(X
X (2)) P(X
X (3))
ev
br

0.5 0.5 0.5


am

-R

0.4 0.4 0.4

0.3 0.3 0.3


-C

s
es

0.2 0.2 0.2


y

Pr

0.1 0.1 0.1


op

ity
C

0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
Sample mean, x Sample mean, x Sample mean, x
rs
w
ie

ve

We can see the shape of the distribution changes as n increases; this tells us the distribution
y
ev

of sample means does not depend on the shape of the original distribution.
op
ni
R

To examine the shape of the probability distribution of sample means as n increases, let us
C

explore an example using a more familiar object, the sample mean of scores on an ordinary
ge

fair die, numbered 1, 2, 3, 4, 5 and 6. The following graph shows the probability distribution
ie
id

of the sample means of size 1.


ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 5: Sampling

ve

y
op
ni
U

C
P(X (1))

ge

w
0.2

ie
id
0.15

ev
br
am 0.1

-R
0.05
-C

s
es
0 1 2 3 4 5 6
Sample mean, x
y

Pr
op

For sample size of 1, the probability distribution graph is uniform; each score has
probability 1 .

ity
C

rs
w

The following graph shows the probability distribution of the sample means of size 2.
ie

ve

y
ev

P(X (2))

op
ni
R

0.2

C
ge

0.15

w
ie
0.1
id

ev
br

0.05
am

-R
0 1 2 3 4 5 6
Sample mean, x
-C

s
es

For sample of size 2, the probability distribution graph is symmetrical about the mean value.
y

Pr
op

111
If we draw probability distribution graphs for larger sample sizes; for example, samples of
ity

size 6 and 10, shown on the following graphs, we can see the distribution of sample means
C

increasingly begins to resemble a normal distribution.


rs
w
ie

ve

P(X (6))
y
ev

0.1
op
ni
R

C
ge

w
ie
id

ev
br
am

-R

0
-C

1.0 6.0
es

Sample mean, x
y

P(X (10))
Pr
op

0.1
ity
C

rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br

0
am

-R

1.0 6.0
Sample mean, x
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
For a fair ordinary die, and where the sample size is 1, the original distribution is uniform

ge

w
(rectangular). From sample of size 2 onwards, the graphs of the probability distributions of

ie
id
sample means show the peak of the graph at the mean of the original distribution.

ev
br
As the sample size increases, the probability of getting a sample mean further away from
am

-R
the actual mean of the distribution, such as the mean of samples of size 1, becomes smaller
and smaller. Hence, the variance of the distribution of the sample mean becomes smaller
-C

s
as n becomes larger.

es
y

Pr
op

EXPLORE 5.7

ity
C

rs
You can create probability distributions graphs for different sample sizes. Search
w

‘dice experiment’ for a suitable program to use.


ie

ve

y
ev

op
ni
R

C
EXPLORE 5.8
ge

w
ie
id

Look back at your graphs generated from means of samples of single-digit random

ev
br

numbers from Explore 5.4. What conclusions can you now draw from these graphs?
am

-R
-C

KEY POINT 5.5


es
y

Pr
op

112 For large sample sizes, the distribution of a sample mean is approximately normal. This normal
distribution will have mean µ and variance σ .
2
ity
C

n
This result is true for all distributions of sample means, regardless of whether the underlying
rs
w

population is normal. This is the fundamental property of the central limit theorem.
ie

ve

y
ev

op
ni
R

The central limit theorem (CLT) states that, provided n is large, the distribution of sample
U

means of size n is:


ge

 σ 2
X ( n ) ~ N  µ , , where the original population has mean µ and variance σ 2 .
ie
id

 n 
ev
br

The value of sample size n required for the central limit theorem to be a good
am

-R

approximation depends on the original population distribution. We need to decide


how large is sufficiently large a value of n to use the central limit theorem as a good
-C

approximation. This depends on the distribution of the original population. If the original
es

population is approximately normal, then the distribution of sample means for a low
y

Pr
op

value of n is sufficient. However, if the original population does not display any features
of a normal distribution, then the value of n will need to be large. For any population, the
ity
C

central limit theorem can be used for sample size n > 50.
rs
w

It follows that if the original distribution is normal, X ~ N( µ , σ 2 ) , then the distribution


ie

ve

y
ev

of sample means from a normal distribution must also be a normal distribution since
op
ni

σ2
E( X ( n )) = µ and Var( X ( n )) =
R

. As n increases, the shape of the distribution of sample


U

n
means becomes more peaked and centred around µ , as shown in the following diagram.
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 5: Sampling

ve

y
op
ni
U

C
f(x)

ge
( 2
X(n) ~ N µ, σ )

w
n

ie
id

ev
br
am

-R
-C

s
es
y

Pr
op

ity
C

rs
w
ie

ve
x

y
ev

op
ni
0 µ 50
R

C
ge

w
KEY POINT 5.6

ie
id

ev
br

The central limit theorem is important when samples of data are being explored, because the
distribution of means of samples is approximately normal even when the parent population is not
am

-R
normal. The central limit theorem allows the use of the normal distribution to make statistical
judgements from sample data from any distribution.
-C

s
es
y

Pr
op

113
WORKED EXAMPLE 5.2
ity
C

rs

The masses of a variety of pears are normally distributed with mean 45 g and variance 52 g2. The pears are
w
ie

ve

packed in bags of six. Find the percentage of bags of pears with a total mass of more than 300 g.
y
ev

op
ni

Answer 1
R

 300 
Sample mean X ~ N  45,   . In a bag with total
52
ge

 6 − 45 
w

1− Φ  = 1 − Φ(1.698)  6 
52  mass 300 g, each will have an average mass of 300 . Use
ie
id

  6
 
ev
br

6
normal tables to calculate the probability and, hence,
= 1 − 0.9553 = 0.0447 = 4.47%
am

-R

the percentage.
-C

Answer 2
es

 300 − 270  An alternative is to find the distribution for the bag


y

1− Φ  = 1 − Φ(1.698)
312 
Pr

 of pears, and multiply the mean and variance by the


op

= 1 − 0.9553 = 0.0447 = 4.47% number of pears in the bag. Then:


ity
C

X ~ N(270, 312)
rs
w
ie

ve

Use the normal tables, as before.


y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge
WORKED EXAMPLE 5.3

w
ie
id
During an exercise session, women will drink, on average, 500 ml of water with a standard deviation of 50 ml.

ev
br
25 women are taking part in the exercise session. You have available 13 litres of water. What is the probability
am

-R
you will have sufficient water?
-C

s
Answer

es
The situation described has µ = 500 and σ = 50. You do not know if the situation follows a normal
y

Pr
op

distribution.
The probability of sufficient water implies less than

ity
C

13000 ml will be needed by the group of women. However, the distribution of sample means is normal,

rs
and its mean is the same as the population.
w

There will be sufficient water if each woman drinks, on


ie

ve
13000 Use normal tables to calculate the probability.

y
average, less than = 520ml.
ev

op
ni
25
R

 2
U

X ~ N  500, 50 

C
 25 
ge

w
 

ie
id

 520 − 500 
P( X < 520) = P  Z < 

ev
br

 50 
 25 
am

-R
= P( Z < 2 ) = 0.977
-C

s
es
y

Pr
op

114
WORKED EXAMPLE 5.4
ity
C

A continuous random variable, X , has probability density given by:


rs
w
ie

ve

x
 0ø x ø 2
y
ev

f( x ) =  2
op
ni

 0 otherwise
R

Calculate the probability that the mean, X ,  of a random sample of 50 observations


ge

3
of X is less than .
ie
id

2
ev
br

Answer
am

-R

2 REWIND
2
x2  x3  4 First find the mean and variance of X .
Mean =
∫ dx =   = 3
-C

0 2  6 0 Use the CLT to define the distribution We learnt how to find


s

mean and variance in


es

of X .
2
x3
2 Chapter 4.
y

dx −  
4

Pr

Variance = Use normal tables to calculate the


op

0 2  3
probability.
ity
C

2
 x4  16 2
=  − 9 = 9
rs
w

 8 0 Var(X bar) = Var(X) ÷ n


ie

ve

2
= ÷ 50
X ~ N  , 
4 2 
y
ev

op
ni

 3 450  9
2
R

3 4 =
C

450
 − 
ge

 3 
w

P  X <  ≈ Φ  2 3  = Φ(2.5) = 0.9938


 2  2 
ie
id


 450 
ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 5: Sampling

ve

y
op
ni
U

C
ge
WORKED EXAMPLE 5.5

w
ie
id
An IT security firm detects threats to steal online data at the rate of 12.2 per day. The

ev
br
threats occur singly and at random. A random sample of 100 weeks is chosen. Find
am

-R
the probability that the average weekly number of threats detected is less than 86.
-C

s
Answer 1

es
Let the random variable T be the First define the distribution.
y

REWIND

Pr
op

total number of threats detected in


Use the CLT to define the distribution
one week. Then T ~ Po( 85.4 ). We learnt about the

ity
C

of T .
Poisson distribution in
T ~ N  85.4, 
85.4 

rs
w

 The continuity correction is Chapter 2.


100 
ie

ve
1 1 .
 1   =

y
ev

  86 − 200  − 85.4  2 n 2 × 100

op
ni

P(T < 86) = Φ  


R

C
 85.4 
ge

 100 

w
= Φ(0.6439) = 0.74

ie
id

ev
1
br

When working with discrete random variables, the continuity correction is ±


2n
am

1
-R
not ± .
2
-C

To explain why, for this example we can find the required probability using an
s
es

alternative method.
y

Answer 2
Pr
op

115
Let the random variable X be the This time, we are using mean over the TIP
ity
C

number of threats over 100 weeks. Then: whole interval of 100 weeks. When using the central
rs

To calculate the probability using this


w

X ~ Po(8540) ≈ N(8540, 8540) limit theorem for


method, we use the usual continuity
ie

ve

sample means size n  


y
 8599.5 − 8540  1
ev

P ( X < 8600) ≈ Φ  correction ± , depending on the situation. taken from a discrete


op
ni

 8540  2 distribution, such


R

And we get the same answer as before.


C

= Φ(0.644) ≈ 0.74 as the binomial or


ge

Poisson distributions,
w

In the first method, we found P (T < 86), whereas in the second method we found the continuity
ie
id

1
P( X < 8600); 86 is 100 times smaller than 8600, and the continuity correction 1 correction is ± .
ev
br

2n
is 100 times smaller than 1 . 200
am

-R

2
-C

s
es

WORKED EXAMPLE 5.6


y

Pr
op

The random variable X ~ B ( 60, 0.25 ). The random variable X is the mean of a random sample of 50 observations
ity
C

of X . Find P( X ø 16).
rs
w
ie

ve

Answer
y
ev

op
ni

X ~ N  15,
11.25  Mean np = 60 × 0.25 = 15.
R

 50 
C

Variance np( 1 − p ) = 60 × 0.25 × 0.75 = 11.25.


ge

 1  1 .
 16 + 100 − 15  Use continuity correction
ie

( )
id

2 × 50
P X ø 16 = Φ   = Φ( 2.129 ) = 0.983
ev
br

 11.25 
 50 
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge
E RCISE 5B

w
ie
id
1 The random variable X has mean 6 and variance 8. The random variable X is the mean of a random

ev
br
sample of 80 observations of X . State the approximate distribution of X ,  giving its parameters, and find the
am

-R
probability that the sample mean is less than 6.4.
-C

s
2 The random variable X has mean 30 and variance 36. The random variable X is the mean of a random

es
sample of 100 observations of X . State the approximate distribution of X , giving its parameters, and find the
y

Pr
probability that the sample mean is greater than 31.
op

ity
C

3 The random variable Y has mean 21 and standard deviation 4.2. The random variable Y is the mean of a
random sample of 50 observations of Y . State the approximate distribution of Y , giving its parameters, and
rs
w

work out P(Y < 22).


ie

ve

y
ev

op
ni

4 The time taken for telephone calls to a call centre to be answered is normally distributed with mean
R

20 seconds and standard deviation 5 seconds. Find the probability that for 16 randomly selected calls made to

C
ge

the centre, the mean time taken to answer the calls is less than 18 seconds.

w
ie
id

PS 5 Ciara needs 5 kg of flour, so she buys 10 bags, each labelled as containing 500 g. Unknown to her, the bags

ev
br

contain, on average, 510 g with variance 120 g2. What is the probability that Ciara actually buys less flour
am

than she needs?


-R
-C

PS 6 The length, in cm, of an electrical component produced by a company may be considered to be a continuous
s
es

random variable X , having probability density function as follows:


y

Pr

 5
op

116
 1.8 ø x ø 2.2
f( x ) =  2
ity
C

 0 otherwise

rs
w

a Calculate the probability that the mean, X , of a random sample of 40 of these components is
ie

ve

y
greater than 2.05 cm.
ev

op
ni

b Calculate the probability that the mean, X , of a random sample of 20 of these components is
R

less than 2.05 cm.


ge

w
ie

7 A random sample of size 60 is taken from the random variable X , where X ~ B(45, 0.4). Given that X is the
id

ev

sample mean, find:


br
am

-R

a P( X , 19)
b P( X ø 18)
-C

s
es

8 A random sample of size 50 is taken from random variable X , where X ~ Po( 2 ). Find P(1.5 < X ø 2.2), where
y

Pr

X is the sample mean.


op

ity
C

rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 5: Sampling

ve

y
op
ni
U

C
ge

w
Checklist of learning and understanding

ie
id

ev
br
● ‘Population’ means all the items of interest within a study.
am

-R
● ‘Sample’ describes part of a population.
-C

● Biased sampling occurs when the sample is unrepresentative of the population.

s
es
● Random numbers can be used to generate a sample in which you have no control over the
y

Pr
selection.
op

● Random sampling is a process whereby each member of the population has an equal chance of

ity
C

selection.

rs
w

● Random sampling does not guarantee that the resulting sample will be representative of the
ie

ve
population.

y
ev

op
ni

● The central limit theorem allows the use of the normal distribution to make statistical
R

C
judgements from sample data from any distribution.
ge

For samples of size n drawn from a population with mean µ and variance σ 2, the distribution

w

 σ2 

ie
id

of sample means X is normal and X ~ N  µ ,  , where n is large.


 n 

ev
br
am

-R
-C

s
es
y

Pr
op

117
ity
C

rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es
y

Pr
op

ity
C

rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge
END-OF-CHAPTER REVIEW EXERCISE 5

w
ie
id
PS 1 The mean and standard deviation of the time spent by visitors at an art gallery are 3.5 hours and 1.5 hours,

ev
br
respectively.
am

-R
a Find the probability that the mean time spent in the art gallery by a random sample of:
-C

s
i 60 people is more than 4 hours [3]

es
y

ii 20 people is less than 4 hours. [3]

Pr
op

b What assumption(s), if any, did you need to make in part a ii? [1]

ity
C

PS 2 The score on a four-sided spinner is given by the random variable X with probability distribution

rs
w

as shown in the table.


ie

ve

y
X 2 3 4 5
ev

op
ni

P( X = x ) 0.1 0.4 0.2 0.3


R

C
a Show that the variance is 1.01. [3]
ge

w
b The spinner is spun 100 times and each score noted. Let S be the random variable for the sum of

ie
id

100 observations. Write down the approximate distribution of S. [2]

ev
br
am

-R
c Use a normal distribution to work out the probability that the sum of the 100 observations is
less than 350. Explain why you can use the normal distribution in this situation. [4]
-C

PS 3 The burn time, in minutes, for a certain brand of candle can be modelled by a normal distribution
es

with mean 90 and standard deviation 15.6. Find the probability that a random sample of five candles,
y

Pr
op

118 each one lit immediately after another burns out, will burn for a total of 500 minutes or less. [5]
ity
C

4 A random sample of 35 observations is to be taken from a normal distribution with mean 15 and
variance 9. If X is the sample mean, find:
rs
w
ie

P( X < 16.2)
ve

a [4]
y
ev

op
ni

b the value of k, where P( X < k ) = 0.75 . [4]


R

M 5 There are 12 equally talented children at a sports club. Jamil wishes to choose one child at random
ge

from these children to represent the club. The children are numbered 1, 2, 3 and so on up to 12.
w

Jamil then throws two ordinary fair dice, each numbered 1 to 6, and he finds the sum of the scores.
ie
id

He chooses the child whose number is the same as the sum of the scores.
ev
br
am

a Explain why this is a biased method of choosing a child. [2]


-R

b Describe briefly an unbiased method of choosing a child. [2]


-C

s
es

6 Dominic wishes to choose a random sample of five students from the 150 students in his year.
y

He numbers the students from 1 to 150. Then he uses his calculator to generate five random
Pr
op

numbers between 0 and 1. He multiplies each random number by 150 and rounds up to the next
ity

whole number to give a student number.


C

rs
w

i Dominic’s first random number is 0.392. Find the student number that is produced by this
ie

ve

random number. [1]


y
ev

op
ni

ii Dominic’s second student number is 104. Find a possible random number that would produce
R

this student number. [1]


C
ge

iii Explain briefly why five random numbers may not be enough to produce a sample of five
w

student numbers. [1]


ie
id

ev
br

Cambridge International AS & A Level Mathematics 9709 Paper 73 Q2 November 2016


am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 5: Sampling

ve

y
op
ni
U

C
ge

w
ie
id
M 7 It is known that the number, N , of words contained in the leading article each day in a certain

ev
br
amnewspaper can be modelled by a normal distribution with mean 352 and variance 29. A researcher

-R
takes a random sample of 10 leading articles and finds the sample mean, N , of N.
-C

i State the distribution of N , giving the values of any parameters. [2]

s
es
ii Find P(N > 354). [3]
y

Pr
op

Cambridge International AS & A Level Mathematics 9709 Paper 73 Q1 November 2015

ity
C

8 Jyothi wishes to choose a representative sample of 5 students from the 82 members of her school year.

rs
w

i She considers going into the canteen and choosing a table with five students from her year sitting
ie

ve
at it, and using these five people as her sample. Give two reasons why this method is unsatisfactory. [2]

y
ev

op
ni

ii Jyothi decides to use another method. She numbers all the students in her year from 1 to 82. Then
R

C
she uses her calculator and generates the following random numbers.
ge

w
231492 762305 346280

ie
id

From these numbers, she obtains the student numbers 23, 14, 76, 5, 34 and 62. Explain how Jyothi

ev
br

obtained these student numbers from the list of random numbers. [3]
am

-R
Cambridge International AS & A Level Mathematics 9709 Paper 73 Q1 June 2015
-C

PS 9 The editor of a magazine wishes to obtain the views of a random sample of readers about the future of
es

the magazine.
y

Pr
op

i A sub-editor proposes that they include in one issue of the magazine a questionnaire for readers 119
to complete and return. Give two reasons why the readers who return the questionnaire would not
ity
C

form a random sample. [2]


rs
w

The editor decides to use a table of random numbers to select a random sample of 50 readers
ie

ve

y
from the 7302 regular readers. These regular readers are numbered from 1 to 7302. The first few
ev

op
ni

random numbers which the editor obtains from the table are as follows.
R

49757 80239 52038 60882


ge

ii Use these random numbers to select the first three members in the sample. [2]
ie
id

ev
br

Cambridge International AS & A Level Mathematics 9709 Paper 73 Q2 November 2010


am

-R

M 10 The lengths of time people take to complete a certain type of puzzle are normally distributed with mean
48.8 minutes and standard deviation 15.6 minutes. The random variable X represents the time taken, in
-C

minutes, by a randomly chosen person to solve this type of puzzle. The times taken by random samples
es

of 5 people are noted. The mean time X is calculated for each sample.
y

Pr
op

i State the distribution of X , giving the values of any parameters. [2]


ity
C

ii Find P( X < 50). [3]


rs
w

Cambridge International AS & A Level Mathematics 9709 Paper 7 Q2 June 2008


ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
ve

y
op
ni
U

C
ge

w
ie
id

ev
br
am

-R
-C

s
es
y

Pr
op

ity
C

rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es
y

Pr
op

120
ity
C

Chapter 6
rs
w
ie

ve

Estimation
y
ev

op
ni
R

C
ge

In this chapter you will learn how to:


ie
id

■ calculate unbiased estimates of the population mean and variance from a sample
ev
br

■ formulate hypotheses and carry out a hypothesis test concerning the population mean in cases
am

-R

where the population is normally distributed with known variance or where a large sample is used

-C

determine and interpret a confidence interval for a population mean in cases where the
s

population is normally distributed with known variance or where a large sample is used
es


y

determine, from a large sample, an approximate confidence interval for a population proportion.
Pr
op

ity
C

rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
ge
PREREQUISITE KNOWLEDGE

w
ie
id

ev
br
Where it comes from What you should be able Check your skills
am to do

-R
Probability & Statistics 1, Calculate the mean and Calculate the mean, variance and standard
-C

Chapters 2 and 3 variance from raw and deviation for the following data sets:

s
es
summarised data.
1 n = 11  ∑ x = 16.5 ∑ x 2 = 25.85
y

Pr
2 n = 8   ∑ x = 434 ∑ x 2 = 26 630
op

3 20 24 15 18 16 25 22

ity
C

4 6.5 9.3 13.7 15.1 20.4

rs
w
ie

ve
Chapter 1 Formulate and carry out State the null and alternative hypotheses and test

y
ev

hypothesis testing. statistic for the following:

op
ni
R

5 X ~ N (86, 16); sample value 84; two-tailed test

C
at 10%
ge

w
6 X ~ N (54, 32 ); sample value 50; one-tailed test

ie
id

at 5%

ev
br

7 X ~ N (18, 3); sample value 20; one-tailed test


am

-R
at 1%
-C

Probability & Statistics 1, Know how to approximate Express the following as approximate normal
s
es

Chapter 8 a binomial distribution by a distributions:


y

normal distribution.
Pr

8 X ~ B(42, 0.4)
op

121
9 X ~ B(100, 0.55)
ity
C

rs
w
ie

ve

y
ev

op
ni

Why do we study estimation?


R

Chapter 5 explained that it is not always possible to collect data about every item in a
C
ge

population. There are many practical situations when it is necessary to use a sample
w

to obtain information about a population. For example, an asthma attack may lead to
ie
id

a hospital admission. Sample data allow us to estimate the number of people likely to
ev
br

require a stay in hospital following an asthma attack. Studying the length of the hospital
am

-R

stay will allow us to estimate hospital staffing and other resources. In turn, this allows the
hospital to assess its resources and plan for the needs of other patients.
-C

Conservationists study only samples of the population of certain species to make


es

predictions of their numbers. For example, the estimate of the population of mountain
y

Pr
op

gorillas is that there are fewer than 800 left in the world. Snow leopards live in 12 countries
in central Asia. Since the start of this century, the estimated number of snow leopards has
ity
C

decreased by about 20%. The actual numbers of snow leopards and mountain gorillas are
rs
w

unknown. These numbers are estimates.


ie

ve

y
ev

Consider a study that claims two-thirds of adults living in a particular country are
op
ni

overweight. It is unlikely that every adult in that country was weighed; yet the study states
R

they have evidence to justify their claim. That evidence comes from summary statistics
ge

from a sample of adults.


w
ie
id

The summary statistics calculated from a sample, the sample mean and the sample
ev
br

variance, are used to draw conclusions about the whole population based on the evidence
am

from the sample. These calculated summary statistics, since they only use part of a
-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
population, are estimates. To differentiate between sample statistics and population

ge

w
statistics, the following convention is used:

ie
id
● Population parameters, such as mean and variance, use Greek letters µ and σ 2 , respectively.

ev
br
● Estimates of population parameters from a sample are written using Roman letters; for
am

-R
example, x is the sample mean and s 2 is the sample variance.
-C

Note that the subject you are studying is ‘statistics’ and confusingly an estimate of a

s
es
population parameter is called a statistic; so estimates of a population’s mean and variance
y

are called population statistics.

Pr
op

ity
C

REWIND

rs
w

Section 5.2 in the previous chapter explained about the sample mean, x . This is an estimate for the
ie

ve
mean, µ, of a population. The sample mean is an unbiased estimate since the expected value of the

y
ev

op
ni
sampling distribution of the sample mean is equal to the mean of the population, the parameter it
R

is estimating.

C
ge

As an example, suppose you wish to find out the average number of fiction books people read

w
each month. You cannot ask the entire population, so instead you ask a sample of the population

ie
id

and work out the average number of fiction books read each month from the sample data. For an

ev
br

unbiased estimate you need to use an unbiased sampling method, such as random sampling, that
am

-R
ensures all members of the population have an equal chance of being selected for the sample; and,
of course, you must ask unambiguous questions, making it clear that you are only interested in
-C

fiction and not non-fiction books.


s
es
y

Pr
op

122
6.1 Unbiased estimates of population mean and variance
ity
C

One objective when taking a sample is to estimate population statistics. A statistic


or estimate is a numerical value calculated from a set of data and used in place of an
rs
w

unknown parameter in a population. The bias of an estimate is the difference between the
ie

ve

y
expected value of the estimate and the true value of the parameter. This difference is the
ev

op
ni

sampling error. The most efficient estimate is one that is unbiased.


R

The reliability of an estimate can also depend on the variance of the population. A
ge

population with a small variance implies that the data are not widely dispersed and any
ie
id

sample is therefore less likely to be seriously unrepresentative. Conversely, a population


ev
br

with a large variance implies that the data are widely dispersed and so an unrepresentative
sample may easily arise.
am

-R
-C

KEY POINT 6.1


es
y

A statistic is an estimate of a given population parameter, calculated from sample data.


Pr
op

A statistic is an unbiased estimate of a given population parameter when the mean of the sampling
ity
C

distribution of that statistic is equal to the parameter being estimated.


rs
w

If Û is some statistic derived from a random sample taken from a population, then Û is an unbiased
ie

ve

estimate for U if E(Uˆ ) = U .


y
ev

op
ni

The most efficient estimate is one that is unbiased and has the smallest variance.
R

C
ge

All the examples presented in Chapter 5 involved a sample from a population with known
ie
id

variance. In practice, if you do not know the population mean, you are unlikely to know
ev
br

the true population variance either.


am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
To explore the sampling distribution of the sample variance, we can return to the example

ge

w
about the spinner numbered 1, 1, 2, 4. In Section 5.2, we found that this distribution has

ie
id
mean 2 and variance 1.5.

ev
br
To explore the variance as the statistic, for a sample size of 1 we can work out the
am

-R
expectation of the variance E(V ).
-C

s
∑ x2

es
Sample outcomes ∑ x2 x Variance, v = − x2 Probability (outcome)
1
y

Pr
op

1
1 1 1 0

ity
2
C

rs
w

2 4 2 0 4
ie

ve
1

y
ev

4 16 4 0 4

op
ni
R

C
The sample variance, for sample size of 1, E(V ) = 0 .
ge

w
This is not equal to the variance of the original population, so the sample variance is not

ie
id

an unbiased estimate for the variance.

ev
br
am

We do not need to explore the variance for another sample size since a single example
-R
that shows the variance is biased is sufficient to prove the point. However, it is worthwhile
-C

exploring other sample sizes to see if there is a possible connection between the sample
s
es

variance and the population variance.


y

Pr

For a sample of size 2, first list all possible sample outcomes, together with the variance
op

123
and probability of choosing that sample.
ity
C

rs
w

∑ x2
Sample outcomes ∑ x2 x Variance, v = − x2 Probability (outcome)
ie

ve

2
y
ev

op
ni

4
11 2 1 0
R

16
C
ge

1
22 8 2 0
w

16
ie
id

1
ev
br

44 32 4 0
16
am

-R

1 1 4
12 5 1 2 4 16
-C

1 1 4
es

14 17 2 2
2 4 16
y

Pr
op

2
24 20 3 1
16
ity
C

rs
w

You can check the values for the probabilities of these sample outcomes in Chapter 5,
ie

ve

Section 5.2.
y
ev

op
ni

We can now draw a probability distribution table for the sample variance, sample size of 2.
R

1
ge

1
v 0 1 2
w

4 4
ie
id

P (V = v ) 3 1 1 1
ev
br

8 4 8 4
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
Hence, E(V ) =  0 ×  +  ×  +  1 ×  +  2 ×  =
3 1 1 1 1 1 3

ge
       

w
8 4 4 8 4 4 4

ie
id
3
Comparing this result with the variance of the original population, we find that E(V ) =

ev
br
4
and σ 2 = 1 21 , the variance of the original population.
am

-R
3 1 n −1
Notice that = × 1 21 , or E(V )   ×  σ 2, where n is the sample size.
-C

s
4 2 n

es
We need more than just this example to see if this relationship between the original
y

Pr
variance and the estimate of variance always holds.
op

Here are the data for a sample size of 3. You can refer to Chapter 5, Section 5.2 for

ity
C

outcomes and probabilities.

rs
w
ie

ve
∑ x2
Sample outcome ∑ x2 x Variance, v = − x2 Probability (outcome)

y
ev

op
ni
R

C
111 3 1 0 8
ge

w
4 2 3
112 6

ie
id

3 9 16

ev
br

5 2 3
12 2 9
am

3 9
-R 32
1
-C

2 2 2 12 2 0 64
s
es

3
18 2 2
y

114 16
Pr
op

124
7 14 6
ity
C

12 4 21 3 9 32
rs
w

8 8 3
2 2 4 24
ie

ve

3 9 64
y
ev

3
op
ni

14 4 33 3 2 32
R

10 8 3
ge

2 4 4 36
w

3 9 64
ie
id

1
4 4 4 48 4 0
ev
br

64
am

-R

The probability distribution table for the sample variance, sample size of 3, is therefore:
-C

2 8 14
s

v 0 2
es

9 9 9
y

9 3 6 9
Pr

5
P (V = v )
op

32 32 32 32 32
ity
C

( ) ( ) ( ) ( ) ( )
rs
w

5 2 9 8 3 14 6 9
Hence, E (V ) = 0 × + × + × + × + 2× = 1, and if we use
ie

ve

32 9 32 9 32 9 32 32
y

n −1
ev

3−1
op
ni

the relationship  E(V ) = n   × σ 2 for sample size of 3, we find that E(V ) = × 1 21 = 1,


3
R

the same value.


C
ge

In general, for a population where σ 2 is the population variance, the expectation of


ie
id

sample variance E(V ) is given by:


ev
br

n −1
am

-R

E(V ) =  × σ 2
n
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
 nV 

ge
Using the results we met in Chapter 3, Key point 3.3, this means that E   = σ 2 and

w
2
 n − 1 
n  ∑X 2

ie
( )

id
nV 1
= −X  = ∑ X − nX .
2 2
n − 1 n − 1  n  n −1

ev
br
am

-R
KEY POINT 6.2
-C

s
es
For sample size n taken from a population, an unbiased estimate of the population mean µ is the
y

sample mean x .

Pr
op

An unbiased estimate of the population variance σ 2 is:

ity
C

s2 =  
1
(
∑ x 2 − nx 2 )
rs
n −1
w
ie

ve

y
ev

op
ni

TIP
R

C
ge

Data may be raw data or summarised data. Use one of the equivalent formulae for variance to suit

w
the information:

ie
id


1
( ∑ x 2 − nx 2 )

ev
br

n −1
am

-R
1  ( ∑ x )2 
=  ∑ x2 − 
n − 1  n 
-C

1
( )
es

=  ∑ ( x − x )2  
n −1
y

Pr
op

To find an unbiased estimate of variance on your calculator, use σ n − 1 or sn − 1. 125


ity
C

rs
w

WORKED EXAMPLE 6.1


ie

ve

y
ev

op
ni

A conservationist wishes to estimate the variance of the numbers of eggs laid by Melodious larks. The following
R

data summarise her results for a sample of 30 Melodious larks’ nests ( m ).


ge

∑ m2 = 162, ∑ m = 66
ie
id

Use the data to find an unbiased estimate for the variance of the number of eggs laid by Melodious larks.
ev
br
am

-R

Answer

1  ( ∑ m )2  1  662  The summarised data suggest using


-C

 ∑ m2 −   =  162 − = 0.579
n − 1   30 − 1  30  1  ( ∑ x )2 
es

n
∑ − .
2
 x
y

n −1 n 
Pr
op

ity

Note that if the question had stated: ‘The following data summarise
C

the results of number of eggs laid in 30 nests of Melodious larks.


rs
w

∑ m2 = 162, ∑ m = 66. Find the variance.’, then the population would


ie

ve

y
ev

op
ni

be just that group of nests and you would use the variance formula
R

1 ( ∑ x) 
2
1
 ∑x −  = (162 − 145.2) = 0.56.
2
ge

n n  30
ie
id

29
× 0.579... = 0.56
ev
br

Note that
30
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge
WORKED EXAMPLE 6.2

w
ie
id
A team of conservationists monitoring a tiger population record the number of tiger cubs in a sample of 24 litters.

ev
br
The table shows their findings.
am

-R
Number of cubs, c 1 2 3 4 >4
-C

s
Frequency, f 2 7 12 3 0

es
y

Find unbiased estimates for the mean and variance of the number of tiger cubs in the litters.

Pr
op

Answer

ity
C

∑ f c (1 × 2) + (2 × 7) + (3 × 12) + (4 × 3)

rs
Adapt formulae for grouped frequency.
w

c = = 
∑f 24
ie

ve
Unbiased estimate for mean is equal to the

y
ev

64 2
= =2 sample mean.

op
ni

24 3
R

You can input data into the calculator and

C
1  ( ∑ f c )2  use σ n − 1 or sn − 1. Show key values in your
ge

w
s2 =  ∑ f c 2
−  working.
n − 1  

ie
n
id

ev
br

1  642 
= ( )
(2 × 12 ) + (7 × 22 + (12 × 32 ) + (3 × 42 ) −
am

 24 
-R
24 − 1 
1  4096  2
-C

=    186 − =
s

23  24  3
es
y

Pr
op

126
ity
C

E RCISE 6A
rs
w
ie

ve

In all of the following questions you are given some data and some descriptive statistics of the data. Your task is to find
y
ev

unbiased estimates of the population mean and variance in each question.


op
ni
R

1 Data: the length, x cm, of an electrical component.


ge

n = 32, ∑ x = 70.4  and ∑ x 2 = 175.56


ie
id

ev
br

2 Data: the time taken, t minutes, in a random sample of dental check-up appointments.
am

-R

n = 30, ∑ t = 630 and ∑ t 2 = 13 770


-C

3 Data: the yield per plant, in kg rounded to the nearest 100 g, of a random sample of a variety of
es

aubergine plants.
y

Pr
op

3.5  3.7  4.1  4.4  4.6  4.5  4.5  4.3  4.2


ity
C

4 Data: the volumes, in ml, for a brand of ice cream in a 750 ml container.
rs
w

748  751  748  751  745  756  753  760


ie

ve

y
ev

op
ni

5 Data: the total mass, x grams, for a random sample of quail eggs.
R

n = 16, ∑ x = 128.4  and ∑ x 2 = 1137.6


ge

6 Data: the total number of faults found in a random sample of 60 silk scarves.
ie
id

ev
br

Number of faults per silk scarf 0 1 2 3 4


am

-R

Number of silk scarves 26 16 7 8 3


-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
7 Data: the time taken, in days, for a random sample of letters posted second class to be delivered.

ge

w
Number of days for letter to be delivered 1 2 3 4 5

ie
id

ev
br
Number of letters 24 32 29 9 6
am

-R
-C

s
6.2 Hypothesis testing of the population mean

es
Sample data are often collected to test a statistical hypothesis about a population. Such a
y

Pr
op

sample, even if it is a random sample, may or may not be representative of the population.
The central limit theorem studied in Chapter 5 proves that random sample estimates can be

ity
C

used to make statements about populations without having to assume that the populations

rs
w

have normal distributions. Estimates of the sample mean and sample variance can be
ie

ve
calculated from the sample and these estimates can be used to see if they support or reject

y
ev

op
ni
the null hypothesis. For sample data, (sample variance) ; that is, , is referred to as the
n
R

standard error.

C
ge

w
We follow the same process as previously used when carrying out a hypothesis test of the

ie
id

population mean. Ideally, we will set up the hypotheses, then collect the sample of data in

ev
br

that order.
am

-R
REWIND
-C

The steps to carry out a hypothesis test are explained in Chapter 1, but in summary they are:
es
y

Pr

• Decide whether the situation calls for a one-tailed or two-tailed test.


op

127
• State the null and alternative hypotheses.
ity
C

• Decide on the significance level.


rs
w
ie

ve

• Calculate the test statistic.


y
ev

op
ni

• Compare the calculated probability with the critical value(s).


R

• Interpret the result in terms of the original claim.


ge

You may also need to consider Type I and Type II errors:


ie
id

• A Type I error occurs when a true null hypothesis is rejected.


ev
br
am

• A Type II error occurs when a false null hypothesis is accepted.


-R
-C

Hypothesis test of a population mean from a normal population


es

with known variance
y

Pr
op

ity

KEY POINT 6.3


C

rs
w

If the population mean is unknown, but the population variance is known, sample data can be used
ie

ve

to carry out a hypothesis test that the population mean has a particular value, as follows:
y
ev

op
ni

For a sample size n drawn from a normal distribution with known variance, σ 2, and sample mean
R

x, the test statistic is:


C

x−µ
ge

z=
w

σ
ie
id

n
ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge
WORKED EXAMPLE 6.3

w
ie
id
The masses of cucumbers grown at a smallholding are normally distributed with mean 310 g and standard

ev
br
deviation 22 g. Producers of a new plant food claim that its use increases the masses of cucumbers. To test this
am

-R
claim, some cucumber plants are grown using the new plant food and a random sample of 40 cucumbers from
these plants are selected and weighed. The mean mass of these cucumbers is 316 g.
-C

s
es
Assuming the standard deviation of the masses of the sample is the same as the standard deviation of the
y

population, test the claim at a 5% level of significance.

Pr
op

Answer 1

ity
C

Let X be the mean mass of cucumbers. First set up the test.

rs
w

 222 
ie

ve
Then X ~ N  310, . Use a one-tailed test, as you are looking for an
 40 

y
ev

increase in weight.

op
ni

H 0 : µ = 310 x−µ
R

Calculate the test statistic using z = σ


U

and

C
H1: µ > 310 n
ge

compare to significance level 5%.

w
One-tailed test at 5% level of significance

ie
id

Comment on your result in the context of the


 

ev
br

question.
 316 − 310 
P(X > 316 ) ≈ P  z >
am


-R
22
 40 
-C

= 1 − Φ(1.725)
es

= 0.0423 or 4.23%.
y

Pr
op

128
4.23% , 5%, so the masses are in the critical region.
ity
C

We reject H0, because there is some evidence to support the plant


food producer’s claim.
rs
w
ie

ve

Answer 2
y
ev

Alternative way of writing the solution:


op
ni

An alternative approach is to compare the test


R

H 0: µ = 310 statistic z with the critical value from tables.


C
ge

H1: µ . 310 Calculate the test statistic.


w
ie
id

One-tailed test at 5% level of significance, Compare it with the critical value, which is
z = φ−1(0.95) = 1.645.
ev
br

critical value z is 1.645.


am

-R

316 − 310 Comment on your result in the context of the


z= = 1.725 and 1.725 . 1.645.
22 question.
-C

40
es

So reject H 0 and accept H1.


y

Pr

There is some evidence to accept the plant food producer’s claim.


op

ity
C

WORKED EXAMPLE 6.4


rs
w
ie

ve

The burn time, in minutes, for a certain brand of candle is modelled by a normal distribution with standard
ev

op
ni

deviation 5.7. The manufacturer claims that the mean is 250 minutes. Lanfen randomly selects ten of these candles
R

and finds that their burn times in minutes are as follows:


ge

245 247 236 255 250 239 241 252 251 243
ie
id

ev
br

Stating any assumptions you make, investigate at the 5% level of significance whether the manufacturer’s claim is
am

valid.
-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
Answer

ge

w
Assumptions: The assumptions are the conditions that allow you

ie
id
Random sample chosen. to use a sample to investigate the claim.

ev
Standard deviation of the sample the same as the population.

br
1 First find the mean of the sample.
Mean = (245 + 247 + 236 + 255 + 250 + 239
am

-R
10
State the null and alternative hypotheses.
2459
-C

+ 241 + 252 + 251 + 243) = = 245.9

s
10 This is a two-tailed test, as Lanfen is not investigating

es
H 0: µ = 250 whether the claim is only too high or only too low.
y

Pr
op

H1: µ ≠ 250 Use tables to find the critical value, which is


z = Φ −1( −0.975) = −1.96.

ity
C

Two-tailed test at 5% significance,


critical value z is −1.96.

rs
w

Calculate the test statistic.


ie

ve
 5.72  Compare the test statistic with the critical value.
X ~ N  250,

y
10 
ev

op
ni

245.9 − 250 Comment on your result in the context of the


R

z= = −2.275

C
5.7 question.
ge

10

w
ie
id

Φ(–2.275) = 1 − 0.9886 = 0.0114 < 2.5% or

ev
br

−2.275 , −1.96
am

-R
Reject H 0 . There is sufficient evidence to doubt the manufacturer’s claim.
-C

s
es

Hypothesis test of population mean using a large sample


y

Pr
op

129
It is possible to carry out a hypothesis test of a population mean when the population
ity
C

variance is unknown. Provided the sample is large, we follow the same process as for
a hypothesis test of a population mean from a normal population with known variance.
rs
w

For the variance, we use s 2 , an unbiased estimate of the population variance, where
ie

ve


( ∑ x )2  .
y
ev

1 
op
ni

s2 = ∑ x2 −
n − 1  n 
R

C
ge

w
ie
id

KEY POINT 6.4


ev
br

If the population mean and population variance are unknown, sample data can be used to conduct
am

-R

a hypothesis test that the population mean has a particular value, as follows:
-C

For a large sample size n drawn with unknown variance and sample mean x, the test statistic is:
s

x−µ
es

z=
s
y

Pr

n
op

 ( ∑x ) 
2
1
ity
C

where s =
2
 ∑x −
2
.
n −1  n 
rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge
WORKED EXAMPLE 6.5

w
ie
id
A researcher believes that students underestimate how long 1 minute is. To test his belief, 42 students are

ev
br
chosen at random. Each student, in turn, closes their eyes and estimates 1 minute. The results for their times,
am

-R
x seconds, are summarised as follows:
-C

∑ x = 2471 and ∑ x 2 = 146 801

s
es
Investigate at the 10% level of significance if there is any evidence to support the researcher’s claim. What advice
y

Pr
op

would you give to the researcher based on your findings?

ity
C

Answer

rs
w

H 0: µ = 60 First, state the null and alternative hypotheses. Use


ie

ve
1 minute = 60 seconds.

y
H1: µ , 60
ev

op
ni

Decide whether a one-tailed or two-tailed test is


One-tailed test at 10% level of significance,
R

C
appropriate and find the critical value using tables.
critical value z is −1.282.
ge

w
∑ x 2471 Find unbiased estimates for the mean and variance.
x = = = 58.83 
( ∑ x )2 

ie
id

n 42 1 
∑ x2 −

ev
Use to find an unbiased
br

n − 1  n 
1  24712 
am

s2 = − = 34.73
146 801  
-R
42 − 1  42  estimate for the variance.
-C

58.83 − 60
s

Next, calculate the test statistic.


z= = −1.287
es

34.73 Compare the test statistic with the critical value and
y

Pr

42
op

130 comment in context of the question.


–1.287 , –1.282
ity
C

Advise the researcher of your findings, always in


Reject H 0 and accept H1. There is evidence to support the context of the original problem.
rs
w

researcher’s claim.
ie

ve

y
ev

The value of the test statistic and the critical value


op
ni

are very close; advise the researcher to do more tests.


R

C
ge

w
ie
id

ev
br

E RCISE 6B
am

-R

PS 1 The manufacturer of a ‘fast-acting pain relief tablet’ claims that the time taken for its tablet to work follows
a normal distribution with mean 18.4 minutes and variance 3.62 minutes 2. Tyler claims that the tablets do
-C

s
es

not work that quickly. To test the claim, a random sample of 40 people record the time taken for the tablet to
y

work. The mean time for this sample is 19.7 minutes.


Pr
op

Assuming the sample and population variances are the same, carry out an appropriate hypothesis test at the
ity
C

1% level of significance.
rs
w

PS 2 IQ test scores are normally distributed and are designed to have a mean score of 100. Anna believes the mean
ie

ve

is higher than 100. A random sample of 180 people’s IQ test scores, x, are summarised as follows.
ev

op
ni

∑ x = 18 432 and ∑ x 2 = 1 926 709.4


R

C
ge

a Carry out an appropriate hypothesis test at the 2% level of significance.


w
ie

b Anna then discovers that the IQ test is also designed to have a variance of 152 . A random sample of
id

ev
br

six people take the test and their IQ test scores are:
am

-R

103 109 112 96 100 104  


-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
To test Anna’s belief, carry out an appropriate hypothesis test, using just the random sample of six people,

ge

w
at the 2% level of significance.

ie
id
c Comment, with reasons, on the reliability of your answers to parts a and b.

ev
br
am

-R
PS 3 The mass of pesto dispensed by a machine to fill a jar is a normally distributed random variable with mean
380 g. The variance of the mass, in grams2, of the pesto in the jars is 6.4. Each week a check is made to see
-C

that the mean mass dispensed by the machine has not significantly reduced. One particular week a sample of

s
es
ten jars is checked. The mean mass of pesto in these jars is 378.7 g . Carry out an appropriate hypothesis test at
y

the 5% level of significance, stating any assumptions you have made.

Pr
op

ity
4 The average mass of large eggs is 68 g. The variance of the masses, in grams2, of large eggs is 1.72 . A farm
C

PS
shop sells large eggs singly. A customer claims that the eggs are underweight. To test the claim, a random

rs
w

sample of large eggs is weighed. Their masses, in grams, are as follows:


ie

ve

y
ev

68 65 59 72 65 60 71 73  

op
ni
R

Carry out an appropriate hypothesis test at the 1% level of significance, stating any assumption(s) you

C
have made.
ge

w
ie
id

PS 5 A machine dispenses ice cream into a cone. The amount dispensed follows a normal distribution with mean

ev
br

80 ml and the variance of the amount of ice cream dispensed, in ml 2, is 9. A consumer complains that the
am

-R
amount is too low. To check whether the machine is dispensing the correct amount, a sample of six cones is
checked. The volumes in ml are as follows:
-C

82 72 75 80 76 80
es
y

Carry out an appropriate hypothesis test at the 5% level of significance, stating any assumption(s) you
Pr
op

131
have made.
ity
C

PS 6 A shop sells 2 kg bags of potatoes. A quality control inspection checks the masses of 80 randomly chosen
rs
w

bags. Their masses, x, are summarised as follows:


ie

ve

y
∑ x = 158.14 and ∑ x 2 = 314.094 
ev

op
ni
R

Assuming the masses of the bags of potatoes are normally distributed, investigate at the 5% level of
C

significance whether there is any evidence that the bags are underweight.
ge

w
ie
id

PS 7 A manufacturer claims its light bulbs last for an average of 2000 hours. A random sample of 42 light bulbs is
ev
br

tested. The lengths of time the light bulbs last, t hours, are summarised as follows:
am

-R

∑ t = 83 895 and ∑ x 2 = 167 589 883.6


-C

Test the manufacturer’s claim at the 10% level of significance, stating any assumptions you have made.
s
es
y

6.3 Confidence intervals for population mean


Pr
op

When a hypothesis test reveals statistically significant results, the results are applicable to
ity
C

the sample. Often we use the results as if they apply to the population. However, we cannot
rs
w

be certain that the sample is actually representative of the population.


ie

ve

The hypothesis tests studied so far, in Chapter 1, Chapter 2 and earlier in this chapter,
y
ev

op
ni

involve a single parameter, the population mean, from a sample of data. To allow for the
R

issue that the sample may or may not be representative of the whole population, sample
C

data can also be used to construct an interval that specifies the limits within which it
ge

is likely that the population mean will lie. This interval is a confidence interval (CI).
ie
id

A confidence interval for a parameter is constructed at a P% level of confidence such that


ev
br

if the same population is sampled many times and each time an interval estimate is found,
am

-R

the true population parameter will occur in P% of those intervals.


-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
It is possible to construct one-sided or two-sided confidence intervals. However, we will

ge

w
consider only symmetrical two-sided intervals.

ie
id
A 95% confidence interval is the range of values in which we can be 95% confident that

ev
br
the true mean lies. If that interval is from a to b, then: P(a , true mean , b ) = 0.95. The
am

-R
central 95% of the sample distribution is from the 2.5th to the 97.5th percentile.
-C

For a normal distribution N( µ, σ 2 ), we find from normal tables that the central 95% lies

s
es
between −1.96 and +1.96 standard deviations either side of the mean.
y

Pr
 σ2 
op

For a sample distribution, we use N  µ , .


 n 

ity
C

Although the sample mean, x, is an unbiased estimate of

rs
w

the true mean, it is only an estimate. It is not necessarily


ie

ve
the true mean.

y
ev

op
ni
σ σ
If we work out sample means for a large number of −1.96 +1.96
R

√n √n
U

C
samples, 95% of the time we would expect the sample
ge

mean, x, to lie within the shaded area; that is:

w
ie
id

σ σ
µ − 1.96 < x < µ+ 1.96

ev
br

n n
am

-R
which rearranges to give:
σ σ
x − 1.96 , µ , x + 1.96
-C

n n
es
y

So to find a 95% confidence interval, use the sample values and work out the interval
Pr
op

132
σ  σ σ 
x ± 1.96 . An alternative way to write the interval is  x − 1.96 , x + 1.96 .
ity
C

n  n n
rs
w
ie

ve

y
KEY POINT 6.5
ev

op
ni
R

A 95% confidence interval means that 95% of possible sample means lie within the interval. It tells
C

us the probability that the true mean lies within the interval is 0.95, and the probability that the true
ge

mean does not lie within the interval is 1 – 0.95 =  0.05.


ie
id

ev
br
am

-R

Confidence intervals for a population mean from a normal population


with known variance
-C

Consider Worked example 6.3. The hypothesis test found that there was evidence to accept
es

the producer’s claim that its plant food increases the mass of cucumbers.
y

Pr
op

22
The sample data are summarised by sample mean x = 316 and standard error = 3.48.
40
ity
C

A 95% confidence interval for these values is 316 ± 1.96 × 3.48; that is, (309, 323).
rs
w
ie

ve

This means we can be 95% confident that the true mean lies in this range.
y
ev

op
ni

Before using the new plant food, the mean mass of cucumbers was 310 g. This mass just
R

lies within the confidence interval (309, 322), at the lower end. We could conclude that it is
C
ge

possible that the plant food does not increase the mass of the cucumbers and the sample is
w

not representative.
ie
id

ev

The percentage level chosen for the confidence interval does affect the size of the interval.
br

For example, consider what happens with a 90% confidence interval.


am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
A 90% confidence interval will give a smaller interval.

ge

w
From normal tables, the central 90% lies within 1.645 standard deviations of the mean.

ie
id
The 90% confidence interval is 316 ± 1.645 × 3.48; that is, (310, 322).

ev
br
am

-R
The lower bound 310 has been rounded from 310.3. Using a 90% confidence interval,
µ
the original population mean 310 lies just outside the interval and so you would σ σ
−1.645 +1.645
-C

accept the producer’s claim. √n √n

s
es
What about a 99% confidence interval? Using normal tables, the 99% confidence
y

Pr
op

interval can be calculated as µ ± 2.576 , giving 316 ± 2.576 × 3.48; that is, (307, 325).
n

ity
C

Compare the confidence intervals, with all values given to the nearest integer: 90%
CI = (310, 322), 95% CI = (309, 323) and 99% CI = (307, 325). We can see that the higher
rs
w

the percentage, the more confident we can be that the true mean lies within that interval.
ie

ve

y
However, the higher percentage gives a wider interval, and this means the information we
ev

op
ni

have about the true mean is less precise; that is, there is a greater range of possible values
R

C
for the true mean.
ge

w
Sample size also affects the size of a confidence interval.

ie
id

Consider a population with known standard deviation 15. A random sample n = 100

ev
br

σ 15
and x = 20 has standard error = = 1.5. A 95% confidence interval is then
am

n 100
-R
20 ± 1.96 × 1.5 or (17.1, 22.9).
-C

σ
s

15
Let us increase n. If n = 400 with x = 20 , then standard error = = 0.75 and the
es

n 400
y

95% confidence interval is narrower, 20 ± 1.96 × 0.75 or (18.5, 21.5).


Pr
op

133
ity
C

KEY POINT 6.6


rs
w
ie

ve

y
A confidence interval for an unknown population parameter, such as the mean, at a P% confidence
ev

op
ni

level, is an interval constructed so that there is a probability of P% that the interval includes the
R

parameter.
C

σ
ge

To find the confidence interval for a population mean with known variance σ 2 , calculate x ± k ,
w

n
where k is determined by the percentage level of the confidence interval.
ie
id

ev
br

% CI 90 95 98 99
am

-R

k 1.645 1.960 2.326 2.576


-C

The greater the percentage, the more confident we can be that the true parameter lies within the interval.
es
y

The greater the percentage, the wider the confidence interval and the less precise we can be about
Pr
op

the value of the true parameter.


ity

When choosing the sample size, n, as n increases the standard error σ decreases and the resulting
C

n
rs
w

confidence interval becomes narrower.


ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge
EXPLORE 6.1

w
ie
id
σ
As sample size increases, the value of decreases and so the width of a confidence

ev
br
n
interval decreases. Why do you think it is not usual practice to use very large
am

-R
samples? Hint: Find the proportional decrease in the width of the confidence interval
-C

for different values of n.

s
es
Discuss these two questions in relation to the scenarios that follow:
y

Pr
● How large a sample do you actually need?
op

● What confidence level are you prepared to accept?

ity
C

Scenario 1: Health officials for a city with population around 40 000 are concerned

rs
w

with the increase in body mass index, BMI, in the population. Would your sample
ie

ve
numbers change if the population was, say, 120 000? Explain why or why not.

y
ev

op
ni

Scenario 2: A health body wishes to investigate the effectiveness of a new drug


R

C
treatment. Discuss the possible advantages and disadvantages in combining several
ge

different trials of a new drug treatment. (Note that combining the results of many

w
ie
scientific studies is called a meta-analysis.)
id

ev
br
am

-R
-C

WORKED EXAMPLE 6.6 


es
y

Pr

Excessive vegetation in pond water can cause the appearance of unwanted organisms. Over a long period of time
op

134
it has been found that the number of unwanted organisms in 100 ml of pond water is approximately normally
ity
C

distributed with standard deviation 12. Adam takes six random 100 ml samples of water from his pond. The
rs
w

numbers of unwanted organisms in the samples are 56, 102, 48, 74, 88 and 67.
ie

ve

a Find a 95% confidence interval for the mean number of organisms in 100 ml of the pond water.
y
ev

op
ni

b If the mean number of unwanted organisms in 100 ml of pond water is above 80, vegetation should be
R

removed. Use your results to decide whether Adam needs to remove vegetation from his pond. What advice
ge

would you give Adam?


w
ie
id

Answer
ev
br

1
( 56 + 102 + 48 + 74 + 88 + 67 ) = 72.5
am

x =
-R

a First, find the mean of the sample.


6
σ Then find the standard error.
-C

12
n = 6, = = 4.9
s

n 6 σ
es

Use x ± 1.96 for a 95% confidence


n
y

CI = 72.5 ± (1.96 × 4.9) or (62.9, 82.1) interval.


Pr
op

ity

b The upper value of the range of values likely to


C

Use your result to comment in context, to


contain the mean is 82.1. The probability that the mean explain and justify the advice.
rs
w
ie

of 80 organisms lies within this interval is 0.95.


ve

y
ev

Despite the sample mean 72.5 being less than 80, the
op
ni

true mean could be as high as 82.1. Advise Adam to


R

remove some of the vegetation.


ge

Advise Adam to take more samples of pond water.


ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
ge
WORKED EXAMPLE 6.7

w
ie
id
The label on a certain packet of sweets states the contents are 100 g. It is known that the standard deviation is 5 g.

ev
br
The mechanism producing these packets of sweets is checked. From a random sample of ten packets, the mean is
am

-R
103.8 g. Find a 99% confidence interval for the mean contents of the packets of sweets. Use your result to explain
whether the mechanism needs adjustment.
-C

s
es
Answer
y

Pr
op

σ 5 σ
x ± 2.576 = 103.8 ± 2.576 = 103.8 ± 4.073 Use x ± 2.576 .
n 10

ity
n
C

The CI is (99.7, 107.9). Comment in context. It is important that

rs
w

consumers do not get less than advertised.


ie

ve
The confidence interval tells us that it is possible for the

y
ev

true mean to be below 100 g. The mechanism may need

op
ni

adjustment.
R

C
ge

w
ie
id

ev
br

DID YOU KNOW?


am

-R
Quality control of manufacturing processes is one
application of sampling methods. Random samples of the
-C

output of a manufacturing process are statistically checked


es

to ensure the product falls within specified limits and


y

Pr

consumers of the product get what they pay for. With any
op

135
product, there can be slight variations in some parameter,
ity
C

such as in the radius of a wheel bolt. Statistical calculations


rs

using the distribution of frequent samples, usually chosen


w

automatically, will give information to suggest whether the


ie

ve

y
manufacturing process is working correctly.
ev

op
ni
R

C
ge

Confidence intervals for a population mean using a large sample


w
ie
id

To find a confidence interval for a population mean, we rely upon knowing the
standard deviation, σ , of the original population. However, since calculating standard
ev
br

deviation involves knowing the mean, it is more likely that the actual value of the
am

-R

standard deviation will be unknown. Instead, we can use the sample data to calculate
an unbiased estimate of variance, s 2 , and then use s in place of σ to find the confidence
-C

s
es

interval.
y

Pr

The procedure for finding a confidence interval using an unbiased estimate of standard
op

deviation from a sample gives a reasonably accurate result provided the sample is
ity
C

sufficiently large. How large is sufficiently large? Look back at Explore 6.1. If you compare
rs
w

sample sizes 25 and 400, then since 25 = 5 and 400 = 20 you will find that increasing
ie

ve

the sample size by 16 times (16 × 25 = 400) only reduces the margin of error by
y
ev

op
ni

 1 1 1 
one-quarter  × = . ‘Large’ is not precisely defined; as a general rule, it
 25 4 400 
R

can be taken to be a sample size of 30 or more.


ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge
KEY POINT 6.7

w
ie
id
s
To find the confidence interval for a population mean using a large sample, calculate x ± k ,

ev
br
n
where s = am1
( )
∑ x 2 − nx 2 and k is determined by the percentage level of the confidence

-R
n −1
interval.
-C

s
es
y

WORKED EXAMPLE 6.8

Pr
op

ity
C

A sample of 60 strawberries is weighed, in grams. The results are summarised as follows:

rs
w

∑ x = 972 and ∑ x 2 = 17 304.78


ie

ve

y
a Find a 90% confidence interval for the mean mass of the strawberries.
ev

op
ni

b An α % confidence interval for the population mean, based on this sample, is found to have width of
R

C
3.65 grams. Find α .
ge

w
Answer

ie
id

∑ x 972 Find unbiased estimates for the mean and variance.

ev
br

a x = = = 16.2

n 60 ( ∑ x )2 
am

-R 1 
Use ∑ x 2
− to find an unbiased
s2 =
1 
17 304.78 −  
9722 
= 26.413 , so n − 1  n 
 60 
-C

60 − 1 
s

estimate for the variance.


es

s= 26.413 = 5.14 s
The sample is sufficiently large to use x ± 1.645
y

.
Pr
op

136 s 5.14 n
x ± 1.645 = 16.2 ± 1.645 = 16.2 ± 1.09
n 60 The confidence interval will be approximate, as the
ity
C

The CI is (15.1, 17.3). population standard deviation is unknown.


rs
w
ie

ve

5.14
y
= 3.65 s s
ev

b 2k The width of the CI is ±k or 2k . Here you


op
ni

60 n n
R

3.65 × 60 can use the value of s from part a.


C

k= = 2.75
ge

2 × 5.14 1 − p gives the percentage in one tail.


w
ie
id

For z = 2.75, from tables p = 0.997. 2(1 − p ) is the percentage in both tails.
ev
br

α = 1 − 2(1 − p )
so α = 99.4%.
am

-R
-C

s
es

E RCISE 6C
y

Pr
op

For questions 1 and 2 you may refer to the answers to Exercise 6A for unbiased estimates of population mean and
ity
C

variance. Give all confidence limits correct to 3 significant figures.


rs
w

1 The following data summarise the length, x cm, of an electrical component:


ie

ve

y
ev

  n = 32, ∑ x = 70.4 and ∑ x 2 = 175.56


op
ni
R

Calculate:
C
ge

a a 98% confidence interval for the population mean


ie
id

b a 90% confidence interval for the population mean.


ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
2 The following data summarise the time taken, t minutes, in a random sample of dental check-up

ge

w
appointments:

ie
id
n = 30 , ∑ t = 630  and ∑ t 2 = 13 770

ev
br
amCalculate:

-R
a a 95% confidence interval for the population mean
-C

s
b a 98% confidence interval for the population mean.

es
y

Pr
op

3 The following data summarise the total mass, x grams of the yield for a random sample of 44 chilli plants.

ity
∑ x = 842 and ∑ x 2 = 16 364
C

rs
w

Calculate:
ie

ve
a a 99% confidence interval for the population mean

y
ev

op
ni

b a 95% confidence interval for the population mean.


R

C
ge

4 The following data summarise the volume, x litres, for a random sample of bottles of juice.

w
ie
id

n = 68, ∑ x =134.14 and ∑ x 2 = 266.094

ev
br

Calculate:
am

-R
a a 90% confidence interval for the population mean
-C

b a 99% confidence interval for the population mean.


es
y

5 The following data summarise the total mass, x grams, for a random sample of quail eggs:
Pr
op

137
n = 30, ∑ x = 254.4 and ∑ x = 2271.6
2
ity
C

a Calculate a 99% confidence interval for the population mean.


rs
w
ie

ve

b An α % confidence interval for the population mean, based on this sample, is found to have width
y
ev

of 1.3 grams. Find α .


op
ni
R

6 The following data summarise the masses, x kg, of 60 bags of dry pet food.
ge

∑ x = 117 and ∑ x 2 = 232.72


ie
id

a Calculate unbiased estimates for the population mean and variance.


ev
br
am

b Calculate a 98% confidence interval for the population mean.


-R

c An α % confidence interval for the population mean, based on this sample, is found to have width of
-C

0.118 kg. Find α .


es
y

Pr

M 7 a Explain why the width of a 98% confidence interval for the mean of a standard normal distribution
op

is 4.652 .
ity
C

b The result, X , of testing the breaking strain of a brand of fishing line is a normally distributed random
rs
w

variable with mean µ and variance 2.25. The testers wish to have a 98% confidence interval for
ie

ve

µ with a total width less than 1. Find the least number of tests needed.
y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
6.4 Confidence intervals for population proportion

ge

w
Not every statistical investigation concerns means of samples. Consider, for example,

ie
id
opinion polls. Many organisations carry out opinion polls to gauge voter intentions.

ev
br
Different polls for the same election do not always agree, even when the people chosen
am

-R
are representative samples of the population. When the data have been collected and
presented, it is possible to calculate probabilities. We need to question, and statistically
-C

calculate, the reliability of their results.

s
es
There are a number of situations in which people are required to choose between two
y

Pr
op

options, such as the UK Brexit vote where the options were to either remain or leave the
European Union. The following example models an opinion poll for such a situation.

ity
C

Sofia and Diego are the only two candidates in an election; there is no third option and

rs
w

everyone has to vote. Let a vote for Sofia be called a success. In an opinion poll of n people,
ie

ve
r

y
ev

where r people say they will vote for Sofia, the proportion of successes pˆ = .

op
ni

n
R

The binomial distribution is a suitable model for this situation since there are only two

C
outcomes, there is a fixed number of people in the poll and each person independently
ge

w
chooses who to vote for.

ie
id

Let the random variable, X , be the number of people who vote for Sofia. Then

ev
br

X ~ B( n,  p ), E(X ) = np and Var(X ) = np(1 – p ).


am

-R
X
Let P̂ be the random variable ‘the proportion of the sample voting for Sofia’. Then Pˆ = .
-C

n
s
es

( )
Expected value, E Pˆ = E   = E ( X ) = ×   np = p , and so p̂ is an unbiased
X 1 1
y

 n 
Pr

n n
op

138 estimate for p.


ity
C

( )
Variance, Var Pˆ = Var   = 2 Var( X ) = 2 ×   np( 1 − p ) =
X 1 1 p(1 − p)
rs
w

 n  n n n
ie

ve

Before the election, an opinion poll of a random sample of 200 people is conducted. In
y
ev

op
ni

this opinion poll 108 people say they will vote for Sofia and 92 say they will vote for Diego.
R

With more than half of the people in the sample voting for Sofia, you may conclude that
C

Sofia will win the election. To investigate how reliable this conclusion is we would have to
ge

find a confidence interval for the population proportion.


ie
id

For sufficiently large values of n, such that np > 5 and n(1 − p ) > 5 , a binomial distribution
ev
br
am

can be approximated by a normal distribution. So an approximate distribution of the


-R

 p(1 − p ) 
sample proportion is N  p,   .
-C

 n 
s
es

Confidence intervals for a population proportion are worked out in a similar way to
y

pˆ (1 − pˆ )
Pr
op

those for the sample mean. We calculate pˆ ± k , where k is determined by the


n
ity
C

percentage level of the confidence interval.


rs
w

Note that this is an approximate confidence interval since a population proportion has a
ie

ve

binomial distribution, which is discrete, whereas the normal distribution is continuous.


y
ev

However, it is not necessary to apply continuity corrections when finding these confidence
op
ni

intervals.
R

C
ge

Returning to the opinion poll for Sofia and Diego, for the random sample of 200 people:
w

108
ie
id

Sample proportion, pˆ = = 0.54


200
ev
br

pˆ (1 − !p) 0.54 × (1 − 0.54)


am

-R

Sample variance = = = 0.001242


n 200
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
For a 95% confidence interval:

ge

w
pˆ (1 − pˆ )
pˆ ± k = 0.54 ± 1.96 × 0.001242 = 0.54 ± 0.07

ie
id
n

ev
br
So the confidence interval is (0.471, 0.609) .
am

-R
With only two candidates, the winner needs more than 50% of the votes. The question to
be resolved is where the confidence interval lies with respect to the 50% value.
-C

s
es
The following diagram shows the range of the confidence interval crossing the 50%, or 0.5,
y

mark.

Pr
op

0.471 0.609

ity
C

rs
w
ie

ve
0.5

y
ev

op
ni
So for this sample, even though more than half said they would vote for Sofia, the
R

confidence interval suggests that the proportion of votes for Sofia could be less than half.

C
ge

Suppose instead you want to know how many people to poll (i.e. to select for the sample) to

w
find a confidence interval of a given width. You could ask, ‘What sample size is needed for

ie
id

an approximate 95% confidence interval for this proportion to have a width of 0.03?’.

ev
br

pˆ (1 − pˆ )
am

-R
To find a confidence interval, we calculate ± k , so the width of the confidence
n
pˆ (1 − pˆ )
-C

interval is given by 2 k . The question requires the same proportion as the sample,
s

n
es

so pˆ = 0.54.
y

Pr
op

For a 95% confidence interval we use k = 1.96. 139


ity

0.54 × (1 − 0.54)
C

2 × 1.96 = 0.03
n
rs
w
ie

ve

n = 4240, to 3 significant figures.


y
ev

op
ni
R

DID YOU KNOW?


C
ge

A claim by general election opinion polls is that they have a


WEB LINK
ie

3% margin of error. In practice, many such polls will have


id

a sample size of 1000.


ev
br

You can find out more


am

George Gallup showed the importance of opinion polls about the guidance of
-R

when he successfully predicted that Franklin Roosevelt a large UK corporation


-C

would win the 1936 US presidential election. George on conducting and


s

Gallup continued to work in the field of public opinion. reporting of opinion


es

The work he began studying social, moral and religious polls on the BBC
y

Pr
op

opinions continues to this day in over 160 countries. website.


ity
C

Today, opinion polls provide research and advice to many large


organisations, such as manufacturers of new products, and
rs
w

political parties.
ie

ve

y
ev

op
ni
R

EXPLORE 6.2
C
ge

Find media reports on the results of an opinion poll. Does the report comment on
ie
id

how many people or voters were included in the poll? Does the report comment on
ev
br

the sampling method employed? Use the information given in the report to discuss
am

-R

the reliability of the results in the opinion poll.


-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge
KEY POINT 6.8

w
ie
id
For a large random sample, size n, an approximate confidence interval for the population

ev
br
proportion, !p, is:
 pˆ (1 − pˆ ) pˆ (1 − pˆ ) 
am

-R
  pˆ − k n
, pˆ + k
n 
-C

s
where k is determined by the percentage level of the confidence interval.

es
y

Pr
op

ity
C

WORKED EXAMPLE 6.9

rs
w

TIP
ie

ve
A Sudoku puzzle is classified as ‘easy’ if more than 70% of the people attempting to

y
Samples have a margin
ev

solve it do so within 10 minutes, and ‘hard’ if less than 20% of people take less than

op
ni

10 minutes to complete it. Otherwise it is classified as ‘average’. Of 120 people given a of error. When you find
R

a confidence interval,

C
Sudoku puzzle, 87 completed it within 10 minutes.
ge

you need to consider

w
a Find an approximate 99% confidence interval for the proportion of people where it lies in relation

ie
id

completing the puzzle within 10 minutes. Comment on how the Sudoku puzzle to the boundary, or

ev
br

should be classified. boundaries, used as


am

guides for action.


-R
b 200 random samples of 120 people were taken and a 99% confidence interval
for the proportion was found from each sample. How many of these 200
-C

confidence intervals would be expected to include the true proportion?


es
y

Pr

Answer
op

140
87 First, find the sample proportion.
ity

pˆ = = 0.725
C

a
120
Use 2.58 for a 99% confidence
rs
w

0.725 × (1 − 0.725) interval.


ie

ve

0.725 ± 2.58
y
120
ev

pˆ (1 − pˆ )
op
ni

= 0.725 ± 0.105 Calculate pˆ ± k .


n
R

CI = (0.62, 0.83) Look where the confidence interval


ge

lies with respect to the boundaries


ie

of interest.
id

ev
br
am

-R
-C

0.2 0.7
es
y

Pr

The CI crosses the 70% boundary, so


op

although in the sample more than 70%


ity
C

of people completed it in less than


rs
w

10 minutes, the interval suggests there


ie

ve

will be some samples where less than 70%


y
ev

op
ni

complete it in this time. It could be


R

classified as easy or average.


ge

b 198 200 × 0.99


ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
ge
WORKED EXAMPLE 6.10

w
ie
id
Apprentices work four days a week and spend one day a week at college. It is proposed that the college day is

ev
br
changed from a Monday to a Friday. The college will consider changing the day if 80% of apprentices are in
am

-R
favour of the change. In a sample of apprentices, how many should be asked to be 90% certain of gaining 80%
support that is not more than 5% wrong.
-C

s
es
Answer
y

pˆ = 0.8

Pr
You want the proportion in favour to be 80% or 0.8.
op

0.8 × (1 − 0.8) 5% = 0.05

ity
C

1.645 = 0.05
n pˆ (1 − pˆ )
= 0.05 .
rs
Calculate k
w

2 n
ie

ve
n=
1.645 
× 0.8 × (1 − 0.8) = 173

y
 0.05  Use 1.645 for 90% confidence interval.
ev

op
ni
R

C
ge

w
ie
id

E RCISE 6D

ev
br
am

-R
1 A quality control check of a random sample of 120 pairs of jeans produced at a factory finds that 24 pairs
are sub-standard. Calculate the following confidence intervals for the proportion of jeans produced that are
-C

sub-standard:
es
y

a a 90% confidence interval


Pr
op

141
b a 98% confidence interval.
ity
C

rs
w

2 At a university, a random sample of 250 students is asked if they use a certain social media app. Of the
ie

ve

students in the sample, 92 use this social media app. Calculate a 95% confidence interval for the proportion of
y
ev

students at the university who use this social media app.


op
ni
R

3 A four-sided spinner has sides coloured red, yellow, green and blue. The probability that the spinner lands on
ge

yellow is p. In an experiment, the spinner lands on yellow 18 times out of 80 spins. Find an approximate
w

99% confidence interval for the value of p.


ie
id

ev
br

PS 4 A biased coin flipped 500 times results in tails 272 times.


am

-R

a Find a 90% confidence interval for the probability of obtaining a tail.


-C

b This experiment is carried out ten times. How many of the confidence intervals would be expected to
es

contain the population proportion of obtaining a tail?


y

Pr
op

M 5 The proportion of European men who are red-green colour-blind is 8%. How large a sample would need to be
ity
C

selected to be 95% certain that it contains at least this proportion of red-green colour-blind men?
rs
w

6 A random sample of 200 bees from a colony is tested to find out how many are infected with Varroa mites.
ie

ve

M PS
y

Forty bees are found to be infected.


ev

op
ni

a Calculate a 99% confidence interval for the proportion of the colony infected with Varroa mites.
R

b The colony of bees will collapse and will not survive if 35% or more are infected with Varroa mites. Show
ge

why it is possible, at the 99% confidence level, that the colony of bees might collapse.
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge

w
Checklist of learning and understanding

ie
id

ev
br
● If U is some statistic derived from a random sample taken from a population, then U is an
am
unbiased estimate for Φ if E (U ) = Φ.

-R
● For sample size n taken from a population, an unbiased estimate of:
-C

s
es
● the population mean µ is the sample mean x
y

2
the population variance σ is:

Pr

op


( ∑ x )    = 1   ∑( x − x )2
2

( ) ( )

ity
1 1 
C

s2 =   ∑ x 2 − nx 2 =   ∑ x2 −
n −1 n −1  n  n −1
 

rs
w
ie

ve
● To test a hypothesis about a sample mean, x , for a sample size n drawn from a normal

y
ev

distribution with known variance, σ 2 , calculate the test statistic

op
ni
R

x −µ

C
z = σ
.
ge

w
n

ie
id

● The test statistic, z, can be used to test a hypothesis about a population mean drawn from

ev
br

any population.
am


-R
Where the population variance is unknown, use the unbiased estimate of variance s 2.
A confidence interval for an unknown population parameter, such as the mean, is an interval
-C


s

constructed so that it has a given probability that it includes the parameter.


es
y

● Confidence interval for:


Pr
op

142 σ σ 

● population mean with known variance, σ , is  x − k ,   x + k 
 n
ity

n
C

rs

 s 
w

s
● population mean using a large sample is  x − k ,   x + k  , where
 n
ie

ve

n
( ∑ x2 − nx 2 )
y
1
s =
ev

n −1
op
ni

( )
R

pˆ (1− pˆ ) pˆ (1− pˆ )
● population proportion, !p, is  pˆ − k ,    pˆ +k ,
ge

n n
w

where k is determined by the percentage level of the confidence interval.


ie
id

ev
br
am

-R
-C

s
es
y

Pr
op

ity
C

rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
ge
END-OF-CHAPTER REVIEW EXERCISE 6

w
ie
id
PS 1 The worldwide proportion of left-handed people is 10%.

ev
br
ama Find a 95% confidence interval for the proportion of left-handed people in a

-R
random sample of 200 people from town A. [3]
-C

b In town B, there is a greater proportion of left-handed people than there is in town A.

s
es
From a random sample of 100 people in town B, an α % confidence interval for the
y

proportion, p, of left-handed people is calculated to be (0.113, 0.207).

Pr
op

i Show that the proportion of left-handed people in the sample from town B is 16%. [2]

ity
C

ii Calculate the value of α . [3]

rs
w
ie

ve
PS 2 The label on a jar of jam carries the words ‘minimum contents 272 g’.

y
ev

op
ni
a Explain why, in practice, the average contents need to be greater than 272 g. [1]
R

C
b The mass of jam dispensed by a machine used to fill the jars is a normally distributed
ge

random variable with mean 276 g. The variance of the mass of the jam, in grams2,

w
dispensed by the machine is 1.82. Each week there is a check to see if the mean mass

ie
id

dispensed by the machine is 276 g. One particular week a sample of eight jars is checked.

ev
br

The mean mass of jam in these jars is 277.7 g. Carry out an appropriate hypothesis test
am

-R
at the 5% level of significance, stating any assumptions you have made. [5]
-C

M 3 An employer who is being sued for the wrongful dismissal of an employee is advised that any award paid out
s
es

will be based on national average earnings for employees of a similar age. A random sample of 120 people is
y

found to have a mean income of $21 000 with standard error $710.
Pr
op

143
a Find a 95% confidence interval for the award. [3]
ity
C

b The employer wants to know the upper limit of the award that is very unlikely to be exceeded. The employer
rs
w

defines ‘unlikely’ as a probability of 0.001.


ie

ve

y
ev

i Explain why the required size of the confidence interval is 99.8%. [1]
op
ni
R

ii Work out the unlikely upper limit of the award, giving your answer to the nearest dollar. [3]
C
ge

M 4 The volume, v ml, of liquid dispensed by a vending machine for a random sample of 60 hot drinks is
w

summarised as follows:
ie
id

ev
br

∑ v = 17 280 and ∑ v 2 = 5 015 000


am

-R

a Find unbiased estimates of the population mean and variance. [2]


-C

b Work out a 90% confidence interval for the population mean. [3]
s
es

M PS 5 The manufacturer of a certain smartphone advertises that the average charging time for the battery is
y

Pr
op

80 minutes with standard deviation 2.6 minutes. Owners of these smartphones suggest that the time is longer.
A random sample of the phones were charged from 0 to 100% and their times, in minutes, are as follows.
ity
C

rs

 88 85 82 77 86 75 80 79
w
ie

ve

a Investigate at the 5% level of significance whether or not the manufacturer’s claim is justified, stating any
y
ev

op
ni

assumption(s) you have made. [6]


R

b The given length of time a charged smartphone battery will last is normally distributed with mean 24 hours.
ge

The variance of the time taken, in minutes, for the smartphone to work is 1. The variance of the length of
w

time time taken, in hours squared, for the battery to last is 1. Sami tests a random sample of five batteries;
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge

w
ie
id
the sample mean time is 23.2 hours. Investigate at the 5% level of significance whether the time batteries

ev
br
amlast is less than the time given, stating any assumption(s) you make. [5]

-R
c In a single sample, determine how long the battery could last for, if a Type I error occurs. [1]
-C

PS 6 The manufacturer of a tablet computer claims that the mean battery life is 11 hours. A consumer organisation

s
es
wished to test whether the mean is actually greater than 11 hours. They invited a random sample of members to
y

report the battery life of their tablets. They then calculated the sample mean. Unfortunately a fire destroyed the

Pr
op

records of this test except for the following partial document.

ity
C

Test of the mean batter

rs
w

the tablet
ie

ve

y
Sample size, n
ev

op
ni
R

Sample mean (hours) 11.8

C
Is the result significant
ge

Yes

w
at the 5% level?

ie
id

Is the result significant No


at the 2.5% level?

ev
br
am

-R
Given that the population of battery lives is normally distributed with standard
-C

deviation 1.6 hours, find the set of possible values of the sample size, n. [5]
es

C am br i dge I nt e r nat i onal A S & A L e ve l M at he m at i c s 9709 P aper 73 Q4 November 2016


y

Pr
op

144
PS 7 Parcels arriving at a certain office have weights W kg, where the random variable W has mean µ and standard
ity
C

deviation 0.2. The value of µ used to be 2.60, but there is a suspicion that this may no longer be true. In order
to test at the 5% significance level whether the value of µ has increased, a random sample of 75 parcels is
rs
w

chosen. You may assume that the standard deviation of W is unchanged.


ie

ve

y
ev

i The mean weight of the 75 parcels is found to be 2.64 kg. Carry out the test. [4]
op
ni
R

ii Later another test of the same hypotheses at the 5% significance level, with another random
C
ge

sample of 75 parcels, is carried out. Given that the value of µ is now 2.68, calculate the probability
w

of a Type II error. [5]


ie
id

ev
br

Cambridge International AS & A Level Mathematics 9709 Paper 73 Q6 November 2015


am

-R

PS 8 Last year Samir found that the time for his journey to work had mean 45.7 minutes and standard deviation
3.2 minutes. Samir wishes to test whether his average journey time has increased this year. He notes the times,
-C

in minutes, for a random sample of 8 journeys this year with the following results.
es
y

46.2 41.7 49.2 47.1 47.2 48.4 53.7 45.5


Pr
op

It may be assumed that the population of this year’s journey times is normally distributed with standard
ity
C

deviation 3.2 minutes.


rs
w

i State, with a reason, whether Samir should use a one-tail or a two-tail test. [2]
ie

ve

y
ev

ii Show that there is no evidence at the 5% significance level that Samir’s mean journey time
op
ni

has increased. [5]


R

iii State, with a reason, which one of the errors, Type I or Type II, might have been made in
ge

carrying out the test in part ii. [2]


ie
id

Cambridge International AS & A Level Mathematics 9709 Paper 73 Q6 June 2012


ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Estimation

ve

y
op
ni
U

C
ge

w
ie
id
M 9 The management of a factory thinks that the mean time required to complete a particular task is 22 minutes.

ev
br
amThe times, in minutes, taken by employees to complete this task have a normal distribution with mean µ

-R
and standard deviation 3.5. An employee claims that 22 minutes is not long enough for the task. In order
to investigate this claim, the times for a random sample of 12 employees are used to test the null hypothesis
-C

µ = 22  against the alternative hypothesis µ . 22 at the 5% significance level.

s
es
i Show that the null hypothesis is rejected in favour of the alternative hypothesis
y

Pr
op

if x . 23.7 (correct to 3 significant figures), where x is the sample mean. [3]

ity
C

ii Find the probability of a Type II error given that the actual mean time is 25.8 minutes. [4]

rs
w

Cambridge International AS & A Level Mathematics 9709 Paper 71 Q5 November 2011


ie

ve

y
10 A doctor wishes to investigate the mean fat content in low-fat burgers. He takes a random sample of 15 burgers
ev

op
ni

and sends them to a laboratory where the mass, in grams, of fat in each burger is determined. The results are as
R

follows.

C
ge

w
9 7 8 9 6 11 7 9 8 9 8 10 7 9 9

ie
id

Assume that the mass, in grams, of fat in low-fat burgers is normally distributed with mean µ and that the

ev
br

population standard deviation is 1.3.


am

-R
i Calculate a 99% confidence interval for µ. [4]
-C

ii Explain whether it was necessary to use the Central Limit Theorem in the calculation in part i. [2]
es
y

iii The manufacturer claims that the mean mass of fat in burgers of this type is 8 g.
Pr
op

145
Use your answer to part i to comment on this claim. [2]
ity
C

Cambridge International AS & A Level Mathematics 9709 Paper 72 Q4 June 2011


rs
w

M 11 The masses of sweets produced by a machine are normally distributed with mean µ grams and
ie

ve

standard deviation 1.0 grams. A random sample of 65 sweets produced by the machine has a mean
y
ev

op
ni

mass of 29.6 grams.


R

i Find a 99% confidence interval for µ. [3]


ge

The manufacturer claims that the machine produces sweets with a mean mass of 30 grams.
ie
id

ii Use the confidence interval found in part i to draw a conclusion about this claim. [2]
ev
br
am

iii Another random sample of 65 sweets produced by the machine is taken. This sample gives a
-R

99% confidence interval that leads to a different conclusion from that found in part ii.
-C

Assuming that the value of µ has not changed, explain how this can be possible. [1]
s
es

Cambridge International AS & A Level Mathematics 9709 Paper 73 Q3 November 2010


y

Pr
op

12 A random sample of n people were questioned about their internet use. 87 of them had a high-speed internet
connection. A confidence interval for the population proportion having a high-speed internet connection is
ity
C

0.1129 , p , 0.1771.
rs
w
ie

i Write down the mid-point of this confidence interval and hence find the value of n. [3]
ve

y
ev

ii This interval is an α % confidence interval. Find α .


op
ni

[4]
R

Cambridge International AS & A Level Mathematics 9709 Paper 71 Q2 June 2010


ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS & A Level Mathematics: Probability & Statistics 2

ve

y
op
ni
U

C
ge

w
ie
id
PS 13 The masses of packets of cornflakes are normally distributed with standard deviation 11 g. A random sample of

ev
br
20 packets was weighed and found to have a mean mass of 746 g.
am

-R
i Test at the 4% significance level whether there is enough evidence to conclude that
the population mean mass is less than 750 g. [4]
-C

s
es
ii Given that the population mean mass actually is 750 g, find the smallest possible sample size,
y

n, for which it is at least 97% certain that the mean mass of the sample exceeds 745 g. [4]

Pr
op

Cambridge International AS & A Level Mathematics 9709 Paper 72 Q5 November 2009

ity
C

rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es
y

Pr
op

146
ity
C

rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es
y

Pr
op

ity
C

rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Interpretation
Cross-topic review
of sample
exercise
data2

ve

y
op
ni
U

C
ge
CROSS-TOPIC REVIEW EXERCISE 2

w
ie
id
M 1 The time to failure, in years, for two types of kettle can be modelled by the continuous random variables X and

ev
br
Y which, respectively, have probability density functions as follows:
am

-R
 x  9
 0øxø4  1ø y ø 3
-C

f( x ) =  8 f( y ) =  4 y3

s
 0 0

es
 otherwise  otherwise
y

Pr
op

Show that the probability of failure by time t is the same for both X and Y if t satisfies the equation

ity
C

t 4 − 18t 2 + 18 = 0 and verify that this time is just over 1 year. [7]

rs
w

2 A continuous random variable X has probability density function given by:


ie

ve
 0.25

y
ev

4øxø8

op
ni
f( x ) = 
 0 otherwise
R

C
a Sketch the graph of y = f( x ). [2]
ge

w
b State the mean and use integration to find the variance. [3]

ie
id

ev
br

c The mean of a random sample of 40 observations of X is denoted by X . State the approximate


am

distribution of X , giving its parameters. [3]


-R
d (
Find the value of a, where P X , a = 0.9. ) [3]
-C

s
es

M 3 Chakib cycles to college. He models his journey time, T minutes, by the following probability density function:
y

Pr

 1
op

147
 (25 − t ) 10 ø t ø 20
f (t ) =  100
ity
C

0 otherwise

rs
w
ie

ve

a Work out the mean and variance. [7]


y
ev

op
ni

Chakib finds that a random sample of 20 of his journey times has mean 12.4 minutes.
R

b Write down the approximate distribution of the sample mean for a sample of size 20. [3]
ge

c Show that Chakib’s model is not suitable. [4]


ie
id

PS
ev
br

4 i Give a reason for using a sample rather than the whole population in carrying out
a statistical investigation. [1]
am

-R

ii Tennis balls of a certain brand are known to have a mean height of bounce of 64.7 cm, when dropped from
-C

a height of 100 cm. A change is made in the manufacturing process and it is required to test whether this
es

change has affected the mean height of bounce. 100 new tennis balls are tested and it is found that their
y

Pr

mean height of bounce when dropped from a height of 100 cm is 65.7 cm and the unbiased estimate of the
op

population variance is 15cm 2.


ity
C

a Calculate a 95% confidence interval for the population mean. [3]


rs
w
ie

ve

b Use your answer to part ii a to explain what conclusion can be drawn about whether the
y
ev

change has affected the mean height of bounce. [1]


op
ni
R

Cambridge International AS & A Level Mathematics 9709 Paper 72 Q3 June 2016


C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS &
and
A Level
A Level
Mathematics:
Mathematics:
Probability
Probability
& Statistics
and Statistics
2 1

ve

y
op
ni
U

C
ge

w
ie
id
5 The diameter, in cm, of pistons made in a certain factory is denoted by X, where X is normally distributed with

ev
br
mean µ and variance σ 2 . The diameters of a random sample of 100 pistons were measured, with the following
am

-R
results:
-C

s
n = 100 ∑ x = 208.7 ∑ x 2 = 435.57

es
y

Calculate unbiased estimates of µ and σ 2 .

Pr
i [3]
op

The pistons are designed to fit into cylinders. The internal diameter, in cm, of the cylinders is denoted by Y ,

ity
C

where Y has an independent normal distribution with mean 2.12 and variance 0.000144. A piston will not fit

rs
w

into a cylinder if Y − X , 0.01.


ie

ve

y
ev

ii Using your answers to part i, find the probability that a randomly chosen piston will not fit into a randomly

op
ni

chosen cylinder. [6]


R

C
Cambridge International AS & A Level Mathematics 9709 Paper 73 Q7 November 2015
ge

w
6 The marks, x, of a random sample of 50 students in a test were summarised as follows:

ie
id

ev
br

n = 50  ∑ x = 1508 ∑ x 2 = 51 825


am

-R
i Calculate unbiased estimates of the population mean and variance. [3]
-C

Each student’s mark is scaled using the formula y = 1.5x + 10 . Find estimates of the population mean and
es

ii
y

variance of the scaled marks, y. [3]


Pr
op

148
Cambridge International AS & A Level Mathematics 9709 Paper 73 Q4 June 2015
ity
C

PS 7 In a survey a random sample of 150 households in Nantville were asked to fill in a questionnaire about
rs
w

household budgeting.
ie

ve

y
ev

i The results showed that 33 households owned more than one car. Find an approximate 99% confidence
op
ni

interval for the proportion of all households in Nantville with more than one car. [4]
R

ii The results also included the weekly expenditure on food, x dollars, of the households. These were
ge

summarised as follows:
ie
id

n = 150,  ∑ x = 19 035, and ∑ x 2 = 4 054 716


ev
br
am

-R

Find unbiased estimates of the mean and variance of the weekly expenditure on food of all households in
Nantville. [3]
-C

iii The government has a list of all the households in Nantville numbered from 1 to 9526. Describe briefly how
es

to use random numbers to select a sample of 150 households from this list. [3]
y

Pr
op

Cambridge International AS & A Level Mathematics 9709 Paper 72 Q4 November 2014


ity
C

PS 8 The number of hours that Mrs Hughes spends on her business in a week is normally distributed with mean
rs
w

µ and standard deviation 4.8. In the past the value of µ has been 49.5.
ie

ve

Assuming that µ is still equal to 49.5, find the probability that in a random sample of 40 weeks the mean
y

i
ev

op
ni

time spent on her business in a week is more than 50.3 hours. [4]
R

Following a change in her arrangements, Mrs Hughes wishes to test whether µ has decreased. She chooses a
ge

random sample of 40 weeks and notes that the total number of hours she spent on her business during these
ie
id

weeks is 1920.
ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Interpretation
Cross-topic review
of sample
exercise
data2

ve

y
op
ni
U

C
ge

w
ie
id
ii a Explain why a one-tail test is appropriate. [1]

ev
br
am b Carry out the test at the 6% significance level. [4]

-R
c Explain whether it was necessary to use the Central Limit theorem in part ii b. [1]
-C

s
Cambridge International AS & A Level Mathematics 9709 Paper 72 Q5 November 2014

es
y

PS 9 Following a change in flight schedules, an airline pilot wished to test whether the mean distance that he flies in a

Pr
op

week has changed. He noted the distances, x km, that he flew in 50 randomly chosen weeks and summarised the

ity
C

results as follows.

rs
w

n = 50, ∑ x = 143 300, and ∑ x 2 = 410 900 000


ie

ve

y
i Calculate unbiased estimates of the population mean and variance. [3]
ev

op
ni
R

ii In the past, the mean distance that he flew in a week was 2850 km. Test, at the 5% significance level, whether
U

C
the mean distance has changed. [5]
ge

w
Cambridge International AS & A Level Mathematics 9709 Paper 71 Q3 November 2013

ie
id

ev
br

10 Each of a random sample of 15 students was asked how long they spent revising for an exam.
am

-R
50 70 80 60 65 110 10 70 75 60 65 45 50 70 50
Assume that the times for all students are normally distributed with mean µ minutes and standard deviation
-C

s
es

12 minutes.
y

Calculate a 92% confidence interval for µ .


Pr

i [4]
op

149

ii Explain what is meant by a 92% confidence interval for µ . [1]


ity
C

rs
w

iii Explain what is meant by saying that a sample is ‘random’. [1]


ie

ve

Cambridge International AS & A Level Mathematics 9709 Paper 73 Q3 June 2013


y
ev

op
ni

11 In the past the weekly profit at a store had mean $34 600 and standard deviation $4500. Following a change of
R

ownership, the mean weekly profit for 90 randomly chosen weeks was $35 400.
ge

i Stating a necessary assumption, test at the 5% significance level whether the mean weekly profit has
ie
id

increased. [6]
ev
br

ii State, with a reason, whether it was necessary to use the Central Limit Theorem in part i. [2]
am

-R

The mean weekly profit for another random sample of 90 weeks is found and the same test is carried out at the
-C

5% significance level.
s
es

iii State the probability of a Type I error. [1]


y

Pr
op

iv Given that the population mean weekly profit is now $36 500, calculate the probability of a Type II error. [5]
ity
C

Cambridge International AS & A Level Mathematics 9709 Paper 73 Q7 June 2013


rs
w

M 12 In order to obtain a random sample of people who live in her town, Jane chooses people at random from the
ie

ve

telephone directory for her town.


y
ev

op
ni

i Give a reason why Jane’s method will not give a random sample of people who live in
R

the town. [1]


ge

Jane now uses a valid method to choose a random sample of 200 people from her town and finds that 38 live in
ie
id

apartments.
ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS &
and
A Level
A Level
Mathematics:
Mathematics:
Probability
Probability
& Statistics
and Statistics
2 1

ve

y
op
ni
U

C
ge

w
ie
id
ii Calculate an approximate 99% confidence interval for the proportion of all people in Jane’s town who live

ev
br
in apartments.
am [4]

-R
iii Jane uses the same sample to give a confidence interval of width 0.1 for this proportion. This interval is
-C

an x% confidence interval. Find the value of x. [4]

s
es
Cambridge International AS & A Level Mathematics 9709 Paper 72 Q6 November 2012
y

Pr
op

PS 13 The volumes of juice in bottles of Apricola are normally distributed. In a random sample of 8 bottles, the
volumes of juice, in millilitres, were found to be as follows.

ity
C

rs
w

332 334 330 328 331 332 329 333


ie

ve
i Find unbiased estimates of the population mean and variance. [3]

y
ev

op
ni

A random sample of 50 bottles of Apricola gave unbiased estimates of 331 millilitres and 4.20 millilitres2 for
R

C
the population mean and variance respectively.
ge

w
ii Use this sample of size 50 to calculate a 98% confidence interval for the population mean. [3]

ie
id

iii The manufacturer claims that the mean volume of juice in all bottles is 333 millilitres. State, with a reason,

ev
br

whether your answer to part ii supports this claim. [1]


am

-R
Cambridge International AS & A Level Mathematics 9709 Paper 71 Q4 November 2011
-C

PS 14 Metal bolts are produced in large numbers and have lengths which are normally distributed with mean 2.62 cm
es

and standard deviation 0.30 cm.


y

Pr
op

150
i Find the probability that a random sample of 45 bolts will have a mean length of more than 2.55 cm. [3]
ity
C

ii The machine making these bolts is given an annual service. This may change the mean length of bolts
rs
w

produced but does not change the standard deviation. To test whether the mean has changed, a random
ie

ve

sample of 30 bolts is taken and their lengths noted. The sample mean length is m cm. Find the set of values
y
ev

of m which result in rejection at the 10% significance level of the hypothesis that no change in the mean
op
ni

length has occurred. [4]


R

Cambridge International AS & A Level Mathematics 9709 Paper 71 Q3 June 2010


ge

w
ie

15 There are 18 people in Millie’s class. To choose a person at random she numbers the people in the class from 1 to
id

ev
br

18 and presses the random number button on her calculator to obtain a 3-digit decimal. Millie then multiplies
the first digit in this decimal by two and chooses the person corresponding to this new number. Decimals in
am

-R

which the first digit is zero are ignored.


-C

i Give a reason why this is not a satisfactory method of choosing a person. [1]
es

Millie obtained a random sample of 5 people of her own age by a satisfactory sampling method and found that
y

Pr
op

their heights in metres were 1.66, 1.68, 1.54, 1.65 and 1.57. Heights are known to be normally distributed with
variance 0.0052 m 2 .
ity
C

rs
w

ii Find a 98% confidence interval for the mean height of people of Millie’s age. [3]
ie

ve

Cambridge International AS & A Level Mathematics 9709 Paper 72 Q1 November 2009


y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Chapter 6: Interpretation
Cross-topic review
of sample
exercise
data2

ve

y
op
ni
U

C
ge

w
ie
id
16 When Sunil travels from his home in England to visit his relatives in India, his journey is in four stages. The

ev
br
times, in hours, for the stages have independent normal distributions as follows:
am

-R
Bus from home to the airport: N(3.75, 1.45)
-C

s
Waiting in the airport: N(3.1, 0.785)

es
Flight from England to India: N(11, 1.3)
y

Pr
op

Car in India to relatives: N( 3.2, 0.81 )

ity
C

i Find the probability that the flight time is shorter than the total time for the other three stages. [6]

rs
w
ie

ii Find the probability that, for 6 journeys to India, the mean time waiting in the airport
ve

y
ev

is less than 4 hours. [3]

op
ni
R

Cambridge International AS & A Level Mathematics 9709 Paper 71 Q6 June 2009


U

C
ge

PS 17 The times taken for the pupils in Ming’s year group to do their English homework have a normal distribution

w
with standard deviation 15.7 minutes. A teacher estimates that the mean time is 42 minutes. The times taken by a

ie
id

random sample of 3 students from the year group were 27, 35 and 43 minutes. Carry out a hypothesis test at the

ev
br

10% significance level to determine whether the teacher’s estimate for the mean should be accepted, stating the null
am

-R
and alternative hypotheses. [5]
-C

Cambridge International AS & A Level Mathematics 9709 Paper 7 Q2 November 2008


s
es

M 18 Diameters of golf balls are known to be normally distributed with mean µ cm and standard deviation σ cm.
y

Pr

A random sample of 130 golf balls was taken and the diameters, x cm, were measured. The results are
op

151
summarised by ∑ x = 555.1 and ∑ x 2 = 2371.30.
ity
C

i Calculate unbiased estimates of µ and σ 2 . [3]


rs
w

Calculate a 97% confidence interval for µ.


ie

ii [3]
ve

y
ev

op
ni

iii 300 random samples of 130 balls are taken and a 97% confidence interval is calculated for each sample.
How many of these intervals would you expect not to contain µ ?
R

[1]
C
ge

Cambridge International AS & A Level Mathematics 9709 Paper 7 Q4 November 2008


w
ie
id

PS 19 The time in hours taken for clothes to dry can be modelled by the continuous random variable with probability
ev
br

density function given by:


am

-R

 k t 1 ø t ø 4,
f(t ) = 
-C

 0 otherwise,
s
es

where k is a constant.
y

Pr

3
op

i Show that k = . [3]


14
ity
C

ii Find the mean time taken for clothes to dry. [4]


rs
w

iii Find the median time taken for clothes to dry. [3]
ie

ve

y
ev

iv Find the probability that the time taken for clothes to dry is between the mean time and the median time. [2]
op
ni
R

Cambridge International AS & A Level Mathematics 9709 Paper 7 Q7 November 2008


C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution


ity
rs
Cambridge International AS &
and
A Level
A Level
Mathematics:
Mathematics:
Probability
Probability
& Statistics
and Statistics
2 1

ve

y
op
ni
U

C
ge

w
ie
id
20 A magazine conducted a survey about the sleeping time of adults. A random sample of 12 adults was chosen

ev
br
from the adults travelling to work on a train.
am

-R
i Give a reason why this is an unsatisfactory sample for the purposes of the survey. [1]
-C

s
ii State a population for which this sample would be satisfactory. [1]

es
A satisfactory sample of 12 adults gave numbers of hours of sleep as shown below.
y

Pr
op

4.6 6.8 5.2 6.2 5.7 7.1 6.3 5.6 7.0 5.8 6.5 7.2

ity
C

iii Calculate unbiased estimates of the mean and variance of the sleeping times of adults. [3]

rs
w

Cambridge International AS & A Level Mathematics 9709 Paper 7 Q1 June 2008


ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es
y

Pr
op

152
ity
C

rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es
y

Pr
op

ity
C

rs
w
ie

ve

y
ev

op
ni
R

C
ge

w
ie
id

ev
br
am

-R
-C

s
es

Copyright Material - Review Only - Not for Redistribution

You might also like