Data Analysis
Data Analysis
( ) x x
n
i
i
n
2
1
where x
i
represents each datapoint x x x
n 1 2
, , ..., ( )
x is the mean,
n is the number of values.
Then ( ) x x
i
2
gives the square of the difference between each value and the mean
(squaring exaggerates the effect of data points far from the mean and gets rid of negative
values), and
( ) x x
i
i
n
2
1
sums up all these squared differences.
The expression
1 2
1
n
x x
i
i
n
( )
=
gives an average value to these differences. If all the data were the same, then each x
i
would equal x and the expression would be zero.
Finally we take the square root of the expression so that the dimensions of the standard
deviation are the same as those of the data.
So standard deviation is a measure of the spread of the data. The greater its value, the
more spread out the data are. This is illustrated by the two frequency polygons shown
above. Although both sets of data have the same mean, the data represented by the
'dotted' frequency polygon will have a greater standard deviation than the other.
Worked Example 1
Find the mean and standard deviation of the numbers,
6, 7, 8, 5, 9.
Solution
The mean, x , is given by,
x =
+ + + + 6 7 8 5 9
5
=
35
5
= 7.
9.6
177
MEP Pupil Text 9
Now the standard deviation can be calculated.
s.d. =
+ + + + ( ) ( ) ( ) ( ) ( ) 6 7 7 7 8 7 5 7 9 7
5
2 2 2 2 2
=
+ + + + 1 0 1 4 4
5
=
10
5
= 2
= 1 414 . (to 3 decimal places)
An alternative formula for standard deviation is
s.d. =
=
x
n
x
i
i
n
2
1 2
This expression is much more convenient for calculations done without a calculator. The
proof of the equivalence of this formula is given below although it is beyond the scope of
the GCSE syllabus.
Proof
You can see the proof of the equivalence of the two formulae by noting that
x x x x x x
i
i
n
i i
i
n
( ) = +
( )
= =
2
1
2 2
1
2
= ( ) +
= = =
x x x x
i
i
n
i
i
n
i
n
2
1
2
1 1
2
= +
= = =
x x x x
i
i
n
i
i
n
i
n
2
1 1
2
1
2 1
(since the expressions 2x and x
2
are common for each term in the summation).
But 1
1 i
n
n
=
1
1
, by
definition, thus
1 1
2
2
1
2
1 1
2
n
x x
n
x x x x n
i
i
n
i
i
n
i
i
n
( ) = +
= = =
(substituting 1
1
=
=
n
i
n
)
178
MEP Pupil Text 9
=
+
= =
x
n
x
x
n
x
i
i
n
i
i
n
2
1 1 2
2 (dividing by n)
= +
=
x
n
x x
i
i
n
2
1 2 2
2 (substituting x for
x
n
i
i
n
=
1
)
=
=
x
n
x
i
i
n
2
1 2
and the result follows.
Worked Example 2
Find the mean and standard deviation of each of the following sets of numbers.
(a) 10, 11, 12, 13, 14 (b) 5, 6, 12, 18, 19
Solution
(a) The mean, x , is given by
x =
+ + + + 10 11 12 13 14
5
=
60
5
= 12
The standard deviation can now be calculated using the alternative formula.
s.d. =
+ + + +
10 11 12 13 14
5
12
2 2 2 2 2
2
= 146 144
= 32
= 1 414 . (to 3 decimal places) .
(b) The mean, x , is given by
x =
+ + + + 5 6 12 18 19
5
= 12 (as in part (a)).
9.6
179
MEP Pupil Text 9
The standard deviation is given by
s.d. =
+ + + +
5 6 12 18 19
5
12
2 2 2 2 2
2
= 178 144
= 5 831 . (to 3 decimal places).
Note that both sets of numbers have the same mean value, but that set (b) has a much
larger standard deviation. This is expected, as the spread in set (b) is clearly far more
than in set (a).
Worked Example 3
The table below gives the number of road traffic accidents per day in a small town.
Accidents per day 0 1 2 3 4 5 6
Frequency 5 8 6 3 2 1 1
Find the mean and standard deviation of this data.
Solution
The necessary calculations for each datapoint, x
i
, are set out below.
Accidents per day Frequency
x
i
( ) f
i
( ) x
i
2
x f
i i
x f
i i
2
0 5 0 0 0
1 8 1 8 8
2 6 4 12 24
3 3 9 9 27
4 2 16 8 32
5 0 25 0 0
6 1 36 6 36
TOTALS 25 43 127
From the totals,
n = 25, x f
i i
i
n
=
=
1
43, x
i
i
n
2
1
127
=
= .
The mean, x , is now given by
x
x f
n
i i
i
n
=
=
1
=
43
25
= 1 72 . .
180
MEP Pupil Text 9
The standard deviation is now given by
s.d. =
=
x f
n
x
i i
i
n
2
1 2
=
127
25
1 72
2
.
= 1 457 . .
Most scientific calculators have statistical functions which will calculate the mean and
standard deviation of a set of data.
Exercises
1. (a) Find the mean and standard deviation of each set of data given below.
A 51 56 51 49 53 62
B 71 76 71 69 73 82
C 102 112 102 98 106 124
(b) Describe the relationship between each set of numbers and also the relation-
ship between their means and standard deviations.
2. Two machines, A and B, fill empty packets with soap powder. A sample of boxes
was taken from each machine and the weight of powder (in kg) was recorded.
A 2.27 2.31 2.18 2.2 2.26 2.24
B 2.78 2.62 2.61 2.51 2.59 2.67 2.62 2.68 2.70
(a) Find the mean and standard deviation for each machine.
(b) Which machine is most consistent?
3. Two groups of students were trying to find the acceleration due to gravity.
Each group conducted 5 experiments.
Group A 9.4 9.6 10.2 10.8 10.1
Group B 9.5 9.7 9.6 9.4 9.8
Find the mean and standard deviation for each group, and comment on their results.
4. The number of matches per box was counted for 100 boxes of matches.
The results are given in the table below.
9.6
181
MEP Pupil Text 9
Number of Matches Frequency
44 28
45 31
46 14
47 15
48 8
49 2
50 2
Find the mean and standard deviation of this data.
5. When two dice were thrown 50 times the total scores shown below were obtained.
Score Frequency
2 1
3 0
4 4
5 8
6 12
7 9
8 7
9 5
10 3
11 1
12 0
Find the mean and standard deviation of these scores.
6. The length of telephone calls from an office was recorded. The results are given in
the table below.
Length of call (mins) 0 0 5 < t . 0 5 1 0 . . < t 1 0 2 0 . . < t 2 0 5 0 . . < t
Frequency 8 10 12 4
Estimate the mean and standard deviation using this table.
7. The charges (to the nearest ) made by a garage for repair work on cars in one
week are given in the table below.
Charge () 20 29 30 49 50 99 100 149 150 199 200 300
Frequency 10 22 6 2 4 1
Use this table to estimate the mean and standard deviation.
182
MEP Pupil Text 9
8. Thirty families were selected at random in two different countries. They were
asked how many children there were in each family.
Country A Country B
1 2 1 2 2 1
2 0 1 4 1 1
2 2 3 2 3 5
2 1 1 3 5 1
2 2 2 9 4 2
3 1 0 6 5 1
5 1 4 4 2 2
0 2 1 4 4 5
1 1 0 3 2 2
2 0 2 4 7 0
Find the mean and standard deviation for each country and comment on the results.
9. (a) Calculate the standard deviation of the numbers
3, 4, 5, 6, 7.
(b) Show that the standard deviation of every set of five consecutive integers is
the same as the answer to part (a).
(LON)
10. Ten students sat a test in Mathematics, marked out of 50. The results are shown
below for each student.
25, 27, 35, 4, 49, 10, 12, 45, 45, 48
(a) Calculate the mean and standard deviation of the data.
The same students also sat an English test, marked out of 50. The mean and
standard deviation are given by
mean = 30, standard deviation = 3.6.
(b) Comment on and contrast the results in Mathematics and English.
(SEG)
11. Ten boys sat a test which was marked out of 50. Their marks were
28, 42, 35, 17, 49, 12, 48, 38, 24 and 27.
(a) Calculate
(i) the mean of the marks,
(ii) the standard deviation of the marks.
Ten girls sat the same test. Their marks had a mean of 30 and a standard deviation
of 6.5.
(b) Compare the performances of the boys and girls.
(NEAB)
9.6
183
MEP Pupil Text 9
12. There are twenty pupils in class A and twenty pupils in class B. All the pupils in
class A were given an I.Q. test. Their scores on the test are given below.
100, 104, 106, 107, 109, 110, 113, 114, 116, 117,
118, 119, 119, 121, 124, 125, 127, 127, 130, 134.
(a) The mean of their scores is 117. Calculate the standard deviation.
(b) Class B takes the same I.Q. test. They obtain a mean of 110 and a standard
deviation of 21. Compare the data for class A and class B.
(c) Class C has only 5 pupils. When they take the I.Q. test they all score 105.
What is the value of the standard deviation for class C?
(SEG)
13. The following are the scores in a test for a set of 15 students.
5 4 8 7 3
6 5 9 6 10
7 8 6 4 2
(a) (i) Calculate the mean score.
(ii) Calculate the standard deviation of the scores.
A set of 10 different students took the same test. Their scores are listed below.
5 6 6 7 7
4 7 8 3 7
(b) After making any necessary calculations for the second set, compare the two
sets of scores. Your answer should be understandable to someone who does
not study Statistics.
(MEG)
14. In a survey on examination qualifications, 50 people were asked,
How many subjects are listed on your GCSE certificate?
The frequency distribution of their responses is recorded in the table below.
Number of subjects 1 2 3 4 5 6 7
Number of people 5 3 7 8 9 10 8
(a) Calculate the mean and standard deviation of the distribution.
(b) A Normal Distribution has approximately 68% of its data values within one
standard deviation of the mean.
Use your answers to part (a) to check if the given distribution satisfies this
property of a Normal Distribution. Show your working clearly.
(MEG)