07 - S1 Chapter 7
07 - S1 Chapter 7
CHAPTER 7
Correlation
Learning objectives
In earlier chapters, only single variables have been considered. Now you will be working with
pairs of variables.
After studying this chapter, you should be able to:
■ investigate the strength of a linear relationship between two variables by using suitable
statistical analysis
■ evaluate and interpret the product moment correlation coefficient.
Week 1 2 3 4 5 6 7 8
x 15 50 35 25 20 30 10 45
y 114 155 132 112 96 105 78 113
Correlation 125
y
160
140
120
Pulse rate
(beats per
minute)
100
80
0 10 20 30 40 50 x 7
Number of step-ups
126 Correlation
x x x
x x
Correlation 127
30
The appearance of the scatter
y 30 diagram is now very different.
25 The existence of correlation is
much more difficult to identify.
20 Scales should cover the range of
5 10 15 20 25 30
Maths
the given data.
The table below gives the marks obtained by the ten pupils
taking maths and history tests.
Pupil A B C D E F G H I J
Maths mark
(out of 30) 7
x 20 23 8 29 14 11 11 20 17 17
History mark History
x 17
(out of 60) 60
z 28 21 42 32 44 56 36 24 51 26
50
Calculating the mean for z:
40 z 36
360
z
10
36
30
The scatter diagram for maths and history shows a clear
20
tendency for points to run from top-left to bottom-right. This 5 10 15 20 25 30
indicates that negative correlation exists between x and z. Maths
128 Correlation
{ (x)2
x2
n }{ (y)2
y2
n } Sketches to illustrate examples
of possible values of r.
Values of r y
213
r 0.71
3
6
0
520
This, of course, can be found directly from your calculator.
The interpretation of the value of r is very important. The value r0
of r tells you how close the points are to lying on a straight x
line. No correlation
Chp-07 124-144.qxd 26/5/04 4:26 pm Page 129
Correlation 129
1 r 1
Solution
x 108 and y 6372
x2 1060.1, y2 3 396 942
xy 56 825.4
Chp-07 124-144.qxd 26/5/04 4:26 pm Page 130
130 Correlation
1082
Sxx 1060.1 88.1
12
63722
Syy 3 396 942 13 410
12
Then
108 6372
Sxy 56 825.4 522.6
12
So Note that it would be worth
522.6
r 0.481 (to 3 s.f.) investigating child I who seems to
8
8
.1
3
11
40
have an abnormally high ATST.
Considering the value of r and the scatter diagram, there is Perhaps the child was ill during
evidence of weak negative correlation between age and ATST. the experiment or perhaps there
is some other reason for the
This would indicate that older children have less ATST than
excessive amount of sleep.
younger children. However, the relationship is fairly weak.
Solution
140.52
Sxx 2723.75 749.725
10
1932
Syy 4489 764.1
10
Chp-07 124-144.qxd 26/5/04 4:26 pm Page 131
Correlation 131
Non-linear relationships
As illustrated in section 7.2, r measures linear relationships Note that clear non-linear
only. It is of no use at all when a non-linear relationship is relationships identified on scatter
evident. There may well be a very clear relationship diagrams should always be
between the variables being considered but if that commented upon but the 7
evaluation of r is not appropriate.
relationship is not linear then r will not help at all.
Freak results
An unusual result can drastically alter the value of r.
Unexpected results should always be commented upon
and investigated further as their inclusion or exclusion
in any calculations can completely change the final
result.
Chp-07 124-144.qxd 26/5/04 4:26 pm Page 132
132 Correlation
5 10 15 20 25 30 x 5 10 15 20 25 30 x
Solution
(a) The scatter diagram indicates little or no correlation
between the two variables. r could be evaluated but would
clearly be close to zero.
y
6
2
0 5 10 15 20 25 x
Correlation 133
y
20
10
0
2 4 6 8 x
Solution
(a) y
80
60
40
20 40 60 80 x
Chp-07 124-144.qxd 26/5/04 4:26 pm Page 134
134 Correlation
(c) The scores would be less variable. Training would lead to a more
The scatter diagram would be more compact but the overall consistent scale for x and y.
shape would be similar. Without training, people’s views
on texture or flavour would vary
(yi y)(zi z)
(d) r widely.
(y
i
y)2(z
i z)2
Sz A calculator cannot be used to
y, obtain r directly in this case – the
S
yySzz
formula must be used.
therefore,
1979.8
r 0.858
2
1
1
7
.6
5
21
6
.9
Country x y
A 130 150
B 5950 43
C 560 121
D 2010 53
E 1870 41
F 170 169
G 390 143
H 580 59
I 820 75
J 6620 20
K 3800 39
Correlation 135
Solution
(a) y
160
140
120
100
Infant
mortality 80
7
60
40
20
136 Correlation
(ii) PMCC measures the strength of a linear relationship. It Look back to the beginning of
is not a suitable measure for data which clearly shows section 7.5. A clear curve is seen.
a non-linear relationship, as in this case.
EXERCISE 7A
1 (a) For each of the following scatter diagrams, state whether
or not the product moment correlation coefficient is an
appropriate measure to use.
(i) y (ii) y
r 1 x r 0.3 x
(iii) y
r 1.2 x
(b) State, giving a reason, whether or not the value
underneath each diagram might be a possible value of
this correlation coefficient.
x x
(iii) y (iv) y
x x
Chp-07 124-144.qxd 26/5/04 4:26 pm Page 137
Correlation 137
138 Correlation
Correlation 139
140 Correlation
Correlation 141
{ (x)2
}{
x y
2
n
2 (y)2
n }
Remember, this can be found directly using a
calculator.
r is a measure of linear relationship only and
1 r 1
Do not refer to r if a scatter diagram clearly shows a
non-linear connection.
3 r 1 or r 1 implies that the points all exactly p129
lie on a straight line.
r 0 implies no linear relationship is present.
But … no linear relationship between the variables
does not necessarily mean that r 0.
4 Even if r is close to 1 or 1, no causal link should p131 7
be assumed between the variables without thinking
very carefully about the nature of the data involved.
Remember the feet stretching! Will it really help you
to get better at maths?
Chp-07 124-144.qxd 26/5/04 4:26 pm Page 142
142 Correlation
(b) r 0.784
y
(c) r 0.145
y
x
Chp-07 124-144.qxd 26/5/04 4:26 pm Page 143
Correlation 143