Bivariate Data Notes
Bivariate Data Notes
BIVARIATE DATA
REGRESSION AND CORRELATION
SYLLLABUS OBJECTIVES
You should be able to:
1. DEFINITIONS
1.2. Correlation
Is a statistical method used to determine whether there is a linear relationship
between variables. If a relationship exists we say the variables are correlated.
1.3. Regression
Is a statistical method used to describe the nature of the relationship between
variables. ie, positive or negative, linear or nonlinear.
2. SCATTER PLOTS
In regression and correlation we are interested in answering the question, are two
variables linearly related? and if so, what is the strength of the relationship?
1
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
A visual way to describe the nature of the relationship between two variables is by
using a scatter plot which is a graph of the ordered pairs (x, y).
Example 1
Construct a scatter plot for the data shown for Mathematics and Physics test marks
all marked out of 10 students.
Maths 4 9 6 10 20 7 12 17 11 10
Physics 6 8 9 13 20 9 10 17 13 8
Letting Maths = x and Physics = y the (x;y) scatter plot will be as follows
If the points appear to lie on a straight line pattern we can say the variables are correlated. In
our example the variables are positively correlated as they appear to increase together.
The two variables in bivariate data are called the independent variable and the
dependent variable.
2
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
TASK 1
TASK 2
For the following data plot scatter diagrams and comment
a)
X 8 10 15 6 11 12 13 11 9
Y 14 10 18 13 14 13 16 11 12
b)
X 25 18 32 27 21 35 28 30 16
Y 16 11 20 17 15 26 22 20 10
c)
X 5 7 12 16 20
Y 4 12 18 21 24
3. RERESSION FUNCTION y on x
From a scatter plot a mathematical relationship between the two variables can be
seen.
To find the line of best fit you need to calculate the values of a and b.
Example 2
3
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
Using the data in Example 1of 10 students’ marks in physics and maths and letting
Maths = x and Physics = y. Calculate regression line of y on x.
Maths 4 9 6 10 20 7 12 17 11 10
Physics 6 8 9 13 20 9 10 17 13 8
SOLUTION
x y x2 y2 xy
4 6 16 36 24
9 8 81 64 72
6 9 36 81 54
7 9 49 81 63
10 8 100 64 80
4
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
3. To get your values press RCL. For example RCL , RCL , RCL ,
RCL b etc.
NB know how to use your calculator from the calculator user guide.
4. REGRESSION LINE x on y
To find the line of best fit you need to calculate the values of c and d.
Example 3
Using results from example 2, Calculate regression line of x on y.
SOLUTION
d= = = = 1,006246451
A regression equation can be used to estimate the values of the variable y for any
value of x and vice-versa.
Estimation outside the range usually gives unreliable values in some situations.
To estimate just substitute the given value into your equation and solve.
Example 4
Use the regression line y = 2,456685509 + 0,834274952x in example 2 to find the
physics mark for a maths mark of 15 to the nearest whole number.
SOLUTION
Maths = x so for x = 15 substitute x = 15 into the equation.
y = 2,456685509 + 0,834274952(15)
y = 14,97077889 = 15
The line of best fit should pass through the point ( on a scatter plot.
To draw the line of best fit your nee just three points ( being one of the three
points.
5
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
Take other two x values and calculate y values for each using the regression
equation.
These x values should be within the range of x values so the best way is to take
the highest and the lowest.
Example 5
Fit regression line y on x in example 2 on a scatter plot.
SOLUTION
( = (10,6 ; 11,3)
When x = 4 y = 5,8 using the equation and when x = 20 y = 19,1 therefore the
other two points are (4 ; 5,8) and (20 ; 19,1)
r=
Example 6
For the data in example 2 calculate the product moment correlation coefficient and
comment.
SOLUTION
r=
6
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
= 0,9162348
There is a strong positive correlation between Maths an Physics marks.
The r value can be retrieved from the calculator if the bivariate data is entered, for
example by pressing RCL r if using Sharp EL-531WH.
8. COEFFICIENT OF DETERMINATION
It measures the strength of the linear relationship between the variables x and y.
The closer the two lines are together the nearer r is to 1 or −1.
If there is a small distance between the two line from their points of intersection
the r value will be closer to 1 or -1( ie a strong positive or negative correlation)
7
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
The value of r can tell us how close our regression lines y on x and x on y are to
each other from their point of intersection.
The greater the magnitude of r the more closer the lines are to each other
1. Ten boys compete in throwing a cricket ball, and the table shows the height of
each boy (x cm) measured to the nearest centimetre and the distance (Y m) to
which he can throw the ball.
Boys A B C D E F G H I J
x 122 124 133 138 144 156 158 161 164 168
y 41 38 52 56 29 54 59 61 63 67
8
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
2. When a car is switched on, the temperatures () was measured and recorded at
eight different intervals of time (t). The results are given in the table below.
t(mIn) 20 30 40 50 60 70 80 90
0
( C) 42 52 64 66 91 86 98 104
a) Represent the data on a scatter diagram. [3]
b) Calculate the equation of the regression line of on t and fit it on the scatter diagram.
[6]
c) Estimate the value of when t = 65 minutes using the fitted line. [2]
d) Calculate the product moment correlation coefficient and comment. [4]
[NOV 2019 P2 no 6]
D 0 5 10 15 20 25 30 35
M 90 82 56 68 58 46 30 20
a) i) Draw a scatter diagram for the data.
ii) Comment on the relationship between the two sets of data. [4]
b) i) Calculate the product moment correlation coefficient,
ii) Comment on the product moment correlation value. [4]
c) i) Find the equation of the regression line M on D. [4]
ii) Use the regression line to estimate the value of M when D is 1. 12
2. 45[4]
[NOV 2018 P2 no 7]
4. The data below summarise the altitude x (in meters) above sea level, and the mean
air temperature y (in 0C) for 10 weather stations.
9
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
1. i) Regression line y on x:
b= =
=
= 0,52695
a=
= 52−(0,52695)(146,8)
= −23,356
y = −23,356 +0,52695x
Regression line x on y:
d= =
=
= 1,007
c=
c = 146,8−(1,007)(52)
= 94,436
x = 94,436 + 1,007y
ii) r =
10
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
=
= 0,7288
Coefficient of determination =
= 0,5312
53,12% of the variation in the distance is explained by the variation in the
heights. Relationship between x and y is not strong.
2.
a)
b) b = =
=
= 0,8988
a=
= 75,375−(0,8988)(55)
= 25,941
y = 25,941 +0,8988x
The regression line of on t is = 25,941 +0,8988t
c) The value of when t = 65 minutes using the fitted line is 84.
d) r =
=
= 0,97545
11
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
3. a) i)
100
90
80
70
60
M 50
40
30
20
10
0
0 5 10 15 20 25 30 35 40
D
b)i) r =
=
= −0,95755
ii) There is a strong negative correlation.
c) i) b = =
=
=−1,88
a=
= 56,25−(−1,88)(17,5)
= 89,16
y = 89,16 −1,88x
ii) The value of M when D is
1. 12 is 89,16 −1,88(17,5) = 66,59
2. 45 is 89,16 −1,88(45) = 4,52
4. . June 2014 no 9
i) r = =
12
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
=
= −0,9051
There is a strong negative correlation between X and Y.
ii) b = =
=
= -0,007852967
a=
= 9,42−(- 0,00785)(106,7)
= 10,26
y = 10,26−0,00785x
iii) The mean air temperature at a place 250 m above sea level is
y = 10,26−0,00785(250)
= 8,30
Therefore temperature is 8,30 0C
5. a)
b) = 394
=394
6(300) = 394
1800 = 394
2 194
=123 648
+90000) =123648
+ =123648
13
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483
+ =123648
= 900048
= 5701
1200 = 5701
1770
=103 500
+40000) =103500
+ =103500
400(1770)+ =123648
= 571500
i) let d = x and t = y. Regression line t on d is regression y on x
b= = = = 0,680
a=
= 295− 0,68(365,7)
= 46,32
y = 46,32+ 0,680x t = 46,32+ 0,680d
ii) on the graph
iii) On the graph the time taken by a participant who travelled 350km is 275 min.
c) ) r = =
= = 0,95733
There is strong positive correlation between distance travelled and time.
14