0% found this document useful (0 votes)
86 views14 pages

Bivariate Data Notes

The document provides comprehensive notes on bivariate data, focusing on regression and correlation analysis. Key topics include plotting scatter diagrams, calculating regression lines, and determining correlation coefficients, with examples and tasks for practice. It also discusses the relationships between correlation coefficients and regression lines, along with past exam questions for further application of the concepts.

Uploaded by

chiyanjapeter7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views14 pages

Bivariate Data Notes

The document provides comprehensive notes on bivariate data, focusing on regression and correlation analysis. Key topics include plotting scatter diagrams, calculating regression lines, and determining correlation coefficients, with examples and tasks for practice. It also discusses the relationships between correlation coefficients and regression lines, along with past exam questions for further application of the concepts.

Uploaded by

chiyanjapeter7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

BIVARIATE DATA
REGRESSION AND CORRELATION
SYLLLABUS OBJECTIVES
You should be able to:

 plot scatter diagrams

 draw lines of best fit

 find the equations of regression lines

 calculate Pearson`s product moment correlation coefficient


(r)

 compute the coefficient of determination (r2)


solve problems involving regression and correlation

1. DEFINITIONS

1.1. Bivariate data


This is data that comes in pairs (x;y) and there may or may not be a relationship
between the them.

1.2. Correlation
Is a statistical method used to determine whether there is a linear relationship
between variables. If a relationship exists we say the variables are correlated.

1.3. Regression
Is a statistical method used to describe the nature of the relationship between
variables. ie, positive or negative, linear or nonlinear.

2. SCATTER PLOTS

 In regression and correlation we are interested in answering the question, are two
variables linearly related? and if so, what is the strength of the relationship?

1
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

 A visual way to describe the nature of the relationship between two variables is by
using a scatter plot which is a graph of the ordered pairs (x, y).

Example 1

Construct a scatter plot for the data shown for Mathematics and Physics test marks
all marked out of 10 students.

Maths 4 9 6 10 20 7 12 17 11 10

Physics 6 8 9 13 20 9 10 17 13 8

Letting Maths = x and Physics = y the (x;y) scatter plot will be as follows

If the points appear to lie on a straight line pattern we can say the variables are correlated. In
our example the variables are positively correlated as they appear to increase together.

2.1. Typical scatter diagrams and comments associated with them.

2.2. Dependent and independent variables

 The two variables in bivariate data are called the independent variable and the
dependent variable.

2
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

 The independent variable is the variable in regression that can be controlled


or manipulated and is plotted on the x-axis of the scatter plot.

 The dependent variable is the variable in regression that cannot be controlled


or manipulated and is represented by y.

TASK 1

1.State the independent and dependent in the following:


a. Fertiliser quantity and yield per hectare.
b. Workers’ salaries and company profits.
c. Time of sunshine and temperature.
d. Speed and time taken to complete a journey.

TASK 2
For the following data plot scatter diagrams and comment
a)

X 8 10 15 6 11 12 13 11 9
Y 14 10 18 13 14 13 16 11 12

b)

X 25 18 32 27 21 35 28 30 16
Y 16 11 20 17 15 26 22 20 10

c)

X 5 7 12 16 20
Y 4 12 18 21 24

3. RERESSION FUNCTION y on x

 From a scatter plot a mathematical relationship between the two variables can be
seen.

 The linear relationship is a function y = f(x) which is a straight line called a


regression line or line of best fit.

 It is of the form y = a + bx where b is the regression function (the amount by


which y increases for a unit increase of x) and a is the y intercept.

 To find the line of best fit you need to calculate the values of a and b.

b= which can be written as and a =

Example 2
3
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

Using the data in Example 1of 10 students’ marks in physics and maths and letting
Maths = x and Physics = y. Calculate regression line of y on x.

Maths 4 9 6 10 20 7 12 17 11 10

Physics 6 8 9 13 20 9 10 17 13 8

SOLUTION

x y x2 y2 xy

4 6 16 36 24

9 8 81 64 72

6 9 36 81 54

10 13 100 169 130

20 20 400 400 400

7 9 49 81 63

12 10 144 100 120

17 17 289 289 289

11 13 121 169 143

10 8 100 64 80

= 106 = 113 = = =1375


1336 1453

b= = = = 0,834274952 = 0,83 to 2 d.p

a= = 11,3 – 0,834274952(10,6) = 2,456685509


regression line y on x is y = 2,456685509 + 0,834274952x

 The Statistics linear mode for regression y = a + bx on your calculator enables


you to input the pairs of data (x,y) and then the values a , b , , , n and all
the summations. For example using when using Sharp EL-531WH
1. Set mode 1- 1
2. To input set (4;6) press 4(x,y)6 Data ie 4 STO 6 M+. Enter all the sets as
such.

4
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

3. To get your values press RCL. For example RCL , RCL , RCL ,
RCL b etc.
NB know how to use your calculator from the calculator user guide.

4. REGRESSION LINE x on y

 It is of the form x = c + dy where d is the regression function (the amount by


which x increases for a unit increase of y).

 To find the line of best fit you need to calculate the values of c and d.

d= which can be written as and c =

Example 3
Using results from example 2, Calculate regression line of x on y.
SOLUTION

d= = = = 1,006246451

c= = 10,6 – 1,006246451(11,3) = -0,770584896


regression line x on y is x = -0,770584896+ 1,006246451y

5. MAKING PREDICTIONS USING RGREESION LINE

 A regression equation can be used to estimate the values of the variable y for any
value of x and vice-versa.

 Estimation however should be within the given range for it to be reliable.

 Estimation outside the range usually gives unreliable values in some situations.

 To estimate just substitute the given value into your equation and solve.
Example 4
Use the regression line y = 2,456685509 + 0,834274952x in example 2 to find the
physics mark for a maths mark of 15 to the nearest whole number.
SOLUTION
Maths = x so for x = 15 substitute x = 15 into the equation.
y = 2,456685509 + 0,834274952(15)
y = 14,97077889 = 15

6. FITTING REGREESIION LINE ON SCATTER PLOT

 The line of best fit should pass through the point ( on a scatter plot.

 To draw the line of best fit your nee just three points ( being one of the three
points.

5
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

 Take other two x values and calculate y values for each using the regression
equation.

 These x values should be within the range of x values so the best way is to take
the highest and the lowest.
Example 5
Fit regression line y on x in example 2 on a scatter plot.
SOLUTION
( = (10,6 ; 11,3)
When x = 4 y = 5,8 using the equation and when x = 20 y = 19,1 therefore the
other two points are (4 ; 5,8) and (20 ; 19,1)

7. PRODUCT MOMENT CORRELATION

 It is a numerical measure of linear correlation.

 The numerical value r is such that

 r = 1 indicates a perfect positive correlation

 r = -1 indicates a perfect negative correlation

 r = 0 means there is no correlation.


r= where Sxy = and Sx = and Sy =

r=

Example 6
For the data in example 2 calculate the product moment correlation coefficient and
comment.
SOLUTION

r=

6
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

= 0,9162348
There is a strong positive correlation between Maths an Physics marks.

The r value can be retrieved from the calculator if the bivariate data is entered, for
example by pressing RCL r if using Sharp EL-531WH.

8. COEFFICIENT OF DETERMINATION

 It measures the strength of the linear relationship between the variables x and y.

 The higher the value of r2 is, the better the fit.

 Coefficient of determination also gives the proportion of variability in the


dependent variable that is explained by changes in the independent variable.

 It can be obtained by squaring the product moment correlation r to give r2.

 For example from Example 6 r2 = 0,91623482 = 0,83948621

 It also shows the relationship between regression coefficients of y on x and x on y


ie b and d. In that case r2 is the product of the two.( if y = a+bx and x = c+dy
then r2 = b × d)
From our examples 2 and 3 we have equations y = 2,456685509 + 0,834274952x
and
x = −0,770584896+ 1,006246451
Our r2 = b×d
= 0,834274952×1,006246451
= 0,83948621
Note:This is the same value as that obtained by squaring the product moment
coefficient.

9. RELATIONSHIP BETWEEN CORRELATION COEFFICIENT r AND


RGRESSION LINES y on x and x on y.

 Regression lines y on x and x on y can be plotted on the same scatter plot.

 The closer the two lines are together the nearer r is to 1 or −1.

 If there is a small distance between the two line from their points of intersection
the r value will be closer to 1 or -1( ie a strong positive or negative correlation)

7
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

 The following diagrams explain that:

 The value of r can tell us how close our regression lines y on x and x on y are to
each other from their point of intersection.

 The greater the magnitude of r the more closer the lines are to each other

10. PAST EXAM QUESTIONS

1. Ten boys compete in throwing a cricket ball, and the table shows the height of
each boy (x cm) measured to the nearest centimetre and the distance (Y m) to
which he can throw the ball.

Boys A B C D E F G H I J

x 122 124 133 138 144 156 158 161 164 168

y 41 38 52 56 29 54 59 61 63 67

Σ =1 468; Σ 2=218 070; Σ =520 Σ 2=28 382; Σ =77 689


Calculate
(i) The regressions line y on x and x on y.

8
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

(ii) Coefficient of determination and comment on its significance. [12]


[SPECIMEN P1 no 12]

2. When a car is switched on, the temperatures () was measured and recorded at
eight different intervals of time (t). The results are given in the table below.

t(mIn) 20 30 40 50 60 70 80 90
0
( C) 42 52 64 66 91 86 98 104
a) Represent the data on a scatter diagram. [3]
b) Calculate the equation of the regression line of on t and fit it on the scatter diagram.
[6]
c) Estimate the value of when t = 65 minutes using the fitted line. [2]
d) Calculate the product moment correlation coefficient and comment. [4]
[NOV 2019 P2 no 6]

3. The following table shows two sets of data D and M.

D 0 5 10 15 20 25 30 35
M 90 82 56 68 58 46 30 20
a) i) Draw a scatter diagram for the data.
ii) Comment on the relationship between the two sets of data. [4]
b) i) Calculate the product moment correlation coefficient,
ii) Comment on the product moment correlation value. [4]
c) i) Find the equation of the regression line M on D. [4]
ii) Use the regression line to estimate the value of M when D is 1. 12
2. 45[4]
[NOV 2018 P2 no 7]

4. The data below summarise the altitude x (in meters) above sea level, and the mean
air temperature y (in 0C) for 10 weather stations.

i) Calculate the product moment correlation coefficient and comment on the


relationship between x and y. [3]
ii) Calculate the equation of the regression line of y on x giving your answer in the
form y = a + bx, [4]
iii) Use your results on part ii) to estimate the mean air temperature at a place 250
m above sea level. [2] [JUN 2014 P4 no 9]

5. Participants to a ZIMSEC workshop on syllabus interpretation were asked to


report the distance d, they drove in kilometres and the time t, taken in minutes.
The table below gives a random sample of the values reported.

d(km) 263 211 290 580 473 377

9
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

t(min) 180 210 240 420 390 330

a) Plot these data on a scatter diagram. Use a scale of 2 cm to represent 50 km on horizontal


axis and 2 cm to represent 50 minutes on vertical axis. [3]
b) i) Obtain the equation of the estimated regression line of t on d. [4]
ii) Draw the regression line on your diagram. [1]
iii) Use the regression line to estimate the time taken by a participant who travelled 350
km. [2]
c) Find the product moment correlation coefficient between t and d. Comment on the results.
[4] [JUN 2008 P4 no 11]

11. SOLUTIONS TO EXAM PRACTICE QUESTIONS

1. i) Regression line y on x:

b= =

=
= 0,52695
a=
= 52−(0,52695)(146,8)
= −23,356
y = −23,356 +0,52695x
Regression line x on y:

d= =

=
= 1,007
c=
c = 146,8−(1,007)(52)
= 94,436
x = 94,436 + 1,007y

ii) r =

10
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

=
= 0,7288
Coefficient of determination =
= 0,5312
53,12% of the variation in the distance is explained by the variation in the
heights. Relationship between x and y is not strong.
2.
a)

b) b = =

=
= 0,8988
a=
= 75,375−(0,8988)(55)
= 25,941
y = 25,941 +0,8988x
The regression line of on t is = 25,941 +0,8988t
c) The value of when t = 65 minutes using the fitted line is 84.

d) r =

=
= 0,97545

11
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

There is a very strong positive correlation between time and temperature.

3. a) i)
100
90
80
70
60
M 50
40
30
20
10
0
0 5 10 15 20 25 30 35 40
D

ii) The two sets of data have a negative correlation.

b)i) r =

=
= −0,95755
ii) There is a strong negative correlation.

c) i) b = =

=
=−1,88
a=
= 56,25−(−1,88)(17,5)
= 89,16
y = 89,16 −1,88x
ii) The value of M when D is
1. 12 is 89,16 −1,88(17,5) = 66,59
2. 45 is 89,16 −1,88(45) = 4,52
4. . June 2014 no 9

i) r = =

12
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

=
= −0,9051
There is a strong negative correlation between X and Y.

ii) b = =

=
= -0,007852967
a=
= 9,42−(- 0,00785)(106,7)
= 10,26
y = 10,26−0,00785x
iii) The mean air temperature at a place 250 m above sea level is
y = 10,26−0,00785(250)
= 8,30
Therefore temperature is 8,30 0C
5. a)

b) = 394
=394
6(300) = 394
1800 = 394
2 194

=123 648
+90000) =123648
+ =123648

13
A LEVEL STATISTICS NOTES COMPILED BY MANYUVIRE D CELL 0783235483

+ =123648
= 900048

= 5701
1200 = 5701
1770

=103 500
+40000) =103500
+ =103500
400(1770)+ =123648
= 571500
i) let d = x and t = y. Regression line t on d is regression y on x

b= = = = 0,680

a=
= 295− 0,68(365,7)
= 46,32
y = 46,32+ 0,680x t = 46,32+ 0,680d
ii) on the graph
iii) On the graph the time taken by a participant who travelled 350km is 275 min.

c) ) r = =

= = 0,95733
There is strong positive correlation between distance travelled and time.

14

You might also like