0% found this document useful (0 votes)

26 views27 pages

C R Lect Notes

Uploaded by

lbwnb.68868

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views27 pages

C R Lect Notes

Uploaded by

lbwnb.68868

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

National Junior College Mathematics Department 2016

National Junior College

2015 – 2016 H2 Mathematics
Correlation and Regression (Lecture Notes)

Topic 22: Correlation & Regression

Key Questions to Answer:

What is bivariate data?

o What is meant by an independent and a dependent variable?
How do we plot a scatter diagram?
o How do we determine if there is a linear relationship between the two variables
from the scatter diagram?
What does the product moment correlation coefficient, r, measure?
o How do we calculate the product moment correlation coefficient for a given
set of bivariate data?
o How do we relate the value of the product moment correlation coefficient (in
particular, values close to –1, 0 and 1) to the appearance of the scatter
diagram?
o Does zero correlation necessarily mean that there is no relationship between
the two variables?
o Does a high correlation between two variables imply that one directly causes
the other?
What is meant by linear regression?
o What is a least squares regression line and how does it relate to the scatter
diagram?
o How do we determine the equation of a least squares regression line for a
given set of bivariate data?
o How do we interpret the values of the slope and intercept of a least squares
regression line in a practical situation?
How do we use a regression line to perform prediction or estimation of a value in a
practical situation?
o How is the choice of regression line used to perform estimation affected by
the existence of a dependence relationship between the two variables?
which variable for which we are estimating its value (given the value of
the other variable)?
o How is the reliability of the estimate affected by
the strength of the linear relationship between the two variables (observed
from the value of r and/or the scatter diagram)?
whether the given value (to input) falls inside or outside the data range
for the variable (i.e. whether it is obtained through interpolation or
extrapolation)?
How do we perform linearisation on a set bivariate data to fit a non-linear model?
o Given different models (linear and/or non linear), how do we determine which
model fits the data the best?

2015 – 2016 / H2 Maths / Correlation and Regression Page 1 of 27

National Junior College Mathematics Department 2016

§1 Introduction

1.1 Bivariate Data and Scatter Diagrams

A set of data comprising the values of two variables, say x and y, obtained from the
same sample, is known as bivariate data.

Some examples of bivariate data include the following:

(a) Amount of advertising time for a product and number of sales for that
product
(b) Heights of persons and their ages
(c) Mathematics test scores and English test scores

When a set of bivariate data is plotted in the Cartesian plane, a scatter diagram (or
scatter plot) is produced. Some examples of scatter diagrams are as follows:

Figure 1.1. Scatter diagram for the Senior High 2 Lecture Test percentage scores (y) for
a class against their Senior High 1 Promotional Examination percentage scores (x)

1.2 Independent & Dependent Variables

In a set of bivariate data, one of the two variables may be affected or influenced by
the value of the other variable, which is controlled. In this case, the variable whose
values have been controlled is called the independent variable, while the other
variable is called the dependent variable.

2015 – 2016 / H2 Maths / Correlation and Regression Page 2 of 27

National Junior College Mathematics Department 2016

For example, in the set of data on the amount of advertising time for a product and
number of sales for that product, the amount of advertising time is the independent
variable, while the number of sales for that product is the dependent variable.

On the other hand, in a set of data comprising the heights and intelligence quotients
(IQ) of a group of people, neither variable is likely to depend on the other.

Example 1.1.

In a city, the number of outlets for a particular café, x, and the number of car
accidents, y, are recorded over a period of time. The set of data obtained is given as
follows.

x 25 45 60 75 90
y 88 72 57 44 23

(i) Sketch a scatter diagram for this set of data.

(ii) Referring to the scatter diagram, describe a possible relationship between the
two variables x and y.
(iii) Explain whether the relationship you have observed in part (ii) suggests that
one variable directly causes the other variable.
(iv) If x and y represent the following variables instead, state whether or not one
variable depends on the other, and identify the independent and dependent
variables when that happens.
(a) Time passed (x), concentration of a substance in a solution (y)
(b) Air temperature (x), Wind speed (y)
(c) Mathematics Test scores (x), English Test scores (y)

Solution.

(i) The graphic calculator can be used as a tool to sketch the scatter diagram as
as described in the following procedure:

No. Keys to Press/Steps Screenshot

1
Press .

2015 – 2016 / H2 Maths / Correlation and Regression Page 3 of 27

National Junior College Mathematics Department 2016

2.
Press .

Key in the x and y values

into columns L1 and L2
respectively.

3. Exit to the main screen.

Then press .

4. Adjust the settings

accordingly.

5.
Press .

Then press

(for “ZoomStat”)

2015 – 2016 / H2 Maths / Correlation and Regression Page 4 of 27

National Junior College Mathematics Department 2016

(ii) As x increases, y decreases proportionately OR there is a negative linear

relationship between x and y.

(iii) No. The decrease in the number of car accidents could be due to a recent
campaign on road safety, while the increase in the number of outlets of the
café could be simply due to the café expanding its business at the same time.
It is not likely that increase in the number of outlets of the cafe has caused a
decrease in the number of car accidents.

(iv) (a) x is the independent variable; y is the dependent variable.

(b) y is the independent variable; x is the dependent variable.
(c) It is not evident whether x depends on y or vice versa.

Notes:

Care and caution are needed when interpreting a scatter diagram.

1. While there may appear to be a mathematical relationship between the two

variables, it does not mean that there is a relationship in reality.

2. The appearance of a mathematical relationship does not imply that there is a

causal relationship. An increase in one variable does not necessarily cause an
increase (or decrease) in the other variable.

Further qualitative analysis is needed to ascertain the true effect of one variable on
the other.

§2 Linear Regression

As in Example 1.1, we may be interested to look for a mathematical relationship

between the variables in the form y = f(x), so that we can estimate or predict the
value of y given a value of x which does not appear in the set of bivariate data, or
vice versa.

If it appears from the scatter diagram that a linear relationship is a sensible

interpretation, we may then attempt to find a model for the relationship in the form
of a regression line, i.e. f(x) = a + bx for some real constants a and b.

2015 – 2016 / H2 Maths / Correlation and Regression Page 5 of 27

National Junior College Mathematics Department 2016

This is akin to finding a “best-fit” line, where a line is drawn on the scatter diagram
such that there are as many points above the line as below it (or as many points to
the left of the line as to the right of it). However, for different individuals, the choice
of line is subjected to their personal judgement of “closeness”. Hence, there can be
many possible “best-fit” lines for the same set of bivariate data.

In the following sub-section, we discuss how to work out the equation of a

regression line mathematically.

2.1 The Method of Least Squares

Consider the scatter diagram in Example 1.1. To find a regression line

mathematically, consider the vertical distances e1, e2, e3, e4 and e5 drawn from each
point to a “best-fit” line which we have drawn for the data, as shown below.

Note that the values of e1, e2, e3, e4 and e5 represent the “y-errors”, i.e. the errors
in using the chosen line to estimate the values of y for the values of x given in the
data set.

Logically speaking, the values of the errors should be as small as possible if the line
chosen is indeed the best-fit line. Hence we aim to minimise the values of the errors
when calculating the equation of the regression line.

However, since these errors will be positive or negative according to whether the
points are above or below the line, we work with the squares of these values instead
and consider their sum,

ek 2 e12 e2 2 e32 e4 2 e52

By minimising the sum ek 2 , one will obtain the least squares regression line of

y on x.

Contrastingly, if we were to instead consider the horizontal distances d1, d2, d3, d4
and d5 drawn from each point to a best-fit line, as shown below:

2015 – 2016 / H2 Maths / Correlation and Regression Page 6 of 27

National Junior College Mathematics Department 2016

and minimise the values of the “x-errors” by minimising the sum of squares
dk 2 d12 d22 d32 d42 d5 2 ,

we would obtain the least squares regression line of x on y.

For the purpose of the H2 Maths syllabus, you are not required to find the
equations of the regression lines analytically. However, you will need to know
how to use the graphic calculator to obtain the equations of the regression lines,
which is illustrated in the next example.

Example 2.1.

Consider the following set of bivariate data.

x 25 50 60 80 90
y 80 90 50 44 10

(i) Find the equations of the regression lines of y on x and x on y, and sketch both
lines in a single scatter diagram.

(ii) Find the coordinates of the point of intersection between both lines found in
part (i). How do the values compare to the sample means of x and y in the set
of data?

Suppose the variables x and y are such that neither depends on the other.

(iii) Using a suitable regression line, estimate the value of

(a) y, when x = 70, and
(b) x, when y = 50.
Justify your choice of regression line for each of parts (iii)(a) and (iii)(b).

Suppose instead that x represents air temperature and y represents wind speed.

(iv) Will the choice of regression lines in parts (iii)(a) and (iii)(b) change? Why
or why not?

(v) Interpret the slope of the x on y line in the context of this question.

2015 – 2016 / H2 Maths / Correlation and Regression Page 7 of 27

National Junior College Mathematics Department 2016

Solution.

(i) To find the equation of the regression line of y on x:

After entering the data into the GC (as illustrated in Example 1.1),

No. Keys to Press/Steps Screenshot

1
Press

to enter the ‘CALC’ sub-

menu.

2.
Press to select

‘8: LinReg(a + bx)’ and

enter L1 (x) and L2 (y) as the
‘Xlist’ and ‘Ylist’
respectively.

3.
Press , then

to calculate the

equation of the regression

line of y on x.

Hence the equation of the regression line of y on x is y = 119.85 – 1.0664x.

2015 – 2016 / H2 Maths / Correlation and Regression Page 8 of 27

National Junior College Mathematics Department 2016

To find the equation of the regression line of x on y:

After entering the data into the GC (as illustrated in Example 1.1),

No. Keys to Press/Steps Screenshot

1
Press

to enter the ‘CALC’ sub-

menu.

2.
Press to select

‘8: LinReg(a + bx)’ and

enter L2 (y) and L1 (x) as the
‘Xlist’ and ‘Ylist’
respectively.

3.
Press , then

to calculate the

equation of the regression

line of x on y.

Hence the equation of the regression line of y on x is x = 99.079 – 0.69489y.

x 99.079
y
0.69489
y 142.58 1.4391x
Sketching using the GC (with the scatter plot turned on as illustrated in
Example 1.1),

2015 – 2016 / H2 Maths / Correlation and Regression Page 9 of 27

National Junior College Mathematics Department 2016

(ii) Rearranging the equations of the two regression lines, we have

1.066412214 x y 119.851145
x 0.694885897 y 99.07978512

Using the GC (PolySmlt2 App – see below), the point of intersection between
The aveare
both the Y on X and X on Y lines has coordinates (61.0, 54.8).

Using the “2-Var Stats” command in the GC (in “STAT” “CALC menu –
see below), sample mean of x = 61 and sample mean of y = 54.8, which
coincide with the coordinates of the point of intersection between the two
lines.

2015 – 2016 / H2 Maths / Correlation and Regression Page 10 of 27

National Junior College Mathematics Department 2016

(iii) (a) Since we are estimating the value of y, we wish to minimise the y-errors. Note: Premature
Hence the appropriate regression line to use in this case is the y on x rounding off in the
line. Therefore, when x = 70, equation of line will
Estimated value of y = 119.85 – 1.0664(70) lead to inaccuracy
= 45.202 in estimation e.g.
= 45.2 (to 3 significant figures)
“Estimated value of
(iii) (b) Since we are estimating the value of x, we wish to minimise the x-errors. Y = 120 – 1.07(70)
Hence the appropriate regression line to use in this case is the x on y = 45.1 (to 3 s.f.s)”
line. Therefore, when y = 50,
Estimated value of x = 99.079 – 0.69489(50)
= 64.155
= 64.2 (to 3 significant figures)

(iv) The choice of line for part (iii)(a) should be changed to the x on y line, while
the choice of line for part (iii)(b) should remain the same. As y is the
controlled variable, there is no error to speak of for y, and therefore the x on
y line should be used to carry out estimation in both parts.

(v) Every unit increase in wind speed (y) will lead to an approximate decrease
of 0.695 units in the air temperature (x).

Notes:

1. The choice of regression line to perform estimation in any general scenario is

summarised in the following table:

Estimate y given Estimate x given

Scenario
value of x value of y
y depends on x Use the y on x line.
x depends on y Use the x on y line.
no dependence Use the y on x line. Use the x on y line.

2. If the sample mean of both variables are given to be x and y , then the point
x , y lies on both the regression lines of y on x and x on y, i.e. the two lines
must intersect at x , y . (For the proof, refer to the appendix on the derivation
of the working formulae to find the equations of the y on x and x on y lines.)

§3 The Product Moment Correlation Coefficient

In the previous section, we have discussed how to use regression lines to carry out
estimation/prediction of values for a variable given a set of bivariate data.

However, the validity of performing this procedure to carry out estimation depends
on the assumption that a linear model is a good fit for the data we are given, which
is not always true. For example, the following scatter diagrams illustrate how
different sets of bivariate data may demonstrate different possible relationships (or
equivalently types of correlation) between the two variables.

2015 – 2016 / H2 Maths / Correlation and Regression Page 11 of 27

National Junior College Mathematics Department 2016

Positive correlation Negative correlation

100 29
90
27
80

Temperature (Y °C)
Physics Marks (Y)

70 25
60 23
50
40 21
30 19
20
17
10
0 15
0 20 40 60 80 100 2000 2500 3000 3500 4000 4500
Maths Marks (X) Altitude (X m )

Quadratic relation No observable relation

4500 6000
4000
5000
3500

Monthly Salary ($Y)

3000 4000
2500
3000
2000
1500 2000
1000
1000
500
0 0
0 20 40 60 80 45 55 65 75 85
Age (X yrs) Weight (X kg)

While the scatter diagram can show clearly if a linear model is a good fit for the
data, this may not always be a practical approach to determine the degree of linear
correlation between the two variables, especially when we are dealing with a large
set of data. In this case, a possible alternative to using the scatter diagram would be
to use a certain measure called the product moment correlation coefficient
(conventionally denoted by r), which is calculated based on the following formula:

x y
( x x)( y y) xy
r= = n (in MF15)
2 2
( x x) ( y y) x
2
y
2

x2 y2
n n

The value of r gives us an indication of the following:

If r is positive (negative), then the two variables

are positively (negatively) correlated i.e. one
Direction of correlation
generally increases (decreases) in value as the
other increases.
The closer the absolute value of r, |r|, is to 1, the
Strength of linear
stronger the strength of linear correlation i.e. the
correlation
better a linear model fits the set of bivariate data

Notes:

(1) For any set of data, –1 ≤ r ≤ 1.

(2) If r = 1, then the set of bivariate data demonstrates a perfect positive linear
correlation between the two variables.

2015 – 2016 / H2 Maths / Correlation and Regression Page 12 of 27

National Junior College Mathematics Department 2016

If r = –1, then the set of bivariate data demonstrates a perfect negative linear
correlation between the two variables.

If a set of bivariate data demonstrates a perfect linear correlation, then ALL

the points in the scatter diagram are collinear. In this case, the y on x and
x on y lines are the same line, which passes through every single point in
scatter plot.

The following diagrams illustrate how the two regression lines appear for
different values of r.

x on y

y on x

y on x and x on y
coincide

Perfect positive linear Strong positive linear correlation, e.g.

correlation, r = 1 r = 0.8

x on y
x on y
y on x
y on x

Weak positive linear correlation, No linear correlation, r = 0

e.g. r = 0.4

x on y
x on y

y on x y on x

Weak negative linear Strong negative linear correlation, e.g.

correlation, e.g. r = 0.4 r = 0.9

y on x and x on y
coincide

Perfect negative linear correlation, r = 1

2015 – 2016 / H2 Maths / Correlation and Regression Page 13 of 27

National Junior College Mathematics Department 2016

In summary, the closer the absolute value of r is to 1 i.e. the stronger the linear
relationship demonstrated, the closer the two regression lines are to each
other.

Note that the x on y line is always STEEPER than the y on x line.

(3) r is a dimensionless quantity i.e. it has NO units, regardless of the units of

each variable.

(4) The value of r depends on the given set of bivariate data and hence it may
change in value if more data pairs are added to the bivariate data (after
conducting more trials of the experiment), or if a data pair that has been
discovered to be an outlier has been removed from the data.

(5) On the other hand, the value of r is independent of any linear

transformation carried out on the set of bivariate data, since the strength of
linear relationship between any two variables is preserved after any
translation, scaling or reflection. For example, the value of r for a set of
bivariate data for two variables, say temperature and amount of rainfall, will
stay the same whether or not the temperatures are measured in Celsius or
Fahrenheit, since the conversion from one unit to the other can be expressed
9
as a linear relationship, F C 32, where C and F are the temperatures
5
measured in Celsius and Fahrenheit respectively.

Example 3.1.

A set of bivariate data between two variables x and y is given as follows. Calculate
the product-moment correlation coefficient, r, for this set of data.

x 2 1 3 3 4 4 4
y 1.5 2 3 3 3.5 4 4

Solution:

To find r given complete sample data

No. Keys to Press/Steps Screenshot

Press , and turn

“STAT DIAGNOSTICS”
on.

2015 – 2016 / H2 Maths / Correlation and Regression Page 14 of 27

National Junior College Mathematics Department 2016

1
Press .

2.
Press .

Key in the x and y values

into columns L1 and L2
respectively.

3. Exit to the main screen.

Then press Press

to enter the ‘CALC’ sub-

menu.
4.
Press to select

‘8: LinReg(a + bx)’ and

enter L1 (x) and L2 (y) as the
‘Xlist’ and ‘Ylist’
respectively.

2015 – 2016 / H2 Maths / Correlation and Regression Page 15 of 27

National Junior College Mathematics Department 2016

3.
Press , then

. This time, both

the coefficients of the y on x

line and the value of r will
appear.

Thus, r = 0.905 (3 s.f.)

Note: The value of r will appear only if STAT DIAGNOSTICS has been turned
on.

Example 3.2.

A set bivariate data comprising 7 data pairs for two variables x and y is collected.
The data is summarised as follows.

x 21, x2 71, y 21, y2 68.5, xy 69.

Calculate the product-moment correlation coefficient, r.

Solution:

To find r given summarised sample data

From the formulae booklet (MF15),

x y
xy
r n
2 2
2
x 2
y
x y
n n

21 21
69
7
2 2
21 21
71 68.5
7 7
0.905

2015 – 2016 / H2 Maths / Correlation and Regression Page 16 of 27

National Junior College Mathematics Department 2016

Example 3.3.

The table below shows four sets of bivariate data.

x1 y1 x2 y2 x3 y3 x4 y4
10 8.04 10 9.14 10 7.46 8 6.58
8 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.10 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.10 4 5.39 19 12.50
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89

(i) Find the values of the product moment correlation coefficient for each set of
data.

(ii) Sketch the scatter diagrams for each of the sets of the bivariate data.

(iii) Using your sketches in part (ii), comment on the strengths of the linear
relationships between the two variables for each set of data, and compare
between the effectiveness of using the r-values and that of using the scatter
diagrams to determine the strength of the linear relationship of a set of
bivariate data.

Solution:

(i) One can verify that the correlation coefficients for the four sets of data are all
equal to 0.816.

(ii) The following scatter diagrams are drawn based on the above data.

For x1 and y1 For x2 and y2

12 10
9
10
8
8 7
6
y1

6 5
4
4
3
2 2
1
0 0
0 5 10 15 0 5 10 15
x1 x2

2015 – 2016 / H2 Maths / Correlation and Regression Page 17 of 27

National Junior College Mathematics Department 2016

For x3 and y3 For x4 and y4

14 14

12 12

10 10

8 8

y4
y3

6 6

4 4

2 2

0 0
0 5 10 15 0 5 10 15 20

x3 x4

(iii) Based on the sketches in part (ii), the 3rd set of data demonstrates the strongest
linear relationship between the two variables, albeit with the presence of an
outlier.

Since all the r-values are the same, we cannot tell which data set has the
strongest linear relationship based on the r-value alone. Rather, we need to
look at the scatter diagram to help us decide.

In other words, the r-value alone is sometimes not enough to fully illustrate
the strength of the linear relationship between the two variables. The scatter
diagrams give a clearer and more complete picture in this aspect.

Example 3.4.

(i) Sketch an example of a scatter diagram indicating the following:

“A linear (product-moment) coefficient close to zero but there is an

obvious relation between the variables.”

(ii) If the estimated product-moment correlation coefficient has a value close

to +1 or to –1, explain why this need not imply that there is a linear
relationship between the variables.

Solution

(i)
4500
4000
3500
Monthly Salary ($Y)

3000
2500
2000
1500
1000
500
0
0 20 40 60 80
Age (X yrs)

r is close to zero but there exists an obvious relationship (possibly

quadratic) between the variables.

2015 – 2016 / H2 Maths / Correlation and Regression Page 18 of 27

National Junior College Mathematics Department 2016

(ii) r is close to +1 or to –1 indicates a linear relationship for values within the

sample range. However, it does not tell us the relation for data values
outside this range. For example, r may be calculated using the last four
pairs of data for the above scatter diagram, which gives almost a linear
relation.

§4 Reliability of Estimates

From Example 2.1 (iii), we see how a regression line can be used to estimate the
value of one of the two variables for a set of bivariate data, given a value of the
other variable.

However, from Example 3.4 (ii), note that even if the bivariate data demonstrates a
linear relationship, adding more data points may show a different relationship
between the two variables altogether. Hence it is risky to use the regression line to
estimate the value of x (or y) given a value of y (or x) that lies outside the range of
values of y (or x) in the data set.

Therefore, to determine if an estimate (or predicted value) obtained from a

regression line is reliable, we consider the following.

NO The estimate (or

1. Does the given value lie in the
predicted value)
range of values of the data set?
is NOT reliable.

YES

2. Is the absolute value of r close to 1, NO

The estimate (or
i.e. is there a strong linear relationship predicted value)
between the two variables based on YES is reliable.
the given data set?

The process of carrying out estimation from within the given data range is known
as interpolation, while the process of carrying out estimation from beyond the data
range is known as extrapolation.

Example 4.1.

An instrument is used to measure the amount of Vitamin C in a given volume of

liquid. It is standardised by using it on seven specimen solutions containing known
amounts of Vitamin C, x, in micrograms/ml. The reading on the instrument is
denoted by y. Corresponding values of x and y are given in the table:

x 100 200 300 400 500 600 700

y 6.26 5.47 4.67 3.91 3.29 2.28 1.44

(i) Find r, the product moment correlation coefficient between x and y.

(ii) Using a suitable regression line, estimate the value of x when y = 3.5.
(iii) Comment on the reliability of this estimate that you have obtained in part
(ii).

2015 – 2016 / H2 Maths / Correlation and Regression Page 19 of 27

National Junior College Mathematics Department 2016

Solution
(i) By GC, r = 0.999.

(ii) Since the reading on the instrument depends on the amount of Vitamin C
in the solution, we should use the y on x line (even though we are
estimating x from a given value of y).

From the GC, the equation of the y on x regression line is given by:
y = –0.0079357143x + 7.077142857

When y = 3.5,
3.5 0.0079357 x 7.0771
3.5 7.0771
x
0.0079357
450.76 451 (to 3 s.f.)

(iii) Since
the given value of y, i.e. 3.5, is in the data range for y, [1.44, 6.26], &
the value of r, 0.999, has an absolute value that is very close to 1,
this suggests that the estimated value is reliable.

Example 4.2.
An anemometer is used to estimate wind speed by observing the rotational speed of
its vanes. This speed is converted to wind speed by means of an equation obtained
from calibrating the instrument in a wind tunnel. In this calibration process, the
wind speed is fixed precisely and the resulting anemometer speed noted. For a
particular anemometer, this process produced the following set of data:

Actual wind speed, s

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
(m/s)
Anemometer speed, r
30 38 48 58 68 80 92 106 120 134
(revs/min)

(a) Obtain the equation of the estimated least squares regression line of r on s
and the line of s on r.
(b) If the actual wind speed is 1.65m/s, use an appropriate regression line to
estimate the rotational speed of the anemometer.
(c) Demonstrate, using the above regression line as an example, that it is
unwise to extrapolate beyond the range of the data.

Solution

(a) By GC, r 90.8 116s and s 0.78772 0.0085566r.

(b) Since the wind speed is fixed precisely, it is the independent variable.
Hence we use the line of r on s.
Therefore, r 90.8 116(1.65) 100.6 revs/ min.

(c) Using the r on s line: When s = 0, r = –90.8 revs/min. That is, the
anemometer speed is negative! Thus, it is unwise to extrapolate beyond the
data range.

2015 – 2016 / H2 Maths / Correlation and Regression Page 20 of 27

National Junior College Mathematics Department 2016

§5 Linearisation of Data

Suppose a strong but non-linear relationship can be observed from the data, say one
a
of the form y b . In this case, we can introduce another variable w, where
x
1
w , and carry out linear regression between w and y instead (since y = aw + b).
x

The process of carrying out such a transformation to achieve linearity is known as

linearisation of data. The following are some examples on how linearisation of data
can be carried out for various non-linear models.

Non-Linear Equation Transformed Variables

a 1
(a) y b y = aw + b, where w
x x
(b) y ax 2 b y = au + b, where u x 2
(c) y a b ln x y = a + bv, where v ln x

Example 5.1.

An experiment is conducted to determine the relationship between the variables x

and y. The following table gives the experimental values.

x 1 2 3 4 5 10 30 50
y 5.145 4.139 3.809 3.640 3.542 3.341 3.212 3.181

Find, correct to 4 decimal places, the product moment correlation coefficient

between

(a) x and y,

1
(b) and y,
x

b
Use your answers to parts (a), (b) and (c) to explain which of y a bx, y a ,
x
y a b ln x is the best model for this set of data.

Solution:

(a) From GC, r-value between x and y = –0.5994 (to 4 d.p.)

2015 – 2016 / H2 Maths / Correlation and Regression Page 21 of 27

National Junior College Mathematics Department 2016

(b) To obtain the r-value (or regression model) for transformed data

No. Keys to Press/Steps Screenshot

1
Press .

2.
Press .

Key in the x and y values

into columns L1 and L2
respectively.

3. Scroll towards the right and

up into the header row to
enter the cell with “L3”.

4.
Press

to define L3 as 1/L1 i.e. L3 is

to comprise all the values of
1
in the data set.
x

2015 – 2016 / H2 Maths / Correlation and Regression Page 22 of 27

National Junior College Mathematics Department 2016

5.
Press to

1
generate all the values of .
x

6 Exit to the main screen,

ensure that STAT
Diagnostics is on, and press

to
enter the “LinReg(ax + b)"
command. Enter L3 and L2
as the X and Y lists
respectively.
7 Select “Calculate” to obtain
1
the r-value between and
x
y. The regression line for y
1
on is also obtained (but
x
not required for this
question).

1
Therefore, r-value between and y = 1.0000 (to 4.d.p.)
x

(c) Following a procedure similar to part (b) (set L4 as ln L1, then set X and Y
lists as L4 and L2 respectively in the “LinReg(ax + b)” command).
r-value between ln x and y = –0.8547 (to 4 d.p.)

b
Since |r| is closest to 1 for part (b), the best model for this set of data is y a .
x

2015 – 2016 / H2 Maths / Correlation and Regression Page 23 of 27

National Junior College Mathematics Department 2016

Example 5.2.

A car is placed in a wind tunnel and the drag force F for different wind speeds, v,
in appropriate units, is recorded. The results are shown in the table.

v 0 4 8 12 16 20 24 28 32 36
F 0 2.5 5.1 8.8 11.2 13.6 17.6 22.0 27.8 33.9

(i) Draw the scatter diagram for these values, labeling the axes clearly.

(ii) It is thought that the drag force F can be modeled by one of the formulae

F a bv or F c dv 2

Use your answer to part (i) to explain which of F a bv or F c dv 2 is

the better model. [2010/II/10 (modified)]

Solution:

(i)

(ii) Since the points appear to follow a curve (or trend) that is increasing at an
increasing rate (with respect to the variable v), the model F c dv 2 is the
better model in this case.

2015 – 2016 / H2 Maths / Correlation and Regression Page 24 of 27

National Junior College Mathematics Department 2016

Appendix A: Equivalence of the 2 Formulae for the Product Moment

Correlation Coefficient

To show that

x y
x x y y xy
r and r n
2 2 2 2
x x y y 2
x 2
y
x y
n n

are equivalent, we need to show the following results:

2
2
2
x
x x x
n
2
2 y
y y y2
n
x y
x x y y xy
n

Proof
2 2
x x x2 2 xx x
2
x2 2x x n x
2
2
x x
x 2 x n
n n
2 2 1 2
x2 x x
n n
1 2
x2 x
n
2
2 y
The proof for y y y2 is similar.
n

x x y y xy xy xy xy

xy x y y x n xy
x y x y
xy y x n
n n n n
x y
xy
n

2015 – 2016 / H2 Maths / Correlation and Regression Page 25 of 27

National Junior College Mathematics Department 2016

Appendix B

B1 Least-Squares Regression Line of y on x: y = a + bx

2
Now, ei2 yi a bxi

First, allow a to vary while keeping all others constant.

ei 2 denotes partial
Differentiating wrt a, we get 2 ( yi a bxi )
a derivative.
2
ei
Let 0 , we get ( yi a bxi ) 0
a
Hence, yi na b xi --- Eqn (1)

Next, allow b to vary while keeping all others constant.

ei 2
Differentiating wrt b, we get 2 ( yi a bxi )( xi )
b
ei2
Let 0 , we get 2 ( xi yi axi bxi 2 ) 0
b
2
Hence, xi yi a xi b xi --- Eqn (2)

Equations (1) and (2) are called the normal equations of y on x.

2
[Eqn(1) xi ] – [Eqn(2) n] gives xi yi n xi yi b xi bn xi 2

xi yi n x y S xy
Thus, it can shown that b 2
,
xi 2 nx S xx
2
x
where S xx x 2
, (or equivalently x 2 nx 2 ) and
n
x y
S xy xy , (or equivalently xy nxy )
n

[Eqn(1) n] gives y a bx a y bx

Thus, the equation of the regression line of y on x is given by:

S xy
y (y bx ) bx y b x x , where b .
S xx

2015 – 2016 / H2 Maths / Correlation and Regression Page 26 of 27

National Junior College Mathematics Department 2016

B2 Least-Squares Regression Line of x on y: 𝒙 = 𝒂′ + 𝒃′𝒚

2
Now, di2 xi a b yi

By doing partial differentiation on 𝑎′ and 𝑏′ as above, we get the following normal

equations:

xi na b yi --- Eqn(3)
2
xi yi a yi b yi --- Eqn(4)

Solving equations (3) and (4), we get

xi yi n x y S xy
b 2
and a x by
2 S yy
yi ny

2
y
where S yy y 2
, (or equivalently y 2 ny 2 ) and
n
x y
S xy xy , (or equivalently xy nxy )
n

Thus, the equation of the regression line of x on y is given by:

S xy
x x by by x b (y y ), where b ,
S yy

Re-arranging this equation, we get

1 S xy
y y x x , where b .
b S yy

2015 – 2016 / H2 Maths / Correlation and Regression Page 27 of 27

Gade 12 & 12 Promaths STATS 2024 June 2024
No ratings yet
Gade 12 & 12 Promaths STATS 2024 June 2024
206 pages
CRM TDS1500_LRAD8703
No ratings yet
CRM TDS1500_LRAD8703
2 pages
Bivariate Data Year 10 Notes Pwe 2016
No ratings yet
Bivariate Data Year 10 Notes Pwe 2016
14 pages
Unit 4 Correlation and Linear Regression
No ratings yet
Unit 4 Correlation and Linear Regression
26 pages
Chapter 23 Correlation and Linear Regression Lecture Notes
No ratings yet
Chapter 23 Correlation and Linear Regression Lecture Notes
23 pages
probabiliy 2
No ratings yet
probabiliy 2
12 pages
IB AAHL 4.2 Correlation & Regression
No ratings yet
IB AAHL 4.2 Correlation & Regression
12 pages
4.2 Correlation & Regression
No ratings yet
4.2 Correlation & Regression
12 pages
12-S6 Correlation and Regression
No ratings yet
12-S6 Correlation and Regression
30 pages
Lecture 7
No ratings yet
Lecture 7
65 pages
Linear Regression Analysis
No ratings yet
Linear Regression Analysis
17 pages
Book 2 Notes-71-78
No ratings yet
Book 2 Notes-71-78
8 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Captura de ecrã 2024-10-16 à(s) 13.04.06
No ratings yet
Captura de ecrã 2024-10-16 à(s) 13.04.06
38 pages
8.1-linear-regression-and-correlation-analysis-glossary
No ratings yet
8.1-linear-regression-and-correlation-analysis-glossary
8 pages
Unit 17 Correlation and Regression
100% (1)
Unit 17 Correlation and Regression
13 pages
Ch.4 Correlation
No ratings yet
Ch.4 Correlation
1 page
RM Chap 18 Bivariate Analysis
No ratings yet
RM Chap 18 Bivariate Analysis
30 pages
Aiml M3 C3
No ratings yet
Aiml M3 C3
37 pages
Simple Linear Regression (1)
No ratings yet
Simple Linear Regression (1)
83 pages
Linear_Regression SHARP Calculator
No ratings yet
Linear_Regression SHARP Calculator
2 pages
7. Chapter 14 Simple Linear Regression .
No ratings yet
7. Chapter 14 Simple Linear Regression .
39 pages
unit 2
No ratings yet
unit 2
44 pages
L3 Correlation
No ratings yet
L3 Correlation
101 pages
Chapter 5 Bivariate Analysis Students Notes 230125 152159-1
No ratings yet
Chapter 5 Bivariate Analysis Students Notes 230125 152159-1
13 pages
CH 6
No ratings yet
CH 6
42 pages
1 Correlation
No ratings yet
1 Correlation
1 page
MAT 120 Chapter 9 Notes PDF
No ratings yet
MAT 120 Chapter 9 Notes PDF
4 pages
05 - Lecture 2
No ratings yet
05 - Lecture 2
111 pages
北师大版高中数学必修3
No ratings yet
北师大版高中数学必修3
180 pages
GAITRite Measurement Definitions
No ratings yet
GAITRite Measurement Definitions
16 pages
Corr_Regression Analysis
No ratings yet
Corr_Regression Analysis
19 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Chapter_10.QM sir pac
No ratings yet
Chapter_10.QM sir pac
8 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
北师大高中数学选修4-4 坐标系与参数方程
No ratings yet
北师大高中数学选修4-4 坐标系与参数方程
60 pages
Regression: Simple Linear Regression Model
No ratings yet
Regression: Simple Linear Regression Model
16 pages
Presentation4 - Bivariate Analysis and Simple Linear Regression
No ratings yet
Presentation4 - Bivariate Analysis and Simple Linear Regression
31 pages
ASS#1-FINALS Doromal
No ratings yet
ASS#1-FINALS Doromal
8 pages
6 Correlation and Linear Regression
No ratings yet
6 Correlation and Linear Regression
32 pages
St. Paul University Philippines
No ratings yet
St. Paul University Philippines
10 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Statistics Learners' Working Manual
No ratings yet
Statistics Learners' Working Manual
25 pages
Correg
No ratings yet
Correg
19 pages
5_Chapter9-linear regression
No ratings yet
5_Chapter9-linear regression
15 pages
Correlation and Regression
No ratings yet
Correlation and Regression
23 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
CH 5 - Correlation and Regression
No ratings yet
CH 5 - Correlation and Regression
9 pages
Regression & Correlation 230224 221642
No ratings yet
Regression & Correlation 230224 221642
9 pages
REGRESSION and CORRELATION ANALYSIS STA 106 -DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 -DR. BASHIRU
10 pages
SQQS2073 Note 1 Simple Linear Regression
No ratings yet
SQQS2073 Note 1 Simple Linear Regression
11 pages
Curve Sketching Notes
No ratings yet
Curve Sketching Notes
34 pages
earth pressure theory
No ratings yet
earth pressure theory
38 pages
Cha 6
No ratings yet
Cha 6
8 pages
Business Stat CHAPTER 6
No ratings yet
Business Stat CHAPTER 6
5 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
Topic 5-Lecture Notes
No ratings yet
Topic 5-Lecture Notes
12 pages
Notes - Correlation & Regression
No ratings yet
Notes - Correlation & Regression
34 pages
北师大版高中数学必修3
No ratings yet
北师大版高中数学必修3
180 pages
北师大版高中数学选修1 1
No ratings yet
北师大版高中数学选修1 1
108 pages
北师大高中数学必修5
No ratings yet
北师大高中数学必修5
127 pages
Unit 3-1
No ratings yet
Unit 3-1
12 pages
Arnab Chowdhury DM
75% (4)
Arnab Chowdhury DM
14 pages
2022 Sem 1 Class TT Updated
No ratings yet
2022 Sem 1 Class TT Updated
62 pages
北师大高中数学选修3-1 数学史选讲
No ratings yet
北师大高中数学选修3-1 数学史选讲
106 pages
Complex Numbers Tutorial
No ratings yet
Complex Numbers Tutorial
9 pages
Statistics of Two Variables: Functions
No ratings yet
Statistics of Two Variables: Functions
15 pages
Unit 5 (CORRELATION AND REGRESSION)
No ratings yet
Unit 5 (CORRELATION AND REGRESSION)
23 pages
Lecture_3.2- Operators And control statements in Java
No ratings yet
Lecture_3.2- Operators And control statements in Java
19 pages
北师大高中数学选修4-7 优选法与试验设计初步
No ratings yet
北师大高中数学选修4-7 优选法与试验设计初步
80 pages
北师大高中数学选修1 1
No ratings yet
北师大高中数学选修1 1
104 pages
Complex Numbers Assgt 2
No ratings yet
Complex Numbers Assgt 2
2 pages
Vectors II Lecture Notes
No ratings yet
Vectors II Lecture Notes
39 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Std x Icse Maths Ch 14 Reflection Divesh Sir Manjiri Pg 174 187
No ratings yet
Std x Icse Maths Ch 14 Reflection Divesh Sir Manjiri Pg 174 187
14 pages
Math8 q2 Mod5of7 If-Then-Statements v2
100% (2)
Math8 q2 Mod5of7 If-Then-Statements v2
17 pages
Unit 7 Regration and Correlation
No ratings yet
Unit 7 Regration and Correlation
11 pages
Concise Selina Solutions For Class 9 Maths Chapter 27 Graphical Solution PDF
No ratings yet
Concise Selina Solutions For Class 9 Maths Chapter 27 Graphical Solution PDF
38 pages
UT LEVEL-2 Part-1
No ratings yet
UT LEVEL-2 Part-1
301 pages
AVL User Guide
No ratings yet
AVL User Guide
38 pages
Google Forms - MCQ PDF
No ratings yet
Google Forms - MCQ PDF
18 pages
HANDOUT
No ratings yet
HANDOUT
13 pages
Chapter 6
No ratings yet
Chapter 6
28 pages
Surface Finish and Surface Integrity (Compatibility Mode)
100% (1)
Surface Finish and Surface Integrity (Compatibility Mode)
23 pages
Latches and Flip-Flop: Finals - Lecture 2
No ratings yet
Latches and Flip-Flop: Finals - Lecture 2
28 pages
Vectors II Tutorial 2 Solutions
No ratings yet
Vectors II Tutorial 2 Solutions
18 pages
Jigsaw Fractal
No ratings yet
Jigsaw Fractal
4 pages
2023 Rivision 34 With Pass Code
No ratings yet
2023 Rivision 34 With Pass Code
12 pages
C R Tut Solns
No ratings yet
C R Tut Solns
14 pages
9b Simplex Operations Research
No ratings yet
9b Simplex Operations Research
15 pages
Differential Equations Summary
No ratings yet
Differential Equations Summary
4 pages
Math 1
No ratings yet
Math 1
4 pages
SS Prelim Revision
No ratings yet
SS Prelim Revision
15 pages
06 Simple Linear Regression Part1
No ratings yet
06 Simple Linear Regression Part1
8 pages
Complex Numbers (Loci) Summary
No ratings yet
Complex Numbers (Loci) Summary
2 pages
Math Homework Solver With Steps
100% (1)
Math Homework Solver With Steps
7 pages
Math 9 Q4 Week 5 LAS 5 - 2
100% (1)
Math 9 Q4 Week 5 LAS 5 - 2
4 pages
Correlation and Regression
No ratings yet
Correlation and Regression
3 pages
Differential Equations Practice Set
No ratings yet
Differential Equations Practice Set
7 pages
Vectors - II - Tutorial - 2 Qns
No ratings yet
Vectors - II - Tutorial - 2 Qns
8 pages
COMPILED 3RD PERIODICAL TEST (Mam Inkay Peralta)
No ratings yet
COMPILED 3RD PERIODICAL TEST (Mam Inkay Peralta)
94 pages
Set 1 (One-One and Inverse Functions)
No ratings yet
Set 1 (One-One and Inverse Functions)
7 pages
Namma Kalvi 11th Maths Public Exam 2022 Answer Key EM 220799
No ratings yet
Namma Kalvi 11th Maths Public Exam 2022 Answer Key EM 220799
7 pages
Control of A Servo Motor Write-Up
No ratings yet
Control of A Servo Motor Write-Up
8 pages
DE and Parametric Equations Quiz
No ratings yet
DE and Parametric Equations Quiz
4 pages
C Program Question
No ratings yet
C Program Question
6 pages
Assignment 4 - Heaps
No ratings yet
Assignment 4 - Heaps
7 pages
Experimental Study On Modal and Harmonic Analysis of Small Wind Turbine Blades Using NACA 63-415 Aerofoil Cross-Section
No ratings yet
Experimental Study On Modal and Harmonic Analysis of Small Wind Turbine Blades Using NACA 63-415 Aerofoil Cross-Section
13 pages
Sequences Series Practice Set Solutions
No ratings yet
Sequences Series Practice Set Solutions
5 pages
Bin N Poisson Distributions Assignment
No ratings yet
Bin N Poisson Distributions Assignment
3 pages
扫描全能王 2022-02-15 21.15
No ratings yet
扫描全能王 2022-02-15 21.15
8 pages
3 VJC 2022 Promo Solutions
No ratings yet
3 VJC 2022 Promo Solutions
12 pages
DLL Mathematics 5 q1 w3
0% (1)
DLL Mathematics 5 q1 w3
9 pages
Differentiation Techniques Assignment Solutions-3
No ratings yet
Differentiation Techniques Assignment Solutions-3
1 page
Srikage &creep
No ratings yet
Srikage &creep
10 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)