0% found this document useful (0 votes)
14 views

SEHH1008 Chapter 04 Correlation and Regression

Uploaded by

niking1685567
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

SEHH1008 Chapter 04 Correlation and Regression

Uploaded by

niking1685567
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

SEHH1008 Mathematics and

Statistics for College Students

SEHH1008 Mathematics and Statistics for College Students 1


Chapter 4 – Correlation and Regression
• Make a scatter diagram.
• Compute the sample correlation coefficient r.
• Investigate the meaning of the correlation coefficient r.
• State the least-squares criterion.
• Find the equation of the least-squares line. Graph the least-
squares line.
• Use the least-squares line to predict a value of the response
variable y for a specified value of the explanatory variable x.
• Use r2 to determine explained and unexplained
variation of the response variable y.

SEHH1008 Mathematics and Statistics for College Students 2


Scatter Diagram
A scatter diagram is a graph in which data pairs 𝑥, 𝑦 are
plotted as individual points on a grid with horizontal axis 𝑥
and vertical axis 𝑦.

In ordered pairs 𝑥, 𝑦 , 𝑥 is called the explanatory variable


and 𝑦 is called the response variable.

SEHH1008 Mathematics and Statistics for College Students 3


Example 1 – Scatter Diagram
A study on the relation between amount of sleep and coffee
consumption is conducted in a small factory with 8 workers.
Worker’s weekly amount of sleep 𝑥 (in hours) and their
weekly coffee consumption 𝑦 (in cups) were recorded in the
following table.

𝑥 21 28 35 37 40 43 48 56
𝑦 21 14 12 5 14 7 10 3
(a) Construct a scatter diagram for the data.
(b) Comment on the relationship between x and y.

SEHH1008 Mathematics and Statistics for College Students 4


Example 1 – Scatter Diagram
(a) Scatter Diagram for the relation between amount of sleep and
coffee consumption
25
Coffee Consumption (in

20
15
cups)

10
5
0
0 20 40 60
Amount of Sleep (in hours)

(b) Smaller values of x are associated with larger values of y and


larger values of x tend to be associated with smaller values of y.
In general, as the number of sleep goes up, the coffee
consumption goes down.
SEHH1008 Mathematics and Statistics for College Students 5
Linear Correlation
How to find a line which “best fit” the data?
Use a downward-sloping line that lies close to the points.
25
Coffee Consumption (in cups)

20

15

10

0
0 10 20 30 40 50 60
Amount of Sleep (in hours)

SEHH1008 Mathematics and Statistics for College Students 6


Linear Correlation
The “best-fitting” line should be the one that comes closest
to each of the points of the scatter diagram.

• If the points of a scatter diagram are located so that no


straight line is realistically a “good” fit, we then say that the
points possess no linear correlation.

• If the points seem close to a straight line, we say the linear


correlation is moderate to high, depending on how close
the points lie to the line.

• If all the points do, in fact, lie on a line, then we have


perfect linear correlation.
SEHH1008 Mathematics and Statistics for College Students 7
Linear Correlation
Scatter Diagrams with
No Linear
Correlation

High Linear
Correlation

Perfect Linear
Correlation

SEHH1008 Mathematics and Statistics for College Students 8


Correlation Coefficient
The sample correlation coefficient 𝑟 is a numerical
measurement that assesses the strength of a linear
relationship between two variables 𝑥 and 𝑦. The population
correlation coefficient 𝜌 computed from all population
data pairs (𝑥, 𝑦).

SEHH1008 Mathematics and Statistics for College Students 9


Sample Correlation Coefficient r
Properties of 𝑟:

1. 𝑟 is a unitless measurement between -1 and 1. In symbols,


– 1  𝑟  + 1.
• If 𝑟 = 1, there is perfect positive linear correlation.
• If 𝑟 = −1, there is perfect negative linear correlation.
• If 𝑟 = 0, there is no linear correlation.
• The closer 𝑟 is to 1 or -1, the better a line describes the
relationship between the two variables 𝑥 and 𝑦.

2. Positive values of 𝑟 imply that as 𝑥 increases, 𝑦 tends to


increase. Negative values of 𝑟 imply that as 𝑥 increases, 𝑦
tends to decrease.

SEHH1008 Mathematics and Statistics for College Students 10


Sample Correlation Coefficient r
Properties of 𝑟:

3. The value of 𝑟 is the same regardless of which variable is


the explanatory variable and which is the response
variable. In other words, the value of 𝑟 is the same for the
pairs (𝑥, 𝑦) and the corresponding pairs (𝑦, 𝑥).

4. The value of 𝑟 does not change when either variable is


converted to different units.

SEHH1008 Mathematics and Statistics for College Students 11


Sample Correlation Coefficient r

No Correlation 𝑟 = 0.0
Very Weak Correlation 0.0 < 𝑟 < 0.2
Weak Correlation 0.2 ≤ 𝑟 < 0.4
Moderate Correlation 0.4 ≤ 𝑟 < 0.6
Strong Correlation 0.6 ≤ 𝑟 < 0.8
Very Strong Correlation 0.8 ≤ 𝑟 < 1.0
Perfect Correlation 𝑟 = 1.0

SEHH1008 Mathematics and Statistics for College Students 12


Positive Linear Correlation
If there is a positive linear relation between variables 𝑥 and
𝑦, then high values of 𝑥 are paired with high values of 𝑦, and
low values of 𝑥 are paired with low values of 𝑦.

Positive linear correlation

SEHH1008 Mathematics and Statistics for College Students 13


Negative Linear Correlation
In the case of negative linear correlation, high values of 𝑥
are paired with low values of 𝑦, and low values of 𝑥 are paired
with high values of 𝑦.

Negative linear correlation

SEHH1008 Mathematics and Statistics for College Students 14


Little or No Linear Correlation
If there is little or no linear correlation between 𝑥 and 𝑦,
however, then we will find both high and low 𝑥 values
sometimes paired with high 𝑦 values and sometimes paired
with low y values.

Little or no linear correlation

SEHH1008 Mathematics and Statistics for College Students 15


How to compute the r
Procedure:

1. Using the data pairs, compute:

σ 𝑥 , σ 𝑦, σ 𝑥 2 , σ 𝑦 2 and σ 𝑥𝑦

2. With 𝑛 = sample size, then compute the sample


correlation coefficient 𝑟

𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑟=
𝑛 σ 𝑥2 − σ 𝑥 2 𝑛 σ 𝑦2 − σ 𝑦 2

SEHH1008 Mathematics and Statistics for College Students 16


Example 2 – Sample Correlation Coefficient r
A study on the relationship between IQ scores and mathematics
competence of Primary 1 students were conducted. Following data
were collected from 8 Primary 1 students, where x is student’s IQ
score and y is student’s score in mathematics examination.

𝑥 95 98 100 102 104 110 116 125

𝑦 66 70 75 69 85 83 95 98

(a) Construct a scatter diagram. Do you expect r to be positive?


(b) Compute the sample correlation coefficient r, correct your final
answer to 4 decimal places.
(c) Is there a linear correlation between IQ scores and
mathematics competence of Primary 1 students ?

SEHH1008 Mathematics and Statistics for College Students 17


Example 2 – Sample Correlation Coefficient r
(a) Scatter diagram for the relation between IQ scores and
mathematics competence.
110
100
Math exam's scores

90
80
70
60
50
40
90 100 110 120 130
IQ scores
It appears that as x values increase, y values also tend to
increase. Thus, r should be positive.
SEHH1008 Mathematics and Statistics for College Students 18
Example 2 – Sample Correlation Coefficient r
(b) 𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
95 66 9025 4356 6270
98 70 9604 4900 6860
100 75 10000 5625 7500
102 69 10404 4761 7038
104 85 10816 7225 8840
110 83 12100 6889 9130
116 95 13456 9025 11020
125 98 15625 9604 12250
σ 𝑥 = 850 σ 𝑦 = 641 σ 𝑥 2 = 91030 σ 𝑦 2 = 52385 σ 𝑥𝑦 = 68908

𝑛 σ 𝑥𝑦− σ 𝑥 σ 𝑦 8 68908 − 850 641


𝑟= =
𝑛 σ 𝑥 2− σ 𝑥 2 𝑛 σ 𝑦2− σ 𝑦 2 8 91030 −8502 8 52385 −6412

= 0.934959 = 0.9350

SEHH1008 Mathematics and Statistics for College Students 19


Example 2 – Sample Correlation Coefficient r

(c) Since r is very close to 1, there is an indication of a


very strong positive linear correlation between IQ scores
and mathematics competence of Primary 1 students.

SEHH1008 Mathematics and Statistics for College Students 20


Linear Regression Line
Which linear regression line best fit the data?
110
100
Math exam's scores

90
80
70
60
50
40
90 100 110 120 130
IQ scores

SEHH1008 Mathematics and Statistics for College Students 21


Least-squares Criterion
To find the linear equation that “best-fit” the data points of
the scatter diagram using least-squares criterion.

By minimizing the sum of the squares of the vertical distances


between the data points (x, y) and the line.

SEHH1008 Mathematics and Statistics for College Students 22


Least-squares Criterion
𝑑 represents the difference between the 𝑦 coordinate of the
data point and the corresponding 𝑦 coordinate on the line.

• If the data point lies above the line, 𝑑 is positive


• If the data point is below the line, 𝑑 is negative.

𝑑
𝑑
𝑑

𝑑
𝑑 𝑑

SEHH1008 Mathematics and Statistics for College Students 23


Least-squares Line
𝑦ො represents the value of the response variable 𝑦 estimated
using the least-squares line and a given value of the
explanatory variable 𝑥.

The equation of the least-squares line is

𝑦ො = 𝑎 + 𝑏𝑥
Estimated y x value
value Intercept Slope of
of the line the line

SEHH1008 Mathematics and Statistics for College Students 24


Least-squares Line – Slope
The slope of the least-squares line tells us how many units
the response variable 𝑦 is expected to change for each unit
change in the explanatory variable 𝑥.

The number of units change in 𝑦 for each unit change in 𝑥 is


also called the marginal change of the response variable.

SEHH1008 Mathematics and Statistics for College Students 25


How to find the equation of 𝒚
ෝ = 𝒂 + 𝒃𝒙
Procedure:

1. Using the data pairs, compute:

σ 𝑥 , σ 𝑦, σ 𝑥 2 , σ 𝑦 2 , σ 𝑥𝑦, 𝑥ҧ and 𝑦ത

2. With 𝑛 = sample size, then compute


𝑛 σ 𝑥𝑦− σ 𝑥 σ 𝑦
• Slope 𝑏, 𝑏=
𝑛 σ 𝑥2− σ 𝑥 2

• 𝑦-intercept a, 𝑎 = 𝑦ത − 𝑏𝑥ҧ

3. The equation of the least-squares line is 𝑦ො = 𝑎 + 𝑏𝑥

SEHH1008 Mathematics and Statistics for College Students 26


Example 3 – Least-squares Line 𝒚
ෝ = 𝒂 + 𝒃𝒙
(Example 2 ctd’)
A study on the relationship between IQ scores and mathematics
competence of Primary 1 students were conducted. Following data
were collected from 8 Primary 1 students, where x is student’s IQ
score and y is student’s score in mathematics examination.

𝑥 95 98 100 102 104 110 116 125


𝑦 66 70 75 69 85 83 95 98

(a) Find the equation of the least-squares line, correct your final
answers to 4 decimal places.
(b) Graph the least-squares line on the scatter diagram constructed
in example 2.
(c) Interpret the meaning of the slope.
(d) Predict the mathematics exam score for a student with IQ score
of 107, correct your final answer to an integer.
SEHH1008 Mathematics and Statistics for College Students 27
Example 3 – Least-squares Line 𝒚
ෝ = 𝒂 + 𝒃𝒙

(a) 𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
95 66 9025 4356 6270
98 70 9604 4900 6860
100 75 10000 5625 7500
102 69 10404 4761 7038
104 85 10816 7225 8840
110 83 12100 6889 9130
116 95 13456 9025 11020
125 98 15625 9604 12250
σ 𝑥 = 850 σ 𝑦 = 641 σ 𝑥 2 = 91030 σ 𝑦 2 = 52385 σ 𝑥𝑦 = 68908

SEHH1008 Mathematics and Statistics for College Students 28


Example 3 – Least-squares Line 𝒚
ෝ = 𝒂 + 𝒃𝒙

σ 𝑥 = 850, σ 𝑦 = 641 , σ 𝑥 2 = 91030 , σ 𝑦 2 = 52385


σ 𝑥𝑦 = 68908

σ𝑥 850
𝑥ҧ = = = 106.25
𝑛 8
σ𝑦 641
𝑦ത = = = 80.125
𝑛 8
𝑛 σ 𝑥𝑦− σ 𝑥 σ 𝑦 8 68908 − 850 641 6414
𝑏= = = = 1.1174
𝑛 σ 𝑥2− σ 𝑥 2 8 91030 − 850 2 5740
641 850 6414
𝑎 = 𝑦ത − 𝑏𝑥ҧ = − = −38.6010
8 8 5740

The equation of the least-squares line is 𝑦ො = −38.6010 + 1.1174𝑥

SEHH1008 Mathematics and Statistics for College Students 29


Example 3 – Least-squares Line 𝒚
ෝ = 𝒂 + 𝒃𝒙

(b) The point 𝑥,ҧ 𝑦ത is always on the least-squares line.

110
100
Math exam's scores

90
80

70 ഥ, 𝒚
𝒙 ഥ

60
50

40
90 100 110 120 130
IQ scores

SEHH1008 Mathematics and Statistics for College Students 30


Example 3 – Least-squares Line 𝒚
ෝ = 𝒂 + 𝒃𝒙

(c) The equation of the least-squares line is

𝑦ො = −38.6010 + 1.1174𝑥

The slope 𝑏 = 1.1174 indicates that if the student’s IQ


score 𝑥) increased by 1, then it is expected his/her score
in mathematics exam 𝑦 to be increased by 1.1174.

(d) The mathematics exam score for a student with IQ score


of 107 is
𝑦ො = −38.601045 + 1.117422 107 = 80.96 ≈ 81

SEHH1008 Mathematics and Statistics for College Students 31


Coefficient of Determination 𝒓𝟐
The coefficient of determination 𝑟 2 is a measure of the
proportion of variation in 𝑦 that is explained by the
regression line, using 𝑥 as the explanatory variable

2
𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑟2 =
𝑛 σ 𝑥2 − σ 𝑥 2 𝑛 σ 𝑦2 − σ 𝑦 2

SEHH1008 Mathematics and Statistics for College Students 32


Coefficient of Determination 𝒓𝟐
The value 𝑟 2 is the ratio of explained variation over total
variation. That is, 𝑟 2 is the fractional amount of total variation
in y that can be explained by using the linear model is 𝑦ො =
𝑎 + 𝑏𝑥.

1 − 𝑟 2 is the fractional amount of total variation in 𝑦 that is


unexplained variation due to random chance or to the
possibility of lurking variables that influence y.

SEHH1008 Mathematics and Statistics for College Students 33


Example 4 – Coefficient of Determination 𝒓𝟐
(Example 2 ctd’)
A study on the relationship between IQ scores and mathematics
competence of Primary 1 students were conducted. Following data
were collected from 8 Primary 1 students, where x is student’s IQ
score and y is student’s score in mathematics examination.

𝑥 95 98 100 102 104 110 116 125


𝑦 66 70 75 69 85 83 95 98

(a) Find the value of coefficient of determination 𝑟 2 , correct your


final answer to 4 decimal places.
(b) What fractional part of the variability in 𝑦 can be associated
with the variability in 𝑥?
(c) What fractional part of the variability in 𝑦 is not associated
with a corresponding variability in 𝑥?

SEHH1008 Mathematics and Statistics for College Students 34


Example 4 – Coefficient of Determination 𝒓𝟐
(a) Coefficient of Determination 𝑟 2 =
2
𝑛 σ 𝑥𝑦− σ 𝑥 σ 𝑦
𝑛 σ 𝑥2− σ 𝑥 2 𝑛 σ 𝑦2 − σ 𝑦 2
8 68908 − 850 641 2
=
8 91030 −8502 8 52385 −6412

= 0.874148 ≈ 0.8741
(b) About 87.41% of the variation of mathematics exam score
y can be explained by the corresponding variation in the
IQ score x using the least-squares line.

(c) About 12.59% of the variation of mathematics exam score


y can NOT be explained by the corresponding variation in
the IQ score x using the least-squares line.
SEHH1008 Mathematics and Statistics for College Students 35

You might also like