3.7 Linear Regression
3.7 Linear Regression
Covariance
- measure of correlation which reveals whether the relationship between the variables is positive or
negative
1 ∑$∑!
.#( = WX $! − Y
'−1 '
Residual
- the difference between the values of ! at the data point and at the point that lies on the line of best fit
and has the same $-coordinate as the data point
actual point point ofthr
regression
345678#9 = :&54;<47 − =;476>?
al
·
positive = data above the line of best fit &
~
S
y
when <
=
-
5
,
residual is 2
r-value =
coefficient of correlation (for linear regression)
(line of best fit)
pg. 14
Linear Correlation and Regression Practice
Determine the linear regression and coefficient of correlation for the following examples.
Example #1
x y $' !' $!
2 13 ↑ 169 26
6 20 36450 120
7 27 49 729 189
X$ X! X $' X !' X $!
15 60 891298335
335) (15) (60)
~ -
a =
' = 3 3(89) -
(15)2
b =
60 - 2 5) .
=
2 5
.
& 2 .
54/X
= 7 5 .
·
3(335) (15) (60) is 2 54 + 7 5 i ! >X
r=
-
i linear regression y =
.
.
L
(3(89) -
(15) 2)(3(1298) -
16013) sinmins ↑ m=
Y = rate
-X >
of change
-Y
bbb
-
linear correlation
=.945 It is strong positive For every 1 value of x increases
,
#7/pumkin
·
2 5 .
units
y-value will increase by .
Are the marks one receives in a course related to the amount of time spent studying the subject? To analyze
this possibility, a student took a random sample of 10 students who had enrolled in an accounting class last
semester. She asked each to report his or her mark in the course and the total number of hours spent studying
accounting. These data are listed here.
Hours 40 42 37 48 25 44 41 48 35 28 a) Calculate the covariance.
& ∑#∑(
spent .#( = $)& ]∑ $! − $ ^
Marks 77 63 79 86 51 78 83 90 65 47
=
-
10-1
128798 -
(309)
Xa = 100 . 089
linear correlation .
there's positive
ya i a
Xy
:
53643 28798
b) Calculate the coefficient of correlation.
[x 388
zy 719 [x 15592 [y Exy
=
$ ∑ #()(∑ #)(∑ ()
= =
= =
10 (28798) -
(388) (719)
r =
(10(15592) (3884)(10(53643)
- -
1719))
= 0 880
.
c) What do the statics you have calculated tell you about the relationship between marks and study times?
The r-value indicates a strong positive linear correlation between the two variables. It has
suggested that the time spent in studying has a significant impact on the result as the more
time spent on studying, the higher the marks can be achieved.
pg. 15
Example #3:
Least Squares Method for Finding Equation of a Line of Best Fit
$' !' $! 1) Substitute these totals into the formula
Age (years) Annual Income for a.
- (X) ($) - (Y)
33 33 #=
$ ∑ #()(∑ #)(∑ ()
=
8(13221) -
(292) (329)
25 31 $(∑ # 6 ))(∑ #)6
8(11712) -
2292)"
19 18
44 52 =
1 15 0
.
50 56
54 60 2) To determine b, you also need the means
of x and y.
38 44
X$
29 35
X! X$ '
X! '
X $!
b =
-a)
292329 11712 14975 13221 =
-
0 85.
1 15x 0 85
y
= -
.
.
4) You can use the equation of the line of best fit as a model.
5) Predict the income for an employee who is 21 and an employee retiring at age 65.
y
= 1 .
15(21) -
0 . 85
y = 1 . 15(65) -
0 . 85
=
23300 = $73900
will
is a
21-year-old employee will i a 65-year-old employee
approx $73900·
approx $23300.
earn .
earn .
6) Determine which one is interpolation and which one is extrapolation from question #5.
because the prediction is with the given
Employee of 21-year-old is interporation
the
Employee of 65-year-old is extrapolations because
1
pg. 16