0% found this document useful (0 votes)
13 views3 pages

3.7 Linear Regression

The document provides an overview of linear regression, including the least squares method for determining the line of best fit, covariance as a measure of correlation, and residuals as the differences between actual data points and predicted values. It also discusses the coefficient of determination and includes examples of calculating linear regression and correlation coefficients based on sample data. Additionally, it explains the concepts of interpolation and extrapolation in the context of predicting values using regression equations.

Uploaded by

dctutor2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views3 pages

3.7 Linear Regression

The document provides an overview of linear regression, including the least squares method for determining the line of best fit, covariance as a measure of correlation, and residuals as the differences between actual data points and predicted values. It also discusses the coefficient of determination and includes examples of calculating linear regression and correlation coefficients based on sample data. Additionally, it explains the concepts of interpolation and extrapolation in the context of predicting values using regression equations.

Uploaded by

dctutor2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Linear Regression

- a technique to find ! = #$ + & for the line of best fit


- “Least Square Method”
' ∑ $! − (∑ $)(∑ !)
slope ->
#=
'(∑ $ ' ) − (∑ $)'
↓ mean of all y-values
intercept & = !- − #$̅ - mean of all X-values
y

linear regression equation Ex


: +

Covariance
- measure of correlation which reveals whether the relationship between the variables is positive or
negative
1 ∑$∑!
.#( = WX $! − Y
'−1 '

Residual
- the difference between the values of ! at the data point and at the point that lies on the line of best fit
and has the same $-coordinate as the data point
actual point point ofthr
regression
345678#9 = :&54;<47 − =;476>?
al

·
positive = data above the line of best fit &

negative = data below the line of best fit


& O

~
S
y

when <
=
-
5
,
residual is 2

r-value =
coefficient of correlation (for linear regression)
(line of best fit)

= Coefficient of Determination for(non-linear regression>


(curve of best fit)
o ; ' (the square of the correlation coefficient)
o it measures the proportion of the variation in y that is explained by the variation in $
o it tells what percentage of points that are on the line/curve of best fit
o 0 ≤ ; ' ≤ 1, ; ' = 1 the curve is a perfect fit
o this applies to any type of regression

pg. 14
Linear Correlation and Regression Practice
Determine the linear regression and coefficient of correlation for the following examples.
Example #1

x y $' !' $!
2 13 ↑ 169 26

6 20 36450 120

7 27 49 729 189

X$ X! X $' X !' X $!
15 60 891298335
335) (15) (60)
~ -

a =

' = 3 3(89) -
(15)2
b =
60 - 2 5) .

=
2 5
.

& 2 .
54/X
= 7 5 .
·
3(335) (15) (60) is 2 54 + 7 5 i ! >X
r=
-

i linear regression y =
.
.
L

(3(89) -

(15) 2)(3(1298) -

16013) sinmins ↑ m=
Y = rate

-X >
of change
-Y
bbb
-

linear correlation
=.945 It is strong positive For every 1 value of x increases
,

#7/pumkin
·
2 5 .
units
y-value will increase by .

Example #2: rate of change


vslope ↓
-

Are the marks one receives in a course related to the amount of time spent studying the subject? To analyze
this possibility, a student took a random sample of 10 students who had enrolled in an accounting class last
semester. She asked each to report his or her mark in the course and the total number of hours spent studying
accounting. These data are listed here.
Hours 40 42 37 48 25 44 41 48 35 28 a) Calculate the covariance.
& ∑#∑(
spent .#( = $)& ]∑ $! − $ ^
Marks 77 63 79 86 51 78 83 90 65 47
=
-
10-1
128798 -

(309)
Xa = 100 . 089

linear correlation .
there's positive
ya i a

Xy
:
53643 28798
b) Calculate the coefficient of correlation.
[x 388
zy 719 [x 15592 [y Exy
=

$ ∑ #()(∑ #)(∑ ()
= =
= =

;= ∑ 6 6 ∑ 6 ,[$ # )(∑ #) ][$ ( )(∑ ()6 ]

10 (28798) -

(388) (719)
r =

(10(15592) (3884)(10(53643)
- -

1719))
= 0 880
.

strong positive linear correlation.


: It is a

c) What do the statics you have calculated tell you about the relationship between marks and study times?
The r-value indicates a strong positive linear correlation between the two variables. It has
suggested that the time spent in studying has a significant impact on the result as the more
time spent on studying, the higher the marks can be achieved.

pg. 15
Example #3:
Least Squares Method for Finding Equation of a Line of Best Fit
$' !' $! 1) Substitute these totals into the formula
Age (years) Annual Income for a.
- (X) ($) - (Y)
33 33 #=
$ ∑ #()(∑ #)(∑ ()
=
8(13221) -

(292) (329)
25 31 $(∑ # 6 ))(∑ #)6
8(11712) -

2292)"
19 18
44 52 =
1 15 0
.

50 56
54 60 2) To determine b, you also need the means
of x and y.
38 44

X$
29 35
X! X$ '
X! '
X $!
b =
-a)
292329 11712 14975 13221 =
-

0 85.

3) Now substitute the values of a and b into the equation:

1 15x 0 85
y
= -
.
.

4) You can use the equation of the line of best fit as a model.

5) Predict the income for an employee who is 21 and an employee retiring at age 65.

a) For a 21 year old employee, b) For a 65 year old employee,

y
= 1 .
15(21) -

0 . 85
y = 1 . 15(65) -
0 . 85
=
23300 = $73900
will
is a
21-year-old employee will i a 65-year-old employee
approx $73900·
approx $23300.
earn .
earn .

6) Determine which one is interpolation and which one is extrapolation from question #5.
because the prediction is with the given
Employee of 21-year-old is interporation
the
Employee of 65-year-old is extrapolations because
1

interval of independent variable


.

prediction is out of the given interval of independent variable


.

7) Comment on the accuracy of both estimates.


restriction on the domain
needs because children
The model a
.
excluded
and retirees should be

pg. 16

You might also like