Math 141: Lecture 18: Correlation and Regression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Math 141

Lecture 18: Correlation and Regression

Albyn Jones1

1 Library
304
[email protected]
www.people.reed.edu/∼jones/courses/141

Albyn Jones Math 141


Correlation

Definition: Correlation Coefficient


Cov(X , Y )
Cor(X , Y ) = ρxy =
σx σy

Recall: the correlation coefficient measures strength of linear


association. It does not measure non-linear association!

People often compute correlation coefficients as if they were


general measures of association. Do not be fooled!

Albyn Jones Math 141


Correlation Examples
ρ=0

Correlation 0

3
● ●

● ●

● ●
● ●

● ● ●
● ● ●
2
● ● ●
● ● ●
● ● ●
● ● ●●●
● ●
● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ●● ● ● ●
● ● ●●
● ●● ●● ●●●● ●
●● ● ● ●● ●
● ●
● ● ● ●
● ● ● ●● ●
● ● ● ● ●
● ● ● ● ●
● ●● ● ● ● ● ●● ● ● ● ●● ●● ●

● ● ● ● ●●● ● ●● ●●
● ●
● ● ● ● ● ●
● ● ● ● ●●●●● ●● ●● ● ●
● ● ● ●● ● ● ●● ●● ● ●
1

● ●● ●
● ● ● ●● ●
● ● ● ●● ● ●
● ● ● ●● ● ● ● ●● ● ●●
●●● ● ● ●● ●● ●● ●● ●● ●
● ● ● ● ●●● ● ● ● ● ●● ● ●● ● ●
● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●
● ●
● ● ● ● ● ●●● ● ● ● ●● ●
● ● ● ●● ●●● ● ●
● ●● ● ●● ●
● ●
●●
● ● ● ● ●●● ● ● ● ●
● ●●● ● ●
● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ●● ●
● ●●●● ●

●● ●
●● ●
●●● ● ● ●● ● ● ●
●●●● ● ● ● ●● ● ● ●● ● ● ● ●
● ● ●●●● ●● ● ● ●●● ●
● ●●● ●●● ● ● ● ● ●
● ●● ● ●● ●●● ● ● ●● ● ● ● ● ●

●●● ●● ●● ●● ●●● ●● ● ● ●● ● ●●●●
● ● ●
● ●● ●
● ●● ●
● ● ● ● ●● ●
● ● ●
● ● ●
● ● ● ● ● ●● ●● ●●
● ●● ● ● ●● ● ●●● ● ● ● ● ● ●
● ● ● ● ●●● ●●● ● ●●●
● ● ● ● ●● ●
0

● ●●●
y

● ●
● ●●● ●


●● ● ● ● ●
● ● ●● ● ● ●
● ●●
● ● ● ●● ● ●● ● ●●● ● ● ●●● ●● ● ● ●●●●● ● ●● ● ● ●
● ● ●
● ● ● ●● ●● ●
● ●●●● ●●●

● ● ●
● ●●● ● ● ● ● ●
● ● ● ● ● ●●● ●● ● ●●
● ●●●
● ●●● ● ● ●
● ● ● ● ● ● ● ●● ● ● ●
● ●●● ● ●● ● ● ●●
● ● ●
● ●● ● ● ● ●● ● ● ● ●
●● ● ● ●● ● ● ●● ●
● ● ● ●●
● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ●
●●● ● ● ● ●
● ● ● ● ●

● ● ● ●● ●●● ●● ● ● ●● ●
●● ●
●● ● ●● ●●● ● ● ● ●●● ● ●●● ●●● ●●
● ●● ● ●●
● ● ● ● ●
●● ● ●
● ● ● ●● ● ● ● ● ●
● ●
● ● ●●● ●
● ● ● ●● ● ● ● ●● ● ● ●● ● ●

● ●●
●● ● ●● ● ●● ●● ● ●
−1

● ● ● ● ●●● ●
● ●● ● ●● ● ● ● ● ● ●
● ● ● ● ● ●
●● ●●● ● ● ● ●● ●
● ●● ●● ●● ●
● ● ●
● ●● ●● ● ● ● ● ● ●
● ● ● ●●● ● ● ● ● ●● ● ●●

● ●● ● ● ●● ●

● ● ● ● ●●
● ● ●●

●●
● ● ● ●
● ● ● ● ●● ● ●
● ● ●● ● ● ●● ●
● ● ● ●● ●
● ● ● ● ●
● ●●
● ●●
● ● ● ●
● ● ● ● ● ●
−2

●●●
● ● ● ●


● ●
● ●
● ●
● ● ●



● ● ●
● ●
−3

−3 −2 −1 0 1 2 3

Albyn Jones Math 141


Correlation Examples
ρ = .25

Correlation .25

4

3 ● ●


● ●

● ● ●

● ●
● ● ●
2

● ● ●
● ●● ●

● ● ● ●
●● ● ● ●● ● ● ● ● ● ● ● ●
●● ● ● ●● ● ●
● ● ●●
● ● ● ●● ● ●

● ● ●
● ● ● ● ● ●
● ●● ● ● ●●● ● ●
● ● ● ●
● ●
● ● ● ● ●
● ●● ●●● ● ●
● ●● ● ● ● ● ●
● ● ●●● ● ●●
● ● ●● ● ●
● ●●
● ● ●
●● ●● ● ● ● ● ●
●● ●
1

● ● ● ●● ● ● ● ● ●●
● ●● ● ● ● ●●● ●● ● ● ●●●
● ● ●
●● ● ● ● ● ● ● ● ● ●●
●● ●● ● ● ●● ● ● ●●● ● ● ●
● ●● ●
●●●●● ●●● ● ● ●● ● ●● ● ● ●

● ●● ● ●
●●●● ● ● ● ●●●




● ● ●● ● ● ●
● ● ●
● ● ●● ●● ● ● ●
● ● ●
●●●● ●● ●●●●●●● ●●●● ●● ● ● ●● ● ● ● ● ● ●
● ● ● ●●
●●
●●● ●●● ● ●●●●● ● ● ● ● ●●●

● ● ● ● ●● ●●
●● ● ● ●●
●●● ● ●
● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ●
● ● ●● ● ● ●● ● ● ● ● ●
Z

● ● ● ● ● ●●● ●
● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●●● ●●


● ●●● ● ● ●
● ●● ● ● ● ● ●● ●●


● ● ● ●
● ●● ●● ● ● ● ●● ● ● ● ●●
● ●● ●
● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●
●● ●● ●●●●● ● ●●●
0

● ●
● ●
●●● ● ●● ●●●● ● ● ●
● ●

●● ●● ● ● ●●
●● ●
●●● ●● ●
●●● ● ● ●● ●● ● ●
● ●
● ● ● ● ● ● ● ●●
● ● ● ●
● ● ● ● ●
● ●● ●
● ● ●
●● ● ●● ● ●● ●●●●●● ●● ●● ● ●

● ●●● ● ●●
● ● ● ●
● ●● ● ●



●●●●●● ● ●●



●● ● ● ● ●●
● ● ●●● ●● ●● ● ●● ● ● ●● ●●● ●●●●● ●●●●

●●● ● ● ● ●

● ● ● ● ●● ● ● ●● ●●
●● ●● ● ● ●●

● ● ●
● ●● ●● ● ●● ● ●● ●● ●● ● ●● ● ● ● ● ●● ● ●● ● ●


● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●●● ● ●● ● ● ●


● ●●●●● ● ●●● ●● ● ● ● ● ●
● ● ● ●● ●
● ● ● ● ● ● ● ● ●●● ● ●
● ●● ● ● ●●●
●●●● ● ●
−1


● ●● ●●● ●● ● ●●● ● ●●●● ● ● ● ●
●●
● ●

● ● ● ●● ● ●
● ● ● ●● ●● ●● ● ●
●● ●
● ● ● ●●
● ●● ●●● ●

● ●● ●●●
● ●
● ● ● ● ● ●● ●
● ● ● ●
● ● ●● ●
●●●
● ● ●

● ● ● ●
●●●●● ●
● ● ●
● ●●

● ● ● ● ●
● ●● ● ● ● ● ● ●
●● ● ● ●

● ● ●● ●● ● ● ●●
● ● ● ● ●●
● ●
● ●● ● ● ●●
● ●
● ● ●
● ● ●
● ●
−2

● ● ● ● ●

● ●
● ● ● ●
● ● ● ●

● ● ●

● ● ●

−3

−3 −2 −1 0 1 2 3

Albyn Jones Math 141


Correlation Examples
ρ = .5

Correlation .5

● ●
● ●

3


● ● ●

● ●
● ● ●
● ●● ● ●

●● ● ● ●
● ● ● ●
● ● ● ● ● ●
●● ●● ●
● ●
2
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ●
● ● ●● ●● ●● ● ● ●● ●
● ● ●● ● ●
● ●
●●
● ● ●● ● ● ●● ● ● ●
● ● ●
● ● ● ● ●● ●
● ● ● ●
● ●● ● ● ● ●● ●
● ● ● ● ●
●● ● ●● ● ●● ●● ●● ● ● ●
● ●●●● ● ● ●● ● ●●● ● ●● ● ●●
● ●
● ● ●

●● ● ● ● ● ● ●
● ● ● ● ●
● ● ●●● ●●●
● ● ● ● ●
● ● ● ● ● ●●● ●
●● ● ● ● ● ●● ●
1

● ●● ●
● ●● ●

● ● ●
● ● ●● ●
● ●● ● ●●● ● ● ●● ●● ● ● ● ● ●● ● ● ●
● ●● ● ●● ● ● ● ● ● ●●● ● ●
● ● ● ● ● ●● ●
● ●● ● ● ●●● ● ● ●● ●●
● ● ● ● ●● ● ●● ● ●●● ● ● ●
● ● ● ● ●
● ●● ● ● ● ● ● ●
● ● ● ●● ● ● ● ● ●●●●●
● ●● ● ●●●●●●●●
●● ●
●● ● ● ● ● ●● ●● ●●
●●
●● ● ●● ●●●
● ●● ● ● ●
●●●●● ●
● ●
● ●● ● ● ●● ● ● ● ● ● ●
● ●

● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●
● ●● ●●● ● ●
●● ● ●●●●●● ●●●●●● ● ●●● ● ● ●●●●●
● ● ● ●● ● ● ●● ● ● ●● ● ●
● ●● ● ●●
Z


0

● ● ●
● ● ● ●● ● ● ● ●●●●
●●
● ● ●●●●● ●
● ● ●● ●


●●

● ●● ● ●
●●●
● ● ●● ●● ● ●● ● ●●
● ●● ●● ● ● ● ●●● ● ●
● ● ●
● ● ●
● ●● ● ● ● ●●
● ●●●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●
● ● ●● ● ●
● ●● ● ● ● ●● ●

● ●● ●● ●● ●●
● ●● ●● ● ● ● ●●●● ● ● ● ●● ● ● ●● ● ●

●● ●
● ● ● ● ● ● ●


●●●●● ●
●● ● ● ●●
● ● ● ●
● ●●● ● ● ● ● ●
● ●● ● ● ●● ●●● ●● ●● ● ● ● ● ● ● ●
●● ● ● ● ● ●● ● ● ● ● ●
●●● ● ● ●
● ●● ●
●● ● ●● ● ●● ● ● ●
● ● ●● ●

● ●●
● ●● ● ● ● ●● ● ●● ● ●● ●

● ● ●●● ● ● ●
● ● ● ●● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ●
−1

●● ● ●
●● ●●● ● ● ●●
●●● ● ●● ●● ● ● ● ● ●
● ● ● ● ●● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ●
●●● ● ●● ● ●●
● ● ● ●● ● ● ● ● ● ●
● ● ● ●● ●●● ●● ●● ●
● ● ● ●● ● ● ● ●● ● ●
● ● ●
● ● ●
● ● ● ●
●● ● ● ●● ● ●
●● ● ● ●
● ● ●● ● ● ● ● ● ●●●● ● ● ●
●● ● ●●● ● ● ● ●
● ●
● ● ● ● ● ● ●
−2

● ●● ●●
●●● ● ●
● ● ●● ●●● ●
● ●
● ●
● ● ● ●

●●

● ● ●● ●


● ● ● ●
● ●
● ● ●
● ● ●
−3


● ●
● ●

−3 −2 −1 0 1 2 3

Albyn Jones Math 141


Correlation Examples
ρ = .75

Correlation .75

4
● ●
● ●
● ●

● ●



●● ●●
●●●
● ●●
● ●● ● ● ●

● ●
● ● ● ●●● ●●
● ● ● ●● ● ● ● ● ●
● ● ● ● ●
●● ● ● ● ●● ● ● ●
● ●●● ● ● ● ● ●● ● ● ●
2

● ● ● ●
● ● ● ● ●●● ● ●● ● ●
●●●● ●● ● ●● ● ●●● ● ●●● ●●
● ● ● ●
●● ● ●●● ● ● ●● ● ● ●●
● ● ● ●●● ●●●● ●●●●●● ● ●● ●● ●● ●
● ●
● ●● ● ● ●
● ● ●●
● ●● ●●●● ●●
●● ● ● ● ● ● ●
● ● ●
●● ● ● ● ●● ●●●

● ●
● ● ●● ● ● ● ● ●

●● ●● ● ● ● ● ●●
●●● ●● ● ●● ●●● ● ●
● ●●● ● ●● ●● ●●●● ●●● ● ●
●● ● ●
● ● ●● ●●●
● ●● ● ●● ● ● ● ● ● ●●
●●

● ● ●● ●●●● ● ● ● ●●● ●●●● ●●● ●● ● ●● ● ●
● ●
●● ● ●● ●● ●● ●● ● ● ●● ●● ● ●
● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ●
● ●
●● ●● ●● ●●● ●● ● ● ●● ● ●● ● ●
● ● ● ●
● ●● ● ● ●● ● ●●

●● ●

● ●●● ●●●
● ● ●● ●
● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ●
● ● ● ●● ● ●●●●●●● ● ●●●

●●● ● ●● ●●●
●●● ●●●● ●● ●●● ●●● ●

● ●●●● ●
● ● ● ●● ● ●

0

● ●● ● ●● ● ●●
● ● ● ●
● ●
● ● ● ● ● ● ●
● ● ●● ●● ●● ●●● ●●
● ● ● ●● ●● ● ● ● ● ●
●● ●● ● ● ● ●●●● ●●●●● ● ● ● ●
● ●● ● ●
●● ● ●● ● ●● ● ●●●●●● ● ●
●● ●● ●
● ● ●
● ● ● ●● ● ●●●● ● ●●●●●
● ● ● ● ●● ● ●● ●

● ●● ● ●●● ●● ● ● ●●● ●● ● ● ●●
●● ● ● ● ●
Z

● ●● ●

●● ● ●
●● ●● ● ●●
● ● ●●●
●●●● ●● ● ●
● ●
● ● ●


● ●● ●
●● ● ●● ● ●●

● ●●
●●● ●● ● ●
● ● ●
● ●●
● ● ● ●●●● ●● ● ● ●●● ● ● ● ●●
● ●
●● ●● ● ● ● ● ●
● ● ● ●●● ● ●● ●
● ● ● ● ● ●● ● ●● ● ●● ●●●● ●● ●● ● ● ●
● ●
● ● ● ● ● ● ●● ●● ● ●● ●●● ● ● ●
● ● ● ● ●●●● ●● ● ● ●● ●●
● ●●●
● ● ●● ● ●●●● ● ● ●
●● ● ● ● ● ●
●●●● ● ● ● ●● ●
●●

● ●● ● ● ● ●
●● ● ● ● ●● ● ● ●
●● ●
●● ●● ●● ● ●
−2

● ● ●● ● ●●
● ● ● ● ● ●● ● ● ●● ●
● ● ●


● ● ● ● ● ●
● ● ● ●● ●
● ● ● ●
● ●
●● ● ● ● ● ● ●● ●●●
●● ● ● ● ●●

● ●● ● ●
● ●
● ● ●● ● ● ● ●●● ●
● ● ●
● ● ● ● ●

●● ● ●
● ● ●
●● ●

−4

● ● ●



−6

−3 −2 −1 0 1 2 3

Albyn Jones Math 141


Correlation Examples
ρ = .95

Correlation .95

10

● ●



● ● ●
● ● ● ●
● ● ● ● ●

● ●
● ● ● ● ●
● ● ● ●
● ●
●● ●
● ● ●
● ●
● ●●● ●●● ●●
5

●● ● ● ●● ●
● ●● ● ● ●
● ●●●● ●●●● ●●●● ●
●● ● ● ● ● ●● ● ●●

● ●● ● ●●
●●
●●●●● ●
● ●●
● ● ●●● ●

●●●●●
● ●
● ● ●● ● ●
● ●● ●● ●●●●●●
●●
●●●●●●● ●● ● ●●
● ●●●● ●● ●● ●● ● ●
●● ●
● ● ● ● ●● ●

● ●● ● ●● ●
●● ●● ●
●●●●●●● ●●●●● ● ●
●● ●● ●
●●●●
●● ● ●
● ●● ●
● ● ●● ● ● ●●●●● ● ●● ●
●● ●●●●● ●● ●●●●●● ● ●● ●●●
● ●● ● ● ●● ● ●● ●●
●●●● ● ● ●●●● ● ●●●● ●●● ● ● ●
●● ●● ●● ● ●● ●
● ●● ●●●●●●●

● ●●● ● ● ●
●●● ●●●● ●●●●●●
●●● ● ●
● ●●●●

● ●

●●
●● ●●● ●●● ●●●
●● ● ●
● ●●● ●●●●●●● ●● ●
●●
● ●●●
● ● ●●
● ●●●●
● ● ●
● ● ●
● ● ●● ● ●● ●
●●● ●●●●●● ●● ●
● ●● ●●● ● ●●●
● ●● ●●
●● ● ●● ●
Z

●● ●●
● ●● ●●●● ●
● ●●●●●●● ●
●●●●● ● ●●●
●●● ●● ●
● ● ●
0

● ● ●● ●●●● ●● ●●●●
●● ●
●●●
● ●● ●●
●● ●● ● ●●
● ●● ●●



●● ●●●●●● ● ●
● ●●●
●● ●● ● ● ● ● ●
●●●● ● ● ●
●●●
●●●● ●●●●●●●●●● ● ●●
● ● ●
● ●● ●● ●
●●●●●●● ●●●●●
● ●●●● ●●●● ●

● ● ● ●● ●● ●●
●● ●

●●●●● ●●●
● ●●● ● ●●
● ●●● ●
● ●● ●●●
● ● ●● ●●
● ●●● ●●

●●●
● ● ●
●●
● ●● ●●●●
●●
● ●● ●● ●● ●●●● ●●●●
● ●● ●● ● ● ● ● ●
●● ● ●
●●●
●●●●●
●● ● ● ●● ●●
●●


●● ● ● ●● ●●●●●
●● ●● ●
●● ● ●●
●●
● ●●●●● ●
●● ● ● ● ●● ● ●● ●
● ● ●● ●
●●
● ●●● ● ● ●● ●● ● ● ● ● ●
●● ● ● ●●● ●● ● ●
● ●● ●● ● ●
● ●
●● ●● ●●● ● ● ● ●
●● ●●●●●●●● ●●●
● ●
● ●● ● ● ● ● ●●
● ●●
● ● ● ●● ●●●
●● ●●● ● ●
● ●●● ●●● ● ● ●●● ● ●
●● ● ●● ● ●● ● ●
−5

● ●●● ● ●● ●
● ● ● ●
● ● ●● ● ● ●
● ●● ●

● ● ● ● ● ●●
● ●

●●● ● ● ●

● ●●● ●
● ●
●● ● ● ●●
● ●
●●

−10

−3 −2 −1 0 1 2 3

Albyn Jones Math 141


Caution!

ρx,y = .58, ρx 2 ,y = .99

50

40 ●

●●


●●
●●

●●●
● ●

30


●●
●●●
●●●
Y

● ●

●●



●●●●●
●● ●●●

●●
●●
20

● ●
●●●

●●
●●●
●● ●
●●●●

●●●
●●●
●● ● ●●
●● ●●●


●● ●●●●
●● ● ●●●


● ●



●●●
●●

●●●●
●●

●●●
● ●● ●●
● ●


●●●
●●●
10

●●● ●

●●●●●
● ●

●●●
●●
●●●●
●●●●

●● ●
●●

●●
●●
●● ●


●●●
●●●
●●

●●●●
● ●●








●●




●●
●● ●●●●


●●
●●●
●●
●●
●●●●●● ●●●
●●
●●

●●●●
●● ● ● ●●


●●
●●


● ●

●●

●●●
●● ●
●●

●● ●●●●●
●●

● ●●

●●
●●●●
●●●●●●●

●●


●●●
●●●●
●●●
●●●●
●●

●●●● ●
●● ● ● ● ● ● ●●●●
●●●
●● ●●●●●
●●●

●●●●●● ● ●●●


●●
●●

●●●


●●


●●




●●




●●
●●●
●● ●
●●

●●



●●●

● ●● ●●●
●●●●●●
● ●
●●

●●●
●●


●●●

●●
●●
●●
●●





●●


●●


●●
● ●●●

●● ●●●

●●●●
●●●
●●●
●●

● ●●●


●●●●
●●


●●
●●●


● ●
● ●

● ●●
●●
●●
● ●
●●
● ●


●●●●


● ●
●●●

●●


●●

●●


●●

●●
●●

●●
●●













●●

●●


●●●

●●●

●●



●●●●●●

●●●
● ● ●● ●●●
●● ●● ● ●
0

●● ● ●
●●
●●● ●
●●
●●
●●●

●●

●●
● ●
●●●
●●
●●● ●

● ●●●● ●●

● ●
●●●

●●
●●

●●
●●
●●●
●●
●●●
●● ●●●● ● ●●


●●●●● ●

●●

−4 −2 0 2 4 6

Albyn Jones Math 141


Regression: Terminology

Linear Regression: fitting lines to data

15
10
Y

● OBS ●

RESIDUAL
5



FIT ●


0

2 4 6 8 10

Albyn Jones Math 141


Reminder: Equation of a line

Y = β0 + β1 X

Albyn Jones Math 141


Reminder: Equation of a line

Y = β0 + β1 X

β0 The intercept.

Albyn Jones Math 141


Reminder: Equation of a line

Y = β0 + β1 X

β0 The intercept.
β1 The slope.

Albyn Jones Math 141


The Regression Model

Y = β0 + β1 X + 

Albyn Jones Math 141


The Regression Model

Y = β0 + β1 X + 

Y: The response variable, also called the dependent


variable.
X: The explanatory variable, also called the
independent variable.
β0 The intercept.
β1 The slope.
 The error term, or deviation from the line.

Albyn Jones Math 141


The Regression Model, Continued

Yi = β0 + β1 Xi + i
We will usually be working with a model for the error term:

i ∼ IID N(0, σ 2 )

Albyn Jones Math 141


Regression: fitting lines to data

Question What is a ‘good fit’?


Answer: We want the line to be close to the points in some
sense: minimize the distances from the points to the line.

For example: choose (β0 , β1 ) to minimize the sum of the


absolute values of the residuals:
X X
min |Yi − (β0 + β1 Xi )| = |ri |

Or: minimize the sum of the squares of residuals:


X X
min (Yi − (β0 + β1 Xi ))2 = ri2

Albyn Jones Math 141


Least Squares

Around 1800 mathematicians (notably Legendre and Gauss)


realized that minimizing the sum of the squares of distances
(which has come to be known as the Residual Sum of Squares)
X
RSS(β0 , β1 ) = (Yi − (β0 + β1 Xi ))2
was analytically tractable, via calculus: it is a quadratic function
of the parameters (β0 , β1 ).

Albyn Jones Math 141


Solutions

P
(Xi − X )(Yi − Y )
β̂1 =
(Xi − X )2
P

β̂0 = Y − β̂1 (X )

Albyn Jones Math 141


Connection to Correlation

The sample correlation coefficient is


P
(Xi − X )(Yi − Y )
r = ρ̂ = qP
(Xi − X )2 (Yi − Y )2
P

Compare to the LSE for the slope:


qP
(Yi − Y )2
P
(Xi − X )(Yi − Y )
β̂1 = = r qP
(Xi − X )2
P
(Xi − X )2

Albyn Jones Math 141


Correlation is Regression!

In other words:
σy
β1 = ρ
σx
Conclusion: don’t use the correlation coefficient as a measure
of association unless the relationship is linear! The correlation
coefficient is just the slope of the line after both variables have
been standardized!

Albyn Jones Math 141


Residual Variance: σ 2

We estimate the residual variance σ 2 by the average of the


squares of the residuals:
P 2
(Yi − (β̂0 + β̂1 Xi ))2
P
2 ri
s = =
n−2 n−2
The residual standard error (s) is the square root of that
quantity, and represents a typical deviation from the regression
line.

Albyn Jones Math 141


Correlation Again

Coefficient of Determination aka R 2


The squared correlation between the fitted line and the
observations:

(Yi − (β̂0 + β̂1 Xi ))2


P
2 2 RSS
R = cor (Y , Ŷ ) = 1 − =1−
TSS (Yi − Y )2
P

Because R 2 = 1 − RSS
TSS , it is also the proportion of variance
explained by the regression.

Albyn Jones Math 141


Properties

It is easy to show that Y is the LSE for the population mean µy .


In general least squares estimates have similar properties to
those of sample means, such as ease of computation and
sensitivity to outliers, plus:

Albyn Jones Math 141


Properties

It is easy to show that Y is the LSE for the population mean µy .


In general least squares estimates have similar properties to
those of sample means, such as ease of computation and
sensitivity to outliers, plus:
the Gauss-Markov Thm: LSE gives unbiased, minimum
variance estimates if the errors (deviations) are
uncorrelated, symmetrically distributed with mean 0 and
constant variance.

Albyn Jones Math 141


Properties

It is easy to show that Y is the LSE for the population mean µy .


In general least squares estimates have similar properties to
those of sample means, such as ease of computation and
sensitivity to outliers, plus:
the Gauss-Markov Thm: LSE gives unbiased, minimum
variance estimates if the errors (deviations) are
uncorrelated, symmetrically distributed with mean 0 and
constant variance.
If the intercept is included in the model, the residuals sum
to 0 and the fitted line passes through (X , Y )

Albyn Jones Math 141


R Functions

plot(x,y) # always look at your data!


lsFit <- lm(y ˜ x, data=YourDataFrame)
# compute LS fit
summary(lsFit) # display results
plot(fitted(lsFit),residuals(lsFit))
# plot residuals

Albyn Jones Math 141

You might also like