Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
AND CORRELATION
1800
Price Sold at Auction
Bidders
15.0
12.5
1400 10.0
7.5
5.0
1000
1
• Regression is a method for studying the
relationship between two or more
quantitative variables
3
• Example: Health data
Variables:
Percent of Obese Individuals
Percent of Active Individuals
Data from CDC. Units are regions of U.S. in 2014.
PercentObesity PercentActive
1 29.7 55.3
2 28.9 51.9
3 35.9 41.2
4 24.7 56.3
5 21.3 60.4
6 26.3 50.9
.
.
.
35
Percent obese
30
25
40 45 50 55 60 65
Percent Active
4
A scatterplot or scatter diagram can give us a
general idea of the relationship between obe-
sity and activity...
35
Percent obese
30
25
40 45 50 55 60 65
Percent Active
Yi = β0 + β1xi + i
6
– So, E[Yi|xi] = β0 + β1xi + 0 = β0 + β1xi
7
Example: Consider the model that re-
gresses Oxygen purity on Hydrocarbon level
in a distillation process with...
β0 = 75 and β1 = 15
8
The conditional mean for x = 1:
E[Y |x] = 75 + 15 · 1 = 90
9
These values that randomly scatter around a
conditional mean are called errors.
10
• The model can also be written as:
12
Simple Linear Regression
Assumptions
• Key assumptions
– independent errors
(this essentially equates to independent
observations in the case of SLR)
13
Simple Linear Regression
Estimation
14
– We let ‘hats’ denote predicted values or
estimates of parameters, so we have:
ˆ ˆ Pn
g(β0, β1) = i=1(yi − yˆi)2
Pn ˆ0 + βˆ1xi))2
= i=1 (y i − ( β
15
– This vertical distance of a point from the
fitted line is called a residual. The resid-
ual for observation i is denoted ei and
ei = yi − ŷi
– To minimize P
g(βˆ0, βˆ1) = ni=1(yi − (βˆ0 + βˆ1xi))2
16
Simplifying the above gives:
n
X n
X
nβˆ0 + βˆ1 xi = yi
i=1 i=1
n n n
(x2i ) =
X X X
βˆ0 xi + βˆ1 y i xi
i=1 i=1 i=1
17
Simple Linear Regression
Estimation
β̂0 = ȳ − β̂1x̄
18
• Example: Cigarette data
(Nicotine vs. Tar content)
●
2.0
1.5
●
Nic
●
●
●
●
● ●●
1.0
●
● ●
●
●
●
●
●
●
●
●
●
●
0.5
●
●
0 5 10 15 20 25 30
Tar
n = 25
Summary statistics:
Pn
i=1 xi = 305.4 x̄ = 12.216
Pn
i=1 yi = 21.91 ȳ = 0.8764
19
Pn
i=1(yi − ȳ)(xi − x̄) = 47.01844
Pn 2 = 770.4336
(x
i=1 i − x̄)
Pn 2 = 4501.2 Pn 2 = 22.2105
x
i=1 i y
i=1 i
ˆ Sxy 47.01844
β1 = = = 0.061029
Sxx 770.4336
and
βˆ0 = ȳ − β̂1x̄
= 0.8764 − 0.061029(12.216)
= 0.130870
20
Simple Linear Regression
Estimating σ 2
21
Recall the model:
iid
Yi = β0 + β1xi + i with i ∼ N (0, σ 2)
∗ SSE =P
error sum of squares
= ni=1(yi − ŷi)2
22
∗ ‘2’ is subtracted from n in the denom-
inator because we’ve used 2 degrees of
freedom for estimating the slope and in-
tercept (i.e. there were 2 parameters es-
timated when modeling the conditional
mean)
23