Module 3 - SimpleLinearRegression - Afterclass1b
Module 3 - SimpleLinearRegression - Afterclass1b
1
Bordeaux wine
4
Building a model
5
Building a model
§ Dependent variable:
– Typical price in 1990-1991 wine auctions (approximates quality)
– Conduct logarithmic transformation
q A better linear fit
§ Independent variables:
– Age of wine (in 1990)
q Older wines are more expensive
– Weather
q Average Growing Season Temperature (AGST)
q Harvest Rain
q Winter Rain
– Population of France
6
The wine data (1952 - 1978)
8
The wine data (1952 - 1978)
10
Baseline model (Take the mean)
ne
y .....
In
&
11
One-Variable Linear Regression
me
e ⑧
12
Simple Regression Model
The population model of y with one predictor variable x is:
-
-
-
! =# +# %+ε
! "
I
-
↑
-
↑
-
§ !" is the slope for x, which is the change in E[Y|x] for a unit increase&
in x
-
§ The output does not show these, but it does estimate se distributed random variables
§ The random errors e and IV (X) are uncorrelated
§ These assumptions are important for effective business analytics
-
13
Estimated Regression Function
§ Estimates the regression model with n observations (xi,yi) for i = 1, …, n
-
&
"! = $ ! + $" &
44
&
§ '! is the sample estimate of the population intercept #!-
14
One-Variable Linear Regression
brtb i)
,
15
Data and Predicted Values
§ What is the observed y when x = 1?
YG =
4 =
16
Data and Predicted Values
§ What is the observed y when x = 1?
y=6
y=4
!=1+(2)(4)
, =9
17
Estimated Model and-
Residuals
e
predicted values of !,
-
– y - $# 8
! =-=
– Each observation has one observed y, one predicted $,
# and one residual r.
§ The residuals are errors between the observed and predicted values.
y3
"!$
## = "# − "!#
y1
"!" #$ = "$ − "!$
"!#
#! = "! − "!!
y4
"!! y2
#" = "" − "!"
18
Computing Residuals
r3
↓ r4
r1 r2
j
§ What is the residual r2 at x = 2? 1
y Y
=
-
19
Computing Residuals
r3
r4
r1 r2
r -
-# = !# − !,# = 3 − 1 + 2 ∗ 2 = 3 − 5 = −2
-$ = !$ − !,$ = 11 − 1 + 2 ∗ 3 = 11 − 7 = 4
20
Ordinary Least Squares Criterion or (OLS)
The least squares line finds the estimates '! and '" of the coefficients to
minimize the sums-of-squares error for a sample {(xi,yi)} with n observations:
W I
-
667 = !% − !,% # ∑'%&"
!,% = '! + '" %% for < = 1, … , ? SSE
Why squared?
↑ min
The sum of residuals could be zero.
-> '
667('! , '" ) = A !% − '! − '" %% #
∑'%&" %% − %̅ !% − !)
D
-
%&" '" =
B667('! , '" ) ∑'%&" %% − %̅ #
3
= 0
B'! C
'! = !) − '" %̅
B667('! , '" ) %:̅ sample average of independent variable
-
= 0 -
O0
Estimated Standard Errors t-score = (Estimated Coefficient – 0)/(Standard Error)
intercept and for estimated
slope coefficients
Two-Tail Test: p-value = 2*P(T<-|t-score|)
V
-
Coefficient of Determination: R-Squared
23
One-Variable Linear Regression
, -3.4178 + 0.6351*AGST
!=
24
Estimate a linear model
(One Variable )
• Estimated model for price:
D
, -3.4178 + 0.6351*AGST
!=
• The predicted LogPrice increases by
-
-
0.6351 for every 1 degree increase in
-
25
T-Tests for the Coefficients: H0: bj = 0 versus HA: bj ≠ 0
& -
O ⑧
e⑪
men
-
df = 23
⑰
-
& A
n
-
1 -
#IF
I
st ↑
I
25 1 1 23
=
- -
=
4 208
. ,
of =
23 .
0 .
001 <P rate -
<0 .
01 **
4 2 <
.
<
2 x
-
.
.
26
How well the model fits data
§ The simplest commonly used measure of fit is R# (the coefficient of
determination): R# = 1 − SSE/SST
-
-
– Decomposition of variation of Y:
SSF
&
SSE =
-
-> Total variation Unexplained variation Explained variation
1- SE
s =
-
SSE *
SS7
27
Coefficient of Determination: R-Squared
• R-Squared is a measure of fit
• Bigger R-Squared indicates better fit all
else being equal
• 43.5% of the variation of prices is
explained by the simple regression on
AGST
-
• 0 < R-Squared < 1
28
Use each variable on its own
§ R# =0.44 using Average growing season temperature (Variable
-
Significant, 0.001)
R# =0.32 using Harvest rain (Variable Significant, 0.01)
↓
§
§ R# =0.22 using France Population (Variable Significant, 0.05)
§ R# =0.20 using Age (Variable Significant, 0.05)
§ R# =0.02 using Winter rain (Not Significant)
§ Multivariate linear regression allows us to use more than one
variable to potentially improve our predictive ability.
29