8 SLR Gsba 545 2024
8 SLR Gsba 545 2024
• Inference
⎼ F and t-testing
2
Simple Linear Regression Model
c
0
𝑌 = β0 + β1 𝑋 + ε
I ⼆ Bot Bix
i l.it
β0 is the Y-intercept, or mean of Y when X is 0*
β1 is the slope, or change in the mean of Y per unit change in X
is error term describing leftover effect on Y
HD H
3* Note: Be careful… you need to have data where X is 0 for this to make sense.
Ordinary Least Squares (OLS) Estimation
= −1398.77 + 145.371 𝐻𝑃
𝐴𝑣𝑒$ Optional background calculations:
Slope (b1)
σ 𝑋𝑖 − 𝑋ത 𝑌𝑖 − 𝑌ത
𝑏1 =
σ 𝑋𝑖 − 𝑋ത 2
True line of means
𝑠
= 𝑟𝑋𝑌 𝑠𝑌 = 145.37
𝑋
Y-Intercept (b0)
Least squares line:
𝑌 = −1399 + 145𝑋 𝑌ത = 19509.68, 𝑋ത = 143.83
𝑏0 = 𝑌ത − 𝑏1 𝑋ത
= 19509.68 − 145.37 143.83 = −1399
4
ixi an.sc
Ave$ and HP Model: Excel
Ave$ and HP Output: Excel
% of variation in Ave Price
explained when incorporating HP
state nor at
How much we expect tomodel
be off, on
3MSE average, when predicting Ave Price
OnCoefficients for at
regression equation
Determines if model, overall, is significant
or useful for predicting Ave Price
00 2h
Determines if HP is significant or
useful for predicting Ave Price
0000000
0
Byse
Ave$ and HP Model: JMP B
S
怀
Bit txi SCB i
n k 1
Fit Model
Ave$ and HP Output: JMP F
% of variation in Ave Price
explained when incorporating HP
n
How much we expect to be off, on
CO
Determines if entire model, overall, is
⼼
000 Coefficients for regression equation
Determines if HP is significant or
useful for predicting Ave Price
鞏 㗊
⽌ t HB fi H.ie not
true
Ave$ and HP Model: Python nkf
Ave$ and HP Output: Python (ANOVA)
Python doesn’t
automatically create a full
ANOVA table or report the
RMSE (Standard Error)
value, so a little more code
is necessary.
Ave$ and HP Output: Python
% of variation in Ave Price
explained when incorporating HP
Determines if HP is significant or
useful for predicting Ave Price
11
RMSE
Model Assumptions
Assumptions about the model error terms
1. Constant Variance: (Homoscedasticity)
lno_rti ol n Variance of error terms, σ , is
2
0
3. Independence: Error terms are statistically independent of each other.
4. Linearity: Linear in parameters.
ki
1Ciii lII.it
fE i
t_T
Model Assumptions: Constant Variance
If non-constant variance
i
exist_
exits, output results
Era cannot be fully trusted.
Measures should be
taken to obtain random
error scattering.
13
Model Assumptions: Normality
ni
y
Incorporating
transformations to
variables may help uncover
the true relationships.
Standard Error of Estimate (RMSE)
2
σ 𝑌𝑖 − 𝑌𝑖 σ 𝑒𝑖2
Ʃ
𝑆𝑡𝑑 𝐸𝑟𝑟𝑜𝑟 = 𝑅𝑀𝑆𝐸 = 𝑆𝑒 =
0 𝑛−𝑘−1
=
𝑛−𝑘−1
n_n
9,1in
• Measures standard deviation of predicted vs. actual values
• Measures average error of estimate
ii
• Denoted by “Standard Error” in Excel and “Root Mean
Square Error (RMSE)” in JMP and other programs
• Affects parameter significance & prediction accuracy
18 0
x = 210 同
1号
⅓
F-test for Overall Model: Excel
Testing H0: 1 = 0 vs Ha: 1 ≠ 0 at the level of significance.
Reject H0 if: p-value (Sig F) <
19
F-test for Overall Model: JMP
8
𝑀𝑆𝑅 5.3331𝑒 + 9
𝐹= = 0
𝑀𝑆𝐸 35723966
0
= 149.29 > 3.946 = 𝐹0.05,1,91
dfn
p-value ≈ 0.000 < 0.05 = α
⇒ Reject, so HP is useful
k
o aE.tt dfz
F-test for Overall Model: Python
𝑀𝑆𝑅 5.3331𝑒 + 09
𝐹= =
Fri, 11 Oct 2024 𝑀𝑆𝐸 3.572397𝑒 + 07
七
= 149.29 > 3.946 = 𝐹0.05,1,91
21
Slope Significance: Standard Error (𝑠𝑏1 )
Describes the possible sample-to-sample variability of b1.
• As RMSE increases, so does 𝑠𝑏1
Ef
• As n increases, 𝑠𝑏1 decreases 𝑅𝑀𝑆𝐸 1
𝑠𝑏1 = ×
• As sx (std deviation of X) increases, 𝑠𝑏1 decreases 𝑛 − 1 𝑠𝑥
⼀⼀
a
no0
CO
22 sb
d'rgsd ten
Slope Significance Test
If regression assumptions hold, we can reject 𝐻0 : β1 = 0 in favor of 𝐻a : β1 ≠ 0 at
the level of significance if and only if the corresponding p-value < (usually 0.05).
Test Statistic
𝑡=
𝑏1 − β1
𝑠𝑏1
0 t
景
95% Confidence Interval for β1
𝑏1 ± 𝑡0.025,𝑑𝑓 𝑠𝑏1
* tα, tα/2, and p-values are based on n–k–1 degrees of freedom, found as df Residual on Excel output
Slope Significance: Excel Output
𝑏1 145.371
𝑡= = = 12.218
𝑠𝑏1 11.898
The slope of HP is > 12 std errors
away from being 0 (or worthless).
8
0
0 00
24
1unit H⼝ out of
Slope Significance:12d
1 Ei
JMP Output
2
𝑏1 145.371
𝑡= =
𝑠𝑏1 11.898
B Bo 0
25
CO tnB.io
Cln.uttn es
i 品品
Slope Significance: Python Output
𝑏1 145.371
𝑡= =
𝑠𝑏1 11.898
Fri, 11 Oct 2024
26
Estimation: Prediction Intervals
Prediction (X = x)
𝑌 = 𝑏0 + 𝑏1 𝑥
Ave$ (HP = 210): 𝑌 = 𝑏0 + 𝑏1 𝑥 = −1399 + 145.37 210 = $29,129
* In JMP and other programs, Se is denoted RMSE, or Root Mean Square Error.
Estimation: Confidence Intervals
Prediction (X = x)
𝑌 = 𝑏0 + 𝑏1 𝑥
Ave$ (HP = 210): 𝑌 = 𝑏0 + 𝑏1 𝑥 = −1399 + 145.37 210 = $29,129