0% found this document useful (0 votes)
42 views28 pages

8 SLR Gsba 545 2024

Uploaded by

jacksui181
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views28 pages

8 SLR Gsba 545 2024

Uploaded by

jacksui181
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Data Driven Decision Making:

Simple Linear Regression Analysis


GSBA 545, Fall 2024
Professor Dawn Porter
Simple Linear Regression Analysis
• Simple linear regression (SLR) model

• Regression model assumptions


⎼ normality, independence, linearity & constant variance

• Inference
⎼ F and t-testing

• Confidence & prediction intervals

2
Simple Linear Regression Model
c

0
𝑌 = β0 + β1 𝑋 + ε

I ⼆ Bot Bix

i l.it
β0 is the Y-intercept, or mean of Y when X is 0*
β1 is the slope, or change in the mean of Y per unit change in X
 is error term describing leftover effect on Y

HD H
3* Note: Be careful… you need to have data where X is 0 for this to make sense.
Ordinary Least Squares (OLS) Estimation
෣ = −1398.77 + 145.371 𝐻𝑃
𝐴𝑣𝑒$ Optional background calculations:
Slope (b1)
σ 𝑋𝑖 − 𝑋ത 𝑌𝑖 − 𝑌ത
𝑏1 =
σ 𝑋𝑖 − 𝑋ത 2
True line of means
𝑠
= 𝑟𝑋𝑌 𝑠𝑌 = 145.37
𝑋

Y-Intercept (b0)
Least squares line:
𝑌෠ = −1399 + 145𝑋 𝑌ത = 19509.68, 𝑋ത = 143.83
𝑏0 = 𝑌ത − 𝑏1 𝑋ത
= 19509.68 − 145.37 143.83 = −1399

4
ixi an.sc
Ave$ and HP Model: Excel
Ave$ and HP Output: Excel
% of variation in Ave Price
explained when incorporating HP

state nor at
How much we expect tomodel
be off, on
3MSE average, when predicting Ave Price
OnCoefficients for at
regression equation
Determines if model, overall, is significant
or useful for predicting Ave Price

00 2h
Determines if HP is significant or
useful for predicting Ave Price

0000000
0
Byse
Ave$ and HP Model: JMP B
S
怀
Bit txi SCB i
n k 1

Fit Model
Ave$ and HP Output: JMP F
% of variation in Ave Price
explained when incorporating HP

n
How much we expect to be off, on

nkt average, when predicting Ave Price

CO
Determines if entire model, overall, is

O significant or useful for predicting Ave Price


000 Coefficients for regression equation

Determines if HP is significant or
useful for predicting Ave Price

鞏 㗊
⽌ t HB fi H.ie not
true
Ave$ and HP Model: Python nkf
Ave$ and HP Output: Python (ANOVA)

Python doesn’t
automatically create a full
ANOVA table or report the
RMSE (Standard Error)
value, so a little more code
is necessary.
Ave$ and HP Output: Python
% of variation in Ave Price
explained when incorporating HP

Determines if entire model, overall, is


significant or useful for predicting Ave Price

Fri, 11 Oct 2024

How much we expect to be off, on


average, when predicting Ave Price

Coefficients for regression equation

Determines if HP is significant or
useful for predicting Ave Price

11
RMSE
Model Assumptions
Assumptions about the model error terms
1. Constant Variance: (Homoscedasticity)
lno_rti ol n Variance of error terms, σ , is
2

the same for all values of X.


2. Normality: Error terms follow a normal distribution for all values of X.

0
3. Independence: Error terms are statistically independent of each other.
4. Linearity: Linear in parameters.

ki
1Ciii lII.it
fE i
t_T
Model Assumptions: Constant Variance

If non-constant variance
i
exist_
exits, output results
Era cannot be fully trusted.

Measures should be
taken to obtain random
error scattering.

13
Model Assumptions: Normality

ni

An approximate normal distribution of Y values is


assumed at each level of X, allowing us to create
confidence and prediction intervals.
Model Assumptions: Independence
If there is a dependence
00 between rows, usually seen in
Oo data assessed over time, there
will probably be an issue with
independence.

Other methods need to be


employed to incorporate the
dependence.
Model Assumptions: Linearity
If there is a nonlinear
relationship in the data,
OLS will not perform well.

y
Incorporating
transformations to
variables may help uncover
the true relationships.
Standard Error of Estimate (RMSE)

2

σ 𝑌𝑖 − 𝑌𝑖 σ 𝑒𝑖2
Ʃ
𝑆𝑡𝑑 𝐸𝑟𝑟𝑜𝑟 = 𝑅𝑀𝑆𝐸 = 𝑆𝑒 =
0 𝑛−𝑘−1
=
𝑛−𝑘−1
n_n
9,1in
• Measures standard deviation of predicted vs. actual values
• Measures average error of estimate

ii
• Denoted by “Standard Error” in Excel and “Root Mean
Square Error (RMSE)” in JMP and other programs
• Affects parameter significance & prediction accuracy

17 * n = number of observations and k = number of independent variables in the model.


Measures of Variation
Prediction (x = 210): 𝑌෠ = 𝑏0 + 𝑏1 𝑋 = −1399 + 145.37 210 = $29,129

𝑌52 − 𝑌ത = 36,100 − 19,510 = $16,590

𝑌52 − 𝑌෠52 = 36,100 − 29,129 = $6971


𝑒52 = $6971
𝑌52 =$36,100
𝑌෠52 − 𝑌ത = 29,129 − 19,510 = $9619

𝑌෠52 =$29,129 The model improved our prediction for


that car by $9619 versus using just the
𝑌ത =$19,510 mean.

18 0
x = 210 同
1号

F-test for Overall Model: Excel
Testing H0: 1 = 0 vs Ha: 1 ≠ 0 at the  level of significance.
Reject H0 if: p-value (Sig F) < 

p-value ≈ 0.000 < 0.05 = α


⇒ Reject, so HP is useful

19
F-test for Overall Model: JMP

8
𝑀𝑆𝑅 5.3331𝑒 + 9
𝐹= = 0
𝑀𝑆𝐸 35723966

0
= 149.29 > 3.946 = 𝐹0.05,1,91

dfn
p-value ≈ 0.000 < 0.05 = α
⇒ Reject, so HP is useful
k

o aE.tt dfz
F-test for Overall Model: Python

𝑀𝑆𝑅 5.3331𝑒 + 09
𝐹= =
Fri, 11 Oct 2024 𝑀𝑆𝐸 3.572397𝑒 + 07

= 149.29 > 3.946 = 𝐹0.05,1,91

p-value ≈ 0.000 < 0.05 = α


⇒ Reject, so HP is useful

21
Slope Significance: Standard Error (𝑠𝑏1 )
Describes the possible sample-to-sample variability of b1.
• As RMSE increases, so does 𝑠𝑏1

Ef
• As n increases, 𝑠𝑏1 decreases 𝑅𝑀𝑆𝐸 1
𝑠𝑏1 = ×
• As sx (std deviation of X) increases, 𝑠𝑏1 decreases 𝑛 − 1 𝑠𝑥
⼀⼀
a

no0
CO

22 sb
d'rgsd ten
Slope Significance Test
If regression assumptions hold, we can reject 𝐻0 : β1 = 0 in favor of 𝐻a : β1 ≠ 0 at
the  level of significance if and only if the corresponding p-value <  (usually 0.05).

Test Statistic

𝑡=
𝑏1 − β1
𝑠𝑏1
0 t

95% Confidence Interval for β1
𝑏1 ± 𝑡0.025,𝑑𝑓 𝑠𝑏1

* tα, tα/2, and p-values are based on n–k–1 degrees of freedom, found as df Residual on Excel output
Slope Significance: Excel Output
𝑏1 145.371
𝑡= = = 12.218
𝑠𝑏1 11.898
The slope of HP is > 12 std errors
away from being 0 (or worthless).

p-value ≈ 0.000 < 0.05 = α


⇒ Reject, so HP is useful

8
0
0 00
24
1unit H⼝ out of
Slope Significance:12d
1 Ei
JMP Output
2

𝑏1 145.371
𝑡= =
𝑠𝑏1 11.898

= 12.22 > 1.986 = 𝑡0.025,91

p-value ≈ 0.000 < 0.05 = α


⇒ Reject, so HP is useful

B Bo 0

25
CO tnB.io
Cln.uttn es
i 品品
Slope Significance: Python Output

𝑏1 145.371
𝑡= =
𝑠𝑏1 11.898
Fri, 11 Oct 2024

= 12.22 > 1.986 = 𝑡0.025,91

p-value ≈ 0.000 < 0.05 = α


⇒ Reject, so HP is useful

26
Estimation: Prediction Intervals
Prediction (X = x)
𝑌෠ = 𝑏0 + 𝑏1 𝑥
Ave$ (HP = 210): 𝑌෠ = 𝑏0 + 𝑏1 𝑥 = −1399 + 145.37 210 = $29,129

A 95% prediction interval for an individual value of Y is


95% PI: 𝑌෠ ± 𝑡0.025,𝑑𝑓 𝐸𝑟𝑟𝑜𝑟 × 𝑆𝑒∗

A 95% PI for Ave$ when HP = 210:


29129 ± 1.986 × 5977 = $17,259, $40,999

* In JMP and other programs, Se is denoted RMSE, or Root Mean Square Error.
Estimation: Confidence Intervals
Prediction (X = x)
𝑌෠ = 𝑏0 + 𝑏1 𝑥
Ave$ (HP = 210): 𝑌෠ = 𝑏0 + 𝑏1 𝑥 = −1399 + 145.37 210 = $29,129

A 95% confidence interval for the mean value of Y is


𝑆𝑒∗
95% CI: 𝑌෠ ± 𝑡0.025,𝑑𝑓 𝐸𝑟𝑟𝑜𝑟
𝑛

A 95% CI for Ave$ when HP = 210:


5977
29129 ± 1.986 × = $27,898, $30,360
93
* In JMP and other programs, Se is denoted RMSE, or Root Mean Square Error.

You might also like