0% found this document useful (0 votes)
6 views

Chapter9 MultipleRegressionAnalysis Correlation.pdf

Chapter 9 discusses multiple linear regression and correlation analysis, including the correlation matrix, coefficient of multiple determination (R²), and significance tests for regression models. It provides an example involving the BIG BEAR Company, demonstrating how advertising costs affect sales volume, and includes a dataset analyzing heating costs related to temperature, insulation, and furnace age. The chapter emphasizes the importance of understanding relationships between variables and the implications of multicollinearity in regression analysis.

Uploaded by

kgzawhein1910
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Chapter9 MultipleRegressionAnalysis Correlation.pdf

Chapter 9 discusses multiple linear regression and correlation analysis, including the correlation matrix, coefficient of multiple determination (R²), and significance tests for regression models. It provides an example involving the BIG BEAR Company, demonstrating how advertising costs affect sales volume, and includes a dataset analyzing heating costs related to temperature, insulation, and furnace age. The chapter emphasizes the importance of understanding relationships between variables and the implications of multicollinearity in regression analysis.

Uploaded by

kgzawhein1910
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

CHAPTER 9

Multiple Linear Regression and


Correlation Matrix
 Correlation matrix
 Coefficient of multiple determination (𝑅^2)
 Multiple linear regression equation
 F test for significance of multiple linear regression
model
 t test for significance of individual slope coefficients
 Multiple standard error of the estimate (Se)
 Qualitative Independent Variables (Dummy variables)
EXAMPLE 1 FROM CHAPTER 8
The BIG BEAR Company has advertised its products on the TV several times. The data of the
advertising costs and the sales volume after advertising have been recorded for ten times as
followed.
SUMMARY OUTPUT: Regression Analysis

Regression Statistics
Multiple R 0.759014109 This suggests advertising cost has a significant effect on sales,
R Square
Adjusted R Square
0.576102418
0.52311522
explaining about 57.61% of sales variation (from R2).
Standard Error 9.900823995
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 1065.789474 1065.789 10.87248 0.01090193
Residual 8 784.2105263 98.02632
Total 9 1850
Simple linear regression is known as bivariate linear regression,
Intercept
Coefficients Standard Error
18.94736842
t Stat P-value
8.498818559 2.229412 0.056349
in which one dependent variable y is predicted using one
Lower 95% Upper 95% Lower 95.0% Upper 95.0%
-0.65094232 38.54568 -0.65094232 38.54567916
x 11.84210526 3.591406333 3.297345 0.010902 independent variable x.
3.560307409 20.1239 3.560307409 20.12390312

Independent variable is advertising cost (million baht)


Dependent variable is sales volume (million baht)
RECALL: SIMPLE LINEAR REGRESSION
• Simple linear regression is known as bivariate linear regression, in which one
dependent variable y is predicted using one independent variable x.

• Regression analysis with two or more independent variables is called multiple


linear regression.
• Multiple regression analysis is similar in principle to simple regression
analysis. However, it is more complex conceptually and computationally.

General Form General Form


• Population model: Y =  + X • Population model: Y = 𝛼 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑘 𝑋𝑘
• Sample equation: ŷ = a + bx • Sample equation: 𝑦ො = 𝑎 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + ⋯ + 𝑏𝑘 𝑥𝑘
• Pearson’s coefficient of

CORRELATION MATRIX
correlation (r)
• T Test for significance of the
correlation coefficient (ρ)
PEARSON’S CORRELATION COEFFICIENT (R)
is a number between -1 and 1 that yields the information about the strength and
direction of a relationship between variables.
 r = 1 Perfect positive correlation x ↑ y ↑ or x ↓ y ↓
 r = 0 Zero correlation There’s no relationship between variables
 r = -1 Perfect negative correlation x ↑ y ↓ or x ↓ y ↑
CORRELATION ANALYSIS

Similar to simple linear regression and


correlation (chapter 8), the correlation analysis is
used to find the relationship between two
variables. In case of multiple linear regression
and correlation matrix (chapter 9), it reports a set
of Pearson’s correlation coefficients in a
correlation matrix.

Correlation matrix shows all the possible


correlation coefficients (𝑟) between pairs of
variables.
CORRELATION ANALYSIS
The dataset examines heating costs across 20 properties and their
relationship to three key factors: mean outside temperature, attic
insulation, and furnace age. This data collection aims to understand
how these environmental and structural factors impact overall
heating expenses in residential or commercial properties.

Correlation matrix shows all the possible


correlation coefficients (𝑟) between pairs of
variables.

Heating Cost ($) Mean Outside Temp. (F) Attic Insulation (inches) Age of Furnance (years)
Heating Cost ($) 1
Mean Outside Temp. (F) -0.8115 1
Attic Insulation (inches) -0.2571 -0.1030 1
Age of Furnance (years) 0.5367 -0.4860 0.0636 1
Interpret the meaning of Pearson’s
CORRELATION MATRIX correlation coefficients (𝒓)
CORRELATION ANALYSIS

There is a ………………… linear There is a ………………… linear There is a ………………… linear


relationship between …………. relationship between …………. relationship between ………….
and ……………... and ……………... and ……………...

Heating Cost ($) Mean Outside Temp. (F) Attic Insulation (inches) Age of Furnance (years)
Heating Cost ($) 1
Mean Outside Temp. (F) -0.8115 1
Attic Insulation (inches) -0.2571 -0.1030 1
Age of Furnance (years) 0.5367 -0.4860 0.0636 1
Interpret the meaning of Pearson’s
CORRELATION MATRIX correlation coefficients (𝒓)

CORRELATION ANALYSIS

There is a ………………… linear There is a ………………… linear There is a ………………… linear


relationship between …………. relationship between …………. relationship between ………….
and ……………... and ……………... and ……………...

Heating Cost ($) Mean Outside Temp. (F) Attic Insulation (inches) Age of Furnance (years)
Heating Cost ($) 1
Mean Outside Temp. (F) -0.8115 1
Attic Insulation (inches) -0.2571 -0.1030 1
Age of Furnance (years) 0.5367 -0.4860 0.0636 1
MULTICOLLINEARITY
What Is Multicollinearity?
When the independent variables 𝑋1 , 𝑋2 , . . . , 𝑋𝑚 are related to each other instead of being
independent, we have a condition known as multicollinearity. If only two predictors are correlated, we
have collinearity. Almost any data set will have some degree of correlation among the predictors.
The depth of our concern would depend on the degree of multicollinearity.
Variance Inflation
Multicollinearity does not bias the least squares estimates or the predictions for Y, but it does induce
variance inflation. When predictors are strongly intercorrelated, the variances of their estimated
coefficients tend to become inflated, widening the confidence intervals for the true coefficients 𝛽1 ,
𝛽2 , . . . , 𝛽𝑘 and making the t statistics less reliable. It can thus be difficult to identify the separate
contribution of each predictor to “explaining” the response variable, due to the entanglement of their
roles.
RECALL: CORRELATION ANALYSIS (CHAPTER 8)

Order of the analysis Check stat inference generalized


• Start with data visualization (Scatter plot) from sample to population
• Then check coefficient of correlation (r)
• Then check coefficient of determination (𝑟 2 ) t Test for Significance of
• Finally test specific parameter significance (t-test on ρ) the Correlation
Coefficient (ρ)

Check how total variation in x can


be used to explain total variation in
y

Direction and strength of Coefficient of


Pattern determination (𝑟 2 )
relationship (r)
RECALL: T TEST FOR SIGNIFICANCE OF THE CORRELATION COEFFICIENT (Ρ)

This method is statistical inference. We use the sample correlation coefficient (r) to infer or to
make conclusion about the population correlation coefficient () under a particular level of
significance ().

Hypotheses
State null and alternative hypothesis in symbolic form
H0:  = 0 H0:   0 H0:   0
HA:  ≠ 0 HA:  < 0 HA:  > 0

Test statistic

Calculate test statistic


with
t Test for Significance of the Correlation Coefficient (ρ)
Determine whether there is any correlation between the heating cost ($) and the mean outside
temperature (F) in the population, at level of significance = 1%.
Heating Cost ($) Mean Outside Temp. (F) Attic Insulation (inches) Age of Furnance (years)
Heating Cost ($) 1
Mean Outside Temp. (F) -0.8115 1
Attic Hypotheses
Insulation (inches) -0.2571 -0.1030 1
Age of Furnance (years) 0.5367 -0.4860 0.0636 1

H0:  = 0 H0:   0 H0:   0


HA:  ≠ 0 HA:  < 0 HA:  > 0

Test statistic

with

There is …………………. between the heating cost ($) and the mean outside temperature
(F) in the population, at 1% level of significant
t Test for Significance of the Correlation Coefficient (ρ)
The CORR Procedure

4 Variables:Heat Mean Attic Age At 1% significance level, is there any significant correlation between
heating cost ($) and attic insulation (inches) in the population?
Pearson Correlation Coefficients, N = 20
Prob > |r| under H0: Rho=0
Heat Mean Attic Age
Heat 1.00000 -0.81151 -0.25710 0.53673
Heating Cost ( $ ) <.0001 0.2738 0.0147
Mean -0.81151 1.00000 -0.10302 -0.48599 At 1% significance level, is there any significant correlation between
Mean Outside Temperature <.0001 0.6656 0.0298
Attic -0.25710 -0.10302 1.00000 0.06362
heating cost ($) and age of furnace (years) in the population?
Attic Insulation ( inches ) 0.2738 0.6656 0.7899
Age 0.53673 -0.48599 0.06362 1.00000
Age of Furnace ( years ) 0.0147 0.0298 0.7899
At 1% significance level, is there any significant positive correlation
between heating cost ($) and age of furnace (years) in the
population?
t Test for Significance of the Correlation Coefficient (ρ)
The CORR Procedure

4 Variables:Heat Mean Attic Age Which pair of the variables has the strongest correlation?
Pearson Correlation Coefficients, N = 20
Prob > |r| under H0: Rho=0
Heat Mean Attic Age
Heat 1.00000 -0.81151 -0.25710 0.53673
Heating Cost ( $ ) <.0001 0.2738 0.0147
Mean -0.81151 1.00000 -0.10302 -0.48599 Which pair of the variables has the weakest correlation?
Mean Outside Temperature <.0001 0.6656 0.0298
Attic -0.25710 -0.10302 1.00000 0.06362
Attic Insulation ( inches ) 0.2738 0.6656 0.7899
Age 0.53673 -0.48599 0.06362 1.00000
Age of Furnace ( years ) 0.0147 0.0298 0.7899
• Multiple linear regression

MULTIPLE LINEAR •
equation
F test for significance of
simple linear regression

REGRESSION •
model
t test for significance of
individual slope coefficients
FYI
Cause and Effect?
When we propose a regression model, we might have a causal mechanism in mind, but cause
and effect is not proven by a simple regression. We cannot assume that the explanatory
variable is “causing” the variation we see in the response variable.

Extrapolation Outside the Range of X


Predictions from our fitted regression model are stronger within the range of our sample x
values. The relationship seen in the scatter plot may not be true for values far outside our
observed x range. Extrapolation outside the observed range of x is always tempting but should
be approached with caution.
RECALL: SIMPLE LINEAR REGRESSION
• Simple linear regression is known as bivariate linear regression, in which one
dependent variable y is predicted using one independent variable x.

• Regression analysis with two or more independent variables is called multiple


linear regression.
• Multiple regression analysis is similar in principle to simple regression
analysis. However, it is more complex conceptually and computationally.
General Form General Form
• Population model: Y =  + X • Population model: Y = 𝛼 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑘 𝑋𝑘
• Sample equation: ŷ = a + bx • Sample equation: 𝑦ො = 𝑎 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + ⋯ + 𝑏𝑘 𝑥𝑘
REGRESSION ANALYSIS

FYI
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.

Independent variable is ………………………………………………………………


Dependent variable is ………………………………

Order of the analysis


• Start with building the model
• Then test overall model significance (ANOVA)
• Then test specific parameter significance (t-test)
• Finally examine various measures of model fit (𝑅2 , 𝑆𝑒 )
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.
SUMMARY OUTPUT

Regression Statistics SSR


Multiple R 0.8968
R2 =
SST
R Square 0.8042
Adjusted R Square 0.7675
Standard Error 51.0486 Se = MSE
Observations 20

ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500

Coefficients Standard Error t Stat P-value


Intercept 427.1938 59.6014 7.1675 2.23764E-06
Mean Outside Temp. (F) -4.5827 0.7723 -5.9336 2.10035E-05
Attic Insulation (inches) -14.8309 4.7544 -3.1194 0.0066
Age of Furnance (years) 6.1010 4.0121 1.5207 0.1479
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.
SUMMARY OUTPUT 1: Build regression model (Regression equation)
Regression Statistics
Multiple R 0.8968 𝑦ො = 𝑎 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + 𝑏3 𝑥3
R Square 0.8042
Adjusted R Square 0.7675
=
Standard Error 51.0486
Observations 20
=
ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500

Coefficients Standard Error t Stat P-value


Intercept 427.1938 59.6014 7.1675 2.23764E-06
Mean Outside Temp. (F) -4.5827 0.7723 -5.9336 2.10035E-05
Attic Insulation (inches) -14.8309 4.7544 -3.1194 0.0066
Age of Furnance (years) 6.1010 4.0121 1.5207 0.1479
RECALL: F TEST FOR SIGNIFICANCE OF SIMPLE LINEAR
REGRESSION

2: Test overall model significance (ANOVA) – CV Vs Test statistic

SSR
R2 =
SST
Se = MSE
F Test for Significance of Multiple Linear Regression

2: Test overall model significance (ANOVA) – CV Vs Test statistic

K = the number of independent variables


EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.
SUMMARY OUTPUT
2: Test overall model significance (ANOVA) – CV Vs Test statistic
Regression Statistics
Multiple R 0.8968 𝐻0 : The regression model is not significant or not useful
R Square 0.8042
Adjusted R Square 0.7675 𝐻𝑎 : The regression model is significant or useful
Standard Error 51.0486
Observations 20

ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500

Coefficients Standard Error t Stat P-value


Intercept 427.1938 59.6014 7.1675 2.23764E-06
Mean Outside Temp. (F) -4.5827 0.7723 -5.9336 2.10035E-05
Attic Insulation (inches) -14.8309 4.7544 -3.1194 0.0066
Age of Furnance (years) 6.1010 4.0121 1.5207 0.1479
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.
SUMMARY OUTPUT
2: Test overall model significance (ANOVA) – Alpha Vs P-value
Regression Statistics
Multiple R 0.8968 𝐻0 : The regression model is not significant or not useful
R Square 0.8042
Adjusted R Square 0.7675 𝐻𝑎 : The regression model is significant or useful
Standard Error 51.0486
Observations 20

ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500

Coefficients Standard Error t Stat P-value


Intercept 427.1938 59.6014 7.1675 2.23764E-06
Mean Outside Temp. (F) -4.5827 0.7723 -5.9336 2.10035E-05
Attic Insulation (inches) -14.8309 4.7544 -3.1194 0.0066
Age of Furnance (years) 6.1010 4.0121 1.5207 0.1479
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.

SUMMARY OUTPUT 2: Test overall model significance (ANOVA) & 𝑅2


Regression Statistics
Multiple R 0.8968 The regression model is significant or useful and approximately
R Square 0.8042 …………….. % of the total variation in ………………….. can be explained
Adjusted R Square 0.7675
Standard Error 51.0486 by the three independent variables namely mean outside temperature (℉),
Observations 20 attic Insulation (inches) and age of furnace (years).
ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500

Coefficients Standard Error t Stat P-value


Intercept 427.1938 59.6014 7.1675 2.23764E-06
Mean Outside Temp. (F) -4.5827 0.7723 -5.9336 2.10035E-05
Attic Insulation (inches) -14.8309 4.7544 -3.1194 0.0066
Age of Furnance (years) 6.1010 4.0121 1.5207 0.1479
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.
3: Slopes (𝑏1 )
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8968 With all other variables held constant, if the mean outside temperature
R Square 0.8042
Adjusted R Square 0.7675
Standard Error 51.0486
……………. , the heating cost is expected to ……………………………
Observations 20

ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500

Coefficients Standard Error t Stat P-value


Intercept 427.1938 59.6014 7.1675 2.23764E-06
Mean Outside Temp. (F) -4.5827 0.7723 -5.9336 2.10035E-05
Attic Insulation (inches) -14.8309 4.7544 -3.1194 0.0066
Age of Furnance (years) 6.1010 4.0121 1.5207 0.1479
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.
3: Slopes (𝑏2 )
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8968 With all other variables held constant, if the attic insulation
R Square 0.8042
Adjusted R Square 0.7675
Standard Error 51.0486
…………………. , the heating cost is expected to ……………………………
Observations 20

ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500

Coefficients Standard Error t Stat P-value


Intercept 427.1938 59.6014 7.1675 2.23764E-06
Mean Outside Temp. (F) -4.5827 0.7723 -5.9336 2.10035E-05
Attic Insulation (inches) -14.8309 4.7544 -3.1194 0.0066
Age of Furnance (years) 6.1010 4.0121 1.5207 0.1479
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.
3: Slopes (𝑏3 )
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8968 With all other variables held constant, if the age of furnace………………….
R Square 0.8042
Adjusted R Square 0.7675
Standard Error 51.0486
, the heating cost is expected to ……………………………
Observations 20

ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500

Coefficients Standard Error t Stat P-value


Intercept 427.1938 59.6014 7.1675 2.23764E-06
Mean Outside Temp. (F) -4.5827 0.7723 -5.9336 2.10035E-05
Attic Insulation (inches) -14.8309 4.7544 -3.1194 0.0066
Age of Furnance (years) 6.1010 4.0121 1.5207 0.1479
RECALL: T TEST FOR SIGNIFICANCE OF LINEAR RELATIONSHIP BETWEEN TWO
VARIABLES ()
If the i could be zero, we can
Step 1: State null and alternative hypothesis in symbolic form
eliminate it from our model and we
need to find a new regression
equation.

Step 2: Find critical value Step 3: Find test statistic


+CV
- CV +CV - CV

Step 4: Decision & Conclusion

The null hypothesis is ……………….. and the conclusion would be that there is
…………………… linear relationship between ………………………. and ……………………...
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.
3: t Test for Significance of Individual Slope Coefficients (𝛽1 , 𝛽2 , 𝛽3 )
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8968
R Square 0.8042
Adjusted R Square 0.7675
Standard Error 51.0486
Observations 20

ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500

Coefficients Standard Error t Stat P-value


Intercept 427.1938 59.6014 7.1675 2.23764E-06
Mean Outside Temp. (F) -4.5827 0.7723 -5.9336 2.10035E-05
Attic Insulation (inches) -14.8309 4.7544 -3.1194 0.0066
Age of Furnance (years) 6.1010 4.0121 1.5207 0.1479
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.

SUMMARY OUTPUT 3: t Test for Significance of Individual Slope Coefficients (𝛽1 , 𝛽2 , 𝛽3 )


Regression Statistics The null hypothesis is …………… and the conclusion would be that there is
Multiple R 0.8968 ……................................. linear relationship between mean outside
R Square 0.8042
Adjusted R Square 0.7675
temperature and heating cost.
Standard Error 51.0486
The null hypothesis is …………… and the conclusion would be that there is
Observations 20
……................................. linear relationship between attic insulation and
ANOVA heating cost.
df SS MS F Significance F
Regression 3 171220.4728
The null hypothesis
57073.4909 21.9012 6.56178E-06
is …………… and the conclusion would be that there is
Residual 16 41695.2772 ……................................. linear relationship between age of furnace and
2605.9548
Total 19 212915.7500 heating cost.
Coefficients Standard Error t Stat P-value
Hence the 3 could be zero, we can eliminate
Intercept 427.1938 59.6014 7.1675 2.23764E-06 age of furnace from our model. Then run the
Mean Outside Temp. (F) -4.5827 0.7723 -5.9336 2.10035E-05
Attic Insulation (inches) -14.8309 4.7544 -3.1194 0.0066 regression analysis again.
Age of Furnance (years) 6.1010 4.0121 1.5207 0.1479
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.
SUMMARY OUTPUT
1: Build regression model (Regression equation)
Regression Statistics
Multiple R 0.8808
R Square 0.7759
Adjusted R Square 0.7495
Standard Error 52.9824
Observations 20

ANOVA
df SS MS F Significance F
Regression 2 165194.5213 82597.2607 29.4241 3.01497E-06
Residual 17 47721.2287 2807.1311
Total 19 212915.7500

Coefficients Standard Error t Stat P-value


Intercept 490.2859 44.4098 11.0400 3.56342E-09
Mean Outside Temp. (F) -5.1499 0.7019 -7.3372 1.16062E-06
Attic Insulation (inches) -14.7181 4.9339 -2.9831 0.0084

Hence the 3 could be zero, we can eliminate age of furnace from our model. Then run the regression analysis again.
Order of the analysis
• Start with building the model
• Then test overall model significance (ANOVA)

EXAMPLE 2 • Then test specific parameter significance (t-test)


• Finally examine various measures of model fit (𝑅2 , 𝑆𝑒 )

What factors determine how happy workers are in their jobs? Use the following data and
multiple regression to produce a model to predict employee satisfaction and then comment on
the results of the process.
Relationship
Job with Overall quality of Total hours Opportunities
satisfaction supervisor work environment worked per week for advancement
60 2 7 56 3
15 1 1 75 1
95 5 9 40 5
56 4 8 69 4
40 2 3 55 3
80 4 8 40 5
10 1 1 90 1
90 5 9 35 5
30 3 6 55 2
65 3 6 55 2
75 3 6 54 3
15 2 1 55 1
Order of the analysis
• Start with building the model
• Then test overall model significance (ANOVA)
EXAMPLE 3 • Then test specific parameter significance (t-test)
• Finally examine various measures of model fit (𝑅2 , 𝑆𝑒 )

The owner of Showtime Movie Theaters, Inc., would like to predict weekly gross revenue as a
function of advertising expenditures. Historical data for a sample of eight weeks follow

Weekly Gross Social Media Television Advertising in


Revenue in $1000s Advertising in $1000s $1000s
96 5 1.5
90 2 2
95 4 1.5
92 2.5 2.5
95 3 3.3
94 3.5 2.3
94 2.5 4.2
94 3 2.5
DUMMY VARIABLE
Multiple regression models can also be written to include qualitative (or categorical) independent
variables. Qualitative variables, unlike quantitative variables, cannot be measured on a
numerical scale. Therefore, we must code the values of the qualitative variables (called levels)
as numbers before we can fit the model. These coded qualitative variables are called dummy (or
indicator) variables because the numbers assigned to the various levels are arbitrarily selected.

To illustrate, suppose a female executive at a certain company claims that male executives earn
higher salaries, on average, than female executives with the same education, experience, and
responsibilities. To support her claim, she wants to model the salary y of an executive using a
qualitative independent variable representing the gender of an executive (male or female).
EXAMPLE 1 Home Heating Cost ($) Mean Outside Temp. (F) Attic Insulation (inches) Garage
1 250 35 3 0
2 360 29 4 1
3 165 36 7 0
An empirical investigation was conducted to examine the 4 43 60 6 0
factors that impact overall heating expenses across 20 5 92 65 5 0
residential and commercial properties. The initial 6
7
200
355
30
10
5
6
0
1
analysis employed multiple linear regression to assess 8 290 7 10 1
the relationship between heating expenditure and three 9 230 21 9 0
10 120 55 2 0
predictor variables: mean ambient temperature, attic 11 73 54 12 0
insulation (inches), and furnace age (years). 12 205 48 5 1
13 400 20 5 1
14 320 39 4 1
Upon statistical analysis, the results indicated no 15 72 60 8 0
significant linear association between furnace age and 16 272 20 5 1
17 94 58 7 0
heating costs (p > 0.05). 18 190 40 8 1
19 235 27 9 0
Consequently, the study was modified to incorporate an 20 139 30 7 0
alternative binary predictor variable: the presence or
absence of a garage structure within each property unit.
REVISION SLR Vs MLR
SLR
Test Linear Relationship Test Correlation

Ho :  = 0  
Ha :   0  

CV : t , n − 2

b
ts : t =
Sb

D &C :
MLR
Test Regression Model ( Global test ) Test Individual Slope ( Test Linear Relationship )
( K = the number of independent varia
SLR The linear relationship between 1Y and 1X Predict Final from midterm
Predict y from x
Test Correlation

Ho : ρ = 0
Ha : ρ ≠ 0

Y = + x
Test Linear Relationship between x and y.
Ho :  = 0
HA :   0
MLR The linear relationship between 1Y and many X

y x1 x2 x3 x4 Scatter Diagram
No. FIN Quiz Mid ATT ASS
1 40 6 25 4 5
Correlation r 
:
:
: : : : : Test Correlation HO :  = 0
: Regression Model Y =  + 1 X 1 +  2 X 2 + 3 X 3 +  4 X 4
:
: Test Regression Model H O : 1 =  2 = 3 =  4 = 0
100
Test Individual Slope
H O : 1 = 0 HO : 2 = 0 H O : 3 = 0 HO : 4 = 0

Predict Final from Quiz , Midterm , Attendance and Assignment


H O : 1 = 0 There is no linear relationship between y and x1.

H a : 1  0 There is a linear relationship between y and x1.

HO : 2 = 0 There is no linear relationship between y and x2.


H a : 2  0 There is a linear relationship between y and x2.

H O : 3 = 0 There is no linear relationship between y and x3.

H a : 3  0 There is a linear relationship between y and x3.

You might also like