Chapter9 MultipleRegressionAnalysis Correlation.pdf
Chapter9 MultipleRegressionAnalysis Correlation.pdf
Regression Statistics
Multiple R 0.759014109 This suggests advertising cost has a significant effect on sales,
R Square
Adjusted R Square
0.576102418
0.52311522
explaining about 57.61% of sales variation (from R2).
Standard Error 9.900823995
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 1065.789474 1065.789 10.87248 0.01090193
Residual 8 784.2105263 98.02632
Total 9 1850
Simple linear regression is known as bivariate linear regression,
Intercept
Coefficients Standard Error
18.94736842
t Stat P-value
8.498818559 2.229412 0.056349
in which one dependent variable y is predicted using one
Lower 95% Upper 95% Lower 95.0% Upper 95.0%
-0.65094232 38.54568 -0.65094232 38.54567916
x 11.84210526 3.591406333 3.297345 0.010902 independent variable x.
3.560307409 20.1239 3.560307409 20.12390312
CORRELATION MATRIX
correlation (r)
• T Test for significance of the
correlation coefficient (ρ)
PEARSON’S CORRELATION COEFFICIENT (R)
is a number between -1 and 1 that yields the information about the strength and
direction of a relationship between variables.
r = 1 Perfect positive correlation x ↑ y ↑ or x ↓ y ↓
r = 0 Zero correlation There’s no relationship between variables
r = -1 Perfect negative correlation x ↑ y ↓ or x ↓ y ↑
CORRELATION ANALYSIS
Heating Cost ($) Mean Outside Temp. (F) Attic Insulation (inches) Age of Furnance (years)
Heating Cost ($) 1
Mean Outside Temp. (F) -0.8115 1
Attic Insulation (inches) -0.2571 -0.1030 1
Age of Furnance (years) 0.5367 -0.4860 0.0636 1
Interpret the meaning of Pearson’s
CORRELATION MATRIX correlation coefficients (𝒓)
CORRELATION ANALYSIS
Heating Cost ($) Mean Outside Temp. (F) Attic Insulation (inches) Age of Furnance (years)
Heating Cost ($) 1
Mean Outside Temp. (F) -0.8115 1
Attic Insulation (inches) -0.2571 -0.1030 1
Age of Furnance (years) 0.5367 -0.4860 0.0636 1
Interpret the meaning of Pearson’s
CORRELATION MATRIX correlation coefficients (𝒓)
CORRELATION ANALYSIS
Heating Cost ($) Mean Outside Temp. (F) Attic Insulation (inches) Age of Furnance (years)
Heating Cost ($) 1
Mean Outside Temp. (F) -0.8115 1
Attic Insulation (inches) -0.2571 -0.1030 1
Age of Furnance (years) 0.5367 -0.4860 0.0636 1
MULTICOLLINEARITY
What Is Multicollinearity?
When the independent variables 𝑋1 , 𝑋2 , . . . , 𝑋𝑚 are related to each other instead of being
independent, we have a condition known as multicollinearity. If only two predictors are correlated, we
have collinearity. Almost any data set will have some degree of correlation among the predictors.
The depth of our concern would depend on the degree of multicollinearity.
Variance Inflation
Multicollinearity does not bias the least squares estimates or the predictions for Y, but it does induce
variance inflation. When predictors are strongly intercorrelated, the variances of their estimated
coefficients tend to become inflated, widening the confidence intervals for the true coefficients 𝛽1 ,
𝛽2 , . . . , 𝛽𝑘 and making the t statistics less reliable. It can thus be difficult to identify the separate
contribution of each predictor to “explaining” the response variable, due to the entanglement of their
roles.
RECALL: CORRELATION ANALYSIS (CHAPTER 8)
This method is statistical inference. We use the sample correlation coefficient (r) to infer or to
make conclusion about the population correlation coefficient () under a particular level of
significance ().
Hypotheses
State null and alternative hypothesis in symbolic form
H0: = 0 H0: 0 H0: 0
HA: ≠ 0 HA: < 0 HA: > 0
Test statistic
Test statistic
with
There is …………………. between the heating cost ($) and the mean outside temperature
(F) in the population, at 1% level of significant
t Test for Significance of the Correlation Coefficient (ρ)
The CORR Procedure
4 Variables:Heat Mean Attic Age At 1% significance level, is there any significant correlation between
heating cost ($) and attic insulation (inches) in the population?
Pearson Correlation Coefficients, N = 20
Prob > |r| under H0: Rho=0
Heat Mean Attic Age
Heat 1.00000 -0.81151 -0.25710 0.53673
Heating Cost ( $ ) <.0001 0.2738 0.0147
Mean -0.81151 1.00000 -0.10302 -0.48599 At 1% significance level, is there any significant correlation between
Mean Outside Temperature <.0001 0.6656 0.0298
Attic -0.25710 -0.10302 1.00000 0.06362
heating cost ($) and age of furnace (years) in the population?
Attic Insulation ( inches ) 0.2738 0.6656 0.7899
Age 0.53673 -0.48599 0.06362 1.00000
Age of Furnace ( years ) 0.0147 0.0298 0.7899
At 1% significance level, is there any significant positive correlation
between heating cost ($) and age of furnace (years) in the
population?
t Test for Significance of the Correlation Coefficient (ρ)
The CORR Procedure
4 Variables:Heat Mean Attic Age Which pair of the variables has the strongest correlation?
Pearson Correlation Coefficients, N = 20
Prob > |r| under H0: Rho=0
Heat Mean Attic Age
Heat 1.00000 -0.81151 -0.25710 0.53673
Heating Cost ( $ ) <.0001 0.2738 0.0147
Mean -0.81151 1.00000 -0.10302 -0.48599 Which pair of the variables has the weakest correlation?
Mean Outside Temperature <.0001 0.6656 0.0298
Attic -0.25710 -0.10302 1.00000 0.06362
Attic Insulation ( inches ) 0.2738 0.6656 0.7899
Age 0.53673 -0.48599 0.06362 1.00000
Age of Furnace ( years ) 0.0147 0.0298 0.7899
• Multiple linear regression
MULTIPLE LINEAR •
equation
F test for significance of
simple linear regression
REGRESSION •
model
t test for significance of
individual slope coefficients
FYI
Cause and Effect?
When we propose a regression model, we might have a causal mechanism in mind, but cause
and effect is not proven by a simple regression. We cannot assume that the explanatory
variable is “causing” the variation we see in the response variable.
FYI
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.
ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500
SSR
R2 =
SST
Se = MSE
F Test for Significance of Multiple Linear Regression
ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500
ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500
Regression Statistics
Multiple R 0.8968 With all other variables held constant, if the mean outside temperature
R Square 0.8042
Adjusted R Square 0.7675
Standard Error 51.0486
……………. , the heating cost is expected to ……………………………
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500
Regression Statistics
Multiple R 0.8968 With all other variables held constant, if the attic insulation
R Square 0.8042
Adjusted R Square 0.7675
Standard Error 51.0486
…………………. , the heating cost is expected to ……………………………
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500
Regression Statistics
Multiple R 0.8968 With all other variables held constant, if the age of furnace………………….
R Square 0.8042
Adjusted R Square 0.7675
Standard Error 51.0486
, the heating cost is expected to ……………………………
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500
The null hypothesis is ……………….. and the conclusion would be that there is
…………………… linear relationship between ………………………. and ……………………...
EXAMPLE 1: MULTIPLE LINEAR REGRESSION
The dataset examines heating costs across 20 properties and their relationship to three key
factors: mean outside temperature, attic insulation, and furnace age. This data collection aims to
understand how these environmental and structural factors impact overall heating expenses in
residential or commercial properties.
3: t Test for Significance of Individual Slope Coefficients (𝛽1 , 𝛽2 , 𝛽3 )
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.8968
R Square 0.8042
Adjusted R Square 0.7675
Standard Error 51.0486
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 171220.4728 57073.4909 21.9012 6.56178E-06
Residual 16 41695.2772 2605.9548
Total 19 212915.7500
ANOVA
df SS MS F Significance F
Regression 2 165194.5213 82597.2607 29.4241 3.01497E-06
Residual 17 47721.2287 2807.1311
Total 19 212915.7500
Hence the 3 could be zero, we can eliminate age of furnace from our model. Then run the regression analysis again.
Order of the analysis
• Start with building the model
• Then test overall model significance (ANOVA)
What factors determine how happy workers are in their jobs? Use the following data and
multiple regression to produce a model to predict employee satisfaction and then comment on
the results of the process.
Relationship
Job with Overall quality of Total hours Opportunities
satisfaction supervisor work environment worked per week for advancement
60 2 7 56 3
15 1 1 75 1
95 5 9 40 5
56 4 8 69 4
40 2 3 55 3
80 4 8 40 5
10 1 1 90 1
90 5 9 35 5
30 3 6 55 2
65 3 6 55 2
75 3 6 54 3
15 2 1 55 1
Order of the analysis
• Start with building the model
• Then test overall model significance (ANOVA)
EXAMPLE 3 • Then test specific parameter significance (t-test)
• Finally examine various measures of model fit (𝑅2 , 𝑆𝑒 )
The owner of Showtime Movie Theaters, Inc., would like to predict weekly gross revenue as a
function of advertising expenditures. Historical data for a sample of eight weeks follow
To illustrate, suppose a female executive at a certain company claims that male executives earn
higher salaries, on average, than female executives with the same education, experience, and
responsibilities. To support her claim, she wants to model the salary y of an executive using a
qualitative independent variable representing the gender of an executive (male or female).
EXAMPLE 1 Home Heating Cost ($) Mean Outside Temp. (F) Attic Insulation (inches) Garage
1 250 35 3 0
2 360 29 4 1
3 165 36 7 0
An empirical investigation was conducted to examine the 4 43 60 6 0
factors that impact overall heating expenses across 20 5 92 65 5 0
residential and commercial properties. The initial 6
7
200
355
30
10
5
6
0
1
analysis employed multiple linear regression to assess 8 290 7 10 1
the relationship between heating expenditure and three 9 230 21 9 0
10 120 55 2 0
predictor variables: mean ambient temperature, attic 11 73 54 12 0
insulation (inches), and furnace age (years). 12 205 48 5 1
13 400 20 5 1
14 320 39 4 1
Upon statistical analysis, the results indicated no 15 72 60 8 0
significant linear association between furnace age and 16 272 20 5 1
17 94 58 7 0
heating costs (p > 0.05). 18 190 40 8 1
19 235 27 9 0
Consequently, the study was modified to incorporate an 20 139 30 7 0
alternative binary predictor variable: the presence or
absence of a garage structure within each property unit.
REVISION SLR Vs MLR
SLR
Test Linear Relationship Test Correlation
Ho : = 0
Ha : 0
CV : t , n − 2
b
ts : t =
Sb
D &C :
MLR
Test Regression Model ( Global test ) Test Individual Slope ( Test Linear Relationship )
( K = the number of independent varia
SLR The linear relationship between 1Y and 1X Predict Final from midterm
Predict y from x
Test Correlation
Ho : ρ = 0
Ha : ρ ≠ 0
Y = + x
Test Linear Relationship between x and y.
Ho : = 0
HA : 0
MLR The linear relationship between 1Y and many X
y x1 x2 x3 x4 Scatter Diagram
No. FIN Quiz Mid ATT ASS
1 40 6 25 4 5
Correlation r
:
:
: : : : : Test Correlation HO : = 0
: Regression Model Y = + 1 X 1 + 2 X 2 + 3 X 3 + 4 X 4
:
: Test Regression Model H O : 1 = 2 = 3 = 4 = 0
100
Test Individual Slope
H O : 1 = 0 HO : 2 = 0 H O : 3 = 0 HO : 4 = 0