100% found this document useful (3 votes)
685 views75 pages

Multiple Regression Analysis

This document discusses multiple regression analysis. Multiple regression allows predicting a dependent variable (Y) from two or more independent variables (X1, X2, etc). It extends simple linear regression, which uses only one independent variable. The multiple regression equation estimates the effect of each independent variable on Y while controlling for the other variables. The R-squared statistic indicates how well the model fits the data. A case study demonstrates applying multiple regression to predict home heating oil use based on temperature and insulation.

Uploaded by

Mira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
685 views75 pages

Multiple Regression Analysis

This document discusses multiple regression analysis. Multiple regression allows predicting a dependent variable (Y) from two or more independent variables (X1, X2, etc). It extends simple linear regression, which uses only one independent variable. The multiple regression equation estimates the effect of each independent variable on Y while controlling for the other variables. The R-squared statistic indicates how well the model fits the data. A case study demonstrates applying multiple regression to predict home heating oil use based on temperature and insulation.

Uploaded by

Mira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 75

Regression Analysis

Multiple Regression
[ Cross-Sectional Data ]
Learning Objectives
Explain the linear multiple regression
model [for cross-sectional data]
Interpret linear multiple regression
computer output
Explain multicollinearity
Describe the types of multiple regression
models
Regression Modeling Steps
Define problem or question
Specify model
Collect data
Do descriptive data analysis
Estimate unknown parameters
Evaluate model
Use model for prediction
Simple vs. Multiple
represents the
unit change in Y
per unit change in
X .
Does not take into
account any other
variable besides
single independent
variable.

i
represents the unit
change in Y per unit
change in X
i
.
Takes into account
the effect of other

i
s.
Net regression
coefficient.
Assumptions
Linearity - the Y variable is linearly related
to the value of the X variable.
Independence of Error - the error
(residual) is independent for each value of X.
Homoscedasticity - the variation around
the line of regression be constant for all values
of X.
Normality - the values of Y be normally
distributed at each value of X.

Goal
Develop a statistical model that
can predict the values of a
dependent (response) variable
based upon the values of the
independent (explanatory)
variables.
Simple Regression
A statistical model that utilizes
one quantitative independent
variable X to predict the
quantitative dependent
variable Y.
Multiple Regression
A statistical model that utilizes two
or more quantitative and
qualitative explanatory variables
(x
1
,..., x
p
) to predict a quantitative
dependent variable Y.
Caution: have at least two or more quantitative
explanatory variables (rule of thumb)
Multiple Regression Model
X
2
X
1
Y
e

Hypotheses
H
0
:
1
=
2
=
3
= ... =
P
= 0

H
1
: At least one regression
coefficient is not equal to zero
Hypotheses (alternate format)

H
0
:
i
= 0

H
1
:
i
0

Types of Models
Positive linear relationship
Negative linear relationship
No relationship between X and Y
Positive curvilinear relationship
U-shaped curvilinear
Negative curvilinear relationship
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Multiple Regression Equations
This is too
complicated!
Youve got to
be kiddin!
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Linear Model
Relationship between one dependent & two
or more independent variables is a linear
function

P P
X X X Y
2 2 1 1 0
Dependent
(response)
variable
Independent
(explanatory)
variables
Population
slopes
Population
Y-intercept
Random
error
Method of Least Squares
The straight line that best fits the data.

Determine the straight line for which the
differences between the actual values (Y)
and the values that would be predicted
from the fitted line of regression (Y-hat)
are as small as possible.
Measures of Variation
Explained variation (sum of
squares due to regression)
Unexplained variation (error sum
of squares)
Total sum of squares
Coefficient of Multiple Determination
When null hypothesis
is rejected, a
relationship between
Y and the X variables
exists.
Strength measured by
R
2
[ several types ]
Coefficient of Multiple
Determination
R
2
y.123- - -P

The proportion of Y that is
explained by the set of
explanatory variables selected
Standard Error of the Estimate
s
y.x

the measure of
variability
around the
line of
regression
Confidence interval estimates
True mean

Y.X

Individual
Y-hat
i
Interval Bands [from simple regression]
X
Y
X
Y
i
=

b
0

+

b
1
X
^
X
given
_
Multiple Regression Equation
Y-hat =
0
+
1
x
1
+
2
x
2
+ ... +
P
x
P
+

where:

0
= y-intercept {a constant value}


1
= slope of Y with variable x
1
holding the
variables x
2
, x
3
, ..., x
P
effects constant

P
= slope of Y with variable x
P
holding all
other variables effects constant
Who is in Charge?
Mini-Case
Predict the consumption of home
heating oil during January for
homes located around Screne Lakes.
Two explanatory variables are
selected - - average daily
atmospheric temperature (
o
F) and
the amount of attic insulation ().
Oi l (Ga l ) Te mp I nsul a ti on
275. 30 40 3
363. 80 27 3
164. 30 40 10
40. 80 73 6
94. 30 64 6
230. 90 34 6
366. 70 9 6
300. 60 8 10
237. 80 23 10
121. 40 63 3
31. 40 65 10
203. 50 41 6
441. 10 21 3
323. 00 38 3
52. 50 58 10
Mini-Case
(
0
F)
Develop a model for
estimating heating oil
used for a single family
home in the month of
January based on average
temperature and amount
of insulation in inches.
Mini-Case
What preliminary conclusions can home
owners draw from the data?

What could a home owner expect heating
oil consumption (in gallons) to be if the
outside temperature is 15
o
F when the
attic insulation is 10 inches thick?
Multiple Regression Equation
[mini-case]
Dependent variable: Gallons Consumed
-------------------------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
--------------------------------------------------------------------------------------
CONSTANT 562.151 21.0931 26.6509 0.0000
Insulation -20.0123 2.34251 -8.54313 0.0000
Temperature -5.43658 0.336216 -16.1699 0.0000
--------------------------------------------------------------------------------------
R-squared = 96.561 percent
R-squared (adjusted for d.f.) = 95.9879 percent
Standard Error of Est. = 26.0138
+
Multiple Regression Equation
[mini-case]

Y-hat = 562.15 - 5.44x
1
- 20.01x
2

where: x
1
= temperature [degrees F]
x
2
= attic insulation [inches]
Multiple Regression Equation
[mini-case]
Y-hat = 562.15 - 5.44x
1
- 20.01x
2
thus:
For a home with zero inches of attic
insulation and an outside temperature
of 0
o
F, 562.15 gallons of heating oil
would be consumed.
[ caution .. data boundaries .. extrapolation ]

+
Extrapolation
Y
Interpolation
X
Extrapolation Extrapolation
Relevant Range
Multiple Regression Equation
[mini-case]
Y-hat = 562.15 - 5.44x
1
- 20.01x
2
For a home with zero attic insulation and an outside
temperature of zero, 562.15 gallons of heating oil would be
consumed. [ caution .. data boundaries .. extrapolation ]
For each incremental increase in

degree F of temperature, for a given
amount of attic insulation, heating oil
consumption drops 5.44 gallons.

+
Multiple Regression Equation
[mini-case]
Y-hat = 562.15 - 5.44x
1
- 20.01x
2
For a home with zero attic insulation and an outside temperature of zero,
562 gallons of heating oil would be consumed. [ caution ]
For each incremental increase in degree F of temperature, for a given
amount of attic insulation, heating oil consumption drops 5.44 gallons.
For each incremental increase in inches
of attic insulation, at a given temperature,
heating oil consumption drops 20.01
gallons.
Multiple Regression Prediction
[mini-case]
Y-hat = 562.15 - 5.44x
1
- 20.01x
2

with x
1
= 15
o
F and x
2
= 10 inches

Y-hat = 562.15 - 5.44(15) - 20.01(10)
= 280.45 gallons consumed
Coefficient of Multiple Determination
[mini-case]
R
2
y.12
= .9656

96.56 percent of the variation in
heating oil can be explained by
the variation in temperature and
insulation.
Coefficient of Multiple Determination
Proportion of variation in Y explained by all
X variables taken together
R
2
Y.12
= Explained variation = SSR
Total variation SST

Never decreases when new X variable is added
to model
Only Y values determine SST
Disadvantage when comparing models
Proportion of variation in Y explained by all
X variables taken together
Reflects
Sample size
Number of independent variables
Smaller [more conservative] than R
2
Y.12

Used to compare models
Coefficient of Multiple Determination
Adjusted
Coefficient of Multiple Determination
(adjusted)



R
2
(adj) y.123- - -P

The proportion of Y that is explained by the
set of independent [explanatory] variables
selected, adjusted for the number of
independent variables and the sample size.
Coefficient of Multiple Determination
(adjusted) [Mini-Case]
R
2
adj
= 0.9599

95.99 percent of the variation in
heating oil consumption can be
explained by the model - adjusted
for number of independent variables
and the sample size
Coefficient of Partial Determination
Proportion of variation in Y explained by
variable X
P
holding all others constant
Must estimate separate models
Denoted R
2
Y1.2
in two X variables case
Coefficient of partial determination of X
1
with Y
holding X
2
constant
Useful in selecting X variables
Coefficient of Partial
Determination [p. 878]
R
2
y1.234 --- P

The coefficient of partial variation of
variable Y with x
1
holding constant
the effects of variables x
2
, x
3
, x
4
, ... x
P
.
Coefficient of Partial Determination
[Mini-Case]
R
2
y1.2
= 0.9561

For a fixed (constant) amount of
insulation, 95.61 percent of the variation
in heating oil can be explained by the
variation in average atmospheric
temperature. [p. 879]
Coefficient of Partial Determination
[Mini-Case]

R
2
y2.1
= 0.8588

For a fixed (constant) temperature,
85.88 percent of the variation in
heating oil can be explained by the
variation in amount of insulation.

Testing Overall Significance
Shows if there is a linear relationship
between all X variables together & Y
Uses p-value
Hypotheses
H
0
:
1
=
2
= ... =
P
= 0
No linear relationship
H
1
: At least one coefficient is not 0
At least one X variable affects Y
Examines the contribution of a set of X
variables to the relationship with Y
Null hypothesis:
Variables in set do not improve significantly
the model when all other variables are included
Must estimate separate models
Used in selecting X variables
Testing Model Portions
Diagnostic Checking
H
0
retain or reject
If reject - {p-value 0.05}
R
2
adj
Correlation matrix
Partial correlation matrix
Multicollinearity
High correlation between X variables
Coefficients measure combined effect
Leads to unstable coefficients depending on
X variables in model
Always exists; matter of degree
Example: Using both total number of rooms
and number of bedrooms as explanatory
variables in same model
Detecting Multicollinearity
Examine correlation matrix
Correlations between pairs of X variables are
more than with Y variable
Few remedies
Obtain new sample data
Eliminate one correlated X variable
Evaluating Multiple Regression Model Steps
Examine variation measures
Do residual analysis
Test parameter significance
Overall model
Portions of model
Individual coefficients
Test for multicollinearity
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Dummy-Variable Regression Model
Involves categorical X variable with
two levels
e.g., female-male, employed-not employed, etc.
Dummy-Variable Regression Model
Involves categorical X variable with
two levels
e.g., female-male, employed-not employed, etc.
Variable levels coded 0 & 1
Dummy-Variable Regression Model
Involves categorical X variable with
two levels
e.g., female-male, employed-not employed, etc.
Variable levels coded 0 & 1
Assumes only intercept is different
Slopes are constant across categories
Dummy-Variable Model Relationships
Y
X
1
0
0
Same slopes b
1
b
0
b
0
+ b
2
Females
Males
Dummy Variables

Permits use of
qualitative data
(e.g.: seasonal, class
standing, location,
gender).

0, 1 coding
(nominative data)


As part of Diagnostic
Checking;
incorporate outliers
(i.e.: large residuals)
and influence
measures.


Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Interaction Regression Model
Hypothesizes interaction between pairs of X
variables
Response to one X variable varies at different
levels of another X variable
Contains two-way cross product terms
Y =
0
+
1
x
1
+
2
x
2
+
3
x
1
x
2
+
Can be combined with other models
e.g. dummy variable models
Effect of Interaction
Given:
Without interaction term, effect of X
1
on Y
is measured by
1

With interaction term, effect of X
1
on
Y is measured by
1
+
3
X
2
Effect increases as X
2i
increases
Y X X X X
i i i i i i

0 1 1 2 2 3 1 2
Interaction Example
X
1
4
8
12
0
0 1 0.5 1.5
Y
Y = 1 + 2X
1
+ 3X
2
+ 4X
1
X
2

Interaction Example
X
1
4
8
12
0
0 1 0.5 1.5
Y
Y = 1 + 2X
1
+ 3X
2
+ 4X
1
X
2

Y = 1 + 2X
1
+ 3(0) + 4X
1
(0) = 1 + 2X
1

Interaction Example
Y
X
1
4
8
12
0
0 1 0.5 1.5
Y = 1 + 2X
1
+ 3X
2
+ 4X
1
X
2

Y = 1 + 2X
1
+ 3(1) + 4X
1
(1) = 4 + 6X
1

Y = 1 + 2X
1
+ 3(0) + 4X
1
(0) = 1 + 2X
1

Interaction Example
Effect (slope) of X
1
on Y does depend on X
2
value
X
1
4
8
12
0
0 1 0.5 1.5
Y
Y = 1 + 2X
1
+ 3X
2
+ 4X
1
X
2

Y = 1 + 2X
1
+ 3(1) + 4X
1
(1) = 4 + 6X
1

Y = 1 + 2X
1
+ 3(0) + 4X
1
(0) = 1 + 2X
1

Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Inherently Linear Models
Non-linear models that can be expressed in
linear form
Can be estimated by least square in linear form
Require data transformation
Y
X
1
Curvilinear Model Relationships
Y
X
1
Y
X
1
Y
X
1
Logarithmic Transformation
Y
X
1

1
> 0

1
< 0
Y = +
1
lnx
1
+
2
lnx
2
+
Square-Root Transformation
Y
X
1
Y X X
i i i i

0 1 1 2 2

1
> 0

1
< 0
Reciprocal Transformation
Y
X
1

1
> 0

1
< 0
i
i i
i
X X
Y
2
2
1
1 0
1 1
Asymptote
Exponential Transformation
Y
X
1

1
> 0

1
< 0
Y e
i
X X
i
i i

0 1 1 2 2
Overview
Explained the linear multiple regression
model
Interpreted linear multiple regression
computer output
Explained multicollinearity
Described the types of multiple regression
models
Source of Elaborate Slides
Prentice Hall, Inc
Levine, et. all, First Edition
Regression Analysis
[Multiple Regression]
*** End of Presentation ***
Questions?

You might also like