0% found this document useful (0 votes)
42 views25 pages

Chapter 14 Multiple Regression and Correlation Analysis

The document discusses multiple regression analysis and correlation. It defines multiple regression, discusses assumptions of multiple regression, and provides an example of using multiple regression to predict yearly food expenditures based on income, family size, and whether there are college students in the family.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views25 pages

Chapter 14 Multiple Regression and Correlation Analysis

The document discusses multiple regression analysis and correlation. It defines multiple regression, discusses assumptions of multiple regression, and provides an example of using multiple regression to predict yearly food expenditures based on income, family size, and whether there are college students in the family.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

14- 1

Chapter

Fourtee
n

McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.


14- 2
Chapter Fourteen
Multiple Regression and Correlation Analysis
GOALS
When you have completed this chapter, you will be able to:

ONE
Describe the relationship between two or more independent variables
and the dependent variable using a multiple regression equation.
TWO
Compute and interpret the multiple standard error of estimate and the
coefficient of determination.
THREE
Interpret a correlation matrix.
FOUR Goals
Setup and interpret an ANOVA table.
14- 3
Chapter Fourteen continued
Multiple Regression and Correlation
Analysis
GOALS
When you have completed this chapter, you will be able to:

FIVE
Conduct a test of hypothesis to determine if any of the set of
regression coefficients differ from zero.
SIX
Conduct a test of hypothesis on each of the regression
coefficients.

Goals
14- 4

Multiple Regression and Correlation


Analysis
The general multiple regression with k
independent variables is given by:

Greek letters are


used for a (α) and a is the Y-intercept.
b (β) when X1 to Xk are the
denoting independent
population variables.
parameters.
Multiple Regression
Analysis
14- 5

bj is the net change in Y for each unit change in Xj


holding all other values constant, where j=1 to k. It is
called a partial regression coefficient, a net regression
coefficient, or just a regression coefficient.
The least squares criterion Because determining
is used to develop this b1, b2, etc. is very
equation. tedious, a software
package such as Excel
or MINITAB is
recommended.

Multiple Regression
Analysis
14- 6

The Multiple Standard Error of Estimate is


a measure of the effectiveness of the regression equation.

It is measured in the same It is difficult to


units as the dependent determine what is a
variable. large value and
what is a small
The formula is:
value of the
standard error.

Multiple Standard Error


of Estimate
14- 7

Assumptions In Multiple Regression and Correlation

The independent variables The dependent


variable must be
and the dependent variable continuous and at
have a linear relationship. least interval-scaled.
The residuals should
follow the normal
distributed with mean 0.
The variation in (Y-Y’) or Successive values of the
residual must be the same dependent variable must
for all values of Y. When be uncorrelated or not
this is the case, we say the autocorrelated.
difference exhibits Multiple Regression and
homoscedasticity. Correlation Assumptions
Explained Variation 14- 8

ANOVA TABLE Variation


accounted
Source df SS MS for by the set
Regression k-1 SSR SSR/(k-1) of
Σ(Y’–Y)2 independent
variables.
Error n-k-1 SSE SSE/(n-k-1)
Σ(Y-Y’)2
Total n-k-1 SS Total
Σ(Y-Y)

Unexplained or Random Variation Total Variation


Variation not accounted for by the
independent variables.
ANOVA table
14- 9

A correlation matrix is
used to show all possible
simple correlation coefficients
Correlation Sales
among the variables. Coefficients Cars Advertising force

Cars 1.000

The matrix is useful for Advertising 0.808 1.000


Sales force 0.872 0.537 1.000
locating correlated
independent variables.

It shows how strongly each


independent variable is
correlated with the dependent
variable. Correlation Matrix
14- 10
The global test is used to investigate whether any of the
independent variables have significant coefficients. The
hypotheses are:

The test statistic is the F distribution with k


(number of independent variables) and
n-(k+1) degrees of freedom, where n is the
sample size.

Global Test
14- 11

The test of individual variables is used to determine which


independent variables have nonzero regression coefficients.

The variables that The test statistic is the t


have zero regression distribution with
coefficients are n-(k+1) degrees of
usually dropped from freedom.
the analysis. bj – 0
t= S
b
j

Test for Individual


Variables
14- 12
A market researcher for
Super Dollar Super Markets
is studying the yearly amount
families of four or more
spend on food. Three
independent variables are
thought to be related to
yearly food expenditures
(Food). Those variables are:
total family income (Income)
in $00, size of family (Size),
and whether the family has
children in college (College).

EXAMPLE 1
14- 13

Food
expenditures = a + b1*(Income) + b2(Size) + b3(College)
Note the following regarding Other examples of
the regression equation. dummy variables
The variable college is called include gender, the
a dummy or indicator variable. part is acceptable or
It can take only one of two unacceptable, the
possible outcomes. That is a voter will or will not
child is a college student or vote for the incumbent
not. governor.

We usually code one value of the dummy


variable as “1” and the other “0.”
Example 1 continued
14- 14

Example 1 continued
14- 15
Use a computer software package,
such as MINITAB or Excel, to
develop a correlation matrix.

From the analysis provided by MINITAB, write


out the regression equation
Y’ = 954 +1.09X1 + 748X2 + 565X3
Food
Expenditure=$954+$1.09*income+$748*size+$565*college
What food expenditure would you
estimate for a family of 4, with no
college students, and an income of
$50,000 (which is input as 500)? Example 1 continued
14- 16
The regression equation is
Food = 954 + 1.09 Income + 748 Size + 565 Student

Predictor Coef SE Coef T P


Constant 954 1581 0.60 0.563
Income 1.092 3.153 0.35 0.738
Size 748.4 303.0 2.47 0.039
Student 564.5 495.1 1.14 0.287

S = 572.7 R-Sq = 80.4% R-Sq(adj) = 73.1%

Analysis of Variance

Source DF SS MS F P
Regression 3 10762903 3587634 10.94 0.003
Residual Error 8 2623764 327970
Total 11 13386667

Example 1 continued
14- 17

Food
Expenditure=$954+$1.09*income+$748*size+$565*college
Each additional $100 dollars of income per year will
increase the amount spent on food by $109 per year.
An additional family member will increase the amount
spent per year on food by $748.
A family with a college student will spend $565 more per
year on food than those without a college student.
Food Expenditure=$954+$1.09*500+$748*4+$565*0
So a family of 4, with no college
students, and an income of $50,000
will spend an estimated $4,491. Example 1 continued
From the regression 14- 18

output we note: Food Income Size College


The coefficient of
determination is 80.4 Food 1.000
percent. This means that
more than 80 percent of Income 0.587 1.000
the variation in the Size 0.876 0.609 1.000
amount spent on food is College 0.773 0.491 0.743 1.000
accounted for by the
variables income, family None of the correlations among
size, and student. the independent variables should
The strongest correlation cause problems. All are between
between the dependent variable –.70 and .70.
and an independent variable is
between family size and amount
spent on food. Correlation matrix
14- 19

Conduct a global test of hypothesis to determine if


any of the regression coefficients are not zero.

H0 is rejected if F>4.07.
From the MINITAB output, the computed value of
F is 10.94.
Decision: H0 is rejected. Not all the regression
coefficients are zero

Example 1 continued
14- 20

Conduct an individual test to determine which coefficients


are not zero. This is the hypotheses for the independent
variable family size.

From the MINITAB output, Thus, using the 5% level


the only significant variable of significance, reject H0
is FAMILY (family size) if the p-value < .05.
using the p-values. The
other variables can be
omitted from the model.
Example 1 continued
14- 21

We rerun the analysis using only the significant


independent family size. The new regression equation
is:
Y’ = 340 + 1031X2

The coefficient of determination is 76.8 percent. We


dropped two independent variables, and the R-square
term was reduced by only 3.6 percent.

Example 1 continued
14- 22

Regression Analysis: Food versus Size

The regression equation is


Food = 340 + 1031 Size

Predictor Coef SE Coef T P


Constant 339.7 940.7 0.36 0.726
Size 1031.0 179.4 5.75 0.000

S = 557.7 R-Sq = 76.8% R-Sq(adj) = 74.4%

Analysis of Variance

Source DF SS MS F P
Regression 1 10275977 10275977 33.03 0.000
Residual Error 10 3110690 311069
Total 11 13386667

Example 1 continued
14- 23

A residual is the difference between the actual


value of Y and the predicted value Y’.

Residuals should follow the normal distribution.


Histograms are useful in checking this
requirement.

A plot of the residuals and their corresponding


Y’ values is used for showing that there are no
trends or patterns in the residuals.

Analysis of Residuals
14- 24

Residual Plots against Estimated Values of Y

1000
Residuals

500

-500
4500 6000 7500

Y’ Residual Plot
14- 25

8
7
6
Frequency

5
4
3
2
1
0
-600 -200 200 600 1000
Residuals

Histograms of Residuals

You might also like