0% found this document useful (0 votes)
16 views19 pages

VIP SLIDE On Regression

Uploaded by

Famecious Fame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views19 pages

VIP SLIDE On Regression

Uploaded by

Famecious Fame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 19

Partial Correlation and Multiple

Regression and Correlation


In This Presentation
 Partial correlations
 Multiple regression
 Using the multiple regression line to predict Y
 Multiple correlation coefficient (R2)
 Limitations of multiple regression and
correlation
Introduction
 Multiple Regression and Correlation allow us
to:
1. Disentangle and examine the separate
effects of the independent variables.
2. Use all of the independent variables to
predict Y.
3. Assess the combined effects of the
independent variables on Y.
Partial Correlation
 Partial Correlation measures the correlation
between X and Y controlling for Z
 Comparing the bivariate (“zero-order”)
correlation to the partial (“first-order”)
correlation allows us to determine if the
relationship between X and Y is direct,
spurious, or intervening
 Interaction cannot be determined with
partial correlations
Partial Correlation
 Note the subscripts in the symbol for a partial
correlation coefficient:
rxy●z
which indicates that the correlation coefficient is for X
and Y controlling for Z
Partial Correlation
Example
 The table below lists husbands’ hours of housework per week (Y),
number of children (X), and husbands’ years of education (Z) for a
sample of 12 dual-career households
Partial Correlation
Example
 A correlation matrix appears below
 The bivariate (zero-order) correlation between husbands’
housework and number of children is +0.50
 This indicates a positive relationship
Partial Correlation
Example
 Calculating the partial (first-order) correlation between
husbands’ housework and number of children controlling for
husbands’ years of education yields +0.43
Partial Correlation
Example
 Comparing the bivariate correlation (+0.50) to
the partial correlation (+0.43) finds little
change
 The relationship between number of children
and husbands’ housework controlling for
husbands’ education has not changed
 Therefore, we have evidence of a direct
relationship
Multiple Regression
Previously, the bivariate regression equation was:

In the multivariate case, the regression equation


becomes:
Multiple Regression
Y = a + b1X1 + b2X2
Notation
 a is the Y intercept, where the regression line crosses the
Y axis
 b1 is the partial slope for X1 on Y
 b1 indicates the change in Y for one unit change in X 1,
controlling for X2
 b2 is the partial slope for X2 on Y
 b2 indicates the change in Y for one unit change in X 2,
controlling for X1
Multiple Regression using SPSS
 Suppose we are interested in the link between Daily Calorie Intake
and Female Life Expectancy in a third world country
 Suppose further that we wish to look at other variables that might
predict Female life expectancy
 One way to do this is to add additional variables to the equation and
conduct a multiple regression analysis.
 E.g. literacy rates with the assumption that those who read can
access health and medical information
Multiple Regression using SPSS:
Steps to Set Up the Analysis
 In Data Editor go to Analyze/ Regression/
Linear and click Reset
 Put Average Female Life Expectancy into
the Dependent box
 Put Daily Calorie Intake and People who
Read % into the Independents box
 Under Statistics, select Estimates,
Confidence Intervals, Model Fit,
Descriptives, Part and Partial Correlation,
R Square Change, Collinearity
Diagnostics, and click Continue
 Under Options, check Include Constant in
the Equation, click Continue and then OK
 Compare your output to the next several
slides
Interpreting Your SPSS Multiple
Regression Output
 First let’s look at the zero-order (pairwise) correlations
between Average Female Life Expectancy (Y), Daily Calorie
Intake (X1) and People who Read (X2). Note that these
are .776 for Y with X1, .869 for Y with X2, and .682 for X1 with
X2

Correlations

Average
female life Daily calorie People who
expectancy intake read (%)
Pearson Correlation Average female life
1.000 .776 .869
expectancy
r YX1 Daily calorie intake
People who read (%)
.776
.869
1.000
.682
.682
1.000
r X1X2
r YX2 Sig. (1-tailed) Average female life
expectancy
. .000 .000
Daily calorie intake .000 . .000
People who read (%) .000 .000 .
N Average female life
74 74 74
expectancy
Daily calorie intake 74 74 74
People who read (%) 74 74 74
Examining the Regression Weights
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B Correlations Collinearity Statistics
Model B Std. Error Beta t Sig. Lower Bound Upper Bound Zero-order Partial Part Tolerance VIF
1 (Constant) 25.838 2.882 8.964 .000 20.090 31.585
People who read (%) .315 .034 .636 9.202 .000 .247 .383 .869 .738 .465 .535 1.868
Daily calorie intake .007 .001 .342 4.949 .000 .004 .010 .776 .506 .250 .535 1.868
a. Dependent Variable: Average female life expectancy

• Above are the raw (unstandardized) and standardized regression weights


for the regression of female life expectancy on daily calorie intake and
percentage of people who read.
•The standardized regression coefficient (beta weight) for daily caloric intake
is .342.
•The beta weight for percentage of people who read is much larger, .636.
•What this weight means is that for every unit change in percentage of
people who read (that is, for every increase by a factor of one standard
deviation on the people who read variable), Y (female life expectancy)
will increase by a multiple of .636 standard deviations.
•Note that both the beta coefficients are significant at p < .001
R, R Square, and the SEE
Model Summary

Change Statistics
Adjusted Std. Error of R Square
Model R R Square R Square the Estimate Change F Change df1 df2 Sig. F Change
1 .905a .818 .813 4.948 .818 159.922 2 71 .000
a. Predictors: (Constant), People who read (%), Daily calorie intake

Above is the model summary, which has some important


statistics. It gives us R and R square for the regression of
Y (female life expectancy) on the two predictors. R is .905,
which is a very high correlation. R square tells us what
proportion of the variation in female life expectancy is
explained by the two predictors, a very high .818. It gives
us the standard error of estimate, which we can use to put
confidence intervals around the unstandardized regression
coefficients
F Test for the Significance of the
Regression Equation
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 7829.451 2 3914.726 159.922 .000a
Residual 1738.008 71 24.479
Total 9567.459 73
a. Predictors: (Constant), People who read (%), Daily calorie intake
b. Dependent Variable: Average female life expectancy

Next we look at the F test of the significance of the


Regression equation, Y = .342 X1 + .636 X2. Is this so much better a predictor
of female literacy (Y) than simply using the mean of Y that the difference is
statistically significant? The F test is a ratio of the mean square for the
regression equation to the mean square for the “residual” (the departures of
the actual scores on Y from what the regression equation predicted). In this
case we have a very large value of F, which is significant at p <.001. Thus it
is reasonable to conclude that our regression equation is a significantly better
predictor than the mean of Y.
Confidence Intervals around the
Regression Weights

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B Correlations
Model B Std. Error Beta t Sig. Lower Bound Upper Bound Zero-order Partial Part
1 (Constant) 25.838 2.882 8.964 .000 20.090 31.585
Daily calorie intake .007 .001 .342 4.949 .000 .004 .010 .776 .506 .250
People who read (%) .315 .034 .636 9.202 .000 .247 .383 .869 .738 .465
a. Dependent Variable: Average female life expectancy

Finally, your output provides confidence intervals around the


unstandardized regression coefficients. Thus we can say
with 95% confidence that the unstandardized weight to apply
to daily calorie intake to predict female life expectancy
ranges between .004 and .010, and that the undstandardized
weight to apply to percentage of people who read ranges
between .247 and .383
Limitations
Multiple regression and correlation are among the most powerful
techniques available to researchers. But powerful techniques have
high demands.
These techniques require:
 Every variable is measured at the interval-ratio level
 Each independent variable has a linear relationship with the
dependent variable
 Independent variables do not interact with each other
 Independent variables are uncorrelated with each other

When these requirements are violated (as they often are), these
techniques will produce biased and/or inefficient estimates. There
are more advanced techniques available to researchers that can
correct for violations of these requirements. Such techniques are
beyond the scope of this text.

You might also like