0% found this document useful (0 votes)
2 views

Lecture 25 - Multiple Regression

The document discusses multiple regression analysis, explaining how it allows researchers to use several independent variables to predict a dependent variable while isolating the unique effects of each variable. It covers concepts such as error analysis, R-squared, and the significance of regression coefficients, along with practical applications using SPSS. Additionally, it emphasizes the importance of understanding the relationships between variables in social science research.

Uploaded by

Jurriesk
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 25 - Multiple Regression

The document discusses multiple regression analysis, explaining how it allows researchers to use several independent variables to predict a dependent variable while isolating the unique effects of each variable. It covers concepts such as error analysis, R-squared, and the significance of regression coefficients, along with practical applications using SPSS. Additionally, it emphasizes the importance of understanding the relationships between variables in social science research.

Uploaded by

Jurriesk
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Multiple Regression

Error Analysis
 Y = a + bX This equation gives the conditional mean of Y at
any given value of X.

X
 So… In reality, our line gives us the expected mean of Y given
each value of X
 The line’s equation tells you how the mean on your dependent
variable changes as your independent variable goes up.
Error Analysis

 As you know, every mean has a distribution around it--so there


is a standard deviation. This is true for conditional means as
Y
well. So, you also have a conditional standard deviation.

Y
Error Analysis
 Can tell the improvement of predicting Y when taking X into
account
 Conditional standard deviations will be smaller than Y’s original SD
Y
Y Original Conditional
standard standard
deviation deviation
X

 If this is so, you have improved prediction of the mean value of Y by


looking at each level of X.
 If there were no relationship, the conditional standard deviation
would be the same as the original, and the regression line would be
flat at the mean of Y.
Error Analysis

 Let’s call the variation around the mean in Y “Error 1.”


 The variation around the line when X is “Error 2.”
Error 1 Y Error 2
Y

 Like ANOVA, we’ll use Sum of Squares to represent Error.

 Error 1 (E1) =  (Y-bar – Y)2 also called “Sum of Squares”

 Error 2 (E2) =  (Yline – Y)2 also called “Sum of Squared Errors”


R-Squared

Error 2
Error 1 Y
Y

X
R-squared
AKA “coefficient of determination”…
(Square of the Pearson correlation)

These formulas are for


explanatory purposes only,
no need to memorize them.

Error 2
Error 1 Y
Y

X
R-Squared
 Is the improvement obtained by using X in getting as near as possible to
everybody’s value for Y over just using the mean for Y alone.
 Falls between 0 and 1
 Of 1 means an exact fit
 (there is no variation of scores around the regression line)
 Of 0 means no relationship
 (as much scatter as in the original Y variable and a flat regression line
through the mean of Y)
 Would be the same for X regressed on Y as for Y regressed on X
 Can be interpreted as the percentage of variability in Y that is explained by
X.
Error Analysis, SPSS

r2
Line to the Mean 215.464 ÷ 3019.978 = .071

Data
points to
the line

Data points
to the mean
Regression, Inferential Statistics

Inferential:
Descriptive:
 What can we say about the
 The equation for your line is
relationship between your
a descriptive statistic. It
variables in the population???
tells you the real, best-fitted
 The inferential statistics are
line that minimizes squared
errors FOR YOUR DATA estimates based on the best-
POINTS. fitted line.
 F-test
Regression, Inferential Statistics

 The ratio of Regression (line to the mean of Y) to Residual (line to


data point) Sums of Squares forms an F ratio in repeated sampling.
 Null: r2 = 0 in the population. If F exceeds critical F, then your
variables have a relationship in the population (X explains some of
the variation in Y).
Regression, Inferential Statistics

 What about the Slope?


 AKA“Coefficient”?
 From sample to sample, different
slopes would be obtained.
 The slope has a sampling distribution
-3 -2 -1 0 1
that is normally distributed. z 2 3 
 So we can do a significance test.
The standard
error and t
appears on
Regression SPSS output
…and the p-
value too!

The slope
and y-
intercept
can be
found in
the SPSS
output.
The Regression is significant
The Slope is significant.

Regression SPSS Example There is evidence of a negative


relationship in the population
between Income and Happiness.
7.1% of the variation in Marriage
attitude is explained by age.
The wealthier people get, the more
likely they are to be happy….maybe.
BUT…This doesn’t tell us more than
Regression SPSS Example the correlation
Multiple Regression
 Multiple Regression is very popular among
social scientists.
 Most social phenomena have more than one
cause.
 It is very difficult to manipulate just one social
variable through experimentation.
 Social scientists must attempt to model complex
social realities to explain them.
Multiple Regression
 Multiple Regression allows us to:
 Use several variables at once to explain the variation in a
continuous dependent variable.
 Isolate the unique effect of one variable on the continuous
dependent variable while taking into consideration that
other variables are affecting it too.
 Write a mathematical equation that tells us the overall
effects of several variables together and the unique effects
of each on a continuous dependent variable.
 Control for other variables to demonstrate whether
bivariate relationships are spurious
Multiple Regression
 For example:
A researcher could look at the relationship between
Job Satisfaction and Family Income and Happiness.

Independent Variables Dependent Variable


Job Satisfaction
Happiness
Family Income
Multiple Regression
 For example:
 Null Hypothesis: Together, Job Satisfaction and Family
Income predict Happiness.
 Research Hypothesis: Together, Job Satisfaction and
Family Income do not predict Happiness.

Independent Variables Dependent Variable


Job Satisfaction
Happiness
Family Income
Multiple Regression
 Bivariate regression is based on fitting a line as close
as possible to the plotted coordinates of your data on
a two-dimensional graph.
 Trivariate regression is based on fitting a plane as
close as possible to the plotted coordinates of your
data on a three-dimensional graph.
Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Happiness (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14
Job Satisfaction (X1) 12 16 20 12 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8
Family Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1
Multiple Regression
Y
Plotted coordinates
(1 – 10) for Job
Satisfaction, Family
Income and Happiness
0

X2 X1

What multiple regression


does is fit a plane to
these coordinates.
Multiple Regression
 Mathematically, that plane is:

Y = a + b1X1 + b2X2
a = y-intercept, where X’s equal zero
b=coefficient or slope for each variable

For

our problem, SPSS says the equation is:
Y = 1.6 + .21X1 - .003X2
Expected Happiness = 1.6 + .21*Job Satisfaction - .003*Family Income
Multiple Regression

12% of the variation in


happiness is explained
by job satisfaction and
family income!


Y = 1.6 + .21X1 - .003X2
Multiple Regression
For our problem, SPSS says the equation is:

Y = 1.6 + .21X1 - .003X2
Expected Happiness = 1.6 + .21*Job Satisfaction - .003*Family Income

We can input values and see what the Y would be for those values.
 Job Satisfaction = 3

 Family Income = 20

Y= 1.6 + .21(4) - .003(50)


Y= 1.6 + .84 - .15
Y = 2.44 – 0.15
Y = 2.29
Multiple Regression
 Standardized Coefficient – A statistic that provides a
way to compare the relative importance of different
variables in a multiple regression analysis.
 It is the regression coefficients expressed in z-scores, so the
Standard Coefficient is the number of Standard Deviations
that the DV changes when the IV changes 1 Standard
Deviation.
 It allows you to compare different IVs’ influence on the DV.
Whichever has the furthest Standardized Coefficient from 0 (-
1 to +1) is the strongest influence on the DV.
For each 1 SD
increase in Job
Satisfaction, there
is a .26 SD increase
in happiness

For each 1 SD
increase in Family
Income, there is
a .19 SD decrease
in happiness

Look here to compare variables’ impact on the DV


Multiple Regression - SPSS
 Analyze>Regression>Linear

Put in single DV

Put in 2+ IVs

Click OK
(That’s it!)
Multiple Regresson - SPSS
 Let’s take a look at an example in SPSS…
Writing Statistics – Regression
For Predictors, Degrees of
freedom for the t-test is the DF
Residual.
• Inferential Statistics – Regression:
• F(5, 125)=7.11, p=0.015 – Now we have DF Regression and DF Residual reported
here (just like ANOVA).
• Where does this go in a sentence? We have two independent variables
now and a different conclusion. Let’s start with our overall findings:
• A simple linear regression was conducted, which showed that age and income
together significantly predicted happiness, F(5, 125)=7.11, p=0.02.
• Next, add individual variables’ unique contributions:
• On their own, age uniquely predicted happiness [β = -.34, t(125) = 7.12, p = .013]
and income uniquely predicted happiness [β = .52, t(125) = 6.53, p = .001].
Writing Statistics – Regression

• A simple linear regression was conducted, which showed that having


trust in people and access to natural environments together
___________ predict life satisfaction, F(_, ___)=_____, p=_____.
• A simple linear regression was conducted, which showed that having
trust in people and access to natural environments together
significantly predict life satisfaction, F(2, 1128)=21.85, p<0.001.
Writing Statistics – Regression

• On their own, trust ______________ satisfaction [β = ___, t(____) =


_____, p = ____] and access to natural environments _____________
satisfaction [β = ___, t(____) = _____, p = ____].
• On their own, trust does uniquely predict satisfaction [β = .16, t(1128)
= 5.53, p < 0.001] and access to natural environments does uniquely
predict satisfaction [β = .09, t(1128) = 3.06, p = 0.002].
Writing Statistics – Regression

• Together, trust and access to natural environments account for


_____________ of variability in life satisfaction.
• Together, trust and access to natural environments account for 3.7%
of variability in life satisfaction.
In-Class Assignment #22

You might also like