100% found this document useful (2 votes)
244 views105 pages

UKP6053 - L8 Multiple Regression

This document discusses multiple linear regression analysis. It begins by stating the learning objectives are to understand what simple and multiple regression are, their rationale, how to conduct regression analyses in SPSS, and how to interpret the results. It then provides instructions for an assignment using a survey dataset to run multiple linear regression with total optimism as the dependent variable and total self-esteem and total life satisfaction as independent variables, interpreting the results in APA style. Finally, it covers key aspects of multiple linear regression, including the regression equation, correlation and determination coefficients, assumptions, and an example analysis.

Uploaded by

Fiq Razali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
244 views105 pages

UKP6053 - L8 Multiple Regression

This document discusses multiple linear regression analysis. It begins by stating the learning objectives are to understand what simple and multiple regression are, their rationale, how to conduct regression analyses in SPSS, and how to interpret the results. It then provides instructions for an assignment using a survey dataset to run multiple linear regression with total optimism as the dependent variable and total self-esteem and total life satisfaction as independent variables, interpreting the results in APA style. Finally, it covers key aspects of multiple linear regression, including the regression equation, correlation and determination coefficients, assumptions, and an example analysis.

Uploaded by

Fiq Razali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 105

MULTIPLE REGRESSION

Lecture 8
1
LEARNING OBJECTIVES
 In this lecture you will learn:
 What simple and multiple regression mean.
 The rationale behind these forms of analyses
 How to conduct a simple bivariate and multiple
regression analyses using SPSS
 How to interpret the results of a regression analysis

2
ASSIGNMENT
 using survey.sav file:
1- run multiple linear regression between between
Total Optimism as (DV) AND both of "Total Self-
esteem" and "Total Life Satisfaction" as (IVs)
2- interpret the results using APA style.

3
MULTIPLE LINEAR
REGRESSION

4
LINEAR RELATIONS BETWEEN TWO OR MORE
IVS AND A SINGLE DV
MULTIPLE REGRESSION
 Multiple regression is used when there is more
than one predictor variable

 Two major uses of multiple regression:


 Prediction
 Causal analysis

5
LINEAR REGRESSION SUMMARY

 Linear Regression

6
X Y

 Multiple Linear Regression


 X1
 X2
 X3 Y
 X4
 X5
Correlation X Y

Simple Regression

7
Y

Partial Correlation
MLR
X1 X2
USES OF MULTIPLE REGRESSION
 Multiple regression can be used to examine the
following:
 How well a set of variables predict an outcome
 Which variable in a set of variables is the best
predictor of the outcome
 Whether a predictor variable still predicts the
outcome when another variable is controlled for.

8
MULTIPLE REGRESSION - EXAMPLE
What might predict exam performance?

Motivation

Attendance at Exam
lectures Performance
(Grade)
Books read
9
REGRESSION EQUATION
 Y = b1x1 + b2x2 +.....+ bixi + a + e
• Y = observed DV scores
• bi = unstandardised regression coefficients (the Bs in

10
SPSS) - slopes
• x1 to xi = IV scores
• a = Y axis intercept
• e = error (residual)
MULTIPLE CORRELATION COEFFICIENT (R)

• “Big R” (capitalise, i.e., R)


• Equivalent of r, but takes into account that there are

11
multiple predictors (IVs)
• Always positive, between 0 and 1
• Interpretation is similar to that for r (correlation
coefficient)
COEFFICIENT OF DETERMINATION (R2)

• “Big R squared”
• Squared multiple correlation coefficient
Usually report R2 instead of R

12

• Indicates the % of variance in DV explained by


combined effects of the IVs
• Analogous to r2
RULE OF THUMB
INTERPRETATION OF R2

• .00 = no linear relationship

13
• R2 = .10 = small (R ~ .3)
• R2 = .25 = moderate (R ~ .5)
• R2 = .50 = strong (R ~ .7)
• R2 = 1.00 = perfect linear relationship
R2 ~ .30 is good for social sciences
ADJUSTED R 2
• Used for estimating explained variance in a
population.
• Report R2 and adjusted R2

14
• Particularly for small N and where results are
to be generalised, take more note of adjusted R2
REGRESSION COEFFICIENTS
 Y = b1x1 + b2x2 +.....+ bixi + a + e

• Y-intercept (a)

15
• Slopes (b):
– Unstandardised
– Standardised
• Slopes are the weighted loading of IV, adjusted for
the other IVs in the model.
UNSTANDARDISED
REGRESSION COEFFICIENTS

• B = unstandardised regression coefficient


Used for regression equations

16

• Used for predicting Y scores


• But can’t be compared with one another unless all
IVs are measured on the same scale
STANDARDISED REGRESSION COEFFICIENTS

• Beta (b or ) = standardised regression coefficient


Used for comparing the relative strength of

17

predictors
•  = r in LR but this is only true in MLR when the
IVs are uncorrelated.

• Which IVs are the most important?


• Compare the standardised regression coefficients
(’s)
ASSUMPTIONS

18
ASSUMPTIONS
1. Levels of measurement
– IVs = metric (interval or ratio) or dichotomous
– DV = metric (interval or ratio)
2. Sample size

19
– Ratio of cases to IVs; total N:
– Min. 5:1; > 20 cases total
– Ideal 20:1; > 100 cases total
– Tabachnick and Fidell (2001, p. 117) give a formula for
calculating sample size requirements, taking into
account the number of independent variables that you
wish to use:
N > 50 + 8m (where m = number of independent
variables).
3. NORMALITY, LINEARITY, HOMOSCEDASTICITY,
INDEPENDENCE OF RESIDUALS

 These assumptions can be checked from the


residuals scatterplots which are generated as part
of the multiple regression procedure.

20
 Residuals are the differences between the
obtained and the predicted dependent variable
(DV) scores. The residuals scatterplots allow you
to check:
 linearity:
– Linear relations exist between IVs & DVs
– the residuals should have a straight-line
relationship with predicted DV scores; and
 normality:
– the residuals should be normally distributed about
the predicted DV scores;
– If variables are non-normal, there will be
heteroscedasticity

 homoscedasticity :
– the variance of the
residuals about
predicted DV scores
should be the same for
all predicted scores.
4. OUTLIERS

Multiple regression is very sensitive to outliers (very


high or very low scores). Checking for extreme scores
should be part of the initial data screening process. You

22
should do this for all the variables, both dependent and
independent, that you will be using in your regression
analysis.
• Extreme cases should be deleted or modified.
• Outliers on your dependent variable can be identified
from the standardised residual plot that can be
requested. Outliers defined as those with
standardised residual values above about 3.3 (or less
than –3.3)
CONT.

 A case may be within normal range for each variable


individually, but be a multivariate outlier based on an
unusual combination of responses which unduly
influences multivariate test results.
e.g., a person who:
– Is 19 years old
– Has 3 children
– Has a post-graduate degree
5. MULTICOLLINEARITY AND SINGULARITY
This refers to the relationship among the independent
variables.
 Multicollinearity
– high correlations (e.g., over .7) between IVs.

24
 Singularity occurs when one independent
variable is actually a combination of other
independent variables.
– perfect correlations among IVs.
– Leads to unstable regression coefficients.
 Multiple regression doesn’t like multicollinearity
or singularity, and these certainly don’t
contribute to a good regression model, so always
check for these problems before you start.
MULTICOLLINEARITY
 Detect via:
 Correlation matrix - are there large
correlations among IVs?

25
 Tolerance statistics - if < .3 then exclude
that variable.
 Variance Inflation Factor (VIF) - looking
for < 3, otherwise exclude variable.
6. CAUSALITY

• Like correlation, regression does not tell us


about the causal relationship between variables.

26
• In many analyses, the IVs and DVs could be
swapped around – therefore, it is important to:
– Take a theoretical position
– Acknowledge alternative explanations
EXAMPLE (( FROM FILE: SURVEY.SAV))

27
FILE: SURVEY.SAV

28
 What you need:
• One continuous dependent variable (total perceived stress); and
• Two or more continuous independent variables (mastery,
PCOISS). (You can also use dichotomous independent
variables, e.g. males=1, females=2.)

29
 What it does:
•Multiple regression tells you how much of the variance in
your dependent variable can be explained by your
independent variables.
•It also gives you an indication of the relative contribution
of each independent variable.
•Tests allow you to determine the statistical significance of
the results, both in terms of the model itself, and the
individual independent variables.
CONT.

 Assumptions:
•The Major assumptions for multiple regression
are described in an earlier section of this

30
presentation.
•Some of these assumptions can be checked as
part of the multiple regression analysis
STANDARD MULTIPLE REGRESSION
 Inthis example two questions will be
addressed:

31
Question 1: How well do the two measures of control
(mastery, PCOISS) predict perceived stress? How much
variance in perceived stress scores can be explained by
scores on these two scales?
Question 2: Which is the best predictor of perceived stress:
control of external events (Mastery scale) or control of
internal states (PCOISS)? example two questions will be
addressed:
PROCEDURE FOR STANDARD MULTIPLE REGRESSION

1. From the menu at the top of the screen click on:


Analyze, then click on Regression, then on Linear.
2. Click on your continuous dependent variable and
move it into the Dependent box.

32
3. Click on your independent variables and move them
into the Independent box.
4. For Method, make sure Enter is selected (this will
give you standard multiple regression).
5. Click on the Statistics button.
• Tick the box marked Estimates, Confidence
Intervals, Model fit, Descriptives, Part and
partial correlations and Collinearity diagnostics
 In the Residuals section tick the Casewise
diagnostics and Outliers outside 3 standard
deviations.
• Click on Continue.
6. Click on the Options button. In the Missing Values
section click on Exclude cases pairwise.
7. Click on the Plots button.
• Click on *ZRESID and the arrow button to move this
into the Y box.
• Click on *ZPRED and the arrow button to move this
into the X box.
• In the section headed Standardized Residual Plots, tick
the Normal probability plot option.
• Click on Continue.
8. Click on the Save button.
• In the section labelled Distances tick the Mahalanobis
box (this will identify multivariate outliers for you) and
Cook’s.
• Click on Continue.
9. Click on OK. 33
The output generated from this procedure is shown below.
34
35
36
37
38
Standard Error of the Estimate
The standard error of the estimate is a measure of
the accuracy of predictions.

You can see that in Graph A, the points are closer to the
39
line than they are in Graph B. Therefore, the predictions
in Graph A are more accurate than in Graph B.
40
INTERPRETATION OF OUTPUT FROM
STANDARD MULTIPLE REGRESSION

 Step 1: Checking the assumptions


 Step 2: Evaluating the model

41
 Step 3: Evaluating each of the
independent variables
STEP 1: CHECKING THE ASSUMPTIONS

Multicollinearity (table labelled Correlations)


 Check that your independent variables show at least some
relationship with your dependent variable (above .3 preferably).

42
 In this case both of the scales (Total Mastery and Total
PCOISS) correlate substantially with Total Perceived Stress (–
.61 and –.58 respectively).
 check that the correlation between each of your independent
variables is not too high (say, 0.7 or more). If so, omit one of the
variables or forming a composite variable from the scores of the
two highly correlated variables.
 In the example presented here the correlation is .52, which is
less than .7, therefore all variables will be retained.
MULTICOLLINEARITY “CONT.“
 collinearity diagnostics: (table labelled Coefficients)
 This can pick up on problems with multicollinearity that may
not be evident in the correlation matrix. you should take them
only as a warning sign, and check the correlation matrix.

Tolerance VIF (Variance


inflation factor)
is an indicator of how much of the variability of the the inverse of the
specified independent is not explained by the other Tolerance value (1
independent variables in the model and is calculated divided by
using the formula 1–R2 for each variable. Tolerance).

If this value is very small (less than .10), it VIF values above 10
indicates that the multiple correlation with other would be a concern
variables is high, suggesting the possibility of here, indicating
multicollinearity. multicollinearity.

In this example the tolerance value is .729, which is not less than .10, 43
and
the VIF value is1.372, which is well below the cut-off of 10.; therefore, we
have not violated the multicollinearity assumption.
OUTLIERS, NORMALITY, LINEARITY,
HOMOSCEDASTICITY, INDEPENDENCE OF RESIDUALS.
 inspecting the residuals scatterplot and the
Normal Probability Plot of the regression
standardised residuals.
A straight diagonal line from bottom left to

44

top right would suggest no major deviations


from normality.

 The residuals should be roughly


rectangularly distributed, with most of the
scores concentrated in the centre (along the
0 point).
 What you don’t want to see is a clear or
systematic pattern to your residuals (e.g.
curvilinear, or higher on one side than the other).
Deviations from a centralised rectangle suggest
some violation of the assumptions.
OUTLIERS

 The presence of outliers can also be detected from


the Scatterplot.
 Outliers can be define as cases that have a

45
standardised residual (as displayed in the
scatterplot) of more than 3.3 or less than –3.3.
 With large samples, it is common to find a number of
outlying residuals. If you find only a few, it may not
be necessary to take any action.
 Outliers can also be checked by inspecting the
Mahalanobis distances.
MAHALANOBIS
 To identify which cases are outliers you
will need to determine the critical chi-
square value, using the number of
independent variables as the degrees of
freedom.

46
 In this example I have two independent
variables; therefore the critical value
is13.82.
 To identify outliers for Mah_1 (using Descriptives, Explore
and requesting Outliers from the list of Statistics). you ask
for the program to label the cases by ID.
 The five highest values will be displayed; check that none
of the values exceed the critical value obtained from the
table above. If you have a big sample size, don’t worry
about a few number of outliers.
CASEWISE DIAGNOSTICS
 This presents information about cases that have
standardised residual values above 3.0 or below –3.0.
 In a normally distributed sample we would expect only
1 percent of cases to fall outside this range.

47
 In this sample we have found one case (case number
152) with a residual value of –3.475.
 To check whether this strange case is having any undue
influence on the results for our model as a whole, we
can check the value for Cook’s Distance given
towards the bottom of the Residuals Statistics
table. Cases with values larger than 1 are a potential
problem. In our example the maximum value for Cook’s
Distance is .09
STEP 2: EVALUATING THE MODEL
LOOK IN THE “MODEL SUMMARY BOX”
 R Square is a statistic that will give some information
about the goodness of fit of a model. In regression, the R2
coefficient of determination is a statistical measure of

48
how well the regression line approximates the real data
points. An R2 of 1.0 indicates that the regression line
perfectly fits the data.
 In this case the value is .468. Expressed as a percentage,
this means that our model explains 46.8 percent of the
variance in perceived stress. This is quite a respectable
result particularly when you compare it to some of the
results that are reported in the journals!).
STEP 2: EVALUATING THE MODEL “CONT.“
 Adjusted R Square is a modification of R2 that adjusts
for the number of explanatory terms in a model.
 Unlike R2, the adjusted R2 increases only if the new term
improves the model more than would be expected by

49
chance. The adjusted R2 can be negative, and will always
be less than or equal to R2.
 Adjusted R2 does not have the same interpretation as R2.
When a small sample is involved, the R square value in
the sample tends to be a rather optimistic overestimation
of the true value in the population. The Adjusted R square
statistic ‘corrects’ this value to provide a better estimate of
the true population value. If you have a small sample you
may wish to consider reporting this value, rather than the
normal R Square value.
ANOVA

 look in the table labelled ANOVA


 This tests the null hypothesis that multiple R in

50
the population equals 0.
 The model in this example reaches statistical
significance (Sig = .000, this really means
p<.0005).
STEP 3: EVALUATING EACH OF THE
INDEPENDENT VARIABLES

 We want to know is which of the variables included in


the model contributed to the prediction of the
dependent variable.

51
 We find this information in the output box labelled
Coefficients. Look in the column labelled Beta under
Standardised Coefficients. To compare the different
variables it is important that you look at the
standardised coefficients, not the unstandardised ones.
 If you were interested in constructing a regression
equation, you would use the unstandardised coefficient
values listed as B.
BETA
 Look down the Beta column and find which beta value is
the largest (ignoring any negative signs out the front).
 The largest beta coefficient means that this variable
makes the strongest unique contribution to explaining

52
the dependent variable, when the variance explained by
all other variables in the model is controlled for.
 For each of these variables, check the value in the
column marked Sig. This tells you whether this variable
is making a statistically significant unique contribution
to the equation.
 If the Sig. value is less than .05 (.01, .0001, etc.), then
the variable is making a significant unique contribution
to the prediction of the dependent variable.
PART CORRELATION COEFFICIENTS (SEMI-
PARTIAL CORRELATION COEFFICIENTS)

 If you square this value (whatever it is called) you get an


indication of the contribution of that variable to the total
R squared. In other words, it tells you how much of the

53
total variance in the dependent variable is uniquely
explained by that variable and how much R squared
would drop if it wasn’t included in your model.
 Example: if a variable has a part correlation coefficient
of –.36. If we square this (multiply it by itself) we get .13,
indicating that this variable explains 13 per cent of the
variance in the independent variable.
CHECKLIST FOR STANDARD MULTIPLE REGRESSION
1. Issues
a. Ratio of cases to IVs and missing data
b. Normality, linearity, and homoscedasticity of residuals
c. Outliers
d. multicollineariaty and singularity

54
e. Outliers in the solution
2. Major analyses
a. Multiple R2 and its confidence limits, F ratio
b. Adjusted multiple R2, overall proportion of variance accounted for
c. Significance of regression coefficients
d. Squared semipartial correlations
3. Additional analyses
a. Post hoc significance of correlations
b. Unstandardized (β) weights, confidence limits
c. Standardized (β) weights
d. Unique versus shared variability
e. Suppressor variables
f. Prediction equation
RESULTS
 A standard multiple regression was performed between number
of visits to health professionals as the dependent variable and
physical health, mental health, and stress as independent vari-
ables. Analysis was performed using SPSS REGRESSION and
SPSS EXPLORE for evaluation of assumptions.

55
 Results of evaluation of assumptions led to transformation of
the variables to reduce skewness, reduce the number of outliers,
and improve the normality, linearity, and homoscedasticity of
residuals. A square root transformation was used on the
measure of stress. Logarithmic transformations were used on
number of visits to health professionals and on physical health.
One IV, mental health, was positively skewed without
transformation and negatively skewed with it; it was not
transformed. With the use of a p < .001 criterion for
Mahalanobis distance no outliers among the cases were found.
No cases had missing data and no suppressor variables were
found, N = 465.
 Table …… displays the correlations between the variables,
the unstandardized regression coefficients (B) and
intercept, the standardized regression coefficients (3), the
semipartial correlations Csri2), R2, and adjusted R2. R for
regression was significantly different from zero, F(3, 461) =
92.90, p < .001, with R2 at .38 and 95% confidence limits
from .30 to .44. The adjusted R2 value of .37 indicates that
more than a third of the variability in visits to health
professionals is predicted by number of physical health
Symptoms, stress, and mental health symptoms. For the
two regression coefficients that differed significantly from
zero, 95% confidence limits were calculated. The confidence
limits for (square root of) stress were 0.0091 to 0.0223, and
those for (log of) physical health were 0.8686 to 1.2113.
56
 The three IVs in combination contributed another .15 in
shared variability. Altogether, 38% (37% adjusted) of the
variability in visits to health professionals was predicted by
knowing scores on these three IVs. The size and direction of
the relationships suggest that more visits to health
professionals are made among women with a large number
of physical health symptoms and higher stress. Between
those two, however, number of physical health symptoms is
much more important, as indicated by the squared semi-
partial correlations.
 Although the bivariate correlation between (log of) visits to
health professionals and mental health was statistically
different from zero using a post hoc correction, r= .36, F(3,
461) = 22.16, p < .01, mental health did not contribute
significantly to regression. Apparently, the relationship
between the number of visits to health professionals and
mental health is mediated by the relationships between
physical health, stress, and visits to health professionals.57
TYPES OF MLR

• Standard or direct (simultaneous)


Hierarchical or sequential

58

• Stepwise (forward & backward)


DIRECT OR STANDARD

• All predictor variables are entered together


(simultaneously)

59
• Allows assessment of the relationship between all
predictor variables and the criterion (Y) variable if there is
good theoretical reason for doing so.
• Manual technique & commonly used
HIERARCHICAL (SEQUENTIAL)

• IVs are entered in blocks or stages.


– Researcher defines order of entry for the
variables, based on theory.

60
– May enter ‘nuisance’ variables first to ‘control’
for them, then test ‘purer’ effect of next block of
important variables.
• R2 change - additional variance in Y explained at
each stage of the regression.
– F test of R2 change.
STEPWISE

• Combines forward & backward.


• At each step, variables may be entered or
removed if they meet certain criteria.

61
• Useful for developing the best prediction
equation from the smallest no. of variables.
• Redundant predictors removed.
FORWARD SELECTION

• The strongest predictor variables are


entered, one by one, if they reach a criteria

62
(e.g., p < .05)
• Best predictor = IV with the highest r with
Y
BACKWARD ELIMINATION

• All predictor variables are entered, then the

63
weakest predictors are removed, one by one,
if they meet a criteria (e.g., p > .05)
• Worst predictor = x with the lowest r with Y
WHICH METHOD?

• Standard: To assess impact of all IVs


simultaneously

64
• Hierarchical: To test specific hypotheses derived
from theory
• Stepwise: If goal is accurate statistical prediction –
computer driven
LOGISTIC REGRESSION
 There are many research situations, however, when the
dependent variable of interest is categorical (e.g.
win/lose; fail/pass; dead/alive).

65
 Logistic regression allows you to test models to predict
categorical outcomes with two or more categories. Your
predictor (independent) variables can be either
categorical or continuous, or a mix of both in the one
model.
 There is a family of logistic regression techniques
available in SPSS that will allow you to explore the
predictive ability of sets or blocks of variables, and to
specify the entry of variables.
TECHNIQUES OF LOGISTIC REGRESSION
 A Forced Entry Method, is the default procedure
available in SPSS. In this approach all predictor
variables are tested in one block to assess their
predictive ability, while controlling for the effects of

66
other predictors in the model.
 The stepwise procedures (e.g. forward and
backward)—allow you to specify a large group of
potential predictors from which SPSS can pick a
subset that provides the best predictive power.
 These stepwise procedures have been criticised (in
both logistic and multiple regression) because they can
be heavily influenced by random variation in the data,
with variables being included or removed from the
model on purely statistical grounds
THE END

 There will be “A MIDTERM EXAM”

 THE EXAM’S TOPICS ARE r, LR and


MLR

67
ENRICHMENT

68
MULTIPLE REGRESSION USING SPSS
Analyze Regression Linear

69
70
MULTIPLE REGRESSION: SPSS OUTPUT

b
Va riables Entere d/Re moved

Variables Variables
Model Entered Removed Method
1 Lectures
att ended,
Number of . Enter
booksa
read
a. All reques ted variables ent ered.
b. Dependent Variable: Grade ac hieved
71
MULTIPLE REGRESSION: SPSS OUTPUT

Model Summary

Adjus ted Std. Error of


Model R R Square R Square the Es timate
1 .605 a .367 .336 13.711
a. Predictors : (Constant), Lectures attended, Number of
books read

72
MULTIPLE REGRESSION: SPSS OUTPUT
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regres sion 4569.053 2 2284.526 12.153 .000a
Residual 7895.258 42 187.982
Total 12464.311 44
a. Predic tors : (Const ant), Lec tures at tended, Number of book s read
b. Dependent Variable: Grade ac hieved

For overall model: F(2, 42) = 12.153, p<.001

73
MULTIPLE REGRESSION: SPSS OUTPUT
Coeffi cientsa

Unstandardized St andardiz ed
Coeffic ient s Coeffic ient s
Model B St d. Error Beta t Sig.
1 (Const ant) 39.173 6.625 5.913 .000
Number of books read 3.832 1.712 .331 2.238 .031
Lectures attended 1.290 .536 .356 2.407 .021
a. Dependent Variable: Grade ac hieved

Number of books read is significant predictor


b=.33, t(42) = 2.24, p<.05

Lectures attended is a significant predictor


b=.36, t(42) = 2.41, p<.05
74
MAJOR TYPES OF MULTIPLE REGRESSION
 There are different types of multiple
regression:
Standard multiple regression

}

Theory-based model
 Enter
building
 Hierarchical multiple regression
 Block entry
 Sequential multiple regression
 Forward
 Backward

 Stepwise
} Statistical model building

75
STANDARD MULTIPLE REGRESSION
 Most common method. All the predictor
variables are entered into the analysis
simultaneously (i.e., enter)

 Used to examine how much:


 An outcome variable is explained by a set of predictor
variables as a group
 Variance in the outcome variable is explained by a
single predictor (unique contribution).

76
EXAMPLE
 The different methods of regression and their
associated outputs will be illustrated using:
 Outcome variable
 Essay mark
 Predictor variables
 Number lectures attended (out of 20)
 Motivation of student (on scale from 0 – 100)

 Number of course books read (from 0 -10)

Motivation

Attendance at Exam
lectures Performance
(Grade) 77
Books read
ENTER OUTPUT
b
Variables Entered/Removed

Variables Variables
Model Entered Removed Method

1 books, lectures, . Enter


a
motivation

a. All requested variables entered.

b. Dependent Variable: essay

78
ENTER OUTPUT

Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate
a
1 .918 .842 .812 6.84522

a. Predictors: (Constant), books, lectures, motivation

R square = proportion of variance in outcome accounted


for by the predictor variables
Adjusted R square = takes into account the sample size
79
and the number of predictor variables
ENTER OUTPUT

ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regress ion 95293.006 3 31764.335 17.030 .000 a
Res idual 382376.0 205 1865.249
Total 477669.0 208
a. Predictors : (Constant), Gender identification, Negative impress ions males hold
about females , Pos itive impress ions males hold about females
b. Dependent Variable: Negative impress ion about males

80
ENTER OUTPUT
a
Coefficients

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 19.738 5.399 3.656 .002

lectures 1.217 .469 .490 2.595 .020

motivation .352 .144 .466 2.450 .026

books .509 .504 .103 1.010 .327

a. Dependent Variable: essay

Beta = standardised regression coefficient and shows the degree to which the
predictor variable predicts the outcome variable with all other things constant 81
HIERARCHICAL MULTIPLE REGRESSION
 aka sequential regression

 Predictor variables entered in a prearranged


order of steps (i.e., block entry)

 Can examine how much variance is accounted for


by a predictor when others already in the model

82
83
84
Don’t forget to choose the r-square change option from the Statistics
menu
BLOCK ENTRY OUTPUT
b
Variables Entered/Removed

Variables Variables
Model Entered Removed Method
a
1 lectures . Enter

2 books, . Enter
a
motivation

a. All requested variables entered.

b. Dependent Variable: essay

85
BLOCK ENTRY OUTPUT
Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate
a
1 .884 .781 .768 7.60374
b
2 .918 .842 .812 6.84522

a. Predictors: (Constant), lectures

b. Predictors: (Constant), lectures, books, motivation

Model Summary

Change Statistics

R Square
Model Change F Change df1 df2 Sig. F Change

1 .781 64.069 1 18 .000

2 .061 3.105 2 16 .073

86
NB – this will be in one long line in the output!
BLOCK ENTRY OUTPUT
c
ANOVA

Model Sum of Squares df Mean Square F Sig.


a
1 Regression 3704.295 1 3704.295 64.069 .000

Residual 1040.705 18 57.817

Total 4745.000 19
b
2 Regression 3995.288 3 1331.763 28.422 .000

Residual 749.712 16 46.857

Total 4745.000 19

a. Predictors: (Constant), lectures

b. Predictors: (Constant), lectures, books, motivation

c. Dependent Variable: essay 87


BLOCK ENTRY OUTPUT
a
Coefficients

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 30.311 3.042 9.965 .000

lectures 2.194 .274 .884 8.004 .000

2 (Constant) 19.738 5.399 3.656 .002

lectures 1.217 .469 .490 2.595 .020

motivation .352 .144 .466 2.450 .026

books .509 .504 .103 1.010 .327

a. Dependent Variable: essay

88
STATISTICAL MULTIPLE REGRESSION

 aka sequential techniques

89
STATISTICAL MULTIPLE REGRESSION
 aka sequential techniques

 Relies on SPSS selecting which predictor


variables to include in a model

 Three types:
 Forward selection
 Backward selection
 Stepwise selection

90
 Forward Starts with no variables in model, tries
them all, includes best predictor, repeats

 Backward Starts with ALL variable, removes


lowest contributor, repeats

 Stepwise Combination. Starts as Forward, checks


that all variables are making contribution after each
iteration (like Backward)

91
SUMMARY OF MODEL SELECTION
TECHNIQUES

 Theory based
 Enter - all predictors entered together (standard)
 Block entry – predictors entered in groups
(hierarchical)

 Statistical based
 Forward – variables entered in to the model based
on their statistical significance
 Backward – variables are removed from the model
based on their statistical significance
 Stepwise – variables are moved in and out of the
model based on their statistical significance 92
ASSUMPTIONS OF REGRESSION
 Linearity
 Relationship between the dependent and predictors must be linear
 check: violations assessed using a scatter-plot
 Independence
 Values on outcome variables must be independent
 i.e., each value comes from a different participant
 Homoscedasity
 At each level of the predictor variable the variance of the residual
terms should be equal (i.e. all data points should be about as close
to the line of best fit)
 Can indicate if all data is drawn from same sample
 Normality
 Residuals/errors should be normally distributed
 check : violations using histograms (e.g., outliers)
 Multicollinearity 93

 Predictor variables should not be highly correlated


OTHER IMPORTANT ISSUES
 Regression in this case is for continuous/interval
or categorical predictors with ONLY two
categories
 More than two are possible (dummy coding)
 Outcome must be continuous/interval

 Sample Size
 Multiple regression needs a relatively large sample size
 some authors suggest using between 10 and 20
participants per predictor variable
 others argue should be 50 cases more than the number
of predictors
 to be sure that one is not capitalising on chance effects 94
OUTCOMES
 So – what is regression?

 This lecture has:


 introduced the different types regression
 detailed how to conduct and interpret regression using
SPSS
 described the underlying assumptions of regression
 outlined the data types and sample sizes needed for
regression
 outlined the major limitation of a regression analysis

95
HIERARCHICAL MULTIPLE REGRESSION

96
 Using hierarchical multiple regression (also referred to as
sequential regression) means that we will be entering our
variables in steps or blocks in a predetermined order (not
letting the computer decide, as would be the case for
stepwise regression).
 In the first block we will ‘force’ age and socially desirable
responding into the analysis. This has the effect of
statistically controlling for these variables.
 In the second step we enter the other independent
variables into the ‘equation’ as a block, just as we did in
the previous example.
 The difference this time is that the possible effect of some
variables (in the second block) has been ‘removed’ and we
can then see whether our block of independent variables
are still able to explain some of the remaining variance in
our dependent variable.
97
PROCEDURE FOR HIERARCHICAL MULTIPLE
REGRESSION
1. From the menu at the top of the screen click on:
Analyze, then click on Regression, then on Linear.
2. Choose your continuous dependent variable and move

98
it into the Dependent box.
3. Move the variables you wish to control for into the
Independent box (This will be the first block of
variables to be entered in the analysis (Block 1 of 1).
4. Click on the button marked Next. This will give you a
second independent variables box to enter your
second block of variables into (you should see Block 2
of 2).
5. Choose your next block of independent variables.
6. In the Method box make sure that this is set to the
default (Enter). This will give you standard multiple
regression for each block of variables entered.
7. Click on the Statistics button. Tick the boxes
marked Estimates, Model fit, R squared change,
Descriptives, Part and partial correlations and
Collinearity
diagnostics. Click on Continue.
8. Click on the Options button. In the Missing
Values section click on Exclude cases pairwise.
9. Click on the Save button. Click on Mahalonobis
and Cook’s. Click on Continue and then OK.

Some of the output generated from this procedure


is shown below.

99
100
101
INTERPRETATION OF OUTPUT
 The output generated from this analysis is
similar to the previous output, but with some
extra pieces of information.

102
 In the Model Summary box there are two
models listed. Model 1 refers to the first block of
variables that were entered, while Model 2
includes all the variables that were entered in
both blocks.
STEP 1: EVALUATING THE MODEL
Check the R Square values in the first Model
summary box.

103
 After the variables in Block 1 have been entered,
the overall model explains 5.7 per cent of the
variance (.057 × 100).
 After Block 2 variables have also been included,
the model as a whole explains 47.4 per cent (.474
× 100).
 It is important to note that this second R square
value includes all the variables from both blocks,
not just those included in the second step.
look in the column labelled R Square change.
 This will help you to find out how much of this overall
variance is explained by our variables of interest after the
effects of the second block of variable are removed.
 In the output presented above you will see, on the line
marked Model 2, that the R square change value is .417.
This means that the second block of variable explain an
additional 41.7 per cent (.417 × 100) of the variance in the
independent variable, even when the effects of the second
block of variable are statistically controlled for. This is a
statistically significant contribution, as indicated by the
Sig. F change
 value for this line (.000).
 The ANOVA table indicates that the model as a whole
(which includes both blocks of variables) is significant [F104
(4, 421)=94.78, p<.0005).
STEP 2: EVALUATING EACH OF THE
INDEPENDENT VARIABLES
 To find out how well each of the variables contributes to
the equation we need to look in the Coefficients table.
 Always look in the Model 2 row. This summarises

105
the results, with all the variables entered into the
equation. Scanning the Sig. column, there are only two
variables that make a statistically significant
contribution (less than .05).
 Remember, these beta values represent the unique
contribution of each variable, when the overlapping
effects of all other variables are statistically removed. In
different equations, with a different set of independent
variables, or with a different sample, these values would
change.

You might also like