100% found this document useful (2 votes)

244 views105 pages

UKP6053 - L8 Multiple Regression

This document discusses multiple linear regression analysis. It begins by stating the learning objectives are to understand what simple and multiple regression are, their rationale, how to conduct regression analyses in SPSS, and how to interpret the results. It then provides instructions for an assignment using a survey dataset to run multiple linear regression with total optimism as the dependent variable and total self-esteem and total life satisfaction as independent variables, interpreting the results in APA style. Finally, it covers key aspects of multiple linear regression, including the regression equation, correlation and determination coefficients, assumptions, and an example analysis.

Uploaded by

Fiq Razali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

244 views105 pages

UKP6053 - L8 Multiple Regression

Uploaded by

Fiq Razali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 105

MULTIPLE REGRESSION

Lecture 8
1
LEARNING OBJECTIVES
 In this lecture you will learn:
 What simple and multiple regression mean.
 The rationale behind these forms of analyses
 How to conduct a simple bivariate and multiple
regression analyses using SPSS
 How to interpret the results of a regression analysis

2
ASSIGNMENT
 using survey.sav file:
1- run multiple linear regression between between
Total Optimism as (DV) AND both of "Total Self-
esteem" and "Total Life Satisfaction" as (IVs)
2- interpret the results using APA style.

3
MULTIPLE LINEAR
REGRESSION

4
LINEAR RELATIONS BETWEEN TWO OR MORE
IVS AND A SINGLE DV
MULTIPLE REGRESSION
 Multiple regression is used when there is more
than one predictor variable

 Two major uses of multiple regression:

 Prediction
 Causal analysis

5
LINEAR REGRESSION SUMMARY

 Linear Regression

6
X Y

 Multiple Linear Regression

 X1
 X2
 X3 Y
 X4
 X5
Correlation X Y

Simple Regression

7
Y

Partial Correlation
MLR
X1 X2
USES OF MULTIPLE REGRESSION
 Multiple regression can be used to examine the
following:
 How well a set of variables predict an outcome
 Which variable in a set of variables is the best
predictor of the outcome
 Whether a predictor variable still predicts the
outcome when another variable is controlled for.

8
MULTIPLE REGRESSION - EXAMPLE
What might predict exam performance?

Motivation

Attendance at Exam
lectures Performance
(Grade)
Books read
9
REGRESSION EQUATION
 Y = b1x1 + b2x2 +.....+ bixi + a + e
• Y = observed DV scores
• bi = unstandardised regression coefficients (the Bs in

10
SPSS) - slopes
• x1 to xi = IV scores
• a = Y axis intercept
• e = error (residual)
MULTIPLE CORRELATION COEFFICIENT (R)

• “Big R” (capitalise, i.e., R)

• Equivalent of r, but takes into account that there are

11
multiple predictors (IVs)
• Always positive, between 0 and 1
• Interpretation is similar to that for r (correlation
coefficient)
COEFFICIENT OF DETERMINATION (R2)

• “Big R squared”
• Squared multiple correlation coefficient
Usually report R2 instead of R

12
•

• Indicates the % of variance in DV explained by

combined effects of the IVs
• Analogous to r2
RULE OF THUMB
INTERPRETATION OF R2

• .00 = no linear relationship

13
• R2 = .10 = small (R ~ .3)
• R2 = .25 = moderate (R ~ .5)
• R2 = .50 = strong (R ~ .7)
• R2 = 1.00 = perfect linear relationship
R2 ~ .30 is good for social sciences
ADJUSTED R 2
• Used for estimating explained variance in a
population.
• Report R2 and adjusted R2

14
• Particularly for small N and where results are
to be generalised, take more note of adjusted R2
REGRESSION COEFFICIENTS
 Y = b1x1 + b2x2 +.....+ bixi + a + e

• Y-intercept (a)

15
• Slopes (b):
– Unstandardised
– Standardised
• Slopes are the weighted loading of IV, adjusted for
the other IVs in the model.
UNSTANDARDISED
REGRESSION COEFFICIENTS

• B = unstandardised regression coefficient

Used for regression equations

16
•

• Used for predicting Y scores

• But can’t be compared with one another unless all
IVs are measured on the same scale
STANDARDISED REGRESSION COEFFICIENTS

• Beta (b or ) = standardised regression coefficient

Used for comparing the relative strength of

17
•
predictors
•  = r in LR but this is only true in MLR when the
IVs are uncorrelated.

• Which IVs are the most important?

• Compare the standardised regression coefficients
(’s)
ASSUMPTIONS

18
ASSUMPTIONS
1. Levels of measurement
– IVs = metric (interval or ratio) or dichotomous
– DV = metric (interval or ratio)
2. Sample size

19
– Ratio of cases to IVs; total N:
– Min. 5:1; > 20 cases total
– Ideal 20:1; > 100 cases total
– Tabachnick and Fidell (2001, p. 117) give a formula for
calculating sample size requirements, taking into
account the number of independent variables that you
wish to use:
N > 50 + 8m (where m = number of independent
variables).
3. NORMALITY, LINEARITY, HOMOSCEDASTICITY,
INDEPENDENCE OF RESIDUALS

 These assumptions can be checked from the

residuals scatterplots which are generated as part
of the multiple regression procedure.

20
 Residuals are the differences between the
obtained and the predicted dependent variable
(DV) scores. The residuals scatterplots allow you
to check:
 linearity:
– Linear relations exist between IVs & DVs
– the residuals should have a straight-line
relationship with predicted DV scores; and
 normality:
– the residuals should be normally distributed about
the predicted DV scores;
– If variables are non-normal, there will be
heteroscedasticity

 homoscedasticity :
– the variance of the
residuals about
predicted DV scores
should be the same for
all predicted scores.
4. OUTLIERS

Multiple regression is very sensitive to outliers (very

high or very low scores). Checking for extreme scores
should be part of the initial data screening process. You

22
should do this for all the variables, both dependent and
independent, that you will be using in your regression
analysis.
• Extreme cases should be deleted or modified.
• Outliers on your dependent variable can be identified
from the standardised residual plot that can be
requested. Outliers defined as those with
standardised residual values above about 3.3 (or less
than –3.3)
CONT.

 A case may be within normal range for each variable

individually, but be a multivariate outlier based on an
unusual combination of responses which unduly
influences multivariate test results.
e.g., a person who:
– Is 19 years old
– Has 3 children
– Has a post-graduate degree
5. MULTICOLLINEARITY AND SINGULARITY
This refers to the relationship among the independent
variables.
 Multicollinearity
– high correlations (e.g., over .7) between IVs.

24
 Singularity occurs when one independent
variable is actually a combination of other
independent variables.
– perfect correlations among IVs.
– Leads to unstable regression coefficients.
 Multiple regression doesn’t like multicollinearity
or singularity, and these certainly don’t
contribute to a good regression model, so always
check for these problems before you start.
MULTICOLLINEARITY
 Detect via:
 Correlation matrix - are there large
correlations among IVs?

25
 Tolerance statistics - if < .3 then exclude
that variable.
 Variance Inflation Factor (VIF) - looking
for < 3, otherwise exclude variable.
6. CAUSALITY

• Like correlation, regression does not tell us

about the causal relationship between variables.

26
• In many analyses, the IVs and DVs could be
swapped around – therefore, it is important to:
– Take a theoretical position
– Acknowledge alternative explanations
EXAMPLE (( FROM FILE: SURVEY.SAV))

27
FILE: SURVEY.SAV

28
 What you need:
• One continuous dependent variable (total perceived stress); and
• Two or more continuous independent variables (mastery,
PCOISS). (You can also use dichotomous independent
variables, e.g. males=1, females=2.)

29
 What it does:
•Multiple regression tells you how much of the variance in
your dependent variable can be explained by your
independent variables.
•It also gives you an indication of the relative contribution
of each independent variable.
•Tests allow you to determine the statistical significance of
the results, both in terms of the model itself, and the
individual independent variables.
CONT.

 Assumptions:
•The Major assumptions for multiple regression
are described in an earlier section of this

30
presentation.
•Some of these assumptions can be checked as
part of the multiple regression analysis
STANDARD MULTIPLE REGRESSION
 Inthis example two questions will be
addressed:

31
Question 1: How well do the two measures of control
(mastery, PCOISS) predict perceived stress? How much
variance in perceived stress scores can be explained by
scores on these two scales?
Question 2: Which is the best predictor of perceived stress:
control of external events (Mastery scale) or control of
internal states (PCOISS)? example two questions will be
addressed:
PROCEDURE FOR STANDARD MULTIPLE REGRESSION

1. From the menu at the top of the screen click on:

Analyze, then click on Regression, then on Linear.
2. Click on your continuous dependent variable and
move it into the Dependent box.

32
3. Click on your independent variables and move them
into the Independent box.
4. For Method, make sure Enter is selected (this will
give you standard multiple regression).
5. Click on the Statistics button.
• Tick the box marked Estimates, Confidence
Intervals, Model fit, Descriptives, Part and
partial correlations and Collinearity diagnostics
 In the Residuals section tick the Casewise
diagnostics and Outliers outside 3 standard
deviations.
• Click on Continue.
6. Click on the Options button. In the Missing Values
section click on Exclude cases pairwise.
7. Click on the Plots button.
• Click on *ZRESID and the arrow button to move this
into the Y box.
• Click on *ZPRED and the arrow button to move this
into the X box.
• In the section headed Standardized Residual Plots, tick
the Normal probability plot option.
• Click on Continue.
8. Click on the Save button.
• In the section labelled Distances tick the Mahalanobis
box (this will identify multivariate outliers for you) and
Cook’s.
• Click on Continue.
9. Click on OK. 33
The output generated from this procedure is shown below.
34
35
36
37
38
Standard Error of the Estimate
The standard error of the estimate is a measure of
the accuracy of predictions.

You can see that in Graph A, the points are closer to the
39
line than they are in Graph B. Therefore, the predictions
in Graph A are more accurate than in Graph B.
40
INTERPRETATION OF OUTPUT FROM
STANDARD MULTIPLE REGRESSION

 Step 1: Checking the assumptions

 Step 2: Evaluating the model

41
 Step 3: Evaluating each of the
independent variables
STEP 1: CHECKING THE ASSUMPTIONS

Multicollinearity (table labelled Correlations)

 Check that your independent variables show at least some
relationship with your dependent variable (above .3 preferably).

42
 In this case both of the scales (Total Mastery and Total
PCOISS) correlate substantially with Total Perceived Stress (–
.61 and –.58 respectively).
 check that the correlation between each of your independent
variables is not too high (say, 0.7 or more). If so, omit one of the
variables or forming a composite variable from the scores of the
two highly correlated variables.
 In the example presented here the correlation is .52, which is
less than .7, therefore all variables will be retained.
MULTICOLLINEARITY “CONT.“
 collinearity diagnostics: (table labelled Coefficients)
 This can pick up on problems with multicollinearity that may
not be evident in the correlation matrix. you should take them
only as a warning sign, and check the correlation matrix.

Tolerance VIF (Variance

inflation factor)
is an indicator of how much of the variability of the the inverse of the
specified independent is not explained by the other Tolerance value (1
independent variables in the model and is calculated divided by
using the formula 1–R2 for each variable. Tolerance).

If this value is very small (less than .10), it VIF values above 10
indicates that the multiple correlation with other would be a concern
variables is high, suggesting the possibility of here, indicating
multicollinearity. multicollinearity.

In this example the tolerance value is .729, which is not less than .10, 43
and
the VIF value is1.372, which is well below the cut-off of 10.; therefore, we
have not violated the multicollinearity assumption.
OUTLIERS, NORMALITY, LINEARITY,
HOMOSCEDASTICITY, INDEPENDENCE OF RESIDUALS.
 inspecting the residuals scatterplot and the
Normal Probability Plot of the regression
standardised residuals.
A straight diagonal line from bottom left to

44


top right would suggest no major deviations

from normality.

 The residuals should be roughly

rectangularly distributed, with most of the
scores concentrated in the centre (along the
0 point).
 What you don’t want to see is a clear or
systematic pattern to your residuals (e.g.
curvilinear, or higher on one side than the other).
Deviations from a centralised rectangle suggest
some violation of the assumptions.
OUTLIERS

 The presence of outliers can also be detected from

the Scatterplot.
 Outliers can be define as cases that have a

45
standardised residual (as displayed in the
scatterplot) of more than 3.3 or less than –3.3.
 With large samples, it is common to find a number of
outlying residuals. If you find only a few, it may not
be necessary to take any action.
 Outliers can also be checked by inspecting the
Mahalanobis distances.
MAHALANOBIS
 To identify which cases are outliers you
will need to determine the critical chi-
square value, using the number of
independent variables as the degrees of
freedom.

46
 In this example I have two independent
variables; therefore the critical value
is13.82.
 To identify outliers for Mah_1 (using Descriptives, Explore
and requesting Outliers from the list of Statistics). you ask
for the program to label the cases by ID.
 The five highest values will be displayed; check that none
of the values exceed the critical value obtained from the
table above. If you have a big sample size, don’t worry
about a few number of outliers.
CASEWISE DIAGNOSTICS
 This presents information about cases that have
standardised residual values above 3.0 or below –3.0.
 In a normally distributed sample we would expect only
1 percent of cases to fall outside this range.

47
 In this sample we have found one case (case number
152) with a residual value of –3.475.
 To check whether this strange case is having any undue
influence on the results for our model as a whole, we
can check the value for Cook’s Distance given
towards the bottom of the Residuals Statistics
table. Cases with values larger than 1 are a potential
problem. In our example the maximum value for Cook’s
Distance is .09
STEP 2: EVALUATING THE MODEL
LOOK IN THE “MODEL SUMMARY BOX”
 R Square is a statistic that will give some information
about the goodness of fit of a model. In regression, the R2
coefficient of determination is a statistical measure of

48
how well the regression line approximates the real data
points. An R2 of 1.0 indicates that the regression line
perfectly fits the data.
 In this case the value is .468. Expressed as a percentage,
this means that our model explains 46.8 percent of the
variance in perceived stress. This is quite a respectable
result particularly when you compare it to some of the
results that are reported in the journals!).
STEP 2: EVALUATING THE MODEL “CONT.“
 Adjusted R Square is a modification of R2 that adjusts
for the number of explanatory terms in a model.
 Unlike R2, the adjusted R2 increases only if the new term
improves the model more than would be expected by

49
chance. The adjusted R2 can be negative, and will always
be less than or equal to R2.
 Adjusted R2 does not have the same interpretation as R2.
When a small sample is involved, the R square value in
the sample tends to be a rather optimistic overestimation
of the true value in the population. The Adjusted R square
statistic ‘corrects’ this value to provide a better estimate of
the true population value. If you have a small sample you
may wish to consider reporting this value, rather than the
normal R Square value.
ANOVA

 look in the table labelled ANOVA

 This tests the null hypothesis that multiple R in

50
the population equals 0.
 The model in this example reaches statistical
significance (Sig = .000, this really means
p<.0005).
STEP 3: EVALUATING EACH OF THE
INDEPENDENT VARIABLES

 We want to know is which of the variables included in

the model contributed to the prediction of the
dependent variable.

51
 We find this information in the output box labelled
Coefficients. Look in the column labelled Beta under
Standardised Coefficients. To compare the different
variables it is important that you look at the
standardised coefficients, not the unstandardised ones.
 If you were interested in constructing a regression
equation, you would use the unstandardised coefficient
values listed as B.
BETA
 Look down the Beta column and find which beta value is
the largest (ignoring any negative signs out the front).
 The largest beta coefficient means that this variable
makes the strongest unique contribution to explaining

52
the dependent variable, when the variance explained by
all other variables in the model is controlled for.
 For each of these variables, check the value in the
column marked Sig. This tells you whether this variable
is making a statistically significant unique contribution
to the equation.
 If the Sig. value is less than .05 (.01, .0001, etc.), then
the variable is making a significant unique contribution
to the prediction of the dependent variable.
PART CORRELATION COEFFICIENTS (SEMI-
PARTIAL CORRELATION COEFFICIENTS)

 If you square this value (whatever it is called) you get an

indication of the contribution of that variable to the total
R squared. In other words, it tells you how much of the

53
total variance in the dependent variable is uniquely
explained by that variable and how much R squared
would drop if it wasn’t included in your model.
 Example: if a variable has a part correlation coefficient
of –.36. If we square this (multiply it by itself) we get .13,
indicating that this variable explains 13 per cent of the
variance in the independent variable.
CHECKLIST FOR STANDARD MULTIPLE REGRESSION
1. Issues
a. Ratio of cases to IVs and missing data
b. Normality, linearity, and homoscedasticity of residuals
c. Outliers
d. multicollineariaty and singularity

54
e. Outliers in the solution
2. Major analyses
a. Multiple R2 and its confidence limits, F ratio
b. Adjusted multiple R2, overall proportion of variance accounted for
c. Significance of regression coefficients
d. Squared semipartial correlations
3. Additional analyses
a. Post hoc significance of correlations
b. Unstandardized (β) weights, confidence limits
c. Standardized (β) weights
d. Unique versus shared variability
e. Suppressor variables
f. Prediction equation
RESULTS
 A standard multiple regression was performed between number
of visits to health professionals as the dependent variable and
physical health, mental health, and stress as independent vari-
ables. Analysis was performed using SPSS REGRESSION and
SPSS EXPLORE for evaluation of assumptions.

55
 Results of evaluation of assumptions led to transformation of
the variables to reduce skewness, reduce the number of outliers,
and improve the normality, linearity, and homoscedasticity of
residuals. A square root transformation was used on the
measure of stress. Logarithmic transformations were used on
number of visits to health professionals and on physical health.
One IV, mental health, was positively skewed without
transformation and negatively skewed with it; it was not
transformed. With the use of a p < .001 criterion for
Mahalanobis distance no outliers among the cases were found.
No cases had missing data and no suppressor variables were
found, N = 465.
 Table …… displays the correlations between the variables,
the unstandardized regression coefficients (B) and
intercept, the standardized regression coefficients (3), the
semipartial correlations Csri2), R2, and adjusted R2. R for
regression was significantly different from zero, F(3, 461) =
92.90, p < .001, with R2 at .38 and 95% confidence limits
from .30 to .44. The adjusted R2 value of .37 indicates that
more than a third of the variability in visits to health
professionals is predicted by number of physical health
Symptoms, stress, and mental health symptoms. For the
two regression coefficients that differed significantly from
zero, 95% confidence limits were calculated. The confidence
limits for (square root of) stress were 0.0091 to 0.0223, and
those for (log of) physical health were 0.8686 to 1.2113.
56
 The three IVs in combination contributed another .15 in
shared variability. Altogether, 38% (37% adjusted) of the
variability in visits to health professionals was predicted by
knowing scores on these three IVs. The size and direction of
the relationships suggest that more visits to health
professionals are made among women with a large number
of physical health symptoms and higher stress. Between
those two, however, number of physical health symptoms is
much more important, as indicated by the squared semi-
partial correlations.
 Although the bivariate correlation between (log of) visits to
health professionals and mental health was statistically
different from zero using a post hoc correction, r= .36, F(3,
461) = 22.16, p < .01, mental health did not contribute
significantly to regression. Apparently, the relationship
between the number of visits to health professionals and
mental health is mediated by the relationships between
physical health, stress, and visits to health professionals.57
TYPES OF MLR

• Standard or direct (simultaneous)

Hierarchical or sequential

58
•

• Stepwise (forward & backward)

DIRECT OR STANDARD

• All predictor variables are entered together

(simultaneously)

59
• Allows assessment of the relationship between all
predictor variables and the criterion (Y) variable if there is
good theoretical reason for doing so.
• Manual technique & commonly used
HIERARCHICAL (SEQUENTIAL)

• IVs are entered in blocks or stages.

– Researcher defines order of entry for the
variables, based on theory.

60
– May enter ‘nuisance’ variables first to ‘control’
for them, then test ‘purer’ effect of next block of
important variables.
• R2 change - additional variance in Y explained at
each stage of the regression.
– F test of R2 change.
STEPWISE

• Combines forward & backward.

• At each step, variables may be entered or
removed if they meet certain criteria.

61
• Useful for developing the best prediction
equation from the smallest no. of variables.
• Redundant predictors removed.
FORWARD SELECTION

• The strongest predictor variables are

entered, one by one, if they reach a criteria

62
(e.g., p < .05)
• Best predictor = IV with the highest r with
Y
BACKWARD ELIMINATION

• All predictor variables are entered, then the

63
weakest predictors are removed, one by one,
if they meet a criteria (e.g., p > .05)
• Worst predictor = x with the lowest r with Y
WHICH METHOD?

• Standard: To assess impact of all IVs

simultaneously

64
• Hierarchical: To test specific hypotheses derived
from theory
• Stepwise: If goal is accurate statistical prediction –
computer driven
LOGISTIC REGRESSION
 There are many research situations, however, when the
dependent variable of interest is categorical (e.g.
win/lose; fail/pass; dead/alive).

65
 Logistic regression allows you to test models to predict
categorical outcomes with two or more categories. Your
predictor (independent) variables can be either
categorical or continuous, or a mix of both in the one
model.
 There is a family of logistic regression techniques
available in SPSS that will allow you to explore the
predictive ability of sets or blocks of variables, and to
specify the entry of variables.
TECHNIQUES OF LOGISTIC REGRESSION
 A Forced Entry Method, is the default procedure
available in SPSS. In this approach all predictor
variables are tested in one block to assess their
predictive ability, while controlling for the effects of

66
other predictors in the model.
 The stepwise procedures (e.g. forward and
backward)—allow you to specify a large group of
potential predictors from which SPSS can pick a
subset that provides the best predictive power.
 These stepwise procedures have been criticised (in
both logistic and multiple regression) because they can
be heavily influenced by random variation in the data,
with variables being included or removed from the
model on purely statistical grounds
THE END

 There will be “A MIDTERM EXAM”

 THE EXAM’S TOPICS ARE r, LR and

MLR

67
ENRICHMENT

68
MULTIPLE REGRESSION USING SPSS
Analyze Regression Linear

69
70
MULTIPLE REGRESSION: SPSS OUTPUT

b
Va riables Entere d/Re moved

Variables Variables
Model Entered Removed Method
1 Lectures
att ended,
Number of . Enter
booksa
read
a. All reques ted variables ent ered.
b. Dependent Variable: Grade ac hieved
71
MULTIPLE REGRESSION: SPSS OUTPUT

Model Summary

Adjus ted Std. Error of

Model R R Square R Square the Es timate
1 .605 a .367 .336 13.711
a. Predictors : (Constant), Lectures attended, Number of
books read

72
MULTIPLE REGRESSION: SPSS OUTPUT
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regres sion 4569.053 2 2284.526 12.153 .000a
Residual 7895.258 42 187.982
Total 12464.311 44
a. Predic tors : (Const ant), Lec tures at tended, Number of book s read
b. Dependent Variable: Grade ac hieved

For overall model: F(2, 42) = 12.153, p<.001

73
MULTIPLE REGRESSION: SPSS OUTPUT
Coeffi cientsa

Unstandardized St andardiz ed
Coeffic ient s Coeffic ient s
Model B St d. Error Beta t Sig.
1 (Const ant) 39.173 6.625 5.913 .000
Number of books read 3.832 1.712 .331 2.238 .031
Lectures attended 1.290 .536 .356 2.407 .021
a. Dependent Variable: Grade ac hieved

Number of books read is significant predictor

b=.33, t(42) = 2.24, p<.05

Lectures attended is a significant predictor

b=.36, t(42) = 2.41, p<.05
74
MAJOR TYPES OF MULTIPLE REGRESSION
 There are different types of multiple
regression:
Standard multiple regression

}

Theory-based model
 Enter
building
 Hierarchical multiple regression
 Block entry
 Sequential multiple regression
 Forward
 Backward

 Stepwise
} Statistical model building

75
STANDARD MULTIPLE REGRESSION
 Most common method. All the predictor
variables are entered into the analysis
simultaneously (i.e., enter)

 Used to examine how much:

 An outcome variable is explained by a set of predictor
variables as a group
 Variance in the outcome variable is explained by a
single predictor (unique contribution).

76
EXAMPLE
 The different methods of regression and their
associated outputs will be illustrated using:
 Outcome variable
 Essay mark
 Predictor variables
 Number lectures attended (out of 20)
 Motivation of student (on scale from 0 – 100)

 Number of course books read (from 0 -10)

Motivation

Attendance at Exam
lectures Performance
(Grade) 77
Books read
ENTER OUTPUT
b
Variables Entered/Removed

Variables Variables
Model Entered Removed Method

1 books, lectures, . Enter

a
motivation

a. All requested variables entered.

b. Dependent Variable: essay

78
ENTER OUTPUT

Model Summary

Adjusted R Std. Error of the

Model R R Square Square Estimate
a
1 .918 .842 .812 6.84522

a. Predictors: (Constant), books, lectures, motivation

R square = proportion of variance in outcome accounted

for by the predictor variables
Adjusted R square = takes into account the sample size
79
and the number of predictor variables
ENTER OUTPUT

ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regress ion 95293.006 3 31764.335 17.030 .000 a
Res idual 382376.0 205 1865.249
Total 477669.0 208
a. Predictors : (Constant), Gender identification, Negative impress ions males hold
about females , Pos itive impress ions males hold about females
b. Dependent Variable: Negative impress ion about males

80
ENTER OUTPUT
a
Coefficients

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 19.738 5.399 3.656 .002

lectures 1.217 .469 .490 2.595 .020

motivation .352 .144 .466 2.450 .026

books .509 .504 .103 1.010 .327

a. Dependent Variable: essay

Beta = standardised regression coefficient and shows the degree to which the
predictor variable predicts the outcome variable with all other things constant 81
HIERARCHICAL MULTIPLE REGRESSION
 aka sequential regression

 Predictor variables entered in a prearranged

order of steps (i.e., block entry)

 Can examine how much variance is accounted for

by a predictor when others already in the model

82
83
84
Don’t forget to choose the r-square change option from the Statistics
menu
BLOCK ENTRY OUTPUT
b
Variables Entered/Removed

Variables Variables
Model Entered Removed Method
a
1 lectures . Enter

2 books, . Enter
a
motivation

a. All requested variables entered.

b. Dependent Variable: essay

85
BLOCK ENTRY OUTPUT
Model Summary

Adjusted R Std. Error of the

Model R R Square Square Estimate
a
1 .884 .781 .768 7.60374
b
2 .918 .842 .812 6.84522

a. Predictors: (Constant), lectures

b. Predictors: (Constant), lectures, books, motivation

Model Summary

Change Statistics

R Square
Model Change F Change df1 df2 Sig. F Change

1 .781 64.069 1 18 .000

2 .061 3.105 2 16 .073

86
NB – this will be in one long line in the output!
BLOCK ENTRY OUTPUT
c
ANOVA

Model Sum of Squares df Mean Square F Sig.

a
1 Regression 3704.295 1 3704.295 64.069 .000

Residual 1040.705 18 57.817

Total 4745.000 19
b
2 Regression 3995.288 3 1331.763 28.422 .000

Residual 749.712 16 46.857

Total 4745.000 19

a. Predictors: (Constant), lectures

b. Predictors: (Constant), lectures, books, motivation

c. Dependent Variable: essay 87

BLOCK ENTRY OUTPUT
a
Coefficients

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 30.311 3.042 9.965 .000

lectures 2.194 .274 .884 8.004 .000

2 (Constant) 19.738 5.399 3.656 .002

lectures 1.217 .469 .490 2.595 .020

motivation .352 .144 .466 2.450 .026

books .509 .504 .103 1.010 .327

a. Dependent Variable: essay

88
STATISTICAL MULTIPLE REGRESSION

 aka sequential techniques

89
STATISTICAL MULTIPLE REGRESSION
 aka sequential techniques

 Relies on SPSS selecting which predictor

variables to include in a model

 Three types:
 Forward selection
 Backward selection
 Stepwise selection

90
 Forward Starts with no variables in model, tries
them all, includes best predictor, repeats

 Backward Starts with ALL variable, removes

lowest contributor, repeats

 Stepwise Combination. Starts as Forward, checks

that all variables are making contribution after each
iteration (like Backward)

91
SUMMARY OF MODEL SELECTION
TECHNIQUES

 Theory based
 Enter - all predictors entered together (standard)
 Block entry – predictors entered in groups
(hierarchical)

 Statistical based
 Forward – variables entered in to the model based
on their statistical significance
 Backward – variables are removed from the model
based on their statistical significance
 Stepwise – variables are moved in and out of the
model based on their statistical significance 92
ASSUMPTIONS OF REGRESSION
 Linearity
 Relationship between the dependent and predictors must be linear
 check: violations assessed using a scatter-plot
 Independence
 Values on outcome variables must be independent
 i.e., each value comes from a different participant
 Homoscedasity
 At each level of the predictor variable the variance of the residual
terms should be equal (i.e. all data points should be about as close
to the line of best fit)
 Can indicate if all data is drawn from same sample
 Normality
 Residuals/errors should be normally distributed
 check : violations using histograms (e.g., outliers)
 Multicollinearity 93

 Predictor variables should not be highly correlated

OTHER IMPORTANT ISSUES
 Regression in this case is for continuous/interval
or categorical predictors with ONLY two
categories
 More than two are possible (dummy coding)
 Outcome must be continuous/interval

 Sample Size
 Multiple regression needs a relatively large sample size
 some authors suggest using between 10 and 20
participants per predictor variable
 others argue should be 50 cases more than the number
of predictors
 to be sure that one is not capitalising on chance effects 94
OUTCOMES
 So – what is regression?

 This lecture has:

 introduced the different types regression
 detailed how to conduct and interpret regression using
SPSS
 described the underlying assumptions of regression
 outlined the data types and sample sizes needed for
regression
 outlined the major limitation of a regression analysis

95
HIERARCHICAL MULTIPLE REGRESSION

96
 Using hierarchical multiple regression (also referred to as
sequential regression) means that we will be entering our
variables in steps or blocks in a predetermined order (not
letting the computer decide, as would be the case for
stepwise regression).
 In the first block we will ‘force’ age and socially desirable
responding into the analysis. This has the effect of
statistically controlling for these variables.
 In the second step we enter the other independent
variables into the ‘equation’ as a block, just as we did in
the previous example.
 The difference this time is that the possible effect of some
variables (in the second block) has been ‘removed’ and we
can then see whether our block of independent variables
are still able to explain some of the remaining variance in
our dependent variable.
97
PROCEDURE FOR HIERARCHICAL MULTIPLE
REGRESSION
1. From the menu at the top of the screen click on:
Analyze, then click on Regression, then on Linear.
2. Choose your continuous dependent variable and move

98
it into the Dependent box.
3. Move the variables you wish to control for into the
Independent box (This will be the first block of
variables to be entered in the analysis (Block 1 of 1).
4. Click on the button marked Next. This will give you a
second independent variables box to enter your
second block of variables into (you should see Block 2
of 2).
5. Choose your next block of independent variables.
6. In the Method box make sure that this is set to the
default (Enter). This will give you standard multiple
regression for each block of variables entered.
7. Click on the Statistics button. Tick the boxes
marked Estimates, Model fit, R squared change,
Descriptives, Part and partial correlations and
Collinearity
diagnostics. Click on Continue.
8. Click on the Options button. In the Missing
Values section click on Exclude cases pairwise.
9. Click on the Save button. Click on Mahalonobis
and Cook’s. Click on Continue and then OK.

Some of the output generated from this procedure

is shown below.

99
100
101
INTERPRETATION OF OUTPUT
 The output generated from this analysis is
similar to the previous output, but with some
extra pieces of information.

102
 In the Model Summary box there are two
models listed. Model 1 refers to the first block of
variables that were entered, while Model 2
includes all the variables that were entered in
both blocks.
STEP 1: EVALUATING THE MODEL
Check the R Square values in the first Model
summary box.

103
 After the variables in Block 1 have been entered,
the overall model explains 5.7 per cent of the
variance (.057 × 100).
 After Block 2 variables have also been included,
the model as a whole explains 47.4 per cent (.474
× 100).
 It is important to note that this second R square
value includes all the variables from both blocks,
not just those included in the second step.
look in the column labelled R Square change.
 This will help you to find out how much of this overall
variance is explained by our variables of interest after the
effects of the second block of variable are removed.
 In the output presented above you will see, on the line
marked Model 2, that the R square change value is .417.
This means that the second block of variable explain an
additional 41.7 per cent (.417 × 100) of the variance in the
independent variable, even when the effects of the second
block of variable are statistically controlled for. This is a
statistically significant contribution, as indicated by the
Sig. F change
 value for this line (.000).
 The ANOVA table indicates that the model as a whole
(which includes both blocks of variables) is significant [F104
(4, 421)=94.78, p<.0005).
STEP 2: EVALUATING EACH OF THE
INDEPENDENT VARIABLES
 To find out how well each of the variables contributes to
the equation we need to look in the Coefficients table.
 Always look in the Model 2 row. This summarises

105
the results, with all the variables entered into the
equation. Scanning the Sig. column, there are only two
variables that make a statistically significant
contribution (less than .05).
 Remember, these beta values represent the unique
contribution of each variable, when the overlapping
effects of all other variables are statistically removed. In
different equations, with a different set of independent
variables, or with a different sample, these values would
change.

Regression PPT Final
100% (1)
Regression PPT Final
59 pages
8multiple Linear Regression
100% (1)
8multiple Linear Regression
21 pages
SEM:Confirmatory Factor Analysis (CFA)
No ratings yet
SEM:Confirmatory Factor Analysis (CFA)
28 pages
Multiple Linear Regression Analysis
No ratings yet
Multiple Linear Regression Analysis
23 pages
Introduction To Econometrics - Stock & Watson - CH 9 Slides
100% (1)
Introduction To Econometrics - Stock & Watson - CH 9 Slides
69 pages
Advancing Quantitative Methods in Second Language Research
No ratings yet
Advancing Quantitative Methods in Second Language Research
378 pages
Data Analysis Formula Sheet Tables (DADM)
No ratings yet
Data Analysis Formula Sheet Tables (DADM)
8 pages
Binary Logistic Regression
100% (1)
Binary Logistic Regression
11 pages
Correlation
100% (1)
Correlation
49 pages
Multiple Linear Regression
100% (1)
Multiple Linear Regression
14 pages
Statistical Computing Using Statistical Computing Using
No ratings yet
Statistical Computing Using Statistical Computing Using
128 pages
Correlation Regression
No ratings yet
Correlation Regression
42 pages
SPSS Multiple Linear Regression
No ratings yet
SPSS Multiple Linear Regression
55 pages
Stats For Managers - Intro
100% (1)
Stats For Managers - Intro
101 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
60 pages
Logistic Regression
100% (1)
Logistic Regression
56 pages
Logistic Regression
100% (1)
Logistic Regression
17 pages
Ch08 Part 2 - Multtiple Regression
No ratings yet
Ch08 Part 2 - Multtiple Regression
45 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Introduction To Econometrics - Stock & Watson - CH 4 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 4 Slides
84 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Module 11 Unit 3 Multiple Linear Regression
No ratings yet
Module 11 Unit 3 Multiple Linear Regression
8 pages
1
100% (1)
1
385 pages
Assumptions of Regression
100% (2)
Assumptions of Regression
16 pages
Cheat Sheet Final
100% (2)
Cheat Sheet Final
7 pages
Multiple Regression
100% (1)
Multiple Regression
58 pages
Regression & Linear Modeling Best Practices and Modern Methods, 1st Edition Complete DOCX Download
100% (14)
Regression & Linear Modeling Best Practices and Modern Methods, 1st Edition Complete DOCX Download
15 pages
Correlation New
100% (1)
Correlation New
38 pages
Multiple Regression Analysis 1
No ratings yet
Multiple Regression Analysis 1
57 pages
Multiple Regression and Correlation Analysis: BX A Y
100% (1)
Multiple Regression and Correlation Analysis: BX A Y
35 pages
Introduction To Econometrics - Stock & Watson - CH 5 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 5 Slides
71 pages
Homework 2
100% (1)
Homework 2
12 pages
IBM SPSS Modeler-Neural Networks
100% (1)
IBM SPSS Modeler-Neural Networks
18 pages
Qualitative Response Regression Models
No ratings yet
Qualitative Response Regression Models
6 pages
Assignment # 1
No ratings yet
Assignment # 1
28 pages
Multiple Regression
100% (1)
Multiple Regression
30 pages
Logistic Regression Model Study Assignment
100% (1)
Logistic Regression Model Study Assignment
5 pages
Practical Problems in Statistic
100% (1)
Practical Problems in Statistic
8 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Logistic Regression Analysis
100% (4)
Logistic Regression Analysis
65 pages
Estimation and Hypothesis
100% (2)
Estimation and Hypothesis
32 pages
Multiple Regression SPECIALISTICA
No ratings yet
Multiple Regression SPECIALISTICA
93 pages
1 4 Multilevel and Longitudinal Mode PDF
No ratings yet
1 4 Multilevel and Longitudinal Mode PDF
1,503 pages
Logistic+Regression - Done
100% (1)
Logistic+Regression - Done
41 pages
QT Chapter 4
No ratings yet
QT Chapter 4
6 pages
7. Heteroscedasticity: y = β + β x + · · · + β x + u
100% (1)
7. Heteroscedasticity: y = β + β x + · · · + β x + u
21 pages
(Ebook PDF) Discovering Statistics Using IBM SPSS Statistics 4th Download
100% (1)
(Ebook PDF) Discovering Statistics Using IBM SPSS Statistics 4th Download
55 pages
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
ADM2304 Multiple Regression Dr. Suren Phansalker
No ratings yet
ADM2304 Multiple Regression Dr. Suren Phansalker
12 pages
Simple and Multiple Regression Analysis
100% (1)
Simple and Multiple Regression Analysis
48 pages
Factor Analysis: Statistics For Psychosocial Research
No ratings yet
Factor Analysis: Statistics For Psychosocial Research
73 pages
Introduction To Econometrics - Stock & Watson - CH 7 Slides
100% (1)
Introduction To Econometrics - Stock & Watson - CH 7 Slides
35 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
31 pages
Blank: CFC Cumulative Forecast Error or Bias Error
100% (1)
Blank: CFC Cumulative Forecast Error or Bias Error
2 pages
Regression Analysis Assignment
No ratings yet
Regression Analysis Assignment
8 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Factor Analysis Example Coca Cola
No ratings yet
Factor Analysis Example Coca Cola
7 pages
Discriminant Analysis PDF
No ratings yet
Discriminant Analysis PDF
9 pages
Confirmatory Factor Analysis
No ratings yet
Confirmatory Factor Analysis
14 pages
Practice Midterm2 Fall2011
No ratings yet
Practice Midterm2 Fall2011
9 pages
Partial Correlation
No ratings yet
Partial Correlation
2 pages
Applied Multilevel Analysis A Practical Guide For Medical Researchers Practical Guides To Biostatistics and Epidemiology 1st Edition Jos W. R. Twisk
No ratings yet
Applied Multilevel Analysis A Practical Guide For Medical Researchers Practical Guides To Biostatistics and Epidemiology 1st Edition Jos W. R. Twisk
79 pages
Transportation of Children To School
No ratings yet
Transportation of Children To School
10 pages
Tutorials in Biostatistics Statistical Modelling of Complex Medical Data Volume 2 Ralph B. D'Agostino
100% (1)
Tutorials in Biostatistics Statistical Modelling of Complex Medical Data Volume 2 Ralph B. D'Agostino
55 pages
Multivariate Analysis IBS
No ratings yet
Multivariate Analysis IBS
20 pages
Is There Social Capital in Cities? Indonesia
No ratings yet
Is There Social Capital in Cities? Indonesia
26 pages
Hamidah2004 - Turnover Intentions Among Secondary School Teachers
No ratings yet
Hamidah2004 - Turnover Intentions Among Secondary School Teachers
39 pages
Full An Introduction To Multilevel Modeling Techniques MLM and SEM Approaches Using Mplus 3rd Edition Ronald H. Heck Ebook All Chapters
No ratings yet
Full An Introduction To Multilevel Modeling Techniques MLM and SEM Approaches Using Mplus 3rd Edition Ronald H. Heck Ebook All Chapters
45 pages
Journal of Applied Quantitative Methods
No ratings yet
Journal of Applied Quantitative Methods
120 pages
Nibret Alene
No ratings yet
Nibret Alene
86 pages
Tutorial MixtureModel Brms
No ratings yet
Tutorial MixtureModel Brms
69 pages
Multiple Regression Analysis Part 1
No ratings yet
Multiple Regression Analysis Part 1
9 pages
Am (101-120) Analisis Multinivel
No ratings yet
Am (101-120) Analisis Multinivel
20 pages
Multilevel Models Applications Using SAS® - (3 Application of Two-Level Linear Multilevel Models)
No ratings yet
Multilevel Models Applications Using SAS® - (3 Application of Two-Level Linear Multilevel Models)
34 pages
Kraft Papay - Prof Env Teacher Development Eepa Full
No ratings yet
Kraft Papay - Prof Env Teacher Development Eepa Full
49 pages
A Comparison of Intraclass Correlation Coefficients, RWG (J)
No ratings yet
A Comparison of Intraclass Correlation Coefficients, RWG (J)
25 pages
The Impact of Customer Satisfaction On Share-of-Wallet in A Business-to-Business Environment
No ratings yet
The Impact of Customer Satisfaction On Share-of-Wallet in A Business-to-Business Environment
14 pages
Adoptive Gay Fathers' Sensitivity and Child Attachment and Behavior Problems
No ratings yet
Adoptive Gay Fathers' Sensitivity and Child Attachment and Behavior Problems
23 pages
Halloween
No ratings yet
Halloween
15 pages
Milfont Et Al - 2018 - On The Relation Between Social Dominance Orientation and Environmentalism
No ratings yet
Milfont Et Al - 2018 - On The Relation Between Social Dominance Orientation and Environmentalism
13 pages
Gresham 2012
No ratings yet
Gresham 2012
6 pages
Stephen E. Humphrey and James M. Lebreton
No ratings yet
Stephen E. Humphrey and James M. Lebreton
6 pages
Social Sciences
No ratings yet
Social Sciences
13 pages
LISREL 9.1 Release Notes
No ratings yet
LISREL 9.1 Release Notes
17 pages
The Actor-Partner Interdependence Model A Model of
No ratings yet
The Actor-Partner Interdependence Model A Model of
11 pages
1 s2.0 S0885200619300705 Main
No ratings yet
1 s2.0 S0885200619300705 Main
10 pages
Relationship Among Family Support, Love Attitude, and Well-Being of Junior High School Students
No ratings yet
Relationship Among Family Support, Love Attitude, and Well-Being of Junior High School Students
8 pages