0% found this document useful (0 votes)

9 views28 pages

SPSS Explained 2nd Edition-312-339

The document discusses linear correlation and regression, focusing on the Pearson, Spearman, and Kendall tau-b correlation methods. It explains how to assess relationships between two variables using correlation coefficients, the assumptions underlying these methods, and provides examples of data analysis using SPSS. The document also outlines procedures for conducting correlation tests and interpreting the results.

Uploaded by

Trí Thành

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views28 pages

SPSS Explained 2nd Edition-312-339

Uploaded by

Trí Thành

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Linear correlation 14

and regression 298 INTRODUCTION

TO THE PEARSON
CORRELATION
301 INTRODUCTION
TO THE
SPEARMAN
CORRELATION

304 INTRODUCTION
TO THE KENDALL
TAU-B
CORRELATION
Sometimes we wish to collect a score on two variables from a set of participants to
306 INTRODUCTION
see whether there is a relationship between the variables. For example, we might TO
measure a person’s experience in a job (number of weeks) and the number of items SCATTERPLOTS
they produce each day to ask if there is a relationship between experience and 314 PARTIAL
productivity. Where we have two variables like this, produced by the same or related CORRELATION
participants, we are able to examine the association between the variables by a 316 LINEAR
correlation. A correlation is performed to test the degree to which the scores on the REGRESSION

two variables co-relate – that is, the extent to which the variation in the scores on 319 LOGISTIC
REGRESSION
one variable results in a corresponding variation in the scores on the second variable.
We will only be considering linear correlations in this book. The simplest
relationship between two variables is a linear relationship and a linear relationship
is the underlying model we propose for our data. The reasons for this are explained
in an earlier chapter. In this case, with two variables, we are arguing that if they are See Chapter 7
correlated, then if we plot the points on a graph they will follow a straight line. We
refer to this line as the regression line. However, we are unlikely to find that our data
points lie exactly on a straight line. We explain this by claiming that it arises from
random factors referred to as ‘error’. Given that the equation of a straight line is
defined mathematically as Y = a + bX, where X and Y are the variables, with ‘a’ the
intercept (or ‘constant’) and ‘b’ the slope of line, our observed values are defined as
follows:

Y = a + bX + error

We can work out the regression line by ﬁnding the equation that gives us the smallest
amount of error.
A strong correlation indicates that there is only a small amount of error and
most of the points lie close to the regression line; a weak correlation indicates that
there is a lot of error and the points are more scattered. In the second case we are
likely to conclude that a linear relationship is not a good model for our data.
High values of one variable associated with high values of the second variable
indicate that the correlation is positive. For example, we might ﬁnd a positive
298 LINEAR CORRELATION AND REGRESSION

correlation between height and foot size, with taller people having larger feet and
shorter people having smaller feet. When high values of the ﬁrst variable are associ-
ated with low values of the second variable then we refer to this as a negative
correlation. So, for a car travelling at a constant speed along a track, we will ﬁnd
that the distance travelled is negatively correlated with the amount of petrol left in
the tank.
The statistical measures of correlation in this chapter (Pearson, Spearman,
Kendall tau-b) all produce a statistic that ranges from –1, indicating a perfect negative
correlation, to +1, indicating a perfect positive correlation. A value of zero indicates
no correlation at all.

INTRODUCTION TO THE PEARSON CORRELATION

The Pearson correlation coefﬁcient (r) is often referred to as Pearson’s product
moment correlation. Essentially, it works out a measure of how much the scores of
the two variables vary together (their ‘product’) and then contrasts this with how
much they vary on their own. The joint variability is referred to as the sums of products
and will be largest when high values of one variable are matched with high values
of the second variable. It will be a negative value when the correlation is negative.
If the joint variability matches the individual variation in the scores then these values
will be equal, so one divided by the other will result in r = 1 (or –1 if the sums of
products is negative). If there is no joint variability the scores do not correlate at all
and r will be zero.
Like other methods of parametric analysis, Pearson’s correlation relies on a
number of assumptions.

n The relationship between the variables is linear.

n The points are evenly distributed along the straight line. This is the assumption
of homoscedasticity. If the data has the points unevenly spread along the
proposed straight line (or there is an outlying point or two) then the Pearson
correlation is not an accurate measure of the association.
n The data are drawn from normally distributed populations.
n The data collected must be interval or ratio, from continuous distributions.

The important point to remember is that we are considering a linear correlation here,
so our key assumption is that the points follow a straight line if they are correlated.
If we believe the relationship between the variables is not linear, then we do not use
the Pearson statistic but instead use the Spearman or Kendall tau-b explained later
in the chapter.

Scenario
A researcher postulated that students who spent the most time studying would
achieve the highest marks in science examinations, whereas those who did the least
studying would achieve lower marks. The researcher noted the results of ten ﬁrst-
year university students, showing how much time (in hours) they spent studying
(on average per week throughout the year) along with their end-of-year examination
marks (out of 100).
LINEAR CORRELATION AND REGRESSION 299

3
As our prediction states that we are expecting a positive correlation this indicates a
direction. Our prediction is therefore one-tailed. If we did not know which direction our
relationship between the two variables would be, we would have a two-tailed prediction,
and we would be looking for either a positive or a negative correlation.

Data entry
3
Enter the dataset as shown in the example.
Remember when
entering data without
decimal places to change
the decimal places to
zero in the Variable
View. See an earlier
chapter for the full data
entry procedure.

See Chapter 2

Pearson correlation test procedure

n Select the Analyze drop-down menu.

n Select Correlate and then Bivariate (meaning having two variables).
n Highlight both the study time and the science exam variables and send them to
the Variables box.
300 LINEAR CORRELATION AND REGRESSION

n You can see that SPSS selects the Pearson coefﬁcient as a default.
n Select whether your prediction is One-tailed or Two-tailed. Ours is one-tailed
as we stated there would be a positive correlation.
n Click on OK.

3
• The Flag significant box is selected as default. Significant correlations are highlighted
underneath the output table with a * for a significance of p < .05 and ** for p < .01.
• If you require the means and standard deviations for each variable, click on the
Options button and tick means and standard deviations, then Continue and OK.

SPSS output
SPSS produces one output table, the Correlations table, unless descriptive statistics
have been selected.
LINEAR CORRELATION AND REGRESSION 301

SPSS essential 3
n The Pearson Correlation test statistic = .721. SPSS indicates with ** that it The appropriate graph to
is significant at the .01 level for a one-tailed prediction. The actual p value support a correlation is a
scatterplot – see later in
is shown to be .009. These figures are duplicated in the matrix. this chapter for more
n A conventional way of reporting these figures would be as follows: details.
r = .72, N = 10, p < .01
n These results indicate that as study time increases, science exam performance
also increases, which is a positive correlation.
n As the r value reported is positive and p < .01, we can state that we have a
positive correlation between our two variables and our null hypothesis can
be rejected. If the r value was negative this would indicate a negative
correlation, and be counter to our hypothesis.
n The Pearson Correlation output matrix also shows the r value when ‘Study
time’ is correlated with itself, and there is a perfect correlation coefficient of
1.000. Similarly, ‘Science Exam’ has a perfect correlation with itself, r = 1.000.
These values are therefore not required.

INTRODUCTION TO THE SPEARMAN CORRELATION

There are times when we wish to correlate data when it is ordinal (one or both
variables are not measured on an interval scale), when data is not normally dis-
tributed, or when other assumptions of the Pearson correlation are violated. On these
occasions we use the Spearman correlation coefﬁcient, which is the nonparametric
equivalent of the Pearson correlation. The Spearman correlation uses exactly the same
calculations as the Pearson but performs the analysis on the ranks of the scores instead
of on the actual data values. The Spearman correlation coefﬁcient is known as rs. As
we are using the ranks rather than the actual scores, the Spearman correlation can
still be used even when the relationship between the two variables is non-linear.
The Spearman correlation can cope with a few tied ranks in the data without
needing to worry about the effect on rs. However, when there are a lot of tied values
the result is that it will make rs larger than it should be. In this case, the Kendall
tau-b can be used instead.

Scenario
Two teachers were asked to rate the same eight teenagers on the variable ‘how well
they are likely to do academically at university’ on a 0–20 scale, from unlikely (0)
to highly likely (20). It was thought that there would be a signiﬁcant correlation
between the teachers’ ranking.

3
As our prediction does not state whether we expect a positive or negative correlation, we
have a two-tailed prediction. If we predicted that our correlation would be either positive
or negative, then we would have a one-tailed prediction.
302 LINEAR CORRELATION AND REGRESSION

Data entry
Enter the dataset as shown in the example.

3
Remember when
entering data without
decimal places to change
the decimal places to
zero in the Variable
View. See previous
chapter for the full data
entry procedure.

See Chapter 2

Spearman test procedure

n Select the Analyze drop-down menu.
n Select Correlate and then Bivariate (meaning having two variables).

3
The Flag significant
correlations box is n Highlight both teacher variables and send them to the Variables box.
selected as default.
Significant correlations n As SPSS selects the Pearson correlation coefficient as a default, deselect that box
are highlighted and put a tick in the Spearman box.
underneath the output
table with a * for a n Select whether your prediction is One-tailed or Two-tailed.
significance of p < .05
and ** for p < .01. n Click on OK.
LINEAR CORRELATION AND REGRESSION 303

SPSS output
In order to check if there is a signiﬁcant correlation between the two teachers’ ratings
the Correlations table must be observed.

SPSS essential
n Spearman’s rho correlation test statistic = .833. This shows a positive
correlation between the two teachers’ ratings. SPSS also illustrates with * that
it is significant at the .05 level for a two-tailed prediction. The actual p value
is shown to be .010. (By double clicking on the figure of .010 in the output
table the value appears to six decimal places .010176, showing that it is just
over .01.) These figures are duplicated in the matrix.
n By observing the Spearman correlation output matrix it can be seen
that teacher 1 is (of course) perfectly correlated with teacher 1, hence the
Spearman’s rho correlation coefficient of 1.000. Similarly, teacher 2 is perfectly
correlated with teacher 2, with a Spearman’s rho correlation coefficient of
1.000.
n A conventional way of reporting the correlation between the two teachers is
as follows:
rs = .83, N = 8, p < .05
n These results indicate that as one teacher’s ratings increase the other teacher’s
ratings increase as well. Therefore, each teacher’s ratings of the teenagers’
expected academic performance is similar, with a student rated highly by one
teacher rated highly by the other as well.

3
While a scatterplot is generally the most appropriate illustrative statistic to support a
correlation, when conducting a Spearman’s test it should be used with caution. The
Spearman’s rho correlation coefﬁcient is produced by using the rank of scores rather
than the actual raw data, whereas the scatterplot displays the raw scores. The Spearman
correlation is based on the ranks of the scores not the actual scores. This means that it
is predicting that that the scores are monotonically related – that they are increasing
together – and not that they lie along a straight line. The procedure for a scatterplot is
at the end of this section.
304 LINEAR CORRELATION AND REGRESSION

INTRODUCTION TO THE KENDALL TAU-B CORRELATION

The Kendall tau-b correlation is another nonparametric correlation coefficient and
is an alternative to the Spearman correlation. It is a measure of association between
two ordinal variables and takes tied ranks into account, so can be used for small
data sets with a large number of tied ranks (unlike the Spearman test, the presence
of tied ranks does not artificially inflate the value of the statistic).
Like the Spearman test, in the Kendall tau-b all the scores are ranked on each
variable. However, it operates on a different principle to the Pearson or Spearman
correlations. The Kendall tau-b assesses how well the rank ordering on the second
variable matches the rank ordering on the first variable. If we put the ranks of the
first variable in order we can place the matched ranks of the second variable alongside.
We can then look at how this orders the ranks of the second variable. In Kendall’s
tau-b, each and every pair of ranks on the second variable is examined. When these
pairs match the order of the ranks of the first variable, then the pair is concordant
and when the order is reversed the pair is discordant. The difference in the number
of concordant and discordant pairs is calculated. This value is compared to the value
when every single pair is concordant to produce tau-b.
Like other correlation coefficients, tau-b ranges from –1 to +1. However, because
the methods used are different, the Kendall tau-b will produce a different value to
that of Spearman, but the probabilities will be very similar.

Scenario
A consumer testing company wanted to pilot
a new breakfast cereal. They decided to ask
people if the new brand was as tasty as a
current leading brand. They gave twenty
people the current leading brand (brand A)
and also gave them the new breakfast cereal
(brand B). The order in which the participants
tasted each brand differed to counterbalance
order effects. Each participant was then asked
to rate their enjoyment of the cereal on a 1–10
scale (1 they didn’t enjoy it and 10 it was very
tasty).

Enter the dataset as shown in the example.

3
• Remember when entering data without decimal places to change the
decimal places to zero in the Variable View.
• As we are worried about the number of tied ranks in our dataset, we are
going to carry out a Kendall tau-b rather than a Spearman correlation. See
previous chapter for the full data entry procedure.

See Chapter 2
LINEAR CORRELATION AND REGRESSION 305

Kendal tau-b test procedure

n The test procedure for the Kendal tau-b is found under the Analyze, Correlate,
Bivariate commands. Send both variables to the Variables box.
n Deselect the default Pearson command and select Kendall’s tau-b.
n Select whether your prediction is One-tailed or Two-tailed. As our prediction
stated a direction, we select the one-tailed option.
n Press OK.

3
The Flag significant
correlations box is
selected as default.
Significant correlations
are highlighted
underneath the output
table with a * for a
significance of p < .05
and ** for p < .01.

SPSS output
The Kendall tau-b output is displayed in the Correlations table.
306 LINEAR CORRELATION AND REGRESSION

SPSS essential
n The Kendall tau-b correlation output matrix shows a correlation coefficient
of .397. As this value is a positive number it shows that our data is positively
correlated. SPSS also indicates with * that it is significant at the .05 level for
a one-tailed prediction. The actual p value is shown to be .017.
n A conventional way of reporting these figures is as follows:
Kendall tau-b = .40, N = 20, p < .05
n These results indicate that as the ratings for breakfast cereal brand A increase,
so do the ratings for breakfast cereal brand B. Therefore, as there are similar
ratings of enjoyment for both cereals, the consumer testing company are
happy to recommend the tasty new breakfast cereal.

INTRODUCTION TO SCATTERPLOTS
A scatterplot or scattergram illustrates the scores or data that we wish to correlate,
3 where the axes are the two variables. If the scores on one variable increase and so do
the scores on the second variable, this is known as a positive correlation. If scores on
While a scatterplot is one variable increase while the scores on the other variable decrease, this is known
generally the most
as a negative correlation. When the points are randomly scattered there is generally
appropriate illustrative
statistic to support a
no correlation between the two variables. Although a scatterplot is recommended as
correlation, when an illustration supporting correlation, it must be used with caution in conjunction
conducting a Kendall with Spearman and Kendall tau-b correlations because nonparametric analyses use
tau-b test it should be the rank scores rather than the actual raw data, whereas the scatterplot displays the
used with caution. The
raw scores.
Kendall tau-b correlation
coefﬁcient is produced When producing a scatterplot you can ask SPSS to produce the regression line
by using the rank of – the line of best ﬁt. This particular line minimises the distance of the points to the
scores rather than the straight line.
actual raw data, whereas
the scatterplot displays
the raw scores. The Scatterplot procedure through Chart Builder
procedure for producing
a scatterplot is shown The procedure for creating scatterplots through the Chart Builder command is
next. similar to other interactive charts and graphs that were produced in an earlier chapter.
We shall use the data of the Pearson correlation example.
See Chapter 4
n Select the Graphs drop-down menu and select Chart Builder. A Chart Builder
window appears, giving you the option to set the measurement level of your
variables. Normally, you have already set the measurement level.
LINEAR CORRELATION AND REGRESSION 307

n Click OK. The main Chart Builder window appears.

n As we are producing a scatterplot, we click on Scatter/Dot on the lower left of
the screen.

n Double click on the type of scatterplot you require. We are using the simple
one on the top left of the 8 choices. Alternatively, you can drag the simple
scatterplot icon into the preview pane.
308 LINEAR CORRELATION AND REGRESSION

n The Chart preview will show the type of scatterplot you wish to produce.
n The Element Properties window also appears. We are not changing the element
properties.

n Select ‘Study Time’ and drag to the X-Axis.

n Select ‘Science Exam’ and drag to the Y-Axis.
LINEAR CORRELATION AND REGRESSION 309

n Press OK.

n Although the positive linear relationship can be seen from the above chart, adding
a regression line will enable a more accurate judgement to be made.
n To insert the regression line, double click inside the scatterplot output and the
SPSS Chart Editor window appears.
310 LINEAR CORRELATION AND REGRESSION

n Select the Elements drop-down menu and click Fit Line at Total. The
Properties window appears. Check that the Linear radio button is selected and
then click Close. Close the Chart Editor.

n Your scatterplot now displays a regression line showing the positive correlation
between study time and science exam marks, as shown below.
LINEAR CORRELATION AND REGRESSION 311

Scatterplot procedure through the Legacy Dialogs

n Select the Graphs drop-
down menu and select
Legacy Dialogs, then
select the Scatter/Dot
option. We shall use
the data of the Pearson
correlation example.

n The Scatterplot window

appears.
n SPSS selects Simple
Scatter as default. Click
on Deﬁne.

n Send one of the variables

to the X Axis box (in this 3
example ‘Study Time’). If you have a predictor
variable it should be sent
n Click on the other variable,
to the X Axis.
‘Science Exam’, and send it
to the Y Axis box.
n Press OK.
312 LINEAR CORRELATION AND REGRESSION

The adjacent chart is produced by

SPSS.

n Although the positive linear

relationship can be seen from
this chart, adding a regression
line will enable a more
accurate judgement to be
made.

n To insert the regression line,

double click inside the
scatterplot output and the
SPSS Chart Editor window
appears.
LINEAR CORRELATION AND REGRESSION 313

n Select the Elements drop-

down menu and select Fit
Line at Total. The Properties
window appears. Check that
the Linear radio button is
selected and then click Close.
n You will now see the
regression line on your
scatterplot.

n Close the small Chart Editor

window and SPSS will return
to your output screen.

n The scatterplot now clearly

illustrates that there is a
positive correlation between
study time and the science
exam marks.
314 LINEAR CORRELATION AND REGRESSION

PARTIAL CORRELATION
Previously we have analysed some example data to show a significant correlation
between study time and science examination performance. However, we might decide
that a third variable, ‘intelligence’, could be influencing the correlation. If intelligence
positively correlates with study time – that is, the more intelligent students spend
the most time studying – and if it also positively correlates with examination
performance – that is, the more intelligent students get the higher marks in the
examination – then the correlation of study time and examination performance might
simply be due to the third factor, ‘intelligence’. If this is the case, then the relationship
between study time and examination performance is not genuine, in that the reason
they correlate is because they are both an outcome of ‘intelligence’. That is, the more
intelligent students both study more and get higher marks in the examination. If we
take out the effect of intelligence, the relationship of study time to examination
performance could disappear.
To answer the question of the influence of intelligence on the study time/
examination performance correlation we need to examine the correlation of study
time and examination performance after removing the effects of intelligence. If the
correlation disappears, we will know that it was due to the third factor. To do this
we calculate a partial correlation.

Data entry
Enter the dataset as shown in the example.

3
See Chapter 2 We could add labels to our variables as in the earlier example. See previous chapter for
the full data entry procedure.
LINEAR CORRELATION AND REGRESSION 315

Partial correlation test procedure

n From the Analyze drop-down menu, select Correlate and then Partial.

n In the Partial Correlations window highlight the two variables that we want partial correlation
to correlate and then send them across to the Variables box. The correlation of two
variables after having removed
n The third factor that we want to control for needs to be sent to the Controlling the effects of a third variable
for box. from both.

3
n Change the Test of Signiﬁcance to One-tailed. Click OK.

If you require the

Output SPSS correlation coefﬁcients
for all three variables
without controlling for
intelligence scores, click
on the Options button
and tick Zero-Order
correlations.

SPSS essential
n The correlation test statistic = .665. The p value is shown to be .025. As this
value is under .05, there is a significant correlation. These figures are
duplicated in the matrix.
n A conventional way of reporting these figures is r = .665, df = 7, p < .05.
316 LINEAR CORRELATION AND REGRESSION

n These results indicate that as study time increases science exam performance
also increases, when the effects of intelligence have been controlled for. This
is a positive correlation. So the relationship between study time and science
exam performance is not a result of intelligence.
n The output matrix also shows that study time is perfectly correlated with
itself, r = 1.000. Similarly, science exam results have a perfect correlation
with science exam, r = 1.000. These values are therefore not required.

LINEAR REGRESSION
We have previously identiﬁed a positive correlation between the two variables ‘study
time’ and ‘science exam’ mark. We may wish to investigate further this relationship
linear regression
by examining whether study time reliably predicts the science exam mark. To do this
A regression that is assumed to
follow a linear model. For two we use a linear regression.
variables this is a straight line
of best ﬁt, which minimises
the ‘error’. Linear regression test procedure

n Select the Analyze drop-down menu. (See below, left.)

n Select Regression and then Linear.
n Highlight ‘science’ and send it across to the Dependent box as we will be
predicting science score from the study time.
n Highlight ‘studytime’ and then send it to the Independent(s) box.
n We will leave all the other choices as their default.
n Click on OK. (See below, right.)
LINEAR CORRELATION AND REGRESSION 317

SPSS output
The ﬁrst table reminds us that we are predicting science scores (the dependent
variable) from the study time (the independent variable).

The next table is the Model Summary, which provides us with the correlation
coefﬁcient. We can compare this table with the output from the Pearson correlation
on the same data, shown earlier.

SPSS essential
n The R Square value in the Model Summary table shows the amount of
variance in the dependent variable that can be explained by the independent
variable.
n In our example the independent variable of study time accounts for 51.9 per
cent of the variability in science exam scores.

SPSS advanced
n The R value (.721a) indicates that as study time increases the science score
also increases, and this is a positive correlation, with r = .721. We know this
to be statistically signiﬁcant from the Pearson correlation output.
n The Adjusted R Square adjusts for a bias in R Square. R2 is sensitive to
the number of variables and scores there are, and adjusted R2 corrects for
this.
n The Std. Error of the Estimate is a measure of the accuracy of the prediction.
318 LINEAR CORRELATION AND REGRESSION

The ANOVA summary table that follows shows details of the signiﬁcance of the
regression.

SPSS essential
n The ANOVA tests the significance of the regression model. In our example,
does the independent variable, study time, explain a significant amount of
the variance in the dependent variable, science exam result?
n As with any ANOVA, the essential pieces of information needed are the df,
the F value and the probability value. We can see from the above table that
F(1,8) = 8.647, p < .05, and therefore can conclude that the regression is
statistically significant.

Now we have the Coefﬁcients output table, which gives us the regression
equation.

SPSS essential
n The Unstandardized Coefﬁcients B column gives us the value of the
intercept (for the Constant row) and the slope of the regression line (from
the Study Time row). This gives us the following regression equation:
Science exam score = 34.406 + .745 Study time
n The Standardized Beta Coefﬁcient column informs us of the contribution
that an individual variable makes to the model. From the above table we can
see that study time ‘contributes’ .721 to science exam performance, which is
our Pearson’s r value.
LINEAR CORRELATION AND REGRESSION 319

SPSS advanced
n The t value (t = 4.539, p < .01) for Constant tells us that the intercept is
signiﬁcantly different from zero.
n The t value for study time (t = 2.941, p < .05) shows that the regression is
signiﬁcant.

LOGISTIC REGRESSION
Sometimes we wish to create a regression equation that predicts a binary dependent
variable rather than a continuous dependent variable. So, rather than predicting an
examination mark (as in the example above) we wish to predict whether the value
of a variable will be 0 or 1, or no or yes. A logistic regression aims to see whether
a value of the binary dependent variable can be predicted by the scores of an
independent variable – for example, which factor(s) might signiﬁcantly predict
whether students will succeed or fail at a speciﬁc exam.
With binary dependent variables the regression is not based directly on the
function of the straight line but on the logistic function, which ranges between 0
and 1. If the probability of a ‘yes’ response to a yes/no question in a questionnaire
is p, then the odds of getting a ‘yes’ response is p/(1 – p). The natural logarithm of
these odds is called the logit, ln (p/(1 – p)). It is the logit, rather than Y (as we saw
in the linear regression above) that is predicted to be linearly related to the
independent variable X. We can rearrange the regression equation of logit = a + bX
to predict the values of p. We predict that Y = 1 when p > = .5 and Y = 0
when p < 0.5. The point where a + bX = 0 is the point at which the prediction from
0 to 1 changes with a 1 (or yes value) is predicted with positive values and a 0 (or
a no value) predicted with negative values.
A logistic regression is the appropriate test to use when we have an independent
variable/s that are measured on an interval scale and we are trying to predict group
membership to a dependent variable measured on a nominal category.

Scenario
It was noted in a large town that many drivers used their cars to drive to work.
In order to reduce trafﬁc congestion and support sustainable travel methods, twenty
drivers in a commuter car park were asked their distance to work (measured in miles)
and also whether they would use public transport if the price was reduced by
20 per cent (responses ‘yes’ or ‘no’).

Data entry
Enter the dataset as shown in the example. The Switch variable records whether the
commuter would change to public transport with the price reduction. A value of 0
is coded for ‘no’ responses and 1 for ‘yes’ responses. See earlier chapter for full data See Chapter 2
entry procedure.
320 LINEAR CORRELATION AND REGRESSION

3
• Binary responses are often recorded as either a 0 or a 1.
For a logistic regression you need to label your values as either
a 0 or a 1.
• Remember that to see the numerical values instead of the
value labels you need to go to the View drop-down menu and
deselect Value Labels.

Simple (binary) logistic regression test

procedure
n Select the Analyze drop-down menu.
n Select Regression and then Binary Logistic.
n Click on the ‘Would switch to public transport’ variable and send it across
to the Dependent box as we will be predicting whether participants
would switch to public transport depending on the distance they lived
from work.
n Select ‘Distance’ and then send it to the Covariates box.
n We will leave all the other choices as their default.
n Click on OK.
LINEAR CORRELATION AND REGRESSION 321

SPSS output
The ﬁrst table is the Case Processing Summary which tells us how many cases are
included in the analysis.

n We can see that there were 20 cases included for analysis and there were no
Missing Cases.

The next table is the Dependent Variable

Encoding table. This shows the values that our
dependent variable has been assigned.

n In our example, ‘no’ has been coded as 0 and

‘yes’ has been coded as a 1.

The Block 0: Beginning Block shows the model with no predictor variables
included.

Block 0: beginning block

n The rows in the Classiﬁcation Table display the observed number of 0s and
1s that are observed in our dependent variable.
n It shows that a basic model predicting that all the results as ‘yes’ responses –
without including the ‘Distance’ variable – would give an accuracy of prediction
of 60 per cent.
322 LINEAR CORRELATION AND REGRESSION

n The Variables in the Equation table shows the signiﬁcance of the basic model
without having included the ‘Distance’ variable.
n These values are generally not usually reported in reports or academic papers.

n By examining the Variables not in the Equation box we can see that the
variable ‘Distance’ (if it had been entered into the equation) would have been a
signiﬁcant predictor of the ‘Switch to public transport’.

Now SPSS produces Block

1: Method = Enter which shows
the model with the predictor
variable included and which
method of regression was used.

SPSS essential
n The ﬁrst table in the block is the Omnibus Tests of Model Coefﬁcients.
This shows how much the current step, block, and model predicts the
dependent variable compared to the basic model in Block .

In our example as we have just one independent variable, all of the values are
the same. There is a chi-square of 8.477, df = 1, p = .003. As the chi-square is
signiﬁcant it indicates that the new model is a better predictor than the basic model
in Block 0.

a. Estimation terminated at iteration number

5 because parameter estimates changed by
less than .001.

SPSS advanced
n The Model Summary table presents estimations of the amount of variance
explained by the logistic regression model.
n The Cox & Snell R Square value and the Nagelkerke R Square value in
the Model Summary table show estimates of the amount of variance in the
dependent variable that can be explained by the independent variable. The
Nagelkerke value is usually the larger of the two and more often reported,
indicating that 46.7 per cent of the variation is explained by the model.
LINEAR CORRELATION AND REGRESSION 323

SPSS essential
n The Classification Table displays the number of observed cases that are
correctly predicted by the model. It also shows the Overall Percentage of
the cases that are predicted by the model. We can see that 70 per cent of the
cases are predicted by the model. This is a higher value than the previous
Classification Table of the basic model in Block 0, showing that the model
has more predictive power.
n The Variables in the Equation table shows the output of the model including
the predictor variable of ‘Distance’ plus the constant. This table shows the
logistic regression model that has been produced for the data.
n The Wald statistic gives the significance of each component of the logistic
regression. Distance is significant at p < .05 (Wald = 4.965, df = 1, Sig = .026).
n As we can see, ‘Distance’ has a significant effect on the prediction. The Exp(B)
value, as it is larger than 1, indicates that as Distance increases with each
additional mile (i.e. increases by one), then the odds that the person will
switch to a ‘yes’ response increases.

SPSS advanced
n The B values give the logistic regression coefﬁcients that can be used in the
formula to predict the logit values. The logit is 0 at the point where the
prediction changes from no to yes. So we can use the B values to ﬁnd this
point, as the regression equation is logit = aX + b. From the table a = .104
and b = –2.522, so 0 = .104 × X – 2.522 can be solved to give the value of
Distance at the cut-off point. So X = 2.522/0.104 = 24.25. Therefore, the
logistic regression model predicts a ‘yes’ response for commuters who travel
over 24.25 miles and a ‘no’ response for those driving less than 24.25 miles.
324 LINEAR CORRELATION AND REGRESSION

FAQ
The following Measurement Level window appears.

What does it mean?

This means that your variables have not been assigned a measurement level. You can click Scan
Data for SPSS to assign the measurement level or go to the variable screen and set them there.

FAQ
I’ve carried out a correlation and my output states the signiﬁcance for a two-tailed test.
However, my prediction is one-tailed. What have I done wrong?
When carrying out correlations in SPSS you need to specify whether you require the test to be
one- or two-tailed in the Bivariate Correlations window. To obtain a one-tailed signiﬁcance
value after selecting a two-tailed calculation, divide the p value by 2, or redo the correlation but
this time select the one-tailed option.

FAQ
I have predicted a positive correlation in my study (one-tailed). My result has come as
r = – 0.7, which is highly signiﬁcant (p < .01). Have I found support for my hypothesis?
No, you have predicted a positive correlation, which is that r would be between 0 and +1.
However, your results show a negative correlation with r = – 0.7, so the result has gone in the
opposite direction to that which you predicted.

FAQ
I have got two variables but am unsure whether to calculate a correlation or regression.
This will depend on what exactly you want to ﬁnd out from your analysis. If you are interested
in the strength of the linear relationship between the two variables, then a correlation will be
the most appropriate. However, if you wish to predict values of one variable by the values of the
other variable, you should be calculating a linear regression.

See Chapter 20, Hinton Further details of linear correlation and linear regression can be found in Hinton
(2014) (2014).

Chapter4
No ratings yet
Chapter4
86 pages
Introduction To Correlation and Regression Analysis
No ratings yet
Introduction To Correlation and Regression Analysis
14 pages
Pearson R Correlation
100% (1)
Pearson R Correlation
8 pages
GE 04 - Mathematics in The Modern World-Topic 2-Lesson 6 - Correlation and Regression Analysis
No ratings yet
GE 04 - Mathematics in The Modern World-Topic 2-Lesson 6 - Correlation and Regression Analysis
29 pages
Module III Correlation and Regression
No ratings yet
Module III Correlation and Regression
61 pages
Lecture 4 Regression Analysis
No ratings yet
Lecture 4 Regression Analysis
51 pages
BUP-06-Correlation, Regression and Logistic
No ratings yet
BUP-06-Correlation, Regression and Logistic
27 pages
Stat Chapter 9
No ratings yet
Stat Chapter 9
34 pages
Pearson Correlation Example
No ratings yet
Pearson Correlation Example
9 pages
Pred 354 12th Lesson
0% (1)
Pred 354 12th Lesson
16 pages
Chapter - 5 - Correlation and Regression
No ratings yet
Chapter - 5 - Correlation and Regression
70 pages
Pearson Correlation Coefficient and Interpretation in SPSS
No ratings yet
Pearson Correlation Coefficient and Interpretation in SPSS
8 pages
Lesson 3
No ratings yet
Lesson 3
36 pages
مايو10- Correlation
No ratings yet
مايو10- Correlation
44 pages
Topic 11 - Correlation
No ratings yet
Topic 11 - Correlation
32 pages
P ', C - S, - T, Anova: Earson SR HI Quare EST AND
No ratings yet
P ', C - S, - T, Anova: Earson SR HI Quare EST AND
86 pages
PBH7003 Tests of Relationships
No ratings yet
PBH7003 Tests of Relationships
68 pages
Chap 15
No ratings yet
Chap 15
44 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Stats Unit 2
No ratings yet
Stats Unit 2
24 pages
Biostatistics PPT - 6
No ratings yet
Biostatistics PPT - 6
35 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
5 pages
Chap 15
No ratings yet
Chap 15
44 pages
Pearson Correlation - SPSS Tutorials - LibGuides at Kent State University
No ratings yet
Pearson Correlation - SPSS Tutorials - LibGuides at Kent State University
13 pages
Correlation & Regression-I
No ratings yet
Correlation & Regression-I
43 pages
Quantitative Methods Workshop III
No ratings yet
Quantitative Methods Workshop III
75 pages
Data Analysis: Parametric vs. Non-Parametric Tests
No ratings yet
Data Analysis: Parametric vs. Non-Parametric Tests
19 pages
Test of Difference Correlational SP Es
No ratings yet
Test of Difference Correlational SP Es
6 pages
Correlation Analyses
No ratings yet
Correlation Analyses
8 pages
Correlation and Regression Original
No ratings yet
Correlation and Regression Original
44 pages
Correlation Analysis Correlations: Pearson Product Moment Correlation and Spearman Rank-Order Correlation
100% (1)
Correlation Analysis Correlations: Pearson Product Moment Correlation and Spearman Rank-Order Correlation
29 pages
The Concept of Correlation
No ratings yet
The Concept of Correlation
2 pages
Lesson 6.2 Correlation and Regression Analysis Final Edition
No ratings yet
Lesson 6.2 Correlation and Regression Analysis Final Edition
8 pages
Psych Assess Chap 4
No ratings yet
Psych Assess Chap 4
5 pages
Correlation Correlation: Some Commonly Used Jargons Some Commonly Used Jargons
0% (1)
Correlation Correlation: Some Commonly Used Jargons Some Commonly Used Jargons
16 pages
Statistic Group 4
No ratings yet
Statistic Group 4
12 pages
Module-4
No ratings yet
Module-4
35 pages
Pearson's Correlation
No ratings yet
Pearson's Correlation
10 pages
4 Pearson R
No ratings yet
4 Pearson R
30 pages
Linear Correlation (Pearson) : Assumptions
No ratings yet
Linear Correlation (Pearson) : Assumptions
2 pages
Post, or Distribute: Correlation and Regression - Pearson and Spearman
No ratings yet
Post, or Distribute: Correlation and Regression - Pearson and Spearman
35 pages
PMC 500 Statistical Reasoning in Education: Correlation
No ratings yet
PMC 500 Statistical Reasoning in Education: Correlation
45 pages
Conduct and Interpret A Pearson Correlation
No ratings yet
Conduct and Interpret A Pearson Correlation
2 pages
Correlation (Pearson, Kendall, Spearman)
100% (1)
Correlation (Pearson, Kendall, Spearman)
4 pages
CORRELATION ANALYSIS Pearson's R
No ratings yet
CORRELATION ANALYSIS Pearson's R
3 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
63 pages
19 - Correlation and Regression
No ratings yet
19 - Correlation and Regression
7 pages
Session 12
No ratings yet
Session 12
9 pages
Correlation Analysis
No ratings yet
Correlation Analysis
102 pages
Psychstat Semifinals Reviewer (Bundalian)
No ratings yet
Psychstat Semifinals Reviewer (Bundalian)
8 pages
Psychstat Semifinals Reviewer
No ratings yet
Psychstat Semifinals Reviewer
5 pages
SPSS Pearson R
No ratings yet
SPSS Pearson R
20 pages
Spss Tutorials: Pearson Correlation
No ratings yet
Spss Tutorials: Pearson Correlation
10 pages
Trickling Filter Design
0% (1)
Trickling Filter Design
5 pages
1ststeps in Hyphothesis Testing
No ratings yet
1ststeps in Hyphothesis Testing
17 pages
Pearson Correlation Coefficient
No ratings yet
Pearson Correlation Coefficient
4 pages
A Semi-Detailed Lesson Plan in Statistics and Probabilit1
0% (1)
A Semi-Detailed Lesson Plan in Statistics and Probabilit1
5 pages
Lesson 7 Pearson Product of Moment Coefficient Correlation
No ratings yet
Lesson 7 Pearson Product of Moment Coefficient Correlation
6 pages
The Binomial of Degree 100 Can Be
No ratings yet
The Binomial of Degree 100 Can Be
84 pages
A Combined Genetic Adaptive Search (Geneas) For Engineering Design
100% (3)
A Combined Genetic Adaptive Search (Geneas) For Engineering Design
34 pages
Schrodinger's Cat
No ratings yet
Schrodinger's Cat
4 pages
2 Uninformed Search
No ratings yet
2 Uninformed Search
41 pages
Information Theory and Coding
No ratings yet
Information Theory and Coding
10 pages
Artificial Neural Networks B Yegnanarayana Instant Download
No ratings yet
Artificial Neural Networks B Yegnanarayana Instant Download
90 pages
EXP-1-To Implement Linear Regression
No ratings yet
EXP-1-To Implement Linear Regression
5 pages
BUSI1701 PPD1 Module Handbook Partners 2022-23 HCM
No ratings yet
BUSI1701 PPD1 Module Handbook Partners 2022-23 HCM
40 pages
برمجة خوارزميات تشفير vb
No ratings yet
برمجة خوارزميات تشفير vb
21 pages
Bike Sharing Prediction Project Structure
No ratings yet
Bike Sharing Prediction Project Structure
37 pages
Mandorino - The Interaction of Fitness and Fatigue On Physical and Tactical Performance
No ratings yet
Mandorino - The Interaction of Fitness and Fatigue On Physical and Tactical Performance
14 pages
Inquiry Matrix Sheet 2005 10 11
No ratings yet
Inquiry Matrix Sheet 2005 10 11
1 page
MP2 - Influence Persuausion - Slide Deck - Wk7 of Learning
No ratings yet
MP2 - Influence Persuausion - Slide Deck - Wk7 of Learning
28 pages
Methodes Numerique
No ratings yet
Methodes Numerique
32 pages
CS2004
No ratings yet
CS2004
2 pages
Experiment-3: Convolution: Signals and Systems Lab (EC2P002)
No ratings yet
Experiment-3: Convolution: Signals and Systems Lab (EC2P002)
4 pages
SP6 Solution
No ratings yet
SP6 Solution
10 pages
ECTS-Bogen - Computer Science
No ratings yet
ECTS-Bogen - Computer Science
2 pages
Interview Script - Time Management, Well-Being, and Academic Performance
No ratings yet
Interview Script - Time Management, Well-Being, and Academic Performance
5 pages
Q Learning
No ratings yet
Q Learning
9 pages
A Preliminary Study On Accelerating Simulation Optimization With GPU Implementation
No ratings yet
A Preliminary Study On Accelerating Simulation Optimization With GPU Implementation
15 pages
23bai10541 - Saad - Lab Manual
No ratings yet
23bai10541 - Saad - Lab Manual
11 pages
The Floyd-Warshall Algorithm: Andreas Klappenecker
No ratings yet
The Floyd-Warshall Algorithm: Andreas Klappenecker
15 pages
Stochastic Differential Equations With Multi-Marko
No ratings yet
Stochastic Differential Equations With Multi-Marko
12 pages
GirirajParcha 2025
No ratings yet
GirirajParcha 2025
1 page
Expert Veri Ed, Online, Free.: Unlimited Access
No ratings yet
Expert Veri Ed, Online, Free.: Unlimited Access
3 pages
Adarsh
No ratings yet
Adarsh
6 pages
Grades - Presentation Skils
No ratings yet
Grades - Presentation Skils
1 page
Dsa 7
No ratings yet
Dsa 7
9 pages
Rmluo 230822115131
No ratings yet
Rmluo 230822115131
1 page
Zeeshan (CS) - Assignment 1
No ratings yet
Zeeshan (CS) - Assignment 1
3 pages
Alpha Numeric Sorting
No ratings yet
Alpha Numeric Sorting
1 page
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Cross Correlation: Unlocking Patterns in Computer Vision
From Everand
Cross Correlation: Unlocking Patterns in Computer Vision
Fouad Sabry
No ratings yet