0% found this document useful (0 votes)
12 views41 pages

Chapter 5 - Bivariate Correlation - 2023

Chapter 5 - Bivariate Correlation - 2023

Uploaded by

vilarongavictor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views41 pages

Chapter 5 - Bivariate Correlation - 2023

Chapter 5 - Bivariate Correlation - 2023

Uploaded by

vilarongavictor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Gignac, G. E. (2023). How2statsbook (Online Edition 2).

Perth, Australia:

For data files visit: www.how2statsbook.com

5
Bivariate Correlation

Contents
Pearson Correlation ................................................................................................................ 2
Formulation ......................................................................................................................... 3
Calculating the Statistical Significance of a Correlation: Manually in Excel ........................ 5
Pearson Correlation – SPSS ................................................................................................. 8
Coefficient of Determination .................................................................................................. 9
Interpretation Guidelines ........................................................................................................ 9
Pearson Correlation – Confidence Intervals ......................................................................... 10
Pearson Correlation – Confidence Intervals – SPSS .......................................................... 10
Pearson Correlation: Negative Correlation ....................................................................... 12
Pearson Correlation: Non-Significant Correlation ............................................................ 12
Pearson Correlation: Power .............................................................................................. 13
Can the null hypothesis ever be accepted? ...................................................................... 14
Pearson Correlation Power: SPSS...................................................................................... 15
Scatter Plots ...................................................................................................................... 16
Pearson Correlation: Assumptions........................................................................................ 18
Spearman Rank Correlation .................................................................................................. 21
Spearman Rank Correlation: SPSS ..................................................................................... 21
Summary ............................................................................................................................... 22
Advanced Topics ................................................................................................................... 24
Pearson Covariation .......................................................................................................... 24
Bootstrapped Confidence Intervals: SPSS ......................................................................... 24
Pearson Correlation: Randomization Test ........................................................................ 26
Pearson Correlation: Randomization Test – SPSS ............................................................. 27
Do not Praise p < .05 ......................................................................................................... 30
Spearman Correlation: Confidence Intervals .................................................................... 31
Polychoric Correlation ....................................................................................................... 31
Polychoric Correlation: SPSS ............................................................................................. 32
Tetrachoric Correlation ..................................................................................................... 33
Test the Difference Between Two Correlations ................................................................ 33
Bayesian Analysis .............................................................................................................. 34
Correction for Small Sample Bias ...................................................................................... 36
Correction for Range Restriction (in progress) ................................................................. 36
Practice Questions ................................................................................................................ 37
Advanced Practice Questions................................................................................................ 39

Is getting a college/university degree a good financial investment? My hunch is that


most people think so, however, it is a rather complicated question to answer in a clear and
convincing way. In this chapter, I demonstrate how to answer the following related question:
CHAPTER 5: BIVARIATE CORRELATION

Is there an association between completing more years of education and earning more
money? Based on a sample of 7,403 Americans, Zagorsky (2007) found that there was an
association. In the first section of this chapter, I describe and demonstrate the statistic that
Zagorsky (2007) used to uncover the answer to the question: the Pearson correlation.

Pearson Correlation
The Pearson correlation coefficient (a.k.a., Pearson’s r) quantifies the degree of
linear association between two variables which are assumed to be measured on an
interval/ratio scale. The Pearson correlation can take on two types of values in terms of
direction: positive and negative. A positive correlation implies that as the numerical value
of one variable increases, the numerical value of another variable also increases. For
example, the correlation between height and weight is positive in nature: taller people tend
to be heavier than shorter people. It’s a tendency, not a 1 to 1 association. By contrast, a
negative correlation implies that as the numerical value of one variable increases, the
numerical value of another variable decreases. For example, the outside temperature and
the amount of clothes people wear: Hotter days tend to be associated with less clothes
worn.
A Pearson correlation is known as a standardized index, because it is range bound.
Theoretically, values of Pearson’s r can range in magnitude from 1.0 to -1.0. Most
correlations reported in scientific papers are somewhere between |.10| and |.70| (or -.10
and -.70). It is rare to see a correlation larger than |.70|. A Pearson r value of .00 implies the
total absence of an association between two variables. It is also relatively uncommon to
observe a correlation of exactly .00.
Many published studies use correlations to report their results. For example, Lamont
and Lundstrom (1977) reported a correlation of r = .58 between height and weight in 143
salespeople. So, as you would expect, taller people tended to be heavier. More surprisingly,
a correlation of r = .36 was reported between height and incentive earnings in the same
sample of 143 salespeople. So, taller salespeople tended to achieve better sales. By contrast,
Lamont et al. (1997) reported a negative correlation of r = -.20 between empathy and job
performance in the salespeople. So, higher levels of empathy were associated with lower
levels of sales job performance (interesting, huh?). Finally, Lamont et al. (1977) reported an
essentially zero correlation (r = -.02) between number of salesperson hobbies and sales
performance. Obviously, Lamont et al. (1977) found the Pearson correlation to be a useful
and versatile method to describe the results of their study. Fortunately, calculating a
Pearson correlation is easy, if you understand z-scores already (see Chapter 2).

Hypotheses
Typically, researchers specify (either explicitly or implicitly) two hypotheses in
relation to a correlation analysis: a null hypothesis and an alternative hypothesis.

C3.2
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Null Hypothesis (H0): There is no association between the two variables of interest.

Alternative Hypothesis (H1): There is an association between two variables of interest.

Researchers conduct inferential statistics in order to determine whether they can


reject the null hypothesis or not. If the statistical analysis is associated with a p < .05, they
tend to reject the null hypothesis and suggest that there is evidence in favour of the
alternative hypothesis.

Formulation
I believe the best way to understand the nature of the Pearson correlation is to
understand that it is, essentially, simply the average of a series of two z-scores (multiplied
together). Recall that a z-score is a standardized score with a mean of 0 and a standard
deviation of 1. If you were interested in the association between two variables, the scores
associated with the two variables could be converted into z-scores. When there is a positive
association between two variables, the larger z-scores across the two variables will tend to
“hang” together. That is, the cases in the sample of data with a relatively large z-scores on
one variable will tend to correspond to relatively large z-scores on the other variable.
Alternatively, when there is a negative correlation between two variables, the positive z-
scores on one variable tend to be matched up with negative z-scores on the other variable
across the cases.
Consider the alternative hypothesis associated with the opening of this chapter:
there is an association between years of education completed and earnings. For the
purposes of demonstration, I have simulated a data set which corresponds very closely to
the results reported by Zagorsky (2007). To keep the calculations more manageable, I have
simulated the data to correspond to a sample of 40 participants, rather than the 7,403
participants included in the Zagorsky (2007) investigation. I have also divided the annual
earnings variable by 365 to yield earnings per day, again, to keep the calculations more
manageable. As can be seen in the Table C5.1, the X column corresponds to years of
education completed, and the Y column corresponds to earnings per day. There are four
steps to the calculation a Pearson correlation (Data File: education_earnings_N_40):

1: Convert the raw scores into z-scores


2: Multiply each cases corresponding z-scores
3: Sum the multiplied z-scores across all cases
4: Divided the sum of product z-scores by N – 1

The above four steps can be summarised neatly with the following Pearson correlation
formula:

Pearson' s r =
∑z z x y
(1)
N −1
C3.3
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Where ∑z x z y is equal to the sum of cross-products of z-scores associated with variables


X and Y, and N - 1 is equal to the sample size minus 1. The reason it is necessary to subtract
the sample size by 1 (N – 1) is because it is known that this “correction” will provide a more
accurate estimate of a Pearson correlation at the population level (Watch Video 5.1:
Pearson r Formula Explained)
As can be seen in Table C5.1, I calculated the z-scores for each of the two variables
(X and Y). The z-scores for years of education are in the ZX column. The z-scores for the
earnings per day are in the ZY column. If you need a refresher on z-scores, consult chapter 2.
Next, I multiplied the ZX and ZY variables together (see column ZXZY). I then summed the
product of the z-scores. Finally, I divided the sum of the products by N-1 (i.e., 39). The result
of the division corresponded to .338. Researchers tend to write r = .388. The value of r = .338
is the Pearson correlation between years of education completed and earnings per day for
Americans. Finally, I’ll note that r = .338 corresponds very closely to the Pearson correlation
reported by Zagorsky (2007) with N = 7,403 (Watch Video 5.2: Pearson Correlation
Calculations Step-by-Step with Excel).

13.19
Pearson' s r = = .338
39

C3.4
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Table C5.1. Calculations involved with a Pearson Correlation


ID X zx Y zy zx zy
1 17 1.04 35.08 -1.21 -1.26
2 18 1.37 63.01 -.81 -1.11
3 15 .39 46.58 -1.05 -.41
4 14 .07 44.57 -1.08 -.08
5 12 -.59 28.24 -1.31 .77
6 16 .72 90.41 -.41 -.30
7 15 .39 71.20 -.69 -.27
8 12 -.59 38.82 -1.16 .68
9 18 1.37 126.03 .10 .14
10 16 .72 101.37 -.25 -.18
11 13 -.26 61.79 -.83 .22
12 19 1.70 153.42 .50 .85
13 13 -.26 67.57 -.74 .19
14 13 -.26 69.10 -.72 .19
15 13 -.26 71.18 -.69 .18
16 13 -.26 73.61 -.66 .17
17 15 .39 101.57 -.25 -.10
18 13 -.26 91.26 -.40 .10
19 13 -.26 92.92 -.38 .10
20 13 -.26 97.52 -.31 .08
21 9 -1.57 54.95 -.93 1.46
22 11 -.91 84.41 -.50 .46
23 15 .39 128.80 .14 .05
24 7 -2.22 41.62 -1.12 2.49
25 14 .07 129.82 .16 .01
26 13 -.26 123.87 .07 -.02
27 16 .72 163.12 .64 .46
28 10 -1.24 97.82 -.31 .38
29 12 -.59 121.12 .03 -.02
30 12 -.59 133.36 .21 -.12
31 9 -1.57 108.32 -.15 .24
32 17 1.04 206.88 1.27 1.32
33 20 2.02 243.20 1.80 3.64
34 16 .72 206.93 1.27 .91
35 11 -.91 180.27 .89 -.81
36 21 2.35 293.88 2.53 5.95
37 13 -.26 206.30 1.26 -.33
38 12 -.59 211.62 1.34 -.79
39 9 -1.57 217.34 1.42 -2.23
40 14 .07 280.59 2.34 .16
Sum 13.19
Note. X = Education; Y = Earnings; Zx = Education z-scores; Zy = earnings z-scores

Calculating the Statistical Significance of a Correlation: Manually in Excel


Now that the Pearson correlation has been estimated, it must be determined
whether it is significant statistically or not. In more precise terms, the p-value associated
with the correlation must be estimated. For those interested, I demonstrate in this section
C3.5
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

how to calculate the statistical significance (i.e., p-value) associated with a Pearson
correlation by comparing it to the t-distribution. I believe doing so will help you understand
the nature of statistics better. However, for those not interested in such things, I
demonstrate in the following section how to perform a Pearson correlation in SPSS through
the menu options, which calculates the statistical significance of the correlation
automatically, without any mention of the t-distribution.
Recall from chapter 3 that z-values and t-values of |1.96| or greater from their
respective means are relatively rare observations. Furthermore, with respect to the one-
sample t-test, a calculated t-statistic greater than |1.96| was found to be so large as to be
conventionally considered statistically significant (i.e., p < .05), when the sample size was
approximately 50 or greater, because less than 5% of calculated t-statistics would be
expected to be that large or larger. In the case of the Pearson correlation, the task is to
convert the raw score Pearson correlation into a t-statistic, so that it can be evaluated
against the t-distribution for it is “unusualness”. By “unusualness”, I mean unusually deviant
from the expectation of a t-value of zero (i.e., correlation of .00), within sampling
fluctuations (i.e., the null hypothesis). To do so, it is necessary to execute the following three
steps:

Step 1: Calculate the standard error for the correlation


Step 2: Divide the correlation by the standard error to obtain the t-statistic
Step 3: Compare the calculated t-statistic against the t-distribution to obtain the p-value.

In order to execute step 1, the Pearson correlation standard error must be


calculated. Recall that the concept of a standard error was discussed in detail in chapter 3.
In the Pearson correlation context, the standard error represents the expected standard
deviation of correlations, if the study were to be conducted many times over again, with
different random samples (but with the same size N). That is, based on the sample of 40
cases used in the current example, the correlation between education and earnings was
estimated at .338. If the same study were conducted with a new sample of 40 cases, it is
highly unlikely that the estimated correlation would be exactly .338. Instead, the estimate
of the correlation would vary from sample to sample simply by chance. In statistics, the fact
that estimates of the same parameter vary from sample to sample is known as “error”.
Specifically, it is referred to as the standard error.
As mentioned in chapter 3, virtually every statistic has been discovered to be
associated with a particular standard error. Most statistics have their own, unique different
standard error formula. The Pearson correlation has the following standard error (SEr)
formula:

1− r2
SE r = (1)
N −2

C3.6
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

As was the case with the standard error of the mean (chapter 3), sample size is a key
characteristic in the estimation of the standard error of a Pearson correlation. The larger the
sample size, the smaller the amount of standard error. For the current example (r = .338; N
= 40), the standard error was estimated at .153:

SE r =
1 − .114
38
SE r = .153
Is .153 a lot of standard error? It depends on the estimated correlation. In this
example, the correlation point-estimate was estimated at .338, which is about 2 times larger
than the standard error. In order to determine whether the estimated correlation is
sufficiently large, in comparison its standard error estimate, to be declared as “statistically
significant”, the correlation can be converted into a t-statistic by dividing the observed
correlation by the corresponding standard error estimate:
r
t = 1− r2 (2)
N −2
The ratio of the correlation to the standard error (i.e., formula 2) is known to follow the t-
distribution with specified degrees of freedom. As mentioned previously, statisticians
typically regard a t-value of |1.96| as a relatively unusual value. So unusual, one would be
justified in considering the value “statistically significant” or “beyond chance”, in most cases.
I have to write “in most cases”, because it depends on sample size (or degrees of freedom,
more precisely).
Based on the years of education and earnings example, where the correlation was
estimated at .338 (and N = 40), the following t-value was calculated:
.338
t=
1 − .114
N −2
.338
t=
.153
t = 2.21
Thus, the t-statistic was estimated at 2.21 (i.e., .338 / .153). What are the chances
of having obtained a t-statistic as large as |2.21| or larger just by chance? I’d say not high,
but it is necessary to estimate the probability precisely, because sample size does come into
play, here. To do so, the calculated t-statistic of 2.21 can be placed within the context of the
t-distribution with specified degrees of freedom, in order to determine the proportion (i.e.,
p-value) of t-statistics that are equal to or larger than |2.21|, when the null hypothesis is
expected to be true (recall, we always go into a study expecting the null hypothesis to be
true). The chances of having obtained a t-value of |2.21| (or greater) just by chance can be
calculated with the following Excel (or Google Sheets) function (Watch Video 5.3: Convert
Pearson r into t-value to get p-value):
C3.7
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

=TDIST(2.21, 38, 2)

The value of 38 in the function above corresponds to the degrees of freedom (N – 2). The
value of 2 corresponds to a two-tailed test. I obtained the following result:

Thus, based on 38 degrees of freedom (N – 2), a calculated t-value of 2.21


corresponded to a p-value of .033. Thus, the null hypothesis of no association between years
of education complete and annual earnings can be rejected, p = .033. In practice, today,
there is no need to calculate a t-value for a correlation, and then obtain the p-value. Instead,
commonly used statistical programs do it all for you, automatically.

Pearson Correlation – SPSS


In order to conduct a Pearson correlation efficiently in SPSS, you can use the
‘Bivariate’ menu option (Watch Video 5.4: Pearson’s r in SPSS). The analysis produced the
following output:

It can be seen that the Pearson correlation was estimated at .337. Furthermore, the
correlation was associated with a p-value of .033, which is identical to the p-value I estimated
“manually” above with the t-formula and the corresponding t-distribution. As the p-value
was less than .05, one would conclude that the correlation is statistically significant, p < .05,
or p = .033 more precisely. Unfortunately, SPSS does not report the t-value in the
‘Correlations’ table. Consequently, if you wanted to report the t-value associated with a
Pearson r analysis, you would need to use the procedure describe in the preceding section.
In practice, researchers rarely ever report the t-value associated with a Pearson r analysis.
Consequently, you shouldn’t feel compelled to do so yourself. In fact, you might confuse
your readers if you did.

C3.8
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Coefficient of Determination
A beneficial characteristic of the Pearson correlation is that it can be squared. When
a Pearson correlation is squared (r2), it represents the proportion of variance in the
dependent variable that is accounted for by the independent variable. The proportion is
known as the coefficient of determination. A coefficient of determination can be multiplied
by 100 to transform the proportion into a percentage.
In the education and earnings example, the independent variable was years of
education completed, and the dependent variable was earnings per day. As the correlation
coefficient was estimated at .338, the coefficient of determination was equal to .114.
Multiplied by 100, one gets 11.4%. Thus, 11.4% of the variance in earnings was accounted
for by years of education completed. That doesn’t seem like much, does it? Correspondingly,
88.6% of the variability in how much people earn had nothing to do with years of education
completed. 1
Perhaps this low value should not come as a surprise. We all know plenty of people
who did not complete a university education and went on to earn very good money, often
in business, real-estate, or entertainment, for example. Also, academics are some of the
most educated people around, but many earn only about a one standard deviation above
the mean in earnings. Money isn’t everything, of course. Despite these observations, there
was, nonetheless, a statistically significant association between years of education
completed and earnings. Unfortunately, SPSS does not have an option to calculate the
coefficient of determination. It must be calculated “by hand” with a calculator (Watch Video
5.5: How to calculate the coefficient of determination).

Interpretation Guidelines
Cohen (1992) published some extremely popular guidelines for interpreting the
magnitude of a Pearson correlation: |.10| = small; |.30| = medium, and |.50| = large. These
values were suggested based on Cohen’s experience reading published scientific papers.
Expressed as coefficients of determination, Cohen’s guidelines correspond to .01, .09, and
.25; or 1%, 9% and 25% shared variance. Based on Cohen’s (1992) guidelines, the estimated
correlation of .338 between years of education completed and earnings would be
considered a medium sized correlation.
Gignac and Szodorai (2016) were a bit sceptical of Cohen’s guidelines. Consequently,
Gignac and Szodorai (2016) took the time to review a large number of papers systematically
to determine small, medium, and large correlations empirically (not just a hunch). Gignac
and Szodorai (2016) found that Cohen’s (1992) guidelines were too conservative.
Specifically, they found that a .50 correlation occurred in less than 3% of studies. Thus,

1
One of the weirdest statistical terms is the ‘coefficient of alienation’. It represents the proportion
of variance in the dependent variable that is not accounted for by the independent variable. It is the
opposite of the coefficient of determination and can be estimated with the following formula: 1 -
r2. In this example, the coefficient of alienation was .886.
C3.9
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Gignac and Szodorai (2016) suggested the following empirically derived guidelines for
interpreting the magnitude of a correlation: < .10 = relatively small; .20 = typical; > .30 =
relatively large. Thus, based on Gignac and Szodorai’s (2016) guidelines, the correlation of
.338 between education and earnings would be considered relatively large.

Pearson Correlation – Confidence Intervals


In addition to reporting the p-value associated with a Pearson correlation, it is often
beneficial to report the 95% confidence intervals. In fact, it has become strongly
recommended to report confidence intervals for all inferential results in scientific papers
(APA, 2001; ERA, 2006). In simple terms, a confidence interval provides an estimate of
precision. For example, just how confident are we that the correlation between years of
education and earnings in the population is precisely .338, as estimated in the example
above based on 40 participants? Technically, statisticians refer to the estimated effect as a
point-estimate. In this example, .338 is the point-estimate.
In contrast to a point-estimate, confidence intervals have two values: a lower-
bound estimate and an upper-bound estimate. The confidence intervals “surround” the
point-estimate. Confidence intervals associated with a correlation provide information
about the accuracy of a point-estimate, from a repeated sampling perspective. Other than
the level of confidence one wishes (say, 90%, or 95%, or 99%), there is only one other factor
that determines the magnitude of the range in the lower-bound and upper-bound estimates:
sample size. The greater the sample size, the narrower the range in the lower-bound and
upper-bound estimates. All other things equal, larger sample sizes help to provide greater
confidence in the precision of a point-estimate, as described in detail in chapter 3. Some
statisticians are so smitten by confidence intervals that they recommend calculating and
reporting them, instead of p-values. However, it should be noted that confidence intervals
and p-values are directly related. When a point-estimate is significant statistically, it implies
that the confidence intervals (lower-bound and upper-bound) will not intersect with zero.
The calculation of confidence intervals “by hand” is a bit tedious, because the
Pearson correlation needs to be transformed into a Fisher’s z-value (not to be confused with
a typical z-value). Then, the lower- and upper-bound Fisher’s z-values need to be back
transformed into Pearson correlations. I’d prefer not to demonstrate such calculations, as
they offer little in the way of insights. However, conceptually, one should keep in mind that
as the standard error associated with a correlation increases, so does the range between the
lower- and upper-bound confidence intervals.

Pearson Correlation – Confidence Intervals – SPSS


Unfortunately, SPSS does not offer a menu-driven option to calculate normal theory
confidence intervals for a correlation. This is particularly disappointing, considering the
emphasis on confidence intervals, rather than only p-values, in contemporary research.
However, Weaver and Koopman (2014) have created a SPSS macro that can be used to
C3.10
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

calculate accurate normal theory confidence intervals for Pearson correlations. I wrote
“normal theory”, because this approach does assume your data are relatively normally
distributed. If your data are substantially skewed and/or kurtotic, you’ll want to use
bootstrapping to estimate the confidence intervals for a Pearson correlation. You can
download the Weaver and Koopman (204) macro here: http://tinyurl.com/y4w93a2t. When
I ran the SPSS macro on the education and earnings per day variables (recall: r = .337 and N
= 40), I obtained the following results (Watch Video 5.6: Pearson r confidence intervals
SPSS):

It can be seen above that the lower and upper bound 95% confidence intervals were
.029 and .587, respectively. Thus, it may be suggested that there is a 95% chance that the
correlation between years of education completed and earnings in the population is
somewhere between .029 and .587. Because, the confidence intervals did not intersect with
zero, it implies that the correlation was statistically significant, p < .05. Correspondingly, the
normal theory correlation p-value was estimated at .033 (i.e., less than .05), as reported in
the previous section.
Evidently, the range in the confidence intervals (lower-bound, upper-bound) in this
example was large. From a small correlation of r = .029 to something as a large as r = .587.
The reason the range was so large is because the sample size used in this example was
relatively small (i.e., N = 40). This is why so many researchers want to see the confidence
intervals associated with a correlation (or any effect size, for that matter). They want to
know how much confidence they can be placed in the reported result. Zagorsky (2007)
actually used a much larger sample size in his actual investigation (N = 7,403), consequently,
the 95% confidence intervals would be expected to be narrower than that reported here for
N = 40. 2
Later in the chapter, I describe a method to estimate confidence intervals for
Spearman correlations. Finally, I’ll note that SPSS does offer a bootstrapped approach to the
estimation of a correlation coefficient confidence intervals. However, it would be necessary
to have access to the bootstrapping module that SPSS offers. I demonstrate how to employ
the bootstrapped technique to confidence interval estimation in the Advanced Topics
section of this chapter.

2
In fact, based on N = 7,403, I estimated the 95% confidence intervals at .32 and .36.
C3.11
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Pearson Correlation: Negative Correlation


If anyone ever tells you that you have a unique looking face, you should probably
take it as an insult. Rhodes and Tremewan (1993) had 24 people rate the distinctiveness and
the attractiveness of 60 faces (50% female). Attractiveness was measured on a 7-point scale
(1 = unattractive; 7 = very attractive), and distinctiveness was also measured on a 7-point
scale (1 = very difficult to pick out of a crowd at a busy railway station, 1 = easy to pick out
of a crowd at a busy railway station). Rhodes et al. estimated the association between the
60 distinctiveness ratings and the 60 attractiveness ratings with a Pearson correlation. I have
simulated data to correspond very closely to the results reported by Rhodes and Tremewan
(1993) (Data File: attractiveness_distinctiveness). When I performed the analysis in SPSS, I
obtained the following result (Watch Video 5.7: Example of a negative correlation in SPSS):

The correlation was estimated to be negative in direction r = -.317. Furthermore, the


correlation was significant statistically, p = .014. Thus, higher levels of distinctiveness were
associated with lower levels of attractiveness. Based on Gignac and Szodorai’s (2016)
guidelines, the correlation would be considered relatively large. The coefficient of
determination was equal to r2 = .100. Thus, 10.0% of the variance in attractiveness was
accounted for by distinctiveness. Based on the SPSS syntax file I adapted, the normal theory
95% confidence intervals were estimated at -.533 and -.062. Thus, the confidence intervals
did not intersect with zero, which implies that they correlation was statistically significant
different from zero. Additionally, there is a 95% chance that the correlation in the population
is somewhere between -.533 and -.062.

Pearson Correlation: Non-Significant Correlation


Some academics are very critical of the utility of psychotherapy (e.g., Dawes, 1994).
In part, the criticism rests on the observation that there does not appear to be a correlation
between years of experience as a psychotherapist and degree of improvement experienced
by the client (Smith & Glass, 1977). I have simulated some data to correspond to the results
obtained from such studies (Data File: experience_improvement). There are two variables:
(1) years of experience; and (2) degree of improvement measured on a 10-point scale (1 =
no improvement at all; 10 = complete recovery). When I ran the analysis in SPSS, I obtained
the following table (Watch Video 5.8: Example of a non-significant correlation in SPSS):

C3.12
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

It can be seen that the null hypothesis was not rejected, p = .467. That is, because
the p-value was not less than .05, we cannot reject the null hypothesis. The p-value of .467
indicates that there is a 46.7% chance that we would fool ourselves, if we concluded that
there was an association between the two variables in the population. It should be
emphasized that one never “accepts” the null hypothesis from the frequentist perspective. 3
One simply fails to reject the null hypothesis. Finally, the normal theory 95% confidence
intervals were -.202 and .415. As the confidence intervals intersected with zero, the null
hypothesis cannot be rejected, consistent with the p-value. However, it should be
acknowledged that the correlation between years of clinical experience and client
improvement might be as high as .415.
Pearson Correlation: Power
Power represents the probability of rejecting the null hypothesis, when it is in fact
false (i.e., should be rejected). Power is a very valuable characteristic associated with a
statistical analysis. Imagine conducting an empirical investigation with very little chance of
rejecting the null hypothesis. What’s the point? You will likely waste your time.
Power can range from .00 to 1.0. Furthermore, power is always a function of three
characteristics: alpha, effect size, and sample size. Alpha (α) represents a specified
probability. Specifically, it is the maximum probability deemed acceptable with respect to
making a wrong decision to reject the null hypothesis. In statistics, to reject the null
hypothesis erroneously is known as a type I error. Somewhat less commonly discussed is a
type II error, which represents the failure to reject the null hypothesis, when it is in fact false
in the population (see Table C5.2). 4 Higher levels of alpha are associated with higher levels
of power. However, the research community essentially insists on the specification of alpha
at .05, so there is no real scope to manipulate alpha to achieve greater power. Larger effect
sizes (in the population) and larger sample sizes are associated with greater levels of power.
Effect sizes in the population are, essentially, out of the control of researchers, so, there is
not much scope for manipulation in this case, either. Sample size, however, can be

3
Bayesian’s may have a different perspective on this matter. See the Advanced Topics at the end of
this chapter for a demonstration of a Bayesian correlation analysis.
4
There is also a type III error, which represents the occurrence of rejecting the null hypothesis in
the direction opposite to which the effect exists in the population (e.g., reporting a correlation of
.30, p < .05, based on a sample, when the correlation is actually negative in the population) . Finally,
a type IV error is said to have occurred when any of the preceding three types of errors are
confused in conversation or writing. 
C3.13
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

manipulated. That is, researchers can spend more time and money to collect more data
(increase N) to help increase power.

Table C5.2. Alpha and Beta Table


H1 True H1 False
Reject H1 Type I Error (α) Power (1 - β)
Fail to Reject H1 1-α Type II Error (β)
Note. H0 = Null hypothesis; α = alpha; β = beta

Pulling a number out of thin air, Cohen (1988) recommended that researchers
conduct investigations in such a way that power is at least .80, i.e., that there is at least an
80% chance of rejecting the null hypothesis, if it is in fact false. Overwhelmingly, researchers
have adopted Cohen’s (1988) .80 guideline. Unfortunately, most studies are underpowered:
they lack the “strength” to reject the null hypothesis. Based on surveys of the literature, the
power of statistical analyses reported in the literature tends to be much lower than .80
(Cashen & Geiger, 1989; Sedlmeier & Gigerenzer, 1989).
There are two ways to think about power: prospectively and retrospectively.
Prospective power is calculated prior to conducting a particular investigation. In practice,
prospective power is estimated to help determine the sample size required for an
investigation to help ensure that power is at least .80. By contrast, retrospective power is
calculated after the fact. That is, after a study has been conducted and the statistical results
have been obtained, the reported effects are evaluated with respect to power.
Retrospective power is especially useful to evaluate when a null hypothesis was failed to be
rejected. That is, a statistical analysis may fail to uncover a statistically significant correlation
between two variables (e.g., years of experience and improvement), but the analysis may
have been associated with a very low level of statistical power. If that is the case, then not
much can be said about the failure to reject the null hypothesis: it was almost inevitable. My
hunch is that the result I described above relevant to therapist experience and client
improvement was underpowered (N = 40). I explore this possibility further below.

Can the null hypothesis ever be accepted?


Strictly speaking, it is inappropriate to accept a null hypothesis from a statistical
(frequentist) perspective. Instead, frequentist statisticians insist that we only ever state that
we failed to reject the null hypothesis. However, I do not believe this to be a logically
consistent position. Instead, I believe that a null hypothesis can be essentially accepted, if
the p > .05 and power was greater than .95.
Consider that statistical results are framed within the context of probability. If p =
.049, for example, we are allowed to reject the null hypothesis, but there is still a very real
4.9% chance that we rejected the null hypothesis inappropriately (i.e., the null hypothesis is
in fact true; i.e., type I error). Researchers have overwhelmingly accepted this approach to

C3.14
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

statistical analysis. Consequently, it is logically consistent to accept a null hypothesis, if a null


hypothesis failed to be rejected (p > .05), and the power associated with the analysis was
95% or more. That is, with power = .95, there is only a 5% chance that the null hypothesis
was not rejected, when it is in fact false. If a researcher wished to be more conservative, he
or she could specify a minimum power criterion of .99, rather than .95. In the next section,
I calculate the power associated with the above correlation analysis conducted to evaluate
the hypothesis that null hypothesis that years of experience as a psychotherapist is
unrelated to degree of improvement experienced by the client.

Pearson Correlation Power: SPSS


Unfortunately, SPSS does not provide the opportunity to estimate power for
correlations within the menus. However, I have prepared a syntax file that will do so. All the
user has to specify is the size of the correlation (in a column named ‘r’) and the size of the
sample size (in a column named ‘n’). Once r and N are inputted into SPSS, simply run the
SPSS syntax that can be found here: https://tinyurl.com/zj57umz (Watch Video 5.9:
Calculate power for a correlation in SPSS).
To demonstrate the usefulness of the syntax, I estimated the power associated with
the psychotherapy experience and improvement investigation described in the section
above relevant to a non-significant Pearson correlation (r = .118, p = .467). Recall that the
sample size was 40. Based on the syntax above, the power was estimated at .11, as can be
seen in the SPSS result below.

Thus, there was only an 11% chance of rejecting the null hypothesis, in this case (if the null
hypothesis were in fact false). Consequently, I would not place much confidence in the fact
that the analysis failed to reject the null hypothesis, in this case. The sample size was too
small to be associated with respectable power. In order to achieve power of .95, the
investigation would have required a sample size of 910, if the correlation is r = .12 in the
population.

With the syntax I have developed above, I created a table (see Table C5.3) which
specifies the levels of power associated with various levels of correlation coefficient (.10 to
.60) and various sample sizes (5 to 800). It can be seen that conducting an investigation with
fewer than 20 participants is probably a waste of time, unless a very large effect size is
anticipated. Additionally, it can be seen that with a sample size of N = 30 (very commonly
used sample size in clinical research), the chances of detecting a large sized correlation
coefficient (i.e., r = .30; according to Cohen, 1992) as statistically significant is only 36%.

C3.15
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Finally, if a small effect size is anticipated (r = .10), even a sample size as large as 400 will
only detect it as statistically significant about 50% of the time.
I have also calculated the sample sizes required to achieve three levels of power
(.80, .95, and .99) across correlations which range in magnitude from |.10| to |.60|. As can
be seen in the Table C5.4, in order to achieve power of .80, a small correlation by Cohen’s
(1992) guidelines (i.e., .10) requires a sample size of 790! However, if a very large effect is
expected (r = .50), a sample size as small as 28 will achieve power of .80.

Table C5.3. Correlation Coefficient Power Table (alpha = .05)


Population Correlation
N .10 .20 .30 .40 .50 .60
5 .03 .04 .06 .08 .11 .15
10 .04 .07 .12 .19 .30 .46
15 .05 .10 .18 .31 .49 .71
20 .06 .13 .24 .42 .64 .85
25 .07 .15 .30 .52 .76 .93
30 .07 .18 .36 .61 .84 .97
50 .10 .28 .57 .84 .97 .99
75 .13 .38 .72 .94 .99 .99
100 .16 .52 .87 .99 .99 .99
200 .29 .82 .99 .99 .99 .99
400 .52 .98 .99 .99 .99 .99
800 .81 .99 .99 .99 .99 .99
Note. These power estimates are based on N-2 (df) not N; negative and positive
correlations are associated with the same levels of power.

Table C5.4. Sample Size Required for a Significant Correlation Coefficient (alpha = .05)
Population Correlation
Power .10 .20 .30 .40 .50 .60
.80 790 190 84 40 28 18
.95 1290 308 132 71 42 27
.99 1820 414 177 94 64 35
Note. These power estimates are based on N-2 (df) not N; negative and positive
correlations are associated with the same levels of power.

Scatter Plots
An impression of the nature of the association between two variables can be
appreciated by an examination of a scatter plot. A scatter plot consists of two axes: an X-axis
and a Y-axis. Conventionally, the X-axis is used to represent the values for the predictor
C3.16
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

(independent) variable, while the Y-axis is used to represent the values of the predicted
(dependent) variable. With respect to a correlation, which variable is labelled the X variable
and which the Y variable is often arbitrary. However, with respect to a more sophisticated
statistical analysis such as regression, each variable is typically clearly designated the
predictor variable and the predicted variable (see chapter on Bivariate Regression).
For the purpose of illustration, I have created scatter plots to depict the three levels
of correlation demarcated as small, medium, and large by Cohen (1988). The data are
simulated based on a sample size of 387. The positive correlations are presented in the top-
half of Figure C5.1 and the negative correlations presented in the bottom-half.

Figure C5.1. Example Scatter Plots Across Small, Medium, and Large Effect Sizes

r = .10 (r2 = .01) r = .30 (r2 = .09) r = .50 (r2 = .25)

r = -.10 (r2 = .01) r = -.30 (r2 = .09) r = -.50 (r2 = .25)


It can be seen that when the correlation is small (.10 or -.10) there is essentially no
discernible pattern at all within the scatter plot. This should come as no surprise, as a
correlation of .10 implies that only 1% (r2 = .01) of the variance in the dependent variable is
shared by the independent variable. By contrast, the scatter plots which depict a large
correlation by Cohen’s (1988) guidelines (r = |.50|) evidence a clear pattern. With respect
to a positive correlation, there is a clear, upward tilt in the observations. With respect to the
negative correlation, there is a clear, downward tilt in the observations.
The scatter plots above are more attractive looking than what you might expect in
most research scenarios. For the purposes of more realistic illustration, I have created the
scatter plots for the three Pearson correlation examples described earlier in the chapter. As
can be seen in Figure C5.2, the scatter plots are much less attractive and/or informative. So
C3.17
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

much so, one would probably not bother reporting them in a report or manuscript, unless it
was for educational purposes. However, all scatter plots can be useful to examine to
evaluate the possibility of outliers or influential cases, as well as non-linearity. These are
topics I address in detail in the chapter devoted to bivariate regression.

Figure C5.2. Realistic Scatter Plots Across Various Effect Sizes

r = .338 r = -.317 r = .118

Pearson Correlation: Assumptions


The Pearson correlation is a parametric statistic, which implies that it is associated
with a variety of assumptions. Strictly speaking, all of the assumptions have to be satisfied
in order for the estimated p-value to be accurate. However, as I describe below, there is
empirical research to suggest that some of the assumptions can be violated to some degree,
without affecting the accuracy of the corresponding p-value adversely.

1: Random sampling
Random sampling implies that all cases in the population have an equal chance of
selection into the sample. In practice, it is entirely unrealistic to run a study such that every
person (or case) in the population to which you wish to infer your results has an equal chance
of selection into your sample. Consequently, in practice, this first assumption is always
violated. The extent to which the violation of this assumption affects the accuracy of
statistical results is anyone’s guess. Ultimately, there is not much that can be done about it.
The show must go on, as they say.
In the years of education completed and earnings example, the data were obtained
from the National Longitudinal Survey of Youth, which is a US nationally representative
sample (see Zagorsky, 1997, for more details). Although not a random sample, it is one of
the better quality samples that one sees in published research.

2: Independence of observations
Independence of observations implies that the participants in the investigation have
not influenced each other with respect to the variable(s) of interest. Typically, it is easy to

C3.18
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

satisfy this assumption, if the participants complete the tasks, tests, and/or questionnaires
independently (i.e., on their own).
In the years of education completed and earnings example, the participants
completed the survey independently. Consequently, it is likely that their responses to the
years of education and annual earnings questions were independent, which would satisfy
the assumption of independence of observations.

3: Data measured on a continuous scale


Strictly speaking, a Pearson correlation assumes the independent and dependent
variables have been measured on a continuous scale. Specifically, an interval or ratio scale.
In the years of education and earnings example, the data were measured on a ratio scale,
because it is possible to score 0 for years of education completed and $0 for annual earnings
(or per day earnings).
Simulation research has shown that parametric tests (e.g., Pearson correlation) will
work pretty well in cases with data scored on an ordinal scale, so long as the scale has at
least 5-points (O’Brien & Homer, 1987; Rasmussen, 1989). However, I would place a major
qualifier on such a recommendation: the participant responses associated with the 5-point
scale actually cover the whole spectrum of the scale. All too often, a theoretical 5-point scale
actually turns out to be a 3-point scale. For example, I am familiar with a questionnaire that
has the following item: ‘I create a positive work environment for others’. Participants
respond to the item based on the following scale: 1 = Almost Never; 2 = Seldom; 3 =
Sometimes; 4 = Usually; 5 = Almost Always. Based on a sample of 4,775 participants, the
histogram displayed in Figure C5.3 was obtained.

Figure C5.3. Real Data Histogram Associated with 5-point Likert Scale

Clearly, this particular theoretical 5-point scale is really only a 3-point scale in
practice. Thus, the onus should be on the researcher to demonstrate that the full 5-point
scale has been used by the participants, prior to applying parametric statistics. If the above
item were used as a dependent variable, it would not be appropriate to perform a
parametric statistical analysis. It would be necessary to perform a nonparametric analysis
(e.g., Spearman correlation; discussed further below).

C3.19
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

4: Linearity
Many sources state that the Pearson correlation assumes a linear association
between the independent and dependent variables. I would prefer to say that the Pearson
correlation is limited in that it can only quantify linear associations. So, it’s a limitation, not
an assumption, per se. However, it is always good practice to examine scatter plots to help
identify non-linearity (a.k.a., curvilinearity) in the data. There is a way to test for
curvilinearity in the data. I describe this method in the chapter devoted to multiple-
regression.

5: Normally distributed data


Theoretically, the Pearson correlation assumes perfectly normally distributed data.
However, based on empirical simulation research, it has been discovered that the Pearson
correlation is fairly robust to violations of the theoretical assumption of normality. Based on
my reading of the simulation research (Bishara & Hittner, 2012; Edgell & Noon, 1984;
Havlicek, & Peterson, 1977), normal theory estimation (“regular” p-values) will provide
respectably accurate p-values, when the data are skewed less than |2.0| and the kurtosis is
less than |9.0|. These are absolute skew and kurtosis values. They are not z-statistics or t-
statistics (i.e., skew divided by the standard error of skew). Also, the sample size needs to
be greater than 10.

Figure C5.4. Education and Earnings Per Days Histograms

Skew .221 Skew .911


Kurtosis .126 Kurtosis .072
Mean 13.80 Mean 118.99
Median 13.00 Median 99.60
SD 3.07 SD 69.19

In the years of education completed and earnings per day example, the data were
somewhat non-normally distributed, however, they were not excessively so, as can be seen
in Figure C5.4 (Watch Video 5.10: Check normality for a Pearson correlation). In particular,

C3.20
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

it will be noted that the years of education completed distribution was associated with low
levels of skew (.221) and kurtosis (.126). However, it was noted that the earnings per day
variable was more substantially skewed at .91. As the value of .91 is much less than |2.0|,
the data should be considered sufficiently normally distributed to perform a normal theory
Pearson correlation analysis on the data, which yielded a p-value of .033, in this case. I
discuss the issue of severe non-normality and correlations in the Advanced Topics section of
this chapter.

Spearman Rank Correlation


The Spearman rank correlation (rS) is simply a Pearson correlation applied to ranked
data. The data can be originally ranked by the participants in the investigation. Alternatively,
the originally non-ranked data provided by participants can be ranked by the researcher,
after the data have been collected. As the Spearman correlation is based on ranked data, it
does not assume normally distributed data. In fact, the Spearman correlation can handle any
level of non-normality, unlike the Pearson correlation.
In my opinion, the Spearman rank correlation is much too frequently applied,
because researchers automatically turn to the Spearman correlation, when the data are
perceived to be excessively skewed. However, as described in the Advanced Topics section
at the end of this chapter, there are more attractive options than the Spearman correlation
in such cases, including the Pearson correlation as estimated via bootstrapping and/or
randomization. Also, the Pearson correlation is relatively robust to violations of normality
(skew < |2.0| and kurtosis < |9.0|). Consequently, it should be the rare day when you turn
to the Spearman correlation for the purposes of estimating the association between two
variables. The main reason I prefer to use a Pearson correlation, when at all possible, is that
I much prefer to use the data that I collected to estimate the association between two
variables, rather than transform the data in some way (either ranks or any other
transformation). If you transform your data, your results are limited to those transformed
data. That is not an attractive situation. Again, I prefer to remain as close as possible to the
“real” data.
The only two possible scenarios in which I can imagine the justifiable application of
the Spearman correlation include: (1) when either the independent variable or the
dependent variable (or both) have been measured on a Likert scale with only 3- or 4-points;
or (2) when the data have been measured with ranks (i.e., rank-ordered data provided by
the participants).

Spearman Rank Correlation: SPSS


It is not easy to find good quality published examples of the use of the Spearman
correlation, because most data are better analysed with Pearson’s (complimented with
bootstrapping or randomization estimation) or a polychoric correlation (as described in the
Advanced Topics of this chapter). However, for the purposes of demonstration, I have
C3.21
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

simulated some data with two variables hypothesized to be related. In this hypothetical
example, the researcher was interested in testing the hypothesis that there is an association
between socio-economic status and reading ability in children. I have created a data file to
demonstrate the example (Data File: ses_reading_3_points).
The independent variable is parents’ socio-economic status (SES) and the dependent
variable is reading ability (reading) in children aged 7 years. SES was measured on the
following 3-point scale: 1 = low; 2 = moderate; and 3 = high. Reading ability was assessed by
each students’ teacher on the following 3-point scale: 1 = poor; 2 = average; and 3 =
excellent. Because the data were measured on a limited information ordinal scale (less than
5-points), it would be arguably inappropriate to a apply Pearson’s correlation in this case.
However, a Spearman correlation could be applied justifiably to such limited information
ordinal scale. In order to perform a Spearman rank correlation in SPSS, one can use the
‘Bivariate’ menu option (Watch Video 5.11: Spearman Rank Correlation in SPSS). SPSS
produced the following output:

As can be seen in the SPSS table entitled ‘Correlations’, the Spearman correlation
was estimated at r = .398. Furthermore, the correlation was statistically significant, p = .016.
The Spearman correlation can be squared (i.e., coefficient of determination) just like the
Pearson correlation. However, one must restrict the interpretation to ranks, not the raw
data. Thus, 15.8% (r2 = .158) of the variability in the reading ability ranks was accounted for
by SES ranks. Confidence intervals can be estimated for Spearman correlations. There are
two options: normal theory and bootstrapping. I describe the normal theory estimation
approach next. I describe the bootstrapping (and randomization) approach in the Advanced
Topics section of this chapter.

Summary
This chapter introduced the foundations relevant to the estimation of the
association between two variables via correlation. The correlations were tested for
statistical significance with normal theory estimation (i.e., with reference to the t-
distribution). The Pearson and Spearman correlations are very commonly used statistics in
many scientific disciplines, so I suspect you will use and read them often. The Advanced

C3.22
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Topics section of the chapter, which follows below, introduces additional measures of
association (e.g., Pearson covariance, polychoric correlation), as well as additional
estimation techniques (bootstrapping, randomization, Bayesian).

C3.23
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Advanced Topics

Pearson Covariation
The Pearson correlation is a standardized measure of association. It is considered
standardized because it is bounded by two values: -1.0 and 1.0. It is also considered
standardized because it is expressed in standard deviation units. For example, the
correlation between years of education completed and earnings was reported at .337 in the
Foundations section of this chapter. Consequently, it may be said that a one standard
deviation increase in years of education completed (SD = 3.07) is associated with a .337 of a
standard deviation increase in earnings per day, i.e., .337 * 69.19 = $23.32.
Despite the above, this chapter would be incomplete, if I did not mention that there
is an unstandardized version of Pearson’s correlation known as Pearson’s covariance. People
very rarely report Pearson’s covariance, mostly because it is not naturally interpretable like
its standardized counterpart (i.e., Pearson’s r). However, there are some statistical analyses
which are based on Pearson’s covariance (e.g., structural equation modeling), consequently,
you should at least know that it exists. As can be seen in the SPSS table entitled ‘Inter-Item
Covariance Matrix’, the Pearson covariance between years of education completed and
earnings per day was estimated at 71.49 (Data File: education_earnings_N_40) (Watch
Video 5.12: Pearson Covariance in SPSS).

Bootstrapped Confidence Intervals: SPSS


The p-values estimated in the main portion of this chapter are based on what is
known as ‘normal theory estimation’. As described in the assumptions section above, normal
theory estimation assumes normally distributed data. Fortunately, there are estimation
techniques other than normal theory that deserve serious consideration. For example,
bootstrapping is a particularly attractive estimation technique, because it has fewer
assumptions. Specifically, the bootstrapping estimation technique does not assume any
level of normally distributed data. I describe the nature of bootstrapping in more detail in
the section below devoted to randomization estimation, as they are usefully compared and
contrasted, for the purposes of understanding.
In order to conduct a bootstrapped correlation analysis in SPSS (assuming you have
this extra module installed), the extra button you need to click is ‘Bootstrap’ and select the
bias corrected accelerated (BCs) confidence intervals option (Watch Video 5.13:
Bootstrapped Pearson Correlation in SPSS). SPSS produced the following output:

C3.24
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

You can see that the correlation (“point-estimate”) was .337, which is identical to
the original Pearson correlation estimate reported in the main portion of the chapter.
Bootstrapping does not estimate a point-estimate that is different to normal theory
estimation. Importantly, however, it can be seen that the bootstrapped 95% Pearson r
confidence intervals were estimated at r = -.035 and r = .620. Thus, there is a 95% chance
that the correlation between years of education and earnings is between -.035 and .620 in
the population. The fact that the confidence intervals intersected with zero (i.e., one
correlation is positive and one is negative) implies that the point-estimate correlation is not
significant statistically, that is, p > .05. Such a result is in contrast to the p = .033 reported in
the Foundations section of this chapter via normal estimation theory (i.e., reference to the
t-distribution). I discuss the discrepancy further below. Note that if you perform the
bootstrap analysis in SPSS yourself, your results might deviate from mine very slightly,
because the re-samples would not be expected to be identical across executions of the
analysis.
The bootstrapping utility also estimated a standard error (SEr = .173). It is acceptable
to divide the correlation point-estimate (r = .337) by the bootstrapped standard error (.173)
to obtain a t-value. In this case, the t-value corresponded to 1.948 (.337/.173), which is less
than |1.96|. I cannot think of a scenario where a t-value less than |1.96| will be found to be
significant statistically, no matter what the sample size (i.e., unless a person uses a one-tailed
test, which I don’t recommend). I estimated the bootstrapped p-value at p = .059. 5
Consequently, based on bootstrapping, the correlation between years of completed
education and earnings per day was not found to be significant statistically (p = .059), which
is inconsistent with the normal theory estimated p-value of p = .033 (see Foundations section
of this chapter). The normal theory and bootstrapped p-values differ because the earnings
per day distribution was moderately skewed (skew = .91; see section above on
Assumptions). Bootstrapping tends to be more robust with data that are skewed across all
levels, however, it is also not a perfect estimation technique (Bishara & Hittner, 2012).
Consequently, in this case, the researcher is left with the problem of deciding which
estimation technique to trust.

5
To estimate the p-value, I used the following Excel function: =TDIST(1.948, 38, 2)
C3.25
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

As a general statement, I would trust bootstrapping over normal theory estimation


results. Fortunately, there is another estimation technique option that can help resolve the
discrepancy. The estimation technique is known as randomization estimation. Bishara &
Hittner (2012) found randomization estimation to be the most robust across all levels of non-
normality and all sample sizes (even as small as N = 5). I describe randomization estimation
in the next section of the chapter.
It is worth noting that bootstrapping can be used for the purposes of estimating
confidence intervals for Spearman correlations, as well. The procedure is the same as that
described above relevant to the Pearson correlation. I obtained the following result, based
on the SES and reading ability example I described in the Spearman correlation section of
the chapter. As can be seen in the SPSS output, the lower-bound and upper-bound estimates
were .072 and .654, respectively. Your result might deviate from these values very slightly,
because the re-samples would not be expected to be identical across executions of the
analysis (Watch Video 5.14: Bootstrapped Spearman Correlation in SPSS).

Pearson Correlation: Randomization Test


The third estimation approach described in this chapter is randomization.
Randomization is similar to bootstrapping in that it is a resampling procedure. Bishara &
Hittner (2012) found randomization estimation to be the most robust across all levels of non-
normality and all sample sizes (even as small as N = 5). Consequently, it is a useful estimation
technique to know. The main distinction between bootstrapping and randomization is that,
in typical bootstrapping, yoked-pairs of data are drawn randomly from the sample (with
replacement). So, for example, a person’s score on the education variable and the same
person’s score on the earnings per day variable would be drawn from the original sample
(sometimes called the parent sample). Randomly sampled scores are drawn until the
bootstrapped sample reaches the N of the original sample. Then, a Pearson correlation is
estimated between the two variables of interest from the bootstrapped sample. The
Pearson correlation is saved into a different data file. When the procedure is repeated a
large number of times (say, at least, 1,000), a respectably stable distribution of bootstrapped
correlations will emerge. The standard deviation of the estimated correlations represents
the standard error. Confidence intervals can be estimated from the bootstrapped standard
error. If the 95% confidence intervals do not intersect with zero, then the correlation may
C3.26
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

be considered statistically significant (p < .05). The beauty of bootstrapping is that it creates
a unique sampling distribution from the observed data, which might be skewed or kurtotic,
rather than relying upon the normal theory distribution for a particular statistic under the
null hypothesis (which was derived analytically).
In contrast to bootstrapping, randomization does not yoke the data. Instead, one of
the variable’s scores is held constant (does not matter whether X or Y) and the other
variable’s scores are shuffled (or scrambled) into a different order. Each time the data are
shuffled into a different order, a Pearson correlation is estimated and saved into a different
data file. When the procedure is repeated a large number of times (say, at least, 1,000), a
distribution of randomized correlations will emerge. However, in contrast to bootstrapping,
the distribution of correlations will have a mean of zero. If the observed correlation
associated with the originally collected sample paired data is larger than 95.0% of the
absolute randomized correlations, then one would declare the observed correlation as
statistically significant, p < .05. Stated alternatively, if the correlation is positive, and it is
larger than the randomized correlation that corresponds to the 97.5th percentile, then the
null hypothesis can be rejected. Alternatively, if the correlation is negative, and it is larger
than the correlation that corresponds to the 2.5th percentile (i.e., even “more negative”),
then the null hypothesis can be rejected. It has been argued that randomization is arguably
better than bootstrapping, because in randomization, only unique shuffles of the data will
be executed, whereas in bootstrapping, some identical bootstrapped samples may be
resampled from the original sample. I would say more research is needed to specify exactly
when and how bootstrapping results will differ from randomization results. In the education
and earnings example, the sample size was N = 40 and the number of variables was two,
which implies a total of 1,560 unique permutations of the data (Watch Video 5.15: The
Difference Between Bootstrapping and Randomization - Explained).
Theoretically, a randomization approach to estimation may be used for any data set
and statistic. However, it tends to be used in two cases: (1) when the data are non-normally
distributed; and (2) when the data are not even remotely close to randomly sampled. I
demonstrate the two instances with examples below.

Pearson Correlation: Randomization Test – SPSS


In some cases, one may have data for which there is no possibility to infer the results
to a population. However, one may nonetheless wish to estimate the association between
two columns of data, because the correlation would represent something interesting. For
example, Perilloux, Fleischman, and Buss (2011) had a group of men (N = 96) and women (N
= 213) rank their preference for 13 mate characteristics. Higher ranks indicated a greater
preference. As can be seen in the Table C5.5, the males rated ‘attractive’ as the most
preferred characteristic (M = 10.66) in a mate and ‘religious’ as the least preferred (M =
3.48). Females, by contrast, rated ‘kind’ as the most attractive characteristic (M = 11.57) and
housekeeper as the least preferred (M = 2.81). Although the results were reported by

C3.27
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Perilloux et al. in such a way as to suggest that males and females are different, I believe
such an interpretation is somewhat misleading. It’s clear in Table C5.4 that the ratings males
and females provided were very similar, in the sense that there appears to be a positive
correlation between the rankings. Of course, such a hypothesis needs to be tested
statistically (Data File: mate_preferences).

Table C5.5. Mean Mate Characteristic Rankings


Trait Males Females
Kind 10.01 11.57
Religious 3.48 3.93
Personality 8.88 8.97
Creative 5.66 4.64
Housekeeper 3.98 2.81
Intelligent 10.39 10.40
Earning capacity 4.26 7.42
Wants kids 5.65 6.09
Easy going 8.63 7.59
Heredity 5.16 4.58
College graduate 5.46 6.99
Attractive 10.66 8.08
Healthy 8.79 7.97

To estimate the degree of similarity, I calculated a Pearson correlation on the data


reported in Table C5.5. However, importantly, because the 13 traits are not a random sample
of traits, nor is there a population to which the results can be inferred, one cannot estimate,
justifiably, the p-value associated with the Pearson correlation with normal theory (or
bootstrapping, for that matter). 6 Instead, in this case, one could estimate the probability of
committing a Type I error (i.e., falsely reject the null hypothesis) with something known as a
randomization test (a.k.a., permutation test). SPSS does not include an option to calculate a
p-value via randomization. However, Hayes (1998) wrote a SPSS syntax procedure to
perform a randomization Pearson correlation analysis (Watch Video 5.16: Randomized
(Permutation) Correlation in SPSS (Example 1)). Based on 5,000 randomized re-samples (the
minimum Hayes recommends), I obtained the following result:

6
I would accept the argument that the example study described in the Foundations section of this
chapter relevant to face distinctiveness and attractiveness should have used a randomization
approach to standard error and p-value estimation. That is, the level of analysis was based on the
number of faces that were rated.
C3.28
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

The p-value was reported as 10-4 x 4.000, which corresponds to p = .0004 (or, .0004
+ .0000 = .0004). Thus, the null hypothesis of no association between the male and female
preferred traits was rejected, p = .0004. Furthermore, the correlation was estimated at r =
.84, thus, 70.1% (r2 = .701) of the variability in female mate characteristic ranked preferences
was shared with male mate characteristics ranked preferences: males and females are
actually very similar in this respect.
For the second randomization correlation demonstration, I return to the years of
education completed and earnings data. Although the education and earnings data were
sufficiently normally distributed for the purposes of estimating the p-value associated with
a Pearson correlation via normal theory, one might nonetheless be uncomfortable with the
fact that the earnings distribution was fairly skewed (skew = .911). Consequently, one might
want to employ a randomization procedure, in order to estimate the p-value for the Pearson
correlation. When I ran the Hayes (1998) syntax, I obtained the following results (Watch
Video 5.17: Randomized (Permutation) Correlation in SPSS (Example 2)).

C3.29
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

As can be seen in the SPSS output, the null hypothesis was rejected, p = .036 (i.e., 10-2 x
3.62000 = .036; or .0184 + .0178 = .036). Furthermore, the randomization estimation
approach corroborated the normal theory p-value of .033 reported in the Foundations
section of this chapter. Finally, an appropriate method to estimate 95% confidence intervals
for a randomization test of a correlation is to multiply the standard deviation of randomized
correlation estimates by 1.96. Then, add and subtract the value to and from the point-
estimate. For example, in this case, the standard deviation associated with the 5,000
randomized correlation estimates was .161 (the output produces a histogram of the
estimates which includes the SD of the randomized correlations). Thus, .161*1.96 = .316.
Therefore, .337+.316 = .653, and .337 - .316 = .021. The 95% confidence intervals were .021
and .653.
In most cases, normal theory, bootstrapped, and randomization p-values will
corroborate each other, i.e., all of the p-values will be less .05 or all of the p-values will be
greater than .05. However, in some cases, like this education and earnings example, they
will diverge. The simulation research suggests that randomization is the most robust (e.g.,
Bishara & Hittner, 2012), so I would trust it over other estimation methods, at this stage.

Do not Praise p < .05


I would like to conclude the Advanced Topics section on the bootstrapped and
randomization estimation techniques by pointing out that there is nothing magical about p
< .05. Researchers could have agreed upon p < .06 as a demarcation for a statistically
significant effect (see Cohen, 1994), which would imply that the correlation between
education earnings would be statistically significant, based on the bootstrapped estimation

C3.30
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

approach (p = .059). Ultimately, the three estimation methods yielded p-values that were
very much in the same ballpark. To help support my position, I have listed the upper and
lower-bound confidence intervals associated with the three estimation techniques for the
education and earnings per day data (see Table C5.6). They really tell the same story, in my
view. Specifically, the correlation in the population is likely positive in magnitude, but a
larger sample size would be required to pin-point more precisely the magnitude of the
positive correlation. Fortunately, Zagorsky (2007) actually had an N = 7,403. However, I used
N = 40 to make the results more interested from an educational perspective.

Table C5.6. 95% Confidence Intervals Associated with Three Estimation Techniques
L-B P-E U-B
Normal Theory .029 .337 .587
Bootstrapped (BCa) -.035 .337 .620
Randomization .033 .337 .641
Note. L-B = Lower-Bound; P-E = Point Estimate; U-B = Upper-Bound

Spearman Correlation: Confidence Intervals


If you do not have access to the SPSS bootstrapping module, you can use the SPSS
syntax I developed to estimate the normal theory 95% confidence intervals for a Spearman
correlation. The procedure is based on work published by Bonett and Wright (2000). You can
download the syntax here. To demonstrate the utility of the syntax, I revisited the
socioeconomic status and reading ability Spearman correlation example described in the
main section of the chapter. Recall that the Spearman correlation was estimated at rs = .398
and the sample sizes was N = 36. When I ran the SPSS syntax on the Spearman correlation
of .398, I obtained the following results (Watch Video: 5.18: Spearman Correlation
Confidence Intervals in SPSS).

It can be seen that the 95% lower-bound (i.e., CI95_r_1b) and upper-bound (CI95_r_ub)
confidence intervals corresponded to rs = .07 and rs = .65, respectively. Such a result
corroborated the statistically significant result (i.e., p < .05). However, it must be
acknowledged that the range in the confidence intervals is wide, which is a symptom of the
small sample size used in the example.

Polychoric Correlation
A good case can be made to apply a polychoric correlation to data that are measured
on limited information ordinal scales, i.e., less than 5-points (Bollen & Barb, 1981; O’Brien &
Homer, 1987). Recall that, strictly speaking, the Pearson correlation assumes the data have
been measured on a continuous scale, either interval or ratio. The Pearson correlation works
reasonably well on data measured on an ordinal scale, so long as the data have 5-points in
C3.31
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

the scale. However, for data measured on a scale with 4-points or less, you should seriously
consider alternatives to the Pearson correlation. The Spearman correlation is an option, in
this case. However, the Spearman correlation represents the association between two
ranked variables, which is not a representation of the association between two continuously
scored dimensions. Often, researchers want to know the association between the variables
of interest along a continuum.
The polychoric correlation assumes that there is an underlying continuous
dimension that mediates responses on the discrete variable. For example, reading ability is
arguably a continuous dimension with a lot subtle variability across people. However, in the
SES and reading ability example described in the Spearman correlation Foundations section
of this chapter, reading ability was measured with three coarse categories (poor, average,
excellent). Theoretically, socio-economic status is also a continuous dimension, but it was
measured with three categories, as well (low, moderate, high). Therefore, the polychoric
correlation could be applied to the SES and reading ability data to estimate the “corrected”
correlation between SES and reading ability in children. In a sense, you can view a polychoric
correlation as a corrected Pearson correlation. It is corrected in the sense that the polychoric
procedure “stretches” out the relatively discontinuous data into a form that is more
continuous. Then, a Pearson correlation is applied to the “stretched” data. In theory, I am in
favour of applying the polychoric correlation to any data that are scored on an ordinal scale,
when there is an underlying continuous dimension, even if the scale has 5-points or more.
However, in practice, it can be difficult to estimate the polychoric correlations, given the
limited software that can perform the analyses efficiently.

Polychoric Correlation: SPSS


SPSS does not have a menu driven option to estimate polychoric correlations.
Fortunately, however, Lorenzo-Seva and Ferrando (2015) published a SPSS macro that can
calculate polychoric correlations on two or more variables, simultaneously. When I ran the
SPSS polychoric macro, I obtained the following output (Watch Video 5.19: Polychoric
Correlation in SPSS):

Thus, the correlation between SES and reading ability was estimated at .415, which
is slightly larger than the correlation coefficient based on Spearman’s correlation (i.e., .398).
In practice, you can expect the polychoric correlation to be larger than the Spearman
correlation, so it can be considered preferred for those who wish to reject the null
hypothesis. Unfortunately, the SPSS macro does not report a p-value, however, there is an
option to estimate 95% confidence intervals (warning: be prepared to wait a significant
amount of time for the program to estimate the confidence intervals).

C3.32
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

I’ll note that the Pearson correlation applied to these data came out at .393, which
is very comparable to both the Spearman and polychoric correlations, in this case. There are
certainly scenarios where larger deviations can be observed, so you would be wise to
consider the polychoric correlation, when you have limited information ordinal variables.

Tetrachoric Correlation
Researchers sometimes use the term tetrachoric correlation in the case of the
application of the polychoric correlation to data measured on a dichotomous scale. The
reason is that the tetrachoric correlation was established first. However, the tetrachoric
correlation is limited to data that are dichotomous in nature. It does not work on data scored
on 3-point or 4-point ordinal scales. The polychoric correlation is a generalisation of the
tetrachoric correlation. That is, the polychoric correlation works equally well on
dichotomous data and 3-point and 4-point ordinal data. Consequently, the tetrachoric
correlation should be considered redundant these days.

Test the Difference Between Two Correlations


In some cases, researchers want to determine whether one variable is more
substantially correlated with a dependent variable, in comparison to another independent
variable. For example, Duckworth and Seligman (2005) wanted to know whether self-
discipline was a better correlate of academic performance (grade point average; GPA) than
intelligence (IQ). To test the hypothesis, they collected data from 164 adolescents in the 8th
grade. Specifically, they administered a collection of self-discipline measures and an
intelligence test. They obtained the students’ GPA scores from the school at the end of the
academic year. The correlation between IQ and GPA was estimated at r = .32, p < .001. The
correlation between self-discipline and GPA was estimated at r = .67, p < .001. Although
some people might simply state that a correlation of r = .67 is larger than a correlation of r
= .32, statistically, it is inappropriate to do so. It is inappropriate, because it is possible that
the numerical difference between .32 and .67 arose simply by chance. Consequently, the
difference between the two correlations (Δr =|.35|) must be tested for statistical
significance.
Meng, Rosenthal & Rubin (1992) developed a method to test the difference
between two correlations, where the two correlations share the same dependent variable.
In the Duckworth et al. (2005) study, the dependent variable was GPA for both correlations.
Thus, the Meng et al. (1992) procedure would be appropriate, here. Unfortunately, most
statistical programs do not include easily accessible utilities to test the difference between
two correlations. Fortunately, however, some syntax has been created by IBM/SPSS to
employ the Meng et al. (1992) method. I have modified the syntax slightly to help enhance
the clarity of the output. The syntax can be found here: (https://tinyurl.com/j4ejja2). When
I tested the difference between the .32 and .67 correlations with the syntax, based on a

C3.33
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

sample size of 164, I obtained the following results (Watch Video 5.20: Test the difference
between two correlations in SPSS).

As can be seen in the output above, the numerical difference between the two
correlations was equal to Δr = -.35. Whether the difference in the correlations is negative or
positive is arbitrary, as it simply depends on which correlation you identify as independent
variable 1. The 95% confidence intervals associated with the difference in the correlation
was estimated at r = -.603 and r = -.255. Thus, there is a 95% chance that the difference in
the correlations at the population level is somewhere between -.603 and -.235. The
statistical test of the difference in the correlations is based on the z-distribution. In this
example, the z-value was calculated at -4.294, which was statistically significant, p < .001.
The output includes 1-tailed and 2-tailed p-values. Personally, I don’t believe in the validity
of 1-tailed p-value testing. Consequently, I recommend that you consult the 2-tailed p-value
only. Either way, in this research scenario, self-discipline was found to be statistically
significantly greater correlate of GPA than IQ.
Although methods have been developed to test the difference between two
correlations with two different dependent variables (and two different independent
variables), the vast majority of research scenarios do not involve such variables.
Consequently, I do not discuss them, here. I’ll also note the Meng et al. (2005) procedure
will work just fine if the research scenario involves a common independent variable and two
different dependent variables. One simply has to “flip” the variables around for the purposes
of the analysis.

Bayesian Analysis
In the Foundations section of this chapter, I argued that compelling statistical
support for the null hypothesis can be provided, when one fails to rejected the null
hypothesis (p > .05) and the power associated with the analysis was .95 or greater. With
respect to the years of experience and psychotherapy example, the null hypothesis was not
rejected, r = .118, p = .467. However, as the sample sizes was only 40, the power associated
with the analysis was only .11. Thus, there was only an 11% chance of rejecting the null
hypothesis, if it were in fact false in the population. Based on such results, I would not
suggest that the null hypothesis was supported.
An alternative approach to evaluating support for hypotheses is based on Bayes
factors. If you are interested to learn about the principles of the Bayesian approach, I
encourage you to read Jarosz and Wiley (2014) for an accessible introduction. Essentially,
Bayes factors provide an estimate of the likelihood that the data are better represented by
one hypothesis versus another hypothesis. In many cases, researchers are interested in
comparing the null hypothesis versus the alternative hypothesis. In the context of Bayesian
C3.34
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

analyses, typically, the null hypothesis is represented by BF01 and the alternative hypothesis
is represented by BF10. A Bayesian analysis calculates the likelihood ratio for each of the BF01
and BF10 terms. The calculations involved with estimating the likelihood ratios is not terribly
complicated. However, I have created an Excel file that does the calculations in an
automated fashion for a bivariate correlation (Download here: https://tinyurl.com/hyjcyar).
All that needs to be inputted into the Excel file is the observed correlation and the sample
size (i.e., row H1; input N and R; for a bivariate correlation k = 2).
I performed a Bayesian analysis on the years of clinical experience and improvement
example data introduced in the Foundations section of this chapter. As can be seen below,
the Bayesian analysis showed that the data were 4.778 times more likely to occur under the
null model (BF01), in comparison to the alternative model.

Raftery (1995) provided guidelines to interpret Bayes factors. I have listed the
guidelines in table C5.7. Based on the guidelines, the Bayes factor of 4.778 can be
interpreted as positive support for the null hypothesis. In my view, one would need at least
‘strong’ support in favour of the null hypothesis to seriously consider the data more
consistent with the null hypothesis.

Table C5.7. Raftery’s (1995) Guidelines for Interpreting Bayes Factors


1 to 3 weak
3 to 20 positive
20 to 150 strong
> 150 very strong

I’ll note that there are programs and websites that can calculate Bayes factors for a
correlation/regression weight. Furthermore, those programs/websites may give results a
little different to the results provided by the Excel file I prepared. The reason the results may
differ is that some approaches to Bayesian analysis incorporate additional information (or
assumptions) that can affect the results. That does not necessarily imply that those
programs/websites yield results that are more (or less) accurate than what is reported by
the Excel file I prepared.

C3.35
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Correction for Small Sample Bias


It was mentioned in the Foundations section of this chapter that the sum of the
cross-products in the Pearson correlation formula needs to be divided by N – 1, rather than
N, because N – 1 yields more accurate estimates of the effect in the population. As it turns
out, dividing by N – 1 is not fully accurate. Although not well-known, it has been
established that the conventional Pearson correlation formula, with N – 1 in the
denominator, applied to a sample of data underestimates the correlation in the population
slightly (Olkin & Pratt, 1958). The underestimation can be corrected with the following
formula:
 1− r2 
r* = r 1 +  for N > 7
 2( N − 3) 
Thus, for example, a Pearson correlation of r = .59 estimated from a sample of N = 11 would
yield a p-value of .056. Consequently, the null hypothesis could not be rejected. However,
once the Olkin and Pratt correction is applied, the estimated correlation is r* = .614, p = .044.
I have created an Excel function to estimate the corrected correlation, as well as the
corresponding t-value and p-value, if you’re desperate for a little more statistical power. The
Excel file with the function can be downloaded here: http://tinyurl.com/zkgapeg

Correction for Range Restriction (in progress)

C3.36
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Practice Questions

1: How strongly related are procrastination and exam performance?


Procrastination affects a lot of people’s lives negatively. Tice et al. (1997) measured
procrastination among 57 university students to test the hypothesis that there is an
association between self-reported procrastination and exam performance. I have simulated
some data to correspond exactly to the results reported by Tice et al. (1997). Test the null
hypothesis that there is no association between individual differences in self-reported
procrastination and exam performance in university (Data File: procrastination_exams)
(Watch Video 5.P1: Procrastination and Exams Pearson Correlation (Practice 1)).

2: Can people estimate how physically attractive they are accurately?


Everyone has an impression of how physically attractive they are. However, how
accurate are such impressions? In a personality and perceived physical attractiveness study,
Dunkel et al. (2016) analysed data that included self-reported physical attractiveness scores
and rater-reported physical attractiveness scores for 25- to 34-year-olds. The participants
responded to the following question: “How attractive are you?” on a 4-point scale 1 = not
all attractive, 2 = slightly attractive, 3 = moderately attractive, and 4 = very attractive. At the
end of the testing session, each participant was rated for attractiveness (“How physically
attractive is the respondent?”) by the research assistant on a 5-point scale: 1 = very
unattractive, 2 = unattractive, 3 = about average, 4 = attractive, and 5 = very attractive. I
have simulated some data to correspond very closely to the results reported by Dunkel et
al. (2016). For the sake of this example, I have specified the sample size at 100. What is the
correlation between self-reported attractiveness and rater-reported attractiveness? Is the
correlation significant statistically? (Data File: self_other_attractiveness) (Watch Video
5.P2: Self-Rated and Other Rated Attractiveness Spearman Correlation (Practice 2))

3: Does goal-setting relate to university performance?


Cetin (2015) was interested in individual differences in goal-setting behaviour and
performance in university. To investigate the hypothesized association between the two
variables, Cetin (2015) collected goal-setting and grade point average (GPA) data from 166
university students. Goal-setting was measured with a subscale (interval level of
measurement) from the Academic Self-Regulated Learning Scale (Magno, 2010). Student
GPA was measured via self-report. I have simulated some data to correspond to the results
found in Cetin (2015) 7 Test the null hypothesis that there is no association between self-
reported goal-setting behaviour and self-reported academic performance (GPA) (Data File:

7
Cetin’s (2015) GPA data were range restricted (GPA SD = .41). Consequently, I simulated
the data such that it would be consistent with a GPA standard deviation of .71, rather than
.41.
C3.37
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

goal_setting_gpa) (Watch Video 5.P3: Goal Setting and University GPA Pearson Correlation
(Practice 3)).

4: Can students be trusted to report their own grade point average (GPA)?
Researchers often rely upon participants to self-report a particular piece of
information. They mostly assume that information is accurate. Cassady (2001) asked 89
university students to report their GPA. He then obtained the actual GPA of each student
from official university records. Test the null hypothesis that there is no association between
self-reported GPA and actual GPA (Date File: self_vs_records_gpa) (Watch Video 5.P4: Self-
Reported vs Actual GPA Pearson Correlation (Practice 4)).

5: Are extreme bodybuilders narcissists?


It’s seems logical to think that people essentially obsessed with their looks may tend
to be narcissistic. Muscle dysmorphia is a condition characterised by an intense
preoccupation with increasing one’s muscle mass and muscularity. Collis, Lewis and Crisp
(2016) were interested to know whether individual differences in muscle dysmorphia were
related to individual differences in narcissism. To this effect, they collected data from 117
males who were mostly weight trainers. Muscle dysmorphia was measured with the 25-item
Muscle Dysmorphia Inventory. Narcissism was measured with the 40-item Narcissistic
Personality Charactersitc-40. Test the null hypothesis that correlation between muscle
dysmorphia and narcissism is zero (Data File: dysmorphia_narcissism) (Watch Video 5.P5:
Muscle Dysmorphia and Narcissism Pearson Correlation (Practice 5)).

6: Can teachers be judged in 30 seconds?


Ambady and Rosenthal (1993) had 13 university teachers (6 women) videotaped
while they were teaching sections of an undergraduate course. Then, Ambady and Rosenthal
(1993) prepared 30 second video clips of each teacher in action. The 30 second video clips
were viewed and rated by nine strangers on 15 dimensions (e.g., warm, competent,
professional, etc.). An overall ‘global’ teaching variable was created by averaging the ratings
across the 15 dimensions. Finally, Ambady and Rosenthal (1993) obtained each of the 13
teacher’s student ratings obtained from their own students at the end of the semester.
Estimate the correlation between stranger ratings and student ratings. I have simulated
some data to correspond very closely to the results reported by Ambady and Rosenthal. The
data can be found in the table below (Watch Video 5.P6: Judging Teachers Pearson
Correlation in SPSS (Practice 6)).

C3.38
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

ID Stranger Student
1 2.28 3.22
2 1.50 1.18
3 1.38 1.97
4 2.98 4.02
5 4.71 4.82
6 3.19 3.78
7 2.18 1.34
8 3.52 3.33
9 1.00 1.28
10 4.42 2.45
11 2.56 2.64
12 1.60 1.86
13 1.21 2.05

Advanced Practice Questions

6: Which hemisphere is mode substantially involved in the experience of fear?


An area of the human brain known as the amygdala has been implicated in the
experience of fear. To test this hypothesis, Phelps et al. (2001) measured brain activation in
12 adults who were exposed to the threat of fearful experience (i.e., electric shock). Phelps
et al. (2001) were particularly interested in evaluating whether the left-hemisphere versus
the right-hemisphere were differentially activated during the threat of a fearful experience.
Test the hypothesis that the threat of a fearful experience (measured on a continuous scale)
was related to left-hemisphere amygdala activation (measured on a continuous scale).
Secondly, test the hypothesis that the threat of a fearful experience (measured on a
continuous scale) was related to right-hemisphere amygdala activation (measured on a
continuous scale). Finally, test the null hypothesis that the threat of fear by left-hemisphere
amygdala activation correlation and the threat by right-hemisphere amygdala correlation
were equal. (Data File: amygdala_fear) (Watch Video 5.P7: Test the Difference Between
Two Correlations (Practice 8))

C3.39
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

References
American Psychological Association. (2001). Publication Manual of the American
Psychological Association (5th edn). Washington, DC.
American Educational Research Association. (2006). Standards for reporting on empirical
social science research in AERA publications. Educational Researcher; 35, 33–40.
Bishara, A. J., & Hittner, J. B. (2012). Testing the significance of a correlation with
nonnormal data: comparison of Pearson, Spearman, transformation, and
resampling approaches. Psychological Methods, 17(3), 399.
Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson,
Kendall and Spearman correlations. Psychometrika, 65(1), 23-28.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997-1003.
Collis, N., Lewis, V., & Crisp, D. (2016). When Is buff enough? The effect of body attitudes
and narcissistic traits on muscle dysmorphia. The Journal of Men's Studies, 24(2),
213-225.
Cousineau, D. (2011). Randomization test of mean is computationally inaccessible
when the number of groups exceeds two. Tutorial in Quantitative Methods for
Psychology, 7(1), 15-18.
Dunkel, C.S., Nedelec, J.L., van der Linden, D. et al. (2016). Physical attractiveness and
the general factor of personality. Adaptive Human Behavior and Physiology.
doi:10.1007/s40750-016-0055-7
Edgell, S., & Noon, S. (1984). Effect of violation of normality on the t test of the
correlation coefficient. Psychological Bulletin, 95, 576–583.
Havlicek, L., & Peterson, N. (1977). Effect of the violation of assumptions upon
significance levels of the Pearson r. Psychological Bulletin, 84, 373–377.
Hayes, A. F. (1998). SPSS procedures for approximate randomization tests.
Behavior Research Methods, Instruments, & Computers, 30(3), 536-543.
Jarosz, A. F., & Wiley, J. (2014). What are the odds? A practical guide to computing and
reporting Bayes factors. The Journal of Problem Solving, 7(1), 2.
Lamont, L. M., & Lundstrom, W. J. (1977). Identifying successful industrial salesmen by
personality and personal characteristics. Journal of Marketing Research, 517-529.
O'Brien, R. M., & Homer, P. (1987). Corrections for coarsely categorized measures:
LISREL's polyserial and polychoric correlations. Quality and Quantity, 21(4), 349-
360.
Raftery, A. E. (1995). Bayesian model selection in social research. In P. V. Marsden (Ed.),
Sociological methodology 1995 (pp. 111–196). Cambridge, MA: Blackwell.
Tice, D. M. & Baumeister, R. F. (1997). Longitudinal study of procrastination, performance,
stress, and health: The costs and benefits of dawdling. Psychological Science, 8(6),
454-458.
Weaver, B., & Koopman, R. (2014). An SPSS macro to compute confidence intervals for
Pearson’s correlation. Quantitative Methods for Psychology, 10(1), 29-39.

C3.40
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.
CHAPTER 5: BIVARIATE CORRELATION

Zagorsky, J. L. (2007). Do you have to be smart to be rich? The impact of IQ on


wealth, income and financial distress. Intelligence, 35(5), 489-501.

C3.41
Gignac, G. E. (2023). How2statsbook (Online Edition 2). Perth, Australia: Author.

You might also like