0% found this document useful (0 votes)
124 views9 pages

Pearson Correlation Example

This document discusses Pearson correlation, which is used to determine the strength and direction of a linear relationship between two continuous variables. It describes the key aspects of Pearson correlation, including: 1) The correlation coefficient (r) ranges from -1 to 1, with positive values indicating an direct relationship and negative an inverse relationship. Stronger correlations are closer to -1 or 1. 2) There are five assumptions that must be met to conduct a Pearson correlation - the variables must be continuous, paired, normally distributed, linearly related, and meet the assumption of homoscedasticity. 3) An example is provided demonstrating how to check the assumptions, conduct the test, and interpret the results using SPSS on

Uploaded by

Franzon Melecio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views9 pages

Pearson Correlation Example

This document discusses Pearson correlation, which is used to determine the strength and direction of a linear relationship between two continuous variables. It describes the key aspects of Pearson correlation, including: 1) The correlation coefficient (r) ranges from -1 to 1, with positive values indicating an direct relationship and negative an inverse relationship. Stronger correlations are closer to -1 or 1. 2) There are five assumptions that must be met to conduct a Pearson correlation - the variables must be continuous, paired, normally distributed, linearly related, and meet the assumption of homoscedasticity. 3) An example is provided demonstrating how to check the assumptions, conduct the test, and interpret the results using SPSS on

Uploaded by

Franzon Melecio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Pearson Correlation

Introduction
Correlation involving two variables, sometimes referred to as bivariate correlation. Bivariate
correlation involves only one group, but two different variables are gathered from each participant, and of
course, the comparison needs to inherently make sense. Whereas it is reasonable to consider the correlation
between the amount of time a student spent taking an exam and the grade on that exam, it is implausible to
assess the correlation between shoe size and exam grade, even though shoe size is a continuous variable.

The correlation value, also termed as correlation coefficient, is notated using a lowercase r and has a
value between −1 and +1. Correlations have two primary attributes: direction and strength. (The notation r is
a sample correlation coefficient; ρ is population correlation coefficient.)

Direction is indicated by the sign of the r value: − or +. Positive correlations (r = 0 to +1) emerge
when the two variables move in the same direction. For example, we would expect that low homework hours
would correlate with low grades, and high homework hours would correlate with high grades. Negative
correlations (r = −1 to 0) emerge when the two variables move in different directions. For example, we
would expect that high alcohol consumption would correlate with low grades, just as we would expect that
low alcohol consumption would correlate with high grades.

Strength is indicated by the numeric value. A correlation wherein the r is close to 0 is considered
weaker than those nearer to −1 or +1. Continuing with the prior example, we would expect to find a strong
positive correlation between homework hours and grade (e.g., r = +.80); conversely, we would expect to
find a strong negative correlation between alcohol consumption and grade (e.g., r = −.80). However, we
would not expect that a variable such as height would have much to do with academic performance, and
hence we would expect to find a relatively weak correlation between height and grade (e.g., r = +.02 or r =
−.02).

Pearson correlation, also known as Pearson product-moment correlation, is used to determine the
strength and direction of a linear relationship between two continuous variables. It is most often used to
analyse the results of two types of study design: (a) to determine if there is a relationship between two
variables; and (b) to determine whether there is a relationship between one or more changes in variables.

In order to run a Pearson correlation, there are five assumptions that need to be considered. The first
two relate to your choice of study design and the measurements you chose to make, while the other three
relate to how your data fits the Pearson correlation model. These assumptions are as follows.

Basic Requirements of Pearson Correlation:


1. Your two variables should be measured on a continuous scale (i.e., they are measured at the interval
or ratio level). Examples of continuous variables include review time (measured in hours),
intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight
(measured in kg), and so forth.
2. Your two continuous variables should be paired, which means that each case (e.g., each participant)
has two values: one for each variable. For example, imagine that you had collected the review times
(measured in hours) and exam results (measured from 0 to 100) from 100 randomly sampled students
at a university (i.e., you have two continuous variables: "review time" and "exam performance"). Each
of the 100 students would have a value for revision time (e.g., "student #1" studied for "23 hours")
and exam performance (e.g., "student #1" scored "81 out of 100"). Therefore, you would have 100
paired values.
3. Normality (in each of the two variables involved)
4. Linearity – there should be a linear relationship between your two continuous variables as evident in
the scatterplot.
5. Homoscedasticity – it pertains to the density of the points (represent pairs of values of the two
variables) along the regression line – the cloud of points is densest in the middle and tapers off at the
ends as evident in the scatterplot.

1|rtqcasem2022
Similarly with normality, linearity and homoscedasticity could also be checked graphically. The
following scatterplots with trend line show when linearity (or homoscedasticity) is satisfied and when it is
not. The points in the scatterplots represent the paired values (e.g. x , y ) for the two variables, one variable is
plotted along the x−¿ axis and the other is plotted along the y−¿axis.

Linearity satisfied Linearity violated

The points on the scatterplot should form a If the overall shape of the points departs into
relatively straight line; the regression line some other shape(s) that is not conducive to
should take a middle-of-the-road path drawing a straight (regression) line through
through the cloud of points. it, this would constitute a violation of the
linearity assumption.

Homoscedasticity satisfied Homoscedasticity violated

The criterion of homoscedasticity is The criterion of homoscedasticity is violated


satisfied when the cloud of points is densest if the points are, instead of being densest in
in the middle and tapers off at the ends. the middle, being concentrated in some other
way.

The last three assumptions can be checked by exploring the data set. The normality test of the data
can be performed through the Explore function while the linearity and homoscedasticity can be verified
based on the results produced upon the actual correlation test run. In cases where these three criteria are not
satisfied for the Pearson correlation test, the Spearman correlation test, which is conceptually similar to the
Pearson correlation test, is the better option.

To illustrate how to run the Pearson correlation, let us consider the following example.

Example
An instructor wants to determine if there is a relationship between how long a student spends taking a
final exam (2 hours are allotted) and his or her grade on the exam (students are free to depart upon
completion). The instructor briefs the students that they are welcome to quietly leave the room upon
completing the exam. At the start of the exam, the instructor will start a stopwatch. When each student hands
in his or her exam, the instructor refers to the stopwatch and records the time (in minutes) on the back of
each exam. Is there a correlation between how long it takes for a student to complete an exam and the grade
on that exam?

2|rtqcasem2022
Data Set
Open the data set: Pearson correlation test.

What test is appropriate? (based on research study design)


In the given case, the variables are time (in minutes) taking the exam and the grade on the exam,
which are both measured in continuous scale. There are 30 individuals involved with paired measurements.
Thus, based on the study design (considering assumptions 1 and 2), we can use Pearson correlation.

Hypotheses
H0: There is no significant relationship between the length of time spent taking the exam and the
grade on the exam.
H1: There is a significant relationship between the length of time spent taking the exam and the
grade on the exam.

Checking for DATA ASSUMPTIONS for Pearson Correlation


We proceed in checking for the last three assumptions by exploring the data set.

Checking for Normality


SPSS Procedure:
1. SPSS Syntax: Analyze > Descriptive Statistics > Explore (a dialogue box appears)
2. Select and transfer the variables,  time and grade, into the Dependent List  box using
the arrow button (or drag-and-drop the variables into the boxes), as shown in Figure 1.
3. Click then tick Histogram, tick ‘Normality plots with tests’ (see Figure 2).
4. Click on the   button.
5. Click on the   button. This will generate the output.

Figure 1 Figure 2

Figure 4

3|rtqcasem2022
Figure 3
Obtained from the generated SPSS output are Figures 3 and 4. In Figure 3, you notice that the
box plots, as well as the histograms, show that the data distribution of both the variables time and grade
are approximately normal (and there are also no significant outliers). This can be verified using the
result of the normality test presented in Figure 4. The Shapiro-Wilk test confirmed that, indeed, the data
for both variables follow a normal distribution as indicated by the obtained p values (0.464 and 0.392)
which are both higher than 0.05 level of significance. 

Checking for Linearity and Homoscedasticity in the Pearson Correlation Test Run
The test run for Pearson correlation involves two parts: First, we will graph the paired variables
using a scatterplot, which will provide a clear graph showing the paired points from both variables on a
chart along with the regression line, sometimes referred to as a trend line, which can be thought of as the
average pathway through the points. Next, we will process the correlation table, which will render the
correlation value (r ) and the corresponding p value.
In order to check whether or not the linearity and homoscedasticity assumptions are satisfied, we
need to graph the paired variables using scatterplot with a regression line.

SPSS Procedure: (Part 1 – generating the scatterplot)


1. SPSS Syntax: Graphs > Chart Builder (see Figure 5)
2. A dialogue window (Figure 6) will appear. Click on the   button.
3. You will be presented with the Chart Builder window (Figure 7). Select "Scatter/Dot" (by clicking
on it) from the Choose from box in the bottom-left-hand corner of the Chart Builder window, as
shown in Figure 8.
4. Selecting "Scatter/Dot" will present eight different scatter/dot options in the lower-middle section of
the Chart Builder window. Drag-and-drop the top left-hand option (you will see it labelled as
"Simple Scatter" if you hover your mouse over the box) into the main chart preview pane, as shown
in Figure 9.
5. You will be presented with additional window, the Element Properties window. Shown in the main
chart preview pane is a simple scatterplot with boxes for the y−¿axis ("Y-Axis?") and x−¿ axis
("X-Axis?") for you to populate with the appropriate variables. Drag-and-drop the variable time
from the Variables box into the "X-axis?" box in the main chart preview screen, and do the same for
the variable grade, but into the "Y-axis?" box. You should end up with a screen like in Figure 10.
(You can ignore the box labelled "Filter?".)
NOTE: Although the Pearson correlation does not make any distinction between dependent and independent
variables, it is still customary to consider the graph's x−¿ and y−¿axes in such a way. For example, this study is
assuming that time in taking the test affects the grade in the exam, not the other way around. Therefore, time in
taking the test goes on the x−¿ axis and grade in the exam on the y−¿axis, even though the Pearson correlation
does not make this distinction.

6. Click on "Y-Axis1 (Point1)" in the Edit Properties of box located in the Element Properties
window. Uncheck the Minimum option in the –Scale Range– area so that the Custom box is
highlighted and has a value of 0 (zero), as shown in Figure 11. (Unchecking the Minimum option is
intended to make the y−¿axis show a suitable range of values for the variable time.)
7. Click on the button in the Element Properties window to confirm these changes.
8. Click on the button in the Chart Builder window to generate the scatterplot (Figure 12).
9. When the scatterplot emerges, you will need to produce the regression line. In the Output panel,
double-click on the scatterplot. This will bring you to the Chart Editor window (see Figure 13).
10. Click the “Add Fit Line at Total” icon (emphasized in Figure 13) to include the regression line on
the scatterplot. (Just ignore the Properties dialogue box will appear in your screen.)

4|rtqcasem2022
11. When you see the regression line emerge on the scatterplot, close the Chart Editor, and you will
see that the regression line is now included on the scatterplot in the Output panel (see Figure 14).

Figure 5

Figure 6

Figure 7 Figure 8 Figure 9

Figure 10

Figure 12

5|rtqcasem2022
Figure 11

Figure 13
Figure 14

The coordinates of each point on the scatterplot in Figure 14 are derived from the two variables:
time and grade for each individual. The 30 paired values in the data set are shown in the scatterplot as
30 points/dots. For this example, you can conclude from visual inspection of the scatterplot in Figure 14
that there is a linear relationship between time spent in taking the test and grade in the exam. Thus, the
linearity criterion is satisfied. 
In this example, the linear relationship between our variables is positive (the line rises to the
right); that is, as the value of time spent in taking the test, so does the grade in the exam. However,
when testing your own data, you might discover a negative relationship (i.e., as the value of one variable
increases, the value of the other variable decreases; the line falls to the right). You might also find that
your line/relationship might be steeper or shallower than the line/relationship in this particular example.
However, for assessing linearity, all that matters is whether or not the relationship is linear (i.e., a
straight line) in order to proceed.
Moreover, from the 30 points in the scatterplot in Figure 14, most of them are located at the
middle portion of the line, while some points are plotted towards the ends of the line. This means that
the homoscedasticity criterion is satisfied. 
Hence, all the assumptions for a Pearson correlation are met. So, we can proceed to part 2.

SPSS Procedure: (Part 2 – correlation test)


1. SPSS Syntax: Analyze > Correlate > Bivariate (see Figure 15)
2. On the Bivariate Correlations window (Figure 16), move the   time and grade variables from
the left panel to the right (Variables) panel. Make sure that the Pearson checkbox is checked.
3. Click on the   button, and the correlation table will be generated (see Figure 17).

6|rtqcasem2022
Figure 16

Figure 15

Figure 17
Interpreting the Results of Pearson Correlation
The magnitude of the Pearson correlation coefficient determines the strength of the correlation.
Although there are no hard-and-fast rules for assigning strength of association to particular values, we
usually use the following table:

The correlation table in Figure 17 shows a positive strong correlation (r =¿ .815) between time
and grade. The positive correlation pertains to the positive slope (increases to the right) of the regression
line.
We can understand the correlation coefficient even better through the coefficient of determination.
Coefficient of determination is the proportion of variance in one variable that is "explained" by the other
variable and is calculated as the square of the correlation coefficient (r 2 ). In this example, we have a
coefficient of determination, r 2 = .8152=0.6642 . This can also be expressed as a percentage, which is
approximately 66%. Remember that this "explained" refers to being explained statistically, not causally.
You could report this as: The time spent in taking the test statistically explained 66% of the variability in
grade in the exam.
Figure 17 also shows that the probability value for the test ( p value) is .000. Despite the “.000”
that is presented in the output, the pvalue never really reaches zero, but it is definitely less than 0.0005).
Since the p value is less than the .05 level of significance, and the correlation coefficient r is greater than
zero, the null hypothesis is rejected. We would say that there is a statistically significant positive
correlation ( p < .0005, α = .05) between time and grade. (Figure 17 shows that the relationship is even
statistically significant at α = .01.) The degrees of freedom is n−2=28 .

Reporting the Results

7|rtqcasem2022
A Pearson's product-moment correlation was run to assess the relationship between
time in taking the test and grade in the exam of 30 individuals.
Preliminary analyses showed the relationship to be linear with both variables normally
distributed, as assessed by Shapiro-Wilk's test ( p > .05). The homoscedasticity criterion was
also satisfied as evident in the scatterplot.
There was a statistically significant, strong positive correlation between time in taking
the test and grade in the exam, r (28) = .815, p < .0005, with time spent watching TV
explaining 66% of the variation in the grade in the exam.

8|rtqcasem2022
9|rtqcasem2022

You might also like