Correlation Analysis
Correlation Analysis
Quantitiative
Variable Y\X Ordinal X Nominal X
X
Quantitative
Pearson r Biserial rb Point Biserial rpb (ad)
Y
2. What is the range of values that a correlation coefficient may take? How is the particular range of values of
correlation coefficient interpreted?
The main result of a correlation is called the correlation coefficient. It ranges from -1.0 to +1.0. The closer r is to
+1 or -1, the more closely the two variables are related.
If r is close to 0, it means there is no relationship between the variables. If r is positive, it means that as one
variable gets larger the other gets larger. If r is negative it means that as one gets larger, the other gets smaller (often
called an "inverse" correlation).
While correlation coefficients are normally reported as r, squaring them makes then easier to understand. The
square of the coefficient is equal to the percent of the variation in one variable that is related to the variation in the other.
A correlation report can also show a second result of each test - statistical significance. In this case, the
significance level will tell you how likely it is that the correlations reported may be due to chance in the form of random
sampling error. If you are working with small sample sizes, choose a report format that includes the significance level.
This format also reports the sample size.
3. For each correlation coefficient, provide a description and an illustrative example to show its appropriateness and
how it can be computed.
a. Person-product Moment Correlation
n xy( x )( y )
r=
Formula
n ( x )( x ) n ( y )( y )
2 2 2 2
The following table shows the grades obtained by six students in Algebra and
Trigonometry. Compute for the Pearson-product moment correlation coefficient.
Student
1 2 3 4 5 6
No.
Algebra 83 78 94 90 88 88
Trigonomet
82 83 93 94 84 86
ry
To solve for the correlation coefficient, some values in the formula must be obtained.
Example
x y x2 y2 xy
83 82 6889 6724 6806
78 83 6084 6889 6474
94 93 8836 8649 8742
90 94 8100 8836 8460
88 84 7744 7056 7392
88 86 7744 7396 7568
x=5 y=5 x2=45 y2=45 xy=45
21 22 397 550 442
Computation:
n xy( x )( y )
r=
n ( x )( x ) n ( y )( y )
2 2 2 2
( 6 ) ( 45442 )(521)(522)
r=
( ( 6 )( 45397 )( 521 )2 )( ( 6 )( 45550 )( 522 )2 )
b. Phi-coefficient
Description
The phi coefficient is a measure of the degree of association between two binary or dichotomous variables. This measure is
similar to the correlation coefficient in its interpretation because it was also formulated by Karl Pearson.
Formula
adbc
=
efgh X- X+ Total
Y- a b e
Phi compares the product of the diagonal cells (a*d) to the product of the
Y+ c d f
off-diagonal cells (b*c). The denominator is an adjustment that ensures
Total g h n
that Phi is always between -1 and +1.
Interpretation
The table below shows the first time driving test results of a sample of 200 individuals classified by gender and success or
failure in the examination. We wish to explore the association between the two variables, the null hypothesis being that there
is no relationship between gender and success/failure in driving test results.
The point biserial correlation coefficient (rpb) is a correlation coefficient used when
Description one variable is dichotomous; Y can either be "naturally" dichotomous, like
gender, or an artificially dichotomized variable. In most situations it is not
advisable to artificially dichotomize variables.
To calculate rpb, assume that the dichotomous variable Y has the two values 0 and 1. If
we divide the data set into two groups, group 1 which received the value "1" on Y and
group 2 which received the value "0" on Y, then the point-biserial correlation
coefficient (for population) is calculated as follows:
r pb=
Sn n
2
M 1M 2 n1 n0
Where:
sn is the standard deviation used when you have data for every member of the
population:
Formula sN =
( x x )2
N
M1 being the mean on the continuous variable X for all data points in group 1, and M0
the mean on the continuous variable X for all data points in group 2.
n1 is the number of data points in group 1, n0 is the number of data points in group 2
and n is the total sample size. There is an equivalent formula that uses sn1:
r pb=
M 1M 2
Sn
n1 n 0
n(n1) n
s=
( x x )2
n1
Interpretation Pett (1997) asserts that the same criteria for evaluating the coefficient of determination
in regard to standard correlation can be applied to rpb2 because of the close relationship
between rpb and the Pearson r. The coefficient of determination in the form of rpb2,
therefore, is a useful index for drawing conclusions from the data.
Next, the researcher collects a small sample of 18 participants for her study, gathering the
following information(Table 1):
Use of Public
Participant Car Ownership Transportation
3
1 No
2 No 12
3 No 10
4 No 11
5 No 12
6 No 23
7 No 14
8 No 0
9 No 16
10 Yes 0
11 Yes 2
Example 12 Yes 1
13 Yes 0
14 Yes 3
15 Yes 4
16 Yes 0
17 Yes 0
18 Yes 1
The next step would be to code the responses Yes as 0 and No as 1, making vehicle
ownership into a numerically dichotomous variable. At first glance, this may seem
counterintuitive because we associate zero as negative response (no) and 1 as positive
response (yes). However, because the researcher hypothesizes the effects of not having
a car rather than having a car will be in terms of an increase in public transportation use,
the researcher will code No responses as 1 as Yes responses as 0. Recall that the
researcher wants to know about lack of car ownership, not car ownership, couching the
hypothesis in terms of a positive relationship.
The correlation coefficient, 0.735means that those who do not own cars tend to use
public transportation more.
A monotonic relationship is a relationship that does one of the following: (1) as the value
of one variable increases, so does the value of the other variable; or (2) as the value of
one variable increases, the other variable value decreases. A monotonic relationship is an
important underlying assumption of the Spearman rank-order correlation. It is also
important to recognize the assumption of a monotonic relationship is less restrictive than
a linear relationship.
Formula Where di is the difference in the paired ranks and n is the number of cases.
The formula to use when there are tied ranks is:
( x i x ) ( y i y )
i
=
( x x ) ( y y )
2 2
i i
i i
The Spearman correlation coefficient, rs, can take values from +1 to -1. A rs of +1
indicates a perfect association of ranks, a rs of zero indicates no association between
Interpretation
ranks and a rs of -1 indicates a perfect negative association of ranks. The closer rs is to
zero, the weaker the association between the ranks.
The table which follows shows the scores of 10 high school students in an English and
Filipino exam. Both were 40-item tests.
English 18 20 14 34 40 35 7 10 28 38
Filipino 27 30 25 36 38 29 24 22 35 40
English 18 20 14 34 40 35 7 10 28 38
Filipino 27 30 25 36 38 29 24 22 35 40
Example
Eng(Rank
7 6 8 4 1 3 10 9 5 2
)
Fil(Rank) 7 5 8 3 2 6 9 10 4 1
d 0 1 0 1 1 3 1 1 1 1
d2 0 1 0 1 1 9 1 1 1 1
6 d i2 6 ( 16 ) 96
=1 =1 =1 =10.097=0.91
2
n ( n 1 ) 10 ( 10 1 )2
990
The spearman rho value of 0.91 indicates a strong positive relationship between the two
variables.
2 ( y 1 y 0 )
r rb =
n
Formula Where n is the number of data pairs, and Y0 and Y1 are the Y score means for data pairs
with an x score of 0 and 1 respectively. These Y scores are ranks and the formula
assumes no tied ranks are present.
Example The table shows the performances of 12 Grade 7 students in Science during the first
Von Christopher G. Chua, LPT, MST
quarter of the school year.
Stude Ran Studen Ran
Sex Grade Sex Grade
nt No. k t No. k
1 M 82 8 7 F 79 11
2 M 85 7 8 F 81 9
3 M 87 5 9 F 95 1
4 M 80 10 10 F 86 6
5 M 90 2 11 F 89 3
6 M 88 4 12 F 73 12
2 ( y 1 y 0 ) 2(76) 2(1) 2
y 1=7 y 0=6 n=12 r rb = = = = =0.17
n 12 12 12
Formula
( )
r b= ( Y 1Y 0 )
Y
Y
Where Y0 and Y1 are the Y score means for the data pairs with an x score of 0 and 1,
respectively, q=1-p and p are the proportions of data pairs with x scores of 0 and 1, and
Y is the populations standard deviation for the y data, and Y is the height of the
standardized normal distribution at the point z.
The following data presents the test scores in Math of seven college students together
with their anxiety level during the exam. A two-point scale was used to measure anxiety
level where 0 corresponds to relaxed and 1 to anxious.
Test Score 65 78 84 90 88 93 70 83
Anxiety Level 0 0 1 0 1 1 1 0
Example
pq
( )
r b= ( Y 1Y 0 )
Y
Y
( )
(0.5)(0.5)
3.99 0.06
r b= ( 83.7579 )
9.16
=( 4.75 )
9.16 ( )
=( 4.75 ) ( 0.0068 )=0.03
g. Tetrachoric Coefficient
Description The tetrachoric correlation for binary data, and the polychoric correlation, for ordered-
Von Christopher G. Chua, LPT, MST
category data, are excellent ways to measure rater agreement. They estimate what the
correlation between raters would be if ratings were made on a continuous scale; they are,
theoretically, invariant over changes in the number or "width" of rating categories. The
tetrachoric and polychoric correlations also provide a framework that allows testing of
marginal homogeneity between raters. Thus, these statistics let one separately assess both
components of rater agreement: agreement on trait definition and agreement on
definitions of specific categories.
The tetrachoric correlation coefficient, rtet, is used when both variables are dichotomous,
like the phi, but we need also to be able to assume both variables really are continuous
and normally distributed. Thus it is applied to ordinal vs. ordinal data which has this
characteristic. Ranks are discrete so in this manner it differs from the Spearman. The
formula involves a trigonometric function called cosine.
180
Formula
r tet =cos
( 1+
BC
AD
)
Example
References:
(1) http://www.surveysystem.com/correlation.htm
(2) https://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide.php
(3) -http://www.pmean.com/definitions/phi.htm
(4) http://en.wikipedia.org/wiki/Correlation_and_dependence
(5) http://www.andrews.edu/~calkins/math/edrm611/edrm13.htm
(6) http://www.statistics.com/index.php?page=glossary&term_id=538