Correlation
Correlation
Is caffeine related to
heart damage?
Is there a relationship
between a person’s age
and his/her blood
pressure?
•Is relationship
satisfaction
related to
commitment?
•These are only a few of the many
questions that can be answered by
using techniques of correlation and
regression analysis.
Correlation is a statistical technique
that is used to measure and describe
the relationship between two
variables. Usually the two variables
are simply observed as they exist
naturally in the environment—there
is no attempt to control or manipulate
the variables.
Sir Francis Galton
1822-1911
Geographer, meteorologist,
tropical explorer, inventor of
fingerprint identification,
eugenicist, half-cousin of
Charles Darwin and best-selling
author.
Galton
◼ Obsessed with
measurement
◼ Tried to measure
everything from the
weather to female
beauty
◼ Invented correlation and
regression
Correlation
• Correlation is a topic that focuses on the
direction and degree of the relationship.
• The direction of the relationship refers to
whether the relationship is positive or negative.
• The degree of relationship refers to the
magnitude or strength of the relationship. The
degree of relationship can vary from nonexistent
to perfect. When the relationship is perfect,
correlation is at its highest and we can exactly
predict from one variable to the other.
CORRELATION
•The degree of relationship between the
variables under consideration is
measured through correlational analysis.
•The measure of correlation is called the
CORRELATION COEFFICIENT.
The characteristics of a
relationship
1. The Direction of the Relationship
The sign of the correlation, positive or negative,
describes the direction of the relationship.
• In a positive correlation, the two variables tend to change in
the same direction: as the value of the X variable increases
from one individual to another, the Y variable also tends to
increase; when the X variable decreases, the Y variable also
decreases.
• In a negative correlation, the two variables tend to go in
opposite directions. As the X variable increases, the Y
variable decreases. That is, it is an inverse relationship.
•A positive relationship indicates
that there is a direct relationship
between the variables.
•A negative relationship indicates
that there is an inverse
relationship between X and Y.
2. The Form of the Relationship
•The most common use of correlation is to
measure straight-line relationships.
However, other forms of relationships do
exist and there are special correlations used
to measure them.
3. The Strength of the
Relationship
For example:
13-27
Correlation
•A scatter diagram/plot can be used to
determine whether a linear (straight
line) correlation exists between two
variables.
Scatter Diagram
13-29
Dependent vs. Independent Variable
DEPENDENT VARIABLE
(DV)The variable that is
being predicted or
estimated. It is scaled on the
Y-axis.
13-30
• Linear Correlation
Correlation is said to be linear when the amount of
change in one variable tends to bear a constant ratio
to the amount of change in the other. Th graph of
variable having a linear relationship will form a
straight line.
• Non Linear correlation
The correlation would be nonlinear if the amount of
change in one variable does not bear a constant
ratio to the amount of change in the other variable.
Types of Correlation
Dependent vs. Independent Variable
DEPENDENT VARIABLE
(DV)The variable that is
being predicted or
estimated. It is scaled on the
Y-axis.
13-33
Example: Constructing a Scatter Plot
Advertising Company
• A marketing manager conducted expenses Sales
a study to determine whether (P1000), x (P1000), y
there is a linear relationship
between money spent on 2.4 225
advertising and company sales. 1.6 184
The data are shown in the table. 2.0 220
Display the data in a scatter plot. 2.6 240
1.4 180
1.6 184
2.0 186
2.2 215
Solution: Constructing a Scatter Plot
𝑑𝑒𝑔𝑟𝑒𝑒 𝑡𝑜 𝑤ℎ𝑖𝑐ℎ 𝑋 𝑎𝑛𝑑 𝑌 𝑣𝑎𝑟𝑦 𝑡𝑜𝑔𝑒𝑡ℎ𝑒𝑟
r=
𝑑𝑒𝑔𝑟𝑒𝑒 𝑡𝑜 𝑤ℎ𝑖𝑐ℎ 𝑋 𝑎𝑛𝑑 𝑌 𝑣𝑎𝑟𝑦 𝑠𝑒𝑝𝑎𝑟𝑎𝑡𝑒𝑙𝑦
𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑋 𝑎𝑛𝑑 𝑌
•r=
𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑋 𝑎𝑛𝑑 𝑌 𝑠𝑒𝑝𝑎𝑟𝑎𝑡𝑒𝑙𝑦
𝑆𝑃
r=
𝑆𝑆𝑥 𝑆𝑆𝑦
X Y X-M Y-M (X-M)² (Y-M) ² (X-M)(Y-M)
0 2
10 6
4 2
8 4
8 6
(64)(16)
r= 0.875
What is the sum of products (SP) for the
following data?
A. 6 X Y
2 4
B. -5 5 2
C. 43 3 5
2 5
D. None of the 3 choices is
correct
A set of n=5 pairs of X and Y values has SSx= 5,
Ssy= 20, and SP= 8. For these data, the Pearson
correlation is _____
A.0.08
B.0.80
C.0.32
D.0.40
Correlation Coefficient
Correlation Coefficient
• The range of the correlation coefficient is -1 to 1.
The Coefficient of Correlation, r
The Coefficient of Correlation (r) is a measure of the strength
of the relationship between two variables. It requires
interval or ratio-scaled data.
• It can range from -1.00 to 1.00.
• Values of -1.00 or 1.00 indicate perfect and strong
correlation.
• Values close to 0.0 indicate weak correlation.
• Negative values indicate an inverse relationship and positive
values indicate a direct relationship.
13-46
Perfect Correlation
13-47
•A perfect relationship is one in which a
positive or negative relationship exists
and all of the points fall on the line. An
imperfect relationship is one in which a
relationship exists, but all of the points
do not fall on the line.
Correlation Coefficient - Interpretation
13-50
Correlation Coefficient
Example: Finding the Correlation Coefficient
• Calculate the correlation Advertising Company
coefficient for the advertising expenses Sales
(P1000), x (P1000), y
expenditures and company
sales data. What can you
conclude? 2.4 225
1.6 184
2.0 220
2.6 240
1.4 180
1.6 184
2.0 186
2.2 215
Finding the correlation coefficient
• Step 1 Make a table, as shown is Step 2
• Step 2 Find the values of xy, x², and y². Place them in the appropriate
columns and sum each column
x y xy x² y²
- - - - -
- - - - -
- - - - -
𝑋 = 𝑦 = 𝑥𝑦 = 𝑥 ²= 𝑦 ²=
13-58
HOW TO INTERPRET
How do we interpret a correlation of _____ ?
•First, it is positive, so we see there is a direct relationship
between the advertising expenses and company sales.
•If the value fairly close to 1.00, so we conclude that the
association is strong.
13-65
Exercise:
• Find out if there is a relationship between the number of hours which a
random sample of ten students studied for an examination and the
grades the students received.
X Y X2 Y2 XY
8 56
5 44
11 79
13 72
10 70
5 54
18 94
15 85
2 33
8 65
Exercise
• Find out if there is a relationship between self-concept and level of
anxiety among ten college students
X Y X2 Y2 XY
16 78
27 55
33 43
15 80
22 49
17 75
20 80
28 50
14 76
12 79
TYPES OF CORRELATION
• Pearson r Correlation
• Point Biserial Correlation
• Kendall Rank Correlation
• Spearman Rank Correlation
13-68
Pearson r Correlation
• Pearson r correlation is the most widely used correlation statistic to
measure the degree of the relationship between linearly related
variables.
Both are continuous (interval or ratio)
Examples
1.Is there a statistically significant relationship between age and
height?
2.Is there a relationship between temperature (measured in degrees
Fahrenheit) and ice cream sales (measured by income)?
3.Is there a relationship between job satisfaction and income?
Point Biserial Correlation
• correlation is conducted with the Pearson correlation
formula except that one of the variables is
dichotomous.
• One is continuous (interval or ratio) and one is nominal
with two values
• For example: association between salaries and gender
13-70
Kendall Rank Correlation
• is a non-parametric test that measures the strength of dependence
between two variables.
• when your sample size is small and has many tied ranks.
13-71
Spearman Rank Correlation
• is an excellent choice when you have ordinal data. Ordinal data have at least
three categories and the categories have a natural order. For example, first,
second, and third in a race are ordinal data.
• For example, imagine the same contestants participate in two spelling
competitions. Suppose you have the finishing ranks for all contestants in
both matches and want to calculate the correlation between contests
• Are people with a higher level of education more concerned about the
environment?
• Is the number of symptoms a patient has related to their willingness to take
medication?
13-72
To illustrate computation of rs, let’s assume that the raters’ attitude and attraction
scores were only of ordinal scaling. Given this assumption,
determine the value of the linear correlation coefficient rho for these data and
compare the value with the value of Pearson r.
Proportion of Similar Attitudes Attraction
(X) (Y)
0.30 8.9
0.44 9.3
0.67 9.6
0.00 6.2
0.50 8.8
0.15 8.1
0.58 9.5
0.32 7.1
0.72 11.0
1.00 11.7
0.87 11.5
0.09 7.3
0.82 10.0
0.64 10.0
0.24 7.5
Formula:
6( σ D²)
r= 1-
N(N²−1)
x y Rank of x Rank of y d d²
Rx Ry
0.30 8.9
0.44 9.3
0.67 9.6
0.00 6.2
0.50 8.8
0.15 8.1
0.58 9.5
0.32 7.1
0.72 11.0
1.00 11.7
0.87 11.5
0.09 7.3
0.82 10.0
0.64 10.0
0.24 7.5
x y Rank of x Rank of y d d²
Rx Ry
0.30 8.9 5 7 -2 4
0.44 9.3 7 8 -1 1
0.67 9.6 11 10 1 1
0.00 6.2 1 1 0 0
0.50 8.8 8 6 2 4
0.15 8.1 3 5 -2 4
0.58 9.5 9 9 0 0
0.32 7.1 6 2 4 16
0.72 11.0 12 13 -1 1
1.00 11.7 15 15 0 0
0.87 11.5 14 14 0 0
0.09 7.3 2 3 -1 1
0.82 10.0 13 11.5 1.5 2.25
0.64 10.0 10 11.5 1.5 2.25
0.24 7.5 4 4 0 0
36.5
6( σ D²)
•r= 1-
N(N²−1)
6(36.5)
•r= 1-
15(15²−1)
219
•r= 1-
3360
•r= 0.9348214285715
• Note that rs 0.93 and r 0.94. The values are not
identical but quite close.
• In general, when Pearson r is calculated using the
interval or ratio properties of data, its values will
be close but not exactly the same as when
calculated on only the ordinal properties of those
data.
x Y Rank of x Rank of y D D²
(HS GPA) (College GPA) Rx Ry
85 87
80 84
77 83
82 80
83 85
86 83
80 84
87 90
83 88
80 84
90 92
86 88
80 85
87 89
85 88
=
x Y Rank of x Rank of y D D²
(HS GPA) (College GPA) Rx Ry
85 87 6.5 7 -0.5 0.25
80 84 12.5 11 1.5 2.25
77 83 15 13.5 1.5 2.25
82 80 10 15 -5 25
83 85 8.5 8.5 0 0
86 83 4.5 13.5 -9 81
80 84 12.5 11 1.5 2.25
87 90 2.5 2 0.5 0.25
83 88 8.5 5 3.5 12.25
80 84 12.5 11 1.5 2.25
90 92 1 1 0 0
86 88 4.5 5 -0.5 0.25
80 85 12.5 8.5 4 16
87 89 2.5 3 -0.5 0.25
85 88 6.5 5 1.5 2.25
= 𝟏𝟒𝟔. 𝟓
6( σ D²)
• r= 1-
N(N²−1)
Learning Check:
• A researcher measures the correlation in rankings for ice cream
flavors and consumers’ rankings of their favorite flavors. If D²= 19.5
and n=8, then what is the value of the correlation coefficient?
A. 0.07
B. -0.34
C. 0.79
D. -0.94
The spearman correlation coefficient is used to
measure the correlation between two ordinal
variables.
A. TRUE
B. FALSE
The point-biserial correlation coefficient is used to
measure the correlation between two continuous
variables.
A. TRUE
B. FALSE