0% found this document useful (0 votes)
67 views84 pages

Correlation

Okay, here are the steps: 1) Calculate the mean for each variable: X mean = 1.8 Y mean = 204 2) Calculate deviations from the mean: X' = X - X mean Y' = Y - Y mean 3) Calculate (X')^2, (Y')^2, (X')(Y') 4) Calculate: SSx = Σ(X')^2 = 5.2 SSy = Σ(Y')^2 = 16 SP = Σ(X')(Y') = 8 5) Use the formula: r = SP / √(SSx * SSy) r = 8 / √(

Uploaded by

Herald
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views84 pages

Correlation

Okay, here are the steps: 1) Calculate the mean for each variable: X mean = 1.8 Y mean = 204 2) Calculate deviations from the mean: X' = X - X mean Y' = Y - Y mean 3) Calculate (X')^2, (Y')^2, (X')(Y') 4) Calculate: SSx = Σ(X')^2 = 5.2 SSy = Σ(Y')^2 = 16 SP = Σ(X')(Y') = 8 5) Use the formula: r = SP / √(SSx * SSy) r = 8 / √(

Uploaded by

Herald
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Correlation

•Another area of inferential statistics


involves determining whether a
relationship exists between two or
more numerical or quantitative
variables.
A businessperson
may want to know
whether the volume
of sales for a given
month is related to
the amount of
advertising the firm
does that month.
• Educators are
interested in
determining whether
the number of hours a
student studies is
related to the
student’s score on a
particular exam.
• Medical researchers
are interested in
questions such as:

Is caffeine related to
heart damage?
Is there a relationship
between a person’s age
and his/her blood
pressure?
•Is relationship
satisfaction
related to
commitment?
•These are only a few of the many
questions that can be answered by
using techniques of correlation and
regression analysis.
Correlation is a statistical technique
that is used to measure and describe
the relationship between two
variables. Usually the two variables
are simply observed as they exist
naturally in the environment—there
is no attempt to control or manipulate
the variables.
Sir Francis Galton
1822-1911

Geographer, meteorologist,
tropical explorer, inventor of
fingerprint identification,
eugenicist, half-cousin of
Charles Darwin and best-selling
author.
Galton
◼ Obsessed with
measurement
◼ Tried to measure
everything from the
weather to female
beauty
◼ Invented correlation and
regression
Correlation
• Correlation is a topic that focuses on the
direction and degree of the relationship.
• The direction of the relationship refers to
whether the relationship is positive or negative.
• The degree of relationship refers to the
magnitude or strength of the relationship. The
degree of relationship can vary from nonexistent
to perfect. When the relationship is perfect,
correlation is at its highest and we can exactly
predict from one variable to the other.
CORRELATION
•The degree of relationship between the
variables under consideration is
measured through correlational analysis.
•The measure of correlation is called the
CORRELATION COEFFICIENT.
The characteristics of a
relationship
1. The Direction of the Relationship
The sign of the correlation, positive or negative,
describes the direction of the relationship.
• In a positive correlation, the two variables tend to change in
the same direction: as the value of the X variable increases
from one individual to another, the Y variable also tends to
increase; when the X variable decreases, the Y variable also
decreases.
• In a negative correlation, the two variables tend to go in
opposite directions. As the X variable increases, the Y
variable decreases. That is, it is an inverse relationship.
•A positive relationship indicates
that there is a direct relationship
between the variables.
•A negative relationship indicates
that there is an inverse
relationship between X and Y.
2. The Form of the Relationship
•The most common use of correlation is to
measure straight-line relationships.
However, other forms of relationships do
exist and there are special correlations used
to measure them.
3. The Strength of the
Relationship
For example:

•A correlation of -0.97 is a strong


negative correlation, whereas a
correlation of 0.10 indicates a weak
positive correlation.
Which has a stronger correlation?
0.30 and 0.70
0.70 and -0.90
0.10 and 0.90
0.10 and -0.98
0.44 and -0.64
0.98 and -0.74
correlation coefficient can vary from 1 to –1. The
sign of the coefficient tells us whether the
relationship is positive or negative.
The numerical part of the correlation coefficient
describes the magnitude of the correlation. The
higher the number, the greater is the correlation.
Since 1 is the highest number possible, it
represents a perfect correlation. A correlation
coefficient of 1 means the correlation is perfect
and the relationship is positive.
13-20
A correlation coefficient of –1 means the
correlation is perfect and the relationship is
negative. When the relationship is nonexistent, the
correlation coefficient equals 0. Imperfect
relationships have correlation coefficients varying
in magnitude between 0 and 1. They will be plus
or minus depending on the direction of the
relationship.
•Learning Check
A negative value for a correlation indicates
______________________
A. a much stronger relationship than if the correlation
were positive
B. a much weaker relationship than if the correlation
were positive
C. Increases in X tend to be accompanied by increases
in Y
D. Increases in X tend to be accompanied by decreases
in Y
A correlation of 0.10 is weaker than
0.74.
A.TRUE
B.FALSE
•A correlation of -0.98 is stronger
than 0.79.
A. TRUE
B. FALSE
Ana computed a correlation coefficient of
1.98. What does this mean?
•A. There is a weak relationship between the
variables
•B. There is a strong relationship between
the variables
•C. The direction of the relationship is
positive.
•D. There is a mistake in the computation.
Correlation Analysis and Scatter Diagram
• Correlation Analysis is the study of the relationship
between variables. It is also defined as group of
techniques to measure the association between two
variables.
• A Scatter Diagram is a chart that portrays the
relationship between the two variables. It is the usual
first step in correlations analysis

13-27
Correlation
•A scatter diagram/plot can be used to
determine whether a linear (straight
line) correlation exists between two
variables.
Scatter Diagram

13-29
Dependent vs. Independent Variable

DEPENDENT VARIABLE
(DV)The variable that is
being predicted or
estimated. It is scaled on the
Y-axis.

INDEPENDENT VARIABLE (IV)


The variable that provides
the basis for estimation. It is
the predictor variable. It is
scaled on the X-axis.

13-30
• Linear Correlation
Correlation is said to be linear when the amount of
change in one variable tends to bear a constant ratio
to the amount of change in the other. Th graph of
variable having a linear relationship will form a
straight line.
• Non Linear correlation
The correlation would be nonlinear if the amount of
change in one variable does not bear a constant
ratio to the amount of change in the other variable.
Types of Correlation
Dependent vs. Independent Variable

DEPENDENT VARIABLE
(DV)The variable that is
being predicted or
estimated. It is scaled on the
Y-axis.

INDEPENDENT VARIABLE (IV)


The variable that provides
the basis for estimation. It is
the predictor variable. It is
scaled on the X-axis.

13-33
Example: Constructing a Scatter Plot
Advertising Company
• A marketing manager conducted expenses Sales
a study to determine whether (P1000), x (P1000), y
there is a linear relationship
between money spent on 2.4 225
advertising and company sales. 1.6 184
The data are shown in the table. 2.0 220
Display the data in a scatter plot. 2.6 240
1.4 180
1.6 184
2.0 186
2.2 215
Solution: Constructing a Scatter Plot
𝑑𝑒𝑔𝑟𝑒𝑒 𝑡𝑜 𝑤ℎ𝑖𝑐ℎ 𝑋 𝑎𝑛𝑑 𝑌 𝑣𝑎𝑟𝑦 𝑡𝑜𝑔𝑒𝑡ℎ𝑒𝑟
r=
𝑑𝑒𝑔𝑟𝑒𝑒 𝑡𝑜 𝑤ℎ𝑖𝑐ℎ 𝑋 𝑎𝑛𝑑 𝑌 𝑣𝑎𝑟𝑦 𝑠𝑒𝑝𝑎𝑟𝑎𝑡𝑒𝑙𝑦
𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑋 𝑎𝑛𝑑 𝑌
•r=
𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑋 𝑎𝑛𝑑 𝑌 𝑠𝑒𝑝𝑎𝑟𝑎𝑡𝑒𝑙𝑦
𝑆𝑃
r=
𝑆𝑆𝑥 𝑆𝑆𝑦
X Y X-M Y-M (X-M)² (Y-M) ² (X-M)(Y-M)
0 2
10 6
4 2
8 4
8 6

𝑆𝑆𝑥 𝑆𝑆𝑦 SP=


r= 28

(64)(16)
r= 0.875
What is the sum of products (SP) for the
following data?
A. 6 X Y
2 4
B. -5 5 2
C. 43 3 5
2 5
D. None of the 3 choices is
correct
A set of n=5 pairs of X and Y values has SSx= 5,
Ssy= 20, and SP= 8. For these data, the Pearson
correlation is _____
A.0.08
B.0.80
C.0.32
D.0.40
Correlation Coefficient
Correlation Coefficient
• The range of the correlation coefficient is -1 to 1.
The Coefficient of Correlation, r
The Coefficient of Correlation (r) is a measure of the strength
of the relationship between two variables. It requires
interval or ratio-scaled data.
• It can range from -1.00 to 1.00.
• Values of -1.00 or 1.00 indicate perfect and strong
correlation.
• Values close to 0.0 indicate weak correlation.
• Negative values indicate an inverse relationship and positive
values indicate a direct relationship.
13-46
Perfect Correlation

13-47
•A perfect relationship is one in which a
positive or negative relationship exists
and all of the points fall on the line. An
imperfect relationship is one in which a
relationship exists, but all of the points
do not fall on the line.
Correlation Coefficient - Interpretation

13-50
Correlation Coefficient
Example: Finding the Correlation Coefficient
• Calculate the correlation Advertising Company
coefficient for the advertising expenses Sales
(P1000), x (P1000), y
expenditures and company
sales data. What can you
conclude? 2.4 225
1.6 184
2.0 220
2.6 240
1.4 180
1.6 184
2.0 186
2.2 215
Finding the correlation coefficient
• Step 1 Make a table, as shown is Step 2
• Step 2 Find the values of xy, x², and y². Place them in the appropriate
columns and sum each column
x y xy x² y²
- - - - -
- - - - -
- - - - -
෍𝑋 = ෍𝑦 = ෍ 𝑥𝑦 = ෍ 𝑥 ²= ෍ 𝑦 ²=

• Step 3 Substitute in the formula to find the value of r.


Step 3. Substitute in the formula to find the value of r.
n σ 𝑥𝑦 −(σ 𝑥)(σ 𝑦)
r= 2
𝑛 σ 𝑥²−(σ 𝑥)² 𝑛 σ 𝑦 −(σ 𝑦)²
Solution: Finding the Correlation Coefficient
Solution: Finding the Correlation Coefficient

540 5.76 50625


294.4 2.56 33856
440 4 48400
624 6.76 57600
252 1.96 32400
294.4 2.56 33856
372 4 34596
473 4.84 46225
Solution: Finding the Correlation Coefficient
• r = .913

13-58
HOW TO INTERPRET
How do we interpret a correlation of _____ ?
•First, it is positive, so we see there is a direct relationship
between the advertising expenses and company sales.
•If the value fairly close to 1.00, so we conclude that the
association is strong.

However, does this mean that more advertising cause


more sales?
No, we have not demonstrated cause and effect here, only
that the two variables—advertising
13-59 and sales are related.
• r= 0.913

There is a strong positive correlation


between advertising expenditures
and company sales.
Correlation and Causation
•The fact that two variables are
strongly correlated does not in itself
imply a cause-and-effect
relationship between the variables.
Correlation does not imply causality
▪ Two variables might be associated because they
share a common cause.
▪ For example, SAT scores and College Grade are highly
associated, but probably not because scoring well on
the SAT causes a student to get high grades in
college.
▪ Being a good student, etc., would be the common
cause of the SATs and the grades.
Intervening and confounding factors

There is a strong positive association between Number of Years of


Education and Annual Income
▪ In part, getting more education allows people to get better, higher-
paying jobs.
▪ But these variables are confounded with others, such as socio-
economic status
IQ GPA xy x² y²
x y
110 1.0
112 1.6
118 1.2
119 2.1
122 2.6
125 1.8
127 2.6
130 2.0
132 3.2
134 2.6
136 3.0
138 3.6 13-64
IQ GPA xy x² y²
x y
110 1.0 110 12100 1.00
112 1.6 179.2 12544 2.56
118 1.2 141.6 13924 1.44
119 2.1 249.9 14161 4.41
122 2.6 317.2 14884 6.76
125 1.8 225.0 15625 3.24
127 2.6 330.0 16129 6.67
130 2.0 260.0 16900 4.00
132 3.2 422.4 17424 10.24
134 2.6 348.4 17956 6.76
136 3.0 408.0 18496 9.00
138 3.6 496.8 19044 12.96
1503 27.3 3488.7 189187 69.13

13-65
Exercise:
• Find out if there is a relationship between the number of hours which a
random sample of ten students studied for an examination and the
grades the students received.
X Y X2 Y2 XY

8 56
5 44
11 79
13 72
10 70
5 54
18 94
15 85
2 33
8 65
Exercise
• Find out if there is a relationship between self-concept and level of
anxiety among ten college students

X Y X2 Y2 XY

16 78
27 55
33 43
15 80
22 49
17 75
20 80
28 50
14 76
12 79
TYPES OF CORRELATION
• Pearson r Correlation
• Point Biserial Correlation
• Kendall Rank Correlation
• Spearman Rank Correlation

13-68
Pearson r Correlation
• Pearson r correlation is the most widely used correlation statistic to
measure the degree of the relationship between linearly related
variables.
Both are continuous (interval or ratio)
Examples
1.Is there a statistically significant relationship between age and
height?
2.Is there a relationship between temperature (measured in degrees
Fahrenheit) and ice cream sales (measured by income)?
3.Is there a relationship between job satisfaction and income?
Point Biserial Correlation
• correlation is conducted with the Pearson correlation
formula except that one of the variables is
dichotomous.
• One is continuous (interval or ratio) and one is nominal
with two values
• For example: association between salaries and gender

• cholesterol concentration and smoking status (smoker"


and "non-smoker).

13-70
Kendall Rank Correlation
• is a non-parametric test that measures the strength of dependence
between two variables.
• when your sample size is small and has many tied ranks.

1.Correlation between a student’s exam grade (A, B, C…) and the


time spent studying put in categories (<2 hours, 2–4 hours, 5–7
hours…)
2.Customer satisfaction (e.g. Very Satisfied, Somewhat Satisfied,
Neutral…) and delivery time (< 30 Minutes, 30 minutes — 1 Hour,
1–2 Hours etc)

13-71
Spearman Rank Correlation
• is an excellent choice when you have ordinal data. Ordinal data have at least
three categories and the categories have a natural order. For example, first,
second, and third in a race are ordinal data.
• For example, imagine the same contestants participate in two spelling
competitions. Suppose you have the finishing ranks for all contestants in
both matches and want to calculate the correlation between contests
• Are people with a higher level of education more concerned about the
environment?
• Is the number of symptoms a patient has related to their willingness to take
medication?

13-72
To illustrate computation of rs, let’s assume that the raters’ attitude and attraction
scores were only of ordinal scaling. Given this assumption,
determine the value of the linear correlation coefficient rho for these data and
compare the value with the value of Pearson r.
Proportion of Similar Attitudes Attraction
(X) (Y)
0.30 8.9
0.44 9.3
0.67 9.6
0.00 6.2
0.50 8.8
0.15 8.1
0.58 9.5
0.32 7.1
0.72 11.0
1.00 11.7
0.87 11.5
0.09 7.3
0.82 10.0
0.64 10.0
0.24 7.5
Formula:

6( σ D²)
r= 1-
N(N²−1)
x y Rank of x Rank of y d d²
Rx Ry
0.30 8.9

0.44 9.3

0.67 9.6

0.00 6.2

0.50 8.8

0.15 8.1

0.58 9.5

0.32 7.1

0.72 11.0

1.00 11.7

0.87 11.5

0.09 7.3

0.82 10.0

0.64 10.0

0.24 7.5
x y Rank of x Rank of y d d²
Rx Ry
0.30 8.9 5 7 -2 4
0.44 9.3 7 8 -1 1
0.67 9.6 11 10 1 1
0.00 6.2 1 1 0 0
0.50 8.8 8 6 2 4
0.15 8.1 3 5 -2 4
0.58 9.5 9 9 0 0
0.32 7.1 6 2 4 16
0.72 11.0 12 13 -1 1
1.00 11.7 15 15 0 0
0.87 11.5 14 14 0 0
0.09 7.3 2 3 -1 1
0.82 10.0 13 11.5 1.5 2.25
0.64 10.0 10 11.5 1.5 2.25
0.24 7.5 4 4 0 0
36.5
6( σ D²)
•r= 1-
N(N²−1)
6(36.5)
•r= 1-
15(15²−1)
219
•r= 1-
3360
•r= 0.9348214285715
• Note that rs 0.93 and r 0.94. The values are not
identical but quite close.
• In general, when Pearson r is calculated using the
interval or ratio properties of data, its values will
be close but not exactly the same as when
calculated on only the ordinal properties of those
data.
x Y Rank of x Rank of y D D²
(HS GPA) (College GPA) Rx Ry
85 87
80 84
77 83
82 80
83 85
86 83
80 84
87 90
83 88
80 84
90 92
86 88
80 85
87 89
85 88
෍=
x Y Rank of x Rank of y D D²
(HS GPA) (College GPA) Rx Ry
85 87 6.5 7 -0.5 0.25
80 84 12.5 11 1.5 2.25
77 83 15 13.5 1.5 2.25
82 80 10 15 -5 25
83 85 8.5 8.5 0 0
86 83 4.5 13.5 -9 81
80 84 12.5 11 1.5 2.25
87 90 2.5 2 0.5 0.25
83 88 8.5 5 3.5 12.25
80 84 12.5 11 1.5 2.25
90 92 1 1 0 0
86 88 4.5 5 -0.5 0.25
80 85 12.5 8.5 4 16
87 89 2.5 3 -0.5 0.25
85 88 6.5 5 1.5 2.25
෍ = 𝟏𝟒𝟔. 𝟓
6( σ D²)
• r= 1-
N(N²−1)
Learning Check:
• A researcher measures the correlation in rankings for ice cream
flavors and consumers’ rankings of their favorite flavors. If D²= 19.5
and n=8, then what is the value of the correlation coefficient?
A. 0.07
B. -0.34
C. 0.79
D. -0.94
The spearman correlation coefficient is used to
measure the correlation between two ordinal
variables.
A. TRUE
B. FALSE
The point-biserial correlation coefficient is used to
measure the correlation between two continuous
variables.
A. TRUE
B. FALSE

You might also like