Correlation
In general, a positive linear relationship (or
correlation) between two variables, X and Y, occur
when high values on X tend to be associated with high
values on Y, and conversely, low values on X tend to be
associated with low values on Y.
Also, a negative linear relationship (or correlation)
between two variables, X and Y, occur when high
values on X tend to be associated with low values on Y,
and conversely, low values on X tend to be associated
with high values on Y.
2
Correlation
In previous researches, positive correlations have been
found , for example, between
Motivation and Academic success
Crime rate and Unemployment rate
Salary and job satisfaction
Verbal ability and proficiency test performance
Teacher quality and student success
3
Correlation
In previous researches, negative correlations have been
found , for example, between
Anxiety and test performance
College stress and academic success
Social support and isolation
Education of Parents and Number of children
4
Correlation Analysis
Analysis of the relationships among variables is called
Correlation Analysis. It includes the measurement of the
correlation between the variables. It is only concerned
with the strength of the relationship; no causal effect is
implied.
A scatter plot (or scatter diagram) can be used to show
the relationship between two numerical variables
Correlation coefficient is used to measure the direction
and strength of the linear relationship between variables.
5
The value of r
It ranges from -1 to +1
-1 0 +1
Positive and negative signs indicate the direction of
the relationship.
6
When r is 0:
A correlation coefficient of 0 indicates that there is
no linear relationship between two variables X and Y.
However, there might be some nonlinear relationship
between them.
Nonlinear relationships can either be quadratic,
logarithmic, exponential, etc.
A scatter diagram can give us a picture of nonlinear
relationships.
7
The Correlation Coefficient
Unit free
Ranges between –1 and 1
The closer to –1, the stronger the negative
linear relationship
The closer to 1, the stronger the positive
linear relationship
The closer to 0, the weaker any linear
relationship
The Correlation Coefficient
Y Y Y
X X X
r = -1 r = -.6 r=0
Y Y
X X
X
Statistics for Managers Using
Microsoft Excel, 5e © 2008 r = +1 r = +.3
Pearson Prentice-Hall, Inc.
Absolute value of r
The absolute value of r indicates the
strength of the linear relationship.
The closer to 1.0 the stronger the linear
relationship
The closer to zero the weaker the linear
relationship.
10
Sample Data with r =1.0
Data set A
X Y
1 2
2 4
3 6
4 8
5 10
11
Sample Data with r close to 1
Data set B
X Y
1 2.5
2 3.5
3 5
4 7
5 9
12
Sample Data with r close to 0
Data set E
Scatter Plot of Data set E
X Y with its Estimated Line
(r = 0.04)
1 1 10
8
2 10 Y
6
4
3 7 2
0
0 1 2 3 4 5 6
4 9 X
5 2
13
Guide Interpretation
(Note: Rule of Thumb only)
Absolute value of r Interpretation
0.00 – 0.29 Very Low
0.30 – 0.49 Low
0.50 – 0.69 Moderate
0.70 – 0.89 High
0.90 – 1.00 Very High
14
Pearson Product Moment Correlation Coefficient
(or simply “Pearson r” or “r”)
Cov( X, Y )
r
Var ( X ) Var ( Y )
where: X and Y are variables under consideration.
15
Testing Pearson r for
significance
While the Pearson coefficient provides the
information about the strength and direction of the
linear relationship, it is very important to determine
whether there is a linear relationship between the
variables in the entire population to which the sample
belongs.
If the correlation coefficient is not statistically
significant, there is no evidence of a linear relationship
between X and Y in the population to which the
sample belongs.
16
Test for significance of r
Null and alternative hypotheses
H0: β1 = 0 (no linear relationship)
H1: β1 ≠ 0 (linear relationship does exist)
Test statistic
d.f. n 2
Statistics for Managers Using Microsoft Excel, 5e © 2008
Prentice-Hall, Inc.
Four Requirements
for Pearson Correlation
#1: The two variables should be measured
at the interval or ratio level (i.e., they are
continuous).
#2: There needs to be a linear relationship
between the two variables.
If the relationship displayed in your scatterplot is
not linear, you will have to either use a non-
parametric equivalent to Pearson’s correlation or
transform your data.
18
The following scatterplots highlight the potential
impact of outliers:
Outliers are simply few data points within your data that
do not follow the usual pattern.
19
Use the following to test for the normality of
the variables:
• Kolmogorov-Smirnov Test
• Wilk-Shapiro Test
20
Coefficient of Determination
Coefficient of determination = r2
r2 = proportion of variance shared by two variables.
It indicates what proportion of the variance in one of
the correlated variables is associated with the variance
in the other variable.
If r = .30 between motivation and task performance,
then r is 0.09. This means that 9% of an individual’s
differences in motivation is associated with the
differences in task performance (or vice versa).
21
Steps in Correlation Analysis
Step 1: Formulate the Hypothesis
Ho: There is no significant relationship between X and
Y.
Ha: There is a significant relationship between X and Y.
Step 2: Specify the Level of significance
22
Steps in Correlation Analysis
Step 3: Gather Data
Step 4: Check the necessary assumptions.
Step 5: Compute appropriate correlation
coefficient and test for significance.
Step 6: Make a decision.
23
Example
Research Question: Is there a significant relationship
between a firm’s financial performance and the age of
the chief financial officer?
Null Hypothesis: There is no significant relationship
between a firm’s financial performance and the age of
the chief financial officer.
24
Using Microsoft Excel
Debt-to-Asset
Age Ratio
Age 1
Debt-to-Asset
Ratio 0.3092 1
25
Using SPSS
26