Chapter 5
Chapter 5
Correlation Analysis
1
Introduction
• Correlation coefficients are used to measure the strength of the
relationship or association between two quantitative variables.
• then the points in the scatter plot will tend to bunch together
otherwise they will be scattered all around.
3
Example: Height and quadriceps muscle strength in 41 male alcoholics
Figure 1. Scatter diagram showing muscle strength and height for 41 male alcoholics
4
• If you look at Figure 1, it is fairly easier to see that taller men
tend to be stronger than shorter men, or,
• looking at the other way round, that stronger men tend to be
taller than weaker men
• It is only a tendency, the tallest man is not the strongest and not
is the shortest man the weakest.
• Correlation enables us to measure how close this association is.
5
• To see how correlation works, we can draw two lines on the
scatter diagram,
– a horizontal line through the mean strength and
– a vertical line through the mean height, as shown in Figure 2.
6
Figure 2. Scatter diagram showing muscle strength and height for 41
male alcoholics, with lines through the mean height and mean strength
7
• When we add the products for all subjects, the sum will be positive,
because there are more positive products than negative ones.
• Further, subjects with very large values for both height and
strength, or very small values for both, will have large positive
products.
So the stronger the relationship is, the bigger the sum of products
will be.
8
• If the sum of products is positive, we say that there is a positive
correlation between the variables
9
If we have two variables X and Y with values xi and yi for the ith
individual, the correlation between them denoted by r(X,Y) is
given by:
(Xi X)(Yi Y) XY [ X Y ] / n
r
(Xi X) (Yi Y)
2 2
[ X 2 ( X ) 2 / n][ Y 2 ( Y ) 2 / n]
10
Example 1: The correlation coefficient for the muscle strength(y)
and height(x) will be:
Σx2= 1,196,828
∑X = 7,000
∑Y = 13,207
Σy2= 4,757,609
Σxy= 2,267,142
n=41
r = 0.42
11
The size of the correlation coefficient clearly reflects the degree
of closeness to a straight line on the scatter diagram.
• The correlation coefficient is less than 1.0. r will not equal –1.0 or
+1.00 when there is a perfect relationship unless the points lie on
a straight line.
12
Figure 3. Scatter diagram showing simulated data from a population where
there is a perfect relationship between the variables and yet the population
correlation coefficient is less than one
13
Example 2: Figure 4. Scatter diagram showing muscle strength(y) and
age(x) for 41 male alcoholics
14
Figure 5. Scatter diagram showing muscle strength and age for 41
male alcoholics, with lines through the mean
15
Example: The correlation coefficient for the muscle strength(y)
and Age(x) will be:
Σx2= 82,845,
∑X = 1,785
∑Y = 13207
Σy2= 4757609,
Σxy= 553,800
n=41
r = -0.42
16
Inference on Correlation Coefficient
r=0 r<0
b=0 b<0
Y
Y
X
X
r>0
b>0
Y
17
Hypothesis testing on correlation coefficient
n2 41 2
t r 0.42 2.89
1 r 1 (0.42)
2 2
P < 0.01
18
Interpretation of correlation coefficient
• Correlation coefficients lie within the range -1 to +1, with the mid-
point of zero indicating no linear association between the two
variables or the two variables are statistically independent