0% found this document useful (0 votes)
30 views19 pages

Chapter 5

This document discusses correlation analysis and correlation coefficients. It defines correlation coefficients as a measure of the strength of the linear relationship between two quantitative variables from -1 to 1. A value of 1 or -1 indicates a perfect linear relationship, while 0 indicates no linear relationship. The document provides an example of calculating the correlation coefficient between height and muscle strength in male alcoholics, finding a positive correlation of 0.42. It also discusses interpreting and testing the significance of correlation coefficients.

Uploaded by

Samuel Debele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views19 pages

Chapter 5

This document discusses correlation analysis and correlation coefficients. It defines correlation coefficients as a measure of the strength of the linear relationship between two quantitative variables from -1 to 1. A value of 1 or -1 indicates a perfect linear relationship, while 0 indicates no linear relationship. The document provides an example of calculating the correlation coefficient between height and muscle strength in male alcoholics, finding a positive correlation of 0.42. It also discusses interpreting and testing the significance of correlation coefficients.

Uploaded by

Samuel Debele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Chapter 5:

Correlation Analysis

1
Introduction
• Correlation coefficients are used to measure the strength of the
relationship or association between two quantitative variables.

• The standard method (Pearson correlation) leads to a quantity


called r that can take on any value from -1 to +1

• This correlation coefficient, r, measures the degree of 'straight-


line' association between the values of two variables.
• Thus a value of +1.0 or -1.0 is obtained if all the points in a
scatter plot lie on a perfectly straight line.
2
• Given data on two variables X and Y.

• If there is a close relationship between the two variables,

• then the points in the scatter plot will tend to bunch together
otherwise they will be scattered all around.

• If high values of X are accompanied by high values of Y and low


values of X with low values of Y, then the sum of products of the
two deviations will be positive hence r will be positive too.

3
Example: Height and quadriceps muscle strength in 41 male alcoholics

Figure 1. Scatter diagram showing muscle strength and height for 41 male alcoholics
4
• If you look at Figure 1, it is fairly easier to see that taller men
tend to be stronger than shorter men, or,
• looking at the other way round, that stronger men tend to be
taller than weaker men

• It is only a tendency, the tallest man is not the strongest and not
is the shortest man the weakest.
• Correlation enables us to measure how close this association is.

• The computation of the correlation coefficient is based on the


products of differences from the mean of the two variables.

5
• To see how correlation works, we can draw two lines on the
scatter diagram,
– a horizontal line through the mean strength and
– a vertical line through the mean height, as shown in Figure 2.

• Because large heights tend to go with large strength and small


heights with small strength,
• there are more observations in the top right quadrant and the
bottom left quadrant than there are in the top left and bottom
right quadrants.

6
Figure 2. Scatter diagram showing muscle strength and height for 41
male alcoholics, with lines through the mean height and mean strength
7
• When we add the products for all subjects, the sum will be positive,
because there are more positive products than negative ones.

• Further, subjects with very large values for both height and
strength, or very small values for both, will have large positive
products.

 So the stronger the relationship is, the bigger the sum of products
will be.

8
• If the sum of products is positive, we say that there is a positive
correlation between the variables

• If the sum of products is negative, we say that there is a negative


correlation between the variables

• The sum of products will depend on the number of observations


and the units in which they are measured.

9
If we have two variables X and Y with values xi and yi for the ith
individual, the correlation between them denoted by r(X,Y) is
given by:

 (Xi  X)(Yi  Y)  XY  [ X  Y ] / n
r 
 (Xi  X)  (Yi  Y)
2 2
[ X 2  ( X ) 2 / n][ Y 2  ( Y ) 2 / n]

The equation is clearly symmetrical as it does not matter which


variable is X and which is Y.

10
Example 1: The correlation coefficient for the muscle strength(y)
and height(x) will be:

Σx2= 1,196,828
∑X = 7,000
∑Y = 13,207
Σy2= 4,757,609
Σxy= 2,267,142
n=41
r = 0.42
11
 The size of the correlation coefficient clearly reflects the degree
of closeness to a straight line on the scatter diagram.

• The correlation coefficient is less than 1.0. r will not equal –1.0 or
+1.00 when there is a perfect relationship unless the points lie on
a straight line.

 Correlation measures closeness to a linear relationship, not to


any perfect relationship.

12
Figure 3. Scatter diagram showing simulated data from a population where
there is a perfect relationship between the variables and yet the population
correlation coefficient is less than one

13
Example 2: Figure 4. Scatter diagram showing muscle strength(y) and
age(x) for 41 male alcoholics

14
Figure 5. Scatter diagram showing muscle strength and age for 41
male alcoholics, with lines through the mean
15
Example: The correlation coefficient for the muscle strength(y)
and Age(x) will be:

Σx2= 82,845,
∑X = 1,785
∑Y = 13207
Σy2= 4757609,
Σxy= 553,800
n=41

r = -0.42

16
Inference on Correlation Coefficient

r=0 r<0

b=0 b<0
Y

Y
X
X

r>0

b>0
Y

17
Hypothesis testing on correlation coefficient

Under the null hypothesis that there is no association in the


population (=0), the appropriate test statistics is given by:
n2
r
1 r 2

has a t-distribution with n-2 degrees of freedom.

Example: for the muscle-height data:

n2 41  2
t  r  0.42   2.89
1 r 1  (0.42)
2 2

P < 0.01
18
Interpretation of correlation coefficient
• Correlation coefficients lie within the range -1 to +1, with the mid-
point of zero indicating no linear association between the two
variables or the two variables are statistically independent

• A very small correlation does not necessarily indicate that two


variables are not associated, however.

• To be sure of this we should study a plot of the data, because it is


possible that the two variables display a non-linear relationship
(for example cyclical or curved).
• In such cases r will underestimate the association, as it is a
measure of linear association alone.
19

You might also like