0% found this document useful (0 votes)
22 views20 pages

Correlation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views20 pages

Correlation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 20

Bangladesh University of Business and Technology

Course Title: Introduction to Statistics


Chapter: Correlation

Bivariate data

Bivariate data is data for which there are two variables for each observation. As an
example, the following bivariate data show the ages of husbands and wives of 10 married
couples.

Husband 36 72 37 36 51 50 47 50 37 41
Wife 35 67 33 35 50 46 47 42 36 41

Correlation

Correlation is a statistical technique which measure and analyses the degree or extent to
which two or more variables fluctuate with reference to one another.

Correlation thus denotes the interdependence amongst variates. The degrees are
expressed by a coefficient which ranges between and . The direction of change is
indicated by or signs.

If the increase (decrease) in one variable results in the corresponding increase


(decrease) in the others i.e. if the changes are in the same directions the variables are
positively correlated. For example, the heights and weights of a group of persons are
positively correlated, advertising and sales.

If the increase (decrease) in one variable results in the corresponding decrease


(increase) in the others i.e. if the changes are in the opposite directions the variables are
negatively correlated. For example, T.V registration and cinema attendance is
negatively correlated.

An absence of correlation is indicated by zero.

Correlation thus expresses the relationship through a relative measure of change and it
has nothing to do with the units in which the variables are expressed.
Correlation

Q. What are the uses of Correlation?

Uses

Economic theory and business studies relationships between variables like


price and quantity demanded, advertising, expenditure scales promotion
measure etc. The correlation analysis helps in deriving precisely the
degree and direction of such relationships.

The concepts of regression are also based upon the measure of correlation.

Scatter Diagram

Scatter diagram (or Dotogram or Scattergram) is a simple and attractive method of


diagrammatic represent of bivariate distribution for ascertaining the nature of correlation
between the variables. Thus for the bivariate distribution if the
values of the variables and be plotted along the -axis and -axis respectively in
the plane, the diagram of dots so obtained is known as scatter diagram.

On the other hand, a scatter plot of two variables shows the values of one variable on the
-axis and the values of the other variable on the -axis. Scatter plots are well suited
for revealing the relationship between two variables.

Scatter Diagram

70
60
............... y ........

50
40
30
20
10
0
0 20 40 60 80 100 120
............. x ........

2 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

Types of Correlation

Correlation is described or classified in several different ways. Three of the most


important are:

 Positive and negative Correlation

 Simple, partial and multiple Correlation

 Linear and non-linear Correlation

Positive and negative correlation

If two variables changes in the same direction (i.e. if one increases the other also increase
or if one decreases the other also decreases) then this is called a positive correlation. For
example:

Positive Positive
Correlation Correlation
X Y X Y
10 15 80 50
12 20 70 45
14 22 60 30
18 25 40 20
20 37 30 10

If two variables change in the opposite direction (i.e. if one increases, the other decreases
and vice versa), then the correlation is called a negative correlation. For example: T.V
registrations and cinema attendance.

Negative Negative
Correlation Correlation
X Y X Y
20 40 100 10
30 30 90 20
40 22 60 30
60 15 40 40
80 12 30 50

2. Simple, Partial and Multiple Correlation

3 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

 When only two variables are studied it is a problem of simple correlation.

 When three or more variables are studied it is a problem of either multiple or


partial correlation.

In multiple correlation three or more variables are studied simultaneously. For example,
when we study the relationship between the yield of rice per acre and both the amount of
rainfall and the amount of fertilizers used, it is problem of multiple correlation. Similarly
the relationship of plastic hardness, temperature and pressure is multivariate.

In partial correlation we recognize more than two variables. But consider only two
variables to be influencing variable being kept constant. For example, in the rice problem
taken above if we limit our correlation analysis of yield and rainfall to periods when a
certain average daily temperature existed, it becomes a problem of partial correlation.

3. Linear and non-linear correlation


The nature of the graph gives us the idea of the linear type of correlation between two
variables. If the graph is in a straight line, the correlation is called a “linear correlation”
and if the graph is not in a straight line, the correlation is non-linear and curve-linear.

The distinction between linear and non-linear correlation is based upon the constancy of
the ratio of change between the variables. If the amount of change in one variable tends
to bear a constant ratio to the amount of change in the other variable then the correlation
is said to be linear. For example, observe the following two variables X and Y:

X: 10 20 30 40 50
Y: 70 140 210 280 350

It is clear that the ratio of change between the two variables is the same. If such variables
are plotted on a graph paper all the plotted points would fall on a straight line.

Scatter Diagram

400
........... y ............

300
200
100
0
0 20 40 60
........... x ............

4 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

Correlation would be called non-linear or curvilinear if the amount of change in one


variable doesn’t bear a constant ratio to the amount of change in the other variable. For
example, if we double the amount of rainfall, the production of rice or wheat etc. would
not necessarily be doubled.

Scatter Diagram

........... y ..........
2000
1000
0
0 10 20 30 40 50
....... x ...........

Properties of the Coefficient of Correlation

The following are the important properties of the coefficient of correlation, :

 The coefficient of correlation lies between and , .


 The coefficient of correlation is the geometric mean of the two regression

coefficients. Symbolically:

 If X and Y are independent variables then coefficient of correlation is zero.


However, the converse is not true.

Degrees of Correlation

Through the coefficient of correlation, we can measure the degree or extent of the
correlation between two variables. On the basis of the coefficient of correlation we can
also determine whether the correlation is positive or negative and also its degree or
extent.

Perfect correlation: If two variables changes in the same direction and in the
same proportion, the correlation between the two is perfect positive. According
to Karl Pearson the coefficient of correlation in this case is . On the other hand,
if the variables change in the opposite direction and in the same proportion, the
correlation is perfect negative. Its coefficient of correlation is . In practice we
rarely come across these types of correlations.

5 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

Absence of correlation: If two series of two variables exhibit no relations


between them or change in variable does not lead to a change in the other
variable, then we can firmly say that there is no correlation or absurd
correlation between the two variables. In such a case the coefficient of
correlation is 0.

Limited degrees of correlation: If two variables are not


perfectly correlated or is there a perfect absence of correlation,
then we term the correlation as Limited correlation. It may be
positive, negative or zero but lies with the limits .

High degree, moderate degree or low degrees are the three categories of this kind of
correlation. The following table reveals the effect (or degree) of coefficient or correlation.

Degrees Positive Negative

Absence of
Zero 0
correlation 

Perfect correlation  +1 -1

High degree  + 0.75 to + 1 - 0.75 to -1

Moderate degree  + 0.25 to +


- 0.25 to - 0.75
0.75

Low degree  0 to 0.25 0 to - 0.25

Methods of Determining Correlation

We shall consider the following most commonly used methods.

 Scatter Plot.

6 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

 Karl Pearson’s coefficient of correlation.

 Spearman’s Rank-correlation coefficient.

 Method of Least Squares.

Scatter Plot (Scatter diagram or dot diagram)

In this method the values of the two variables are plotted on a graph paper. One is taken
along the horizontal ( -axis) and the other along the vertical ( -axis). By plotting the
data, we get points (dots) on the graph which are generally scattered and hence the name
‘Scatter Plot’.

The manner in which these points are scattered, suggest the degree and the direction of
correlation. The degree of correlation is denoted by ‘ ’ and its direction is given by the
signs positive and negative.

If all points lie on a rising straight line the correlation is perfectly positive and
.

Scatter Diagram

150
............. y ...........

100

50

0
8 13 18 23
......... x .............

If all points lie on a falling straight line the correlation is perfectly negative and
.

7 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

Scatter Diagram

.......... y .........
100
80
60
40
20
0
10 20 30 40 50 60
.............. x ..........

If the points lie in narrow strip, rising upwards, the correlation is


high degree of positive.

If the points lie in a narrow strip, falling downwards, the


correlation is high degree of negative.

8 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

If the points are spread widely over a broad strip, rising upwards,
the correlation is low degree positive.

If the points are spread widely over a broad strip, falling


downward, the correlation is low degree negative.

If the points are spread (scattered) without any specific pattern,


the correlation is absent. i.e. .

Scatter Diagram

60
............ y ...........

50
40
30
20
10
0
0 10 20 30 40
............ x ..........

Though this method is simple and is a rough idea about the existence
and the degree of correlation, it is not reliable. As it is not a
mathematical method, it cannot measure the degree of correlation.

9 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

Merits and Limitations of the Method

Merits

It is simple and non-mathematical method of studying correlation between the


variables .As such it can be easily understood and a rough idea can very quickly
be formed as to whether or not the variables are related.

It is not influenced by the size of extreme values whereas most of the


mathematical methods of finding correlation are influenced by extreme values.

Making a scatter diagram usually is the first step in investigating the relation ship
between the variables.
Limitations

By applying this method we can get an idea about the direction of correlation and also
whether it is high or low. But we cannot establish of correlation and also whether it is
high or low. But we cannot establish the exact degree of correlation between the variables
as is possible by applying the mathematical method.

Example1: Given the following pairs of values:

Capital employed (Crores of Rs.): 1 2 3 4 5 7 8 9 11 12


Profit (Lakhs of Rs.) : 3 5 4 7 9 8 10 11 12 14

1) Make a scatter diagram

2) Do you think that there is any correlation between profits and capital employed?
Is it positive? Is it high or low?

10 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

Correlation between profits and Capital


employed(Crores of Rs.)

16
14
12

Profit 10
8 Profit(Lakhs of Rs.)

6
4
2
0
0 5 10 15
Capital Employed

By looking at the scatter diagram we can say that the variables profits and capital
employed are correlated. Further, correlation is positive because the trend to the points is
upward rising from the lower left hand corner to the upper right hand corner of the
diagram.

The diagram also indicate that the degree of relationship is high because the plotted
points are in a narrow band which shows that it is a case of high degree of positive
correlation.

Karl Pearson’s Coefficient of Correlation

Of the several mathematical methods of measuring correlation, the Karl Pearson’s


method, popularly known as Pearsonian coefficient of correlation, is most widely used in
practice. The coefficient of correlation is denoted by the symbol r. If the two variables
under study are X and Y, the following formula suggested by Karl Pearson can be used
for measuring the degree of relationship.

The value of the coefficient of correlation as obtained by the above formula shall always
lie between .

When , it means there is perfect positive correlation between the variables.

11 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

When , it means there is a perfect negative correlation between the variables.

When , it means there is no relationship between the variables.

Example1: Calculate the coefficient of correlation between the heights


of father and his son for the following data.

Height of father 16 16 16 16 16 16 17
172
(cm): 5 6 7 8 7 9 0

Height of son 16 16 16 17 16 17 16
171
(cm): 7 8 5 2 8 2 9

Solution:

We know that. Correlation of coefficient

Let us consider the height of father is and height of son is .

By using calculator we get,

= 225828 =1344
= 228532 = 1352 = 227160

= 0.603022689 = 0.603

12 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

Example2: The following data consist of observations for the weights of 10 different
automobiles (in 1000 pounds) and the corresponding fuel consumptions (gallons per
100 miles).

Weight (x) Fuel Consumption (y)

3.4 5.5

3.8 5.9

4.1 6.5

2.2 3.3

2.6 3.6

2.9 4.6

2.0 2.9

2.7 3.6

1.9 3.1

3.4 4.9

We would like to find out how y is correlated to x.

Solution: We know that. Correlation of coefficient

By using calculator we get,

= 89.29 = 29
= 207.31 =43.9 =135.8

13 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

= 0.976629971 = 0.976

Example3: Suppose that we took 7 mice and measured their body weight and their length
from nose to tail. We obtained the following results and want to know if there is any
relationship between the measured variables. [To keep the calculations simple, we will
use small numbers]

Mouse Units of weight Units of length


(X) (Y)

1 1 2

2 4 5

3 3 8

4 4 12

5 8 14

6 9 19

7 8 22

Solution: We know that. Correlation of coefficient

By using calculator we get,

= 251 = 37
=1278 = 82 = 553

14 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

= 0.901441541= 0.90

Example4:The data below are the heights (cm) and weights (Kg) of 20 female students
taking STAT 201. Calculate the coefficient of correlation between the heights and
weights of female students of the following data.

SL fht fwt
1 167 60
2 164 65
3 170 64
4 163 47
5 152 46
6 160 57
7 170 57
8 160 55
9 157 55
10 170 65
11 150 50
12 156 46
13 168 60
14 159 55
15 160 50
16 172 69
17 175 56
18 169 56
19 169 72
20 156 56

Solution
We know that. Correlation of coefficient

15 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

Let us consider fht is denoted by and fwt is denoted by .

By using calculator we get,

= 534615 = 3267
= 66113 = 1141 = 187045

= 0.673318089 = 0.673

(3)Spearman’s Rank Correlation

The association between two series of rank is called rank correlation. The method of
ascertaining the coefficient of correlation by ranks was devised by Charles Edwards
Spearman in 1904.This method is especially useful in case when the actual magnitudes or
item values are not given and simply their ranks in the series are known. Spearman’s rank
correlation coefficient, usually denoted by (Rho) is given by the formula:

Where d stands for the difference between the pair of ranks and n the number of paired
observations.

The value of Spearman’s rank correlation coefficient ranges between and .When
is , the concordance between rankings is perfect and the ranks are in the same
direction. When is , there is also perfect concordance between rankings but the ranks
in opposite direction.

16 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

In rank correlation we may have two types of problems:

A. Where actual ranks are given.

B. Where ranks are not given.

A. Where Actual Ranks are given

Where Actual Ranks are given the steps required for computing rank correlation are:

 Take the difference of the two ranks i.e and denote these differences by
.

 Square these differences and obtain the total

 Apply the formula

Example1:
Two managers are asked to rank a group of employees in order of potential for eventually
becoming top managers .The rankings are as follows:

Employee Ranked by manager Ranked by Manager


I II
A 10 9
B 2 4
C 1 2
D 4 3
E 3 1
F 6 5
G 5 6
H 8 8
I 7 7
J 9 10
Compute the coefficient of rank correlation and comment on the value.

Solution:

Calculation of Rank Correlation Coefficient

17 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

Employee Ranked by Ranked by


manager I Manager II
A 10 9
B 2 4 By using
C 1 2 Calculator
D 4 3
E 3 1
F 6 5
G 5 6
H 8 8
I 7 7
J 9 10
Total

We know that,

=1- = 0.915

Thus we find that there is a high degree of positive correlation in the ranks assigned by
the two managers.

B. Where Ranks are not given

When we are given the actual data and not the ranks it will be necessary to assigns the
ranks .Ranks can be assigned by taking either the highest value as 1 or the lowest value as
1. But whether we start with the lowest value or the highest value we must follow the
same method in case of all the variables.

Example1:

Calculate the rank correlation coefficient for the following data of marks of 2 tests given
to candidates for a clerical job:

Preliminary
92 89 87 86 83 77 71 63 53 50
test
Final test 86 83 91 77 68 85 52 82 37 57
Solutions:

Calculation of Rank Correlation Coefficient

18 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

Preliminary Final test


test
92 10 86 9
89 9 83 7
87 8 91 10
86 7 77 5 By using
83 6 68 4 Calculator
77 5 85 8
71 4 52 2
63 3 82 6
53 2 37 1
50 1 57 3
Total

We know that,

= = 1-0.267 = 0.733

Thus there is a high degree of positive correlation between preliminary and final test.

Merits and Limitations of the Rank Method

Merits

 This method is simpler to understand and easier to apply compared to the Karl
Pearson’s method.

 Where the data are of a qualitative nature like honesty, efficiency, intelligence
etc., this method can be used with great advantage. For example the workers of
two factories can be ranked in order of efficiency and the degree of correlation
established by applying the method.

 This is the only method that can be used where we are given the ranks and not the
actual data.

 Even where actual data are given rank method can be applied for ascertaining
rough degree of correlation.

Limitations:

 This method cannot be used for finding out correlation in a grouped frequency
distribution.

19 Md. Moyazzem Hossain


Lecturer in Statistics
Correlation

 Where the number of observations exceed 30 the calculations becomes quite


tedious and require a lot of time. Therefore this method should not applied where
n exceeding 30 unless we are given the ranks and not the actual values of the
variable.

(4) Method of Least Squares

For finding out correlation by the coefficient method of least squares we have to calculate
the values of two regression coefficients that of on and on . The correlation
coefficient is the square root of the product of two regression coefficients. Symbolically,

Coefficient of Determination

One very convenient and useful way of interpreting the value of coefficient of correlation
between two variables is to use the square of coefficient of correlation, which is called
coefficient of determination. The coefficient of determination thus equals .

*** If the value of , will be 0.81 and this would mean that 81% of the
variation in the dependent variable has been explained by the independent variable.

20 Md. Moyazzem Hossain


Lecturer in Statistics

You might also like