Stat
Stat
It is relatively reliable in the sense that it does not vary too much when repeated
samples arc taken from one and the same population, at least not as much as
some other kind of statistical descriptions.
6. The mean is typical in the sense that it is the center of gravity balancing the values
on either side of it.
1. Since the value of mean depends upon each and every item of the series, extreme
values, i.e., very small and very large items unduly affect the value of the average.
The smaller the number of observations, the greater is likely to be the impact of
extreme values.
2. In a distribution with open end classes, the value of mean cannot be computed
without making assumption regarding the size of the class interval of the open
end classes. If such classes contain a large proportion of the values, then mean
may be subject to substantial error.
3. The arithmetic mean is not always a good measure of central tendency. The
mean provides a characteristic value, in the sense of indicating w here most of
the values lie, only when the distribution of the variable is reasonably normal i.e.,
bell shaped. In case of a U shaped distribution the mean is not likely to serve a
useful purpose.
When we are given a sequence of items, arithmetic mean can be calculated either by
direct method or by shortcut method.
i) Direct Method:
In the direct method, arithmetic mean can be obtained by adding all the items and then
dividing the total by the number of items.
X
Symbolically : X
N
76
81
Illustration 1 : The monthly incomes of 10 employees in an office are given below.
Income (Rs.) : 1780, 1760, 1690, 1750, 1840, 1920, ! 100, 1810,1050,1950
Calculate the arithmetic mean of incomes.
Solution: X = 16,650, N = 10
X 16650
X 1665, The average income is Rs. 1665
N 10
ii) Short cut Method:
In the short cut method, one of the items, usually the middle item will be taken as the
assumed mean and then deviations are calculated from the assumed mean. The formula for
calculating arithmetic mean is
d
X A , where d = X-A, Deviations taken From the assumed mean A.
N
Illustration 2: The monthly incomes of 10 employees in an office are given below.
Income (Rs.) : 1780, 1760, 1690, 1750, 1840, 1920, 1100, 1810, 1050,1950
Calculate the arithmetic mean of incomes by short cut method.
Solution: Lei us lake 1750 as the assumed mean.
Income X (Rs.) DX-1750
1780 35
A 1750
1760 10
N 10
1690 -60
d -850
1750 0
1X40 90
1920 170 d
X A
1 100 -650 N
IX10 60
1050 -700 1750
850
10
1950 200
1750 85
-850
1665
The average income is Rs. 1665
77
82
Note that we have got same value of arithmetic mean either by direct method or by short
cut method.
D. Calculation of Arithmetic Mean-Discrete Series:
When we are given a discrete series, the arithmetic mean can be calculated either by
direct method or by short cut method.
i) Direct Method:
The formula for computing arithmetic mean is
fx
X , where f = frequency,,
N
X = variable and N = total frequency
Illustration 3: From the following data of marks obtained by 60 students of a class, calculate the
arithmetic mean.
Marks (X) 20 30 40 50 160 70
No. of Students (f) 8 12 20 10 6 4
Solution:
X f fX
N f 6
20 8 160
fx 2460
30 12 360 fx
X
40 20 800 N
50 10 500
60 6 360
70 4 280 2460 = 60 = 41
60 2460
ii) Short cut Method:
In the short cut method, usually we take middle item of X as the assumed mean and
calculate the deviations from that assumed mean. In this case the arithmetic mean can be obtained
from the following formula.
fd
X A , where A is assumed mean
N
78
83
D = X - A and N is the total frequency.
Illustration 4 : From the following data of marks obtained by 60 students of a class, calculate
arithmetic mean.
Marks (X) 20 30 40 50 60 70
No. of Students (f) 8 12 20 10 6 4
Solution :
X F D = X-40 fX N - 60, A = 40, fd - 60
20 8 -20 -160 fd
X A
40 20 0 0 N
50 10 10 100
60 6 20 120 60
40 41
60
70 4 30 120
60 60
Note that we could get the same value by both the methods.
E. Calculation of Arithmetic Mean-Continuous Series:
In a continuous series, arithmetic mean can be computed by applying direct method or
short cut method or step deviation method.
i) Direct Method: The formula for obtaining arithmetic mean is given by
fx
X , where X represents the middle value of class intervals
N
f = frequency of each class, N = Total frequency
Illustration 5: Calculate the arithmetic mean for the following data by direct method.
Solution :
79
84
Class Frequency Mid value FX N = 100
Interval f X fx 3300
0-10 5 5 25
fx
10-20 10 15 150 X
N
20-30 25 25 625
30-40 30 35 1050 3300
33
100
40-50 20 45 900
50-60 10 55 550
100 3300
Arithmetic mean of the marks is X = 33
ii) Short Method:
When short cut method is used, arithmetic mean is computed by applying the following
formula.
fd
X A , where A = Assumed mean
N
D = X - A, deviations of mid values from assumed mean
N = Total frequency
Illustration 6: Calculate the arithmetic mean for the following data by short cut method.
Solution :
Class Interval Frequency Mid value D = X-25 fd
f X
0-10 5 5 -20 -100
10-20 10 15 -10 -100
20-30 25 25 0 0
30-40 30 35 10 300
40-50 20 45 20 400
50-60 10 55 30 300
100 800
85
fd 800
X A 25 33
N 100
Arithmetic mean of the marks is X 33
iii) Step Deviation Method:
This method can be used for the calculation of various statistical measures in case of
continuous distributions with equal intervals. The calculations will be further simplified in this
method. A very important note with regard to this method is that, it can be applied only when the
class intervals are equal for all the classes.
The formula used in this method is
fd1
X A , where A = Assumed mean, f = Class frequency
N
XA
d1 , C is the class interval
C
X = middle value of the class interval
Illustration 7: Calculate the arithmetic mean for the following data by step deviation method.
Solution :
Class Interval Frequency Mid value d1-X-35/10 fd1
f X
0-10 5 5 -3 -15
10-20 10 15 -2 -20
20-30 25 25 -1 -25
30-40 30 35 0 0
40-50 20 45 1 20
50-60 10 55 2 20
100 -20
81
86
A-35, C=10, N = 100 fd1 20
X A
fd1
xC 35
20 x10 33
N 100
Note that we could get the same value of arithmetic mean either by direct method or by short cut
method or by step deviation method.
6.3.1 Median:
The median by definition refers to the middle value in a distribution. That means it is the
item which divides the total observations into two equal parts. Median is an item for which half of
the items are less and half the items arc more than it. As distinct from the arithmetic mean, the
median is called a positional average. The term position refers to the place of a value in a series.
The place of the median in a series is such that an equal number of items lie on either side of it.
A. Merits of Median:
1) Median is especially useful in case of open end classes, since only the position
and not the values of items must be known. The median is also recommended if
the distribution has unequal classes, since it is easier to compute than the mean.
2) Extreme values do not affect the median as strongly as they do the mean. Very
often, when extreme values are present in a set of observations, the median is a
more satisfactory measure of central tendency than the mean.
4) It is the most appropriate average in dealing with qualitative data, i.e., where
ranks are given or there are other types of items that are not counted or measured
but are scored.
82
87
5) The value of median can be determined graphically whereas the value of mean
cannot be graphically ascertained.
6) The median indicates the value of the middle item in the distribution. This is a
clear cut meaning and makes the median a measure that can be easily explained.
B. Limitations of Median:
1) For calculating median, it is necessary to arrange the data; other averages do not need
any arrangement.
2) Since it is a positional average, its value is not determined by each and every item of the
distribution.
4) The value of median is affected more by sampling fluctuations than the value of the
arithmetic mean.
5) The median, in some cases, cannot be computed exactly as the mean. When the number
of items included in a series of data is even, the median is determined approximately as the mid-
point of the two middle terms.
When we are given a sequence of observations, we have to arrange them either in the
ascending order or in the descending order of magnitude. If the total number of items is odd, the
middle term will be the median. If the total number of items is even, the average of the two middle
terms will be the median.
83
88
Illustration 8: The wages of eight workers are given below:
Wages (Rs.): 1100, 1150, 1080, 1120, 1200, 1160, 1400, 1210.
Calculate the median of the wages.
Solution: Arranging the items in ascending order of magnitude, we have
1080, 1100, 1120, 1150, 1160, 1200, 1210, 1400
since we have eight terms in total,
1150+1160
Median = ______________ =1155
2
84
89
N+1
Median = Size of-------- th item.
122 + 1
th item.
\ Median of the incomes - Rs.1500
E. Calculation of Median - Continuous Series:
When we are given a continuous series, we have to calculate cumulative fretji and
determine the particular class in which the value of ihc median lies. In a cop series we have »
consider N/2th ilem instead of Nt-l/2tli item. T he following form- be used to determine the exact
value of the median.
(N/2 - m)
Median ^ ! t ----------------------- x C , where
f
.>• as
m ^ <.. umulalive frequency of the class precu.. n\
f - Simple frequency of the median class C C= Class interval
N that if we ..i’c given inclusive type of class intervals. we ha\ into exclusive type Iy adding 0.5 to
the upper limit and subtracting 0.5 from mo lower iium. Also when the class intervals re unequal,
the frequencies need not be adjusted to me class intervals equal and the same formula can
applied.
Illustration 10: Calculate median for the following frequency distribution.
Marks 45-50 40-45 35-40 30-35 25-30 20-25 15-20 10-15 5-10
No. of St. 10 . 15 26 30 42- 31 24 15 7
85
90
Solution: We have to first arrange the data in ascending order and find the cumulative frequencies.
Marks No. of Students Cumulative
f frequency
5-10 7 7
10-15 15 22
15-20 24 46
20-25 31 77
25-30 42 119
30-35 30 149 Median class
35-40 26 175
40-45 15 190
45-50 10 200
N 200
Median = Size of th item. 100 th item
2 2
100 - 77 5
Median = 25 , = 25 + 2.74 = 27.74
42
6.3.2 Mode:
The mode or the model value is that value in a series of observations which occurs with
the highest frequencies. The mode is often said to be that value which occurs most often in the
data. Mode is that value about which the items are most closely concentrated. It is the value
which has the greatest frequency density in its immediate neighbourhood. For this reason mode
is also called the most typical or fashionable value of a distribution. A set of data may have single
mode in which case it is said to be unimodal, it may have two modes which makes it bimodal. If
the data have several modes, they are called multimodal.
There are many situations in which arithmetic mean and median fail to reveal the true
characteristic of data. For example, when we talk of most common wage, most common income,
86
91
most common height, most common size of shoe or ready made garments, we have in mind
mode and not the arithmetic mean or median. The shortcomings in mean and median may be
overcome by the use of mode which refers to the value which occurs most frequently in a
distribution.
A. Merits of Mode:
The following are the merits of mode when compared to other measures of central tendency.
2. Like median, the mode is not unduly affected by extreme values. Even if the high values
are very high and the low values are very low, we choose the most frequent value of the
data to the modal value.
3. Its value can be determined in open-end distributions without ascertaining the class limits.
5. The value of mode can also be determined graphically whereas the value of mean cannot
be graphically ascertained.
B. Limitations of Mode:
1. The value of mode cannot always be determined. In some cases we may have a
bimodal series.
87
92
2. It is not capable of algebraic manipulations. For example, from the modes of
two sets of data we cannot calculate the overall mode of the combined data.
3. The value of mode is not based on each and every item of the series.
4. It is not a rigidly defined measure. There are several formulae for calculating the
mode, all of which usually give somewhat different answers.
5. While dealing with quantitative data, the disadvantages of the mode outweigh its
good features and hence it is used rarely.
For determining the mode, count the number of times the various values repeat themselves
and the value occurring maximum number of times will be the mode.
Illustration 11 :Calculate mode from the following data of marks obtained by 10 students.
Marks: 10, 27, 24, 12, 27, 27, 20, 18, 15 and 30.
Solution: We observe that the item 27 occurs 3 times and all the remaining items only once. Since
the item 27 occurs the maximum number of times, i.e., 3, the mode of the marks is 27.
In discrete series, quite often mode can be determined just by inspection, i.e., by looking
to that value of the variable around which the items arc most heavily concentrated. However,
when the mode is determined just by inspection, an error of judgement is possible. There are
certain cases where the difference between the maximum frequency and the frequency preceding
it or succeeding it is very small and the items are heavily concentrated on either side. In such
cases, it is desirable to prepare a grouping table and an analysis table. These tables help us in
ascertaining the modal class.
A grouping table has 6 columns. In column 1, the maximum frequency is marked or put
in a circle. In column 2, frequencies are grouped in two’s. In column 3, leave the first frequency
and then group the remaining in two’s. In column 4, group the frequencies in three’s. In column 5,
88
93
leave the first frequency and group the remaining in three’s. In column 6, leave the first two
frequencies and then group the remaining in three’s. In each of these cases, take the maximum
total and mark it in a circle or bold type.
After preparing the grouping table, prepare an analysis table. While preparing this table, put
column number on the left hand side and the various probable values of mode on the right hand
side. The values against which frequencies are the highest are marked in the grouping table and
then entered by means of a bar in the relevant box corresponding to the values thy represent.
The procedure of preparing grouping table and analysis table shall be clear from the following
illustration.
Illustration 12: Calculate the value of mode for the following data.
Marks 10 15 20 25 30 35 40
No. of Students 8 12 36 35 28 18 9
Solution: Since it is difficult to say by inspection as to which 20, 25 or 30 is the modal value, we
prepare the grouping and analysis tables.
Grouping Table
Marks f
X I II III IV V VI
10 8
20
15 12 56
48
20 36 83
71
25 35 99
63
30 28
46 81
35 18 55
27
40 9
Analysis Table
89
94
Column 20 25 30
No.
I 1
II 1 1
III 1 1
IV 1 1
V 1 1
VI 1 1 1
4 5 3
Corresponding to the maximum total 5, the value of the variable is 25. Hcncc modal value is 25.
E. Calculation of Mode - Continuous Series:
When we are given a continuous series, first of all we have to identify the modal class
either by inspection or by preparing grouping and analysis tables. Then we have to apply the
following formula.
1
Mode 1 C,
1 2
1 = difference between the frequency of the modal class and the preceding class (ignoring
signs)
2 = difference between the frequency of the modal class and the succeeding class (ignoring
signs)
C = Class interval of the model class
While applying the^bove formula for calculating mode, it is necessary to see that the
class intervals arc equal. If they are unequal, they should first be made equal on the assumption
that the frequencies are equally distributed through out the class, otherwise we will get misleading
results.
Illustration 13: Calculate mode from the following data.
90
95
Marks No. of Marks No. of
Students Students
0-10 3 50-60 15
10-20 5 60-70 12
20-30 7 70-80 6
30-40 10 80-90 2
40-50 12 90-100 8
Solution: By inspection, the modal class is 50-60.
1 15 12 3, 2 15 12 3, C 10, 1 50
1
Mode 1 C,
1 2
3
50 10 50 5 55
33
\ Mode of the marks = 55
6.3.3 Geometric Mean:
Geometric mean is defined as the Nth root of the product of N items. If there are two
items, we have to take the square root; if there are three items, the cube root; and so on.
Symbolically
GM X1 , X 2 ..............X n
When the numbers of items is three or more the task of multiplying the numbers and the
calculating the Nth root becomes excessively difficult. Logarithms are used to simplify the
calculations. Geometric mean is then calculated as follows.
log x
GM Antilog
N
Illustration 14: Daily income of ten families of a particular place is given below:
Daily income (Rs.): 85, 70, 15, 75, 500, 8, 45, 250, 40, 36.
Calculate the geometric mean.
91
96
Solution:
X Log X X Log X
85 1.9294 8 0.9031
70 1.8481 45 1.6532
15 1.1761 250 2.3979
75 1.8751 40 1.6021
500 2.6990 36 1.5563
Total 17.6373
log x 17.6373
GM Antilog AL AL 1.7376 58.03
N 10
6.3.4 Harmonic Mean
The harmonic mean is based on the reciprocals of numbers averaged. It is defined as the
reciprocal of the arithmetic mean of the reciprocal of the individual observations. Thus harmonic
mean can be obtained as follows.
N
H.M.
1 X
Illustration 15: The monthly incomes of 8 employees are given below: Monthly income (Rs.):
1250, 1190, 1340, 1160, 1090, 1150, 1210, 1400
Find the harmonic mean
Solution:
X 1/X X l/X
1250 0.0008 1090 0.0009
1 190 0.0008 1 150 0.0008
1340 0.0007 1210 0.0008
1160 0.0009 1400 0.0007
Total 0.0064
92
97
N 8
H.M. 1250
1 X 0.0064
6.4 Summary :
A single value that describes the characteristics of the entire mass of data is called an
average or a measure of central tendency. An average should posses certain characteristics in
order to he a good average. The very commonly used measures of central tendency are: arithmetic
mean, median, mode, geometric mean and harmonic mean. Each average can be obtained for
different types of data, viz., individual series, discrete series and continuous series. There are
certain merits and limitations for each average when compared to the other averages.
6.4 Unit end Questions :
1. Why averages arc called measures of central tendencies? What arc the requisites
of a good average?
2. What is an arithmetic mean? Explain its merits and limitations. Describe various
methods of obtaining arithmetic mean.
3. Explain the merits and limitations of median.
4. Explain the concept of mode. How can you obtain the mode if it cannot be
identified by inspection?
6.5 Readings :
1. S.P.Gupta: Statistical Methods.
2. D.C.Sancheti & V.K.Kapoor: Statistics -Theory, Methods and Applications.
3. K.V.Rao : Research Methodology in Commerce and Management.
93
98
GUIDELINE - 7 :
MEASURES OF DISPERSION-I
Structure
7.0 Objective
7.1 Quartile Deviation
7.1.1 Coefficient of Quartile Deviation
7.2 Individual Observation
7.3 Discrete Series
7.4 Merits and Limitations of Quartile Deviation
7.5 The mean Deviation
7.6 Computation of Mean Deviation - Individual Observation .
7.7 Discrete Series
7.8 Continuous Series
7.9 Merits and Lamitations of Mean Deviation
7.0 Objective:
In the previous lessons, you have studied about the measures of central tendency. For a
representative nature of the data, we have to study, how the given observations are moving
around the value of central tendency, i.e.. whether the values are closer to the central value or
whether they are widely scattered to the left of the central value or to the right of the value or in
both directions. Measures of Dispersion is a statistical technique to know this phenomenon. In
this lesson, the methods of Quartile Deviation and Mean Deviation are discussed.
7.1 Quartile Deviation:
Q 3 - Q1
Quartile Deviation or Q.D.
2
Quartile Deviation gives the average amount by which the two quartiles differ from the
median. In a symmetrical distribution the two quartiles Q1and Q3 are equidistant from the
94
99
median, i.e.. Med - Q1 Q3 Med . and as such the difference can bt taken as a measure of
dispersion. The median ± Q.D.), covers exactly 50 percent of the observations.
W hen Quartile deviation is very small, it describes high uniformity or small variation of
the central 50% items and a high quartile deviation means that the variation among the central
iteams is large.
Q3 Q1 Q3 Q1
Coefficient of Q.D. 2 Q 3 Q1
Coefficient of quartile deviation can be used to compare the degree of variation in different
distributions.
The process of computing quartile deviation is very simple, we have just to compute the
values of the upper and lower qumtiles.
Find out the value of quartile deviation and its coefficient from the following data.
Roll No: 1 2 3 4 5 6 7
Marks : 20 28 40 12 30 15 50
Solution:
12 15 20 18 30 40 50
95
100
N 1 7 1
Q1 Size of th iteam Size of 2 nd iteam size of second
4 4
iteam is 15. Thus Q1 15
N 1
Q3 Size of 3 th iteam
4
3 8
th iteam 6 item
th
Size of
4
Q3 Q1 40 15
QD 12.5
2 2
Q3 Q1 40 15 25
Coefficient of Q.D Q Q 40 15 55 0.455
3 1
N 1 Q 1
Q1 Size of th iteam 3 11th item
4 4
Size of 11th iteam is 20. Thus Q1=20
th
N 1 3 44
Q3 Size of 3 item 33rd item
4 4
96
101
Size of 33rd item is 40. Thus, Q3= 40.
Q3 Q1 40 20
Q.D 10
2 2
Q3 Q1 40 20
Coefficient of Q.D Q Q 40 20 0.333
3 1
Calculate quartile deviation and the coefficient of quartile deviation from the following data:
Calculation of Q.D and its Coefficient
Wages f C.f
(Rs Per week)
Less than 35 14 14
35-37 62 76
38-40 99 175
41-43 18 193
Over 43 7 200
Q3 Q1
Q.D
2
N th 200
Q1 = Size of item = = 50th item Q1, lies in the class 35-37
4 4
N th 200
Q1 Size of item 50th
4 4
N
C. f
Q1 L 4 i
f
50 14
Q1 35 2 35 1.16 36.16
62
3N 3 200
Q3 Size of th
item 15Q th item
4 4
97
102
Q, lies in the class 38-40 /
3N / 4 C. f
Q3 L i
f
150 76
Q3 38 38 1.49 39.49
99
39.49 36.16
Q.D 1.67
2
Q3 Q1
Coefficient of Q.D Q Q
3 1
It has a special utility in measuring variation in case of open and distribution or one in
which the data may be ranked but-measured quantitatively.
It is also useful in cratic or badly skewed distributions, where the other measures of
dispersion would be warped by extreme values. The Quartile deviation is not affected by the
presence of extreme values.
Limitations:
Quartile deviation ignores 50% items, i.e., the first 25% and the last 25%. As the value of
Quartile deviation does not depend upon every item of the series, it cannot be regarded as a
good method of measuring dispersion.
* Quartile deviation is not itself measured from an average, but it is a positional average.
Because of the above limitations quartile deviation is not often for statistical inference.
98
103
7.5 The mean Deviation :
The mean deviation is also known as the average deviation. It is I lie average di Here nee
between the items in a distribution and the median or mean of that series. Theoretically there is an
advantage in talking the deviations from median because the sum of deviations of i teams from
median is minimum when signs are ignored. However, in practice, the arithmetic mean is more
frequently used in calculating the value of average deviation and this is the reason why it is more
commonly called mean deviation. In any case, the average used must be clearly stated in a given
problem so that any possible confusion in meaning is avoided.
If x2, x1, x3, xN are N given observations then the deviation about an average A is given
by
1
M .D x A
n
1 D
D or
N N
Where |D| = x - A |. Read as mod (x - A) is the modulus value or absolute value of the
deviation ignoring plus and minus signs.
Steps :
* The deviations of iteams from median ignoring ± signs and denote these deviations by |D|
If a distribution is normal, the mean ± mean deviation is the range that w ill include 57.7
percent of the iteams in the scries. If it is moderately skewed, then we may expect approximately
57.5 percent of the iteams to fall within this range. Hence, if average deviation is small, the
distribution is highly compact or uniform, since more than half of the cases are concentrated with
a small range around the mean.
99
104
The relative measure corresponding to the mean deviation, called the coefficient of mean
deviation, is obtained by dividing mean deviation by the particular average used in computing
mean deviation. Thus, if mean deviation has been computed from median. The coefficient of
mean deviation small be obtained by dividing mean deviation by median.
M .D
Coefficient of M .D
Median
If mean has been used while calculating the value of mean deviation, in such a case
coefficient of mean deviation shall be obtained by dividing mean deviation by the mean.
Problem:
Calculate the mean deviation and its coefficient of the two income groups of five and
seven members given below:
Solution:
Group I Group II
4,800 400
5,800 1,400
100
105
D
Mean Deviation: Group M .D
N
N 1th
D = Deviation from median ignoring signs Median = Size of item
2
5 1 rd
3 item.
2
1200
Size of 3rd item is 4,400 M.D = = 240
5
This means that the average deviation of the individual incomes from the median income
is Rs. 240.
Mean Deviation : Group II
N 1th 7 1
Mean = Size of item = = 4th item size of 4th item is 4,400
2 2
D 4,400 N = 7
4,000
M .D 571.43
7
Note : If we were compute coefficient of mean deviation we shall divide mean deviation by
median. Thus for the first group.
240
M .D. 0.054
Coefficient of 4,400
571.43
Coefficient of M .D. 0.130
4,400
f D
M .D. (by the same logic as given before)
N
101
106
D denotes deviation from median ignoring signs.
Step :
Take the deviations of the items from median ignoring signs and denote them by D
Multiply these deviations by the respective frequencies and obtain the total f D
Problem:
x 10 11 12 13 14
f 3 12 18 12 3
Solution:
Solution
x f D f D C. f
10 3 2 6 3
11 12 1 12 15
12 18 0 0 33
13 12 1 12 45
14 3 2 6 48
f D
M .D.
N
N 1th 48 1
Median = size 2 item 24.5th item
2
102
107
size of 24.5 item is 12, hence medium = 12
36
M .D. 0.75
48
7.8 Calculation of Mean Deviation - Continuous Series:
For calculating mean deviation in continuous series the procedure remains the same as
discussed above. The only difference is that here we have to obtain the midpoint of the various
classes and take deviations of these points from median.
Problem :
Calculate the mean deviation and its coefficient from the following data:
M D
0- 10 5 5 5 38 190
10-20 8 13 15 28 224
20-30 12 25 25 18 216
30-40 15 40 35 8 120
40-50 20 60 45 2 40
50-60 14 74 55 12 168
60-70 12 86 65 22 264
70-80 6 92 75 32 192
N=92 f D 1414
N th 92
Med size of item = 46 th item
2 2
Medium lies in the class 40-50
N / 2 C. f
Med = L i
f
46 40
Med = 40 10 46 3 43
20
103
108
f D 1414
] M .D 15.37
N 92
Merits :
The outstanding advantage of the average deviation is its relative simplicity. It is simple to
understand and easy to compute. Any one familiar when the concept of the average can readily
appreciate the meaning of the average deviation. It a satiation at ion requires a measures a
measure of dispersion that will be presented to the general public or a group not very familiar
with in statistics. The average deviation is useful.
* It is based on each and every item of the data. Consequently change in the value of any
item would change the value of mean deviation.
* Since deviations are taken from a central value, co- repression about formation of different
distributions easily be made.
Limitations:
The greatest drawback of this method is that algebraic signs arc ignored while taking the
deviations of the items.
Because of these limitations its use is limited and it is overshadowed as a measure of variation by
the superior standard deviation.
104
109
GUIDELINE-8 :
MEASURES OF DISPERSION-II
Structure
8.0 Objective
8.1 The Standard Deviation
8.1.1 Calculation if Standard Deviation
8.2 Calculation of Standard Deviation in Discrete Series
8.3 Calculation of Standard Deviation in Continuous Series
8.4 Coefficient of Variation
8.0 Objective
Standard deviation is another measure of dispersion. This is an important measure in the
sense that, while calculations mean deviation from arithmetic mean, we have to take modulas to
the deviations, otherwise the sum o(“deviations from arithmetic mean will become zero. To ohivale
this problem, standard deviation is used in which the deviations will be squared. Further, standard
deviations will be useful in the computation of coefficient of variation, coefficient of skewness.
etc. This lesson will provide a detailed procedure of computing standard deviation and thereby
coefficient of variation.
8.1 The Standard Deviation
The standard deviation concept was introduced by Karl Pearson in 1823. It is by far the
most important and widely used measure of studying dispersion. Its significance lies in the fact
thai it is free from and defuse from which the earlier methods suffer of statistics most of the
properties of a good measure of dispersion. Standard deviation is also known as root mean
square deviation for the reason that it is the square root of the mean of the squared deviation from
the arithmetic mean. Standard deviation is denoted by the small break latter a (read as sigma).
8.1.1 Calculation of Standard Deviation : Individual Observation:
In case of Individual observations standard deviations may be computed by applying
any of the following two methods.
105
110
Deviations taken from Actual Mean:
When deviations are taken from actual mean the following formula is applied :
x 2
N
x x x
Steps :
Calculate the actual mean of the series i.e x
take the deviations of the items from the mean, i.e., find (.r - x). Denote these
deviations by x.
Square these deviations and obtain the total x 2
Divide x 2 by the total number of observations, i.e., N and extract the squareroot. This
gives us the value of standard deviation.
Deviations taken from Assumed Mean :
When the actual mean is in fractions, say it is 123.674 it would be too cumbersome to
take deviations from it and then obtain squares of these deviations. In such a case either the mean
may be approximated or else the deviations be taken from an assumed mean and the necessary
adjustment made in the value ofthe standard deviation. The former method of approximation is
less accurate and there lore, invariably in such a case deviations are taken from assumed mean.
When deviations are taken from assumed mean the following formula is applied.
2
d 2 d
N N
Steps :
Take the deviations of the items from an assumed mean, i.e., obtain (x-A). Denote these
deviations by d. Take the total of these deviations, i.e., obtain d .
Square these deviations and obtain the total d 2
106
111
Problem: Blood Serlum Cholesterol levels of 10 persons are as under, calculate the standard
deviation with the help of assumed mean.
x (x-264) d2 d
260 -4 16
290 + 26 676
255 -9 81
272 -8 64
263 -1 1
277 + 13 169
x 2641 d 1 d 2 2689
2
d 2 d
N N
d 2 2689, d 1, N 10
2
2689 1
10 10
107
112
(A) Actual Mean Method :
When this method is applied, deviations are taken from the actual mean i.e.. we find
x x and denote these deviations by x. These deviations are then squared and multiplied by
the respective frequencies. The following formula is applied.
fx 2
, where x x x
N
(b) Assumed Mean Method :
2
fd 2 fd
N N
Where d x A
Steps :
* Take the deviations of the items from an assumed mean and denote these deviations by
d.
* Multiply these deviations by the respective frequencies and obtain the total V fd.
* Obtain the squares of the deviations, i.e.. Calculate d 2 .
* Multiply the squared deviations by the respective frequencies, and obtain the total fd 2 .
* Substitute the values in the above formula.
Problem:
Calculate the standard deviation from the data given below:
x f (x - 6.5) fd fd2
Size of item d
3.5 3 -3 9 27
4.5 7 -2 -14 28
5.5 22 -1 22 22
6.5 60 0 0 0
7.5 85 +1 +85 85
8.5 32 +2 +64 128
9.5 8 +3 +24 72
N=217 fd 128 fd 2 362
108
113
2
fd 2 fd
N N
2
362 128
217 217
When this method is used we take deviations of mid points from an assumed mean and
divide these deviations by the width of class interval, i.e.. “I” In case class intervals are unequal,
we divide the deviations of midpoints by the lowest-common factor and use C instead of “i” in
the formula for calculating standard deviation. The formula for calculating standard deviation is:
2
fd 2 fd
i
N N
x A
Where d and i class interval. The use of above formula simplifies calculations.
i
Problem :
The annual salaries of a group of employees are given in the following table;
109
114
Salaries No. of persons (x-60)/S f.d f.d2
x f d
45 3 -3 -9 27
50 5 -2 -10 20
55 8 -1 -8 8
60 7 0 0 0
65 9 +1 +9 9
70 7 +2 +14 28
75 4 +3 +12 36
80 7 +4 +28 112
N=50 fd 36 fd 2 240
2
fd 2 fd
i
N N
2
240 36
i
50 50
4.8 5.184 5
10.35
8.3 Calculation of Standard Deviation - Continuous Series:
The formula is
2
fd 2 fd
i
N N
m A
d where i = class interval
i
110
115
Steps :
* lake the deviations of these midpoints from an assumed mean and denote these deviations
by d.
* Multiply the prequencies of each class with these deviations and obtain fd .
* Square the deviations and multiply them with the respective frequencies of each class
and obtain fd 2 .
Problem:
m d
10-20 15 15 -2 -30 60
20-30 23 25 1 -23 23
30-40 22 35 0 0 0
40-50 25 45 +1 +25 25
50-60 10 55 +2 +20 40
60-70 5 65 +3 +15 45
fd 2 fd 2 488
2 2
fd 2 fd 488 2
1 10
N N 125 125
111
116
Coefficient of Variation :
This measure developed by Karl Pearson is the most commonly used measure of relative
variation. It is used such problems where we want to compare the variability of one or more than
two series. That series (or group) for which the coefficient of variation is greater is said to be
more variable or conversely less consistent, less uniform, less stable or less homogeneous on the
other hand, the series for which coefficient of variation is less is said to be less variable or more
consistent, more uniform, more stable or more homogeneous.
Coefficient of Variation or C.V 100
x
Problem:
(Rs.) m d
78-82 80 2 +4 32
73-77 75 6 +5 + 18 54
68-72 70 7 +2 +14 28
63-67 65 12 +/ + 12 12
58-62 60 18 0 0 0
53-57 55 13 -1 -13 13
48-52 50 9 2 -18 36
43-47 45 7 -3 -21 63
38-42 40 4 -4 -16 64
33-37 35 2 -5 -10 50
fd
x A i
N
112
117
26
60 5
80
60 1.625 58.375
2
fd 2 fd
i
N N
2
352 26
5
80 80
10.36
C .V 100 100 17.75%
x 58.375
Problem:
From the prices of shares of a and y below find out which is more stable.
A X X x2 y Y Y y2
x y
35 -16 256 108 +3 9
54 +3 9 107 +2 4
52 +1 1 105 0 0
53 +2 4 105 0 0
56 +5 25 107 +1 1
58 +7 49 107 +2 4
52 +1 1 104 -1 1
50 -1 1 103 -2 4
51 0 0 104 -1 1
49 -2 4 101 -4 16
113
118
Coefficient of Variation X:
C .V . 100
X
X 510
X 51
N 10
x 2 350
5.916
N 10
5.916
C .V . 100 11.6
51
Coefficient of variation Y:
C.V . 100
y
y 1050
y 105
N 10
y 2 40
2
N 10
2
C.V . 100 1.905
105
Since coefficient of variation is much less in case of shares y, hence they are mere stable
in value.
114
119
GUIDELINE - 9 :
SOLVED PROBLEMS IN MEASURES OF CENTRAL
TENDENCY, MEASURES OF DISPERSION
Structure
9.0 Objectives
9.1 Introduction
9.0. Objectives
9.1. Introduction
Arithmetic mean, median, mode, geometric mean and harmonic mean constitute measures
of central tendency. Quartile deviation, mean deviation and standard deviation, constitute measures
of dispersion. Pearson’s coefficient of correlation and spearman rank correlation are the very
commonly used measures of correlation. The two regression equations, vig., regression equation
of y on x and regression equation of x on y are applied for the purpose of estimation in regression
analysis.
Various problems on calculation of arithmetic mean, median, mode, geometric mean and
harmonic mean are presented in the following subsections.
115
120
Wages (Rs): 550, 610, 575, 625, 715, 815, 760, 590, 630 and 750. Find arithmetic
mean of the wages.
Sol. The given data are ungrouped data and we can find the mean by direct method.
X 6620
X 662
N 10
\ AM of wages is ‘X = Rs.662.
X 64 63 62 61 60 59
F 2 18 12 9 7 6
Sol: The given table is a discrete series. We can apply short cut method.
X f d = X - 62 fd
64 8 2 16
63 18 1 18
62 12 0 0
61 9 -1 -9
60 7 -2 -14
59 6 -3 -18
60 -7
N = 60, Sfd = -7
X A
fd
62
7 62 0.1167 61.8833
N 60
Ex.3: Following is the distribution of 400 persons according to different income groups.
116
121
Income (Rs. 000) No. of persons Income (Rs. 000) No. of persons
0-2 81 10-25 27
2-3 103 25-50 6
3-5 115 50-75 2
5-10 64 75-100 2
Calculate arithmetic mean of the incomes.
Sol: The given data is a continuous scries. We have to apply only short cut method. We
cannot apply step deviation method because the class intervals are not equal.
Let us take assumed mean as A = 7.5
Income F Mid Value d = X - 7.5 fd
(Rs.000) X
0-2 81 1 -6.5 -526.5
2-3 103 2.5 -5.0 -515.0
3-5 115 4 -3.5 -402.5
5 - 10 64 7.5 0 0
10-25 27 17.5 10 270
25-50 6 37.5 30 180
50-75 2 62.5 55 110
75 -100 2 87.5 80 160
400 -724
N = 400 ; Sfd = -724
X A
fd
7.5
724 7.5 1.81 5.69
N 400
\ Average Income = Rs.5,690.
9.2.2. Median:
Ex.4: Determine median from the following data: 25, 20, 15, 45, 18, 7, 10, 38, 12.
Sol: Arranging the data in ascending order: 7, 10, 12, 15, 18, 20, 25, 38, 45.
117
122
N 1 9 1
Median = Size of th item. = Size of th item.
2 2
=Size of 5th item =18.
Ex.5: The following data gives you the number of members in the family for 60 families. Calculate
the median.
No. of members No. of families No. of members No. of families
1 1 7 9
2 3 8 5
3 5 9 3
4 6 10 2
5 10 11 2
6 13 12 1
Sol: The given series is a discrete series. Hence, for the purpose of obtaining the median, wc
have to calculate cumulative frequencies.
X f c.f
1 1 1
2 3 4
3 5 9
4 6 15
5 10 25
6 13 38 Median Class
7 9 47
8 5 52
9 3 55
10 2 57
11 2 59
12 1 60
N 1 9 1
Median = Size of th item. = Size of th item. = size of 30.5th item =6.
2 2
Ex.6: The following is the distribution of marks secured by 398 students in an examination.
118
123
Find The median.
Sol: The given series is a continuous series with inclusive type of class intervals. Hence, for the
purpose of obtaining the median, we have to convert the class intervals to exclusive type and find
the cumulative frequencies.
Class Interval Frequency Cumulative frequency
0.5 - 20.5 42 42
20.5 -30.5 38 80
30.5 -40.5 120 200 Median class
40.5 - 50.5 84 284
50.5 - 60.5 48 332
60.5 - 70.5 36 368
70.5 - 80.5 30 398
N
Median = size of th item = size of 199th item.
2
Hence median lies in the class 30.5 - 40.5
l = 30.5, m = 80, f = 120, c = 10
N
m
Median = l 2 .C 30.5 190 80 .10 30.5 9.92 40.42
f 120
119
124
Sol: The given series is a discrete series. By inspection it is difficult to say which is the
modal value becausc, though the highest frequency is 45, the concentration appears to be greater
around 42. Hence we have to prepare the grouping and analysis tables.
Grouping Table:
Size of I II III IV V VI
collar f
12.0 10
12.5 18 28
13.0 38 56
13.5 42 80 66 98 125
14.0 45 87
14.5 15 60
15.0 8 23 102 30
15.5 7 15 68
Analysis Table :
Column No. Size of collars
12.5 13.0 13.5 14.0 14.5
I 1
II 1 1
III 1 1
IV 1 1 1
V 1 1 1
VI 1 1 1
Total 1 3 5 4 1
The highest total in the analysis table is 5 and the item corresponding to it is 13.5.
Hence the modal size is 13.5 inches.
Ex.9: Calculate mode for the distribution of monthly rent paid by 500 families in a locality.
Monthly Rent (Rs) No. of families Monthly Rent No. of families
0-50 5 250-300 87
50-100 14 300-350 60
100-150 40 350-400 38
150-200 91 400 & above 15
200-250 150 Total 500
Sol: The highest concentration is clearly in the class interval 200 -250 which is the modal class.
120
125
L-200, D1=150 - 91= 59, D2 = 150-87 = 63, C = 50.
59
Mode 200 .50 200 24.18 224.18
59 63
Modal Rent = Rs.224.18
9.2.4. Geometric Mean:
Ex.10: Calculate geometric mean of the following series of monthly income (Rs) of a both of 10
members. 180,250,490, 120, 1400,7000, 1050, 150,360, 100.
Sol:
X Log x X Logx
180 2.2553 7000 3.8451
250 2.3979 1050 3.0212
490 2.6902 150 2.1761
120 2.0792 360 2.5563
1400 3.1461 100 2.0000
Total 26.1674
N = 10
log X 26.1674
GM Anti log Anti log
N 10
= Antilog 2.6167 = 413.7
\ Geometric Mean = Rs.413.70
9.2.5. Harmonic Mean:
Ex. 11: Calculate the harmonic mean of the following series of monthly expenditure of a batch of
students.
Monthly expenditure (Rs): 125, 130, 75,. 10, 45, 50, 35,40, 500, 150.
Sol:
X 125 130 75 10 45 50 35 40 500 150
1/X 0.0080 0.0077 0.0133 0.1000 0.0222 0.0200 0.0286 0.0250 0.0020 0.0067
N=10 S(l/X) = 0.2335
N 10
HM 42.83
1 / X 0.2335
121
126
\ Harmonic mean of the expenditures = Rs.42.83.
Various problems on quartile deviation, mean deviation, standard deviation and coefficient
of variation are presented in the following sub-sections.
Ex. 12: The sales of a firm in different months arc given below. Find the quartile deviation
Sales(Rs. 000):78, 86, 90, 88, 84, 88, 86, 80, 82, 84, 82, 80.
Sol: The given series is an individual series. Hence, arranging the items in ascending order of
magnitude, 78, 80, 80, 82, 82, 84, 84, 86, 86, 88, 88, 90.
N 1 12 1
Q1 = Size of th item. = size of the item. = size of 3.25th item.
4 4
1 th 1
= 3rd item + (4 item - 3rd item) = 80 + (82 - 80) = 80.5
4 4
3N 1 312 1
Q3 = Size of th item — size of th item = size ot 9.75th item
4 4
3 3
= 9th item + (10th item - 9th item) = 86 + (88-86) = 87.5
4 4
Q3 Q1 87.5 80.5
Quartile Deviation 3 .5
2 2
Ex.13: Calculate quartile deviation for the following distribution of weights of 31 workers in a
factory
Weight(kgs) 60 62 64 66 68 70 72 74
No. of workers 1 3 5 7 10 3 1 1
Sol: The given series is a discrete series. Hence we have to find the cumulative frequencies.
122
127
X F Cumulative Frequency
60 1 1
62 3 4
64 5 9 Q1 class
66 7 16
68 10 26 Q3 class
70 3 29
72 1 30
74 1 31
N 1 31 1
Q1 = Size of th item = size of th item = size of 8th item = 64
4 4
3N 1 331 1
Q3 = Size of th item = size of th item = size of 24th item = 68
4 4
Q3 Q1 68 64
Quartile deviation 2
2 2
\ Quartile deviation of the weights = 2 kgs.
Ex.14: Evaluate quartile deviation for the following data.
Income (Rs) No. of persons Income (Rs) No. of persons
Less than 50 54 110-130 230
50-70 100 130-150 125
70-90 140 above 150 51
90-110 300 Total 1000
Sol:
Income (Rs) F c.f
Less than 50 54 54
50 - 70 100 154
70 - 90 140 294 Q1 class
90 - 110 300 594
110 - 130 230 824 Q3 class
130-150 125 949
Above 150 51 1000
l1 = 70,m1= 154, f1 = 140, c = 20, N = 1000.
123
128
N
m1 250 154 .20 70 13.71 83.71
Q1 l1 4 .c 70
f1 140
3N
m3
Q3 l3
4 .c 110 750 594 .20 110 13.56 123.56
f3 230
Q3 Q1 123.56 83.71
Q.D 19.925
2 2
Quartile deviation of the incomes = Rs.19.92.
9.3.2. Mean Deviation:
Ex.15: The incomes of 7 persons are given below:
Income (Rs): 4400, 5800, 3000, 4000, 4800, 4600, 4200.
Calculate mean deviation from median.
Sol: In order to identify the median, we have to arrange the data in ascending order. To find the
mean deviation, we have to take the absolute deviations from the median. The calculations are
shown in the following table.
N 1
Median = Size of th item = size of 4th item = 4,400.
2
3,000 1,400
4,000 400
4,200 200
4,400 0
4,600 200
4,800 400
5,800 1,400
Total 4,000
124
129
D
M .D. , D = X - M, M is the median.
N
N -7, S | D | = 4000.
D
\ M .D. ,
N
\ Mean deviation of the incomes = Rs.571.43
Ex.16: Calculate mean deviation about median from the following data relating to heights (inches)
of 100 children.
Height (inches) 60 61 62 63 64 65 66 67 68
No. of children 2 3 12 29 25 12 10 4 3
Sol: The given series is a discrete series. First, we have to obtain median by calculating cumulative
frequencies.
X f cf |D|=|X - 64| f |D|
60 2 2 4 8
61 3 5 3 9
62 12 17 2 24
63 29 46 1 29
64 25 71 0 0
65 12 83 1 12
66 10 93 2 20
67 4 97 3 12
68 3 100 4 12
100 126
N 1
Median = Size of the item = size of 50.5th item = 64.
2
f D 126
M .D. 1.26
N 100
\ Mean Deviation about median of the heights = 1.26 inches.
9.3.3 Standard Deviation:
Ex.17: Calculate standard deviation for the following series. 8, 9, 15, 23, 5, 11, 19, 8, 10, 12.
Sol: Short cut method.
125
130
X d= x - 11 d2
8 -3 9
9 -2 4
15 4 16
23 12 144
5 -6 36
11 0 0
19 8 64
8 -3 9
10 -1 1
12 1 1
Total 10 284
N=10, Sd = 10, Sd2 = 284
2 2
d 2 d 284 10
s 28.4 1 27.4 5.23
N N 10 10
Ex.18: Calculate standard deviation of household size from the following frequency distribution
of 500 households covered in a sample survey.
Household size 1 2 3 4 5 6 7 8 9
No. of households 92 49 52 82 102 60 35 24 4
Sol: The given series is a discrete scries.
Shortcut method.
X f d=X-5 fd fd2
1 92 -4 -368 1472
2 49 -3 -147 441
3 52 -2 -104 208
4 82 -1 -82 82
5 102 0 0 0
6 60 1 60 60
7 35 2 70 140
8 24 3 72 216
9 4 4 16 64
500 -483 2683
N = 500 Sfd = - 483 Sfd = 2683
2
126
131
2 2
fd 2 fd 2683 483
S
N N 500 500
X 650
life time(hrs) f Mid values X d fdc fdc2
100
300 - 400 6 350 -3 -18 54
400 - 500 18 450 -2 -36 72
500 - 600 73 550 -1 -73 73
600 - 700 165 650 0 0 0
700 - 800 62 750 1 62 62
800 - 900 22 850 2 44 88
900 - 1000 4 950 3 12 36
350 -9 385
N - 350, sfdc = 9, sfdc2 = 385 c - 100
2 2
fd 2 fd 385 9
s C .C
N N N 350
132
9.3.4. Coefficient of Variation:
Ex.20: The following data refer to the dividend (%) paid by two companies A and B over the last
7 years.
A 4 8 4 15 10 11 9
B 12 8 3 15 6 4 10
Sol: We have the calculate arithemetic mean and standard deviation of the dividends.
Company A Company B
X d=X-8 D2 Y d=Y-8 d2
4 -4 16 12 4 16
8 0 0 8 0 0
4 -4 16 3 -5 25
15 7 49 15 7 49
10 2 4 6 -2 4
11 3 9 4 -4 16
9 1 1 10 2 4
5 95 2 114
d 5 d 2
X A 8 8.71 Y B 8 8.29
N 7 N 7
2 2
d 2 d d 2 d
S S
N N N N
2 2
95 5 114 2
7 7 7 7
S
CV .100
X
3.61
Company A : CV .100 41.45%
8.71
4.03
Company B : CV .100 48.61%
8.29
128
133
We observe that the coefficient of variation for dividends of company B is more than that of
company A. Hence, we conclude that dividends of company B are having more variations. In
other words the dividends of company A are more consistent.
129
134
135
136
137
138
139
140
141
142
143
144
145
146
147
GUIDELINE-10 :
CORRELATION
Structure
10.0 Objective
10.1 Introduction
10.2 Definitions
10.3 Significance
10.4 Correlation and Causation
10.5 Types of Correlation
10.6 Methods of Studying Correlation
10.7 Properties of Karl Pearson’s Correlation Co-efficient
10.8 Rank Correlation Coefficient
10.9 Concurrent Deviation Method
10.10 Summary
10.11 Model Questions
10.12 Suggested Readings
10.0 Objective:
In economic and business environment, we encounter with the relationships between
different variables- demand and price, supply and price, output and inputs, cost and output,
consumption and income, etc. This lesson helps the reader in understanding the relationship
between two or more variables under study: The analysis of correlation is one of the important
statistical techniques to understand the nature, magnitude and direction of the relationship between
two variables. Thus the objective of this lesson is to provide an understanding of the concept of
correlation and the methods of computing the co-efficient of correlation.
10.1 Introduction:
In the earlier lessons, we have studied the problems relating to one variables - averages,
dispersion etc,. However, in real world we encounter with the relationships between two or
130
148
more variables. If two variables vary in such a way that movements in one are accompanied by
movements in the other, these variables arc said to be correlated. For example, we know that
some relationship exists between quantity demanded and its price, agricultural production and
rainfall, age of a husband and age of a wife, increase in the number of television licenses and
number of cinemagoers. The degree as well as the direction of relationship between the variables
under consideration is measured through correlation analysis. Thus the measure of correlation
called as the correlation co-efficient summarizes in one figure the direction and degree of correlation.
10.2. Definitions:
Following are some of the important definitions of correlation given by statisticians and
mathematicians.
• Y.L. Chow defines correlation as “an analysis that attempts to determine the
degree of relationship between two or more variables”.
a) In our practical life we encounter with variables which show some kind of
relationship. We know that there exists a relationship between demand and price,
supply and price, income and expenditure, savings and income etc. The
131
149
conelation analysis will help us to measure in one figure the degree or relationship
existing between the variables under study.
b) Once we know that two variables are closely related, we can estimate the value
of one variable (dependent) given the value of another variable (independent).
However, this is known as the analysis of regression which can be discussed in
the next lesson.
However, it should be noted that the coefficient of correlation is one of the most widely
used and also one of the most widely abused statistical measure. It is abused in the sense that one
sometimes overlooks the fact that correlation measures are nothing but the strength of linear
relationship and that it does not necessarily imply a cause and effect relationship. For instance, if
we compute the correlation between two series of production of fertilizers and construction of
houses, we may get a high positive correlation coefficient. But there is no theoretical justification
signifying such type of relationship and caution need to be taken while interpreting the correlation
co-efficient. This type of correlation is called as spurious correlation or nonsense correlation.
150
variables. The explanation of a significant degree of correlation may be any one or combination
of the following reasons:
a) The occurrence of correlation may be due to pure chance, especially in a small sample:
We may get a high degree of correlation between two variables in a sample but such
type of relationship may not exist in the universe. This is especially so in case of small
samples. Such a correlation may arise due to pure random sampling variation or due to
bias of the investigator in selecting the sample.
b) Correlation between two variables may be due to the influence of some other variable(s):
It is just possible that a high degree of correlation between the variables may be due to
some causes affecting each variable or different causes affecting each variable with the
same effect. For example, a high degree of correlation between the production of wheat
and coffee may be due to the fact that both are related and influenced by the amount of
rainfall. But none of the two variables, i.e. production of wheat and coffee, is the cause
of the other. To take another example we may get a correlation coefficient of 0.95
between the salaries of teachers and the consumption of alcohol over a period of time.
This does not indicate or prove together because, both are influenced by variables like
growth in national income, population, etc.,
c) Both the variables may be naturally influencing each other so that neither can be designated
as the cause and the other the effect. In some instances, despite of the high degree of
correlation between two variables, it may be difficult to pinpoint as to which is the cause
and which is the effect. This is especially likely to be so in ease of economic variables. It
is an established theory in economics that as price increases quantity demanded will
decrease and vice versa. Mere the cause is the change in price and the effect is the
change in quantity demanded. On the other hand, it is also possible, in a dynamic analysis,
that increased demand of a commodity due to growth of population or other reasons
may exercise an upward pressure on prices. If so, the cause is the increased, demand
and the effect is the price. Thus, at times it may become difficult to explain from the two
correlated variables which is the cause and which is the effect.
133
151
10.5. Types of Correlation:
If both variables are moving in the same direction, the correlation is said to be positive
and the relationship is said to be direct relationship-for example Supply and price. If both variables
are moving in opposite directions, the correlation is said to be negative and the relationship is
called as inverse relationship-for example demand and price.
The distinction between simple, partial and multiple correlation is based upon the number
of variables studied. When only two variables are studied, it is a problem of simple correlation.
When three or more variables are studied, it is a problem of either partial or multiple correlation.
In partial correlation, though the number of variables is more than two we consider the relationship
between only two variables to be influencing each other while the effect of other influencing
variables being kept constant. On the other hand in case of multiple correlation the relationship
between three or more variables wilt be studied simultaneously.
The distinction between linear and non-linear correlation is based upon the nature of the
ratio of change between the variables. If the amount of change in one variable tends to bear
constant ratio to the amount of change in other variable, then the correlation is said to be linear.
If we plot a graph between the two variables and if all points lie on the straight line, it is a case of
linear correlation: On the other. hand, if the amount of change in one variable does not bear a
constant ratio to the amount of change in the other variable, the correlation is said to be non-
134
152
linear.. In this case, the graph of the variables is not a straight line and denotes a curve. So, it is
also called as curvi-linear. The following graphs are examples for linear and non-linear correlation.
Diagram -1
Different methods exist to ascertain whether two variables are correlated or not. Some
methods give only the direction of correlation while some methods will help us in determining the
degree or magnitude as well as the direction of correlation between the variations. The various
methods of measuring correlation are:
135
153
a) Scatter diagram method
b) Graphicmethod
Apart from these, Karl Pearson’s coefficient of correlation can also be computed by
using regression equations, which can be explained in the subsequent lessons.
This is the simplest method of ascertaining the correlation between two variables. Through
this method, we can determine only the direction of correlation. When this method is used, the
given data are plotted on a graph paper in the forms of dots. For each paired values of the given
variables, say X and Y, are put a dot and thus obtain as may dots as the number of paired
observations. The chart showing these dots is known as Scatter diagram. By looking at the
scatter of various dots, we can form an idea as to whether the variables under consideration are
related or not. The more closely the points lie, the higher is the relationship between the variables.
On the other hand, if the points are widely scattered, the lower is the relationship between the
variables. Different cases of correlation are shown in the following diagrams - 2:
In graphical method, the individual values of the two variables are plotted on the graph
paper and we obtain two curves/lines for each of the variables (X and Y). By examining the
direction and closeness of the two curves so drawn, we can ascertain whether the two variables
are related or not. If both the curves drawn on the graph are moving in the same direction, the
correlation is said to be positive.
136
154
137
155
If both the drawn curves are moving in opposite directions, correlation is said to be negative.
This method is normally used where the data are given over a period of time. As in the case of
scatter diagram, in this method also, we cannot get a numerical value describing the extent to
which the variables are related.
This is one of the most popular and widely used method of measuring correlation. This is
a mathematical method, wherein we obtain the direction as well as the degree of correlation
between the variables under consideration. The Pearson’s coefficient of correlation is denoted
by the symbol V. The formula for computing Pearson’s coefficient of correlation is
xy
r
NsX sY
Where x X X , y Y Y
N = number of observations.
We know that,
X X ,
1 2 1 2
sx x
N N
Y Y ,
1 2 1 2
y y
N N
138
156
xy xy
r
1 2 1 x 2 X y 2
x y 2
N N
The value of correlation coefficient is always lies between - 1 and+1. If r = -1, there is
perfect negative correlation, r = + 1, there is perfect positive correlation and r = 0, there is no
correlation between the variables. However, in practice we used to get a value lies between
-1 and +1. The following steps will help in computing the correlation coefficient using the above
formula.
• Take the deviations of the ‘X’ and ‘Y’ series from the respective means, i.e.
compute
Square the deviations of 4X’ and ‘Y’ and obtain the respective totals, ie, x 2 and
y 2
Multiply the deviations of X and Y series and obtain the total, i.e. xy
Substitute the values of xy , x 2 and y 2 in the formula to obtain the value
of correlation coefficient (r).
This method is easier when the mean values of ‘X’ and ‘Y’ scries arc integers. If the
mean values are not integers, the computation of deviations, their squares and their product
involves much time and in that case we can use the following formula which is obtained by
expanding the terms in the numerator and denominator of the .formula.
Σ XY ΣX ΣY
r
NΣ X Σ X NY 2 ΣY
2 2 2
• The advantage of this method is, the correlation coefficient can be calculated
directly from the given values of the two series ‘X’ and ‘Y’ without taking
deviations. The following steps can be followed while computing the correlation
coefficient.
139
157
• Total the series of the ‘X’ and ‘Y’ series to obtain ΣX and ΣY
• Multiply the series of ‘X’ with the corresponding value of the ‘Y’ series and total
to obtain ΣXY
Quite often we encounter with data, where the numbers are large and the means are not
integers. If the given data involves large numbers, the computation of ΣXY, X 2 , Y 2 etc.,
consumes more time and this can be avoided by taking deviations from some assumed mean of
the series. In this case, the formula for computing the coefficient of correlation is
NΣ dx dy - Σdx dy
r
N Σ dx Σ dx 2
2
N Σ dy 2 Σ dy 2
Where dx refers to deviations of ‘X’ series from an assumed mean
Σ dx dy = sum of the product of the deviations of ‘X’ and ‘Y’ series from their assumed mean
Σ dx 2 = sum of the squares of the deviations of ‘X’ scries from an assumed mean
Σ dy 2 = sum of the squares of the deviations of ‘Y’ series from an assumed mean
N = Number of observations.
Substitute the above values in the formula and simplify to obtain the value of the coefficient
of correlation.
140
158
It should be noted that, whatever the formula we use, we get the same value of the
coefficient of correlation and the use of the formula is in accordance with convenience. This can
be illustrated with the following example.
Example - 1: Calculate Karl Pearson’s Coefficient of correlation from the following data;
X: 42 45 49 56 59 64 63
Y: 17 20 27 39 35 39 40
x XX yYY
X Y x2 y2 xy
X 54 Y 54
45 20 -9 -11 81 121 99
49 27 -5 -4 25 16 20
56 39 2 8 4 64 16
59 35 5 4 25 16 20
61 39 10 8 100 64 80
63 40 9 9 81 81 81
ΣX 378 ΣY 217
X 54 ; Y 31 ;
N 7 N 7
484
0.96
506.60
141
159
Solution II : By direct method
X Y X2 Y2 XY
7 . 484 484
0.96
7 . 460 7 . 558 460 558
142
160