0% found this document useful (0 votes)
19 views80 pages

Stat

Uploaded by

Tesfahun Tegegn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views80 pages

Stat

Uploaded by

Tesfahun Tegegn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

5.

It is relatively reliable in the sense that it does not vary too much when repeated
samples arc taken from one and the same population, at least not as much as
some other kind of statistical descriptions.

6. The mean is typical in the sense that it is the center of gravity balancing the values
on either side of it.

7. It is a calculated value and not based on position in the series.

B. Limitations of Arithmetic Mean:

1. Since the value of mean depends upon each and every item of the series, extreme
values, i.e., very small and very large items unduly affect the value of the average.
The smaller the number of observations, the greater is likely to be the impact of
extreme values.

2. In a distribution with open end classes, the value of mean cannot be computed
without making assumption regarding the size of the class interval of the open
end classes. If such classes contain a large proportion of the values, then mean
may be subject to substantial error.

3. The arithmetic mean is not always a good measure of central tendency. The
mean provides a characteristic value, in the sense of indicating w here most of
the values lie, only when the distribution of the variable is reasonably normal i.e.,
bell shaped. In case of a U shaped distribution the mean is not likely to serve a
useful purpose.

C. Calculation of Arithmetic Mean - Individual Observations (Ungrouped Data):

When we are given a sequence of items, arithmetic mean can be calculated either by
direct method or by shortcut method.

i) Direct Method:

In the direct method, arithmetic mean can be obtained by adding all the items and then
dividing the total by the number of items.

X
Symbolically : X 
N
76

81
Illustration 1 : The monthly incomes of 10 employees in an office are given below.
Income (Rs.) : 1780, 1760, 1690, 1750, 1840, 1920, ! 100, 1810,1050,1950
Calculate the arithmetic mean of incomes.
Solution:  X = 16,650, N = 10

X 16650
X   1665, The average income is Rs. 1665
N 10
ii) Short cut Method:
In the short cut method, one of the items, usually the middle item will be taken as the
assumed mean and then deviations are calculated from the assumed mean. The formula for
calculating arithmetic mean is

d
X A , where d = X-A, Deviations taken From the assumed mean A.
N
Illustration 2: The monthly incomes of 10 employees in an office are given below.
Income (Rs.) : 1780, 1760, 1690, 1750, 1840, 1920, 1100, 1810, 1050,1950
Calculate the arithmetic mean of incomes by short cut method.
Solution: Lei us lake 1750 as the assumed mean.
Income X (Rs.) DX-1750
1780 35
A  1750
1760 10
N  10
1690 -60
 d  -850
1750 0
1X40 90
1920 170 d
X A
1 100 -650 N
IX10 60
1050 -700 1750 
 850
10
1950 200
 1750  85
-850
 1665
The average income is Rs. 1665

77

82
Note that we have got same value of arithmetic mean either by direct method or by short
cut method.
D. Calculation of Arithmetic Mean-Discrete Series:
When we are given a discrete series, the arithmetic mean can be calculated either by
direct method or by short cut method.
i) Direct Method:
The formula for computing arithmetic mean is

fx
X , where f = frequency,,
N
X = variable and N = total frequency
Illustration 3: From the following data of marks obtained by 60 students of a class, calculate the
arithmetic mean.
Marks (X) 20 30 40 50 160 70
No. of Students (f) 8 12 20 10 6 4

Solution:
X f fX
N  f  6
20 8 160
fx  2460
30 12 360 fx
X
40 20 800 N
50 10 500
60 6 360
70 4 280 2460 = 60 = 41
60 2460
ii) Short cut Method:
In the short cut method, usually we take middle item of X as the assumed mean and
calculate the deviations from that assumed mean. In this case the arithmetic mean can be obtained
from the following formula.

fd
X A , where A is assumed mean
N

78

83
D = X - A and N is the total frequency.
Illustration 4 : From the following data of marks obtained by 60 students of a class, calculate
arithmetic mean.
Marks (X) 20 30 40 50 60 70
No. of Students (f) 8 12 20 10 6 4

Solution :
X F D = X-40 fX N - 60, A = 40, fd - 60
20 8 -20 -160 fd
X A
40 20 0 0 N
50 10 10 100
60 6 20 120 60
 40   41
60
70 4 30 120
60 60
Note that we could get the same value by both the methods.
E. Calculation of Arithmetic Mean-Continuous Series:
In a continuous series, arithmetic mean can be computed by applying direct method or
short cut method or step deviation method.
i) Direct Method: The formula for obtaining arithmetic mean is given by

fx
X , where X represents the middle value of class intervals
N
f = frequency of each class, N = Total frequency
Illustration 5: Calculate the arithmetic mean for the following data by direct method.

Marks 0-10 10-20 20-30 30-40 40-50 50-60


No. of Students 5 10 25 30 20 10

Solution :

79

84
Class Frequency Mid value FX N = 100
Interval f X fx  3300
0-10 5 5 25
fx
10-20 10 15 150 X
N
20-30 25 25 625
30-40 30 35 1050 3300
  33
100
40-50 20 45 900
50-60 10 55 550
100 3300
Arithmetic mean of the marks is X = 33
ii) Short Method:
When short cut method is used, arithmetic mean is computed by applying the following
formula.
fd
X A , where A = Assumed mean
N
D = X - A, deviations of mid values from assumed mean
N = Total frequency
Illustration 6: Calculate the arithmetic mean for the following data by short cut method.

Marks 0-10 10-20 20-30 30-40 40-50 50-60


No. of Students 5 10 25 30 20 10

Solution :
Class Interval Frequency Mid value D = X-25 fd
f X
0-10 5 5 -20 -100
10-20 10 15 -10 -100
20-30 25 25 0 0
30-40 30 35 10 300
40-50 20 45 20 400
50-60 10 55 30 300
100 800

N = 100, fd  800 , A = 25


80

85
fd 800
X A  25   33
N 100
Arithmetic mean of the marks is X  33
iii) Step Deviation Method:
This method can be used for the calculation of various statistical measures in case of
continuous distributions with equal intervals. The calculations will be further simplified in this
method. A very important note with regard to this method is that, it can be applied only when the
class intervals are equal for all the classes.
The formula used in this method is

fd1
X A , where A = Assumed mean, f = Class frequency
N

XA
d1  , C is the class interval
C
X = middle value of the class interval
Illustration 7: Calculate the arithmetic mean for the following data by step deviation method.

Marks 0-10 10-20 20-30 30-40 40-50 50-60


No. of Students 5 10 25 30 20 10

Solution :
Class Interval Frequency Mid value d1-X-35/10 fd1
f X
0-10 5 5 -3 -15
10-20 10 15 -2 -20
20-30 25 25 -1 -25
30-40 30 35 0 0
40-50 20 45 1 20
50-60 10 55 2 20
100 -20

81

86
A-35, C=10, N = 100 fd1  20

X A
fd1
xC  35 
 20 x10  33
N 100

\ Arithmetic mean of the marks = 33

Note that we could get the same value of arithmetic mean either by direct method or by short cut
method or by step deviation method.

6.3.1 Median:

The median by definition refers to the middle value in a distribution. That means it is the
item which divides the total observations into two equal parts. Median is an item for which half of
the items are less and half the items arc more than it. As distinct from the arithmetic mean, the
median is called a positional average. The term position refers to the place of a value in a series.
The place of the median in a series is such that an equal number of items lie on either side of it.

A. Merits of Median:

Median has certain merits when compared to arithmetic mean or mode.

1) Median is especially useful in case of open end classes, since only the position
and not the values of items must be known. The median is also recommended if
the distribution has unequal classes, since it is easier to compute than the mean.

2) Extreme values do not affect the median as strongly as they do the mean. Very
often, when extreme values are present in a set of observations, the median is a
more satisfactory measure of central tendency than the mean.

3) In markedly skewed distributions such as income distributions or price


distributions, where the arithmetic mean would be distorted by the extreme values,
the median is especially useful.

4) It is the most appropriate average in dealing with qualitative data, i.e., where
ranks are given or there are other types of items that are not counted or measured
but are scored.

82

87
5) The value of median can be determined graphically whereas the value of mean
cannot be graphically ascertained.

6) The median indicates the value of the middle item in the distribution. This is a
clear cut meaning and makes the median a measure that can be easily explained.

B. Limitations of Median:

Median has certain limitations also as presented in the following.

1) For calculating median, it is necessary to arrange the data; other averages do not need
any arrangement.

2) Since it is a positional average, its value is not determined by each and every item of the
distribution.

3) It is not capable of algebraic treatment. For example, combined median cannot be


calculated as in the case of combined arithmetic mean. Because of this limitation, median is much
less popular when compared to arithmetic mean.

4) The value of median is affected more by sampling fluctuations than the value of the
arithmetic mean.

5) The median, in some cases, cannot be computed exactly as the mean. When the number
of items included in a series of data is even, the median is determined approximately as the mid-
point of the two middle terms.

6) Tt is erratic if the number of items is small.

C. Calculation of Median - Individual Observations:

When we are given a sequence of observations, we have to arrange them either in the
ascending order or in the descending order of magnitude. If the total number of items is odd, the
middle term will be the median. If the total number of items is even, the average of the two middle
terms will be the median.

Symbolically: Median = Size of N+l/2 th item.

83

88
Illustration 8: The wages of eight workers are given below:
Wages (Rs.): 1100, 1150, 1080, 1120, 1200, 1160, 1400, 1210.
Calculate the median of the wages.
Solution: Arranging the items in ascending order of magnitude, we have
1080, 1100, 1120, 1150, 1160, 1200, 1210, 1400
since we have eight terms in total,

1150+1160
Median = ______________ =1155
2

D. Calculation of Median — Discrete series:


First, we have to arrange the data in ascending order of magnitude, if vy are not. Then
we have to find less than cumulative frequencies.
Median = Size of N+1/2 th item.
In order to find the median, observe the cumulative frequency column and find that total
which is either equal to N + l/2 or next higher to that and determine ihe alue of the variable
corresponding to it.
Illustration 9: Find the median from the following data.
Income (Rs.) 1000 1500 800 2000 2500 1800
No. of Persons 24 26 16 20 6 30
Solution : Arrangement of incomes in ascending order and calculation «>I cumulative frequencies
are shown in the following table.
Income No. of Persons Cumulative
f frequency
800 16 16
1000 24 40
1500 26 66
1800 30 96
2000 20 116
2500 6 122

84

89
N+1
Median = Size of-------- th item.
122 + 1
th item.
\ Median of the incomes - Rs.1500
E. Calculation of Median - Continuous Series:
When we are given a continuous series, we have to calculate cumulative fretji and
determine the particular class in which the value of ihc median lies. In a cop series we have »
consider N/2th ilem instead of Nt-l/2tli item. T he following form- be used to determine the exact
value of the median.
(N/2 - m)
Median ^ ! t ----------------------- x C , where
f
.>• as
m ^ <.. umulalive frequency of the class precu.. n\
f - Simple frequency of the median class C C= Class interval
N that if we ..i’c given inclusive type of class intervals. we ha\ into exclusive type Iy adding 0.5 to
the upper limit and subtracting 0.5 from mo lower iium. Also when the class intervals re unequal,
the frequencies need not be adjusted to me class intervals equal and the same formula can
applied.
Illustration 10: Calculate median for the following frequency distribution.
Marks 45-50 40-45 35-40 30-35 25-30 20-25 15-20 10-15 5-10
No. of St. 10 . 15 26 30 42- 31 24 15 7

85

90
Solution: We have to first arrange the data in ascending order and find the cumulative frequencies.
Marks No. of Students Cumulative
f frequency
5-10 7 7
10-15 15 22
15-20 24 46
20-25 31 77
25-30 42 119
30-35 30 149 Median class
35-40 26 175
40-45 15 190
45-50 10 200

N 200
Median = Size of th item.  100 th item
2 2

Median lies in the class 25-30

1 = 25, m = 77, f = 42, C = 5

100 - 77   5
Median = 25  , = 25 + 2.74 = 27.74
42

Median of marks = 27.74

6.3.2 Mode:

The mode or the model value is that value in a series of observations which occurs with
the highest frequencies. The mode is often said to be that value which occurs most often in the
data. Mode is that value about which the items are most closely concentrated. It is the value
which has the greatest frequency density in its immediate neighbourhood. For this reason mode
is also called the most typical or fashionable value of a distribution. A set of data may have single
mode in which case it is said to be unimodal, it may have two modes which makes it bimodal. If
the data have several modes, they are called multimodal.

There are many situations in which arithmetic mean and median fail to reveal the true
characteristic of data. For example, when we talk of most common wage, most common income,
86

91
most common height, most common size of shoe or ready made garments, we have in mind
mode and not the arithmetic mean or median. The shortcomings in mean and median may be
overcome by the use of mode which refers to the value which occurs most frequently in a
distribution.

A. Merits of Mode:

The following are the merits of mode when compared to other measures of central tendency.

1. By definition, mode is the most typical or representative value of a distribution. Hence


when we talk of modal wage, modal size of shoe or model size of family, it is this average
that we refer to. If the modal wage in a factory is Rs.950, then more workers receive
950 than any other wage.

2. Like median, the mode is not unduly affected by extreme values. Even if the high values
are very high and the low values are very low, we choose the most frequent value of the
data to the modal value.

3. Its value can be determined in open-end distributions without ascertaining the class limits.

4. It can be used to describe qualitative phenomenon. For example, if we want to compare


the consumer preferences for different types of products, say, soap, toothpaste etc., or
different media of advertising, we should compare the modal preferences expressed by
different groups of people.

5. The value of mode can also be determined graphically whereas the value of mean cannot
be graphically ascertained.

B. Limitations of Mode:

The important limitations of this average are:

1. The value of mode cannot always be determined. In some cases we may have a
bimodal series.

87

92
2. It is not capable of algebraic manipulations. For example, from the modes of
two sets of data we cannot calculate the overall mode of the combined data.

3. The value of mode is not based on each and every item of the series.

4. It is not a rigidly defined measure. There are several formulae for calculating the
mode, all of which usually give somewhat different answers.

5. While dealing with quantitative data, the disadvantages of the mode outweigh its
good features and hence it is used rarely.

C. Calculation of Mode - Individual Observations:

For determining the mode, count the number of times the various values repeat themselves
and the value occurring maximum number of times will be the mode.

Illustration 11 :Calculate mode from the following data of marks obtained by 10 students.

Marks: 10, 27, 24, 12, 27, 27, 20, 18, 15 and 30.

Solution: We observe that the item 27 occurs 3 times and all the remaining items only once. Since
the item 27 occurs the maximum number of times, i.e., 3, the mode of the marks is 27.

D. Calculation of Mode - Discrete Series:

In discrete series, quite often mode can be determined just by inspection, i.e., by looking
to that value of the variable around which the items arc most heavily concentrated. However,
when the mode is determined just by inspection, an error of judgement is possible. There are
certain cases where the difference between the maximum frequency and the frequency preceding
it or succeeding it is very small and the items are heavily concentrated on either side. In such
cases, it is desirable to prepare a grouping table and an analysis table. These tables help us in
ascertaining the modal class.

A grouping table has 6 columns. In column 1, the maximum frequency is marked or put
in a circle. In column 2, frequencies are grouped in two’s. In column 3, leave the first frequency
and then group the remaining in two’s. In column 4, group the frequencies in three’s. In column 5,

88

93
leave the first frequency and group the remaining in three’s. In column 6, leave the first two
frequencies and then group the remaining in three’s. In each of these cases, take the maximum
total and mark it in a circle or bold type.
After preparing the grouping table, prepare an analysis table. While preparing this table, put
column number on the left hand side and the various probable values of mode on the right hand
side. The values against which frequencies are the highest are marked in the grouping table and
then entered by means of a bar in the relevant box corresponding to the values thy represent.
The procedure of preparing grouping table and analysis table shall be clear from the following
illustration.
Illustration 12: Calculate the value of mode for the following data.

Marks 10 15 20 25 30 35 40
No. of Students 8 12 36 35 28 18 9
Solution: Since it is difficult to say by inspection as to which 20, 25 or 30 is the modal value, we
prepare the grouping and analysis tables.
Grouping Table
Marks f
X I II III IV V VI
10 8
20

15 12 56
48
20 36 83
71
25 35 99
63
30 28
46 81
35 18 55
27
40 9

Analysis Table

89

94
Column 20 25 30
No.
I 1
II 1 1
III 1 1
IV 1 1
V 1 1
VI 1 1 1
4 5 3
Corresponding to the maximum total 5, the value of the variable is 25. Hcncc modal value is 25.
E. Calculation of Mode - Continuous Series:
When we are given a continuous series, first of all we have to identify the modal class
either by inspection or by preparing grouping and analysis tables. Then we have to apply the
following formula.

1
Mode  1      C,
1 2

1 = lower limit of modal class

1 = difference between the frequency of the modal class and the preceding class (ignoring
signs)

 2 = difference between the frequency of the modal class and the succeeding class (ignoring
signs)
C = Class interval of the model class
While applying the^bove formula for calculating mode, it is necessary to see that the
class intervals arc equal. If they are unequal, they should first be made equal on the assumption
that the frequencies are equally distributed through out the class, otherwise we will get misleading
results.
Illustration 13: Calculate mode from the following data.

90

95
Marks No. of Marks No. of
Students Students
0-10 3 50-60 15
10-20 5 60-70 12
20-30 7 70-80 6
30-40 10 80-90 2
40-50 12 90-100 8
Solution: By inspection, the modal class is 50-60.

1  15  12  3,  2  15  12  3, C  10, 1  50

1
Mode  1      C,
1 2

3
 50   10  50  5  55
33
\ Mode of the marks = 55
6.3.3 Geometric Mean:
Geometric mean is defined as the Nth root of the product of N items. If there are two
items, we have to take the square root; if there are three items, the cube root; and so on.
Symbolically

GM  X1 , X 2 ..............X n

When the numbers of items is three or more the task of multiplying the numbers and the
calculating the Nth root becomes excessively difficult. Logarithms are used to simplify the
calculations. Geometric mean is then calculated as follows.

  log x 
GM  Antilog 
 N 
Illustration 14: Daily income of ten families of a particular place is given below:
Daily income (Rs.): 85, 70, 15, 75, 500, 8, 45, 250, 40, 36.
Calculate the geometric mean.

91

96
Solution:
X Log X X Log X
85 1.9294 8 0.9031
70 1.8481 45 1.6532
15 1.1761 250 2.3979
75 1.8751 40 1.6021
500 2.6990 36 1.5563
Total 17.6373

  log x   17.6373 
GM  Antilog   AL   AL 1.7376  58.03
 N   10 
6.3.4 Harmonic Mean
The harmonic mean is based on the reciprocals of numbers averaged. It is defined as the
reciprocal of the arithmetic mean of the reciprocal of the individual observations. Thus harmonic
mean can be obtained as follows.

N
H.M. 
1 X 

Illustration 15: The monthly incomes of 8 employees are given below: Monthly income (Rs.):
1250, 1190, 1340, 1160, 1090, 1150, 1210, 1400
Find the harmonic mean
Solution:

X 1/X X l/X
1250 0.0008 1090 0.0009
1 190 0.0008 1 150 0.0008
1340 0.0007 1210 0.0008
1160 0.0009 1400 0.0007
Total 0.0064

92

97
N 8
H.M.    1250
1 X  0.0064

6.4 Summary :
A single value that describes the characteristics of the entire mass of data is called an
average or a measure of central tendency. An average should posses certain characteristics in
order to he a good average. The very commonly used measures of central tendency are: arithmetic
mean, median, mode, geometric mean and harmonic mean. Each average can be obtained for
different types of data, viz., individual series, discrete series and continuous series. There are
certain merits and limitations for each average when compared to the other averages.
6.4 Unit end Questions :
1. Why averages arc called measures of central tendencies? What arc the requisites
of a good average?
2. What is an arithmetic mean? Explain its merits and limitations. Describe various
methods of obtaining arithmetic mean.
3. Explain the merits and limitations of median.
4. Explain the concept of mode. How can you obtain the mode if it cannot be
identified by inspection?
6.5 Readings :
1. S.P.Gupta: Statistical Methods.
2. D.C.Sancheti & V.K.Kapoor: Statistics -Theory, Methods and Applications.
3. K.V.Rao : Research Methodology in Commerce and Management.

93

98
GUIDELINE - 7 :
MEASURES OF DISPERSION-I

Structure
7.0 Objective
7.1 Quartile Deviation
7.1.1 Coefficient of Quartile Deviation
7.2 Individual Observation
7.3 Discrete Series
7.4 Merits and Limitations of Quartile Deviation
7.5 The mean Deviation
7.6 Computation of Mean Deviation - Individual Observation .
7.7 Discrete Series
7.8 Continuous Series
7.9 Merits and Lamitations of Mean Deviation

7.0 Objective:
In the previous lessons, you have studied about the measures of central tendency. For a
representative nature of the data, we have to study, how the given observations are moving
around the value of central tendency, i.e.. whether the values are closer to the central value or
whether they are widely scattered to the left of the central value or to the right of the value or in
both directions. Measures of Dispersion is a statistical technique to know this phenomenon. In
this lesson, the methods of Quartile Deviation and Mean Deviation are discussed.
7.1 Quartile Deviation:

Q 3 - Q1
Quartile Deviation or Q.D. 
2
Quartile Deviation gives the average amount by which the two quartiles differ from the
median. In a symmetrical distribution the two quartiles Q1and Q3  are equidistant from the

94

99
median, i.e.. Med - Q1  Q3  Med . and as such the difference can bt taken as a measure of
dispersion. The median ± Q.D.), covers exactly 50 percent of the observations.

W hen Quartile deviation is very small, it describes high uniformity or small variation of
the central 50% items and a high quartile deviation means that the variation among the central
iteams is large.

Quartile deviation is an absolute measure of dispersion. The relative measure corresponding


to this measure, called the coefficient of quartile deviation is calculated as follows.

7.1.1 Coefficient of Quartile Deviation :

Q3  Q1   Q3  Q1
Coefficient of Q.D.  2 Q 3  Q1

Coefficient of quartile deviation can be used to compare the degree of variation in different
distributions.

Computation of Quartile Deviation.

The process of computing quartile deviation is very simple, we have just to compute the
values of the upper and lower qumtiles.

7.2 Individual Observation:

Find out the value of quartile deviation and its coefficient from the following data.

Roll No: 1 2 3 4 5 6 7

Marks : 20 28 40 12 30 15 50

Solution:

Marks arranged in ascending order:

12 15 20 18 30 40 50

95

100
N 1 7 1
Q1  Size of th iteam  Size of  2 nd iteam size of second
4 4
iteam is 15. Thus Q1 15

 N 1
Q3  Size of 3  th iteam
 4 

 3 8 
th iteam  6 item
th
Size of 
 4 

Size of 6th item is 40. Thus Q3 = 40.

Q3  Q1 40  15
QD    12.5
2 2

Q3  Q1 40  15 25
Coefficient of Q.D  Q  Q  40  15  55  0.455
3 1

7.3 Discrete Series :


Compute coefficient of quartiled deviation from the following data :
Marks Frequency C.f
l0 4 4
20 7 11
30 15 26
40 8 34
50 7 41
60 2 43

N 1 Q 1
Q1  Size of th iteam  3 11th item
4 4
Size of 11th iteam is 20. Thus Q1=20
th
 N 1 3  44
Q3  Size of 3   item   33rd item
 4  4

96

101
Size of 33rd item is 40. Thus, Q3= 40.

Q3  Q1 40  20
Q.D    10
2 2

Q3  Q1 40  20
Coefficient of Q.D  Q  Q  40  20  0.333
3 1

Calculate quartile deviation and the coefficient of quartile deviation from the following data:
Calculation of Q.D and its Coefficient
Wages f C.f
(Rs Per week)
Less than 35 14 14
35-37 62 76
38-40 99 175
41-43 18 193
Over 43 7 200

Q3  Q1
Q.D 
2

N th 200
Q1 = Size of item = = 50th item Q1, lies in the class 35-37
4 4

N th 200
Q1  Size of item   50th
4 4

N
 C. f
Q1  L  4 i
f

L = 35, N/4 = 50, C.f = 14, f = 62, i=2

50  14
Q1  35   2  35  1.16  36.16
62

3N 3  200
Q3  Size of th
item   15Q th item
4 4

97

102
Q, lies in the class 38-40 /

3N / 4  C. f
Q3  L  i
f

L= 38, 3N/4 = 150, C.f = 76, f=99, i=2.

150  76
Q3  38    38  1.49  39.49
99

39.49  36.16
Q.D   1.67
2

Q3  Q1
Coefficient of Q.D  Q  Q
3 1

39.49  36.16 3.33


   0.044
39.49  36.16 75.65

7.4 Merits and Limitations :

In certain respects it is superior to range as a measure of dispersion.

It has a special utility in measuring variation in case of open and distribution or one in
which the data may be ranked but-measured quantitatively.

It is also useful in cratic or badly skewed distributions, where the other measures of
dispersion would be warped by extreme values. The Quartile deviation is not affected by the
presence of extreme values.

Limitations:

Quartile deviation ignores 50% items, i.e., the first 25% and the last 25%. As the value of
Quartile deviation does not depend upon every item of the series, it cannot be regarded as a
good method of measuring dispersion.

* It is not capable of mathematical manipulation

* Its value is very much affected by sampling fluctuations.

* Quartile deviation is not itself measured from an average, but it is a positional average.

Because of the above limitations quartile deviation is not often for statistical inference.
98

103
7.5 The mean Deviation :

The mean deviation is also known as the average deviation. It is I lie average di Here nee
between the items in a distribution and the median or mean of that series. Theoretically there is an
advantage in talking the deviations from median because the sum of deviations of i teams from
median is minimum when signs are ignored. However, in practice, the arithmetic mean is more
frequently used in calculating the value of average deviation and this is the reason why it is more
commonly called mean deviation. In any case, the average used must be clearly stated in a given
problem so that any possible confusion in meaning is avoided.

7.6 Computation of mean Deviation - Individual Observations :

If x2, x1, x3, xN are N given observations then the deviation about an average A is given
by

1
M .D  x  A
n

1 D
 D or
N N

Where |D| = x - A |. Read as mod (x - A) is the modulus value or absolute value of the
deviation ignoring plus and minus signs.

Steps :

* Compute the median of the series.

* The deviations of iteams from median ignoring ± signs and denote these deviations by |D|

* Obtain the total of these deviations, i.e., X .

* Divide the total obtained in step (iii) by the number of observations.

If a distribution is normal, the mean ± mean deviation is the range that w ill include 57.7
percent of the iteams in the scries. If it is moderately skewed, then we may expect approximately
57.5 percent of the iteams to fall within this range. Hence, if average deviation is small, the
distribution is highly compact or uniform, since more than half of the cases are concentrated with
a small range around the mean.

99

104
The relative measure corresponding to the mean deviation, called the coefficient of mean
deviation, is obtained by dividing mean deviation by the particular average used in computing
mean deviation. Thus, if mean deviation has been computed from median. The coefficient of
mean deviation small be obtained by dividing mean deviation by median.

M .D
Coefficient of M .D 
Median

If mean has been used while calculating the value of mean deviation, in such a case
coefficient of mean deviation shall be obtained by dividing mean deviation by the mean.

Problem:

Calculate the mean deviation and its coefficient of the two income groups of five and
seven members given below:

l(Rs) : 4,000 4,200 4,400 4,600 4,800

11 (Rs) : 3,000 4000 4,2000 4.400 4,600 4,800 5,800

Solution:

Calculation of Mean Deviation

Group I Group II

Deviation from Deviation from

median 4,400(D) median 4,400(D)

4,000 400 3.000 1,400

4,200 200 4,000 400

4,400 0 4,200 200

4,600 200 4,400 0

4,800 400 4,600 200

4,800 400

5,800 1,400

N=5  D  1200 N=7  D  4000

100

105
D
Mean Deviation: Group M .D 
N

N  1th
D = Deviation from median ignoring signs Median = Size of item
2

5  1 rd
  3 item.
2

1200
Size of 3rd item is 4,400 M.D = = 240
5
This means that the average deviation of the individual incomes from the median income
is Rs. 240.
Mean Deviation : Group II

N  1th 7 1
Mean = Size of item = = 4th item size of 4th item is 4,400
2 2

 D  4,400 N = 7

4,000
M .D   571.43
7
Note : If we were compute coefficient of mean deviation we shall divide mean deviation by
median. Thus for the first group.
240
M .D.   0.054
Coefficient of 4,400

and for the second group

571.43
Coefficient of M .D.   0.130
4,400

7.7 Discrete Series :


In discrete series the formula for calculating mean deviation is

f D
M .D.  (by the same logic as given before)
N

101

106
D denotes deviation from median ignoring signs.

Step :

 Calculate the median of the series

 Take the deviations of the items from median ignoring signs and denote them by D

 Multiply these deviations by the respective frequencies and obtain the total  f D

 Devide the total obtained in step (ii) by the number of observations.

 This gives us the value of mean deviation

Problem:

Calculate mean deviation from the following series:

x 10 11 12 13 14

f 3 12 18 12 3

Solution:

Calculation of Mean Deviation

Solution

x f D f D C. f

10 3 2 6 3

11 12 1 12 15

12 18 0 0 33

13 12 1 12 45

14 3 2 6 48

f D
M .D. 
N

N  1th 48  1
Median = size 2 item   24.5th item
2

102

107
size of 24.5 item is 12, hence medium = 12

36
M .D.   0.75
48
7.8 Calculation of Mean Deviation - Continuous Series:
For calculating mean deviation in continuous series the procedure remains the same as
discussed above. The only difference is that here we have to obtain the midpoint of the various
classes and take deviations of these points from median.
Problem :
Calculate the mean deviation and its coefficient from the following data:

Class frequency C.f M.P M - 43 f D

M D

0- 10 5 5 5 38 190
10-20 8 13 15 28 224
20-30 12 25 25 18 216
30-40 15 40 35 8 120
40-50 20 60 45 2 40
50-60 14 74 55 12 168
60-70 12 86 65 22 264
70-80 6 92 75 32 192

N=92 f D  1414

N th 92
Med size of item =  46 th item
2 2
Medium lies in the class 40-50

N / 2  C. f
Med = L  i
f

L = 40, N / 2 = 46, C.f = 40, f=20, i=10

46  40
Med = 40   10  46  3  43
20

103

108
f D 1414
] M .D    15.37
N 92

7 Limits and Limitations:

Merits :

The outstanding advantage of the average deviation is its relative simplicity. It is simple to
understand and easy to compute. Any one familiar when the concept of the average can readily
appreciate the meaning of the average deviation. It a satiation at ion requires a measures a
measure of dispersion that will be presented to the general public or a group not very familiar
with in statistics. The average deviation is useful.

* It is based on each and every item of the data. Consequently change in the value of any
item would change the value of mean deviation.

* Since deviations are taken from a central value, co- repression about formation of different
distributions easily be made.

Limitations:

 The greatest drawback of this method is that algebraic signs arc ignored while taking the
deviations of the items.

 This method may not give us very accurate results.

 It is not capable of further algebraic treatment

 It is very used in sociological studies.

Because of these limitations its use is limited and it is overshadowed as a measure of variation by
the superior standard deviation.

104

109
GUIDELINE-8 :
MEASURES OF DISPERSION-II

Structure
8.0 Objective
8.1 The Standard Deviation
8.1.1 Calculation if Standard Deviation
8.2 Calculation of Standard Deviation in Discrete Series
8.3 Calculation of Standard Deviation in Continuous Series
8.4 Coefficient of Variation
8.0 Objective
Standard deviation is another measure of dispersion. This is an important measure in the
sense that, while calculations mean deviation from arithmetic mean, we have to take modulas to
the deviations, otherwise the sum o(“deviations from arithmetic mean will become zero. To ohivale
this problem, standard deviation is used in which the deviations will be squared. Further, standard
deviations will be useful in the computation of coefficient of variation, coefficient of skewness.
etc. This lesson will provide a detailed procedure of computing standard deviation and thereby
coefficient of variation.
8.1 The Standard Deviation
The standard deviation concept was introduced by Karl Pearson in 1823. It is by far the
most important and widely used measure of studying dispersion. Its significance lies in the fact
thai it is free from and defuse from which the earlier methods suffer of statistics most of the
properties of a good measure of dispersion. Standard deviation is also known as root mean
square deviation for the reason that it is the square root of the mean of the squared deviation from
the arithmetic mean. Standard deviation is denoted by the small break latter a (read as sigma).
8.1.1 Calculation of Standard Deviation : Individual Observation:
In case of Individual observations standard deviations may be computed by applying
any of the following two methods.

105

110
Deviations taken from Actual Mean:
When deviations are taken from actual mean the following formula is applied :

x 2

N

x  x  x 

Steps :
 Calculate the actual mean of the series i.e x
 take the deviations of the items from the mean, i.e., find (.r - x). Denote these
 deviations by x.
 Square these deviations and obtain the total x 2

 Divide x 2 by the total number of observations, i.e., N and extract the squareroot. This
gives us the value of standard deviation.
Deviations taken from Assumed Mean :
When the actual mean is in fractions, say it is 123.674 it would be too cumbersome to
take deviations from it and then obtain squares of these deviations. In such a case either the mean
may be approximated or else the deviations be taken from an assumed mean and the necessary
adjustment made in the value ofthe standard deviation. The former method of approximation is
less accurate and there lore, invariably in such a case deviations are taken from assumed mean.
When deviations are taken from assumed mean the following formula is applied.

2
d 2  d 
 
N  N 

Steps :
 Take the deviations of the items from an assumed mean, i.e., obtain (x-A). Denote these
deviations by d. Take the total of these deviations, i.e., obtain d .
 Square these deviations and obtain the total d 2

 Substitute the values of d 2 , d and N in the above formula.

106

111
Problem: Blood Serlum Cholesterol levels of 10 persons are as under, calculate the standard
deviation with the help of assumed mean.

x (x-264) d2 d

240 -24 576

260 -4 16

290 + 26 676

245 -19 361

255 -9 81

288 +24 576

272 -8 64

263 -1 1

277 + 13 169

251 -13 169

x  2641 d  1 d 2  2689

2
d 2  d 
  
N  N 

d 2  2689, d  1, N  10

2
2689  1 
  
10  10 

 268.9  0.01  16.398

8.2 Calculation of Standard Deviation - Discrete Series:

1. Actual Mean Method

2. Assumed Mean Method

3. Step Deviation Method

107

112
(A) Actual Mean Method :
When this method is applied, deviations are taken from the actual mean i.e.. we find
x  x  and denote these deviations by x. These deviations are then squared and multiplied by
the respective frequencies. The following formula is applied.

fx 2
 , where x   x  x 
N
(b) Assumed Mean Method :
2
fd 2  fd 
  
N  N 

Where d   x  A
Steps :
* Take the deviations of the items from an assumed mean and denote these deviations by
d.
* Multiply these deviations by the respective frequencies and obtain the total V fd.
* Obtain the squares of the deviations, i.e.. Calculate d 2 .

* Multiply the squared deviations by the respective frequencies, and obtain the total fd 2 .
* Substitute the values in the above formula.
Problem:
Calculate the standard deviation from the data given below:
x f (x - 6.5) fd fd2
Size of item d
3.5 3 -3 9 27
4.5 7 -2 -14 28
5.5 22 -1 22 22
6.5 60 0 0 0
7.5 85 +1 +85 85
8.5 32 +2 +64 128
9.5 8 +3 +24 72
N=217 fd  128 fd 2  362

108

113
2
fd 2  fd 
  
N  N 

 fd 2  362,  fd  128, N  217

2
362  128 
  
217  217 

 1.668  348  1.149

C ) Step Deviation Method :

When this method is used we take deviations of mid points from an assumed mean and
divide these deviations by the width of class interval, i.e.. “I” In case class intervals are unequal,
we divide the deviations of midpoints by the lowest-common factor and use C instead of “i” in
the formula for calculating standard deviation. The formula for calculating standard deviation is:

2
 fd 2   fd 
   i
N  N 

 x A
Where d    and i class interval. The use of above formula simplifies calculations.
 i 

Problem :

The annual salaries of a group of employees are given in the following table;

Calculate the standard deviation of the salaries.

109

114
Salaries No. of persons (x-60)/S f.d f.d2

x f d

45 3 -3 -9 27

50 5 -2 -10 20

55 8 -1 -8 8

60 7 0 0 0

65 9 +1 +9 9

70 7 +2 +14 28

75 4 +3 +12 36

80 7 +4 +28 112

N=50  fd  36  fd 2  240

2
 fd 2   fd 
   i
N  N 

 fd 2  240, N  50,  fd  36, i  5

2
240  36 
   i
50  50 

 4.8  5.184  5

  10.35
8.3 Calculation of Standard Deviation - Continuous Series:

The formula is

2
 fd 2   fd 
   i
N  N 

 m A
d   where i = class interval
 i 
110

115
Steps :

* Find the mid - points of various classes.

* lake the deviations of these midpoints from an assumed mean and denote these deviations
by d.

* Wherever possible take a common factor and denote this column by d.

* Multiply the prequencies of each class with these deviations and obtain  fd .

* Square the deviations and multiply them with the respective frequencies of each class
and obtain  fd 2 .

Problem:

Find the standard deviation from the following data.

Age f MP (m-35)/10 fil fd2

m d

0-10 15 5 -3 -45 135

10-20 15 15 -2 -30 60

20-30 23 25 1 -23 23

30-40 22 35 0 0 0

40-50 25 45 +1 +25 25

50-60 10 55 +2 +20 40

60-70 5 65 +3 +15 45

70-80 10 75 +4 +40 160

 fd  2  fd 2  488

2 2
fd 2  fd  488  2 
   1     10
N  N  125  125 

 3.904  0.0003  10  1.976 10  19.76

111

116
Coefficient of Variation :

This measure developed by Karl Pearson is the most commonly used measure of relative
variation. It is used such problems where we want to compare the variability of one or more than
two series. That series (or group) for which the coefficient of variation is greater is said to be
more variable or conversely less consistent, less uniform, less stable or less homogeneous on the
other hand, the series for which coefficient of variation is less is said to be less variable or more
consistent, more uniform, more stable or more homogeneous.

Coefficient of variation is denoted by C.V. and is obtained as follows:


Coefficient of Variation or C.V   100
x

Problem:

The following table shows that monthly expenditure of 80 students of a university on


morning breakfast. Calculate coefficient of variation.

Expenditure MP f (m-60)/5 f.d f.d2

(Rs.) m d

78-82 80 2 +4 32

73-77 75 6 +5 + 18 54

68-72 70 7 +2 +14 28

63-67 65 12 +/ + 12 12

58-62 60 18 0 0 0

53-57 55 13 -1 -13 13

48-52 50 9 2 -18 36

43-47 45 7 -3 -21 63

38-42 40 4 -4 -16 64

33-37 35 2 -5 -10 50

N=80  fd  26  fd 2  352

fd
x  A i
N
112

117
26
 60  5
80

 60  1.625  58.375

2
 fd 2   fd 
   i
N  N 

2
352   26 
   5
80  80 

 4.4  0.106  5  2.072  5  10.36

 10.36
C .V   100   100  17.75%
x 58.375
Problem:
From the prices of shares of a and y below find out which is more stable.

A X  X  x2 y Y  Y  y2
x y
35 -16 256 108 +3 9
54 +3 9 107 +2 4
52 +1 1 105 0 0
53 +2 4 105 0 0
56 +5 25 107 +1 1
58 +7 49 107 +2 4
52 +1 1 104 -1 1
50 -1 1 103 -2 4
51 0 0 104 -1 1
49 -2 4 101 -4 16

x  510 x  0 x 2  350 y  1050 y  0 y 2  40

113

118
Coefficient of Variation X:


C .V .   100
X

X 510
X   51
N 10

x 2 350
   5.916
N 10

5.916
C .V .   100  11.6
51

Coefficient of variation Y:


C.V .   100
y

y 1050
y   105
N 10

y 2 40
  2
N 10

2
C.V .   100  1.905
105

Since coefficient of variation is much less in case of shares y, hence they are mere stable
in value.

114

119
GUIDELINE - 9 :
SOLVED PROBLEMS IN MEASURES OF CENTRAL
TENDENCY, MEASURES OF DISPERSION

Structure

9.0 Objectives

9.1 Introduction

9.2 Measures of Central Tendency

9.3 Measures of Dispersion

9.4 Suggested Readings

9.0. Objectives

a) To present the solved problems in various measures of central tendency.

b) To acquaint the student with the problems in measures of dispersion.

9.1. Introduction

Arithmetic mean, median, mode, geometric mean and harmonic mean constitute measures
of central tendency. Quartile deviation, mean deviation and standard deviation, constitute measures
of dispersion. Pearson’s coefficient of correlation and spearman rank correlation are the very
commonly used measures of correlation. The two regression equations, vig., regression equation
of y on x and regression equation of x on y are applied for the purpose of estimation in regression
analysis.

9.2. Measures of Central Tendency

Various problems on calculation of arithmetic mean, median, mode, geometric mean and
harmonic mean are presented in the following subsections.

9.2.1. Arithmetic Mean

Ex: 1: The Weekly wages of 10 workers are given below.

115

120
Wages (Rs): 550, 610, 575, 625, 715, 815, 760, 590, 630 and 750. Find arithmetic
mean of the wages.

Sol. The given data are ungrouped data and we can find the mean by direct method.

Number of items N = 10, SX = 6620

X 6620
X   662
N 10

\ AM of wages is ‘X = Rs.662.

Ex.2: Calculate arithmetic mean for the following data.

X 64 63 62 61 60 59

F 2 18 12 9 7 6

Sol: The given table is a discrete series. We can apply short cut method.

Let us take assumed mean A as 62.

X f d = X - 62 fd

64 8 2 16

63 18 1 18

62 12 0 0

61 9 -1 -9

60 7 -2 -14

59 6 -3 -18

60 -7

N = 60, Sfd = -7

X  A
fd
 62 
 7   62  0.1167  61.8833
N 60

\ Arithmetic Mean is ‘X = 61.8833.

Ex.3: Following is the distribution of 400 persons according to different income groups.

116

121
Income (Rs. 000) No. of persons Income (Rs. 000) No. of persons
0-2 81 10-25 27
2-3 103 25-50 6
3-5 115 50-75 2
5-10 64 75-100 2
Calculate arithmetic mean of the incomes.
Sol: The given data is a continuous scries. We have to apply only short cut method. We
cannot apply step deviation method because the class intervals are not equal.
Let us take assumed mean as A = 7.5
Income F Mid Value d = X - 7.5 fd
(Rs.000) X
0-2 81 1 -6.5 -526.5
2-3 103 2.5 -5.0 -515.0
3-5 115 4 -3.5 -402.5
5 - 10 64 7.5 0 0
10-25 27 17.5 10 270
25-50 6 37.5 30 180
50-75 2 62.5 55 110
75 -100 2 87.5 80 160
400 -724
N = 400 ; Sfd = -724

X  A
fd
 7.5 
 724  7.5  1.81  5.69
N 400
\ Average Income = Rs.5,690.
9.2.2. Median:
Ex.4: Determine median from the following data: 25, 20, 15, 45, 18, 7, 10, 38, 12.
Sol: Arranging the data in ascending order: 7, 10, 12, 15, 18, 20, 25, 38, 45.
117

122
N 1 9 1
Median = Size of th item. = Size of th item.
2 2
=Size of 5th item =18.
Ex.5: The following data gives you the number of members in the family for 60 families. Calculate
the median.
No. of members No. of families No. of members No. of families
1 1 7 9
2 3 8 5
3 5 9 3
4 6 10 2
5 10 11 2
6 13 12 1
Sol: The given series is a discrete series. Hence, for the purpose of obtaining the median, wc
have to calculate cumulative frequencies.
X f c.f
1 1 1
2 3 4
3 5 9
4 6 15
5 10 25
6 13 38  Median Class
7 9 47
8 5 52
9 3 55
10 2 57
11 2 59
12 1 60
N 1 9 1
Median = Size of th item. = Size of th item. = size of 30.5th item =6.
2 2
Ex.6: The following is the distribution of marks secured by 398 students in an examination.

Marks 0-20 21-30 31-40 41-50 51-60 61-70 71-80


No. of students 42 38 120 84 48 36 30

118

123
Find The median.
Sol: The given series is a continuous series with inclusive type of class intervals. Hence, for the
purpose of obtaining the median, we have to convert the class intervals to exclusive type and find
the cumulative frequencies.
Class Interval Frequency Cumulative frequency
0.5 - 20.5 42 42
20.5 -30.5 38 80
30.5 -40.5 120 200  Median class
40.5 - 50.5 84 284
50.5 - 60.5 48 332
60.5 - 70.5 36 368
70.5 - 80.5 30 398

N
Median = size of th item = size of 199th item.
2
Hence median lies in the class 30.5 - 40.5
l = 30.5, m = 80, f = 120, c = 10

N 
  m
Median = l   2  .C  30.5  190  80 .10  30.5  9.92  40.42
f 120

\ Median of marks = 40.42.


9.2.3. Mode:
Ex.7: Find the mode for the following series.
2/4, 13, 6, 8, 10, 12, 10, 17, 10, 21, 10, 18, 27. Sol: We observe that the item 10 occurs a
maximum number of limes i.e., 4 than the other items. Hence, Mode =10.
Ex.8: Find mode for the following data.
Size of a collar 12.0 12.5 13.0 13.5 14.0 14.5 15.0. 15.5
(inches)
No. of persons 10 28 38 42 45 . J5 8 7
wearing

119

124
Sol: The given series is a discrete series. By inspection it is difficult to say which is the
modal value becausc, though the highest frequency is 45, the concentration appears to be greater
around 42. Hence we have to prepare the grouping and analysis tables.
Grouping Table:
Size of I II III IV V VI
collar f
12.0 10
12.5 18 28
13.0 38 56
13.5 42 80 66 98 125
14.0 45 87
14.5 15 60
15.0 8 23 102 30
15.5 7 15 68
Analysis Table :
Column No. Size of collars
12.5 13.0 13.5 14.0 14.5
I 1
II 1 1
III 1 1
IV 1 1 1
V 1 1 1
VI 1 1 1
Total 1 3 5 4 1
The highest total in the analysis table is 5 and the item corresponding to it is 13.5.
Hence the modal size is 13.5 inches.
Ex.9: Calculate mode for the distribution of monthly rent paid by 500 families in a locality.
Monthly Rent (Rs) No. of families Monthly Rent No. of families
0-50 5 250-300 87
50-100 14 300-350 60
100-150 40 350-400 38
150-200 91 400 & above 15
200-250 150 Total 500
Sol: The highest concentration is clearly in the class interval 200 -250 which is the modal class.
120

125
L-200, D1=150 - 91= 59, D2 = 150-87 = 63, C = 50.

59
Mode  200  .50  200  24.18  224.18
59  63
Modal Rent = Rs.224.18
9.2.4. Geometric Mean:
Ex.10: Calculate geometric mean of the following series of monthly income (Rs) of a both of 10
members. 180,250,490, 120, 1400,7000, 1050, 150,360, 100.
Sol:
X Log x X Logx
180 2.2553 7000 3.8451
250 2.3979 1050 3.0212
490 2.6902 150 2.1761
120 2.0792 360 2.5563
1400 3.1461 100 2.0000
Total 26.1674
N = 10
  log X   26.1674 
GM  Anti log   Anti log 
 N   10 
= Antilog 2.6167 = 413.7
\ Geometric Mean = Rs.413.70
9.2.5. Harmonic Mean:
Ex. 11: Calculate the harmonic mean of the following series of monthly expenditure of a batch of
students.
Monthly expenditure (Rs): 125, 130, 75,. 10, 45, 50, 35,40, 500, 150.
Sol:
X 125 130 75 10 45 50 35 40 500 150
1/X 0.0080 0.0077 0.0133 0.1000 0.0222 0.0200 0.0286 0.0250 0.0020 0.0067
N=10 S(l/X) = 0.2335

N 10
HM    42.83
1 / X  0.2335
121

126
\ Harmonic mean of the expenditures = Rs.42.83.

9.3. Measures of Dispersion:

Various problems on quartile deviation, mean deviation, standard deviation and coefficient
of variation are presented in the following sub-sections.

9.3.1. Quartile Deviation:

Ex. 12: The sales of a firm in different months arc given below. Find the quartile deviation

Sales(Rs. 000):78, 86, 90, 88, 84, 88, 86, 80, 82, 84, 82, 80.

Sol: The given series is an individual series. Hence, arranging the items in ascending order of
magnitude, 78, 80, 80, 82, 82, 84, 84, 86, 86, 88, 88, 90.

N 1 12  1
Q1 = Size of th item. = size of the item. = size of 3.25th item.
4 4

1 th 1
= 3rd item + (4 item - 3rd item) = 80 + (82 - 80) = 80.5
4 4

3N  1 312  1
Q3 = Size of th item — size of th item = size ot 9.75th item
4 4

3 3
= 9th item + (10th item - 9th item) = 86 + (88-86) = 87.5
4 4

Q3  Q1 87.5  80.5
Quartile Deviation    3 .5
2 2

\ Quartile deviation of sales = Rs.3,500.

Ex.13: Calculate quartile deviation for the following distribution of weights of 31 workers in a
factory

Weight(kgs) 60 62 64 66 68 70 72 74

No. of workers 1 3 5 7 10 3 1 1

Sol: The given series is a discrete series. Hence we have to find the cumulative frequencies.

122

127
X F Cumulative Frequency
60 1 1
62 3 4
64 5 9 Q1 class
66 7 16
68 10 26 Q3 class
70 3 29
72 1 30
74 1 31
N 1 31 1
Q1 = Size of th item = size of th item = size of 8th item = 64
4 4
3N  1 331  1
Q3 = Size of th item = size of th item = size of 24th item = 68
4 4
Q3  Q1 68  64
Quartile deviation   2
2 2
\ Quartile deviation of the weights = 2 kgs.
Ex.14: Evaluate quartile deviation for the following data.
Income (Rs) No. of persons Income (Rs) No. of persons
Less than 50 54 110-130 230
50-70 100 130-150 125
70-90 140 above 150 51
90-110 300 Total 1000
Sol:
Income (Rs) F c.f
Less than 50 54 54
50 - 70 100 154
70 - 90 140 294 Q1 class
90 - 110 300 594
110 - 130 230 824 Q3 class
130-150 125 949
Above 150 51 1000
l1 = 70,m1= 154, f1 = 140, c = 20, N = 1000.
123

128
N
 m1 250  154 .20  70  13.71  83.71
Q1  l1  4 .c  70 
f1 140

l3 = 110, m3 = 594 f1 = 230, c = 20

 3N 
  m3 
Q3  l3  
4  .c  110  750  594  .20  110  13.56  123.56
f3 230

Q3  Q1 123.56  83.71
Q.D    19.925
2 2
Quartile deviation of the incomes = Rs.19.92.
9.3.2. Mean Deviation:
Ex.15: The incomes of 7 persons are given below:
Income (Rs): 4400, 5800, 3000, 4000, 4800, 4600, 4200.
Calculate mean deviation from median.
Sol: In order to identify the median, we have to arrange the data in ascending order. To find the
mean deviation, we have to take the absolute deviations from the median. The calculations are
shown in the following table.

N 1
Median = Size of th item = size of 4th item = 4,400.
2

Income (Rs) X D  X  4400

3,000 1,400
4,000 400
4,200 200
4,400 0
4,600 200
4,800 400
5,800 1,400
Total 4,000

124

129
D
M .D.  , D = X - M, M is the median.
N
N -7, S | D | = 4000.

D
\ M .D.  ,
N
\ Mean deviation of the incomes = Rs.571.43
Ex.16: Calculate mean deviation about median from the following data relating to heights (inches)
of 100 children.
Height (inches) 60 61 62 63 64 65 66 67 68
No. of children 2 3 12 29 25 12 10 4 3
Sol: The given series is a discrete series. First, we have to obtain median by calculating cumulative
frequencies.
X f cf |D|=|X - 64| f |D|
60 2 2 4 8
61 3 5 3 9
62 12 17 2 24
63 29 46 1 29
64 25 71 0 0
65 12 83 1 12
66 10 93 2 20
67 4 97 3 12
68 3 100 4 12
100 126

N 1
Median = Size of the item = size of 50.5th item = 64.
2

f D 126
M .D.    1.26
N 100
\ Mean Deviation about median of the heights = 1.26 inches.
9.3.3 Standard Deviation:
Ex.17: Calculate standard deviation for the following series. 8, 9, 15, 23, 5, 11, 19, 8, 10, 12.
Sol: Short cut method.
125

130
X d= x - 11 d2
8 -3 9
9 -2 4
15 4 16
23 12 144
5 -6 36
11 0 0
19 8 64
8 -3 9
10 -1 1
12 1 1
Total 10 284
N=10, Sd = 10, Sd2 = 284
2 2
d 2  d  284  10 
s        28.4  1  27.4  5.23
N  N  10  10 
Ex.18: Calculate standard deviation of household size from the following frequency distribution
of 500 households covered in a sample survey.
Household size 1 2 3 4 5 6 7 8 9
No. of households 92 49 52 82 102 60 35 24 4
Sol: The given series is a discrete scries.
Shortcut method.
X f d=X-5 fd fd2
1 92 -4 -368 1472
2 49 -3 -147 441
3 52 -2 -104 208
4 82 -1 -82 82
5 102 0 0 0
6 60 1 60 60
7 35 2 70 140
8 24 3 72 216
9 4 4 16 64
500 -483 2683
N = 500 Sfd = - 483 Sfd = 2683
2

126

131
2 2
fd 2  fd  2683   483 
S     
N  N  500  500 

 5.366  0.933  4.433  2.1055


\ Standard deviation of the house hold size is s = 2.1055.
Ex 19: The following table shows ihe distribution of life time of 350 radio tubes.
Lifetime(hrs) No. of tubes Lifetime(hrs) No. of tubes
300 - 400 6 700 - 800 62
400 - 500 18 800 - 900 22
500 - 600 73 900 - 1000 4
600 - 700 165 Total 350
Calculte standard deviation for the above data.
Sol: The given series is a continuous series. Since the class intervals arc equal for all classes,
we can apply step deviation method.

X  650
life time(hrs) f Mid values X d  fdc fdc2
100
300 - 400 6 350 -3 -18 54
400 - 500 18 450 -2 -36 72
500 - 600 73 550 -1 -73 73
600 - 700 165 650 0 0 0
700 - 800 62 750 1 62 62
800 - 900 22 850 2 44 88
900 - 1000 4 950 3 12 36
350 -9 385
N - 350, sfdc = 9, sfdc2 = 385 c - 100

2 2
fd 2  fd  385   9 
s   C   .C
N  N  N  350 

 1.1  0.0007 .100  1.1  0.0007 .100  1.048.100  104.8


\ Standard deviation of the life time of radio tubes = 104.8 hrs.
127

132
9.3.4. Coefficient of Variation:
Ex.20: The following data refer to the dividend (%) paid by two companies A and B over the last
7 years.
A 4 8 4 15 10 11 9
B 12 8 3 15 6 4 10
Sol: We have the calculate arithemetic mean and standard deviation of the dividends.
Company A Company B
X d=X-8 D2 Y d=Y-8 d2
4 -4 16 12 4 16
8 0 0 8 0 0
4 -4 16 3 -5 25
15 7 49 15 7 49
10 2 4 6 -2 4
11 3 9 4 -4 16
9 1 1 10 2 4
5 95 2 114

d 5 d 2
X  A  8   8.71 Y  B  8   8.29
N 7 N 7

2 2
d 2  d  d 2  d 
S   S  
N  N  N  N 

2 2
95  5  114  2 
     
7 7 7 7

 13.57  0.51  3.61  16.29  0.08  4.03

S
CV  .100
X

3.61
Company A : CV  .100  41.45%
8.71

4.03
Company B : CV  .100  48.61%
8.29
128

133
We observe that the coefficient of variation for dividends of company B is more than that of
company A. Hence, we conclude that dividends of company B are having more variations. In
other words the dividends of company A are more consistent.

9.4 Suggested Readings:

1) S.P. Gupta: Statistics) Methods.

2) D.C.Sancheti & V.K.Kapoor: Statistics-Theory, Methods and Applications.

3) R.I. Levin & D.S.Rubin: Statistics for Management.

129

134
135
136
137
138
139
140
141
142
143
144
145
146
147
GUIDELINE-10 :
CORRELATION

Structure
10.0 Objective
10.1 Introduction
10.2 Definitions
10.3 Significance
10.4 Correlation and Causation
10.5 Types of Correlation
10.6 Methods of Studying Correlation
10.7 Properties of Karl Pearson’s Correlation Co-efficient
10.8 Rank Correlation Coefficient
10.9 Concurrent Deviation Method
10.10 Summary
10.11 Model Questions
10.12 Suggested Readings
10.0 Objective:
In economic and business environment, we encounter with the relationships between
different variables- demand and price, supply and price, output and inputs, cost and output,
consumption and income, etc. This lesson helps the reader in understanding the relationship
between two or more variables under study: The analysis of correlation is one of the important
statistical techniques to understand the nature, magnitude and direction of the relationship between
two variables. Thus the objective of this lesson is to provide an understanding of the concept of
correlation and the methods of computing the co-efficient of correlation.
10.1 Introduction:
In the earlier lessons, we have studied the problems relating to one variables - averages,
dispersion etc,. However, in real world we encounter with the relationships between two or
130

148
more variables. If two variables vary in such a way that movements in one are accompanied by
movements in the other, these variables arc said to be correlated. For example, we know that
some relationship exists between quantity demanded and its price, agricultural production and
rainfall, age of a husband and age of a wife, increase in the number of television licenses and
number of cinemagoers. The degree as well as the direction of relationship between the variables
under consideration is measured through correlation analysis. Thus the measure of correlation
called as the correlation co-efficient summarizes in one figure the direction and degree of correlation.

10.2. Definitions:

Following are some of the important definitions of correlation given by statisticians and
mathematicians.

• According to Croxton and Cowden, “when the relationship is one of the


quantitative nature, the appropriate statistical tool for discovering and measuring
the relationship and expressing it in a brief formula is known as correlation”.

• According to L.R. Corner, “if two or more variables or quantities vary in a


particular manner so that movements in one tend to accompanied by
corresponding movements in the other, than they arc said to be correlated”.

• Y.L. Chow defines correlation as “an analysis that attempts to determine the
degree of relationship between two or more variables”.

• According to A.M. Tuttle, “correlation is an analysis of the co- variation between


two or more variables.

10.3. Significance of the Correlation Analysis:

The study of correlation is of immense use to economists, businessmen, industrialists,


salesmen etc. due to the following reasons:

a) In our practical life we encounter with variables which show some kind of
relationship. We know that there exists a relationship between demand and price,
supply and price, income and expenditure, savings and income etc. The

131

149
conelation analysis will help us to measure in one figure the degree or relationship
existing between the variables under study.

b) Once we know that two variables are closely related, we can estimate the value
of one variable (dependent) given the value of another variable (independent).
However, this is known as the analysis of regression which can be discussed in
the next lesson.

c) To an economist, correlation analysis helps in identifying the critically important


variables on which others depend. It may also reveal the connection by which
disturbances spread and suggest to him the paths through which stabilizing forces
may become effective.

d) To business executive, correlation analysis enables him to estimate sales, prices,


costs, etc, on the basis of some other series with which these sales, prices,
costs, etc, are functionally related. Thus, it enables to eliminate or reduce the
guess work involved in business decisions when the relationship between a
variable to be determined and the one or more other variables on which it depends
arc close and reasonably invariant.

However, it should be noted that the coefficient of correlation is one of the most widely
used and also one of the most widely abused statistical measure. It is abused in the sense that one
sometimes overlooks the fact that correlation measures are nothing but the strength of linear
relationship and that it does not necessarily imply a cause and effect relationship. For instance, if
we compute the correlation between two series of production of fertilizers and construction of
houses, we may get a high positive correlation coefficient. But there is no theoretical justification
signifying such type of relationship and caution need to be taken while interpreting the correlation
co-efficient. This type of correlation is called as spurious correlation or nonsense correlation.

10.4. Correlation and Causation:

As outlined earlier, correlation analysis helps us in determining the degree of relationship


between two or more variables. It does not tell us anything about cause and effect relationship.
Though the existence of causation always implies correlation a high degree of correlation between
two or more variables does not necessarily mean a cause and effect relationship between these
132

150
variables. The explanation of a significant degree of correlation may be any one or combination
of the following reasons:

a) The occurrence of correlation may be due to pure chance, especially in a small sample:
We may get a high degree of correlation between two variables in a sample but such
type of relationship may not exist in the universe. This is especially so in case of small
samples. Such a correlation may arise due to pure random sampling variation or due to
bias of the investigator in selecting the sample.

b) Correlation between two variables may be due to the influence of some other variable(s):
It is just possible that a high degree of correlation between the variables may be due to
some causes affecting each variable or different causes affecting each variable with the
same effect. For example, a high degree of correlation between the production of wheat
and coffee may be due to the fact that both are related and influenced by the amount of
rainfall. But none of the two variables, i.e. production of wheat and coffee, is the cause
of the other. To take another example we may get a correlation coefficient of 0.95
between the salaries of teachers and the consumption of alcohol over a period of time.
This does not indicate or prove together because, both are influenced by variables like
growth in national income, population, etc.,

c) Both the variables may be naturally influencing each other so that neither can be designated
as the cause and the other the effect. In some instances, despite of the high degree of
correlation between two variables, it may be difficult to pinpoint as to which is the cause
and which is the effect. This is especially likely to be so in ease of economic variables. It
is an established theory in economics that as price increases quantity demanded will
decrease and vice versa. Mere the cause is the change in price and the effect is the
change in quantity demanded. On the other hand, it is also possible, in a dynamic analysis,
that increased demand of a commodity due to growth of population or other reasons
may exercise an upward pressure on prices. If so, the cause is the increased, demand
and the effect is the price. Thus, at times it may become difficult to explain from the two
correlated variables which is the cause and which is the effect.

133

151
10.5. Types of Correlation:

The important ways of classifying correlation are:

a) Positive and negative

b) Simple, partial and multiple

c) Linear and non-linear

10.5.1. Positive and negative Correlation:

If both variables are moving in the same direction, the correlation is said to be positive
and the relationship is said to be direct relationship-for example Supply and price. If both variables
are moving in opposite directions, the correlation is said to be negative and the relationship is
called as inverse relationship-for example demand and price.

10.5.2. Simple, Partial and Multiple Correlation:

The distinction between simple, partial and multiple correlation is based upon the number
of variables studied. When only two variables are studied, it is a problem of simple correlation.
When three or more variables are studied, it is a problem of either partial or multiple correlation.
In partial correlation, though the number of variables is more than two we consider the relationship
between only two variables to be influencing each other while the effect of other influencing
variables being kept constant. On the other hand in case of multiple correlation the relationship
between three or more variables wilt be studied simultaneously.

10.5.3. Linear and non-linear correlation:

The distinction between linear and non-linear correlation is based upon the nature of the
ratio of change between the variables. If the amount of change in one variable tends to bear
constant ratio to the amount of change in other variable, then the correlation is said to be linear.
If we plot a graph between the two variables and if all points lie on the straight line, it is a case of
linear correlation: On the other. hand, if the amount of change in one variable does not bear a
constant ratio to the amount of change in the other variable, the correlation is said to be non-

134

152
linear.. In this case, the graph of the variables is not a straight line and denotes a curve. So, it is
also called as curvi-linear. The following graphs are examples for linear and non-linear correlation.

Diagram -1

10.6. Methods of Studying Correlation:

Different methods exist to ascertain whether two variables are correlated or not. Some
methods give only the direction of correlation while some methods will help us in determining the
degree or magnitude as well as the direction of correlation between the variations. The various
methods of measuring correlation are:

135

153
a) Scatter diagram method

b) Graphicmethod

c) Karl Pearson’s coefficient of correlation

d) Spearman’s rank correlation coefficient

e) Concurrent deviation method

Apart from these, Karl Pearson’s coefficient of correlation can also be computed by
using regression equations, which can be explained in the subsequent lessons.

10.6.1 Scatter Diagram Method:

This is the simplest method of ascertaining the correlation between two variables. Through
this method, we can determine only the direction of correlation. When this method is used, the
given data are plotted on a graph paper in the forms of dots. For each paired values of the given
variables, say X and Y, are put a dot and thus obtain as may dots as the number of paired
observations. The chart showing these dots is known as Scatter diagram. By looking at the
scatter of various dots, we can form an idea as to whether the variables under consideration are
related or not. The more closely the points lie, the higher is the relationship between the variables.
On the other hand, if the points are widely scattered, the lower is the relationship between the
variables. Different cases of correlation are shown in the following diagrams - 2:

The method is easy as it is a non-mathematical method and provides an understanding of


the nature of relationship between two variables. It also gives us an idea whether the correlation
is high or low. However, it is not possible to establish the exact degree of correlation between the
variables.

10.6.2 Graphic Method:

In graphical method, the individual values of the two variables are plotted on the graph
paper and we obtain two curves/lines for each of the variables (X and Y). By examining the
direction and closeness of the two curves so drawn, we can ascertain whether the two variables
are related or not. If both the curves drawn on the graph are moving in the same direction, the
correlation is said to be positive.
136

154
137

155
If both the drawn curves are moving in opposite directions, correlation is said to be negative.
This method is normally used where the data are given over a period of time. As in the case of
scatter diagram, in this method also, we cannot get a numerical value describing the extent to
which the variables are related.

10.6.3 Karl Pearson’s Coefficient of Correlation:

This is one of the most popular and widely used method of measuring correlation. This is
a mathematical method, wherein we obtain the direction as well as the degree of correlation
between the variables under consideration. The Pearson’s coefficient of correlation is denoted
by the symbol V. The formula for computing Pearson’s coefficient of correlation is

xy
r
NsX sY

Where x  X  X , y  Y  Y 

X = mean of the ‘X’ series

Y = mean of the ‘Y’ series

Sx = standard deviation of’X’ series

Sy = stand deviation of ‘Y’ series

N = number of observations.

We know that,

X  X  ,
1 2 1 2
sx  x
N N

Y  Y  ,
1 2 1 2
y y
N N

Substituting these values in the formula gives,

138

156
xy xy
r 
1 2 1 x 2 X y 2
x y 2
N N

The value of correlation coefficient is always lies between - 1 and+1. If r = -1, there is
perfect negative correlation, r = + 1, there is perfect positive correlation and r = 0, there is no
correlation between the variables. However, in practice we used to get a value lies between
-1 and +1. The following steps will help in computing the correlation coefficient using the above
formula.

• First, compute the mean value of the ‘X’ and ‘ Y’ series.

• Take the deviations of the ‘X’ and ‘Y’ series from the respective means, i.e.
compute

The Series of values of x = (X - X) and y = (F - Y)

 Square the deviations of 4X’ and ‘Y’ and obtain the respective totals, ie, x 2 and
y 2

 Multiply the deviations of X and Y series and obtain the total, i.e. xy

Substitute the values of xy , x 2 and y 2 in the formula to obtain the value
of correlation coefficient (r).

This method is easier when the mean values of ‘X’ and ‘Y’ scries arc integers. If the
mean values are not integers, the computation of deviations, their squares and their product
involves much time and in that case we can use the following formula which is obtained by
expanding the terms in the numerator and denominator of the .formula.

Σ XY  ΣX ΣY 
r
NΣ X  Σ X  NY 2  ΣY 
2 2 2

• The advantage of this method is, the correlation coefficient can be calculated
directly from the given values of the two series ‘X’ and ‘Y’ without taking
deviations. The following steps can be followed while computing the correlation
coefficient.

139

157
• Total the series of the ‘X’ and ‘Y’ series to obtain ΣX and ΣY

• Multiply the series of ‘X’ with the corresponding value of the ‘Y’ series and total
to obtain ΣXY

• Square the value in the ‘X’ series and total to obtain ΣX 2

• Square the value in the ‘Y’ series and total to obtain ΣY 2

Substitute the values of ΣXY, X, Y, X 2 , Y 2 and N (number of observations) in


the formula and simplify it to obtain the value of the coefficient of correlation.

Quite often we encounter with data, where the numbers are large and the means are not
integers. If the given data involves large numbers, the computation of ΣXY, X 2 , Y 2 etc.,
consumes more time and this can be avoided by taking deviations from some assumed mean of
the series. In this case, the formula for computing the coefficient of correlation is
NΣ dx dy - Σdx dy
r

N Σ dx  Σ dx 2
2
 
N Σ dy 2  Σ dy 2 
Where dx refers to deviations of ‘X’ series from an assumed mean

dy refers to deviations of ‘Y’ series from an assumed mean

Σdx = sum of the deviations of’X’ series from an assumed mean

dy = sum of the deviations of ‘Y’ series from an assumed mean

Σ dx dy = sum of the product of the deviations of ‘X’ and ‘Y’ series from their assumed mean

Σ dx 2 = sum of the squares of the deviations of ‘X’ scries from an assumed mean

Σ dy 2 = sum of the squares of the deviations of ‘Y’ series from an assumed mean

N = Number of observations.

Substitute the above values in the formula and simplify to obtain the value of the coefficient
of correlation.

140

158
It should be noted that, whatever the formula we use, we get the same value of the
coefficient of correlation and the use of the formula is in accordance with convenience. This can
be illustrated with the following example.

Example - 1: Calculate Karl Pearson’s Coefficient of correlation from the following data;

X: 42 45 49 56 59 64 63

Y: 17 20 27 39 35 39 40

x XX yYY
X Y x2 y2 xy
 X  54  Y  54

42 17 -12 -14 144 196 168

45 20 -9 -11 81 121 99

49 27 -5 -4 25 16 20

56 39 2 8 4 64 16

59 35 5 4 25 16 20

61 39 10 8 100 64 80

63 40 9 9 81 81 81

ΣX  378 ΣY  217 Σx 2  460 Σy 2  558 Σxy  484

ΣX 378 ΣY 217
X   54 ; Y   31 ;
N 7 N 7

Σxy 484 484


r  
2
Σx Σy 2
460X558 256680

484
  0.96
506.60

141

159
Solution II : By direct method

X Y X2 Y2 XY

42 17 1764 289 714

45 20 2025 400 900

49 27 2401 729 1323

56 39 3136 1521 2184

59 35 3481 1225 2085

64 39 4096 1521 2496

63 40 3969 1600 2520

ΣX  378 ΣY  217 ΣX 2  217 ΣY 2  20872 ΣXY  12202

NΣΣX  ΣX ΣY 


r

NΣΣ 2 - ΣX 2  
NΣΣ 2 - ΣY 2 

7 .12202 - 378 . 217



7 . 20872 - 378 7 . 7285 - 217 
2 2

85414  82026 3388


 
146104 - 142884 50995 - 47089 3220 3906

7 . 484 484
   0.96
7 . 460 7 . 558 460 558

142

160

You might also like