Unit 2 Measures of Central Tendency and Dispersion: Structure
Unit 2 Measures of Central Tendency and Dispersion: Structure
Unit 2 Measures of Central Tendency and Dispersion: Structure
Structure
2.1 Introduction
Objectives
2.2 Central Tendency and Dispersion
2.3 Measures of Central Tendency
The Mean
The Median
The Mode
Algebraic Properties of the Measures
A Comparison of the Measures
2.4 Measures of Dispersion
The Range
The Mean Deviation
The Standard Deviation
Algebraic Properties of the Measures
A Comparison of the Measures
2.5 Coefficient of Variation
2.6 Summary
2.7 Solutions and Answers
2.1 INTRODUCTION
In Unit 1, we have seen that statistical data may relate to qualitative characters as
well as quantitative. This unit highlights some common features of a frequency
distribution of a single variable (also called a univariate frequency distribution).
These features are central tendency and dispersion. We shall also discuss some
commonly used measures of these features and their properties. Most of you would
already be familiar with the measures of central tendency and dispersion. So we'll go
over these quickly and ask you to do a few exercises to help you to recapitulate.
In the examples, we'll be referring again and again to the data sets which you have
studi-ed in Unit 1. So, you will often have to go back and look at the tables in that
unit. If you have a calculator, it would be a good idea to keep it handy while going
through this unit. Calculators are also available at your study centre.
Now we are going to list the objectives of this unit. After you have gone through the
unit, make sure that you have achieved them.
Objectives
After studying this unit, you should be able to :
compute the mean, median and mode from raw data or from a given frequency
distribution,
compute the range, standard deviation and mean deviation of the data, whether
grouped or ungrouped.
derive and use some algebraic properties of the measures of central tendency and
dispersion.
This tendency is called the central tendency of the variable (or of its frequency
, distribution). In the presence of central tendency, we can take the particular value,
4 or a value in the particular part of the range, around which the observations cluster,
as representative or typical of the whole set. By a measure of central tendency (or a
measure of location) of a frequency distribution, we mean such a typical value. The
same is meant by the more familiar term, average. Note that any such measure must
{ have a unit. This unit corresponds to the unit in which the variable is measured and
recorded.
Now suppose you are told that the average blood pressure for your age-groupis 120 mm.
You measure yours and find that it is 110 mm. What does this mean? Is it a cause for
worry? Or does your blood pressure fall within the limits of normal variation? So,
you see, knowing only the 'average' is not enough. While mentioning an ayerage, we
should also give an idea of the extent to which the individual observations differ from
the average value. The variation of the observations from the average (or from one
another) is called scatter or dispersion of the data on the variable. Thus, todescribe
a set of data, we have to give a measure of dispersion along with a measure of central
tendency.
In the next section, we'll describe some measures of central tendency. But before
that, a word about the notation that we'll be using in the unit.
Notation
We shall use some such letter as x, y or z to denote the variable under study, and
the letter n to denote the sum total of the frequencies (i.e., the total number of
individuals for which data are available).
In case the data are in their raw (or ungrouped) form, x, will denote the values of x
as observed for the ith individual (i=l, 2, .... n). Thus, XI, x2, x,, etc., will,
1 respectively, denote the values of x as observed in the first individual, second
individual, third individual, etc.
In case the data are in the form of a frequency table, the number of classes will be
denoted by k and x, will denote the value of x defining the ith class or the mid-point.
of the ith class interval (also called the ith class mark). We will denote the frequency
in the ith class by fi. We then have
Before we go any further, let us recall she following properties of the summation
notation.
i) If ai=a for all i, then
g . a i = ma.
I
I
strtldla
~rripthr ii) If b is a constant, then
iii) a
m
i-1
m
(xi+yi) = s x i +
i=l
2
i=l
yi ,
m
i=l
m
2 (4-Yi) = z x i -
i=L
m
i=]
j;Y,
so that
Fi=Fi-,+fi foria2,
and
Ff = Ff,, + fi for iCk-1
Now here is a simple exercise to find opt whether you have understood our notation
or not.
- -
E l ) a) Suppose the annual incomes of 5-individualsas reported in the I-T returns
for the year 1990-91 (in thousands of rupees) are 75, 80,75, 105 and 83.
Compute xC,)and xi,,.
b) Show that
F; = n - Fi+l (for isk-1)
and
F f = n - Fi - (for i a 2 )
c) What are Fk and F; ?
Now we are ready to discuss the measures of central tendency in the next section.
-
x=
1
n
-z5n
1
... (4)
Formula 5 will provide the exact answer in case the variabl: x is discrete and the
frequency table has classes defined by one distinct value of x each.
If classes of the frequency table are not defined by single distinct values 6f x, we still
use Formula 5, but xi now denotes the clbss mark of the ith class (i=l, 2,.. ..., k).
. Since we are considering class marks instead of individual values, Formula 5 gives
only an approximate value of Z. In other words, we can say that Formula 5 is subject
to grouping errors. But Formula 4 gives the value free from such errors. In the
continuous case, since the observations will involve rounding-off errors, even
lib
I,,
Formula 4 will be approximate, despite being free of grouping errors.
!'I
We no+ &e two examples to illustrate the use of Formulas 4 and 5.
Example 1 : The quantities of milk (in litres) produced by a dairy farm on ten
consecutive
1
days are shown below :
218.2 199.7 207.3 185.4 213.7
184.7 179.5 194.4 224.3 203.5
Let us calculate the mean production.
He:e the data concern a continuous variable, viz., milk yield per day. Here n = 10 .
and
x x i = 2010.7 litres.
Hence, the mean output per day for the dairy farm is
-x = - 2010.7
10
= 201.07 litres.
Before gfving the next example, we would like to tell you about a method of
simplifying the computation of the mean.
In Sec. 2.3.4, you will see that under a linear transformation of the variable, the mean
gets transformed in the same way. Hence, if the variable is subjected to a change of
base and/or scale, that is, if
then
Example 2 :Let us obtain the mean petiole length per leaf of pipal tree from Table 6 in
Unit 1.
We- lay out the computations as in the table below :
Table 1 : Cnldatlons for mean petlole length
0.95 -5 2 -10
1.75
2.55
-4
-3
6
8
-24
-24
.
3.35 -2 10 -20
4.15 -1 24 -24
4.95 0 43 0
5.75 - 1 52 52
6.55 2 33 66
7.35 3 15 45
8.15 4 4 16
8.95 5 1 5
Total - 198 82
(x-4.95)
Since u = , the mean petiole length is
0.8
-x = 4.95+0.8 U
= 4.95+0.331=5,281 cm.
332) The scores obtaned in English by 15 students are given below. calculate the
mean score.
33, 41, 46, 47, 52, 52, 53, 54,57,61, 61,68, 69,70, 74
E3) The age-distribution of the Indian population according to the 1981 census is
shown below :
Age
as on
last 0-4 5-9 10-14 15-19 20.24 25-29 30-34 35-39 4044 45-49 50-54 55-59 60-74
birth-
day
Actually, the census tabulation
leaves the last interval open, that Percen- 12.59 14.08 12.88 9.63 8.62 7.63 6.38 5.85 5.14 4.40 3.83 2.47 6.49
is, simply as "60- ". But here we
tage
ask you to do the calculation
based on the assumption that the
upper end point of this interval is Obtain the mean age of an Indian alive at the time of the census (Note that
74. We cannot calculate the mean here age x on last birthday means that the age at the time of the census was less
without some such assumption. than x + l years but not less than x years. Hence the class intervals should be
However, we must realise that
this assumption may introduce yet
taken to be 0-5, 5-10, etc. You can take.%=32.5 and c=5.
another source of error.
We now turn our attention to another measure of location, the median.
Example 3 :Consider the data on the daily milk yield of a dairy farm that were cited
in Example 1. Arranged in ascending order, the observations are (in litres),
179.5 184.7 185.4 194.4 199.7 203.5 207.3 213.7 218.2 224.3
Here n is even (=lo). Also, x(,)=199.7 and x(,,=203.5.
Hence, any value between 199.7 litres and 203.5 litres may be taken to be the median
yield of milk for the dairy farm. However, if we follow the convention, we may take,
as the unique median,
Now suppose the data on a discrete variable is put in the form of a frequency
distribution in which each class is defined by a single value of x. In this case, the
cumulative frequency table of less than (more than) type presents an arrangement of
the original observations in ascending (descending) order. Let's see how we can use
this fact to get the median.
Example 4 :Table 4 a in Unit 1 shows that if the original data (as shown in Table 2
of Unit 1) were arranged in ascending order, then the first three values would be 1,
the 4th to the 10th would be 2, the 11th to the 21st would be 3, and so on. Here
n=80 and we find that
X!a) = X(4i) = 5,
so that 2 = 5.
When the variable x is continuous and the data are in grouped form, you may
visualise the frequency curve of the distribution. The median should then be taken
as the value of x that divides the area under the frequency curve into two equal
. -parts
n
(or the value that has ordinate - in the corresponding ogive of either type). In any
2
particular situation, however, you vzill have a frequency table with, say, at most 20-25
class intervals, and can hope to get only a rough approximation to the nhedian. You
may then take the median as that value of x which has cumulative frequency (of either
type), 4 2 .
Suppose we have a cumulative frequency table of less than type. We first ascertain
which of the class-intervals c~ntainsthe median. Suppose this interval has lower
boundary x, and upper boundary xu.We further assume that the cumulative frequency
increases linearly frorp F, to F, as the variable x increases from x, to x,,.
-
X-x~ - (42) -F,
Thus, --
Fu-F, '
so that 5 = x, + ( d ) - F , x
fn
where c=x,,-x, is the width of the median interval and fo = F,-F, is the frequency
for the interval.
We can also make use of the cumulative frequency table of the more than type for
computing the median.
Here is an example to illustrate the use of Formula (7).
Example 5 : For the frequency distribution of petiole length per leaf of a pipal tree
2
2
= 99. Therefore, Table 8 in Unit 1 indicates that the median would be in the
interval 5.35 cm - 6.15 cm. Thus,
x, = 5.35; x,= 6.15 *
c = 0.8
and
We have assumed that the ratio in
which idivides Ix,, x.1.is the
F, = 93, F, = 145 *
fo = 52.
same as the ratio in which d2
F,] .
divides [F,, Hence, tbe median may be taken to be
= 5.35 + 0.009
= 5.359 cm.
Petiole Petiole
(a, ('J)
Fig. 2 : (a) less-than ogive, (b) both the wives for the data on petiole length.
You may use the same method in determining the median for the frequency
distribution of a discrete variable like the one in Table 10 in Unit 1, taking the
artificial class intervals 7.5-12.5,12.5-17.5, etc. But note that this method can providc
us with only a rough estimate of the median. This is because in this method we only
replace a step diayam by an ogive.
We have described the methods used to compute the median of
i) raw data on a discrete variable,
ii) data in the form of an ungroupedfrequency table, and
iii) data in the form of a grouped frequency table.
On the basis of our discussion, see if you can solve the following exercise.
E4) Obtain the median age of an Indian according to the population census of 1981
on the basis of the frequency table given in E3.
Example 6 : For the frequency distribution of word length for the 91 words in a poem,
as shown in Table 2, you can see that both 4 and 5 have the highest frequency (19).
As such, in this case the mode is not unique.
Table 2 : Frequency table of word length for the 91 words in a poem
Wordlength Frequency
2 13
8 " 4
9 3
10 1
Total , 91
O4 -'variable value X
I
Fig. 3 : A frequency curve with its mode,
. .i
So, to find the mode of a continuous variable, we look at its frequency table or
bistogram. As a first approximation, we caatake the mid-point of the modal classoas
x. If this class has boundaries x, and xu, and x"' denotes the first approximation to x,
then
= x, + cl2,
where c is the width of the modal class.
But we get a better approximation if we consider the modal class as well as its two
adjacent classes (provided, of course, that the modal class is not a terminal class).
Suppose that these three classes are of the same width. Let us denote by fo, f- and
f,, respectively, the frequency of the modal class, that of the class immediately
preceding the modal class and that of the class following the modal class. Further,
we assume that
, ;;-X, : xu-;; = f0-f- : f0-f+
Now, when will these two approximations be equal? On equating (9) and (10) and
simplifying, we get
g(2)is equal to iff-f- = f,.
In the following example, we have calculated and g(2)for the'data in Table 7 in
Unit 1.
Example 7 : The frequency table for petiole length per leaf for 198 leaves of a pipal
tree has classes of equal width and the class 5.35-6.15 (cm) is the modal class.
Our first approximation to the mode is, then,
g"' = 5.35 + 0.812
= 5.75 cm.
In this case, we have
fo = 52, f- = 43 and f, = 33,
Thus, the second approximation is
= 5 . a cm.
Now, here is a re ,lark about the relationship between the three measures that we
have discussed.
Remark 1 : There is an empirical relation connecting the mean, median and mode,
of a distribution, viz., the relation
mean - mode = 3(mean - median)
-
or x - ;; = 3 @ - % ) ... (11)
We can also use this in'obtaining an approximate value of the mode of a frequency
distribution. From ( l l ) , we get the formula for this third approximation as,
g(3) = 32-2:.
We are now giving an exercise which will give you some practice in calculating the
mode.
Measures ff Central Tendency and
ES) a) Find approximately the mpde of the age distribution given in E3 by using .- .
Dbpemlon
Formula (10).
b) Compare the mean, median and mode as obtained by you for this distribution
and state whether the empirical relation (11)is borne out by the distribution.
So far we have acquainted you with three measures of central tendency; the mean,
the median and the mode. In the next sub-section, we shall discuss some algebraic
properties of these measures.
iv) Suppose k sets of observations on x are combined, 'the ith set having ni
observations with mean Zi. Then the composite (or grand) mean of x is I
k
2 ni Zi
, i=l
-
x = -----
i n i ~
i=l
Y
v) If zi = xi + y,, then Z = ii + 7.
Similarly, if zi = xi - y,, then Z = - 7. x
This result can be extended to the case when two or more variables are a d d ~ dor
subtracted. So, if
z = a+bx+cy+dz+..;+lw, then ?: = a+bii+cy+dZ+ .....+lii.
Out of these, i) and v) are easy to prove. We are sure you will be able to prove them.
Here, we will prove ii), iii) and iv).
Proof : ji)t,We have
I
This result will be proved
Unit 10.
:..
g(y) = Idx/
f(b) y-a dy-'
1
Thus, the frecluency density at any value of y is just -times the frequency density at
Ibl
the corresponding value of x. Sonwquenfly, if there is a value ;of x with the I !-hest
frequency density, then a+bx must be the value of y with the highest f r e q ~2ncy
density.
Hence, = a b i , +
whether the variables are discrete or continuous.
k
e=- x f l ( x i - i ) = 0.
1
iv) Let x i be the jth in the ith set (i=l,i,.. . .,lc and j-1.2 , . ..n i ) Then
the mean of the ith set, i , , is given by
E6) If two sets of observations are combined, show that the composite mean must
lie between the two set means.
E7) Let y be a monotone function of x, say g(x). Show that
a) Y = g(2).
b) Show that = g(g) if the variables are discrete.
X
c) Is!= g@)? A monotone function is an ,
increasing or a decreasing ,
(Hint : Try with g(x) = x3.) function.
E8) Prove algebraic properties i) and v).
E9) The mean of a number of temperature readings on the Centigrade (Celsius)
scale is 33.2 degrees. What would be the mean if the readings were taken on
the Fahrenheit scale?
E10) There are four blocks in an urban locality, having 126, 153, $37 and 190
households. If the mean income (in rupees) for a month per household is
2012.35, 1972.45, 2734.56 and 2415.67 for the four blocks, respectively, then
what is the mean household income for the month for the locality as a whole?
So far we have seen how to compute some measures of central tendency and have
also discussed some of their algebraic properties. Now, we should be able to decide
which of these measures should be chosen for the given data. For this, we have to
know the.pros and cons of using each of these measures. In the next sub-section,
we'll talk about just this.
Find the mean stature per student of the group and see if it is a representative
value. What is the median? Is that a representative value?
With this, we end our discussion of the measures of central tendency. In the next
section, we'll talk about the measures of dispersion.
Thus,
R = x ( ~-
) x(,) = x ; ~ )- xin).
When the data are in the form of an ungrouped frequency distribution, then we can
I1:,
calculate R exactly. Here
R = x, - x,. I
But when the data are presented in the form of a grouped frequency distribution, we
can compute R only approximately. In this case, we estimate R by,
R = Xku - XI &,
where x,, is-the upper boundary of the last (kth) class and x, Iis the lower boundary
of the first class. 1
Example 8. : i) For the data on daily milk yield for a dairy farm, we have, with n=lO,
x(,) = 179.5 litres, x(,) = 224.3 litres.
MD, = sfilxi-~l
I
The only difference between Fo~mulas(14) and (15) is that in (14), x, denotes the ith
distinct value of the discrete variable and in (15), x, denotes the class-mark of the ith
class.
The following remark contains an important result.
Remark 2 : The mean deviation MD, is least when A is the median of x. We are not
going to prove this here. But if you are interested you can look up the book :
Fundamentals of Statistics, Vol. 1 by Goon, Gupta and Dasgupta. This book is
available in your study centre library.
The result stated in Remark 2 perhaps supplies a rationale for taking the median as
the origin while computing the mean deviation of a set of observations.
Now, if the number of observations, n, is odd,
say, n=2m+l, then i =x(,+,).
m 2m+l
where S, = and S2 = XI(,)
ZX(~,
1 m+2
-
If n = 2m, then x(,) <x Sx~~+,).
m 2m
where Si = XX(~)
and S; = Cyi).
Thus, we have
nMD, = Sum of observations exceeding the median - Sum of observations
that are less than the median. ... (16)
-- -.
Descriptive Setisties Using this formula, we can find the mean deviation about the median without-
explicitly knowing the value of the median. We now give an example to illustrate the
use of the formula.
Example 9 : Consider the data on the daily yield of milk (in litres) of a dairy farm
given in Example 1. Arranged in ascending order, the values are
179.5, 184.7, 185.4, 194.4, 199.7, 203.5, 207.3, 213.7, 218.2, 224.3.
The mean deviation about the median (i.e., about any value between 199.7 and
203.5) is given by
lOMD%= x
5
1
-
x(~)
10
6
x(~)
Example 10 : Let us caldulate MDi for the frequency distribution of household size
given in Table 3 in Unit 1. Here, n = 80 and I=5.
We show the required computations in the following table.
Table 2
Now
$
MD, = ~ l x - i l f ,
= - 117
= 14.63 litres.
Next we take the case of data given in the form of a grouped frequency table.
Example 11 : Let us find MD, for the frequency distribution of petiole length given
in Table 6 of Unit 1. Here, jZ=5.359 cm, which lies in the class interval 5.35-6.15.
We first form the following table.
Table 3
Class mark x,
0.95
1.75
2.55
3.35
4.15
4.95
5.75
6.55
7.35
.8.15
8.95 '
Total
Try to do these exercises now.
E12) Show that in computing MD,, it is necessary to consider only the positive
deviations (for which x, > TI) or only the negative deviations (for which xi < 3.
Indeed, if the sum of the former is P and that of the latter is Q, show that
nMD, = 2P = -2Q.
E13) For the age distribution of Indians shown in E3, obtain the mean deviation about
median.
+
Since xi - A,= (xi -- JI) 6 - A),
We have ( 5 - A)' = (xi - q2+ 2 6 - A) (xi - JI) + 6 - A)'
Thus, s= /k$x:-?
is another expression for s.
For grouped data, we have the formula
Now we'll illustrate the method of finding the standard deviation of the given da)a
on a continuous variable. We are sure you will have no difficulty in compyting tHe
standard deviafion of the data on a discrete variable.
i
I
Example 12 : For the grouped data on petiole length (see Table 6, Unit I), the
I
computations needed for determining the standard deviation (together with the
mean) may be raid out in tabular form. But, in this case, it will b e convenient to
(x - 4.95)
I
subiect the variable to a change of base and scale. Let us take u = ,, i
Table 4
Clam mark u,=(x, - 4.95)lO.S Frequency
XI f,
0.95 -5 2
i f l U i=,82
1
1I
k 1.
Hence ns: = x f i u : -
1
n
= 656.0404
/
b Hence, the variance of u is
st = 656'0404 = 3.3133
198
But how can we get s2, the variance of x, from this? It is related to the variance of u by
s2 = C2S"'
2
We'll prove this in the next section.
Here c; = 0.8. Therefore, we have
,
s2 = 0.64 x 3.3133 = 2.1205
e,,
and the standard deviation is
?,I s= d m = 1.456 cm.
From the same table, we get
82
ii = -= 0.4141 cm,
198
so that -
x = 4.95 + 0.8 x 0.4141
= 4.95 +
0.331 = 5.281 cm.
Check if you can find the standard deviation of the data on a discrete variable by
solving this exercise now.
Next, we shall discuss some algebraic properties of the three measures of dispersion
discussed so far. '
so that
R, = Y(,) -'Y(,) = W(,) - ~(1))= 4.
If b < 0, then
so that
R, = y(,) - Y ( ~ )= b(x(,, - = - bRx.
Hence, in either case,
R, = IbIR,.
ii) Mean deviation : We have already seen (in Section 2.3.4) that under a
transformation of this type,
Remark 3 : Note that a good measure of dispersion should have Property (2). For,
if the observations are all increased or decreased by a constant amount, then the
dispersion remains unchanged, but if they are all increased or decreased in a constant
proportion, then the dispersion too gets increased or decreased in the same
proportion.
3) Suppose several (say k) sets of observations on x are combined, the ith set having
xi
n, observations with mean and standard deviation si (i= l , 2 , . ......,k). Then the
composite standard deviation, i.e., the standard deviation of the combined data
- Proof : Let xijbe the jth obse\rvationin the ith set (i=1,2 ,...,k and j = 1,2,......., ni).
Then
Now, s2 is the variance of the combined set and is, by definition, given by
-
so that (xij- q2= (xij-. xi12 + 2@,-3 (xii - XI) + (s?,-x)~,
and x j
(xij- K12 = x
j
(xij- xJ2
A
+ ni(jZi-q2 , since x
j
-
(xij - %) = 0.
- -%n-
Remark 4 : This result shows that the standard deviation of the combined set may be
non-zero even when the individual sets have zero standard deviations. If the values
in the ith set are all equal to ai, i=1,2 ,.....,k,
Example 13 : For the four blocks in an urban locality, t h e means and standard
deviations of household income (in rupees) for a certain month are given below
together with the number of households in each block:
Blvck I I1 111 1V
Number of 126 153 137 190
households I
. ' Mean Income 1972.45 2734.56 2415.47
(in Rs.)
S.dof income
I
,
2012.35
153.17 189.62 202.09
Let us find the mean and standard deviation for the entire locality.
We have, in the notation used in (3),
k
and 2nzf = 20828581.0
1
Also, x= 2
n , ~ / ni = 2291.94 (rupees).
1 1
E16) If R and s be the range and the standard deviation for a set of n observations on
a variable x, show that
E17) Show that the standard deviation can be expressed in terms of the mutual
-
differences xi-xj of the observations more precisely,
E18) There are 4 sections, A,B,C and D in Class X of a school, having 48,41, 52'
and 45 students, respectively. If the mean 1Qs par student for the sections are
133.2,125.4,110.5 and 97.8, and the standard deviations of IQ are 3.8,4.7,5.1
and 5.9, respectively, then find the composite mean and composite standard
deviation of IQ for the class.
So far, ,in this section, we have discussed three measures of dispersion. Now let us
compare and contrast these.
Easy to compute - -
- Properly reflects the variation
in the data -
- - Not unduly affected by the
presence of extremely high or
low values.
Not amenable to algebraic
-
treatment -
Measured of Central Tendency and
E19) Fill in the gaps in Table 6. %prbo
In the light of these observations, the range appears to be the worst measure. Indeed,
the standard deviation may generally be taken to be the best measure of dispersion,
just as the mean may generally be taken to be the best measure of central tendency.
However, in industrial applications, to check whether a manufacturing process is
under control, we have to compute a measure of dispersion of some important
characteristic of the manufactured product at frequent intervals. Due to the simplicity
of calculation, the range is quite popular in such cases.
In the last two sections, we have discussed the measures of central tendency and
dispersion. In the next section, we'll see how we can compare two or more data sets.
$1
sly
1
9,
dP,
units but with widely different averages are to be compared. For example, consider
the following situation. Suppose Firm I manufactures ball-bearings meant for
bicycles, while Firm I1 manufactures ball-bearings that are to be used in motor cars.
Naturally, the ball bearings of Firm I have to be much smaller in volume than those
of Firm 11. If we want to know which of the two firms produces ball-bearings with a
'
5;
f lesser variation in size, then a comparison of the two standard deviations will not be
relevant. Since the ball-bearings produced by Firm I1 are larger, we would be ready
to tolerate higher deviation from the mean size (and hence a higher standard
deviation) than in the case of Firm I. A comparison of the two coefficients of variation
will be much more meaningful. For the same reason, we should use the coefficient
of variation, rather than the standard deviation, when income disparities in, say, a
group of managers are to be compared with those in a group of clerks.
Exam* 14 : The mean and standard deviation of family income (in US dollars) in
a year are given'below for each of the three countries, A , B and C.
Country
I A B C
We would like to know which of the three countries shows the highest disparity in
family income and which one shows the lowest.
We may compare the standard deviations of B and C, but A surely stands in a
different category, having a mean family income that is vastly higher than those of
B. The comparison should, therefore, be made in terms of the coefficients of
variation rather than the standard deviations. We have
With this, we bring thisunit to a close. Here is a brief summary of our discussion.
2.6 SUMMARY
In this unit, we have discussed some features of univariate distributions. In particular,
we have seen that
1) the observations on a variable show a tendency to cluster around some point or
a small part of the range of variation. This is called the central tendency. An
average is a value which can be taken to be representative of the data. The
variation of the observations from the average is called dispersion.
2) there are different types of measures of central tendency:
the mean, the median and the mode.
We have seen how to compute these from raw data, grouped or ungrouped
frequency distributions.
We have noted that the mean, median and mode have certain algebraic
properties.
We haie also discussed the relative advantages and disadvantages of these
measures.
3) there are various measures of dispersion :
the range, the mean deviation, the standard deviation.
We have noted the algebraic properties and relative merits and demerits of these
measures.
4) coemcient of variation is used to compare dispersions of different data sets.
3 Fi= n - F;+,
Similarly, F; + Fie,= n and hence
F; = n - Fi-l
c) F, = F; ,= n.
838
E2) mean score = - r 55.86 ....
15
.'. Si; = 5i + 32.5
= +
-7.4395 32.5
= 25.06 years.
= 7.77years.
b) Now from E3 and E4, E - = 17.29 years
and j;. - Z = 4.59 years.
So the empirical relation is not borne out in this case.
Let El be the mean and n, be the total frequency of the first set.
Let % be the mean.and n, be the total frequency of the second set.
Let E be the composite mean.
-
Suppose E, < i, . E =
n,xl + n,q-
nl + n2
a 6 2 - El) O.
and - i=
nl + n2
:. El s E s E2.
Similar proof if & < i,.
a) = g(%), since the middlemost value of x corresponds to the middlemost
value of y.
b) If x is discrete, s5 is g(x) and the freqhency of any value of x is also the
frequency for tb vesponding value of y. Hence, the value of y with the
highest freque~. ;esponds to the value of x with the highest frequency,
i.e., = g(i).
c) i; #.g(TI) unless g is a linear function.
1 1
:. MD, = --Xlxi - a1 = -nx l a - a ( = 0.
-
Similarly s = 0.
-
b) y - y = b(xi - x), since = a+ b i ,
= b2S,2
.'. s, - Ib(s, , since standard deviation is the positive square root of
variance.
where a = {XO) + ~ ( n ) ) a
MY)
Standard Deviation
-Range Mean Deviation
Easy to compute More difficult More difficult
Does not reflect variation Properly reflects variation Properly reflects variation
properly
~ r e a taffected
l~ by the Not unduly affected Not unduly affected
presence of extreme values
Not amenable to algebraic Not easily amenable to Is amenable to algebraic
treatment algebraic treatment treatment
E20) Yes, since the CVs are 7.07% and 9.43%, respectively.