BS Unit 3
BS Unit 3
Syllabus:
Measures of Dispersion – Range, Quartile Deviation, Mean Deviation & Standard Deviation with coefficients
Measures of Dispersion
Averages or measures of central tendency give us an idea of the concentration of observations about the
central part of the distribution. In spite of their great utility in statistical analysis, they have their own
limitations. If we are given only the average of a series of observations, we cannot form compete idea
about the distribution since there may exist a number of distributions whose averages are same but which
may differ widely from each other in a number of ways. The following example will illustrate this view
point.
Total Mean
A 15 15 15 15 15 15 15 15 15 135 15
B 11 12 13 14 15 16 17 18 19 135 15
C 6 10 13 14 15 17 19 20 21 135 15
All the three series A, B and C have the same size (n = 9) and same mean viz., 15. Thus, if we are given
that the mean of a series of 9 observations is 15, we cannot determine if we are talking of the series A, B
or C. In fact, any series of 9 items with total 135 will give mean 15. Thus, we may have a large number
of series with entirely different structures and compositions but having the same mean.
From the above illustration it is obvious that the measures of central tendency are inadequate to describe
the distribution completely. Thus the measures of central tendency must be supported and supplemented
by some other measures. One such measure is Dispersion.
Literal meaning of dispersion is “scatteredness”. We study dispersion to have an idea of the homogeneity
or heterogeneity of the distribution. In the above illustration, we say that the series A is stationary, i.e., it
is constant and shows no variability. Series B is slightly dispersed and series C is relatively more
dispersed. We say that series B is more homogeneous as compared with series C or the series C is more
heterogeneous than series B.
Definitions
“Dispersion or spread is the degree of the scatter or variation of the variables about a central value.” –
B.C. Brooks and W.F.L. Dick.
The measures of variation enable us to find out if the average is representative of the data. Dispersion
gives us an idea about the spread of the observations about an average value. If the dispersion is
small, it means that the given data values are closer to the central (average ) value and hence the
average may be regarded as reliable in the sense that it provides a fairly good estimate of the
corresponding population average. If the dispersion is large, then the data values are more deviated
from the central value, thereby implying that the average is not representative of the data and hence
not quite reliable.
The measures of variation help us to determine the causes and the nature of variation, so as to control
the variation itself.
The relative measures of dispersion may be used to compare two or more distributions, even if they
are measured in different units, as regards their variability or uniformity.
The measures of variation are used for computing other statistical measures which are used
extensively in correlation, regression analysis etc.
Measures of Dispersion
i. Range
ii. Quartile Deviation or Semi-Interquartile Range.
iii. Mean deviation
iv. Standard deviation.
v. Lorenz curve.
The first two measures viz., Range and Quartile Deviation are termed as positional measures since they
depend upon the values of the variable at particular position of the distribution. The last measure viz.,
Lorenz curve is a graphic method of studying variability.
RANGE
Range is the simplest of all the measures of dispersion. It is defined as the difference between two
extreme observations of the distribution. In other words, range is the difference between the largest
(maximum) and the smallest (minimum) observation of the distribution. Thus,
Where X max is the largest observation and X min is the smallest observation of the variable values.
In case of grouped frequency distribution (for discrete values) or continuous frequency distribution, range
is defined as the difference between the upper limit of the highest class and the lower limit of the smallest
class.
Range as defined in the above formula is the absolute measure of range. A relative measure of range, also
termed as the coefficient of range, is defined as:
X max + X min
Example:
Merits and Demerits of Range
Merits
Demerits
Uses
It is a measure of dispersion based on the upper quartile Q3 and the lower quartile Q1.
Quartile deviation is obtained from inter-quartile range on dividing by 2 and hence is also known as
semi inter-quartile range. Thus
Q 3−Q1
Quartile Deviation (Q.D) ¿
2
The above measure is an absolute measure of dispersion. Relative measure of Quartile deviation is
known as coefficient of quartile deviation and is given by:
( Q3 −Q1 )
2
Coefficient of Q . D=
Q3−Q!
2
Q3−Q1
¿
Q3−Q1
Example:
Merits and Demerits of Quartile Deviation
Merits
1. It is rigidly defined.
2. It is easy to understand and compute.
3. It is not affected by extreme observations and hence a suitable measure of dispersion when a
distribution is highly skewed.
4. It can be calculated even for a distribution with open ends.
Demerits
An important requirement of a measure of dispersion is that it should be based on all the observations.
Range and Quartile deviation do not satisfy this requirement. Mean deviation is a measure of dispersion
based on all the observations. It is defined as the arithmetic mean of the absolute deviations of
observations from a central value like mean, median or mode. Here the dispersion in each observation is
measured by its deviation from a central value. This deviation will be positive for an observation greater
than the central value and negative for less than it. If these deviations are added as such, the positive
deviations tend to cancel the negative deviations and the resulting sum will not reflect a correct
magnitude of the total spread of observations. This difficulty is tackled by taking the absolute value of the
deviations.
The following are the formulae for the computation of mean deviation (M.D) of an individual series of
observations X1, X2,,…. Xn:
∑| X− X̅ |
1. M . D ¿ Mean ( X )=
n
∑|X −median|
2. M . D ¿ Median=
n
∑|X −mode|
3. M . D ¿ Mode=
n
Example:
Calculate Mean Deviation from mean and median for the following data of heights (in inches) of 10
persons.
Solution:
¿ 2.88
Coefficient of M.D from X̅ = M . D ¿ X̅ = =0.045
X̅ 64.4
63+64
Median = =63.5inches
2
X | X−Median|
60 3.5
60 3.5
62 1.5
63 0.5
63 0.5
64 0.5
65 1.5
68 4.5
69 5.5
70 6.5
∑| X−Median|=28
∑|X −median| 28
M.D from Median = = =2.8 inches
n 10
2.8
Coefficient of M.D from Md = =0.044
63.5
In case of an ungrouped frequency distribution, the observations X 1, X2,……Xn occur with respective
frequencies f1, f2,….fn such that ∑f = N. The corresponding formulae for M.D can be written as:
∑ f |X −X̅ |
1. M . D ¿ Mean=
N
∑ f | X−Median|
2. M . D ¿ Median=
N
∑ f | X−mode|
3. M . D ¿ Mode=
N
The above formulae are also applicable to a grouped frequency distribution where the symbols X 1, X2,
…..Xn will denote the mid-values of the first, second……nth classes respectively.
The above formulae for mean deviation give an absolute measure of dispersion. The formulae for relative
measure, termed as the coefficient of mean deviation, are given below:
¿
1. Coefficient of M . D ¿ Mean=M . D ¿ X̅
X̅
¿
2. Coefficient of M . D ¿ Median=M . D ¿ Median
Median
¿
3. Coefficient of M . D ¿ Mode=M . D ¿ Mode
Mode
Example:
Weekly No. of Mid-
u=
X −475fu c.f | X−X̅ | f | X− X̅ | | X−M d| f | X−M d|
wages Workers values 50
(f) (X)
200 - 250 7 225 -5 -35 7 227.25 1590.75 230 1610
250 – 300 13 275 -4 -52 20 177.25 2304.25 180 2340
300 – 350 15 325 -3 -45 35 127.25 1908.75 130 1950
350 – 400 24 375 -2 -48 59 77.25 1854 80 1920
400 – 450 36 425 -1 -36 95 27.25 981 30 1080
450 – 500 50 475 0 0 145 22.75 1137.5 20 1000
500 – 550 25 525 1 25 170 72.75 1818.75 70 1750
550 – 600 10 575 2 20 180 122.75 1227.5 120 1200
600 – 650 8 625 3 24 188 172.75 1382 170 1360
650 -700 6 675 4 24 194 222.75 1336.5 220 1320
700 – 750 4 725 5 20 198 272.75 1091 270 1080
750 - 800 2 775 6 -12 200 322.75 645.5 320 640
N = 200 17277.5 17250
∑ fu
X =A +h .
N
(−91 )
X =475+50 × =452.25
200
∑ f |X −Mean| 17277.5
M.D from Mean = = =86.39
N 200
M d =450+
50 200
50
×
2 (
−95 =455 )
∑ f |X −Median| 17250
M.D from Median = = =86.25
N 200
Example:
Calculate mean deviation from median and its coefficient from the given data:
X : 0 1 2 3 4 5 6 7 8 9
F : 15 45 91 162 110 95 82 26 13 2
Solution:
N 641
Since = =320.5 ∴ M d=¿ 4 ¿
2 2
∑ f |X −Median| 938
Thus, M.D¿ = =1.46
N 641
Demerits
1. It is not capable of further mathematical treatment. Since mean deviation is the arithmetic mean
of absolute value of deviations, it is not very convenient to be algebraically manipulated. This
necessitates a search for a measure of dispersion which is capable of being subjected to further
mathematical treatment.
2. It is not a well-defined measure of dispersion since deviations can be taken from any measure of
central tendency.
Uses
The mean deviation is a very useful measure of dispersion when sample is small and no elaborate analysis
of data is needed. Since standard deviation gives more importance to extreme observations the use of
mean deviation is preferred in statistical analysis of certain economic, business and social phenomena.
STANDARD DEVIATION
Standard deviation, usually denoted by the letter σ (small sigma) of the Greek alphabets was first
suggested by Karl Pearson as a measure of dispersion in 1893. It is defined as the positive square root of
the arithmetic mean of the squares of the deviations of the given observations from their arithmetic mean.
Square of standard deviation is known as variance. Thus, if X 1, X2,….,Xn is a set of n observations then its
standard deviation is given by:
√
2
∑ ( X−X̅ )
σ=
n
∑X
where, X = is the arithmetic mean of the n observations.
n
If X is a whole number, the above method is appropriate. If X̅ is not a whole number the S.D can be
conveniently computed by using the transformed form of the above formula, given below:
√ ∑ X2 ∑ X
( )
2
σ= −
n n
Where X is the value of the variable or the mid-value of the class (in case of grouped or continuous
frequency distributions); f is the corresponding frequency of the value X; N = ∑f, is the total frequency
and
∑ fX
X=
N
S.D of an ungrouped and grouped frequency distribution can also be computed using the formula:
√ ∑ f X 2 ∑ fX
( )
2
σ= −
N N
Variance is given by σ 2
Example:
X: 10 11 12 13 14 15 16 17 18
f: 2 7 10 12 15 11 10 6 3
Solution:
X X2 f fX fX2
10 100 2 20 200
11 121 7 77 847
12 144 10 120 1440
13 169 12 156 2028
14 196 15 210 2940
15 225 11 165 2475
16 256 10 160 2560
17 289 6 102 1734
18 324 3 54 972
Tota 76 106 15196
l 4
σ=
√
∑ f X 2 ∑ fX 2
N
−
N ( )
=
76
−
√
15196 1064 2
76 ( )
= √ 199.95−( 14 )2 =√ 3.95=1.99
The formulae to calculate standard deviation using short-cut method is given below:
1.
√ ( )
∑ f d 2 ∑ fd 2
σ= −
N N
Where d = X – A; where A is assumed mean, which is usually taken from a value which lies in the middle
portion of the X values.
√ ( )
∑ f u2 ∑ fu 2
2. σ =h × −
N N
X− A
Where, ¿ ; h is the class width or a common factor of the X values; A is the assumed
h
mean.
Example:
∴ σ X =14.24
Coefficient of Variation
The standard deviation is an absolute measure of dispersion and is expressed in the same units as the units
of variable X. A relative measure of dispersion, based on standard deviation is known as coefficient of
standard deviation and is given by
σ
Coefficient of Standard Deviation=
X̅
Coefficient of Standard deviation expressed in the form of percentage is called coefficient of variation.
This measure introduced by Karl Pearson, is used to compare the variability or homogeneity or stability
or uniformity or consistency of two or more sets of data. The data having a higher value of the coefficient
of variation is said to be more dispersed or less uniform or consistent, etc.
σ
Coefficient of Variation= × 100
X̅
Example:
Example:
Calculate standard deviation and its coefficient of variation from the following data:
Measurements : 0 -5 5 -10 10 – 15 15 – 20 20 – 25
Frequency : 4 1 10 3 2
Solution:
Calculation of X̅ and σ
X −12.5
Here, u=
5
∑ fu 5×2
X =A + × h=12.5− =12
N 20
σ =h ×
√ N ( )
∑ f u2 ∑ fu 2
−
N
=5 ×
√
28 −2 2
20
−
20 ( )
=5.89
σ 5.89
Coefficient of Variation (CV) ¿ × 100= ×100=49
X̅ 12
Merits
Demerits
Uses
1. Standard deviation can be used to compare the dispersions of two or more distributions when
their units of measurements and arithmetic means are same.
2. It is used to test the reliability of mean. It may be pointed out here that the mean of a distribution
with lower S.D is said to be more reliable.
SKEWNESS
Skewness of a distribution refers to its asymmetry. The symmetry of a distribution implies that for a given
deviation from a central value, there is equal number of observations on either side of it. If the
distribution is asymmetrical or skewed, its frequency curve would have a prolonged tail either towards the
left or towards its right hand side. Thus, the skewness of a distribution is defined as the departure from
symmetry.
In a symmetrical distribution, mean, median and mode are equal and the ordinate at mean divides the
frequency curve into two parts such that one part is the mirror image of the other. Positive skewness
results if some observations of high magnitude are added to a symmetrical distribution so that the right
hand tail of the frequency curve gets elongated. In such a situation, we have Mode< Median < Mean.
Similarly, negative skewness results when some observations of low magnitude are added to the
distribution so that left hand tail of the frequency curve gets elongated and we have Mode > Median >
Mean.
Measure of Skewness
A measure of skewness gives the extent and direction of skewness of a distribution. As in case of
dispersion, we can define absolute and relative measures of skewness. Two measures of skewness are:
This measure was suggested by Karl Pearson. According to this method, the difference
between X̅ and (mode) Mo can be taken as an absolute measure of skewness in a distribution,
i.e., absolute measure of skewness = X̅ - Mo.
Alternatively, when mode is ill-defined and the distribution is moderately skewed, the
above measure can be approximately expressed as 3 ( X −Median ).
X−M o Mean−Mode
Sk = =
σ Standard Deviation
3 ( X −M d ) 3 ( Mean−Median )
Sk = =
σ Standard Deviation
This measure suggested by Bowley, is based upon the fact that Q1 and Q3 are equidistant from
median of a symmetrical distribution, i.e., Q3 – Md = Md – Q1. Therefore,
(Q3 – Md ) – (Md – Q1) can be taken as an absolute measure of skewness.
Example:
Example: