Dispersion 26-11-2023

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

 The degree to which numerical data

tend to spread about an average value


is called variation or dispersion.
 Measures of Central Tendency are those
statistical methods with the help of which
we get one value for a given series and this
value represents the series. These measures
are very useful in statistical analysis. But
these measures suffer from one very serious
limitation-
 It does not tell about variation in the series.
In this way these measures ignore very
important aspect of series, i.e., variation.
 Measures of central tendency will
suggest that these series are similar but in
fact their structure will be different. To
understand this let us take some
example:
I II III
Match Virat Kohli Yuvraj Singh Mahi
I 32 10 42
II 25 60 38
III 43 00 40
IV 55 30 39
V 45 100 41
Mean Score X‾ = X‾ = X‾ =
 .The three scores have same value of
mean. But their structures are different. If
we mean to compare the performance of
three players, our judgment will be that
their performance is similar. However it is
not so. Variations in third distribution are
small. Hence, performance of Mahi is very
consistent or he is a very reliable batsman.
 Variation in second distribution are large.
Hence performance of Yuvraj Singh is not
consistent or he is not a reliable batsman.
 Variation in first distribution is neither too
large nor too small.
 Measures of central tendency ignores these
variations in the different distributions.
 Dispersion– scatter/spread/ variation
 Why Dispersion?
 (i) check the reliability of Mean
 (ii) to compare two or more series.
 (iii) to study variation in a given series,
statistician have developed a technique
called Dispersion.
 Definition: By dispersion we mean variability
in a given series about its central mean.
 Different Measures of Dispersion:
 These measures are divided into two
categories.
Measures of Dispersion

Absolute Measures Relative Measures

Expressed in relations to
each other example
Expressed in absolute
ratio, percentage
terms
[called as COEFFICIENT
OF DISPERSION
 To control the variation of the data from the
central value.
 To find the average distance of the items from
an average.
 To know the reliability of an average. When the
dispersion is small, the average is reliable.
 To control the variability itself.
 To compare two or more series on the basis of
their variability.
 To obtain other statistical measures for further
analysis of data
 Overtime Statisticians have developed
following measures of dispersion:
 Algebraic Measures- in next slide*
 Graphic Measure: Lorenz Curve.
 Independent of change of origin but
dependent on change of scale.
Measures of Dispersion

Absolute
Relative
Measures
Measures

Coefficient Coefficient Coefficient


Quartile Mean Standard Coefficient
Range of Quartile of Mean of Standard
Deviation Deviation Deviation of Range
Deviation Deviation Deviation
 Absolute Measures: the measures of
Dispersion which are expressed in terms
of the original units of distribution are
called absolute measures of dispersion.
 The measures are not used for
comparing the variability of two or more
distributions.
 Relative Measures: independent of
original units of the variable. These are
generally in the form of pure numbers,
percentage etc. we use these measures
for comparing variability of two or more
than two series.
that measure is called a good measure which possess
the following characteristics:
(i) It should be simple to calculate and easy to
understand.
(ii) It should be rigidly defined.
(iii) It should be based on all observations of the series.
(iv) It should be capable of further algebraic treatment.
(v) It should be familiar to common people.
(vi) It should not be affecting by fluctuations of
sampling.
(vii) It should be such that it is possible to find variations
even in the case of open end class interval.
(viii) It should not be unduly affected by the presence of
extreme items.
 The difference between the maximum value –
minimum value = Range
 Range = Xmax – Xmin
 where Xmax = L, is the greatest observation and Xmin
= S, is the smallest observation of the variable values.
 In case of the grouped frequency distribution (for
discrete values) or the continuous frequency
distribution, range is defined as the difference
between the upper limit of the highest class and the
lower limit of the smallest class.
 NOTE: In case of a frequency distribution, the
frequencies of the various values of the variable (or
classes) are immaterial since range depends only on
the two extreme observations
 Range is the simplest though crude measure of dispersion.
It is rigidly defined, readily comprehensible and is perhaps
the easiest to compute, requiring very little calculations.
 However, it does not satisfy the properties (iii) to (vi) for an
ideal measure of dispersion.
 Limitations and drawbacks.
 (i) Range is not based on the entire set of data. It is based
only on two extreme observations, which themselves are
subject to change fluctuations. As such, range cannot be
regarded as a reliable measure of variability.
 (ii) Range is very much affected by fluctuations of
sampling. Its value varies very widely from sample to
sample.
 (iii) If the smallest and the largest observations of a
distribution are unaltered and all other values are
replaced by a set of observations within these values i.e., X
max and Xmin, the range of the distribution remains same.
 Thus range does not take into account the
composition of the series or the distribution of the
observations within the extreme values.
Consequently, it is fairly unreliable as a measure of
dispersion of the values within the distribution.
 iv) Range cannot be used if we are dealing with
open end classes.
 (v) Range is not suitable for mathematical treatment.
 (vi) Another shortcoming of the range, though less
important is that it is very sensitive to the size of the
sample. As the sample size increases, the range tends
to increase though not proportionately.
 (vii) In the words of W.I. King “Range is too indefinite
to be used as a practical measure of dispersion.”
 (1) In spite of the above limitations and shortcomings
range, as a measure of dispersion, has its applications in a
number of fields where the data have small variations like
the stock market fluctuations, the variations in money rates
and rate of exchange.
 (2) Range is used in industry for the statistical quality
control of the manufactured product by the construction
of R-chart i.e., the control chart for range.
 (3) Range is by far the most widely used measure of
variability in our day-to-day life. For example, the answer
to problems like, ‘daily sales in a departmental store’;
‘monthly wages of workers in a factory’ or ‘the expected
return of fruits from an orchard’, is usually provided by the
probable limits - in the form of a range.
 (4) Range is also used as a very convenient measure by
meteorological department for weather forecasts since
the public is primarily interested to know the limits within
which the temperature is likely to vary on a particular day.
 Absolute and Relative Measures of Range:
 If we want to compare the variability of two or
more distributions with the same units of
measurement, we may use absolute measure
i.e., Range = Xmax – Xmin.
 However, to compare the variability of the
distributions given in different units of
measurement, we cannot use absolute
measure but we need a relative measure
which is independent of the units of
measurement. This relative measure, called the
coefficient of range, is defined as follows:

 Range is a positional measure of dispersion.


 It is a measure of dispersion based on the upper quartile
Q3 and the lower quartile Q1.
 Inter-quartile Range = Q3 – Q1
 Quartile deviation is obtained from inter-quartile range on
dividing by 2 and hence is also known as semi inter-
quartile range. Thus
 Quartile Deviation (Q.D.) =Q3 – Q1/2.
 Q.D. as defined in only an absolute measure of dispersion.
 For comparative studies of variability of two distributions
we need a relative measure which is known as Coefficient
of Quartile

Deviation and is given by :


 Merits.
 Quartile deviation is quite easy to understand and
calculate. It has a number of obvious advantages over
range as a measure of dispersion. For example :
 (a) As against range which was based on two
observations only, Q.D. makes use of 50% of the data and
as such is obviously a better measure than range.
 (b) Since Q.D. ignores 25% of the data from the beginning
of the distribution and another 25% of the data from the
top end, it is not affected at all by extreme observations.
 (c) Q.D. can be computed from the frequency distribution
with open end classes. In fact, Q.D. is the only measure of
dispersion which can be obtained while dealing with a
distribution having open end classes.
 (i) Q.D. is not based on all the observations
since it ignores 25% of the data at the lower
end and 25% of the data at the upper end of
the distribution. Hence, it cannot be regarded
as a reliable measure of variability.
 (ii) Q.D. is affected considerably by fluctuations
of sampling.
 (iii) Q.D. is not suitable for further mathematical
treatment./
 Thus quartile deviation is not a reliable measure
of variability, particularly for distributions in
which the variation is considerable.
 Mean Deviation is also known as
average deviation. In this case deviation
taken from any average especially
Mean, Median or Mode. While taking
deviation we have to ignore negative
items and consider all of them as
positive. The formula is given below:

24
 Mean Deviation is also known as
average deviation. In this case deviation
taken from any average especially
Mean, Median or Mode. While taking
deviation we have to ignore negative
items and consider all of them as
positive. The formula is given below

25
 Steps for Computation of Mean Deviation for
Frequency Distribution.
 If X1, X2,…,Xn are n given observations then the
mean deviation (M.D.) about an average A,
say, is given by :

 where | d | = | X – A |; read as mod (X – A), is


the modulus value or absolute value of the
deviation (after ignoring the negative sign). d =
X – A and Σ | d | is the sum of these absolute
deviations and A is any one of
 the averages Mean (M), Median (Md) and
Mode (Mo).
26
 1. Calculate the average A of the distribution by the
usual methods
 2.Take the deviation d = X – A of each observation
from the average A.
 3. Ignore the negative signs of the deviations, taking
all the deviations to be positive to obtain the
 absolute deviations, | d | = | X – A |.
 4. Obtain the sum of the absolute deviations obtained
in step 3.
 5. Divide the total obtained in step 4 by N, the number
of observations.
 The result gives the value of the mean deviation
about the average A.

27
 In the case of frequency distribution or
grouped or continuous frequency distribution,
mean deviation about an average A is given
by :

 where X is the value of the variable or it is the


mid-value of the class interval (in the case of
grouped or
 continuous frequency distribution),
 f is the corresponding frequency,
 N = Σf , is the total frequency and
 | X – A | is the absolute value of the deviation
d = (X – A) of the given values of X from the
average A
 (Mean, Median or Mode).
28

 4. Multiply the absolute deviations | d | =
| X – A | by the corresponding frequency
f to get f | d |.
 5. Take the total of products in step 4 to
obtain Σ f | d |.
 6. Divide the total in step 5 by N, the total
frequency.
 The resulting value is the mean deviation
about the average A.

29
 (i) Mean deviation is rigidly defined and is easy to
understand and calculate.
 (ii) Mean deviation is based on all the observations
and is thus definitely a better measure of dispersion
than the range and quartile deviation.
 (iii) The averaging of the absolute deviations from an
average irons out the irregularities in the distribution
and thus mean deviation provides an accurate and
true measure of dispersion.
 (iv) As compared with standard deviation, it is less
affected by extreme observations.
 (v) Since mean deviation is based on the deviations
about an average, it provides a better measure for
comparison about the formation of different
distributions.
 (i) The strongest objection against mean deviation is that while
computing its value we take the absolute value of the deviations
about an average and ignore the signs of the deviations.
 (ii) The step of ignoring the signs of the deviations is
mathematically unsound and illogical. It creates artificiality and
renders mean deviation useless for further mathematical
treatment. This drawback necessitates the requirement of
another measure of variability which, in addition to being based
on all the observations is also amenable to further algebraic
manipulations.
 (iii) It is not a satisfactory measure when taken about mode or
while dealing with a fairly skewed distribution. As already pointed
out, theoretically mean deviation gives the best result when it is
calculated about median. But median is not a satisfactory
measure when the distribution has great variations.
 (iv) It is rarely used in sociological studies.
 (v) It cannot be computed for distributions with open end
classes.
 (vi) Mean deviation tends to increase with the size of the sample
though not proportionately and not so rapidly as range.
 Uses. In spite of its mathematical drawbacks, mean
deviation has found favor with economists and
business statisticians because of its simplicity,
accuracy and the fact that standard deviations)
gives greater weightage to the deviations of extreme
observations. Mean deviation is frequently useful in
computing the distribution of personal wealth in a
community or a nation since for this, extremely rich as
well as extremely poor people should be taken into
consideration. Regarding the practical utility of mean
deviation as a measure of variability, it may be
worthwhile to quote that in the studies relating to
forecasting business cycles, the National Bureau of
Economic Research has found that the mean
deviation is most practical measure of dispersion to
use for this purpose.
 The relative measure of dispersion, called
the coefficient of mean deviation is
given by :
 Standard deviation, usually denoted by the
letter σ (small sigma) of the Greek alphabet
was first suggested by Karl Pearson as a
measure of dispersion in 1893. It is defined as
the positive square root of the arithmetic mean
of the squares of the deviations of the given
observations from their arithmetic mean.
 Thus if X1, X2,…, Xn is a set of n observations
then its standard deviation is given by :

 The Formula:
 1. Compute the Arithmetic Mean X‾
 Compute the deviation (X- X‾) of each
observation from A.M. i.e.,
 X1 – X‾, X2 – X2‾, …, Xn – X‾n.
 Introduced by Karl Pearson in 1823.
 It is by far the most important and widely
used measure of studying Dispersion. It’s
significance lies that it is free from those
defects from which the earliest methods
suffer and satisfies most of the properties
of a good measure of Dispersion.
 S.D is also known as the root mean
square deviation from the Arithmetic
Mean. It is denoted by the symbol σ
 The S.D. measures the absolute dispersion for
variability of distribution. The greater the S.D.
the greater will be the magnitude of the
deviations of the values from the mean.
 A small S.D means a high degree of uniformity
of the observation as well as homogeneity of a
series, a large S.D means just the opposite.
 Thus, if we have two or more comparable
series with identical or nearly identical means. It
is the distribution with the smallest S.D. that has
the most representative mean. Hence S.D. is
extremely useful in judging the representatives.
 Both these measures of dispersion are
based on each and every item of the
distribution. But they differ in the following:
 Algebraic signs are ignored while
calculating mean deviation whereas in the
calculation of S.D. signs are taken into
account.
 M.D. can be computed either from Mean,
Median or Mode. The S.D. on the other
hand, is always computed from the A.M.
because the sum of the squares of the
deviation of items from A.M. is the least.

You might also like