Measures of Dispersion (Part 1)
Measures of Dispersion (Part 1)
Dispersion
Dr. Richa Verma, DEI
Need and Meaning: Measures of Dispersion
• In the preceding lectures we have already discussed why it is necessary to tabulate and classify
statistical series and to condense them into a single figure called average.
• The average as we have already seen has its own limitations and even an ideal average can
represent a series only" as best as a single figure can".
• No doubt averages have a very great utility in statistical analysis but they fail to reveal the entire
story of a phenomenon.
• There may be a dozen series whose averages may be identical but which may differ from each
other in a hundred ways. Obviously in such cases further statistical analysis of the data is necessary
so that these differences between various series may also be studied and accounted for.
• If this is done statistical analysis would be more accurate and we shall be more confident of
conclusions.
Need and Meaning: Measures of Dispersion
• Just as central tendency can be measured by a number in the form of an average, the
amount of variation (dispersion, spread, or scatter) among the values in the data set can also
be measured.
• The measures of central tendency describe that the major part of values in the data set
appears to concentrate (cluster) around a central value called average with the remaining
values scattered (spread or distributed)on either sides of that value.
• But these measures do not reveal how these values are dispersed (spread or scattered) on
each side of the central value.
• The dispersion of values is indicated by the extent to which these values tend to spread over
an interval rather than cluster closely around an average.
Need and Meaning: Measures of Dispersion
• A small dispersion among values in the data set indicates that data are clustered closely around the mean. The mean is therefore considered
representative of the data, i.e. mean is a reliable average.
• Conversely, a large dispersion among values in the data set indicates that the mean is not reliable, i.e. it is not representative of the data.
• Illustration
• Suppose over the six-year period the net profits (in percentage) of two firms are as follows:
Firm 1 : 5.2, 4.5, 3.9, 4.7, 5.1, 5.4
Firm 2 : 7.8, 7.1, 5.3, 14.3, 11.0, 16.1
• Since average amount of profit is 4.8 per cent for both firms, therefore operating results of both the firms are equally good and that a
choice between them for investment purposes must depend on other considerations.
• However, the difference among the values is greater in Firm, 2, that is, profit is varying from 5.3 to 16.1 per cent, while the net profit
values of Firm 1 were varying from 3.9 to 5.4 per cent.
• This shows that the values in data set 2 are spread more than those in data set 1.
• This implies that Firm 1 has a consistent performance while Firm 2 has a highly inconsistent performance.
• Thus for investment purposes, a comparison of the average (mean) profit values alone should not be sufficient.
When Dispersion is high
• Test the reliability of an average: Measures of variation are used to test to what extent an average
represents the characteristic of a data set. If the variation is small, that is, extent of dispersion or scatter is
less on each side of an average, then it indicates high unformity of values in the distribution and the
average represents an individual value in the data set. On the other hand, if the variation is large, then it
indicates a lower degree of uniformity in values in the data set, and the average may be unreliable. No
variation indicates perfect uniformity and, therefore, values in the data set are identical.
• ii) Control the variability: Measuring variation helps to identify the nature and causes of variation. Such
information is useful in controlling the variations. According to Spurr and Bonini, ‘In matters of health,
variations in, body temperature, pulse beat and blood pressure are the basic guides to diagnosis.
Prescribed treatment is designed to control their variation. In industrial production, efficient operation
requires control of quality variation, the causes of which are sought through inspection and quality control
programmes.’ In social science, the measurement of ‘inequality’ of distribution of income and wealth
requires the measurement of variability.
SIGNIFICANCE OF MEASURING DISPERSION
(iii) Compare two or more sets of data with respect to their variability: Measures of variation
help in the comparison of the spread in two or more sets of data with respect to their
uniformity or consistency. For example, (i) the measurement of variation in share prices and
their comparison with respect to different companies over a period of time requires the
measurement of variation, (ii) the measurement of variation in the length of stay of patients in
a hospital every month may be used to set staffing levels, number of beds, number of doctors,
and other trained staff, patient admission rates, and so on.
iv) Facilitate the use of other statistical techniques: Measures of variation facilitate the use of
other statistical techniques such as correlation and regression analysis, hypothesis testing,
forecasting, quality control, and so on.
Essential Requisites for a Measure of Variation
The essential requisites for a good measure of variation are listed below. These requisites
help in identifying the merits and demerits of individual measures of variation.
(i) It should be rigidly defined.
(ii) It should be based on all the values (elements) in the data set.
(iii) It should be calculated easily, quickly, and accurately.
(iv) It should not be unduly affected by the fluctuations of sampling and by extreme
observations.
(v) It should be amenable to further mathematical or algebraic manipulations.
CLASSIFICATION OF MEASURES OF DISPERSION
The various measures of dispersion (variation) can be classified into two categories:
• (i) Absolute measures, and
• (ii) Relative measures
• Absolute measures are described by a number or value to represent the amount of variation or differences among values in a data set.
Such a number or value is expressed in the same unit of measurement as the set of values in the data such as rupees, inches, feet,
kilograms, or tonnes. Such measures help in comparing two or more sets of data in terms of absolute magnitude of variation, provided
the variable values are expressed in the same unit of measurement and have almost the same average value.
• The relative measures are described as the ratio of a measure of absolute variation to an average and is termed as coefficient of
variation. The word ‘coefficient’ means a number that is independent of any unit of measurement. While computing the relative
variation, the average value used as base should be the same from which the absolute deviations were calculated.
Measures of dispersion
• The range is the most simple measure of dispersion and is based on the location of the largest and the
• smallest values in the data.
• Thus, the range is defined to be the difference between the largest and lowest observed values in a data set.
In other words, it is the length of an interval which covers the highest and lowest observed values in a data
set and thus measures the dispersion or spread within the interval in the most direct possible way.
Range (R) = Highest value of an observation – Lowest value of an observation
Range (R) = H – L
• For example, if the smallest value of an observation in the data set is 160 and largest value is 250, then the
range is 250 – 160 = 90.
• For grouped frequency distributions of values in the data set, the range is the difference between the upper
class limit of the last class and the lower class limit of first class. In this case, the range obtained may be
higher than as compared to ungrouped data because of the fact that the class limits are extended slightly
beyond the extreme values in the data set.
Coefficient of Range
• The relative measure of range, called the coefficient of range is obtained by applying the
following
• Formula:
• Example : The following are the sales figures of a firm for the last 12
months.
Months : 1 2 3 4 5 6 7 8 9 10 11 12
Sales: (Rs. ’000) : 80 82 82 84 84 86 86 88 88 90 90 92
Calculate the range and coefficient of range for sales.
• Fluctuation in share prices: The range is useful in the study of small variations among values in a data
set, such as variation in share prices and other commodities that are very sensitive to price changes
from one period to another.
• Quality control: It is widely used in industrial quality control. Quality control is exercised by
preparing suitable control charts. These charts are based on setting an upper control limit (range)
and a lower control limit (range) within which produced items shall be accepted. The variation in the
quality beyond these ranges requires necessary correction in the production process or system.
• Weather forecasts: The concept of range is used to determine the difference between maximum
and minimum temperature or rainfall by meteorological departments to announce for the knowledge
of the general public.
Interquartile Range
Interquartile Range or Deviation
• The limitations or disadvantages of the range can partially be overcome by using another measure of
variation which measures the spread over the middle half of the values in the data set so as to
minimise the influence of outliers (extreme values) in the calculation of range.
• Since a large number of values in the data set lie in the central part of the frequency distribution,
therefore it is necessary to study the Interquartile Range (also called midspread).
• To compute this value, the entire data set is divided into four parts each of which contains 25 per
cent of the observed values.
• The quartiles are the highest values in each of these four parts. The interquartile range is a measure
of dispersion or spread of values in the data set between the third quartile, Q3 and the first quartile,
Q1. In other words, the interquartile range or deviation (IQR) is the range for the middle 50 per
cent of the data.
Interquartile range (IQR) = Q3 – Q1
Interquartile Range
• The median is not necessarily midway
between Q1 and Q3, although this will be so
for a symmetrical distribution. The median
and quartiles divide the data into equal
numbers of values but do not necessarily
divide the data into equally wide intervals.
• In a non-symmetrical distribution, the two
quartiles Q1 and Q3 are at equal distance
from the median, that is, Median – Q1 = Q3
– Median. Thus, Median ± Quartile
Deviation covers exactly 50 per cent of the
observed values in the data set.
• A smaller value of quartile deviation
indicates high uniformity or less variation
among the middle 50 per cent observed
values around the median value. On the
other hand, a high value of quartile deviation
indicates large variation among the middle
50 per cent observed values.
SEMI-INTER-QUARTILE
RANGE/ QUARTILE
DEVIATION
SEMI-INTER QUARTILE RANGE/ QUARTILE DEVIATION
• Where Q3 and Q1 stand for the upper and lower quartiles respectively.
SEMI-INTER QUARTILE RANGE/ QUARTILE DEVIATION
• In a symmetrical series median lies half way on the scale from and Q1
to Q3. If, therefore, the value of the quartile deviation is added to the
lower quartile or subtracted from the upper quartile, in a symmetrical
series, the resulting figure would be the value of the median.
• But generally series are not symmetrical and in a moderately
asymmetrical series Q1 - quartile deviation or Q3 - quartile deviation,
would not give true value of the median.
• There would be a difference between the two figures and the greater
the difference, the greater would be the extent of departure from
normality.
Coefficient of Quartile Deviation
• Since quartile deviation is an absolute measure of variation, therefore its value gets affected by the size
• and number of observed values in the data set. Thus, the Q.D. of two or more than two sets of data may
• differ. Due to this reason, to compare the degree of variation in different sets of data, we compute the
• relative measure corresponding to Q.D., called the coefficient of Q.D., and it is calculated as follows:
(i) It is not difficult to calculate but can only be used to evaluate variation
among observed values within the middle of the data set. Its value is not
affected by the extreme (highest and lowest) values in the data set.
(i) The value of Q.D. is based on the middle 50 per cent observed values in the data
set, therefore it cannot be considered as a good measure of variation as it is not
based on all the observations.
(iii) The Q.D. has no relationship with any particular value or an average in the data
set for measuring the variation. Its value is not affected by the distribution of the
individual values within the interval of the middle 50 per cent observed values.