Unit 4
Unit 4
Objectives
After going through this unit, you will learn:
• the concept and significance of measuring variability
• the concept of absolute and relative variation
• the computation of several measures of variation, such as the range,
quartile deviation, average deviation and standard deviation and also
their coefficients
• the concept of skewness and its importance
• the computation of coefficient of skewness.
Structure
4.1 Introduction
4.2 Significance of Measuring Variation
4.3 Properties of a Good Measure of Variation
4.4 Absolute and Relative Measures of Variation
4.5 Range
4.6 Quartile Deviation
4.7 Average Deviation
4.8 Standard Deviation
4.9 Coefficient of Variation
4.10 Skewness
4.11 Relative Skewness
4.12 Summary
4.13 Key Words
4.14 Self-assessment Exercises
4.15 Further Readings
4.1 INTRODUCTION
In the previous unit, we were concerned with various measures that are used
to provide a single representative value of a given set of data. This single
value alone cannot adequately describe a set of data. Therefore, in this unit,
we shall study two more important characteristics of a distribution. First we
shall discuss the concept of variation and later the concept of skewness.
A measure of variation (or dispersion) describes the spread or scattering of
the individual values around the central value. To illustrate the concept of
variation, let us consider the data given below:
57
Data Collection Firm A Firm B Firm C
and Analysis
Daily Sales (Rs.) Daily Sales (Rs.) Daily Sales (Rs.)
5000 5050 4900
5000 5025 3100
5000 4950 2200
5000 4835 1800
5000 5140 13000
�
X� = 5000 �
X� = 5000 �
X� = 5000
Since the average sales for firms A, B and C is the same, we are likely to
conclude that the distribution pattern of the sales is similar. It may be
observed that in Firm A, daily sales are the same irrespective of the day,
whereas there is less amount of variation in the daily sales for firm 13 and
greater amount of variation in the daily sales for firm C. Therefore, different
sets of data may have the same measure central tendency but differ greatly in
terms of variation.
58
Measures of
4.4 ABSOLUTE AND RELATIVE MEASURES Variation and
OF VARIATION Skewness
4.5 RANGE
The range is defined as the difference between the highest (numerically
largest) value and the lowest (numerically smallest) value in a set of data. In
symbols, this may be indicated as:
R = H - L,
where R = Range; H = Highest Value; L = Lowest Value
As an illustration, consider the daily sales data for the three firms as given
earlier.
For firm A, R = H - L = 5000 - 5000 = 0
For firm B, R = H - L = 5140 - 4835 = 305
For firm C, R = H - L = 13000 - 1800 = 11200
The interpretation for the value of range is very simple.
In this example, the variation is nil in case of daily sales for firm A, the
variation is small in case of firm B and variation is very large in case of firm
C.
The range is very easy to calculate and it gives us some idea about the
variability of the data. However, the range is a crude measure of variation,
since it uses only two extreme values.
The concept of range is extensively used in statistical quality control. Range
is helpful in studying the variations in the prices of shares and debentures and
other commodities that are very sensitive to price changes from one period to
another. For meteorological departments, the range is a good indicator for
weather forecast.
For grouped data, the range may be approximated as the difference between
the upper limit of the largest class and the lower limit of the smallest class.
The relative measure corresponding to range, called the coefficient of range,
is obtained by applying the following formula
���
Coefficient of range = ���
59
Data Collection Activity A
and Analysis
Following are the prices of shares of a company from Monday to Friday:
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
………………………………………………………………………………..
………………………………………………………………………………..
………………………………………………………………………………..
………………………………………………………………………………..
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
63
Data Collection
and Analysis
4.8 STANDARD DEVIATION
The standard deviation is the most widely used and important measure of
variation. In computing the average deviation, the signs are ignored. The
standard deviation overcomes this problem by squaring the deviations, which
makes them all positive. The standard deviation, also known as root mean
square deviation, is generally denoted by the lower case Greek letter a (read
as sigma). In symbols, this can be expressed as
∑(X − ��)�
�=�
N
∑f(X − ��)�
�=�
N
8-10 9 8 -3 -24 72
10-12 11 12 -2 -24 48
12-14 13 20 -1 -20 20
14-16 15 30 0 0 0
16-18 17 20 +1 +20 20
18-20 19 10 +2 +20 40
N = 100 ∑fd = −28 ∑fd� = 200
= √2 − 0.0784 × 2 = √1.9216 × 2
= 1.3862 × 2 = 2.7724 ≃ 2.77
The standard deviation is most commonly used to measure variability, while
all other measures have rather special uses. In addition, it is the only measure
possessing the necessary mathematical properties (like combined standard
deviation) to make it useful for advanced statistical work.
Activity E
The following data show the daily sales at a petrol station. Calculate the
mean and standard deviation.
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
Compare the variability of the life of the two types of electric lamps using the
coefficient of variation.
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
4.10 SKEWNESS
The measures of central tendency and variation do not reveal all the
66 characteristics of a given set of data. For example, two distributions may
have the same mean and standard deviation but may differ widely in the Measures of
Variation and
shape of their distribution. Either the distribution of data is symmetrical or it Skewness
is not. If the distribution of data is not symmetrical, it is called asymmetrical
or skewed. Thus skewness refers to the lack of symmetry in distribution.
A simple method of detecting the direction of skewness is to consider the
tails of the distribution (Figure I). The rules are:
Data are symmetrical when there are no extreme values in a particular
direction so that low and high values balance each other. In this case, mean =
median = mode. (see Fig I(a) ).
If the longer tail is towards the lower value or left hand side, the skewness is
negative. Negative skewness arises when the mean is decreased by some
extremely low values, thus making mean < median < mode. (see Fig I(b) ).
If the longer tail of the distribution is towards the higher values or right hand
side, the skewness is positive. Positive skewness occurs when mean is
increased by some unusually high values, thereby making mean > median >
mode. (see Fig I(c) )
67
Data Collection
and Analysis
4.11 RELATIVE SKEWNESS
In order to make comparisons between the skewness in two or more
distributions, the coefficient of skewness (given by Karl Pearson) can be
defined as:
Mean - Mode
SK. =
S. D.
If the mode cannot he determined, then using the approximate relationship,
Mode = 3 Median - 2 Mean, the above formula reduces to
3 (Mean - Median)
SK. =
S.D.
if the value of this coefficient is zero, the distribution is symmetrical; if the
value of the coefficient is positive, it is positively skewed distribution, or if
the value of the coefficient is negative, it is negatively skewed distribution. In
practice, the value of this coefficient usually lies between ± 1.
When we are given open-end distributions where extreme values are present
in the data or positional measures such as median and quartiles, the following
formula for coefficient of skewness (given by Bowley) is more appropriate.
Q� + Q� − 2Median
SK. =
Q � − Q�
Again if the value of this coefficient is zero, it is a symmetrical distribution.
For positive value, it is positively skewed distribution and for negative value,
it is negatively skewed distribution.
To explain the concept of coefficient of skewness, let us consider the
following data.
Since the given distribution is not open-ended and also the mode can be
determined, it is appropriate to apply Karl Pearson formula as given below:
Mean - Mode
SK. =
S. D.
Profits m.p. f d= fd fd2
(Rs. thousand) X (X- 17)/2
10-12 11 7 -3 -21 63
12-14 13 15 -2 -30 60
14-16 15 18 -1 -18 18
68
Measures of
16-18 17 20 0 0 0 Variation and
Skewness
18-20 19 25 +1 25 25
20-22 21 10 +2 20 40
22-24 23 5 +3 15 45
N = 100 ∑fd = −9 ∑fd� = 251
∑�� 9
�� = � + × � = 17 − × 2 = 17 − 0.18 = 16.82
� 100
d� 5
Mode = L + × i = 18 + × 2 = 18 + 0.5 = 18.5
d� + d� 5 + 15
4.12 SUMMARY
In this unit, we have shown how the concepts of measures of variation and
skewness are important. Measures of variation considered were the range,
average deviation, quartile deviation and standard deviation. The concept of
coefficient of variation was used to compare relative variations of different
data. The skewness was used in relation to lack of symmetry.
700-800 28 1000-1100 30
800-900 32 1100-1200 25
900-1000 40 1200-1300 15
7) Calculate the mean, standard deviation and variance for the following
data
12) You are given the following information before and after the settlement
of workers' strike.
73