Unit 8 Measures of Variation and Skewness: Objectives
Unit 8 Measures of Variation and Skewness: Objectives
SKEWNESS
Objectives
After going through this unit, you will learn:
Since the average sales for firms A, B and C is the same, we are likely to conclude
that the distribution pattern of the sales is similar. It may be observed that in Firm A,
daily sales are the same irrespective of the day, whereas there is less amount of
variation in the daily sales for firm 13 and greater amount of variation in the daily
sales for firm C. Therefore, different sets of data may have the same measure central
tendency but differ greatly in terms of variation.
47
Data Collection and
Analysis
8.2 SIGNIFICANCE OF MEASURING VARIATION
Measuring variation is significant for some of the following purposes.
i) Measuring variability determines the reliability of an average by pointing out as
to how far an average is representative of the entire. data.
ii) Another purpose of measuring variability is to determine the nature and cause
variation in order to control the variation itself.
iii) Measures of variation enable comparisons of two or more distributions with
regard to their variability.
iv) Measuring variability is of great importance to advanced statistical analysis. For
example, sampling or statistical inference is essentially a problem in measuring
variability.
8.5 RANGE
The range is defined as the difference between the highest (numerically largest) value
and the lowest (numerically smallest) value in a set of data. In symbols, this may be
indicated as:
R = H - L,
where R = Range; H = Highest Value; L = Lowest Value
As an illustration, consider the daily sales data for the three firms as given earlier.
For firm A, R = H - L = 5000 - 5000 =0
For firm B, R = H - L = 5140 – 4835 = 305
For firm C, R = H - L = 13000 – 18000 = 11200
The interpretation for the value of range is very simple.
In this example, the variation is nil in case of daily sales for firm A, the variation is
small in case of firm B and variation is very large in case of firm C.
48
The range is very easy to calculate and it gives us some idea about the variability of Measures of Variation and
the data. However, the range is a crude measure of variation, since it uses only two Skewness
extreme values.
The concept of range is extensively used in statistical quality control. Range is
helpful in studying the variations in the prices of shares and debentures and other
commodities that are very sensitive to price changes from one period to another. For
meteorological departments, the range is a good indicator for weather forecast.
For grouped data, the range may be approximated as the difference between the
upper limit of the largest class and the lower limit of the smallest class.
The relative measure corresponding to range, called the coefficient of range, is
obtained by applying the following formula
H-L
Coefficient of range =
H+L
Activity A
Following are the prices of shares of a company from Monday to Friday:
Day : Monday Tuesday Wednesday Thursday Friday
Price : 670 678 750 705 720
Compute the value of range and interpret the value.
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
Activity B
Calculate the coefficient of range from the following data:
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
8.6 QUARTILE DEVIATION
The quartile deviation, also known as semi-interquartile range, is computed by taking
the average of the difference between the third quartile and the first quartile. In
symbols, this can be written as:
Q3 - Q1
Q.D. =
2
where Q1 = first quartile, and Q3 = third quartile.
The following illustration would clarify the procedure involved. For the data given
below, compute the quartile deviation.
49
Data Collection and To compute quartile deviation, we need the values of the first quartile and the third
Analysis quartile which can be obtained from the following table:
Monthly Wages No. of workers C.F.
(Rs.) f
Below 850 12 12
850-900 16 28
900-950 39 67
950 -1000 56 123
1000-1050 62 185
1050-1100 75 260
I100-1150 30 290
1150 and above I0 300
The quartile deviation is superior to the range as it is not based on two extreme
values but rather on middle 50% observations. Another advantage of quartile
deviation is that it is the only measure of variability which can be used for open-end
distribution.
The disadvantage of quartile deviation is that it ignores the first and the last 25%
observations.
Activity C
A survey of domestic consumption of electricity gave the following distribution of
the units consumed. Compute the quartile deviation and its coefficient.
Number of units Numberofconsumers Number of units Numberofconsumers
Below 200 9 800-1000 45
200-400 18 1000-1200 38
400-600 27 1200-1400 20
600-800 32 1400 & above 11
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
50
Measures of Variation and
8.7 AVERAGE DEVIATION Skewness
The measure of average (or mean) deviation is an improvement over the previous two
measures in that it considers all observations in the given set of data. This measure is
computed as the mean of deviations from the mean or the median. All the deviations
are treated as positive regardless of sign. In symbols, this can be represented by:
A.D. =
∑X-X or
∑ X - Median
N N
Theoretically speaking, there is an advantage in taking the deviations from median
because the sum of the absolute deviations (i.e. ignoring ± signs) from median is
minimum. In actual practice, however, arithmetic mean is more popularly used in
computation of average deviation.
For grouped data, the formula to be used is given as:
A.D. =
∑X-X
N
As an illustration, consider the following grouped data which relate to the sales of
100 companies.
The relative measure corresponding to the average deviation, called the coefficient of
average deviation, is obtained by dividing average deviation by the particular average
used in computing the average deviation. Thus, if average deviation has been
computed from median, the coefficient of average deviation shall be obtained by
dividing the average deviation by the median.
A.D. A.D.
Coefficient of A.D. = or
Median Mean
Although the average deviation is a good measure of variability, its use is limited. If
one desires only to measure and compare variability among several sets of data, the
average deviation may be used.
51
Data Collection and The major disadvantage of the average deviation is its lack of mathematical
Analysis properties. This is more true because non-use of signs in its calculations makes it
algebraically inconsistent.
Activity D
Calculate the average deviation and coefficient of the average deviation from the
following data.
Sales No. of days Sales No. of days
(Rs. thousand) (Rs. thousand)
Less than 20 3 Less than 50 23
Less than 30 9 Less than 60 25
Less than 40 20
.........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
σ=
∑ (X - X) 2
N
The square of the standard deviation is called variance. Therefore
Variance = σ 2
The standard deviation and variance become larger as the cm a within the data
becomes greater. More important, it is readily comparable with other standard
deviations and the greater the standard deviation, the greater the variability.
For grouped data, the formula is
σ=
∑ f(X - X) 2
N
The following formulas for standard deviation are mathematically equivalent to the
above formula and are often more convenient to use in calculations.
∑ fX ∑ fX ∑ fX
2 2 2
2
σ= − = −X
N N N
∑ fd ∑ fd
2 2
X-A
= − × i Where d =
N N i
52
Remarks: If the data represent a sample of size N from a population, then it can be Measures of Variation and
proved that the sum of the squared deviations are divided by (N-1) instead of by N. Skewness
However, for large sample sizes, there is very little difference in the use of (N-1) or
N in computing the standard deviation.
To understand the formula for grouped data, consider the following data which relate
to the profits of 100 companies.
Profit No. of companies Profit No. of companies
(Rs. lakhs) (Rs. lakhs)
8-10 8 14-16 30
10-12 12 16-18 20
12-14 20 18-20 10
To compute standard deviation we construct the following table:
The standard deviation is commonly used to measure variability, while all other
measures have rather special uses. In addition, it is the only measure possessing the
necessary mathematical properties to make it useful for advanced statistical work.
Activity E
The following data show the daily sales at a petrol station. Calculate the mean and
standard deviation.
Number of No. of days Number of No. of days
litres sold litres sold
700-1000 12 1900-2200 18
1000-1300 18 2200-2500 5
1300-1600 20 2500-2800 2
1600-1900
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
……………………………………………………………………………………….....
53
Data Collection and
Analysis
8.9 COEFFICENT OF VARIATION
A frequently used relative measure of variation is the coefficient of variation, denoted
by C.V. This measure is simply the ratio of the standard deviation to mean expressed
as the percentage.
σ
Coefficient of variation = C.V. = 100 when the coefficient of variation is less in
X
the data, it is said to be less variable or more consistent.
Consider the following data which relate to the mean daily sales and standard
deviation for four regions.
As the coefficient of variation is minimum for Region1, therefore the most consistent
region is Region1.
Activity F
A factory produces two types of electric lamps, A and B. In an experiment re1ating to
their life, the following results were obtained.
Length of life Type A Type B
(in hours) No. of lamps No. of lamps
500-700 5 4
700-900 11 30
900-1100 26 12
1100-1300 10 8
1300-1500 8 6
Compare the variability of the life of the two types of electric lamps using the
coefficient of variation.
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
8.10 SKEWNESS
The measures of central tendency and variation do not reveal all the characteristics of
a given set of data. For example, two distributions may have the same mean and
54
standard deviation but may differ widely in the shape of their distribution. Either the Measures of Variation and
distribution of data is symmetrical or it is not. If the distribution of data is not Skewness
symmetrical, it is called asymmetrical or skewed. Thus skewness refers to the lack of
symmetry in distribution.
A simple method of detecting the direction of skewness is to consider the tails of the
distribution (Figure I). The rules are:
Data are symmetrical when there are no extreme values in a particular direction so
that low and high values balance each other. In this case, mean = median = mode.
(see Fig I(a) ).
If the longer tail is towards the lower value or left hand side, the skewness is
negative. Negative skewness arises when the mean is decreased by some extremely
low values, thus making mean < median < mode. (see Fig I(b) ).
If the longer tail of the distribution is towards the higher values or right hand side, the
skewness is positive. Positive skewness occurs when mean is increased by some
unusually high values, thereby making mean > median > mode. (see Fig I(c) )
Figure I
(a)
Symmetrical Distribution
(b)
(c)
Positively skewed distribution
55
Data Collection and
Analysis
8.11 RELATIVE SKEWNESS
In order to make comparisons between the skewness in two or more distributions, the
coefficient of skewness (given by Karl Pearson) can be defined as:
Mean - Mode
SK. =
S.D.
If the mode cannot he determined, then using the approximate relationship, Mode = 3
Median - 2 Mean, the above formula reduces to
3 (Mean - Median)
SK. =
S.D.
if the value of this coefficient is zero, the distribution is symmetrical; if the value of
the coefficient is positive, it is positively skewed distribution, or if the value of the
coefficient is negative, it is negatively skewed distribution. In practice, the value of
this coefficient usually lies between ± I.
When we are given open-end distributions where extreme values are present in the
data or positional measures such as median and quartiles, the following formula for
coefficient of skewness (given by Bowley) is more appropriate.
Q3 + Q1 - 2 Median
SK. =
Q3 − Q1
To explain the concept of coefficient of skewness, let us consider the following data.
Profits No. of Profits No. of
(Rs. thousand) companies (Rs. thousand) companies
10-12 7 18-20 25
12-14 15 20-22 10
14-16 18 22-24 5
16-18 20
Since the given distribution is not open-ended and also the mode can be determined,
it is appropriate to apply Karl Pearson formula as given below:
Mean - Mode
SK. =
S.D.
Profits m.p. f d=(X- fd fd2
(Rs. thousand) X 17)/2
10-12 11 7 -3 -21 63
12-14 13 15 -2 -30 60
14-16 15 18 -1 -18 18
16-18 17 20 0 0 0
I8-20 19 25 +1 25 25
20-22 21 10 +2 20 40
22-24 23 5 +3 15 45
N = 100
∑ fd = -9 ∑ fd 2
= 251
56
Measures of Variation and
Skewness
Sales
(Rs. lakhs) No. of companies c.f.
Below 50 8 8
50-60 12 20
60-70 20 40
70-80 25 65
80 & above 15 80
This value of coefficient of skewness indicates that the distribution is slightly skewed
to the left and therefore there is a greater concentration of the sales at the higher
values than the lower values of the distribution.
8.12 SUMMARY
In this unit, we have shown how the concepts of measures of variation and skewness
are important. Measures of variation considered were the range, average deviation, 57
Data Collection and quartile deviation and standard deviation. The concept of coefficient of variation was
Analysis used to compare relative variations of different data. The skewness was used in
relation to lack of symmetry.
Interquartile Range considers the spread in the middle 50% (Q3 – Q1 ) of the data.
Quartile Deviation is one half the distance between first and third quartiles.
Range is the difference between the largest and the smallest value in a set of data.
Relative Variation is used to compare two or more distributions by relating the
variation of one distribution to the variation of the other.
Skewness refers to the lack of symmetry.
Standard Deviation is the root mean square deviation of a given set of data.
Variance is the square of standard deviation and is defined as the arithmetic mean of
the squared deviations from the mean.
12 You are given the following information before and after the settlement of
workers' strike.
Assuming that the increase in wage is a loss to the management, comment on the
gains and losses from the point of view of workers and that of management.
59
Data Collection and
Analysis
8.15 FURTHER READINGS
Clark, T.C. and E.W. Jordan, 1985. Introduction to Business and Economic Statistics,
South-Western Publishing Co.:
Gupta, S.P. and M.P. Gupta, 1988. Business Statistics, Sultan Chand & Sons: New
Delhi.
Moskowitz, H. and G.P. Wright, 1985. Statistics for Management and Economics,
Charles E. Merill Publishing Company.
60