Measure of Dispersion and Skwness
Measure of Dispersion and Skwness
Measure of Dispersion and Skwness
LECTURER:ANGELA JOHN
3.1 Introduction
3.2 Meaning and Definition of Dispersion
3.3 Significance and Properties of Measuring Variation
3.4 Measures of Dispersion
3.5 Range
3.6 Interquartile Range or Quartile Deviation
3.7 Mean Deviation
3.8 Standard Deviation
3.9 Lorenz Curve
3.10 Skewness: Meaning and Definitions
3.11 Tests of Skewness
3.12 Measures of Skewness
3.13 Moments
3.14 Kurtosis
3.15 Summary
3.16 Self-Test Questions
3.17 Suggested Readings
3.1 INTRODUCTION
In the previous chapter, we have explained the measures of central tendency. It may
be noted that these measures do not indicate the extent of dispersion or variability in a
distribution. The dispersion or variability provides us one more step in increasing our
understanding of the pattern of the data. Further, a high degree of uniformity (i.e. low
variability in the raw material, then it could not find mass production economical.
55
Suppose an investor is looking for a suitable equity share for investment. While
examining the movement of share prices, he should avoid those shares that are highly
fluctuating-having sometimes very high prices and at other times going very low.
Such extreme fluctuations mean that there is a high risk in the investment in shares.
The investor should, therefore, prefer those shares where risk is not so high.
The various measures of central value give us one single figure that represents the
entire data. But the average alone cannot adequately describe a set of observations,
unless all the observations are the same. It is necessary to describe the variability or
dispersion of the observations. In two or more distributions the central value may be
the same but still there can be wide disparities in the formation of distribution.
distribution.
It is clear from above that dispersion (also known as scatter, spread or variation)
measures the extent to which the items vary from some central value. Since measures
they are also called averages of the second order. An average is more meaningful
when it is examined in the light of dispersion. For example, if the average wage of the
56
workers of factory A is Rs. 3885 and that of factory B Rs. 3900, we cannot
necessarily conclude that the workers of factory B are better off because in factory B
there may be much greater dispersion in the distribution of wages. The study of
following example:
100 100 1
100 102 2
100 103 3
100 90 5
57
arithmetic mean and hence there is no dispersion. In series B, only one item is
perfectly represented by the arithmetic mean and the other items vary but the variation
the arithmetic mean and the items vary widely from one another. In series C,
dispersion is much greater compared to series B. Similarly, we may have two groups
of labourers with the same mean salary and yet their distributions may differ widely.
The mean salary may not be so important a characteristic as the variation of the items
from the mean. To the student of social affairs the mean income is not so vitally
important as to know how this income is distributed. Are a large number receiving the
mean income or are there a few with enormous incomes and millions with incomes far
below the mean? The three figures given in Box 3.1 represent frequency distributions
with some of the characteristics. The two curves in diagram (a) represent two
distractions with the same mean X , but with different dispersions. The two curves in
(b) represent two distributions with the same dispersion but with unequal means X l
and X 2, (c) represents two distributions with unequal dispersion. The measures of
central tendency are, therefore insufficient. They must be supported and supplemented
measures the extent to which there are differences between individual observation and
amount of the variation or its degree but not in the direction. For example, a measure
of 6 inches below the mean has just as much dispersion as a measure of six inches
58
Literally meaning of dispersion is ‘scatteredness’. Average or the measures of central
tendency gives us an idea of the concentration of the observations about the central
part of the distribution. If we know the average alone, we cannot form a complete idea
about the distribution. But with the help of dispersion, we have an idea about
VARIATION
the mass. When dispersion is small, the average is a typical value in the sense
that it closely represents the individual value and it is reliable in the sense that
hand, when dispersion is large, the average is not so typical, and unless the
in body temperature, pulse beat and blood pressure are the basic guides to
the causes of which are sought through inspection is basic to the control of
with regard to their variability. The study of variation may also be looked
59
upon as a means of determining uniformity of consistency. A high degree of
Deviation, Mean deviation, Standard Deviation, and Lorenz curve. Among them, the
first four are mathematical methods and the last one is the graphical method. These
3.5 RANGE
The simplest measure of dispersion is the range, which is the difference between the
Example 3.1: Find the range for the following three sets of data:
Set 1: 05 15 15 05 15 05 15 15 15 15
Set 2: 8 7 15 11 12 5 13 11 15 9
60
Set 3: 5 5 5 5 5 5 5 5 5 5
Solution: In each of these three sets, the highest number is 15 and the lowest number
is 5. Since the range is the difference between the maximum value and the minimum
value of the data, it is 10 in each case. But the range fails to give any idea about the
dispersal or spread of the series between the highest and the lowest value. This
upper limit of the highest class and the lower limit of the lowest class.
Example 3.2: Find the range for the following frequency distribution:
Solution: Here, the upper limit of the highest class is 120 and the lower limit of the
lowest class is 20. Hence, the range is 120 - 20 = 100. Note that the range is not
S, where L is the largest value and S is the smallest value in a distribution. The
coefficient of range is calculated by the formula: (L-S)/ (L+S). This is the relative
measure. The coefficient of the range in respect of the earlier example having three
sets of data is: 0.5.The coefficient of range is more appropriate for purposes of
Example 3.3: Calculate the coefficient of range separately for the two sets of data
given below:
Set 1 8 10 20 9 15 10 13 28
Set 2 30 35 42 50 32 49 39 33
61
Solution: It can be seen that the range in both the sets of data is the same:
Set 1 28 - 8 = 20
Set 2 50 - 30 = 20
28 – 8 = 0.55
28+8
Coefficient of range in set 2 is:
50 – 30
= 0.25
50 +30
1. It is based only on two items and does not cover all the items in a distribution.
population.
3. It fails to give any idea about the pattern of distribution. This was evident from
the range.
Despite these limitations of the range, it is mainly used in situations where one wants
to quickly have some idea of the variability or' a set of data. When the sample size is
very small, the range is considered quite adequate measure of the variability. Thus, it
is widely used in quality control where a continuous check on the variability of raw
weather forecast. The meteorological department uses the range by giving the
maximum and the minimum temperatures. This information is quite useful to the
common man, as he can know the extent of possible variation in the temperature on a
particular day.
62
3.6 INTERQUARTILE RANGE OR QUARTILE DEVIATION
distribution than the range. Here, avoiding the 25 percent of the distribution at both
the ends uses the middle 50 percent of the distribution. In other words, the
interquartile range denotes the difference between the third quartile and the first
quartile.
Many times the interquartile range is reduced in the form of semi-interquartile range
When quartile deviation is small, it means that there is a small deviation in the central
50 percent items. In contrast, if the quartile deviation is high, it shows that the central
distribution, the two quartiles, that is, Q3 and QI are equidistant from the median.
Symbolically,
M-QI = Q3-M
However, this is seldom the case as most of the business and economic data are
asymmetrical. But, one can assume that approximately 50 percent of the observations
are contained in the interquartile range. It may be noted that interquartile range or the
Q3 –Q1
Coefficient of QD = Q3 +Q1
upper and lower quartiles. As the computation of the two quartiles has already been
63
3.6.1 MERITS OF QUARTILE DEVIATION
The mean deviation is also known as the average deviation. As the name implies, it is
the average of absolute amounts by which the individual items deviate from the mean.
Since the positive deviations from the mean are equal to the negative deviations,
while computing the mean deviation, we ignore positive and negative signs.
Symbolically,
from the mean ignoring positive and negative signs, n = the total number of
observations.
64
Example 3.4:
Solution:
2-4 3 20 60 -2.6 52
4-6 5 40 200 -0.6 24
6-8 7 30 210 1.4 42
8-10 9 10 90 3.4 34
Total 100 560 152
x =
fm 560 5.6
n 100
f |d |
152
1.52
MD ( x ) =
n 100
easy to calculate.
2. It takes into consideration each and every item in the distribution. As a result,
a change in the value of any item will have its effect on the magnitude of mean
deviation.
3. The values of extreme items have less effect on the value of the mean
deviation.
65
2. At times it may fail to give accurate results. The mean deviation gives best
results when deviations are taken from the median instead of from the mean.
But in a series, which has wide variations in the items, median is not a
satisfactory measure.
algebraic signs when the deviations are taken from the mean.
The standard deviation is similar to the mean deviation in that here too the deviations
are measured from the mean. At the same time, the standard deviation is preferred to
the mean deviation or the quartile deviation or the range because it has desirable
mathematical properties.
Before defining the concept of the standard deviation, we introduce another concept
viz. variance.
Example 3.5:
X X- (X-)2
20 20-18=2 4
15 15-18= -3 9
19 19-18 = 1 1
24 24-18 = 6 36
16 16-18 = -2 4
14 14-18 = -4 16
108 Total 70
Solution:
108
Mean = = 18
6
66
The second column shows the deviations from the mean. The third or the last column
shows the squared deviations, the sum of which is 70. The arithmetic mean of the
x 2
= 70/6=11.67 approx.
N
This mean of the squared deviations is known as the variance. It may be noted that
this variance is described by different terms that are used interchangeably: the
variance of the distribution X; the variance of X; the variance of the distribution; and
It is also written as 2
x i 2
N
(points). If a distribution relates to income of families then the variance is (Rs) 2 and
not rupees. Similarly, if another distribution pertains to marks of students, then the
unit of variance is (marks)2. To overcome this inadequacy, the square root of variance
is taken, which yields a better measure of dispersion known as the standard deviation.
Taking our earlier example of individual observations, we take the square root of the
variance
Symbolically, = x i
2
In applied Statistics, the standard deviation is more frequently used than the variance.
67
x 2
x
2 i
i
= N
N
We use this formula to calculate the standard deviation from the individual
Example 7.6:
X X2
20 400
15 225
19 361
24 576
16 256
14 196
108 2014
Solution:
x 2 2014
i x i 108 N=6
2014
1082 11664
2014
= 6 Or, = 6
6 6
= 70 Or, = 11.67
6
= 3.42
Example 3.7:
68
60- 70 6
70- 80 3
80- 90 2
90-100 1
Solution:
fim i
2
= i1
Where mi is the mid-point of the class intervals is the mean of the distribution, fi is
the frequency of each class; N is the total number of frequency and K is the number of
classes. This formula requires that the mean be calculated and that deviations (mi -
) be obtained for each class. To avoid this inconvenience, the above formula can be
modified as:
K
K
fid
fd
i
2
i
= i 1 i 1
N
Where C is the class interval: fi is the frequency of the ith class and di is the deviation
of the of item from an assumed origin; and N is the total number of observations.
231 45
2
= 10
55 55
69
=10 4.2 0.669421
=18.8 marks
When it becomes clear that the actual mean would turn out to be in fraction,
calculating deviations from the mean would be too cumbersome. In such cases,
an assumed mean is used and the deviations from it are calculated. While mid-
point of any class can be taken as an assumed mean, it is advisable to choose
the mid-point of that class that would make calculations least cumbersome.
Guided by this consideration, in Example 3.7 we have decided to choose 55 as
the mid-point and, accordingly, deviations have been taken from it. It will be
seen from the calculations that they are considerably simplified.
3.8.1 USES OF THE STANDARD DEVIATION
determine as to how far individual items in a distribution deviate from its mean. In a
(i) About 68 percent of the values in the population fall within: + 1 standard
(ii) About 95 percent of the values will fall within +2 standard deviations from the
mean.
(iii) About 99 percent of the values will fall within + 3 standard deviations from
the mean.
the same units as the original data. As such, it cannot be a suitable measure while
comparing two or more distributions. For this purpose, we should use a relative
variation, which relates the standard deviation and the mean such that the standard
deviation is expressed as a percentage of mean. Thus, the specific unit in which the
standard deviation is measured is done away with and the new unit becomes percent.
70
Symbolically, CV (coefficient of variation) = x 100
Example 3.8: In a small business firm, two typists are employed-typist A and typist
B. Typist A types out, on an average, 30 pages per day with a standard deviation of 6.
Typist B, on an average, types out 45 pages with a standard deviation of 10. Which
Solution: Coefficient of variation for A x 100
6
Or A x 100
30
Or 20% and
Coefficient of variation for B x 100
10
B x 100
45
or 22.2 %
These calculations clearly indicate that although typist B types out more pages, there
is a greater variation in his output as compared to that of typist A. We can say this in a
different way: Though typist A's daily output is much less, he is more consistent than
two groups of data having different means, as has been the case in the above example.
in units of the standard deviation, is called a standardised variable. Since both the
numerator and the denominator are in the same units, a standardised variable is
independent of units used. If deviations from the mean are given in units of the
standard deviation, they are said to be expressed in standard units or standard scores.
71
Through this concept of standardised variable, proper comparisons can be made
compositions differ.
Example 3.9: A student has scored 68 marks in Statistics for which the average
marks were 60 and the standard deviation was 10. In the paper on Marketing, he
scored 74 marks for which the average marks were 68 and the standard deviation was
15. In which paper, Statistics or Marketing, was his relative standing higher?
the mean x in terms of standard deviation s. For Statistics, Z = (68 - 60) 10 = 0.8
Since the standard score is 0.8 in Statistics as compared to 0.4 in Marketing, his
Example 3.10: Convert the set of numbers 6, 7, 5, 10 and 12 into standard scores:
Solution:
X X2
6 36
7 49
5 25
10 100
12 144
X = 40 X
2
= 354
x x N 40 5 8
X 2
354
402
=
x 2
N or, = 5
N 5
72
xx 68
Z= = -0.77 (Standard score)
2.61
7 8
(i) = -0.38
2.61
58
(ii) = -1.15
2.61
10 8
(iii) = 0.77
2.61
12 8
(iv) (iv) = 1.53
2.61
Thus the standard scores for 6,7,5,10 and 12 are -0.77, -0.38, -1.15, 0.77 and 1.53,
respectively.
This measure of dispersion is graphical. It is known as the Lorenz curve named after
Dr. Max Lorenz. It is generally used to show the extent of concentration of income
and wealth. The steps involved in plotting the Lorenz curve are:
2. Calculate percentage for each item taking the total equal to 100.
3. Choose a suitable scale and plot the cumulative percentages of the persons and
income. Use the horizontal axis of X to depict percentages of persons and the
4. Show the line of equal distribution, which will join 0 of X-axis with 100 of Y-
axis.
5. The curve obtained in (3) above can now be compared with the straight line of
equal distribution obtained in (4) above. If the Lorenz curve is close to the line
of equal distribution, then it implies that the dispersion is much less. If, on the
73
contrary, the Lorenz curve is farther away from the line of equal distribution,
The Lorenz curve is a simple graphical device to show the disparities of distribution
Figure 3.1 shows two Lorenz curves by way of illustration. The straight line AB is a
line of equal distribution, whereas AEB shows complete inequality. Curve ACB and
A F
As curve ACB is nearer to the line of equal distribution, it has more equitable
distribution of income than curve ADB. Assuming that these two curves are for the
same company, this may be interpreted in a different manner. Prior to taxation, the
curve ADB showed greater inequality in the income of its employees. After the
taxation, the company’s data resulted into ACB curve, which is closer to the line of
equal distribution. In other words, as a result of taxation, the inequality has reduced.
be repeated here that frequency distributions differ in three ways: Average value,
Variability or dispersion, and Shape. Since the first two, that is, average value and
74
variability or dispersion have already been discussed in previous chapters, here our
main spotlight will be on the shape of frequency distribution. Generally, there are two
distribution. Two distributions may have the same mean and standard deviation but
may differ widely in their overall appearance as can be seen from the following:
distributions.
symmetrical distribution the mean, median and mode are identical. The more
the mean moves away from the mode, the larger the asymmetry or skewness."
4. "A distribution is said to be 'skewed' when the mean and the median fall at
different points in the distribution, and the balance (or centre of gravity) is
75
The above definitions show that the term 'skewness' refers to lack of symmetry" i.e.,
distribution.
The concept of skewness will be clear from the following three diagrams showing a
distribution.
metrical distribution the values of mean, median and mode coincide. The
2. Asymmetrical Distribution. A
the mean is maximum and that of mode least-the median lies in between the
maximum and that of mean least-the median lies in between the two. In the
positively skewed distribution the frequencies are spread out over a greater
76
range of values on the high-value end of the curve (the right-hand side) than
they are on the low-value end. In the negatively skewed distribution the
position is reversed, i.e. the excess tail is on the left-hand side. It should be
mean and the median is approximately one-third of the interval between the
In order to ascertain whether a distribution is skewed or not the following tests may
be applied. Skewness is present if:
1. The values of mean, median and mode do not coincide.
2. When the data are plotted on a graph they do not give the normal bell-
shaped form i.e. when cut along a vertical line through the centre the two
3. The sum of the positive deviations from the median is not equal to the sum
the mode.
3. Sum of the positive deviations from the median is equal to the sum of the
negative deviations.
77
4. Quartiles are equidistant from the median.
mode.
There are four measures of skewness, each divided into absolute and relative
measures. The relative measure is known as the coefficient of skewness and is more
frequently used than the absolute measure of skewness. Further, when a comparison
which is used. The measures of skewness are: (i) Karl Pearson's measure, (ii)
Bowley’s measure, (iii) Kelly’s measure, and (iv) Moment’s measure. These
than the mode or less than the mode. If it is greater than the mode, then skewness is
78
positive. But when the mean is less than the mode, it is negative. The difference
between the mean and mode indicates the extent of departure from symmetry. It is
unit of measurement. It may be recalled that this observation was made in the
skewness lies between +1. If the mean is greater than the mode, then the coefficient of
Example 3.11: Given the following data, calculate the Karl Pearson's coefficient of
Solution:
Mean ( x )=
X 452 45.2
N 10
x x
2 2
SD
x2
x2
N N N N
24270 452 2 2427 (45.2) 2 19.59
10 10
Applying the values of mean, mode and standard deviation in the above formula,
This shows that there is a positive skewness though the extent of skewness is
marginal.
Example 3.12: From the following data, calculate the measure of skewness using the
X 10 - 20 20 - 30 30 - 40 40 - 50 50-60 60 - 70 70 - 80
f 18 30 40 55 38 20 16
79
Solution:
2
x MVx dx f fdx fdX cf
10 - 20 15 -3 18 -54 162 18
20 - 30 25 -2 30 -60 120 48
30 - 40 35 -1 40 -40 40 88
40-50 45=a 0 55 0 0 143
50 - 60 55 1 38 38 38 181
60 - 70 65 2 20 40 80 201
70 - 80 75 3 16 48 144 217
Total 217 -28 584
a = Assumed mean = 45, cf = Cumulative frequency, dx = Deviation from assumed
mean, and i = 10
xa
fdx i
N
28
45 10 43.71
217
l2 l1
Median= l1 (m c)
f1
50 40
Median 40 (109 88)
55
10
40 21
55
= 43.82
fd 2 fd x
2
584 28 2
SD = 10 10
x
f f
217 217
= 3 (43.71 - 43.82)
= 3 x -0.011
80
= -0.33
Coefficient of skewness
Skewness or
SD
= -0.33
16.4
= -0.02
The result shows that the distribution is negatively skewed, but the extent of skewness
is extremely negligible.
Where Q3 and Q1 are upper and lower quartiles and M is the median. The value of this
skewness varies between +1. In the case of open-ended distribution as well as where
extreme values are found in the series, this measure is particularly useful. In a
symmetrical distribution, skewness is zero. This means that Q3 and Q1 are positioned
when the distribution is skewed, then Q3 - Q2 will be different from Q2 – Q1' When Q3
written as:
comparing two distributions where the units of measurement are different. In view of
81
Relative Skewness = (Q3 Q2 ) (Q2 Q1 )
(Q3 Q2 ) (Q2 Q1 )
Q3 Q2 Q2 Q 1
=
Q3 Q2 Q2 Q 1
Q3 Q1 2Q2
=
Q3 Q1
Q3 Q1 2M
=
Q3 Q1
Example 3.13: For a distribution, Bowley’s coefficient of skewness is - 0.56,
Solution:
Q3 Q1 2M
Bowley's coefficient of skewness is: SkB =
Q3 Q1
Q3 16.4 - (2 x 24.2)
SkB =
Q3 16.4
Q3 16.4 - 48.4
0.56
Q3 16.4
- 1.56 Q3 = - 41.184
41.184
Q3 = 26.4
1.56
Now, we have the values of both the upper and the lower quartiles.
Q3 Q1
Coefficient of quartile deviation =
Q3 Q1
26.4 16.4 10
= 0.234 Approx.
26.4 16.4 42.8
data:
82
Value in Rs Frequency
Less than 50 40
50 - 100 80
150 – 200 60
Solution: It should be noted that the series given in the question is an open-ended
be the most appropriate measure of skewness in this case. In order to calculate the
quartiles and the median, we have to use the cumulative frequency. The table is
Less than 50 40 40
50 - 100 80 120
l2 l1
Q1 = l1 (m c)
f1
n 1 341
Now m=( ) item = = 85.25, which lies in 50 - 100 class
4 4
100 50
Q1 = 50 + (85.25 40) 78.28
80
n 1 341
M=( ) item = = 170.25, which lies in 100 - 150 class
4 4
83
150 100
M= 100 + (170.5 120) 119.4
130
l2 l1
Q3 = l1 (m c)
f1
m = 3(341) 4 = 255.75
200 150
Q3 = 150 + (255.75 250) 154.79
60
= - 0.075 approx.
This shows that there is a negative skewness, which has a very negligible magnitude.
D1 D9 2M
Or,
D9 D1
Where P and D stand for percentile and decile respectively. In order to calculate the
coefficient of skewness by this formula, we have to ascertain the values of 10th, 50th
and 90th percentiles. Somehow, this measure of skewness is seldom used. All the
Class Intervals f cf
10 - 20 18 18
20 - 30 30 48
84
30- 40 40 88
40- 50 55 143
50 - 60 38 181
60 – 70 20 201
70 - 80. 16 217
l2 l1
PIO = l1 (m c) , where m = (n + 1)/10th item
f1
217 1
21.8th item
10
217 1
P50 (median): where m = (n + 1)/2th item = = 109th item
2
Kelley's skewness
88.87 - 87.64
=
46.63
= 0.027
85
This shows that the series is positively skewed though the extent of skewness is
extremely negligible. It may be recalled that if there is a perfectly symmetrical
distribution, then the skewness will be zero. One can see that the above answer
is very close to zero.
3.13 MOMENTS
In mechanics, the term moment is used to denote the rotating effect of a force. In
moments lies in the sense that they indicate different aspects of a given distribution.
Thus, by using moments, we can measure the central tendency of a series, dispersion
or variability, skewness and the peakedness of the curve. The moments about the
actual arithmetic mean are denoted by . The first four moments about mean or
First moment 1 =
1
N x
1 x
Second moment 2 = 1
N
x1 x
2
Third moment 3 = 1
N
x1 x
3
Fourth moment 3 = 1
N
x1 x
4
1
First moment 1 =
N fix 1 x
Second moment 2 =
1
N
fix 1 x
2
Third moment 3 =
1
N
fix 1 x
3
86
fix
1 4
Fourth moment 3 = 1 x
N
It may be noted that the first central moment is zero, that is, = 0.
The third central moment 3 is used to measure skewness. The fourth central moment
Karl Pearson suggested another measure of skewness, which is based on the third and
23
1
23
Example 3.16: Find the (a) first, (b) second, (c) third and (d) fourth moments for the
Solution:
(a) x
x 2 3 4 5 6 20 4
N 5 5
(b) x x 2
2 3 4 5 6
2 2 2 2 2
N 5
4 9 16 25 36
18
5
(c) x x 3
2 3 4 5 6
3 3 3 3 3
N 5
8 27 64 125 216
88
5
(d) x x 4
2 3 4 5 6
4 4 4 4 4
N 5
87
Example 3.17: Using the same set of five figures as given in Example 3.7, find the
(a) first, (b) second, (c) third and (d) fourth moments about the mean.
Solution:
m1 (x x)
(x x) (2 4) (3 4) (4 4) (5 4) (6 4)
N 5
- 2 -1 0 1 2
= =0
5
m2 (x x) 2
(x x)2
(2 4)2 (3 4)2 (4 4)2 (5 4)2 (6 4)2
N 5
(-2)2 (_1) 2 02 12 22
=
5
4 1 0 1 4
= = 2. It may be noted that m2 is the variance
5
m3= (x x)3
(x x)3
(2 4)3 (3 4)3 (4 4)3 (5 4)3 (6 4)3
N 5
(-2)3 (_1)3 03 13 23 - 8 -1 0 1 8
= = 0
5 5
(x x)4
m4= (x x)
4
N 5
(-2)4 (_1) 4 04 14 24
=
5
16 1 0 1 016
= 6.8
5
Example 3.18: Calculate the first four central moments from the following data:
Class interval 50-60 60-70 70-80 80-90 90-100
Frequency 5 12 20 7 6
Solution:
88
60- 70 12 65 -10 -1 -12 12 -12 12
70- 80 20 75 0 0 0 0 0 0
80- 90 7 85 10 1 7 7 7 7
90-100 6 95 20 2 12 24 48 96
Total 50 -3 -4 195
'
fd i 3 10 0.6
1
N 50
2 '
fd 2 i 63 10
12.6
N 50
fd 3 i 4 10 0.8
2 '
N 50
2 '
fd 4 i 195 10
19
N 50
Moments about Mean
1=1’ - 1’= -0.6-(-0.6) = 0
2=2’ - 1’2=10-( -0.6)2= 10-3.6=6.4
3=3’ - 32’’1+21’3=-0.8-3(12.6)(-0.6)+2(-0.6)3
= -0.8 + 22.68 + 0.432 = 22.312
4=4’ - 43’’1+621’2-31’4
= 19 + 4(-0.8)(-0.6) + 6(10)(-0.6)2- 3(-0.6)4
= 19 + 1.92 + 21.60 - 0.3888
= 42.1312
3.14 KURTOSIS
which means bulginess. While skewness signifies the extent of asymmetry, kurtosis
curves into three types on the basis of the shape of their peaks. These are mesokurtic,
leptokurtic and platykurtic. These three types of curves are shown in figure below:
89
It will be seen from Fig.
curve of a normal
distribution. Leptokurtic
curve is a more peaked than the normal curve. In contrast, platykurtic is a relatively
flat curve. The coefficient of kurtosis as given by Karl Pearson is 2=4/22. In case of
a normal distribution, that is, mesokurtic curve, the value of 2=3. If 2 turn out to be
> 3, the curve is called a leptokurtic curve and is more peaked than the normal curve.
Again, when 2 < 3, the curve is called a platykurtic curve and is less peaked than the
appropriate average. For example, for normal distribution, mean is most appropriate;
Example 3.19: From the data given in Example 3.18, calculate the kurtosis.
Solution: For this, we have to calculate 2 This can be done by using the formula
Another measure of kurtosis is based on both quartiles and percentiles and is given by
Q
K
P90 P10
90
Where K = kurtosis, Q = ½ (Q3 – Q1) is the semi-interquartile range; P90 is 90th
percentile and P10 is the 10th percentile. This is also known as the percentile
Example 3.20: From the data given below, calculate the percentile coefficient of
kurtosis.
50- 60 10 10
60-70 14 24
70-80 18 42
80 - 90 24 66
90-100 16 82
100 -110 12 94
Total 100
Solution: It may be noted that the question involved first two columns and in order to
l2 l 1
Q1 = l1 (m c) , where m = (n + 1)/4th item, which is = 25.25th item
f1
80 70
= 70 + (25.25 24) = 70.69
18
l2 l1
Q3 = l1 + (m c) , where m = 75.75
f1
100 90
= 90 + (75.75 - 66) = 96.09
16
91
l2 l1
PI0 = l1 + (m c) , where m = 10.1
f1
70 60
= 60 + (10.01 -10) = 60.07
14
l2 l1
P90 = l1 + (m c) , where m = 90.9
f1
110 100
= 100 + (90.9 - 82) = 107.41
12
Q
K
P90 P10
1/ 2(Q3 Q1 )
=
P90 P10
½ (96.09 - 70.69)
=
107.41 - 60.07
= 0.268
It will be seen that the above distribution is very close to normal distribution as the
3.15 SUMMARY
The average value cannot adequately describe a set of observations, unless all the
observations are the same. It is necessary to describe the variability or
dispersion of the observations. In two or more distributions the central value
may be the same but still there can be wide disparities in the formation of
distribution. Therefore, we have to use the measures of dispersion.
Further, two distributions may have the same mean and standard deviation but may
92
distinguish between different types of distributions, we may use the measures of
skewness.
2. “Variability is not an important factor because even though the outcome is more
certain, you still have an equal chance of falling either above or below the median.
Therefore, on an average, the outcome will be the same.” Do you agree with this
3. Why is the standard deviation the most widely used measure of dispersion? Explain.
6. What are the different measures of skewness? Which one is repeatedly used?
(i) Compute the moment coefficients of skewness and kurtosis. (ii) Is the distribution
10. The first four moments of a distribution about the value 4 are 1,4, 10 and 45. Obtain
93
11. Define kurtosis. If β1=1 and β2 =4 and variance = 9, find the values of β3 and β4 and
12. Calculate the first four moments about the mean from the following data. Also
No. of students 5 12 18 40 15 7 3
Delhi.
Hall, NJ.
94
Course: Business Statistics Author: Anil Kumar
Course Code: MC-106 Vetter : Prof. Harbhajan Bansal
Lesson: 04
CORRELATION ANALYSIS
Structure
4.1 Introduction
4.2 What is Correlation?
4.3 Correlation Analysis
4.3.1 Scatter Diagram
4.3.2 Correlation Graph
4.3.3 Pearson’s Coefficient of Correlation
4.3.4 Spearman’s Rank Correlation
4.3.5 Concurrent Deviation Method
4.4 Limitations of Correlation Analysis
4.5 Self-Assessment Questions
4.6 Suggested Readings
95