Measure of Dispersion-Intro
Measure of Dispersion-Intro
1.1 Introduction
The measures of central tendencies (i.e., means) indicate the general magnitude of the data and
locate only the center of a distribution of measures. They do not establish the degree of variability
or the spread out or scatter of the individual items and their deviation from (or the difference with)
the means.
i) According to Nciswanger, "Two distributions of statistical data may be symmetrical and have
common means, medians and modes and identical frequencies in the modal class. Yet with these
points in common they may differ widely in the scatter or in their values about the measures of
central tendencies."
ii) Simpson and Kafka said, "An average alone does not tell the full story. It is hardly fully
representative of a mass, unless we know the manner in which the individual item. Scatter around
it .... a further description of a series is necessary, if we are to gauge how representative the average
is."
Example
The three groups have same mean i.e., 50. In fact the median of group X and Y are also equal.
Now if one would say that the students from the three groups are of equal capabilities, it is totally
a wrong conclusion. Close examination reveals that in group X, students have equal marks as the
mean, students from group Y are very close to the mean but in the third group Z, the marks are
widely scattered. It is thus clear that the measures of the central tendency alone is not sufficient to
describe the data.
Definition of dispersion: The arithmetic mean of the deviations of the values of the individual
items from the measure of a particular central tendency used. Thus the ’dispersion’ is also known
as the "average of the second degree."
In measuring dispersion, it is imperative to know the amount of variation (absolute measure) and
the degree of variation (relative measure). In the former case we consider the range, mean
deviation, standard deviation etc. In the latter case we consider the coefficient of range, the
coefficient mean deviation, the coefficient of variation etc.
1.2 Methods of Computing Dispersion
1.2.1 Range
In any statistical series, the difference between the largest and the smallest values is called as the
range.
Thus Range (R) = L – S
Where
Coefficient of Range: The relative measure of the range. It is used in the comparative study of the
dispersion.
TASK 1
Find the range and the co-efficient of the range of the following items:
110, 117, 129, 197, 190, 100, 100, 178, 255, 790.
Diff. in No. of
years: students:
0-5 449
5-10 705
10-15 507
15-20 281
20-25 109
25-30 52
30-35 16
35-40 4
1.2.3 Variance
The term variance was used to describe the square of the standard deviation. The concept of
variance is of great importance in advanced work where it is possible to split the total into several
parts, each attributable to one of the factors causing variations in their original series.
Variance is defined as follows:
Merits:
(1) It is rigidly defined and based on all observations.
(2) It is amenable to further algebraic treatment.
(3) It is not affected by sampling fluctuations.
(4) It is less erratic
Demerits:
(1) It is difficult to understand and calculate.
(2) It gives greater weight to extreme values.
A B C D E F G H I J
10 12 16 8 25 30 14 11 13 11
TASK 5
Calculate s.d. of the marks of 100 students.
Marks No. of students
0-2 10
2-4 20
4-6 35
6-8 30
8-10 5
TASK 6
The score of two teams A and B in 10 matches are as:
A 40 32 0 40 30 7 13 25 14 3
B 21 14 14 30 5 12 10 13 30 6
Find the variance for both the series. Which team is more consistent?
1.4 Percentile
The nth percentile is that value (or size) such that n% of values of the whole data lies below it.
For example, a score of 7% from the topmost score would be 93 the percentile as it is above 93%
of the other scores.
Percentile Range
It is used as one of the measures of dispersion in a set of data and is defined as = P90 - P10 where
P90 and P10 are the 90th and 10th percentile respectively. The semi - percentile range, i.e.
Now the lower quartile (Q1) is the 25th percentile and the upper quartile (Q3) is the 75th percentile.
It is interesting to note that the 50th percentile is the middle quartile (Q2) which is in fact what
you have studied under the title’ Median ".
Thus, symbolically
ASSIGNMENT 1
1. From the set of data given below,
3, 9, 5, 2, 7
i) Find the mean and the median [2mks]
ii) Calculate the standard deviation [3mks]
iii) Calculate the geometric mean and the harmonic mean [4mks]
2. The following is the distribution of weights of 140 students of ICT class of Samburu Technical
during the last intake.
Skewness
It may happen that two distributions have the same mean and standard deviations. For example,
see the following diagram.
Although the two distributions have the same means and standard deviations, they are not identical.
Where do they differ?
They differ in symmetry. The left-hand side distribution is symmetrical one whereas the
distribution on the right-hand is asymmetrical or skewed. For a symmetrical distribution, the
values, of equal distances on either side of the mode, have equal frequencies. Thus, the mode,
median and mean - all coincide.
Its curve rises slowly, reaches a maximum (peak) and falls equally slowly (Fig. 1). But for a
skewed distribution, the mean, mode and median do not coincide. Skewness is positive or negative
as per the positions of the mean and median on the right or the left of the mode.
A positively skewed distribution (Fig.2) curve rises rapidly, reaches the maximum and falls slowly.
In other words, the tail as well as median on the right-hand side. A negatively skewed distribution
curve (Fig.3) rises slowly reaches its maximum and falls rapidly. In other words, the tail as well
as the median are on the left-hand side.
1.6.1 Measure of Skewness
Pearson has suggested the use of this formula if it is not possible to determine the mode (Mo) of
any distribution,
Note:
i) Although the co-efficient of skewness is always within ±1, but Karl Pearson’s coefficient lies
within ± 3.
ii) Sk = 0, then there is no skewness.
iii) If Sk is positive, the skewness is also positive.
iv) If Sk is negative, the skewness is also negative.
Unless and until no indication is given, you must use only Karl Pearson’s formula.
1.6.2 Kurtosis
It has its origin in the Greek word "Bulginess." In statistics it is the degree of flatness or peakedness
in the region of mode of a frequency curve. It is measured relative to the ’peakedness’ of the
normal curve. It tells us the extent to which a distribution is more peaked or flat-topped than the
normal curve. If the curve is more peaked than a normal curve it is called ’Lepto Kurtic.’ In this
case items are more clustered about the mode. If the curve is more flat-topped than the more normal
curve, it is Platy-Kurtic. The normal curve itself is known as "Meso Kurtic."
(a) Construct a cumulative frequency table. (Use upper class boundaries 15.5, 20.5 and so on.)
(b) On graph paper, draw a cumulative frequency graph, using a scale of 2 cm to represent 5
minutes on the horizontal axis and 1 cm to represent 10 students on the vertical axis.
(c) Use your graph to estimate
(i) the number of students that completed the task in less than 17.5 minutes;
(ii) the time it will take for 75% of the students to complete the task.
4. The table below shows the percentage, to the nearest whole number, scored by candidates in
an examination.
The following is the cumulative frequency table for the marks.