0% found this document useful (0 votes)
13 views14 pages

Measure of Dispersion-Intro

Chapter 1 discusses measures of dispersion, emphasizing that central tendencies alone do not adequately describe data variability. It covers various methods for computing dispersion, including range, mean deviation, variance, and standard deviation, along with their merits and demerits. Additionally, the chapter introduces concepts of skewness and kurtosis, highlighting their importance in understanding data distribution.

Uploaded by

tum chris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views14 pages

Measure of Dispersion-Intro

Chapter 1 discusses measures of dispersion, emphasizing that central tendencies alone do not adequately describe data variability. It covers various methods for computing dispersion, including range, mean deviation, variance, and standard deviation, along with their merits and demerits. Additionally, the chapter introduces concepts of skewness and kurtosis, highlighting their importance in understanding data distribution.

Uploaded by

tum chris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

CHAPTER 1: MEASURES OF DISPERSION

1.1 Introduction
The measures of central tendencies (i.e., means) indicate the general magnitude of the data and
locate only the center of a distribution of measures. They do not establish the degree of variability
or the spread out or scatter of the individual items and their deviation from (or the difference with)
the means.
i) According to Nciswanger, "Two distributions of statistical data may be symmetrical and have
common means, medians and modes and identical frequencies in the modal class. Yet with these
points in common they may differ widely in the scatter or in their values about the measures of
central tendencies."
ii) Simpson and Kafka said, "An average alone does not tell the full story. It is hardly fully
representative of a mass, unless we know the manner in which the individual item. Scatter around
it .... a further description of a series is necessary, if we are to gauge how representative the average
is."

Example

The three groups have same mean i.e., 50. In fact the median of group X and Y are also equal.
Now if one would say that the students from the three groups are of equal capabilities, it is totally
a wrong conclusion. Close examination reveals that in group X, students have equal marks as the
mean, students from group Y are very close to the mean but in the third group Z, the marks are
widely scattered. It is thus clear that the measures of the central tendency alone is not sufficient to
describe the data.
Definition of dispersion: The arithmetic mean of the deviations of the values of the individual
items from the measure of a particular central tendency used. Thus the ’dispersion’ is also known
as the "average of the second degree."
In measuring dispersion, it is imperative to know the amount of variation (absolute measure) and
the degree of variation (relative measure). In the former case we consider the range, mean
deviation, standard deviation etc. In the latter case we consider the coefficient of range, the
coefficient mean deviation, the coefficient of variation etc.
1.2 Methods of Computing Dispersion
1.2.1 Range
In any statistical series, the difference between the largest and the smallest values is called as the
range.
Thus Range (R) = L – S
Where

Coefficient of Range: The relative measure of the range. It is used in the comparative study of the
dispersion.

TASK 1
Find the range and the co-efficient of the range of the following items:
110, 117, 129, 197, 190, 100, 100, 178, 255, 790.

1.2.2 Mean Deviation


The mean deviation of a statistical data is defined as the arithmetic mean of the numerical values
of the deviations of items from some average value. Mean deviation is also known as average
deviation.
The mean deviation is generally denoted by M.D.
TASK 2
Find the mean deviation from the mean for the given raw data.
12 ,6, 7, 3, 15, 10, 18, 5.
TASK 3
Calculate the mean deviation and the coefficient of mean deviation from the following data using
the mean.
Difference in ages between boys and girls of a class.

Diff. in No. of
years: students:
0-5 449
5-10 705
10-15 507
15-20 281
20-25 109
25-30 52
30-35 16
35-40 4

1.2.3 Variance
The term variance was used to describe the square of the standard deviation. The concept of
variance is of great importance in advanced work where it is possible to split the total into several
parts, each attributable to one of the factors causing variations in their original series.
Variance is defined as follows:

1.2.4 Standard Deviation (s. d.)


It is the square root of the arithmetic mean of the square deviations of various values from their
arithmetic mean. it is denoted by s.d (when dealing with a sample) or (when dealing with the
population)

Merits:
(1) It is rigidly defined and based on all observations.
(2) It is amenable to further algebraic treatment.
(3) It is not affected by sampling fluctuations.
(4) It is less erratic

Demerits:
(1) It is difficult to understand and calculate.
(2) It gives greater weight to extreme values.

1.3 Co‐efficient Of Variation (C. V.)


To compare the variations (dispersion) of two different series, relative measures of standard
deviation must be calculated. This is known as co-efficient of variation or the co-efficient of s. d.
Its formula is

Thus, it is defined as the ratio s. d. to its mean.


It is given as a percentage and is used to compare the consistency or variability of two or more
series. The higher the C. V., the higher the variability and lower the C. V., the higher is the
consistency of the data.
TASK 4
Calculate the standard deviation and its co-efficient from the following data.

A B C D E F G H I J
10 12 16 8 25 30 14 11 13 11

TASK 5
Calculate s.d. of the marks of 100 students.
Marks No. of students
0-2 10
2-4 20
4-6 35
6-8 30
8-10 5

TASK 6
The score of two teams A and B in 10 matches are as:

A 40 32 0 40 30 7 13 25 14 3
B 21 14 14 30 5 12 10 13 30 6

Find the variance for both the series. Which team is more consistent?

1.4 Percentile
The nth percentile is that value (or size) such that n% of values of the whole data lies below it.
For example, a score of 7% from the topmost score would be 93 the percentile as it is above 93%
of the other scores.

Percentile Range
It is used as one of the measures of dispersion in a set of data and is defined as = P90 - P10 where
P90 and P10 are the 90th and 10th percentile respectively. The semi - percentile range, i.e.

can also be used but it is not common in use.

1.5 Quartiles and Interquartile Range


If we concentrate on two extreme values (as in the case of range), we don’t get any idea about the
scatter of the data within the range (i.e., the two extreme values). If we discard these two values
the limited range thus available might be more informative. For this reason, the concept of
interquartile range is developed. It is the range which includes middle 50% of the distribution.
Here 1/4 (one quarter of the lower end and 1/4 (one quarter) of the upper end of the observations
are excluded.

Now the lower quartile (Q1) is the 25th percentile and the upper quartile (Q3) is the 75th percentile.
It is interesting to note that the 50th percentile is the middle quartile (Q2) which is in fact what
you have studied under the title’ Median ".
Thus, symbolically
ASSIGNMENT 1
1. From the set of data given below,
3, 9, 5, 2, 7
i) Find the mean and the median [2mks]
ii) Calculate the standard deviation [3mks]
iii) Calculate the geometric mean and the harmonic mean [4mks]

2. The following is the distribution of weights of 140 students of ICT class of Samburu Technical
during the last intake.

Weight (in pounds) Frequency


80-89 4
90-99 23
100-109 49
110-119 38
120-129 17
130-139 6
140-149 3

i) Draw a histogram to represent the above information


ii) Estimate the median weight in pounds
iii) Calculate the interquartile range
iv) Represent the information above by a box and whisker plot.
3. Give a brief description of each of the sampling techniques listed below
i) Random sampling
ii) Stratified sampling
iii) Systematic sampling.

1.6 Skewness and Kurtosis


The voluminous raw data cannot be easily understood; hence, we calculate the measures of central
tendencies and obtain a representative figure. From the measures of variability, we can know that
whether most of the items of the data are close to or away from these central tendencies. But these
statistical means and measures of variation are not enough to draw sufficient inferences about the
data. Another aspect of the data is to know its symmetry. In the chapter "Data presentation" we
have seen that a frequency may be symmetrical about mode or may not be. This symmetry is well
studied by the knowledge of the "skewness." Still one more aspect of the curve that we need to
know is its flatness or otherwise its top. This is understood by what is known as “Kurtosis."

Skewness
It may happen that two distributions have the same mean and standard deviations. For example,
see the following diagram.
Although the two distributions have the same means and standard deviations, they are not identical.
Where do they differ?
They differ in symmetry. The left-hand side distribution is symmetrical one whereas the
distribution on the right-hand is asymmetrical or skewed. For a symmetrical distribution, the
values, of equal distances on either side of the mode, have equal frequencies. Thus, the mode,
median and mean - all coincide.

Its curve rises slowly, reaches a maximum (peak) and falls equally slowly (Fig. 1). But for a
skewed distribution, the mean, mode and median do not coincide. Skewness is positive or negative
as per the positions of the mean and median on the right or the left of the mode.
A positively skewed distribution (Fig.2) curve rises rapidly, reaches the maximum and falls slowly.
In other words, the tail as well as median on the right-hand side. A negatively skewed distribution
curve (Fig.3) rises slowly reaches its maximum and falls rapidly. In other words, the tail as well
as the median are on the left-hand side.
1.6.1 Measure of Skewness
Pearson has suggested the use of this formula if it is not possible to determine the mode (Mo) of
any distribution,

Note:
i) Although the co-efficient of skewness is always within ±1, but Karl Pearson’s coefficient lies
within ± 3.
ii) Sk = 0, then there is no skewness.
iii) If Sk is positive, the skewness is also positive.
iv) If Sk is negative, the skewness is also negative.
Unless and until no indication is given, you must use only Karl Pearson’s formula.
1.6.2 Kurtosis
It has its origin in the Greek word "Bulginess." In statistics it is the degree of flatness or peakedness
in the region of mode of a frequency curve. It is measured relative to the ’peakedness’ of the
normal curve. It tells us the extent to which a distribution is more peaked or flat-topped than the
normal curve. If the curve is more peaked than a normal curve it is called ’Lepto Kurtic.’ In this
case items are more clustered about the mode. If the curve is more flat-topped than the more normal
curve, it is Platy-Kurtic. The normal curve itself is known as "Meso Kurtic."

1.7 Characteristics of a good measure of Dispersion


i. It should be easy to understand and to calculate.
ii. It should be easy to interpret.
iii. It should be open to further algebraic manipulation
ASSIGNMENT 2
1. The table shows the number of children in 50 families.

Number of children Frequency Cumulative frequency


1 3 3
2 m 22
3 12 34
4 p q
5 5 48
6 2 50
T

(a) Write down the value of T.


(b) Find the values of m, p and q.
2. The following table shows the times, to the nearest minute, taken by 100 students to complete
a mathematics task.

(a) Construct a cumulative frequency table. (Use upper class boundaries 15.5, 20.5 and so on.)
(b) On graph paper, draw a cumulative frequency graph, using a scale of 2 cm to represent 5
minutes on the horizontal axis and 1 cm to represent 10 students on the vertical axis.
(c) Use your graph to estimate
(i) the number of students that completed the task in less than 17.5 minutes;
(ii) the time it will take for 75% of the students to complete the task.
4. The table below shows the percentage, to the nearest whole number, scored by candidates in
an examination.
The following is the cumulative frequency table for the marks.

(a) Calculate the values of s and of t.


(b) Using a scale of 1 cm to represent 10 marks on the horizontal axis, and 1 cm to represent 10
candidates on the vertical axis, draw a cumulative frequency graph.
(c) Use your graph to estimate
(i) the median mark;
(ii) the lower quartile;
(iii) the pass mark, if 40% of the candidates passed.
5. The cumulative frequency graph below shows the examination scores of 80 students.

From the graph find


(a) the median value;
(b) the interquartile range;
(c) the 35th percentile.
6. The table below shows the number and weight (w) of fish delivered to a local fish market one
morning.

(a) (i) Write down the value of c.


(ii) On graph paper, draw the cumulative frequency curve for this data. Use a scale of 1 cm
to represent 0.1 kg on the horizontal axis and 1 cm to represent 10 units on the vertical axis.
Label the axes clearly.
(iii) Use the graph to show that the median weight of the fish is 0.95 kg.
(b) (i) The zoo buys all fish whose weights are above the 90th percentile.
How many fish does the zoo buy?
(ii) A pet food company buys all the fish in the lowest quartile. What is the maximum weight of
a fish bought by the company?
(c) A restaurant buys all fish whose weights are within 10% of the median weight.
(i) Calculate the minimum and maximum weights for the fish bought by the restaurant.
(ii) Use your graph to determine how many fish will be bought by the restaurant
7. The cumulative frequency graph has been drawn from a frequency table showing the time it
takes a number of students to complete a computer game.

(a) From the graph find


(i) the median time;
(ii) the interquartile range.
The graph has been drawn from the data given in the table below.
(b) Using the graph, find the values of p and q.
(c) Calculate an estimate of the mean time taken to finish the computer game.

You might also like