Bast 503 Lect 5
Bast 503 Lect 5
2
INTRODUCTION
• Dispersion measures the extent to which the items
vary from some central value.
• Measures of dispersion or variation measure only the
degree but not the direction of the variation.
• The measures of dispersion are also called averages
of the second order because they are based on the
deviations of the different values from the mean or
other measures of central tendency which are called
averages of the first order.
3
DEFINITION
• In the words of Bowley “Dispersion is the measure of
the variation of the items”
• According to Conar “Dispersion is a measure of the
extent to which the individual items vary”
• According to Simpson and Kafka, “The measures of the
scatterness of a mass of figures in a series about an
average is called measure of variation, or dispersion”.
• According to Spiegel, “The degree to which numerical
data lend to spread about an average value is called
the variation, or dispersion of the data”.
4
Measures of Variability or Dispersion
Slide 6
Significance of Measuring Variation
• It points out as to how far an average is
representative of mass.
• It is used to determine the uniformity and
consistency.
• It is used to determine the nature and cause
of variation in order to control the variation.
• It enable a comparison to be made of two or
more series with regard to their variability.
Objective of dispersion
• It is the value of dispersion which says how much
reliable a central tendency is?
• Usually, a small value of dispersion indicates that
measure of central tendency is more reliable series
and vice ‐ versa.
• Many powerful analytical tools in statistics such as
correlation analysis, the testing of hypothesis,
analysis of variance, the statistical quality control,
regression analysis are based on measure of
variation of one kind or another.
Characteristic of an Ideal Measure of
Dispersion
• It should be rigidly defined.
• It should be easy to understand and easy to calculate.
• It should be based on all the observations of the data.
• It should be easily subjected to further mathematical
• It should be least affected by the sampling fluctuation
.
• It should not be unduly affected by the extreme
values
How dispersions are measured?
• Absolute: Measure the dispersion in the original
unit of the data.
– Variability in 2 or more distribution can be compared
provided they are given in the same unit and have the
same average.
• Relative: Measure of dispersion is free from unit of
measurement of data.
– It is the ratio of a measure of absolute dispersion to the
average, from which absolute deviations are measured.
– It is called as co-efficient of dispersion.
Measures of Dispersion
– Range
– Inter-Quartile deviation
– Mean deviation
– Standard deviation
Range
• The simplest and crudest measure of
dispersion is the range.
• This is defined as the difference between the
largest and the smallest values in the
distribution.
• If x1 , x 2 ,.........., x n are the values of observations
in a sample, then range (R) of the variable X is
given by:
R x1 , x 2 ,........, x n maxx1 , x 2 ,..........., x n min x1 , x 2 ,............, x n
Range
• Symbolically for ungrouped data it is represented by:
R=H–L
Where, R = Range
H = Highest value in the observation
L = Lowest value in the observation
Coefficient of Range = ?
When To Use the Range
• The range is used when
– you have ordinal data or
– you are presenting your results to people with little or no
knowledge of statistics
– It is used to find the variability of data quickly.
– When the sample size is small, it is considered quite
adequate to measure the variability.
• The range is rarely used in scientific work as it is fairly
insensitive
– It depends on only two scores in the set of data, XL and XS
– Two very different sets of data can have the same range:
1 1 1 1 9 vs 1 3 5 7 9
15
Merits and Demerits of Range
• It is simplest and easiest to compute, understand and
interpret.
• It is a crude measure since it does not take into account all
individual observation.
• It is subjected to wide fluctuation from sample to sample
based on same sporulation. Addition/removal of a single
extreme value at upper/lower end of data series can alter the
range to great extent.
• In open-ended distribution, it is not possible to compute the
range
Characteristics of Range
• Simplest measure of dispersion
• It is not based on all the observations.
• Unduly affected by the extreme values and
fluctuations of sampling.
• The range may increase with the size of the set of
observations
• Gives an idea of the variability very quickly
Inter-quartile Range or Quartile Deviation
• The IQR is a measure of variability, based on dividing a data set into
quartiles.
• Quartiles divide a rank-ordered data set into four equal parts. The
values that separate parts are called the first, second, and third
quartiles; and they are denoted by Q1, Q2, and Q3, respectively.
• the interquartile range (IQR), also called the midspread or middle 50%,
or technically H-spread, is a measure of statistical dispersion, being
equal to the difference between 75th and 25th percentiles, or between
upper and lower quartiles
IQR = Q3 − Q1.
• In other words, the IQR is the first quartile subtracted from the third
quartile.
Inter-quartile Range or Quartile Deviation
Finding the IQR requires the following steps:
1. Arrange the data sets from least to highest.
2. Find the values of the 3rd quartile (Q3) and the 1st quartile (Q1)
values by using the locator's formula: c = k(n)/4. where k is the
quartile and n is the number of values (observations).
Rule of thumb for c
a) If c is a whole number, the quartile value is the average of
the values at c and c + 1.
b) If c is not a whole number, the quartile value is the value at
the integer when c is rounded off.
3. Subtract Q3 and Q1.
Percentiles, Quartiles (Measure of Relative
Standing) and Inter-quartile Range
Semi Inter-quartile Range or Quartile Deviation
• The quartile deviation is half of the IQR. It can be found by dividing
the IQR by 2.
QD = IQR/2 = (Q3 – Q1)/ 2
• When QD is small, it means that there is a small deviation in the
centre 50% items. If it is high, it shows that central 50% items have
a large variation
• Coefficient of quartile deviation: IQR or the QD is an absolute
measure of dispersion and can be changed in to relative measure
of dispersion using
(Q3 – Q1)/(Q3 + Q1)
IQR
• Merits:
– It is superior to range as a measure of dispersion.
– A special utility in measuring variation in case of open end distribution
or one which the data may be ranked but measured quantitatively.
– Useful in erratic or badly skewed distribution.
– The Quartile deviation is not affected by the presence of extreme values.
• Limitations:
– As the value of quartile deviation dose not depend upon every item of
the series it can’t be regarded as a good method of measuring
dispersion.
– It is not capable of mathematical manipulation.
– Its value is very much affected by sampling fluctuation.
The mean deviation
• The mean deviation is an average of absolute deviations of
individual observations from the central value of a series.
• It uses average but ignores signs and hence appears
unmethodical.
• MD is based on all values and hence cannot be calculated for
open-ended distributions.
• MD is calculated from mean as well as from median for both
ungrouped data using direct method and for continuous
distribution using assumed mean method and short-cut-
method.
• The average used is either the arithmetic mean or median
Actual and absolute deviations from mean
f i xi x
MD
n
Mean Deviation for continuous grouped data
f i mi x
MD x
n
√ √
∑ (𝑥 − 𝑥) 𝑜𝑟 ∑ 𝑑
√ ∑𝑥
( )
∑𝑥
2 2 2 2
𝜎= 𝑜𝑟 −
𝑁 𝑁 𝑁 𝑁
√ ∑𝑑
( )
∑𝑑
2 2
𝜎= −
𝑁 𝑁
Steps in calculating the standard deviation
• For each value, find its distance to the mean
• For each value, find the square of this distance
• Find the sum of these squared values
• Divide the sum by the number of values in the data
set
• Find the square root of this
Problem: Find the mean respiration rate per minute
and its standard deviation when in 4 cases the rate was
found to be : 16, 13, 17 and 22.
16
13
17
22
√ √
2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛=𝜎=
∑ (𝑥 − 𝑋 ¿)2
=
∑𝑑 ¿
𝑁 𝑁
Example: Blood serum cholesterol levels of 10 persons are as
under: 240, 260, 290, 245, 255, 288, 272, 263, 277, 251.
calculation standard deviation with the help of assumed mean
Standard Deviation for discrete
series
• Deviation from Actual mean
√ ∑ 𝑓𝑥
2
𝜎=
𝑁 Where x X X
• Deviation from Assumed mean
√ ∑ 𝑓 𝑑2 ∑ 𝑓 𝑑
( )
2
•
𝜎= −
Step Deviation Method𝑁
𝑁 d X A
√
2 7 11 15 10 4 1
( )
2
𝜎=
∑𝑑 2
−
∑𝑑
𝑁 𝑁
x f d = x-A fd
Find the standard deviation value for the following
discreate data using step deviation method
BP
102 106 110 114 118 122 126
(mmHg)
Days 2 9 25 35 17 10 1
x f d = (x-A)/4 fd
√ ∑𝑓𝑑
( ∑𝑓𝑑
)
2 2
𝜎= − 𝑋𝑖
𝑁 𝑁
Standard Deviation for Group Data
√
• SD is : fx
∑
2
𝑓 (𝑥 − 𝑥) Where x
i i
𝜎= f
𝑁 i
• Simplified formula
√ ∑𝑑
( )
∑𝑑
2 2
𝜎=
𝑁
−
𝑁
𝑋𝑖
d
m A
,
i
Where i = class interval
Find the standard deviation value for the
following discreate data using step deviation
method
IQ 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No of 5 12 15 20 10 4 2
Student
IQ f m d = (x-A)/10 fd
Mathematical Properties of SD
• Combined Standard Deviation
xi fi f i xi f i xi
2 x i x x i x 2 f i x i x 2
3 2 6 18 -3 9 18
5 3 15 75 -1 1 3
7 2 14 98 1 1 2
8 2 16 128 2 4 8
9 1 9 81 3 9 9
Total 10 60 400 - - 40
x
f
f x i
i
i
60
10
6 𝜎=
√ ∑ 𝑓 (𝑥 − 𝑋 )
𝑁
2
𝜎=
√40
10
Variance
• Term was used to describe the square of the SD by RA Fisher in
1913.
• Has highly importance in advanced work where it is possible to
split the total into several parts, each attributable to one of the
factors causing variation in their original.
• Variance measures how far a data set is spread out.
• Defined as “The average of the squared differences from the
Mean”.
• Where
and I = class interval
Problem
• calculate the range, the variance and the standard deviation of the sample data
which follows Vitamin E concentration (mmol/l) in 12 heifers showing clinical signs
of an unusual myopathy:
4.2 3.3 7 6.9 5.1 3.4 2.5 8.6 3.5 2.9 4.9 5.4
√
240.12
∑𝑑 −
( )
∑𝑑
2 2
240.13 𝜎=
𝑁 𝑁
240.15
240.12
240.17
240.15
240.17
240.16
240.22
240.21
Problem
Calculate Variance, Standard Deviation , Mean Deviation from Median, Mode, Arithmetic
Mean, Geometric mean, Harmonic mean, Coefficient of Variation and Standard Error
from the data of Gestation Period of crossbred cows given below for each grades
individually and for pooled data using formula for Variance, Standard Deviation and
Standard Error.