lecture_4
lecture_4
Dispersion
• The measure of the spread or variability
• No Variability – No Dispersion
Measures of Variation
• There are 3 values that we will look at to
measure the amount of dispersion or variation.
(The spread of the group)
1. Range
2. Standard Deviation
3. Quartile deviation
Why is it Important?
• You want to choose the best brand of
medicine for your patients. You are
interested in how long the drugs takes to
cure a disease. The choices are narrowed
down to 2 different drugs. The results are
shown in the chart. Which drug would
you choose?
Drug A Drug B
The chart indicates 10 35
the number of days 60 45
a drug takes to 50 30
cure a particular 30 35
disease. 40 40
20 25
210 210
Does the Average Help?
• Drug A: Avg = 210/6 = 35 days
• Range = 100 – 2 = 98
Deviation from the Mean
• A deviation from the mean, x – x , is the difference
between the value of x and the mean x
• We base our formulas for variance and standard
deviation on the amount that they deviate from the
mean.
• The mean deviation of a set of observations
𝑥1 , 𝑥2 , ⋯ , 𝑥𝑁 is the mean of the absolute deviations
from the mean and equals
1 𝑁
σ𝑖=1 |𝑥𝑖 − 𝑥|ҧ
N
Formulae for sample and population
variances
Computation formulae Definition formulae
( x) 2
x − 2
σ𝑛
(𝑥
𝑖=1 𝑖 − 𝑥)
lj 2
n 2
𝑠 =
s2 = 𝑛−1
n −1
( xi ) 2 N
− (x − )
2 2
x
=
2 N i
N 2 = i =1
N
Standard Deviation
• The standard deviation is the square root of the
variance.
s = s 2
Example – Using Formula
• Find the variance of the following
dataset 6, 3, 8, 5, 3 (in hours)
x x 2
6 36
3 9
8 64
5 25
3 9
x = 25 x = 143
2
Example – Using Formula
( x) 2
x 2
−
s2 = n
n −1
252
143 −
5 143 − 125 18
s =
2
= = = 4.5
4 4 4
Find the standard deviation
• The standard deviation is the positive square
root of the variance.
s = 4.5 = 2.12
Example: Mean, variance and standard deviation of data
• In a city there are six professional football clubs. Last season they had
25, 30, 18, 27, 28 and 22 players respectively on their full-time paid
staffs. Find the mean, variance and standard deviation of the number
of full-time paid staffs.
• Let us call the number of full-time paid staff r. It is easier to layout the
calculation in form of a table
Example: Mean, variance and standard deviation of data
𝟐
Club 𝒓𝒊 𝒓𝒊 − 𝒓ത 𝒓𝒊 − 𝒓ത 𝒓𝟐𝒊
A 25 0 0 625
B 30 5 25 900
C 18 -7 49 324
D 27 2 4 729
E 28 3 9 784
F 22 -3 9 484
6|150 6|96 3846
Mean 𝑟ҧ = 25 Variance = 16
2
1 𝑘 1 σ𝑘
𝑖=1 𝑓𝑖 𝑟𝑖
σ 𝑓 𝑟𝑖 − 𝑟ҧ 2 or σ𝑘𝑖=1 𝑓𝑖 𝑟𝑖2 −
𝑁 𝑖=1 𝑖 𝑁 𝑁
194.53
= = 2.13377
91
The standard deviation is 2.13377 = 1.46
Variance and standard deviation of grouped
observations of a continuous variable
The variance of a set of N observations of a continuous variable, in
which 𝑓𝑖 observations fall in the interval whose centre is 𝑥𝑖 (𝑖 =
1, 2, ⋯ , 𝑘 ), is
2
1 𝑘 1 σ𝑘
𝑖=1 𝑓𝑖 𝑥𝑖
σ𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥ҧ 2 or σ𝑘𝑖=1 𝑓𝑖 𝑥𝑖2 −
𝑁 𝑁 𝑁
The semi-inter-quartile range
(or quartile deviation)
The semi-inter-quartile range (or quartile deviation)
• The variance, the standard deviation and the mean deviation go
naturally with the mean.
• They are based on deviations from the mean, and the averaging
process is the same as that for calculating the mean.
Rank order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Quartiles . . . Q1 . . . M . . . Q3
Example
Rank order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of seats sold 54 57 60 67 72 74 75 78 83 87 88 93 98 99 100
Quartiles . . . Q1 . . . M . . . Q3
The general term for these measures is quantiles; the quartiles, deciles
and percentiles are examples.
Example: Deciles and quartiles of a grouped continuous
variable (Clarke & Cooke, 4th Ed., example 4.6.1)
The incomes of married couples over retiring age in 1973 are shown in columns 1
and 2 of the Table below. Draw a cumulative frequency curve for the data, and
from it estimate the lowest decile, the median, the lower and upper quartiles of
income. Use the curve to estimate the proportion of married couples who had a
gross weekly income between £22 and £28.
2000
1500
Frequency
1000
500
0
0 5 10 15 20 25 30 35 40 45 50
Gross weekly income (£)
Example: Deciles and quartiles of a grouped continuous
variable (Clarke & Cooke, 4th Ed., example 4.6.1)
• Note that the upper limit has been conveniently set at £50
• The total frequency, in thousands, is 2059.
• One-tenth of the total frequency must lie below the first decile. Thus from the
graph we need to find the income corresponding to 205.9 on the vertical scale: it
is £14.40 as accurately as we can read it. The first decile is thus £14.40.
• From the graph, the median has to have half the total frequency, i.e. 1029.5,
below it. The income corresponding to this is £19.80.
1
• The quartiles correspond to cumulative frequencies of × 2059 = 514.75 and
4
3
× 2059 = 1544.25; they are therefore, from the graph, 𝑄1 = £16.20 and 𝑄3 =
4
£26.20
Example: Deciles and quartiles of a grouped continuous
variable (Clarke & Cooke, 4th Ed., example 4.6.1)
• Finally from the graph the cumulative frequency up to £22 is 1230 thousands,
and up to £28 is 1590 thousands, so that the number of married couples having
360
incomes between £22 and £28 is 360 thousands. This as a proportion = =
2059
1
0.175 (i.e 17 %) of the whole.
2
Coefficient of variation
• The coefficient of variation (CV) is a relative measure of variability
that indicates the size of a standard deviation in relation to its mean.
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝐶𝑉 =
𝑀𝑒𝑎𝑛
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 5
𝐶𝑉 = = = 0.25
𝑀𝑒𝑎𝑛 20
• The coefficient of variation is 0.25. This value tells us the relative size of the
standard deviation compared to the mean.
• Values less than one indicate that the standard deviation is smaller
than the mean (typical), while values greater than one occur when
the S.D. is greater than the mean, a phenomenon referred to as
overdispersion
• For the five minute standard deviation in the pizza delivery example, we
know that the typical delivery occurs five minutes before or after the mean
delivery time.
• That information is very useful! It tells us the variability in our data using,
conveniently, the original measurement units. We can conceivably compare
this delivery time variability to another pizza restaurant.
Absolute versus Relative Measures of Variability
• In the CV ratio, both the standard deviation and the mean use the
same units, which cancels them out and produces a unitless statistic.
These values use the same unit of measurement (Malawi Kwacha), allowing you to compare the
standard deviations.
The variability in high-income household expenses is much greater than low-income households
(MWK125,000 vs. MWK10,000). However, given the vast difference in mean expenses, that’s not
surprising.
However, if you want to compare variability while accounting for the disparate means, you need to use
a relative measure of variability, such as the coefficient of variation. The table below shows that when
you account for the differences in expenses, the low-income group actually has equal variability.