MODULE 4 - Variability
MODULE 4 - Variability
MODULE 4: VARIABILITY
https://www.investopedia.com/terms/v/variability.asp
Module Introduction
In this module, we will introduce the statistical concept of variability. We will
describe the methods that are used to measure and objectively describe the differences that
exist from one score to another within a distribution. In addition to describing distributions
of scores, variability also helps us determine which outcomes are likely and which are very
unlikely to be obtained. This aspect of variability will play an important role in inferential
statistics.
Module Content
4.1 INTRODUCTION TO VARIABILITY
The term variability has much the same meaning in statistics as it has in everyday
language; to say that things are variable means that they are not all the same. In statistics,
our goal is to measure the amount of variability for a particular set of scores, a distribution.
In simple terms, if the scores in a distribution are all the same, then there is no variability.
If there are small differences between scores, then the variability is small, and if there are
large differences between scores, then the variability is large.
The purpose for measuring variability is to obtain an objective measure of how the
scores are spread out in a distribution. In general, a good measure of variability serves two
purposes:
1. Variability describes the distribution. Specifically, it tells whether the scores are
clustered close together or are spread out over a large distance. Usually, variability
is defined in terms of distance. It tells how much distance to expect between one
score and another, or how much distance to expect between an individual score and
the mean.
2. Variability measures how well an individual score (or group of scores) represents
the entire distribution. This aspect of variability is very important for inferential
statistics, in which relatively small samples are used to answer questions about
populations.
There are three different measures of variability: the range, standard deviation, and
variance. Of these three, the standard deviation and the related measure of variance are by
far the most important.
THE RANGE
The obvious first step toward defining and measuring variability is the range,
which is the distance covered by the scores in a distribution, from the smallest score to the
largest score. The range is defined as the difference between the highest and lowest value
in the distribution. It is easy to compute, however this measure of the variaton is the least
reliable because it is easily influenced by the extreme values.
R=H-L
where:
R – range
H – represents the highest value
L – represents the lowest value
Example 1: The ages of 15 students in a certain class were taken as shown below.
15, 18, 17, 16, 19, 21, 18, 16, 17, 20, 21, 19, 24, 23, 18
R=H–L
R = 24-15
R=9
R = HUCB - LLCB
where:
R – range
HUCB – represents the highest upper class boundary
LLCB – represents the lowest lower class boundary
Example 2:
Class Interval f
21 – 23 3 R = HUCB – LLCB
24 – 26 4 R = 38.5 – 20.5
27 – 29 6 R = 18
30 – 32 10
33 – 35 5
36 - 38 2
INTER-QUARTILE RANGE
This value is obtained by getting the difference between the third and the first
quartile:
IQR = Q3 – Q1
where:
Q3 – the third quartile
Q1 – the first quartile
𝒊(𝒏+𝟏) th
Qi = ( ) value ; where i = 1, 2, 3
𝟒
Example 3: A random sample of 15 patients yielded the following data on the length of
stay (in days) in the hospital. 5, 6, 9, 10, 15, 10, 14, 12, 10, 13, 13, 9, 8, 10, 12. Find
quartiles.
1. Arrange the data in ascending order: 5, 6, 8, 9, 9, 10, 10, 10, 10, 12, 12, 13, 13, 14, 15
2. Find the first quartile Q1
𝑖(𝑛+1) th
Qi = (
4
) value
1(15+1) th
Q1 = (
4
) value
1(16) th
Q1 = (
4
) value
Q1 = (4)th value
Q1 = 9
𝒊(𝒏) th
Qi = ( ) value
𝟒
where:
value i = 1, 2, 3
n – the total number of observations
𝒊(𝒏)
𝟒
− <𝑐𝑓
Qi = L + (
𝒇
)c
where:
L - the lower boundary of the quartile class
n - total number of observations
f - frequency of the quartile class
<cf - cumulative frequency of the class previous to quartile class
c - the class width/ class interval
Example 4: The following table gives the amount of time (in minutes) spent on the
internet each evening by a group of 56 students.
The cumulative frequency just greater The cumulative frequency just greater
than or equal to 14 is 15. The than or equal to 42 is 54. The
corresponding class 12.5−15.5 is the 1st corresponding class 18.5−21.5 is the 3rd
quartile class. quartile class.
𝒊(𝒏) 𝑖(𝑛)
𝟒
− <𝒄𝒇 4
− <𝑐𝑓
Qi = L + (
𝒇
)c Qi = L + (
𝑓
)c
𝟏(𝟓𝟔) 3(56)
𝟒
−𝟑 4
− 30
Q1 = 12.5 + (
𝟏𝟐
)3 Q3 = 18.5 + (
24
)3
𝟏𝟒− 𝟑 42− 30
Q1 = 12.5 + (
𝟏𝟐
)3 Q3 = 18.5 + (
24
)3
𝟏𝟏 12
Q1 = 12.5 + ( ) 3 Q3 = 18.5 + ( ) 3
𝟏𝟐 24
Q1 = 12.5 + 2.75 Q3 = 18.5 + 1.5
Q1 = 15.25 Q3 = 20
𝜮∣𝒙–𝒙∣
MAD =
𝒏
where:
MAD – mean absolute deviation
x – individual values
x – the mean of the distribution
n – sample size or population
Example 5: Consider the following values: 13, 16, 9, 6, 15, 7, 11. Determine the value of
MAD.
𝛴∣𝑥–𝑥∣
MAD =
𝑛
1. Mean: = 77/7 = 11
𝜮 𝒇∣ 𝒙 – 𝒙 ∣
MAD =
𝒏
where:
MAD – average deviation / mean adsolute deviation
x – class mark or midpoint
x – the mean of the distribution
n – sample size or population
Example 6:
x
Classes f
(class
fx
∣x–x∣ f∣x–x∣
mark) (x – 8.36)
X= L+U/2
2-4 2 3 6 5.36 10.72
5-7 3 6 18 2.36 7.08
8 – 10 6 9 54 0.64 3.84
11 - 13 2 12 24 3.64 7.28
14 - 16 1 15 15 6.64 6.64
Total Σf = 14 Σfx = 117 Σf ∣ x – x ∣ = 35.56
𝜮 𝒇∣ 𝒙 – 𝒙 ∣
MAD =
𝒏
35.56
MAD =
14
MAD = 2.54
Standard deviation and variance are the most commonly used measures of
variability. Both of these measures are based on the idea that each score can be described
in terms of its deviation or distance from the mean. The variance is the mean of the squared
deviations. The standard deviation is the square root of the variance and provides a
measure of the standard distance from the mean.
VARIANCE
Variance equals the mean of the squared deviations. Variance is the average
squared distance from the mean.
𝜮 (𝒙 – 𝝁)𝟐 𝜮 (𝒙 – 𝒙 )𝟐
σ2 = S2 =
𝑵 𝒏−𝟏
or or
(𝜮𝑿)𝟐 (𝛴𝑋)2
𝜮𝑿𝟐 – 𝛴𝑋2 –
𝑵 𝑛
σ =
2 2
S =
𝑵 𝑛−1
where: where:
σ2 – population variance S2 – sample variance
x – individual score x – individual score
μ – mean x – mean
N – total number of scores n – total number of scores
𝜮𝒇 (𝒙 – 𝝁)𝟐 𝜮𝑓 (𝒙 – 𝒙 )𝟐
σ2 = S2 =
𝑵 𝒏−𝟏
or or
(𝜮𝒇𝑿)𝟐 (𝛴𝑓𝑋)2
𝜮𝒇(𝑿)𝟐 – 𝛴𝑓(𝑋)2 –
𝑵 𝑛
σ2 = S2 =
𝑵 𝑛−1
where: where:
σ2 – population variance S2 – sample variance
f - frequency f - frequency
x – class mark x – class mark
μ – mean x – mean
N – total number of scores n – total number of scores
STANDARD DEVIATION
The standard deviation is the most commonly used and the most important measure of
variability. Standard deviation uses the mean of the distribution as a reference point and
measures variability by considering the distance between each score and the mean.
In simple terms, the standard deviation provides a measure of the standard, or average,
distance from the mean, and describes whether the scores are clustered closely around the mean
or are widely scattered.
𝜮 (𝒙 – 𝝁)𝟐 𝜮 (𝒙 – 𝒙 )𝟐
σ=√ S=√
𝑵 𝒏−𝟏
or or
(𝛴𝑋)2
𝜮(𝑿)𝟐 – (𝜮𝑿)𝟐 𝛴(𝑋)2 –
σ= √ 𝑵 S= √ 𝑛
𝑵 𝑛−1
or simply or simply
where: where:
σ – population standard deviation S – sample standard deviation
x – individual score x – individual score
μ – mean x – mean
N – total number of scores n – total number of scores
𝜮𝒇 (𝒙 – 𝝁)𝟐 𝜮𝑓 (𝒙 – 𝒙 )𝟐
σ=√ S=√
𝑵 𝒏−𝟏
or or
(𝛴𝑓𝑋)2
𝜮𝒇(𝑿)𝟐 – (𝜮𝒇𝑿)𝟐 𝛴𝑓(𝑋)2 –
σ= √ 𝑵 S= √ 𝑛
𝑵 𝑛−1
or simply or simply
where: where:
σ – population standard deviation S – sample standard deviation
f - frequency f - frequency
x – class mark x – class mark
μ – mean x – mean
N – total number of scores n – total number of scores
COEFFICIENT OF VARIATION
The coefficient of variation is a relative measure of dispersion. It may be utilized
to determine the degree of extend of variability of two sets of data relative to the mean of
the distribution of each set of data.
𝛔 S
CV = (100%) CV = (100%)
𝒙 𝑥
where: where:
σ – population standard deviation S – sample standard deviation
x – mean x – mean
x x–μ ( x – μ)2 x x2
3 -3.44 11.83 3 9
13 6.56 43.03 13 169
11 4.56 20.79 11 121
15 8.56 73.27 15 225
5 -1.44 2.07 5 25
4 -2.44 5.95 4 16
2 -4.44 19.71 2 4
3 -3.44 11.83 3 9
2 -4.44 19.71 2 4
ΣX = 58 0.04 208.22 ΣX = 58 ΣX = 582
2
𝟐𝟎𝟖.𝟐𝟐 (58)2
σ2 = 582 – 9
𝟗 σ2 =
σ2 = 23.14 9
3364
582 – 9
σ2 =
9
582 – 373.78
σ2 =
9
σ2 = 23.14
𝟐𝟎𝟖.𝟐𝟐 (58)2
S2 = 582 – 9
𝟗−𝟏 2
S =
2 𝟐𝟎𝟖.𝟐𝟐 9−1
S = 3364
𝟖 582 –
S2 = 26.03 9
S2 =
9−1
582 – 373.78
S2 =
8
S2 = 26.03
S=√
𝟐𝟎𝟖.𝟐𝟐 582 – (58)2
𝟗−𝟏 S=√ 9−1
9
𝟐𝟎𝟖.𝟐𝟐
S=√ 582 – 9 3364
𝟖 S=√ 8
S = √𝟐𝟔. 𝟎𝟑
582 – 373.78
S = 5.10 S=√ 8
208.22
S=√
8
S = √26.03
S = 5.10
𝟐𝟏𝟎 (165)2
σ2 = 2025 –
15
𝟏𝟓 σ2 =
σ2 = 14 15
27225
2025 – 15
σ2 =
15
2025 – 1815
σ2 =
15
σ2 = 14
σ=√
𝟐𝟏𝟎 2025 – (165)2
𝟏𝟓 σ=√ 15
15
σ = √𝟏𝟒
σ = 3.74 2025 – 27225
σ=√ 15
15
2025 – 1815
σ=√ 15
𝟐𝟏𝟎
σ=√
𝟏𝟓
σ = √14
σ = 3.74
𝟐𝟏𝟎 (165)2
S=√ 2025 – 15
𝟏𝟓−𝟏 S=√ 15−1
𝟐𝟏𝟎
S=√ 2025 – 15 27225
𝟏𝟒 S=√ 14
S = √𝟏𝟓
2025 – 1815
S = 3.87 S=√ 14
𝟐𝟏𝟎
S=√
𝟏𝟒
S = √15
S = 3.87
• A quartile is a type of quantile which divides the number of data points into four
more or less equal parts, or quarters. The first quartile (Q1) is defined as the middle
number between the smallest number and the median of the data set.
A quartile divides data into three points – a lower quartile, median, and
upper quartile, to form four groups of the data set. The lower quartile or
first quartile is denoted as Q1 and is the middle number that falls between the
smallest value of the data set and the median.
• A decile is usually used to assign decile ranks to a data set. A decile rank arranges
the data in order from lowest to highest and is done on a scale of one to ten where
each successive number corresponds to an increase of 10 percentage points.
• A percentile (or a centile) is a measure used in statistics indicating the value below
which a given percentage of observations in a group of observations falls. For
example, the 20th percentile is the value (or score) below which 20% of the
observations may be found.
Where:
Qi or Di or Pi is a particular score or entry
L is the lower class boundary of the class interval containing in/4 or 10 or 100
f is the frequency of the class interval
cf is the sum of all frequencies of the intervals immediately before the class interval where
Qi or Di or Pi is contained
n is the total frequency;
c is the class size
Classes f <cf
14 - 16 5 15
11 - 13 4 10
8 - 10 3 6
5-7 2 3
2-4 1 1
Σf = 15
The cumulative frequency just The cumulative frequency just The cumulative frequency just
greater than or equal to 11.25 greater than or equal to 10.5 is greater than or equal to 3 is 3. It
is 15. It lies in the class 14-16 15. It lies in the class 14-16 lies in the class 5-7 and
and corresponding class and corresponding class corresponding class boundary is
boundary is 13.5-16.5. boundary is 13.5-16.5. 4.5-7.5.
The lower boundary point of The lower boundary point of The lower boundary point of
13.5-16.5 is 13.5. 13.5-16.5 is 13.5. 4.5-7.5 is 4.5.
L=13.5 L=13.5
L=4.5
𝒊𝒏 𝑖𝑛
𝟒
− <𝒄𝒇 10
− <𝑐𝑓
Qi = L + (
𝒇
)c Di = L + (
𝑓
)c 𝑖𝑛
− <𝑐𝑓
𝟑(𝟏𝟓) 7(15) Pi = L + ( 100 )c
− 𝟏𝟎 − 10 𝑓
𝟒 10
Q3 = 13.5 + (
𝟓
)3 D7 = 13.5 + (
5
)3 20(15)
−1
𝟏𝟏.𝟐𝟓 − 𝟏𝟎 10.5 − 10 P20= 4.5 + ( 100 )3
Q3 = 13.5 + ( )3 D7 = 13.5 + ( )3 2
𝟓 5 3−1
𝟏.𝟐𝟓 0.5 P20= 4.5 + (
2
)3
Q3 = 13.5 + (
𝟓
)3 D7 = 13.5 + (
5
)3 2
Q3 = 13.5 + 0.75 D7 = 13.5 + 0.3 P20= 4.5 + ( ) 3
2
Q3 = 14.25 D7 = 13.8 P20= 4.5 + 3
P20= 7.5
Other Assessments:
• Online Recitation – This will incorporate a video conference within online
teaching to give learning a more personal touch. During scheduled brief online
interviews, students can demonstrate their proficiency in most essential topics.
(Google Meet/FB Messenger)
• Online Activities – These are integral part of the course. This may come in various
tasks such as group work, individual activity, research work, extended reading and
the like. This will provide opportunities for the students to transfer the concepts
they have learned in class to a more concrete situation and to equally participate in
class discussion.
Learning Reference
Gravetter, F.J. & Wallnau, L.B. (2016). Statistics for the Behavioral Sciences, 10th Edition;
Boston, MA: Cengage