Measures of Variability
Measures of Variability
Variability
What is Variability?
Variability refers to how "spread out" a group of scores is. To see what we mean by spread out,
consider graphs in Figure 1. These graphs represent the scores on two quizzes. The mean score
for each quiz is 7.0. Despite the equality of means, you can see that the distributions are quite
different. Specifically, the scores on Quiz 1 are more densely packed and those on Quiz 2 are
more spread out. The differences among students were much greater on Quiz 2 than on Quiz 1.
The terms variability, spread, and dispersion are synonyms, and refer to how spread out a
distribution is. Just as in the section on central tendency where we discussed measures of the
center of a distribution of scores, in this chapter we will discuss measures of the variability of a
distribution. There are four frequently used measures of variability: the range, interquartile
range, variance, and standard deviation. In the next few paragraphs, we will look at each of
these four measures of variability in more detail.
Range
The range is the simplest measure of variability to calculate, and one you have probably
encountered many times in your life. The range is simply the highest score minus the lowest
score.
What is the range of the following group of numbers: 10, 2, 5, 6, 7, 3, 4?
the highest number is 10, and the lowest number is 2, so 10 - 2 = 8. The range is 8.
Let’s take another example. Here’s a dataset with 10 numbers: 99, 45, 23, 67, 45, 91, 82, 78, 62,
51. What is the range? The highest number is 99 and the lowest number is 23,
so 99 - 23 equals 76
the range is 76.
Now consider the two quizzes shown in Figure 1. On Quiz 1, the lowest score is 5 and the
highest score is 9. Therefore, the range is 4. The range on Quiz 2 was larger: the lowest score
was 4 and the highest score was 10. Therefore the range is 6.
Interquartile Range
Quartile – they are values that divide the data set in to 4 equal parts.
Here there are lower quartile, middle quartile, upper quartile.
As example 62, 18, 22, 11, 40, 41, 70
First arrange the data in increasing order or decreasing order.
So the data will be 11, 18, 22, 40, 41, 62, 70.
So for the lower quartile=
For middle quartile =
Upper quartile =
IQR=
For Quiz 1, the 75th percentile is 8 and the 25th percentile is 6. The interquartile range is
therefore 2. For Quiz 2, which has greater spread, the 75th percentile is 9, the 25th percentile is
5, and the interquartile range is 4.
Variance
Variability can also be defined in terms of how close the scores in the distribution are to the
middle of the distribution. Using the mean as the measure of the middle of the distribution, the
variance is defined as the average squared difference of the scores from the mean. The data
from Quiz 1 are shown in Table 1. The mean score is 7.0. Therefore, the column "Deviation from
Mean" contains the score minus 7. The column "Squared Deviation" is simply the previous
column squared.
Table 1. Calculation of Variance
for Quiz 1
Scores Deviation from Mean Squared Deviation
9 2 4
9 2 4
9 2 4
8 1 1
8 1 1
8 1 1
8 1 1
7 0 0
7 0 0
7 0 0
7 0 0
7 0 0
6 -1 1
6 -1 1
6 -1 1
6 -1 1
6 -1 1
6 -1 1
5 -2 4
5 -2 4
Means
7 0 1.5
One thing that is important to notice is that the mean deviation from the mean is 0. This will
always be the case. The mean of the squared deviations is 1.5. Therefore, the variance is 1.5.
Analogous calculations with Quiz 2 show that its variance is 6.7. The formula for the variance is:
where is the variance, μ is the mean, and N is the number of numbers. For Quiz 1, μ = 7 and N =
20.
f the variance in a sample is used to estimate the variance in a population, then the previous formula underestimates the
variance and the following formula should be used:
where is the estimate of the variance and M is the sample mean. Note that M is the mean of a sample taken from a population
with a mean of μ. Since, in practice, the variance is usually computed in a sample, this formula is most often used. The
simulation "estimating variance" illustrates the bias in the formula with N in the denominator.
Let's take an example. Assume the scores 1, 2, 4, and 5 were sampled from a larger population. To estimate the variance in the
population you would compute as follows:
M = (1 + 2 + 4 + 5)/4 = 12/4 = 3.
= [(1-3)2 + (2-3)2 + (4-3)2 + (5-3)2]/(4-1)
= (4 + 1 + 1 + 4)/3 = 10/3 = 3.333
Standard Deviation
The standard deviation is simply the square root of the variance.
This makes the standard deviations of the two quiz distributions 1.257 and 2.203.
The spread of statistical data is measured by the standard deviation. Distribution measures the
deviation of data from its mean or average position. The degree of dispersion is computed by the
method of estimating the deviation of data points. It is denoted by the symbol, ‘σ’.
Example
When a die is rolled, the possible outcome will be 6. So the sample space, n = 6 and the data set
= { 1;2;3;4;5;6}.
To find the variance, first, we need to calculate the mean of the data set.
Mean, x̅ = (1+2+3+4+5+6)/6 = 3.5
We can put the value of data and mean in the formula to get;
σ2 = Σ (xi – x̅)2/n
σ2 = ⅙ (6.25+2.25+0.25+0.25+2.25+6.25)
σ2 = 2.917
Now, the standard deviation , σ = √2.917 = 1.708