Measure Variability
Measure Variability
BMLS II-2
Measure Variability
Statisticians use summary measures to describe the amount of variability or spread in a set of
data. The most common measures of variability are the range, the interquartile range (IQR),
variance, and standard deviation.
The Range
The range is the difference between the largest and smallest values in a set of values.
For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11. For this set of numbers, the
range would be 11 - 1 or 10.
Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part
are called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3,
respectively.
Q1 is the "middle" value in the first half of the rank-ordered data set.
Q2 is the median value in the set.
Q3 is the "middle" value in the second half of the rank-ordered data set.
The interquartile range is equal to Q3 minus Q1. For example, consider the following numbers:
1, 2, 3, 4, 5, 6, 7, 8.
Q2 is the median of the entire data set - the middle value. In this example, we have an even
number of data points, so the median is equal to the average of the two middle values. Thus,
Q2 = (4 + 5)/2 or Q2 = 4.5. Q1 is the middle value in the first half of the data set. Since there are
an even number of data points in the first half of the data set, the middle value is the average
of the two middle values; that is, Q1 = (2 + 3)/2 or Q1 = 2.5. Q3 is the middle value in the
second half of the data set. Again, since the second half of the data set has an even number of
1
DACUMOS, Alexander N. BMLS II-2
observations, the middle value is the average of the two middle values; that is, Q3 = (6 + 7)/2 or
Q3 = 6.5. The interquartile range is Q3 minus Q1, so IQR = 6.5 - 2.5 = 4.
Notice that this process divided the data set into four parts of equal size. The first part consists
of 1 and 2; the second part, 3 and 4; the third part, 5 and 6; and the fourth part, 7 and 8.
The Variance
In a population, variance is the average squared deviation from the population mean, as
defined by the following formula:
σ2 = Σ ( Xi - μ )2 / N
where σ2 is the population variance, μ is the population mean, X i is the ith element from the
population, and N is the number of elements in the population.
Observations from a simple random sample can be used to estimate the variance of a
population. For this purpose, sample variance is defined by slightly different formula, and uses
a slightly different notation:
s2 = Σ ( xi - x )2 / ( n - 1 )
where s2 is the sample variance, x is the sample mean, xi is the ith element from the sample,
and n is the number of elements in the sample. Using this formula, the sample variance can be
considered an unbiased estimate of the true population variance. Therefore, if you need to
estimate an unknown population variance, based on data from a simple random sample, this is
the formula to use.
2
DACUMOS, Alexander N. BMLS II-2
Normal Distribution
Normal Probability Distributions
The Normal Probability Distribution is very common in the field of statistics.
Whenever you measure things like people's height, weight, salary, opinions or votes, the graph
of the results is very often a normal curve.
The Normal Distribution
A random variable X whose distribution has the shape of a normal curve is called a normal
random variable.
3
DACUMOS, Alexander N. BMLS II-2
This random variable X is said to be normally distributed with mean μ and standard
deviation σ if its probability distribution is given by
4
DACUMOS, Alexander N. BMLS II-2
We can transform all the observations of any normal random variable X with mean μ and
variance σ to a new set of observations of another normal random variable Z with mean 0 and
variance 1 using the following transformation:
Since all the values of X falling between x1 and x2 have corresponding Z values
between z1 and z2, it means:
The area under the X curve between X = x1 and X = x2 equals the area under the Z curve
between Z = z1 and Z = z2.
Hence, we have the following equivalent probabilities:
5
DACUMOS, Alexander N. BMLS II-2
In the above graph, we have indicated the areas between the regions as follows:
−1 ≤ Z ≤ 1 68.27%
−2 ≤ Z ≤ 2 95.45%
−3 ≤ Z ≤ 3 99.73%
This means that 68.27% of the scores lie within 1 standard deviation of the mean.
This comes from:
Also, 95.45% of the scores lie within 2 standard deviations of the mean.
This comes from:
Finally, 99.73% of the scores lie within 3 standard deviations of the mean.
This comes from:
The z-Table
The areas under the curve bounded by the ordinates z = 0 and any positive value of z are found
in the z-Table. From this table the area under the standard normal curve between any two
ordinates can be found by using the symmetry of the curve about z = 0.