Measures of Spread
Measures of Spread
Introduction
A measure of spread, sometimes also called a measure of dispersion, is used
to describe the variability in a sample or population. It is usually used in
conjunction with a measure of central tendency, such as the mean or median,
to provide an overall description of a set of data.
Range
The range is the difference between the highest and lowest scores in a data
set and is the simplest measure of spread. So we calculate range as:
23 56 45 65 59 55 62 54 85 25
The maximum value is 85 and the minimum value is 23. This results in a
range of 62, which is 85 minus 23. Whilst using the range as a measure of
spread is limited, it does set the boundaries of the scores. This can be useful
if you are measuring a variable that has either a critical low or high threshold
(or both) that should not be crossed. The range will instantly inform you
whether at least one value broke these critical thresholds. In addition, the
range can be used to detect any errors when entering data. For example, if
you have recorded the age of school children in your study and your range is
7 to 123 years old you know you have made a mistake!
OrderScoreOrderScoreOrderScoreOrderScoreOrderScore
1st 35 21st 42 41st 53 61st 64 81st 74
2nd 37 22nd 42 42nd 53 62nd 64 82nd 74
3rd 37 23rd 44 43rd 54 63rd 65 83rd 74
4th 38 24th 44 44th 55 64th 66 84th 75
5th 39 25th 45 45th 55 65th 67 85th 75
6th 39 26th 45 46th 56 66th 67 86th 76
7th 39 27th 45 47th 57 67th 67 87th 77
8th 39 28th 45 48th 57 68th 67 88th 77
9th 39 29th 47 49th 58 69th 68 89th 79
10th 40 30th 48 50th 58 70th 69 90th 80
11th 40 31st 49 51st 59 71st 69 91st 81
12th 40 32nd 49 52nd 60 72nd 69 92nd 81
13th 40 33rd 49 53rd 61 73rd 70 93rd 81
14th 40 34th 49 54th 62 74th 70 94th 81
15th 40 35th 51 55th 62 75th 71 95th 81
16th 41 36th 51 56th 62 76th 71 96th 81
17th 41 37th 51 57th 63 77th 71 97th 83
18th 42 38th 51 58th 63 78th 72 98th 84
19th 42 39th 52 59th 64 79th 74 99th 84
20th 42 40th 52 60th 64 80th 74 100th 85
The first quartile (Q1) lies between the 25th and 26th student's marks,
the second quartile (Q2) between the 50th and 51st student's marks, and
the third quartile (Q3) between the 75th and 76th student's marks. Hence:
Quartiles are a useful measure of spread because they are much less
affected by outliers or a skewed data set than the equivalent measures of
mean and standard deviation. For this reason, quartiles are often reported
along with the median as the best choice of measure of spread and central
tendency, respectively, when dealing with skewed and/or data with outliers. A
common way of expressing quartiles is as an interquartile range. The
interquartile range describes the difference between the third quartile (Q3)
and the first quartile (Q1), telling us about the range of the middle half of the
scores in the distribution. Hence, for our 100 students:
Interquartile range = Q3 - Q1
= 71 - 45
= 26
However, it should be noted that in journals and other publications you will
usually see the interquartile range reported as 45 to 71, rather than the
calculated range.
The absolute and mean absolute deviation show the amount of deviation
(variation) that occurs around the mean score. To find the total variability in
our group of data, we simply add up the deviation of each score from the
mean. The average deviation of a score can then be calculated by dividing
this total by the number of scores. How we calculate the deviation of a score
from the mean depends on our choice of statistic, whether we use absolute
deviation, variance or standard deviation.
To find out the total variability in our data set, we would perform this
calculation for all of the 100 students' scores. However, the problem is that
because we have both positive and minus signs, when we add up all of these
deviations, they cancel each other out, giving us a total deviation of zero.
Since we are only interested in the deviations of the scores and not whether
they are above or below the mean score, we can ignore the minus sign and
take only the absolute value, giving us the absolute deviation. Adding up all
of these absolute deviations and dividing them by the total number of scores
then gives us the mean absolute deviation (see below). Therefore, for our 100
students the mean absolute deviation is 12.81, as shown below:
Variance
Another method for calculating the deviation of a group of scores from the
mean, such as the 100 students we used earlier, is to use the variance. Unlike
the absolute deviation, which uses the absolute value of the deviation in order
to "rid itself" of the negative values, the variance achieves positive values by
squaring each of the deviations instead. Adding up these squared deviations
gives us the sum of squares, which we can then divide by the total number of
scores in our group of data (in other words, 100 because there are 100
students) to find the variance (see below). Therefore, for our 100 students, the
variance is 211.89, as shown below:
As a measure of variability, the variance is useful. If the scores in our group of data are spread
out, the variance will be a large number. Conversely, if the scores are spread closely around the
mean, the variance will be a smaller number. However, there are two potential problems with
the variance. First, because the deviations of scores from the mean are 'squared', this gives
more weight to extreme scores. If our data contains outliers (in other words, one or a small
number of scores that are particularly far away from the mean and perhaps do not represent
well our data as a whole), this can give undo weight to these scores. Secondly, the variance is
not in the same units as the scores in our data set: variance is measured in the units squared.
This means we cannot place it on our frequency distribution and cannot directly relate its value
to the values in our data set. Therefore, the figure of 211.89, our variance, appears somewhat
arbitrary. Calculating the standard deviation rather than the variance rectifies this problem.
Nonetheless, analyzing variance is extremely important in some statistical analyses, discussed
in other statistical guides.
Standard Deviation
Introduction
The standard deviation is a measure of the spread of scores within a set of
data. Usually, we are interested in the standard deviation of a population.
However, as we are often presented with data from a sample only, we can
estimate the population standard deviation from a sample standard deviation.
These two standard deviations - sample and population standard deviations -
are calculated differently. In statistics, we are usually presented with having to
calculate sample standard deviations, and so this is what this article will focus
on, although the formula for a population standard deviation will also be
shown.
What type of data should you use when you calculate a standard
deviation?
The standard deviation is used in conjunction with the mean to
summarize continuous data, not categorical data. In addition, the standard
deviation, like the mean, is normally only appropriate when the continuous
data is not significantly skewed or has outliers.
Where,
Where,