Variance and Standard Deviation
Variance and Standard Deviation
www.statcan.gc.ca
Skip to content | Skip to institutional links
Français
Home
Contact Us
Help
Search
canada.gc.ca
Home > Publications > 12-004-X > Main page > Measures of spread >
Publications
Measures of spread
o Welcome page
o Range and quartiles
o Variance and standard deviation
o Five-number summaries
o Constructing box and whisker plots
o Exercises
o Answers
Unlike range and quartiles, the variance combines all the values in a data set to produce a
measure of spread. The variance (symbolized by S2) and standard deviation (the square root of
the variance, symbolized by S) are the most commonly used measures of spread.
We know that variance is a measure of how spread out a data set is. It is calculated as the
average squared deviation of each number from the mean of a data set. For example, for the
numbers 1, 2, and 3 the mean is 2 and the variance is 0.667.
Calculating variance involves squaring deviations, so it does not have the same unit of
measurement as the original observations. For example, lengths measured in metres (m) have a
variance measured in metres squared (m2).
Taking the square root of the variance gives us the units used in the original scale and this is the
standard deviation.
Standard deviation is the measure of spread most commonly used in statistical practice when the
mean is used to calculate central tendency. Thus, it measures spread around the mean. Because
of its close links with the mean, standard deviation can be greatly affected if the mean gives a
poor measure of central tendency.
Standard deviation is also influenced by outliers one value could contribute largely to the results
of the standard deviation. In that sense, the standard deviation is a good indicator of the
presence of outliers. This makes standard deviation a very useful measure of spread for
symmetrical distributions with no outliers.
Standard deviation is also useful when comparing the spread of two separate data sets that have
approximately the same mean. The data set with the smaller standard deviation has a narrower
spread of measurements around the mean and therefore usually has comparatively fewer high or
low values. An item selected at random from a data set whose standard deviation is low has a
better chance of being close to the mean than an item from a data set whose standard deviation
is higher.
Generally, the more widely spread the values are, the larger the standard deviation is. For
example, imagine that we have to separate two different sets of exam results from a class of
30 students the first exam has marks ranging from 31% to 98%, the other ranges from 82% to
93%. Given these ranges, the standard deviation would be larger for the results of the first
exam.
Standard deviation might be difficult to interpret in terms of how big it has to be in order to
consider the data widely spread. The size of the mean value of the data set depends on the size
of the standard deviation. When you are measuring something that is in the millions, having
measures that are "close" to the mean value does not have the same meaning as when you are
measuring the weight of two individuals. For example, a measure of two large companies with a
difference of $10,000 in annual revenues is considered pretty close, while the measure of two
individuals with a weight difference of 30 kilograms is considered far apart. This is why, in most
situations, it is useful to assess the size of the standard deviation relative to the mean of the
data set.
Although standard deviation is less susceptible to extreme values than the range, standard
deviation is still more sensitive than the semi-quartile range. If the possibility of high values
(outliers) presents itself, then the standard deviation should be supplemented by the semi-
quartile range.
Standard deviation is only used to measure spread or dispersion around the mean of a
data set.
Standard deviation is never negative.
Standard deviation is sensitive to outliers. A single outlier can raise the standard
deviation and in turn, distort the picture of spread.
For data with approximately the same mean, the greater the spread, the greater the
standard deviation.
If all values of a data set are the same, the standard deviation is zero (because each
value is equal to the mean).
When analysing normally distributed data, standard deviation can be used in conjunction with
the mean in order to calculate data intervals.
If = mean, S = standard deviation and x = a value in the data set, then
Discrete variables
The standard deviation for a discrete variable made up of n observations is the positive square
root of the variance and is defined as:
Use this step-by-step approach to find the standard deviation for a discrete variable.
A hen lays eight eggs. Each egg was weighed and recorded as follows:
60 g, 56 g, 61 g, 68 g, 51 g, 53 g, 69 g, 54 g.
Weight (x) (x - ) (x - )2
60 1 1
56 -3 9
61 2 4
68 9 81
51 -8 64
53 -6 36
69 10 100
54 -5 25
472 320
c.
Using the information from the above table, we can see that
In order to calculate the standard deviation, we must use the following formula:
The formulas for variance and standard deviation change slightly if observations are grouped into
a frequency table. Squared deviations are multiplied by each frequency's value, and then the
total of these results is calculated.
Thirty farmers were asked how many farm workers they hire during a typical harvest season.
Their responses were:
4, 5, 6, 5, 3, 2, 8, 0, 4, 6, 7, 8, 4, 5, 7, 9, 8, 6, 7, 5, 5, 4, 2, 1, 9, 3, 3, 4, 6, 4
Table 2. Thirty farmers were asked how many farm workers they
hire during a typical harvest season. Their responses were:
0 1 0 -5 25 25
1 1 1 -4 16 16
2 2 4 -3 9 18
3 3 9 -2 4 12
4 6 24 -1 1 6
5 5 25 0 0 0
6 4 24 1 1 4
7 3 21 2 4 12
8 3 24 3 9 27
9 2 18 4 16 32
30 150 152
220 students were asked the number of hours per week they spent watching television. With this
information, calculate the mean and standard deviation of hours spent watching television by the
220 students.
10 to 14 2
15 to 19 12
20 to 24 23
25 to 29 60
30 to 34 77
35 to 39 38
40 to 44 8
a. First, using the number of students as the frequency, find the midpoint of time
intervals.
b. Now calculate the mean using the midpoint (x) and the frequency (f).
Note: In this example, you are using a continuous variable that has been rounded to the nearest
integer. The group of 10 to 14 is actually 9.5 to 14.499 (as the 9.5 would be rounded up to 10
and the 14.499 would be rounded down to 14). The interval has a length of 5 but the midpoint
is 12 (9.5 + 2.5 = 12).
6,560 = (2 X 12 + 12 X 17 + 23 X 22 + 60 X 27 + 77 X 32 + 38 X 37 + 8 X 42)
Then, calculate the numbers for the xf, (x - ), (x - )2 and (x - )2f formulas.
Use the information found in the table above to find the standard deviation.
Note: During calculations, when a variable is grouped by class intervals, the midpoint of the
interval is used in place of every other value in the interval. Thus, the spread of observations
within each interval is ignored. This makes the standard deviation always less than the true
value. It should, therefore, be regarded as an approximation.
Example 5 – Standard deviation
Assuming the frequency distribution is approximately normal, calculate the interval within which
95% of the previous example's observations would be expected to occur.
= 29.82, s = 6.03
This means that there is about a 95% certainty that a student will spend between 18 hours and
42 hours per week watching television.
Date Modified: 2017-10-23
Top of Page
Important Notices
Advanced
Select Language ▼
We may use Cookies
OK
FacebookTwitterPinterestLinkedIneMail a Friend
Variance
The Variance is defined as:
Example
You and your friends have just measured the heights of your dogs (in
millimeters):
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and
300mm.
Find out the Mean, the Variance, and the Standard Deviation.
Answer:
Mea
= 600 + 470 + 170 + 430 + 3005
n
= 19705
= 394
so the mean (average) height is 394 mm. Let's plot this on the chart:
To calculate the Variance, take each difference, square it, and then average the
result:
Variance
σ2 = 2062 + 762 + (−224)2 + 362 + (−94)25
= 42436 + 5776 + 50176 + 1296 + 88365
= 1085205
= 21704
And the Standard Deviation is just the square root of Variance, so:
Standard Deviation
σ = √21704
= 147.32...
= 147 (to the nearest mm)
And the good thing about the Standard Deviation is that it is useful. Now we can
show which heights are within one Standard Deviation (147mm) of the Mean:
So, using the Standard Deviation we have a "standard" way of knowing what is
normal, and what is extra large or extra small.
But if the data is a Sample (a selection taken from a bigger Population), then
the calculation changes!
All other calculations stay the same, including how we calculated the mean.
The "Population Standard Deviation":
4 + 4 − 4 − 44 = 0
That looks good (and is the Mean Deviation), but what about this case:
|7| + |1| + |−6| + |−2|4 = 7 + 1 + 6 +
24 = 4
Oh No! It also gives a value of 4, Even though the differences are more spread
out.
So let us try squaring each difference (and taking the square root at the end):
√(7 + 1 + 6 +
2 2 2
2 4) = √(904) = 4.74...
2
That is nice! The Standard Deviation is bigger when the differences are more
spread out ... just what we want.
In fact this method is a similar idea to distance between points, just applied in a
different way.
And it is easier to use algebra on squares and square roots than absolute
values, which makes the standard deviation easy to use in other areas of
mathematics.
Return to Top
Question 1 Question 2 Question 3 Question 4 Question 5 Question 6 Q
uestion 7 Question 8 Question 9 Question 10
Standard Deviation FormulasStandard Deviation CalculatorStandard Normal
DistributionAccuracy and PrecisionMeanProbability and Statistics
Search ○ Index ○ About ○ Contact ○ Cite This Page ○ Privacy
Copyright © 2017 MathsIsFun.com