MODULE IN Measures of Central Tendency PDF
MODULE IN Measures of Central Tendency PDF
MODULE IN Measures of Central Tendency PDF
Introduction
The mean, median and mode are all valid measures of central tendency but, under
different conditions, some measures of central tendency become more appropriate
to use than others. In the following discussions we will look at the mean, median
and mode and learn how to calculate them and under what conditions they are most
appropriate to be used.
Mean (Arithmetic)
The mean (or average) is the most popular and well known measure of central
tendency. It can be used with both discrete and continuous data, although its use
is most often with continuous data. The mean is equal to the sum of all the values
in the data set divided by the number of values in the data set. So, if we have n
values in a data set and they have values x1, x2, ..., xn, then the sample mean,
usually denoted by (pronounced x bar), is:
This formula is usually written in a slightly different manner using the Greek
capitol letter, , pronounced "sigma", which means "sum of...":
You may have noticed that the above formula refers to the sample mean. So, why
have we called it a sample mean? This is because, in statistics, samples and
populations have very different meanings and these differences are very
important, even if, in the case of the mean, they are calculated in the same way.
To acknowledge that we are calculating the population mean and not the sample
mean, we use the Greek lower case letter "mu", denoted as µ:
An important property of the mean is that it includes every value in your data set
as part of the calculation. In addition, the mean is the only measure of central
tendency where the sum of the deviations of each value from the mean is always
zero.
The mean has one main disadvantage: it is particularly susceptible to the influence
of outliers. These are values that are unusual compared to the rest of the data set
by being especially small or large in numerical value. For example, consider the
wages of staff at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data
suggests that this mean value might not be the best way to accurately reflect the
typical salary of a worker, as most workers have salaries in the $12k to 18k range.
The mean is being skewed by the two large salaries. Therefore, in this situation
we would like to have a better measure of central tendency. As we will find out
later, taking the median would be a better measure of central tendency in this
situation.
Another time when we usually prefer the median over the mean (or mode) is when
our data is skewed (i.e. the frequency distribution for our data is skewed). If we
consider the normal distribution - as this is the most frequently assessed in
statistics - when the data is perfectly normal, then the mean, median and mode are
identical. Moreover, they all represent the most typical value in the data set.
However, as the data becomes skewed the mean loses its ability to provide the best
central location for the data as the skewed data is dragging it away from the typical
value. However, the median best retains this position and is not as strongly
Median
The median is the middle score for a set of data that has been arranged in order of
magnitude. The median is less affected by outliers and skewed data. In order to
calculate the median, suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case 56 (highlighted in bold). It is
the middle mark because there are 5 scores before it and 5 scores after it. This
works fine when you have an odd number of scores but what happens when you
have an even number of scores? What if you had only 10 scores? Well, you
simply have to take the middle two scores and average the result. So, if we look at
the example below:
65 55 89 56 35 14 56 55 87 45
14 35 45 55 55 56 56 65 87 89 92
Only now we have to take the 5th and 6th score in our data set and average them
to get a median of 55.5.
Mode
The mode is the most frequent score in our data set. On a histogram it represents
the highest bar in a bar chart or histogram. You can, therefore, sometimes
consider the mode as being the most popular option. An example of a mode is
presented on the next page:
We can see above that the most common form of transport, in this particular data
set, is the bus. However, one of the problems with the mode is that it is not unique,
so it leaves us with problems when we have two or more values that share the
highest frequency, such as diagram on the next page:
Another problem with the mode is that it will not provide us with a very good
measure of central tendency when the most common mark is far away from the
rest of the data in the data set, as depicted in the diagram below:
When you have a normally distributed sample you can legitimately use both the
mean and the median as your measure of central tendency. In fact, in any
symmetrical distribution the mean, median and mode are equal. However, in this
situation, the mean is widely preferred as the best measure of central tendency as
it is the measure that includes all the values in the data set for its calculation, and
any change in any of the scores will affect the value of the mean. This is not the
case with the median or mode.
However, when our data is skewed, for example, as with the right-skewed data
set seen on the diagram shown on the next page:
If dealing with a normal distribution and tests of normality show that the data is
non-normal, then it is customary to use the median instead of the mean. This is
more a rule of thumb than a strict guideline however. Sometimes, researchers
wish to report the mean of a skewed distribution if the median and mean are not
appreciably different (a subjective assessment) and if it allows easier comparisons
to previous research to be made.
Please use the following summary table to know what the best measure of central
tendency is with respect to the different types of variable.
Class
Interval f cf m fm fm2
21 - 25 3
26 - 30 5
31 - 35 11
36 - 40 9
41 - 45 12
46 - 50 6
51 - 55 4
n=50
There are seven (7) class intervals (or groups) ranging from 21-25 up to 51-
55. Each group has a class width (or size) of five (5). Three (3) students scored in
the range of 21-25, five (5) students got about 26-30, eleven (11) garnered scores
ranging from 31-35, nine (9) received scores from 36-40, … while four (4) scored
about 51-55. Altogether, 50 students took the examination.
Class
Interval f cf m fm fm2
21 - 25 3 3
26 - 30 5 8
31 - 35 11 19
36 - 40 9 28
41 - 45 12 40
46 - 50 6 46
51 - 55 4 50
n=50
Class
Interval f cf m fm fm2
21 - 25 3 3 23
26 - 30 5 8 28
31 - 35 11 19 33
36 - 40 9 28 38
41 - 45 12 40 43
46 - 50 6 46 48
51 - 55 4 50 53
n=50
The column for fm is the product of the frequency of the class interval and
its midpoint (frequency x midpoint). For the class interval 21-25, fm is 69 (3 x 23
= 69), for 26-30, 140 (5 x 28), then 363 (11 x 33), …, and for the class interval 51-55,
fm is 212 (4 x 53). The sum of the fm column is 1930.
The last column (fm2) is the product of the midpoint m and fm (m x fm),
hence, for the class interval 21-25 fm2 is equal to 1587 (23 x 69). For the class
interval 26-30, fm is 3920 (28 x 140), next group’s fm2 is 11979 (33 x 363), …, and
for class interval 51-55, fm2 is 11236 (53 x 212). The summation of fm2 is 77730.
Class
Interval f cf m fm fm2
21 - 25 3 3 23 69 1587
26 - 30 5 8 28 140 3920
31 - 35 11 19 33 363 11979
36 - 40 9 28 38 342 12996
41 - 45 12 40 43 516 22188
46 - 50 6 46 48 288 13824
51 - 55 4 50 53 212 11236
n=50 fm=1930 fm2=77730
fm 1930
Mean = = 50 = 38.6
n
(n/2) - cfp
The median is defined by the formula Median = LB + w
fmd
where: LB = real lower boundary of the median class (which is 0.5 less
than the lower boundary)
n = sample size
cfp = cumulative frequency of the class that precedes (just
before) the median class
fmd = frequency of the median class
w = class width (or size)
Note: In finding the median class of the grouped data, we have to identify
the class interval where the middle data is included. In this case, the
25th data which is half of the total sample population of 50 belongs
to the class interval 36-40.
Class
Interval f cf m fm fm2
21 - 25 3 3 23 69 1587
26 - 30 5 8 28 140 3920
Median
class 31 - 35 11 19 33 363 11979
36 - 40 9 28 38 342 12996
41 - 45 12 40 43 516 22188
46 - 50 6 46 48 288 13824
51 - 55 4 50 53 212 11236
n=50 fm=1930 fm2=77730
(n/2) - cfp
Median = LB + w
fmd
(50/2) - 19 25 - 19
Median = 35.5 + 5 = 35.5 + 5
9 9
Median = 38.83
fmode - fB
Mode = LB + w
2fmode - fB - fA
where: LB = real lower boundary of the modal class (which is 0.5 less
than the lower boundary)
fmode = frequency of the modal class
fB = frequency of the class interval preceding (just before) the
modal class
fA = frequency of the class interval succeeding (just after) the
modal class
w = class width (or size)
Note: In finding the modal class of the grouped data, we have to identify
the class interval with the highest frequency. In this case, the modal
class is the class interval 41-45 (with the highest frequency of 12).
Class
Interval f cf m fm fm2
21 - 25 3 3 23 69 1587
26 - 30 5 8 28 140 3920
31 - 35 11 19 33 363 11979
Modal
class 36 - 40 9 28 38 342 12996
41 - 45 12 40 43 516 22188
46 - 50 6 46 48 288 13824
51 - 55 4 50 53 212 11236
n=50 fm=1930 fm2=77730
fmode - fB
Mode = LB + 2fmode - fB - fA w
12 - 9 3
Mode = 40.5 + 2(12) - 9 - 6 5 = 40.5 + 9 5
Mode = 42.17
The standard deviation is a statistic that tells you how tightly all the entries
are clustered around the mean in a set of data. It provides a good indication of
volatility (unpredictability or unstableness). When the sample data are pretty
tightly bunched together and the bell-shaped curve is steep, the standard
deviation is small. When the sample data are spread apart and the bell curve is
relatively flat, there is a relatively large standard deviation.
One standard deviation away from the mean in either direction accounts
for somewhere around 68 percent of the population sample. Two standard
deviations away from the mean account for roughly 95 percent of the population
sample. And three standard deviations account for about 99 percent of the
population sample.
The standard deviation can tell you how spread out the examples in a set
are from the mean.
For example, School A has a higher mean test score than School B. Your
first reaction might be to say that the kids at School A are smarter.
But a bigger standard deviation for one school tells you that there are
relatively more kids at that school scoring toward one extreme or the other. By
asking a few follow-up questions you might find that, say, School A’s mean was
skewed up because the school district sends all of the gifted education kids to
School A. Or that School B’s scores were dragged down because students who
recently have been "mainstreamed" from special education classes have all been
sent to School B.
In this way, looking at the standard deviation can help point you in the
right direction when asking why information is the way it is.
1
s.d. () = n √𝒏 𝒇𝒎𝟐 − ( 𝒇𝒎)𝟐
Class
Interval f cf m fm fm2
21 - 25 3 3 23 69 1587
26 - 30 5 8 28 140 3920
31 - 35 11 19 33 363 11979
36 - 40 9 28 38 342 12996
41 - 45 12 40 43 516 22188
46 - 50 6 46 48 288 13824
51 - 55 4 50 53 212 11236
n=50 fm=1930 fm2=77730
1
s.d. () = n √𝑛𝑓𝑚2 − (𝑓𝑚)2
1
s.d. () = 50 √50 (77730) − (1930)2
1
s.d. () = 50 √3886500 − 3724900
1
s.d. () = 50 √161600
1
s.d. () = 50 (401.9950)
1. Complete the frequency table and compute for the mean, median, mode
and standard deviation.
Class
Interval f cf m fm fm2
86 - 96 1
97 - 107 8
108 - 118 18
119 - 129 14
130 - 140 12
141 - 151 3
152 - 162 4
2. Make a frequency table (see above) with six (6) class intervals starting with
the range 15 – 25 until 70 – 80. Then compute for the mean, median, mode
and standard deviation.
18 20 31 30 37 18 53 27 32 55
60 32 55 45 47 51 54 23 56 42
57 62 75 67 49 22 27 32 32 45
27 73 58 41 42 52 53 35 40 50
19 49 21 50 25 30 38 42 45 47
3. Construct a frequency table with seven (7) classes starting with the class
interval 31 – 35 (up to the class interval 61 – 65). Compute for the mean,
median, mode and standard deviation of the grouped data.
47 41 36 43 48 33 50 52 53 47
44 46 52 54 57 48 53 53 32 45
63 43 31 47 44 40 52 45 60 52
35 53 57 48 42 60 46 58 41 36
51 47 54 37 34 54 65 63 40 48