UNIDAD CENTRAL DEL VALLE DEL CAUCA
PROGRAMA DE ENFERMERIA
BIOESTADÍSTICA
MEASURES OF CENTRAL TENDENCY
By: Ildefonso Cobo Viveros M.sc
A measure of central tendency is a summary statistic that represents the center
point or typical value of a dataset. These measures indicate where most values in
a distribution fall and are also referred to as the central location of a distribution.
You can think of it as the tendency of data to cluster around a middle value. In
statistics, the three most common measures of central tendency are the mean,
median, and mode. Each of these measures calculates the location of the central
point using a different method.
Choosing the best measure of central tendency depends on the type of data you
have. In this post, I explore these measures of central tendency, show you how to
calculate them, and how to determine which one is best for your data
Locating the Center of Your Data
Most articles that you’ll read about the mean, median, and mode focus on how you
calculate each one. I’m going to take a slightly different approach to start out. My
philosophy throughout my blog is to help you intuitively grasp statistics by focusing
on concepts. Consequently, I’m going to start by illustrating the central point of
several datasets graphically—so you understand the goal. Then, we’ll move on to
choosing the best measure of central tendency for your data and the calculations.
The three distributions below represent different data conditions. In each
distribution, look for the region where the most common values fall. Even though
the shapes and type of data are different, you can find that central location. That’s
the area in the distribution where the most common values are located.
1
As the graphs highlight, you can see where most values tend to occur. That’s the
concept. Measures of central tendency represent this idea with a value. Coming
up, you’ll learn that as the distribution and kind of data changes, so does the best
measure of central tendency. Consequently, you need to know the type of data you
have, and graph it, before choosing a measure of central tendency!
Mean
The mean is the arithmetic average, and it is probably the measure of central
tendency that you are most familiar. Calculating the mean is very simple. You just
add up all of the values and divide by the number of observations in your dataset.
The calculation of the mean incorporates all values in the data. If you change any
value, the mean changes. However, the mean doesn’t always locate the center of
the data accurately. Observe the histograms below where I display the mean in the
distributions.
When to use the mean: Symmetric distribution, Continuous data
Median
The median is the middle value. It is the value that splits the dataset in half. To find
the median, order your data from smallest to largest, and then find the data point
that has an equal amount of values above it and below it. The method for locating
the median varies slightly depending on whether your dataset has an even or odd
number of values. I’ll show you how to find the median for both cases. In the
examples below, I use whole numbers for simplicity, but you can have decimal
places.
In the dataset with the odd number of observations, notice how the number 12 has
six values above it and six below it. Therefore, 12 is the median of this dataset.
2
When there is an even number of values, you count in to the two innermost values
and then take the average. The average of 27 and 29 is 28. Consequently, 28 is
the median of this dataset.
Outliers and skewed data have a smaller effect on the median. To understand why,
imagine we have the Median dataset below and find that the median is 46.
However, we discover data entry errors and need to change four values, which are
shaded in the Median Fixed dataset. We’ll make them all significantly higher so
that we now have a skewed distribution with large outliers.
As you can see, the median doesn’t change at all. It is
still 46. Unlike the mean, the median value doesn’t
depend on all the values in the dataset. Consequently,
when some of the values are more extreme, the effect on
the median is smaller. Of course, with other types of
changes, the median can change. When you have a
skewed distribution, the median is a better measure
of central tendency than the mean.
When to use the median: Skewed distribution,
Continuous data, Ordinal data
Mode
The mode is the value that occurs the most frequently in
your data set. On a bar chart, the mode is the highest bar.
If the data have multiple values that are tied for occurring the most frequently, you
have a multimodal distribution. If no value repeats, the data do not have a mode.
In the dataset below, the value 5 occurs most frequently, which makes it the mode.
These data might represent a 5-point Likert scale.
3
Typically, you use the mode with categorical, ordinal, and discrete data. In fact, the
mode is the only measure of central tendency that you can use with categorical
data—such as the most preferred flavor of ice cream. However, with categorical
data, there isn’t a central value because you can’t order the groups. With ordinal
and discrete data, the mode can be a value that is not in the center. Again, the
mode represents the most common value.
In the continuous data below, no values repeat, which means there is no mode.
With continuous data, it is unlikely that two or more values will be exactly equal
because there are an infinite number of values between any two values.
When to use the mode: Categorical data, Ordinal data, Count data, Probability
Distributions
Which is Best—the Mean, Median, or Mode?
4
When you have a symmetrical distribution for continuous data, the mean, median,
and mode are equal. In this case, analysts tend to use the mean because it
includes all of the data in the calculations. However, if you have a skewed
distribution, the median is often the best measure of central tendency.
When you have ordinal data, the median or mode is usually the best choice. For
categorical data, you have to use the mode.
In cases where you are deciding between the mean and median as the better
measure of central tendency, you are also determining which types of
statistical hypothesis tests are appropriate for your data—if that is your ultimate
goal. I have written an article that discusses when to use parametric (mean) and
nonparametric (median) hypothesis tests along with the advantages and
disadvantages of each type.
BIBLIOBRAFÍA
FROST, Jim M.S. Introduction to Statistics, An institute guide for analyzing data an
unlocking discoveries. Statistics By Jim Publishing. Sine loco. 1ª ed. 13 August of
2020