Statistical Analysis
Statistical Analysis
Measures of central tendency help you find the middle, or the average, of a dataset. The 3 most
common measures of central tendency are the mode, median, and mean.
In addition to central tendency, the variability and distribution of your dataset is important to
understand when performing descriptive statistics.
Normal distribution
In a normal distribution, data is symmetrically distributed with no skew. Most values
cluster around a central region, with values tapering off as they go further away from the
center. The mean, mode and median are exactly the same in a normal distribution.
Example: Normal distribution You survey a sample in your local community on the number of books they
read in the last year.
A histogram of your data shows the frequency of responses for each possible number of books.
From looking at the chart, you see that there is a normal distribution.
The mean, median and mode are all equal; the central tendency of this dataset is 8.
Skewed distributions
In skewed distributions, more values fall on one side of the center than the other, and
the mean, median and mode all differ from each other. One side has a more spread out
and longer tail with fewer scores at one end than the other. The direction of this tail tells
you the side of the skew
In a positively skewed distribution, there’s a cluster of lower scores and a spread out tail
on the right. In a negatively skewed distribution, there’s a cluster of higher scores and a
spread out tail on the left.
To find the mode, sort your dataset numerically or categorically and select the response
that occurs most frequently.
Example: Finding the mode In a survey, you ask 9 participants whether they identify as conservative,
moderate, or liberal.
To find the mode, sort your data by category and find which response was chosen most
frequently.
To make it easier, you can create a frequency table to count up the values for each category.
For continuous variables or ratio levels of measurement, the mode may not be a helpful
measure of central tendency. That’s because there are many more possible values than
there are in a nominal or ordinal level of measurement. It’s unlikely for a value to repeat
in a ratio level of measurement.
Example: Ratio data with no mode. You collect data on reaction times in a computer task, and your
dataset contains values that are all different from each other.
Participant 1 2 3 4 5 6 7 8 9
Reaction time (milliseconds) 267 345 421 324 401 312 382 298 303
In this dataset, there is no mode, because each value occurs only once.
Median
The median of a dataset is the value that’s exactly in the middle when it is ordered from
low to high.
Example: Finding the median. You measure the reaction times of 7 participants on a computer task and
categorize them into 3 groups: slow, medium or fast.
Participant 1 2 3 4 5 6 7
To find the median, you first order all values from low to high. Then, you find the value in the
middle of the ordered dataset—in this case, the value in the 4th position.
Median: Medium
In larger datasets, it’s easier to use simple formulas to figure out the position of the
middle value in the distribution. You use different methods to find the median of a
dataset depending on whether the total number of values is even or odd.
For an odd-numbered dataset, find the value that lies at the position,
where n is the number of values in the dataset.
Example: You measure the reaction times in milliseconds of 5 participants and order the dataset.
That means the median is the 3rd value in your ordered dataset.
Example: You measure the reaction times of 6 participants and order the dataset.
That means the middle values are the 3rd value, which is 345, and the 4th value, which is 357.
To get the median, take the mean of the 2 middle values by adding them together and dividing by
2.
Mean
The arithmetic mean of a dataset (which is different from the geometric mean) is the
sum of all values divided by the total number of values. It’s the most commonly used
measure of central tendency because all values are used in the calculation.
Participant 1 2 3 4 5
Example: Mean with an outlier. In this dataset, we swap out one value with an extreme outlier.
Participant 1 2 3 4 5
Due to the outlier, the mean ( ) becomes much higher, even though all the other numbers in the
dataset stay the same.
While data from a sample can help you make estimates about a population, only full
population data can give you the complete picture.
In statistics, the notation of a sample mean and a population mean and their formulas
are different. But the procedures for calculating the population and sample means are
the same.
Sample mean formula. The sample mean is written as M or x̄ (pronounced x-bar). For calculating the
mean of a sample, use this formula:
Population mean formulaThe population mean is written as μ (Greek term mu). For calculating the mean
of a population, use this formula:
μ: population mean
: sum of all values in the population dataset
N: number of values in the population dataset
The mode can be used for any level of measurement, but it’s most meaningful for
nominal and ordinal levels.
The median can only be used on data that can be ordered – that is, from ordinal, interval
and ratio levels of measurement.
The mean can only be used on interval and ratio levels of measurement because it
requires equal spacing between adjacent values or scores in the scale.
To decide which measures of central tendency to use, you should also consider the
distribution of your dataset.
For normally distributed data, all three measures of central tendency will give you the
same answer so they can all be used.