Lecture3A Slides
Lecture3A Slides
TENDENCY
Lecture Three
Measures of central tendency
• Graphs are useful for giving a general idea of
the shape and spread of the data however
better and more objective understandings of
these concepts can be obtained by
summarising the data numerically using
statistics.
• Measures of central tendency are numerical
values that refer to the centre of the
distribution. They give an indication of the
centre of a group of numbers.
• The three common measures of central
tendency are: the mode, the median, and the
mean. These measures are calculated
differently for raw (ungrouped) and
frequency (grouped) data.
Mode
• The mode is the value that appears most
frequently in the data.
• For raw (ungrouped) data, the mode is the
value/s of the variable that occur the most.
• A set of raw data can have one mode
(unimodal), two modes (bimodal), multiple
modes, or no mode at all.
• For frequency (grouped) data, the modal
class is the class with the most observations
i.e. the class with the highest frequency.
• Advantages: represents the largest number of
subjects with the same score; is a score that
actually occurred (unlike the mean or
median); is applicable to all scales of
measure, including nominal variables
• Disadvantages: depends on how the data is
grouped; may not represent the entire
collection of data
EXAMPLES
The mode is Honours
Year of Study (frequency = 85)
Frequency
Valid Honours 85
Masters 74
PhD 41
Total 200
Sensation-seeking Scores The modal class is 25-29
(frequency = 108)
Frequency
Valid 15-19 4
20-24 49
25-29 108
30-34 33
35-39 6
Total 200
The Median
• The median is the value of the number in the
middle of an ordered set of numbers i.e. it is
the value that sits in the centre position of a
set of numbers arranged in order from
smallest to largest. For example, the median
in the set of numbers: 1, 3, 5, 7, 9 is 5 as this
sits in the central position in the data.
• For raw (ungrouped) data, the median is the
estimated value that occupies the central
position in the data set.
• For frequency (grouped) data, the median
class is the class that is estimated to contain
the value that would occupy the central
position in the data set.
• The median is the value for which at most
50% of observations are below it and at most
50% of observations are above it. The median
is therefore also known as the 50th percentile.
• Percentiles are commonly described as
measures that divide a set of data into 100
parts; they are used to indicate the relative
standing of a data point (value or
observation) in the data set. For example, if a
set of n observations is arranged in ascending
order (smallest to largest), then the rth
percentile is the data point/ value such that
at most r% of the data points are below it
and at most (100 – r%) are above it.
• Advantages: the median is not sensitive to
outliers (extreme values) and is often used to
describe skewed distributions; it can be used
for ordinal discrete data and all forms of
continuous data
• Disadvantages: it is not stable from sample to
sample; it is hard to work with; the actual
value of the median may not exist in the data
as a data point.
EXAMPLE
Consider the following data:
10 20 25 25 30 35 40 40 45
• There are nine values in the data set above
(i.e. n = 9) – these values are ordered from
smallest to largest value numerically.
• To find the position of the median, the
following formula can be used: