Chapter Three
Chapter Three
Contents
3.1 Introduction
3.2 The summation Notation
3.3 Properties of Measures of Central Tendency
3.4 Types of Measures of Central Tendency
3.4.1 The Arithmetic mean (simple and weight)
3.4.2 The Mode
3.4.3 The median
3.5 Measures of Non-central Locations
G. BATESO
Objectives
At the end of this chapter students will be able to:
• Identify measure of central tendency.
• Understand properties of arithmetic mean.
• Summarize an aggregate of statistical data by using single measure.
• Define and calculate the mean, mode and median.
• measure the position of data using quartiles, deciles and percentiles with their interpretation.
3.1 introductions
Researchers are often interested in defining a value that best describes some attribute of the
population. The best way to reduce a set of data and still retain part of the information is to
summarize the set with a single value. Therefore, measures of central tendency are one of
descriptive statistics.
1
Chapter three: Measures of central Tendency
When we want to make comparison between groups of numbers it is good to have a single value that is
considered to be a good representative of each group. This single value is called the average of the group.
-Averages are also called measures of central tendency.
-An average which is representative is called typical average and an average which is not
representative and has only a theoretical value is called a descriptive average.
3.2 The Summation Notation ()
Statistical Symbols: Let a data set consists of a number of observations, represents by 𝑥1 , 𝑥2
, … , 𝑥𝑛 where n (the last subscript) denotes the number of observations in the data and 𝑥𝑖 is the ith
observation. Then the sum of all numbers (𝑥𝑖 ′𝑠) where i goes from 1 up to n is symbolically
given by ∑𝑛𝑖=1 𝑥𝑖 𝑜𝑟 ∑ 𝑥𝑖 𝑜𝑟 ∑ 𝑥 that is,
∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + … + 𝑥𝑛
x - whole set of numbers
𝑥𝑖 - specific score in a set of numbers
n - total number of observations
For instance a data set consisting of six measurements 2, 3, 9, 10, 8 and -2 is represented by 𝑥1 ,
𝑥2 , … , 𝑥6 where 𝑥1 = 2, 𝑥2 =3, 𝑥3 =9, 𝑥4 = 10, 𝑥5 = 8 and 𝑥6 =-2 Their sum becomes ∑6𝑖=1 𝑥𝑖
= 𝑥1 + 𝑥2 + … + 𝑥6 = 2+3+9+10+8+ (-2) = 30
Some Properties of the Summation Notation
1. ∑𝑛𝑖=1 𝑐 = n.c, where c is a constant number.
2
Chapter three: Measures of central Tendency
Example 3.1: ∑7𝑖=1 𝑥𝑖 = 20 , ∑7𝑖=1 𝑦𝑖 = 30, ∑7𝑖=1 𝑥𝑖2 = 420, ∑7𝑖=1 𝑦𝑖2 =280
3
Chapter three: Measures of central Tendency
Example 3.3: Calculate the arithmetic mean of the sample of numbers of students in 10 classes:
50 42 48 60 58 54 50 42 50 42
∑𝑛
𝑖=1 𝑥𝑖 50+42+48+60+58+54+50+42+50+42 496
𝑥̅ = = = = = 49.6 ≈ 50
𝑛 10 10
In this case there are three 42’s, one 48, three 50’s, one 54, one 58 and one 60. The number of
times each number occurs is called its frequency and the frequency is usually denoted by f. The
information in the sentence above can be written in a table, as follows.
4
Chapter three: Measures of central Tendency
Value, xi 42 48 50 54 58 60
Frequency, 3 1 3 1 1 1
fi
xifi 126 48 150 54 58 60
The formula for the arithmetic mean for data of this type is
𝑥1 𝑓1 +𝑥2 𝑓2 + …+𝑥𝑘 𝑓𝑘 ∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ = = ∑𝑘
𝑓1 +𝑓2 + …+ 𝑓𝑘 𝑖=1 𝑓𝑖
Solution:
The formula to be used for the mean is as follows:
∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ =
∑𝑘
𝑖=1 𝑓𝑖
Let us calculate these values and make a table for these values for the sake of convenience.
Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total
5
Chapter three: Measures of central Tendency
Mid-Point (𝑥𝑖 ) 61 63 65 67 69 71
(x
i =1
i − x) = 0
n
• The sum of squares of deviations from the mean is the least. That is, ( x − A)
i =1
i
2
is minimum when
A = x.
6
Chapter three: Measures of central Tendency
While calculating simple arithmetic mean, all items were assumed to be of equally importance (each
value in the data set has equal weight). When the observations have different weight, we use weighted
average. Weights are assigned to each item in proportion to its relative importance.
If 𝑥1 , 𝑥2 , … , 𝑥𝑛 represent values of the items and 𝑤1 , 𝑤2 , … , 𝑤𝑛 are the corresponding weights, then the
weighted mean, (𝑥̅𝑤 ) is given by
xw =
w1 x1 + w2 x2 + + wn xn
=
w x i i
w1 + w2 + + wn w i
Example 3.5: A student’s final mark in Mathematics, Physics, Chemistry and Biology are respectively A,
B, D and C. If the respective credits received for these courses are 4, 4, 3 and 2, determine the
approximate average grade the student has got for the course.
Solution
We use a weighted arithmetic mean, weight associated with each course being taken as the number of
credits received for the corresponding course.
𝑥𝑖 4 3 1 2 Total
𝑤𝑖 4 4 3 2 13
𝑥𝑖 𝑤𝑖 16 12 3 4 35
xw =
w1 x1 + w2 x2 + + wn xn
=
w x i i
w1 + w2 + + wn w i
16+12+3+4 35
= 13
= 13
= 2.69
Combined mean: When a set of observations is divided into k groups and 𝑥̅1 is the mean of n1
observations of group 1, 𝑥̅2 is the mean of n2 observations of group2, …, 𝑥̅𝑘 is the mean of nk
observations of group k, then the combined mean, denoted by 𝑥̅𝑐 , of all observations taken together is
given by
This is a special case of the weighted mean. In this case the sample sizes are the weights.
7
Chapter three: Measures of central Tendency
Example 3.6: In the Previous year there were two sections taking Statistics course. At the end of the
semester, the two sections got average marks of 70 & 78. There were 45 and 50 students in each section
respectively. Find the mean mark for the entire students.
Solution:
Solution;
i. The data in ascending order is given by:
-5 0 1 2 4 5 6 8 10 15
n=10 ➔n is even. The two middle values are 5th and 6th observations. So the median
is,
10 10
( )𝑡ℎ +( +1)𝑡ℎ 5𝑡ℎ +6𝑡ℎ 4+5
2 2
𝑥̃ = value = = = 4.5
2 2 2
8
Chapter three: Measures of central Tendency
Note: The median is easy to calculate for small samples and is not affected by an "outlier".
Median for Discrete data arranged in a frequency distribution:- In this case also, the median
is obtained by the above formula. After arranging the values in an increasing order find the
smallest CF greater than or equal to that value obtained by a & b above formula and the
corresponding value is the median.
Median for grouped continuous data:-For continuous data, the median is obtained by the
following formula.
w n
Median = L + − CF = ~
x
f med 2
Where: L= the lower class boundary of the median class; w = the class width of the median
class;
f med = the frequency of the median class; and CF = the cum. freq. corresponding to the class
preceding the median class. That is, the sums of the frequencies of all classes lower than the
median class. Where the median class is the class which contains the (n/2)th observation whether n
is odd or even, since the items have already lost their originality once they are grouped in to
continuous classes.
Example 3.11: Calculate the median for the following frequency distribution.
Freq. 4 8 12 6 3 4 3 40
Freq. 4 8 12 6 3 4 3 40
Cuml. Freq. 4 12 24 30 33 37 40
Since n = 40, 40/2 = 20, and the smallest CF greater than or equal to 20 is 24; thus, the median class
is the third class. And for this class, L = 10.5, w = 5, f med =12, CF = 12. Then applying the formula,
we get:
~
x =10.5+(20-12)*5/12=13.8
Merits of median
• It is less affected by extreme values.
9
Chapter three: Measures of central Tendency
10
Chapter three: Measures of central Tendency
immediately preceding the modal class; f 2 = frequency of the class immediately succeeding the
modal class; and fmode = frequency of the modal class.
Example 3.13: Calculate the mode for the frequency distribution of data of example 3.11.
Solution: By inspection, the mode lies in the third class, where L =10.5, fmod = 12, f1=8, f2=6, w = 5
Using the formula, the mode is:
1
Mode = L + w = 10.5 + (12-8)*5/(12-8)+(12-5) = 12.5
1 + 2
Merits of mode
• Mode is not affected by extreme values.
• We can change the size of the observations without changing the mode.
• It can be computed for all level of data i.e. ratio, interval, ordinal or nominal.
Demerits of mode
• It may not exist.
• It does not take every value into consideration.
• Mode may not exist in the series and if it exists it may not be unique.
3.4 The Relationship of the Mean, Median and Mode
Comparing the Mean, Median, and the Mode
• If the data is skewed –avoid the mean.
• If there is high gap around the middle- avoid the median.
• A measure is a resistant measure if its value is not affected by an outlier or an extreme
data value.
• The mean is not a resistant measure of central tendency because it is not resistant to the
influence of the extreme data values or outliers.
• The median is resistant to the influence of extreme data values or outliers and its value
does not respond strongly to the changes of a few extreme data values regardless of how
large the change may be.
• The mode has an advantage over both the mean and the median when the data is
categorical since it is not possible to calculate the mean or median for this type of data.
Also, the mode usually indicates the location within a large distribution where the data
values are concentrated. However, the mode can not always be calculated because if a
distribution has all different data values, then the distribution is non modal.
• In the case of symmetrical distribution; mean, median and mode coincide. That is
mean=median = mode. However, for a moderately asymmetrical (non symmetrical)
distribution, mean and mode lie on the two ends and median lies between them and they
have the following important empirical relationship, which is
Mean – Mode = 3(Mean - Median)
11
Chapter three: Measures of central Tendency
Example 3.14: In a moderately asymmetrical distribution, the mean and the mode are 30 and 42
respectively. What is the median of the distribution?
Solution:
Median = (2mean + Mode)/2 = (2*30 + 42)/3 = 34
Hence the median of the distribution is 34.
Which of the Three Measures is the Best?
At this stage, one may ask as to which of these three measure of central tendency is the best.
There is no simple answer to this question. It is because these three measures are based upon
different concepts. The arithmetic mean is the sum of the values divided by the total number of
observations in the series. The median is the value of the middle observations tend to
concentrate. As such; the use of a particular measure will largely depend on the purpose of the
study and the nature of the data. For example, when we are interested in knowing the consumers’
preferences for different brands of television sets or kinds of advertising, the choice should go in
favor of mode. The use of mean and median would not be proper. However, the median can
sometimes be used in the case of qualitative data when such data can be arranged in an ascending
or descending order. Let us take another example. Suppose we invite applications for a certain
vacancy in our company. A large number of candidates apply for that post. We are now
interested to know as to which age or age group has the largest concentration of applicants. Here,
obviously the mode will be the most appropriate choice. The arithmetic mean may not be
appropriate as it may be influenced by some extreme values.
3.5 Measures of Non-central Locations
Median is the value of the middle item which divides the data in to two equal parts and found by
arranging the data in an increasing or decreasing order of magnitude, where as quintiles are
measures which divides a given set of data in to approximately equal subdivision and are
obtained by the same procedure to that of median. They are averages of position (non-central
tendency). Some of these are quartiles, deciles and percentiles.
Quartiles: are values which divide the data set in to approximately four equal parts, denoted by
𝑄1 , 𝑄2 𝑎𝑛𝑑 𝑄3 . The first quartile (𝑄1) is also called the lower quartile and the third quartile
(𝑄3 ) is the upper quartile. The second quartile ( 𝑄2 ) is the median.
• Quartiles for Individual series:
Let x1 , x 2 , , x n be n ordered observations. The ith quartile (Qi ) is the value of the item
corresponding
with the [i(n+1)/4]th position, i = 1, 2, 3.
That is, after arranging the data in ascending order, Q1, Q2, & Q3 are, obtained by:
1(𝑛+1) 𝑡ℎ 2(𝑛+1) 𝑡ℎ 3(𝑛+1) 𝑡ℎ
𝑄1 = ( ) 𝑣𝑎𝑙𝑢𝑒, 𝑄2 = ( ) 𝑣𝑎𝑙𝑢𝑒 and 𝑄3 = ( ) 𝑣𝑎𝑙𝑢𝑒.
4 4 4
12
Chapter three: Measures of central Tendency
the less than cumulative frequency distribution and apply the formula of quartile for individual
series.
• Quartiles in continuous data:- For continuous data, use the following formula:
w in
Qi = L + − CF
f Qi 4
Where i = 1,2, 3, and L, w ,fQi and CF are defined in the same way as the median.
𝑤 𝑛 𝑤 2𝑛 𝑤 3𝑛
i.e. Q1 = L +𝑓 ( 4 − 𝐶𝐹) , Q2 = L + 𝑓 ( 4 − 𝐶𝐹) 𝑎𝑛𝑑 Q3 = L + 𝑓 ( 4 − 𝐶𝐹)
𝑄1 𝑄2 𝑄3
The class under question is the one including (ixn/4)th value. That is, the class with the minimum
frequency greater than or equal to (ixn/4) th is the class of the ith quartile.
Deciles: are values dividing the data approximately in to ten equal parts, denoted by 𝐷1 , 𝐷2,…, 𝐷9 .
• Deciles for Individual Series:
Let x1 , x 2 , , x n be n ordered observations. The ith decile (𝐷𝑖 ) is the value of the item
corresponding
with the [i(n+1)/10]th position, i = 1, 2, . . . ,9.
That is, after arranging the data in ascending order, D1, D2, . . . & D9 are, obtained by:
1(𝑛+1) 𝑡ℎ 2(𝑛+1) 𝑡ℎ 9(𝑛+1) 𝑡ℎ
𝐷1 = ( ) 𝑣𝑎𝑙𝑢𝑒, 𝐷2 = ( ) 𝑣𝑎𝑙𝑢𝑒 . . . and 𝐷9 = ( ) 𝑣𝑎𝑙𝑢𝑒.
10 10 10
Define the symbols in similar ways as we did in the case of quartiles for continuous data.
Percentiles: are values which divide the data approximately in to one hundred equal parts, and
denoted by 𝑃1 , 𝑃2,…, 𝑃99 .
• Percentiles for Individual Series:
Let x1 , x 2 , , x n be n ordered observations. The ith percentile (𝑃𝑖 ) is the value of the item
corresponding with the [i(n+1)/100]th position, i = 1, 2, . . . ,99.
That is, after arranging the data in ascending order, P1, P2, . . . & P99 are, obtained by:
13
Chapter three: Measures of central Tendency
Define the symbols similar ways as we did in the case of quartiles or deciles for continuous data.
Interpretations
1. 𝑄𝑖 is the value below which ( i × 25) percent of the observations in the series are found
(where i = 1, 2,3). For instance 𝑄3 means the value below which 75 percent of observations in
the given series are found.
2. 𝐷𝑖 is the value below which ( i ×10) percent of the observations in the series are found (where
i = 1, 2,...,9 ). For instance 𝐷4 is the value below which 40 percent of the values are found in the
series.
3. 𝑃𝑖 is the value below which i percent of the total observations are found (where i = 1, 2,3,...,99
). For example 60 percent of the observations in a given series are below 𝑃60 .
Example 3.15: Calculate 𝑄1 , 𝑄2 , 𝑄3, 𝐷4, 𝐷9, 𝑃40 & 𝑃90 for the following data given on the table
below.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Solution: The data is arranged in an increasing order. So we need to construct only the
cumulative frequency table before calculating the required values.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Cum. 2 10 35 83 148 188 208 217 219
Freq.
The total number of observations is 219 which is odd. Clearly then the median is 14. i.e.
𝑛+1 𝑡ℎ 219+1 𝑡ℎ
𝑥̃ = ( ) =( ) value = 110th value = 14
2 2
1(𝑛+1) 𝑡ℎ 1(219+1) 𝑡ℎ
𝑄1 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 55th value = 13
4 4
2(𝑛+1) 𝑡ℎ 2(219+1) 𝑡ℎ
𝑄2 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 110th value = 14 = 𝑥̃
4 4
14
Chapter three: Measures of central Tendency
3(𝑛+1) 𝑡ℎ 3(219+1) 𝑡ℎ
𝑄3 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 165th value = 15
4 4
4(𝑛+1) 𝑡ℎ 4(219+1) 𝑡ℎ
𝐷4 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 88th value = 14
10 10
9(𝑛+1) 𝑡ℎ 9(219+1) 𝑡ℎ
𝐷9 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 198th value = 16
10 10
40(𝑛+1) 𝑡ℎ 40(219+1) 𝑡ℎ
𝑃40 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 88th value = 14
100 100
90(𝑛+1) 𝑡ℎ 90(219+1) 𝑡ℎ
𝑃90 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 198th value = 16
100 100
Example 3.16: Marks of 50 students out of 85 is given below. Based on the data find 𝑄1,
𝐷4 𝑎𝑛𝑑 𝑃7.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
fi 4 8 15 5 9 5 4
Solution:- first find the class boundaries and cumulative frequency distributions.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
class 45.5-50.5 50.5-55.5 55.5-60.5 60.5-65.5 65.5-70.5 70.5-75.5 75.5-80.5
boundary
fi 4 8 15 5 9 5 4
Cum. 4 12 27 32 41 46 50
frequency
Q1 Measure of (n/4)th value = 12.5th value which lies in group 55.5 – 60.5
𝑤 𝑛 5
Q1 = L +𝑓 ( 4 − 𝐶𝐹) = 55.5 +15 (12.5 − 12) = 55.7
𝑄1
D4 Measure of (4n/10)th value = 20th value which lies in group 55.5 – 60.5.
𝑤 4𝑛 5
D4 = L +𝑓 ( 10 − 𝐶𝐹) = 55.5 +15 (20 − 12) = 58.2
𝐷4
P7 Measure of (7n/100)th value = 3.5th value which lies in group 45.5 – 50.5
𝑤 7𝑛 5
P7 = L +𝑓 (100 − 𝐶𝐹) = 45.5 +4 (3.5 − 0) = 49.875.
𝑃7
15