Sta 3.2
Sta 3.2
Sta 3.2
1. Median: The median for a data set is the values that is exactly in the middle position of the list
when the data are arranged in order from smallest to largest.
If the number of observations (𝒏) is odd If the number of observations (𝒏) is even
𝑛+1 𝑛 𝑛
𝑀𝑒 = 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 (( ) 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 + ( + 1) 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛)
2 2 2
𝑀𝑒 =
2
Math: Scores of 5 students in an exam have given below: 5, 14, 8, 11, 10. Find median.
Solution: First, we rearranged data from smallest to largest, 5, 8, 10, 11, 14
Here, 𝑛 = 5, that is, odd number.
The median can be written as,
(𝑛 + 1)
𝑀𝑒 = 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
2
5+1
⟹ 𝑀𝑒 = 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 3𝑟𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 10
2
Hence, the median mark of all student is 10.
Math: Scores of 10 students in an exam have given below: 27, 14, 92, 5, 68, 31, 83, 10, 45, 77.
Find median.
Solution: First, we rearranged data from smallest to largest, 5, 10, 14, 27, 31, 45, 68, 77, 83, 92
Here, 𝑛 = 10, that is, even number.
The median can be written as,
𝑛 𝑛
(( ) 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 + ( + 1) 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛)
2 2
𝑀𝑒 =
2
5𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 + 6𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 (31 + 45)
⟹ 𝑀𝑒 = = = 38
2 2
Hence, the median mark of all student is 38.
Chapter 3: Measures of Central Tendency (Median & Mode)
Solution: Here,
Wealth Status Frequency 𝑪𝑭𝒊
Poorest 4 4
Poorer 10 14
Middle 17 31
Richer 12 43
Richest 7 50
Total 𝑛 = 50
Now, 𝑁/2 = 25, which lie in wealth index group “Middle”. Hence, wealth index group “Middle”
is the median.
2. Median for grouped data:
Steps of computing median from
𝑁 grouped data:
− 𝐹𝑐
𝑀𝑒 = 𝐿𝑚 + 2 ×𝑐 1. Prepare a less than type cumulative
𝑓𝑚
frequency distribution.
Here, 𝑁
2. Determine 2 , where 𝑁 is the total
𝐿𝑚 = 𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 frequency.
𝑁 = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠/𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 3. Locate the median class whose
cumulative frequency includes the
𝐹𝑐 = 𝐶𝑢𝑚𝑢𝑙𝑡𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑝𝑟𝑒 − 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 𝑁
value of 2 .
𝑓𝑚 = 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
4. Determine the value of
𝑐 = 𝐶𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝐿𝑚 , 𝐹𝑐 , 𝑓𝑚 , 𝑎𝑛𝑑 𝑐.
Chapter 3: Measures of Central Tendency (Median & Mode)
𝑁 40
Here, 𝑁 = 40; = = 20.
2 2
Looking at the cumulative frequency column in the table, we find that the position of 20 which is
in the class (40 − 50).
Now, 𝐿𝑚 = 40, 𝑁 = 40, 𝐹𝑐 = 15, 𝑓𝑚 = 9, 𝑎𝑛𝑑 𝑐 = 10
𝑁
− 𝐹𝑐 20 − 15
∴ 𝑀𝑒 = 𝐿𝑚 + 2 × 𝑐 = 40 + × 10 = 45.55
𝑓𝑚 9
3. Merits of median:
a) It is rigidly defined.
b) Median is easy to understand and easy to calculate for a non-mathematical person.
c) Since median is a positional average, it is not affected at all by extreme observations.
4. Demerits of median:
a) In case of even number of observations for an ungrouped data, median cannot be
determined exactly.
b) Median is relatively less stable than mean.
Chapter 3: Measures of Central Tendency (Median & Mode)
6. Mode: The mode of a data set is its most frequently occurring value. Data can have more than
one mode. If it has two modes, it is referred to as bimodal, three modes, tri-modal, and the like.
*** Mode is the only one measure of central tendency, which can be computed at all levels of
measurement.
*** 3,4,0,2,2,1,2,5,2, 2; the modal value is 2 and the data set is unimodal data.
*** 2,2,2,0,3,4,4,4,5,6; the modal value is 2 and 4, and the data set is bimodal data.
*** 2,3,4,5,1,9,6,10,7; there is no modal value.
*** 1,1,2,2,3,3,4,4; the modal value is 1, 2, 3, and 4, and the data set is multimodal data.
7. Merits of mode:
a) Mode is easy to calculate and understand.
b) Mode is not at all affected by extreme observations and as such is preferred to arithmetic
mean while dealing with extreme observations.
8. Demerits of mode:
a) Mode is not rigidly defined.
b) It is not based on all the observations of the series.
c) Mode is not suitable for further mathematical treatment.
Chapter 3: Measures of Central Tendency (Median & Mode)
Math: The data below represent the sampled frequency distribution of “Programming Languages
Known” in CSE department. Find the mode.
Number of Programming Frequency
Languages Known
0 5
1 12
2 15
3 18
4+ 8
Total 58
Solution: The table reveals that the occurrence of students who know three programming
languages is the highest, with a frequency of 18 times. As a result, the modal value for
“programming languages known” is determined to be 3.
Math: The data below represent the sampled percentage distribution of “Programming Languages
Known” in CSE department. Find the mode.
Number of Programming Frequency Percentage
Languages Known
0 5 8.6%
1 12 20.7%
2 15 25.9%
3 18 31.0%
4+ 8 13.8%
Total 58 100%
Solution: The table reveals that the occurrence of students who know three programming
languages is the highest, with 31%. As a result, the modal value for "programming languages
known" is determined to be 3.
Chapter 3: Measures of Central Tendency (Median & Mode)
Math:
Class 𝒇𝒊
10-20 5
20-30 8
30-40 12
40-50 7
50-60 9
Here, the class with highest frequency is (30 − 40). This is our modal class.
Now, 𝐿𝑜 = 30, Δ1 = 12 − 8 = 4, Δ2 = 12 − 7 = 5, 𝑐 = 10
Δ1 4
∴ 𝑀𝑜𝑑𝑒, 𝑀𝑜 = 𝐿𝑜 + × 𝑐 = 30 + × 10 = 34.44
Δ1 + Δ2 4+5
11. Does all data have a median, mode and mean? Mean Median Mode
Every set of continuous data possesses a median, mode, Nominal No No Yes
and mean. Nevertheless, when considering ordinal data, Ordinal No Yes Yes
it encompasses solely a median and mode, whereas Interval Yes Yes Yes
nominal data solely involves a mode. Ratio Yes Yes Yes
Chapter 3: Measures of Central Tendency (Median & Mode)
13. If there is outlier in dataset, what is the best indicator of central tendency?
It is usually inappropriate to use the mean in such situations. We would normally choose the
median or mode, with the median usually preferred.
17. What is the most appropriate measure of central tendency when the data has outliers?
The median is usually preferred in these situations because the value of the mean can be distorted
by the outliers. However, it will depend on how influential the outliers are. If they do not
significantly distort the mean, using the mean as the measure of central tendency will usually be
preferred.
Chapter 3: Measures of Central Tendency (Median & Mode)
Extra:
• Table below shows the frequency distribution of marks in “Statistics” of 50 students of a
university.
Scores 0-5 5-10 10-15 15-20 20-25 25-30 30-35
Freq. 3 6 9 9 14 8 1
1. Measures of Location:
a) Quartiles
b) Deciles
c) Percentiles
2. Quartiles: The quartiles which we denote 𝑄1, 𝑄2 , and 𝑄3 divide a data set into four equal parts
when the data arranged in order from smallest to largest.
• The first quartile (𝑄1 ) is the value that separates the bottom 25% of the data from the top
75%
• The second quartile (𝑄2 ) is the middle value/ median value and it separates the bottom
50% of the data from the top 50%
• The third quartile (𝑄3 ) is the value that separates the bottom 75% of the data from the top
25%
Math: Scores from 11 CSE students. Compute the quartiles of the data set and interpret the results,
20, 46, 27, 38, 50, 33, 36, 58, 23, 22, 60
Solution:
First, we arrange the data set in order from smallest to largest:
20, 22, 23, 27, 33, 36, 38, 46, 50, 58, 60
Now,
(𝑖 × 𝑛)
∴ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑄1 =
4
(1 × 11)
⟹ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑄1 = = 2.75
4
Since the position value is not integer, then we take the next integer value as position.
∴ 𝑄1 = 3𝑟𝑑 𝑂𝑏𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 23
Interpretation: It's evident that 25% of the students achieve scores of 23 or below, while 75% of
the students attain scores above 23.
(𝑖 × 𝑛)
∴ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑄2 =
4
(2 × 11)
⟹ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑄2 = = 5.5
4
Since the position value is not integer, then we take the next integer value as position.
∴ 𝑄2 = 6𝑡ℎ 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 36
Interpretation: It's evident that 50% of the students achieve scores of 36 or below, while 50% of
the students attain scores above 36.
(𝑖 × 𝑛)
∴ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑄3 =
4
(3 × 11)
⟹ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑄3 = = 8.5
4
Since the position value is not integer, then we take the next integer value as position.
∴ 𝑄3 = 9𝑡ℎ 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 50
Interpretation: It's evident that 75% of the students achieve scores of 50 or below, while 25% of
the students attain scores above 50.
Chapter 3: Measures of Central Tendency (Median & Mode)
3. Deciles: The deciles which we denote 𝐷1 , 𝐷2 , … , 𝐷9 divide a data set into ten equal parts when
the data arranged in order from smallest to largest.
• The first decile (𝐷1 ) is the value that separates the bottom 10% of the data from the top
90%.
• The second decile (𝐷2 ) is the value that separates the bottom 20% of the data from the top
80%.
• Proceeding in this way, the ninth decile (𝐷9 ) separates the bottom 90% value from the top
10%.
Math: Scores from 11 CSE students. Compute the value of 𝐷1 , 𝐷5 , 𝐷9 of the data set and interpret
the results,
20, 46, 27, 38, 50, 33, 36, 58, 23, 22, 60
Solution:
First, we arrange the data set in order from smallest to largest:
20, 22, 23, 27, 33, 36, 38, 46, 50, 58, 60
Now,
(𝑖 × 𝑛)
∴ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷1 =
10
(1 × 11)
⟹ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷1 = = 1.1
10
Since the position value is not integer, then we take the next integer value as position.
∴ 𝐷1 = 2𝑛𝑑 𝑂𝑏𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 22
Interpretation: It's evident that 10% of the students achieve scores of 22 or below, while 90% of
the students attain scores above 22.
(𝑖 × 𝑛)
∴ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷5 =
10
(5 × 11)
⟹ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷5 = = 5.5
10
Since the position value is not integer, then we take the next integer value as position.
∴ 𝐷5 = 6𝑡ℎ 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 36
Interpretation: It's evident that 50% of the students achieve scores of 36 or below, while 50% of
the students attain scores above 36.
(𝑖 × 𝑛)
∴ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷9 =
10
(9 × 11)
⟹ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷9 = = 9.9
10
Since the position value is not integer, then we take the next integer value as position.
∴ 𝐷9 = 10𝑡ℎ 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 58
Interpretation: It's evident that 90% of the students achieve scores of 58 or below, while 10% of
the students attain scores above 58.
Chapter 3: Measures of Central Tendency (Median & Mode)
4. Percentiles: The percentiles which we denote 𝑃1 , 𝑃2 , … , 𝑃99 divide a data set into hundred equal
parts when the data arranged in order from smallest to largest.
• The first percentile (𝑃1 ) is the value that separates the bottom 1% of the data from the top
99%.
• Proceeding in this way, the ninety ninth percentile (𝑃99 ) separates the bottom 99% of the
data from the top 1%.
Steps of getting Percentiles:
a) Arrange data set from smallest to largest
b) Identify the position of 𝑃𝑖 ; 𝑖 = 1,2,3, … ,99 by utilizing the formula,
(𝑖 × 𝑛)
𝐽=
100
c) If 𝐽 is the integer value, then
(𝐽𝑡ℎ 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 + (𝐽 + 1)𝑡ℎ 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛)
2
If 𝐽 is not integer value, then take the next integer value as position.
Math: Scores from 11 CSE students. Compute the value of 𝑃25 of the data set and interpret the
results,
20, 46, 27, 38, 50, 33, 36, 58, 23, 22, 60
Solution:
First, we arrange the data set in order from smallest to largest:
20, 22, 23, 27, 33, 36, 38, 46, 50, 58, 60
Now,
(𝑖 × 𝑛)
∴ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑃25 =
100
(25 × 11)
⟹ 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑃25 = = 2.75
100
Since the position value is not integer, then we take the next integer value as position.
∴ 𝑃25 = 3𝑟𝑑 𝑂𝑏𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 23
Interpretation: It's evident that 25% of the students achieve scores of 23 or below, while 75% of
the students attain scores above 23.
Chapter 3: Measures of Central Tendency (Median & Mode)