STA 111 Topic 2 Notes
STA 111 Topic 2 Notes
1.1 Objectives
By the end of the topic, the learner should be able to:
i) Define measure of central tendency and state the objectives of averaging
ii) Calculate and interpret various measures of central tendency – arithmetic mean,
median, mode, geometric mean, harmonic mean.
1.2 Introduction
Even after the data have been classified and tabulated one often finds too much details
for many uses that may be made of the information available. We, therefore, frequently
need further analysis of the tabulated data. One of the powerful tools of analysis is to
calculate a single average value that represents the entire mass of data. An “average” is a
single value which is considered as the most representative or typical value for a given
set of data. Such a value is neither the smallest nor the largest value, but is a number
whose value is somewhere in the middle of the group. For this reason an average is
frequently referred to as a measure of central tendency or central value
Definition: a measure of central tendency refers to measurement of values around
which data is scattered.
Page 1 of 17
The average should depend upon each and every observation so that if any of
the observation is dropped average itself is altered.
iv) It should be rigidly defined.
An average should be properly defined so that it has one and only one
interpretation.
v) It should be capable of further algebraic or statistical treatment /or analysis.
We should prefer to have an average that could be used for further statistical
computations.
vi) It should have sampling stability.
We should prefer to get a value which has what statisticians call “sampling
stability” (should be least affected by the fluctuations of sampling.
vii) It should not be affected by the presence of extreme values.
Although each and every observation should influence the value of the average
of the average, none of the observation should influence it unduly.
In this course we will look at the following important measures of central tendency
which are generally used in various fields eg business, education, etc:
(1) Arithmetic mean, (2) Median, (3) Mode, (4) Geometric
mean, and (5) Harmonic mean.
Page 2 of 17
𝑛
𝑖=1 𝑓𝑖 𝑥𝑖
𝑥=
𝛴𝑓𝑖
𝑛
Where 𝛴𝑖=1 𝑓𝑖 is the total number of observations.
c) Properties ofArithmetic Mean
= 𝑛𝑖=1 𝑥𝑖 − 𝑛𝑖=1 𝑥
𝑛
𝑖=1 𝑥 𝑖 𝑛
But by definition 𝑥 = ⟹ 𝑖=1 𝑥𝑖 = n𝑥
𝑛
Thus 𝑛𝑖=1 𝑑𝑖 = n𝑥 − n𝑥 = 0 ∆
Exercise - If the 𝑥𝑖 ’s occur with frequencies 𝑓1 , 𝑓2 , 𝑓3 … , 𝑓𝑛 respectively, show that the sum of the
deviations from the arithmetic mean is zero.
Page 3 of 17
And for grouped data:
𝑓𝑖 𝑥𝑖 𝑓 𝑖 (𝑑 𝑖 +𝑎) 𝑓𝑖 𝑑𝑖 + 𝑓𝑖 𝑎 𝑓𝑖 𝑑𝑖
𝑥= and hence = = +𝑎
𝑓𝑖 𝑓𝑖 𝑓𝑖 𝑓𝑖
𝛴𝑑𝑖
𝑥=𝑎+ , (ungrouped data)
𝑛
𝛴𝑓𝑖 𝑑𝑖
𝑥=𝑎+ , (grouped data)
𝛴𝑓𝑖
Page 4 of 17
b) Using assumed mean method (change of origin).
Rather than directly adding these values, we first subtract 𝑎 = 280 from
each one to obtain the new values 𝑑𝑖 = 𝑥𝑖 − 280:
𝑑𝑖 : 4, 0, −3, 2, −1, 5, 1, 3, −2, −3 and 𝛴𝑑𝑖 = 6
𝛴𝑑 𝑖 6
By definition 𝑥 = 𝑎 + = 280 + 10 = 280.6
𝑛
c) Using coding method (change of scale).
𝑢𝑖
By definition 𝑥 = 𝑎 + 𝑐 𝑛
This is ungrouped data and therefore we choose an appropriate value of 𝑐 either the
g.c.d of the 𝑑𝑖 ′𝑠 or any other value (use a factor that will not result in recurring
decimals.
𝑑𝑖
Let 𝑐 = 5, then 𝑢𝑖 = results in 0.8, 0, −0.6, 0.4, −0.2,1, 0.2, 0.6, −0.4, −0.6
5
𝑢𝑖 1.2
Hence 𝑥 = 𝑎 + 𝑐 = 280 + 5 × 10 = 280.6
𝑛
Example 2.2
Thefollowing is a frequency table giving the ages of members of a cultural club for
young adults.
Age 15 16 17 18 18 20
Frequency 2 5 11 9 14 13
Find the arithmetic mean of the ages of the 54 members of the symphony.
Solution
This data is ungrouped but has been placed in a simple frequency distribution table
indicating the age and the corresponding number of students. Hence
𝑛
𝑖=1 𝑥𝑖 15 × 2 + 16 × 5 + 17 × 11 + 18 × 9 + 19 × 14 + 20 × 13
𝑥= = = 18.24
𝑛 54
𝑛
𝑖=1 𝑓 𝑖 𝑥 𝑖
This is equivalent to writing the formula as𝑥 = 𝑛 𝑓
𝑖=1 𝑖
Example 2.3
Calculate the arithmetic mean of the following data using the three methods
Solution:
Use𝑎 = 75, 𝑐 = 5
Class 𝑓 𝑥 𝑓𝑥 𝑑 𝑓𝑑 𝑑 𝑓𝑢
𝑢𝑖 =
=𝑥−𝑎 𝑐
53-57 2 55 110 -20 -40 -4 -8
58-62 12 60 720 -15 -180 -3 -36
53-67 12 65 780 -10 -120 -2 -24
68-67 25 70 1750 -5 -75 -1 -25
73-77 27 75 675 0 0 0 0
78-82 10 80 800 5 50 1 10
83-87 9 85 765 10 90 2 18
Page 5 of 17
88-92 3 90 270 15 45 3 9
𝛴𝑓 = Σ𝑓𝑥 =7 𝛴𝑓𝑑 = Σ𝑓𝑢 =-
100 220 -280 56
𝑓𝑖 𝑥𝑖 7220
𝑥= = = 72.2
𝛴𝑓 𝑖 100
𝑓𝑑 −280
𝑥=𝑎+ = 75 + =75 −2.8=72.2
𝑓 100
𝑓𝑢 56
𝑥 =𝑎+𝑐 =75 + 5 − 100 =75 −2.8=72.2
𝑓
Exercise
1. The mean of seven numbers is seven. One number is removed and the mean
increases to 10. Find the number which was removed.
Page 6 of 17
2. The average weight of a group of 30 friends increases by 1 kg when the weight of
their football coach was added. If average weight of the group after including the
weight of the football coach is 31 kg, what is the weight of their football coach?
4. The average age of a group of 10 students was 20. The average age increased by 2
years when two new students joined the group. What is the average age of the two
new students who joined the group?
Page 7 of 17
N1 X 1 N 2 X 2 N 3 X 3
X 123
N1 N 2 N 3
b) The mean of marks in Statistics of 100 students of a class was 72. The mean of marks
of boys was 75, while their number was 70. Find out the mean marks of girls in the
class.
Solution
We are given N 100 , X 12 72 , mean of boys, X 1 75 , Number of boys, N1 70 . We
have to find out the mean marks of girls, i.e., X 2 .
N1 X 1 N 2 X 2
X 12
N1 N 2
72
7075 30 X 2
70 30
7200 5250 30 X 2 X 2
1950
65
30
Hence the mean marks of girls in the class = 65.
c) The mean age of a combined group of men and women is 30 years. If the mean age
of the group of men is 32 and that of the group of women is 25, find out the
percentage of men and women in the group.
Solution
Let N 1 represent the percentage of men and N 2 represent the percentage of women so
that N1 N 2 100 . We are given X 12 30, X 1 32, X 2 25
N1 X 1 N 2 X 2
X 12
N1 N 2
N1 32 N 2 25
30
100
3000 32 N1 100 N1 25 32 N1 25 N1 3000 2500 500
N1 71.43 and N 2 100 71.43 28.57
Example 8: A shopkeeper has 50 cold drink bottles. Some of the bottles are 1-liter and
some are 2-liter bottles. The average cold drink of the bottles is 1200 ml. Find the
number of 2-liter bottles. (1 liter = 1000 ml)
Solution: We have two groups, one of 1-lit bottles and other one of 2-lit bottles. Let us
say number of 2-lit bottles is N1 and number of 2-lit bottles is N2. We know that N1 +
N2 = 50 as given the in question. The average of group 1 (W1) is 1000 ml as all the
bottles are of equal quantity, i.e. 1000 ml. Similarly, the average of group 2 (W2) is 2000
Page 8 of 17
ml. With the help of weighted average formula we can calculate N1 and N2. The
weighted average here is 1200 ml. Let us put the values in the equation.
As N1 + N2 = 50, Replacing and solving for N1 we get, N1 = 40 and N2 = 10. Thus, the
shopkeeper has 10 bottles of 2-lit.
Xw
WX where X represents the weighted arithmetic mean, X = The variable, and
W
w
Example 2.6
A student final marks in Mathematics, Physics, English and Accounting are respectively
82, 86, 90, and 70. If the respective credits received for these courses are 3, 5, 3, and 1;
determine the approximate average mark.
Solution
Xw
WX
82(3) 86(5) 90(3) 70(1) 246 430 270 70 1016
84.67 85
W 3 5 31 12 12
1.6 Median
Ungrouped data
Order the values of a data set of size n from smallest to largest (in order of magnitude).
If n is odd, the median is the value in position (𝑛 + 1) 2; if n is even, the median is the
average of the values in positions n/2 and n/2 + 1 i.e. it’s the middle value / arithmetic
mean of two middle values
Example 2.7
Page 9 of 17
a) Find the median of: 1, 10,7, 20, 5
Solution
Put the data in an array and arrangein ascending or descending order: 1,5,7,10,20
5+7
=6
2
b) Find the median of the set of numbers: 21, 3, 7, 17, 19, 31, 46, 20 and 43.
Grouped data
The following formula is used
𝑁
( −𝑐 𝑓 )
Median= 𝑙𝑚 + 2
× 𝑐 𝑜𝑟 𝑖
𝑓𝑚
𝑙𝑚 − Lower limit of median class
N- 𝑓- total number of units
C- Size of median class
𝑓𝑚 -Frequency of median class
𝑐𝑓 - Cumulative frequency of class
Example 2.9
Class 𝑓 𝑐𝑓
53-57 2 2
58-62 12 14
63-67 12 26
68-72 25 51
73-77 27 78
78-82 10 88
83-87 9 97
88-92 3 100
𝑁 100
Median class = = = 50
2 2
𝑁
( −𝑐 𝑓 )
Median= 𝑙𝑚 + 2
×𝑐
𝑓𝑚
50 − 26 5
= 67.5 +
25
= 67.5 + 14.8
= 72.3
Lower Quartile (Q1)
Divides the distribution into four.
Calculation of Lower Quartile – Grouped data
N
Determine the particular class in which the value of the lower quartile lies. Use to
4
N
locate the lower quartile class because in the use of grouped data it is which
4
Page 10 of 17
divides the area of the curve into four equal parts. Apply the following formula for
determining the exact value of the lower quartile:
Example 2.8
The profits earned by 100 companies during 2010 – 2011 periods are given below:
Profits ($) No. of companies Profits ($) No. of companies
20 – 30 4 60 – 70 15
30 – 40 8 70 – 80 10
40 – 50 18 80 – 90 8
50 – 60 30 90 – 100 7
Calculate Q1 and Q3.
Solution
Profits ($) No. of companies (f) Cumulative frequency
20 – 30 4 4
Page 11 of 17
30 – 40 8 12
40 – 50 18 30
50 – 60 30 60
60 – 70 15 75
70 – 80 10 85
80 – 90 8 93
90 – 100 7 100
th
N 100
Lower Quartile, Q1 = size of observation = 25th observation.
4 4
Hence Q1 lies in the class 40 – 50.
L = 40, p.c.f. = 12, f = 18, i = 10.
Q1 L
N / 4 pcf i 40 25 12 10 40 7.22 47.22
f 18
Hence 25% of the companies earn an annual profit of $47.22 or less.
th
3N 3 100
Upper Quartile, Q3 = size of observation = 75th observation.
4 4
Hence Q3 lies in the class 60 – 70.
L = 60, p.c.f. = 60, f = 15, i = 10.
Q3 L
3N / 4 pcf i 60 75 60 10 60 10 70
f 15
Hence 75% of the companies earn an annual profit of $70 or less.
These values, i.e., Q1 , median and Q3 can also be obtained from the Ogive curve.
In general; the pth percentile, Xp is the value of x in the ogive corresponding to
p
y N
100
Note:
The median is the 50th percentile value.
The lower quartile is the 25th percentile value.
The upper quartile is the 25th percentile value.
The formula for evaluating Xp is given by:
Pth percentile, X p LX p
pN / 100 pcf i
fX p
1.7 Mode
It’s the value with the highest frequency.
For ungrouped data e.g. 1,2,3,4,5,5,5 the mode is 5
Page 12 of 17
Example 2.9
Find the mean, median, mode, and range for the following list of values: 13, 18, 13, 14,
13, 16, 14, 21, 13
Solution
Note that the mean, in this case, isn’t a value from the original list. This is a common
result. You should not assume that your mean will be one of your original numbers.
The median is the middle value, so first we’ll have to rewrite the list in numerical order:
13, 13, 13, 13, 14, 14, 16, 18, 21
There are nine numbers in the list, so the middle one will be the (9 + 1) ÷ 2 = 10 ÷ 2 = 5th
number:
13, 13, 13, 13, 14, 14, 16, 18, 21
So the median is 14.
The mode is the number that is repeated more often than any other, so 13 is the mode,
since 13 is being repeated 4 times.
The largest value in the list is 21, and the smallest is 13, so the range is 21 – 13 = 8.
Mean: 15 |median: 14 |mode: 3 |range: 8
Grouped data
𝜕 1`
Mode= 𝑙𝑚 + (𝜕 )𝑐
1 +𝜕 2
Page 13 of 17
𝑙 𝑚 𝜕 1 +𝜕 2 +𝜕 1 𝑐
= 𝜕 1 +𝜕 2
𝑙 𝑚 𝜕 1 +𝜕 2 𝜕1 𝑐
= +𝜕
𝜕 1 +𝜕 2 1 +𝜕 2
𝜕1 𝑐
Mode= 𝑙𝑚 + 𝜕
1 +𝜕 2
c)
Class Frequencies
58-62 12
63-67 12
68-72 25
73-77 27
78-82 10
83-87 9
83-87 9
88-92 3
Page 14 of 17
Log (G.M .)
log X 1 log X 2 log X N
log X
N N
log X
G.M . anti log
N
Example 2.10
Compared to the previous year the overhead expenses went up by 32% in 2006; they
increased by 40% in the next year and by 50% in the following year. Calculate the
average rate of increase in the overhead expenses over the three years.
Solution
In average ratios and percentages, geometric mean is more appropriate. Applying
geometric mean here;
% Rise Expenses at the end of the year Log X
taking preceding year as 100 (X)
32 132 2.1206
40 140 2.1461
50 150 2.1761
log X 6.4428
log X
Anti log
6.4428
G.M . Anti log Anti log6.4428 140.5
N 3
Average rate of increase in overhead expenses = 140.5 – 100 = 40.5%.
Page 15 of 17
Example 2.11
The annual rates of growth of output of a factory in 5 years are 5.0, 7.5, 2.5, 5.0, and 10.0
respectively. What is the compound rate of growth of output per annum for the period?
Solution
In average ratios and percentages, geometric mean is more appropriate. Applying
geometric mean here;
Annual rate Output relatives at the Log X
of growth end of the year (X)
5.0 105 2.0212
7.5 107.5 2.0314
2.5 102.5 2.0107
5.0 105.0 2.0212
10.0 110.0 2.0414
log X 10.1259
log X
Anti log
10.1259
G.M . Anti log Anti log2.0252 105 .9
N 5
The compound rate of growth of output per annum for the period = 105.9 – 100 = 5.9%.
Page 16 of 17
Example 2.12
(a) Calculate harmonic mean of numbers 10, 20, 25, 40, 50.
(b) Calculate harmonic mean from the following frequency distribution:
Marks 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50
No. of students 8 15 20 4 3
Solution
(a)
X 1/X
10 0.100
20 0.050
25 0.040
40 0.025
50 0.02
1
X 0.235
N 5
Harmonic Mean 21.28
1
X 0.235
(b)
Marks X F f 1 / X
0 – 10 5 8 1.600
10 – 20 15 15 1.000
20 – 30 25 20 0.800
30 – 40 35 4 0.114
40 – 50 45 3 0.067
f
X 3.581
N 50
Harmonic Mean 13.96
f 3.581
X
Page 17 of 17