Chapter Five
Chapter Five
Learning objectives
By the end of this unit, you will be able to:
~ Describe the concept of statistical average
~ Calculate mean, median and mode of discrete and continuous data types
~ Compute measures of dispersion/variation and interpret.
INTRODUCTION
Graphical representation is a good way to represent summarized data. However, graphs provide us
only an overview and thus may not be used for further analysis. Hence, we use summary
statistics like computing averages to analyze the data. Mass data, which is collected, classified,
tabulated and presented systematically, is analyzed further to bring its size to a single
representative figure. This single figure is the measure, which can be found at central part of the
range of all values. It is the one, which represents the entire data set. Hence, this is called the
measure of central tendency.
In other words, the tendency of data to cluster around a figure, which is in central location, is
known as central tendency. Measure of central tendency or average of first order describes the
concentration of large numbers around a particular value. It is a single value, which represents all
units.
The statistical average or simply an average refers to the measure of middle value of the data set.
The objectives of statistical average are to:
The mass data is condensed to make the data readable and to use it for further analysis.
➢ Facilitate comparison
It is difficult to compare two different sets of mass data. But we can compare those two after
computing the averages of individual data sets. While comparing, the same measure of average
1
should be used. It leads to incorrect conclusions when the mean salary of employees is compared
with the median salary of the employees.
Averages can be used to draw inferences about the unknown relationships between the data sets.
Computing the averages of the data sets is helpful for estimating the average of population.
In many fields, such as business, finance, insurance and other sectors, managers compute the
averages and draw useful inferences or conclusions for taking effective decisions.
ARITHMETIC MEAN
Arithmetic mean is defined as the sum of all values divided by number of values and is
̅
represented by 𝑋
∑ 𝑛
𝑥
✓ For ungrouped data without frequency the arithmetic mean is given by 𝑋̅ = 𝑖=1 𝑖
𝑁
2
Example: Find out the arithmetic mean of 15, 17, 22, 21, 19, 26 and 20.
Solution: The arithmetic mean is given by:
✓ For discrete data with frequency fi, the arithmetic mean is given by
Example: The data in the table below shows the number of students with respect to their age.
Calculate the arithmetic mean of the students’ age.
3
Example: The table below shows the distribution of data of number of students according
to height. Find the arithmetic mean of the height of students.
Arithmetic mean is capable of further algebraic treatment. Suppose X1, X2….. Xn are the
means of n1, n2…….nn sets of values. Then their combined arithmetic mean value is given
by:
Example: If average height of 30 men is 158 cm and average height of another group of 40
men is 162 cm, find the average height of the combined group.
Solution: Given
2. Average weight of 100 screws in box ‘A’ is 10.4 gms. It is mixed with 150 screws of box
‘B’. Average weight of mixed screws is 10.9 gms. Find the average weight of screws of
box ‘B’.
3. A clerk calculated arithmetic mean of 50 values as 39.2. However, it was found that instead
of taking two values as 25 and 32, he took them as 52 and 23. Find the corrected
arithmetic mean.
4. Find the missing frequency for the distribution below, given the mean value as 129.
MODE: is a value of a variable that occurs most often in a data set and is denoted by 𝑋̂. Modal
value is most useful for business people. For example, shoe and readymade garment manufacturers
will like to know the modal size of the people to plan their operations. For discrete data with or
without frequency, it is that value corresponding to highest frequency.
Example: The following data relate to size of shoes. Find the modal value.
6, 7, 6, 8, 9, 9, 9, 10, 8, 7, 7, 9, 10, 9, 9, 9, 8, 8, 11
Modal value is 9, which is the most repeating value in the series.
5
In case of continuous series or grouped data , mode is given by
Where, L – LCB of the modal class, C – Class width, Fm – maximum frequency in the frequency
distribution, Fp – Frequency preceding Fm, Fs – Frequency succeeding Fm
Example: An apartment builder is concerned about the number of customers who wish to have
base area of their apartments. Find the modal base area.
Table : Customers wishing to have base area
Base Area Sq ft 600 – 800 800 – 1000 1000 – 1200 1200 – 1400 1400 – 1600 1600 – 1800 Above 1800
No. of
Customers 4 10 15 - fp 25 - fm 12 - fs 8 2
6
MEDIAN: is the value which is the middle most value when they are arranged in the ascending order of
magnitude. Median is denoted by 𝑋̃
✓ For ungrouped data, first we arrange the values in ascending or descending order and then
✓ For continuous data, first we compute (less than cumulative frequency), <CFi and determine the
𝑁
median class by searching for the class containing the middle value or ( 2 ) 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒. And then
apply the formula below to approximate the median
𝑁
( 2 −< 𝐶𝑓𝑖 )
𝑋̃ = 𝐿 + ×𝐶
𝑓
Where L – LCB of the median class, f – Actual frequency of the median class,
C – Class width, Cf – cumulative Frequency just before the median class
Example: Find the median value of the following set of values 45, 32, 31, 46, 40, 28, 27, 37, 36,
41, 47, 50.
Solution: Arranging in ascending order, we get:
27, 28, 31, 32, 36, 37, 40, 41, 45, 46, 47, 50 and we have, n = 12
𝑛 𝑛
(( 2 )𝑡ℎ )𝑣𝑎𝑙𝑢𝑒+(( 2 +1)𝑡ℎ )𝑣𝑎𝑙𝑢𝑒
𝑥̃ =
2
12 12
(( 2 )𝑡ℎ )𝑣𝑎𝑙𝑢𝑒+(( 2 +1)𝑡ℎ )𝑣𝑎𝑙𝑢𝑒
𝑥̃ =
2
7
6𝑡ℎ 𝑣𝑎𝑙𝑢𝑒+7𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 37+40
𝑥̃ = = = 38.5
2 2
Example: Find the median value for the data shown below.
X 12 16 10 14 17 20 15
f 4 9 3 5 4 2 10
Solution: In this problem, we have, n = 37
X 12 16 10 14 17 20 15
f 4 9 3 5 4 2 10
<Cfi 4 13 16 21 25 27 37
= 14.
Solution: As it is in exclusive type of interval, we organize the data as shown in the table below.
𝑛 100 th
( ) 𝑡ℎ item that is ( ) 𝑡ℎ =50 item is found in [40 – 45] –
2 2
8
𝑁
( 2 −< 𝐶𝑓𝑖 )
𝑋̃ = 𝐿 + ×𝐶
𝑓
100
( 2 − 25) 25
𝑋̃ = 40 + × 5 = 40 + = 40 + 3.125 = 43.125
40 8
Hence, the weight of 50% of the values is less than 43.125 kg.
9
SYMMETRY (SKEWNESS) OF DISTRIBUTIONS
The symmetry of a (uni-modal) distribution can be characterized and described based on the shape of its
frequency polygon and the relative position of its mean, median and mode. A distribution is said to be
symmetrical if its frequency polygon can be folded along a vertical line (ordinate) so that the two halves
of the figure coincide. In other words, for such distributions, values equidistant from the mean halve equal
frequencies. An important type of a symmetrical curve is a bell-shaped curve which has got a single
smooth hump in the middle and tails off gradually at either end. For these types of symmetrical
distributions, the mode, median, and mean have got the same value. A distribution is said to be skewed
if it lacks in symmetry or if its asymmetric. In a skew distribution, observations tend to pile up at one or
the other end of the distribution. Thus the freq. curve of a skew distribution may have a long tail to the
positive (right) side in which case it is said to be positive skew curve (and the distribution is known as
positively skewed distribution) or to the negative (left) side in which case it is known as negative skew
curve (and the distribution is known as negatively skewed distribution). We can identify skewness of a
distribution by using its mode, mean and median.
The flowing diagram depicts what the physical shape of the frequency distribution looks like
10