Lec 2
Lec 2
The individual observations of a distribution or a data set are found to have a general tendency to cluster
around a certain point, somewhere at the center of the distribution. For example, if we observe the
distribution of the height of a group of students in a class, the height of most of the students are close to a
certain central value. This tendency of the observations of a distribution to cluster or concentrate around
the center of the distribution is called central tendency and its numerical measures are known as the
measures of central tendency.
Different measures of central tendency are:
1. Mean
(a) Arithmetic mean (AM)
(b) Geometric mean (GM)
(c) Harmonic mean (HM)
2. Median
3. Mode
The main purpose of measuring central tendency of a distribution is to determine such a value, which can
be considered to be a representative one. An ideal measure of central tendency should, therefore, have the
following characteristics:
● It should be rigidly defined
● It should be based on all the observations.
● It should be readily comprehensible and easy to calculate.
● It should be suitable for further algebraic treatment.
● It should be least affected by sampling fluctuations.
Arithmetic Mean :
Arithmetic mean of a set of observations is their sum divided by the number of observations.
The arithmetic mean may be of two types :
(a) Simple Arithmetic mean
(b) Weighted Arithmetic mean
(a) Simple Arithmetic Mean : The arithmetic mean of n observations x1, x2, ..., xn is given by
In case of frequency distribution; let f1, f2, ..., fn are the frequencies of x1, x2, .., xn respectively, arithmetic
mean is obtained as
(b) Weighted Arithmetic Mean: In practice all values of a series may not carry equal weight or
importance. For example, if we want to have an idea of the change in cost of living of a certain group of
people, the simple mean of prices of the commodities consumed by them will not do, since all the
commodities are not equally important, e.g., rice, sugar and wheat are more important than confectionery
items, coffee, tea, etc.
Let w1, w2, ... ,wn be the weights attached to the item x1, x2, ... ,xn respectively, the arithmetic mean is
computed as-
, where
= Proved.
2. Arithmetic mean is dependent on change of origin and scale.
Proof : Let, x1, x2, ..., xn be the values of a variable x. Let us change the origin to an arbitrary value 'a' and
change the scale by dividing by 'h'. The values of the new variable are,
Now,
or,
Hence proved.
3. The sum of the squares of the deviations of a set of values from their arithmetic mean is the
minimum.
Proof:
Let, be the arithmetic mean of a set of observations x1, x2, ..., xn with frequencies f1, f2, ..., fn
respectively. Now sum of squares of deviations from an arbitrary value 'a' is
or,
i.e., Proved.
Proof:
The mean,
Example 1. The daily wages of a group of farm workers are shown in the following frequency
distribution.
Daily wages Number of Daily wages Number of
(Tk.) workers (Tk.) workers
50-55 5 70-75 15
55-60 10 75-80 7
60-65 25 80-85 3
65-70 35
Direct Method :
Number of
Daily wages
workers Mid value xi fixi
(Tk.)
fi
50-55 5 52.5 262.5
55-60 10 57.5 575.0
60-65 25 62.5 1562.5
65-70 35 67.5 2362.5
70-75 15 72.5 1087.5
75-80 7 77.5 542.5
80-85 3 82.5 247.5
100 6640.0
= Tk. 66.40
∴ Average daily wage is Tk. 66.40
Indirect Method:
In case of frequency distribution, when f1, f2, ... , fn be the frequencies of x1, x2, ..., xn respectively,
then
GM=
= Anti log
Advantages of Geometric Mean :
● It is rigidly defined.
● It is based upon all the observations.
● It is not affected much by sampling fluctuations.
● It is suitable for further algebraic treatments.
● It is the most suitable average in measuring the rate of change.
= 10.9 (Approx.)
The average rate of change of yield of the new variety of wheat is 10.9%.
If f1, f2, ..., fn are respectively the frequencies of x1, x2, ..., xn non-zero observations; then
Advantages of Harmonic Mean :
● It is rigidly defined.
● It is based upon all the observations.
● Sampling fluctuation is less.
● It is not affected much by extreme values
Example 3.
The frequency distribution of profit per share of 10 companies are given below -
Mid-value
Profit per share (Tk.) Frequency fi fi / xi
xi
0-5 1 2.5 0.4000
5-10 2 7.5 0.2667
10-15 4 12.5 0.3200
15-20 2 17.5 0.1143
20-25 1 22.5 0.0444
Total 10 1.1454
By definition,
and
i) Since any square quantity is always non-negative,
or, x1 + x2 -
x1 + x2 ≥
or,
or,
or, H ≤ G i.e., G ≥ H ................................ (3)
(ii) A.H =
= G2
∴A.H = G2 Proved.
Median:
The median of a distribution is the value of the variable which divides the distribution into two equal
parts if arranged in order of magnitude. Median is the value such that the number of observations above it
is equal to the number of observations below it. Thus median is the middlemost value of an ordered set
and as such a positional average.
In case of ungrouped data, when the number of observations, n is odd, observation in the
series will be the median. Again, when the number of observations, n is even, median will be the
where,
Lm = lower limit of the median class.
N = total frequency
fm = frequency of the median class
Advantages of Median:
● It is rigidly defined.
● It is easily understood and easy to compute.
● It is not influenced by extreme items.
● It can be calculated for distribution with opened classes.
● It can be used in defining the median of attributes.
Disadvantages of Median:
● It is not based upon all the observations.
● It is not suitable for further algebraic treatment.
● It is affected much by the sampling fluctuation.
Uses of Median:
● It is used in case of both quantitative and qualitative data.
● It is used for calculating the typical value in problem concerning wages, distribution of wealth
etc.
Quantiles:
Quantiles also are some positional or location measures of the distribution. Quantiles are those values in a
series, which divide the whole distribution into a number of equal parts when the series is arranged in
order of magnitude of observations. The following are the quantiles that are used in Statistics -
(i) Quartiles, (ii) Deciles and (iii) Percentiles.
3 quartiles: Qi (i = 1, 2, 3); devide the whole distribution into four equal parts.
9 deciles: Dj (j = 1, 2,..., 9); devide the whole distribution into 10 equal parts.
99 percentiles: Pk (k = 1, 2, ..., 99); devide the whole distribution into 100 equal parts.
Computation of quantiles from frequency distribution is very much similar to that of median. We
first need to identify the corresponding quantile class. The classes having cumulative frequencies equal to
or immediately higher than iN/4, jN/10 and kN/100 are respectively the ith quartile class, the jth decile
class and the kth percentile class.
Dj = Lj + ; j = 1, 2, ..., 9
Pk = Lk + ; k = 1, 2, ..., 99
Mode (Mo):
Mode of the distribution is that value of the variable for which the frequency is the maximum. In
other words, mode is the highest frequent value of a distribution. In the case of frequency distribution,
mode is given by
Mo = L +
where, L = lower limit of modal class
f0 = frequency of modal class
f1 = frequency of pre-modal class
f2 = frequency of post-modal class
[The class which corresponds to the maximum frequency is the model class]
Advantages of Mode:
● It is easy to understand and easy to calculate
● It is not affected by extreme values.
● It can be located graphically.
Disadvantages of Mode:
● It is not rigidly defined - a distribution may have more than one mode.
● It is not based upon all the observations.
● It is not suitable for further algebraic treatment.
Uses of Mode:
● Mode is used to find the ideal size, e.g., in business forecasting, meteorological forecast on
weather condition, in the manufacture of ready-made garments, shoes, etc.
Graphical Location of Mode:
Mode can graphically located in two ways:
a) Using frequency curve.
b) Using the histogram.
a) From the peak of the frequency curve, a perpendicular line is drawn on the X-axis; the foot of the
perpendicular line indicates the mode (shown in the following figure):
F
r
e
q
u
e
n
c
y
Mo
Mid-values of class intervals.
Figure: Location of mode from frequency curve
b) Mode can be located more accurately from the histogram; the steps are the following :
i) The rectangles corresponding to the modal group, the pre-modal group and the post-modal group are
considered. A straight line is drawn connecting the left vertical point (say A) of the modal group
rectangle and the left vertical point (say D) of the post modal group rectangle. Similarly the right
vertical point (say B) of the modal group rectangle and the right vertical point (say C) of the
pre-modal group rectangle are connected.
ii) From the point of intersection of AD and BC, a perpendicular line is drawn on the X-axis; the foot of
the perpendicular line indicates the mode.
F
r A B
e D
q C
u
e
n
c Mo
y
Class intervals
Fig. Location of mode from the histogram.
Comparison among the Measures of Central Tendency
Criteria AM GM HM Me Mo
From the above comparison, it is clear that arithmetic mean is the best measure of central tendency.
Example 4.
Find the median, lower and upper quartiles, 4th decile, 70th percentile and mode for the following
distribution :
Class: 50-60 60-70 70-80 80-90 90-100 100-110 110 and
above
Frequ- 5 9 13 20 19 9 5
ency:
Solution :
Class Frequency c.f.
50-60 5 5
60-70 9 14
70-80 13 27
80-90 20 47
90-100 19 66
100-110 9 75
110 and over 5 80
N = 80
Here, N = 80
Computation of Median:
∴ Me = Lm +
=
= 86.5
Computation of Quartiles :
∴ Q1 = L1 +
=
= 74.62 (app.)
∴ Q3 = L3 +
=
= 96.84
Computation of Deciles:
∴ D4 = L4 +
=
= 82.50
Computation of Percentiles:
∴ P70 = L70 +
=
= 94.74 (app.)
Computation of Mode:
Here (80-90) is the modal class because maximum frequency (20) lies in that class
∴ Mo = L +
=
= 88.75