0% found this document useful (0 votes)
9 views16 pages

Lec 2

The document discusses central tendency and its measures, including mean, median, and mode, which represent the clustering of data around a central value. It elaborates on different types of means (arithmetic, geometric, harmonic), their properties, advantages, disadvantages, and applications. Additionally, it covers the calculation of these measures using frequency distributions and introduces quantiles as further statistical measures.

Uploaded by

wasi78045
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views16 pages

Lec 2

The document discusses central tendency and its measures, including mean, median, and mode, which represent the clustering of data around a central value. It elaborates on different types of means (arithmetic, geometric, harmonic), their properties, advantages, disadvantages, and applications. Additionally, it covers the calculation of these measures using frequency distributions and introduces quantiles as further statistical measures.

Uploaded by

wasi78045
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

CENTRAL TENDENCY AND ITS MEASURES

Definition of Central Tendency and Measures of Central Tendency:

The individual observations of a distribution or a data set are found to have a general tendency to cluster
around a certain point, somewhere at the center of the distribution. For example, if we observe the
distribution of the height of a group of students in a class, the height of most of the students are close to a
certain central value. This tendency of the observations of a distribution to cluster or concentrate around
the center of the distribution is called central tendency and its numerical measures are known as the
measures of central tendency.
Different measures of central tendency are:
1. Mean
(a) Arithmetic mean (AM)
(b) Geometric mean (GM)
(c) Harmonic mean (HM)
2. Median
3. Mode
The main purpose of measuring central tendency of a distribution is to determine such a value, which can
be considered to be a representative one. An ideal measure of central tendency should, therefore, have the
following characteristics:
● It should be rigidly defined
● It should be based on all the observations.
● It should be readily comprehensible and easy to calculate.
● It should be suitable for further algebraic treatment.
● It should be least affected by sampling fluctuations.
Arithmetic Mean :
Arithmetic mean of a set of observations is their sum divided by the number of observations.
The arithmetic mean may be of two types :
(a) Simple Arithmetic mean
(b) Weighted Arithmetic mean

(a) Simple Arithmetic Mean : The arithmetic mean of n observations x1, x2, ..., xn is given by

In case of frequency distribution; let f1, f2, ..., fn are the frequencies of x1, x2, .., xn respectively, arithmetic
mean is obtained as
(b) Weighted Arithmetic Mean: In practice all values of a series may not carry equal weight or
importance. For example, if we want to have an idea of the change in cost of living of a certain group of
people, the simple mean of prices of the commodities consumed by them will not do, since all the
commodities are not equally important, e.g., rice, sugar and wheat are more important than confectionery
items, coffee, tea, etc.
Let w1, w2, ... ,wn be the weights attached to the item x1, x2, ... ,xn respectively, the arithmetic mean is
computed as-

, where

Properties of Arithmetic Mean:


1. Sum of the deviations of the values of the variable from its

arithmetic mean is zero, i.e., .


Proof :

= Proved.
2. Arithmetic mean is dependent on change of origin and scale.
Proof : Let, x1, x2, ..., xn be the values of a variable x. Let us change the origin to an arbitrary value 'a' and

change the scale by dividing by 'h'. The values of the new variable are,

Now,

or,

Hence proved.
3. The sum of the squares of the deviations of a set of values from their arithmetic mean is the
minimum.
Proof:
Let, be the arithmetic mean of a set of observations x1, x2, ..., xn with frequencies f1, f2, ..., fn
respectively. Now sum of squares of deviations from an arbitrary value 'a' is

or,

⇒ ; [Since (x-a)2 is a positive quantity]

i.e., Proved.

4. Mean of Composite Series :

If , (i = 1, 2, .... , k) are the means of k-component series of sizes ni (i = 1, ......, k) respectively,


then the mean of the composite series can be obtained by the formula -

Proof:

Let x11, x12, ..., be n1 members of the first series,

x21, x22, ..., be n2 members of the 2nd series ;


........................
........................

be nk members of the kth series-

having means respectively.


Then n1 + n2 + .... + nk will be the size of the composite series

(x11, x12, ..., ), (x21, x22, ... , ), ..., ( ..., ).


The mean, of the composite series of size n1 + n2 + ... + nk is given by
Proved.

Mean of first n Natural Numbers:


First n natural number are 1, 2, ..., n.

The mean,

Advantages of Arithmetic mean:


● It is rigidly defined.
● It is easy to calculate.
● It is based upon all the observations.
● It is suitable for further algebraic treatment.
● It is less affected by sampling fluctuations.

Disadvantages of Arithmetic mean:


● It is affected very much by extreme values.
● It cannot be calculated if the extreme class is open.
● It is not suitable for extremely skewed distribution.
● It cannot be used if we are dealing with qualitative characteristics; such as intelligence, honesty,
beauty, etc.
● It cannot be obtained if a single observation is missing or lost.
Uses of Arithmetic Mean:
● It is widely used to calculate average age, average income, average price, average salary, average
increment, average import and average consumption, etc.
● It is used to establish the various theories and formulas of Mathematics and also used as an aid in
further statistical analysis.
● It is used in computation of index number

Example 1. The daily wages of a group of farm workers are shown in the following frequency
distribution.
Daily wages Number of Daily wages Number of
(Tk.) workers (Tk.) workers
50-55 5 70-75 15
55-60 10 75-80 7
60-65 25 80-85 3
65-70 35

Computation of arithmetic mean by direct and indirect methods

Direct Method :
Number of
Daily wages
workers Mid value xi fixi
(Tk.)
fi
50-55 5 52.5 262.5
55-60 10 57.5 575.0
60-65 25 62.5 1562.5
65-70 35 67.5 2362.5
70-75 15 72.5 1087.5
75-80 7 77.5 542.5
80-85 3 82.5 247.5
100 6640.0

= Tk. 66.40
∴ Average daily wage is Tk. 66.40
Indirect Method:

Daily Number of New variable


Mid value
wages workers fiui
xi
(Tk.) fi
50-55 5 52.5 -3 -15
55-60 10 57.5 -2 -20
60-65 25 62.5 -1 -25
65-70 35 67.5 0 0
70-75 15 72.5 1 15
75-80 7 77.5 2 14
80-85 3 82.5 3 9
100 -22

New variable, ui = ; where, a = 67.5 and h = 5


Now, = -0.22

= 67.5 + 5(-0.22) = 66.40


∴ Average daily wage is Tk. 66.40

Geometric Mean (GM) :


Geometric mean of a set of n non-zero positive observations is the nth root of their product. The GM
of n non-zero positive values x1, x2,..., xn of a variable x is given by

log (GM) = log

In case of frequency distribution, when f1, f2, ... , fn be the frequencies of x1, x2, ..., xn respectively,
then

GM=

= Anti log
Advantages of Geometric Mean :
● It is rigidly defined.
● It is based upon all the observations.
● It is not affected much by sampling fluctuations.
● It is suitable for further algebraic treatments.
● It is the most suitable average in measuring the rate of change.

Disadvantages of Geometric Mean:


● It cannot be computed where there is any negative or zero values in the series.
● It is not easy to understand and to calculate for persons having very weak mathematical skills.
● It cannot be computed when the extreme classes of the frequency distribution are open.
● The value of GM may not be found in the series.

Uses of Geometric Mean :


● Geometric mean is used to find the average of ratios, rate of population growth, rate of interest,
average of percentages.
● It is used in the construction of index numbers.
Example 2.
Rate of increase of yield of a new wheat variety compared with a local variety in 10 selected
agricultural farms are given below –

Rate of increase of yield (%) Number of farms


0-5 1
5-10 2
10-15 4
15-20 2
20-25 1

For computation of geometric mean, we construct the following table

Rate of change of Frequency Mid-


log xi filogxi
yield (%) fi value xi
0-5 1 2.5 0.39794 0.39794
5-10 2 7.5 0.87506 1.75012
10-15 4 12.5 1.09691 4.38764
15-20 2 17.5 1.24304 2.48608
20-25 1 22.5 1.35218 1.35218
Σfi = N Σfilogxi
=10 = 10.37396

= 10.9 (Approx.)

The average rate of change of yield of the new variety of wheat is 10.9%.

Harmonic Mean (HM):


Harmonic mean of a set of non-zero observations is the reciprocal of the arithmetic mean of the
reciprocals of the given values. Harmonic mean of n non-zero observations x1, x2, .... , xn is given by

If f1, f2, ..., fn are respectively the frequencies of x1, x2, ..., xn non-zero observations; then
Advantages of Harmonic Mean :
● It is rigidly defined.
● It is based upon all the observations.
● Sampling fluctuation is less.
● It is not affected much by extreme values

Disadvantages of Harmonic Mean:


● It cannot be computed where there is any zero value in the series.
● It is not easily understood and difficult to compute.
● It is very complex for further algebraic treatments.
● It cannot be computed if the extreme classes of the frequency distribution are open.
Uses of Harmonic Mean:
● The harmonic mean is used when observations are made in terms of work done per hour, speeds
(kilometers covered per hour), quantity of things purchased per taka, etc.

Example 3.
The frequency distribution of profit per share of 10 companies are given below -

Profit per share (Tk). 0-5 5-10 10-1 15-2 20-2


5 0 5
No. of companies 1 2 4 2 1

To calculate harmonic mean, we construct the following table:

Mid-value
Profit per share (Tk.) Frequency fi fi / xi
xi
0-5 1 2.5 0.4000
5-10 2 7.5 0.2667
10-15 4 12.5 0.3200
15-20 2 17.5 0.1143
20-25 1 22.5 0.0444
Total 10 1.1454

∴ The average profit per share is Tk. 8.73.

Relationship among AM, GM and HM:


◙ For two non-zero positive observations:
i) A ≥ G ≥ H
ii) AH = G2; where A = Arithmetic mean,
G = Geometric mean and H = Harmonic mean.

Proof: Let the two non-zero positive observations be x1 and x2.

By definition,

and
i) Since any square quantity is always non-negative,

or, x1 + x2 -

or, ................................ (1)


∴ A ≥G ................................ (2)
Again from equation (1)

x1 + x2 ≥

or,

Multiplying both sides by we get

or,
or, H ≤ G i.e., G ≥ H ................................ (3)

From (2) & (3) it follows that A ≥ G ≥ H. Proved.

(ii) A.H =

= G2
∴A.H = G2 Proved.
Median:
The median of a distribution is the value of the variable which divides the distribution into two equal
parts if arranged in order of magnitude. Median is the value such that the number of observations above it
is equal to the number of observations below it. Thus median is the middlemost value of an ordered set
and as such a positional average.

In case of ungrouped data, when the number of observations, n is odd, observation in the
series will be the median. Again, when the number of observations, n is even, median will be the

arithmetic mean of observation in the series.


For computing median from frequency distribution we first need to identify the median class (class
which contains the median). The class having cumulative frequency equal to N/2 or higher will be the
median class.
For frequency distribution, the formula for computing the median is

where,
Lm = lower limit of the median class.
N = total frequency
fm = frequency of the median class

= cumulative frequency of the pre-median class


h = length of median class.

Advantages of Median:
● It is rigidly defined.
● It is easily understood and easy to compute.
● It is not influenced by extreme items.
● It can be calculated for distribution with opened classes.
● It can be used in defining the median of attributes.
Disadvantages of Median:
● It is not based upon all the observations.
● It is not suitable for further algebraic treatment.
● It is affected much by the sampling fluctuation.
Uses of Median:
● It is used in case of both quantitative and qualitative data.
● It is used for calculating the typical value in problem concerning wages, distribution of wealth
etc.

Quantiles:
Quantiles also are some positional or location measures of the distribution. Quantiles are those values in a
series, which divide the whole distribution into a number of equal parts when the series is arranged in
order of magnitude of observations. The following are the quantiles that are used in Statistics -
(i) Quartiles, (ii) Deciles and (iii) Percentiles.

3 quartiles: Qi (i = 1, 2, 3); devide the whole distribution into four equal parts.
9 deciles: Dj (j = 1, 2,..., 9); devide the whole distribution into 10 equal parts.
99 percentiles: Pk (k = 1, 2, ..., 99); devide the whole distribution into 100 equal parts.

Computation of quantiles from frequency distribution is very much similar to that of median. We
first need to identify the corresponding quantile class. The classes having cumulative frequencies equal to
or immediately higher than iN/4, jN/10 and kN/100 are respectively the ith quartile class, the jth decile
class and the kth percentile class.

For frequency distribution, the quantiles are computed as - Qi = Li + ; i = 1, 2, 3

Dj = Lj + ; j = 1, 2, ..., 9

Pk = Lk + ; k = 1, 2, ..., 99

i, j, k indicate the order of quartiles, deciles and percentiles respectively; are


respectively the cumulative frequencies of class preceding the ith quartile, jth decile and kth percentile
classes; h is the corresponding class interval.

It may be mentioned that


Q2 = D5 = P50 = Me ; Q1 = P25 ; Q3 = P75 ; D6 = P60 ,etc.

Graphical Location of Median and Quantiles:


Median, quartiles, deciles and percentiles can be located from ogive; the necessary steps are briefly
discussed below :
i) An ogive is drawn and the position in the Y-axis is marked for different partition values (e.g., N/2
for median, N/4 for 1st quartile, 4N/10 for 4th decile, etc.)
ii) From the corresponding points in the Y-axis, a line parallel to the X-axis is drawn which
intersects the ogive at certain point.
iii) From the corresponding point of intersection mentioned above, a perpendicular line is drawn on
the X-axis; the foot of the perpendicular line is the desired partition value. The whole process is
illustrated in the figure below:
C N
u
m 3N
/4
u
l
N
a /2
t
i N
/4
v
e
F 0
r
e
q
u
e
n
c
y
Q1 Me Q3
Upper limits of class interval
Fig. Location of median and quantiles.

Mode (Mo):
Mode of the distribution is that value of the variable for which the frequency is the maximum. In
other words, mode is the highest frequent value of a distribution. In the case of frequency distribution,
mode is given by

Mo = L +
where, L = lower limit of modal class
f0 = frequency of modal class
f1 = frequency of pre-modal class
f2 = frequency of post-modal class
[The class which corresponds to the maximum frequency is the model class]
Advantages of Mode:
● It is easy to understand and easy to calculate
● It is not affected by extreme values.
● It can be located graphically.
Disadvantages of Mode:
● It is not rigidly defined - a distribution may have more than one mode.
● It is not based upon all the observations.
● It is not suitable for further algebraic treatment.
Uses of Mode:
● Mode is used to find the ideal size, e.g., in business forecasting, meteorological forecast on
weather condition, in the manufacture of ready-made garments, shoes, etc.
Graphical Location of Mode:
Mode can graphically located in two ways:
a) Using frequency curve.
b) Using the histogram.
a) From the peak of the frequency curve, a perpendicular line is drawn on the X-axis; the foot of the
perpendicular line indicates the mode (shown in the following figure):

F
r
e
q
u
e
n
c
y
Mo
Mid-values of class intervals.
Figure: Location of mode from frequency curve

b) Mode can be located more accurately from the histogram; the steps are the following :

i) The rectangles corresponding to the modal group, the pre-modal group and the post-modal group are
considered. A straight line is drawn connecting the left vertical point (say A) of the modal group
rectangle and the left vertical point (say D) of the post modal group rectangle. Similarly the right
vertical point (say B) of the modal group rectangle and the right vertical point (say C) of the
pre-modal group rectangle are connected.

ii) From the point of intersection of AD and BC, a perpendicular line is drawn on the X-axis; the foot of
the perpendicular line indicates the mode.
F
r A B
e D
q C
u
e
n
c Mo
y
Class intervals
Fig. Location of mode from the histogram.
Comparison among the Measures of Central Tendency

Criteria AM GM HM Me Mo

Rigidly Rigidly Rigidly Not rigidly


Definition Rigidly defined
defined defined defined defined
Values must
Data No Values must be
be nonzero No restriction No restriction
restriction restriction nonzero
and positive
Slightly
Computation Easy Slightly difficult Easy Easy
difficult
Based upon all
Yes Yes Yes No No
observations
Effect of Less
Not affected Less affected Not affected Not affected
extreme values affected
Sampling
Little Little Little Much Much
fluctuation
Graphical Not
Not possible Not possible Possible Possible
location possible
Further
algebraic Possible Possible Not possible Not possible Not possible
treatment

From the above comparison, it is clear that arithmetic mean is the best measure of central tendency.
Example 4.
Find the median, lower and upper quartiles, 4th decile, 70th percentile and mode for the following
distribution :
Class: 50-60 60-70 70-80 80-90 90-100 100-110 110 and
above
Frequ- 5 9 13 20 19 9 5
ency:
Solution :
Class Frequency c.f.
50-60 5 5
60-70 9 14
70-80 13 27
80-90 20 47
90-100 19 66
100-110 9 75
110 and over 5 80
N = 80

Here, N = 80
Computation of Median:

= 40th observation lies in the class (80-90)


∴ (80-90) is the median class

∴ Me = Lm +

=
= 86.5
Computation of Quartiles :

= 20th observation lies in the class (70-80)


∴ (70-80) is the lower quartile (Q1) class

∴ Q1 = L1 +

=
= 74.62 (app.)

Again, = 60th observation lies in the class (90-100)


∴ (90-100) is the upper quartile (Q3) class

∴ Q3 = L3 +

=
= 96.84
Computation of Deciles:

= 32th observation lies in (80-90)


∴ (80-90) is the 4th deciles (D4) class

∴ D4 = L4 +

=
= 82.50
Computation of Percentiles:

= 56th observation lies in the class (90-100)


∴ (90-100) is the 70th percentiles (P70) class

∴ P70 = L70 +

=
= 94.74 (app.)
Computation of Mode:
Here (80-90) is the modal class because maximum frequency (20) lies in that class

∴ Mo = L +

=
= 88.75

You might also like