0% found this document useful (0 votes)
16 views31 pages

Chapter 3

Uploaded by

Kaleb Mulatu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views31 pages

Chapter 3

Uploaded by

Kaleb Mulatu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Basic Statistics for Applied Science (Math 1106) Lecture Note

Chapter 3: Measures of Central Tendency


3.1 Introduction
Quantitative data in a mass or arranged in the form of frequency distributions generally exhibit a common
characteristic, which is that they have a tendency to concentrate at certain values, usually somewhere in the
center of the distributions. This tendency of the items or values of items to cluster in the central part of the
distribution is known as central tendency and can be measured statistically
So, central tendency refers to the location of the center of a distribution and tells us where the data are or what a
typical observation represents the set of data
The value of a measure of central tendency or an average is regarded as the most representative value of the
given data. This is because it is determined at the point where the concentration of items or values is greatest.
i.e. the frequency is highest on a distribution scale.
Average is thus a single figure, which represents the characteristic or general significance of the data. It is the
point or location around which individual values of the data cluster. So it is also called “Measure of location”.
It has a tendency to be somewhere at the centre within the range of all values. It is thus also called measure of
central tendency.
There are many types of measures of central tendencies (averages), each possessing particular properties and
each being typical in some unique way.
The most frequently encountered ones are given below.
1. Computed averages
a) The arithmetic mean
b) The geometric mean
c) The harmonic mean
2. Positional averages
a) The median
b) The quantiles/ fractiles
3. The mode
In this chapter, these measures are presented in the ordered mentioned.

3.2 Objectives of measuring central tendency

1. To comprehend the data easily.


2. To facilitate comparison.
3. To make further statistical analysis.
Meseret Taddesse Ejeta Page 1
Basic Statistics for Applied Science (Math 1106) Lecture Note

3.3 Important characteristics of measures of central tendency


According to Prof.Yule, the following are the desiderata (requirements) to be satisfied by an ideal average or
measure of central tendency:
1. It should be rigidly defined i.e., the definition should be clear and un-ambiguous so that it leads to one and
only one interpretation by different persons. In other words, the definition should not leave anything to the
discretion of the investigator or the observer. If it is not rigidly defined then the bias introduced by the
investigator will make its value unstable and render it unrepresentative of the distribution.
2. It should be easy to understand and calculate even for a non-mathematical person. In other words, it
should be readily comprehensible and should be computed with sufficient ease and rapidly and should not
involve heavy arithmetical calculations.

3. It should be based on all the observations.

Thus, in the computation of an ideal average the entire set of data at our disposal should be used and there
should not be any loss of information resulting from not using the available data. Obviously, if the whole
data is not used in computing the average, it will be unrepresentative of the distribution.

4. It should be suitable for further mathematical treatment.

In other words, the average should possess some important and interesting mathematical properties so that its
use in further statistical theory is enhanced. For example, if we are given the averages and sizes (frequencies)
of a number of different groups then for an ideal average we should be in a position to compute the average
of the combined group. If an average is not amenable to further algebraic manipulation, then obviously its
use will be very much limited for further applications in statistical theory.

5. It should be affected as little as possible by fluctuations of sampling.

By this we mean that if we take independent random samples of the same size from a given population and
compute the average for each of these samples then, for an ideal average, the values so obtained from
different samples should not vary much from one another. The difference in the values of the average for
different samples is attributed to the so called fluctuations of sampling. This property is also explained by
saying that an ideal average should possess sampling stability.

6. It should not be affected much by extreme observations.

By extreme observations we mean very small or very large observations should not unduly affect the value of
a good average.

Meseret Taddesse Ejeta Page 2


Basic Statistics for Applied Science (Math 1106) Lecture Note

General Rounding Rule In statistics the basic rounding rule is that when computations are done in the
calculation, rounding should not be done until the final answer is calculated. When rounding is done in the
intermediate steps, it tends to increase the difference between that answer and the exact one. But in the
textbook and solutions manual, it is not practical to show long decimals in the intermediate calculations;
hence, the values in the examples are carried out to enough places (usually three or four) to obtain the same
answer that a calculator would give after rounding on the last step.

3.4 Types of measures of central tendency


3.4.1 Mathematical measures of Central Tendency
a) Simple arithmetic mean

Definition If X1 , X 2 ,...., X N are the values of a variable X, then arithmetic mean of X denoted by X is defined
as
N

X 1  X 2  ...  X N 
Xi
X  i 1

N N
For the case of discrete grouped data, if X assumes k distinct values X1 , X 2 ,...., X k with respective frequencies
f1, f2, … , fk, then X ( the Arithmetic mean of X) is

fX i i k
X i 1
, where N   fi
N i 1

For the case of a frequency distribution of a continuous variable grouped in to class-intervals, if there are k
classes with respect to mid points X1 , X 2 ,...., X k and respective frequencies f1, f 2 ,...., f k then

fX i i k
Arithemetic mean X of X is X  i 1
, where N   f i
N i 1

Example 3.1 Compute the Arithmetic mean for the following frequency distributions.

a) Values (xi) 20 21 22 23 24 25

Frequencies(fi) 5 5 7 6 6 7

Narks(X) 10 __19 20 __29 30 __39 40 __49 50 __59


b)
fi 4 5 8 6 2

Meseret Taddesse Ejeta Page 3


Basic Statistics for Applied Science (Math 1106) Lecture Note
6

fx i i
5 x 20  5 x 21  7 x 22  6 x 23  6 x 24  7 x 25 816
Solution a) x  i 1
6
   22.66666667 22.7
f
36 36
i
i 1

Mid-value 14.5 24.5 34.5 44.5 54.5 Total


b) (xi)
fi 4 5 8 6 2 25
fixi 58 122.5 276 267 109 832.5
832.5
Therefore, x   33.3
25

Rounding Rule for the Mean The mean should be rounded to one more decimal place than occurs in the raw
data. For example, if the raw data are given in whole numbers, the mean should be rounded to the nearest tenth.
If the data are given in tenths, the mean should be rounded to the nearest hundredth, and so on.

Mathematical Properties of the arithmetic mean


Arithmetic mean possesses some very interesting and important mathematical properties as given below.
1. If we subtract an arbitrary constant from each of the observations the mean is also reduced by the constant
value.

2  4  6  8 20
Example3.2 The mean of the observations 2, 4, 6, and 8 is   5 . If we subtract from each of
4 4
these observations 2, we get the new observations 0, 2, 4 and 6. The mean of the new observations is
0246
 3 which is mean of the original data minus the constant 2. i.e., 5-2 =3.
4

2. If we add an arbitrary constant to each of the observations the mean is also increased by the same constant

value.

3. If we divide each observation of a set by an arbitrary constant the mean is reduced as many times as the

constant devisor.

Example 3.3 If we have a set of data 2, 4, 6 and 8 then their mean is 5. If we divide the given set of data by 2,
1 2  3  4
we get a new set of data 1, 2, 3 and 4. The mean of the new set of data is  2.5 which is equal to
4
5
 2.5
2

4. If a wrong figure has been used when computing the mean, then the correct mean can be obtained without

Meseret Taddesse Ejeta Page 4


Basic Statistics for Applied Science (Math 1106) Lecture Note

repeating the whole process using the relation

Correct mean = wrong mean +


coreect value  wrong value ,
n

Where n is total number of observation.

Example 3.4 The arithmetic mean of 20 observations 20. But while calculating this, an observation 13 was
misread as 30. Compute the correct mean.

Solution

Given n  20 , x  20.
correct value  wrong value 13  30
Then x (correct)  x ( wrong)   20   19.15  19.2
n 20

5. The sum of the deviations of the observations from their arithmetic mean is always equal to zero i.e., Let

 X 
N
X1 , X 2 ,...., X N denote the values of a variable X and Let X denote their mean, then i X 0
i 1

       
N
Pr oof :  X i  X  X 1  X  X 2  X  ...  X N  X
i 1
N
  Xi  N X
i 1
N

X i
 N X  NX , sin ce X  i 1

N
0

6. If Y is a linear function of X then Y is the same linear function X . i.e. If Yi  aX i  b, i  1, 2,..., N , where a
and b are any given constants, then Y  aX  b .

Proof By definition the mean of Y , Y is

Meseret Taddesse Ejeta Page 5


Basic Statistics for Applied Science (Math 1106) Lecture Note
N
1
Y
N
Y
i 1
i

N
1

N
  aX
i 1
i  b

1  N 
  a  X i  Nb 
N  i 1 
N
a  Xi
Nb
 i 1

N N
 aX b

7. The sum of the squares of deviations of the given set of observations is minimum when taken from the
arithmetic mean.
Proof Mathematically, for a given frequency distribution

The sum S   f i  X i  A  is minimum when A  X . Here we use the principle of maxima and minima in
2

differential calculus.

dS d 2S
For S to be minimum if  0 and 0
dA dA2

dS
   fi .2  X i  A 1  2 fi  X i  A   0
dA i i

  fi X i   fi A  0
i i

fX i i
 A i
X
f i
i

d 2S
Again  2 fi   1  2 fi  2 N  0,
dA2 i i

Since total frequency is always positive

b) Weighted mean
In the computation of arithmetic mean we assumed that all items are of equal importance. It may not be so.
Importance of different items can be shown by attaching suitable weights to them relative to their importance. If
w1,w2,…wn are the weights assigned to the values x1,x2,…,xn respectively, then the weighted mean is given as:

Meseret Taddesse Ejeta Page 6


Basic Statistics for Applied Science (Math 1106) Lecture Note
n

w x  w2 x2  ...  wn xn w x i i
xW  1 1  i 1
w1  w2  ...  wn n

wi 1
i

(When w1= w2 =w3 = … = w for all i=1, 2…, n then the mean becomes the arithmetic)

Example 3.5 A student was registered for five courses with 4,4,3,2 and 3 credit hours. She obtained B, A, C, D
and A grades respectively. The grading system is of the form A= 4, B=3, C=2, D=1 and F=0. Find the GPA of
the student.

Solution Let w1  4, w2  4, w3  3, w4  2 and w5  3, because the credit hours are the weights of the courses.
Then x1  3, x2  4, x3  2, x4  1 and x5  4.

w x i i
4 x3  4 x 4  3x 2  2 x1  3x 4 48
Therefore, GPA  xw  i 1
   3.00
5
4 43 23
w
16
i
i 1

c) Combined mean

Let there be two sets of observations on the variable X . Let n1 and x1 denote the number of observations and

the mean of X in the 1st, and n2 and x2 denote the number of observations and the mean of X in the 2nd set.
Then the mean of the combined set n1  n2 observations on X , denoted by x12 , is given by

n1 x1  n2 x2
x12 
n1  n2

Example 3.6 In a test given to two sections of a statistics course the average grade is 60.98. Section 1 has a
mean of 57.30, section 2 a mean of 65.30. If there are 27 students in section 1, how many students are there in
section2?

Solution Let n1 and n2 be the number of students in section 1 and section 2 respectively x1 and x2 be the mean
mark of students in section 1 and section 2 respectively for a statistics course. Again let x12 be the combined
mean grade of the two sections in statistics course.

Hence, we are given that n1  27, x1  57.30, x2  65.30 and x12  60.98 .We are to find n2.

Then by definition of combined mean we have,

n1 x1  n2 x2 27 x57.3  n2 x65.3
x12   60.98   n2  23
n1  n2 27  n2

Meseret Taddesse Ejeta Page 7


Basic Statistics for Applied Science (Math 1106) Lecture Note

Thus there are 23 students in section 2.

Generally, if we have k- different sets of data with n1 , n2 ,..., nk numbers of observations and x1 , x2 ,..., xk
arithmetic means respectively, then the arithmetic mean for the combined set of observations is given by the
relation

n1 x1  n2 x2  ...  nk xk
x12...k 
n1  n2  ...  n k

Example 3.7 The mean marks obtained by 300 candidates in statistics are 46. The mean of the top 100 of
them was found to be 70 and the mean of the last 100 was known to be 20. What is the mean of the remaining
100 candidates?

Solution Here we are given 3-different sets of data with n1  n2  n3  100 and x1  70, x3  20 and x123  46
We require x2  ?

Using the formula for combined mean of three sets of data, we have

n1 x1  n2 x2  n3 x3 100x70  100x2  100x 20


x123   46   x2  48.
n1  n2  n3 300

Merits and Demerits of Arithmetic mean

Merits

1. It is rigidly defined ( the definition should be clear and un-ambiguous so that it leads to one and only
one interpretation by different persons)

2. It is easy to calculate and understand

3. It is based on all the observations

4. It is suitable for further mathematical treatment

5. Of all the averages, arithmetic mean is affected least (a stable average)

Demerits

1. It is affected by extreme vales.

2. It cannot be obtained for open end classes.

3. It cannot be used for qualitative characteristics such as intelligence, honesty, beauty, etc.

4. It cannot be determined by inspection nor can it be located graphically.

Meseret Taddesse Ejeta Page 8


Basic Statistics for Applied Science (Math 1106) Lecture Note

d) The Geometric mean (G.M)

Definition Let X be a variable with values X1, X2, …, Xn. Then the geometric mean of X denoted by G.M or
Mg is defined as:
1
 n n
G.M  x1.x2 .x 3 ... xn 
1
n    xi  ( for xi  0 only)
 i1 
 n x1.x2 .x3 ... xn ( for xi  0 only)

Example 3.8 Suppose the profits earned by the Sur Construction Company on five .projects were 3,4,4,6 and 5
percent, respectively. What is the geometric mean profit?

Solution

G.M  5 x1.x2 .x3 .x4 .x5  5 3x 4 x 4 x6 x5  4.28225

The geometric mean, profit is 4.28225 percent. The arithmetic mean profit is 4.4 percent, found by
(3+4+4+6+5)/5. It is always true that the arithmetic mean is greater than the geometric mean for any series of
positive values, unless the items being averaged are the same value, in which case the two averages are the
same.
The above form of the formula is used when dealing with ungrouped data. For discrete grouped data, the
formula of the geometric mean becomes:

G.M  n x1f1 x2f2 x3f3 ... xmfm

Where f i  frequency of the ith value

xi  i th value

m - number of values

and n=  fi

Example 3.9 Find the geometric mean for the data given in the table below.

xi 1 2 4 6
fi 2 1 2 3

Solution

n   fi
i

 G.M  8 12 x 21 x 4 2 x63  3.02.

Meseret Taddesse Ejeta Page 9


Basic Statistics for Applied Science (Math 1106) Lecture Note

For continuous grouped data, we use the same formula by letting the class marks represent their respective
classes.
Example 3.10 Find the geometric mean for the following continuous grouped data on the percentage increase in
salary of 16 employees of accompany.

% increase in salary 0 __ 4 5 __ 9 10 __ 14 15 __ 19

Number of employees 5 6 3 2

Solution The class marks are 2,7,12 and 17 for the 1st, 2nd, 3rd and 4th class respectively.

Therefore,

G.M  16 25 x7 6 x 12  x 17 


3 2

 16 32 x117649 x1728 x 289


 16 1,880, 095, 021, 000
 5.85

The geometric mean percentage increase in salary is 5.85 percent.

Uses of geometric mean

Geometric mean is especially useful in averaging ratios, percentages, and rates of increase between two periods.

G.M. is the appropriate average to be used for computing the average rate of growth of population or average
increase in the rate of profits, sales, production etc., or the rate of money.

Compound Interest formula

Let Po be the initial value of the variable (i.e. the value of the variable in the beginning and P n be its value at the
end of the period n and let r be the rate of growth per unit period.

Growth for period 1 is Por and thus the value of the variable at the end of period 1 is

Po +Por = Po (1+r) r

The growth for the second period is Po (1+r) r

= Po (1+r) r and consequently the value of the variate at the end of 2 nd period is

Po 1  r   Po 1  r  r  Po 1  r  1  r 
 Po 1  r  .
2

Meseret Taddesse Ejeta Page 10


Basic Statistics for Applied Science (Math 1106) Lecture Note

Similarly proceeding we shall get the value of the variable at the end of period 3 is

Po 1  r   Po 1  r  r  Po 1  r 
2 2 3

P4  Po 1  r   Po 1  r  r  Po 1  r  ....Value at the end period 4


3 3 4

Pn  Po 1  r   Po 1  r  .r  Po 1  r  ...Value at the end of period n


n 1 n 1 n

 Pn  Po 1  r  ….. (1) Compound interest formula for money


n

Where Pn : The value at the end of period n

Po : The value in the beginning

n: The length of the period

r: The rate per unit per period

For given values of n, Pn and Po we get from (1)


1
Pn P  n
 1  r   1  r   n 
n

Po  Po 
Pn
 r n 1
Po

Average Rate of a variable which Increases by Different Rates at Different periods

If instead of the values of the variable increasing at a constant rate in each period, the rate per unit per period is
different, say, r1,r2,…rn for the 1st ,2nd , …, and nth period respectively. Then as discussed above we shall get

P1= the value at the end of first period

= Po (1+r1)

P2= the value at the end of 2nd period

 Po 1  r1 1  r2 

.
.
.
Pn = the value at the end of period n = Po 1  r1 1  r2  ... 1  rn 

Meseret Taddesse Ejeta Page 11


Basic Statistics for Applied Science (Math 1106) Lecture Note

Pn  Po (1  r1 )(1  r2 )...(1  rn ) 

If r is assumed to be the constant rate of growth per unit per period, then we get

Pn  Po 1  r  
n

Hence equating the values of Pn in   and   the average rate of growth over the period n is given by:

1  r   1  r1 1  r2  ... 1  rn 
n

 1  r  1  r1 1  r2  ... 1  rn  


1
n

If r1 , r2 , r3 ,..., rn denote the percentage growth per unit per period for the n periods respectively, then we have

1
r  r  r   r  n
1  1  1 1  2  ... 1  n  
100  100  100   100  

Where r is the average percentage growth rate over n periods

100  r  100  r1 100  r2  ... 100  rn  


1
n

 r  100  r1 100  r2  ... 100  rn  n  100


1

Thus we see that if rates are given as percentages then the average percentage growth rate can be obtained on
subtracting 100 from the G.M. of (100+r1),(100+r2), …,(100+rn).

Example 3.11 Find the average rate of increase in population which in the first decade had increased by 20% in
the next by 30% and in the third by 40%.

Solution Here r1  20, r2  30, r3  40 and n  3 Hence, the average percentage rate of

increase in the population per decade over the entire period is

r  3 120 x130 x140  100  129.743  100


 29.7%

Example 3.12 The population of a country was 300million in 1951. It became 520 million1969. Calculate the
percentage compound rate of growth per annum.

Solution Given Pn  520, 000, 000, Po  300, 000, 000 and n  19

Meseret Taddesse Ejeta Page 12


Basic Statistics for Applied Science (Math 1106) Lecture Note

If r is the percentage compound rate of growth per annum, then by the formula:

520
Pn  Po 1  r    1  r 
n 19

300

 
1
 1  r  26
19
15

 
1
 r  26  1  0.02972926  2.97%
19
15

Example 3.13 A certain store made profits of Birr 5,000, Birr 10, 000, Birr80, 000 in1965, 1966, and1967
respectively. Determine the average rate of growth of this store‟s profits.

10, 000
Solution Rate of growth of profits from 1965 to 1966 is x100  200%
5, 000

80, 000
Rate of growth of profits from 1966 to 1967 is x100  800%
10, 000

The average rate of growth of store‟s profits from 1965 to 1967 is the geometric
mean of 200 and 800.

i.e. Average rate of growth  200 x800  400%

Example 3.14 The price of a certain commodity increases from Birr 60 to Birr 140 in a period of 4 years. Find
the average percentage rate of growth of the price per year.
Solution Here Po  Birr 60, Pn  Birr 140, n  4
1

Then r   n   1
P 4

 Po
1
 140  4
  1
 60 
 0.235930917

Merits and Demerits of Geometric Mean

Merits
1. Geometric Mean is rigidly defined
2. It is based on all observations
3. It is suitable for further mathematical treatment

Meseret Taddesse Ejeta Page 13


Basic Statistics for Applied Science (Math 1106) Lecture Note

Demerits
1. Because of its abstract mathematical character, geometric mean is not easy to understand and to
calculate for a non-mathematical person.
2. If any one of the observations is zero, geometric mean becomes zero and if anyone the
observations are negative, geometric mean becomes imaginary regardless of the magnitude of the
other items.

e) The Harmonic Mean

Another measure of central tendency which is only occasionally used is the harmonic mean.

The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of series of observations.

Let the values of the variable X be x1 , x2 ,.., xn . Then the harmonic mean of X denoted by H.M is defined as:

1 n
H .M  
1  1  ...  1 n
 1 
X1 X2 Xn   
i 1  X i 
n

To make the computation easier, we can write the formula as

  1 X 
n
1  1  ...  1 
1 X1 X2 Xn i
  i 1
H .M n n

The above formula is used when dealing with ungrouped data

Example 3.15 Find the harmonic mean of the following numbers:

a) 2, 4 and 8 b) 2, 4, 3, 5, 6, 8

1 1 1
1
Solution a)  2 4 8 7
H .M 3 24

 H .M  24  3.43
7

1 1 1 1 1 1 189
1 2 4 3 5 6 8 120
b)  
H .M 6 6
189

720
720
H .M   3.81
189

Meseret Taddesse Ejeta Page 14


Basic Statistics for Applied Science (Math 1106) Lecture Note

For discrete grouped data, the same formula is used with slight modification. For k values x1, x2,…, xk with
frequencies f1,f2,…, fk respectively, the harmonic mean is given as:
k

1 f i
H .M   i 1

 f  1 X 
 f  1 X 
k

 

i
i 
i
i 1 i
k

f
i 1
i

Example 3.16 Find the harmonic mean for the following discrete grouped data

Xi 3 6 5 4

fi 2 3 1 4

Solution
4

f
i 1
i  2  3  1  4  10

 f i  1 
Therefore, H .M   Xi   10
 4.22
 fi 2 3 1 
3 6 5

For continuous grouped data, we apply the same formula as with the discrete grouped data by taking the class
marks as class representatives.

Example 3.17 Find the harmonic mean of the following continuous grouped data on the percentage increase in
salary of 16 employees of a company.

% increase in salary 0 __ 4 5 __ 9 10 __ 14 15 __ 19

Number of employees 5 6 3 2

Solution

f i  16, X 1  2, X 2 ,  7, X 3  12 X 4  17
and f1  5, f 2 ,  6, f 3  3 and f 4  2
16
Therefore H .M   4.30
5 6 3 2
2 7 12 17

Meseret Taddesse Ejeta Page 15


Basic Statistics for Applied Science (Math 1106) Lecture Note

Merits and Demerits of Harmonic Mean

Merits

1. Harmonic mean is rigidly defined

2. It is based on all the observations

3. It is suitable for further mathematical treatment

4. It is not affected very much by fluctuations of sampling

5. Since the reciprocals of the values of the variables are involved, it gives greater weightage to smaller

observations and as such is not very much affected by one or two big observations.

6. Sometimes the variable may be in the form „x per y‟, e.g. kms.per hour, birr per kg., kg. per cubic

cm., etc. In such cases, the harmonic mean would be the proper average if equal units of x were

considered, while the arithmetic mean would be appropriate if equal units of y were considered.

Demerits
1. It is not easy to understand and calculate.
2. Its value cannot be obtained if any one of the observations is zero.
3. It is not a representative figure of the distribution unless the phenomenon requires greater weightage
to be given to smaller items. As such, it is hardly used in business problems.
Relationship among Arithmetic Mean, Geometric Mean and Harmonic Mean

1. The arithmetic mean (A.M.), the geometric mean (G.M.) and the harmonic mean (H.M.) of a series of N-

positive observations are connected by the relation:

H.M .  G.M  A.M

Example 3.18: For the numbers 2,3,4,5

4
H .M .   3.116883117
1 1 1 1
  
2 3 4 5
G.M .  4 2 x3 x 4 x5  3.30975092
23 45
A.M .   3.5
4

Meseret Taddesse Ejeta Page 16


Basic Statistics for Applied Science (Math 1106) Lecture Note

Therefore, H.M .  G.M  A.M

All the three are equal if all observations are identical.

2. For two positive numbers we also have

G.M 2  A.MxH .M .

Example 3.19 : For the numbers 5 and 6 we have :


2 60
H .M .  
1 1 11

5 6
5  6 11
A.M .  
2 2
G.M .  5 x6  30

  30 2

11 60
x
2 11
 30  30

3.4.2 Positional Measures of Central Tendency


Positional measures of central location are measures which are chosen because of their position. The most
popular positional measure is the median.

3.4.2.1 The Median


For data containing one or two extremely large or very small values, the arithmetic mean may not be
representative.

Example 3.20 Let the weights of 8 iron balls be:

138,143,141,139,152,148,130 and 267 kg. Here the mean is 161 kg. but this cannot be said to be a
representative value, because seven out of the eight given values are smaller than 161.

In cases of this sort, where the data contain a few extreme values widely different form the majority of the
values, the mean should not be used.

The center point for such problems can be described using a measure of central tendency called the median.

Definition If the given values of variable X are arranged in an increasing or decreasing order of magnitude then
the middle most value in this arrangement is called the median of X (denoted by M d or ~ x ).

The median may alternatively be defined as a value of X such that half of the given values of X are smaller than
or equal to it and half are greater than or equal to it.

Meseret Taddesse Ejeta Page 17


Basic Statistics for Applied Science (Math 1106) Lecture Note

Median for ungrouped data

There are two cases:

 n 1
th

i) When the number of values, n, is odd the middle most value i.e. the   value in the arrangement will be
 2 
the unique median of X.

 n 1
th

~
x  M d  the  observation.
 2 

In this case, the median is an actually occurring value.


ii) When n is even, there will be no unique median. Then the median is given by the
formula:
th th
n n 
  value    1 value
x  Md    2 
~ 2
2

Example 3.21 Determine the medians of

a) 0, 5, -100, -20, 80
b) 6, 7, 9, 12, 16, 20
Solution a) Arranging the data in ascending order we have: -100, -20, 0, 5, 80

Then Md =0

b) The median is the average of 9 and 12.

~ 9  12 21
X    10.5
2 2

For discrete grouped data, the median is obtained by using the same formula as with the ungrouped data after
arranging the values in an increasing order.

Example 3.22 Find the median for the following data

Values (xi)
6 3 0 2 5 1 4
Frequencies
1 20 1 15 6 6 15
(fi)

Meseret Taddesse Ejeta Page 18


Basic Statistics for Applied Science (Math 1106) Lecture Note

Solution By arranging the data to numerical size, we get

xi 0 1 2 3 4 5 6

fi 1 6 15 20 15 6 1

The total number of observations is 64 which is an even number.

the 32th value  the 33th value


Therfore, the medianis 
2
33
 3
2

The median for continuous grouped data

For continuous grouped data, the exact median cannot be obtained unless the original raw data was retained.

There are two popular ways of locating the median for grouped data; the graphic method and the algebraic
interpolation method.

The graphic method


In the graphic method the median value is found by interpolation from the ogive of the distribution. We draw a
horizontal line at half of the length of the absolute vertical scale or at 0.5 from the relative scale to the ogive;
then drop a perpendicular line to the horizontal scale to locate the median value.

n
Here the ogive of the distribution is first drawn. Then through the point on the vertical axis a line parallel to
2
the x-axis is taken, which intersects the ogive at a point. From this point a perpendicular is let fall on the x-axis.
The point at which it meets the x-axis is the median Md.

The algebraic interpolation method


In the method of algebraic interpolation, we first find the median class, the class whose cumulative frequency
 n  1
first exceeds the value of  . Then we locate the median by use of the formula for interpolation. The
 2 
formula is developed based on the geometric fact that the median is the value of x (abscissa) corresponding to
that vertical line which divides a histogram in to two parts having equal areas.

c n 
Md  ~
x l  C
f 2 

Meseret Taddesse Ejeta Page 19


Basic Statistics for Applied Science (Math 1106) Lecture Note

Where l = the lower class boundary of the median class.

n   f i = Total number of observations


i

C = Cumulative frequency of the pre-median class

f = Frequency of median class

c = Class width of median class


th
n n
Note: The median class is the class which contains the   observation. In using the formula, we use  
2 2
whether n is odd or even because the observations have already lost their originality once they are grouped in to
continuous classes.

Example 3.23 Find the mean and median for the following data.

Number of absent < 5 < 10 < 15 < 20 < 25 < 30 < 35 < 40 < 45
days
Number of 29 224 465 582 634 644 650 653 655
students

Solution:

The data should be rearranged first. The frequencies given are less than cumulative frequencies .To calculate the
frequencies in different class intervals subtract each cumulative frequency from the one immediately following.

Number of 0 __ 5 5 __ 10 10 __ 15 15 __ 20 20 __ 25 25 __ 30 30 __ 35 35 __ 40 40 __ 45 Total
absent days
Number of 29 195 241 117 52 10 6 3 2 655
students
9

fx i i
8432.5
x i 1
9
  12.8740458 12.87 days
f
655
i
i 1

Md  ~
x l 
c n
f 2

 C , n  327.5.
2

Hence, the median class is 10 _15.

Meseret Taddesse Ejeta Page 20


Basic Statistics for Applied Science (Math 1106) Lecture Note

Where l  10, c  5, f  241, C  224

 M d  10 
5
327.5  224
241
M d  12.1473029  12.15 days

Merits and Demerits of Median


Merits
1. It is rigidly defined.

2. It is easy to understand and easy to calculate for a non-mathematical person.


3. Since median is a positional average, it is not affected at all by extreme observations and as such is very
useful in the case of skewed distributions.
4. Median can be computed while dealing a distribution with open end classes.
5. Median can sometimes be located by simple inspection and can also be computed graphically.
6. It can be used dealing with qualitative characteristics which cannot be measured quantitatively but can still be

arranged in ascending or descending order of magnitude e.g., to find the average intelligence, average beauty,
average honesty etc., among a group of people.

Demerits
1. In case of even number of observations for an ungrouped data, median cannot be determined exactly.
2. Median, being a positional average, is not based on each and every item of the distribution
3. Median is not suitable for further mathematical treatment i.e., given the sizes and the median values of
different groups we cannot compute the median of the combined groups.
4. Median is relatively less stable than mean, particularly for small samples since it is affected more by
fluctuations of sampling as compared with arithmetic mean.

Meseret Taddesse Ejeta Page 21


Basic Statistics for Applied Science (Math 1106) Lecture Note

3.4.2.2 Other Positional Measures of Central Tendency


The median divides a given set of data in to two equal parts. It is also possible to subdivide a set of data in to
more than two equal parts. The measures obtained by such equal subdivisions of data are called quantiles or
fractiles.
In this sub section, we will discuss three different types of quantiles as related to the median namely the
quartiles, deciles and percentiles.

The quartiles are measures which divide a given set of data in four equal parts. We can have three quartiles.
These quartiles usually denoted by Q1, Q2 and Q3 are obtained after arranging the data in to an increasing order
and are known as the first, second and third quartiles respectively.

The deciles divide a given set of data in to ten equal parts. There are nine deciles usually denoted by D1, D2,…,
D9 . These measures are obtained after arranging the data in an increasing order and are known as the first
decile, second decile, third decile, etc.

Generally, Di is known as the ith deciles where i goes for 1 to 9.

Similarly, the percentiles divide a given set of data in to hundred equal parts. We can have 99 percentiles
denoted as P1,P2,…,P99 for the first, second, third, etc. percentiles respectively. Generally P i is used to denote
the ith percentile.

Percentile Formula
The percentile corresponding to a given value X is computed by using the following formula:

Percentile 
number of values below X   0.5
 100%
Total number of values

Example 3.24: A teacher gives a 20-point test to 10 students. The scores are shown here. Find the percentile
rank of a score of 12.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10

S o l uti o n
Arrange the data in order from lowest to highest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Then substitute into the formula.
Percentile 
number of values below X   0.5  100%
Total number of values
Since there are six values below a score of 12, the solution is
6  0.5
Percentile   100%  65th Percentile
10
Thus, a student whose score was 12 did better than 65% of the class.

Meseret Taddesse Ejeta Page 22


Basic Statistics for Applied Science (Math 1106) Lecture Note

Example 3.25 Using the data in Example 3.22, find the percentile rank for a score of 6.
Solution
There are three values below 6. Thus
3  0.5
Percentile   100%  35th Percentile
10
A student who scored 6 did better than 35% of the class.

Procedure Table
Finding a Data Value Corresponding to a Given Percentile
Step 1 Arrange the data in order from lowest to highest.
Step 2 Substitute into the formula
n p
c
100
Where
n = total number of values
p = percentile
Step 3A If c is not a whole number, round up to the next whole number. Starting at the lowest value, count
over to the number that corresponds to the rounded-up value.
Step 3B If c is a whole number, use the value halfway between the cth and (c+1)st values when counting up
from the lowest value.
Example 3.26 Using the scores in Example 3.22, find the value corresponding to the 25th percentile.

Solution

Step 1 Arrange the data in order from lowest to highest. 2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Step 2 Compute
n  p 10  25
c   2.5
100 100
Step 3 If c is not a whole number, round it up to the next whole number; in this case, c = 3. (If c is a whole
number, see Example 3.25.) Start at the lowest value and count over to the third value, which is 5. Hence, the
value 5 corresponds to the 25th percentile.

Example 3.27 Using the data set in Example 3.22, find the value that corresponds to the 60th percentile.

Solution

Step 1 Arrange the data in order from smallest to largest.


2, 3, 5, 6, 8, 10, 12, 15, 18, 20

Meseret Taddesse Ejeta Page 23


Basic Statistics for Applied Science (Math 1106) Lecture Note

Step 2 Substitute in the formula.


n  p 10  60
c  6
100 100
Step 3 If c is a whole number, use the value halfway between the c and c +1 values when counting up from the
lowest value—in this case, the 6th and 7th values.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
The value halfway between 10 and 12 is 11. Find it by adding the two values and dividing by 2.
Hence, 11 corresponds to the 60th percentile. Anyone scoring 11 would have done better than 60% of the
class.

The values of Q2, Md, D5 and P50 and those Q3 and P75 are equal. This is not a coincidence. The relationship
always holds because Md, Q2, D5 and P50 divide the set of numbers in to two equal parts, similarly, Q 3 and P75
are the values below which 75% of the numbers lie.

In fact, the following relations always hold:

i) M d  Q2  D5  P50
ii ) Q1  P25
iii) Q3  P75
iv) D1  P10 , D2  P20 , D3  P30 ,..., D9  P90

Procedure Table
Finding Data Values Corresponding to Q1, Q2, and Q3
Step 1 Arrange the data in order from lowest to highest.
Step 2 Find the median of the data values. This is the value for Q2.
Step 3 Find the median of the data values that fall below Q2. This is the value for Q1.
Step 4 Find the median of the data values that fall above Q2. This is the value for Q3.

Example 3.28 Find Q1, Q2, and Q3 for the data set 15, 13, 6, 5, 12, 50, 22, 18.
Solution
Step 1 Arrange the data in order.
5, 6, 12, 13, 15, 18, 22, 50

Step 2 Find the median (Q2).


5, 6, 12, 13, 15, 18, 22, 50

Md

13  15
Md =  14
2

Meseret Taddesse Ejeta Page 24


Basic Statistics for Applied Science (Math 1106) Lecture Note

Step 3 Find the median of the data values less than 14.
5, 6, 12, 13

Q1

6  12
Q1 =  9 . So Q1 is 9.
2
Step 4 Find the median of the data values greater than 14.
15, 18, 22, 50

Q3

18  22
Q3=  20
2

Here Q3 is 20. Hence, Q1 = 9, Q2 =14, and Q3 = 20.

For discrete grouped data, the quantiles are obtained by using the set of formulae as with the ungrouped data
after arranging the values in an increasing order.

Example 3.29 The data given below is the distribution of 99 students according to the total number of credits
they are taking in a semester.

Credit hours 8 10 15 16 17 18 20 Total

Number of students 8 10 10 16 20 25 10 99

Find the 1st quartile, 7th decile and the 60th percentile.
Solution The data is already arranged according to numerical size.
Therefore,
 n 1
th

Q2    value  (50) value  17 credit hours


th

 2 
Q1  The median of the values lessthan the 50th observation
 49  1 
th

 The   value  The 25 value  15 credit hours.


th

 2 

We know that D7  P70

To find P70 we use the formula:


n  p 99  70
c   69.3
100 100

Meseret Taddesse Ejeta Page 25


Basic Statistics for Applied Science (Math 1106) Lecture Note

Here c is not a whole number we take c = 70.


Start at the lowest value and count over to the 70 th value, which is 18 credit hours. Hence, the value 18
corresponds to the 70th percentile.

 D7  P70  18 credit hours

To find P60 we use the formula:

n  p 99  60
c   59.4
100 100

Here c is not a whole number we take c = 60.

Start at the lowest value and count over to the 60 th value, which is 17 credit hours. Hence, the value 17
corresponds to the 60th percentile.

 P60  17 credit hours

Note: The inclusion of a third column on cumulative distribution may be helpful in detecting the values.

For continuous grouped data the exact quantiles cannot be obtained unless we can restore the ungrouped data. In
such a case, we develop an approximating formula which is analogous to that of the median to each of the
quantiles. These can be given as:

Qi  l 
f

c in
4
C 
Where l  the lower class boundary of the Q i class

n  total number of observations

C  Cumulative frequency of all classes lower than the Q i class

f  Frequency of the Q i class

c  Class width of the Q i class

Di  l  
c in
f 10
C 
Where l  the lower class boundary of the Di class
n  total number of observations
C  Cumulative frequency of all classes lower than the Di class

Meseret Taddesse Ejeta Page 26


Basic Statistics for Applied Science (Math 1106) Lecture Note

f  Frequency of the Di class


c  Class width of the Di class

Pi  l  
c in
f 100
C 
Where l  the lower class boundary of the Pi class

n  Total number of observations

C  Cumulative frequency of all classes lower than the Pi class

f  Frequency of the Pi class

c  Class width of the Pi class

Example 3.30 The frequency distribution of the scores of 50 students in a final examination is given in the table
below.

Scale (%) Number of students  cf i

46-50 4 4

51-55 8 12

56-60 15 27

61-65 5 32

66-70 9 41

71-75 5 46

76-80 3 49

81-100 1 50

Find Q3, D5 and P40

Solution (i) To find Q3

Q3 is the score of the ¾ (50) th =37.5th student. This student is in the fifth class. The intermediate values needed
for the calculation of Q3 are:

l  65.5 , C  32 , f  9 , c  5

Meseret Taddesse Ejeta Page 27


Basic Statistics for Applied Science (Math 1106) Lecture Note

Therefore , Q3  65.5 
5
37.5  32
9
 65.5  3.06
 68.56

ii) To find D5

5
 50   25th student. This student is in the third Class. The values needs for its
th
D5 is the score of the
10
calculation are

l  55.5 , C  12 , f  15 , and c  5

There fore , D5  55.5 


 25  12  x5
15
 55.5  4.33
 59.83

iii) To find P40

40
 50   20th student. The 20th student is in the third class. The values needed for its
th
P40 is the score of the
100
determination are

l  55.5 , C  12 , f  15 , c  5

Therefore , P40  55.5 


5
20  12
15
 55.5  2.67
 58.17

3.4.3 The Mode


The mode is another measure of central tendency. The name‟ mode‟ was derived from the French word „mode‟
to mean „fashion‟. The mode denoted by Mo (or x̂ ) is that value in a sample or population that appears more
frequently than any other value. That is, mode is a value of an observation which occurs with the highest
frequency. This measure is suitable of qualitative (categorical) data.
Example 3.31 A store owner may want to know the „average‟ choice of color of shoes of his customers. If
„blank‟ is the most frequently chosen color, then he may take „black‟ as the modal choice of his customers.

A mode can also be obtained for a numerical set of data; but mode is especially useful in describing nominal
and ordinal data.

Meseret Taddesse Ejeta Page 28


Basic Statistics for Applied Science (Math 1106) Lecture Note

For ungrouped data (raw data), the mode is the value of the observation(s) with the highest frequency, if any.

Example 3.32 Find the mode(s) of each of the following sets of data

i) 3 5 5 4 6 5 4 5 5 4 7 8 5

ii) 3 8 8 7 4 7 2 9 7 8 1

iii) 1 3 5 8 3 8 5 1 1 3 8 5

Solution i) To make the detection of the mode(s) easier we group values with same magnitude together.

3 4 4 4 5 5 5 5 5 5 6 7 8

The most frequent value is 5 with frequency 6. Therefore, M o=5. Such distributions (sets of data) which have a
unique mode are known as uni-modal.

Sometimes, two or more values may have the same but highest frequency. In such cases the values with the
highest frequency are jointly the modes of the distribution.

ii) After grouping like values together, we get

1 2 3 4 7 7 7 8 8 8 9.

The most frequent values are 7 and 8 with frequency 3.Therefore, Mo=7 and 8 sets of data with two modes are
known as bi-modal.

Generally, sets of data with two or more modes are known as multi-modal.

iii) After grouping like values together, we get

1 1 1 3 3 3 5 5 5 8 8 8

All the values appear with the same frequency. In such cases, we say the set has no mode, i.e. the mode is non-
existent.

Mode of grouped data

For discrete grouped data, the mode is the value(s) with the highest frequency and is obtained by inspecting the
grouped data.

Example 3.33 Find the mode for the following discrete grouped data.

xi 2 4 3 5 8 7

fi 3 7 1 4 7 6

Meseret Taddesse Ejeta Page 29


Basic Statistics for Applied Science (Math 1106) Lecture Note

Solution: By inspecting the data the highest frequency is 7 and the values with that frequency are 4 and 8
therefore, Mo=4 and 8.

The mode of a continuous frequency distribution very often can be approximated by the midpoint of the modal
class- the class with the greatest frequency density or class containing the largest number of class frequencies.
Thus for the example 3.28, Mo=58, the midpoint of the 3rd class with frequency 15. This method of locating the
mode is quite satisfactory when frequency densities in the class immediately before the modal class (the pre-
modal class) and immediately after the modal class (the post modal class) are approximately equal. When this
condition is not met, more satisfactory results can be obtained by algebraic interpolation with the following
formula by making the following assumptions.

i. The set of data is uni-modal.

ii. The classes have equal width.

iii. The modal class, the class in which the mode is expected to corresponds with the class

having maximum frequency.

The formula is given as

f p  f p1
M d  xˆ  l  xc
( f p  f p1 )  ( f p  f p1 )

Where l  the lower class boundary of the modal class

f p  Frequency of modal class

f p1 = Frequency of pre-modal class

f p 1  Frequency of post modal class

c  The class width of the modal class.

Example 3.34 The following distribution was obtained from the age distribution of 228 housewives.

Age(in years) 15 -- 19 20 -- 24 25 -- 29 30 -- 34 35 -- 39 40 -- 44 45 -- 49
Number of
women 6 19 50 57 48 27 21

Find the modal age of the house wives.

Solution The modal class is the 4th class.

Meseret Taddesse Ejeta Page 30


Basic Statistics for Applied Science (Math 1106) Lecture Note

Then l  29.5 years, f p  57 years, f p 1  50 years, f p 1 = 48 year and c  5 years.

Therefore, M o  29.5 years 


 57  50  x5
 57  50    57  48
 29.5 years  2.1875
 31.7 years

Merits and Demerits of Mode

Merits

1. It is easy to calculate and understand in case of ungrouped data. In some cases it can be located merely by

inspection. It can also be estimated graphically from a histogram.

2. It is not at all affected by extreme observations and as such is preferred to arithmetic mean while dealing with

extreme observations.

3. It can be conveniently obtained in the case of open end classes which do not pose any problem here.

4. It may be obtained for qualitative data.

Demerits
1. It is not rigidly defined i.e., it may not exist (when no two values in a set are alike or when all values are
equally frequent).
2. It may not be unique when the set of data is multi-modal.
3. It is not based on all observations.
4. It is not suitable for further mathematical treatment.
5. As compared with mean, mode is affected to a great extent by the fluctuations of sampling.

Meseret Taddesse Ejeta Page 31

You might also like