1
Dr. Muhammad Riaz ■ Department of Mathematics, University of the Punjab, Lahore ■
Email:
[email protected] Measures of Location
Measures of Center Tendency
Chapter 2
Average:
“An average is a single value, which represents the data. Average is also called “measures of central
tendency” or measure of location” Since the averages tend to lie in the center of the distribution they
are called measures of center tendency. They are also called measures of location because they
locate the center of distribution
Arithmetic mean
The Arithmetic mean or simply the mean is defined as “The value obtained by dividing the sum of the
__
values by their number” It is denoted by X
For ungrouped data
__
1. X =
X (Direct Method)
n
__
2. X = A +
D where D = X – A (Short-cut Method)
n
__
X =A+
U h where U =
X A
(Short-cut Method) *For equal interval data only.
n h
For grouped data
1. X =
__ fX
(For grouped data)
f
2. X A
fD where D = X – A (Short-cut Method)
f
__
X = A + X A
fU h where U =
X A
(Short-cut Method) *For equal interval data only.
f h
Examlpe: Find the Arithmetic Mean or Mean of the marks obtained by 9 students are given below:
45, 32, 37, 46 ,39, 36, 41, 48, 36
First Method X
X
360
40 marks.
n 9
2
X D
32 -7 Second Method
36 -3
36 -3 Let A=39, D = X – A = X-39,
X A 39 40
37 -2 D 9
39 0 n 9
41 2
45 6
46 7
48 9
X 360 D 9
Examlpe 2: Find the Arithematic Mean or Mean of the data: 6,12,18,24,30,36,42,48,54,60
X DX A X 30 First Method
U
X
6 X 330
33
6 n 10
12
18 Second Method
24 Let A=30, D = X – A = X-30,
X A 30 33
30 D 30
36
n 10
42
48 Third Method
54 X A X 30
60 U=
h 6
X 330 D 30 U 5 __
U h = 30 5 6 33
X =A+
n 10
Properties of Arithmetic mean:
__
1) The sum of the deviation of values from their mean is zero i.e. ( X X ) 0 .
2) The sum of squares of the deviations of the values from any values ‘a’ is minimum if and only if
__ __
a= X symbolically ( X a) 2
is minimum if and only if a X
__ __
3) Combined Mean: If n1 values have mean X 1 , n2 values have mean X 2 , …… and so on nk values
__
have mean X k , the mean of all the values is
X =
n1 X1 n2 X 2 n3 X 3 ............. nk X k
= nX
n1 n2 n3 ................ nk n
4) Arithmetic mean is affected by change of origin and scale.
__ __
(i) If Y = X b then Y X b (ii) If Y = aX, then Y a X
3
__
OR If Y = aX b then Y aX b
5) The Mean of constant values is constant itself i.e If X = a then X = a
=
W 1 X 1 W 2 X 2 W 3 X 3 ..................WkXk WX
Weighted Mean: Xw =
W 1 W 2 W 3 ................ Wk W
Example: Calculate the weighted mean
Items Expenditures (X) Weights (W) WX Weighted mean is
Xw
Food 290 7.5 2175 WX 2542.5
203.4
Rent
Clothing
54
98
2.0
1.5
108
147
W 12.5
Fuel and light 75 1.0 75
Other items 75 0.5 37.5
Total 12.5 2542.5
Examlpe 3: Using the data given below
X 11, 13, 14, 18, 24, 26, 27
__ __
Prove that (i) ( X X ) 0 (ii) If Y = 2 X 3 then Y 2 X 3
X X X Y
11
X X
133
19
13 n 7
Y
14 Y 287
18 41
n 7
24 __
26 Y 2 X 3 2 19 3 41
27
X 133 __
( X X ) 0 Y 287
Example: Combined Mean
Section Number of Mean height
boys
A 40 62”
B 37 58”
C 43 61”
n1 x1 n2 x2 n3 x3 7249
Combined Mean is X 60.4"
n1 n2 n3 120
4
Geometric Mean: The Geometric mean, G of a set of n positive values X1, X2, X3, X4,………… Xn is the
nth root of the product of the values. Its formula is given by:
G= n X1. X 2 . X 3 ..... X n (by definition)
G = Anti log
log X (For Ungrouped data)
n
G = Anti log
f log X
(For Grouped data)
f
Examlpe: The marks obtained by 9 students are given beow:
45, 32, 37, 46 ,39, 36, 41, 48, 36
G 9 45 32 ... 36 39.68
log X
Alternative Method G anti log anti log 1.59856 39.68
n
Geometric mean is use when we measure the average growth or depreciation in the data
Note:- It is not possible to find Geometric mean of the data if
i- any observation in the data is zero
ii- any observation in the data is negative
Harmonic mean: The Harmonic mean H, of a set of n values X1, X2, … Xn is the reciprocal of the
arithmetic mean of the reciprocal of the values.
H=
n
(For Ungrouped data) H=
f (For Grouped data)
1
f X
1
X
Harmonic mean is use when we measure average speed or rate in the data.
Example: Suppose a car is running at 15km/hr for first 30km, 20km/hr for second 30km, 25km/hr for
third 30km. The distance is constant but times are variable. Therefore, the Harmonic mean is the correct
average.
1 1 1
H Re ciprocal of 15 20 25 19.15km / hr
3
Note:- It is not possible to find Harmonic mean of the data if any observation in the data is zero
Example: Given the following frequency distribution
Weights 65-84 85-104 105-124 125-144 145-164 165-184 185-204
f 9 10 17 10 5 4 5
5
Solution:
Weights f X fX flogX f/X
65-84
85-104
105-124
125-144
145-164
165-184
185-204
f 60 f X 7350 f log X f
X 0.53044
124.2483
X
fX 122.5 H
f
60
113.11
f 60 f
X 0.53044
G Anti log
f log X Anti log
124.2483
Anti log 2.0708 117.7
f 60
Mode: The mode is defined as that value in the data which occurs the greatest number of times provided
such value exits. It is denoted by x . A data may have more than one mode or no mode at all.
f m f1
Mode for group data x = l h Where
( f m f1 ) ( f m f 2 )
Median: The median of a set of values arranged in ascending order of magnitude is defined as the middle
value if the number of values is odd and the mean of two middle values if the number of values is even.
For ungroup data x = the middle term
h n
For group data x = l ( c) Where
f 2
n
(i) If n is odd (or is not integer) the median is the middle value of the array in ascending order. i.e
2
n 1 n
Median = th terms. (or round off to the next integer)
2 2
n
(ii) If n is even (or is not integer) the median is the average of middle two terms i.e the average of
2
n n
th and 1 th terms.
2 2
6
Selecting suitable measure of central tendency:
For quantitative variables, arithmetic mean is usually appropriate, if the quantitative values are unequal
importance then weighted mean use For the categorical / quality variables, the median or mode is
appropriate measure.
Geometric mean and Harmonic mean are useful for growth or depreciation and speed or rates
For the open end distribution, it is not possible to calculate A.M, H.M, G.M
QUANTILES:
For ungrouped data:
n n
Q1 = value of item, Q2 = median Q3 = value of 3 th item
4 4
For grouped data:
h n h 3n
Q1 = l ( c) Q2 = median Q3 = l ( c)
f 4 f 4
Deciles:
For ungrouped data: D j = jth deciles
jn jn
If is not integer then D j is the observation with ordinal number 1
10 10
jn
OR round off to the next integer.
10
jn jn jn
If is an integer then D j is the average of two observations with ordinal number and 1
10 10 10
h n h 9n
D1 l ( C) D9 l ( C)
For Group Data: f 10 f 10
Percentiles:
For ungrouped data: Pj = jth percentile
jn jn
If is not integer then Pj is the observation with ordinal number 1
100 100
jn
OR round off to the next integer.
100
jn jn jn
If is an integer then Pj is the average of two observations with ordinal number and 1
100 100 100
7
For group data:
h n h 99n
P1 = l ( c) P99 = l ( c)
f 100 f 100
Quantiles: Collectively the quartiles, deciles, percentiles and other values obtained by equal sub division
of the data are called quantiles.
Examlpe: The marks obtained by 9 students are given below:
45, 32, 37, 46 ,39, 36, 41, 48, 36
Array the data in ascending order 32,36,36,37,39, 41, 45, 46, 48
n
Here n=9, 4.5 is not integer (round off to next integer)
2
Hence Median = value of 5th term = 39.
n
Now 2.25 is not integer (round off to next integer) so Q1 = 3rd item = 36
4
3n
Now 6.75 is not integer (round off to next integer) so Q3 = 7th item = 45
4
Example: Given the following frequency distribution
Marks 30-39 40-49 50-59 60-69 70-79 80-89 90-99
f 8 87 190 304 211 85 20
Find Mean, Median and Mode. Also Find Q1, Q3 , D8 , P68
Marks Class boundaries X(mid f c.f
points)
30-39 29.5-39.5 34.5 8 8
40-49 39.5-49.5 44.5 87 95
50-59 49.5-59.5 54.5 190 285
60-69 59.5-69.5 64.5 304 589
70-79 69.5-79.5 74.5 211 800
80-89 79.5-89.5 84.5 85 885
90-99 89.5-99.5 94.5 20 905
905
Mean = X
fX ?
f 905
h n 10
Median = l C 59.5 452.5 285 65
f 2 304
f m f1 304 190
Mode = l h 59.5 10 65
f m f1 f m f 2 304 190 304 211
8
hn 10
Q1 l c 49.5 226.25 95 56
f 4 190
h 3n 10
Q3 l c 69.5 678.75 589 74
f 4 211
h 8n 10
D8 l c 69.5 724 589 76
f 10 211
h 88n
P68 l c ?
f 100
Question: Find Mean, Median, Mode, Q1 , Q3 , D4 , P74 for the following frequency distribution
Weights 65-84 85-104 105-124 125-144 145-164 165-184 185-204
f 9 10 17 10 5 4 5
Short Questions with Answers:
Q No 1: In which situation, would you consider the geometric mean as an appropriate measure as compared
to the mean?
Ans: When we are dealing with quantities that change over a period of time, we want an average rate of
change, such as an average growth rate over a period of several years. In this situation, the
geometric is the appropriate measure instead of the mean.
Q No 2: How would you locate the median in a histogram?
Ans: Geometrically median is the value of X (abscissa) corresponding to the vertical line which divides
a histogram into two parts having equal areas.
Q No 3: Which of the averages would be suitable for each of the following. (i) Income of workers in a
factory (ii) Heights of students (iii) Marks obtained by students in a test (iv) Dress and shoe sizes
(v) Number of apples on trees (vi) Intensity.
of colour (vii) Rates of increase in incomes (viii) Average speed.
Ans: (i) Median (ii) Mean (iii) Mean (iv) Mode (v) Mode (vi) Median (vii) Geometric mean (viii)
Harmonic mean
Unimodal : A distribution has a single mode than it is called unimodal distribution
Bimodal: A distribution having two mode than it is called Bimodal distribution
Multimodal: A distribution having more than two mode than it is called multimodal distribution
The Empirical Relation between the Mean, Median and Mode
Mode = 3 median – 2 mean OR Mean-Mode = 3(Mean – Median)
Relation between A.M, H.M and G.M
H.M ≤ G.M ≤ A.M OR A.M ≥ G.M ≥ H.M
9
Equality sign hold when all values are equals
Q No 4: Define symmetrical, positively skewed and negatively skewed distribution.
Ans: In a Symmetrical distribution or normal distribution Mean = Median = Mode
Positively skewed distribution: Mean > Median > Mode
Negatively skewed distribution: Mean < Median < Mode
MEASURES OF DISPERISON,SKEWNESS AND KURTOSIS
The Range: R X L X S = largest value – smallest value
Q3 Q1
The Semi-Interquartile Range or The Quartile Deviation: Q.D. =
2
| X X |
The Mean Deviation from Mean M.D. =
n
The standard Deviation
( X X ) 2
s standard deviation (unbiased)
n 1
10
( X X ) 2
S standard deviation (biased)
n
The Variance: The variance is defined as the square of the standard deviation, i.e. the mean of the squared
deviations from the mean. It is given by
n
1
s2
n 1
i 1
( X i X )2 (for ungrouped data)
n
1
S2
n
i 1
( X i X )2 (for ungrouped data)
1 k
f ( X X )2
S2
n
i 1
fi ( X i X )2
n
(for grouped data)
Properties of the Variance and Standard deviation
i) The variance of a constant is zero Var (a) = 0.
ii) The variance are independent of origin, i.e. Var (X + a) = Var (X)
iii) Var (aX) = a2 Var (X) where a is constant.
iv) If X and Y are independent random variables
Var X +Y Var X +Var Y and Var X – Y Var X Var Y
In general Var aX bY c a 2Var X b 2Var Y 2abCov X , Y
X X Y Y .
1
Where Cov X , Y
N
If X and Y are independent then Cov X , Y 0
v) The variance has the minimal property. This means that the variance or the standard deviation
is a minimum if and only if the deviation are taken form the mean. In other words,
1 n
( X i a)2 is minimum if and only if a = X .
n i 1
Empirical Relations between Mean, Median and Mode
Mode=3Median-2Mean
Empirical Relations between Measures of Dispersion
For the normal distribution
i) Man Deviation = 0.7979 (Standard Deviation)
ii) Semi-Interquartile Range = 0.6745 (Standard Deviation)
These relationships hold approximately for moderately skewed distributions. We, therefore, have the
following empirical formulae.
11
4
i) Mean Deviation = (Standard Deviation)
5
2
ii) Semi-Interquartile Range = (Standard Deviation)
3
5
iii) Semi-Interquartile Range = (Mean Deviation)
6
Examlpe: The marks obtained by a sample of size n = 9 students are given below:
45, 32, 37, 46 ,39, 36, 41, 48, 36
X
X 360
Sample Mean 40
n 9
M.D from mean =
X X 40 4.3
n 9
X X
2
232
Standard deviation S= 5.08
n 9
X X
2
232
Variance S2 25.78
n 9
Examlpe: A population of size N = 10 has observations 7,8,10,13,14,19,20,25,26 and 28.
Find the population mean, population standard deviation and population variance.
7,8,10,13,14,19,20,25,26, 28
Population Mean
X 170 17
N 10
X
2
534
Population Standard deviation 7.31
N 10
X
2
534
Population Variance 2
53.4
N 10
Examlpe: Array the data in ascending order 32,36,36,37,39, 41, 45, 46, 48
To find trimmed mean and trimmed S.D
W remove all values before Q1 and remove all values after Q3
So we use 36,37,39, 41, 45
36 37 39 41 45
Trimmed Mean = 39.6
5
X X
2
Trimmed S.D = 5.5
5
To find Winsorized mean and Winsorized S.D
W replace all values before Q1 by Q1 and replace all values after Q3 by Q3
So we use 36,36,36,37,39, 41, 45, 45, 45
12
36 36 36 37 39 41 45 45 45
Winsorized Mean 40
9
X X
2
Winsorized S.D = 3.86
5
Short Questions with answers
Q No 1: What is meant by dispersion?
Sol: By dispersion, we mean the extent to which the values in a series are spread out from the average.
Q No 2: What does quartile deviation measure in a distribution?
Sol: In distributions where there is not complete symmetry, the quartile deviation measures the
average distance from the quartiles to the median.
Q No 3: The range is generally a poor measure of dispersion. Explain.
Sol: Consider two series: (a) 3, 5, 6, 7, 10, 12, 15, 18 (b) 3, 8, 8, 8, 9, 9, 9, 18. in both series, range
is 18 – 3 =15, but there is much more variation or dispersion in (a) than in (b). Since the range
indicates no difference between the two series, it is a poor measure of dispersion in this case.
Q No 4: By adding 5 to each of numbers in the set 3, 6, 2, 1, 5, 7, we obtain the set 8, 11, 7, 6, 10, 12.
show that the two sets have the same standard deviation but different means.
Solve: Denoting the first and second sets by X and Y respectively, X = X / n = 24/6 = 4 and
Y Y / n = 54/6 = 9. The relation between X and Y is given by Y = 5 + 4 or y = 5 + X .
Since the standard deviation is independent of the origin (5 in this case), it will remain the same
for the two sets.
Q No 5: The scores obtained by five students on a set of examination papers were 70, 50, 60, 70, 50.
These scores are changed by (i) adding 10 points to all scores (ii) increasing all scores by
10%. What effect will these changes have on the standard deviation?
Solve: (i) Denoting the given scores by X and changed scores by adding 10 by Y, then Y = X + 10.
By property of the variance
Var(Y) = Var (X + 10) = Var (X) or S.D. (Y) = S.D. (X).
Thus there is no change in the standard deviation of Y.
(ii) In this case the original scores X’s are increased by 10% and these scores are 77, 55, 66, 77, 55.
By calculating the standard deviation of both sets, we find that the standard deviation of the changed set is
increased by 10%.
Q No 6: Given X = 10 and Var (X) = 4. Find Y and Var (Y) when
1
(i) Y = 2X 1 (ii) Y = (4 + 3X)
3
Sol: (i) Y =2X –1= 2 10 –1 19 , Var(Y) = Var(2x – 1) = 4 Var(X) = 4(4) = 16
1 4 1 4
(ii) Y = (4 + 3 X ) = + X =34/3, Var(Y) = Var (4 3x) = Var X = Var(X)
3 3 3 3
13
Q No 7: A manufacturer of T.V. tubes has two types of tubes A and B. The tubes have respective mean
lifetimes X A = 1495 hours and X B = 1895 hours S A 280, S B 310 . Which tube has the
grater (i) absolute dispersion (ii) relative dispersion?
Solve: (i) Since SB is greater than SA, tube B has greater absolute dispersion.
SA 280
(ii) C.V. (A) = 100 100 18.73% (Co-efficient of variation)
XA 1495
SB 310
C.V. (B) = 100 100 16.36%
XB 1895
Since C.V. (A) is greater than C.V. (B), tube A has greater relative dispersion.
Moment about origin
r
X i
r for Papulation data
N
mr X ir for Sample data
n
Moment about Mean: Moments are certain parameters, which are used for testing the symmetry and
normality of a distribution
r
( xi )r
for Papulation data
N
mr
( xi x )r for Sample data
n
The relationship between Raw Moments & Central Moment
1 0
2 2 ( 1) 2
3 3 3 2 1 2( 1)3
4 4 4 3 1 6 2 ( 1) 2 3( 1) 4
Variance S 2 m2 and 2 2 (Variance is equal to Second moment about mean)
3 2 4 m32 m4
Moments Ratios: 1 , 2 (for sample) and b , 2 for population.
23 2 2 1
m23 m2 2
Examlpe: From a sample data 32,36,36,37,39, 41, 45, 46, 48 . Find first four moments about mean.
X X 0, m X X
2
X 40, m1 2 Var ( X ) 25.78
n n
X X X X
3 4
m3 20.67, m4 1189.78
n n
14
Examlpe: A population data in ascending order 3,4,6,8,11,12,14,15. Find first four moments about mean.
9.125,
X 0, X X
2
X 144.83
1 2 2
18.10
N N n 8
X
X
3 4
47.431 4031.08
3 5.91, 4 503.8
N 8 N 8
KURTOSIS
Lepto- kurtic Distribution:
A distribution having a relatively higher peak is called Lepto kurtic Distribution 2 3
Platy- kurtic Distribution:
A distribution which is flat-topped is called Platy kurtic Distribution 2 3
Meso- kurtic Distribution: or Normal
A distribution which is neither peaked nor high flat-topped is called Meso kurtic Distribution 2 3
Kurtosis: To show the degree of peaked ness of the distribution
m4
(I) Moment of Co-efficient of Kurtosis 2
m22
(II) Percentile of Co-efficient of Kurtosis Q.D
k
P90 P10
15
FORMULAS SHEET
Arithmetic Mean
Group Data Ungroup Data
X
fX X
f N
a
X a
fD short cut or deviation D
where “D” is X – a
f N
X a
fu h where u
xa
X a
u h where u
xa
f h N h
step deviation or coding
Weighed Mean
Xw
WX Xw
WX
W W
Geometric Mean
f log X n X1 X 2 ....X n
anti log
f
X1 X 2 X 3 ... X n 1 / n
Median
~ h n ~ n 1
X l C X th value
f 2 2
Harmonic Mean
H .M
f
H .M
n
f 1
X
X
Quartiles
h kn n 1
Qk C Qk K th value
f 4 4
Deciles
h kn n 1
Dk C Dk K th value
f 10 10
Percentile
h kn n 1
Pk C Pk K th value
f 100 100
Mode
16
Xˆ
f m f1 h
Most Common Value
f m f1 f m f 2
Range
R Xm X0 Xm X0
Co-eff of dispersion =
Xm X0
Mean Deviation
M .D
f X X M .D
X X from mean
f n
Q3 Q1
Q.D = or semi inter quartile range
2
Q3 Q1
Co-efficient of Quartile Deviation
Q3 Q1
Variance
Population Variance Sample Variance
X X X
2 2
2
S2
N n
Karl Person’s coefficient of Skewness
Mean Mode
Karl Person’s 1st coefficient of Skewness S.K
S.D
3Mean Median
Karl Person’s 2nd coefficient of Skewness S .K
S .D
Bow ley’s or Quartiles Co-efficient S .K Q 3 Q1 2median
Q 3 Q1