0% found this document useful (0 votes)
297 views16 pages

Chapter 2 Statistics

This document discusses various measures of central tendency including the arithmetic mean, weighted mean, and geometric mean. [1] The arithmetic mean, also called the average, is calculated by summing all values and dividing by the total number of values. [2] The weighted mean weights each value by an additional weight value before calculating the average. [3] The geometric mean takes the nth root of the product of all values, and is used to measure average growth or depreciation rates.

Uploaded by

Sam Sam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
297 views16 pages

Chapter 2 Statistics

This document discusses various measures of central tendency including the arithmetic mean, weighted mean, and geometric mean. [1] The arithmetic mean, also called the average, is calculated by summing all values and dividing by the total number of values. [2] The weighted mean weights each value by an additional weight value before calculating the average. [3] The geometric mean takes the nth root of the product of all values, and is used to measure average growth or depreciation rates.

Uploaded by

Sam Sam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

1

Dr. Muhammad Riaz ■ Department of Mathematics, University of the Punjab, Lahore ■


Email: [email protected]
Measures of Location
Measures of Center Tendency
Chapter 2
Average:
“An average is a single value, which represents the data. Average is also called “measures of central
tendency” or measure of location” Since the averages tend to lie in the center of the distribution they
are called measures of center tendency. They are also called measures of location because they
locate the center of distribution
Arithmetic mean
The Arithmetic mean or simply the mean is defined as “The value obtained by dividing the sum of the
__
values by their number” It is denoted by X

For ungrouped data


__
1. X =
X (Direct Method)
n
__
2. X = A +
 D where D = X – A (Short-cut Method)
n
__
X =A+
U  h where U =
X A
(Short-cut Method) *For equal interval data only.
n h
For grouped data

1. X = 
__ fX
(For grouped data)
f

2. X  A 
 fD where D = X – A (Short-cut Method)
f
__
X = A + X  A
 fU  h where U =
X A
(Short-cut Method) *For equal interval data only.
f h

Examlpe: Find the Arithmetic Mean or Mean of the marks obtained by 9 students are given below:
45, 32, 37, 46 ,39, 36, 41, 48, 36

First Method X 
X 
360
 40 marks.
n 9
2

X D
32 -7 Second Method
36 -3
36 -3 Let A=39, D = X – A = X-39,

X  A    39   40
37 -2 D 9
39 0 n 9
41 2
45 6
46 7
48 9
 X  360  D  9

Examlpe 2: Find the Arithematic Mean or Mean of the data: 6,12,18,24,30,36,42,48,54,60

X DX A X  30 First Method


U
X 
6 X 330
 33
6 n 10
12
18 Second Method
24 Let A=30, D = X – A = X-30,

X  A    30   33
30 D 30
36
n 10
42
48 Third Method
54 X  A X  30
60 U= 
h 6
 X  330  D  30 U  5 __
U  h = 30  5  6  33
X =A+
n 10

Properties of Arithmetic mean:


__
1) The sum of the deviation of values from their mean is zero i.e.  ( X  X )  0 .
2) The sum of squares of the deviations of the values from any values ‘a’ is minimum if and only if
__ __
a= X symbolically  ( X  a) 2
is minimum if and only if a  X
__ __
3) Combined Mean: If n1 values have mean X 1 , n2 values have mean X 2 , …… and so on nk values
__
have mean X k , the mean of all the values is

X =
n1 X1  n2 X 2  n3 X 3  .............  nk X k
= nX
n1  n2  n3  ................  nk n
4) Arithmetic mean is affected by change of origin and scale.
__ __
(i) If Y = X  b then Y  X  b (ii) If Y = aX, then Y  a X
3

__
OR If Y = aX  b then Y  aX  b
5) The Mean of constant values is constant itself i.e If X = a then X = a

= 
W 1 X 1  W 2 X 2  W 3 X 3  ..................WkXk WX
Weighted Mean: Xw =
W 1  W 2  W 3  ................  Wk W
Example: Calculate the weighted mean

Items Expenditures (X) Weights (W) WX Weighted mean is

Xw  
Food 290 7.5 2175 WX 2542.5
  203.4
Rent
Clothing
54
98
2.0
1.5
108
147
W 12.5
Fuel and light 75 1.0 75
Other items 75 0.5 37.5
Total 12.5 2542.5

Examlpe 3: Using the data given below

X 11, 13, 14, 18, 24, 26, 27


__ __
Prove that (i) ( X  X )  0 (ii) If Y = 2 X  3 then Y  2 X  3

X X X Y
11
X X
133
 19 
13 n 7

Y  
14 Y 287
18  41
n 7
24 __
26 Y  2 X  3  2 19   3  41
27
 X  133 __

( X  X )  0 Y  287

Example: Combined Mean

Section Number of Mean height


boys
A 40 62”
B 37 58”
C 43 61”

n1 x1  n2 x2  n3 x3 7249
Combined Mean is X    60.4"
n1  n2  n3 120
4

Geometric Mean: The Geometric mean, G of a set of n positive values X1, X2, X3, X4,………… Xn is the
nth root of the product of the values. Its formula is given by:

G= n X1. X 2 . X 3 ..... X n (by definition)

G = Anti log
 log X (For Ungrouped data)
n

G = Anti log 
f log X
(For Grouped data)
f
Examlpe: The marks obtained by 9 students are given beow:
45, 32, 37, 46 ,39, 36, 41, 48, 36

G  9 45  32  ...  36  39.68

  log X 
Alternative Method G  anti log    anti log 1.59856   39.68
 n 

Geometric mean is use when we measure the average growth or depreciation in the data
Note:- It is not possible to find Geometric mean of the data if
i- any observation in the data is zero
ii- any observation in the data is negative
Harmonic mean: The Harmonic mean H, of a set of n values X1, X2, … Xn is the reciprocal of the
arithmetic mean of the reciprocal of the values.

H=
n
(For Ungrouped data) H=
f (For Grouped data)
1
f X
1
X
Harmonic mean is use when we measure average speed or rate in the data.
Example: Suppose a car is running at 15km/hr for first 30km, 20km/hr for second 30km, 25km/hr for
third 30km. The distance is constant but times are variable. Therefore, the Harmonic mean is the correct
average.

1 1 1
 
H  Re ciprocal of 15 20 25  19.15km / hr
3

Note:- It is not possible to find Harmonic mean of the data if any observation in the data is zero
Example: Given the following frequency distribution

Weights 65-84 85-104 105-124 125-144 145-164 165-184 185-204


f 9 10 17 10 5 4 5
5

Solution:

Weights f X fX flogX f/X


65-84
85-104
105-124
125-144
145-164
165-184
185-204
 f  60  f X  7350  f log X f
 X  0.53044
 124.2483

X
 fX   122.5 H
f 
60
 113.11
f 60 f
X 0.53044

G  Anti log
 f log X  Anti log
124.2483
 Anti log  2.0708   117.7
f 60

Mode: The mode is defined as that value in the data which occurs the greatest number of times provided

such value exits. It is denoted by x . A data may have more than one mode or no mode at all.
 f m  f1
Mode for group data x = l h Where
( f m  f1 )  ( f m  f 2 )

Median: The median of a set of values arranged in ascending order of magnitude is defined as the middle
value if the number of values is odd and the mean of two middle values if the number of values is even.

For ungroup data x = the middle term

h n
For group data x = l (  c) Where
f 2

n
(i) If n is odd (or is not integer) the median is the middle value of the array in ascending order. i.e
2

 n 1  n
Median =   th terms. (or round off to the next integer)
 2  2

n
(ii) If n is even (or is not integer) the median is the average of middle two terms i.e the average of
2
n  n 
  th and   1 th terms.
2 2 
6

Selecting suitable measure of central tendency:


For quantitative variables, arithmetic mean is usually appropriate, if the quantitative values are unequal
importance then weighted mean use For the categorical / quality variables, the median or mode is
appropriate measure.
Geometric mean and Harmonic mean are useful for growth or depreciation and speed or rates
For the open end distribution, it is not possible to calculate A.M, H.M, G.M

QUANTILES:
For ungrouped data:

n n 
Q1 = value of item, Q2 = median Q3 = value of 3   th item
4 4

For grouped data:

h n h 3n
Q1 = l  (  c) Q2 = median Q3 = l  (  c)
f 4 f 4

Deciles:

For ungrouped data: D j = jth deciles

jn jn
If is not integer then D j is the observation with ordinal number 1
10 10

jn
OR round off to the next integer.
10

jn jn jn
If is an integer then D j is the average of two observations with ordinal number and 1
10 10 10

h n h 9n
D1  l  (  C) D9  l  (  C)
For Group Data: f 10 f 10

Percentiles:

For ungrouped data: Pj = jth percentile

jn jn
If is not integer then Pj is the observation with ordinal number 1
100 100
jn
OR round off to the next integer.
100
jn jn jn
If is an integer then Pj is the average of two observations with ordinal number and 1
100 100 100
7

For group data:

h n h 99n
P1 = l  (  c) P99 = l  (  c)
f 100 f 100

Quantiles: Collectively the quartiles, deciles, percentiles and other values obtained by equal sub division
of the data are called quantiles.
Examlpe: The marks obtained by 9 students are given below:
45, 32, 37, 46 ,39, 36, 41, 48, 36

Array the data in ascending order 32,36,36,37,39, 41, 45, 46, 48

n
Here n=9,  4.5 is not integer (round off to next integer)
2
Hence Median = value of 5th term = 39.

n
Now  2.25 is not integer (round off to next integer) so Q1 = 3rd item = 36
4
3n
Now  6.75 is not integer (round off to next integer) so Q3 = 7th item = 45
4
Example: Given the following frequency distribution

Marks 30-39 40-49 50-59 60-69 70-79 80-89 90-99


f 8 87 190 304 211 85 20
Find Mean, Median and Mode. Also Find Q1, Q3 , D8 , P68

Marks Class boundaries X(mid f c.f


points)
30-39 29.5-39.5 34.5 8 8
40-49 39.5-49.5 44.5 87 95
50-59 49.5-59.5 54.5 190 285
60-69 59.5-69.5 64.5 304 589
70-79 69.5-79.5 74.5 211 800
80-89 79.5-89.5 84.5 85 885
90-99 89.5-99.5 94.5 20 905
905

Mean = X 
 fX  ?
f 905

h n  10
Median = l    C   59.5   452.5  285   65
f 2  304

f m  f1 304  190
Mode = l   h  59.5  10  65
 f m  f1    f m  f 2   304  190   304  211
8

hn  10
Q1  l    c   49.5   226.25  95  56
f 4  190

h  3n  10
Q3  l    c   69.5   678.75  589   74
f 4  211

h  8n  10
D8  l    c   69.5   724  589   76
f  10  211

h  88n 
P68  l   c ?
f  100 

Question: Find Mean, Median, Mode, Q1 , Q3 , D4 , P74 for the following frequency distribution

Weights 65-84 85-104 105-124 125-144 145-164 165-184 185-204


f 9 10 17 10 5 4 5
Short Questions with Answers:
Q No 1: In which situation, would you consider the geometric mean as an appropriate measure as compared
to the mean?
Ans: When we are dealing with quantities that change over a period of time, we want an average rate of
change, such as an average growth rate over a period of several years. In this situation, the
geometric is the appropriate measure instead of the mean.
Q No 2: How would you locate the median in a histogram?
Ans: Geometrically median is the value of X (abscissa) corresponding to the vertical line which divides
a histogram into two parts having equal areas.
Q No 3: Which of the averages would be suitable for each of the following. (i) Income of workers in a
factory (ii) Heights of students (iii) Marks obtained by students in a test (iv) Dress and shoe sizes
(v) Number of apples on trees (vi) Intensity.
of colour (vii) Rates of increase in incomes (viii) Average speed.
Ans: (i) Median (ii) Mean (iii) Mean (iv) Mode (v) Mode (vi) Median (vii) Geometric mean (viii)
Harmonic mean
Unimodal : A distribution has a single mode than it is called unimodal distribution
Bimodal: A distribution having two mode than it is called Bimodal distribution
Multimodal: A distribution having more than two mode than it is called multimodal distribution
The Empirical Relation between the Mean, Median and Mode
Mode = 3 median – 2 mean OR Mean-Mode = 3(Mean – Median)

Relation between A.M, H.M and G.M


H.M ≤ G.M ≤ A.M OR A.M ≥ G.M ≥ H.M
9

Equality sign hold when all values are equals


Q No 4: Define symmetrical, positively skewed and negatively skewed distribution.
Ans: In a Symmetrical distribution or normal distribution Mean = Median = Mode

Positively skewed distribution: Mean > Median > Mode


Negatively skewed distribution: Mean < Median < Mode

MEASURES OF DISPERISON,SKEWNESS AND KURTOSIS

The Range: R  X L  X S = largest value – smallest value

Q3  Q1
The Semi-Interquartile Range or The Quartile Deviation: Q.D. =
2

| X  X |
The Mean Deviation from Mean M.D. =
n
The standard Deviation

( X  X ) 2
s  standard deviation (unbiased)
n 1
10

( X  X ) 2
S  standard deviation (biased)
n
The Variance: The variance is defined as the square of the standard deviation, i.e. the mean of the squared
deviations from the mean. It is given by
n
1
s2 
n 1

i 1
( X i  X )2 (for ungrouped data)

n
1
S2 
n

i 1
( X i  X )2 (for ungrouped data)

1 k
 f ( X  X )2
S2 
n

i 1
fi ( X i  X )2 
n
(for grouped data)

Properties of the Variance and Standard deviation


i) The variance of a constant is zero Var (a) = 0.
ii) The variance are independent of origin, i.e. Var (X + a) = Var (X)
iii) Var (aX) = a2 Var (X) where a is constant.
iv) If X and Y are independent random variables
Var  X +Y   Var  X  +Var Y  and Var  X – Y   Var  X   Var Y 

In general Var  aX  bY  c   a 2Var  X   b 2Var Y   2abCov  X , Y 

  X  X Y  Y  .
1
Where Cov  X , Y  
N

If X and Y are independent then Cov  X , Y   0

v) The variance has the minimal property. This means that the variance or the standard deviation
is a minimum if and only if the deviation are taken form the mean. In other words,
1 n
 ( X i  a)2 is minimum if and only if a = X .
n i 1

Empirical Relations between Mean, Median and Mode


Mode=3Median-2Mean

Empirical Relations between Measures of Dispersion


For the normal distribution
i) Man Deviation = 0.7979 (Standard Deviation)
ii) Semi-Interquartile Range = 0.6745 (Standard Deviation)
These relationships hold approximately for moderately skewed distributions. We, therefore, have the
following empirical formulae.
11

4
i) Mean Deviation = (Standard Deviation)
5
2
ii) Semi-Interquartile Range = (Standard Deviation)
3
5
iii) Semi-Interquartile Range = (Mean Deviation)
6
Examlpe: The marks obtained by a sample of size n = 9 students are given below:
45, 32, 37, 46 ,39, 36, 41, 48, 36

X 
X 360
Sample Mean  40
n 9

M.D from mean =


 X  X  40  4.3
n 9
 X  X 
2
232
Standard deviation S=   5.08
n 9
 X  X 
2
232
Variance S2    25.78
n 9
Examlpe: A population of size N = 10 has observations 7,8,10,13,14,19,20,25,26 and 28.
Find the population mean, population standard deviation and population variance.

7,8,10,13,14,19,20,25,26, 28

Population Mean  
 X  170  17
N 10
 X   
2
534
Population Standard deviation    7.31
N 10


X  
2
534
Population Variance  2
  53.4
N 10
Examlpe: Array the data in ascending order 32,36,36,37,39, 41, 45, 46, 48

To find trimmed mean and trimmed S.D


W remove all values before Q1 and remove all values after Q3
So we use 36,37,39, 41, 45
36  37  39  41  45
Trimmed Mean =  39.6
5
 X  X 
2

Trimmed S.D =  5.5


5
To find Winsorized mean and Winsorized S.D
W replace all values before Q1 by Q1 and replace all values after Q3 by Q3
So we use 36,36,36,37,39, 41, 45, 45, 45
12

36  36  36  37  39  41  45  45  45
Winsorized Mean  40
9
 X  X 
2

Winsorized S.D =  3.86


5
Short Questions with answers
Q No 1: What is meant by dispersion?
Sol: By dispersion, we mean the extent to which the values in a series are spread out from the average.

Q No 2: What does quartile deviation measure in a distribution?


Sol: In distributions where there is not complete symmetry, the quartile deviation measures the
average distance from the quartiles to the median.

Q No 3: The range is generally a poor measure of dispersion. Explain.


Sol: Consider two series: (a) 3, 5, 6, 7, 10, 12, 15, 18 (b) 3, 8, 8, 8, 9, 9, 9, 18. in both series, range
is 18 – 3 =15, but there is much more variation or dispersion in (a) than in (b). Since the range
indicates no difference between the two series, it is a poor measure of dispersion in this case.

Q No 4: By adding 5 to each of numbers in the set 3, 6, 2, 1, 5, 7, we obtain the set 8, 11, 7, 6, 10, 12.
show that the two sets have the same standard deviation but different means.
Solve: Denoting the first and second sets by X and Y respectively, X =  X / n = 24/6 = 4 and
Y   Y / n = 54/6 = 9. The relation between X and Y is given by Y = 5 + 4 or y = 5 + X .
Since the standard deviation is independent of the origin (5 in this case), it will remain the same
for the two sets.

Q No 5: The scores obtained by five students on a set of examination papers were 70, 50, 60, 70, 50.
These scores are changed by (i) adding 10 points to all scores (ii) increasing all scores by
10%. What effect will these changes have on the standard deviation?
Solve: (i) Denoting the given scores by X and changed scores by adding 10 by Y, then Y = X + 10.
By property of the variance
Var(Y) = Var (X + 10) = Var (X) or S.D. (Y) = S.D. (X).

Thus there is no change in the standard deviation of Y.


(ii) In this case the original scores X’s are increased by 10% and these scores are 77, 55, 66, 77, 55.
By calculating the standard deviation of both sets, we find that the standard deviation of the changed set is
increased by 10%.

Q No 6: Given X = 10 and Var (X) = 4. Find Y and Var (Y) when


1
(i) Y = 2X 1 (ii) Y = (4 + 3X)
3

Sol: (i) Y =2X –1= 2 10  –1  19 , Var(Y) = Var(2x – 1) = 4 Var(X) = 4(4) = 16

1 4 1  4 
(ii) Y = (4 + 3 X ) = + X =34/3, Var(Y) = Var  (4  3x)  = Var   X  = Var(X)
3 3 3  3 
13

Q No 7: A manufacturer of T.V. tubes has two types of tubes A and B. The tubes have respective mean
lifetimes X A = 1495 hours and X B = 1895 hours S A  280, S B  310 . Which tube has the
grater (i) absolute dispersion (ii) relative dispersion?
Solve: (i) Since SB is greater than SA, tube B has greater absolute dispersion.

SA 280
(ii) C.V. (A) = 100  100 18.73% (Co-efficient of variation)
XA 1495

SB 310
C.V. (B) = 100  100 16.36%
XB 1895

Since C.V. (A) is greater than C.V. (B), tube A has greater relative dispersion.

Moment about origin

  
r
X i
r for Papulation data
N

mr   X ir for Sample data


n
Moment about Mean: Moments are certain parameters, which are used for testing the symmetry and
normality of a distribution

r  
( xi   )r
for Papulation data
N

mr 
 ( xi  x )r for Sample data
n
The relationship between Raw Moments & Central Moment
1  0
 2   2  ( 1) 2
3  3  3 2 1  2( 1)3
 4   4  4 3 1  6 2 ( 1) 2  3( 1) 4
Variance S 2  m2 and  2  2 (Variance is equal to Second moment about mean)

3 2 4 m32 m4
Moments Ratios: 1  , 2  (for sample) and b  , 2  for population.
23 2 2 1
m23 m2 2
Examlpe: From a sample data 32,36,36,37,39, 41, 45, 46, 48 . Find first four moments about mean.

  X  X   0, m  X  X 
2

X  40, m1  2  Var ( X )   25.78


n n

 X  X   X  X 
3 4

m3   20.67, m4   1189.78
n n
14

Examlpe: A population data in ascending order 3,4,6,8,11,12,14,15. Find first four moments about mean.

    9.125, 
 X     0,   X  X 
2
X 144.83
1 2  2
   18.10
N N n 8

  X   
X  
3 4
47.431 4031.08
3   5.91, 4    503.8
N 8 N 8
KURTOSIS
Lepto- kurtic Distribution:
A distribution having a relatively higher peak is called Lepto kurtic Distribution  2  3
Platy- kurtic Distribution:
A distribution which is flat-topped is called Platy kurtic Distribution  2  3
Meso- kurtic Distribution: or Normal
A distribution which is neither peaked nor high flat-topped is called Meso kurtic Distribution  2  3
Kurtosis: To show the degree of peaked ness of the distribution
m4
(I) Moment of Co-efficient of Kurtosis 2 
m22
(II) Percentile of Co-efficient of Kurtosis Q.D
k 
P90  P10
15

FORMULAS SHEET

Arithmetic Mean
Group Data Ungroup Data

X 
 fX  X
f N

 a 
X  a
 fD short cut or deviation D
where “D” is X – a
f N

X a
 fu  h where u 
xa
X a
u h where u 
xa
f h N h
step deviation or coding
Weighed Mean

Xw 
 WX Xw 
 WX
W W
Geometric Mean
  f log X  n X1  X 2  ....X n
anti log  
  f 
 X1  X 2  X 3  ... X n 1 / n
Median
~ h n  ~  n 1
X  l   C X  th value
f 2   2 
Harmonic Mean
   
   
H .M  
 f
 H .M  
n

  f    1 
  
 X   
 X 
   

Quartiles
h  kn   n  1
Qk     C Qk  K  th value
f 4   4 
Deciles
h  kn   n  1
Dk     C Dk  K  th value
f  10   10 
Percentile
h  kn   n  1
Pk     C Pk  K  th value
f  100   100 
Mode
16

Xˆ   
 f m  f1  h
Most Common Value
 f m  f1    f m  f 2 
Range
R  Xm  X0 Xm  X0
Co-eff of dispersion =
Xm  X0

Mean Deviation

M .D 
f X X M .D 
 X X from mean
f n

Q3  Q1
Q.D = or semi inter quartile range
2
Q3  Q1
Co-efficient of Quartile Deviation 
Q3  Q1
Variance

Population Variance Sample Variance


 X    X  X 
2 2

 
2
S2 
N n
Karl Person’s coefficient of Skewness
Mean  Mode
Karl Person’s 1st coefficient of Skewness S.K 
S.D
3Mean  Median
Karl Person’s 2nd coefficient of Skewness S .K 
S .D
Bow ley’s or Quartiles Co-efficient S .K Q 3  Q1  2median

Q 3  Q1

You might also like