0% found this document useful (0 votes)
13 views19 pages

4.sheet - Lecture On Central Tendency

Uploaded by

2023-2-60-191
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views19 pages

4.sheet - Lecture On Central Tendency

Uploaded by

2023-2-60-191
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Measures of Central Tendencies

 Measures of Central Tendency: One number that represents a data set and gives an idea
of the middle quality of the data.

(Measures of Central tendency show the tendency of some central value around which data tends
to cluster)

There are three absolute measures of Central tendency

(1) Mean (2) Median (3) Mood

Mean: The mean can again be of three types

1. Arithmetic Mean: Arithmetic Mean is the sum of a set of observations (Positive, negative,
zero) divided by the number of observations.

A.M.= Sum of values of items / Number of items

 Direct Method of Calculation of A.M.:

(I)Ungrouped data: Suppose we have a set of values x 1 , x 2 , x 3 , … , x n .Then we define


and denote A.M. as
n

x  x 2  x 3  ...  x n . x
i 1
i
(a) x = 1  where xi is the individual values of items
n n

Example1: Find the mean of 10,20,30,50 and 60

10  20  30  50  60 175
Solution: A.M. =   34
5 5

(b) [[ Discrete Series: When actual size of items (individual measurements) and
corresponding frequencies are given.]]

( f x )
i 1
i i
x= n

f
i 1
i n

Where x i = the size of the item


and f i = Corresponding frequency

Example 2: Calculate the arithmetic average:

1
Size of item(in inches) 2 3 6 7 10
Frequencies 3 8 5 10 14

Solution:

Size of item(in inches) (x i ) 2 3 6 7 10


Frequencies (f i ) 3 8 5 10 14
f i  40
fi xi 6 24 30 70 140 5

 ( f x ) = 270
i 1
i i

270
A.M.= = 6.75inches
40

(I)Grouped data: i.e When class- interval and frequencies are given

( f x )
i 1
i i
x= n

f
i 1
i n

Where x i = the central value of the interval


and f i = Frequencies of the respective classes

Example 3: Calculate the mean of the following distribution

Yield(in cwt) 0-3 3-6 6-9 9-12


Frequency 4 4 32 10
Solution:
Yield(in cwt) 0-3 3-6 6-9 9-12
Central value(x i ) 1.5 4.5 7.5 10.5
Frequency (f i ) 4 4 32 10
f i  50
6 18 240 105 4

 ( f x ) = 369
fi xi
i i
i 1

( f x )
i 1
i i
369
A.M. = 4
= = 7.308 cwt
50
f
i 1
i

H.W: 6: The following table gives the distribution of weekly wages of workers in a factory.
Calculate the arithmetic mean of the distribution.

2
Weekly Wages: 240-269 270-299 300-329 330-359 360-389
No of Workers : 7 19 27 15 18

## Advantages of Arithmetic Mean:

(1) Easily Understand, calculated and is generally used


(2) Uses each and every item

## Disadvantages of Arithmetic Mean:

(1) Its value may be greatly distorted by extreme value


(2) Cannot be used in qualitative value

2.Geometric Mean: Ungrouped data:

The GM [G] of n-positive values x 1 ,x 2 ,x 3 ,… , x n is defined as the nth positive root of the
product of n numbers

n 1

 xi ) n =
1
G = (x 1 .x 2 .x 3 ,… , x n ) n
=( n x 1 x 2 x 3,…, x n
i 1

For frequency, G= n
x f1 1 x f 2 2 x f3 3 ,…, x f n n

where xi is the individual values of items


and f i = Corresponding frequency
Using Logaritham:
1
Log G= ( logx1+ logx2 + ----------------------------------------- + log x n )
n
1
= ∑logx
n
1
G = Antilog ( ∑logx)
n
Or

The latter expression states that the log of the geometric mean is the arithmetic mean of the logs
of the numbers.

[Practical definition: The average of the logarithmic values of a data set, converted back
to a base 10 number.]

3
#### Because the geometric mean is based on log values and the log transformation tends to draw
extreme values toward the center of the data, the geometric mean is more "robust" than the
arithmetic mean. "Robust" here means less influenced by outliers.

Grouped data: =
G = n x f1 1 x f 2 2 x f3 3 ,…, x f n n Where x i = the central value of the interval
f i = Frequencies of the respective classes

Log G =1/𝑁 log ( x 11 x


f f2
2 x f3 3 ,…, x f n n )
= 1/𝑁 [f 1 logx1+ f 2 logx2 + ----------------------------------------- + f n log x n )]
= 1/𝑁 ∑ f i log x 2
G = antilog (1/𝑁 ∑ flog x)

1 n
3.Harmonic Mean: H.M= =
1 1 1 1 1 1
  ...    ... 
x1 x 2 xn x1 x 2 xn
n

For frequency,
𝑁
H= 𝑓1 𝑓2 𝑓3
𝑥1+ 𝑥2+ 𝑥3
𝑁
= 𝑓
𝑥

Where, xi= individual values of item (For Ungrouped data)

x i = the central value of the interval (For Grouped data)

and f i = Frequencies of the respective classes


 Home

Property: For set of n non-negative observations x 1 , x 2 , x 3 ,… , x n

AM  GM  HM

Let 3, 9 , 11 are three given numbers, then

AM= (3+9+11) / 3 = 7.66

3
GM= 3.9.11 = 6.67

4
3
HM= = 5.60
1 1 1
  ... 
3 9 11

So AM  GM  HM is proved

 Median: The “Median” is the middle value in the list of numbers.

Calculation of median: Ungrouped Data


Arrange the data in ascending or descending order of magnitude.
n 1
Median= Size of th observation
2
For Uneven ,
Ex: from the following data of wages of 7 workers compute the median wage
Wages (in Taka) 1600 1650 1580 1690 1660 1606 1640
Solution: wage arranged in ascending order

1580,1600,1606,1640,1650,1660,1690
n 1 7 1
Median= Size of th observation= th observation = 4 th observation
2 2
Here median wage =Taka 1640.

For Even,
8 1
th observation = 4.5 th observation
2

We shall take 4th and 5th observation


1640  1650
Median = = 1645
2
Advantages :(1)Its value is not distorted by extreme values.
(2)It is suitable for qualitative studies.
Disadvantages: (1) Necessities arraying of data before it can be found.

Grouped data:

N
 p.c. f
Median = L + 2 i
f
Where,
L= Lower limit of median class
P.c.f= preceding cumulative frequency to the median class
f= Frequency of median class
i= The class interval of median class

N
Median class, CF(cumulative frequency) 
2
Ex. 1500 workers are working in an industrial establishment. Their age is classified as follows:

5
Age(yrs) No.of Age (yrs) No.of
workers workers
18-22 120 38-42 184
22-26 125 42-46 162
26-30 280 46-50 86
30-34 260 50-54 75
34-38 155 54-58 53

Calculate the median age

Solution:
Age group Frequency (f i ) Cumulative frequency(c.f)
18-22 120 120
22-26 125 245
26-30 280 525
30-34 260 785
34-38 155 940
38-42 184 1124
42-46 162 1286
46-50 86 1372
50-54 75 1447
54-58 53 1500

N 1500
So, Median = Size of th observation= Size of th observation= 750 th observation
2 2
Hence of median lies in the class 30-34
N
 p.c. f
2 750  525
Median = L + i = 30 +  4 = 30+ 3.46 =33.46 (Answer)
f 260
 Mode: Mode is the most frequently occurring value in a frequency distribution.
Ex. To find the mode of 11,3,5,11,7,3,11
Here, mode=11

Calculation of mode –Ungrouped data

For example, let us consider the age of (in year) of some children investigated in a small locality:

5, 2, 2, 8, 7, 6, 5, 4, 3, 4, 5, 2, 2

In the above example, age 2 years are recorded 4 times (maximum time). So 2 years is the mode
of the distribution of ages of children.

Distribution of workers by their daily wages (in taka):

90.00, 90.00, 95.00, 100.00, 100.00, 90.00, 100.00, 110.00, 115.00, 125.00

Here 90.00 and 100.00 occur with greatest frequency 3.


Therefore, Mo=90.00 and Mo=100.00. The distribution is bi-modal.

6
Distribution of the students according to their grade point average in an examination:

2.5, 3.0, 3.0, 3.5, 3.8, 3.0, 3.0, 2.5, 3.8, 4.0, 4.0, 4.0, 4.0, 2.5, 2.5

In the distribution Mo=2.5, Mo = 3.0 and Mo= 4.0, since each of the value 2.5, 3.0 and 4.0 occur
4 times in the distribution. This distribution is known as multimodal distribution.

Calculation of mode –Grouped data

Example: the following data represent the distribution of female workers in different garments
industries garments industries according to their monthly salary (in taka).

Class interval of salary No. of female workers c.f


(in taka) fi
< 600 10 10
600-700 40 50
700-800 65 115
800-900 250 365
900-1000 175 540
1000-1100 82 622
1100-1200 50 672
(i) Find the maximum salary, on average, of the major group of female workers.
(ii) How many female workers have salary less than 1000 taka?

Solution: (i) the major group of of female workers are 250 whose salary is in the limit 800-
900. Their, on an average, salary is given by mode, where
h( f 1  f 0 )
M0  l 
2 f1  f 0  f 2
100(250  65)
 800 
2  250  65  175
 871.15 tk
l  800, h  100, f1  250, f 0  65, f 2  175

(ii)From c.f it is observed that 540 female workers‟ salary is less than 1000.00 tk.

#### An empirical relationship between mean, median, mode is presented by Karl


Pearson. For moderately asymmetrical curve, the empirical relationship is
Mode = Mean - 3(Mean - Median)
Or Mode = 3Median – 2Mean
2
Median = Mode + (Mean - Mode)
3
3
And Mean = Mode + (Median - Mode)
2

7
Example: The following data represent the distribution of chicken by weight after 7
weeks of birth:
Class interval of No. of chicken, c.f Mid-Value fi X i
weight(in kg) fi Xi
0.6-0.8 25 25 0.7 17.5
0.8-1.0 22 47 0.9 19.8
1.0-1.2 18 65 1.1 19.8
1.2-1.4 12 77 1.3 15.6
1.4-1.6 12 89 1.5 18.0
1.6-1.8 11 100 1.7 18.7
Total 100 109.4
Find the Weight, on average, of the maximum number of the chicken.
Solution: Since the maximum frequency is in the first class, mode is ill defined.
1 109.4
However, Mean 
N
f Xi i
100

 1.094

hN  0.2
Me  l    c   1.0  (50  47)  1.03
f 2  18
The weights, on average, of maximum number of chicken is mode of the distribution.
The mode is given by, Mo = 3 Median – 2 mean = 3(1.03) – 2(1.094) =
0.902

Outlier:
 A number that is very far away from the rest of the data set. I.e., a value that is much greater
or much less than most of the other numbers

 An outlier moves the average from the middle of the cluster of the rest of the data points.

# Advantage & Disadvantage of Measure Of Central Tendency


 Consider a set of observations as follows :
 Observations: 4, 8, 5, 7, 100
4+8+7+100
𝑥
= 5
= 24.8
## Disadvantage of Arithmetic Mean :
Arithmetic mean is affected by extreme value. This is the major disadvantage.
Array: 4, 5, 7, 8, 100
Me = 7
Advantage of Median:
Me is not affected by extreme value

### X : -2, 4, 8
G = (-2.4.8) ⅓
If we have even number of minus values i.e. X: -2, 4, -8
G = (-2.4.-8)⅓ = 4 and x =-2

8
But here G > x , it‟s distorted the relationship, So for negative values we shouldn‟t
calculate Geometric mean

X: 3, 4, 0, 10
G = (3.4.0.10)1 4
=0
4
H=1 +1 +1 +1
3 4 0 10
4
=∞=0
Disadvantage of Geometric & Harmonic Mean:
Geometric & Harmonic mean are not calculated if array observation is Zero or negative.
 Geometric & Harmonic mean are used if observations are giver in rates & ratio.
X: 2, 3, 3, 5, 8, 9, 15, 15
Mode = 3 & 15

Class Frequency Frequency


interval
𝑓0 = ?
10-20 20 =𝑓1 8
20-30 18 = 𝑓2 10
30-40 14 12
40-50 10 14 = 𝑓𝑜
50-60 6 18 = 𝑓1

𝑓2 = ?

*** Modal Class is that class for which frequency is maximum.

Disadvantage of Mode :
1. There may be many modes
2. Mode may not fall at the centre of the array.
3. Mode is not well defined if the modal class is the first class & last class.

Example:-
Find an appropriate measure of central tendency of the following observations with justification.
Ans :-
1. Here arithmetic mean is not suitable as there is an extreme value 150.
2. Geometric& Harmonic mean are not calculated as there are zero& negative observations.
3. Each observation occurs once. So there is no mode.
4. Therefore median is an appropriate measure of central tendency in this case.
0+5 −10+10
Me = 2 = 2.5 Me = 2 = -5

# Measures of Location
Let us, consider the array as follows
X: 5, 7, 8, 10, 13, 15, 16, 18, 19, 20, 25, 27, 29, 31, 33, 36, 37, 38, 40, 45
N = 20

9
13+15 20+25 33+36
Me = Me = Me =
2 2 2
= 14 = 22.5 = 34.5
# A measure which is located in different place in the array is called the measure of location.
 Measure of central tendency falls at the centre of the array but measure of location
falls in different places in the array.

# Different measures of location are :


1. Quartile (Q i i= 1, 2,3)
(A measure which divides the array into 4 equal parts)
2. Decile ( D i , I = 1,2 …………………………. 9)
( A measure which divides the array into 10 equal parts)
3. Percentile ( Pi ; I = 1,2, ………………………….. 99)
( A measure which divides the array into 100 equal parts)

𝑕
Me = l + 𝑓 ( 𝑁 2 – c ) = Q 2 = D 5 = P 50
𝑕 2𝑁
=l+ −𝐶
𝑔 4
𝑕 5𝑁
=l+ −𝐶
𝑔 10
𝑕 2𝑁
=l+ 𝑔 4
−𝐶
𝑕 50𝑁
=l+ 𝑔 100
−𝐶

𝑕 𝑖𝑁
Qi = l + 𝑓 4
− 𝐶 ; I = 1, 2, 3
Here,
l = lower limit of Q i class
h= width of Q i class
f= frequency of Q i class
c= c.f of the class before Q i class
N = Total frequency
𝑖𝑁
* Q i is that class for which c.f.≥ 4
𝑕 𝑖𝑁
### Di = l + 𝑓 10
− 𝑐 ; i= 1,2,……….,9
iN
*Di is that class for which c.f. ≥
10

𝑕 iN
P i = l+ 𝑓 ( – c ), i= 1,2,……….,99
100
𝑖𝑁
* P i is that class for which c.f. ≥ 100
# Let us consider the following frequency distribution:
Class interval Frequency ,f C.f.
10-12 8 8

10
12-14 12 20
14-16 20 40
16-18 22 62
18-20 18 80
20-22 12 92
22-24 8 100
N = 100

1. Calculate Q 1 , Q 2 , Q 3 Of the distribution


2. Find the values of D 3 & D7
3. Find the values of p40 & p80
4. Find Q1, D8 & P 60 from graph
5. Draw Box & whisker plot of the distribution
𝑕 𝑁
1. Q1 = l + 𝑓 4
−𝑐

2 100
= 14 + − 20
20 4
1
= 14+ 10
×5
= 14.5
𝑕 2𝑁
Q2 = l + 𝑓 4 −𝑐
2 2×100
= 16+ − 40
22 4
1
= 16+ 11
× 10
= 16.9
𝑕 3𝑁
Q3= l + −𝑐
𝑓 4
2 3×100
= 18 + − 62
18 4
1
= 18 + 9
× 13
= 19.44

2.
𝑕 3𝑁
D3 = l + −𝑐
𝑓 10
2 3×100
= 14 + 20 10
− 20
1
= 14 + 10
× 10
= 15

𝑕 7𝑁
D7 = l + 𝑓 10
−𝑐
2 7×100
= 18 + 18 10
− 62
1
= 18 + × 89
= 18.9

Box Plot

11
12
20

Outlier: An outlier is defined as a value that is more than 1.5 times the interquartile range smaller
that Q1 or larger than Q3.

Outlier > Q3 + 1.5(Q3-Q1)

Outlier < Q1-1.5(Q3-Q1)

Dispersion

The word „Dispersion‟ implies the average distance or deviation from a central value.

Let us start with two examples

Data set-1: 3,4,5 mean=4


Data set-2: 0,2,10 mean=4
Which implies same mean but in set 2 much variation

The absolute measures are

13
1) The Range (2) The Standard deviation (3) The Mean Deviation (4) The quartile
deviation
The relative measures are
1) The coefficient of variation
2) The coefficient of mean deviation
3) The coefficient of Quartile deviation
4) The coefficient of Range

*** Absolute measure’s of variation are expressed in the same statistical unit in which the
original data are given such as rupees, kilograms, tones etc.
i.e. when two distributions have same unit.

*** Relative measures of variation: A measure of relative variation is the ratio of a measure
of absolute variation to an average. It is sometimes called a coefficient of variation, because
“Coefficient” means a pure number that is independent of the unit of measurement.

Range: Range is the difference between the value of smallest and largest.

Symbolically, Range=L-S where, L=Largest value and S= Smallest

• The range is widely used in statistical process control applications.


A serious defect of the range is that it is based on only two values, the largest and smallest; it
does not take into consideration all of the values

Limitations:
1) Range is not based on each and every observation.
2) The amount of range is affected by extreme values
3) It is not calculated from frequency distribution with open-end classes.

Mean deviation: Mean deviation is an average of absolute deviation of the individual


observation from the central value of the series.
n

x1  x  x2  x  ..............  xn  x x
i 1
i x
MD= =
n n
Ex. Find the mean deviation of 4, 8, 12, 24

Average x = (4+8+12+24)/ 4 = 12

4  12  8  12  12  12  24  12
MD= =6
4
Limitations:
1) It ignores the sign in measuring deviations which is bad from the mathematical point
of view. Hence, it is not good for further algebraic use of the measure.
2) The amount of mean deviation increases with the increase in size of sample.

Standard Deviation: (http://www.mathsisfun.com/data/standard-deviation.html)

It is a measure of how much “Spread” or “variability” is present in the sample.

14
[NB. If all the numbers in the sample are very close to each other, the standard deviation is
close to zero.]

Standard deviation is denoted and is defined as

15
n

 (x
i 1
i  )2
σ= ←For population
n
Where, xi = the individual values
 = Population mean
n = number of items

:. Variance σ 2 =(Standard deviation) 2

 (x
i 1
i  x) 2
s= ←For Sample
n 1

For Frequencies:
n

 f (x
i 1
i i  )2
σ= ←For population
n
Where, xi =mid value of each class interval (For grouped data)
n

 f (x
i 1
i i  x) 2
s= ←For Sample
n 1
Advantage:
1) Based on all the observations.
2) As sign of deviations are not ignored, it is suitable for further use in statistical analysis.
Limitations:
As it is not free of unit, it is not use to compare the dispersion of two or more distributions.

Ex. Find the standard deviation from a population data on the weekly wages of ten workers working in a
factory.
Wages(TK): 320,310,315,322,326,340,325,321,320,331

10 n

 xi
i 1 3230
 (x
i 1
i  )2
622
µ= = = 323 .: σ = = = 7.89
10 10 n 10

Quartile deviation:

The inter quartile range is frequently reduced to the measure of semi-inter-quartile range, also known as the
quartile deviation, by dividing it by 2. Thus
I QR Q3  Q1
QD  
2 2

16
This measure is more meaningful than the range because it is not based on two extreme values.

Limitation of Quartile Deviation as a Measure of Dispersion:


the quartile deviation have serious shortcomings. First of all, they do not take into consideration the values of
all times. For example, Q D is not affected by the distribution of all items below Q1 and above Q3 . Moreover,
they remain to be positional measures, following to provide measurement of scatter of the observations,
relative to the typical value. In addition, it does not enter into any of the higher mathematical relationships
that are basic to inferential statistics.

Relative measures of dispersion:

Coefficient of range:
The coefficient of range is a relative measure corresponding to range and obtained by the following formula
LS
CR  100
LS
Where L and S are respectively the largest and smallest observations in the data set. The coefficient of range
is rarely used as a measure of dispersion because of its inherent difficulties in interpretation.

Coefficient of Mean Deviation:


The third relative measure is the coefficient of mean deviation. As the mean deviation can be computed from
mean, median, mode or form any arbitrary value, a general formula for computing coefficient of mean
deviation may be put as follow:
M d (a)
C MD( a )  100
a
Where „a‟ may be the mean, median, mode or any arbitrary value.

Coefficient of Quartile Deviation:


The coefficient of quartile deviation is computed from the first and third quartiles using the following
formula:
Q3  Q1
CQD  100
Q3  Q1
It is worth to mention that most of the absolute measures expect CV, are of little significance because of their
limited practical utility.

Co-efficient of variation (CV):

A co-efficient of variation is computed as the ratio of the Standard deviation of the distribution to the
mean of the same distribution.
S
CV= × 100
x
Uses of co-efficient of variation: CV is helpful in comparing the relative variation in several data sets
that have different means and different standard deviation.

EX.: Find CV for data given below:


For 150 children
Height Weight
Mean, x = 40 inch Mean, x = 10 kg
SD, S = 5 inch SD, S = 2 kg

17
S 5
For Height CV= = = 0.125
x 40
S 2
For Weight CV= = = 0.2
x 10
So we can say weight variability is greater than height variability.

Ex. For Blood pressure:


Systolic Diastolic
Mean, x = 130 mm Hg Mean, x = 60 mm Hg
SD, S = 15 mm Hg SD, S = 8 mm Hg

S 15
For Systolic, CV = = = 0.115
x 130
S 8
For Diastolic, CV = = = 0.133
x 60

Example: Suppose that we wish to obtain some insight into whether height is more variable than the weight
in the same population. For this purpose, we have to following data obtained from 150 children in a
community.
Height Weight
Mean 40 inch 10 kg
SD 5 inch 2 kg
CV 12.5% 20.0%

Examination of the respective standard deviations does not tell us in any meaningful way which characteristic
has more variability than the other, because they are measured in different units. If we now compute
coefficient of variation, the results become comparable, because coefficient of variation for weight is greater
than that of the height, we conclude that weight has more variability than height in the population.
Even if two variables in the same population are measured in the same unit, the standard deviation may fail to
provide a correct picture of their relative variability. This is illustrated by an example bellow

Example: Consider that the blood pressures of a group of patients were measured at tow level: systolic and
diastolic, both being measured in the same unit. The results were as follow:
Systolic Diastolic
Mean 130 mm Hg 60 mm Hg
SD 15 mm Hg 8 mm Hg
CV 11.5% 13.3%
As implied by the standard deviation, systolic pressure is more than the diastolic pressure. However in
relative terms, as measured by the CV, the diastolic pressure has the greater variability. This show the relative
variability is of more concern than absolute variation hence the importance of the coefficient of variation.

The discussions and examples above tend to demonstrate that coefficient of variation is a very useful
measure when:
1) The data are in different units.
2) The data are in same units but means are far apart.

18
3) When the data sets involve all or nearly all positive values.

Properties of the Standard Deviation

In terms of measuring the variability of spread of data, we've seen that the standard deviation is the preferred
and most used measure.

Some additional things to think about the standard deviation:

1. The standard deviation is the typical or average distance a value is to the mean
2. If all values are the same, then the standard deviation is 0
3. The standard deviation is heavily influenced by outliers just like the mean (it uses the mean in its
calculation).
4. The sample standard deviation is denoted with the letter s and the population standard deviation is
denoted with the lower case Greek letter sigma σ.

If your data is more spread out (has more variability) then you will have a higher standard deviation. It's often
difficult to interpret a standard deviation since it's based on the sample of data. Is a standard deviation of 12
high or is a .20 high?

Coefficient of Variation (CV)

If you know nothing about the data other than the mean, one way to interpret the relative magnitude of the
standard deviation is to divide it by the mean. This is called the coefficient of variation. For example, if
the mean is 80 and standard deviation is 12, the cv = 12/80 = .15 or 15%.
If the standard deviation is .20 and the mean is .50, then the cv = .20/.50 = .4 or 40%. So knowing nothing
else about the data, the CV helps us see that even a lower standard deviation doesn't mean less variable
data.

19

You might also like