0% found this document useful (0 votes)
18 views45 pages

03 - Measures - of - Center - Variation

The document discusses measures of central tendency, including arithmetic mean, median, mode, geometric mean, and harmonic mean, explaining their definitions, properties, and calculations. It also covers combined means, quartiles, deciles, and percentiles, providing examples and exercises for better understanding. Additionally, it includes methods for calculating these measures using data sets and interpreting results.

Uploaded by

aatikasiddiqa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views45 pages

03 - Measures - of - Center - Variation

The document discusses measures of central tendency, including arithmetic mean, median, mode, geometric mean, and harmonic mean, explaining their definitions, properties, and calculations. It also covers combined means, quartiles, deciles, and percentiles, providing examples and exercises for better understanding. Additionally, it includes methods for calculating these measures using data sets and interpreting results.

Uploaded by

aatikasiddiqa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Measures of

Central Tendency
Measures of Central Tendency

• A Measure of Location summarizes a data set by giving a “single


quantitative value” within the range of the data values that describes
its location relative to entire data set.
• Some Common Measures are:
• Arithmetic Mean / Average
• Median
• Mode
• Geometric Mean
• Harmonic Mean
Arithmetic mean / mean
• Most common measure of the center
• Obtained by dividing the SUM of all the observations by the total
number of observations
N

X i
X1 + X 2 + + XN
Population Mean = i =1
=
N N

x i
x1 + x2 + + xn
Sample Mean x= i =1
=
n n
Properties of Arithmetic mean
1. Mean of the constant is equal to that constant
2. The sum of the deviations of the observations from their mean is
equal to zero. i.e., 𝒏
ഥ =𝟎
෍ 𝑿𝒊 − 𝑿
𝒊=𝟏

3. The sum of squared deviations of the observations from their mean is


minimum 𝒏 𝒏

෍ 𝑿𝒊 − 𝑿 𝟐 < ෍ 𝑿𝒊 − 𝒂 𝟐

𝒊=𝟏 𝒊=𝟏

Where a is any constant


Properties of Arithmetic mean
X (𝑿 − 𝟔𝟖. 𝟓) (𝑿 − 𝟔𝟖. 𝟓)𝟐 (𝑿 − 𝟕𝟎) (𝑿 − 𝟕𝟎)𝟐
65 -3.5 12.25 -5 25
71 2.5 6.25 1 1
67 -1.5 2.25 -3 9
75 6.5 42.25 5 25
63 -5.5 30.25 -7 49
69 0.5 0.25 -1 1
75 6.5 42.25 5 25
63 -5.5 30.25 -7 49
548 0 166 -12 184

σ 𝑿 𝟓𝟒𝟖
ഥ=
𝑿 = = 𝟔𝟖. 𝟓
𝒏 𝟖
Properties of Arithmetic mean

4. If X1, X2 , …………, Xn have mean 𝑋ത then the mean after multiplying each
observation by a constant ‘a’ is the mean multiplied by that constant.

σ 𝒏
ഥ ∗ 𝒊=𝟏 𝒂𝑿𝒊 ഥ
𝑿 = =𝒂 ×𝑿
𝒏
5. If a constant ‘a’ is added to each of the observation X1, X2 , …………, Xn having
mean 𝑋ത then mean increases by that constant.

σ 𝒏
𝒊=𝟏 (𝒂+𝑿𝒊 )
ഥ =
𝑿∗ ഥ+𝒂
=𝑿
𝒏
Combined Mean

• For ‘k’ subgroups of data consisting of ‘n1, n2, …, nk’ observations


(with σ𝑘𝑖=1 𝑛𝑖 = 𝑛), having respective means, 𝑥ҧ1 , 𝑥ҧ2 , …, 𝑥ҧ𝑘 . Then
combined mean (mean of the all ‘k’ means) is given by:

𝑛1 𝑥1ҧ + 𝑛2 𝑥2ҧ + ⋯ + 𝑛𝑘 𝑥𝑘ҧ σ𝑘𝑖=1 𝑛𝑖 𝑥𝑖ҧ σ𝑘𝑖=1 𝑛𝑖 𝑥𝑖ҧ


𝑥ҧ𝑐 = = 𝑘 =
𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘 σ𝑖=1 𝑛𝑖 𝑛
Combined Mean
Example: The mean heights and the number of students in three sections of a
statistics class are given below. Calculate overall (or combined) mean height of the
students?
Sections Number of Mean height
students (inches)
A 40 62
B 37 58
C 43 61
Solution:
Note that we have, n1=40, n2=37, n3=43 and ഥ
𝑥1=62, ഥ
𝑥2 =58 and ഥ
𝑥3 =61. So the
Combined mean is :
𝑛1 𝑥ҧ1 + 𝑛2 𝑥ҧ2 + 𝑛3 𝑥ҧ3
𝑥ҧ𝑐 = = 60.4
𝑛1 + 𝑛2 + 𝑛3
Tasks
1. The mean weight of 10 students is 50 Kg when two students left the class
the mean weight becomes 48 Kg. Find the mean weight of students who
left the class? Answer = 58
2. There are total 30 students in a class. On thursday,18 students took a
math test and their mean marks was 80. The remaining 12 students took
a math test on Friday and their mean marks was 90. Find the mean marks
of the entire class? Answer = 84
3. Ali took five Math tests during the semester and the mean of his test
score was 85. If his mean after the first three was 83, What was the mean
of his 4th and 5th tests. Answer = 88
Geometric mean & harmonic mean
• The Geometric Mean (G.M) of a set of n positive values 𝑥1 , 𝑥2 , … , 𝑥𝑛 is the positive nth root of the
product of the values.
𝒏 𝟏ൗ
𝒏
𝑮. 𝑴 = ෑ 𝑿𝒊
𝒊=𝟏

σ𝒏𝒊=𝟏 𝑳𝒐𝒈 𝑿𝒊
𝑮. 𝑴 = 𝑨𝒏𝒕𝒊𝒍𝒐𝒈
𝒏
• The Harmonic Mean (H) of a set of n values 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined as the reciprocal of
the arithmetic mean of the reciprocals of the values.
𝒏
𝑯. 𝑴 =
𝟏
σ𝒏𝒊=𝟏
𝑿𝒊
Example
Find Geometric Mean and Harmonic Mean from the following data?

X Log (X) 1/X σ𝒏𝒊=𝟏 𝑳𝒐𝒈 𝑿𝒊


𝑮. 𝑴 = 𝑨𝒏𝒕𝒊𝒍𝒐𝒈 = 𝟔. 𝟒𝟑
3 0.477 0.333 𝒏

5 0.699 0.200
6 0.778 0.167
0.778 0.167 𝒏
6 𝑯. 𝑴 = = 𝟓. 𝟖𝟕
𝟏
7 0.845 0.143 σ𝒏𝒊=𝟏
𝑿𝒊
10 1.000 0.100
12 1.079 0.083
49 5.6567 1.1929
Mode

• The most frequent value also called nominal average


• can be used for qualitative as well as quantitative data
• may not be unique
• may not exist
• computation of the mode for ungrouped or raw data

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Median

• Numerical measures that give the relative position of a data value


relative to the entire data set.
• Divides the observations into two equal parts after arranging the
values in ascending order of magnitude
• If n is odd, the median is the middle number.
• If n is even, the median is the average of the 2 middle numbers.

𝑛+1
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑆𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
2
Quartiles
▪ Divide an array into four equal parts, each part having
25% of the distribution of the data values, denoted by Q j
▪ 25th of the observations are below the 1st quartile.
▪ 1st quartile is the 25th percentile; the 2nd quartile is the
50th percentile, also the median and the 3rd quartile is
the 75th percentile.
𝒏+𝟏
𝑸𝒋 = 𝑺𝒊𝒛𝒆 𝒐𝒇 𝒋 𝒕𝒉 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝟒

Where j = 1, 2, 3
Deciles
▪ Divide an array into ten equal parts, each part having ten
percent of the distribution of the data values, denoted by Dj
▪ 10 percent of the total observations fall below D1 and the
rest 90% are above it.
▪ 5th Decile is equal to the Q2 and Median

𝒏+𝟏
𝑫𝒋 = 𝑺𝒊𝒛𝒆 𝒐𝒇 𝒋 𝒕𝒉 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝟏𝟎

Where j = 1, 2, 3, …,9
Percentiles
▪ Divide an array (raw data arranged in increasing or
decreasing order of magnitude) into 100 equal parts.
▪ The jth percentile, denoted as Pj, is the data value in the data
set that separates the bottom j% of the data from the top
(100-j)%.

𝒏+𝟏
𝑷𝒋 = 𝑺𝒊𝒛𝒆 𝒐𝒇 𝒋 𝒕𝒉 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝟏𝟎𝟎

Where j = 1, 2, 3, …,99
Example
▪ Suppose ALI was told that relative to the other scores on a NTS
test, his score was the 95th percentile i.e., his percentile score
is 95. How do we interpret it?

➔ This means that 95% of those who took the test had scores
less than or equal to Ali’s score, while 5% had scores higher than
Ali’s.
Exercise

• Find Median, Q1, Q2, Q3 of the following data of marks obtained by


20 students? Also show that Median = Q2? Also interpret the
results?
53 74 82 42 39 28 20 18 68 58 54 93 70
30 61 55 36 37 29 94

• First of all arrange the data in ascending order of magnitude

Sr. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
18 20 28 29 30 36 37 39 42 53 54 55 58 61 68 70 74 82 93 94
Median & Quartiles
𝒏+𝟏
• 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝑺𝒊𝒛𝒆 𝒐𝒇 𝒕𝒉 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝟐
= Size of 10.5th Observation
= 10th Observation + 0.5 (11th Observation – 10th Observation)
= 53 + 0.5 (54 – 53)
= 53.5

𝒏+𝟏
• Q3= 𝑺𝒊𝒛𝒆 𝒐𝒇 𝟑 𝒕𝒉 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝟒
= Size of 15.75th observation
= 15th Observation + 0.75 (16th Observation – 15th Observation)
= 68 + 0.75 (70 – 68)
=69.5
Example

Consider the following data of marks of 20 students:-


53 74 82 42 39 28 20 81 68 58
54 93 70 30 61 55 36 37 29 94
Construct Boxplot of the data and interpret it.

Minimum = 20
Q1 = 36.25
Median = 54.5
Q3 = 73
Maximum = 94
Measures of Variation
Variation / Dispersion / Spread
• Although arithmetic mean is a concise method of presentation of a statistical
data yet it is inadequate for several reasons, for example, it gives no
indication of its reliability.
• It is possible that average of two data sets are same but even than two data
sets may be quite different with respect to variation among values with in
each data set

Data 1 Data 2 By comparing mean, both data


49 0 sets look same, but quite
50 50 different in terms of variability
among values within each data
51 100
Measures of Variation/ Dispersion
• In Statistics, Dispersion (also called variability, scatter, or spread)
denotes how stretched or squeezed a distribution is
• Variability is the extant to which data points in a Statistical
Distribution or data set diverge from the average, or mean, value
as well as the extent to which these data points differ from each
other.
Common measures of Variation

• Variance
• Standard Deviation
• Range
• Inter Quartile Range
• Semi Inter Quartile Range
• Mean Deviation
Variance

• Variance is the measure of the spread between observations in a


dataset.
• The variance measures the distance of all the observations from their
mean.
σ𝑁 2
Population Variance 2 𝑖=1 𝑋𝑖 − 𝜇
𝜎 =
𝑁

𝑛 ത 2
Sample Variance σ𝑖=1 𝑋𝑖 − 𝑋
𝑆2 =
𝑛−1
Standard Deviation

• It is the positive square root of the Variance

Population Standard σ𝑁
𝑖=1 𝑋𝑖 − 𝜇
2

Deviation 𝜎=
𝑁

σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 2
Sample Standard 𝑆=
Deviation 𝑛−1
X ഥ)
(𝑿 − 𝑿 ഥ )𝟐
(𝑿 − 𝑿
Example 1 2 -4 16
4 -2 4
6 0 0
• Consider the following data of height 8 2 4
(cm) of 5 plants. 10 4 16
2, 4, 6, 8, 10 30 0 40
• Find the average, variance and the σ 𝑿 𝟑𝟎
standard deviation of the yield. ഥ=
𝑿 = =𝟔
𝒏 𝟓
σ 𝒏 ഥ 𝟐
𝟐 𝒊=𝟏 𝑿𝒊 − 𝑿 𝟒𝟎
𝑺 = = = 𝟏𝟎
𝒏−𝟏 𝟓−𝟏

𝑺 = 𝟏𝟎 = 𝟑. 𝟏𝟔
X (𝑿 − 𝟔𝟖. 𝟓) (𝑿 − 𝟔𝟖. 𝟓)𝟐
Example 2 65 -3.5 12.25
71 2.5 6.25
67 -1.5 2.25
• Consider the following data of yield of 75 6.5 42.25
wheat (in kgs) from 8 experimental 63 -5.5 30.25
plots. 69 0.5 0.25
75 6.5 42.25
65, 71, 67, 75, 63, 69, 75, 63 63 -5.5 30.25
• Find the average, variance and the 548 0 166
standard deviation of the yield.
σ 𝑿 𝟓𝟒𝟖
ഥ=
𝑿 = = 𝟔𝟖. 𝟓
𝒏 𝟖
σ𝒏 ഥ 𝟐
𝒊=𝟏 𝑿 𝒊 − 𝑿 𝟏𝟔𝟔
𝑺𝟐 = = = 𝟐𝟑. 𝟕𝟏
𝒏−𝟏 𝟖−𝟏

𝑺 = 𝟐𝟑. 𝟕𝟏 = 𝟒. 𝟖𝟕
The Range & Coefficient of Range

• The Range R is defined as the difference between the largest and the
smallest observations in a dataset. i.e,

𝑅 = 𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛

• The Coefficient of Dispersion or Coefficient of Range is defined as

𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛
𝐶𝑜𝑒𝑓𝑓. 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
𝑋𝑚𝑎𝑥 + 𝑋𝑚𝑖𝑛
Example

The marks obtained by 9 students are given below:-


45, 32, 37, 46, 39, 36, 41, 48, 36
Find the range and the Coefficient of Range.
Maximum Obs is 48 and Minimum 32, therefore
Range = 16 marks
Co-efficient of Range = 0.2
Semi Inter Quartile Range / Quartile Deviation
• The inter quartile range (IQR) is a measure of dispersion, defined as
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
• The Semi Inter Quartile Range or Quartile Deviation (QD) is defined as
𝑄3 − 𝑄1
𝑄𝐷 =
2
• The Co-efficient of Quartile Deviation (QD) is defined as
𝑄3 − 𝑄1
𝐶𝑜𝑒𝑓𝑓. 𝑜𝑓 𝑄𝐷 =
𝑄3 + 𝑄1
Disadvantages of the Range
• Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12

Range = 12 - 7 = 5 Range = 12 - 7 = 5

• Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119 37
The Mean Deviation OR Average Deviation

• The mean deviation (M.D.) of a set of data is defined as the arithmetic


mean of the deviations measured either from the mean or from the
median,
σ𝑛 ത
𝑖=1 𝑋𝑖 −𝑋 σ𝑛
𝑖=1 𝑋𝑖 −𝑚𝑒𝑑𝑖𝑎𝑛
𝑀. 𝐷 = OR 𝑀. 𝐷 =
𝑛 𝑛
• Co-efficient of Mean Deviation is given as
𝑀.𝐷 𝑀.𝐷
𝐶𝑜𝑒𝑓𝑓. 𝑜𝑓 𝑀. 𝐷 = OR
𝑀𝑒𝑎𝑛 𝑀𝑒𝑑𝑖𝑎𝑛
Co-efficient of Variation (CV)

• The coefficient of variation is a measure of spread that describes the


amount of variability relative to the mean. Because the coefficient of
variation is unitless, you can use it instead of the standard deviation
to compare the spread of data sets that have different units or
different means.
𝑆
𝐶𝑜𝑒𝑓𝑓. 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 (𝐶𝑉) = × 100
𝑋ത
Example
Following data represents the prices Following data represents the
in Rs. of a certain commodity life of car battery in hours
8, 13, 18, 23, 30 130, 150, 180, 250, 345
Sol:
Sol:
𝑋ത = 18.4 𝑅𝑠. 𝑌ത = 211 𝐻𝑟𝑠.
𝑆𝑥 = 8.56 𝑅𝑠. 𝑆𝑦 = 87.63 𝐻𝑟𝑠.
𝑪. 𝑽 = 𝟒𝟔. 𝟓
𝑪. 𝑽 = 𝟒𝟏. 𝟓
Example of CV
The following data represent length (in inches) and weight (in Kg) for a
sample of 10 fish of same species after using a particular type of fish feed

Fish 1 2 3 4 5 6 7 8 9 10
Weight 1.8 1.9 2.1 2.4 2.5 2.6 2.7 2.8 3.1 3.2
Length 11 12 12 13 15 15 16 17 18 18

Which characteristic weight or length has relatively more variation?


Standard deviation Mean CV
(S) 2.51 kg 18.82
Weight 0.472 kg
14.70 inches 17.58
Length 2.584 inches
Types of Distribution
Measures of Skewness

• Skewness is a measure of symmetry, or more precisely, the lack of


symmetry. A distribution, or data set, is symmetric if it looks the same
to the left and right of the center point. The Coefficient of Skewness is
given as
σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 3 (Q3 − 2Q2 + Q1 )
Sk =
𝑆𝑘 =
𝑛𝑆 3 (Q3 − Q1 )
• If Sk = 0 the distribution is Symmetrical
• If Sk  0 the distribution is +vely skewed
• If Sk  0 the distribution is -vely skewed
Measures of Kurtosis
• Describes the extent of
peakedness or flatness of
the distribution of the
data.
• Measured by Coefficient
of Kurtosis (K) computed
as,

σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 4
𝐾= −3
𝑛𝑆 4
Interpretation

K=0
mesokurtic

K>0 K<0
leptokurtic platykurtic
Example
Consider the following data:- Mean 32
Standard Error 1.73
25, 27, 36, 31, 33, 35, 37
Median 33
Find Mean, Variance, Coefficient of Standard Deviation 4.58
Skewness and Coefficient of Kurtosis and Sample Variance 21
interpret the results. Kurtosis -1.65
Skewness -0.39
Range 12
Minimum 25
Maximum 37
Sum 224
Count 7
How to do it…
𝑿 ഥ
𝑿−𝑿 ഥ
𝑿−𝑿 𝟐 ഥ
𝑿−𝑿 𝟑 ഥ
𝑿−𝑿 𝟒

25 -7 49 -343 2401
27 -5 25 -125 625
36 4 16 64 256
31 -1 1 -1 1
33 1 1 1 1
35 3 9 27 81
37 5 25 125 625
224 0 126 -252 3990
Five Number Summary

• For a set of data, the minimum, first quartile, median, third


quartile, and maximum.
Minimum, Q1, Median, Q3, Maximum
Boxplot / Box & Whisker plot

1. Line within box( median) indicates average size of the data


2. Length of graph / box indicates variation in the data
3. Position of line within box indicates the shape of the data
✓ Line at the center of the box indicates data is symmetrical
✓ Line above the center of the box indicates data is -vely skewed
✓ Line below the center of the box indicates data is +vely skewed
Example
Consider the following data of marks of 20 students:-

53 74 82 42 39 28 20 81 68 58
54 93 70 30 61 55 36 37 29 94
Construct Boxplot of the data and interpret it.

Minimum = 20
Q1 = 36.25
Median = 54.5
Q3 = 73
Maximum = 94

You might also like