0% found this document useful (0 votes)
2 views

Lecture 5 Notes

Uploaded by

sirajamalif
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 5 Notes

Uploaded by

sirajamalif
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction to Dispersion

What is Dispersion?
• Measures of average (such as the median and mean) represent the typical
value for a dataset. But within the dataset, the actual values usually differ from
one another and from the average value itself.

• The extent to which the central value are good representatives of the values
in the original dataset depends upon the variability or dispersion in the
original data.

• Dispersion is the spread or scatter of item values from a measure of central


tendency. Dispersion is usually measured as an average of deviations about
some central value.

• Dispersion thus is a type of average and is sometimes called a second order


average. Datasets are said to have high dispersion or variation when they
contain values considerably higher and lower than the central value.

Example 1

• Let us consider two groups of students with score in a particular examination


as shown in the table.

Group 1 49 50 50 51
Group 2 0 0 100 100

• The AM for each group is 50.

• It is clear from the data that the first group consists of near average intelligent
student and the 2nd group is made up of very bright and very dull students.

• It is evident that the distributions of both groups have the same AM.

• But they differ in variation from 𝑥𝑥̄ such variation is usually measured by the
measure of dispersion.
Example 2

Size of Tutorial Groups Size of Tutorial Groups

• In the two charts, the number of different sized tutorial groups in semester 1
and semester 2 are presented.

• In both semesters the mean and median tutorial group size is 5 students,
however the groups in semester 2 show more dispersion (or variability in size)
than those in semester 1.

Characteristics of a good measure of dispersion

Dispersion within a dataset can be measured or described in several ways including


the range, inter-quartile range, variance, and standard deviation.

The following are the characteristics of an ideal measure of variation or dispersion:

i. It should be easy to understand.

ii. It should be easy to calculate.

iii. It should be based upon all observations.

iv. It should be rigidly defined.

v. It should not be unduly affected by extreme values.

vi. It should be suitable for further algebraic treatment.

vii. It should be less affected by sampling fluctuation.


Purpose of measure of dispersion
Measure of dispersion is important for the following purpose.

i. To determine the reliability of an average.

ii. To compare the variability.

iii. To compare two or more series with regard to their variability.

iv. To facilitate the use of other statistical measures.

v. It is one of the most important quantities used to characterize a frequency


distribution.

Types of measure of dispersion

• Measure of dispersion or variation may be either absolute or relative.

• Absolute measure of variation is expressed in the same statistical unit in


which the original data are given such as takas, kilograms, tones, etc. and
may be used to compare the variation in two distributions, provided the
variables are expressed in the same units and of same average size.

• On the other hand, often it is necessary to compare the distribution in two or


more different frequency distributions having variables expressed in different
units.

• In such a case dispersion is calculated by dividing the absolute measure of


dispersion by a measure of central tendency – which generates pure number
that are independent of the unit of measurement. The resultant numerical
value is a relative measure of dispersion.

Which measures of Dispersion to choose?

Absolute Measure of Dispersion Relative Measure of Dispersion


When dealing with data, if ones’ When dealing with data, if ones’
objective is “only to determine” the objective is “to determine and
variation of single set of compare” the variations of multiple set
variable/Information: s/he can/will of variables/information having
choose to use Absolute measure of expressed in same/different unit(s): s/he
dispersion. can/will choose to use Relative measure
of dispersion.
Different types of Absolute and Relative measure of dispersion

Absolute Measure of Dispersion Relative Measure of Dispersion


1. Range 1. Coefficient of range

2. Quartile deviation 2. Coefficient of quartile deviation

3. Variance and Standard deviation 3. Coefficient of variation and


standard deviation

Range
• The range of a set of data values is the difference between the highest and
the lowest values in the set.

• If 𝑋𝑋𝑙𝑙 & 𝑋𝑋𝑆𝑆 are the smallest and the largest values respectively in a set then the
range “R” is defined as R = 𝑋𝑋𝑙𝑙 − 𝑋𝑋𝑆𝑆

• For group data the range is taken either as the difference between the lower
boundary of the first class and the upper boundary of the last class or as the
difference between the highest and the lowest mid-values

Coefficient of Range

The coefficient of dispersion corresponding to range called coefficient of range

𝑿𝑿𝒍𝒍 −𝑿𝑿𝑺𝑺
Coefficient of Range =
𝑿𝑿𝒍𝒍 + 𝑿𝑿𝑺𝑺

Where 𝑋𝑋𝑙𝑙 = Largest value and 𝑋𝑋𝑆𝑆 = Smallest Value


Range

Merits Demerits

• It is not based on all observation.

• Easy to understand and • Range does not give any


calculate. indication of the character of the
distribution with in the two
• It is based only on extreme extreme observations.
observations and no detail in
formations is required. • Range is subject of fluctuations
from sample to sample.
• It gives us a quick idea of the
variability of a set of data • Cannot be computed in case of
open-end class.

Quartile Deviation

• Quartiles divide the observations in to four equal parts, when observations are
arranged in order of magnitudes.

• Median, denoted by 𝑄𝑄2 , is the middle most observation and 𝑄𝑄1 & 𝑄𝑄3 are the
middle most observations of the lower and upper half respectively.

• Therefore 𝑄𝑄2 − 𝑄𝑄1 and 𝑄𝑄3 − 𝑄𝑄2 gives us some measure of dispersion.

• The AM of these two measures give us the quartile deviation and is denoted
by QD.

(𝐐𝐐𝟐𝟐 −𝐐𝐐𝟏𝟏 )+ (𝐐𝐐𝟑𝟑 −𝐐𝐐𝟐𝟐 ) 𝐐𝐐𝟑𝟑 −𝐐𝐐𝟏𝟏


𝑄𝑄𝑄𝑄 = =
𝟐𝟐 𝟐𝟐

Coefficient of Quartile Deviation

• The coefficient of variation corresponding to quartile deviation is called the


coefficient of quartile deviation and is defined as

𝑸𝑸𝟑𝟑 −𝑸𝑸𝟏𝟏
Coefficient of 𝑄𝑄𝑄𝑄 =
𝑸𝑸𝟑𝟑 +𝑸𝑸𝟏𝟏
Merits Demerits

• It is superior to range as a • It ignores 50% of items that is the


measure of dispersion. first 25% and last 25% of
observations.
• It is applicable in Open-end
class. • Very much affected by sampling
fluctuations.
• Easy to understand and
compute. • Not suited for further algebraic
treatment.
• Not affected by extreme values.

Test Yourself 1
The following data represents Annual wages of two Factories X and Y for the
given information
I. Determine range and coefficient of range. (in ‘000 Tk)
II. Determine the quartile deviation and coefficient of Co-efficient of quartile
deviation.

Table 1: Annual wages of Factory X workers (in ‘000 Tk)

91 70 74 79 86 93
60 71 76 79 87 96
112 72 127 79 87 62
68 72 77 79 90 76
69 73 77 85 48 157

Table 2: Annual wages of Factory Y workers (in ‘000 Tk)

97 78 85 92 97 105
72 79 85 92 97 107
112 79 87 92 97 72
113 80 90 96 68 75
78 82 90 97 100
Hints:
1. Find the lowest and highest value for each table.
2. Calculate the quartiles (𝑄𝑄1, 𝑄𝑄2, 𝑄𝑄3) for each table.

I. Determine range and coefficient of range. (in ‘000 Tk)


For table 1:
R = 157-48 = 109
157−48
Coefficient of Range = 157+48 = 0.5317

For table 2:
R = 113-68 = 45
113−68
Coefficient of Range = 113+68 = 0.2486

II. Determine the quartile deviation and coefficient of Co-efficient of quartile


deviation.
For table 1:
First Quartile 𝑄𝑄1:
𝑖𝑖𝑖𝑖 1∗30
4
= 4
= 7.5, not an integer

𝑄𝑄1= 72
Second Quartile 𝑄𝑄2 :
𝑖𝑖𝑖𝑖 2∗30
4
= 4
= 15, an integer
1
𝑄𝑄2 = 2 [77 + 79] = 78

Third Quartile 𝑄𝑄3 :


𝑖𝑖𝑖𝑖 3∗30
4
= 4
= 22.5, not an integer

𝑄𝑄3 = 87
𝑄𝑄3−𝑄𝑄1 87−72
QD = 2
= 2
= 7.5
𝑄𝑄3−𝑄𝑄1 87−72
Coefficient of QD = 𝑄𝑄3+𝑄𝑄1 = 87+72
= 0.09434
For table 2:
First Quartile 𝑄𝑄1:
𝑖𝑖𝑖𝑖 1∗29
4
= 4
= 7.25, not an integer

𝑄𝑄1 = 79
Second Quartile 𝑄𝑄2 :
𝑖𝑖𝑖𝑖 2∗29
4
= 4
= 14.5, not an integer

𝑄𝑄2 = 90
Third Quartile 𝑄𝑄3 :
𝑖𝑖𝑖𝑖 3∗29
4
= 4
= 21.75, not an integer

𝑄𝑄3 = 97
𝑄𝑄3−𝑄𝑄1 97−79
QD = 2
= 2
=9
𝑄𝑄3−𝑄𝑄1 97−79
Coefficient of QD = 𝑄𝑄3+𝑄𝑄1 = 97+79
= 0.10227
Variance and Standard Deviation

Note: For computational convenience we will use the following formulae

Ungroup Data Group Data


Population

Population
𝑵𝑵 𝟐𝟐 𝒌𝒌 𝟐𝟐
𝟏𝟏 �∑𝑵𝑵
𝒊𝒊=𝟏𝟏 𝒙𝒙𝒊𝒊 � 𝟏𝟏 �∑𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 𝒙𝒙𝒊𝒊 �
𝝈𝝈 = [ � 𝒙𝒙𝟐𝟐𝒊𝒊 −
𝟐𝟐
] 𝝈𝝈 = [ � 𝒇𝒇𝒊𝒊 𝒙𝒙𝟐𝟐𝒊𝒊 −
𝟐𝟐
]
𝑵𝑵 𝑵𝑵 𝑵𝑵 𝑵𝑵
𝒊𝒊=𝟏𝟏 𝒊𝒊=𝟏𝟏

𝒌𝒌
𝟏𝟏
𝒔𝒔 =𝟐𝟐
[ � 𝒇𝒇𝒊𝒊 𝒙𝒙𝟐𝟐𝒊𝒊

Sample
Sample

𝑵𝑵 𝟐𝟐
𝟏𝟏 �∑𝑵𝑵
𝒊𝒊=𝟏𝟏 𝒙𝒙𝒊𝒊 � 𝒏𝒏 − 𝟏𝟏
𝟐𝟐
𝒔𝒔 = [ � 𝒙𝒙𝟐𝟐𝒊𝒊 − ] 𝒊𝒊=𝟏𝟏
𝒏𝒏 − 𝟏𝟏 𝒏𝒏 𝟐𝟐
𝒊𝒊=𝟏𝟏 �∑𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 𝒙𝒙𝒊𝒊 �
− ]
𝒏𝒏
Ungroup Data Group Data

Variance

• Variance provides an average measure of squared difference between each


observation and arithmetic mean.

• In other words, the variance shows, on an average, how close the values of a
variable are to the arithmetic mean.
Calculating Variance

• If 𝑥𝑥1 , 𝑥𝑥2 , 𝑥𝑥3 , … … … , 𝑥𝑥𝑁𝑁 are sample values and 𝑥𝑥 is the sample mean, then
the deviation of the value 𝑥𝑥𝑖𝑖 from the sample mean 𝑥𝑥̅ is ( 𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ ) and the
squared deviation is (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )2.

• �)𝟐𝟐 . The following graph shows


The sum of squared deviations is 𝝈𝝈𝒏𝒏𝒊𝒊=𝟏𝟏 (𝒙𝒙𝒊𝒊 − 𝒙𝒙
the squared deviations of the values from their mean.

Standard Deviation

• The variance represents squared units, and therefore is not appropriate


measure of dispersion when we wish to express the concept of dispersion in
terms of the original unit.

• The Standard deviation is another measure of dispersion. The standard


deviation is the positive square root of the variance and is expressed in the
original unit of the data.

• Standard Deviation of variable 𝑋𝑋, 𝑆𝑆𝑆𝑆(X) = �𝑽𝑽𝑽𝑽𝑽𝑽(𝑿𝑿)

Population: Ungrouped Data

• If 𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3 , … … … 𝑋𝑋𝑁𝑁 are N values of a population of size N, then the
population variance, commonly designated as 𝝈𝝈2, is defined as

𝝈𝝈𝑵𝑵
𝒊𝒊=𝟏𝟏 (𝑿𝑿𝒊𝒊 − µ)
𝟐𝟐
𝝈𝝈𝟐𝟐 = , where µ = Population Mean
𝑵𝑵
Recommended formula:
𝑵𝑵 𝟐𝟐
𝟐𝟐
𝟏𝟏 𝟐𝟐
�∑𝑵𝑵
𝒊𝒊=𝟏𝟏 𝒙𝒙𝒊𝒊 �
𝝈𝝈 = [ � 𝒙𝒙𝒊𝒊 − ]
𝑵𝑵 𝑵𝑵
𝒊𝒊=𝟏𝟏

• For the same population, Standard Deviation (SD) of the population,


commonly designated as 𝝈𝝈, is defined as

𝝈𝝈𝑵𝑵
𝒊𝒊=𝟏𝟏 (𝑿𝑿𝒊𝒊 −µ)
𝟐𝟐
𝝈𝝈 = √𝑽𝑽𝑽𝑽𝒓𝒓𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊 = √𝝈𝝈𝟐𝟐 = �
𝑵𝑵

Test Yourself 2
A population of 10 students got the marks in the examination as given in the
table below. Find the variance and Standard Deviation of the given data.
13 15 14 16 2 8 9 23 28 12
Answer:

𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖2
13 169

15 225

14 196

16 256

2 4

8 64

9 81

23 529

28 784

12 144

∑ 𝑥𝑥𝑖𝑖 = 140 ∑ 𝑥𝑥𝑖𝑖2 = 2452

2
1 �∑𝑁𝑁
𝑖𝑖=1 𝑥𝑥𝑖𝑖 � 1 (140)2
Variance, 𝝈𝝈2 = 𝑁𝑁 [ ∑𝑁𝑁 2
𝑖𝑖=1 𝑥𝑥𝑖𝑖 − ] = 10 [2452 − ] = 49.2
𝑁𝑁 10

SD = 𝝈𝝈 = 7.01427
Population: Grouped Data

• In case of grouped data, if 𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3, … … … , 𝑋𝑋𝑘𝑘 are values that occur with
frequencies 𝑓𝑓1 , 𝑓𝑓2 , 𝑓𝑓3 , … … … , 𝑓𝑓𝑘𝑘 respectively in a population of size N, then
the population variance 𝝈𝝈2 is defined as

𝟐𝟐
𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 (𝑿𝑿𝒊𝒊 − µ )𝟐𝟐 𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 (𝑿𝑿𝒊𝒊 − µ)𝟐𝟐
𝝈𝝈 = =
𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 𝑵𝑵

Recommended Formula:

𝒌𝒌 𝟐𝟐
𝟏𝟏 �∑𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 𝒙𝒙𝒊𝒊 �
𝝈𝝈 = [ � 𝒇𝒇𝒊𝒊 𝒙𝒙𝟐𝟐𝒊𝒊 −
𝟐𝟐
]
𝑵𝑵 𝑵𝑵
𝒊𝒊=𝟏𝟏

• For the same population, Standard Deviation (SD) of the population, 𝝈𝝈 is


defined as

𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 (𝑿𝑿𝒊𝒊 − µ)𝟐𝟐


� 𝟐𝟐
𝝈𝝈 = √𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽 = 𝝈𝝈 = �
𝑵𝑵

Test Yourself 3
A population of 40 students got marks in the examination as given in the table
below. Find the variance and Standard Deviation of the given data.
𝑋𝑋𝑋𝑋 15 20 25 30 35
𝑓𝑓𝑓𝑓 6 8 15 7 4

𝑥𝑥𝑖𝑖 𝑓𝑓𝑖𝑖 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖2 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖


15 6 1350 90

20 8 3200 160

25 15 9375 375

30 7 6300 210

35 4 4900 140

∑ 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖2 = 25125 ∑ 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖 = 975


2
1 �∑𝑘𝑘
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖 � 1 (975)2
𝜎𝜎 = 𝑁𝑁 [
2 ∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖2 − ] = 40 [25125 − ] = 33.9844
𝑁𝑁 40

SD = 5.8296

Sample: Ungrouped Data

• If 𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3, … … … , 𝑋𝑋𝑘𝑘 are values of a sample of size n, then the sample
variance 𝒔𝒔𝟐𝟐 is defined as

𝝈𝝈𝒏𝒏 � )𝟐𝟐
𝒊𝒊=𝟏𝟏 (𝑿𝑿𝒊𝒊 −𝒙𝒙
𝟐𝟐
𝒔𝒔 = , where x = Sample mean
𝒏𝒏−𝟏𝟏

Recommended Formula:

𝑵𝑵 𝟐𝟐
𝟏𝟏 �∑𝑵𝑵
𝒊𝒊=𝟏𝟏 𝒙𝒙𝒊𝒊 �
𝟐𝟐
𝒔𝒔 = [ � 𝒙𝒙𝟐𝟐𝒊𝒊 − ]
𝒏𝒏 − 𝟏𝟏 𝒏𝒏
𝒊𝒊=𝟏𝟏

For the same population, Standard Deviation (SD) of the population, 𝝈𝝈 is defined
as

𝝈𝝈𝒏𝒏𝒊𝒊=𝟏𝟏 (𝑿𝑿𝒊𝒊 − 𝒙𝒙
� )𝟐𝟐
𝒔𝒔 = √𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽 = �𝒔𝒔𝟐𝟐 = �
𝒏𝒏 − 𝟏𝟏
Test Yourself 4
A sample of 10 students got the marks in the examination as given in the table
below. Find the variance and Standard Deviation of the given data.
13 15 14 16 2 8 9 23 28 12

𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖2
13 169

15 225

14 196

16 256

2 4

8 64

9 81

23 529

28 784

12 144

∑ 𝑥𝑥𝑖𝑖 = 140 ∑ 𝑥𝑥𝑖𝑖2 = 2452


2
1 �∑𝑁𝑁
𝑖𝑖=1 𝑥𝑥𝑖𝑖 � 1 (140)2
𝑠𝑠 = 𝑛𝑛−1 [
2 ∑𝑁𝑁 2
𝑖𝑖=1 𝑥𝑥𝑖𝑖 − ] = 9 [2452 − ] = 54.66667
𝑛𝑛 10

SD = 7.39369

Sample: Grouped Data

In case of grouped data, 𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3, … … … , 𝑋𝑋𝑘𝑘 are values that occur with
frequencies 𝑓𝑓1 , 𝑓𝑓2 , 𝑓𝑓3 , … … … , 𝑓𝑓𝑘𝑘 respectively in a sample of size n, then the sample
variance 𝒔𝒔𝟐𝟐 is defined as

𝟐𝟐
𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 (𝑿𝑿𝒊𝒊 − 𝒙𝒙
� )𝟐𝟐 𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 (𝑿𝑿𝒊𝒊 − 𝒙𝒙
�)𝟐𝟐
𝒔𝒔 = =
𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 − 𝟏𝟏 𝒏𝒏 − 𝟏𝟏

Recommended Formula:

𝒌𝒌 𝟐𝟐
𝟏𝟏 �∑𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 𝒙𝒙𝒊𝒊 �
𝟐𝟐
𝝈𝝈 = [ � 𝒇𝒇𝒊𝒊 𝒙𝒙𝟐𝟐𝒊𝒊 − ]
𝒏𝒏 − 𝟏𝟏 𝒏𝒏
𝒊𝒊=𝟏𝟏
For the same population, Standard Deviation (SD) of the population, 𝝈𝝈 is defined
as

𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 (𝑿𝑿𝒊𝒊 − 𝒙𝒙


� )𝟐𝟐
𝒔𝒔 = √𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽 = �𝒔𝒔𝟐𝟐 = �
𝒏𝒏 − 𝟏𝟏

Test Yourself 5
A sample of 40 students got marks in the examination as given in the table
below. Find the variance and Standard Deviation of the given data.
𝑋𝑋𝑋𝑋 15 20 25 30 35
𝑓𝑓𝑓𝑓 6 8 15 7 4

𝑥𝑥𝑖𝑖 𝑓𝑓𝑖𝑖 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖2 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖


15 6 1350 90

20 8 3200 160

25 15 9375 375

30 7 6300 210

35 4 4900 140

∑ 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖2 = 25125 ∑ 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖 = 975

2
1 �∑𝑘𝑘
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖 � 1 (975)2
𝑠𝑠 = 𝑛𝑛−1 [
2 ∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖2 − ] = 39 [25125 − ] = 34.8558
𝑛𝑛 40

SD = 5.9039
Population or Sample?

• As we can see, the formulas for variance and SD differ between population
and sample.

• For population, it is a parameter, whereas for sample, it is a statistic.

• Unless clearly mentioned, datasets are samples taken from a large


population.

• So the formula for populations should be used:

o If the question directly mentions that the data is for the whole
population or

o If the question dictates that the data was taken for all members of
population e.g. all students of a class, and then ask for variance/SD for
that population.

• Otherwise, always use the formula for samples, specially when nothing is
mentioned about sample/population.

Why 𝑛𝑛 − 1, not 𝑛𝑛?

• We see that both formulas of samples’ variance/SD (grouped/ungrouped) has


𝑛𝑛 − 1 as a denominator, instead of 𝑛𝑛.

• But for population, it is only 𝑁𝑁, the total population.

• The reason for using 𝑛𝑛 − 1 is complex and out of scope, but three basic points
can be mentioned:

o The average of all variance/SD taken from sample should be equal to


the variance/SD of the population. If we use 𝑛𝑛, it is not.

o For 𝑛𝑛 − 1, the sample’s variance/SD is closer to population’s.

o As samples are a finite set from the population, the value of the last
data is determined by the value of others. Thus the degree of freedom
of the set is 1 less than the size i.e. 𝑛𝑛 − 1.
Test Yourself 6
An Advertising company is looking for a group of extras to shoot a sequence for a
movie. The ages of the first 20 candidates to be interviewed are

50 56 44 49 52 57 56 57 56 59
54 55 61 60 51 59 62 52 54 49

𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖2
50 2500

56 3136

44 1936

49 2401

52 2704

57 3249

56 3136

57 3249

56 3136

59 3481

54 2916

55 3025

61 3721

60 3600

51 2601

59 3481

62 3844

52 2704

54 2916

49 2401

∑ 𝑥𝑥𝑖𝑖 = 1093 ∑ 𝑥𝑥𝑖𝑖2 = 60137

2
1 �∑𝑁𝑁
𝑖𝑖=1 𝑥𝑥𝑖𝑖 � 1 (1093)2
𝑠𝑠 2 = 𝑛𝑛−1 [ ∑𝑁𝑁 2
𝑖𝑖=1 𝑥𝑥𝑖𝑖 − ] = 19 [60137 − ] = 21.29211
𝑛𝑛 20

SD = 4.6143
As the director suggested that a standard deviation of 3 years would be accepted.
So this group of extras will not qualify.

Coefficient of Variation (CV)

• The coefficient of variation (CV) is defined as the ratio of the standard


deviation to the mean:
𝝈𝝈
Population CV, 𝒄𝒄𝒗𝒗 = ∗ 𝟏𝟏𝟏𝟏𝟏𝟏 = 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑖𝑖𝑖𝑖 𝑥𝑥%
µ

𝒔𝒔
Sample CV, 𝒄𝒄𝒗𝒗 = ∗ 𝟏𝟏𝟏𝟏𝟏𝟏 = 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑖𝑖𝑖𝑖 𝑥𝑥%
𝒙𝒙�

• It shows the extent of variability in relation to the mean of the population.

• The coefficient of variation should be computed only for data measured on a


ratio scale.

• For comparison between data sets with different units or widely different
means, one should use the coefficient of variation instead of the standard
deviation.

Comparing Coefficients

• C.V. is free of unit (unit-less) and the SD is divided by the corresponding AM


to make it comparable for different means.

• To compare the variability of two sets of data (i.e. to determine which set is
more variable), we need to calculate the AM and SD of both sets.

• Then we can calculate the CVs for both sets.

• The data set with the larger value of CV has larger variation which is
expressed in percentage.

• For example, the relative variability of the data set 1 will be larger than data
set 2 if 𝐶𝐶𝑉𝑉1 > 𝐶𝐶𝑉𝑉2 and vice versa.
Test Yourself 7
In a University, students can take any number of courses per semester. For two
samples of 30 students each. The data of how many courses one takes is given
below:

Course Number 2 3 4 5 6

Sample 1 2 5 10 12 1

Sample 2 1 6 8 13 2

For which sample of students, the relative variability of course numbers is


higher?

Sample 1:

𝑥𝑥𝑖𝑖 𝑓𝑓𝑖𝑖 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖2 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖


2 2 8 4

3 5 45 15

4 10 160 40

5 12 300 60

6 1 36 6

∑ 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖2 = 549 ∑ 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖 = 125

2
1 �∑𝑁𝑁
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖 � 1 (125)2
𝑠𝑠12 = 𝑛𝑛−1 [ ∑𝑁𝑁 2
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖 − ] = 29 [549 − ] = 0.971264
𝑛𝑛−1 30

SD1 = 0.985527
AM1 = 4.1667
𝑆𝑆𝑆𝑆
CV1 = 𝐴𝐴𝐴𝐴 = 0.2365
Sample 2:

𝑥𝑥𝑖𝑖 𝑓𝑓𝑖𝑖 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖2 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖


2 1 4 2

3 6 54 18

4 8 128 32

5 13 325 65

6 2 72 12

∑ 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖2 = 583 ∑ 𝑓𝑓𝑖𝑖 ∗ 𝑥𝑥𝑖𝑖 = 129

2
1 �∑𝑁𝑁
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖 � 1 (975)2
𝑠𝑠22 = 𝑛𝑛−1 [ ∑𝑁𝑁 2
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖 − ] = 29 [25125 − ] = 0.9759
𝑛𝑛−1 30

SD2 = 0.9879
AM2 = 4.3
𝑆𝑆𝑆𝑆
CV2 = 𝐴𝐴𝐴𝐴 = 0.2297

Here,
CV1 > CV2
So, the relative variability is higher in sample 1.

Combined Standard Deviation

The combined standard deviation of two sets of data containing 𝑛𝑛1 and 𝑛𝑛2
observations with means µ1 and µ2 and standard deviations 𝝈𝝈1 and 𝝈𝝈2 respectively
is given by

𝐧𝐧𝟏𝟏 �𝛔𝛔𝟐𝟐𝟏𝟏 + 𝐝𝐝𝟐𝟐𝟏𝟏 � + 𝐧𝐧𝟐𝟐 �𝛔𝛔𝟐𝟐𝟐𝟐 + 𝐝𝐝𝟐𝟐𝟐𝟐 �


𝛔𝛔𝟏𝟏𝟏𝟏 =�
𝐧𝐧𝟏𝟏 + 𝐧𝐧𝟐𝟐

Where, 𝝈𝝈𝝈𝝈𝝈𝝈 = combined Standard Deviation

𝑑𝑑1 = µ12 − µ1
𝑑𝑑2 = µ12 − µ2
𝑛𝑛1 µ1+ 𝑛𝑛2 µ2
µ12 =
𝑛𝑛1 +𝑛𝑛2

• This formula of combined standard deviation of two sets of data can be


extended to compute the standard deviation of more than two sets of data on
the same lines.

Test Yourself 8
From the analysis of monthly wages paid to employees in two service
organizations X and Y, the following results were obtained:

Organization X Organization Y
Number of wage-earners 550 650
Average monthly wages 5000 4500
Variance of the 900 1600
distribution of wages

a. Which organization pays a larger amount as monthly wages?

Organization X pays: 550 * 5000 = 2750000


Organization Y pays: 650 * 4500 = 2925000
Organization Y pays larger amount as monthly wages.
b. Determine the combined variance of all the employees taken together?

For organization X:
µ1 = 5000
𝑉𝑉𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎, 𝜎𝜎12 = 900
𝑆𝑆𝑆𝑆, 𝜎𝜎1 = 30
𝑛𝑛1 = 550
For organization Y:
µ2 = 4500
𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉, 𝜎𝜎22 = 1600
𝑆𝑆𝑆𝑆, 𝜎𝜎2 = 40
𝑛𝑛2 = 650
Now,
𝑛𝑛1 µ1+ 𝑛𝑛2 µ2 550∗5000+650∗4500
µ12 = = = 4729.16667
𝑛𝑛1 +𝑛𝑛2 550+650

𝑑𝑑1 = µ12 − µ1 = -270.83333


𝑑𝑑2 = µ12 − µ2 = 229.16667
𝑛𝑛1 �𝜎𝜎12 +𝑑𝑑12 �+𝑛𝑛2 (𝜎𝜎22 +𝑑𝑑22 )
Combined Variance, 𝜎𝜎12 2 = 𝑛𝑛1 +𝑛𝑛2
= 63345.13144

Test Yourself 9
For a group of 50 male workers, the mean and standard deviation of their
monthly wages are tk. 6300 and tk. 600 respectively. For a group of 40 female
workers, these are tk. 5400 and tk. 600 respectively. Find the standard deviation
of monthly wages for the combined group of workers.
For Group 1:
µ1 = 6300
𝑆𝑆𝑆𝑆, 𝜎𝜎1 = 600
𝑛𝑛1 = 50
For Group 2:
µ2 = 5400
𝑆𝑆𝑆𝑆, 𝜎𝜎2 = 600
𝑛𝑛2 = 40
Now,
𝑛𝑛1 µ1+ 𝑛𝑛2 µ2
µ12 = = 5900
𝑛𝑛1 +𝑛𝑛2

𝑑𝑑1 = µ12 − µ1 = -400


𝑑𝑑2 = µ12 − µ2 = 500

𝑛𝑛1 �𝜎𝜎12 +𝑑𝑑12 �+𝑛𝑛2 �𝜎𝜎22 +𝑑𝑑22 �


Combined Standard deviation, 𝜎𝜎12 = � 𝑛𝑛1 +𝑛𝑛2
= 748.3315
Application of Standard Deviation

• In stock charts, Standard Deviation (SD) is a measure of volatility. Chartists


can use SD to measure expected risk and determine the significance of
certain price movements.

• Bollinger Bands show the upper and lower limits of ‘normal’ price
movements based on SD of prices.

• In many fields of Science and Business studies, SD is significant. From the


variance and SD we can understand the fitness of a statistical model when we
deal with the dependencies of data.

Variance

Merits Demerits

• Rigidly defined.

• Based upon all observation.


• Difficult to calculate.
• Easy to understand
• Affected by extreme values.
• Less affected by sampling
• Difficult to calculate for open-end
fluctuations.
class.
• Suitable for further algebraic
treatment.

You might also like