Lecture 5 Notes
Lecture 5 Notes
What is Dispersion?
• Measures of average (such as the median and mean) represent the typical
value for a dataset. But within the dataset, the actual values usually differ from
one another and from the average value itself.
• The extent to which the central value are good representatives of the values
in the original dataset depends upon the variability or dispersion in the
original data.
Example 1
Group 1 49 50 50 51
Group 2 0 0 100 100
• It is clear from the data that the first group consists of near average intelligent
student and the 2nd group is made up of very bright and very dull students.
• It is evident that the distributions of both groups have the same AM.
• But they differ in variation from 𝑥𝑥̄ such variation is usually measured by the
measure of dispersion.
Example 2
• In the two charts, the number of different sized tutorial groups in semester 1
and semester 2 are presented.
• In both semesters the mean and median tutorial group size is 5 students,
however the groups in semester 2 show more dispersion (or variability in size)
than those in semester 1.
Range
• The range of a set of data values is the difference between the highest and
the lowest values in the set.
• If 𝑋𝑋𝑙𝑙 & 𝑋𝑋𝑆𝑆 are the smallest and the largest values respectively in a set then the
range “R” is defined as R = 𝑋𝑋𝑙𝑙 − 𝑋𝑋𝑆𝑆
• For group data the range is taken either as the difference between the lower
boundary of the first class and the upper boundary of the last class or as the
difference between the highest and the lowest mid-values
Coefficient of Range
𝑿𝑿𝒍𝒍 −𝑿𝑿𝑺𝑺
Coefficient of Range =
𝑿𝑿𝒍𝒍 + 𝑿𝑿𝑺𝑺
Merits Demerits
Quartile Deviation
• Quartiles divide the observations in to four equal parts, when observations are
arranged in order of magnitudes.
• Median, denoted by 𝑄𝑄2 , is the middle most observation and 𝑄𝑄1 & 𝑄𝑄3 are the
middle most observations of the lower and upper half respectively.
• Therefore 𝑄𝑄2 − 𝑄𝑄1 and 𝑄𝑄3 − 𝑄𝑄2 gives us some measure of dispersion.
• The AM of these two measures give us the quartile deviation and is denoted
by QD.
𝑸𝑸𝟑𝟑 −𝑸𝑸𝟏𝟏
Coefficient of 𝑄𝑄𝑄𝑄 =
𝑸𝑸𝟑𝟑 +𝑸𝑸𝟏𝟏
Merits Demerits
Test Yourself 1
The following data represents Annual wages of two Factories X and Y for the
given information
I. Determine range and coefficient of range. (in ‘000 Tk)
II. Determine the quartile deviation and coefficient of Co-efficient of quartile
deviation.
91 70 74 79 86 93
60 71 76 79 87 96
112 72 127 79 87 62
68 72 77 79 90 76
69 73 77 85 48 157
97 78 85 92 97 105
72 79 85 92 97 107
112 79 87 92 97 72
113 80 90 96 68 75
78 82 90 97 100
Hints:
1. Find the lowest and highest value for each table.
2. Calculate the quartiles (𝑄𝑄1, 𝑄𝑄2, 𝑄𝑄3) for each table.
For table 2:
R = 113-68 = 45
113−68
Coefficient of Range = 113+68 = 0.2486
𝑄𝑄1= 72
Second Quartile 𝑄𝑄2 :
𝑖𝑖𝑖𝑖 2∗30
4
= 4
= 15, an integer
1
𝑄𝑄2 = 2 [77 + 79] = 78
𝑄𝑄3 = 87
𝑄𝑄3−𝑄𝑄1 87−72
QD = 2
= 2
= 7.5
𝑄𝑄3−𝑄𝑄1 87−72
Coefficient of QD = 𝑄𝑄3+𝑄𝑄1 = 87+72
= 0.09434
For table 2:
First Quartile 𝑄𝑄1:
𝑖𝑖𝑖𝑖 1∗29
4
= 4
= 7.25, not an integer
𝑄𝑄1 = 79
Second Quartile 𝑄𝑄2 :
𝑖𝑖𝑖𝑖 2∗29
4
= 4
= 14.5, not an integer
𝑄𝑄2 = 90
Third Quartile 𝑄𝑄3 :
𝑖𝑖𝑖𝑖 3∗29
4
= 4
= 21.75, not an integer
𝑄𝑄3 = 97
𝑄𝑄3−𝑄𝑄1 97−79
QD = 2
= 2
=9
𝑄𝑄3−𝑄𝑄1 97−79
Coefficient of QD = 𝑄𝑄3+𝑄𝑄1 = 97+79
= 0.10227
Variance and Standard Deviation
Population
𝑵𝑵 𝟐𝟐 𝒌𝒌 𝟐𝟐
𝟏𝟏 �∑𝑵𝑵
𝒊𝒊=𝟏𝟏 𝒙𝒙𝒊𝒊 � 𝟏𝟏 �∑𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 𝒙𝒙𝒊𝒊 �
𝝈𝝈 = [ � 𝒙𝒙𝟐𝟐𝒊𝒊 −
𝟐𝟐
] 𝝈𝝈 = [ � 𝒇𝒇𝒊𝒊 𝒙𝒙𝟐𝟐𝒊𝒊 −
𝟐𝟐
]
𝑵𝑵 𝑵𝑵 𝑵𝑵 𝑵𝑵
𝒊𝒊=𝟏𝟏 𝒊𝒊=𝟏𝟏
𝒌𝒌
𝟏𝟏
𝒔𝒔 =𝟐𝟐
[ � 𝒇𝒇𝒊𝒊 𝒙𝒙𝟐𝟐𝒊𝒊
Sample
Sample
𝑵𝑵 𝟐𝟐
𝟏𝟏 �∑𝑵𝑵
𝒊𝒊=𝟏𝟏 𝒙𝒙𝒊𝒊 � 𝒏𝒏 − 𝟏𝟏
𝟐𝟐
𝒔𝒔 = [ � 𝒙𝒙𝟐𝟐𝒊𝒊 − ] 𝒊𝒊=𝟏𝟏
𝒏𝒏 − 𝟏𝟏 𝒏𝒏 𝟐𝟐
𝒊𝒊=𝟏𝟏 �∑𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 𝒙𝒙𝒊𝒊 �
− ]
𝒏𝒏
Ungroup Data Group Data
Variance
• In other words, the variance shows, on an average, how close the values of a
variable are to the arithmetic mean.
Calculating Variance
• If 𝑥𝑥1 , 𝑥𝑥2 , 𝑥𝑥3 , … … … , 𝑥𝑥𝑁𝑁 are sample values and 𝑥𝑥 is the sample mean, then
the deviation of the value 𝑥𝑥𝑖𝑖 from the sample mean 𝑥𝑥̅ is ( 𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ ) and the
squared deviation is (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )2.
Standard Deviation
• If 𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3 , … … … 𝑋𝑋𝑁𝑁 are N values of a population of size N, then the
population variance, commonly designated as 𝝈𝝈2, is defined as
𝝈𝝈𝑵𝑵
𝒊𝒊=𝟏𝟏 (𝑿𝑿𝒊𝒊 − µ)
𝟐𝟐
𝝈𝝈𝟐𝟐 = , where µ = Population Mean
𝑵𝑵
Recommended formula:
𝑵𝑵 𝟐𝟐
𝟐𝟐
𝟏𝟏 𝟐𝟐
�∑𝑵𝑵
𝒊𝒊=𝟏𝟏 𝒙𝒙𝒊𝒊 �
𝝈𝝈 = [ � 𝒙𝒙𝒊𝒊 − ]
𝑵𝑵 𝑵𝑵
𝒊𝒊=𝟏𝟏
𝝈𝝈𝑵𝑵
𝒊𝒊=𝟏𝟏 (𝑿𝑿𝒊𝒊 −µ)
𝟐𝟐
𝝈𝝈 = √𝑽𝑽𝑽𝑽𝒓𝒓𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊 = √𝝈𝝈𝟐𝟐 = �
𝑵𝑵
Test Yourself 2
A population of 10 students got the marks in the examination as given in the
table below. Find the variance and Standard Deviation of the given data.
13 15 14 16 2 8 9 23 28 12
Answer:
𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖2
13 169
15 225
14 196
16 256
2 4
8 64
9 81
23 529
28 784
12 144
2
1 �∑𝑁𝑁
𝑖𝑖=1 𝑥𝑥𝑖𝑖 � 1 (140)2
Variance, 𝝈𝝈2 = 𝑁𝑁 [ ∑𝑁𝑁 2
𝑖𝑖=1 𝑥𝑥𝑖𝑖 − ] = 10 [2452 − ] = 49.2
𝑁𝑁 10
SD = 𝝈𝝈 = 7.01427
Population: Grouped Data
• In case of grouped data, if 𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3, … … … , 𝑋𝑋𝑘𝑘 are values that occur with
frequencies 𝑓𝑓1 , 𝑓𝑓2 , 𝑓𝑓3 , … … … , 𝑓𝑓𝑘𝑘 respectively in a population of size N, then
the population variance 𝝈𝝈2 is defined as
𝟐𝟐
𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 (𝑿𝑿𝒊𝒊 − µ )𝟐𝟐 𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 (𝑿𝑿𝒊𝒊 − µ)𝟐𝟐
𝝈𝝈 = =
𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 𝑵𝑵
Recommended Formula:
𝒌𝒌 𝟐𝟐
𝟏𝟏 �∑𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 𝒙𝒙𝒊𝒊 �
𝝈𝝈 = [ � 𝒇𝒇𝒊𝒊 𝒙𝒙𝟐𝟐𝒊𝒊 −
𝟐𝟐
]
𝑵𝑵 𝑵𝑵
𝒊𝒊=𝟏𝟏
Test Yourself 3
A population of 40 students got marks in the examination as given in the table
below. Find the variance and Standard Deviation of the given data.
𝑋𝑋𝑋𝑋 15 20 25 30 35
𝑓𝑓𝑓𝑓 6 8 15 7 4
20 8 3200 160
25 15 9375 375
30 7 6300 210
35 4 4900 140
SD = 5.8296
• If 𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3, … … … , 𝑋𝑋𝑘𝑘 are values of a sample of size n, then the sample
variance 𝒔𝒔𝟐𝟐 is defined as
𝝈𝝈𝒏𝒏 � )𝟐𝟐
𝒊𝒊=𝟏𝟏 (𝑿𝑿𝒊𝒊 −𝒙𝒙
𝟐𝟐
𝒔𝒔 = , where x = Sample mean
𝒏𝒏−𝟏𝟏
Recommended Formula:
𝑵𝑵 𝟐𝟐
𝟏𝟏 �∑𝑵𝑵
𝒊𝒊=𝟏𝟏 𝒙𝒙𝒊𝒊 �
𝟐𝟐
𝒔𝒔 = [ � 𝒙𝒙𝟐𝟐𝒊𝒊 − ]
𝒏𝒏 − 𝟏𝟏 𝒏𝒏
𝒊𝒊=𝟏𝟏
For the same population, Standard Deviation (SD) of the population, 𝝈𝝈 is defined
as
𝝈𝝈𝒏𝒏𝒊𝒊=𝟏𝟏 (𝑿𝑿𝒊𝒊 − 𝒙𝒙
� )𝟐𝟐
𝒔𝒔 = √𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽𝑽 = �𝒔𝒔𝟐𝟐 = �
𝒏𝒏 − 𝟏𝟏
Test Yourself 4
A sample of 10 students got the marks in the examination as given in the table
below. Find the variance and Standard Deviation of the given data.
13 15 14 16 2 8 9 23 28 12
𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖2
13 169
15 225
14 196
16 256
2 4
8 64
9 81
23 529
28 784
12 144
SD = 7.39369
In case of grouped data, 𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3, … … … , 𝑋𝑋𝑘𝑘 are values that occur with
frequencies 𝑓𝑓1 , 𝑓𝑓2 , 𝑓𝑓3 , … … … , 𝑓𝑓𝑘𝑘 respectively in a sample of size n, then the sample
variance 𝒔𝒔𝟐𝟐 is defined as
𝟐𝟐
𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 (𝑿𝑿𝒊𝒊 − 𝒙𝒙
� )𝟐𝟐 𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 (𝑿𝑿𝒊𝒊 − 𝒙𝒙
�)𝟐𝟐
𝒔𝒔 = =
𝝈𝝈𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 − 𝟏𝟏 𝒏𝒏 − 𝟏𝟏
Recommended Formula:
𝒌𝒌 𝟐𝟐
𝟏𝟏 �∑𝒌𝒌𝒊𝒊=𝟏𝟏 𝒇𝒇𝒊𝒊 𝒙𝒙𝒊𝒊 �
𝟐𝟐
𝝈𝝈 = [ � 𝒇𝒇𝒊𝒊 𝒙𝒙𝟐𝟐𝒊𝒊 − ]
𝒏𝒏 − 𝟏𝟏 𝒏𝒏
𝒊𝒊=𝟏𝟏
For the same population, Standard Deviation (SD) of the population, 𝝈𝝈 is defined
as
Test Yourself 5
A sample of 40 students got marks in the examination as given in the table
below. Find the variance and Standard Deviation of the given data.
𝑋𝑋𝑋𝑋 15 20 25 30 35
𝑓𝑓𝑓𝑓 6 8 15 7 4
20 8 3200 160
25 15 9375 375
30 7 6300 210
35 4 4900 140
2
1 �∑𝑘𝑘
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖 � 1 (975)2
𝑠𝑠 = 𝑛𝑛−1 [
2 ∑𝑘𝑘𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖2 − ] = 39 [25125 − ] = 34.8558
𝑛𝑛 40
SD = 5.9039
Population or Sample?
• As we can see, the formulas for variance and SD differ between population
and sample.
o If the question directly mentions that the data is for the whole
population or
o If the question dictates that the data was taken for all members of
population e.g. all students of a class, and then ask for variance/SD for
that population.
• Otherwise, always use the formula for samples, specially when nothing is
mentioned about sample/population.
• The reason for using 𝑛𝑛 − 1 is complex and out of scope, but three basic points
can be mentioned:
o As samples are a finite set from the population, the value of the last
data is determined by the value of others. Thus the degree of freedom
of the set is 1 less than the size i.e. 𝑛𝑛 − 1.
Test Yourself 6
An Advertising company is looking for a group of extras to shoot a sequence for a
movie. The ages of the first 20 candidates to be interviewed are
50 56 44 49 52 57 56 57 56 59
54 55 61 60 51 59 62 52 54 49
𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖2
50 2500
56 3136
44 1936
49 2401
52 2704
57 3249
56 3136
57 3249
56 3136
59 3481
54 2916
55 3025
61 3721
60 3600
51 2601
59 3481
62 3844
52 2704
54 2916
49 2401
2
1 �∑𝑁𝑁
𝑖𝑖=1 𝑥𝑥𝑖𝑖 � 1 (1093)2
𝑠𝑠 2 = 𝑛𝑛−1 [ ∑𝑁𝑁 2
𝑖𝑖=1 𝑥𝑥𝑖𝑖 − ] = 19 [60137 − ] = 21.29211
𝑛𝑛 20
SD = 4.6143
As the director suggested that a standard deviation of 3 years would be accepted.
So this group of extras will not qualify.
𝒔𝒔
Sample CV, 𝒄𝒄𝒗𝒗 = ∗ 𝟏𝟏𝟏𝟏𝟏𝟏 = 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑖𝑖𝑖𝑖 𝑥𝑥%
𝒙𝒙�
• For comparison between data sets with different units or widely different
means, one should use the coefficient of variation instead of the standard
deviation.
Comparing Coefficients
• To compare the variability of two sets of data (i.e. to determine which set is
more variable), we need to calculate the AM and SD of both sets.
• The data set with the larger value of CV has larger variation which is
expressed in percentage.
• For example, the relative variability of the data set 1 will be larger than data
set 2 if 𝐶𝐶𝑉𝑉1 > 𝐶𝐶𝑉𝑉2 and vice versa.
Test Yourself 7
In a University, students can take any number of courses per semester. For two
samples of 30 students each. The data of how many courses one takes is given
below:
Course Number 2 3 4 5 6
Sample 1 2 5 10 12 1
Sample 2 1 6 8 13 2
Sample 1:
3 5 45 15
4 10 160 40
5 12 300 60
6 1 36 6
2
1 �∑𝑁𝑁
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖 � 1 (125)2
𝑠𝑠12 = 𝑛𝑛−1 [ ∑𝑁𝑁 2
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖 − ] = 29 [549 − ] = 0.971264
𝑛𝑛−1 30
SD1 = 0.985527
AM1 = 4.1667
𝑆𝑆𝑆𝑆
CV1 = 𝐴𝐴𝐴𝐴 = 0.2365
Sample 2:
3 6 54 18
4 8 128 32
5 13 325 65
6 2 72 12
2
1 �∑𝑁𝑁
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖 � 1 (975)2
𝑠𝑠22 = 𝑛𝑛−1 [ ∑𝑁𝑁 2
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑥𝑥𝑖𝑖 − ] = 29 [25125 − ] = 0.9759
𝑛𝑛−1 30
SD2 = 0.9879
AM2 = 4.3
𝑆𝑆𝑆𝑆
CV2 = 𝐴𝐴𝐴𝐴 = 0.2297
Here,
CV1 > CV2
So, the relative variability is higher in sample 1.
The combined standard deviation of two sets of data containing 𝑛𝑛1 and 𝑛𝑛2
observations with means µ1 and µ2 and standard deviations 𝝈𝝈1 and 𝝈𝝈2 respectively
is given by
𝑑𝑑1 = µ12 − µ1
𝑑𝑑2 = µ12 − µ2
𝑛𝑛1 µ1+ 𝑛𝑛2 µ2
µ12 =
𝑛𝑛1 +𝑛𝑛2
Test Yourself 8
From the analysis of monthly wages paid to employees in two service
organizations X and Y, the following results were obtained:
Organization X Organization Y
Number of wage-earners 550 650
Average monthly wages 5000 4500
Variance of the 900 1600
distribution of wages
For organization X:
µ1 = 5000
𝑉𝑉𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎, 𝜎𝜎12 = 900
𝑆𝑆𝑆𝑆, 𝜎𝜎1 = 30
𝑛𝑛1 = 550
For organization Y:
µ2 = 4500
𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉, 𝜎𝜎22 = 1600
𝑆𝑆𝑆𝑆, 𝜎𝜎2 = 40
𝑛𝑛2 = 650
Now,
𝑛𝑛1 µ1+ 𝑛𝑛2 µ2 550∗5000+650∗4500
µ12 = = = 4729.16667
𝑛𝑛1 +𝑛𝑛2 550+650
Test Yourself 9
For a group of 50 male workers, the mean and standard deviation of their
monthly wages are tk. 6300 and tk. 600 respectively. For a group of 40 female
workers, these are tk. 5400 and tk. 600 respectively. Find the standard deviation
of monthly wages for the combined group of workers.
For Group 1:
µ1 = 6300
𝑆𝑆𝑆𝑆, 𝜎𝜎1 = 600
𝑛𝑛1 = 50
For Group 2:
µ2 = 5400
𝑆𝑆𝑆𝑆, 𝜎𝜎2 = 600
𝑛𝑛2 = 40
Now,
𝑛𝑛1 µ1+ 𝑛𝑛2 µ2
µ12 = = 5900
𝑛𝑛1 +𝑛𝑛2
• Bollinger Bands show the upper and lower limits of ‘normal’ price
movements based on SD of prices.
Variance
Merits Demerits
• Rigidly defined.