Lecture 4 Copy 1
Lecture 4 Copy 1
STA 101
S M Rajib Hossain
Lecture-4
Measures of Dispersion
In statistics, the measures of dispersion help to interpret the variability of
data i.e.to know how much homogenous or heterogeneous the data is. In
simple terms, it shows how scattered the variable is.
Types of Measures of Dispersion
There are two main types of dispersion methods in statistics which are:
✓ Absolute Measure of Dispersion
✓ Relative Measure of Dispersion
Absolute Measure of Dispersion
An absolute measure of dispersion contains the same unit as the original data
set. The absolute dispersion method expresses the variations in terms of the
average of deviations of observations like standard or means deviations. It
includes range, standard deviation, quartile deviation, etc.
The types of absolute measures of dispersion are:
✓ Range
✓ Variance
✓ Standard Deviation
✓ Quartile Deviation
✓ Mean Deviation
Relative Measure of Dispersion
The relative measures of dispersion are used to compare the distribution of
two or more data sets. This measure compares values without units.
Common relative dispersion methods include:
✓ Co-efficient of Range
✓ Co-efficient of Variation (C.V)
✓ Co-efficient of Standard Deviation
✓ Co-efficient of Quartile Deviation
✓ Co-efficient of Mean Deviation
Range
It is simply the difference between the maximum value and the minimum
value given in a data set.
Example: 1, 3, 5, 6, 7
Range = 7 -1
=6
Variance
The term variance refers to a statistical measurement of the spread between
numbers in a data set. More specifically, variance measures how far each
number in the set is from the mean (average), and thus from every other
number in the set. Variance is often depicted by this symbol: 𝜎 2
In our study, we have two types of variances,
• Population variance: Let, 𝑥1 , … , 𝑥𝑁 be 𝑁 observations in a population
and 𝜇 be the population mean. Then the population variance denoted by
𝜎 2 is defined as,
𝑁
2
(𝑥𝑖 − 𝜇)2
𝜎 =∑
𝑁
1
𝑓𝑖 (𝑥𝑖 −𝜇)2
For grouped data 𝜎 2 = ∑𝑁
1 ∑𝑛1 𝑓𝑖
𝑓𝑖 (𝑥𝑖 −𝜇)2
For grouped data 𝜎 = √∑𝑁
1 ∑𝑛1 𝑓𝑖
(𝑥𝑖 −𝑥̅ )2
• Sample standard deviation 𝑠 = √∑𝑛1
𝑛−1
𝑓𝑖 (𝑥𝑖 −𝑥̅ )2
For grouped data 𝑠 = √∑𝑛1 ∑𝑛1 𝑓𝑖 −1
Q1
Find the variance and standard deviation for the following data
∑𝑛
1 𝑓𝑖 𝑥𝑖
Sample mean 𝑥̅ = ∑𝑛
1 𝑓𝑖
231.295
=
57
= 4.0578
𝑓𝑖 (𝑥𝑖 −𝑥̅ )2
Sample variance 𝑠 2 = ∑𝑛1 ∑𝑛1 𝑓𝑖 −1
25.10035
=
57−1
= 0.448221
Sample standard deviation 𝑠 = √0.448221
= 0.669
Mean Deviation
Mean deviation is used to compute how far the values in a data set are from
the center point. Mean, median, and mode all form center points of the data
set. In other words, the mean deviation is used to calculate the average of the
absolute deviations of the data from the central point.
In case of mean
∑𝑛
1 |𝑥𝑖 −𝑥̅ |
M.A.D (𝑥̅ )=
𝑛
∑𝑛
1 𝑓𝑖 |𝑥𝑖 −𝑥̅ |
For grouped data M.A.D (𝑥̅ )= ∑𝑛
1 𝑓𝑖
In case of median
∑𝑛
1 |𝑥𝑖 −𝑀𝑒|
M.A.D (𝑀𝑒)=
𝑛
∑𝑛
1 𝑓𝑖 |𝑥𝑖 −𝑀𝑒|
For grouped data M.A.D (𝑀𝑒)= ∑𝑛
1 𝑓𝑖
In case of mode
∑𝑛
1 |𝑥𝑖 −𝑀𝑜|
M.A.D (𝑀𝑜)=
𝑛
∑𝑛
1 𝑓𝑖 |𝑥𝑖 −𝑀𝑜|
For grouped data M.A.D (𝑀𝑜)= ∑𝑛 1 𝑓𝑖
Q2
∑𝑛
1 𝑓𝑖 𝑥𝑖
Sample mean 𝑥̅ = ∑𝑛
1 𝑓𝑖
4480
=
133
=33.68
∑𝑛
1 𝑓𝑖 |𝑥𝑖 −𝑥̅ |
M.A.D (𝑥̅ )= ∑𝑛
1 𝑓𝑖
1082
=
133
= 8.14
Coefficient of variation
The coefficient of variation is the ratio of the standard deviation to the mean.
It is usually expressed as percentage. Mathematically,
𝜎
𝐶𝑉 = × 100; 𝐹𝑜𝑟 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝜇
𝑠
𝐶𝑉 = × 100; 𝐹𝑜𝑟 𝑆𝑎𝑚𝑝𝑙𝑒
𝑥̅
Q3
Find the coefficient of variation for the two plants of a factory for the given
data and interpret the results.
Two plants C and D of a factory show the following results about the
number of workers and the wages paid to them.
Solution:
To Find: Which plant has greater variability.
For this, we need to find the coefficient of variation. The plant that has a
higher coefficient of variation will have greater variability.
𝑠
Using coefficient of variation formula, 𝐶𝑉 = × 100
𝑥̅
Minimum Score
The lowest score, excluding outliers (shown at the end of the left whisker).
Lower Quartile
Twenty-five percent of scores fall below the lower quartile value (also
known as the first quartile).
Median
The median marks the mid-point of the data and is shown by the line that
divides the box into two parts (sometimes known as the second quartile).
Half the scores are greater than or equal to this value, and half are less.
Upper Quartile
Seventy-five percent of the scores fall below the upper quartile value (also
known as the third quartile). Thus, 25% of data are above this value.
Maximum Score
The highest score, excluding outliers (shown at the end of the right whisker).
Whiskers
The upper and lower whiskers represent scores outside the middle 50% (i.e.,
the lower 25% of scores and the upper 25% of scores).
The Interquartile Range (IQR)
The box plot shows the middle 50% of scores (i.e., the range between the
25th and 75th percentile).
Why Are Box Plots Useful?
Box plots are useful as they show the average score of a data set:
The median is the average value from a set of data and is shown by the line
that divides the box into two parts. Half the scores are greater than or equal
to this value, and half are less.
Box plots are useful as they show the skewness of a data set:
The box plot shape will show if a statistical data set is normally distributed
or skewed.
When the median is in the middle of the box, and the whiskers are about the
same on both sides of the box, then the distribution is symmetric.
When the median is closer to the bottom of the box, and if the whisker is
shorter on the lower end of the box, then the distribution is positively
skewed (skewed right).
When the median is closer to the top of the box, and if the whisker is shorter
on the upper end of the box, then the distribution is negatively skewed
(skewed left).
Box plots are useful as they show outliers within a data set:
An outlier is an observation that is numerically distant from the rest of the
data.
When reviewing a box plot, an outlier is defined as a data point that is
located outside the whiskers of the box plot.
Skewness and Kurtosis
Skewness is a measure of the asymmetry of a distribution.
Skewness is an important statistical technique that helps to determine
asymmetrical behavior than of the frequency distribution, or more precisely,
the lack of symmetry of tails both left and right of the frequency curve. A
distribution or dataset is symmetric if it looks the same to the left and right
of the center point.
Measures of skewness
In studying skewness of a distribution, the first thing that we would like to
know whether the distribution is positively skewed or negatively skewed.
The second thing is to measure the degree of skewness. The simplest
measure of skewness is the Pearson's coefficient of skewness defined as:
𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒
Pearson's coefficient of skewness =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Interpretation
If the value of Pearson's coefficient of skewness is zero, the distribution is
symmetric.
If the value of Pearson's coefficient of skewness is positive, the distribution
is positively skewed.
If the value of Pearson's coefficient of skewness is negative, the distribution
is negatively skewed.
Another measure of skewness due to Bowley, is defined in terms of the
quartile values. Since there is no difference between the distances of either
of the first quartile 𝑄1 or the third quartile 𝑄3 from the median 𝑄2 in a
symmetrical distribution, any difference in the distances from the median is
a reasonable basis for measuring skewness in a distribution. Thus, in terms
of the three quartiles 𝑄1 , 𝑄2 and 𝑄3 the Bowley's Quartile coefficient of
skewness is:
(𝑄3 −𝑄2 )−(𝑄2 −𝑄1 ) 𝑄3 +𝑄1 −2𝑄2
Quartile coefficient of skewness = =
(𝑄3 −𝑄1 ) 𝑄3 −𝑄1
This is evidently a pure number lying between 1 and +1 and is zero for a
symmetrical distribution.
• If 𝑄3 − 𝑄2 = 𝑄2 − 𝑄1 quartile skewness 0 and the distribution is
symmetrical
If 𝑄3 − 𝑄2 > 𝑄2 − 𝑄1 , quartile skewness >0 and the distribution is
positively skewed
• If 𝑄3 − 𝑄2 < 𝑄2 − 𝑄1 quartile skewness <0 and the distribution is
negatively skewed
Kurtosis is a measure of whether the data are heavy-tailed or light-tailed
relative to a normal distribution. That is, data sets with high kurtosis tend to
have heavy tails, or outliers. Data sets with low kurtosis tend to have light
tails, or lack of outliers.
Types of kurtosis: The following figure describes the classification of
kurtosis: