MEFall2023 4
MEFall2023 4
1 / 82
Descriptive Statistics
56 / 82
Measures of Dispersion
Imagine, you are comparing two different data sets (for now, measured in the same units, e.g. kg,
km etc). By chance, it happens that the two data sets have the same means, medians or modes.
Does it mean that the two data sets are the same and they have the same features?
No.
Here we need some extra insight into the data; as a first step, we need to measure their respective
dispersions or variabilities about the center and then compare them.
Some of the most commonly used measures of dispersions are
Range, Mid-range
Inter-quartile Range (also called the fourth-spread), Semi-inter-quartile Range
Mean Deviation
Variance and Standard Deviation
Range is quite a simple measure, as you know, just the difference of the two extreme values in the
data and mid-range is just the average of two extreme values, i.e.
max-value + min-value
mid-range =
2
So we will start from inter-quartile and semi-inter-quartile range.
57 / 82
Measures of Dispersion
Inter-quartile range (aka fourth-spread): The interquartile range, denoted IQR, is a measure
of spread from the lower quartile to the upper quartile,
IQR = Q3 − Q1
58 / 82
Measures of Dispersion
Mean Deviation: Mean (or median) deviation (MD) or mean absolute deviation (MAD) is also
a measure of dispersion defined as the average of the absolute differences/deviations between the
data values and the data center (usually, mean or median). Mathematically,
Using the mean as the data center,
Pn
i=1 |xi − x̄|
MD = .
n
Similarly, for median the MedD is defined as,
Pn
i=1 |xi − x̃|
MedD =
n
P
x
where x̄ = .
n
For grouped data, arranged in frequency table, with k classes having midpoints x1 , x2 , ..., xk , and
frequencies f1 , f2 , ..., fk , the MD and MedD are given by,
Pk Pk
i=1 fi |xi − x̄| i=1 fi |xi − x̃|
MD = and MedD =
n n
P
fx
where x̄ = .
n
59 / 82
Measures of Dispersion
Example 16.
Find the MD and MedD for the following simple data.
65 55 89 56 35 14 56 55 87 45 92
Solution:
Lets denote the data by X. What we need first, are the mean and median. The mean is
Pn
i=1 xi
x̄ =
n
65 + 55 + ... + 92 649
= =
11 11
= 59.
Since n is odd, the median is just the middle observation of the oredered data,
14 35 45 55 55 56 56 65 87 89 92
hence median is 56. Now lets proceed to find MD and MedD.
60 / 82
Measures of Dispersion
P P
Lets arrange the data in a table, and calculate the required quantities i.e. |xi − x̄| and |xi − x̃|,
for the above formulas,
xi xi − x̄ |xi − x̄| xi − x̃ |xi − x̃|
65 65-59=6 6 65-56=9 9
55 55-59=-4 4 55-56=-1 1
89 30 30 33 33
56 -3 3 0 0
35 -24 24 -21 21
14 -45 45 -42 42
56 -3 3 0 0
55 -4 4 -1 1
87 28 28 31 31
45 -14 14 -11 11
92
P 33 33 36 36
194 33 185
Hence,
P P
|xi − x̄| 194 |xi − x̃| 185
MD = = = 17.6 and MedD = = = 16.8
n 11 n 11
61 / 82
Measures of Dispersion
Example 17.
Find the MD and MedD for the following grouped data.
x : 14 35 45 55 56 65 87 89 92
f : 4 7 11 13 18 13 8 6 3
Solution:
Again, first we need the mean and the median to calculate the necessary columns. The mean is
Pk
fi xi 4870
x̄ = Pi=1
k
= = 58.7
i=1 fi
83
x̃ = 56
62 / 82
Measures of Dispersion
P
fi |xi − x̄| 1182
MD = P = = 14.2
fi 83
P
fi |xi − x̃| 1120
MedD = P = = 13.5 That’s it!
fi 83
63 / 82
Measures of Dispersion
Variance and Standard Deviation: Variance is defined as the mean of the squared deviations of
all the observations from the mean. Population variance is denoted by σ 2 and the sample variance
is denoted by S 2 or σ̂ 2 . Mathematically, for simple data,
(xi − µ)2
P
σ2 = , for population data,
N
(xi − x̄)2
P
S2 = , for large sample data (n > 30),
n
or
(xi − x̄)2
P
s2 = , for small sample data (n ≤ 30),
n−1
The standard deviation is just the positive square root of the variance, defined as,
rP
(xi − µ)2
σ= , for population data,
N
rP
(xi − x̄)2
S= , for large sample data (n > 30)
n
or
s
(xi − x̄)2
P
s= , for small sample data (n ≤ 30)
n−1
64 / 82
Measures of Dispersion
Standard deviation (SD) is a widely used measure of variability or diversity, used in statistics and
probability theory. It shows how much variation or “dispersion” exists from the average (mean,
or expected value). A low standard deviation indicates that the data points tend to be very close
to the mean, whereas high standard deviation indicates that the data points are spread out over a
large range of values.
p
Do you remember the formula D = (x1 − x2 )2 + (y1 − y2 )2 ?
Do you notice the similarity between SD and this formula?
SD does almost the same function as D, except that it averages the squared deviations and that the
coordinates of the second (here mean) point are the same for all pairs. i.e. for two data points
s
(x1 − x̄)2 + (x2 − x̄)2
SD =
2
Generalizing to n observations x1 , x2 , ..., xn , the SD is
s sP
(x1 − x̄)2 + (x2 − x̄)2 + ... + (xn − x̄)2 (xi − x̄)2
SD = =
n n
SD is commonly used to measure confidence in statistical conclusions.
65 / 82
Measures of Dispersion
For grouped data arranged in frequency table with k classes with midpoints x1 , x2 , ..., xk , and
frequencies f1 , f2 , ..., fk the variance and standard deviations are given by,
Pk
fi (xi − x̄)2
S2 = i=1
Pk ,
i=1 fi
and
sP
k
i=1fi (xi − x̄)2
S= Pk
i=1 fi
where
Pk
fi xi
x̄ = Pi=1
k
i=1 fi
In next slides we will solve an example problem using the data from example 17.
66 / 82
Measures of Dispersion
Example 18.
Find the SD and variance of the data in Example 16.
Solution:
Looking at the nature of data (i.e. observations with frequencies), we need to use the formula for SD at
previous page. That is, sP
k 2
i=1 fi (xi − x̄)
S= Pk
i=1 fi
For which we need to find the mean; x̄, which Pkis 58.7 (from 2Example 16), so lets just calculate the
required quantity for the above formula, i.e. i=1 fi (xi − x̄) . The variance is calculated by taking
square of the SD. So we need to construct the following table.
We now have the required stuff for the formula, lets put x f xi − x̄ fi (x − x̄)2
14 4 -44.7 7983
the values in it. 35 7 -23.7 3923
r 45 11 -13.7 2057
30056 55 13 -3.7 176
S= = 19.0
83 56 18 -2.7 129
65 13 6.3 520
Thus the standard deviation (SD) of the said data is 19.0. 87 8 28.3 6419
The variance is simply the square of S, i.e. 89 6 30.3 5518
S 2 = 19.02 = 362.0. 92
P 3 33.3 3332
83 30056
67 / 82
Measures of Dispersion
In practice, variance and SD are calculated by using computationally friendly formulas given as
below,
For a sample of size n with values xi ; i = 1, 2, ..., n,
n
1X
S2 = (xi − x̄)2
n i=1
Pn 2 Pn 2
i=1 xi i=1 xi
= −
n n
Similarly for grouped data, distributed in k groups, with midpoints xi and frequencies fi
(i = 1, 2, ..., k) we use
k
1X
S2 = fi (xi − x̄)2
n i=1
Pk Pk !2
2
i=1 fi xi i=1 fi xi
= −
n n
The benefit of these formulas is that, here one does not need to calculate the column of differences
i.e. xi − x̄.
By taking positive square root of S, we get SD.
68 / 82
Measures of Central Tendency
Box-plots
Stem-and-leaf and histograms convey rather general impressions about a data set, whereas a single
summary such as the mean or standard deviation focuses on just one aspect of the data. A pictorial
summary called box-and-whisker plot or simply box-plot has been used successfully to describe
several most prominent features of a data set. These features include (1) center, (2) spread, (3) the
extent and nature of any departure from symmetry, and (4) identification of “outliers,” observations
that lie unusually far from the main body of the data. Because even a single outlier can drastically
affect the values of x̄ and s, a box-plot is based on measures that are “resistant” to the presence of
a few outliers, i.e. the median and a measure of spread called the fourth spread (quartile range). A
typical boxplot is a rectangle or square like box with mustaches as shown in the following figure.
Q1 - 1.5 fs
Variable Name
Whisker
Q3 + 1.5 fs
−4 −2 0 2
Data
69 / 82
Measures of Dispersion
Definition
Any observation farther than 1.5fs from the closest fourth is an outlier. An outlier is extreme if it is
more than 3fs from the nearest fourth, and it is mild otherwise.
Example 19.
The relevant summary quantities for Example 1.17 (page 39 Devore) are
x̃ = 92.17 Q1 = 45.64 Q3 = 167.79
Q3 − Q1 = fs = 122.15 1.5fs = 183.225 3fs = 366.45
Subtracting 1.5fs from the Q1 gives a negative number, and none of the observations are negative, so
there are no outliers on the lower end of the data. However,
Thus the four largest observations – 563.92, 690.11, 826.54, and 1529.35 – are extreme outliers, and
352.09, 371.47, 444.68, and 460.86 are mild outliers. The box-plot for the above data can then be
sketched as following.
70 / 82
Measures of Dispersion
The whiskers in the boxplot in the above figure extend out to the smallest observation 9.69 on the low
end and 312.45, the largest observation that is not an outlier, on the upper end. There is some positive
skewness in the middle half of the data (the median line is somewhat closer to the right edge of the box
than to the left edge) and a great deal of positive skewness overall. We will learn about positive/negative
skewness in the next few slides. Most importantly, boxplots can be used to compare several data sets at
once, e.g. see the following figure of the monthly boxplots of the daily temperatures in some country.
●
●
5
●
●
0
−5
●
●
●
●
●
−10
January February March April May June July August September October November December
71 / 82
, x̄, and S 2
P
Properties of
Mean, x̄:
¶ For a sample x1 , x2 , ..., xn , the sum of their deviations from mean x̄ is zero.
Xn
(xi − x̄) = 0
i=1
· The sum of squared deviations of xi ’s from the mean, x̄ is a minimum. In other words, for any
arbitrary constant “a” (a 6= x̄)
Xn n
X
(xi − x̄)2 ≤ (xi − a)2
i=1 i=1
¸ If there are k samples, with individual sizes n1 , n2 , ..., nk , their combined mean is defined as,
n1 x̄1 + n2 x̄2 + ... + nk x̄k
x̄c =
n1 + n2 + ... + nk
Pk
i=1 ni x̄i
=
n
That’s the mean of the their individual means.
¹ Suppose a linear transformation is applied to the variable xi to create a new variable yi = mxi + b,
where m and b are constants. Then, the mean of the new variable y is derived by applying summation
over all i values, that’s
n
X n
X
yi = m xi + nb dividing both sides by n we get
i=1 i=1
ȳ = mx̄ + b
This is called the invariance property of the mean under linear transformation. 73 / 82
, x̄, and S 2
P
Properties of
Variance, S 2 :
¶ The variance of a constant is zero. That’s for any constant a,
(a − a)2
P
V ar(a) = ∵ the mean of a constant is that constant itself
1
= 0
· Variance is independent of the origin, that’s if we add or subtract a fixed amount (constant) to xi ,
V ar(X) remains unchanged. Mathematically,
V ar(X + a) = V ar(X)
¸ When the variable X is multiplied/divided by some constant, the variance is multiplied/divided by the
square of that constant. Mathematically,
V ar(aX) = a2 V ar(X)
¹ The variance of the sum/difference of two independent variables is equal to the SUM of their respec-
tive variances, i.e. If X and Y are any two independent variables then,
V ar(X ± Y ) = V ar(X) + V ar(Y )
Noticed something?
V ar(X + Y ) = V ar(X − Y ) = V ar(X)+V ar(Y )
Strange!!!!
The variance of the sums as well as the variance of differences of the two independent variables X and
Y are both equal to the SUM of their individual variances.
Why??
Because variance can not be negative, as it’s just like an (averaged) squared distance of observations
from the mean.
Distance can’t be negative.
Hence Variance and SD can not be negative. 74 / 82
Measures of Dispersion
Coefficient of Variation (CV): Variance and SD can not be used to compare the variability
of two data sets with different measurement units. For this purpose we use CV, which is a pure
measure used to compare the variability of two or more datasets without taking into account the
units in which the data are measured. It’s defined as,
S
CV = × 100 for sample data
x̄
σ
CV = × 100 for population data
µ
Its interpretation is simple, large values of CV indicate higher variability, and smaller values indicate
smaller variability.
Example 20. Find the CV for the data in Example 17.
Solution: We found that x̄ = 58.7 and S = 19.0, thus CV is,
S 19.0
CV = × 100 = × 100 = 32.6
x̄ 58.7
Although, this value seems large but calculating it for a single dataset does not make any sense, as it’s
used to compare the variability of two or more datasets. Lets imagine, the mean and SD of another
dataset “A” were found to be x̄A = 58.7 and SA = 29.0, so
SA 29.7
CVA = × 100 = × 100 = 49.4
x̄A 88.3
which is quite large than that we calculated for data of example 17. So, this second data set has larger
variability.
75 / 82
Measures of Dispersion
Moments: The mean and the variance provide information on the location and variability. Mo-
ments are the quantitative measures used to explore the shape of a distribution, e.g. its width, peak,
and length of tails. These are similar to variance, rather variance itself is a moment. In practice,
only the first four moments are enough to get a detailed information about a distribution. These
are defined as,
N
1 X
µr = (xi − µ)r for population data
N i=1
n
1X
mr = (xi − x̄)r for sample data
n i=1
where r = 1, 2, 3, 4. These are called the central moments or moments about mean. Similarly
for grouped data, distributed in k groups, with midpoints xi and frequencies fi (i = 1, 2, ..., k), the
formulas are,
k
1 X
µr = fi (xi − µ)r for population data
N i=1
k
1X
mr = fi (xi − x̄)r for sample data
n i=1
76 / 82
Measures of Dispersion
Example 21.
Find the first four central moments for the following data.
14 35 45 55 55 56 56 65 87 89 92
Solution: P
xi
First we need the mean, which is x̄ = n
= 59. Now, lets arrange the data in a table and find the
required quatinites.
x xi − x (xi − x̄)2 (xi − x̄)3 (xi − x̄)4
14 -45 2025 -91125 4100625
35 -24 576 -13824 331776
45 -14 196 -2744 38416
55 -4 16 -64 256
55 -4 16 -64 256
56 -3 9 -27 81
56 -3 9 -27 81
65 6 36 216 1296
87 28 784 21952 614656
89 30 900 27000 810000
92 33 1089 35937 1185921
P
0 5656 -22770 7083364
77 / 82
Measures of Dispersion
78 / 82
Measures of Dispersion
argh!!
These are, to be honest, not easy to remember. However, there is a trick; the first two moments are
simple. In 3rd and 4th moments, just remember that the total power (superscript) and the subfix
(subscripts) of any term comprising the raw moments should be equal to the power and the subfix of
the central moment.
The formulas for the population version of moments are same, but are expressed in Greek symbols e.g.
0 0 0 0
raw moments µ1 , µ2 , µ3 , µ4 and central moments µ1 , µ2 , µ3 , µ4 . Solve example 19 using this method.
79 / 82
Measures of Dispersion
Skewness: This is one of the the two ratio measures, for which we actually need the moments.
Skewness is just the opposite of symmetry. If we fold a symmetric distribution about its center, the
two parts will exactly coincide. Skewness is defined such that it’s a pure measure (free of units of
measurements), and it’s zero for perfectly symmetric distribution. We first give some other formulas
for calculating skewness before we use the one based on central moments.
Pearson coefficient of skewness:
3(M ean − M edian)
Sk =
SD
Its value usually varies within [-3, 3]. For symmetric distributions its zero. A negative value indicates
negative skewness and a positive value indicates positive skewness.
Bowley’s coefficient of skewness:
Q1 + Q3 − 2M edian
Sk =
Q3 − Q1
Its values varies within [-1, 1]. It’s also zero for symmetric distributions and non-zero for asymmetric
distributions.
A moments based measure of skewness is defined as,
µ3
Sk = 3 for population data
(µ2 ) 2
m3
Sk = 3 for sample data
(m2 ) 2
Its value, which usually varies between -2 to 2, is also zero for symmetric distributions.
In a positively skewed distributions mean>median>mode and in a negatively skewed distribution mean<
median<mode, whereas in symmetric distributions mean=median=mode.
80 / 82
Measures of Dispersion
+ve Sk −ve Sk ~ Zero Sk
0.12
0.12
0.08
RF
RF
RF
0.04
0.04
0.00
0.00
0 5 10 15 20 25 30 35 −15 −10 −5 0 5 10 15 20 −1.5 −0.5 0.0 0.5 1.0 1.5
X Y Z
Kurtosis: Kurtosis measures the degree of the peakedness or flatness of a distribution. The
peakedness of a distribution has usually three levels, Lepto-Kurtic (high peaked): when most of
the values are near the mode of the distribution, Platy-Kurtic (flat topped): when most of the
values lie at the tails of the distribution, Meso-Kurtic (normal): neither very peaked nor very flat.
Kurtosis
It’s usually calculated by,
0.4
µ4
K = for population data Lepto−kurtic
(µ2 )2 Meso−Kurtic
0.3
Platy−kurtic
m4
K = for sample data
0.2
(m2 )2
0.1
For meso-kurtic distributions K is equal to 3.0,
for lepto-kurtic distributions K > 3 and for
0.0
platy-kurtic K < 3. −5 0 5
81 / 82
Measures of Dispersion
Practice exercises:
Have a look of section 1.4 in Chapter one of J L Devore’s Modern Mathematical Statistics with
Applications.
Then solve questions 41-46, 48, 49 in exercise 1.4.
Also, calculate the first for raw and central moments for the above questions.
The next chapter is on probability theory, for which, we will need to refresh our knowledge about
the elementary sets theory. Like, union, intersection, compliment, Venn diagrams etc.
82 / 82