SKEWNESS
and
KURTOSIS
Arvin Buncad Alonzo, DIT
Professor
Objectives
▪ Discuss skewness and kurtosis
▪ Explain the different measures of skewness
▪ Calculate values using the measures skewness and
kurtosis
▪ Use a software application to compute or show skewness
and kurtosis
SKEWNESS
▪ refers to the asymmetry or lack of symmetry in the shape of a
frequency distribution. (Morris Hamburg)
▪ "Measures of skewness tell us the direction and the extent of
skewness. In symmetrical distribution the mean, median and
mode are identical. The more the mean moves away from the
mode, the larger the asymmetry or skewness.“ (Simpson &
Kalka)
▪ "When a series is not symmetrical it is said to be
asymmetrical or skewed.“ (Croxton & Cowden)
▪ SIMPLY lack of Symmetry
SKEWNESS
▪ when a distribution is not symmetrical (or is asymmetrical) it is
called a skewed distribution
Symmetrical Distribution
-values of mean, median, mode coincide.
-spread of frequencies is the same on both sides of the
center point of the curve
Asymmetrical Distribution
-values of mean, median, mode does not coincide.
-distribution could either be positively
skewed or negatively skewed as would be
Positively Skewed Distribution
-value of the mean is maximum and that of mode least-the
median lies in between the two.
NegativelySkewed Distribution
value of mode is maximum and that of mean least-the
median lies in between the two
Test of Skewness
▪ Skewness is present if:
▪ values of mean, median and mode do not coincide.
▪ When the data are plotted on a graph they do not give the
normal bell-shaped form i.e. when cut along a vertical line
through the center the two halves are not equal.
▪ The sum of the positive deviations from the median is not equal
to the sum of the negative deviations.
▪ Quartiles are not equidistant from the median.
▪ Frequencies are not equally distributed at points of equal
deviation from the mode.
Test of Skewness
▪ Skewness is absent if:
▪ values of mean, median and mode coincide.
▪ Data when plotted on a graph give the normal bell-shaped
form.
▪ Sum of the positive deviations from the median is equal to the
sum of the negative deviations.
▪ Quartiles are equidistant from the median.
▪ Frequencies are equally distributed at points of equal
deviations from the mode
Measures of Skewness
▪ Karl Pearson's measure
▪ Bowley’s measure
▪ Kelly’s measure
▪ Moment’s measure
Karl Pearson's measure of Skewness
▪ Skewness = Mean - Mode
𝑀𝑒𝑎𝑛 −𝑀𝑜𝑑𝑒
▪ Coefficient of skewness #1(Sk1)=
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
3(𝑀𝑒𝑎𝑛 −𝑀𝑒𝑑𝑖𝑎𝑛)
▪ Coefficient of skewness #2 (SK2)=
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
▪ Coefficient of skewness lies between +1.
▪ Example 1: Find the Skewness using Pearson 1 and 2 given the
following: Mean: 72 Median: 78 Mode: 80 StDev: 20.20
72 −80
▪ Sk1 = = -0.39604
20.20
3(72−78)
▪ Sk2 = = -0.89109
20..20
Pearson’s Measure of Skewness
▪ If the mean is greater than the mode, the distribution is positively skewed.
▪ If the mean is less than the mode, the distribution is negatively skewed.
▪ If the mean is greater than the median, the distribution is positively skewed.
▪ If the mean is less than the median, the distribution is negatively skewed.
▪ A skewness value of zero (0) means no skewness at all.
Source: https://www.statisticshowto.datasciencecentral.com/pearson-mode-skewness/
Example 1
Given the following scores of 10 students in an examination.
46 Calculate the Karl Peason’s coefficient of Skewness.
40 mean 39
37 mode 37
38 median 37.5
37 Std Deviation 3.4641
35
43 𝟑𝟗 − 𝟑𝟕 𝟑(𝟑𝟗 − 𝟑𝟕. 𝟓)
36 SK1 𝟑. 𝟒𝟔𝟒𝟏
SK 2 𝟑. 𝟒𝟔𝟒𝟏
37
41 0.57735 1.299
Example 2
▪ Given the following table, compute the measure of
skewness using the mean, median, mode and the
standard deviation
X f
10-20 18
20-30 30
30-40 40
40-50 55
50-60 38
60-70 20
70-80 16
Solution
Bowley’s Measure of Skewness
▪ Based on quartile values
𝑄3 + 𝑄1 −2𝑀
▪ Skewness =
𝑄3 −𝑄 1
▪ Q3 – upper quartile
▪ Q1 – lower quartile
▪ M – median
▪ Values varies between +1.
▪ An absolute measure of skewness
▪ Based on middle 50% of the observation of data set.
Example 2
Compute the measure of Solution
skewness of the data Class Frequency Cum.Freq.
below
`less than 50 40 40
Class Frequency
50-100 80 120
`less than 50 40
100-150 130 250
50-100 80
150-200 60 310
100-150 130
Above 200 30 230
150-200 60
Q1 =(n+1)/4 85.25 it lies in 50-100 class
Above 200 30
Q1 =l1 + (l2-l1)/f1)(m-c) 78.28
M =(n+1)/2 170.5 it lies in 100-150 class
M =l1 + (l2-l1)/f1)(m-c) 119.42
Q3 =3((n+1)/4) 255.75 it lies in 150-200 class
Q3 =l1 + (l2-l1)/f1)(m-c) 154.79
Bowley's Coefficient of Skewness
= Q3 + Q1 -2M -0.0754
Q3 - Q1 negatively skewed
EXAMPLE 2
source: https://www.statisticshowto.datasciencecentral.com/bowley-skewness/
# of pets # of families Cum.Freq.
0 60 60 Survey on the number of families
1 60 120 on the number of pets they have
2 50 170
3 20 190
4 25 215
5 10 225
6 or more 5 230
230
Q1=(n+1)/4 57.75 it lies in 0 class
M=(n+1)/2 115.5 it lies in 1 class
Q3=3((n+1)/4) 173.25 it lies in 3 class
Bowley's Coefficient of Skewness
= Q3 + Q1 -2M 0.33
Q3 - Q1 positively skewed
Kelly’s Measure of Skewness
▪ Based on percentiles (90th and 10th percentile), where
20% of observations are not included from the measure
▪ Sk=P90 + P10 – 2*P50 OR =D9 + D1-2*D5.
▪ Coefficient of Skewness
P90 + P10 – 2∗P50
= -using percentile
P90 − P10
D9 + D1 – 2∗D5
= -using decile
D9 − D1
▪ Note: D1=P10, D5=P50, D9=P90
Example 1
Ci f cf 21.8th item which lies between
P10=(n+1)/10 21.8 20-30
10-20 18 18
=l1 + (l2-l1)/f1)(m-c) 21.27
20-30 30 48
30-40 40 88 109th item which lies between
40-50 55 143 P50=(n+1)/50 109 40-50
50-60 38 181 =l1 + (l2-l1)/f1)(m-c) 36.18
60-70 20 201
70-80 16 217 196.2th item which lies
P90=90((n+1)/100) 196.2 between 60-70
217 =l1 + (l2-l1)/f1)(m-c) 67.6
Bowley's Coefficient of Skewness
= Q3 + Q1 -2M 0.8295 positively skewed
Q3 - Q1
Problem
A researcher monitors EXAMPLE 2 (using Decile)
and keeps record of Time Spent No. of Employees Cum. Freq.
the time spent by
Faculty Members of 1 8 8
Cagayan State 2 13 21
University in preparing 3 24 45
their weekly lessons 4 23 68
5 30 98
Time No. of
6 17 115
Spent Employees
7 and more 15 130
1 5 sample (N) 130
2 12 D1 =1(N)/10 13 13th employee which lies in 2 hrs
2
3 24 65
D5 =5(N)/10 or N/2 65th employee which lies in 4 hrs
4 20 4
D9 =9((N)/10) 117 117th employee which lies in 6 hrs
5 30
6
6 14 Bowley's Coefficient of Skewness
7 and 15 = D9 + D1 -2D5 0 symmetrical
more D9 - D1
MOMENT
▪ used to indicate peculiarities of a frequency distribution
▪ we measure the central tendency of a series, dispersion
or variability, skewness and the peakedness of the curve
Moments in relation to individual items Moments in relation to mean
1 1
first moment 𝜇1 = σ(𝑋1 − Ẋ) first moment 𝜇1 = σ 𝑓𝑖(𝑋1 − Ẋ)
𝑁 𝑁
1 1
Second moment 𝜇2 = σ(𝑋1 − Ẋ)2 Second moment 𝜇2 = σ 𝑓𝑖(𝑋1 − Ẋ)2
𝑁 𝑁
1 1
Third moment 𝜇2 = σ(𝑋1 − Ẋ) 3 Third moment 𝜇2 = σ 𝑓𝑖(𝑋1 − Ẋ) 3
𝑁 𝑁
1 1
Fourth moment 𝜇2 = σ(𝑋1 − Ẋ)4 Fourth moment 𝜇4 = σ 𝑓𝑖(𝑋1 − Ẋ)4
𝑁 𝑁
Example 1
▪ Find the first, second, third and fourth moments for the
following: 2,3,4,5,6
σ𝑥 2+3+4+5+6
▪ First moment Ẋ= = =4
𝑁 5
σ 𝑥2 22+32+42+52+62
▪ Second moment Ẋ= = = 18
𝑁 5
σ 𝑥3 23+33+43+53+63
▪ Third moment Ẋ= = = 88
𝑁 5
σ 𝑥4 24+34+44+54+64
▪ Fourth moment Ẋ = = = 454.8
𝑁 5
▪ Find also the first, second, third and fourth moments about
their mean
Example 2
Kurtosis
▪ Greek word, which means bulginess.
▪ measures the degree of
peakedness of a frequency
distribution
▪ a measure of whether the data
are heavy-tailed (profusion of
outliers) or light-tailed (lack of
outliers) relative to a normal
distribution.
▪ Excess Kurtosis = kurtosis - 3
Kurtosis, types
▪ Mesokurtic.The distribution which has similar kurtosis as
normal distribution kurtosis, which is zero.
▪ Leptokurtic. The distribution which has kurtosis greater than
a Mesokurtic distribution. Tails of such distributions are thick
and heavy. If the curve of a distribution is more peaked than
Mesokurtic curve, it is referred to as a Leptokurtic curve.
▪ Platykurtic. The distribution which has kurtosis lesser than a
Mesokurtic distribution. Tails of such distributions thinner. If a
curve of a distribution is less peaked than a Mesokurtic curve,
it is referred to as a Platykurtic curve
Kurtosis, types
▪ A normal distribution has kurtosis exactly 3 (excess kurtosis
exactly 0). Any distribution with kurtosis ≈3 (excess ≈0) is
called mesokurtic.
▪ A distribution with kurtosis <3 (excess kurtosis <0) is
called platykurtic. Compared to a normal distribution, its tails
are shorter and thinner, and often its central peak is lower and
broader.
▪ A distribution with kurtosis >3 (excess kurtosis >0) is
called leptokurtic. Compared to a normal distribution, its tails
are longer and fatter, and often its central peak is higher and
sharper.
Kurtosis
▪ β2 = 𝜇4/𝜇22
▪ Where β2 = curtosis
▪ 𝜇4 = fourth moment
▪ 𝜇2 = second moment
▪ Using the values on the table, we have:
3.8032
β2 =
1.16 2
= 2.8263971
See more samples in excel file
See more samples in excel file
Kurtosis
▪ Can also be based on quartiles and percentiles
𝑄
▪ K =
𝑃90−𝑃10
▪ Where Q = (Q3-Q1) P90 = 90th percentile P10 = 10th percentile
https://youtu.be/iF4L3SZ-XPg
Application in excel, watch on youtube: https://youtu.be/iF4L3SZ-
https://youtu.be/iF4L3SZ-XPg
XPg
references
▪ Surinder Kundu (n.d.), An introduction to business statistics, a pdf file
▪ Sarang Narkhede, Understanding Descriptive Statistics, from website:
https://towardsdatascience.com/understanding-descriptive-statistics-
c9c2b0641291
▪ https://365datascience.com/explainer-video/skewness-example/
▪ http://atozmath.com/example/StatsUG.aspx?he=e&q=5
▪ https://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm