Measures of Shape: Skewness and Kurtosis

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Measures of Shape: Skewness and Kurtosis

Copyright © 2008–2016 by Stan Brown

Summary: You’ve learned numerical measures of center, spread, and outliers, but what about
measures of shape? The histogram can give you a general idea of the shape, but two numerical
measures of shape give a more precise evaluation: skewness tells you the amount and
direction of skew (departure from horizontal symmetry), and kurtosis tells you how tall and
sharp the central peak is, relative to a standard bell curve.

Why do we care? One application is testing for normality: many statistics inferences require
that a distribution be normal or nearly normal. A normal distribution has skewness and excess
kurtosis of 0, so if your distribution is close to those values then it is probably close to normal.

Contents:

 Skewness
o Computing
o Example 1: College Men’s Heights
o Interpreting
o Inferring
o Estimating
 Kurtosis
o Visualizing
o Computing
o Inferring
 Assessing Normality
o Alternative Methods
 Example 2: Size of Rat Litters
 References
 What’s New

Skewness
The first thing you usually notice about a distribution’s shape is whether it has one mode (peak)
or more than one. If it’s unimodal (has just one peak), like most data sets, the next thing you
notice is whether it’s symmetric or skewed to one side. If the bulk of the data is at the left and
the right tail is longer, we say that the distribution is skewed right or positively skewed; if the
peak is toward the right and the left tail is longer, we say that the distribution is skewed left or
negatively skewed.

Look at the two graphs below. They both have μ = 0.6923 and σ = 0.1685, but their shapes are
different.
Beta(α=4.5, β=2) 1.3846 − Beta(α=4.5, β=2)
skewness = −0.5370 skewness = +0.5370

The first one is moderately skewed left: the left tail is longer and most of the distribution is at the
right. By contrast, the second distribution is moderately skewed right: its right tail is longer and
most of the distribution is at the left.

You can get a general impression of skewness by drawing a histogram (MATH200A part 1), but
there are also some common numerical measures of skewness. Some authors favor one, some
favor another. This Web page presents one of them. In fact, these are the same formulas that
Excel uses in its “Descriptive Statistics” tool in Analysis Toolpak, and in the SKEW( ) function.

You may remember that the mean and standard deviation have the same units as the original
data, and the variance has the square of those units. However, the skewness has no units: it’s a
pure number, like a z-score.

Computing

The moment coefficient of skewness of a data set is

skewness: g1 = m3 / m23/2

(1) where

m3 = ∑(x−x̅)3 / n   and   m2 = ∑(x−x̅)2 / n

x̅ is the mean and n is the sample size, as usual. m3 is called the third moment of the data set. m2
is the variance, the square of the standard deviation.

The skewness can also be computed as g1 = the average value of z3, where z is the familiar z-
score, z = (x−x̅)/σ. Of course the average value of z is always zero, but what about the average of
z3? Suppose you have a few points far to the left of the mean, and a lot of points less far to the
right of the mean. Since cubing the deviations gives the big ones even greater weight, you’ll have
negative skewness. It works just the opposite if you have big deviations to the right of the mean.

You’ll remember that you have to compute the variance and standard deviation slightly
differently, depending on whether you have data for the whole population or just a sample. The
same is true of skewness. If you have the whole population, then g1 above is the measure of
skewness. But if you have just a sample, you need the sample skewness:

(2) sample skewness:

(The formula comes from Joanes and Gill 1998.)

Excel doesn’t concern itself with whether you have a sample or a population: its measure of
skewness is always G1, the sample skewness.

Example 1: College Men’s Heights

Height Class Freq-


(inches) Mark, x uency, f
59.5–62.5 61 5
62.5–65.5 64 18
65.5–68.5 67 42
68.5–71.5 70 27
71.5–74.5 73 8

Here are grouped data for heights of 100 randomly selected male students, adapted from Spiegel
and Stephens (1999, 68).

A histogram shows that the data are skewed left, not symmetric.

But how highly skewed are they, compared to other data sets? To answer this question, you have
to compute the skewness.

Begin with the sample size and sample mean. (The sample size was given, but it never hurts to
check.)

n = 5+18+42+27+8 = 100

x̅ = (61×5 + 64×18 + 67×42 + 70×27 + 73×8) ÷ 100

x̅ = 9305 + 1152 + 2814 + 1890 + 584) ÷ 100

x̅ = 6745÷100 = 67.45
Now, with the mean in hand, you can compute the skewness. (Of course in real life you’d
probably use Excel or a statistics package, but it’s good to know where the numbers come from.)

Class Mark, x Frequency, f xf (x−x̅) (x−x̅)²f (x−x̅)³f

61 5 305 -6.45 208.01 -1341.68

64 18 1152 -3.45 214.25 -739.15

67 42 2814 -0.45 8.51 -3.83

70 27 1890 2.55 175.57 447.70

73 8 584 5.55 246.42 1367.63

∑ 6745 n/a 852.75 −269.33

x̅, m2, m3 67.45 n/a 8.5275 −2.6933

Finally, the skewness is

g1 = m3 / m23/2 = −2.6933 / 8.52753/2 = −0.1082

But wait, there’s more! That would be the skewness if you had data for the whole population.
But obviously there are more than 100 male students in the world, or even in almost any school,
so what you have here is a sample, not the population. You must compute the sample skewness:

= [√(100×99) / 98] [−2.6933 / 8.52753/2] = −0.1098

Interpreting

If skewness is positive, the data are positively skewed or skewed right, meaning that the right tail
of the distribution is longer than the left. If skewness is negative, the data are negatively skewed
or skewed left, meaning that the left tail is longer.

If skewness = 0, the data are perfectly symmetrical. But a skewness of exactly zero is quite
unlikely for real-world data, so how can you interpret the skewness number? Bulmer
(1979) — a classic — suggests this rule of thumb:

 If skewness is less than −1 or greater than +1, the distribution is highly skewed.
 If skewness is between −1 and −½ or between +½ and +1, the distribution is moderately
skewed.
 If skewness is between −½ and +½, the distribution is approximately symmetric.

With a skewness of −0.1098, the sample data for student heights are approximately symmetric.
Caution: This is an interpretation of the data you actually have. When you have data for the
whole population, that’s fine. But when you have a sample, the sample skewness doesn’t
necessarily apply to the whole population. In that case the question is, from the sample skewness,
can you conclude anything about the population skewness? To answer that question, see the next
section.

Inferring

Your data set is just one sample drawn from a population. Maybe, from ordinary sample
variability, your sample is skewed even though the population is symmetric. But if the sample is
skewed too much for random chance to be the explanation, then you can conclude that there is
skewness in the population.

But what do I mean by “too much for random chance to be the explanation”? To answer that,
you need to divide the sample skewness G1 by the standard error of skewness (SES) to get the
test statistic, which measures how many standard errors separate the sample skewness from
zero:

(3) test statistic: Zg1 = G1/SES  where 

This formula is adapted from page 85 of Cramer (1997). (Some authors suggest √(6/n), but for
small samples that’s a poor approximation. And anyway, we’ve all got calculators, so you may
as well do it right.)

The critical value of Zg1 is approximately 2. (This is a two-tailed test of skewness ≠ 0 at roughly
the 0.05 significance level.)

 If Zg1 < −2, the population is very likely skewed negatively (though you don’t know by
how much).
 If Zg1 is between −2 and +2, you can’t reach any conclusion about the skewness of the
population: it might be symmetric, or it might be skewed in either direction.
 If Zg1 > 2, the population is very likely skewed positively (though you don’t know by
how much).

Don’t mix up the meanings of this test statistic and the amount of skewness. The amount of
skewness tells you how highly skewed your sample is: the bigger the number, the bigger the
skew. The test statistic tells you whether the whole population is probably skewed, but not by
how much: the bigger the number, the higher the probability.

Estimating

GraphPad suggests a confidence interval for skewness:

(4) 95% confidence interval of population skewness = G1 ± 2 SES

I’m not so sure about that. Joanes and Gill point out that sample skewness is an unbiased
estimator of population skewness for normal distributions, but not others. So I would say,
compute that confidence interval, but take it with several grains of salt — and the further the
sample skewness is from zero, the more skeptical you should be.

For the college men’s heights, recall that the sample skewness was G1 = −0.1098. The sample
size was n = 100 and therefore the standard error of skewness is

SES = √[ (600×99) / (98×101×103) ] = 0.2414

The test statistic is

Zg1 = G1/SES = −0.1098 / 0.2414 = −0.45

This is quite small, so from this sample it’s impossible to say whether the population is
symmetric or skewed. Since the sample skewness is small, a confidence interval is probably
reasonable:

G1 ± 2 SES = −.1098 ± 2×.2414 = −.1098±.4828 = −0.5926 to +0.3730.

You can give a 95% confidence interval of skewness as about −0.59 to +0.37, more or less.

Kurtosis
The other common measure of shape is called the kurtosis. As skewness involves the third
moment of the distribution, kurtosis involves the fourth moment. The outliers in a sample,
therefore, have even more effect on the kurtosis than they do on the skewness  and in a
symmetric distribution both tails increase the kurtosis, unlike skewness where they offset each
other.

You may remember that the mean and standard deviation have the same units as the original
data, and the variance has the square of those units. However, the kurtosis, like skewness, has no
units: it’s a pure number, like a z-score.

Traditionally, kurtosis has been explained in terms of the central peak. You’ll see statements like
this one: Higher values indicate a higher, sharper peak; lower values indicate a lower, less
distinct peak. Balanda and MacGillivray (1988) also mention the tails: increasing kurtosis is
associated with the “movement of probability mass from the shoulders of a distribution into its
center and tails.”

However, Peter Westfall (2014) has been on a bit of a crusade to change this perception, and I
think he makes a good case. We might say, following Wikipedia’s article on kurtosis (accessed
15 May 2016), that “higher kurtosis means more of the variance is the result of infrequent
extreme deviations, as opposed to frequent modestly sized deviations.” In other words, it’s
the tails that mostly account for kurtosis, not the central peak.

The reference standard is a normal distribution, which has a kurtosis of 3. In token of this, often
the excess kurtosis is presented: excess kurtosis is simply kurtosis−3. For example, the
“kurtosis” reported by Excel is actually the excess kurtosis.
 A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any distribution
with kurtosis ≈3 (excess ≈0) is called mesokurtic.
 A distribution with kurtosis <3 (excess kurtosis <0) is called platykurtic. Compared to a
normal distribution, its tails are shorter and thinner, and often its central peak is lower
and broader.
 A distribution with kurtosis >3 (excess kurtosis >0) is called leptokurtic. Compared to a
normal distribution, its tails are longer and fatter, and often its central peak is higher and
sharper.

Note that word “often” in describing changes in the central peak due to changes in the tails.
Westfall 2014 gives several illustrations of counterexamples.

Visualizing

Kurtosis is unfortunately harder to picture than skewness, but these illustrations, suggested by
Wikipedia, should help. All three of these distributions have mean of 0, standard deviation of 1,
and skewness of 0, and all are plotted on the same horizontal and vertical scale. Look at the
progression from left to right, as kurtosis increases.

Uniform(min=−√3, max=√3) Normal(μ=0, σ=1) Logistic(α=0, β=0.55153)


kurtosis = 1.8, excess = −1.2 kurtosis = 3, excess = 0 kurtosis = 4.2, excess = 1.2

Moving from the illustrated uniform distribution to a normal distribution, you see that the
“shoulders” have transferred some of their mass to the center and the tails. In other words, the
intermediate values have become less likely and the central and extreme values have become
more likely. The kurtosis increases while the standard deviation stays the same, because more of
the variation is due to extreme values.

Moving from the normal distribution to the illustrated logistic distribution, the trend continues.
There is even less in the shoulders and even more in the tails, and the central peak is higher and
narrower.

How far can this go? What are the smallest and largest possible values of kurtosis? The
smallest possible kurtosis is 1 (excess kurtosis −2), and the largest is ∞, as shown here:
Discrete: equally likely values Student’s t (df=4)
kurtosis = 1, excess = −2 kurtosis = ∞, excess = ∞

A discrete distribution with two equally likely outcomes, such as winning or losing on the flip of
a coin, has the lowest possible kurtosis. It has no central peak and no real tails, and you could
say that it’s “all shoulder” — it’s as platykurtic as a distribution can be. At the other extreme,
Student’s t distribution with four degrees of freedom has infinite kurtosis. A distribution can’t
be any more leptokurtic than this.

You might want to look at Westfall’s (2014) Figure 2 for three quite different distributions with
identical kurtosis.

Computing

The moment coefficient of kurtosis of a data set is computed almost the same way as the
coefficient of skewness: just change the exponent 3 to 4 in the formulas:

kurtosis: a4 = m4 / m22   and   excess kurtosis: g2 = a4−3

(5) where

m4 = ∑(x−x̅)4 / n   and   m2 = ∑(x−x̅)2 / n

Again, the excess kurtosis is generally used because the excess kurtosis of a normal distribution
is 0. x̅ is the mean and n is the sample size, as usual. m4 is called the fourth moment of the data
set. m2 is the variance, the square of the standard deviation.

The kurtosis can also be computed as a4 = the average value of z4, where z is the familiar z-
score, z = (x−x̅)/σ. Of course the average value of z is always zero, but the average value of z4 is
always ≥ 1, and is larger when you have a few big deviations on either side of the mean than
when you have a lot of small ones.

Just as with variance, standard deviation, and skewness, the above is the final computation of
kurtosis if you have data for the whole population. But if you have data for only a sample, you
have to compute the sample excess kurtosis using this formula, which comes from Joanes and
Gill:
(6) sample excess kurtosis:

Excel doesn’t concern itself with whether you have a sample or a population: its measure of
kurtosis in the KURT( ) function is always G2, the sample excess kurtosis.

Example: Let’s continue with the example of the college men’s heights, and compute the
kurtosis of the data set. n = 100, x̅ = 67.45 inches, and the variance m2 = 8.5275 in² were
computed earlier.

Class Mark, x Frequency, f x−x̅ (x−x̅)4f

61 5 -6.45 8653.84

64 18 -3.45 2550.05

67 42 -0.45 1.72

70 27 2.55 1141.63

73 8 5.55 7590.35

∑ n/a 19937.60

m4 n/a 199.3760

Finally, the kurtosis is

a4 = m4 / m2² = 199.3760/8.5275² = 2.7418

and the excess kurtosis is

g2 = 2.7418−3 = −0.2582

But this is a sample, not the population, so you have to compute the sample excess kurtosis:

G2 = [99/(98×97)] [101×(−0.2582)+6)] = −0.2091

This sample is slightly platykurtic: its peak is just a bit shallower than the peak of a normal
distribution.

Inferring

Your data set is just one sample drawn from a population. How far must the excess kurtosis be
from 0, before you can say that the population also has nonzero excess kurtosis?
The question is similar to the question about skewness, and the answers are similar too. You
divide the sample excess kurtosis by the standard error of kurtosis (SEK) to get the test
statistic, which tells you how many standard errors the sample excess kurtosis is from zero:

(7) test statistic: Zg2 = G2 / SEK  where 

The formula is adapted from page 89 of Cramer (1979). (Some authors suggest √(24/n), but for
small samples that’s a poor approximation. And anyway, we’ve all got calculators, so you may
as well do it right.)

The critical value of Zg2 is approximately 2. (This is a two-tailed test of excess kurtosis ≠ 0 at
approximately the 0.05 significance level.)

 If Zg2 < −2, the population very likely has negative excess kurtosis (kurtosis <3,
platykurtic), though you don’t know how much.
 If Zg2 is between −2 and +2, you can’t reach any conclusion about the kurtosis: excess
kurtosis might be positive, negative, or zero.
 If Zg2 > +2, the population very likely has positive excess kurtosis (kurtosis >3,
leptokurtic), though you don’t know how much.

For the sample college men’s heights (n=100), you found excess kurtosis of G2 = −0.2091. The
sample is platykurtic, but is this enough to let you say that the whole population is platykurtic
(has lower kurtosis than the bell curve)?

First compute the standard error of kurtosis:

SEK = 2 × SES × √[ (n²−1) / ((n−3)(n+5)) ]

n = 100, and the SES was previously computed as 0.2414.

SEK = 2 × 0.2414 × √[ (100²−1) / (97×105) ] = 0.4784

The test statistic is

Zg2 = G2/SEK = −0.2091 / 0.4784 = −0.44

You can’t say whether the kurtosis of the population is the same as or different from the kurtosis
of a normal distribution.

Assessing Normality
There are many ways to assess normality, and unfortunately none of them are without problems.

One test is the D'Agostino-Pearson omnibus test (D’Agostino and Stephens, 390–391; for an
online source see Öztuna, Elhan, Tüccar). I’ve implemented the D’Agostino-Pearson test in an
Excel workbook at Normality Check and Finding Outliers in Excel. It’s called an omnibus test
because it uses the test statistics for both skewness and kurtosis to come up with a single p-value
assessing whether this data set’s shape is too different from normal. The test statistic is

(8) DP = Zg1² + Zg2² follows χ² with df=2

You can look up the p-value in a table, or use χ²cdf on a TI-83 or TI-84.

This χ² test always has 2 degrees of freedom, regardless of sample size. D’Agostino doesn’t say
why explicitly, but an author of one of the other chapters says that it was an empirical match, and
that seems reasonable to me.

χ²cdf(2, 5.991464546) = 0.95, so if the test statistic is bigger than about 6 you would reject the
hypothesis of normality at the .05 level.

Caution: The D’Agostino-Pearson test has a tendency to err on the side of rejecting normality,
particularly with small sample sizes. David Moriarty, in his StatCat utility, recommends that you
don’t use D’Agostino-Pearson for sample sizes below 20.

For college students’ heights you had test statistics Zg1 = −0.45 for skewness and Zg2 = 0.44 for
kurtosis. The omnibus test statistic is

DP = Zg1² + Zg2² = 0.45² + 0.44² = 0.3961

and the p-value for χ²(df=2) > 0.3961, from a table or a statistics calculator, is 0.8203. You
cannot reject the assumption of normality. (Remember, you never accept the null hypothesis, so
you can’t say from this test that the distribution is normal.) The histogram suggests normality,
and this test gives you no reason to reject that impression.

Alternative Methods

There’s no One Right Way to test for normality. One of many alternatives to the D’Agostino-
Pearson test is making a normal probability plot; the accompanying workbook does this. (See
Technology near the top of this page.)

TI calculator owners can use Normality Check on TI-83/84 or Normality Check on TI-89.

See also: The University of Surrey has a good survey of problems with normality tests, at How
do I test the normality of a variable’s distribution? (You have to scroll down about 2/3 of the
page to get to the relevant section, headed YOU THOUGHT THIS WAS GOING TO BE QUICK
AND SIMPLE BUT…) That page recommends using the test statistics Zg1 and Zg2 individually.

Example 2: Size of Rat Litters


For a second illustration of inferences about skewness and kurtosis of a population, I’ll use an
example from Bulmer:

Frequency distribution of litter size in rats, n=815


Litter size 1 2 3 4 5 6 7 8 9 10 11 12
Frequency distribution of litter size in rats, n=815
Frequency 7 33 58 116 125 126 121 107 56 37 25 4

I’ll spare you the detailed calculations, but you should be able to verify them by following
equation (1) and equation (2):

n = 815, x̅ = 6.1252, m2 = 5.1721, m3 = 2.0316

skewness g1 = 0.1727 and sample skewness G1 = 0.1730

The sample is roughly symmetric but slightly skewed right, which


looks about right from the histogram. The standard error of skewness is

SES = √[ (6×815×814) / (813×816×818) ] = 0.0856

Dividing the skewness by the SES, you get the test statistic

Zg1 = 0.1730 / 0.0856 = 2.02

Since this is greater than 2, you can say that there is some positive skewness in the population.
Again, “some positive skewness” just means a figure greater than zero; it doesn’t tell us anything
more about the magnitude of the skewness.

If you go on to compute a 95% confidence interval of skewness from equation (4), you get
0.1730±2×0.0856 = 0.00 to 0.34.

What about the kurtosis? You should be able to follow equation (5) and compute a fourth
moment of m4 = 67.3948. You already have m2 = 5.1721, and therefore

kurtosis a4 = m4 / m2² = 67.3948 / 5.1721² = 2.5194

excess kurtosis g2 = 2.5194−3 = −0.4806

sample excess kurtosis G2 = [814/(813×812)] [816×(−0.4806+6) = −0.4762

So the sample is moderately less peaked than a normal distribution. Again, this matches the
histogram, where you can see the higher “shoulders”.

What if anything can you say about the population? For this you need equation (7). Begin by
computing the standard error of kurtosis, using n = 815 and the previously computed SES of
0.0.0856:
SEK = 2 × SES × √[ (n²−1) / ((n−3)(n+5)) ]

SEK = 2 × 0.0856 × √[ (815²−1) / (812×820) ] = 0.1711

and divide:

Zg2 = G2/SEK = −0.4762 / 0.1711 = −2.78

Since Zg2 is comfortably below −2, you can say that the distribution of all litter sizes is
platykurtic, less sharply peaked than the normal distribution. But be careful: you know that it is
platykurtic, but you don’t know by how much.

You already know the population is not normal, but let’s apply the D’Agostino-Pearson test
anyway:

DP = 2.02² + 2.78² = 11.8088

p-value = P( χ²(2) > 11.8088 ) = 0.0027

The test agrees with the separate tests of skewness and kurtosis: sizes of rat litters, for the entire
population of rats, are not normally distributed.

References
Balanda, Kevin P., and H. L. MacGillivray. 1988.
“Kurtosis: A Critical Review”. The American Statistician 42(2), 111–119.
My thanks to Karl Ove Hufthammer for drawing this article to my attention.
Cramer, Duncan. 1997.
Basic Statistics for Social Research. Routledge.
D’Agostino, Ralph B., and Michael A. Stephens. 1986.
Goodness-of-Fit Techniques. Dekker.
Joanes, D. N., and C. A. Gill. 1998.
“Comparing Measures of Sample Skewness and Kurtosis”. The Statistician 47(1): 183–
189.
Öztuna, Derya, Atilla Halil Elhan, and Ersöz Tüccar. 2006.
“Investigation of Four Different Normality Tests in Terms of Type 1 Error Rate and
Power under Different Distributions”. Turk J Med Sci 36(3): 171-176. Retrieved
15 May 2016 from
http://dergipark.ulakbim.gov.tr/tbtkmedical/article/download/5000030904/5000031141
(PDF)
Spiegel, Murray R., and Larry J. Stephens. 1999.
Theory and Problems of Statistics. 3d ed. McGraw-Hill.
Westfall, Peter H. 2014.
“Kurtosis as Peakedness, 1905–2014. R.I.P.” The American Statistician 68(3): 191–195.
Retrieved 15 May 2016 from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321753/

You might also like