All You Need To Know About Normal Distribution - Towards Data Science
All You Need To Know About Normal Distribution - Towards Data Science
To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
Devesh Poojari
Oct 6, 2019 · 14 min read
Have you ever been told to draw a hill in your art class? What if I told you to imagine or
draw a hill, what is the first shape that comes in your mind?
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 1/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
source: MS Paint
Is that similar to the one I made above? If it is then your basics in art is good and if it
isn’t similar then maybe you need to re-attend your art class ;p. I am talking about this
shape because this is the shape of the hero of this article the “Normal Distribution”.
We see this shape so often in nature, one of the reasons being it is similar to a triangle
and triangle is said to be the strongest shape in nature. In a similar vein, Normal
Distribution can be found in so many places in statistics (almost everywhere) and is one
of the strongest idea/concept in statistics.
Let’s start this post with a quick-game that will show that Normal Distribution is also in
your mind.
How would you arrange these three bars in the picture below within 5–10 secs of seeing
it.
Assume them to be buildings and you an architect and then arrange them.
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 2/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
source: MS Paint
Keep your answer in the back of your mind for now. Or just arrange it on a piece of
paper.
3. Central Limit Theorem (You will get to know when you reach here)
Let’s Start,
To give a concrete example, here is the probability distribution of a fair 6-sided die.
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 3/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy. Probability Distribution of 6 sided dice
Discrete = This means that if I pick any two consecutive outcomes. I can’t get an
outcome that’s in between. For example, if we consider 1 and 2 as outcomes of rolling a
six-sided die, then I can’t have an outcome in between that (e.g. I can have an outcome
of 1.5).
Univariate = means that we only have one (random) variable. In this case, we only
have the outcome of the die roll. In contrast, if we have more than one variable then we
say that we have a multivariate distribution.
finite support = This means that there is a limited number of outcomes. The support is
essentially the outcomes for which the probability distribution is defined. So the
support in our example is. 1, 2, 3, 4, 5 and 6. And since this is not an infinite number of
values, it means that the support is finite.
Remember that the probability of a random variable, which we denote with a capital
letter, X, taking on a value, denoted with a lowercase letter, x, is written as P(X=x). So
if we use the dice roll as our example random variable, we can write the probability of
the die landing on the number 3 as P(X=3) = 1/6.
A probability mass function, which we’ll call “f” returns the probability of an outcome.
Therefore, a probability mass function is written as:
I know this is getting a little horrible and mathematical but bear with me. The equation
that we see above says that the probability mass function “f” just returns the probability
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 4/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
ofmake
To the outcome x. we log user data. By using Medium, you agree to our Privacy Policy, including
Medium work,
cookie policy.
Since a probability mass function returns probabilities it must obey the rules of
probability. Namely, the probability mass function outputs values between 0 and 1
inclusive and the sum of the probability mass function (pmf) over all outcomes is equal
to 1. Mathematically we can write these two conditions as
So we’ve seen that we can write a discrete probability distribution as a table and as a
function. We can also represent the die roll example graphically
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 5/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
Graphical representation of the probability distribution for outcomes of rolling a fair six-sided die
Example discrete probability distribution is “The Bernoulli distribution”. But the word
basics at the start will not allow me to get into it :)
One of the best examples of the probability density function is “Normal distribution”. I
think it’ll be easiest to start with “Normal Distribution” so that both concepts get clear
simultaneously.
2. Normal Distribution
The normal distribution is probably the most common distribution in all of probability
and statistics.
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 6/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
Normal Distribution
Standard Deviation ( σ): How much dataset deviates from the mean of the sample.
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 7/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
To 1. Mean,
make Medium median andlogmode
work, we are By
user data. allusing
equal.
Medium, you agree to our Privacy Policy, including
cookie policy.
2. Only mean and the standard deviation is required to explain the entire distribution.
Approximately 95 percent of the data falls within 2 standard deviations of the mean
Approximately 99. 7 percent of the data falls within 3 standard deviations of the
mean
Now comes the math behind Normal Distribution. Don’t get overwhelmed :O
Where
To the parameters
make Medium (i.e.
work, we log userthe symbols
data. By usingafter theyou
Medium, semicolon)
agree to ourrepresent theincluding
Privacy Policy, mean, μ,
cookie policy. where the centre of the distribution is) and the standard deviation, σ, (how
(the point
spread out the distribution is) of the population.
Variance(σ²): It is the average squared distance from the mean. Standard deviation is
calculated from Variance.
Z-score (z): Z-score is the measurement of how many standard deviations away a point
is from the mean.
The Final Equation looks a lot simplified and with more meaningful terms.
If we set the mean to be equal to zero (μ=0) and the standard deviation equal to 1
(σ=1) then the distribution we get looks like this
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 9/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
However, unlike probability mass functions, the output of a probability density function
is not a probability value. This is an incredibly important distinction.
To get the probability from a probability density function we need to find the area
under the curve. So from our example distribution with mean = 3 and standard
deviation = 1, we can find the probability that the outcome is between 0 and 1 by
finding the area shown in the image below
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 10/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
The area shaded is the probability of the outcome being between 0 and 1.
We can read this as “the integral of the probability density function between 0 and 1 (on
the left-hand side) is equal to the probability that the outcome of the random variable is
between zero and 1 (on the right-hand side)”.
To calculate the area of under the curve we use the cumulative distribution function
(CDF), which calculates the area under the curve from negative infinity to a particular
value. Let me give you a brief of how it is done. For data as shown below
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 11/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
Normal distribution and its cumulative distribution is shown below.
To find the probability of a value lying within 1 standard deviation (here standard
deviation is 10).
2. On CDF curve it is the difference between y-axis value of 10 and -10 which is 84.1–
15.9, which gives us 68.2%.
The above image is a snip from an excel file provided by Khan Academy. You can play
with it by changing different field and understanding Normal Distribution. You can
download the excel file from here.
Forgive me as I haven’t explicitly covered integrals and how they work because they are
out of the scope of this post. If you don’t know about them then all you need to know for
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 12/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
the
To moment
make Mediumiswork,
thatwe
it’slog
a mathematical
user data. By usingmethod
Medium,for
youfinding
agree to the area under
our Privacy a curve,
Policy, including
cookie
whichpolicy.
in this case gives us the probabilities of outcomes.
We’ve now seen another property of Normal Distribution. Namely that the probability
between two outcomes, let’s say ‘a’ and ‘b’, is the integral of the probability density
function between those two points (this is equivalent to finding the area under the
curve produced by the probability density function between the points ‘a’ and ‘b’).
Mathematically this is
Remember that we still have to follow the rules of probability distributions, namely the
rule that says that the sum of all possible outcomes is equal to 1. We can cover all
possible values if we set our range from ‘minus infinity’ to ‘positive infinity’. Therefore
the following has to be true for the function to be a probability density function
This says that the area under the curve between minus infinity and positive infinity is
equal to 1.
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 13/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
This
To may
make seemwork,
Medium weirdwe conceptually
log user data. Bybut if Medium,
using you understand calculus
you agree to then
our Privacy it should
Policy, make a
including
cookie policy. sense. I’m not going to cover calculus in this post. Instead, what I want you to
little more
take away from this fact is that we can only talk about probabilities occurring between
two values. Or we can ask about the probability of an outcome being greater than or
less than a specific value. We can’t ask about the probability of an outcome being equal
to a specific value.
One important question comes to everyone’s mind. How are so many variables
approximately normally distributed? What is the logic?
The answer is “Central Limit Theorem”, the most famous theorem, Normal Distribution
is a consequence of CLT.
The central limit theorem (CLT) is simple. It just says that with a large sample size, sample
means are normally distributed.
CLT has a core idea in stats that lets you use data to evaluate your ideas, even with
incomplete information, hence it is one of the pillars in hypothesis testing, an important
decision making statistics.
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 14/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
From the figure, we can see that this dice cannot get 2 and 5.
Let’s take samples from this distribution of sample size 4, that is we’ll take 4 random
samples from the population.
Sample1(S1) = (1,1,3,6)
S2 = (3,4,3,1)
x2 = (3+4+3+1)/4 = 2.75
S3 = (1,1,6,6)
x3 = (1+1+6+6)/4 = 3.5
We can continue to sample infinitely. But for here these will suffice. Now we will plot
freq plot for the Sample means.
The two-block on 2.75 is x1 and x2 and x3 is at 3.5. This is just a sample to show how
the frequency plot is done. If we have taken a large amount of sample then this block
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 15/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
would
To look like
make Medium a small
work, we logdot.
user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
Let me show you how the plot may look when there is a large number of samples. The
three-dot are our x1, x2, x3. This distribution is called Sampling Distribution of the
Mean. It’s mean is same as the population mean μ and standard deviation σ/sqrt(n).
n ∝ 1/σ
Hence as the n increases the curve becomes taller and thinner. If n=20 then Sampling
Distribution of the sample mean is
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 16/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
Bonus
Real-life data rarely, if ever, follow a perfect normal distribution. The skewness and
kurtosis coefficients measure how different a given distribution is from a normal
distribution.
1. Skew
The skewness measures the symmetry of a distribution. The normal distribution is
symmetric and has a skewness of zero. If the distribution of a data set has a skewness
less than zero, or negative skewness, then the left tail of the distribution is longer than
the right tail; positive skewness implies that the right tail of the distribution is longer
than the left.
Negative skew is called left Skew because the mean is to the left of the peak and in
positive skew mean is on the right of the peak.
2. Kurtosis
The kurtosis statistic measures the thickness of the tail ends of a distribution in relation
to the tails of the normal distribution. Distributions with large kurtosis exhibit tail data
exceeding the tails of the normal distribution (e.g., five or more standard deviations
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 17/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
from
To makethe mean).
Medium Distributions
work, with
we log user data. low kurtosis
By using exhibit
Medium, you agreetail data
to our thatPolicy,
Privacy is generally
includingless
cookie policy.
extreme than the tails of the normal distribution. The normal distribution has a kurtosis
of three, which indicates the distribution has neither fat nor thin tails. Therefore, if an
observed distribution has a kurtosis greater than three, the distribution is said to have
heavy tails when compared to the normal distribution. If the distribution has a kurtosis
of less than three, it is said to have thin tails when compared to the normal distribution.
Well coming to the answer of the quick-game. Is your answer as below. If yes, then
believe me Normal Distribution is in your mind and no I am not a mind reader.
Pheww!!!
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 18/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science
Medium.
To I welcome
make Medium work, wefeedback andByconstructive
log user data. using Medium,criticism and
you agree to ourcan be Policy,
Privacy reached on
including
cookie policy.
Linkedin .
Thanks for your time. Keep Learning and Keep growing.
Thank You!
Happy Learning :)
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 19/19