0% found this document useful (0 votes)
32 views19 pages

All You Need To Know About Normal Distribution - Towards Data Science

This document discusses the normal distribution, a foundational concept in statistics. It explains what a normal distribution is, how it relates to probability distributions, and provides examples of when it occurs in nature and statistics problems. The central limit theorem is also mentioned as important context for understanding normal distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views19 pages

All You Need To Know About Normal Distribution - Towards Data Science

This document discusses the normal distribution, a foundational concept in statistics. It explains what a normal distribution is, how it relates to probability distributions, and provides examples of when it occurs in nature and statistics problems. The central limit theorem is also mentioned as important context for understanding normal distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.

All You Need To Know About Normal Distribution


The most normal thing in statistics.

Devesh Poojari
Oct 6, 2019 · 14 min read

Normal Hill Top

Have you ever been told to draw a hill in your art class? What if I told you to imagine or
draw a hill, what is the first shape that comes in your mind?

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 1/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.

source: MS Paint

Is that similar to the one I made above? If it is then your basics in art is good and if it
isn’t similar then maybe you need to re-attend your art class ;p. I am talking about this
shape because this is the shape of the hero of this article the “Normal Distribution”.

We see this shape so often in nature, one of the reasons being it is similar to a triangle
and triangle is said to be the strongest shape in nature. In a similar vein, Normal
Distribution can be found in so many places in statistics (almost everywhere) and is one
of the strongest idea/concept in statistics.

Let’s start this post with a quick-game that will show that Normal Distribution is also in
your mind.

How would you arrange these three bars in the picture below within 5–10 secs of seeing
it.

Assume them to be buildings and you an architect and then arrange them.

Note: bar/buildings 2 and 3 are of the same height.

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 2/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.

source: MS Paint

Keep your answer in the back of your mind for now. Or just arrange it on a piece of
paper.

What we’ll learn in this post:-


1. Probability Distribution. (Because Normal distribution is one of them)

2. Normal Distribution. (Our Hero)

3. Central Limit Theorem (You will get to know when you reach here)

Let’s Start,

1. Probability Distribution (Basics)


I wrote basics in parenthesis because many people fear or try to ignore this topic more
often than any other topic. But let me clarify that I will cover only the basic part here, as
it is only required for the understanding Normal Distribution.

1.1 What is Probability Distribution


Recall that a random variable is a variable whose value is the outcome of a random
event. For example, a random variable could be the outcome of the roll of a dice or the
flip of a coin.

A probability distribution is a list of all of the possible outcomes of a random variable


along with their corresponding probability values.

To give a concrete example, here is the probability distribution of a fair 6-sided die.

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 3/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy. Probability Distribution of 6 sided dice

To be explicit, this is an example of a discrete univariate probability distribution


with finite support. That’s a bit of a mouthful, so let’s try to break that statement down
and understand it.

Discrete = This means that if I pick any two consecutive outcomes. I can’t get an
outcome that’s in between. For example, if we consider 1 and 2 as outcomes of rolling a
six-sided die, then I can’t have an outcome in between that (e.g. I can have an outcome
of 1.5).

Univariate = means that we only have one (random) variable. In this case, we only
have the outcome of the die roll. In contrast, if we have more than one variable then we
say that we have a multivariate distribution.

finite support = This means that there is a limited number of outcomes. The support is
essentially the outcomes for which the probability distribution is defined. So the
support in our example is. 1, 2, 3, 4, 5 and 6. And since this is not an infinite number of
values, it means that the support is finite.

1.2 Probability mass functions: Discrete probability distributions


When we use a probability function to describe a discrete probability distribution we
call it a probability mass function (commonly abbreviated as pmf).

Remember that the probability of a random variable, which we denote with a capital
letter, X, taking on a value, denoted with a lowercase letter, x, is written as P(X=x). So
if we use the dice roll as our example random variable, we can write the probability of
the die landing on the number 3 as P(X=3) = 1/6.

A probability mass function, which we’ll call “f” returns the probability of an outcome.
Therefore, a probability mass function is written as:

I know this is getting a little horrible and mathematical but bear with me. The equation
that we see above says that the probability mass function “f” just returns the probability

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 4/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

ofmake
To the outcome x. we log user data. By using Medium, you agree to our Privacy Policy, including
Medium work,
cookie policy.
Since a probability mass function returns probabilities it must obey the rules of
probability. Namely, the probability mass function outputs values between 0 and 1
inclusive and the sum of the probability mass function (pmf) over all outcomes is equal
to 1. Mathematically we can write these two conditions as

So we’ve seen that we can write a discrete probability distribution as a table and as a
function. We can also represent the die roll example graphically

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 5/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.

Graphical representation of the probability distribution for outcomes of rolling a fair six-sided die

Example discrete probability distribution is “The Bernoulli distribution”. But the word
basics at the start will not allow me to get into it :)

1.3 Probability density functions: Continuous probability distributions


Sometimes we are concerned with the probabilities of random variables that have
continuous outcomes. Examples include the height of an adult picked at random from a
population (a most popular one) or the amount of time that a taxi driver has to wait
before their next job. For these examples, the random variable is better described by a
continuous probability distribution.

When we use a probability function to describe a continuous probability distribution we


call it a probability density function (commonly abbreviated as pdf).

Probability density functions are slightly more complicated conceptually than


probability mass functions but don’t worry, we’ll get there. I think it’ll be easiest to start
with an example of a continuous probability distribution and then discuss the
properties from there.

One of the best examples of the probability density function is “Normal distribution”. I
think it’ll be easiest to start with “Normal Distribution” so that both concepts get clear
simultaneously.

2. Normal Distribution
The normal distribution is probably the most common distribution in all of probability
and statistics.

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 6/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.

Left Image: Normal Distribution_______Right Image: Normal Distribution on hill

2.1 What is Normal Distribution

Normal Distribution

Normal distribution, also known as the Gaussian distribution, is a continuous


probability distribution that is symmetric about the mean, showing that data near the
mean are more frequent in occurrence than data far from the mean. In graph form,
normal distribution will appear as a bell curve. In simple terms, if a probability
distribution forms a bell-shaped curve and mean, median and mode of the sample are
equal then the variable has a normal distribution.

2.2 Mean and Standard Deviation


Normal Distribution is a probability distribution that is solely dependent on mean and
standard deviation.

Mean ( μ): Average of all points in the sample.

Standard Deviation ( σ): How much dataset deviates from the mean of the sample.

The reason why Normal Distribution is so easy to explain because:-

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 7/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

To 1. Mean,
make Medium median andlogmode
work, we are By
user data. allusing
equal.
Medium, you agree to our Privacy Policy, including
cookie policy.
2. Only mean and the standard deviation is required to explain the entire distribution.

The values -3,-2,-1,1,2,3 are no. of standard deviations.

Distribution of values in normal Distribution


Approximately 68% percent of the data falls within 1 standard deviation of the
mean

Approximately 95 percent of the data falls within 2 standard deviations of the mean

Approximately 99. 7 percent of the data falls within 3 standard deviations of the
mean

Now comes the math behind Normal Distribution. Don’t get overwhelmed :O

2.3 Intuition Behind Normal Distribution


Take a deep breath and let’s dive in.

The probability density function for the normal distribution is defined as

Continuous Probability Distribution Function for Normal Distribution


https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 8/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

Where
To the parameters
make Medium (i.e.
work, we log userthe symbols
data. By usingafter theyou
Medium, semicolon)
agree to ourrepresent theincluding
Privacy Policy, mean, μ,
cookie policy. where the centre of the distribution is) and the standard deviation, σ, (how
(the point
spread out the distribution is) of the population.

Let’s Simplify the above equation,

σ² becomes the variance of the distribution.

Variance(σ²): It is the average squared distance from the mean. Standard deviation is
calculated from Variance.

Also ((x-μ)/σ) is the Z-score.

Z-score (z): Z-score is the measurement of how many standard deviations away a point
is from the mean.

The Final Equation looks a lot simplified and with more meaningful terms.

If we set the mean to be equal to zero (μ=0) and the standard deviation equal to 1
(σ=1) then the distribution we get looks like this

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 9/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.

Normal distribution with mean = 0 and standard deviation equal to 1

The normal distribution is an example of a continuous univariate probability distribution


with infinite support. By infinite support, I mean that we can calculate values of the
probability density function for all outcomes between minus infinity and positive
infinity. In mathematics, we sometimes say that it’s supported on the whole real line.

Properties of a continuous probability distribution:


Normal Distribution
The first thing to notice is that the numbers on the vertical axis start at zero and go up.
This is a rule that a probability density function has to obey. Any output value from a
probability density function is greater than or equal to zero. In mathematical lingo, we
would say that the output is non-negative or write this mathematically as

Note: We will use continuous probability distribution and Normal Distribution


interchangeably.

However, unlike probability mass functions, the output of a probability density function
is not a probability value. This is an incredibly important distinction.

To get the probability from a probability density function we need to find the area
under the curve. So from our example distribution with mean = 3 and standard
deviation = 1, we can find the probability that the outcome is between 0 and 1 by
finding the area shown in the image below
https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 10/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.

The area shaded is the probability of the outcome being between 0 and 1.

Mathematically we would write this as

We can read this as “the integral of the probability density function between 0 and 1 (on
the left-hand side) is equal to the probability that the outcome of the random variable is
between zero and 1 (on the right-hand side)”.

To calculate the area of under the curve we use the cumulative distribution function
(CDF), which calculates the area under the curve from negative infinity to a particular
value. Let me give you a brief of how it is done. For data as shown below

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 11/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
Normal distribution and its cumulative distribution is shown below.

Left Image: Normal Probability Distribution_____Right Image:Cumulative Distribution Function

To find the probability of a value lying within 1 standard deviation (here standard
deviation is 10).

1. On Normal Distribution curve, it is the area highlighted in yellow. By using CDF we


need to find the difference between the area hashed in red(from negative infinity to
10) and area highlighted in green(from negative infinity to -10).

2. On CDF curve it is the difference between y-axis value of 10 and -10 which is 84.1–
15.9, which gives us 68.2%.

Perfect it coincides with what the properties of Normal Distribution, that is


approximately 68% percent of the data falls within 1 standard deviation of the mean.

The above image is a snip from an excel file provided by Khan Academy. You can play
with it by changing different field and understanding Normal Distribution. You can
download the excel file from here.

Forgive me as I haven’t explicitly covered integrals and how they work because they are
out of the scope of this post. If you don’t know about them then all you need to know for

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 12/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

the
To moment
make Mediumiswork,
thatwe
it’slog
a mathematical
user data. By usingmethod
Medium,for
youfinding
agree to the area under
our Privacy a curve,
Policy, including
cookie
whichpolicy.
in this case gives us the probabilities of outcomes.

We’ve now seen another property of Normal Distribution. Namely that the probability
between two outcomes, let’s say ‘a’ and ‘b’, is the integral of the probability density
function between those two points (this is equivalent to finding the area under the
curve produced by the probability density function between the points ‘a’ and ‘b’).
Mathematically this is

Remember that we still have to follow the rules of probability distributions, namely the
rule that says that the sum of all possible outcomes is equal to 1. We can cover all
possible values if we set our range from ‘minus infinity’ to ‘positive infinity’. Therefore
the following has to be true for the function to be a probability density function

This says that the area under the curve between minus infinity and positive infinity is
equal to 1.

An important thing to know about continuous probability distributions (and something


that may be weird to come to terms with conceptually) is that the probability of the
random variable is equal to a specific outcome is 0. For example, if we try to get the
probability that the outcome is equal to the number 2 we would get

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 13/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

This
To may
make seemwork,
Medium weirdwe conceptually
log user data. Bybut if Medium,
using you understand calculus
you agree to then
our Privacy it should
Policy, make a
including
cookie policy. sense. I’m not going to cover calculus in this post. Instead, what I want you to
little more
take away from this fact is that we can only talk about probabilities occurring between
two values. Or we can ask about the probability of an outcome being greater than or
less than a specific value. We can’t ask about the probability of an outcome being equal
to a specific value.

One important question comes to everyone’s mind. How are so many variables
approximately normally distributed? What is the logic?

The answer is “Central Limit Theorem”, the most famous theorem, Normal Distribution
is a consequence of CLT.

3. Central Limit Theorem


It is one of the most valuable theorem in statistics. What it says.

The central limit theorem (CLT) is simple. It just says that with a large sample size, sample
means are normally distributed.

CLT has a core idea in stats that lets you use data to evaluate your ideas, even with
incomplete information, hence it is one of the pillars in hypothesis testing, an important
decision making statistics.

Let’s take a random distribution of a 6 sided dice, whose probability distribution


function is with mean μ and standard deviation σ.

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 14/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.

From the figure, we can see that this dice cannot get 2 and 5.

Let’s take samples from this distribution of sample size 4, that is we’ll take 4 random
samples from the population.

Sample1(S1) = (1,1,3,6)

its mean is x1=(1+1+3+6)/4 = 2.75

S2 = (3,4,3,1)

x2 = (3+4+3+1)/4 = 2.75

S3 = (1,1,6,6)

x3 = (1+1+6+6)/4 = 3.5

We can continue to sample infinitely. But for here these will suffice. Now we will plot
freq plot for the Sample means.

The two-block on 2.75 is x1 and x2 and x3 is at 3.5. This is just a sample to show how
the frequency plot is done. If we have taken a large amount of sample then this block

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 15/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

would
To look like
make Medium a small
work, we logdot.
user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.
Let me show you how the plot may look when there is a large number of samples. The
three-dot are our x1, x2, x3. This distribution is called Sampling Distribution of the
Mean. It’s mean is same as the population mean μ and standard deviation σ/sqrt(n).

Normal distribution for n= 4

If we increase the sample size the standard deviation decreases by sqrt(n).

n ∝ 1/σ

Hence as the n increases the curve becomes taller and thinner. If n=20 then Sampling
Distribution of the sample mean is

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 16/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including
cookie policy.

Hence, sample size → ∞ (sample size read as tends to infinity)

distribution → Normal Distribution

Bonus
Real-life data rarely, if ever, follow a perfect normal distribution. The skewness and
kurtosis coefficients measure how different a given distribution is from a normal
distribution.

1. Skew
The skewness measures the symmetry of a distribution. The normal distribution is
symmetric and has a skewness of zero. If the distribution of a data set has a skewness
less than zero, or negative skewness, then the left tail of the distribution is longer than
the right tail; positive skewness implies that the right tail of the distribution is longer
than the left.

Negative skew is called left Skew because the mean is to the left of the peak and in
positive skew mean is on the right of the peak.

2. Kurtosis

The kurtosis statistic measures the thickness of the tail ends of a distribution in relation
to the tails of the normal distribution. Distributions with large kurtosis exhibit tail data
exceeding the tails of the normal distribution (e.g., five or more standard deviations

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 17/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

from
To makethe mean).
Medium Distributions
work, with
we log user data. low kurtosis
By using exhibit
Medium, you agreetail data
to our thatPolicy,
Privacy is generally
includingless
cookie policy.
extreme than the tails of the normal distribution. The normal distribution has a kurtosis
of three, which indicates the distribution has neither fat nor thin tails. Therefore, if an
observed distribution has a kurtosis greater than three, the distribution is said to have
heavy tails when compared to the normal distribution. If the distribution has a kurtosis
of less than three, it is said to have thin tails when compared to the normal distribution.

Well coming to the answer of the quick-game. Is your answer as below. If yes, then
believe me Normal Distribution is in your mind and no I am not a mind reader.

Pheww!!!

That’s all for today.


Thank you for the read and staying with me for so long. I am going to be writing more
beginner-friendly posts in the future too. For more posts like this, you can follow me on

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 18/19
30/3/2020 All You Need To Know About Normal Distribution - Towards Data Science

Medium.
To I welcome
make Medium work, wefeedback andByconstructive
log user data. using Medium,criticism and
you agree to ourcan be Policy,
Privacy reached on
including
cookie policy.
Linkedin .
Thanks for your time. Keep Learning and Keep growing.

Thank You!

Happy Learning :)

Statistics Normal Distribution Data Science Data Analysis

About Help Legal

https://towardsdatascience.com/all-you-need-to-know-about-normal-distribution-3f67df0691f8 19/19

You might also like