0% found this document useful (0 votes)

17 views49 pages

Normal Distribution

The document provides an overview of normal distribution, its significance in probability and statistics, and its applications in data science and machine learning. It explains the characteristics of normal distribution, including the bell-shaped curve and the Central Limit Theorem, which states that the sum of many random variables tends to be normally distributed. Additionally, it discusses methods to check and transform data to achieve normality, as well as potential problems associated with assuming normality in various contexts.

Uploaded by

varmakdc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views49 pages

Normal Distribution

Uploaded by

varmakdc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Normal

distribution

Dhanya N.M.
Taxonomy of Probability Distributions

Discrete probability distributions

–Binomial distribution
–Multinomial distribution
–Poisson distribution
–Hypergeometric distribution

Continuous probability distributions

–Normal distribution
–Standard normal distribution
–Gamma distribution
–Exponential distribution
–Chi square distribution
–Lognormal distribution
–Weibull distribution
Normal Distribution

– What is so special about normal probability distribution?

– Why so many data science and machine learning articles revolve around normal
probability distribution?
Agenda

– What probability distribution is?

– What normal distribution means?
– Which variables exhibit normal distribution?
– How to check distribution of your data set in Python?
– How to make a variable normally distributed in Python?
– Problems with normality
A Little Background First

– Firstly, the most important point to note is that the normal distribution is also
known as the Gaussian distribution.
– It is named after the genius of Carl Friedrich Gauss.
– Lastly, an important point to note is that the simple predictive models
are usually the most used models due to the fact that they can be explained
and are well-understood.
– Now to add to this point; normal distribution is simple and hence its simplicity
makes it extremely popular.
What Does Probability
Distribution Mean?
Let me explain by building the appropriate building blocks first.
– Consider the predictive models we might be interested in building in our data science
projects.
– If we want to predict a variable accurately then the first task we need to perform is to
understand the underlying behavior of our target variable.
– What we need to do first is to determine the possible outcomes of the target variable and
if the underlying outcomes are discrete (distinct values) or continuous (infinite values).
– For the sake of simplicity, if we are estimating the behaviour of a dice then the first step is
to know that it can take any value from 1 to 6 (discrete).
– Then the next step would be to start assigning probabilities to the events (values).
Consequently, if a value cannot occur then it is assigned a probability of 0%.
The higher the probability, the
more likely it is for the event to
occur.
– As an instance, we can start repeating an experiment for a large
number of times and start noting the values we retrieve for the
variable.
– Now what we can do is to group the values into categories/buckets.
– And for each bucket, we can start recording the number of times
the variable had the value of the bucket.
– For example, we can throw a dice 10000 times and as there are 6
possible values that a dice can take, we can create 6 buckets.
– And start recording the number of occurrences for each value.
Probability Distribution

– We can plot the chart and it will form a curve.

– This curve is known as probability distribution curve and the likelihood of the target
variable getting a value is the probability distribution of the variable.
– Once we understand how the values are distributed then we can start estimating the
probabilities of the events, even by the means of using formulas (known as probability
distribution functions).
– As a result, we can start understanding its behaviour better.
– The probability distribution is dependent on the moments of the sample such as mean,
standard deviation, skewness and kurtosis.
– If you add all of the probabilities then it will sum up to 100%.
– There are a large number of probability distributions and the most widely used
probability distribution is known as “normal distribution”.
Let’s Now Move Onto
Normal Probability
Distribution
– If you plot the probability distribution and it forms a bell shaped curve and the
mean, mode and median of the sample are equal then the variable has normal
distribution.
– This is an example normal distribution bell shaped curve:
It is important to understand and
estimate the probability distribution of
your target variable.
Following variables are close to normally distributed variables:
– Height of a population
– Blood pressure of adult human
– Position of a particle that experiences diffusion
– Measurement errors
– Residuals in regression
– Shoe size of a population
– Amount of time it takes for employees to reach home
– A large number of educational measures
– Additionally, there are a large number of variables around us which are normal with a x%
confidence; x < 100.
What Is Normal
Distribution?
– A normal distribution is a distribution that is solely dependent on
two parameters of the data set: its mean and the standard
deviation of the sample.
– Mean — This is the average value of all the points in the sample.
– Standard Deviation — This indicates how much the data set
deviates from the mean of the sample.
This characteristic of the distribution makes it
extremely simple for statisticians and hence any
variable that exhibits normal distribution is feasible to
be forecasted with higher accuracy.
Normal Distribution Is Simply … The
Normal Behaviour That We Are Just
So Familiar With
– Now, what’s phenomenal to note is that once you find the
probability distributions of most of the variables in nature then
they all approximately follow normal distribution.
– Normal distribution is simple to explain. The reasons are:
– The mean, mode and median of the distribution are equal.
– We only need to use the mean and standard deviation to explain the entire
distribution.
But how are so many variables
approximately normally distributed?
What is the logic behind it?
– The idea revolves around the theorem that when you repeat an experiment a
large number of times on a large number of random variables then the sum of
their distributions will be very close to normality.
– As height of a person is a random variable and is based on other random
variables such as the amount of nutrition a person consumes, the environment
they live in, their genetics and so on, the sum of the distributions of these
variables end up being very close to normal.
– This is known as the Central Limit Theorem.
This brings us to the core
of the article:
– We understood from the section above that the normal distribution is the sum
of many random distributions.
– If we plot the normal distribution density function, it’s curve has following
characteristics:
Characteristics

– The bell-shaped curve above has 100 mean and 1 standard deviation
– Mean is the center of the curve. This is the highest point of the curve as most of
the points are at the mean.
– There are equal number of points on each side of the curve. The center of the
curve has the most number of points.
– The total area under the curve is the total probability of all of the values that
the variable can take.
– The total curve area is therefore 100%
Characteristics
– Approximately 68.2% of all of the points are within the range -1 to 1 standard
deviation.
– About 95.5% of all of the points are within the range -2 to 2 standard
deviations.
– About 99.7% of all of the points are within the range -3 to 3 standard
deviations.
– This allows us to easily estimate how volatile a variable is and given a
confidence level, what its likely value is going to be.
– As an instance, in the gray bell shaped curve above, there is a 68.2% chance
that the value of the variable will be within 101–99.
– Imagine the confidence you can now have when making future decisions with
that information!!!
Normal Probability
Distribution Function
– The probability density function of normal distribution is:
– The probability density function is essentially the probability of continuous random variable
taking a value.

–
Normal distribution is a bell-shaped curve where
mean=mode=median.
– If you plot the probability distribution curve using its computed probability density
function then the area under the curve for a given range gives the probability of the
target variable being in that range.
– This probability distribution curve is based on a probability distribution function which
itself is computed on a number of parameters such as mean, or standard deviation of
the variable.
– We could use this probability distribution function to find the relative chance of a
random variable taking a value within a range. As an instance, we could record the daily
returns of a stock, group them into appropriate buckets and then find the probability of
the stock making 20–40% gain in the future.
– The larger the standard deviation, the more the volatility in the sample.
How Do I Find Feature
Distribution In Python?
– The simplest method I follow is to load all of the features in the
data frame and then write this script:
– Use the Python Pandas libarary:
– DataFrame.hist(bins=10)
– #Make a histogram of the DataFrame.

– It shows us the probability distributions of all of the variables.

What Does It Mean For A Variable
To Have Normal Distribution?

– Now what’s even more fascinating is that once you add a large number of random
variables with differing distributions together, your new variable will end up having a
normal distribution. This is essentially known as the Central Limit Theorem.
– The variables that exhibit normal distribution always exhibit normal distribution. As
an instance, if A and B are two variables with normal distributions then:
– A x B is normally distributed
– A + B is normally distributed
– As a result, it is extremely simple to forecast a variable and find the probability of it
within a range of values because of the well-known probability distribution function.
What If The Sample
Distribution Is Not Normal?

– You can convert a distribution of a feature into normal distribution.

– I have used a number of techniques to make a feature normally distributed:
1. Linear Transformation

– Once we gather sample for a variable, we can compute the Z-score via linearly
transforming the sample using the formula above:
– Calculate the mean
– Calculate the standard deviation
– For each value x, compute Z using:
2. Using Boxcox
Transformation

– You can use SciPy package of Python to transform data to normal distribution:
– scipy.stats.boxcox(x, lmbda=None, alpha=None)
3. Using Yeo-Johnson
Transformation
– Additionally, power transformer yeo-johnson can be used. Python’s sci-kit learn
provides the appropriate function:
– sklearn.preprocessing.PowerTransformer(method=’yeo-johnson’,
standardize=True, copy=True)
Min-Max Normalization

– Min-max normalization, (usually called feature scaling) performs a linear

transformation on the original data.
– This technique gets all the scaled data in the range [0,1].
Problems With Normality

– As the normal distribution is simple and is well-understood, it is also over used

in the predictive projects.
– Assuming normality has its own flaws.
– As an instance, we cannot assume that the stock price follows normal
distribution as the price cannot be negative.
– Therefore the stock price potentially follows log of normal distribution to ensure
it is never below zero.
– We know that the returns can be negative, therefore the returns can follow
normal distribution.
Problems With Normality

– It is not wise to assume that the variable follows a normal

distribution without any analysis.
– A variable can follow Poisson, Student-t or Binomial distribution as an instance
and falsely assuming that a variable follows normal distribution can lead to
inaccurate results.
Problem

– The population distribution of SAT scores is normal with a mean of μ = 500 and
a standard deviation of 100. Given this information about the population and
the known proportions for a normal distribution, we can determine the
probabilities associated with specific samples. For example, what is the
probability of randomly selecting an individual from this population who has an
SAT score greater than 700?
Find probabilities
Answers

– 0.1587
– 0.9335
– 0.3085
Find probability
Answers
Problem

– It is known that IQ scores form a normal distribution with μ = 100 and σ =15.
Given this information, what is the probability of randomly selecting an
individual with an IQ score less than 120?

– 1. Transform the X values into z-scores.

– 2. Use the unit normal table to look up the proportions corresponding to the z-
score values.
– The highway department conducted a study measuring driving speeds on a
local section of interstate highway. They found an average speed of μ = 58 miles
per hour with a standard deviation of σ = 10. The distribution was
approximately normal.

– Given this information, what proportion of the cars are traveling between 55
and 65 miles per hour? Using probability notation, we can express the problem
as p(55 < X < 65) ?

Mechanics and Sound With Answer
No ratings yet
Mechanics and Sound With Answer
149 pages
Aitel S3 Phy Eot-2 2024
No ratings yet
Aitel S3 Phy Eot-2 2024
4 pages
Lesson 2-08 Properties of Normal Distributions
100% (3)
Lesson 2-08 Properties of Normal Distributions
18 pages
Coalescers Separators
100% (2)
Coalescers Separators
3 pages
Friction Stir Welding
100% (1)
Friction Stir Welding
20 pages
ABB Motor Catalog Frame 315 & 355 PDF
No ratings yet
ABB Motor Catalog Frame 315 & 355 PDF
6 pages
Normal Distribution (SESSION 1)
No ratings yet
Normal Distribution (SESSION 1)
7 pages
MIKE
No ratings yet
MIKE
66 pages
Compaction Test Results
No ratings yet
Compaction Test Results
21 pages
STA 211 Lecture 1
No ratings yet
STA 211 Lecture 1
18 pages
Introduction To RCC by Fahmid
No ratings yet
Introduction To RCC by Fahmid
35 pages
Statistics and Probability
100% (1)
Statistics and Probability
26 pages
Statistics and Probability: Quarter 3 - Module 3: The Normal Distribution
No ratings yet
Statistics and Probability: Quarter 3 - Module 3: The Normal Distribution
29 pages
Normal Distribution1
100% (1)
Normal Distribution1
8 pages
Lesson Plan in Science I
No ratings yet
Lesson Plan in Science I
5 pages
Glossary of Terms in Reservoir Engineering
No ratings yet
Glossary of Terms in Reservoir Engineering
1 page
Collision Between Identical Particles
No ratings yet
Collision Between Identical Particles
14 pages
Ph.D. Enrolment Register As On 22.11.2016
No ratings yet
Ph.D. Enrolment Register As On 22.11.2016
35 pages
Normal Distribution
No ratings yet
Normal Distribution
126 pages
STAT 411 Ders Notu
No ratings yet
STAT 411 Ders Notu
89 pages
ES Chapter 4 Continuous Probability Distributions 2
No ratings yet
ES Chapter 4 Continuous Probability Distributions 2
82 pages
Probability
No ratings yet
Probability
58 pages
New Microsoft Office Word Document
No ratings yet
New Microsoft Office Word Document
941 pages
Continuous Pro SV
No ratings yet
Continuous Pro SV
45 pages
Morris 2003 Reducing Curl in Multilayer Blown Film Experimental Results Model Development and Application To A Cereal
No ratings yet
Morris 2003 Reducing Curl in Multilayer Blown Film Experimental Results Model Development and Application To A Cereal
24 pages
Mod-3 2024
No ratings yet
Mod-3 2024
48 pages
Final Thesis
No ratings yet
Final Thesis
79 pages
Continuous Probability Distributions
No ratings yet
Continuous Probability Distributions
57 pages
Continuous Random Variable
No ratings yet
Continuous Random Variable
44 pages
02 Data Categorization
No ratings yet
02 Data Categorization
25 pages
Statistics Notes Part-2
No ratings yet
Statistics Notes Part-2
24 pages
2.normal Distribution
No ratings yet
2.normal Distribution
69 pages
Introduction To Business Forecasting and Predictive Analytics
No ratings yet
Introduction To Business Forecasting and Predictive Analytics
25 pages
STA 241 Topic 13 Final Normal Distribution
No ratings yet
STA 241 Topic 13 Final Normal Distribution
28 pages
Normal Random Variables and Normal Curve
No ratings yet
Normal Random Variables and Normal Curve
16 pages
Statistics and Probability
No ratings yet
Statistics and Probability
23 pages
CM 3 Statistics Probability
No ratings yet
CM 3 Statistics Probability
23 pages
Breast Cancer Vijay & Aravind Project 2024-06-28 Recreate
No ratings yet
Breast Cancer Vijay & Aravind Project 2024-06-28 Recreate
14 pages
Lesson 2-6 Ratios and Proportions - Demo
No ratings yet
Lesson 2-6 Ratios and Proportions - Demo
45 pages
Lecture 7 Continuous Probability Distribution
No ratings yet
Lecture 7 Continuous Probability Distribution
30 pages
Chapter 4 - Common Probability Distribution 03 March 2024
No ratings yet
Chapter 4 - Common Probability Distribution 03 March 2024
23 pages
Statistics and Probability-WEEK2
No ratings yet
Statistics and Probability-WEEK2
25 pages
8 - Illustrating A Normal Random
No ratings yet
8 - Illustrating A Normal Random
47 pages
The Normal Distribution and Its Properties
No ratings yet
The Normal Distribution and Its Properties
8 pages
Unit 6 Topic 1 Content
No ratings yet
Unit 6 Topic 1 Content
21 pages
Csc-Reviewer-Stats and Prob
No ratings yet
Csc-Reviewer-Stats and Prob
13 pages
Normal Distributions
No ratings yet
Normal Distributions
11 pages
QTAOR
No ratings yet
QTAOR
14 pages
The Normal Distribution
No ratings yet
The Normal Distribution
32 pages
Probability Presentation
No ratings yet
Probability Presentation
26 pages
Illustrating Normal Curve
No ratings yet
Illustrating Normal Curve
11 pages
PRP PBL-1
No ratings yet
PRP PBL-1
12 pages
STAT PROB Week 5 Sy 2020 2021
No ratings yet
STAT PROB Week 5 Sy 2020 2021
19 pages
Lesson 9 The Normal Distribution
No ratings yet
Lesson 9 The Normal Distribution
23 pages
Week#7 SIM (Normal Distribution)
No ratings yet
Week#7 SIM (Normal Distribution)
27 pages
SLIDES Probability-Part3
No ratings yet
SLIDES Probability-Part3
17 pages
Normal Probability Distribution
No ratings yet
Normal Probability Distribution
6 pages
Sci Pi Statistics and Probability Handout
No ratings yet
Sci Pi Statistics and Probability Handout
4 pages
All You Need To Know About Normal Distribution - Towards Data Science
No ratings yet
All You Need To Know About Normal Distribution - Towards Data Science
19 pages
Probab Stat 5 8 1
No ratings yet
Probab Stat 5 8 1
29 pages
Normal
No ratings yet
Normal
8 pages
Feb. 10 Discussion
No ratings yet
Feb. 10 Discussion
3 pages
11 DHCP
No ratings yet
11 DHCP
8 pages
Normal Probability Curve
No ratings yet
Normal Probability Curve
6 pages
R-6 Theory
No ratings yet
R-6 Theory
4 pages
Day 02-Random Variable and Probability - Part (I)
No ratings yet
Day 02-Random Variable and Probability - Part (I)
34 pages
M3.Normal Distribution - Final PDF
No ratings yet
M3.Normal Distribution - Final PDF
23 pages
Cyclic Groups and Some of Their Properties - Part
No ratings yet
Cyclic Groups and Some of Their Properties - Part
5 pages
Assignment 2 State Arman
No ratings yet
Assignment 2 State Arman
9 pages
Grade 10-Informal 1
No ratings yet
Grade 10-Informal 1
9 pages
Bearing Capacity of Soil: Kumar Shubham
No ratings yet
Bearing Capacity of Soil: Kumar Shubham
11 pages
Normal Distribution
No ratings yet
Normal Distribution
48 pages
A. Report
No ratings yet
A. Report
2 pages
What Is Normal Distribution
No ratings yet
What Is Normal Distribution
5 pages
Episode 305 2 Oscillators
No ratings yet
Episode 305 2 Oscillators
4 pages
Project Locus Grade 11
No ratings yet
Project Locus Grade 11
9 pages
Fiziks: Forum For Csir-Ugc Jrf/Net, Gate, Iit-Jam, Gre in Physical Sciences
No ratings yet
Fiziks: Forum For Csir-Ugc Jrf/Net, Gate, Iit-Jam, Gre in Physical Sciences
13 pages
Normal Distribution
No ratings yet
Normal Distribution
29 pages
Probability Distribution
No ratings yet
Probability Distribution
10 pages
Normal Distribution: An Ultimate Guide: Data Science Blogathon
No ratings yet
Normal Distribution: An Ultimate Guide: Data Science Blogathon
7 pages
Study On Effects of Polymer On Properties of Bitumen: (Acceptance Date 21st January, 2012)
No ratings yet
Study On Effects of Polymer On Properties of Bitumen: (Acceptance Date 21st January, 2012)
12 pages
Design and Application of Self Compacting Concrete
No ratings yet
Design and Application of Self Compacting Concrete
38 pages
Probability
No ratings yet
Probability
14 pages
M3.Normal Distribution - Final PDF
No ratings yet
M3.Normal Distribution - Final PDF
23 pages
What Is Distribution?
No ratings yet
What Is Distribution?
4 pages
Harmonic Analysis Tutorial CAEPIPE
No ratings yet
Harmonic Analysis Tutorial CAEPIPE
6 pages
AO4423/AO4423L: Product Summary General Description
No ratings yet
AO4423/AO4423L: Product Summary General Description
5 pages
Kolmogorov-Smirnov Tests: 3.1 The One-Sample Test
No ratings yet
Kolmogorov-Smirnov Tests: 3.1 The One-Sample Test
6 pages
Mineral: Amethyst Quartz
No ratings yet
Mineral: Amethyst Quartz
1 page
FTB TBN100 10a
No ratings yet
FTB TBN100 10a
2 pages
Quality 1.4021 Chemical Composition: Lucefin Group
No ratings yet
Quality 1.4021 Chemical Composition: Lucefin Group
2 pages
Professor Bidyadhar Subudhi Dept. of Electrical Engineering National Institute of Technology, Rourkela
No ratings yet
Professor Bidyadhar Subudhi Dept. of Electrical Engineering National Institute of Technology, Rourkela
120 pages
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)

Normal Distribution

Uploaded by

Normal Distribution

Uploaded by

Normal

Discrete probability distributions

Continuous probability distributions

– What is so special about normal probability distribution?

– What probability distribution is?

– We can plot the chart and it will form a curve.

– It shows us the probability distributions of all of the variables.

– You can convert a distribution of a feature into normal distribution.

– Min-max normalization, (usually called feature scaling) performs a linear

– As the normal distribution is simple and is well-understood, it is also over used

– It is not wise to assume that the variable follows a normal

– 1. Transform the X values into z-scores.

You might also like