0% found this document useful (0 votes)
38 views54 pages

Chapter 3

The document provides an introduction to the normal distribution and Poisson distribution. It discusses key properties of the normal distribution including being symmetrical, having an area of 1, and being described by the mean and standard deviation. It also discusses how the normal distribution can be used to approximate real-world data and calculate probabilities. For the Poisson distribution, it describes how it can model count data that occurs at a constant rate, and how it is characterized by the parameter lambda which represents the average number of events. It also shows examples of how to calculate probabilities using the Poisson distribution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views54 pages

Chapter 3

The document provides an introduction to the normal distribution and Poisson distribution. It discusses key properties of the normal distribution including being symmetrical, having an area of 1, and being described by the mean and standard deviation. It also discusses how the normal distribution can be used to approximate real-world data and calculate probabilities. For the Poisson distribution, it describes how it can model count data that occurs at a constant rate, and how it is characterized by the parameter lambda which represents the average number of events. It also shows examples of how to calculate probabilities using the Poisson distribution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

The normal

distribution
I N T R O D U C T I O N T O S TAT I S T I C S I N P Y T H O N

Maggie Matsui
Content Developer, DataCamp
What is the normal distribution?

INTRODUCTION TO STATISTICS IN PYTHON


Symmetrical

INTRODUCTION TO STATISTICS IN PYTHON


Area = 1

INTRODUCTION TO STATISTICS IN PYTHON


Curve never hits 0

INTRODUCTION TO STATISTICS IN PYTHON


Described by mean and standard deviation

Mean: 20

Standard deviation:
3

Standard normal
distribution

Mean: 0

Standard deviation:
1

INTRODUCTION TO STATISTICS IN PYTHON


Described by mean and standard deviation

Mean: 20

Standard deviation:
3

Standard normal
distribution

Mean: 0

Standard deviation:
1

INTRODUCTION TO STATISTICS IN PYTHON


Areas under the normal distribution
68% falls within 1 standard deviation

INTRODUCTION TO STATISTICS IN PYTHON


Areas under the normal distribution
95% falls within 2 standard deviations

INTRODUCTION TO STATISTICS IN PYTHON


Areas under the normal distribution
99.7% falls within 3 standard deviations

INTRODUCTION TO STATISTICS IN PYTHON


Lots of histograms look normal
Normal distribution Women's heights from NHANES

Mean: 161 cm Standard


deviation: 7 cm

INTRODUCTION TO STATISTICS IN PYTHON


Approximating data with the normal distribution

INTRODUCTION TO STATISTICS IN PYTHON


What percent of women are shorter than 154 cm?
from scipy.stats import norm
norm.cdf(154, 161, 7)

0.158655

16% of women in the survey are shorter than


154 cm

INTRODUCTION TO STATISTICS IN PYTHON


What percent of women are taller than 154 cm?
from scipy.stats import norm
1 - norm.cdf(154, 161, 7)

0.841345

INTRODUCTION TO STATISTICS IN PYTHON


What percent of women are 154-157 cm?

norm.cdf(157, 161, 7) - norm.cdf(154, 161, 7)

INTRODUCTION TO STATISTICS IN PYTHON


What percent of women are 154-157 cm?

norm.cdf(157, 161, 7) - norm.cdf(154, 161, 7)

0.1252

INTRODUCTION TO STATISTICS IN PYTHON


What height are 90% of women shorter than?
norm.ppf(0.9, 161, 7)

169.97086

INTRODUCTION TO STATISTICS IN PYTHON


What height are 90% of women taller than?
norm.ppf((1-0.9), 161, 7)

152.029

INTRODUCTION TO STATISTICS IN PYTHON


Generating random numbers
# Generate 10 random heights
norm.rvs(161, 7, size=10)

array([155.5758223 , 155.13133235, 160.06377097, 168.33345778,


165.92273375, 163.32677057, 165.13280753, 146.36133538,
149.07845021, 160.5790856 ])

INTRODUCTION TO STATISTICS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O S TAT I S T I C S I N P Y T H O N
The central limit
theorem
I N T R O D U C T I O N T O S TAT I S T I C S I N P Y T H O N

Maggie Matsui
Content Developer, DataCamp
Rolling the dice 5 times
die = pd.Series([1, 2, 3, 4, 5, 6])
# Roll 5 times
samp_5 = die.sample(5, replace=True)
print(samp_5)

array([3, 1, 4, 1, 1])

np.mean(samp_5)

2.0

INTRODUCTION TO STATISTICS IN PYTHON


Rolling the dice 5 times
# Roll 5 times and take mean
samp_5 = die.sample(5, replace=True)
np.mean(samp_5)

4.4

samp_5 = die.sample(5, replace=True)


np.mean(samp_5)

3.8

INTRODUCTION TO STATISTICS IN PYTHON


Rolling the dice 5 times 10 times
Repeat 10 times: sample_means = []
for i in range(10):
Roll 5 times
samp_5 = die.sample(5, replace=True)
Take the mean sample_means.append(np.mean(samp_5))
print(sample_means)

[3.8, 4.0, 3.8, 3.6, 3.2, 4.8, 2.6,


3.0, 2.6, 2.0]

INTRODUCTION TO STATISTICS IN PYTHON


Sampling distributions
Sampling distribution of the sample mean

INTRODUCTION TO STATISTICS IN PYTHON


100 sample means
sample_means = []
for i in range(100):
sample_means.append(np.mean(die.sample(5, replace=True)))

INTRODUCTION TO STATISTICS IN PYTHON


1000 sample means
sample_means = []
for i in range(1000):
sample_means.append(np.mean(die.sample(5, replace=True)))

INTRODUCTION TO STATISTICS IN PYTHON


Central limit theorem
The sampling distribution of a statistic becomes closer to the normal distribution as the
number of trials increases.

* Samples should be random and independent

INTRODUCTION TO STATISTICS IN PYTHON


Standard deviation and the CLT
sample_sds = []
for i in range(1000):
sample_sds.append(np.std(die.sample(5, replace=True)))

INTRODUCTION TO STATISTICS IN PYTHON


Proportions and the CLT
sales_team = pd.Series(["Amir", "Brian", "Claire", "Damian"])
sales_team.sample(10, replace=True)

array(['Claire', 'Damian', 'Brian', 'Damian', 'Damian', 'Amir', 'Amir', 'Amir',


'Amir', 'Damian'], dtype=object)

sales_team.sample(10, replace=True)

array(['Brian', 'Amir', 'Brian', 'Claire', 'Brian', 'Damian', 'Claire', 'Brian',


'Claire', 'Claire'], dtype=object)

INTRODUCTION TO STATISTICS IN PYTHON


Sampling distribution of proportion

INTRODUCTION TO STATISTICS IN PYTHON


Mean of sampling distribution
# Estimate expected value of die
np.mean(sample_means)

3.48

# Estimate proportion of "Claire"s


np.mean(sample_props)

Estimate characteristics of unknown


0.26
underlying distribution
More easily estimate characteristics of
large populations

INTRODUCTION TO STATISTICS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O S TAT I S T I C S I N P Y T H O N
The Poisson
distribution
I N T R O D U C T I O N T O S TAT I S T I C S I N P Y T H O N

Maggie Matsui
Content Developer, DataCamp
Poisson processes
Events appear to happen at a certain rate,
but completely at random

Examples
Number of animals adopted from an
animal shelter per week

Number of people arriving at a


restaurant per hour

Number of earthquakes in California per


year
Time unit is irrelevant, as long as you use
the same unit when talking about the same
situation

INTRODUCTION TO STATISTICS IN PYTHON


Poisson distribution
Probability of some # of events occurring over a fixed period of time

Examples
Probability of ≥ 5 animals adopted from an animal shelter per week

Probability of 12 people arriving at a restaurant per hour

Probability of < 20 earthquakes in California per year

INTRODUCTION TO STATISTICS IN PYTHON


Lambda (λ)
λ = average number of events per time interval
Average number of adoptions per week = 8

INTRODUCTION TO STATISTICS IN PYTHON


Lambda is the distribution's peak

INTRODUCTION TO STATISTICS IN PYTHON


Probability of a single value
If the average number of adoptions per week is 8, what is P (# adoptions in a week = 5)?

from scipy.stats import poisson


poisson.pmf(5, 8)

0.09160366

INTRODUCTION TO STATISTICS IN PYTHON


Probability of less than or equal to
If the average number of adoptions per week is 8, what is P (# adoptions in a week ≤ 5)?

from scipy.stats import poisson


poisson.cdf(5, 8)

0.1912361

INTRODUCTION TO STATISTICS IN PYTHON


Probability of greater than
If the average number of adoptions per week is 8, what is P (# adoptions in a week > 5)?

1 - poisson.cdf(5, 8)

0.8087639

If the average number of adoptions per week is 10, what is P (# adoptions in a week > 5)?

1 - poisson.cdf(5, 10)

0.932914

INTRODUCTION TO STATISTICS IN PYTHON


Sampling from a Poisson distribution
from scipy.stats import poisson
poisson.rvs(8, size=10)

array([ 9, 9, 8, 7, 11, 3, 10, 6, 8, 14])

INTRODUCTION TO STATISTICS IN PYTHON


The CLT still applies!

INTRODUCTION TO STATISTICS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O S TAT I S T I C S I N P Y T H O N
More probability
distributions
I N T R O D U C T I O N T O S TAT I S T I C S I N P Y T H O N

Maggie Matsui
Content Developer, DataCamp
Exponential distribution
Probability of time between Poisson events
Examples
Probability of > 1 day between adoptions

Probability of < 10 minutes between restaurant arrivals

Probability of 6-8 months between earthquakes

Also uses lambda (rate)

Continuous (time)

INTRODUCTION TO STATISTICS IN PYTHON


Customer service requests
On average, one customer service ticket is created every 2 minutes
λ = 0.5 customer service tickets created each minute

INTRODUCTION TO STATISTICS IN PYTHON


Lambda in exponential distribution

INTRODUCTION TO STATISTICS IN PYTHON


Expected value of exponential distribution
In terms of rate (Poisson):

λ = 0.5 requests per minute

In terms of time (exponential):

1/λ = 1 request per 2 minutes

INTRODUCTION TO STATISTICS IN PYTHON


How long until a new request is created?
P (wait < 1 min) =

from scipy.stats import expon expon.cdf(1, scale=0.5)

0.8646647167633873

P (wait > 3 min) = P (1 min < wait < 3 min) =

1- expon.cdf(3, scale=0.5) expon.cdf(3, scale=0.5) - expon.cdf(1, scale=0.5)

0.0024787521766663767 0.13285653105994633

INTRODUCTION TO STATISTICS IN PYTHON


(Student's) t-distribution
Similar shape as the normal distribution

INTRODUCTION TO STATISTICS IN PYTHON


Degrees of freedom
Has parameter degrees of freedom (df) which affects the thickness of the tails
Lower df = thicker tails, higher standard deviation

Higher df = closer to normal distribution

INTRODUCTION TO STATISTICS IN PYTHON


Log-normal distribution
Variable whose logarithm is normally
distributed

Examples:
Length of chess games

Adult blood pressure

Number of hospitalizations in the 2003


SARS outbreak

INTRODUCTION TO STATISTICS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O S TAT I S T I C S I N P Y T H O N

You might also like