11 Normal Distribution
11 Normal Distribution
image: Etsy
with materials by
Will Monroe Mehran Sahami
July 19, 2017 and Chris Piech
Announcements: Midterm
Review session:
Tomorrow, July 20, 2:30-3:20pm
in Gates B01
Review: A grid of random variables
One One
trial X ∼Ber( p) X ∼Geo( p) success
n=1 r=1
Several Several
trials X ∼Bin(n , p) X ∼NegBin (r , p) successes
(continuous!)
Review: Continuous distributions
A continuous random variable has a
value that’s a real number (not
necessarily an integer).
F X (a)= ∫ dx f X (x)
x=−∞
Review: Probability density function
The probability density function (PDF)
of a continuous random variable
represents the relative likelihood of
various values.
∞ ∞
E [ X ]= ∑ x⋅p X (x) E [ X ]= ∫ dx x⋅f X ( x)
x=−∞ x=−∞
∞ ∞
2 2 2 2
E [ X ]= ∑ x ⋅p X ( x) E [ X ]= ∫ dx x ⋅f X ( x)
x=−∞ x=−∞
2 2 2
Var( X )=E [( X −E [ X ]) ]=E [ X ]−(E [ X ])
(still!)
Review: Uniform random variable
A uniform random variable is
equally likely to be any value in
a single real number interval.
X ∼Uni(α ,β)
1
{
f X (x)= β−α
0
if x∈[α ,β]
otherwise
Uniform: Fact sheet
minimum value
X ∼Uni(α ,β)
maximum value
1
PDF:
{
f X ( x)= β−α
0
if x∈[α ,β]
otherwise
x−α
CDF:
{
F X ( x)=
β−α
1
0
if x∈[α ,β]
if x>β
otherwise
expectation: E[ X ]=
α+β
2
(β−α)
2
variance: Var( X )=
12
image: Haha169
Review: Exponential random variable
An exponential random variable
is the amount of time until the
first event when events occur
as in the Poisson distribution.
X ∼Exp(λ)
−λ x
λe if x≥0
{
f X (x)=
0 otherwise
X ∼Exp(λ)
time until first event
−λ x
λe if x≥0
PDF: f X ( x)=
0{ otherwise
−λ x
1−e if x≥0
CDF: F X ( x)= {
0 otherwise
1
expectation: E [ X ]=
λ
1
variance: Var( X )= 2
image: Adrian Sampson
λ
Normal random variable
An normal (= Gaussian) random variable is
a good approximation to many other
distributions. It often results from sums or
averages of independent random variables.
2
X ∼N (μ , σ ) 2
1 x−μ
− (
1 2 σ )
f X ( x)= e
σ √2 π
Déjà vu?
Déjà vu?
P( X =k )
k
Déjà vu?
f X ( x)
Personality: easygoing
What is normally distributed?
(with sufficient
Averages of samples from a population sample sizes)
The Know-Nothing Distribution
“maximum entropy”
2
X ∼N (μ , σ )
variance (σ = standard deviation)
2
1 x−μ
PDF: f X ( x)=
1
e
− ( )
2 σ
σ √2 π
The Standard Normal
Z∼N (0,1)
μ σ²
2
X ∼N (μ , σ ) X =σ Z +μ
X−μ
Z= σ
De-scarifying the normal PDF
2
1 x−μ
f X ( x)=
1 −
e
( )
2 σ
σ √2 π
De-scarifying the normal PDF
2
1 z−0
f Z ( z)=
1 −
e
( )
2 1
1 √2 π
De-scarifying the normal PDF
1 2
1 − z
2
f Z ( z)= e
√2 π
De-scarifying the normal PDF
1 2
− z
2
f Z ( z)=C e
De-scarifying the normal PDF
1 2
− z
2
f Z ( z)=C e
1 2
− z
2
De-scarifying the normal PDF
1 2
− z
2
f Z ( z)=C e
1 2
− z
2
De-scarifying the normal PDF
2
1 x−μ
f X ( x)=
1
e
− ( )
2 σ
σ √2 π X −μ
Z= σ
normalizing
constant
Normal: Fact sheet
mean
2
X ∼N (μ , σ )
variance (σ = standard deviation)
2
1 x−μ
PDF: f X ( x)=
1 −
e
(
2 σ )
σ √2 π
x
x−μ
CDF: ( )
F X ( x)=Φ σ = ∫ dx f X ( x)
−∞
(no closed form)
The Standard Normal
Z∼N (0,1)
μ σ²
2
X ∼N (μ , σ ) X =σ Z +μ
X−μ
Z= σ
Φ(z)=F Z ( z)=P(Z≤z)
Symmetry of the normal
P( X≤μ−x)=P( X≥μ+ x)
and don’t forget:
P(Z≤−z)=P(Z≥z)
and don’t forget:
Φ(−z)=P(Z≥z)
and don’t forget:
Φ(0.54)=P(Z≤0.54)=0.7054
With today’s technology
scipy.stats.norm(mean, std).cdf(x)
X −3 0−3
P( X >0)=P
4 (>
4 )
3
=P Z >−(4 )
3 3
(
=1−P Z≤− =1−Φ(− )
4 )
4
3
=1−(1−Φ( ))
4
3
=Φ( )≈0.7734
4
Practice with the Gaussian
X ~ N(3, 16)
μ=3
σ² = 16
σ=4
2
X ∼N (μ , σ )
variance (σ = standard deviation)
2
1 x−μ
PDF: f X ( x)=
1
e
( − )
2 σ
σ √2 π
x
x−μ
CDF: ( )
F X ( x)=Φ σ = ∫ dx f X ( x)
−∞
(no closed form)
expectation: E[ X ]=μ
2
variance: Var( X )=σ
Carl Friedrich Gauss
(1775-1855)—remarkably influential
German mathematician
P( X =k )
P( X =k )
2 k
Bin (n , p)≈ N (μ , σ )
Something is strange...
Continuity correction
X ∼Bin (n , p)
Y ∼N (np , np(1− p))
P ( X ≥55)≈ P (Y >54.5)
X ~ Bin(100, 0.5)
np = 50
np(1 – p) = 50(1 – 0.5) = 25
≈ Y ~ N(50, 25)
Y −50 64.5−50
P (Y >64.5)=P ( 5
>
5 )
=P(Z >2.9)=1−Φ(2.9)≈0.00187
Stanford admissions
Stanford accepts 2480 students.
Each student independently
decides to attend with p = 0.68.
What is
P(at least 1750 students attend)?
≈ Y ~ N(1686.4, 539.65)
Y −1686.4 1749.5−1686.4
P (Y >1749.5)=P
(
√ 539.65
>
√ 539.65 )
≈ P (Z >2.54)=1−Φ(2.54)≈0.0053
image: Victor Gane
Stanford admissions changes