Normal Distribution - Fundamentals and Using Distributions - Lesson
Normal Distribution - Fundamentals and Using Distributions - Lesson
Fundamentals
& Using Distributions
Jamie Frost
www.drfrost.org
@DrFrostMaths
The distribution
across positions
has a
symmetrical
‘bell-curve’
shape, with the
frequencies
concentrated
at the centre.
Normal Distribution from Natural
Processes
Many real-life I like your
equations.
processes Thanks.Let’s
Let’s
involve such have
havebabies.
babies.
repeated
random
processes.
Height
Introducing the Normal Distribution
A normal distribution is a
distribution for continuous
variables where the probability
symmetrically tails off from the
mean.
For normal distributions we tend
𝑓 (𝑥 ) to draw the axis at the mean for
symmetry.
cm Height in cm ()
Notation for Normal Distributions
! If a random variable is normally 𝑋 𝑁 ( 𝜇 , 𝜎“with
2
)
distributed, then , where is the mean parameters
and is the standard deviation. and .”
“The random “a normal
variable …” distribution…
”
Why is this peak “has the
lower? distribution
As the data is more 𝑓 (𝑥 ) …”
spread out, a smaller Recall that standard deviation,
roughly speaking, is the average
? of data is
proportion
The distance of values from the mean.
concentrated at the mean is A higher value means the data is
mean. when . more spread out.
The
𝜇=0 ?, 𝜎= 1 mean/
𝜇=0 ?, 𝜎= ? 3 centre is
the
𝜇=− ?2 , 𝜎=? 3 same.
But the
data is
more
spread
𝑥 out.
Test Your Understanding drfrost.org/
s/
640b
a𝑋 𝑁 ( 1 ,3 2 ) E? c 𝑋 𝑁 ( 4 ,1 2 ) B
?
b𝑋 𝑁 ( 4 , 22 ) C? d 𝑋 𝑁 ( 1 ,12 ) A?
e 𝑋 𝑁 ( 1 ,2 2) D?
Normal Distribution Q&A
For a normal To find , we would:
distribution to be Find the area?
used, the random between these two
variable has to be... values.
Continuous
?
(e.g. height, Would we ever want to find ?
weight) Since height is continuous, the
With a discrete probability someone is ‘exactly’
𝑓 (𝑥 ) ?
variable, all the cm is infinitesimally small. This is
probabilities had to therefore not a ‘probability’ in
add up to . the normal sense.
For a continuous
variable, similarly: Because of this,
The total area
? under
the probability graph
is .
Heigh
t (cm)
Probability Density
If is meaningless, represents the concentration of probability
what does this (i.e. the probability per unit value), known as
vertical axis probability density. See skill 862.
actually represent? This is analogous to histograms, where the
vertical axis is the frequency density, i.e. the
frequency per unit value.
𝑓 (𝑥 )
Probability
density
The area Frequenc
under a y density
probability
density graph
gives us
probability.
𝑥 𝑥
Height
(cm)
The ‘68-95-99.7 Rule’
! If the data is normally
distributed, then of data is
𝑓 (𝑥 ) within standard deviation
of the mean, i.e.
! and of the data is 𝜇 − 𝜎𝜇+ 𝜎
within and standard
deviations of the mean,
respectively.
𝜇− 2 𝜎 𝜇+ 2 𝜎
We will work out
how to calculate
𝜇− 3 𝜎
𝟔𝟖% 𝜇+ 3 𝜎
these values later,
but it is
worthwhile
remembering
these
percentages
Only one in million values fall 𝟗𝟓 % because they help
outside . CERN used a “ sigma
level of significance” to
us make
judgements
ensure the data suggesting
existence of the Higgs Boson
wasn’t by chance: this is a in
𝟗𝟗.𝟕% mentally about
million chance (if we consider
just one tail).
proportions of
data.
Examples
𝜇 − 𝜎 𝜇+ 𝜎
Test Your Understanding drfrost.org/
s/
640d
a ml and ml.
b ml and ml.
a 𝟗𝟓 %
?
b 𝟔𝟖 %
?
More Complex Ranges
Use a graph when determining more complex ranges using the 68-95-
99.7 rule.
‘IQ’ is a standardised measure of intelligence,
where the mean IQ is and the standard deviation
is.
Determine the probability that a random selected
person has an IQ:
a
b above . If is between
and , then half
between and of this will be
a
𝑓 (𝑥 ) b 𝑓 (𝑥 ) between and .
is within standard
deviations (7). So
by symmetry, the Similarly, if is
remaining is split between and ,
either side. is between
and .
95 %
47.5 %34 %
𝟐.𝟓%
70 130 70 115
𝑃 ( 𝑋 >130 )=𝟎. 𝟎𝟐𝟓 to sf
Test Your Understanding drfrost.org/
s/
640e
a
b
a b
? ?
68 % 95 %
16 % 16 % 2.5 % 2.5 %
3.9 44.1 3.844.2
𝑃 ( 𝐷<3.9 ) =𝟎 . 𝟏𝟔
𝑃 ( 𝐷<4.2 )=0.95+0.025
Calculating Arbitrary Ranges
Given that , determine the probability
of:
a b
These are
instructions for the
Casio fx-CG50
1 Choose 2 Choose Normal.
DISTRIBUTION.
a
b
?
a
b 𝑃 ( 𝐷<165 )=𝟎.𝟐𝟎𝟐?
Inverse Normal Distribution
𝑃 ( 𝑋 < 𝒌 )=0.7
𝑓 (𝑥 )
0.7
𝑘
??
We’ve used the normal distribution to
determine the probability of a range. But
what about the reverse: finding
boundary values that give a specific
probability?
Inverse Normal Distribution on a Calculator
Given that , determine the value of
such that
These are
instructions for the
1 Choose 2 Choose Normal.
Casio fx-CG50 DISTRIBUTION.
a
b
a
b
a
𝑃 (𝑋<
? 𝑎 ) =0.3
𝑃 ( 𝑋 >𝑏 )=0.1
b
?
𝑃 ( 𝑋>𝑘) =0.01
Let be the IQ for which of
the? population is above.
Harder Examples
Given that , determine the and such that:
a
b
a b
0.2
100 − 𝑐100
100+ 𝑐 80 𝑑100
Test Your Understanding drfrost.org/
s/
641d 641e
𝑃 ( 𝑋<12 )=0.8413
a
𝑃 ( 𝑋<10+𝑑) =0.7
b
?
Example Test Your
Understanding
We can use the inverse normal 9 The times of athletes in a m
distribution to determine had mean seconds and
quartiles. standard deviation seconds. It
Given that , determine the is suggested the distribution of
upper quartile. times can be modelled using .
Determine the interquartile
range of times according to
this model.
By definition, the
upper quartile is
along the data.
drfrost.org/ 641c
s/
Conditional Probability with Normal
Distributions
The IQ of a population has the distribution .
A person is defined as ‘smart’ if their IQ is over .
Given a person is smart, determine the
probability that their IQ is over .
Recall that
𝑓 (𝑥 )
“Above and
above ” is the 100130140
same as just Thinking of ‘given
saying “above ”. that’ as ‘out of’, it’s
The more just the area above
restrictive out of the area
condition ‘wins’. above .
Test Your Understanding drfrost.org/
s/
642a
𝑃 ( 𝐻>160 )
𝑃 ( 𝐻>160|𝐻>150?)=
𝑃 ( 𝐻>150 )
Determining the Median of a Restricted
Group
Reflections: We can
The IQ of a population has the distribution . see the calculation of
A person is defined as ‘smart’ if their IQ is over . the two methods are the
same. Method 1 uses
Determine the median IQ of ‘smart’ people. the conditional
probability formula
whereas Method 2
reasons about the
graph.
Method Method
1 Of smart people, 2
will be above 𝑃 ( 𝑋 >130 )= 0.02275
the smart Of this of smart
people median people, they will
() 𝑓 (𝑥 )
be split half and
half above and
below the
median.
0.0114 0.0114
100130𝑎
Use inverse
normal.
Test Your Understanding drfrost.org/
s/
642b
cm
Conditional Probabilities Using Symmetry
Draw a graph to
represent the
information 𝑓 (𝑥 )
given.
This tail is also
by symmetry.
0.3 0.3
𝜇 −10𝜇𝜇+ 10
Test Your Understanding drfrost.org/
s/
642c
𝑃 ( 𝑊<𝜇−30 )
𝑃 ( 𝑊<𝜇−30|𝑊 <𝜇 )=
? 𝑃 ( 𝑊 <𝜇 )
Value at a More General Position drfrost.org/
s/
642d
She has used her calculator for hours, but has another hours of exams to sit.
Alice only has new batteries so, after the first hours of her exams, although
her calculator is still working, she randomly selects of the batteries from her
calculator and replaces these with the new batteries.
Find the probability that her calculator will not stop working for the remainder
of her exams, giving your answer correct to significant figures.
(5 marks)
P(Athlete 1 Wins)
= 0.99
𝐼 How ‘surprising’ a
probability is can be
calculated using
𝐼 =− log 𝑝
, i.e. an event which is
certain to happen is
‘completely
unsurprising’.
𝑝
1
At the other extreme, ,
so the more unlikely the
event, the more and
more surprising it
becomes.
Entropy of a Distribution
𝑓 (𝑥 )
Remarkably, if all we know about a
continuous distribution is its mean
and standard deviation , a normal
distribution has the maximum
entropy across all possible
distributions.
𝜇
Bayes’ theorem (skill 636) allows If the causes were discrete and
us to find the probability of categorical, e.g. possible fruits an
object could be, we might use a
various ‘causes’ that leads to an
uniform distribution for the prior
observed ‘effect’, for example distribution.
the classification of a fruit given
observable characteristics like
colour and size.
The prior belief of what different But if the causes was a continuous
causes might be, without value, we would use a normal
considering evidence observed, distribution,
is known as the prior to assume as little as possible.
distribution.