Continuous Distributions
Continuous Distributions
will learn:
• how to describe
distributions probabilities of
continuous variables
• how to calculate
expected statistics of
Introductory problem continuous variables
The height of many trees in a forest is measured and they • about an extremely
have a mean of 7 m and a standard deviation of 1.5 m. important distribution
Estimate the proportion of trees above 10 m tall. called the normal
distribution
• how to work
In chapter 23 we saw that being able to describe random backwards from
variables allowed us to make predictions about their properties. probabilities to
However, a major limitation was that those methods only applied estimate information
to discrete variables. In reality, many variables we are interested about the data.
in, such as height, weight and time, are continuous variables. In
this chapter we shall extend the methods of chapter 23 to work
with continuous variables. We will also meet the incredibly
important ‘normal distribution’, which is used to model a large
number of continuous variables in the physical world.
Mass / kg Frequency
4.9 12
5.0 16
5.1 20
5.2 14
Not all of the data labelled 5.1 kg has a mass of exactly 5.1 kg.
Which is more useful –
A bag with mass 5.1358 kg or 5.0879546 kg would be counted
knowing that the mass
in this category. It would be impossible to list all the different of a bag of rice is
possible actual masses, and it would be impossible to measure 5.1 kg to 2 significant
the mass absolutely accurately. When we collect continuous figures or knowing that the
data we have to put it into groups. This means that we cannot mass is 5.0879546 kg? Is
talk about the probability of a single value of a continuous knowing the exact answer
random variable (crv). We can only talk about the probability always better?
of the crv being in a specified range. A useful way of
representing this is by the area under a graph. This also has the
© Cambridge University Press 2012 24 Continuous distributions 769
f
property that there is no area above a single value but we can
find the area above any range.
x
a b
x
0 1
1
Total area is 1. Area is only found between (a) 1 = ∫ kx 2 dx
0
0 and 1 1
⎡ kx 3 ⎤
=⎢
⎣ 3 ⎥⎦ 0
k
=
3
⇔ k=3
06
Exercise 24A
1. For each of these distributions, find the possible values of the
unknown parameter k:
⎧2 − 2 x 0 < x < 1
2. (a) If f ( x ) = ⎨
⎩0 otherwise
(i) Find P( .3 < X < 0.. ). (ii) Find P ( 0 < X < 0.5 ).
⎧ π
⎪cos 0<x<
(b) If f ( x ) = ⎨ 2
⎪⎩ 0 otherwise
⎛π π⎞ ⎛ π⎞
(i) Find P ⎜ < ≤ ⎟ . (ii) Find P 0 ≤ X < ⎟ .
⎝4 3⎠ ⎝ 6⎠
⎧ 1 1 < x < 10
⎪
(c) If f ( x ) = ⎨ x ln10
⎪⎩ 0 otherwise
(i) Find P( ). (ii) Find P( ).
⎧ 2x 0 < x < 1
3. (a) If f ( x ) = ⎨
⎩ 0 otherwise
(i) Find a if P ( X a) = 0.4 . (ii) Find b if P ( X b) = 0.9 .
⎧x
⎪ 0<x<8
(b) If f ( x ) = ⎨ 8
⎪⎩0 otherwise
⎧x
⎪ 2<x<6
(c) If f ( x ) = ⎨16
⎪⎩ 0 otherwise
(i) Find a if P ( 2 + a < X < 6 ) 0.8.
(ii) Find b if ( b X < b + 1) = 0.25.
⎧ e x k x < 2k
8. If f ( x ) = ⎨
⎩0 otherwise
find P ⎛ X > ⎞ .
3k
[7 marks]
⎝ 2⎠
2 3
To find standard deviation we must first E (X 2 ) = ∫ x2 × x ( 2 − ) dx
find Var(X) which requires us to find E(X2)
0 4
3 2 3
4 ∫0
= x ( − ) dx
= 1.2 (from GDC)
∴ Var ( ) = E ( ) − E ( )
2
= 1 2 − 12
=02
standard deviation = 0 2 = 0.447
⎧3
⎪ (4x 2 x3 ) 0 < x < 2
If f ( x ) = ⎨ 20 find the median and mode of X.
⎪⎩ 0 otherwise
m 3 1
∫ ( 4 x 2 x 3 ) dx =
1 0 20 2
Probability of being below the median is 3 4
2 ⇔
m
−
3m
=
1
5 80 2
y
This is a quartic equation without any easy x=2
m3 3m4
y= 5 − 80
1
2
0
m
1.52 5.24
From GDC: m = 1.52 or 5.24
However 0 2 therefore median
= 1.52
df 6 x 9 x 2
For the mode check for a local maximum = − =0
dx 5 20
3x
= (8 − 3 x)
20
8
⇔ =0 o x=
3
y
y = df = 3x
20
(8−3x)
dx
x
8
3 2
x x
100 110 18.5 20 21
P (X > 110) P (18.5 < X < 21)
The average height of people in a town is 170 cm with standard deviation of 10 cm. What is the
probability that a randomly selected resident:
(a) is less than 165 cm tall?
(b) is between 180 cm and 190 cm tall?
(c) is over 176 cm tall?
x
165 170
P (X < ) = 0.309 ( ) (from GDC)
x
170 180 190
P ( 180 < < 190 ) 0.136(3SF) (from GDC)
x
170 176
P (X > 176)
P (X > ) = 0.274
7 (3SF) (from GDC)
x−μ
(a) z =
The number of standard deviations away from σ
the mean is measured by the Z-score
(b) z = −1 2
Values below the mean have a negative Z-score x − 15
−1 2 =
25
⇒ − 15 = −3
⇒ = 12
6 1 − 6⎞
We are given that x = 6.1 so we can (a) P ( X ) P ⎛⎝ Z ≤ = P( ≤ 0.2)
05 ⎠
calculate z
(c) P ( X ) 1 P (X ≤ )
6 5 − 6⎞
= 1− P⎛Z ≤ = 1 − P( ≤ )
⎝ 05 ⎠
= P(Z > 1)
You can see from the examples above that you don’t actually
have to convert probabilities into the form P(X ≤ k) every time;
simply replace the x values by the corresponding z scores.
Exercise 24C
1. Find the following probabilities:
(a) If X ~ N (20, ),
(i) P ( X ≤ 32) (ii) P ( X < 12)
(b) If Y ~ N ( 4.8,1.44 ),
(i) P (Y > 5.1) (ii) P( .4)
(c) If R ~ N (17, 2)
(i) P (16 R 20) (ii) P( .4 R 18.. )
(d) If Q has a normal distribution with mean 12 and
standard deviation 3:
(i) P (Q > 9.4 ) (ii) P (Q < 14 )
(e) If F has a normal distribution with mean 100 and
standard deviation 25:
(i) P ( F − 100 < 15 ) (ii) P ( F − 100 > 10 )
8. If Q ~ (4, ), find:
(a) P( 5 < )
(b) P ( Q Q ) [6 marks]
The size of men’s feet is thought to be normally distributed with mean 22 cm and variance
25 cm2. A shoe manufacturer wants only 5% of men to be unable to find shoes large enough for
them. How big should their largest shoe be?
The masses of gerbils are thought to be normally distributed. If 30% of gerbils have a mass of
more than 65 g and 20% have a mass of less than 40 g, estimate the mean and the variance of
the mass of a gerbil.
Exercise 24D
1. (a) If X ~ N(14, 49), find x if:
(i) P ( X x ) = 0 .8 (ii) P ( X x ) = 0.46
(b) If X ~ N (36.5,10) , find x if:
(i) P ( X x ) = 0 .9 (ii) P ( X x ) = 0 .4
(c) If X ~ N ( 0,12 ) , find x if:
(i) P ( X < ) (ii) P ( X < 0.8 )
Summary
• Because we group continuous data, the probability of a continuous random variable (crv)
is discussed in terms of the probability of it being in a given range. To do this we integrate a
probability density function such that the area under the curve f (x) represents the probability.
The probability of the crv falling between values a and b is:
b
P (a < x < b ) ∫ f ( x ) dx
a
• For continuous random variables, the formulae for expectation and variance require integration:
∞
x 2 f ( x ) dx ; Var ( X )
∞
E( X ) ∫ xxf ( x ) dx; E ( X 2 ) ∫ E( X2 ) − [ ( X )]
2
−∞
− −∞
−
m 1
• The median, m, of a continuous variable satisfies ∫ f ( x ) dx = , and the mode is the value of x
−∞
− 2
at the maximum value of f (x).
• One very important continuous distribution is the normal distribution: X ~ N (μ, σ2), where
μ = mean and σ2 = variance. Calculators can provide approximate probabilities of being in any
given range.
• For X ∼ N(
N μ σ ), the Z-score (z) measures the number of standard deviations from the mean
x −μ
that a value (x) is: z = .
σ
• Given a random variable X ∼ N( N μ σ ), Z is a new random variable that takes the values equal
to the Z-scores of x, such that for every x there is a corresponding z. This is the standardised
value, which always has a normal distribution Z N( ), called the standard normal
distribution.
• If you need to calculate the μ and σ2 of a normal distribution, you can use the standard normal
distribution to replace values of X with their Z-scores as these follow the known distribution
of Z N( ).
The height of many trees in a forest is measured and they have a mean of 7 m and a
standard deviation of 1.5 m. Estimate the proportion of trees above 10 m tall.
If we make the reasonable assumption that heights of trees are normally distributed, this
problem is asking what is the probability of being more than 2 standard deviations above
the mean. This is 1 Φ (2) = 2.3%.
Short questions
⎧ k 2x 0 < x <1
1. If X is a continuous random variable with pdf f ( x ) = ⎨
⎩0 otherwise
(a) Find the value of k.
(b) Find the variance of X. [6 marks]
2. The test scores of a group of students are normally distributed with mean 62
and variance 144.
(a) Find the percentage of students with scores above 80.
(b) What is the lowest score achieved by the top 50% of the students? [6 marks]
3. 200 people are asked to estimate the size of an angle. 16 give an estimate
which was less than 25o and 42 give an estimate which was more than 35o.
Assuming that the data follows a normal distribution, estimate the mean
and the standard deviation of the results. [6 marks]
4. If X is a continuous random variable with pdf
⎧ax + b 1 x < 5
f (x) = ⎨
⎩0 otherwise
and E(X) = 3.5, find the exact values of the constants a and b. [5 marks]
5. The adult female of a breed of dog has average height 0.7 m with variance
0.05 m2. If the height follows a normal distribution find the probability that in
six independently selected dogs of this breed exactly four are above 0.75 m tall.
[5 marks]
6. If Z ~ N (0,, ), prove that for positive k:
P( Z k ) = 2 − 2Φ(k) [5 marks]
Long questions
1. A continuous random variable X has the probability density function
⎧ ax 2 (5 − x ) 0 < x < 5
f (x) = ⎨
⎩0 otherwise
(a) Find the value of the constant a.
(b)
(b) Evaluate the mean and the standard deviation of X.
(c)
(c) Find the probability that X > 4.
(d) Find the standard deviation of a normal distribution which has the
(d)
same mean as X and the same probability that X > 4. [12 marks]
2. The continuous random variable X has probability density function f (x) where:
⎧ e − kek 0 x≤1
f ( x) = ⎨
⎩ 0 otherwise