Probability
Probability
Probability Spaces
How likely is it that at least two of these people have the same birthday? Is it
• Extremely unlikely?
• Unlikely?
• Likely?
How likely is it that two (or more) of these people were born on June 15?
Example 1.1.2 In poker, a full house (3 cards of one rank and two of another, e.g. 3 fours
and 2 queens) beats a flush (five cards of the same suit).
A player is more likely to be dealt a flush than a full house. We will be able to
precisely quantify the meaning of “more likely” here.
1
both equally likely. The outcome of each toss is unpredictable; so is the sequence
of H and T.
However, as the number of tosses gets large, we expect that the number of H
(heads) recorded will fluctuate around 1/2 of the total number of tosses. We say
the probability of a H is 21 , abbreviated by
1
P(H) = .
2
1
Of course P(T ) = also.
2
Example 1.1.4 Suppose three coins are tossed. What is the probability of the outcome
“2H, 1T”?
All of these 8 outcomes are equally likely, and three of them involve the combi-
nation ”2H, 1T”. So the probability of this combination is 3/8.
P(2H, 1T ) = 83 .
Problem 1.1.5 If 4 coins are tossed, what is the probability of getting 3 heads and 1 tail?
N OTES:
• In general, an event has associated to it a probability, which is a real number
between 0 and 1.
• Events which are unlikely have low (close to 0) probability, and events
which are likely have high (close to 1) probability.
2
Example 1.1.6 (Probability and Addition)
(a) A fair die is cast. What is the probability that it will show either a 5 or a 6?
(b) Two fair dice are cast. What is the probability that at least one of them will show a
6?
S OLUTION:
(a) There are six possible outcomes :
1, 2, 3, 4, 5, 6,
all equally likely - each has probability 1/6. So the probability of the event “5 or
6” is 1/3 : we expect this outcome in 1/3 of cases.
1
P(5 or 6) = .
3
1
Note : P(5) = 6
and P(6) = 16 . So
1
P(5 or 6) = = P(5) + P(6).
3
In this example the probability of one or other of the events “5” and “6” is the
sum of their individual probabilities.
(b) We need
3
experiment).
The events in (b), “Die 1 shows 6” and “Die 2 shows 6” are not mutually exclu-
sive - it is possible for both to occur simultaneously in the experiment described
in (b), if both dice show 6.
4
1.2 Sample Spaces and Events
The purpose of the following definition is to establish some of the language that
is commonly used to discuss problems in probability theory.
• The sample space, S, of an experiment is the set of possible outcomes for the ex-
periment. (The sample space may be finite or infinite).
• An EVENT is a subset of the sample space, i.e. a (possibly empty) collection of pos-
sible outcomes to the experiment. We say that an event A OCCURS if the outcome
is an element of A.
N OTE: The difference between an outcome and an event in this context is that an
outcome is an individual possible result of the experiment; an event may include
many different possible outcomes.
The following examples describe how these definitions might apply in various
applications.
Define A = {1, 2} ⊆ S.
The set A is the event “the outcome is less than 3”.
Example 1.2.4
Experiment: Roll two dice (one red, one blue) and record the scores.
The Sample Space is S = {(1, 1), (1, 2), . . . , (1, 6), (2, 1), . . . , (6, 1), . . . , (6, 6)} (listed
with the score on the red die first)
(36 = 62 possible outcomes, all equally likely if both dice are fair).
Example 1.2.5
Experiment: Roll two dice and observe the total score.
5
Example 1.2.6
Experiment: Continue tossing a coin until a tail appears, and observe the number of
tosses.
Example 1.2.7
Experiment: Measuring the hours a computer disk runs.
2. Union : A ∪ B = {a ∈ S : a ∈ A or a ∈ B or both}.
4. The complement of A in S is
Ā = {a ∈ S : a 6∈ A}.
Ā, also denoted S\A, contains exactly those elements of S which are not in
A.
A ∩ Ā = ∅ and S = A ∪ Ā,
6. The power set of S, denoted P(S), is the set whose elements are all the subsets
of S. If S is the sample space of an experiment, then P(S) is the set of possible
events.
6
In Probability Theory, a probability P(A) is assigned to every subset A of the sam-
ple space S of an experiment (i.e. to every event). The number P(A) is a measure
of how likely the event A is to occur and ranges from 0 to 1. We insist that the
following two properties be satisfied :
1. P(S) = 1 : The outcome is certain to be an element of the sample space.
2. If A1 , A2 , A3 . . . are pairwise mutually exclusive events, then
P(A1 ∪ A2 ∪ A3 ∪ . . . ) = P(A1 ) + P(A2 ) + P(A3 ) + . . .
Example 1.2.9 Suppose a die is unfairly balanced, so that it shows a 6 with probability
1
2
, the other 5 outcomes being equally likely.
Experiment: Roll this unfair die.
Sample Space : S = {1, 2, 3, 4, 5, 6}
What should the probability measure be?
Solution:
1
P(6) =
2
1
=⇒ P({1, 2, 3, 4, 5}) =
2
1 1 1
=⇒ P(1) = P(2) = P(3) = P(4) = P(5) = × = .
5 2 10
7
Note : Here |A| denotes the number of elements in A. The above definition of the
probability measure P says
• If the event A includes the outcome 6 then its probability is 12 (for the “6”)
plus 101 for each remaining outcome in A, of which there are |A| − 1.
• If the event A does not include the outcome 6 then its probability is |A| 10
1
-
the number of outcomes included in A is |A|, and each of these has the same
probability 101 .
(i) P(1) = P(2) = P(3) = P(4) (i.e. outcomes 1,2,3,4 are all equally likely).
(ii) P(5) = P(6) = 2P(1) (i.e. outcomes 5 and 6 are equally likely, but each is
twice as likely as any of the other four possible outcomes).
Properties 1 and 2 of Definition 1.2.3 are the only Properties which we insist must
be satisfied. In Section 1.1.3 we will derive some consequences of these proper-
ties, and see that they are consistent with what we would expect based on exam-
ples.
8
1.3 Further Properties of Probability
Throughout this section, let S be a sample space (for some experiment), and
P : P(S) −→ [0, 1]
2. Complement of A (not A)
For A ⊆ S, Ā (the complement of A) is the set of outcomes which do not
belong to A. So the event Ā occurs precisely when A does not. Since A and
Ā are mutually exclusive (disjoint) and A ∪ Ā = S, we have by properties 1
and 2
P(A) + P(Ā) = P(S) = 1.
Thus
P(Ā) = 1 − P(A) for any event A. (Property 3)
Example 1.3.1 Suppose A and B are subsets of S with P(A) = 0.2, P(B) = 0.6 and
P(A ∩ B) = 0.1. Calculate
(a) P(A ∪ B)
(b) P(Ā)
(c) P(Ā ∪ B)
S OLUTION
(a) A ∪ B is the disjoint union of A ∩ B̄, A ∩ B and Ā ∩ B (disjoint means that no
two of these three sets have any elements in common). Hence
9
Thus in our example P(A ∩ B̄) = 0.2 − 0.1 = 0.1.
Similarly P(Ā ∩ B) = P(B ∩ Ā) = P(B) − P(B ∩ A) = 0.6 − 0.1 = 0.5. So
Also,
P(A ∩ B) = P(A) + P(B) − P(A ∪ B) (Property 6)
Example 1.3.2 Weather records over 1000 days show rain on 550 days and sunshine
on 350 days. If 700 days had either rain or sunshine, find the probability that on a day
selected at random there was
10
(b) P(R ∩ S̄) = P(R) − P(R ∩ S) by Property 4
550 1 7
So P(R ∩ S̄) = − = .
1000 5 20
700 3
So P(R̄ ∩ S̄) = 1 − = .
1000 10
Problem 1.3.3 A college has 400 2nd Science students, of whom 150 study mathematics,
50 study both mathematics and chemistry, and 220 study either mathematics or chem-
istry. Find the probability that a student selected at random studies
(a) Chemistry
11
1.4 Conditional Probability
Example 1.4.1 Two fair dice are rolled, 1 red and 1 blue. The Sample Space is
-36 outcomes, all equally likely (here (2, 3) denotes the outcome where the red die show 2
and the blue one shows 3).
A NSWER: Suppose C occurs : so the outcome is either (4, 6), (5, 5) or (6, 4).
In two of these cases, namely (4, 6) and (6, 4), the event D also occurs. Thus
12
N OTE: In the above example
1 1 2
P(C ∩ D) = = × = P(C) × P(D|C).
18 12 3
This is an example of a general rule.
Definition 1.4.2 If A and B are events, then P(A|B), the conditional probability of A
given B, is defined by
P(A ∩ B)
P(A|B) =
P(B)
Definition 1.4.4 (Independence) Let A and B be events. They are called independent
if
P(A ∩ B) = P(A) × P(B).
These means that the probability of one of these events occurring is unaffected by
whether the other occurs or not.
N OTE: If the events A and B are mutually exclusive then P(A|B) = 0 and P(B|A) =
0.
Problem 1.4.5 Experiment : Roll two dice, one red and one blue. Which of the following
pairs of events are independent? In each case find P(A), P(B), P(A ∩ B), P(A|B) and
P(B|A).
(a) A: Red die shows even score. B: Blue die shows even score.
(b) A: Red die scores 1. B: Total score is 8.
(c) A: Red die shows even score. B: Total score is even.
(d) A: Red die scores 4 or more. B: Total score is 6 or more.
Example 1.4.6 A factory has two machines A and B making 60% and 40% respectively
of the total production. Machine A produces 3% defective items, and B produces 5%
defective items. Find the probability that a given defective part came from A.
13
S OLUTION: We consider the following events :
14
Proof:
P(A ∩ Ei )
1. Certainly P(Ei |A) = .
P(A)
2. P(A ∩ Ei ) = P(Ei )P(A|Ei ) by Definition 1.4.2.
3. To describe P(A) :
P(A) = P(A ∩ E1 ) + P(A ∩ E2 ) + · · · + P(A ∩ En )
= P(E1 )P(A|E1 ) + P(E2 )P(A|E2 ) + · · · + P(En )P(A|En )
Xn
= P(Ej )P(A|Ej ).
j=1
Hence
P(Ei )P(A|Ei )
P(Ei |A) = Pn .
j=1 P(Ej )P(A|Ej )
Example 1.4.9 A test is developed to detect a certain medical condition. It is found that
the population breaks into three categories.
Category A Afflicted individuals who give a positive result 97%
of the time
Category D Individuals with a different condition who
give a positive result 10% of the time
Category H Unaffected individuals who give a positive result 5% of
the time.
The categories A, D and H represent 1%, 3% and 96% of the population respec-
tively. What is the probability that someone who is selected at random and tests
positive for the condition actually has the condition?
Let A be the event someone is afflicted with the disease and R the event that the
test result is positive. We want P(A|R).
By Bayes’s theorem
P(A)P(R|A)
P(A|R) =
P(A)P(R|A) + P(D)P(R|D) + P(H)P(R|H)
(.01)(.97)
=
(.01)(.97) + (.03)(0.1) + (.96)(.05)
= 0.16
So only 16% of positive results will have the disease. This is not a good test.
15
Problem 1.4.10 Suppose the test is carried out twice and the test case tests positive both
times. Assuming the two tests are independent, what then is the probability that the test
case has the disease?
Problem 1.4.11 A machine consists of four components linked in parallel, so that the
machine fails only if all four components fail. Assume component failures are independent
of each other. If the components have probabilities 0.1, 0.2, 0.3 and 0.4 of failing when the
machine is turned on, what is the probability that the machine will function when turned
on?
Problem 1.4.12 A student takes a multiple choice exam in which each question has 5
possible answers, one correct. For 70% of the questions, the student knows the answer,
and she guesses at random for the remaining 30%.
(a) What is the probability that on a given question the student gets the correct answer?
(b) If the student answers a question correctly, what is the probability that she knows
the answer?
16
1.5 Some Counting Principles
In discrete problems, estimating the probability of some event often amounts to
counting the number of possible outcomes that have some property of interest,
and expressing this as a proportion of the total number of outcomes. For example,
determining the probability that a randomly dealt poker hand is a full house
means counting the number of possible full houses and counting the number of
total poker hands. “Discrete” means roughly that the outcomes of the experiment
are somehow “spread out” instead of blended together in a continuous fashion.
So there is a close connection between problems of discrete probability theory
and problems of enumerative combinatorics (i.e. counting).
The following is a list of some basic but important counting principles.
n! = n × (n − 1) × · · · × 2 × 1.
n!
n × (n − 1) × · · · × (n − k + 1) = (n−k)!
.
There are n choices for the first object, n − 1 for the second, etc., finally
n − k + 1 for the kth object. This number is called n Pk , the number of k-
permutations of n distinct objects.
Example 1.5.2 In a race with 8 runners, the number of ways in which the gold,
silver and bronze medals can be awarded is
8 8!
P3 = = 8 × 7 × 6 = 336.
(8 − 3)!
17
n
n!
k
= k!(n−k)!
Example 1.5.3 If the 8-person race ends in a three-way dead heat, the number of
possibilities for the three winners is
8 8! 8!
= = = 56.
3 3!(8 − 3)! 3!5!
Example 1.5.4 Find the probability that a randomly dealt poker hand will be
(a) A full house (3 cards of one rank and 2 of another).
(b) A (non-straight) flush (5 cards of the same suit, not of consecutive rank).
(a) To count the full houses : there are 13 choices for the rank of the 3-card set.
4
Within the four cards of this rank, the number of ways to choose three is 3 = 4.
Having chosen these, 12 choices remain for the rank of the pair. Within this rank
there are 42 = 6 ways to choose 2 cards. Thus the number of full houses is
4 4
13 × × 12 × = 13 × 4 × 12 × 6 = 3744.
3 2
Thus the probability of a full house being dealt is
3744
52
∼ 0.0014.
5
(b) Number of Flushes : There are four choices for a suit, and having chosen a suit
13
the number of ways to choose 5 cards belonging to it is 5 . Thus the number of
flushes in a 52-card deck is
13
4× = 5148.
5
From this count we need to exclude the straight flushes of which there are 40 :
there are 10 in each suit, since a straight may have any of Ace,2,3,. . . ,10 as the low-
est rank card (according to my recent poker researches, the sequences A,2,3,4,5
and 10,J,Q,K,A both qualify as straights). Thus the number of straight flushes is
40, and the number of non-straight flushes is 5148 − 40 = 5108. The probability
of a non-straight flush is
5108
52
∼ 0.002.
5
So a non-straight flush is more likely than a full house.
Problem 1.5.5 Find the probability that a randomly dealt poker hand is
18
(a) A straight flush.
(b) A straight (5 cards of consecutive ranks, not all of the same suit).
Problem 1.5.6 A bridge hand contains 13 cards. Find the probability that a randomly
dealt bridge hand
Example 1.5.7 (The “Birthday” Problem) Twenty people are assemble in a room. Find
the probability that two of them have the same birthday. (Assume there are 365 days all
equally likely to be a birthday).
S OLUTION: Let (x1 , . . . , x20 ) be the list of birthdays (so xi is the birthday of the ith
person). The number of possibilities for the list is (365)20 . The number with the
xi all different is 365 P20 (ordered lists of 20 from 365).
365 (365)!
P20 = = 365 × 364 × 363 × · · · × 347 × 346.
(345)!
Thus the probability that at least two birthdays are the same is
Problem 1.5.8 Give a formula for the probability that among n people, two have the
same birthday. The following table indicates the (maybe surprising?) values of this prob-
ability for various values of n:
n = 23 ∼ 0.51
n = 30 ∼ 0.71
n = 37 ∼ 0.85
n = 50 ∼ 0.97
n = 100 > 0.999999
This concludes Chapter 1, Probability Spaces. A “Probability Space” is just a
sample space equipped with a probability measure.
19
Chapter 2
20
5
P(X = 4) = (0.6)4 (0.4)1 = 0.2592
4
5
−we can choose 4 of the 5 tosses for the H in ways.
4
5
P(X = 5) = (0.6)5 = 0.07776
5
0.4 q
........................................................................................
......................................................................
......................................................................
0.3 q ......................................................................
......................................................................
......................................................................
......................................................................
........................................................................................
............................................................................................................................................
........................................................................................................
.................................................................... ................................................................................................................................................................................
.................................................................... ............................................................................................................................................
0.2 q .................................................................... ............................................................................................................................................
.................................................................... ............................................................................................................................................
.................................................................... ............................................................................................................................................
.................................................................... ............................................................................................................................................
.................................................................... ............................................................................................................................................
.................................................................... ............................................................................................................................................
.................................................................... ............................................................................................................................................
.................................................................... ............................................................................................................................................
.................................................................... ............................................................................................................................................
0.1 q .................................................................... ............................................................................................................................................
.................................................................... ............................................................................................................................................
.................................................................... .............................................................................................................................................................. .................
................. .................................. ........................................................................................ .................
.................
.................
.................
.................
................. ..................................................................................... ................................................................................................................................................................................................
..................
.................
.................
.................
................. .................................................................... ..............................................................................................................................................................
................. .................
..................
.................
.................
................. .................................................. ............................................................................................................................ .................
.................
.................
.................
................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..................
.................
................. .................................................................... ..............................................................................................................................................................
.................
.................
..................
.................
.................
.................
.................
................. .................................................. ............................................................................................................................
. .................
..................
................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .................
q q q q q q
................. .................
................. .................................................................... ..............................................................................................................................................................
................. ..................
.................
.................
................. .................
................. .................
................. .................................. ........................................................................................ .................
.................
0 1 2 3 4 5
Probability Function for X
R EMARKS
• X in this example is a discrete random variable. It associates a number (in this
case the number of heads) to every outcome of an experiment. Officially
• X(S) denotes the set of possible values assumed by the random variable X
(so in this example X(S) = {0, 1, 2, 3, 4, 5} - possible numbers of heads in 5
tosses).
fX : X(S) −→ [0, 1]
21
T HE B INOMIAL P ROBABILITY F UNCTION
Definition 2.1.2 A Bernoulli Trial is an experiment with two possible outcomes, suc-
cess (s), occurring with probability p, and failure (f), occurring with probability q =
1 − p.
Definition 2.1.3 If a Bernoulli trial (in which success has probability p) is performed
n times, let X be the number of successes in the n trials. Then X is called a binomial
random variable with parameters n and p.
So the X of Example 2.1.1 is a binomial random variable with parameters n = 5 and
p = 0.6.
The probability function for X is given by
n
fX (k) = k
pk (1 − p)n−k , for k = 0, 1, . . . , n
That this is true follows from the binomial theorem applied to the expression
(p + (1 − p))n . Recall that the binomial theorem says
Xn
n k n−k
n
(x + y) = x y ,
k=0
k
Problem 2.1.4 If 2% of items produced by a factory are defective, find the probability
that a (randomly packed) box of 20 items contains
22
T HE H YPERGEOMETRIC P ROBABILITY F UNCTION
Example 2.1.5 A lotto competition works as follows. An entry involves selecting four
numbers from {1, 2, . . . , 20}. After all entries have been completed, four winning numbers
are selected from the set {1, 2, . . . , 20}. A “match” occurs within an entry when that entry
includes one of the four winning numbers - so the number of matches in an entry can be
0,1,2,3 or 4. If X is the number of “matches” in a random selection of four numbers,
describe the probability function of X.
A. X = 0 if the entry contains none of the winning numbers, but four of the 16
losing numbers. This can happen in 16
4
ways. So
16
4
P(X = 0) = fX (0) = 20
≈ 0.3756.
4
C. X = 2 if the entry contains two of the 4winning numbers, and two of the 16
losing numbers. This can happen in 42 162 ways. So
4 16
2
P(X = 2) = fX (2) = 20
2 ≈ 0.1486.
4
D. X = 3 if the entry contains three of the 4 winning numbers, and one of the
4 16
16 losing numbers. This can happen in 3 1 ways. So
4 16
3
P(X = 3) = fX (3) = 20
1 ≈ 0.0132.
4
E. X = 4 if the entry contains all four of the winning numbers. This can happen
only in one way. So
1
P(X = 4) = fX (4) = 20
≈ 0.0002.
4
23
N OTE: This problem can be approached in a slightly more systematic way as
follows. Entering the lotto involves selecting 4 objects from 20, of which four are
“good” and 16 are “bad”. The random variable X is the number of “good” objects
in a selection. For k = 0, 1, 2, 3, 4, the number of selections that include k “good”
objects is
4 16
.
k 4−k
Thus the probability function for X is given by
4 16
k 4−k
P(X = k) = fX (k) = 20
.
4
Definition 2.1.6 Suppose that a collection of n distinct objects includes m “good” ob-
jects and n − m “bad” objects. Suppose k 6 m and let X be the number of good objects
in a random selection of k objects from the full collection. Then X is a hypergeomet-
ric random variable with parameters (n, m, k) and the probability distribution of X is
given by
m n−m
r k−r
P(X = r) = fX (r) = n
, for r = 0, 1, . . . , k.
k
Problem 2.1.7 Let X be the number of clubs in a randomly dealt poker hand. Write down
a formula for the probability function of X.
Example 2.1.8 A fair die is rolled repeatedly until it shows a 6. Let X be the number of
rolls required. Describe the probability function of X.
S OLUTION: Each roll is a Bernoulli trial in which a 6 represents success (s), and
occurs with probability 1/6.
Any other score is failure : P(f) = 5/6.
Sample Space : S = {s, fs, ffs, fffs, . . . }
Random Variable X : X(s) = 1, X(fs) = 2, X(fffffs) = 6, etc.
X(S) = 1, 2, 3, . . . ∼ N - possible values of X.
24
Probability Function :
1
fX (1) = P(s) =
6
5 1
fX (2) = P(fs) = ×
6 6
.. ..
. .
k−1
5 1
fX (k) = P(ff . . . }f s) =
| {z for k ∈ N.
6 6
k−1
P∞
N OTE: We should expect k=1 fX (k) = 1; this is
2 !
1 5 1 5 1
+ × + × + ...
6 6 6 6 6
1
- a geometric series with first term 6
and common ratio 56 . Its sum is
1/6
= 1.
1 − (5/6)
Definition 2.1.9 Independent Bernoulli trials with probability of success p are carried
out until a success is achieved. Let X be the number of trials. Then X is the geometric
random variable with parameter p, and probability function given by
(a) X 6 3.
(b) 4 6 X 6 7.
(c) X > 6.
N OTE: Recall that in a geometric series with initial term a and common ratio r, the sum
1 − rn
of the first n terms is a .
1−r
25
T HE N EGATIVE B INOMIAL P ROBABILITY F UNCTION
Definition 2.1.12 If independent Bernoulli trials with probability of success p are car-
ried out, let X be the number of trials required to observe r successes. Then X is the
negative binomial random variable with parameters r and p. The probability func-
tion of X is described by
k−1
r
fX (k) = r−1
p (1 − p)k−r for k = r, r + 1, . . .
R EMARKS:
1. If r = 1, this is the geometric random variable.
2. Binomial Random Variable with parameters n, p - no. of successes in n trials.
Negative Binomial Random Variable with parameter r.p - no. of trials re-
quired to get r successes.
26
2.2 Expectation and Variance of a Random Variable
Example 2.2.1 Let X be the number of clubs in a poker hand; X(S) = {0, 1, 2, 3, 4, 5}.
Then X is a hypergeometric random variable with parameters (52, 13, 5). The
probability distribution of X is given by
13
39
k 5−k
pX (k) = 52
.
5
So the most likely number of clubs in a poker hand is 1. Suppose that a large num-
ber (say 10,000) of poker hands is dealt. Amongst the 10,000 hands, we “expect”
the following distribution of clubs :
No. of Clubs 0 1 2 3 4 5
No. of Hands 2215 4114 2743 816 107 5
X
5
1.25 = 0×pX (0)+1×pX (1)+2×pX (2)+3×pX (3)+4×pX (4)+5×pX (5) = kpX (k).
k=0
1.25 is the expectation, expected value or mean of the random variable X, denoted by
E(X) (or sometimes µ or µ(X)).
E(X) is a weighted average of the possible values of X, the weight attached to a
particular value being its probability.
Note: As Example 2.2.1 shows, E(X) may not actually be a possible value of X but
it is the value assumed “on average” by X.
27
N OTE: Suppose that X is a hypergeometric random variable with parameters
(n, m, k) (i.e. X is the number of good objects chose when k objects are selected
from a collection of n in which m are good and n − m are bad). Then
m
E(X) = k
n
J USTIFICATION FOR THIS A SSERTION: The proportion of “good” objects in the full
selection is m
n
. So on average we would expect a selection of k objects to contain
m
n
k good ones.
Example 2.2.4 Suppose that X is the number of smokers in a random selection of five
persons from the population. Suppose also that 20% of people smoke. What is the expected
value of X?
28
E(X) = np.
Example 2.2.6 Find the variance and standard deviation of the random variable X of
Example 2.2.1.
m m n − k
Var(X) = k 1−
n n n−1
29
Example 2.2.7 Suppose that X is the number of successes in a single Bernoulli trial with
probability of success p. What is the variance of X?
S OLUTION: We know that E(X) = p and that X has only two possible values, 0
and 1, occuring with probabilities 1 − p and p respectively. Thus
X = X1 + X2 + · · · + Xn
and
Var(X) = Var(X1 ) + Var(X2 ) + · · · + Var(Xn ) = np(1 − p).
Example 2.2.8 Suppose that X is the number of university graduates in a random selec-
tion of 12 people from the population, and that university graduates account for 18% of
the population. Find the expectation, variance and standard deviation of X.
30
√
• σ= σ2 ≈ 1.331.
1 1−p
E(X) = and Var(X) =
p p2
r r(1 − p)
E(X) = and Var(X) =
p p2
31
Chapter 3
Continuous random variables do not have probability functions in the sense dis-
cussed in Chapter 2. To discuss probability theory for continuous random vari-
ables, we need the idea of a cumulative distribution function.
Let S be a sample space and let X : S −→ R be a random variable on S.
k 0 1 2 3 4 5
P(X = k) 0.2215 0.4114. 0.2743 0.0816 0.0107 0.0005
32
Graph of FX :
1.0 s s s s
s
s s s s s s
0 1 2 3 4 5
Notes:
Example 3.1.4 A vertically mounted wheel is spun and allowed to come to rest. Let X
be the angle (measured counterclockwise) between its initial position and final position.
Then 0 6 X < 2π and the cdf for X is given by:
33
1 s ppppppp
p p p p p p p p p ppppppp ppp
ppp
ppppp ppppp
p p p p p p p p p ppppppp ppp
pp
pppp pppppp
p p p p p p p p p ppppppppppp
pp
ppppppppppp
p p p p p p p p p ppppppp ppp
pp
pppp pppppp
p psppppp pppp s
0 2π
X is equally likely to assume any value in the interval [0, 2π), so FX (x) increases
uniformly from 0 to 1 on [0, 2π]; the graph of FX on [0, 2π] is a line.
0 x<0
1
FX (x) = x 0 6 x 6 2π
2π
1 x > 2π
X is an example of a uniform continuous random variable.
N OTE: Define a function f : R −→ R by
0 x<0
1
f(x) = 0 6 x < 2π
2π
0 x > 2π
1
2π
s
s
2π
34
Problem 3.1.5 Let A, B be real numbers with A < B. Suppose X is a continuous random
variable which assumes a value in the interval [A, B], all values in this interval being
equally likely. Write down a formula and graph for the cdf of X (X has the continuous
uniform distribution).
i.e. for every x ∈ R, P(X 6 x) is the area under the graph of f(t), to the left of x.
N OTES:
Definition 3.1.7 Let X be a continuous random variable with pdf f. Then the expectation
of X is defined by Z ∞
E(X) = tf(t)dt.
−∞
35
Definition 3.1.9 Var(X) = E((X − E(X))2 ): variance of X. This is also given by
E(X2 ) − (E(X))2 (this is true also in the discrete case).
Z∞
2
Fact: E(X ) = t2 f(t)dt. So
−∞
Z∞ Z∞ Z ∞ 2
2 2 2
Var(X) = t f(t)dt − (E(X)) = t f(t)dt − tf(t)dt .
−∞ −∞ −∞
p
The standard deviation σ of X is defined by σ = Var(X).
Problem 3.1.10 Find the variance and standard deviation of the random variable X of
Example 3.1.4.
36
3.2 The Normal Distribution
Consider the function f : R −→ R defined by
1 1 2
f(x) = √ e− 2 x .
2π
This function has the following properties:
• lim f(x) = 0.
x→∞
• lim f(x) = 0.
x→−∞
Definition 3.2.1 A continuous random variable having the above f as a pdf is said to
have the standard normal distribution. Such a random variable is said to have distri-
bution N(0, 1).
Problem 3.2.2 If X has distribution N(0, 1), show that E(X) = 0 and Var(X) = 1.
37
Outline of Solution:
Z∞
1 − 21 t2
1. E(X) = t √ e dt.
−∞ 2π
Note that the integrand here is an odd function, hence the integral is zero
(provided that the improper integral converges which is not difficult to
check). Thus E(X) = 0.
Z∞ Z∞
2 1 − 12 t2 2 1 1 2
2. Var(X) = t √ e dt − (E(X)) = √ t2 e− 2 t dt.
−∞ 2π 2π −∞
0 −t2
Integrating by parts with u = t, v = te gives
∞ Z∞
1 − 12 t2
1 1 2
Var(X) = − √ te +√ e− 2 t dt = 0 + 1 = 1.
2π
−∞ 2π −∞
Example 3.2.3 Suppose that X is a random variable having the distribution N(0, 1).
Find
(a) P(X 6 2.0)
Solution: Let Φ : R −→ [0, 1] be the cdf of X. (The symbol Φ is the upper case of
the Greek letter phi). For part (a), we know from the definition of a pdf that
Z2
1 t2
P(X 6 2.0) = Φ(2.0) = √ e− 2 dt.
2π −∞
t2
P ROBLEM: The expression e− 2 does not have an antiderivative expressible in
terms of elementary functions. So to estimate the value of this integral we need
to use a numerical approximation. A table of values of 1 − Φ(x) for 0 6 x < 3, in
increments of 0.01, is provided for this purpose.
(a) From the table, P(X 6 2.0) = 1 − P(X > 2.0) ≈ 1 − 0.0228 (from looking up
z = 2 in the table. So
38
(c) By symmetry, P(X < −0.5) = P(X > 0.5). This can be read from the table at
z = 0.5. So
P(X < −0.5) = 0.3085.
(d)
Problem 3.2.4 If X has the distribution N(0, 1), find the following :
I NVERSE P ROBLEM: Suppose X has distribution N(0, 1). Find that real number x
for which P(X 6 x) = 0.8 (i.e. find that x for which we expect X to be 6 x in 80%
of cases).
S OLUTION: We want that value of x for which the area under the graph to the left
of x is 0.8, and the area to the right is 0.2. So look in the right hand side column
of the table for 0.2000. We have 0.2005 when z = 0.84, and 0.1977 when z = 0.85.
We can estimate the value x corresponding to 0.2 as follows :
0.2005 − 0.2000
x ≈ 0.84 + (0.85 − 0.84)
0.2005 − 0.1997
≈ 0.8418.
Problem 3.2.5 Find that real number x for which P(|X| 6 x) = 0.8, where X has the
standard normal distribution.
39
Other Normal Distributions
Definition 3.2.6 A random variable X is said to have the distribution N(µ, σ2 ) if X has
pdf fX given by
1 1 x−µ 2
fX (x) = √ e− 2 ( σ ) ,
2πσ
where µ ∈ R and σ > 0 are fixed.
R EMARK: If σ = 1, then fX (x) = f(x − µ), where f is the pdf of the the N(0, 1)
distribution. So the graph of fX (x) in this case is just that of f(x) moved µ units to
the right.
So if X has distribution N(µ, σ2 ) then X has expected value µ and variance σ2 . For
fixed µ and σ, the graph of fX (x) is a “bell-shaped” curve, in which the value of
fX (x) is negligible for X more than about 3σ away from µ. The graph is symmetric
about x = µ, and its “flatness” is controlled by σ, larger values of σ giving a wider,
flatter curve. This makes sense, since a large value of σ means a high standard
deviation. The following diagram shows the graphs for µ = 0 corresponding to
σ = 1, 2 and 3.
Example 3.2.7 The height (in metres) of a person selected at random is known to be
normally distributed with a mean (i.e. expected value) of 1.65 and a standard deviation
of 0.12.
(a) What is the probability that a person’s height will exceed 1.75?
(b) What is the probability that a person’s height will be between 1.60 and 1.75?
(c) Above what height can we expect to find the tallest 5% of the population?
Problem: We only have tables for N(0, 1). What can we do with N(1.65, (0.12)2 )?
40
X−µ
Theorem 3.2.8 Suppose X has the distribution N(µ, σ2 ). Define Y = . Then Y
σ
has the distribution N(0, 1).
X−µ
So: we work with Y = instead of X.
σ
Back to the Example: Let X be the height of a randomly selected person. So X ∼
X − 1.65
N(1.65, (0.12)2 ). Then Y = has distribution N(0, 1).
0.12
X − 1.65 0.1
(a) X > 1.75 =⇒ X − 1.65 > 0.1 =⇒ > =⇒ Y > 0.8333. So we
0.12 0.12
need P(N(0, 1) > 0.83. From the tables we can interpolate
1
P(X > 1.75) ≈ 0.2033 − (0.2033 − 0.2005) ≈ 0.2024.
3
(c) From the table we can read that for the tallest 5% we have Y > 1.645. Then
X − 1.65
> 1.6449 =⇒ X − 1.65 > 0.1974 =⇒ X > 1.85
0.12
The tallest 5% of the population will be taller than 1.85m.
Problem 3.2.9 Assume that the flight time (from takeoff to landing) from Dublin Air-
port to London Heathrow is normally distributed with a mean of 50 minutes and a stan-
dard deviation of 5 minutes.
(a) What is the probability that the flight time will exceed one hour?
(b) What is the probability that the flight time will be between 45 and 55 minutes?
(c) Below what duration can we expect the fastest 10% of flights?
41
R EMARK /E XERCISE (cf (b) above) : If X has distribution N(µ, σ2 ) then the proba-
bility that X will be between µ − σ and µ + σ (i.e. within one standard deviation
of the mean) is
i.e. we expect 68% of observed values to be within one standard deviation of the
mean. Similarly,
42