Department of Statistics
Federal University of Technology, Minna
These notes were prepared for the module on Probability Theory II which forms part of the STA127S course
at FUT Minna. R tutorials as well as theoretical practicals are provided.
Recommended textbooks:
• Les Underhill & Dave Bradfield. INTROSTAT, Lecture Note on Statis-
tics, University of Cape Town
• Bayo Lawal Applied Statistical Methods in Agriculture, Health and Life
Sciences, Publisher: Springer
1 Set Theory
Chapter 2
Simply because one of Murphy’s Laws states that before you can do anything, you
have to do something else. Before we can do “statistics” we have to do “probability
theory”, and for that we need some “set theory”. So here we go.
Definition of sets . . .
We define a set A to be a collection of distinguishable objects or entities. The set
A is determined when we can either (a) list the objects that belong to A or (b) give a
rule by which we can decide whether or not a given object belongs to A.
Example 1A: (a) If we say, “The letters e, f, g belong to the set A”, then we write
A = {e, f, g}
(b) If we say, “The set B consists of real numbers between 1 and 10 inclusive”, then
we write
B = {x | 1 ≤ x ≤ 10}.
We read this by saying: “The set B consists of all real numbers x such that x is larger
than or equal to 1 but is less than or equal to 10.”
Because the object e belongs to the set A we write
Example 2B:
(a) Express in set theory notation: the set U of numbers which have square roots
between 1 and 4.
(b) Write out in full all the elements of the set Z = {(x, y) | x ∈ {1, 2, 3, 4}, y = x2 }.
(a) Because the square roots of numbers between 1 and 16 belong to U , we write
U = {x | 1 ≤ x ≤ 16}.
(b) Z = {(1, 1), (2, 4), (3, 9), (4, 16)}.
Example 3C: Which of the following statements are correct and which are wrong?
(a) {3, 3, 3, 3} = {3}
(b) 6 ∈ {5, 6, 7}
(c) C = {−1, 0, 1}
(d) F = {x | 4 < f < 5}
(e) {1, 2, 7} = {7, 2, 1, 7}
(f) If H = {2, 4, 6, 8}, J = {1, 2, 3, 4} and K = {2x | x ∈ H}, then K = J
(g) {1} ∈ {1, 2, 3}.
Subsets . . .
Suppose we have two sets, G and H, and that every element of G also belongs to
H. Then we say that “G is a subset of H” and we write G ⊂ H. We can also write
H ⊃ G and say “H contains G”. If every element in G does not also belong to H, we
write G 6⊂ H and say “G is not a subset of H.”
Example 4A: Let G = {1, 3, 5}, H = {1, 3, 5, 9} and J = {1, 2, 3, 4, 5}. Then
clearly G ⊂ H, H 6⊂ J, J ⊃ G.
Note that the notation ⊂, ⊃ for sets is analogous to the notation ≤, ≥ for ordinary
numbers (rather than the notation <, >). The “round end” of the subset notation tells
you which of the sets is “smaller” (in the same way as the “pointed end” shows which
of two numbers is smaller).
Our definition of subset has a curious (at first sight) but logical consequence. Because
every element in G belongs to G, we can write G ⊂ G. For numbers, we can write 2 ≤ 2.
If H ⊂ G and G ⊂ H, then, obviously, H = G. For numbers, x ≤ 2 and x ≥ 2
together imply that x = 2.
Example 5C: Let V = {v | 0 < v < 5}, W = {0, 5}, X = {1, 2, 3, 4}, Y = {2, 4},
Z = {x | 1 ≤ x ≤ 4}. Which of the following statements are true, and which are false:
(a) V = W (e) X = Z
(b) Y ⊂ X (f) Z 6⊂ V
(c) W ⊃ V (g) Y ⊂ W
(d) Z ⊃ X (h) Y ∈ Z
Intersections . . .
Suppose that L = {a, b, c} and M = {b, c, d}. Then L 6⊂ M and M 6⊂ L. But if
we consider the set N = {b, c}, then we see that N ⊂ L and N ⊂ M , and that no other
set of which N is a subset has this property. This leads us to the idea of intersection.
The intersection of any two sets is the set that contains precisely those elements
which belong to both sets. For the sets, L, M and N above we write N = L ∩ M and
read this “N equals L intersection M ”. The intersection of two sets M and N can be
thought of as the set containing those elements which belong to both M and to N .
P ∩ Q = {x | 5 < x ≤ 10}.
The concept union contrasts with the concept intersection. The union of two sets
A and B is the set that contains the elements that belong to A or to B. Here we use
the word “or” in an inclusive sense — we do not exclude from the union those elements
that belong to both A and B.
If A = {1, 2, 3} and B = {2, 3, 4, 5} then the union of A and B is the set
C = {1, 2, 3, 4, 5}. We write
C =A∪B
and say “C equals A union B”.
Complements . . .
Our final concept from set theory is that of the complement of a set. Given the
sample space S, we define the complement of a set A to be the set of elements of S
which are not in A. The complement of A is written A, and is always relative to the
sample space S.
If S = {1, 2, 3, 4, 5, 6}, A = {1, 3, 5} and B = {2, 4, 6} then A = {2, 4, 6}.
We write
and say “the complement of A equals B” or, more briefly, “A complement equals B”.
Example 9C: If the sample space S contains the letters of the alphabet, i.e. S =
{a, b, c, . . . , x, y, z}, the set A contains the vowels, the set B contains the consonants,
the set C contains the first 10 letters of the alphabet C = {a, b, c, . . . , h, i, j} pick out
the true and false statements in the following list:
(a) A ∪ B = S (g) S ∩ B = B
(b) A ∩ B = ∅ (h) A ∪ A = S
(c) S ⊂ S (i) C ∩ A = {o, u}
(d) A ∩ C = {a, e, i} (j) (A ∪ C) = A ∩ C
(e) A ⊂ B (k) A ∩ C ⊂ C
(f) A = B (l) S = ∅
Venn diagrams . . .
A pictorial representation of sets that helps us solve many probems in set theory is
known as the Venn diagram. In the diagrams below think of all the “points” in the
rectangle as being the sample space S, and all the points inside the circles for A and
B as the sets A and B respectively. The shaded area in the diagram on the left then
represents A ∩ B, the set of points belonging to A and B. Similarly the diagram on the
right is a visual representation of A ∪ B, the set of points belonging to A or B. Recall
once again the special, inclusive meaning we give to “or”. When drawing Venn diagrams
it is helpful to associate “intersection” with “and” and “union” with “or”.
The diagram on the left below shows how to depict two mutually exclusive sets in a
Venn diagram.
Venn diagrams are usually only useful for up to three sets: the area shaded in the
diagram on the right is A ∩ B ∩ C.
(A ∪ B) ∩ C (A ∩ C) ∪ (B ∩ C)
Example 11C: Draw Venn diagrams to show that the following are true:
(a) A ∪ B = A ∪ (B ∩ A)
(b) (A ∩ C) ∪ (B ∩ A) = (A ∪ C) ∩ (A ∩ B)
(c) The sets A ∩ B, A ∩ C, B ∩ A and (A ∪ C) form a family of pairwise mutually
exclusive and exhaustive sets.
Example 12C: Draw Venn diagrams to determine which of the following statements
are true.
(a) (A ∩ B) = A ∩ B
(b) (A ∩ B) ∪ (A ∩ B) ⊂ A ∪ B
(c) (A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C)
(d) (C ∩ A) ∪ (C ∩ B) = (C ∪ (A ∩ B))
(e) [(A ∪ B) ∩ C] ∪ [(A ∪ C) ∩ B] = [(A ∪ B ∪ C) ∩ (A ∪ B) ∩ C] ∪ (A ∩ B)
(f) If the sets A1 , A2 , A3 , and A4 are pairwise mutually exclusive and exhaustive, and
B is an arbitrary set, then
Solutions to examples . . .
3C (a), (b), (c) and (e) are correct; (d) should read either F = {x | 4 < x < 5} or
F = {f | 4 < f < 5}. For (f), check that the following statement is correct: if H
and J are as given, and if K = {2x | x ∈ J} then K = H. For (g), note that we
never use the ∈-notation with a set on the left hand side.
12C (b) (c) (e) and (f) are true. For (a), check that (A ∩ B) = A ∪ B is true.
Easy exercises . . .
2.1 Let S be {1, 2, 3, 4, 5, 6}, the set of all possible outcomes when a die is thrown
and the number of dots on the uppermost face recorded. Describe in words the
following sets:
(a) {6} (d) {2, 4, 6}
(b) {1, 2, 3, 4} (e) {5, 6}
(c) {1, 3, 5} (f) {6}
∗ 2.2 If S = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, and A = {0, 1, 2}, B = {3, 4, 5, 6, 7},
C = {7, 8}, and D = {2, 4, 6, 8}, which of the following statements are true?
(a) A and B are mutually exclusive
(b) B = {0, 1, 2, 8, 9}
(c) A ∪ B ∪ C ∪ D = {0, 1, 2, 3, 4, 5, 6, 7, 8}
(d) D ⊂ (B ∪ C)
(e) A ∩ B ∩ C ∩ D = {9}
(f) A ∪ (B ∩ D) = (A ∩ B) ∪ (A ∩ D).
2.3 Let S denote the set of all companies listed on the Johannesburg Stock Exchange.
Let A = {x | x is in the gold mining sector},
let B = {x | x has annual turnover exceeding R10 million},
let C = {x | x has financial year ending in June},
let D = {x | the share price of x is higher now than six months ago}.
Describe in words the following sets:
(a) A ∪ B, (e) A,
(b) A ∩ D, (f) C ∪ D,
(c) A ∩ C ∩ D, (g) B∩C
(d) B ∩ (C ∪ A), (h) (B ∩ A) ∪ (C ∩ D).
∗ 2.4 If A, B and C are subsets of a universal set S, draw Venn diagrams to determine
which of the following statements are true.
(a) A∪A=S
(b) A∩A=∅
(c) A∪B =A∩B
(d) A∩B =A∪B
(e) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
(f) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
(g) A∪B∪C =S
(h) A∩B∩C =∅
(i) A∪B ⊃A∩B
(j) A ∩ (B ∪ C) ⊂ A ∪ (B ∩ C)
2.5 If S = {1, 2, 3}, list all the subsets of S.
∗ 2.6 Draw a series of Venn diagrams representing three sets, and shade in the following
(a) A∩B∩C
(b) (B ∩ A) ∪ (A ∩ C)
(c) A∪B∪C
(d) (A ∪ B ∪ C) ∩ (B ∩ C).
A = ∪ni=1 (A ∩ Bi ).
2.8 Show that if the set S has n elements, then S has 2n subsets. [Hint: Use the
binomial theorem.]
2.9 Let A and B be two events defined on a sample space S. Depict the following
events in Venn diagrams:
(a) C = (A ∩ B) ∪ (A ∩ B)
(b) D = (A ∪ B) ∪ (A ∩ B)
(c) What can you say about events C and D?
Solutions to exercises . . .
2.3 (a) Set of companies either in the gold mining sector or with turnovers exceeding
R10 million.
(b) Set of gold mining companies whose share price is higher now than six months
(c) Set of gold mining companies with a financial year ending in June whose share
price is higher now than six months ago.
(d) Set of all companies which have an annual turnover exceeding R10 million
and which are either gold mining companies or companies with financial years
ending in June (or both).
(e) Set of companies not in the gold mining sector.
(f) Set of companies which either do not have a financial year ending in June or
have a share price which is higher now than six months ago.
(g) Set of companies which either do not have an annual turnover exceeding R10
million or do not have financial year ending in June.
(h) Set of companies which either do not have an annual turnover exceeding R10
million or are not in the gold mining sector or both have a financial year
ending in June and have a share price which is higher now than six months
(Notice how difficult it is to express unambiguously in words the meaning of a few
mathematical symbols.)
2.5 ∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}.
2 Probability Theory
Chapter 3
Example 1A: The following are examples of random experiments and their sample
(a) We toss a coin. We can list the set of possible outcomes: S = {heads, tails}. We
can repeat the experiment endlessly, and we can observe the result of every trial.
(b) A phone number is chosen at random. The number is dialled, and the person who
answers is asked whether he/she is currently watching television. If the telephone is
unanswered after 45 seconds, the outcome, “no reply”, is recorded. The set of possi-
ble outcomes, the sample space, is S = {yes, no, won’t say, number engaged, no reply}.
(a) Squash is played to 9 points, with “deuce” at 8-all, in which case the player who
reached 8 first decides whether to play to 9 points or to 10 points. Thus the sample
space is S = {9-0, 9-1, 9-2, 9-3, 9-4, 9-5, 9-6, 9-7, 9-8, 10-8, 10-9}.
(b) The set of ways in which a batsman’s innings can end is given by S = {bowled,
caught, leg before wicket, run out, stumped, hit wicket, not out, retired, retired
hurt, obstruction, timed out}.
(c) It is convenient to let U = up D = down and N = no change. Then S = { UU,
UD, UN, DU, DD, DN, NU, ND, NN}, where, for example, DU means “first share
down, second share up”.
Notice how we construct the most detailed possible sample space — the set of
outcomes {both up, one up & one down, one up & one unchanged, one down & one
unchanged, both unchanged, both down} is not acceptable because each of these could
represent several distinguishable outcomes. For example, “one up & one down” could
represent either “UD” or “DU”.
Example 3C: A random experiment consists of tossing 3 coins of values R1, R2 and
R5 and observing heads and tails. Which of the following is the correct sample space?
(a) S = {3 heads,2 heads,1 head,0 heads}.
(b) S = {3 heads,2 heads 1 tail,1 head 2 tails,3 tails}.
(c) S = {HHH,HHT,HTH,THH,HTT,THT,TTH,TTT} where, for example, HTH
means “heads on R1, tails on R2 and heads on R5”.
Example 4B: Refer to the random experiments (a) to (c) of Example 2B, and give the
subsets of S that correspond to the following events.
(a) (i) The squash game is won by 5 or more points.
(ii) The game goes to deuce.
(b) When the batsman ended his innings the bails were dislodged from the wickets.
(c) None of the shares decline.
In each case we simply list the outcomes that favour the occurrence of the event in
which we are interested. The answers are:
(a) (i) {9-0, 9-1, 9-2, 9-3, 9-4} (ii) {9-8, 10-8, 10-9}
(b) {bowled, run out, stumped, hit wicket}
(c) {UU, UN, NU, NN}.
Example 5C: A salesperson, after calling on a client, records the outcome: sale made
(S), or no sale made (N ). List the sample space of outcomes in one afternoon if
(a) two clients are visited
(b) three clients are visited.
(c) Suppose now that three outcomes are recorded: sale made (S), sales potential good
(P ), no sale ever likely to be made (N ). List the sample space if two clients are
Example 6C. A party of five hikers, three males and two females, walk along a moun-
tain trail in single file.
(a) What is the sample space S?
(b) Find the subset of S that correspond to the events:
• U : a female is in the lead
• V : a male is bringing up the rear
Suppose that S is the sample space for a random experiment. For all
events A ⊂ S, we define the probability of A, denoted Pr(A), to be a
real number with the following properties:
2. Pr(S) = 1
Relative frequencies . . .
To try to get some insight into the concept of probability, consider a random exper-
iment on some sample space S repeated infinitely many times. Let’s start by doing
n trials of the random experiment and counting the number of times r that some event
A ⊂ S occurs during the n trials. Then we define r/n to be the relative frequency of
the event A. Obviously, 0 ≤ r/n ≤ 1. Thus relative frequencies and probabilities both
lie between zero and one.
We can think of the probability of the event A as the relative frequency of A as n,
the number of trials of the random experiment gets very large. In symbols
Pr(A) = lim .
n→∞ n
If you toss a fair coin, then the probability of “heads” is equal to the probability of
“tails”, i.e. Pr(H) = Pr(T ) = 0.5. If you tossed the coin 10 times you might observe
6 heads, a relative frequency of 6/10 = 0.6. But if you tossed it 100 times you might
observe 53 heads, relative frequency 53/100 = 0.53. If you kept going for a few hours
more, and tossed it 1000 times you might observe 512 heads, giving a relative frequency
of 512/1000 = 0.512. As the number of trials increases, the relative frequency
tends to get closer and closer to the “true” probability.
Almost exactly a quarter of the days of the year fall into April, May or June
(91/365.25 = 0.249, allowing for leap years every fourth year). Thus we expect the
probability that an individual’s birthday falls into one of these three months to be
pretty close to 0.249. Let’s do an experiment within the class, and fill in this table.
Front row
Whole class
Do the relative frequencies get closer to the “true” probability as n gets larger?
A ∪ A = S.
Because A and A are mutually exclusive, i.e. A ∩ A = ∅, we can use axiom 3 to state
Pr(A) + Pr(A) = 1
Pr(A) = 1 − Pr(A),
as required
A = (A ∩ B) ∪ (A ∩ B).
(A ∩ B) ∩ (A ∩ B) = ∅.
Therefore, using axiom 3,
A ∪ B = B ∪ (A ∩ B)
Because B and A ∩ B are mutually exclusive, we can again apply axiom 3 and say
A = B ∪ (A ∩ B)
Proof: The proof is by repeated use of axiom 3. The events (A1 ∪ A2 ∪ . . . ∪ An−1 ) and
An are mutually exclusive. Thus
! n−1
[ [
Pr Ai = Pr Ai + Pr(An )
i=1 i=1
Next, the events (A1 ∪ A2 ∪ . . . ∪ An−2 ) and An−1 are mutually exclusive. Thus
! n−2
[ [
Pr Ai = Pr Ai + Pr(An−1 ),
i=1 i=1
so that ! !
[ n−2
Pr Ai = Pr Ai + Pr(An−1 ) + Pr(An ).
i=1 i=1
Continue the process, and the result follows.
Example 7A: If Pr(A) = 0.5, Pr(B) = 0.6 and Pr(A ∩ B) = 0.3, find Pr(B), Pr(A ∩ B)
and Pr(A ∪ B).
By theorem 1, Pr(B) = 1 − Pr(B) = 1 − 0.6 = 0.4.
By theorem 2, Pr(A ∩ B) = Pr(A) − Pr(A ∩ B) = 0.5 − 0.3 = 0.2.
By theorem 3, Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B) = 0.5 + 0.6 − 0.3 = 0.8
Example 8C: Is it possible for events in the same sample space to have probabilities
Pr(A) = 0.8, Pr(B) = 0.6 and Pr(A ∩ B) = 0.7?
Example 9C: In the sample space S, let Pr(A) = 0.7, Pr(B) = 0.5, Pr(C) = 0.1,
Pr(A ∪ B) = 0.9, and Pr((A ∪ B) ∩ C) = 0. Depict the events A, B and C on a Venn
diagram and find the probability of the events A ∩ B, A ∩ C, A ∪ B ∪ C, B ∩ C, and
(A ∩ B).
Example 10C: If we know that Pr(A ∪ B) = 0.6 and Pr(A ∩ B) = 0.2, can we find
Pr(A) and Pr(B)?
where we define the function n(A) to mean the count of the number of elementary events
contained in A. Clearly, n(S) = N .
Example 11A: Consider tossing a fair die. Then S = {1, 2, 3, 4, 5, 6} and N = 6. Let
A = {1, 3, 5} the event of getting an odd number. Find Pr(A).
The number of elementary events contained in A is n(A) = 3. So
n(A) 3 1
Pr(A) = = = ,
N 6 2
which your intuition should tell you is correct!
Example 12A: 100 people bought tickets in a charity raffle. 60 of them bought the
tickets because they supported the charity. 75 bought tickets because they liked the
prize. No one who neither supported the charity nor liked the prize bought a ticket.
(a) What is the probability that the prize-winning ticket was bought by someone who
liked the prize?
(b) What is the probability that the prize was won by someone who did not support
the charity?
(c) What is the probability that the prize was won by someone who both supported
the charity and liked the prize?
(a) Let A and B be the events “liked the prize” and “support the charity” respectively.
To find Pr(A), we apply the three steps as follows:
1. N = 100
2. n(A) = 75
3. Therefore Pr(A) = n(A)/N = 75/100 = 0.75.
1. n(B) = 40
A set is a collection of outcomes.
Sample space
The sample space is the set of all possible outcomes of a random experiment. A
sample space is usually denoted by the symbol S and the collection of elements
contained in S enclosed in curly brackets { }.
Sample point
A sample point is an individual outcome (element) in a sample space.
5) Drawing a card from a deck of cards. The elements in the sample space are listed
S = {2♦ 3♦ 4♦ 5♦ 6♦ 7♦ 8♦ 9♦ 10♦ J♦ Q♦ K♦ A♦
2♥ 3♥ 4♥ 5♥ 6♥ 7♥ 8♥ 9♥ 10♥ J♥ Q♥ K♥ A♥
2♣ 3♣ 4♣ 5 ♣ 6♣ 7♣ 8♣ 9♣ 10♣ J♣ Q♣ K♣ A♣
2♠ 3♠ 4♠ 5♠ 6♠ 7♠ 8♠ 9♠ 10♠ J♠ Q♠ K♠ A♠ }
An event is a subset of a sample space i.e. a collection of sample points taken from a sample
Impossible event
An impossible event is an event that cannot happen (has probability zero).
Certain event
A certain event is an event that is sure to happen (has probability 1).
Simple events are events that involve only one sample point (outcome) of the sample space
1) Let E denote the event “an odd number is obtained when tossing a single die”.
Then E = {1, 3, 5}.
2) Let H denote the event “at least one head appears when tossing two coins”.
H = {hh, ht, th}.
3) Let B denote the event “obtaining a club and a heart in a single draw from a deck of
cards”. The event B is impossible. The set of outcomes of B is an empty set denoted by
B = { } = .
4) Let A denote the event “obtaining a 1, 2, 3, 4, 5 or 6 when tossing a single die”. The
event A is a certain event i.e. one of the outcomes belonging to the set describing the
event must happen. This is denoted by A = S, where S is the sample space.
Venn diagrams
A Venn diagram is a drawing, in which circular areas represent groups of items
usually sharing common properties.
The drawing consists of two or more circles, each representing a specific group or
set, contained within a square that represents the sample space. Venn diagrams are
often used as a visual display when referring to sample spaces, events and
operations involving events.
Complementary events
The complementary event Ā (sometimes written À) of an event A is all the outcomes in S
that are not in A.
1) Consider the experiment of tossing a single die. S = {1, 2, 3, 4, 5, 6}. The complement
of the event A = “obtaining a 3 or less” = {1, 2, 3} is
A = “obtaining a 4 or more” = {4, 5, 6}.
2) Consider the experiment of tossing two coins. S = {hh, ht, th, tt}. The complement of
the event H = “at least one head”= {hh, ht, th} is H “no heads” = {tt}.
The union of two events A and B, denoted by A B , is the set of outcomes that are
in A or in B or in both A and B i.e. the event that
“either A or B or both A and B occur”
or “at least one of A or B occurs”.
These definitions involving two events can be extended to ones involving 3 or more events
e.g. for the 3 events A1, A2 and A3 the event A1 A2 A3 is the event “at least one of A1, A2
or A3 occurs” and A1 A2 A3 the event “A1 and A2 and A3 occur”.
A B = {1, 2, 3, 5, 6, 7, 8, 9} , A B = { 3, 7},
A B = {2, 5, 9}, A B = {1, 6, 8}.
2) Let C be the event “drawing a face card from a deck of cards” and A the event “drawing
a king or an ace from a deck of cards”.
1) Let B be the event “drawing a black card from a deck of cards” and R the event “drawing
a red card from a deck of cards”.
The events B and R have no outcomes in common i.e. B R (empty set). Hence B
and R are mutually exclusive.
2) Let E be the event “an even number with a single throw of a die” and O the event “an
odd number with a single throw of a die” i.e. E = {2, 4, 6} and O = {1, 3, 5}.
N ( A) m
P(A) = = ,
N (S ) n
where N(A) = m is the number of outcomes favourable to the event A and N(S) = n
the number of outcomes in the sample space S i.e. the total number of outcomes.
2) Two dice are rolled. Find the probability that a sum of 7 will occur.
The number of sample points in S is 36 (see example 3 under sample space).
The classical definition of probability requires the assumption that all the outcomes in the
sample space are equally likely. If this assumption is not met, this formula cannot be used.
The possible temperatures (degrees Celsius) in Durban on a particular day in December are
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
In December Durban is hot so, for example, 15 degrees is less likely than 30 degrees
i.e. P (temperature = 15) = 1 ÷ 25 = 0.04 does not seem reasonable.
P(A) = .
Note: This formula differs from the classical formula in the sense that the classical
formula uses all the outcomes in the sample space as the total number of outcomes,
while the relative frequency formula uses the number of repetitions (n) of the
experiment as total number of outcomes. In the classical formula the number of
outcomes in the sample space is fixed, while the number of repetitions of an
experiment (n) can vary. It can be shown that the empirical probability is a good
approximation of the true probability when n is sufficiently large.
1) A bent coin is tossed 1000 times with heads coming up 692 times.
An estimate of P(h) is 0.692.
Mark f
less than 30 6
30 – 39 26
40 – 49 45
50 – 59 64
60 – 69 82
70 – 79 37
80 – 89 22
90 – 99 8
Total 290
From the table (using the empirical formula) the following probabilities can be
26 6
(a) P(mark less than 40) = 0.110.
64 82 37 22 8 6 26 45 213
(b) P(pass) = 1 0.73.
290 290 290
22 8
(c) P(above 80) = 0.103.
The preference probabilities according to gender for 2 different brands of a certain product
are summarized in the table on the following page.
The gender marginal probabilities are obtained by summing the joint probabilities over the
brands. The brand marginal probabilities are obtained by summing the joint probabilities
over the genders.
1 2
Male 0.2 0.32 0.52
Female 0.4 0.08 0.48
0.6 0.4 1
This rule can be extended to any finite number of experiments. If one experiment can be
done in n1 ways, a second one in n2 ways, . . . , a kth one in nk ways, then one of the k
experiments can be done in n1 + n2 +. . . + nk ways.
Suppose a man is standing in a room which has 2 doors to his left and 1 door to his
right. In how many ways can he leave the room?
Let “leave the room by going to the left” be experiment 1 and “leave the room by
going to the right” be experiment 2. There are n=2 ways to do experiment 1 (he can
leave by door A or door B) and there is m=1 way to do experiment 2 (he can leave by
door C). In total there are n+m = 2+1 = 3 ways to leave the room.
This rule can be extended to any finite number of experiments. If one experiment can be
done in n1 ways, a second one in n2 ways, . . . , a kth one in nk ways, then the k
experiments together can be done in n1×n2×…×nk ways.
Example 1:
A basic meal consists of soup, a sandwich and a beverage. If a person having this
meal has 3 choices of soup, 4 choices of sandwiches and a choice of coffee or tea as
a beverage, how many such meals are possible?
Example 2:
A PIN to be used at an ATM can be formed by selecting 4 digits from the digits
0, 1, 2, . . . , 9 . How many choices of PIN are there if
Factorial notation
In how many ways can n (n – integer) objects be arranged in a row?
Note: 1 ! = 1, 0 ! = 1.
1) In how many ways can 7 people be placed in a queue at a bus stop?
A permutation is the number of different arrangements of a group of items where
order matters.
The number of permutations of n objects taken r at a time is calculated from
nPr = P(n, r) = .
(n r )!
A combination is the number of different selections of a group of items where order
does not matter.
The number of combinations of a group of n objects taken r at a time is calculated
n n!
nCr = C(n, r) = ( r ) = .
(n r )!r!
1) Four people (A, B, C, D) serve on a board of directors. A chairman and vice-chairman are
to be chosen from these 4 people. In how many ways can this be done?
Chairman Vice-chairman
2) Four people (A, B, C, D) serve on a board of directors. Two people are to be chosen from
them as members of a committee that will investigate fraud allegations. In how many
ways can this be done?
Number of ways = 6.
In both these examples a choice of 2 people from 4 people is made. However, in example 1
the order of choice of the 2 people matters (since the one person chosen is chairman and
the other one vice-chairman). In example 2 the order does not matter. The only interest is in
who serves on the committee.
Application of formulae.
In question 1 the permutations formula applies with n = 4, r =2.
Number of ways = P(4, 2) = 12.
(4 2)!
Number of ways = C(4, 2) = 6.
2!(4 2)!
3) Find the number of ways to take 4 people and place them in groups of 3 at a time where
order does not matter.
Since order does not matter, use the combination formula.
4! 24
C(4,3) = 4 .
3!(4 3)! 6
4) Find the number of way to arrange 6 items in groups of 4 at a time where order matters.
6! 720
Solution: P(6,4) = 360
(6 4)! 2!
There are 360 ways to arrange 6 items taken 4 at a time when order matters.
5) Find the number of ways to take 20 objects and arrange them in groups of 5 at a time
where order does not matter.
Solution: C(20,5) = 15504
5!(20 5)!
There are 15 504 ways to arrange 20 objects taken 5 at a time when order does not
6) Determine the total number of five-card hands that can be drawn from a deck of 52
When a hand of cards is dealt, the order of the cards does not matter. Thus the
combinations formula is used.
There are 52 cards in a deck and we want to know in how many different ways we can
draw them in groups of five at a time when order does not matter. Using the
combination formula gives
C(52,5) = 2 598 960.
7) There are five women and six men in a group. From this group a committee of 4 is to be
chosen. In how many ways can the committee be formed if the committee is to have at least
3 women in it?
8) In how many ways can a phone number consisting of 5 digits be chosen from the digits
1, 2, 3, . . . , 9 if no digits are to be repeated?
9) In how many ways can the 6 winning numbers in a Lotto draw be selected?
10) In many ways can a five-card hand consisting of three eight's and two sevens be dealt?
11) How many different 5-card hands include 4 of a kind and one other card?
We have 13 different ways to choose 4 of a kind: 2's, 3's, 4's, … Queens, Kings and
Once a set of 4 of a kind has been removed from the deck, 48 cards are left.
The possible situations that will satisfy the above requirement are:
Complementary events
For any event A defined on some sample space,
P( A ) = 1 – P( A).
These formulae can be extended to probabilities involving more than two events
e.g. for 3 events A, B and C defined on some sample space
2.1 Exercise
EX 01 Let V = {v|0 < v < 5}, W = {0, 5}, X = {1, 2, 3, 4}, Y = {2, 4}, Z = {x|1 ≤ x ≤ 4}. Which of
the following statements are true, and which are false:
(a)V = W (e) X = Z
(b)Y ⊂ X (f ) Z(V
(c)W ⊃ V (g) Y ⊂ W
(d)Z ⊃ X (h) Y ∈Z
EX 02 A small town has three grocery stores (1, 2 and 3). Four ladies living in this town each randomly
and independently pick a store in which to shop. Give the sample space of the experiment which
consists of the selection of the stores by the ladies. Then define the events:
EX 03 : Let A, B and C be three arbitrary events. Find expressions for the events
EX 03 : Let A and B be two events defined on a sample space S. Write down an expression for each of
the following events, express their probabilities in terms of Pr(A), Pr(B) and Pr(A ∩ B)), and
evaluate their probabilities if Pr(A) = 0.3, Pr(B) = 0.4 and Pr(A ∩ B) = 0.2:
3 Counting Methods
In calculating probabilities, it is very essential that we be able to count sample points corresponding
to S and E in the event. However, this sometimes becomes a tedious job, and thus compact counting
methods are necessary. A branch of Algebra, called “Permutations” and “Combinations” is very useful
Suppose two operations A and B are carried out, and if there are “m” different ways of carrying out
A and “k” different ways of carrying out B, then the combined operation of A and B may be carried
out in m × k = mk
3.1 Permutations
The number of permutations (or arrangements) of n distinct objects, taken all together is
n! = n × (n − 1) × . . . × 2 × 1
0! = 1 by defination
1! = 1 × 1 = 1
2! = 2 × 1 = 2
3! = 3 × 2 × 1 = 6
4! = 4 × 3 × 2 × 1 = 24
5! = 5 × 4 × 3 × 2 × 1 = 120
10! = 10 × 9 × . . . × 2 × 1
n n!
Pr = P (n, r) =
(n − r)!
• Example 04A: If the set A = {1, 2, 3}, list all the possible permutations. There are 3! =
3 × 2 × 1 = 6 distinct arrangements of the objects in A. They are: 123|132|213|231|312|321.
• Example 04B: Consider the three letters A, B, C, the number of possible arrangements of these
three letters will be 3! = 3×2×1 = 6. These arrangements are given by ABC,ACB,BAC,BCA,CAB,
• Example 05: The focusing mechanism on Ron’s camera is bust, so that he can only take
pictures of people at a distance of 2 metres, so he only takes pictures of 3 people at a time. How
many different pictures (a rearrangement of the same people is considered a different picture)
are possible if 10 people are present?
Solution 05:This is the same as asking for the number of permutations of 10 objects taken 3
at a time, given by
10 10!
P3 = P (n, r) = = 10 × 9 × 8 = 720.
(10 − 3)!
• Example 06: Four people (A, B, C, D) serve on a board of directors. Two people are to be
chosen from them as members of a committee that will investigate fraud allegations. In how
many ways can this be done?
Solution 05: People can chosen : A and B, A and C, A and D, B and C, B and D, C and D
Number of ways= 6 ways
• Example 07A: Suppose there are 19 political parties contested an election. One party wanted
the ballot papers to have the parties listed in random order. Another said it was impractical.
How many different orderings would have been possible?
Solution 07: This is equivalent to asking: “How many permutations of 19 objects taken 19 at
a time are there?” The answer is:
19 19! 19!
P19 = P (n, r) = = = 19!
(19 − 19)! 0!
We now suppose that we have n types of objects and r slots, and that we have at least r objects
of each type available. We can thus fill the first slot with any of the n types of objects, there
are still n types of objects available for the second slot, . Because there are at least r objects of
each type, there are still objects of each of the n types available for the final, rth slot.
Thus the number of permutations of n types of objects taken r at a time, allowing repetitions is
n × n × n . . . × n) = nr
• Example 07B: How many four digit numbers of ATM pins can be generated from the 10 digits
from 0 to 9, a)if repetitions are permitted? b)if repetitions are not permitted?
Solution 07B:
(a) We have four slots to fill. But because all of the 10 digits remain available to fill every slot,
this can be done in 104 = 10000 ways. This makes sense, because there are 10 000 numbers from
0 (actually 0 000) to 9 999.
(b) Repetitions not allowed
10 × 9 × 8 × 7 = 5040
3.3 Combinations
• A combination is the number of different selections of a group of items where order does not
n n!
Cr = C(n, r) =
r!(n − r)!
Therefore the formula for can also be written as
n n!
r r!(n − r)!
• Example 08: In how many ways can a 9 man work team be formed from 15 men? The problem
asks only for the number of ways of choosing 9 men out of 15:
15 15!
= = 5005
9 9!6!
• Example 09: How many different bridge hands can be dealt from a pack of 52 playing cards?
A bridge hand contains 13 cards — what matters is only the group of cards (even though you
might arrange them in a convenient order). Therefore, bridge hands consist of combinations of
52 objects taken 13 at a time:
52 52!
= = 635013559600.
13 13!39!
At 15 minutes per bridge game, there are enough different bridge hands to keep you going for
about 20 million years continuously.
• Example 010: From 8 accountants and 5 computer programmers, in how many ways can one
select a committee of
(a) 3 accountants and 2 computer programmers?
(b) 5 people, subject to the condition that the committee contain at least 2 computer program-
mers and at least two accountants.
Solution 010:
(a) We can choose 3 accountants from 8 in ways. We can choose 2 computer programmers
from 5 in ways. We multiply the results, because for every group of 3 accountants that
we choose, we can choose one of different groups of computer programmers. Thus we can
choose a committee of 3 accountants and 2 computer programmers in
8 5
= 56 × 10 = 560ways
3 2
(b) The total possible number of ways of forming the committee is: 840 ways (Work it out!)
4 Set Theory
A Venn diagram is a drawing, in which circular areas represent groups of items usually sharing common
properties. The drawing consists of two or more circles, each representing a specific group or set,
contained within a square that represents the sample space. Venn diagrams are often used as a visual
display when referring to sample spaces, events and operations involving events.
Complementary : The complementary event A (sometimes written Ac ) of an event A is all the
outcomes in S that are not in A.
• Consider the experiment of tossing a single die. S = {1, 2, 3, 4, 5, 6}. The complement of the
event A = (obtaining a 3 or less) = {1, 2, 3} is A= “obtaining a 4 or more” = 4, 5, 6.
• Consider the experiment of tossing two coins. S = {HH, HT, T H}. The complement of the
event H = (at least one head)= {HH, HT, T H} is H “no heads” = {}.
• The union of two events A and B, denoted by A ∪ B, is the set of outcomes that are in A or in
B or in both A and B i.e. the event that “either A or B or both A and B occur” or “at least
one of A or B occurs”.
• The intersection of two events A and B, denoted by A ∩ B, is the set of outcomes that are in
both A and B i.e. the event that “both A and B occur”.
• These definitions involving two events can be extended to ones involving 3 or more events e.g.
for the 3 events A1 , A2 and A3 the event A1 ∪ A2 ∪ A3 is the event “at least one of A1 , A2 or A3
occurs” and A1 ∩ A2 ∩ A3 the event A1 , A2 and A3 occur”.
1 Let B be the event “drawing a black card from a deck of cards” and R the event “drawing a red
card from a deck of cards”.
The events B and R have no outcomes in common i.e.B ∩ R = φ(empty set). Hence B and R
are mutually exclusive.
2 Let E be the event “an even number with a single throw of a die” and O the event “an odd
number with a single throw of a die” i.e. E = (2, 4, 6) and O = (1, 3, 5).
Hence, E and O have no outcomes in common i.e. E ∩ O = φ(empty set)and are therefore
mutually exclusive
5 Definition of Probability
Classical definition of probability If there are n equally likely total numbers of outcomes of which m
are favorable to an event A, then the probability of occurrence of the event A, denoted as P(A), is
given by
N (A) n
P (A) = =
N (S) m
where N(A) = m is the number of outcomes favourable to the event A and N(S) = n the number of
outcomes in the sample space S i.e. the total number of outcomes.
Note: Since N (A) ≥ and N (A) ≤ N (S), 0 ≤ P (A) ≤ 1.
QUE 08 Example 08 If 30% of Nigerians are obese (A) and that 4% of Nigerians suffer from diabetes
(B). 2% are both obese and suffer from diabetes. What is the probability that a randomly
selected person is obese or suffers from diabetes?
• Solution 08 Here, P(A) =0.3, P(B) =0.04 and P(A and B) =P (A ∩ B) =0.02. Then,
QUE 09 Example 09 What is the probability that the individual selected is male or against abortion. Let
A = {M ale{ and B the event B = {against{, consider a survey of 1000 people with possibility
of interviewing 445 men, P(A) = 451/1000 and 442 men were against abortion, i.e. P(B) =
442/100 and P(A and B) = 203/1000
• Solution 08
Que 10 Two coins are tossed. Find the probability of getting (i) exactly two heads. (ii) at least one
Que 11 Two dice are rolled. Find the probability that a sum of 7 will occur.
Ans 11 Solution:
The number of sample points in S is 36 (see example 3 under sample space).
Let A = “a sum of 7 will occur”. A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}
∴ P (A) = 36 == 16 .
Que 12 Example 12A: 100 people bought tickets in a charity raffle. 60 of them bought the tickets
because they supported the charity. 75 bought tickets because they liked the prize. No one who
neither supported the charity nor liked the prize bought a ticket.
(a) What is the probability that the prize-winning ticket was bought by someone who liked the
(b) What is the probability that the prize was won by someone who did not support the charity?
(c) What is the probability that the prize was won by someone who both supported the charity
and liked the prize?
Ans 12 Solution 12: (a) Let A and B be the events “liked the prize” and “support the charity”
respectively. To find Pr(A), we apply the three steps as follows:
1. N = 100
2. n(A) = 75
3. Therefore Pr(A) = n(A)/N = 75/100 = 0.75.
1. n(B) = 40
2. Thus Pr(B) = n(B)/N = 40/100 = 0.4
1. n(A ∩ B) = 60 + 75 − 100 = 35
2. Pr(A ∩ B) = n(A ∩ B)/N = 35/100 = 0.35.
• Example 13: A pack of playing cards contains 52 cards, 13 belonging to each of the four suites
“Spades”, “Hearts”, “Diamonds” and “Clubs”. Within each suite the 13 cards are labelled: Ace,
2,. . . , 10, Jack, Queen, King. Let D be the event that a randomly selected card is a diamond, and
K be the event that the card is a king, and B be the event that the card has one of the numbers
from 2 to 10. Find Pr(D), Pr(K), Pr(B), Pr (D ∩ K), P r(B ∪ D), P r(B ∩ K) and Pr(D ∪ K ∪ B).
• Example 14: The seats of a jet airliner are arranged in 55 rows (numbered 1 to 55) of 10 seats
(lettered A to K, leaving out I). In each row, seats C, D, G and H are on aisles, and A and K
are window seats. Smoking is permitted in rows 45 to 55 inclusive. If a passenger is assigned a
seat at random, what is the probability of being allocated
Example: The following diagram shows two events A and B in the sample space S. Are the events A
and B independent?
Answer : There are 10 black dots in S and event A contains 4 of these dots. So the probability
4 5
of A, is P (A) = 10 . Similarly, event B contains 5 black dots. Hence P (B) = 10 . The conditional
probability of A given B is
P (A ∩ B) 2
P (A|B) = =
P (B) 5
The intuitive feeling is that independent events have no effect upon each other. But how do we
decide whether two events A and B are independent? If the occurrence of event A has nothing to do
with the occurrence of event B, then we expect the conditional probability of B given A to be the
same as the unconditional probability of B:
Pr(A|B) = Pr(B).
The information that event A has occurred does not change the probability of B occurring. If
P r(B|A) = P r(B), then, using the definition of conditional probability,
Pr(A ∩ B)
= Pr(B).
Pr(A ∩ B) = Pr(A) × Pr(B).
This leads us to definition of independent events.
Events A and B are independent if
In words, the probability of the intersection of independent events is the product of their individual
Remark The definition can be extended to the independence of a series of events i.e. Multipli-
cation of series of probabilities: if events A1 , A2 , . . . , An are independent, then
• Example 17A: Two coins are tossed. Using the classical definition of probability,
P(both tosses heads) = 21 × 12 = 14 .
Assuming that both coins are unbiased, P(1st coin is heads) = P(2nd coin is heads) = 12 .
Hence P(1st coin is heads) × P(2nd coin is heads) = 12 × 21 = 14 = P(both tosses heads), the
events “heads on the first toss” and “heads on the second toss” are independent
• Example 17B: A coin is tossed and a single 6 sided die is rolled. Find the probability of
“heads” and rolling a 3 with the die.
Solution: P(head) = 1/2 and P(3) = 1/6. Since the results of the coin and the die are
independent, P (headsand3) = P (heads) × P (3) = (1/2) × (1/6) = 1/12
• Example 18: A school survey found that 9 out of 10 students like pizza. If three students are
chosen at random with replacement, what is the probability that all three students like pizza?
P(student 1 likes pizza) = 9/10 = P(student 2 likes pizza) = P(student 3 likes pizza).
P(student 1 likes pizza and student 2 likes pizza and student 3 likes pizza)
= P(student 1 likes pizza) × P(student 2 likes pizza) × P(student 3 likes pizza)
9 3
= 10 =0.729
• Example 19: The probability that person A will be alive in 20 years is 0.7 and the probability
that person B will be alive in 20 years is 0.5, while the probability that they will both be alive
in 20 years is 0.45. Are the eventsE1 “A is alive in 20 years” and E2 “B is alive in 20 years”
P (E1 ) = 0.7, P (E2 ) = 0.5, P (E1 ∩ E2 ) = 0.45
Since P (E1 )×P (E2 ) = 0.7×0.5 = 0.35 6= P (E1∩E2), the events E1 and E1 are not independent.
• Example 20: Let A be the event that a microchip is manufactured perfectly. Let B be the event
that the chip is installed correctly. If Pr(A) = 0.98 and Pr(B) = 0.93 what is the probability
that the installed chip functions perfectly?
Solution: We require P r(A ∩ B). Because manufacture and installation may be considered
independent, we have:
• Example 22: A four-engined plane can land safely even if three engines fail. Each engine fails,
independently of the others, with probability 0.08 during a flight. What is the probability of
making a safe landing?
Solution: Let Ai be the event that engine i fails. Then the event “safe landing” can be written
as (A1 ∩ A2 ∩ A3 ∩ A4 ), the complement of the event “all engines fail”
P r(A1 ∩ A2 ∩ A3 ∩ A4 )
= 1 − (Pr(A1 ∩ A2 ∩ A3 ∩ A4 )) = 1 − (Pr(A1 ) × Pr(A2 ) × Pr(A3 ) × Pr(A4 )) .
= 1 − 0.084 = 0.999959040.
Quite safe!. On the average, about one flight in 24 414 will crash.
Example 23 The probability that the rand will weaken against the dollar tomorrow is 0.53.
The probability that you will wake up late tomorrow is 0.42.
(a) What is the probability that, tomorrow, the Naira weakens against the dollar and you wake
up late?
(a) What is the probability that, tomorrow, the Naira weakens against the dollar or you wake
up late?
• Example 24: Some financial academics argue that the day-to-day movements of share prices
are statistically independent. Assume, hypothetically, that the share De Beers has a probability
of 0.55 of rising on any given trading day. What is the probability that it rises on three successive
trading days?
5.3 Exercise
• Ex. 20: There are 33 candidates for an election to a committee of three. What is the probability
that Jones, Smith and Brown are elected?
• Ex. 21: A group of eight students fill the front row at Statistics lectures daily. They decide to
keep attending lectures until they have exhausted every possible arrangement in the front row.
For how many days will they attend lectures?
• Ex. 22: A young investor is considering the purchase of a portfolio of three shares from the
“Building and construction” sector of the stock exchange. He chooses three shares at random
from the 25 shares currently listed in this sector.
(a) How many ways can shares be selected for the portfolio?
(b) What is the probability that Everite, Grinaker and L.T.A. (three shares in this sector) are
(c) What is the probability that Grinaker is one of the selected shares?
• Ex. 23: A firm of speculative builders has bought three adjoining plots. The company builds
houses in seven styles. It is concerned about the visual appearance of the houses from the street.
So they ask their drafting section to sketch all possible selections of street views.
(a) How many sketches are required if (i) no repetitions of styles are allowed, and if (ii) they
allow repetitions of styles?
(b) If they choose one sketch at random from those in part (a)(ii), what is the probability that
all the houses will be of different styles?
(c) In order to determine the materials required, the quantity surveying department is con-
cerned only with the three styles which might be built (and not on which plot they are
built on). How many combinations of styles must they be prepared for if (i) no repetitions
of styles are allowed, and if (ii) repetitions are allowed?
• Ex. 24: Two new computer codes are being developed to prevent unauthorized access to
classified information. The first consists of six digits (each chosen from 0 to 9); the second
consists of three digits (from 0 to 9) followed by two letters (A to Z, excluding I and O).
(a) Which code is better at preventing unauthorized access (defined as breaking the code in
one attempt)?
(b) If both codes are implemented, the first followed by the second, what is probability of
gaining access in a single attempt?
• Ex. 25: A housewife is asked to rank five brands of washing powder (A, B, C, D, E) in order
of preference. Suppose that she actually has no preference, and her ordering is arbitrary. What
is the probability that
• Ex. 26: A and B are events such that Pr(A) = 0.6, Pr(B | A) = 0.3, and Pr(A∪ B) = 0.72.
Are A and B independent, mutually exclusive, or both?
• Ex. 27: If the probability is 0.001 that a 20-watt bulb will fail a 10-hour test, what is the
probability that a sign constructed of 1000 bulbs will burn for 10 hours
(a) with no bulb failure?
(b) with one bulb failure?
(c) with k bulb failures?
• Ex. 28: Show that if events A and B are independent, then the following pairs of events are
also independent.
(a) A and B
(b) A and B.
• Ex. 29: The events A,B and C are such that A and B are independent and B and C are mu-
tually exclusive. Their probabilities are Pr(A) = 0.3, Pr(B) = 0.4, and Pr(C) = 0.2. Calculate
the probabilities of the following events. (a) Both B and C occur.
(b) At least one of A and B occurs.
(c) B does not occur.
(d) All three events occur.
(e) (A ∩ B) ∪ C.
Pr(A ∩ B)
Pr(A|B) = , whereP (B) > 0 (2)
Pr(A ∩ B)
Pr(B|A) = , whereP (A) > 0 (3)
Solution 25
P (satisf ied|male) = = 0.6
P(satisfied |female) = = 0.45
P(not satisfied|male) =
P(not satisfied | female) =
P(satisfied) = = 0.54 and P(not satisfied) =
1) When calculating a conditional probability the sample space is restricted to that associated with
the event that is known to occur.
2) The probability of a person being satisfied depends on the gender of the person being interviewed.
In this case females are less satisfied than males with the news coverage.
3) In the example above, P(satisfied) and P(not satisfied) are known as marginal probabilities.
• Example 26 : At a certain university the probability of passing accounting is 0.68, the prob-
ability of passing statistics 0.65 and the probability of passing both statistics and accounting is
0.57. Calculate the probability that a student:
(a) passes statistics when it is known that he/she passed accounting.
(b) passes accounting when it is known that he/she passed statistics.
(c) passes statistics when it is known that he/she did not pass accounting.
• Solution 26: Let A denote the event “a student passes accounting” and B the event “a student
passes statistics”.
Then A is the event “a student did not pass accounting”,
A ∩ B the event “a student passes both statistics and accounting” and
A ∩ B the event “a student passes statistics, but not accounting”.
Given: P(A) = 0.68, P(B) = 0.65, P (A ∩ B) = 0.57.
Pr(A ∩ B)
Pr(B|A) =
= = 0.838
Pr(A ∩ B)
Pr(A|B) =
= = 0.877.
(c) P (B|A) =?
(d) P (A|B) =?
Pr(A ∩ B)Pr(B)
Pr(B|A) = (4)
Pr(A ∩ B)Pr(A)
Pr(A|B) = (5)
A very useful tool for finding conditional probabilities is Bayes’ theorem, which connects P r(B|A)
with P r(A|B)), named in honour of Rev. Thomas Bayes, who did pioneering work in probability
theory in the 1700’s.
Bayes’ Theorem. If A and B are two events, then
Pr(A ∩ B)Pr(A)
Pr(A|B) =
Pr(B|A)Pr(A) + Pr(B|A)Pr(A)
and theorem 2
Pr(B) = Pr(A ∩ B) + Pr(A ∩ B)
Substituting , we have
Pr(A ∩ B)
Pr(A|B) =
Pr(A ∩ B) + Pr(A ∩ B)
Also, we note that
Pr(A ∩ B) = Pr(B|A)Pr(A)
Pr(Ā ∩ B) = Pr(B|A)Pr(A)
Pr(A|B) = .
Pr(B|A)Pr(A) + Pr(B|A)Pr(A)
Example 27: A television manufacturer cannot produce the full quota of tubes it requires, so it
purchases 20% of its needs from an outside supplier. The quality manager has determined that 6% of
the tubes produced in house are defective, and that 8% of the purchased tubes are defective. He finds
the tube of a randomly selected television to be defective. What is the probability that the tube was
produced by the company.
Solution 27:
We are given Pr(D|C) = 0.06, Pr(D|C) = 0.08 and Pr(C) = 0.8. We need to determine Pr(C|D).
By the Bayes’s Theorem
P r(C|D) =
Pr(D|C)Pr(C) + Pr(D|C)Pr(C)
0.06 × 0.8
= = 0.75
0.06 × 0.8 + 0.08 × 0.2
Example 28: When testing a person for a certain disease, the test can show either a positive result
(the person has the disease) or a negative result (the person does not have the disease). When a
person actually has the disease, the test shows positive 99% of the time. When the person actually
does not have the disease the test shows negative 95% of the time. Suppose it is known that only
0.1% of the people in the population have the disease.
(a) If a test turns out to be positive, what is the probability that the person has the disease?
(b) If the test turns out to be negative, what is the probability that the person does not have the
Solution 28:
Let A = the person has the disease and B = the test returns a positive result.Then
A is the event the person does not have the disease,
B|A is the event the test is positive given the person has the disease,
B|A is the event “the test is positive given the person does not have the disease and
B|A is the event “the test is negative given the person does not have the disease”.
(a) We are given P r(A) = 0.01, P r(B|A) = 0.99 and P r(B|A) = 0.95 , then
P r(A) = 1–P r(A) = 0.999,
Similarly P (B|A) = 1–P (B|A) = 0.05.
Substituting the above formular, we obtain
P r(A ∩ B) 0.00099
P r(A|B) = =
P r(B) 0.05094
= 0.0194
The result can be interpreted as the chances that a person will have the disease when the result
of the test shows positive is 194 in 10 000.
P r(A ∩ B)
P r(A|B) =
P r(B)
P r(A)P r(B|A) 0.999 × 0.95
= = = 0.9999895
1 − P r(B) 0.94906
From the above it can be seen that a negative result of the test is very reliable (it will be wrong
only 105 times in 10 million cases).
Example 29: The probabilities of producing a defective item with three machines M1 , M2 , M3
are 0.1, 0.08 and 0.09, respectively. At any instant, only one of the machines is being operated, in
the following percentage of the daily work, respectively: 30%, 30%, 40%. An item is randomly chosen
and found to be defective. Which machine most probably produced it?
Solution 29: Denoting the defective item by A, the total probability breaks down into:
5.6 Exercise
EX1. You feel ill at night and stumble into the bathroom, grab one of three bottles in the dark and
take a pill. An hour later you feel really ghastly, and you remember that one of the bottles
contains poison and the other two aspirin. Your handy medical text says that 80% of people
who take the poison will show the same symptoms as you are showing, and that 5% of people
taking aspirin will have them.
Let B be the event “having the symptoms”, A be the event “taking the poison”, Then Ā is the
event “taking aspirin”.
What is the probability that you took the poison given that you have got the symptoms, i.e.
what is P r(A|B)?
EX2. A well is drilled as part of an oil exploration programme. The probability of the well passing
through shale is 0.4. If the well passes through shale, the probability of striking oil is 0.3. If it
does not pass through shale, the probability drops to 0.1.
(a) Given that oil was found, what is the probability that it did not pass through shale?
(b) b) Given that oil was not found, what is the probability that it passed through shale?
EX3. A family has two dogs (Rex and Rover) and a cat called Garfield. None of them is fond of the
postman. If they are outside, the probabilities that Rex, Rover and Garfield will attack the
postman are 30%, 40% and 15%, respectively. Only one is outside at a time, with probabili-
ties 10%, 20% and 70%, respectively. If the postman is attacked, what is the probability that
Garfield was the culprit?
Hint : Extended Bayes Theorem: The Bayes’ theorem. Prove the adult version. Let
A1 , A2 , . . . , An be a set of mutually exclusive and exhaustive events in S. Let B be any other
event. Then
Pr(B|Ai )Pr(Ai )
Pr(Ai |B) = .
Pr(B|Ai )Pr(A1 ) + Pr(B|A2 )Pr(A2 ) + . . . + Pr(B|An )Pr(An )
EX4. Suppose that a fashion shirt comes in three sizes and five colours. The three sizes (and the per-
centage of the population who purchase each size) are: small (30%), medium (50%), and large
(20%). Market research indicates the following colour preferences: white (6%), blue (26%), green
(36management of a store expects to sell 1000 of these shirts. How many shirts of each size and
colour should they order? Assume independence.
EX5. The probability of passing Statistics without doing these exercises is 0.1 and 0.8 if they are done.
If 60% of students do these exercises, what is the probability that a student has not done the
exercises if he passes Statistics?
EX6. Which of the following pairs of events would you expect to be independent, which mutually
exclusive and which neither?
(a) studying Economics and being left-handed,
(b) owning a dog and paying vet’s bills,
(c) the prices of shares in Anglovaal and Gold Fields (both in the mining house sector of the
Johannesburg Stock Exchange) both rising today,
(d) being a member of the Canoe Club and studying for a B.A.,
(e) buying sugar-free cooldrink and buying a cream doughnut for yourself.
EX7. An X-ray test is used to detect a disease that occurs, initially without any obvious symptoms, in
3% of the population. The test has the following error rates: 7% of people who are disease free
have a positive reaction and 2% of the people who have the disease have a negative reaction. A
large number of people are screened at random using the test, and those with a positive reaction
are examined further.
– (a) What proportion of people who have the disease are correctly diagnosed?
– (b) What proportion of people with a positive reaction actually have the disease?
– (c) What proportion of people with a negative reaction actually have the disease?
– (d) What proportion of the tests conducted give the incorrect diagnosis?
6 Chapter Four : Random Variables
Random variables fall into two categories — discrete and continuous. The mathematical treatment
of these two types of random variables is very different - as you will learn from the remainder of this
Discrete random variables take on isolated values along the real line, usually (but by no means
always) integer values. Examples of integer-valued discrete random variables are:
• the number of customers entering a store between 09h00 and 10h00
• the number of occupied tables at a restaurant
• the number of clients visited by a salesperson during a day
• the number of applicants who respond to an job advertisement
In contrast to discrete random variables, a continuous random variable can (conceptually, at least)
be measured to any degree of accuracy; i.e. between every two possible values x1 and x2 that the
random variable can assume, there is another possible value x3 , between x1 and x2 . The set of all
possible values of a continuous random variable is usually an interval of the real line. Examples of
continuous random variables are”
• the distance a car travels on one litre of petrol
• the proportion of gold in a sample of ore
• the volume of milk that actually goes into a nominally one litre carton
• the time that a customer waits in the queue at a fast food outlet
• the direction of the wind at midday.
Example 6 Which of the following are random variables? Which of the random variables are
continuous and which are discrete? Write down the set of values that each random variable can take
(a) The number of customers arriving at a supermarket during the morning.
(b) The number of letters in the Greek alphabet.
(c) The opening price of gold in New York on Monday next week.
(d) The number of seats that will be sold for a performance of a play in a theatre with a capacity
of 328.
(e) The length of time you have to wait at an autobank.
(f) The ratio between the circumference and the diameter of a circle.
(g) The last digit of a randomly selected telephone number.
The distinction between discrete and continuous random variables is critical because we develop dif-
ferent mathematical approaches for the two types of random variable.
Definition: Let X be a random variable with space RX and probability density function f(x). The
mean µX of the random variable X is defined as
x∈Rx xp(x) If X is discrete
µX = (6)
R −∞
−∞ xf (x) If X is continuous
if the right hand side exists.
The p(x) is called the probability mass function (pmf) and f (x) is the probability density function
(pdf) for the discrete and continuous random variable respectively.
A function p(x) is called a probability mass function (frequently abbreviated to p.m.f.) if it satisfies
the conditions PMF1, PMF2 and PMF3.
PMF1: p(x) is defined for all values of x, but p(x) 6= 0 only at a finite or “countably infinite” set of
PMF3: all values of p(x) lie in the unit interval [0, 1], that is 0 ≤ p(x) ≤ 1.
PMF3: p(x) = 1, where the sum is taken over all values of x for which p(x) 6= 0.
The mean of a random variable is a composite of its values weighted by the corresponding prob-
abilities. The mean is a measure of central tendency: the value that the random variable takes “on
average.” The mean is also called the expected value of the random variable X and is denoted by
E(X). The symbol E is called the expectation operator. The expected value of a random variable
may or may not exist.
Not1 1: In the case of a discrete variable, the mean or expected value of a random variable X will
be represented by E(X) = µ and it is calculated by
E(X) = µ = xp(x)
. and variance X X
Var(X) = σ 2 = (x − µ)2 = x2 p(x) − µ2
. Note 2: In the case of a continuous variable , the mean or expected value of a random variable
X will be represented by E(X) = µ and it is calculated by
Z b
E(X) = µ = xf (x)
. and variance Z Z
b b
Var(X) = σ 2 = (x − µ)2 = x2 f (x) − µ2
a a
• Example 001: An unbiased die is rolled and the random variable X consists of the number
of dots appearing on the upturned face. Find the probability mass function for this random
variable. Verify the property PM!, PMF2 PMF2 and PMF3 above.
• Example 002:
satisfies the conditions for being the probability mass function of some random variable X.
Sketch the function, p(x)
b) Find Pr[2 ≤ X ≤ 4].
c) Find Pr[X ≥ 4].
• Example 003:
• Example 004: Let X be a random variable of number of tails when a coin is tossed 3 times.
Find the expected value of the random variable X and the variance:
Let the number of tails in sample space be 0, 1, 2, 3, the probability distribution table is given
Using the information in the table above
Var(X) = σ 2 = x2 p(x) − µ2 .
X 1 3 3 1 12
σ2 = x2 p(x) − µ2 = {02 × + 12 × + 22 × + 32 × } − { }2
8 8 8 8 8
= 3 − 1.5 = 0.75
Standand deviation:
σ = 0.75 = 0.866
(EX.) Consider a discrete random variable with probability mass function given below.
x 1 2 3 4
P(X=x) 0.1 0.3 0.4 0.2
13C The sample space, numerical values for the elementary events and their associated
probabilities are:
X = 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
Probability = 36 36 36 36 36 36 36 36 36 36 36
Y= 1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 30 36
1 2 2 3 2 4 2 1 2 4 2 1 2 2 2 1 2 1
Pr(Y ) = 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
Pr[x ≥ 10] = 0.1667, Pr[Y ≥ 13] = 36 .
15C (a), (c) and (d) are probability mass functions, but (b) is not, because
p(1) = −0.2 < 0.
26C (a), (b), (f) and (g) are probability mass functions, (c), (d), (e) and (h) are prob-
ability density functions.
p(x) = 0.1 × 0.5 = 0.05 x=0
= 0.2 × 0.5 + 0.1 × 0.5 = 0.15 x=1
= 0.2 × 0.5 + 0.5 × 0.5 = 0.35 x=2
= 0.5 × 0.5 + 0.2 × 0.5 = 0.35 x=3
= 0.2 × 0.5 = 0.1 x=4
=0 otherwise
Exercises. . .
∗ 4.1 Which of the following random variables are discrete, and which are continuous?
(a) the time required to answer this question
(b) the number of words in a book chosen at random from the library
(c) the number of “heads” in 6 flips of a coin
(d) the number of goals scored in a soccer match
(e) the maximum temperature recorded at Cape Town International Airport to-
(f) the volume of air breathed in by an individual when asked to “take a deep
(g) the annual income to the nearest cent of a randomly chosen wage-earner
(h) the population of a randomly chosen town in the Free State
(i) the length of time you have to wait for a bus
f (x) = Ax3 0 ≤ x ≤ 10
=0 otherwise
Find A. What is the probability that X lies between 2 and 5, and what is the
probability that X is less than 3? Sketch the density function.
∗ 4.7 A random variable X has probability density function
f (x) = 2x 0 ≤ x ≤ 1
= 0 otherwise
find the number a such that the probability that X < a is three times the proba-
bility that X ≥ a.
4.9 If f (x) = 3x2 for 0 < x < 1, and zero elsewhere, find the number b, such that X
is equally likely to be greater than, or less than b.
∗ 4.10 The probability density function of the life in hours X of a certain kind of radio
tube is found to be
f (x) = 100/x2 x > 100
=0 otherwise
Three such tubes are bought for a radio set. What is the probability that none
will have to be replaced during the first 150 hours of operation?
Further exercises. . .
4.12 A continuous random variable X has probability density function
f (x) = k a ≤ x ≤ b
= 0 otherwise
4.13 Find values for c so that the following functions may serve as probability density
f (x) = c + e−x 0 ≤ x ≤ 1
=0 otherwise
f (x) = x + 2 0≤x≤c
=0 otherwise
∗ 4.14 The density function for a random variable X is given by
f (x) = 34 (kx − x2 ) 0 ≤ x ≤ 2
=0 otherwise
Solutions to exercises. . .
4.1 (b) (c) (d) (g) and (h) are discrete
(a) (e) (f) and (i) are continuous.
(j) is an unusual example of a mixed continuous and discrete random variable:
although the random variable is, at face value, continuous, it cannot be modelled
by a conventional probability density function because the probability of no rain
in a day is not zero but positive. The probability function for X needs to be
something like
p(x) = p x=0
= f (x) x > 0
=0 otherwise
with the “probability density function” f (x) integrating to 1 − p.
4.7 0.6931
4.8 0.8660
4.9 0.7937
4.10 8/27
4.11 0.9850
4.12 k = 1/(b − a)
7 Probability Distributions
7.1 Discrete Probability Distributions
Chapter 5
A number of probability mass and density functions have proved themselves useful as
“models” for a large variety of practical problems in business and elsewhere. We consider
four of the most frequently encountered probability distributions in this chapter — the
Binomial, Poisson, Exponential and Normal Distributions.
1. We have a random experiment which has a sample space with exactly two out-
comes, one of which we can label “success”, and the other “failure”: i.e. S =
{success, failure}.
e.g. A door-to-door salesperson calls on a prospective client — the client either
purchases the product (success) or does not purchase (failure).
3. The probability of success remains constant from trial to trial. We assume that
each client is equally likely to purchase the product. Let Pr(success) = p; thus
Pr(failure) = 1−p. It is sometimes convenient to let q = 1−p, so that Pr(failure) =
the random variable X has a binomial distribution. In the above example, X is the
number of calls that resulted in sales. Because 6 calls were made, X must assume one
of the values 0, 1, . . . , 6, and X is therefore an example of a discrete random variable.
Binomial processes occur in many contexts. From an industrial or commercial per-
spective, one of the most important binomial processes occurs in the field of quality. The
quality of a product or service, whether it is a tomato, a nail, a personal computer, a
car, an insurance policy or the punctuality of a train, can be classified as “satisfactory”
or “defective”. In particular, the binomial probability distribution provides the basis for
deciding whether or not a consignment of goods meets the desired specifications.
In a binomial process, we have n independent trials, each trial has two
outcomes, success of failure, and Pr[success] = p for all trials. Let the
random variable X be the number of successes in n trials.
Then X has the binomial distribution, and Pr[X = x] is given by
the probability mass function
p(x) = n x p (1 − p)
n−x x = 0, 1, . . . , n
=0 otherwise
Once we give values to n and p, (n ≥ 1, 0 < p < 1), a particular binomial distribution
is specified. n and p are examples of what we call the parameters of the distribution.
Once the parameters of a distribution have values, a particular distribution is specified.
We have a neat abbreviated notation which saves us writing “the random variable
X is distributed binomially with parameters n and p”. We compress all this information
into the symbols X ∼ B(n, p).
Example 1A: A door-to-door salesperson calls on 6 clients per session. Each client
makes their purchasing decision independently of the others, with probability 0.2 of
purchasing the product. What are the probabilities that 0, 1, 2, 3, 4, 5 or 6 clients
purchase the product?
Clearly, the three conditions for the binomial process are satisfied, and X, the num-
ber of clients who purchase the product, has a binomial distribution with n = 6 and
p = 0.2 : thus X ∼ B(6, 0.2).
Instead of simply using the formula given in the box, let us compute from first
principles the probability of, say, 2 clients purchasing the product, i.e. Pr[X = 2]:
Firstly, 2 clients out of 6 can purchase the product in many different permutations.
Let A1 be the event that the first 2 clients purchase (these are the “successes” that we
count) and that clients 3 to 6 refuse to purchase (i.e. are “failures”). Then, using our
usual conventions, we can write
A1 = S ∩ S ∩ F ∩ F ∩ F ∩ F.
Let the events A2 , A3 represent other permutations of 2 successes and 4 failures, e.g.
A2 = F ∩ S ∩ S ∩ F ∩ F ∩ F
A3 = F ∩ F ∩ S ∩ S ∩ F ∩ F
Secondly, we compute Pr(A1 ). Because the clients act independently of each other,
Pr(A1 ) = Pr(S ∩ S ∩ F ∩ F ∩ F ∩ F )
= Pr(S) × Pr(S) × Pr(F ) × Pr(F ) × Pr(F ) × Pr(F )
= p2 (1 − p)4 = 0.22 × 0.84 .
Recall that the probability of the intersection of independent events is the product of
the individual probabilities, so that
Thirdly, the events A1 , A2 , . . . , A15 are mutually exclusive — no client can simul-
taneously both purchase and refuse to purchase! Thus
Stop a while and convince yourself that the answer 0.22 0.84 obtained from first
principles is the same as that obtained by substituting n = 6, p = 0.2 and x = 2 into
the formula for the binomial probability mass function.
Try computing the remaining probabilities from first principles, and compare them
with the results obtained from the formula. The probabilities are given in the table
x p(x) = Pr[X = x]
0 × 0.8 = 0.2621
6 1 5
1 1 × 0.2 × 0.8 = 0.3932
6 2 4
2 2 × 0.2 × 0.8 = 0.2458
6 3 3
3 3 × 0.2 × 0.8 = 0.0819
6 4 2
4 4 × 0.2 × 0.8 = 0.0154
6 5 1
5 5 ×0.2 × 0.8 = 0.0015
6 6
6 6 × 0.2 = 0.0001
The probability that all six clients purchase the product is very small (0.0001) but
will occasionally occur (we expect it roughly once in every 10 000 times that a session of
6 calls are made!). The probability that none of the 6 clients purchase is 0.2621, so that
in approximately a quarter of sessions of 6 calls no purchases are made. The probability
that two or more purchases are made is Pr[X ≥ 2] = 0.2458 + 0.0819 + 0.0154 + 0.0015 +
0.0001 = 0.3417, so that in approximately one-third of sessions of 6 calls the salesperson
achieves two or more sales.
Example 2B: What is the probability of a contractor being awarded only one out of
five contracts? Assume that the probability of being awarded a contract is 0.5.
Let “success” = “awarded a contract”. Pr (success) = p = 12 . So q = 1 − p = 21 . We
have n = 5 trials. Let X be the number of successes in 5 trials. Then X ∼ B(5, 12 ).
x 5−x
5 1 1
P [X = x] = p(x) = x = 0, 1, . . . , 5
x 2 2
So 5
5 1
Pr[X = 1] = p(1) = = 5/32.
1 2
X n
n n
px (1 − p)n−x = px q n−x
x x
x=0 x=0
n 0 n n 1 n−1 n x n−x n n 0
= p q + p q + ··· + p q + ··· + p q
0 1 x n
= (p + q)n
(from the binomial theorem — hence the name “binomial ”distribution)
= 1n (because q = 1 − p)
= 1.
0 5 10 15
X ∼ B(15, 0.3)
0 5 10 15
X ∼ B(15, 0.8)
0 5 10 15
=0 otherwise
12 2
(c) (i) Pr[X = 2] = p(2) = 2 0.10 0.9010 = 0.2301
12 0
(ii) Pr[X = 0] = p(0) = 0 0.10 0.9012 = 0.2824
(iii) Pr[X ≥ 2] = 1 − Pr[X = 0] − Pr[X = 1] = 1 − 0.2824 − 0.3766 = 0.3410.
We are given a period of time during which events occur at random.
The average rate at which events occur is λ events per time period.
It is critical that the time period referred to in the rate must be the
same as the time period during which the events are counted. Let the
random variable X be the number of events occurring during the time
Then X has the Poisson distribution with parameter λ, i.e. X ∼ P (λ),
and has probability mass function
e−λ λx
p(x) = x = 0, 1, 2, . . .
=0 otherwise
The bar graphs below show the shape of Poisson distribution for two values of λ.
X ∼ P (3)
0 5 10 15
X ∼ P (8)
0 5 10 15
Example 6A: We have a large fleet of delivery trucks. On average we have 12 break-
downs per 5-day working week. Each day we keep two trucks on standby. What is the
probability that on any day
(a) no standby trucks are needed?
(b) the number of standby trucks is inadequate?
Let the random variable X be the number of trucks that break down in a given day.
Because we are dealing with breakdowns, it is reasonable to assume that they occur at
random and that the Poisson distribution is a realistic model.
Because we are interested in breakdowns per day, we need to convert the given weekly
rate into a daily rate. 12 breakdowns per 5 days is equivalent to 12/5 = 2.4 breakdowns
per day. Thus we assume that X has the Poisson distribution with parameter λ = 2.4,
i.e. X ∼ P (2.4). Hence
e−2.4 2.4x
Pr[X = x] = p(x) =
e−2.4 2, 40
(a) Pr(no breakdowns) = Pr[X = 0] = p(0) = = 0.0907
Example 7B: Bank tellers make errors in entering figures in their ledgers at the rate
of 0.75 errors per page of entries. What is the probability that in a random sample of 4
pages there will be 2 or more errors?
Because we are dealing with errors, we assume a Poisson distribution. If errors occur
at 0.75 errors per page, then the error rate per 4 pages is 3. So we choose λ = 3.
e−3 3x
Pr[X = x] =
e−λ λx
p(x) = x = 0, 1, 2, . . .
=0 otherwise
λ2 λ3 X∞
λx i
eλ = 1 + λ + + + ··· =
2! 3! x=0
Example 9C: Beercans are randomly tossed alongside the national road, with an
average frequency 3.2 per km.
(a) What is the probability of seeing no beercans over a 5 km stretch?
(b) What is the probability of seeing at least one beercan in 200 m?
(c) Determine the values of x and y in the following statement: “40% of 1 km sections
The Discrete Probability
haveDistributions are usually
x or fewer beercans, while used to represent
5% have more thanthe
y.”events that are qualitative
in nature. The following distributions are common to describe discrete data: Bernoulli, Binomial,
Poisson, Hyper-geometric and geometric Distributions
denote the probability of success as P(s)= p and the probability of failure as P(f) = 1–p = q
1. We have a random experiment which has a sample space with exactly two outcomes, one of which
we can label “success”, and the other “failure”: i.e. S = success, failure. e.g. A door-to-door
salesperson calls on a prospective client — the client either purchases the product (success) or
does not purchase (failure).
2. The random experiment is repeated n times, n ≥ 1. The outcome on any one repetition is not
influenced by the outcome on any other repetition. We say “we have n independent trials of the
random experiment”. e.g. The salesperson calls on n = 6 prospective clients — the clients make
their purchasing decisions independently (there is no communication between them!).
3. The probability of success remains constant from trial to trial. We assume that each client is
equally likely to purchase the product. Let Pr(success) = p; thus Pr(failure) = 1 − p. It is
sometimes convenient to let q = 1 − p, so that Pr(failure) = q.
4. Our random variable X is the number of successes we observe in n trials. If the conditions above
are satisfied, then we say that we have a binomial process, and that the random variable X
has a binomial distribution. In the above example, X is the number of calls that resulted in
sales. Because 6 calls were made, X must assume one of the values 0, 1, . . . , 6 and X is therefore
an example of a discrete random variable. Binomial processes occur in many contexts. From
an industrial or commercial perspective, one of the most important binomial processes occurs in
the field of quality.
We have n independent trials, each trial has two outcomes, success of failure, and Pr(success) = p
and q = 1 − p for all trials. The random variable X is the number of successes in n trials; n ≥ 1
must be an integer, and 0 ≤ p ≤ 1. Then X has the binomial distribution, i.e. X ∼ B(n, p), with
probability mass function
n x n−x
p q x = 0, 1, . . . , n
p(x) =
0 Otherwise
Que20: What is the probability of a contractor being awarded only one out of five contracts? Assume
that the probability of being awarded a contract is 0.5.
Que21: A certain type of pill is packed in bottles of 12 pills each. 10% of the pills are chipped in the
manufacturing process.
(a) Explain why the binomial distribution can provide a reasonable model for the random vari-
able X, the number of chipped pills found in a bottle of 12 pills. What are the appropriate
(b) What is the probability that a bottle of pills contains x chipped pills, i.e. what is P r[X = x]?
(c) What are the probabilities of
(i) 2 chipped pills? (ii) no chipped pills? (iii) at least 2 chipped pills?
Given a population of size N, of which M are defective, a sample of size n (n ≤ N ) is drawn. Let
the random variable X be the number of defectives in the sample. Then X has the hypergeometric
distribution with parameters N,M and n, X ∼ H(N, M, n) and X has probability mass function
M N − M
x n−x
x = 0, 1, . . . , n
p(x) = N
0 Otherwise
Que26: A fisherman caught 10 lobsters, 3 of which were undersized. An inspector of the Sea Fisheries
Branch measured a random sample of 4 lobsters. What is the probability that the sample
contains no undersized lobsters?
Ans26: Here N = 10,M = 3 and n = 4. If X is the number of undersized lobsters in the sample of 4,
M N −M N
p(X = 0) = /
x n−x n
3 10 − 3 10
= /
0 4−0 4
= 0.1667
Conversely, the probability that the inspector finds at least one undersized lobster is 1−0.1667 =
Que 27: A team of 15 people is chosen from a class of 65 MBA students to play a social rugby match. The
class contains 25 engineers. What is the probability that the team contains (a) four engineers?
(b) at least four engineers?
Ans 27: (a) Let X be the number of engineers in the sample. Then N = 25 + 40 = 65, M = 25 and
n = 15, so that X ∼ H(65, 25, 15). Thus
25 40 65
p(X = 4) = / = 0.1410
4 11 15
Que 27: A bowl contains 10 blue and 7 red marbles. Four (4) marbles are drawn at random from the
bowl. Calculate the probability of (a) two (b) at least 3 blue marbles drawn when sampling is
1) with replacement 2) without replacement.
a.) The number of bad cheques presented for daily payment at a bank.
d.) The number of defects per square meter on metal sheets being manufactured.
Que P22: A radioactive substance emits alpha particles at a constant rate of one particle every 2 seconds, in
the conditions stated above for applying the Poisson distribution model. What is the probability
of detecting at most 1 particle in a 10-second interval?
Ans 22: Assuming the second as the time unit, we have µ = 0.5. Therefore λt = 5 and
X µx e−µ
P (X ≤ 1) =
50 e−5 51 e−5
= +
0! 1!
= e−5 + 5e−5
= 0.04.
Que P23: A secretary claims an average mistake rate of 1 per page. A sample page is selected at random
and 5 mistakes found. What is the probability of her making 5 or more mistakes if her claim of
1 mistake per page on average is correct?
Ans 22: In this case λ = 1 is claimed and X the number of mistakes ≥ 5. If the claim is true,
P (X ≥ 5) = 1–P (X ≤ 4)
= 1–0.9963 = 0.0037.
The above calculation shows that if the claim of 1 mistake per page on average is true, there is
only a 37 in 10 000 chance of getting 5 or more mistakes per page.
Que P24: At a particular restaurant 4 plates are broken, on average, each week. What is the probability
a) 2 plates are broken next week? Ans: 0.1465
b) at most 4 plates are broken next week? Ans: 0.6288
c) more than 3 plates are broken next week? Ans: 0.5665
Que P25: A computer that operates continuously breaks down at random on average 1.5 times per week.
• Solution :This tells us λ = 1.5 per week, and that the random variable X, the time between
breakdowns, has density function
1.5e−1.5x x ≥ 0
P (x) =
0 Otherwise
What is the probability of no breakdowns for 2 weeks?
• Solution: This implies that X must be greater than 2 and that we want Pr[X > 2]. Because
the exponential distribution is continuous, we evaluate this probability by integration:
Z ∞
P r[X > 2] = 1.5e−1.5x dx = e−1.5x 2
= −e−∞ + e−3 = 0 + e−3
== 0.0498.
a. values c units larger than the mean are just as probable as values that are c units smaller than
the mean ”symmetry” That is many continuous random variables have probability distributions
that look like this
b. A normal probability density with mean 0 and variance=1, X has The probability density func-
tion or pdf given above is the pdf of the normal probability distribution (sometimes called the
Gaussian probability distribution). The normal distribution is the most important distribution
in statistics.
Z ∞ Z ∞
f (x) dx = λ e−λx dx = [−e−λx ]∞
0 0
= −e−∞ + e0 = 0 + 1
= 1,
as required for the area under the curve of a probability density function.
Example 12C: Let the random variable X be the time in hours for which a light bulb
burns from the time it is put into service. The probability density function of X is given
f (x) = 10001
e− 1000 x x ≥ 0
=0 otherwise
(a) What is the probability that the bulb burns for between 100 and 1000 hours?
(b) What is the probability that the bulb burns for more than 1000 hours?
(c) What is the probability that the bulb burns for a further 1000 hours, given that
it has already burned for 500? (Use conditional probabilities!)
Example 13C: Events occur according to a Poisson process with “intensity” λ (i.e. at
rate λ per unit of time).
(a) Use the Poisson distribution to determine the probability of no events in t units of
(b) Now use the exponential distribution to determine the probability that the time
between events is greater than t.
(c) Compare the answers to (a) and (b) and explain these results.
Example 14C: Flaws occur in telephone cable at the average rate of 4.4 flaws per
km of cable. Calculate the following probabilities. (Make use of binomial, Poisson and
exponential distributions.)
(a) What is the probability of 1 flaw in 100 m of cable?
(b) What is the probability of more than 3 flaws in 250 m of cable?
(c) What is the probability that the distance between flaws exceeds 500m?
(d) In ten 200 m lengths of cable, what is the probability that 8 or more are free of
There are two parameters, µ (“mu”, the Greek letter m for Mean) and σ (“sigma”, the
Greek letter s for Standard deviation).
d. The normal distribution is not the only distribution whose probability density function (pdf)
looks bell- shaped,- but it is the most important one and many real world random variables follow
the normal distribution at least approximately. The normal distribution -like the binomial and
Poisson is an example of a parametric probability distribution
It is completely described by a small number of parameters
If X has the normal distribution with parameters µ and σ, we abbreviate this to X ∼
N (µ, σ 2 ), reading this as “the random variable X has the normal distribution with
parameters µ and σ 2 . When we use this notation, our convention is to write σ 2 for
the second parameter, not plain σ. The parameter σ 2 is known as the variance of the
distribution. As in Chapter 1, the variance is the square of the standard deviation.
Unfortunately it is impossible to determine probabilities by integrating the normal
probability density function. However (and this makes life very easy), the integration
can be done by computer, and we are supplied with a table of probabilities for the normal
It should come as a surprise to you that a single table is all we need. After all,
there are infinitely many combinations of µ and σ, and it seems that we ought to have
a massive book of normal tables. We are luckier than we deserve to be, and there is
a connecting link between all normal distributions which makes it possible to get away
with a single table! We will learn how to use this amazing table by means of an example.
Z ∼ N (0, 1)
0.4 0.4 ........
... ....
.. ....
.. .. ..
.. .. ..
.. .. ...
... ... ...
... ... ..
... .... ..
.... .... ...
... .... ..
... .... ...
... ..... ..
... ..... ...
.... ..... ...
... ..... ...
..... ...
254 is one standard deviation (i.e. 3 units) above 251, the mean. Thus the area between
251 and 254 in N (251, 32 ) is the same as that between 0 and 1 in N (0, 1).
Returning to part (a) of our margarine example, we need the area between 251 and
253 of N (251, 32 ). 253 is two-thirds of a standard deviation above the mean of 251,
because (253 − 251)/3 = 2/3. Thus Pr[251 < X < 253] = Pr[0 < Z < 2/3], as depicted
Z ∼ N (0, 1)
0.4 0.4 ........
... ...
.. ....
.. . ..
. .. ..
... .. ..
.... ... ...
... ... ..
.... ... ..
... ... ...
... ... ...
... ... ..
... ... ..
... ... ..
... ... ...
... ... ...
... ...
The corresponding probabilities for two, three and four standard deviations are:
Z ∼ N (0, 1)
0.4 ..
... ...
.. ....
.. .. ..
... .. ..
... .. ..
.... ... ..
... ... ..
.... .... ..
... .... ...
... .... ...
... .... ..
... ..... ...
... ..... ...
.... ..... ...
.... ..... ...
In our margarine example, we use z = (x − µ)/σ for x = 251 and x = 253 to get
251 − 251 253 − 251
Pr[251 < X < 253] = Pr <Z< = Pr[0 < Z < 0.67]
3 3
From the table for the standard normal distribution (Table 1) we read off this probability
as 0.2486. Thus
Pr[251 < X < 253] = 0.2486,
almost a quarter of margarine tubs contain between 251 g and 253 g of margarine.
Part (b) of our question asked for the probability that a tub of margarine was
underweight, i.e. the probability that X < 250. The area between −∞ and 250 in
N (251, 32 ) is the same as the area between −∞ and (250 − 251)/3 = −1/3 in N (0, 1):
250 − 251
Pr[X < 250] = Pr Z < = P [Z < −1/3].
Z ∼ N (0, 1)
0.4 0.4 ........
... ...
.. ....
.. ..
... ..
.. ...
. ..
. ...
. ... ..
... ... ...
.... ...
.. ... ..
.. ...
.... ...
... ....
... ....
Pr[X < 250] = Pr[Z < (250 − 251)/3] = Pr[Z < −1/3] = Pr[Z > 1/3]
= 0.5 − Pr[0 < Z < 1/3] = 0.5 − 0.1293 = 0.3707.
The value 0.1293 is looked up in Table 1. Thus 37% of the tubs will contain less margarine
than stated. Notice that because the normal distribution is symmetric we only need
tables for “half” of the distribution.
Example 17B: A t-shirt manufacturer knows that the chest measurements of his
customers are normally distributed with mean 92 cm and standard deviation 5 cm. He
makes his t-shirts in four sizes — S ( fit size range 80–87 cm), M (to fit 87–94), L (to
fit 94–101) and XL (to fit 101–108). What proportion of customers fit into each size
...... .....
.... ....
... ...
.... ...
... ..
.. ..
.. X ∼ N (92, 5 ) 2
.. ..
.... ...
.. .
.. ...
.. ..
... ...
... ...
.... ...
.... ...
.. ...
. ...
.... ...
... ...
.. ...
.... ...
.. ...
. ...
. ...
. ...
.... ...
.... ...
. ...
... ...
... ...
.... ...
.. ...
... ...
... ...
.. ...
. ...
... ...
.... ...
. ...
. ...
. ...
.... ...
. ...
.... ...
.... ...
.. ...
S M L ...
. ....
..... .....
. ......
. ......
. .
...... .........
. ...........
......... .............................
80 87 94 101 108
We need to find the z-values for each of the boundary points, by using the formula
z = (x − µ)/σ.
Then, from our normal tables, we find the area between each of these points and the
mean. This gives
x z = (x − 92)/5 Area between x and µ
80 −2.4 0.4918
87 −1.0 0.3413
94 0.4 0.1554
101 1.8 0.4641
108 3.2 0.4993
The proportions for each size are then found by subtraction (or addition in the case
of size M), as follows:
Size Proportion
S 0.4918 − 0.3413 = 0.1505 (15.05%)
M 0.3413 + 0.1554 = 0.4967 (49.67%)
L 0.4641 − 0.1554 = 0.3087 (30.87)%
XL 0.4993 − 0.4641 = 0.0352 ( 3.52%)
Check for yourself that 0.89% of customers don’t fit into any size t-shirt.
Example 18C: The mean inside diameter of washers produced by a machine is 0.403
cm and the standard deviation is 0.005 cm.
Washers with an internal diameter less than 0.397 cm or greater than 0.406 cm are
considered defective. What percentage of the washers produced are defective, assuming
the diameters are normally distributed?
Example 19C: In a large group of men 4% are under 160 cm tall and 52% are between
160 cm and 175 cm tall. Assuming that heights of men are normally distributed, what
are the mean and standard deviation of the distribution?
Xi ∼ N (µi , σi2 )
and if it is independent
P of the time taken for other tasks, then the distribution of the
random variable Y = ni−1 Xi is
Y ∼ N (µ, σ 2 )
where µ = ni=1 µi , and σ 2 = ni=1 σi2 .
Sometimes we need to consider the difference of two independent normally dis-
tributed random variables. Suppose
The mean of the random variable Z is found by subtraction, but the variance is still
found by addition.
Example 21B: You have 4 chores to perform before getting to Statistics lectures by
08h00. The time (in minutes) to perform each chore is normally distributed with mean
and standard deviation as given below:
(a) The total time taken to get to university is a normally distributed random variable
X with mean
µ = 5 + 4 + 10 + 15 = 34 minutes
and variance
σ 2 = 0.52 + 1.02 + 3.52 + 5.02 = 38.5
and therefore standard deviation σ = 6.205.
The probability that you take more than the allowed 40 minutes is
40 − 34
Pr[X > 40] = Pr Z > = Pr[Z > 0.97] = 0.1660
You’ll be late about one day in 31, on average, but (by part (b)) more than
three minutes late only once in every 100 days.
Example 22C: Plastic caps seal the ends of the tube into which your degree certificate
is placed when you graduate. Suppose the tubes have a mean diameter of 24.0mm and
a standard deviation of 0.15mm, and that the plastic caps have a mean diameter of
23.8 mm and a standard deviation of 0.11mm. If the diameter of the cap is 0.10 mm
or more larger than that of the tube, the cap cannot be squashed into the tube, and if
the diamater of the cap is 0.45 mm or more smaller than that of the tube, it will not
seal the tube, but will just keep falling out. If a tube and and plastic cap are selected at
random, what are the probabilities of (a) the cap being too large for the tube, and (b)
the cap falling out of the tube?