Probability I

PROBABILITY I
Rasheed A. ADEYEMI (PhD)

1
B.Tech(FUTA), MSc(UIlorin), MSc(UCT), PhD(UKZN)
1
Department of Statistics
Federal University of Technology, Minna
November 26, 2022
Abstract
These notes were prepared for the module on Probability Theory II which forms part of the STA127S course
at FUT Minna. R tutorials as well as theoretical practicals are provided.
CHAPTER 1: Set, subset, intersection, union, complement, empty and uni-

versal sets, mutually exclusive sets; pairwise mutually exclusive and exhaustive
sets.
CHAPTER 2: Random experiments, sample space, events, elementary events, cer-

tain and impossible events, mutually exclusive events, probability, relative frequency,
Kolmogorov’s axioms, permutations, combinations, conditional probability, Bayes’ the-
orem, independent events.and applications
CHAPTER 3: Discrete Random variables: Discrete Probability Distributions,

Mean, variance, standard deviation of discrete random variable; Probability Distri-
butions: Binomial, Hypergeometric and Poisson and its applications
CHAPTER 4: Continuous Random variables: The Normal Distribution, Mean,

variance, standard deviation of normal random variable, Normal Probability dis-
tributions and Applications
Recommended textbooks:
• Les Underhill & Dave Bradfield. INTROSTAT, Lecture Note on Statis-
tics, University of Cape Town
• Bayo Lawal Applied Statistical Methods in Agriculture, Health and Life
Sciences, Publisher: Springer
1
1 SET THEORY 2
1 Set Theory
Chapter 2
SET THEORY
KEYWORDS: Set, subset, intersection, union, complement,

empty and universal sets, mutually exclusive sets; pairwise mutually
exclusive and exhaustive sets.
Why do we have to do set theory? ...
Simply because one of Murphy’s Laws states that before you can do anything, you
have to do something else. Before we can do “statistics” we have to do “probability
theory”, and for that we need some “set theory”. So here we go.
Definition of sets . . .
We define a set A to be a collection of distinguishable objects or entities. The set
A is determined when we can either (a) list the objects that belong to A or (b) give a
rule by which we can decide whether or not a given object belongs to A.
Example 1A: (a) If we say, “The letters e, f, g belong to the set A”, then we write
A = {e, f, g}
(b) If we say, “The set B consists of real numbers between 1 and 10 inclusive”, then
we write
B = {x | 1 ≤ x ≤ 10}.
We read this by saying: “The set B consists of all real numbers x such that x is larger
than or equal to 1 but is less than or equal to 10.”
Because the object e belongs to the set A we write
e∈A
and we say: “e is an element of A”. Because e does not belong to B, we write
e 6∈ B
and we say: “e is not an element of B”.

Note, firstly, that if C = {1, 3, 5, a} and D = {a, 1, 5, 3} then C = D. The order
in which we list the elements of a set is irrelevant. Secondly, if E = {a, b, c, a} and
F = {a, b, c} then E = F . The set E contains only the distinguishable elements a, b
and c.
47
1 SET THEORY 3
48 INTROSTAT
Example 2B:
(a) Express in set theory notation: the set U of numbers which have square roots
between 1 and 4.
(b) Write out in full all the elements of the set Z = {(x, y) | x ∈ {1, 2, 3, 4}, y = x2 }.
(a) Because the square roots of numbers between 1 and 16 belong to U , we write
U = {x | 1 ≤ x ≤ 16}.
(b) Z = {(1, 1), (2, 4), (3, 9), (4, 16)}.
Example 3C: Which of the following statements are correct and which are wrong?
(a) {3, 3, 3, 3} = {3}
(b) 6 ∈ {5, 6, 7}
(c) C = {−1, 0, 1}
(d) F = {x | 4 < f < 5}
(e) {1, 2, 7} = {7, 2, 1, 7}
(f) If H = {2, 4, 6, 8}, J = {1, 2, 3, 4} and K = {2x | x ∈ H}, then K = J
(g) {1} ∈ {1, 2, 3}.
Subsets . . .
Suppose we have two sets, G and H, and that every element of G also belongs to
H. Then we say that “G is a subset of H” and we write G ⊂ H. We can also write
H ⊃ G and say “H contains G”. If every element in G does not also belong to H, we
write G 6⊂ H and say “G is not a subset of H.”
Example 4A: Let G = {1, 3, 5}, H = {1, 3, 5, 9} and J = {1, 2, 3, 4, 5}. Then
clearly G ⊂ H, H 6⊂ J, J ⊃ G.
Note that the notation ⊂, ⊃ for sets is analogous to the notation ≤, ≥ for ordinary
numbers (rather than the notation <, >). The “round end” of the subset notation tells
you which of the sets is “smaller” (in the same way as the “pointed end” shows which
of two numbers is smaller).
Our definition of subset has a curious (at first sight) but logical consequence. Because
every element in G belongs to G, we can write G ⊂ G. For numbers, we can write 2 ≤ 2.
If H ⊂ G and G ⊂ H, then, obviously, H = G. For numbers, x ≤ 2 and x ≥ 2
together imply that x = 2.
Example 5C: Let V = {v | 0 < v < 5}, W = {0, 5}, X = {1, 2, 3, 4}, Y = {2, 4},
Z = {x | 1 ≤ x ≤ 4}. Which of the following statements are true, and which are false:
(a) V = W (e) X = Z
(b) Y ⊂ X (f) Z 6⊂ V
(c) W ⊃ V (g) Y ⊂ W
(d) Z ⊃ X (h) Y ∈ Z
1 SET THEORY 4
CHAPTER 2. SET THEORY 49
Intersections . . .
Suppose that L = {a, b, c} and M = {b, c, d}. Then L 6⊂ M and M 6⊂ L. But if
we consider the set N = {b, c}, then we see that N ⊂ L and N ⊂ M , and that no other
set of which N is a subset has this property. This leads us to the idea of intersection.
The intersection of any two sets is the set that contains precisely those elements
which belong to both sets. For the sets, L, M and N above we write N = L ∩ M and
read this “N equals L intersection M ”. The intersection of two sets M and N can be
thought of as the set containing those elements which belong to both M and to N .
Example 6A: If P = {x | 0 ≤ x ≤ 10} and Q = {x | 5 < x < 20}, find P ∩ Q. Is

5 ∈ P ∩ Q? Is 10 ∈ P ∩ Q?
Paying careful attention to the endpoints,
P ∩ Q = {x | 5 < x ≤ 10}.
No, 5 6∈ P ∩ Q, but, yes, 10 ∈ P ∩ Q.
The empty set, mutually exclusive sets . . .

What happens if L = {a, b, c} and R = {d, e, f }? If we want L ∩ R to be a set,
then we must introduce a new concept, the empty set, the set that has no members.
This is a sensible concept: consider the set of English-speaking fish, or consider the set
of real numbers whose square is negative. We reserve the symbol ∅ to denote the empty
set. We use this symbol for no other purpose. We write and read this as L ∩ R = ∅, “the
intersection of sets L and P is the empty set”.
Pairs of sets whose intersection is the empty set are said to be mutually exclusive
sets (or disjoint sets). Thus L and R are mutually exclusive.
The universal set, the sample space . . .

Another reserved symbol is the letter S. It is used for the set containing all objects
under consideration. Thus if, in a particular problem, the only objects of interest are
the colours of a traffic light, then S = {red, amber, green}. The set S is known to
mathematicians as the universal set. In statistical jargon the set S is called the sample
space.
Unions
The concept union contrasts with the concept intersection. The union of two sets
A and B is the set that contains the elements that belong to A or to B. Here we use
the word “or” in an inclusive sense — we do not exclude from the union those elements
that belong to both A and B.
If A = {1, 2, 3} and B = {2, 3, 4, 5} then the union of A and B is the set
C = {1, 2, 3, 4, 5}. We write
C =A∪B
and say “C equals A union B”.
1 SET THEORY 5
50 INTROSTAT
Example 7A: If P = {x | 0 ≤ x ≤ 10} and Q = {x | 5 < x < 20}, find P ∪ Q.

The union includes all the elements of both set P and set Q:
P ∪ Q = {x | 0 ≤ x < 20}.
Complements . . .
Our final concept from set theory is that of the complement of a set. Given the
sample space S, we define the complement of a set A to be the set of elements of S
which are not in A. The complement of A is written A, and is always relative to the
sample space S.
If S = {1, 2, 3, 4, 5, 6}, A = {1, 3, 5} and B = {2, 4, 6} then A = {2, 4, 6}.
We write
A=B
and say “the complement of A equals B” or, more briefly, “A complement equals B”.
Example 8A: If S = {x | 0 ≤ x ≤ 1} and D = {x | 0 < x < 1}, find D.

Because the set D excludes the endpoints of the interval from zero to one, D = {0, 1}.
Example 9C: If the sample space S contains the letters of the alphabet, i.e. S =
{a, b, c, . . . , x, y, z}, the set A contains the vowels, the set B contains the consonants,
the set C contains the first 10 letters of the alphabet C = {a, b, c, . . . , h, i, j} pick out
the true and false statements in the following list:
(a) A ∪ B = S (g) S ∩ B = B
(b) A ∩ B = ∅ (h) A ∪ A = S
(c) S ⊂ S (i) C ∩ A = {o, u}
(d) A ∩ C = {a, e, i} (j) (A ∪ C) = A ∩ C
(e) A ⊂ B (k) A ∩ C ⊂ C
(f) A = B (l) S = ∅
Venn diagrams . . .
A pictorial representation of sets that helps us solve many probems in set theory is
known as the Venn diagram. In the diagrams below think of all the “points” in the
rectangle as being the sample space S, and all the points inside the circles for A and
B as the sets A and B respectively. The shaded area in the diagram on the left then
represents A ∩ B, the set of points belonging to A and B. Similarly the diagram on the
right is a visual representation of A ∪ B, the set of points belonging to A or B. Recall
once again the special, inclusive meaning we give to “or”. When drawing Venn diagrams
it is helpful to associate “intersection” with “and” and “union” with “or”.
S S
A A
B B
A∩B A∪B
1 SET THEORY 6
The diagram on the left below shows how to depict two mutually exclusive sets in a
Venn diagram.
Venn diagrams are usually only useful for up to three sets: the area shaded in the
diagram on the right is A ∩ B ∩ C.
S A B S
A
B
C
A∩B =∅ A∩B∩C
Pairwise mutually exclusive, exhaustive sets . . .

If a family of sets A1 , A2 , . . . , An are such that any pair of them is mutually
exclusive, i.e. Ai ∩ Aj = ∅ if i 6= j, and if A1 ∪ A2 ∪ . . . ∪ An = S, i.e. the union of the
sets “exhausts” the sample space, then the family of sets A1 , A2 , . . . , An are said to be
pairwise mutually exclusive and exhaustive. If we represent such a family of sets
on a Venn diagram, the sets must cover the sample space, and they must be disjoint.
Here are two examples:
.. ... .. ... ..
....... ...
.... .......... ...... ......
. ... ....... .....
......... ..........
A1.............. .........
.........
.........
.....
......
......
......
......
S ...
... .......
.......
....... ..
... S
.... A2
. .
. ... .... .... ... .
... ..
.......
.
.........
.........
.........
...... ..
.....
......
......
...
......
......
......
..
A 1 ....
...
... A 3 .....
.......
.......
.
...
.....
.. .
..
....
... ...................
..........
... ........
......
.. ..........
...... ...
... ..............
.......
.......
. .
A 6 .......
...
A 8
...................... ...... ...... ..
. A 3 ......
......
.....
......
......
......
.
..... .. .
....... ... ...
. .. .. ...............................................................................
...
...... . ...... ......... .
.
... ...
. ...
.....
. ..
. .
. ... ......... ...
...... ...... ...... ... .. ... .............
...... ..... .. .... ... .......... ..
...... ...... ...... ........... .....
.....
......
......
...... .. ......
...... A 2 ... .... .... .
. A .
.....
..........
........... ...
.....
......
.
.....
...
.
A 4................
... .....
. ......
......
... .
..
... ..
5 ....
.
...
..........
..........
........ ..
...
....
.
...... ...... .... ... ..

.. ... ............. ...
...... ...... ...... ..... ... ............... ...
...... ......... ......... ..... ..
..
.. ... ...
.
.
..... ... .. ..... . ... .....
.
. ...
.
.....
.... ...... ..... ... ...
...... .... ... .. ...
......
......
.....
......
......
...... A n−1 .....
.....
..... ...
.. A 4
.. ....
.... A 7 ...
...
...
A 9
...... ...... ..... . . . . ..
...... .... ... ... ...
......
......
......
.....
.....
..... A n .
...
.. . . ...
...
.....
...
...... ..... .. ... ...
..... ..... .. .. ..
Using Venn diagrams . . .

Example 10B: Draw Venn diagrams to show that (A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C)
In the left-hand Venn diagram, the grey-shaded shaded area is A ∪ B, and the
vertically shaded area is C. Their intersection (A ∪ B) ∩ C is shaded both grey and
vertically. In the right-hand Venn diagram, the two shaded areas are A ∩ C and B ∩ C.
Their union is the same as the area shaded both grey and vertically in the left-hand
diagram.
1 SET THEORY 7
52 INTROSTAT
A B S A B S
C C
(A ∪ B) ∩ C (A ∩ C) ∪ (B ∩ C)
Example 11C: Draw Venn diagrams to show that the following are true:
(a) A ∪ B = A ∪ (B ∩ A)
(b) (A ∩ C) ∪ (B ∩ A) = (A ∪ C) ∩ (A ∩ B)
(c) The sets A ∩ B, A ∩ C, B ∩ A and (A ∪ C) form a family of pairwise mutually
exclusive and exhaustive sets.
Example 12C: Draw Venn diagrams to determine which of the following statements
are true.
(a) (A ∩ B) = A ∩ B
(b) (A ∩ B) ∪ (A ∩ B) ⊂ A ∪ B
(c) (A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C)
(d) (C ∩ A) ∪ (C ∩ B) = (C ∪ (A ∩ B))
(e) [(A ∪ B) ∩ C] ∪ [(A ∪ C) ∩ B] = [(A ∪ B ∪ C) ∩ (A ∪ B) ∩ C] ∪ (A ∩ B)
(f) If the sets A1 , A2 , A3 , and A4 are pairwise mutually exclusive and exhaustive, and
B is an arbitrary set, then
B = (A1 ∩ B) ∪ (A2 ∩ B) ∪ (A3 ∩ B) ∪ (A4 ∩ B).
Solutions to examples . . .
3C (a), (b), (c) and (e) are correct; (d) should read either F = {x | 4 < x < 5} or
F = {f | 4 < f < 5}. For (f), check that the following statement is correct: if H
and J are as given, and if K = {2x | x ∈ J} then K = H. For (g), note that we
never use the ∈-notation with a set on the left hand side.
5C Only (b) and (d) are true.
9C All are true.
11C All are true.
12C (b) (c) (e) and (f) are true. For (a), check that (A ∩ B) = A ∪ B is true.
1 SET THEORY 8
Easy exercises . . .
2.1 Let S be {1, 2, 3, 4, 5, 6}, the set of all possible outcomes when a die is thrown
and the number of dots on the uppermost face recorded. Describe in words the
following sets:
(a) {6} (d) {2, 4, 6}
(b) {1, 2, 3, 4} (e) {5, 6}
(c) {1, 3, 5} (f) {6}
∗ 2.2 If S = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, and A = {0, 1, 2}, B = {3, 4, 5, 6, 7},
C = {7, 8}, and D = {2, 4, 6, 8}, which of the following statements are true?
(a) A and B are mutually exclusive
(b) B = {0, 1, 2, 8, 9}
(c) A ∪ B ∪ C ∪ D = {0, 1, 2, 3, 4, 5, 6, 7, 8}
(d) D ⊂ (B ∪ C)
(e) A ∩ B ∩ C ∩ D = {9}
(f) A ∪ (B ∩ D) = (A ∩ B) ∪ (A ∩ D).
2.3 Let S denote the set of all companies listed on the Johannesburg Stock Exchange.
Let A = {x | x is in the gold mining sector},
let B = {x | x has annual turnover exceeding R10 million},
let C = {x | x has financial year ending in June},
let D = {x | the share price of x is higher now than six months ago}.
Describe in words the following sets:
(a) A ∪ B, (e) A,
(b) A ∩ D, (f) C ∪ D,
(c) A ∩ C ∩ D, (g) B∩C
(d) B ∩ (C ∪ A), (h) (B ∩ A) ∪ (C ∩ D).
∗ 2.4 If A, B and C are subsets of a universal set S, draw Venn diagrams to determine
which of the following statements are true.
(a) A∪A=S
(b) A∩A=∅
(c) A∪B =A∩B
(d) A∩B =A∪B
(e) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
(f) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
(g) A∪B∪C =S
(h) A∩B∩C =∅
(i) A∪B ⊃A∩B
(j) A ∩ (B ∪ C) ⊂ A ∪ (B ∩ C)
2.5 If S = {1, 2, 3}, list all the subsets of S.
∗ 2.6 Draw a series of Venn diagrams representing three sets, and shade in the following
areas.
(a) A∩B∩C
(b) (B ∩ A) ∪ (A ∩ C)
(c) A∪B∪C
(d) (A ∪ B ∪ C) ∩ (B ∩ C).
1 SET THEORY 9
54 INTROSTAT
More difficult exercises . . .

∗ 2.7 Let B1 , B2 , . . . , Bn be n disjoint subsets of S such that
∪ni=1 Bi = S and Bi ∩ Bj = ∅ for i 6= j.
Let A be any other subset of S.

Use a Venn diagram to show that
A = ∪ni=1 (A ∩ Bi ).
[Notation: ∪ni=1 Bi means B1 ∪ B2 ∪ . . . ∪ Bn .]
2.8 Show that if the set S has n elements, then S has 2n subsets. [Hint: Use the
binomial theorem.]
2.9 Let A and B be two events defined on a sample space S. Depict the following
events in Venn diagrams:
(a) C = (A ∩ B) ∪ (A ∩ B)
(b) D = (A ∪ B) ∪ (A ∩ B)
(c) What can you say about events C and D?
Solutions to exercises . . .
2.1 (a) The number six is obtained.

(b) A number less than or equal to four is obtained.
(c) An odd number is obtained.
(d) An even number is obtained.
(e) A number greater than or equal to 5 is obtained.
(f) A number other than 6 is obtained.
2.2 All are true except (d) and (f).
2.3 (a) Set of companies either in the gold mining sector or with turnovers exceeding
R10 million.
(b) Set of gold mining companies whose share price is higher now than six months
ago.
(c) Set of gold mining companies with a financial year ending in June whose share
price is higher now than six months ago.
(d) Set of all companies which have an annual turnover exceeding R10 million
and which are either gold mining companies or companies with financial years
ending in June (or both).
(e) Set of companies not in the gold mining sector.
(f) Set of companies which either do not have a financial year ending in June or
have a share price which is higher now than six months ago.
(g) Set of companies which either do not have an annual turnover exceeding R10
million or do not have financial year ending in June.
1 SET THEORY 10
(h) Set of companies which either do not have an annual turnover exceeding R10
million or are not in the gold mining sector or both have a financial year
ending in June and have a share price which is higher now than six months
ago.
(Notice how difficult it is to express unambiguously in words the meaning of a few
mathematical symbols.)
2.4 All are true, except (g) and (h).
2.5 ∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}.
2.9 (c) C and D are mutually exclusive.

2 PROBABILITY THEORY 11
2 Probability Theory
Chapter 3
PROBABILITY THEORY
KEYWORDS: Random experiments, sample space, events,

elementary events, certain and impossible events, mutually ex-
clusive events, probability, relative frequency, Kolmogorov’s
axioms, permutations, combinations, conditional probability,
Bayes’ theorem, independent events.
New wine in old wineskins . . .

In the mathematical sciences, in contrast to most other disciplines, we prefer not
to coin new words for new concepts. We rather prefer to give new meanings to old
words. In this chapter we ask you to put aside your intuitive ideas of what constitutes
an “experiment” or an “event” and replace them with the new meanings statisticians
have given them.
Random experiments, sample spaces, trials . . .

To statisticians, a random experiment is a procedure whose outcome in a par-
ticular performance or trial cannot be predetermined. Although we cannot foretell what
the outcome of any single repetition of the experiment will be, we must be able to list
the set of all possible outcomes of the experiment. In general, random experiments must
be capable, in theory at least, of indefinite repetition. It must also be possible to observe
the outcome of each repetition of the experiment. The set of all possible outcomes of a
random experiment is called the sample space of the random experiment. We usually
use the letter S to denote the sample space. Each repetition of the procedure for the
random experiment is called a trial, and gives rise to one and only one of the possible
outcomes.
Example 1A: The following are examples of random experiments and their sample
spaces.
(a) We toss a coin. We can list the set of possible outcomes: S = {heads, tails}. We
can repeat the experiment endlessly, and we can observe the result of every trial.
(b) A phone number is chosen at random. The number is dialled, and the person who
answers is asked whether he/she is currently watching television. If the telephone is
unanswered after 45 seconds, the outcome, “no reply”, is recorded. The set of possi-
ble outcomes, the sample space, is S = {yes, no, won’t say, number engaged, no reply}.
57
CHAPTER 3. PROBABILITY THEORY 59
(a) Squash is played to 9 points, with “deuce” at 8-all, in which case the player who
reached 8 first decides whether to play to 9 points or to 10 points. Thus the sample
space is S = {9-0, 9-1, 9-2, 9-3, 9-4, 9-5, 9-6, 9-7, 9-8, 10-8, 10-9}.
(b) The set of ways in which a batsman’s innings can end is given by S = {bowled,
caught, leg before wicket, run out, stumped, hit wicket, not out, retired, retired
hurt, obstruction, timed out}.
(c) It is convenient to let U = up D = down and N = no change. Then S = { UU,
UD, UN, DU, DD, DN, NU, ND, NN}, where, for example, DU means “first share
down, second share up”.
Notice how we construct the most detailed possible sample space — the set of
outcomes {both up, one up & one down, one up & one unchanged, one down & one
unchanged, both unchanged, both down} is not acceptable because each of these could
represent several distinguishable outcomes. For example, “one up & one down” could
represent either “UD” or “DU”.
Example 3C: A random experiment consists of tossing 3 coins of values R1, R2 and
R5 and observing heads and tails. Which of the following is the correct sample space?
(a) S = {3 heads,2 heads,1 head,0 heads}.
(b) S = {3 heads,2 heads 1 tail,1 head 2 tails,3 tails}.
(c) S = {HHH,HHT,HTH,THH,HTT,THT,TTH,TTT} where, for example, HTH
means “heads on R1, tails on R2 and heads on R5”.
Example 4B: Refer to the random experiments (a) to (c) of Example 2B, and give the
subsets of S that correspond to the following events.
(a) (i) The squash game is won by 5 or more points.
(ii) The game goes to deuce.
(b) When the batsman ended his innings the bails were dislodged from the wickets.
(c) None of the shares decline.
In each case we simply list the outcomes that favour the occurrence of the event in
which we are interested. The answers are:
(a) (i) {9-0, 9-1, 9-2, 9-3, 9-4} (ii) {9-8, 10-8, 10-9}
(b) {bowled, run out, stumped, hit wicket}
(c) {UU, UN, NU, NN}.
Example 5C: A salesperson, after calling on a client, records the outcome: sale made
(S), or no sale made (N ). List the sample space of outcomes in one afternoon if
(a) two clients are visited
(b) three clients are visited.
(c) Suppose now that three outcomes are recorded: sale made (S), sales potential good
(P ), no sale ever likely to be made (N ). List the sample space if two clients are
visited.
Example 6C. A party of five hikers, three males and two females, walk along a moun-
tain trail in single file.
(a) What is the sample space S?
(b) Find the subset of S that correspond to the events:
• U : a female is in the lead
• V : a male is bringing up the rear
60 INTROSTAT
• W : females are in the second and fourth positions.

(c) Find the subsets of S that correspond to U , U ∩ W , V ∩ W , and U ∩ V .
Kolmogorov, father of probability . . .

Andrey Nikolaevich Kolmogorov was a Russian mathematician who, in 1933, pub-
lished the axioms of probability, and established the theoretical foundation for the rig-
orous mathematical study of probability theory.
KOLMOGOROV’S AXIOMS OF PROBABILITY
Suppose that S is the sample space for a random experiment. For all
events A ⊂ S, we define the probability of A, denoted Pr(A), to be a
real number with the following properties:
1. 0 ≤ Pr(A) ≤ 1 for all A ⊂ S
2. Pr(S) = 1
3. If A ∩ B = ∅ (i.e. if A and B are mutually exclusive events) then

Pr(A ∪ B) = Pr(A) + Pr(B).
A consequence of the Kolmogorov axioms is that Pr(∅) = 0. The function Pr(A)

provides a means of attaching probabilities to events in S. The first two axioms tell us
that probabilities lie between zero and one, and that these extreme probabilities occur
for the impossible and certain events, respectively. The probabilities of all other events
are graded between these two extremes — unlikely events have probabilities close to zero,
and events which are nearly certain have probabilities close to one. If an event is as likely
to occur as it is not to occur, then it has probability 0.5. Thus for an unbiased coin,
for which “heads” and “tails” are equally likely, the probability of the event “heads” is
equal to the probability of the event “tails” is equal to 0.5!
This function Pr(A) is almost certainly a new kind of function to you. The functions
you have seen before, e.g. y = 3(x2 + 5), take one real number, x, and map them onto
another real number, y. If it helps you, you can think of the function y = f (x) as a
kind of mincing machine — you put a number (x) in, you get another number (y)
out. Now you must think of the function Pr(A) as a new kind of mincing machine —
you put a set (A) in, and out pops a number between zero and one (inclusive of these
end limits)!
Relative frequencies . . .
To try to get some insight into the concept of probability, consider a random exper-
iment on some sample space S repeated infinitely many times. Let’s start by doing
n trials of the random experiment and counting the number of times r that some event
A ⊂ S occurs during the n trials. Then we define r/n to be the relative frequency of
the event A. Obviously, 0 ≤ r/n ≤ 1. Thus relative frequencies and probabilities both
lie between zero and one.
We can think of the probability of the event A as the relative frequency of A as n,
the number of trials of the random experiment gets very large. In symbols
r
Pr(A) = lim .
n→∞ n
If you toss a fair coin, then the probability of “heads” is equal to the probability of
“tails”, i.e. Pr(H) = Pr(T ) = 0.5. If you tossed the coin 10 times you might observe
6 heads, a relative frequency of 6/10 = 0.6. But if you tossed it 100 times you might
observe 53 heads, relative frequency 53/100 = 0.53. If you kept going for a few hours
more, and tossed it 1000 times you might observe 512 heads, giving a relative frequency
of 512/1000 = 0.512. As the number of trials increases, the relative frequency
tends to get closer and closer to the “true” probability.
A class experiment — birthdays in April, May and June . . .
Almost exactly a quarter of the days of the year fall into April, May or June
(91/365.25 = 0.249, allowing for leap years every fourth year). Thus we expect the
probability that an individual’s birthday falls into one of these three months to be
pretty close to 0.249. Let’s do an experiment within the class, and fill in this table.
Number of Number with birthdays Relative

students (n) in April, May, June (r) frequency (r/n)
Front row
Front three rows
Whole class
Do the relative frequencies get closer to the “true” probability as n gets larger?
Some useful theorems . . .

We consider several theorems that follow immediately from the axioms of probability.
Theorem 1. Let A ⊂ S. Then Pr(A) = 1 − Pr(A).
Proof: We write S as the union of two mutually exclusive events:
A ∪ A = S.
Because A and A are mutually exclusive, i.e. A ∩ A = ∅, we can use axiom 3 to state
Pr(A ∪ A) = Pr(A) + Pr(A).
But A∪A = S, and Pr(S) = 1, by Kolmogorov’s 2nd axiom, so Pr(A∪A) = 1. Therefore
Pr(A) + Pr(A) = 1
and
Pr(A) = 1 − Pr(A),
as required
62 INTROSTAT
Theorem 2. If A ⊂ S and B ⊂ S then Pr(A) = Pr(A ∩ B) + Pr(A ∩ B).

Proof: Write A as the union of the two mutually exclusive sets:
A = (A ∩ B) ∪ (A ∩ B).
Clearly,
(A ∩ B) ∩ (A ∩ B) = ∅.
Therefore, using axiom 3,
Pr(A) = Pr(A ∩ B) + Pr(A ∩ B)
Notice that theorem 2 may also be expressed as
Pr(A ∩ B) = Pr(A) − Pr(A ∩ B).
Theorem 3. The Addition Rule. For any arbitrary events A and B,
Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).
Proof: Write A ∪ B as the union of two mutually exclusive sets:
A ∪ B = B ∪ (A ∩ B)
Because B and A ∩ B are mutually exclusive, we can again apply axiom 3 and say
Pr(A ∪ B) = Pr(B) + Pr(A ∩ B).
But, by theorem 2, Pr(A ∩ B) = Pr(A) − Pr(A ∩ B). The result follows.
Theorem 4. If B ⊂ A, then Pr(B) ≤ Pr(A).

Proof: If B ⊂ A then we can write A as the union of two mutually exclusive sets,
A = B ∪ (A ∩ B)
and
Pr(A) = Pr(B) + Pr(A ∩ B)

≥ Pr(B)
because Pr(A ∩ B) ≥ 0 as all probabilities are non-negative.
Theorem 5 If A1 , A2 , . . . , An are pairwise mutually exclusive, i.e. Ai ∩ Aj = ∅ for i 6= j,

then
Pr(A1 ∪ A2 ∪ . . . ∪ An ) = Pr(A1 ) + Pr(A2 ) + . . . + Pr(An ),
or, in a more concise notation,
n
! n
[ X
Pr Ai = Pr(Ai ).
i=1 i=1
Proof: The proof is by repeated use of axiom 3. The events (A1 ∪ A2 ∪ . . . ∪ An−1 ) and
An are mutually exclusive. Thus
n
! n−1
!
[ [
Pr Ai = Pr Ai + Pr(An )
i=1 i=1
Next, the events (A1 ∪ A2 ∪ . . . ∪ An−2 ) and An−1 are mutually exclusive. Thus
n−1
! n−2
!
[ [
Pr Ai = Pr Ai + Pr(An−1 ),
i=1 i=1
so that ! !
n
[ n−2
[
Pr Ai = Pr Ai + Pr(An−1 ) + Pr(An ).
i=1 i=1
Continue the process, and the result follows.
Example 7A: If Pr(A) = 0.5, Pr(B) = 0.6 and Pr(A ∩ B) = 0.3, find Pr(B), Pr(A ∩ B)
and Pr(A ∪ B).
By theorem 1, Pr(B) = 1 − Pr(B) = 1 − 0.6 = 0.4.
By theorem 2, Pr(A ∩ B) = Pr(A) − Pr(A ∩ B) = 0.5 − 0.3 = 0.2.
By theorem 3, Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B) = 0.5 + 0.6 − 0.3 = 0.8
Example 8C: Is it possible for events in the same sample space to have probabilities
Pr(A) = 0.8, Pr(B) = 0.6 and Pr(A ∩ B) = 0.7?
Example 9C: In the sample space S, let Pr(A) = 0.7, Pr(B) = 0.5, Pr(C) = 0.1,
Pr(A ∪ B) = 0.9, and Pr((A ∪ B) ∩ C) = 0. Depict the events A, B and C on a Venn
diagram and find the probability of the events A ∩ B, A ∩ C, A ∪ B ∪ C, B ∩ C, and
(A ∩ B).
Example 10C: If we know that Pr(A ∪ B) = 0.6 and Pr(A ∩ B) = 0.2, can we find
Pr(A) and Pr(B)?
Equally probable elementary events . . .

All probability problems, in theory at least, can be solved by making use of the-
orem 5. The elementary events whose union make up the sample space S are always
mutually exclusive because if one elementary event occurs, no other elementary event
occurs. Therefore, if we knew the probabilities of all the elementary events, we would
also be able to compute the probabilities of any event in S. By theorem 5, the proba-
bility of any event is simply the sum of the probabilities of the elementary events that
make up the event.
There is a wide class of problems for which we do know the probabilities of all the
elementary events in a sample space. These are the problems for which it is reasonable to
assume that all the elementary events are equally likely. If there are N elementary
events contained in S, and each one has the same probability of occurring, then the
probability of each and every elementary event must be 1/N .
Equally probable elementary events occur in many games of chance — coins and
dice are assumed to be unbiased; when a card is drawn from a pack of 52 playing cards,
the probability of any particular card is assumed to be 1/52. Let A be some event in
this scenario. Then A must consist of the union of elementary events, each of probability
1/N . If we could determine the number of elementary events contained in A, then
we could write
number of elementary events contained in A
Pr(A) =
number of elementary events contained in S
n(A) n(A)
= = ,
n(S) N
64 INTROSTAT
where we define the function n(A) to mean the count of the number of elementary events
contained in A. Clearly, n(S) = N .
Example 11A: Consider tossing a fair die. Then S = {1, 2, 3, 4, 5, 6} and N = 6. Let
A = {1, 3, 5} the event of getting an odd number. Find Pr(A).
The number of elementary events contained in A is n(A) = 3. So
n(A) 3 1
Pr(A) = = = ,
N 6 2
which your intuition should tell you is correct!
COMPUTING PROBABILITIES WHEN THE ELEMENTARY

EVENTS ARE ALL EQUIPROBABLE
If all the N elementary events contained in a sample space have proba-
bility 1/N , the following three steps enable the probability of any event
A in the sample space to be found:
1. Determine the sample space S made up out of elementary events.

Determine the number, N = n(S) of elementary events contained
in S. You might have to list and count them, or you might be
able to use one of the “counting rules” given below. If the N
elementary events are equally probable, then each one occurs
with probability 1/N .
2. Determine A, the subset of S, the event whose probability is to

be found. Count the number of elementary events contained in
A — suppose n(A) elementary events make up the event A.
3. Then Pr(A) = n(A)/N .
Example 12A: 100 people bought tickets in a charity raffle. 60 of them bought the
tickets because they supported the charity. 75 bought tickets because they liked the
prize. No one who neither supported the charity nor liked the prize bought a ticket.
(a) What is the probability that the prize-winning ticket was bought by someone who
liked the prize?
(b) What is the probability that the prize was won by someone who did not support
the charity?
(c) What is the probability that the prize was won by someone who both supported
the charity and liked the prize?
(a) Let A and B be the events “liked the prize” and “support the charity” respectively.
To find Pr(A), we apply the three steps as follows:
1. N = 100
2. n(A) = 75
3. Therefore Pr(A) = n(A)/N = 75/100 = 0.75.
(b) To find Pr(B), we only have to go through steps 2 and 3.
1. n(B) = 40
49
Set
A set is a collection of outcomes.
Sample space
The sample space is the set of all possible outcomes of a random experiment. A
sample space is usually denoted by the symbol S and the collection of elements
contained in S enclosed in curly brackets { }.
Sample point
A sample point is an individual outcome (element) in a sample space.
Examples
1) Tossing a single coin. S = {h, t}.
2) Tossing a die. S = {1, 2, 3, 4, 5, 6}.
3) Tossing a pair of dice

S= { (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6) }.
4) Tossing two coins. S = {hh, ht, th, tt}.
5) Drawing a card from a deck of cards. The elements in the sample space are listed
below.
S = {2♦ 3♦ 4♦ 5♦ 6♦ 7♦ 8♦ 9♦ 10♦ J♦ Q♦ K♦ A♦
2♥ 3♥ 4♥ 5♥ 6♥ 7♥ 8♥ 9♥ 10♥ J♥ Q♥ K♥ A♥
2♣ 3♣ 4♣ 5 ♣ 6♣ 7♣ 8♣ 9♣ 10♣ J♣ Q♣ K♣ A♣
2♠ 3♠ 4♠ 5♠ 6♠ 7♠ 8♠ 9♠ 10♠ J♠ Q♠ K♠ A♠ }
Each outcome listed in the above examples is a sample point.
Event
An event is a subset of a sample space i.e. a collection of sample points taken from a sample
space.
Impossible event
An impossible event is an event that cannot happen (has probability zero).
50
Certain event
A certain event is an event that is sure to happen (has probability 1).
Simple events are events that involve only one sample point (outcome) of the sample space
.
Examples
1) Let E denote the event “an odd number is obtained when tossing a single die”.
Then E = {1, 3, 5}.
2) Let H denote the event “at least one head appears when tossing two coins”.
H = {hh, ht, th}.
3) Let B denote the event “obtaining a club and a heart in a single draw from a deck of
cards”. The event B is impossible. The set of outcomes of B is an empty set denoted by
B = { } = .
4) Let A denote the event “obtaining a 1, 2, 3, 4, 5 or 6 when tossing a single die”. The
event A is a certain event i.e. one of the outcomes belonging to the set describing the
event must happen. This is denoted by A = S, where S is the sample space.
Venn diagrams
 A Venn diagram is a drawing, in which circular areas represent groups of items
usually sharing common properties.
 The drawing consists of two or more circles, each representing a specific group or
set, contained within a square that represents the sample space. Venn diagrams are
often used as a visual display when referring to sample spaces, events and
operations involving events.
3.2 Complements, Unions and Intersections of

events
Compound events
These are events that involve more than one event. Such events can be obtained by
performing various operations involving two or more events.
Some of the operations that can be performed are described in the sections that follow.
Complementary events
The complementary event Ā (sometimes written À) of an event A is all the outcomes in S
that are not in A.
51
Examples
1) Consider the experiment of tossing a single die. S = {1, 2, 3, 4, 5, 6}. The complement
of the event A = “obtaining a 3 or less” = {1, 2, 3} is
A = “obtaining a 4 or more” = {4, 5, 6}.
2) Consider the experiment of tossing two coins. S = {hh, ht, th, tt}. The complement of
the event H = “at least one head”= {hh, ht, th} is H  “no heads” = {tt}.
Union and intersection of events
 The union of two events A and B, denoted by A  B , is the set of outcomes that are
in A or in B or in both A and B i.e. the event that
“either A or B or both A and B occur”
or “at least one of A or B occurs”.
 The intersection of two events A and B, denoted by A  B , is the set of outcomes

that are in both A and B i.e. the event that
“both A and B occur”.
The Venn diagrams below show the sets A  B and A  B .
A  B is the event “a sample point is in B but not in A”.

A  B is the event “a sample point is in A but not in B”.
52
These definitions involving two events can be extended to ones involving 3 or more events
e.g. for the 3 events A1, A2 and A3 the event A1  A2  A3 is the event “at least one of A1, A2
or A3 occurs” and A1  A2  A3 the event “A1 and A2 and A3 occur”.
Examples
1) Consider the events A = {1, 3, 6, 7, 8} and B = { 2, 3, 5, 7, 9} defined on a sample space

S = {1, 2, 3, . . . , 10}.
A  B = {1, 2, 3, 5, 6, 7, 8, 9} , A  B = { 3, 7},
A  B = {2, 5, 9}, A  B = {1, 6, 8}.
2) Let C be the event “drawing a face card from a deck of cards” and A the event “drawing
a king or an ace from a deck of cards”.
C = {J♦, Q♦, K♦, J♥, Q♥, K♥,

J♠, Q♠, K♠, J♣, Q♣, K♣}
A = {A♦, A♥, A♠, A♣, K♦, K♥, K♠, K♣}.
C  A = {J♦, Q♦, K♦, J♥, Q♥, K♥,

J♠, Q♠, K♠, J♣, Q♣, K♣,
A♦, A♥, A♠, A♣}.
C  A = { K♦, K♥, K♠, K♣}.
Mutually exclusive (disjoint) events

Two events A and B are mutually exclusive (disjoint) if they have no elements
(outcomes) in common .This also means that these events cannot occur together.
53
Examples
1) Let B be the event “drawing a black card from a deck of cards” and R the event “drawing
a red card from a deck of cards”.
The events B and R have no outcomes in common i.e. B  R   (empty set). Hence B
and R are mutually exclusive.
2) Let E be the event “an even number with a single throw of a die” and O the event “an
odd number with a single throw of a die” i.e. E = {2, 4, 6} and O = {1, 3, 5}.
E and O have no outcomes in common i.e. E  O   and are therefore mutually

exclusive.
3.3 Definitions of probability
Classical definition of probability

If there are n equally likely total numbers of outcomes of which m are favorable to
an event A, then the probability of occurrence of the event A, denoted as P(A), is
given by
N ( A) m
P(A) = = ,
N (S ) n
where N(A) = m is the number of outcomes favourable to the event A and N(S) = n
the number of outcomes in the sample space S i.e. the total number of outcomes.
Note: Since N(A) ≥ 0 and N(A) ≤ N(S), 0 ≤ P(A) ≤ 1.
Examples
1) Two coins are tossed. Find the probability of getting

(i) exactly two heads.
(ii) at least one head.
Solution:
Here, S = {hh, ht, th, tt} .

(i) Let A = getting exactly two heads = {hh}
∴ P(A) = ¼.
54
(ii) Let B = getting at least one head = {hh, ht, th}

∴ P(B) = ¾.
2) Two dice are rolled. Find the probability that a sum of 7 will occur.
Solution:
The number of sample points in S is 36 (see example 3 under sample space).
Let A = “a sum of 7 will occur”.
= {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}
∴ P(A) = 6/36 = 1/6.
The classical definition of probability requires the assumption that all the outcomes in the
sample space are equally likely. If this assumption is not met, this formula cannot be used.
Example
The possible temperatures (degrees Celsius) in Durban on a particular day in December are
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
In December Durban is hot so, for example, 15 degrees is less likely than 30 degrees
i.e. P (temperature = 15) = 1 ÷ 25 = 0.04 does not seem reasonable.
Relative frequency (empirical) definition of probability

If an experiment is repeated n times and an event A is observed f times, then the
estimated probability of occurrence of an event A is given by
f
P(A) = .
n
Note: This formula differs from the classical formula in the sense that the classical
formula uses all the outcomes in the sample space as the total number of outcomes,
while the relative frequency formula uses the number of repetitions (n) of the
experiment as total number of outcomes. In the classical formula the number of
outcomes in the sample space is fixed, while the number of repetitions of an
experiment (n) can vary. It can be shown that the empirical probability is a good
approximation of the true probability when n is sufficiently large.
55
Examples
1) A bent coin is tossed 1000 times with heads coming up 692 times.
692
An estimate of P(h) is  0.692.
1000
2) A summary of the final marks in a certain statistics course is shown below.
Mark f
less than 30 6
30 – 39 26
40 – 49 45
50 – 59 64
60 – 69 82
70 – 79 37
80 – 89 22
90 – 99 8
Total 290
From the table (using the empirical formula) the following probabilities can be
estimated.
26  6
(a) P(mark less than 40) =  0.110.
290
64  82  37  22  8 6  26  45 213
(b) P(pass) =  1   0.73.
290 290 290
22  8
(c) P(above 80) =  0.103.
290
Marginal and joint probabilities

 Probabilities involving the occurrence of single events are called marginal
probabilities.
 Probabilities involving the occurrence of two or more events are called joint
probabilities
Example
The preference probabilities according to gender for 2 different brands of a certain product
are summarized in the table on the following page.
The gender marginal probabilities are obtained by summing the joint probabilities over the
brands. The brand marginal probabilities are obtained by summing the joint probabilities
over the genders.
56
Brand
Marginal
1 2
Probability
Male 0.2 0.32 0.52
Gender
Female 0.4 0.08 0.48
Marginal
0.6 0.4 1
Probability
Joint probabilities: P(male  brand 1) = 0.20, P(male  brand 2) = 0.32,

P(female  brand 1) = 0.40, P(female  brand 2) = 0.08
Marginal probabilities: P(male) = 0.52, P(female) = 0.48,

P(brand 1) = 0.60, P(brand 2) = 0.40.
3.4 Counting formulae

The computation of probabilities using the classical definition involves counting the number
of outcomes favourable to the event of interest (say event A) and the total number of
possible outcomes in the sample space. The following formulae can be used to count
numbers of outcomes to be used in the classical definition formula.
Addition and Multiplication formulae for counting

Addition formula – If an experiment can be performed in n ways, and another experiment
can be performed in m ways then either of the two experiments can be performed in (n+m)
ways.
This rule can be extended to any finite number of experiments. If one experiment can be
done in n1 ways, a second one in n2 ways, . . . , a kth one in nk ways, then one of the k
experiments can be done in n1 + n2 +. . . + nk ways.
Example:
Suppose a man is standing in a room which has 2 doors to his left and 1 door to his
right. In how many ways can he leave the room?
Solution:
Let “leave the room by going to the left” be experiment 1 and “leave the room by
going to the right” be experiment 2. There are n=2 ways to do experiment 1 (he can
leave by door A or door B) and there is m=1 way to do experiment 2 (he can leave by
door C). In total there are n+m = 2+1 = 3 ways to leave the room.
57
Multiplication formula – If an experiment can be done in n ways and another experiment

can be done in m ways, then both of the experiments can be done in n × m ways.
This rule can be extended to any finite number of experiments. If one experiment can be
done in n1 ways, a second one in n2 ways, . . . , a kth one in nk ways, then the k
experiments together can be done in n1×n2×…×nk ways.
Example 1:
A basic meal consists of soup, a sandwich and a beverage. If a person having this
meal has 3 choices of soup, 4 choices of sandwiches and a choice of coffee or tea as
a beverage, how many such meals are possible?
Choosing soup (experiment 1) has 3 possibilities.

Choosing a sandwich (experiment 2) has 4 possibilities.
Choosing a beverage (experiment 3) has 2 possibilities.
Number of choices of meals = 3 × 4 × 2 = 24.
Example 2:
A PIN to be used at an ATM can be formed by selecting 4 digits from the digits
0, 1, 2, . . . , 9 . How many choices of PIN are there if
(a) digits may be repeated?

(b) digits may not be repeated?
(a) First digit – 10 choices,

second digit – 10 choices,
third digit – 10 choices,
fourth digit – 10 choices.
number of choices = 10 × 10 × 10 × 10 = 104 = 10 000.
58
(b) First digit – 10 choices,

second digit – 9 choices,
third digit – 8 choices,
fourth digit – 7 choices.
number of choices = 10 × 9 × 8 × 7 = 5040.
Factorial notation
In how many ways can n (n – integer) objects be arranged in a row?
Let n = 2 : 1st object – 2 choices

2nd object – 1 choice.
Number of ways = 2 × 1 = 2.
Let n = 3 : 1st object – 3 choices

2nd object – 2 choices.
3rd object – 1 choice.
Number of ways = 3 × 2 × 1 = 6.
In general: the number of ways is n × (n-1) × (n-2) ×. . . × 2 × 1 = n ! (n factorial)
Using this notation 2 × 1 = 2 ! = 2

3×2×1=3!=6
4 × 3 × 2 × 1 = 4 ! = 24 etc.
Note: 1 ! = 1, 0 ! = 1.
The factorial notation is used in counting formulae.
Examples
1) In how many ways can 7 people be placed in a queue at a bus stop?
The 7 people have to be placed in the 7 positions from 1st to 7th.

No. of ways = 7 × 6 × 5 × . . . × 2 × 1 = 7 ! = 5040.
2) In how many ways can 5 books be arranged in a row?
No. of ways = 5 × 4 × 3 × 2 × 1 = 5 ! = 120.

59
Permutations and combinations
Permutation
 A permutation is the number of different arrangements of a group of items where
order matters.
 The number of permutations of n objects taken r at a time is calculated from
n!
nPr = P(n, r) = .
(n  r )!
Combination
 A combination is the number of different selections of a group of items where order
does not matter.
 The number of combinations of a group of n objects taken r at a time is calculated
from
n n!
nCr = C(n, r) = ( r ) = .
(n  r )!r!
Examples:
1) Four people (A, B, C, D) serve on a board of directors. A chairman and vice-chairman are
to be chosen from these 4 people. In how many ways can this be done?
Chairman Vice-chairman
A B
B A
A C
C A
A D
D A
B C
C B
B D
D B
C D
D C
Number of ways = 12.
2) Four people (A, B, C, D) serve on a board of directors. Two people are to be chosen from
them as members of a committee that will investigate fraud allegations. In how many
ways can this be done?
People chosen A and B A and C A and D B and C B and D C and D
Number of ways = 6.
60
In both these examples a choice of 2 people from 4 people is made. However, in example 1
the order of choice of the 2 people matters (since the one person chosen is chairman and
the other one vice-chairman). In example 2 the order does not matter. The only interest is in
who serves on the committee.
Application of formulae.
In question 1 the permutations formula applies with n = 4, r =2.
4!
Number of ways = P(4, 2) =  12.
(4  2)!
In question 2 the combinations formula applies with n = 4, r =2.
4!
Number of ways = C(4, 2) =  6.
2!(4  2)!
3) Find the number of ways to take 4 people and place them in groups of 3 at a time where
order does not matter.
Solution:
Since order does not matter, use the combination formula.
4! 24
C(4,3) =  4 .
3!(4  3)! 6
4) Find the number of way to arrange 6 items in groups of 4 at a time where order matters.
6! 720
Solution: P(6,4) =   360
(6  4)! 2!
There are 360 ways to arrange 6 items taken 4 at a time when order matters.
5) Find the number of ways to take 20 objects and arrange them in groups of 5 at a time
where order does not matter.
20! 20.19.18.17.16
Solution: C(20,5) =   15504
5!(20  5)! 1.2.3.4.5
There are 15 504 ways to arrange 20 objects taken 5 at a time when order does not
matter.
61
6) Determine the total number of five-card hands that can be drawn from a deck of 52
cards.
Solution:
When a hand of cards is dealt, the order of the cards does not matter. Thus the
combinations formula is used.
There are 52 cards in a deck and we want to know in how many different ways we can
draw them in groups of five at a time when order does not matter. Using the
combination formula gives
C(52,5) = 2 598 960.
7) There are five women and six men in a group. From this group a committee of 4 is to be
chosen. In how many ways can the committee be formed if the committee is to have at least
3 women in it?
Solution:
8) In how many ways can a phone number consisting of 5 digits be chosen from the digits
1, 2, 3, . . . , 9 if no digits are to be repeated?
Solution:
9) In how many ways can the 6 winning numbers in a Lotto draw be selected?
Solution:
10) In many ways can a five-card hand consisting of three eight's and two sevens be dealt?
Solution:
62
11) How many different 5-card hands include 4 of a kind and one other card?
Solution:
We have 13 different ways to choose 4 of a kind: 2's, 3's, 4's, … Queens, Kings and
Aces.
Once a set of 4 of a kind has been removed from the deck, 48 cards are left.
Remember OR means add.
The possible situations that will satisfy the above requirement are:
4 Aces and one other card C(4,4)×C(48,1) = 48.
or 4 Kings and one other card C(4,4)×C(48,1) = 48.
or 4 Queens and one other card C(4,4)×C(48,1) = 48.

.
.
.
or 4 twos and one other card C(4,4)×C(48,1) = 48.
Total of 48×13 = 624 ways.
3.5 Basic probability formulae
Complementary events
For any event A defined on some sample space,
P( A ) = 1 – P( A).
Union of two or more events
For any two events A and B defined on some sample space,
P( A  B)  P(A) + P(B) for mutually exclusive events.
P( A  B)  P(A) + P(B) – P( A  B) for events that are not mutually exclusive.
These formulae can be extended to probabilities involving more than two events
e.g. for 3 events A, B and C defined on some sample space
3 COUNTING METHODS 32
2.1 Exercise
EX 01 Let V = {v|0 < v < 5}, W = {0, 5}, X = {1, 2, 3, 4}, Y = {2, 4}, Z = {x|1 ≤ x ≤ 4}. Which of
the following statements are true, and which are false:
(a)V = W (e) X = Z
(b)Y ⊂ X (f ) Z(V
(c)W ⊃ V (g) Y ⊂ W
(d)Z ⊃ X (h) Y ∈Z
EX 02 A small town has three grocery stores (1, 2 and 3). Four ladies living in this town each randomly
and independently pick a store in which to shop. Give the sample space of the experiment which
consists of the selection of the stores by the ladies. Then define the events:
(a) A: all the ladies choose Store 1

(b) B: half the ladies choose Store 1 and half choose Store 2
(c) C: all the stores are chosen (by at least one lady).
EX 03 : Let A, B and C be three arbitrary events. Find expressions for the events
(a) only A occurs

(b) both A and B but not C occur
(c) all three events occur
(d) at least one occurs
(e) at least two occur
(f) exactly one occurs
(g) exactly two occur
(h) no more than two occur
(i) none occur.
EX 03 : Let A and B be two events defined on a sample space S. Write down an expression for each of
the following events, express their probabilities in terms of Pr(A), Pr(B) and Pr(A ∩ B)), and
evaluate their probabilities if Pr(A) = 0.3, Pr(B) = 0.4 and Pr(A ∩ B) = 0.2:
(a) either A or B occurs

(b) both A and B occur
(c) A does not occur
(d) A occurs but B does not occur
(e) neither A nor B occurs
(f) exactly one of A or B occurs.
3 Counting Methods
In calculating probabilities, it is very essential that we be able to count sample points corresponding
to S and E in the event. However, this sometimes becomes a tedious job, and thus compact counting
methods are necessary. A branch of Algebra, called “Permutations” and “Combinations” is very useful
here.
Suppose two operations A and B are carried out, and if there are “m” different ways of carrying out
A and “k” different ways of carrying out B, then the combined operation of A and B may be carried
out in m × k = mk
3.1 Permutations
The number of permutations (or arrangements) of n distinct objects, taken all together is
n! = n × (n − 1) × . . . × 2 × 1
0! = 1 by defination
1! = 1 × 1 = 1
2! = 2 × 1 = 2
3! = 3 × 2 × 1 = 6
4! = 4 × 3 × 2 × 1 = 24
5! = 5 × 4 × 3 × 2 × 1 = 120
..
.
10! = 10 × 9 × . . . × 2 × 1
Note that n! is read as n factorial.

Defn: A set containing n distinguishable objects has
n × (n − 1) × . . . × 2 × 1
different orderings of the objects belonging to the set. We say that there are n! distinct arrangements
(technically, we call each arrangement or ordering a permutation) of the n objects in the set.
A permutation
• is the number of different arrangements of a group of items where order matters.
• The number of permutations of n objects taken r at a time is calculated from
n n!
Pr = P (n, r) =
(n − r)!
• Example 04A: If the set A = {1, 2, 3}, list all the possible permutations. There are 3! =
3 × 2 × 1 = 6 distinct arrangements of the objects in A. They are: 123|132|213|231|312|321.
• Example 04B: Consider the three letters A, B, C, the number of possible arrangements of these
three letters will be 3! = 3×2×1 = 6. These arrangements are given by ABC,ACB,BAC,BCA,CAB,
CBA.
• Example 05: The focusing mechanism on Ron’s camera is bust, so that he can only take
pictures of people at a distance of 2 metres, so he only takes pictures of 3 people at a time. How
many different pictures (a rearrangement of the same people is considered a different picture)
are possible if 10 people are present?
Solution 05:This is the same as asking for the number of permutations of 10 objects taken 3
at a time, given by
10 10!
P3 = P (n, r) = = 10 × 9 × 8 = 720.
(10 − 3)!
• Example 06: Four people (A, B, C, D) serve on a board of directors. Two people are to be
chosen from them as members of a committee that will investigate fraud allegations. In how
many ways can this be done?
Solution 05: People can chosen : A and B, A and C, A and D, B and C, B and D, C and D
Number of ways= 6 ways
• Example 07A: Suppose there are 19 political parties contested an election. One party wanted
the ballot papers to have the parties listed in random order. Another said it was impractical.
How many different orderings would have been possible?
Solution 07: This is equivalent to asking: “How many permutations of 19 objects taken 19 at
a time are there?” The answer is:
19 19! 19!
P19 = P (n, r) = = = 19!
(19 − 19)! 0!
3.2 Permutations, with repetitions
We now suppose that we have n types of objects and r slots, and that we have at least r objects
of each type available. We can thus fill the first slot with any of the n types of objects, there
are still n types of objects available for the second slot, . Because there are at least r objects of
each type, there are still objects of each of the n types available for the final, rth slot.
Thus the number of permutations of n types of objects taken r at a time, allowing repetitions is
n × n × n . . . × n) = nr
• Example 07B: How many four digit numbers of ATM pins can be generated from the 10 digits
from 0 to 9, a)if repetitions are permitted? b)if repetitions are not permitted?
Solution 07B:
(a) We have four slots to fill. But because all of the 10 digits remain available to fill every slot,
this can be done in 104 = 10000 ways. This makes sense, because there are 10 000 numbers from
0 (actually 0 000) to 9 999.
(b) Repetitions not allowed
10 × 9 × 8 × 7 = 5040
ways.
3.3 Combinations
Combination
• A combination is the number of different selections of a group of items where order does not
matter.
• The number of combinations of a group of n objects taken r at a time is calculated from
n n!
Cr = C(n, r) =
r!(n − r)!

n
Therefore the formula for can also be written as
r

n n!
=
r r!(n − r)!
• Example 08: In how many ways can a 9 man work team be formed from 15 men? The problem
asks only for the number of ways of choosing 9 men out of 15:

15 15!
= = 5005
9 9!6!
• Example 09: How many different bridge hands can be dealt from a pack of 52 playing cards?
A bridge hand contains 13 cards — what matters is only the group of cards (even though you
might arrange them in a convenient order). Therefore, bridge hands consist of combinations of
52 objects taken 13 at a time:

52 52!
= = 635013559600.
13 13!39!
At 15 minutes per bridge game, there are enough different bridge hands to keep you going for
about 20 million years continuously.
4 SET THEORY 35
• Example 010: From 8 accountants and 5 computer programmers, in how many ways can one
select a committee of
(a) 3 accountants and 2 computer programmers?
(b) 5 people, subject to the condition that the committee contain at least 2 computer program-
mers and at least two accountants.
Solution 010:
8
(a) We can choose 3 accountants from 8 in ways. We can choose 2 computer programmers
3

5
from 5 in ways. We multiply the results, because for every group of 3 accountants that
2

5
we choose, we can choose one of different groups of computer programmers. Thus we can
2
choose a committee of 3 accountants and 2 computer programmers in

8 5
= 56 × 10 = 560ways
3 2
(b) The total possible number of ways of forming the committee is: 840 ways (Work it out!)
4 Set Theory
4.1 VENN DIAGRAM:
A Venn diagram is a drawing, in which circular areas represent groups of items usually sharing common
properties. The drawing consists of two or more circles, each representing a specific group or set,
contained within a square that represents the sample space. Venn diagrams are often used as a visual
display when referring to sample spaces, events and operations involving events.
Complementary : The complementary event A (sometimes written Ac ) of an event A is all the
outcomes in S that are not in A.
Examples
• Consider the experiment of tossing a single die. S = {1, 2, 3, 4, 5, 6}. The complement of the
event A = (obtaining a 3 or less) = {1, 2, 3} is A= “obtaining a 4 or more” = 4, 5, 6.
• Consider the experiment of tossing two coins. S = {HH, HT, T H}. The complement of the
event H = (at least one head)= {HH, HT, T H} is H “no heads” = {}.
Union and intersection of events
• The union of two events A and B, denoted by A ∪ B, is the set of outcomes that are in A or in
B or in both A and B i.e. the event that “either A or B or both A and B occur” or “at least
one of A or B occurs”.
• The intersection of two events A and B, denoted by A ∩ B, is the set of outcomes that are in
both A and B i.e. the event that “both A and B occur”.
The Venn diagrams below show the sets A ∪ B and A ∩ B.
• A ∩ B is the event “a sample point is in B but not in A”.

4 SET THEORY 36
• A ∩ Bis the event “a sample point is in A but not in B”.
• These definitions involving two events can be extended to ones involving 3 or more events e.g.
for the 3 events A1 , A2 and A3 the event A1 ∪ A2 ∪ A3 is the event “at least one of A1 , A2 or A3
occurs” and A1 ∩ A2 ∩ A3 the event A1 , A2 and A3 occur”.
• Examples : Consider the events A = {1, 3, 6, 7, 8} and B = {2, 3, 5, 7, 9} defined on a sample

space S = {1, 2, 3, . . . , 10}.
The following events can be derived A = {1, 2, 3, 5, 6, 7, 8, 9} A ∩ B = {3, 7},
A ∩ B = {2, 5, 9}, ‘ A ∩ B = {1, 6, 8}.
4.2 Mutually Exclusive events

Two events A and B are mutually exclusive (disjoint) if they have no elements (outcomes) in common
.This also means that these events cannot occur together.
Examples
1 Let B be the event “drawing a black card from a deck of cards” and R the event “drawing a red
card from a deck of cards”.
The events B and R have no outcomes in common i.e.B ∩ R = φ(empty set). Hence B and R
are mutually exclusive.
2 Let E be the event “an even number with a single throw of a die” and O the event “an odd
number with a single throw of a die” i.e. E = (2, 4, 6) and O = (1, 3, 5).
Hence, E and O have no outcomes in common i.e. E ∩ O = φ(empty set)and are therefore
mutually exclusive
5 DEFINITION OF PROBABILITY 37
5 Definition of Probability
Classical definition of probability If there are n equally likely total numbers of outcomes of which m
are favorable to an event A, then the probability of occurrence of the event A, denoted as P(A), is
given by
N (A) n
P (A) = =
N (S) m
where N(A) = m is the number of outcomes favourable to the event A and N(S) = n the number of
outcomes in the sample space S i.e. the total number of outcomes.
Note: Since N (A) ≥ and N (A) ≤ N (S), 0 ≤ P (A) ≤ 1.
Examples
5.1 COMPUTING PROBABILITIES WHEN THE ELEMENTARY EVENTS ARE

ALL EQUIPROBABLE
If all the N elementary events contained in a sample space have probability 1/N, the following three
steps enable the probability of any event A in the sample space to be found:
1. Determine the sample space S made up out of elementary events. Determine

the number, N = n(S) of elementary events contained in S. You might have
to list and count them, or you might be able to use one of the “counting
rules” given below. If the N elementary events are equally probable, then
each one occurs with probability 1/N.
2. Determine A, the subset of S, the event whose probability is to be found.
Count the number of elementary events contained in A — suppose n(A)
elementary events make up the event A.
3. Then Pr(A) = n(A)/N.
Examples
QUE 08 Example 08 If 30% of Nigerians are obese (A) and that 4% of Nigerians suffer from diabetes
(B). 2% are both obese and suffer from diabetes. What is the probability that a randomly
selected person is obese or suffers from diabetes?
• Solution 08 Here, P(A) =0.3, P(B) =0.04 and P(A and B) =P (A ∩ B) =0.02. Then,
P (AorB) = P (A ∪ B) = P (A) + P (B) − P(A and B)

= 0.3 + 0.04 − 0.02 = 0.32
QUE 09 Example 09 What is the probability that the individual selected is male or against abortion. Let
A = {M ale{ and B the event B = {against{, consider a survey of 1000 people with possibility
of interviewing 445 men, P(A) = 451/1000 and 442 men were against abortion, i.e. P(B) =
442/100 and P(A and B) = 203/1000
• Solution 08
P (AorB) = P (A ∪ B) = P (A) + P (B) − P(A and B)

451 442 203
= + −
1000 1000 1000
609
=
1000
Que 10 Two coins are tossed. Find the probability of getting (i) exactly two heads. (ii) at least one
head.
Ans 10 Solution: Here, S = {HH, HT, T H, T T } .

(i) Let A = getting exactly two heads = {HH}
∴ P (A) = 14 .
(ii) Let B = getting at least one heads = {HH, HT, T H}
∴ P (B) = 34 .
Que 11 Two dice are rolled. Find the probability that a sum of 7 will occur.
Ans 11 Solution:
The number of sample points in S is 36 (see example 3 under sample space).
Let A = “a sum of 7 will occur”. A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}
6
∴ P (A) = 36 == 16 .
Que 12 Example 12A: 100 people bought tickets in a charity raffle. 60 of them bought the tickets
because they supported the charity. 75 bought tickets because they liked the prize. No one who
neither supported the charity nor liked the prize bought a ticket.
(a) What is the probability that the prize-winning ticket was bought by someone who liked the
prize?
(b) What is the probability that the prize was won by someone who did not support the charity?
(c) What is the probability that the prize was won by someone who both supported the charity
and liked the prize?
Ans 12 Solution 12: (a) Let A and B be the events “liked the prize” and “support the charity”
respectively. To find Pr(A), we apply the three steps as follows:
1. N = 100
2. n(A) = 75
3. Therefore Pr(A) = n(A)/N = 75/100 = 0.75.
(b) To find Pr(B), we only have to go through steps 2 and 3.
1. n(B) = 40
2. Thus Pr(B) = n(B)/N = 40/100 = 0.4
(c) We now need Pr(A ∩ B):
1. n(A ∩ B) = 60 + 75 − 100 = 35
2. Pr(A ∩ B) = n(A ∩ B)/N = 35/100 = 0.35.
• Example 13: A pack of playing cards contains 52 cards, 13 belonging to each of the four suites
“Spades”, “Hearts”, “Diamonds” and “Clubs”. Within each suite the 13 cards are labelled: Ace,
2,. . . , 10, Jack, Queen, King. Let D be the event that a randomly selected card is a diamond, and
K be the event that the card is a king, and B be the event that the card has one of the numbers
from 2 to 10. Find Pr(D), Pr(K), Pr(B), Pr (D ∩ K), P r(B ∪ D), P r(B ∩ K) and Pr(D ∪ K ∪ B).
• Example 14: The seats of a jet airliner are arranged in 55 rows (numbered 1 to 55) of 10 seats
(lettered A to K, leaving out I). In each row, seats C, D, G and H are on aisles, and A and K
are window seats. Smoking is permitted in rows 45 to 55 inclusive. If a passenger is assigned a
seat at random, what is the probability of being allocated
(a) an aisle seat?

(b) a seat in the smoking section?
(c) ) a window seat in the non-smoking section?
(d) a window seat in row 1?
5.2 Independent events

Definition 2.3. Two events A and B of a sample space S are called independent if and only if
P (A ∩ B) = P (A) × P (B). (1)
Example: The following diagram shows two events A and B in the sample space S. Are the events A
and B independent?
Answer : There are 10 black dots in S and event A contains 4 of these dots. So the probability
4 5
of A, is P (A) = 10 . Similarly, event B contains 5 black dots. Hence P (B) = 10 . The conditional
probability of A given B is
P (A ∩ B) 2
P (A|B) = =
P (B) 5
The intuitive feeling is that independent events have no effect upon each other. But how do we
decide whether two events A and B are independent? If the occurrence of event A has nothing to do
with the occurrence of event B, then we expect the conditional probability of B given A to be the
same as the unconditional probability of B:
Pr(A|B) = Pr(B).
The information that event A has occurred does not change the probability of B occurring. If
P r(B|A) = P r(B), then, using the definition of conditional probability,
Pr(A ∩ B)
= Pr(B).
Pr(A)
or
Pr(A ∩ B) = Pr(A) × Pr(B).
This leads us to definition of independent events.
Events A and B are independent if
Pr(A ∩ B) = Pr(A) × Pr(B).
In words, the probability of the intersection of independent events is the product of their individual
probabilities.
Remark The definition can be extended to the independence of a series of events i.e. Multipli-
cation of series of probabilities: if events A1 , A2 , . . . , An are independent, then
Pr(A1 ∩ A2 . . . , ∩An ) = Pr(A1 ) × Pr(A2 ) × . . . × Pr(An ).
• Example 17A: Two coins are tossed. Using the classical definition of probability,
P(both tosses heads) = 21 × 12 = 14 .
Assuming that both coins are unbiased, P(1st coin is heads) = P(2nd coin is heads) = 12 .
Hence P(1st coin is heads) × P(2nd coin is heads) = 12 × 21 = 14 = P(both tosses heads), the
events “heads on the first toss” and “heads on the second toss” are independent
• Example 17B: A coin is tossed and a single 6 sided die is rolled. Find the probability of
“heads” and rolling a 3 with the die.
Solution: P(head) = 1/2 and P(3) = 1/6. Since the results of the coin and the die are
independent, P (headsand3) = P (heads) × P (3) = (1/2) × (1/6) = 1/12
• Example 18: A school survey found that 9 out of 10 students like pizza. If three students are
chosen at random with replacement, what is the probability that all three students like pizza?
Solution:
P(student 1 likes pizza) = 9/10 = P(student 2 likes pizza) = P(student 3 likes pizza).
P(student 1 likes pizza and student 2 likes pizza and student 3 likes pizza)
= P(student 1 likes pizza) × P(student 2 likes pizza) × P(student 3 likes pizza)

9 3
= 10 =0.729
• Example 19: The probability that person A will be alive in 20 years is 0.7 and the probability
that person B will be alive in 20 years is 0.5, while the probability that they will both be alive
in 20 years is 0.45. Are the eventsE1 “A is alive in 20 years” and E2 “B is alive in 20 years”
independent?
Solution:
P (E1 ) = 0.7, P (E2 ) = 0.5, P (E1 ∩ E2 ) = 0.45
Since P (E1 )×P (E2 ) = 0.7×0.5 = 0.35 6= P (E1∩E2), the events E1 and E1 are not independent.
• Example 20: Let A be the event that a microchip is manufactured perfectly. Let B be the event
that the chip is installed correctly. If Pr(A) = 0.98 and Pr(B) = 0.93 what is the probability
that the installed chip functions perfectly?
Solution: We require P r(A ∩ B). Because manufacture and installation may be considered
independent, we have:
Pr(A ∩ B) = Pr(A) × Pr(B) = 0.98 × 0.93 = 0.9114
• Example 22: A four-engined plane can land safely even if three engines fail. Each engine fails,
independently of the others, with probability 0.08 during a flight. What is the probability of
making a safe landing?
Solution: Let Ai be the event that engine i fails. Then the event “safe landing” can be written
as (A1 ∩ A2 ∩ A3 ∩ A4 ), the complement of the event “all engines fail”
P r(A1 ∩ A2 ∩ A3 ∩ A4 )
= 1 − (Pr(A1 ∩ A2 ∩ A3 ∩ A4 )) = 1 − (Pr(A1 ) × Pr(A2 ) × Pr(A3 ) × Pr(A4 )) .
= 1 − 0.084 = 0.999959040.
Quite safe!. On the average, about one flight in 24 414 will crash.
Example 23 The probability that the rand will weaken against the dollar tomorrow is 0.53.
The probability that you will wake up late tomorrow is 0.42.
(a) What is the probability that, tomorrow, the Naira weakens against the dollar and you wake
up late?
(a) What is the probability that, tomorrow, the Naira weakens against the dollar or you wake
up late?
• Example 24: Some financial academics argue that the day-to-day movements of share prices
are statistically independent. Assume, hypothetically, that the share De Beers has a probability
of 0.55 of rising on any given trading day. What is the probability that it rises on three successive
trading days?
5.3 Exercise
• Ex. 20: There are 33 candidates for an election to a committee of three. What is the probability
that Jones, Smith and Brown are elected?
• Ex. 21: A group of eight students fill the front row at Statistics lectures daily. They decide to
keep attending lectures until they have exhausted every possible arrangement in the front row.
For how many days will they attend lectures?
• Ex. 22: A young investor is considering the purchase of a portfolio of three shares from the
“Building and construction” sector of the stock exchange. He chooses three shares at random
from the 25 shares currently listed in this sector.
(a) How many ways can shares be selected for the portfolio?
(b) What is the probability that Everite, Grinaker and L.T.A. (three shares in this sector) are
selected?
(c) What is the probability that Grinaker is one of the selected shares?
• Ex. 23: A firm of speculative builders has bought three adjoining plots. The company builds
houses in seven styles. It is concerned about the visual appearance of the houses from the street.
So they ask their drafting section to sketch all possible selections of street views.
(a) How many sketches are required if (i) no repetitions of styles are allowed, and if (ii) they
allow repetitions of styles?
(b) If they choose one sketch at random from those in part (a)(ii), what is the probability that
all the houses will be of different styles?
(c) In order to determine the materials required, the quantity surveying department is con-
cerned only with the three styles which might be built (and not on which plot they are
built on). How many combinations of styles must they be prepared for if (i) no repetitions
of styles are allowed, and if (ii) repetitions are allowed?
• Ex. 24: Two new computer codes are being developed to prevent unauthorized access to
classified information. The first consists of six digits (each chosen from 0 to 9); the second
consists of three digits (from 0 to 9) followed by two letters (A to Z, excluding I and O).
(a) Which code is better at preventing unauthorized access (defined as breaking the code in
one attempt)?
(b) If both codes are implemented, the first followed by the second, what is probability of
gaining access in a single attempt?
• Ex. 25: A housewife is asked to rank five brands of washing powder (A, B, C, D, E) in order
of preference. Suppose that she actually has no preference, and her ordering is arbitrary. What
is the probability that
(a) brand A is ranked first?

(b) brand C is ranked first and brand D is ranked second?
• Ex. 26: A and B are events such that Pr(A) = 0.6, Pr(B | A) = 0.3, and Pr(A∪ B) = 0.72.
Are A and B independent, mutually exclusive, or both?
• Ex. 27: If the probability is 0.001 that a 20-watt bulb will fail a 10-hour test, what is the
probability that a sign constructed of 1000 bulbs will burn for 10 hours
(a) with no bulb failure?
(b) with one bulb failure?
(c) with k bulb failures?
• Ex. 28: Show that if events A and B are independent, then the following pairs of events are
also independent.
(a) A and B
(b) A and B.
• Ex. 29: The events A,B and C are such that A and B are independent and B and C are mu-
tually exclusive. Their probabilities are Pr(A) = 0.3, Pr(B) = 0.4, and Pr(C) = 0.2. Calculate
the probabilities of the following events. (a) Both B and C occur.
(b) At least one of A and B occurs.
(c) B does not occur.
(d) All three events occur.
(e) (A ∩ B) ∪ C.
5.4 Conditional Probability

The conditional probability provides a method for updating or revising probability in the availability
of new information. On Monday, the weather forecaster might say the probability of raining on
Thursday is 50% , and insist on giving probabilities as percentages!), on Tuesday he might revise
this probability in the light of additional information to 70even more reliable information, he might
say 60that each forecast was conditional on the information available up to that point in time. The
conditional probability of an event A occurring given that another event B has occurred is given by
Pr(A ∩ B)
Pr(A|B) = , whereP (B) > 0 (2)
Pr(B)
Also
Pr(A ∩ B)
Pr(B|A) = , whereP (A) > 0 (3)
Pr(A)
The conditional probability Pr(B | A) may be thought of as a reassessment of the probability of

B given the information that some other event A has occurred.
Example 25 : Five hundred (500) TV viewers consisting of 300 males and 200 females were asked
whether they were satisfied with the news coverage on a certain TV channel. Their replies are sum-
marized in the table below.
Gender Satisfied Not Satisfied Total

Male 180 120 300
Female 90 110 200
Total 270 230 500
Solution 25
180
P (satisf ied|male) = = 0.6
300
90
P(satisfied |female) = = 0.45
200
P(not satisfied|male) =
P(not satisfied | female) =
270
P(satisfied) = = 0.54 and P(not satisfied) =
500
Note
1) When calculating a conditional probability the sample space is restricted to that associated with
the event that is known to occur.
2) The probability of a person being satisfied depends on the gender of the person being interviewed.
In this case females are less satisfied than males with the news coverage.
3) In the example above, P(satisfied) and P(not satisfied) are known as marginal probabilities.
• Example 26 : At a certain university the probability of passing accounting is 0.68, the prob-
ability of passing statistics 0.65 and the probability of passing both statistics and accounting is
0.57. Calculate the probability that a student:
(a) passes statistics when it is known that he/she passed accounting.
(b) passes accounting when it is known that he/she passed statistics.
(c) passes statistics when it is known that he/she did not pass accounting.
• Solution 26: Let A denote the event “a student passes accounting” and B the event “a student
passes statistics”.
Then A is the event “a student did not pass accounting”,
A ∩ B the event “a student passes both statistics and accounting” and
A ∩ B the event “a student passes statistics, but not accounting”.
Given: P(A) = 0.68, P(B) = 0.65, P (A ∩ B) = 0.57.
(a)
Pr(A ∩ B)
Pr(B|A) =
Pr(A)
0.57
= = 0.838
0.68
(b)
Pr(A ∩ B)
Pr(A|B) =
Pr(B)
0.57
= = 0.877.
0.65
(c) P (B|A) =?
(d) P (A|B) =?
5.5 Bayes’s Theorem

For any two events A and B there are two conditional probabilities that can be considered:
Pr(A ∩ B)Pr(B)
Pr(B|A) = (4)
Pr(A)
Pr(A ∩ B)Pr(A)
Pr(A|B) = (5)
Pr(B)
A very useful tool for finding conditional probabilities is Bayes’ theorem, which connects P r(B|A)
with P r(A|B)), named in honour of Rev. Thomas Bayes, who did pioneering work in probability
theory in the 1700’s.
Bayes’ Theorem. If A and B are two events, then
Pr(A ∩ B)Pr(A)
Pr(A|B) =
Pr(B)
Pr(A|B)Pr(A)
=
Pr(B|A)Pr(A) + Pr(B|A)Pr(A)
Proof: Recall the definition of conditional probability
Pr(A|B) = Pr(A ∩ B)Pr(A)/Pr(B)

and theorem 2
Pr(B) = Pr(A ∩ B) + Pr(A ∩ B)
Substituting , we have
Pr(A ∩ B)
Pr(A|B) =
Pr(A ∩ B) + Pr(A ∩ B)
Also, we note that
Pr(A ∩ B) = Pr(B|A)Pr(A)
and
Pr(Ā ∩ B) = Pr(B|A)Pr(A)
Therefore
Pr(B|A)Pr(A)
Pr(A|B) = .
Pr(B|A)Pr(A) + Pr(B|A)Pr(A)
Example 27: A television manufacturer cannot produce the full quota of tubes it requires, so it
purchases 20% of its needs from an outside supplier. The quality manager has determined that 6% of
the tubes produced in house are defective, and that 8% of the purchased tubes are defective. He finds
the tube of a randomly selected television to be defective. What is the probability that the tube was
produced by the company.
Solution 27:
• Let D be the event “tube defective”,
• C be the event “produced by the company”.
We are given Pr(D|C) = 0.06, Pr(D|C) = 0.08 and Pr(C) = 0.8. We need to determine Pr(C|D).
By the Bayes’s Theorem
Pr(D|C)Pr(C)
P r(C|D) =
Pr(D|C)Pr(C) + Pr(D|C)Pr(C)
0.06 × 0.8
= = 0.75
0.06 × 0.8 + 0.08 × 0.2
Example 28: When testing a person for a certain disease, the test can show either a positive result
(the person has the disease) or a negative result (the person does not have the disease). When a
person actually has the disease, the test shows positive 99% of the time. When the person actually
does not have the disease the test shows negative 95% of the time. Suppose it is known that only
0.1% of the people in the population have the disease.
(a) If a test turns out to be positive, what is the probability that the person has the disease?
(b) If the test turns out to be negative, what is the probability that the person does not have the
disease?
Solution 28:
Let A = the person has the disease and B = the test returns a positive result.Then
A is the event the person does not have the disease,
B|A is the event the test is positive given the person has the disease,
B|A is the event “the test is positive given the person does not have the disease and
B|A is the event “the test is negative given the person does not have the disease”.
(a) We are given P r(A) = 0.01, P r(B|A) = 0.99 and P r(B|A) = 0.95 , then
P r(A) = 1–P r(A) = 0.999,
Similarly P (B|A) = 1–P (B|A) = 0.05.
Substituting the above formular, we obtain
(Numerator:) P r(A ∩ B) = P r(B|A)P r(A) = 0.001 × 0.99 = 0.00099

Denominator:
P r(B) = P r(B ∩ A) + P r(B ∩ A)

= P r(B|A)P r(A) + P r(B|A)P r(A)
= (0.001 × 0.99) + (0.999 × 0.05)
= 0.00099 + 0.04995 = 0.05094
Therefore
P r(A ∩ B) 0.00099
P r(A|B) = =
P r(B) 0.05094
= 0.0194
The result can be interpreted as the chances that a person will have the disease when the result
of the test shows positive is 194 in 10 000.
(b)
P r(A ∩ B)
P r(A|B) =
P r(B)
P r(A)P r(B|A) 0.999 × 0.95
= = = 0.9999895
1 − P r(B) 0.94906
From the above it can be seen that a negative result of the test is very reliable (it will be wrong
only 105 times in 10 million cases).
Example 29: The probabilities of producing a defective item with three machines M1 , M2 , M3
are 0.1, 0.08 and 0.09, respectively. At any instant, only one of the machines is being operated, in
the following percentage of the daily work, respectively: 30%, 30%, 40%. An item is randomly chosen
and found to be defective. Which machine most probably produced it?
Solution 29: Denoting the defective item by A, the total probability breaks down into:
P (M1 )P (A|M 1) = 0.3 × 0.1 = 0.03

P (M2 )P (A|M 2) = 0.3 × 0.08 = 0.024
;
P (M3 )P (A|M 3) = 0.4 × 0.09 = 0.036
. Therefore, the total probability is 0.09 and using Bayes’ Theorem we obtain: Bayes’ Theorem
P (M1 |A) = 0.33 , P (M2 |A) = 0.27P; P (M3 |A) = 0.4. The
P machine that most probably produced the
defective item is M3 . Notice that k P (Mk ) = 1 and k P (Mk |A) = 1.
Example 30: If we randomly pick two television sets in succession from a shipment of 240 tele-
vision sets of which 15 are defective, what is the probability that they will be both defective?
Answer: Let A denote the event that the first television picked was defective. Let B denote the event
that the second television picked was defective. Then A ∩ B will denote the event that both televisions
picked were defective. Using the conditional probability, we can calculate
P (A|B) = P (A)P (B/A)

13 14
=
240 239
7
= .
1912
Comment: Here, we assume that we are sampling without replacement.
Example 31: box of fuses contains 20 fuses, of which 5 are defective. If 3 of the fuses are selected
at random and removed from the box in succession without replacement, what is the probability that
all three fuses are defective?

Answer: Let A be the event that the first fuse selected is defective. Let B be the event that the
second fuse selected is defective. Let C be the event that the third fuse selected is defective. The
probability that all three fuses selected are defective is P r(A ∩ B ∩ C). Hence
rP (A ∩ B ∩ C) = P (A)P (B/A)P (C/A ∩ B)

5 4 3
=
20 19 18
1
= .
114
5.6 Exercise
EX1. You feel ill at night and stumble into the bathroom, grab one of three bottles in the dark and
take a pill. An hour later you feel really ghastly, and you remember that one of the bottles
contains poison and the other two aspirin. Your handy medical text says that 80% of people
who take the poison will show the same symptoms as you are showing, and that 5% of people
taking aspirin will have them.
Let B be the event “having the symptoms”, A be the event “taking the poison”, Then Ā is the
event “taking aspirin”.
What is the probability that you took the poison given that you have got the symptoms, i.e.
what is P r(A|B)?
EX2. A well is drilled as part of an oil exploration programme. The probability of the well passing
through shale is 0.4. If the well passes through shale, the probability of striking oil is 0.3. If it
does not pass through shale, the probability drops to 0.1.
(a) Given that oil was found, what is the probability that it did not pass through shale?
(b) b) Given that oil was not found, what is the probability that it passed through shale?
EX3. A family has two dogs (Rex and Rover) and a cat called Garfield. None of them is fond of the
postman. If they are outside, the probabilities that Rex, Rover and Garfield will attack the
postman are 30%, 40% and 15%, respectively. Only one is outside at a time, with probabili-
ties 10%, 20% and 70%, respectively. If the postman is attacked, what is the probability that
Garfield was the culprit?
Hint : Extended Bayes Theorem: The Bayes’ theorem. Prove the adult version. Let
A1 , A2 , . . . , An be a set of mutually exclusive and exhaustive events in S. Let B be any other
event. Then
Pr(B|Ai )Pr(Ai )
Pr(Ai |B) = .
Pr(B|Ai )Pr(A1 ) + Pr(B|A2 )Pr(A2 ) + . . . + Pr(B|An )Pr(An )
EX4. Suppose that a fashion shirt comes in three sizes and five colours. The three sizes (and the per-
centage of the population who purchase each size) are: small (30%), medium (50%), and large
(20%). Market research indicates the following colour preferences: white (6%), blue (26%), green
(36management of a store expects to sell 1000 of these shirts. How many shirts of each size and
colour should they order? Assume independence.
EX5. The probability of passing Statistics without doing these exercises is 0.1 and 0.8 if they are done.
If 60% of students do these exercises, what is the probability that a student has not done the
exercises if he passes Statistics?
EX6. Which of the following pairs of events would you expect to be independent, which mutually
exclusive and which neither?
(a) studying Economics and being left-handed,
(b) owning a dog and paying vet’s bills,
(c) the prices of shares in Anglovaal and Gold Fields (both in the mining house sector of the
Johannesburg Stock Exchange) both rising today,
(d) being a member of the Canoe Club and studying for a B.A.,
(e) buying sugar-free cooldrink and buying a cream doughnut for yourself.
EX7. An X-ray test is used to detect a disease that occurs, initially without any obvious symptoms, in
3% of the population. The test has the following error rates: 7% of people who are disease free
have a positive reaction and 2% of the people who have the disease have a negative reaction. A
large number of people are screened at random using the test, and those with a positive reaction
are examined further.
– (a) What proportion of people who have the disease are correctly diagnosed?
– (b) What proportion of people with a positive reaction actually have the disease?
– (c) What proportion of people with a negative reaction actually have the disease?
– (d) What proportion of the tests conducted give the incorrect diagnosis?
6 CHAPTER FOUR : RANDOM VARIABLES 48
6 Chapter Four : Random Variables
Random variables fall into two categories — discrete and continuous. The mathematical treatment
of these two types of random variables is very different - as you will learn from the remainder of this
section.
Discrete random variables take on isolated values along the real line, usually (but by no means
always) integer values. Examples of integer-valued discrete random variables are:
• the number of customers entering a store between 09h00 and 10h00
• the number of occupied tables at a restaurant
• the number of clients visited by a salesperson during a day
• the number of applicants who respond to an job advertisement
In contrast to discrete random variables, a continuous random variable can (conceptually, at least)
be measured to any degree of accuracy; i.e. between every two possible values x1 and x2 that the
random variable can assume, there is another possible value x3 , between x1 and x2 . The set of all
possible values of a continuous random variable is usually an interval of the real line. Examples of
continuous random variables are”
• the distance a car travels on one litre of petrol
• the proportion of gold in a sample of ore
• the volume of milk that actually goes into a nominally one litre carton
• the time that a customer waits in the queue at a fast food outlet
• the direction of the wind at midday.
Example 6 Which of the following are random variables? Which of the random variables are
continuous and which are discrete? Write down the set of values that each random variable can take
on.
(a) The number of customers arriving at a supermarket during the morning.
(b) The number of letters in the Greek alphabet.
(c) The opening price of gold in New York on Monday next week.
(d) The number of seats that will be sold for a performance of a play in a theatre with a capacity
of 328.
(e) The length of time you have to wait at an autobank.
(f) The ratio between the circumference and the diameter of a circle.
(g) The last digit of a randomly selected telephone number.
The distinction between discrete and continuous random variables is critical because we develop dif-
ferent mathematical approaches for the two types of random variable.
Definition: Let X be a random variable with space RX and probability density function f(x). The
mean µX of the random variable X is defined as
 P

 x∈Rx xp(x) If X is discrete

µX = (6)


 R −∞
−∞ xf (x) If X is continuous
if the right hand side exists.
The p(x) is called the probability mass function (pmf) and f (x) is the probability density function
(pdf) for the discrete and continuous random variable respectively.
A function p(x) is called a probability mass function (frequently abbreviated to p.m.f.) if it satisfies
the conditions PMF1, PMF2 and PMF3.
PMF1: p(x) is defined for all values of x, but p(x) 6= 0 only at a finite or “countably infinite” set of
points.
PMF3: all values of p(x) lie in the unit interval [0, 1], that is 0 ≤ p(x) ≤ 1.
P
PMF3: p(x) = 1, where the sum is taken over all values of x for which p(x) 6= 0.
The mean of a random variable is a composite of its values weighted by the corresponding prob-
abilities. The mean is a measure of central tendency: the value that the random variable takes “on
average.” The mean is also called the expected value of the random variable X and is denoted by
E(X). The symbol E is called the expectation operator. The expected value of a random variable
may or may not exist.
Not1 1: In the case of a discrete variable, the mean or expected value of a random variable X will
be represented by E(X) = µ and it is calculated by
X
E(X) = µ = xp(x)
. and variance X X
Var(X) = σ 2 = (x − µ)2 = x2 p(x) − µ2
. Note 2: In the case of a continuous variable , the mean or expected value of a random variable
X will be represented by E(X) = µ and it is calculated by
Z b
E(X) = µ = xf (x)
a
. and variance Z Z
b b
Var(X) = σ 2 = (x − µ)2 = x2 f (x) − µ2
a a
.
• Example 001: An unbiased die is rolled and the random variable X consists of the number
of dots appearing on the upturned face. Find the probability mass function for this random
variable. Verify the property PM!, PMF2 PMF2 and PMF3 above.
• Example 002:
a) Check that the function


 x/15 x = 1, 2, 3, 4, 5
p(x) =

0 Otherwise
satisfies the conditions for being the probability mass function of some random variable X.
Sketch the function, p(x)
b) Find Pr[2 ≤ X ≤ 4].
c) Find Pr[X ≥ 4].
• Example 003:
a) For what value of k will  k

 x! x = 0, 1, 2, 3, 4
p(x) =

0 Otherwise
will satisfy the conditions a probability mass function of some random variable X?.
b) Find Pr[X ≤ 2].
X 0 1 2 3
p(x) 1/8 3/8 3/8 1/8
• Example 004: Let X be a random variable of number of tails when a coin is tossed 3 times.
Find the expected value of the random variable X and the variance:
Solution
Let the number of tails in sample space be 0, 1, 2, 3, the probability distribution table is given
below:
Using the information in the table above
(a.) Find the probability of getting 2 or more tails .

3 1 1
P (X ≥ 2) = P (X = 2) + P (X = 3) = + = .
8 8 2
(b.) Find the probability of getting at least one tail .

3 3 1 7
P (X ≥ 1) = P (X = 1) + P (X = 2) + P (X = 3) = + + = .
8 8 8 8
(c.) Find the expected value and variance

P 1 3 3
Expected value: E(X) = µ = xp(x) = 0 × 8 +1× 8 +2× 8 + 3 × 81 .
3 6 3
= 0 + 8 + 8 + 8.
= 12
8 = 1.5.
Variance
X
Var(X) = σ 2 = x2 p(x) − µ2 .
X 1 3 3 1 12
σ2 = x2 p(x) − µ2 = {02 × + 12 × + 22 × + 32 × } − { }2
8 8 8 8 8
2
= 3 − 1.5 = 0.75
Standand deviation:
√
σ = 0.75 = 0.866
(EX.) Consider a discrete random variable with probability mass function given below.
x 1 2 3 4
P(X=x) 0.1 0.3 0.4 0.2
Find the mean , variance, and standard deviation

CHAPTER 4. RANDOM VARIABLES 107
13C The sample space, numerical values for the elementary events and their associated
probabilities are:
X = 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
Probability = 36 36 36 36 36 36 36 36 36 36 36
Y= 1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 30 36
1 2 2 3 2 4 2 1 2 4 2 1 2 2 2 1 2 1
Pr(Y ) = 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
13
Pr[x ≥ 10] = 0.1667, Pr[Y ≥ 13] = 36 .
14C (c) 0.875
15C (a), (c) and (d) are probability mass functions, but (b) is not, because
p(1) = −0.2 < 0.
16C (a) k = 24/65 (b) 48/65
17C The probability mass function is
p(x) = 0.1 × 0.9x x = 0, 1, 2, . . .

=0 otherwise
25C (a) k = 3/20 (b) Pr[2 < X < 3] = 4/5
26C (a), (b), (f) and (g) are probability mass functions, (c), (d), (e) and (h) are prob-
ability density functions.
27C (b) 7/16 (c) A = 0.5412
28C −1 ≤ A ≤ 1 (otherwise probabilities are either negative or greater than one).
29C
p(x) = 0.1 × 0.5 = 0.05 x=0
= 0.2 × 0.5 + 0.1 × 0.5 = 0.15 x=1
= 0.2 × 0.5 + 0.5 × 0.5 = 0.35 x=2
= 0.5 × 0.5 + 0.2 × 0.5 = 0.35 x=3
= 0.2 × 0.5 = 0.1 x=4
=0 otherwise
Exercises. . .
∗ 4.1 Which of the following random variables are discrete, and which are continuous?
(a) the time required to answer this question
(b) the number of words in a book chosen at random from the library
(c) the number of “heads” in 6 flips of a coin
(d) the number of goals scored in a soccer match
(e) the maximum temperature recorded at Cape Town International Airport to-
day
(f) the volume of air breathed in by an individual when asked to “take a deep
breath”
(g) the annual income to the nearest cent of a randomly chosen wage-earner
(h) the population of a randomly chosen town in the Free State
(i) the length of time you have to wait for a bus
108 INTROSTAT
(j) the amount of rain that falls in a day.

∗ 4.2 Check which of the following functions can serve as probability mass functions or
probability density functions.
(a)
p(x) = x/6 x = 1, 2, 3
=0 otherwise
(b)
3
f (x) = 10 (2 − x2 ) −1 ≤ x ≤ 1
=0 otherwise
(c)
1 3 1 1
p(x) = x x= , , ,
16 16 4 2
(d)
f (x) = 2x/3 −1 < x < 2
=0 otherwise
(e)
f (x) = 14 3<x<7
=0 otherwise
(f)
x
p(x) = 1
n(n+1)
x = 1, 2, . . . , n
2
=0 otherwise
4.3 Show that the following functions are probability mass functions.
(a)
p(x) = e−2 2x x! x = 0, 1, 2, . . .
=0 otherwise
(b)
1 x 3 5−x
p(x) = x5 4 4 x = 0, 1, 2, 3, 4, 5
=0 otherwise
4.4 Show that the following functions are probability density functions.
(a) √
f (x) = (2 x)−1 0 < x < 1
=0 otherwise
(b)
1
f (x) = 14 xe− 2 x 0 ≤ x < ∞
=0 otherwise
∗ 4.5 What must the value of k be so that the following functions are probability density
functions?
(a)
f (x) = kx2 (1 − x) 0 < x < 1
=0 otherwise
(b)
f (x) = ke−4x 0 ≤ x < ∞
=0 otherwise
4.6 A random variable X has probability density function given by
f (x) = Ax3 0 ≤ x ≤ 10
=0 otherwise
Find A. What is the probability that X lies between 2 and 5, and what is the
probability that X is less than 3? Sketch the density function.
∗ 4.7 A random variable X has probability density function
f (x) = e−x 0 ≤ x < ∞

=0 otherwise
Find the number t such that Pr[X < t] = 12 .
4.8 For the probability density function
f (x) = 2x 0 ≤ x ≤ 1
= 0 otherwise
find the number a such that the probability that X < a is three times the proba-
bility that X ≥ a.
4.9 If f (x) = 3x2 for 0 < x < 1, and zero elsewhere, find the number b, such that X
is equally likely to be greater than, or less than b.
∗ 4.10 The probability density function of the life in hours X of a certain kind of radio
tube is found to be
f (x) = 100/x2 x > 100
=0 otherwise
Three such tubes are bought for a radio set. What is the probability that none
will have to be replaced during the first 150 hours of operation?
4.11 A batch of small-calibre ammunition is accepted as satisfactory if none of a sample

of five shots fall more than 8 cm from the target at a given range. If X, the distance
from the centre of the target to an impact point, has probability density function
f (x) = xe−x 0 ≤ x < ∞

=0 otherwise
find the probability that a batch is accepted.
Further exercises. . .
4.12 A continuous random variable X has probability density function
f (x) = k a ≤ x ≤ b
= 0 otherwise
for arbitrary constants a and b. Find the value of k.
4.13 Find values for c so that the following functions may serve as probability density
functions:
110 INTROSTAT
(a)
f (x) = c + e−x 0 ≤ x ≤ 1
=0 otherwise
(b)
1
f (x) = x + 2 0≤x≤c
=0 otherwise
∗ 4.14 The density function for a random variable X is given by
f (x) = 34 (kx − x2 ) 0 ≤ x ≤ 2
=0 otherwise
(a) Determine the value of k.

(b) Calculate Pr[0 < X < 1].
∗ 4.15 (a) Check whether 4
p(x) = x5 2−x /36 x = 0, 1, 2
=0 elsewhere
is a probability mass function.
(b) Calculate Pr[X = 0], Pr[X = 1] and Pr[X = 2].
Solutions to exercises. . .
4.1 (b) (c) (d) (g) and (h) are discrete
(a) (e) (f) and (i) are continuous.
(j) is an unusual example of a mixed continuous and discrete random variable:
although the random variable is, at face value, continuous, it cannot be modelled
by a conventional probability density function because the probability of no rain
in a day is not zero but positive. The probability function for X needs to be
something like
p(x) = p x=0
= f (x) x > 0
=0 otherwise
with the “probability density function” f (x) integrating to 1 − p.
4.2 (a) (b) (e) and (f) satisfy conditions.

For (c), p(x) is not defined for all X.
For (d), f (x) < 0 for −1 < x < 0.
4.5 (a) k = 12 (b) k = 4
4.6 A = 1/2500, Pr[2 < X < 5] = 0.0609, Pr[X < 3] = 0.0081.
4.7 0.6931
4.8 0.8660
4.9 0.7937
4.10 8/27
4.11 0.9850
4.12 k = 1/(b − a)
4.13 (a) c = e−1 (b) c = 1

1
4.14 (a) k = 2 (b) 2
4.15 Pr[X = 0] = 6/36, Pr[X = 1] = 20/36, Pr[X = 2] = 10/36

7 PROBABILITY DISTRIBUTIONS 56
7 Probability Distributions
7.1 Discrete Probability Distributions
Chapter 5
PROBABILITY DISTRIBUTIONS I:
THE BINOMIAL, POISSON,
EXPONENTIAL AND NORMAL
DISTRIBUTIONS
KEYWORDS: Binomial, Poisson, exponential and normal distribu-

tions.
A number of probability mass and density functions have proved themselves useful as
“models” for a large variety of practical problems in business and elsewhere. We consider
four of the most frequently encountered probability distributions in this chapter — the
Binomial, Poisson, Exponential and Normal Distributions.
The binomial distribution . . .

The binomial distribution may be used as a probability model in situations in
which the following conditions are satisfied:
1. We have a random experiment which has a sample space with exactly two out-
comes, one of which we can label “success”, and the other “failure”: i.e. S =
{success, failure}.
e.g. A door-to-door salesperson calls on a prospective client — the client either
purchases the product (success) or does not purchase (failure).
2. The random experiment is repeated n times, n ≥ 1. The outcome on any one

repetition is not influenced by the outcome on any other repetition. We say “we
have n independent trials of the random experiment”.
e.g. The salesperson calls on n = 6 prospective clients — the clients make their
purchasing decisions independently (there is no communication between them!).
3. The probability of success remains constant from trial to trial. We assume that
each client is equally likely to purchase the product. Let Pr(success) = p; thus
Pr(failure) = 1−p. It is sometimes convenient to let q = 1−p, so that Pr(failure) =
q.
Our random variable X is the number of successes we observe in n trials. If the

conditions above are satisfied, then we say that we have a binomial process, and that
113
114 INTROSTAT
the random variable X has a binomial distribution. In the above example, X is the
number of calls that resulted in sales. Because 6 calls were made, X must assume one
of the values 0, 1, . . . , 6, and X is therefore an example of a discrete random variable.
Binomial processes occur in many contexts. From an industrial or commercial per-
spective, one of the most important binomial processes occurs in the field of quality. The
quality of a product or service, whether it is a tomato, a nail, a personal computer, a
car, an insurance policy or the punctuality of a train, can be classified as “satisfactory”
or “defective”. In particular, the binomial probability distribution provides the basis for
deciding whether or not a consignment of goods meets the desired specifications.
BINOMIAL DISTRIBUTION
In a binomial process, we have n independent trials, each trial has two
outcomes, success of failure, and Pr[success] = p for all trials. Let the
random variable X be the number of successes in n trials.
Then X has the binomial distribution, and Pr[X = x] is given by
the probability mass function
x
p(x) = n x p (1 − p)
n−x x = 0, 1, . . . , n
=0 otherwise
Once we give values to n and p, (n ≥ 1, 0 < p < 1), a particular binomial distribution
is specified. n and p are examples of what we call the parameters of the distribution.
Once the parameters of a distribution have values, a particular distribution is specified.
We have a neat abbreviated notation which saves us writing “the random variable
X is distributed binomially with parameters n and p”. We compress all this information
into the symbols X ∼ B(n, p).
Example 1A: A door-to-door salesperson calls on 6 clients per session. Each client
makes their purchasing decision independently of the others, with probability 0.2 of
purchasing the product. What are the probabilities that 0, 1, 2, 3, 4, 5 or 6 clients
purchase the product?
Clearly, the three conditions for the binomial process are satisfied, and X, the num-
ber of clients who purchase the product, has a binomial distribution with n = 6 and
p = 0.2 : thus X ∼ B(6, 0.2).
Instead of simply using the formula given in the box, let us compute from first
principles the probability of, say, 2 clients purchasing the product, i.e. Pr[X = 2]:
Firstly, 2 clients out of 6 can purchase the product in many different permutations.
Let A1 be the event that the first 2 clients purchase (these are the “successes” that we
count) and that clients 3 to 6 refuse to purchase (i.e. are “failures”). Then, using our
usual conventions, we can write
A1 = S ∩ S ∩ F ∩ F ∩ F ∩ F.
Let the events A2 , A3 represent other permutations of 2 successes and 4 failures, e.g.
A2 = F ∩ S ∩ S ∩ F ∩ F ∩ F
A3 = F ∩ F ∩ S ∩ S ∩ F ∩ F
How many permutations

of 2 successes and 4 failures are there? Counting rule 6 tells us
there are 62 = 15 such permutations, so we could write down events from A1 to A15 .
CHAPTER 5. PROBABILITY DISTRIBUTIONS I 115
Secondly, we compute Pr(A1 ). Because the clients act independently of each other,
Pr(A1 ) = Pr(S ∩ S ∩ F ∩ F ∩ F ∩ F )
= Pr(S) × Pr(S) × Pr(F ) × Pr(F ) × Pr(F ) × Pr(F )
= p2 (1 − p)4 = 0.22 × 0.84 .
Recall that the probability of the intersection of independent events is the product of
the individual probabilities, so that
Pr(A1 ) = Pr(A2 ) = . . . = Pr(A15 ) = 0.84 × 0.22
Thirdly, the events A1 , A2 , . . . , A15 are mutually exclusive — no client can simul-
taneously both purchase and refuse to purchase! Thus
Pr[X = 2] = p[A1 ∪ A2 ∪ · · · ∪ A15 ]

= Pr(A1 ) + Pr(A2 ) + · · · + Pr(A15 )

6
= 0.22 0.84 = 15 × 0.22 × 0.84 = 0.2458.
2

6
Stop a while and convince yourself that the answer 0.22 0.84 obtained from first
2
principles is the same as that obtained by substituting n = 6, p = 0.2 and x = 2 into
the formula for the binomial probability mass function.
Try computing the remaining probabilities from first principles, and compare them
with the results obtained from the formula. The probabilities are given in the table
below:
x p(x) = Pr[X = x]
6
6
0
0 × 0.8 = 0.2621
6 1 5
1 1 × 0.2 × 0.8 = 0.3932
6 2 4
2 2 × 0.2 × 0.8 = 0.2458
6 3 3
3 3 × 0.2 × 0.8 = 0.0819
6 4 2
4 4 × 0.2 × 0.8 = 0.0154
6 5 1
5 5 ×0.2 × 0.8 = 0.0015
6 6
6 6 × 0.2 = 0.0001
The probability that all six clients purchase the product is very small (0.0001) but
will occasionally occur (we expect it roughly once in every 10 000 times that a session of
6 calls are made!). The probability that none of the 6 clients purchase is 0.2621, so that
in approximately a quarter of sessions of 6 calls no purchases are made. The probability
that two or more purchases are made is Pr[X ≥ 2] = 0.2458 + 0.0819 + 0.0154 + 0.0015 +
0.0001 = 0.3417, so that in approximately one-third of sessions of 6 calls the salesperson
achieves two or more sales.
116 INTROSTAT
Example 2B: What is the probability of a contractor being awarded only one out of
five contracts? Assume that the probability of being awarded a contract is 0.5.
Let “success” = “awarded a contract”. Pr (success) = p = 12 . So q = 1 − p = 21 . We
have n = 5 trials. Let X be the number of successes in 5 trials. Then X ∼ B(5, 12 ).
x 5−x
5 1 1
P [X = x] = p(x) = x = 0, 1, . . . , 5
x 2 2
So 5
5 1
Pr[X = 1] = p(1) = = 5/32.
1 2
Example 3B: Check that the binomial distribution

p(x) = nx px (1 − p)n−x x = 0, 1, 2, . . . , n
=0 otherwise
is a probability mass function.

(i) It is defined everywhere and p(x) 6= 0 on the finite set {0, 1, . . . , n}.
(ii) p(x) has no negative terms.
(iii) For convenience, let q = 1 − p.
n
X n
X
n n
px (1 − p)n−x = px q n−x
x x
x=0 x=0

n 0 n n 1 n−1 n x n−x n n 0
= p q + p q + ··· + p q + ··· + p q
0 1 x n
= (p + q)n
(from the binomial theorem — hence the name “binomial ”distribution)
= 1n (because q = 1 − p)
= 1.
Bar graphs of the binomial distribution . . .

To gain some feeling for the shape of the binomial distribution, consider the following
three bar graphs, for which n is fixed at 15, and p is varied.
0.3
X ∼ B(15, 0.5)
0.2
p(x)
0.1
0.0
0 5 10 15
x
0.3
X ∼ B(15, 0.3)
0.2
p(x)
0.1
0.0
0 5 10 15
x
0.3
X ∼ B(15, 0.8)
0.2
p(x)
0.1
0.0
0 5 10 15
x
Further examples on the binomial distribution. . .

Example 4B: A certain type of pill is packed in bottles of 12 pills each. 10% of the
pills are chipped in the manufacturing process.
(a) Explain why the binomial distribution can provide a reasonable model for the
random variable X, the number of chipped pills found in a bottle of 12 pills. What
are the appropriate parameters?
(b) What is the probability that a bottle of pills contains x chipped pills, i.e. what is
Pr[X = x]?
(c) What are the probabilities of
(i) 2 chipped pills?
(ii) no chipped pills?
(iii) at least 2 chipped pills?
(a) We check that the three conditions are satisfied.
1. The random experiment consists of examining a pill, and deciding whether it
is chipped or unchipped. Thus there are two possible outcomes, as required.
Because “chipped pills” are the things we are looking for and counting, we
will let “chipped pill” = “success” and “unchipped pill” = “failure”.
2. There are 12 pills in the bottle. We repeat the experiment 12 times, ex-
amining each pill. It seems reasonable to assume that the pills are chipped
independently of each other.
3. It also seems reasonable that the probability of a pill being chipped is the
same for each pill.
Thus the binomial distribution with parameters n = 12 and p = 0.10 may be used
to model the phenomenon of the number of chipped pills in a bottle of pills.
118 INTROSTAT
(b) Because X ∼ B(12, 0.10)

p(x) = 12 x
x 0.10 0.90
12−x x = 0, 1, . . . , 12
=0 otherwise
12 2
(c) (i) Pr[X = 2] = p(2) = 2 0.10 0.9010 = 0.2301
12 0
(ii) Pr[X = 0] = p(0) = 0 0.10 0.9012 = 0.2824
(iii) Pr[X ≥ 2] = 1 − Pr[X = 0] − Pr[X = 1] = 1 − 0.2824 − 0.3766 = 0.3410.
Example 5C: A TV manufacturer is supplied with a certain component by a special-

ist producer. Each incoming consignment of components is subjected to the following
quality control procedure. A random sample of 10 components is individually tested.
If there are one or more defective components among the 10 tested, the entire consign-
ment is rejected. If there are no defective components in the sample, the consignment
is accepted.
(a) What are the probabilities of a consignment being rejected if the true proportions
of defective components are
(i) 1% (ii) 10% (iii) 30%
(b) If a sample of 20 components (instead of 10) were tested, and the consignment
rejected if two or more proved defective, calculate the probabilities of rejecting a
consignment for the same proportions of defective components.
(c) Which quality control procedure do you think is the better?
The Poisson distribution. . .

Many phenomena in physics obey the Poisson probability law named in honour of
the French mathematician Simeon D. Poisson (1781–1840). The classic example is the
decomposition of radio-active nuclei. In management science, the number of demands
for service in a given period of time (e.g. on tellers in a bank, the stock pile of a factory,
the runways of an airport, the lines of a telephone exchange) often obeys (either exactly
or approximately) a Poisson distribution. This applies also to the occurrence of accidents,
errors, breakdowns and other calamities — the number that occurs within a specified
time period has a Poisson distribution under certain circumstances.
In broad terms, the condition for a “Poisson process” is that the events occur in time
“at random”. Loosely, this means that an event is equally likely to occur at any instant
in time. If a phenomenon obeys the Poisson process, then the Poisson distribution may
be used to model the number of occurrences of the event during a fixed time
period. We can also use the Poisson distribution when we count the occurrences of
an event in a fixed amount of “space”. For example, the number of faults in 100 m of
computer cable, the number of misprints on an A4 page, and the number of diamonds in
a cubic metre of ore are all Poisson processes (if the “events” occur at random in space)
and can be modelled using the Poisson distribution.
The Poisson distribution has only one parameter, namely the average rate λ at which
events are occurring per time period. Because the number of events that occur in the
interval must be an integer, the Poisson distribution is discrete. The probability mass
function is given in the box:
POISSON DISTRIBUTION
We are given a period of time during which events occur at random.
The average rate at which events occur is λ events per time period.
It is critical that the time period referred to in the rate must be the
same as the time period during which the events are counted. Let the
random variable X be the number of events occurring during the time
period.
Then X has the Poisson distribution with parameter λ, i.e. X ∼ P (λ),
and has probability mass function
e−λ λx
p(x) = x = 0, 1, 2, . . .
x!
=0 otherwise
The bar graphs below show the shape of Poisson distribution for two values of λ.
0.3
X ∼ P (3)
0.2
p(x)
0.1
0.0
0 5 10 15
x
0.3
X ∼ P (8)
0.2
p(x)
0.1
0.0
0 5 10 15
x
Example 6A: We have a large fleet of delivery trucks. On average we have 12 break-
downs per 5-day working week. Each day we keep two trucks on standby. What is the
probability that on any day
(a) no standby trucks are needed?
(b) the number of standby trucks is inadequate?
Let the random variable X be the number of trucks that break down in a given day.
Because we are dealing with breakdowns, it is reasonable to assume that they occur at
random and that the Poisson distribution is a realistic model.
Because we are interested in breakdowns per day, we need to convert the given weekly
rate into a daily rate. 12 breakdowns per 5 days is equivalent to 12/5 = 2.4 breakdowns
per day. Thus we assume that X has the Poisson distribution with parameter λ = 2.4,
i.e. X ∼ P (2.4). Hence
e−2.4 2.4x
Pr[X = x] = p(x) =
x!
120 INTROSTAT
e−2.4 2, 40
(a) Pr(no breakdowns) = Pr[X = 0] = p(0) = = 0.0907
0!
(b)
Pr(inadequate standby trucks) = Pr[X > 2]

= 1 − Pr[X ≤ 2]
= 1 − (p(0) + p(1) + p(2))
= 1 − (0.0907 + 0.2177 + 0.2613)
= 0.4303.
This means that 9% of days we will not use our standby trucks at all, but that
on 43% of days we will run out of standby trucks. We should investigate the financial
implications of putting more trucks on standby.
Example 7B: Bank tellers make errors in entering figures in their ledgers at the rate
of 0.75 errors per page of entries. What is the probability that in a random sample of 4
pages there will be 2 or more errors?
Because we are dealing with errors, we assume a Poisson distribution. If errors occur
at 0.75 errors per page, then the error rate per 4 pages is 3. So we choose λ = 3.
Hence
e−3 3x
Pr[X = x] =
x!
Then:
Pr[X ≥ 2] = 1 − Pr[X < 2]

= 1 − Pr[X = 0] − Pr[X = 1]
e−3 30 e−3 31
=1− −
0! 1!
= 1 − 0.0498 − 0.1494 = 0.8008 .
Example 8C: Show that the function
e−λ λx
p(x) = x = 0, 1, 2, . . .
x!
=0 otherwise
is in fact a probability mass function.

[You need the mathematical result:
λ2 λ3 X∞
λx i
eλ = 1 + λ + + + ··· =
2! 3! x=0
x!
Example 9C: Beercans are randomly tossed alongside the national road, with an
average frequency 3.2 per km.
(a) What is the probability of seeing no beercans over a 5 km stretch?
(b) What is the probability of seeing at least one beercan in 200 m?
(c) Determine the values of x and y in the following statement: “40% of 1 km sections
The Discrete Probability
haveDistributions are usually
x or fewer beercans, while used to represent
5% have more thanthe
y.”events that are qualitative
in nature. The following distributions are common to describe discrete data: Bernoulli, Binomial,
Poisson, Hyper-geometric and geometric Distributions
7.2 Bernoulli trial

Consider an experiment in which there are two complementary outcomes. One outcome is labelled
“success” (s) and the other is labelled “failure” (f). Such an experiment is called a Bernoulli trial. We
denote the probability of success as P(s)= p and the probability of failure as P(f) = 1–p = q
7.3 Binomial Distribution

The binomial distribution may be used as a probability model in situations in which the following
conditions are satisfied:
1. We have a random experiment which has a sample space with exactly two outcomes, one of which
we can label “success”, and the other “failure”: i.e. S = success, failure. e.g. A door-to-door
salesperson calls on a prospective client — the client either purchases the product (success) or
does not purchase (failure).
2. The random experiment is repeated n times, n ≥ 1. The outcome on any one repetition is not
influenced by the outcome on any other repetition. We say “we have n independent trials of the
random experiment”. e.g. The salesperson calls on n = 6 prospective clients — the clients make
their purchasing decisions independently (there is no communication between them!).
3. The probability of success remains constant from trial to trial. We assume that each client is
equally likely to purchase the product. Let Pr(success) = p; thus Pr(failure) = 1 − p. It is
sometimes convenient to let q = 1 − p, so that Pr(failure) = q.
4. Our random variable X is the number of successes we observe in n trials. If the conditions above
are satisfied, then we say that we have a binomial process, and that the random variable X
has a binomial distribution. In the above example, X is the number of calls that resulted in
sales. Because 6 calls were made, X must assume one of the values 0, 1, . . . , 6 and X is therefore
an example of a discrete random variable. Binomial processes occur in many contexts. From
an industrial or commercial perspective, one of the most important binomial processes occurs in
the field of quality.
BINOMIAL DISTRIBUTION
We have n independent trials, each trial has two outcomes, success of failure, and Pr(success) = p
and q = 1 − p for all trials. The random variable X is the number of successes in n trials; n ≥ 1
must be an integer, and 0 ≤ p ≤ 1. Then X has the binomial distribution, i.e. X ∼ B(n, p), with
probability mass function


 n x n−x
 p q x = 0, 1, . . . , n
x
p(x) =



0 Otherwise
Expectation: E[X] = np Variance : Var[X] = npq
Que20: What is the probability of a contractor being awarded only one out of five contracts? Assume
that the probability of being awarded a contract is 0.5.
Ans20: Let “success” = “awarded a contract”. Pr(success) = p = 12 . So q = 1 − 21 = 12 . We have n = 5

trials. Let X be the number of successes in 5 trials. Then X ∼ B(5, 1/2).

5
1 x

1 5−x
p(X = x) = 2 2 x = 0, 1, . . . 5
x

5
1 x 1 5−x
x 2 2
p(X = 1) = p(1) = 1 1 4
= 5 12
5 2 5
= 5 12 = 32
Que21: A certain type of pill is packed in bottles of 12 pills each. 10% of the pills are chipped in the
manufacturing process.
(a) Explain why the binomial distribution can provide a reasonable model for the random vari-
able X, the number of chipped pills found in a bottle of 12 pills. What are the appropriate
parameters?
(b) What is the probability that a bottle of pills contains x chipped pills, i.e. what is P r[X = x]?
(c) What are the probabilities of
(i) 2 chipped pills? (ii) no chipped pills? (iii) at least 2 chipped pills?
7.4 The hypergeometric distribution

: The binomial and negative binomial distributions require that the probability of success remains the
same from trial to trial. In many practical situations this is unrealistic — in particular it is unrealistic
in sampling problems when the sampling is done “without replacement”.
Consider, for example, a population of N items, M of which are defective. We draw a sample of size n.
Let the random variable X be the number of defective items in n items sampled without replacement.
For the event X = x to occur, we must draw x items from the M defective items, and n − x from the
N− Mnon-defective items. Counting rule 3 of chapter 3 tells us that we canchoose x items from M
M N −M
in ways, and that we can choose n − x items from N − M in ways.
x n−x
Thus the total number of ways in which the event X = x can occur is

M N −M
.
x n−x
HYPERGEOMETRIC DISTRIBUTION
Given a population of size N, of which M are defective, a sample of size n (n ≤ N ) is drawn. Let
the random variable X be the number of defectives in the sample. Then X has the hypergeometric
distribution with parameters N,M and n, X ∼ H(N, M, n) and X has probability mass function
   

 M N − M 

 

 x n−x

   x = 0, 1, . . . , n
p(x) = N
 

 n






0 Otherwise
Que26: A fisherman caught 10 lobsters, 3 of which were undersized. An inspector of the Sea Fisheries
Branch measured a random sample of 4 lobsters. What is the probability that the sample
contains no undersized lobsters?
Ans26: Here N = 10,M = 3 and n = 4. If X is the number of undersized lobsters in the sample of 4,
then

M N −M N
p(X = 0) = /
x n−x n

3 10 − 3 10
= /
0 4−0 4
= 0.1667
Conversely, the probability that the inspector finds at least one undersized lobster is 1−0.1667 =
0.8333.
Que 27: A team of 15 people is chosen from a class of 65 MBA students to play a social rugby match. The
class contains 25 engineers. What is the probability that the team contains (a) four engineers?
(b) at least four engineers?
Ans 27: (a) Let X be the number of engineers in the sample. Then N = 25 + 40 = 65, M = 25 and
n = 15, so that X ∼ H(65, 25, 15). Thus

25 40 65
p(X = 4) = / = 0.1410
4 11 15
(b) The probability that X is 4 or more is

25 40 25 40 25 40 25 40 65
p(X ≥ 4) = + + + / = 0.9176
0 15 1 14 2 13 3 12 15
Que 27: A bowl contains 10 blue and 7 red marbles. Four (4) marbles are drawn at random from the
bowl. Calculate the probability of (a) two (b) at least 3 blue marbles drawn when sampling is
done
1) with replacement 2) without replacement.
Anse 27: (1a.) Given are N = 17, M = 10, N = 7 and n = 4

2 2
4 10 7
P (X = 2) = C2 = 0.352
17 17
(1b.) P (X ≥ 3) = P (X = 3) + P (X = 4)
3 1 4 0
4 10 7 4 10 7
C3 + C4 = 0.455
17 17 17 17
10 7
(2a.) P (X = 2) = C172 ×
C4
C2
= 0.397
(2b.) P (X ≥ 3) = P (X = 3) + P (X = 4)
10 C 7 10 C 7
3 × C1 4 × C0
= 17 C
+ 17 C
= 0.441
4 4
7.5 Poisson Distribution

POISSON DISTRIBUTION
Events occur at random in time, with an average rate of λ events per time period (or space).
The random variable X is a count of the number of events occurring during a fixed interval of
time (or space). The time period (or amount of space) referred to in the rate must be the same
as the time period (or space) in which events are counted. Then X has the Poisson distribution
with parameter λ ≥ 0, i.e. X ∼ P (λ), and has probability mass function
 e−λ λx
 x! x = 0, 1, . . . , n
p(x) =

0 Otherwise
Expectation: E[X] = λ and its variance : V ar[X] = λ

Examples of Poisson events are:
a.) The number of bad cheques presented for daily payment at a bank.
b.) The number of road deaths per month.
c.) The number of bacteria in a given culture.
d.) The number of defects per square meter on metal sheets being manufactured.
e.) The number of mistakes per typewritten page.

Que P22: A radioactive substance emits alpha particles at a constant rate of one particle every 2 seconds, in
the conditions stated above for applying the Poisson distribution model. What is the probability
of detecting at most 1 particle in a 10-second interval?
Ans 22: Assuming the second as the time unit, we have µ = 0.5. Therefore λt = 5 and
X µx e−µ
P (X ≤ 1) =
x!
i=1
50 e−5 51 e−5
= +
0! 1!
= e−5 + 5e−5
= 0.04.
Que P23: A secretary claims an average mistake rate of 1 per page. A sample page is selected at random
and 5 mistakes found. What is the probability of her making 5 or more mistakes if her claim of
1 mistake per page on average is correct?
Ans 22: In this case λ = 1 is claimed and X the number of mistakes ≥ 5. If the claim is true,
P (X ≥ 5) = 1–P (X ≤ 4)
= 1–0.9963 = 0.0037.
The above calculation shows that if the claim of 1 mistake per page on average is true, there is
only a 37 in 10 000 chance of getting 5 or more mistakes per page.
Que P24: At a particular restaurant 4 plates are broken, on average, each week. What is the probability
that
a) 2 plates are broken next week? Ans: 0.1465
b) at most 4 plates are broken next week? Ans: 0.6288
c) more than 3 plates are broken next week? Ans: 0.5665
Que P25: A computer that operates continuously breaks down at random on average 1.5 times per week.
• Solution :This tells us λ = 1.5 per week, and that the random variable X, the time between
breakdowns, has density function

 1.5e−1.5x x ≥ 0
P (x) =

0 Otherwise
What is the probability of no breakdowns for 2 weeks?
• Solution: This implies that X must be greater than 2 and that we want Pr[X > 2]. Because
the exponential distribution is continuous, we evaluate this probability by integration:
Z ∞
∞
P r[X > 2] = 1.5e−1.5x dx = e−1.5x 2
2
= −e−∞ + e−3 = 0 + e−3
== 0.0498.
What is the probability of a breakdown within 3 days?

• Solution: We first make our units of time compatible: 3 days = 3/7 week. We want the
probability of a breakdown before 3/7 week:
Z 3/7
3/7
P r[0 ≤ X ≤ 3/7] = 1.5e−1.5x dx = e−1.5x 0
0
= −e−0.6429 + e0 = −0.5258 + 1
== 0.4742.
8 CONTINUOUS PROBABILITY DISTRIBUTION 68
8 Continuous Probability Distribution

8.1 The Normal Distribution
A random variable, X, which assumes a normal distribution function , it is said to be a continuous
random variable. Many continuous random variables have distributions such that “values close to the
mean are most probable and values further away from the mean are decreasingly probable”unimodal”
a. values c units larger than the mean are just as probable as values that are c units smaller than
the mean ”symmetry” That is many continuous random variables have probability distributions
that look like this
b. A normal probability density with mean 0 and variance=1, X has The probability density func-
tion or pdf given above is the pdf of the normal probability distribution (sometimes called the
Gaussian probability distribution). The normal distribution is the most important distribution
in statistics.
124 INTROSTAT
(iii)
Z ∞ Z ∞
f (x) dx = λ e−λx dx = [−e−λx ]∞
0
0 0
= −e−∞ + e0 = 0 + 1
= 1,
as required for the area under the curve of a probability density function.
Example 12C: Let the random variable X be the time in hours for which a light bulb
burns from the time it is put into service. The probability density function of X is given
by
1
f (x) = 10001
e− 1000 x x ≥ 0
=0 otherwise
(a) What is the probability that the bulb burns for between 100 and 1000 hours?
(b) What is the probability that the bulb burns for more than 1000 hours?
(c) What is the probability that the bulb burns for a further 1000 hours, given that
it has already burned for 500? (Use conditional probabilities!)
Example 13C: Events occur according to a Poisson process with “intensity” λ (i.e. at
rate λ per unit of time).
(a) Use the Poisson distribution to determine the probability of no events in t units of
time.
(b) Now use the exponential distribution to determine the probability that the time
between events is greater than t.
(c) Compare the answers to (a) and (b) and explain these results.
Example 14C: Flaws occur in telephone cable at the average rate of 4.4 flaws per
km of cable. Calculate the following probabilities. (Make use of binomial, Poisson and
exponential distributions.)
(a) What is the probability of 1 flaw in 100 m of cable?
(b) What is the probability of more than 3 flaws in 250 m of cable?
(c) What is the probability that the distance between flaws exceeds 500m?
(d) In ten 200 m lengths of cable, what is the probability that 8 or more are free of
flaws?
The normal distribution. . .

The normal distribution is often referred to as the Gaussian distribution, in honour of
Carl Friedrich Gauss (1777–1855), a famous German mathematician, who, for more than
a century, was credited with its discovery. The same result was published at about the
same time by the equally famous French mathematician the Marquis de Laplace (1749–
1827). But the normal distribution had actually been discovered nearly a century earlier
by Abraham de Moivre (1667–1754). In 1733 he published a mathematical pamphlet that
was not widely circulated and was quickly forgotten. A copy of de Moivre’s pamphlet
was found in 1924, and the English statistician Karl Pearson found that it contained the
formula for the normal distribution. De Moivre’s precedence in discovering the normal
Normal DISTRIBUTION
c. The normal distribution is continuous, and has probability density function

!
1 1 x−µ 2
f (x) = √ exp − − ∞ ≤ x ≤ +∞
2πσ 2 2 σ
There are two parameters, µ (“mu”, the Greek letter m for Mean) and σ (“sigma”, the
Greek letter s for Standard deviation).
d. The normal distribution is not the only distribution whose probability density function (pdf)
looks bell- shaped,- but it is the most important one and many real world random variables follow
the normal distribution at least approximately. The normal distribution -like the binomial and
Poisson is an example of a parametric probability distribution
It is completely described by a small number of parameters
– In the case of the binomial - there were two parameters n and p

– In the case of the Poisson - there was just one parameter λ the mean of the distribution
and its variance is also equal to λ .i.e E(X) = V ar(X) = λ
– In the case of the normal - there are two parameters
∗ µ = the mean of the distribution and
∗ σ 2 = the variance of the distribution
distribution is contained in a paper published in 1924 by Pearson (Historical note on

the origin of the normal curve of errors) in the journal Biometrika volume 16 pages
402–404, an important statistical journal which is still publishing major discoveries in
statistics.
The normal distribution is the most important distribution in statistics. Part of the
reason for this is a result called the “central limit theorem”, which states that if a
random variable X is the sum of a large number of random increments, then X has the
normal distribution.
The daily turnover of a large store is the sum of the purchases of all the individual
customers. The height of a 50-year old pinetree can be thought of as the sum of each
year’s growth — which itself is a variable affected by sunshine, temperature, rainfall,
etc. So one expects the heights of 50-year old pinetrees to obey a normal distribution.
Similarly, an examination mark is the sum of the scores in a large number of questions.
Thus, by the central limit theorem, one expects daily turnover, the heights of trees and
examination marks (approximately, at least) to be normally distributed.
The normal distribution is continuous, and has probability density function
2
− 21
x−µ
1 σ
f (x) = √ e −∞<x<∞
2πσ 2
There are two parameters, µ (“mu”, the Greek letter m for Mean) and σ (“sigma”,
the Greek letter s for Standard deviation).
The constant µ tells us where the graph is located (it can take on any real value);
the constant σ (which is always positive) tells how spread out the distribution is. The
graphs, depicting f (x) for a few values of µ and σ, make this clear: The most striking
feature of the normal distribution is that it is bell-shaped. Notice also that the centre
of the bell is located at the value µ, and that the distribution gets flatter as σ gets
larger. The plot also illustrates the fact that the area under the curve for a probability
density function is one; to accommodate this, notice that as the curve gets “flatter”, its
maximum value has to become smaller.
0.8 ...
. .
.. ...
.. .. N (0, 1)
.
. .
. N (0, 0.5 ) 2
.
.
.
. N (2, 4 ) 2
. .
.
0.6 .
.
.
.
.
.
N (6, 1)
. .
. .
. .
. .
. .
. .
. .
. .
. .
0.4 f (x) . .
. ..... ........ ..
.
...
.. .... . ....
. ..... ... . . .
. ... ...
.. . ... ...
.... .
. . . .
.
.
...
.
...
... .... ...
.
...... ...
. .
.. ...
..... ...... . .
.... ..
. ...
. . ... ...
... . . .... .
. .
...
. . .
... . ..... .
0.2 .
.
... . .
...
... ..
.
. .
...
.... .
. ... . .
.. . ... ... ...
... . .. . .
... . . ... .
. ...
...
.
.
...
... ...... . ..
..
.... .................. ....... ....... ....... ....... ....... .. .
. .
........... ......... ....... ...... . ...
.. ..... ....... ....... .........
. ...
.
. .... .............. . . .... . ............. .. .
....... . . ... ... ... .. ....... ...... .... ....
.
...
.. .
...
... ............. ....... .
..
..
...... .
. .. .....
..
..
.. .
..
. .. ....... ........... ......
... ........ .. .. ......... . . .... . . ....... .......
.......................... ..... .... .... .............................. .... . ..
−5 0 5 10
x
If X has the normal distribution with parameters µ and σ, we abbreviate this to X ∼
N (µ, σ 2 ), reading this as “the random variable X has the normal distribution with
parameters µ and σ 2 . When we use this notation, our convention is to write σ 2 for
126 INTROSTAT
the second parameter, not plain σ. The parameter σ 2 is known as the variance of the
distribution. As in Chapter 1, the variance is the square of the standard deviation.
Unfortunately it is impossible to determine probabilities by integrating the normal
probability density function. However (and this makes life very easy), the integration
can be done by computer, and we are supplied with a table of probabilities for the normal
distribution.
It should come as a surprise to you that a single table is all we need. After all,
there are infinitely many combinations of µ and σ, and it seems that we ought to have
a massive book of normal tables. We are luckier than we deserve to be, and there is
a connecting link between all normal distributions which makes it possible to get away
with a single table! We will learn how to use this amazing table by means of an example.
Example 15A: If the amount of margarine, X, in a 250 g tub is normally distributed

with µ = 251 and σ = 3, what is the probability that the tub will contain
(a) between 251 g and 253 g of margarine?
(b) less than 250 g of margarine?
For part (a) we want Pr[251 ≤ X ≤ 253]
Z 253
1 1 x−251 2
= √ e− 2 ( 3 ) dx
251 2π9
It is impossible to evaluate this integral by finding an indefinite integral and then
substituting. So we resort to our tables. We are given tables for only one set of values of
the two parameters — this is all we need: when µ = 0 and σ = 1 we have the standard
normal distribution which has density function
1 1 2
f (x) = √ e− 2 x −∞<x<∞
2π
How do we make do with tables for only the standard normal distribution? Because
we have an easily proved result that the proportion of the density function that
lies between the mean and a specified number of standard deviations away
from the mean is always constant regardless of the numerical values of the mean
and standard deviation.
Translated into mathematical symbols, this important result can be written as
Z µ+zσ Z z
1 1 x−µ 2 1 1 2
√ e− 2 ( σ ) dx = √ e− 2 z dz
µ 2πσ 2 0 2π
Use integration by substitution to prove this by putting z = (x − µ)/σ.

As an example of this, the areas depicted below are equal. The shading, in both
cases, shows the area under the curve between the mean and one standard deviation
above the mean. Both plots have the same scale on both axes — so you can count the
dots for a numerical “proof”!
Z ∼ N (0, 1)
0.4 0.4 ........
... ....
.. ....
.. .. ..
...
.. .. ..
...
...
.. .. ...
... ... ...
..
...
... ... ..
..
... .... ..
...
.... .... ...
...
... .... ..
...
... .... ...
...
... ..... ..
...
... ..... ...
.... ..... ...
...
....
... ..... ...
..... ...
0.2 f (x) f (z) 0.2 ... ...

... ..... ..
...
.... ..... ..
...
...
... ..... ...
X ∼ N (251, 32 ) .... ..... ...
...
... ..... ...
...
.... ..... ...
...
....... .... .
..............................
........... ....... ... ..... ...
...
......
..... ........ . ......
..... ... ..... ...
.
.
....
..
.
............ .....
...... ... ..... ...
...
..
.
.
....
..
.............. .....
..... ...
.... ..... ...
...
.
.
.... ............... .....
.....
..... ... ..... ...
...
.
.....
.
.....
. ............... .....
..... .. ..... ...
.....
.
.
..
.
............... ......
...... .
.
... ..... ..
..
...
..
..
.
.
.....
..
............... ......
...... .
...
..... ...
.
.......
..
...... ............... .......
........ .... ..... ...
...
.
...........
...
...
...
...
...
...
...
.. .
.............
...
...
............... ..........
.............
.................
.
.....
..... ...
.
245 250 255 −2 0 2

x σ=3 z σ=1
254 is one standard deviation (i.e. 3 units) above 251, the mean. Thus the area between
251 and 254 in N (251, 32 ) is the same as that between 0 and 1 in N (0, 1).
Returning to part (a) of our margarine example, we need the area between 251 and
253 of N (251, 32 ). 253 is two-thirds of a standard deviation above the mean of 251,
because (253 − 251)/3 = 2/3. Thus Pr[251 < X < 253] = Pr[0 < Z < 2/3], as depicted
below:
Z ∼ N (0, 1)
0.4 0.4 ........
... ...
.. ....
.. . ..
...
. .. ..
..
..
... .. ..
...
.... ... ...
..
...
... ... ..
..
.... ... ..
...
... ... ...
...
... ... ...
...
... ... ..
...
... ... ..
...
... ... ..
...
... ... ...
....
... ... ...
..
... ...
0.2 f (x) f (z) 0.2 .... ...

... ... ...
...
... ... ...
...
.... ... ...
X ∼ N (251, 32 ) ... ... ...

...
... ... ...
...
...
.... ... ...
........
.........
....
.............. ..................... ... ... ...
...
......
..... ........ .
......
.....
..... ... ... ...
.
...
.....
.
..
.......... ......
.....
.... ... ...
...
.
......
. .......... .....
.....
...
.... ... ..
...
.
....
..
.
..... .......... .....
.... ... ... ...
...
.
.....
.
..
.......... .....
..... .
..
. ... ..
.
.
.
....
..
.
.......... ......
...... .
.. ... ..
...
...
..
.
.....
.
.
.......... ......
...... .
.
.. ... ...
...
.
......
..
...... .......... .......
........ ... ... ...
...
.
...............
...
...
...
...
...
...
.. .
..........
. ..
...
...
.......... .........
............
....................
.
.
..
...
... ...
245 250 255 −2 0 2

x
Some numerical results from the normal tables help to give a “feel” for the normal
distribution. The area from one standard deviation below the mean to one standard
deviation above the mean is 0.683 (close to 2/3rds); i.e.
Pr[µ − σ < X < µ + σ] = 0.683.
The corresponding probabilities for two, three and four standard deviations are:
Pr[µ − 2σ < X < µ + 2σ] = 0.954
Pr[µ − 3σ < X < µ + 3σ] = 0.997

128 INTROSTAT
Pr[µ − 4σ < X < µ + 4σ] = 0.999999

These results are true for all combinations of µ and σ! In general terms, two-thirds (68%)
of a normal distribution is within one standard deviation of its mean, 95% is within two
standard deviations, and virtually all of it is within three standard deviations,
The general result is that the area between µ and some point x for N (µ, σ 2 ) is the
same as the area between 0 and z = x−µ x−µ
σ for N (0, 1). The formula z = σ tells us how
many standard deviations the point x is away from the mean µ. Once again, you can
count the dots in the plot below of the normal distribution with arbitrary parameters µ
and σ and in the standard normal distribution N (0, 1):
Z ∼ N (0, 1)
0.4 ..
........
.
... ...
.. ....
.. .. ..
... .. ..
..
...
... .. ..
...
.... ... ..
..
... ... ..
..
...
.... .... ..
...
... .... ...
... .... ...
...
... .... ..
...
...
... ..... ...
... ..... ...
...
.... ..... ...
...
.... ..... ...
..
f (x) f (z) 0.2 ... ...... ...
..
... ...... ...
...
.... ...... ..
...
... ...... ...
...
X ∼ N (µ, σ 2 ) ... ....... ...
...
.... ....... ...
... ....... ...
...
.........
......
.... .
.......................................
.......
...
.... ....... ...
.....
.....
........ . .....
..... ... ....... ...
...
.
....
..
. ............ .....
..... ... ....... ...
...
.
.
....
..
.
.............. .....
..... ... ....... ..
..
..
.
....
.
.
................ .....
..... ....
... ....... ...
...
.
.....
.
......
. .................. . .....
..... ... ....... ...
...
.
....
..
. ..................... ......
...... .
..
.
....... ..
.
..
.
.....
..
..
..................... ......
......
...... .
.
.
....... ...
...
.
.......
..
..
.....
.
..................... .......
........ ..
.
.. ....... ...
...
.........
..................................
..
...
...
..
..................... ..........
..............
..............
.
..... ....... ....
µ x −2 0 2 x−µ
z= σ
In our margarine example, we use z = (x − µ)/σ for x = 251 and x = 253 to get

251 − 251 253 − 251
Pr[251 < X < 253] = Pr <Z< = Pr[0 < Z < 0.67]
3 3
From the table for the standard normal distribution (Table 1) we read off this probability
as 0.2486. Thus
Pr[251 < X < 253] = 0.2486,
almost a quarter of margarine tubs contain between 251 g and 253 g of margarine.
Part (b) of our question asked for the probability that a tub of margarine was
underweight, i.e. the probability that X < 250. The area between −∞ and 250 in
N (251, 32 ) is the same as the area between −∞ and (250 − 251)/3 = −1/3 in N (0, 1):

250 − 251
Pr[X < 250] = Pr Z < = P [Z < −1/3].
3
Z ∼ N (0, 1)
0.4 0.4 ........
... ...
.. ....
.. ..
... ..
.. ...
...
. ..
...
...
...
. ...
..
..
. ... ..
..
... ... ...
.... ...
...
.. ... ..
.. ...
...
...
...
.... ...
...
..
...
... ....
...
...
... ....
...
...
f (x) f (z) ...... ...

...
...
...
0.2 0.2 .... ... ..
...
.... .... ..
...
. .... ...
... ...
2
.......
...
X ∼ N (251, 3 ) ....
...
...
...
. ....
... ...
.........
.... ...
...................................... ...
.........
....... ...
....... ...
..
. . .....
......
...... .
.
.. ..... ... ...
..... . . . ......
..... ... ...
.....
..... .......
..... ...
.... ......
..... ... ...
...... ......... .
..... ....
..
..
. . . . . .
....
.
.....
..... ... . ...... ...
...
... . . . . . . . ......
.. ..... . ...
.... . . . . . . .
.. ..
..... .
.... . . . . . . . .
............
.
..
.
..
.
..... .................
..
..
...... ...
. . ..
..
...
....
..
. ...
...... .
....... . . . . . . . . . . . ...
...... .
.. ...
.. . . . . . . . . . . . .
..............
. ....... ... ...
........... . . . . . . . . . . .
... ... . ...
..... ........
............... . . . . . . . . . . . . .
.......... ... ...
................................. . . . . . . . . . . . . . . . ......
.
..
.............. .
..............
245 250 255 −2 0 2

x
Because our tables give us areas between 0 and a point z, we have to go through the
steps depicted in the diagrams below to find this probability. We make use of the facts
that the normal distribution is symmetric, and that the area from 0 to ∞ is 0.5.
Alternatively, we can write:
Pr[X < 250] = Pr[Z < (250 − 251)/3] = Pr[Z < −1/3] = Pr[Z > 1/3]
= 0.5 − Pr[0 < Z < 1/3] = 0.5 − 0.1293 = 0.3707.
The value 0.1293 is looked up in Table 1. Thus 37% of the tubs will contain less margarine
than stated. Notice that because the normal distribution is symmetric we only need
tables for “half” of the distribution.
Example 16B: If µ = 4 and σ = 8 what is the probability that a normally distributed

random variable X lies between 2 and 18?

2−4 X −µ 18 − 4
Pr[2 < X < 18] = < < (letting z = (x − µ)/σ)
8 σ 8
= Pr [−0.25 < Z < 1.75]
= 0.0987 + 0.4599 = 0.5586
Example 17B: A t-shirt manufacturer knows that the chest measurements of his
customers are normally distributed with mean 92 cm and standard deviation 5 cm. He
makes his t-shirts in four sizes — S ( fit size range 80–87 cm), M (to fit 87–94), L (to
fit 94–101) and XL (to fit 101–108). What proportion of customers fit into each size
t-shirt?
130 INTROSTAT
.......................
...... .....
.... ....
...
... ...
.... ...
...
... ..
.. ..
.
..
.
..
..
.. X ∼ N (92, 5 ) 2
.. ..
.
.... ...
...
.
.. .
.. ...
.. ..
... ...
... ...
.
.... ...
...
.... ...
.. ...
. ...
.... ...
... ...
.
.. ...
.... ...
...
.
.. ...
...
. ...
..
. ...
. ...
.... ...
.... ...
..
.
. ...
.
... ...
.
... ...
.... ...
...
.
.. ...
.
... ...
.
... ...
.. ...
.
. ...
.
... ...
.... ...
...
...
. ...
...
. ...
...
. ...
...
.... ...
...
. ...
.... ...
...
.... ...
.. ...
.
....
.
.
...
.
.
..
.
S M L ...
...
....
....
...
XL
..
...
. ....
..... .....
...
. ......
.
..
.....
. ......
.......
. .
...... .........
...
...
. ...........
...
...
...
...
...
......... .............................
...............
80 87 94 101 108
x
We need to find the z-values for each of the boundary points, by using the formula
z = (x − µ)/σ.
Then, from our normal tables, we find the area between each of these points and the
mean. This gives
x z = (x − 92)/5 Area between x and µ
80 −2.4 0.4918
87 −1.0 0.3413
94 0.4 0.1554
101 1.8 0.4641
108 3.2 0.4993
The proportions for each size are then found by subtraction (or addition in the case
of size M), as follows:
Size Proportion
S 0.4918 − 0.3413 = 0.1505 (15.05%)
M 0.3413 + 0.1554 = 0.4967 (49.67%)
L 0.4641 − 0.1554 = 0.3087 (30.87)%
XL 0.4993 − 0.4641 = 0.0352 ( 3.52%)
Check for yourself that 0.89% of customers don’t fit into any size t-shirt.
Example 18C: The mean inside diameter of washers produced by a machine is 0.403
cm and the standard deviation is 0.005 cm.
Washers with an internal diameter less than 0.397 cm or greater than 0.406 cm are
considered defective. What percentage of the washers produced are defective, assuming
the diameters are normally distributed?
Example 19C: In a large group of men 4% are under 160 cm tall and 52% are between
160 cm and 175 cm tall. Assuming that heights of men are normally distributed, what
are the mean and standard deviation of the distribution?
Example 20C: A soft-drink vending machine is set to discharge an average of 215 ml

of cooldrink per cup. The amount discharged is normally distributed with standard
deviation 10 ml.
(a) If 225 ml cups are used, what proportion of cups overflow?
(b) What is the probability that a cup contains at least 200 mℓ of cooldrink?
(c) What size cups ought to be used if it is desirable that only 2% of cups overflow?
Sums and differences of independent normal random variables . . .

Suppose we have a number of tasks that have to be completed in sequence e.g.
when a building is constructed. Suppose the time taken for each task obeys a normal
distribution, each having a given mean and variance and is independent of the time taken
for the other tasks. Obviously the total time taken will also be a random variable.
What will its distribution be and what will its mean and variance be? Without proof, we
state that the total time taken will be normally distributed with mean total time equal
to the sum of the means for each task, and variance equal to the sum of the variances
(not the standard deviations). Mathematically, we write this as follows. If the time Xi
taken for the i th task is such that
Xi ∼ N (µi , σi2 )
and if it is independent
P of the time taken for other tasks, then the distribution of the
random variable Y = ni−1 Xi is
Y ∼ N (µ, σ 2 )
P P
where µ = ni=1 µi , and σ 2 = ni=1 σi2 .
Sometimes we need to consider the difference of two independent normally dis-
tributed random variables. Suppose
X1 ∼ N (µ1 , σ12 ) and X2 ∼ N (µ2 , σ22 )
then, letting Z = X1 − X2 , we state, without proof, that
Z ∼ N (µ1 − µ2 , σ12 + σ22 ).
The mean of the random variable Z is found by subtraction, but the variance is still
found by addition.
Example 21B: You have 4 chores to perform before getting to Statistics lectures by
08h00. The time (in minutes) to perform each chore is normally distributed with mean
and standard deviation as given below:
mean (µ) std. dev. (σ)

1. Shower 5 0.5
2. Get dressed 4 1.0
3. Eat breakfast 10 3.5
4. Drive to university 15 5.0
132 INTROSTAT
(a) If you get up at 07h20, what is the probability of being late?

(b) (i) At what time should you get up so as to be 99% sure you will not be more
than 3 minutes late?
(ii) What is the probability, getting up at this time, that you will be there after
08h10?
(a) The total time taken to get to university is a normally distributed random variable
X with mean
µ = 5 + 4 + 10 + 15 = 34 minutes
and variance
σ 2 = 0.52 + 1.02 + 3.52 + 5.02 = 38.5
and therefore standard deviation σ = 6.205.
The probability that you take more than the allowed 40 minutes is

40 − 34
Pr[X > 40] = Pr Z > = Pr[Z > 0.97] = 0.1660
6.205
On average, you will be late one day in six, because 1/0.1660 ≈ 6.

(b) (i) We must choose x so that Pr[X < x] = 0.99. From tables Pr[Z < z] = 0.99
implies z = 2.33. Thus, using the formula z = (x − µ)/σ ,
x − 34
2.33 = ,
6.205
which has solution x = 48.5 minutes.
48.5 minutes before 08h03 is 07h14.5.
(ii) Probability of taking a total of more than 45.5 minutes is

45.5 − 34
Pr[X > 45.5] = Pr Z > = Pr[Z > 1.85] = 0.0322.
6.205
You’ll be late about one day in 31, on average, but (by part (b)) more than
three minutes late only once in every 100 days.
Example 22C: Plastic caps seal the ends of the tube into which your degree certificate
is placed when you graduate. Suppose the tubes have a mean diameter of 24.0mm and
a standard deviation of 0.15mm, and that the plastic caps have a mean diameter of
23.8 mm and a standard deviation of 0.11mm. If the diameter of the cap is 0.10 mm
or more larger than that of the tube, the cap cannot be squashed into the tube, and if
the diamater of the cap is 0.45 mm or more smaller than that of the tube, it will not
seal the tube, but will just keep falling out. If a tube and and plastic cap are selected at
random, what are the probabilities of (a) the cap being too large for the tube, and (b)
the cap falling out of the tube?
END

Probability I

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Probability I

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability I

Uploaded by

Copyright:

Available Formats

PROBABILITY I

Rasheed A. ADEYEMI (PhD)

November 26, 2022

CHAPTER 1: Set, subset, intersection, union, complement, empty and uni-

CHAPTER 2: Random experiments, sample space, events, elementary events, cer-

CHAPTER 3: Discrete Random variables: Discrete Probability Distributions,

CHAPTER 4: Continuous Random variables: The Normal Distribution, Mean,

KEYWORDS: Set, subset, intersection, union, complement,

Why do we have to do set theory? ...

and we say: “e is an element of A”. Because e does not belong to B, we write

and we say: “e is not an element of B”.

CHAPTER 2. SET THEORY 49

Example 6A: If P = {x | 0 ≤ x ≤ 10} and Q = {x | 5 < x < 20}, find P ∩ Q. Is

No, 5 6∈ P ∩ Q, but, yes, 10 ∈ P ∩ Q.

The empty set, mutually exclusive sets . . .

The universal set, the sample space . . .

Example 7A: If P = {x | 0 ≤ x ≤ 10} and Q = {x | 5 < x < 20}, find P ∪ Q.

Example 8A: If S = {x | 0 ≤ x ≤ 1} and D = {x | 0 < x < 1}, find D.

CHAPTER 2. SET THEORY 51

Pairwise mutually exclusive, exhaustive sets . . .

...... ...... .... ... ..

Using Venn diagrams . . .

B = (A1 ∩ B) ∪ (A2 ∩ B) ∪ (A3 ∩ B) ∪ (A4 ∩ B).

5C Only (b) and (d) are true.

9C All are true.

11C All are true.

CHAPTER 2. SET THEORY 53

More difficult exercises . . .

∪ni=1 Bi = S and Bi ∩ Bj = ∅ for i 6= j.

Let A be any other subset of S.

[Notation: ∪ni=1 Bi means B1 ∪ B2 ∪ . . . ∪ Bn .]

2.1 (a) The number six is obtained.

2.2 All are true except (d) and (f).

CHAPTER 2. SET THEORY 55

2.4 All are true, except (g) and (h).

2.9 (c) C and D are mutually exclusive.

KEYWORDS: Random experiments, sample space, events,

New wine in old wineskins . . .

Random experiments, sample spaces, trials . . .

CHAPTER 3. PROBABILITY THEORY 59

• W : females are in the second and fourth positions.

Kolmogorov, father of probability . . .

KOLMOGOROV’S AXIOMS OF PROBABILITY

1. 0 ≤ Pr(A) ≤ 1 for all A ⊂ S

3. If A ∩ B = ∅ (i.e. if A and B are mutually exclusive events) then

A consequence of the Kolmogorov axioms is that Pr(∅) = 0. The function Pr(A)

CHAPTER 3. PROBABILITY THEORY 61

A class experiment — birthdays in April, May and June . . .

Number of Number with birthdays Relative

Front three rows

Some useful theorems . . .

Pr(A ∪ A) = Pr(A) + Pr(A).

But A∪A = S, and Pr(S) = 1, by Kolmogorov’s 2nd axiom, so Pr(A∪A) = 1. Therefore

Theorem 2. If A ⊂ S and B ⊂ S then Pr(A) = Pr(A ∩ B) + Pr(A ∩ B).

Pr(A) = Pr(A ∩ B) + Pr(A ∩ B)

Notice that theorem 2 may also be expressed as

Pr(A ∩ B) = Pr(A) − Pr(A ∩ B).

Theorem 3. The Addition Rule. For any arbitrary events A and B,