0% found this document useful (0 votes)
11 views78 pages

Probability Theory

The document is a set of lecture notes for MTH 2102 Probability Theory, authored by Daniel Nkwata Katongole, covering various aspects of probability including statistical experiments, sample spaces, events, and probability definitions. It details different types of random variables, probability distributions, and specific distributions such as binomial and Poisson. The notes serve as a comprehensive guide for understanding the fundamental concepts and applications of probability theory.

Uploaded by

ibitiomara02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views78 pages

Probability Theory

The document is a set of lecture notes for MTH 2102 Probability Theory, authored by Daniel Nkwata Katongole, covering various aspects of probability including statistical experiments, sample spaces, events, and probability definitions. It details different types of random variables, probability distributions, and specific distributions such as binomial and Poisson. The notes serve as a comprehensive guide for understanding the fundamental concepts and applications of probability theory.

Uploaded by

ibitiomara02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Probability Theory

MTH 2102
Lecture Notes

BY
Daniel Nkwata Katongole

2025
MTH 2102 Probability Theory– Semester Two 1
Table of Contents
Table of Contents 2

1 Probability 4
1.1 Introduction to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 A Statistical ( random) experiment . . . . . . . . . . . . . . . . . . . 4
1.1.2 Features or characteristics of a statistical experiment . . . . . . . . 4
1.1.3 Example of statistical experiments . . . . . . . . . . . . . . . . . . . 5
1.1.4 Sample or Event space. . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.5 Examples of random experiments and their sample spaces . . . . . 5
1.1.6 Types of sample spaces . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.7 An Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.8 Operation of events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Mutually Exclusive Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Probability of an event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 The case of countably finite sample space . . . . . . . . . . . . . . . 12
1.3.2 The classical definition . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.3 The Relative frequency definition . . . . . . . . . . . . . . . . . . . . 14
1.3.4 The subjective definition . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.5 The Kolmogorov definition/Axiomatic definition . . . . . . . . . . . 15
1.4 Dependent (Conditional Probability) events. . . . . . . . . . . . . . . . . . 19
1.5 Independent events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6 Total probability and Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . 24

2 RANDOM VARIABLES 28
2.1 Types of quantitative Random variables . . . . . . . . . . . . . . . . . . . . 29
2.2 The Probability Distributions/ Probability Functions . . . . . . . . . . . . . 29
2.3 Discrete Probability Distributions/Functions . . . . . . . . . . . . . . . . . . 29
2.4 The cumulative mass function/Distribution function . . . . . . . . . . . . . 31
2.5 Properties of a discrete random variable . . . . . . . . . . . . . . . . . . . . 33
2.5.1 Expectation of a random variable E(X) . . . . . . . . . . . . . . . . . 33
2.6 Expectation of a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7 Variance of a discrete random variable Var(x) . . . . . . . . . . . . . . . . . 35
2.8 Mode of a discrete random Variable . . . . . . . . . . . . . . . . . . . . . . 36
2.9 Median of a discrete random Variable . . . . . . . . . . . . . . . . . . . . . 36

3 Common Discrete Probability Distributions 37


3.1 Binomial and Bernoulli probability distributions . . . . . . . . . . . . . . . 37
3.2 Verifications that the Binomial and Bernoulli distributions are a pmfs . . . 39

2
3.3 Mean and Variance of a Binomial random variable . . . . . . . . . . . . . . 40
3.3.1 The Expected value of a binomial random variable X . . . . . . . . 40
3.3.2 Variance, Var(X) of a Binomial random variable X . . . . . . . . . . 41
3.4 The Negative Binomial distribution (The waiting time or Pascal distribution) 46
3.4.1 The mean and variance of X . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.2 Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.3 Mean and variance of X ∽ b(x; 1, P) . . . . . . . . . . . . . . . . . . . 49
3.5 The mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6 Hyper geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.1 Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.7 The Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.7.1 Properties / postulates of a Poisson process . . . . . . . . . . . . . . 53
3.7.2 Example of Poisson processes; . . . . . . . . . . . . . . . . . . . . . 54
3.7.3 The Mean and Variance of a Poisson random variable . . . . . . . . 54
3.7.4 Verifying that P(x; λ)is a pmf . . . . . . . . . . . . . . . . . . . . . . 56
3.7.5 Mode (most likely value) of the Poisson distribution . . . . . . . . 57
3.7.6 Computing probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7.7 Poisson approximation of the Binomial distribution . . . . . . . . . 59

4 Continuous Probability Distributions 62


4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2 The distribution function of a continuous random variable . . . . . . . . . 63
4.3 The expected value, Variance Median and Mode of a continuous random
variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Some common continuous distributions (pdfs) . . . . . . . . . . . . . . . . 65
4.4.1 Uniform (Rectangular) Distribution . . . . . . . . . . . . . . . . . . 65
4.4.2 Normal/Gaussian distribution . . . . . . . . . . . . . . . . . . . . . 67
4.4.3 The Gamma function . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

MTH 2102 Probability Theory– Semester Two 3


Chapter 1

Probability
1.1 Introduction to Probability
1.1.1 A Statistical ( random) experiment
A statistical experiment is a chance / random /nondeterministic /stochastic process whose
particular outcome to occur cannot be predicted with certainty though all its possible
outcomes may be known. After carrying out an experiment, the result of the random
experiment is then known. Thus, we cannot know the particular outcome of an experi-
ment/process until it is performed.

1.1.2 Features or characteristics of a statistical experiment


i It can be repeated indefinitely under essentially unchanged conditions.

ii When the experiment gets performed repeatedly, the individual outcomes seem to
occur in a haphazard manner. However repeating the experiment a large number of
times, a definite pattern or regularity of outcomes appears.

iii Although the particular outcome to occur cannot be stated with certainty, the set of
all possible outcomes can be described (Sample / event space).

Note: All statistical experiments have three things in common:

• The experiment can have more than one possible outcome.

• Each possible outcome can be predicted or specified in advance but the particular
one to occur may not be known with certainty.

• The outcome of the experiment to occur depends on a chance.

4
1.1.3 Example of statistical experiments
1 A coin toss;

• There is more than one possible outcome.


• We can specify each possible outcome in advance –either head (H) or tail (T)
• There is an element of chance for the outcome to occur- depending on the fairness
of coin or conditional status declared.

2 Sitting for an exam as a process

• Has more than one possible outcome.


• Possible outcome to occur can be specified in advance –either to pass (P) or failure
(F)
• Each of the possible outcome will depend on chance to occur determined by a
number of factors; teaching methods, content coverage, environment, exam time
preparedness by student, health status of student, etc

3 Rolling a dice once

1.1.4 Sample or Event space.


This is the set of all possible outcomes of a statistical experiment. In the context of a
random experiment, the sample space is the universal set. The definition of a sample
space depends on the nature or definition of a given random experiment.

1.1.5 Examples of random experiments and their sample spaces


1. toss a coin once; S={H, T}

2. toss a coin twice; S={HH,HT,TH,TT}

3. toss a coin three times; S ={HHH, HHT, HTH, THH, HTT,THT,TTH,TTT}

4. roll a dice once; S={1, 2, 3, 4, 5, 6}

5. Observe the number of goals in a soccer match; S={0, 1, 2, 3, . . . . }

Each individual element of an event space resulting from a random experiment is called
a sample outcome or a point but generally an outcome. A sample point appears at most
once in a set. When we repeat a random experiment several times, we call each particular
performance a trial.

MTH 2102 Probability Theory– Semester Two 5


1.1.6 Types of sample spaces
• The number of events in the event space need not be finite or countable.

• If a sample space consists of a countably finite number of sample points is called a


countably finite sample space. Examples 1 - 4 above.

• If the sample points can be listed but the number of sample points is infinite (endless
list of outcomes), then the sample space is said to be a countably infinite sample
space. Example 5 above.

• If the sample space contains an uncountably infinite number of sample points is


called an uncountably infinite sample space. Example; choosing a number between
0 and 1 exclusive, S = {x : 0 < x < 1, x ∈ R}

Countably finite or countably infinite sample spaces are generally called Discrete/Discontinuous
sample spaces while uncoutably infinite sample spaces are said to be Continouos sample
spaces.

1.1.7 An Event
Is the occurrence of one or a combination of two or more sample points of a sample space.
An event is part or a subset of the sample space-hence an event is a set also. If it contains
only one sample point it is said to be a simple event or indecomposable. And if the event
contains two or more sample points, it is called a compound or decomposable event because
it can be decomposed into simple events.

Example
Consider rolling a dice once; The sample space is given by:

S = {1, 2, 3, 4, 5, 6} (1.1)

The set
E = {obtaining a 4}, =⇒ E = {4} (1.2)
is a simple event. Whereas if

G = {obtaining an even number} =⇒ G = {2, 4, 6}. (1.3)

H = obtaining a number > 3 , =⇒ H = {4, 5, 6}.



(1.4)
The subsets G and H are compound events.

1.1.8 Operation of events


Event sets of a random experiment can be manipulated using set operations such as
Union, intersection, etc. Details follow;

MTH 2102 Probability Theory– Semester Two 6


1 Intersection of events: If A and B are events, then ∩ is the intersection operation.

• (A ∩ B) denotes the intersection of A and B


• (A ∩ B) is a set of all elements common to sets A and B.
• (A ∩ B) can be a simple or compound event
• (A ∩ B) occurs if both A and B occur.
• The event (A1 ∩ A2 ∩ ........... ∩ An ) occurs if all A1 , A2 ....and An occur.
• Key words ”and” and ”all of” correspond to intersections.

2 Union of events:

• ∪ is the union operation and (A ∪ B) denotes the union of sets A and B.


• (A ∪ B) is the set of all simple elements in either A or B or both.
• In the set (A ∪ B), each element is written once.
• (A ∪ B) is also a compound event.
• Similarly, if A1 , A2 ....and An are events, then the event {A1 ∪ A2 ∪ A3 ........ ∪ An }
occurs if at least one of {A1 ∪ A2 ∪ A3 ........ ∪ An } occurs.

3 A null event: This is an empty event, A = {} = ∅ (meaning an impossible event to occur)

4 Complementary events: Given an event A, a complementary event of A is denoted Ac


or A′ or A and is the set of all elements in the event space not in A. Hence sets A and Ac
are said to be complementary events.

Meanings of the following symbols ought to be known;

• ∈: is a member of

• <: is not a member of

• ∋: contains as a member,

• ⊂: is a subset of

• ⊆: is a subset of or equal to

• ⊇: is a super-set of or equal to

• {}: These are curly brackets used for a set, ( for binding an operation)

• ∃: there exists an element

MTH 2102 Probability Theory– Semester Two 7


Example
Consider rolling a dice once; S = {1, 2, 3, 4, 5, 6}. If A is the event ”obtaining a number < 1”,
then A is a null event i.e A = {} or A = ∅

Remarks
Operations of events is similar to the algebra of sets. The following are some of the rules
of events. Let S be the universal set.

1 Identity laws

• A∪∅=A
• A∪S=S
• A∩∅=∅
• A∩S=A

2 Commutative laws

• A∪B=B∪A
• A∩B=B∩A

3 Associative laws

• (A ∪ B) ∪ C = A ∪ (B ∪ C)
• (A ∩ B) ∩ C = A ∩ (B ∩ C)

4 Distributive laws

• A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
• A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)

5 Complementary laws

• (Ac )c = A
• Sc = ∅
• ∅c = S
• (A ∪ Ac ) = S
• A ∩ Ac =∅

6 De-Morgan’s laws

• (A ∩ B)c = Ac ∪ Bc
• (A ∪ B)c = Ac ∩ Bc
• (A1 ∩ A2 ∩ A2 ∩ ....... ∩ An )c = Ac1 ∪ Ac2 ∪ Ac2 ∪ ....... ∪ Acn = ∪ni=1 Aci

MTH 2102 Probability Theory– Semester Two 8


• (A1 ∪ A2 ∪ A2 ∪ ....... ∪ An )c = Ac1 ∩ Ac2 ∩ Ac2 ∩ ....... ∩ Acn = ∩ni=1 Aci

7 Idempotent law

• A∩A=A
• A∪A=A

1.2 Mutually Exclusive Events


Two events A and B are said to be mutually exclusive if

A∩B=∅ (1.5)

or
P(A ∩ B) = 0 (1.6)
such that
P(A ∪ B) = P(A) + P(B) (1.7)
If this is so, then A and B are also said to disjoint events. A and B have no common
elements. Generally, for n disjoint events Ai ; i = 1, 2, .....n Then
n
X
P(∪ni=1 Ai ) = P(A1 ) + P(A2 ) + .........P(An ) = P(Ai ) (1.8)
1

If in addition to being mutually exclusive, events A and B are such that

(A ∪ B) = S (1.9)

or
P(A ∪ B) = P(S) = 1 (1.10)
then A and B are said to be mutually exhaustive events. Note that clearly mutual exclu-
siveness implies mutual exhaustiveness. Thus,

(A ∩ B) = ∅ =⇒ (A ∪ B) = S (1.11)

or
P(A ∩ B) = 0 =⇒ P(A ∪ B) = P(S) = 1 (1.12)
However, mutual exhaustiveness does not necessarily imply mutual exclusiveness. Thus,
if A ∪ B = S or P(A ∪ B) = P(S) = 1, then not necessarily true that A ∩ B = ∅ or P(A ∩ B) = 0.

Example
If A and B are mutually exhaustive events, test whether the following are mutually
exclusive events;

MTH 2102 Probability Theory– Semester Two 9


(a) P(Ac ∩ B)

Solution
From the contingency table

B Bc
A A∩B A ∩ Bc
Ac Ac ∩ B Ac ∩ Bc

A and B being mutually exhaustive events imply that

(A ∩ B) = ∅ (1.13)

and
(A ∪ B) = S (1.14)
or
P(A ∩ B) = 0 (1.15)
and
P(A ∪ B) = 1 (1.16)
Now
B = (A ∩ B) ∪ (Ac ∩ B) (1.17)
or
Ac = (Ac ∩ B) ∪ (Ac ∩ Bc ) (1.18)
P(B) = P(A ∩ B) + P(Ac ∩ B) (1.19)
P(Ac ∩ B) = P(B) − P(A ∩ B) (1.20)
P(Ac ∩ B) = P(B) − 0 (1.21)
P(Ac ∩ B) = P(B) , 0 (1.22)

Hence Ac and B are not mutually exclusive although events A and B are mutually
exhaustive

(b) P(A ∩ Bc )
Solution
From the contingency table

B Bc
A A∩B A ∩ Bc
Ac Ac ∩ B Ac ∩ Bc

MTH 2102 Probability Theory– Semester Two 10


Bc = (A ∩ Bc ) ∪ (Ac ∩ Bc ) (1.23)
or
A = (A ∩ B) ∪ (A ∩ Bc ) (1.24)
P(Bc ) = P((A ∩ Bc ) ∪ (Ac ∩ Bc )) (1.25)
P(Bc ) = P(A ∩ Bc ) + P(Ac ∩ Bc ) (1.26)
P(A ∩ Bc ) = P(Bc ) − P(Ac ∩ BC ) (1.27)
P(A ∩ Bc ) = [1 − P(B)] − [1 − P(A ∪ B)] (1.28)
P(A ∩ Bc ) = 1 − P(B) − 1 + P(A) + P(B) − P(A ∩ B) (1.29)
P(A ∩ Bc ) = P(A) , 0 (1.30)

(c) P(Ac ∩ Bc ) Solution


From the contingency table

B Bc
A A∩B A ∩ Bc
Ac Ac ∩ B Ac ∩ Bc

Bc = (A ∩ Bc ) ∪ (Ac ∩ Bc ) (1.31)
or
Ac = (Ac ∩ B) ∪ (Ac ∩ Bc ) (1.32)
P(Ac ) = P((Ac ∩ B) ∪ (Ac ∩ Bc )) (1.33)
P(Bc ) = P(A ∩ Bc ) + P(Ac ∩ Bc ) (1.34)
p(Ac ∩ Bc ) = P(Bc ) − P(A ∩ Bc ) (1.35)
p(Ac ∩ Bc ) = P(Bc ) − (P(A only)) (1.36)
p(Ac ∩ Bc ) = P(A ∪ B)c (1.37)
p(Ac ∩ Bc ) = 1 − P(A ∪ B) (1.38)
p(Ac ∩ Bc ) = 1 − [P(A) + P(B) − P(A ∩ B)] (1.39)
p(Ac ∩ Bc ) = 1 − [P(A) + P(B)] , 0 (1.40)

1.3 Probability of an event


As remarked above, if an experiment is a chance process and is performed, the occurrence
of an outcome is random or by chance. hence when a random experiment is performed,
a quantifiable likelihood or chance of an individual outcome A to occur is defined. This
is called the probability of an event, denoted P(A).

MTH 2102 Probability Theory– Semester Two 11


So every event of an experiment seems to occur depending on its probability assigned to
it.

P(A) is a value/ number between 0 and 1 inclusive i.e 0≤ P(A) ≤ 1 that shows how likely
an event A is to occur. If P(A) is 0, the event null, if close to 0, then it is very unlikely that
the event A occurs , if P(A) is close to 1, A is very likely to occur and if P(A) is 1, then
it is a sure deal that the event is to occur. Below are some of the alternative definitions
/interpretations of the probability of an event, P(A);

1.3.1 The case of countably finite sample space


According to this definition, to determine the probability of events resulting from a
random experiment with a finite sample space S = {a1 , a2 , ......an }, we assume P
that there
exists some set of numbers wi called weights which are such that 0 ≤ wi ≤ 1 and S wi = 1.
Assigning the respective sample points ai weights wi according to their likelihood of
occurrence, then the probability of event A, P(A) over the sample space S is the sum of all
weights assigned to the sample points corresponding to event A. Thus,
X
P(A) = ai wi (1.41)
S

wi = 1.
P
and w can be established using the conditions 0 ≤ wi ≤ 1 and S

Example 1
Suppose that a fair dice is rolled once, find the probability that the outcome is an even
number.

Solution

Finite Sample space


S = {1, 2, 3, 4, 5, 6}
wi = 1 This implies
P
Let weights wi , be assigned to all outcomes such that 0 ≤ wi ≤ 1 and

w1 + w2 + w3 + w4 + w5 + w6 = 1

Since the die is fair, then all the outcomes are equally likely. and therefore

w1 = w2 = w3 = w4 = w5 = w6 = w

This therefore implies


6w = 1
or
1
w=
6

MTH 2102 Probability Theory– Semester Two 12


Let A be an event that an even number appears,

A = {2, 4, 6}

Hence,
X 1
P(A) = wi = w + w + w = 3w = 3 × = 0.5
6
Example 2
A dice is loaded once such that an odd number is thrice as likely to appear on top as an
even number. What is the probability that on the toss of the dice, numbers less than 3
appear?
Solution
Finite Sample space
{1, 2, 3, 4, 5, 6}
Let weights w, be the weight of an even number, then the weight of an odd number is
3w (or if the weight of an odd number is w, the weight f an even number is 31 w, such that
0 ≤ wi ≤ 1 and 61 wi = 1 This implies that
P

3w + w + 3w + w + 3w + w = 1

12w = 1
or
1
w=
12
Let A be an event that an even number appears, A = {1, 2} Hence,
X
P(A) = Awi = 3w + w = 4w

or
1
P(A) =
3
Or if the weight of an odd number is w, the weight of an even number is 31 w, such that
0 ≤ wi ≤ 1 and 61 wi = 1 This implies that
P

1 1 1
w+ + w + + w + = 4w = 1
w w w
1
w=
4
Let A be an event that an even number appears,A = {1, 2} Hence,
X 1 4 1
P(A) = Awi = w + w = w =
3 3 3
Disadvantages of this definition/interpretation

MTH 2102 Probability Theory– Semester Two 13


This definition fails if;

• The sample space is infinite or unknown or empty

• The likelihood of occurrence of the respective outcomes are not known

1.3.2 The classical definition


Let an experiment have a finite sample pace S with N equally likely outcomes. If n of the
N outcomes correspond to some event A, then the probability of A, is defined by

n(A) n
P(A) = = , N,0 (1.42)
n(S) N

Proof
Let S = a1 , a2 , ....aN . Since the outcomes are equally likely, and S is finite we assign w to
each ai such that that 0 ≤ w ≤ 1 inclusive and Sw = 1
P

1
Nw = 1 =⇒ w = (1.43)
N
If n of the N outcomes correspond to event A, then A = {a1 , a2 , .....an } and so that
X 1 n
P(A) = Aw = w + w + ...... + w = nw = n × = (1.44)
N N

Remark
The word fair is used to convey the fact that the possible outcomes are equally likely. This
is the most commonly used definition of probability of an event. It is a particular case of
the first definition when the outcomes are equally likely.

Limitations
This definition fails if;

• The outcomes are not equally likely

• The sample space is infinite or unknown

1.3.3 The Relative frequency definition


Let an experiment be performed under exactly the same conditions, repeated N times. If
n of these N times correspond to some event A, then n is a relative frequency of event A
among the N trials. Then the probability of event A is defined as,
n
P(A) = lim (1.45)
N→∞ N

MTH 2102 Probability Theory– Semester Two 14


implying that N is sufficiently large. In other words, the probability of an event is the
proportion of the times that events of the same kind will occur in a long run or among
sufficiently large numbers of outcomes.

Example

If records show that 516 out of the 600 aero-planes from Entebbe to Nairobi arrive on time,
what is the probability that anyone aero-plane from Entebbe to Nairobi will not arrive on
time?

Solution

n 84
P(L) = lim = lim = 0.14
N→∞ N 600→∞ 600

Limitations

• Repeating an experiment exactly under the same conditions is impossible even with
the same equipment.

• We are not told how large the number of trials N should be in order to be considered
being “sufficiently large”

1.3.4 The subjective definition


Here, an event has a probability of occurrence as assigned to it by an individual, thus
subjectivity from experience. Examples include;

• Superstitions

• Gambling games on streets

• Lotteries

1.3.5 The Kolmogorov definition/Axiomatic definition


Kolmogorov- a Russian Mathematician, identified three facts in this definition of the prob-
ability of an event. He called them “axioms of probability” and they were automatically
accepted since they needed no proof. Later his definition became the axiomatic definition
of probability and today these axioms act as a foundation for Probability theory.

1. Axiom 1:
For any event A, P(A) ≰ 0, P(A) ≤ 1. Or 0 ≤ P(A) ≤ 1. This axiom indicates that
P(A) cannot be negative. The smallest value for P(A) is zero and if P(A) = 0 then the
event A will never happen/occur.

MTH 2102 Probability Theory– Semester Two 15


On the other hand, if P(A) = 1, event A will surely occur. And 0 ≤ P(A) ≤ 1 implies
that the probability of event A is a proportion of times it occurs within a given
number of trials. Therefore, P(A) is a variable and always lies between 0 and 1.

2. Axiom 2:
Probability of the sample space S is P(S) = 1. This axiom states that the probability of
the whole sample space is absolutely 1 (100%). S is a sure event to occur. The reason
for this is that the sample space S contains all possible outcomes of the random
experiment. Thus, the outcome of each trial always belongs to S, i.e., the event S
always occurs and P(S) = 1. At any trial, any outcome must be part of the sample
space.

3. Axiom 3:
If A1 , A2 , ......An is a sequence of n disjoint or mutually exclusive events in the sample
space S, then the probability of occurrence of at least one of them i.e their union is
given by
n
X
P(A1 ∪ A2 ∪ ... ∪ An ) = P(A1 ) + P(A2 ) + ... + P(An ) = P(∪ni=1 AI ) = P(Ai ) (1.46)
i=1

This axiom is probably the most interesting one. The basic idea is that if some
events are disjoint (i.e, there is no overlap between them), then the probability of
their union must be the summations of their probabilities. Another way to think
about this is to imagine the probability of a set as the area of that set. If several sets
are disjoint, then the total area of their union is the sum of individual areas.

Kolmogorov’s three axioms have been used to prove various theorems in Probability
theory such as;

1. P(Ac ) = 1 − P(A), since A ∪ Ac = S, where Ac and A are complementary events in S.

Figure 1.1

This implies that


A ∪ Ac = S

MTH 2102 Probability Theory– Semester Two 16


and this is the complementary law

P(A ∪ Ac ) = P(S) = 1

P(A) + P(Ac ) − P(A ∪ Ac ) (1.47)

P(A) + P(Ac ) − 0 = 1 (1.48)


since A and Ac are Mutually exclusive events

P(Ac ) = 1 − P(A) (1.49)

2. If A ∪ ∅ = A,then P(∅) = 0 Given that A = A ∪ ∅,, identity law

P(A) = P(A ∪ ∅) = P(A) + P(∅) − P(A ∩ ∅) (1.50)

P(A) = P(A) + P(∅) − 0 (1.51)


P(A) = P(A) + P(∅) (1.52)
P(A) − P(A) = P(∅) (1.53)
P(∅) = 0 (1.54)

3. If A ⊆ S,then P(A) ≤ 1.
From
P(A) ≤ P(S) = 1
then
P(A) ≤ 1 (1.55)

4. If A ⊆ B in S,P(A) ≤ P(B). This is called Boole’s inequality.

Figure 1.2

B = (A ∪ (B − A)) = (A ∪ Bonly) (1.56)


P(B) = P(A) + P(B − A) − P(A ∩ (B − A)) (1.57)
P(B) = P(A) + P(Bonly), (1.58)

MTH 2102 Probability Theory– Semester Two 17


since A and (B − A) are Mutually exclusive If B only = ∅ it implies that, A = B and
therefore implies P(B) = P(A). If B only , 0, it implies A ⊂ B, and P(A) ≤ P(B) or
P(B) > P(A).Hence, for A ⊆ B , then P(A) ≤ P(B)

5. P(A ∪ B) = P(A) + P(B) − P(A ∩ B) where A and B are non-mutually exclusive events
in a sample space S. This is called an additive rule.

Figure 1.3

From the Venn diagram above;

A ∪ B = (A only ∪ (A ∩ B) ∪ B only) (1.59)

A ∪ B = (A ∪ B only) (1.60)
A ∪ B = (A only ∪ B) (1.61)
Using Eq. 1.59
P(A ∪ B) = P(A only ∪ (A ∩ B) ∪ B only) (1.62)
P(A ∪ B) = P(A only) + P((A ∩ B) + P(B only) − 0 (1.63)
since A only,(A ∩ B) and B only) are Mutually exclusive. But

P(A only) = P(A) − P(A ∩ B) (1.64)

and
P(B only) = P(B) − P(A ∩ B) (1.65)
So

P(A ∪ B) = P(A) − P(A ∩ B) + P(A ∩ B) + P(B) − P(A ∩ B) = P(A) − P(A ∩ B) (1.66)

Or
P(A ∪ B) − P(B) = P(A) − P(A ∩ B) (1.67)
Or
P(A − B) = P(A only) = P(A) − P(A ∩ B) (1.68)

And
P(B only) = P(B) − P(A ∩ B) (1.69)

MTH 2102 Probability Theory– Semester Two 18


Exercise

a Repeat using Eq.1.62 and Eq. 1.61


b Prove that if A,B and C are events, then

P(A ∪ B ∪ C) = P(A) + P(B) + P(C) − P(A ∩ B) − P(A ∩ C) − P(B ∩ C) + P(A ∩ B ∩ C) (1.70)

Note
In all the definitions above, a probability function of an event,P(A) is a number.

1.4 Dependent (Conditional Probability) events.


The probability of an event A to occur given that some other event B has already occurred
is called conditional probability of A given B, denoted P(A/B). We say that the probability
of A depends on the previous occurrence of event B. Hence, A and B are dependent events
and P(A/B) is defined by

P(A ∩ B)
P(A/B) = (1.71)
P(B
This implies that
P(A ∩ B) = P(A/B) × P(B) (1.72)
Similarly, P(B/A), is the conditional probability of event B given that event B has occurred,
defined by
P(A ∩ B)
P(B/A) = (1.73)
P(A)
This implies that
P(B ∩ A) = P(A/B) × P(A) (1.74)
Since (A ∩ B) = (B ∩ A) by the commutative law, then;

P(A ∩ B) = P(A/B) × P(B) = P(A/B) × P(A) (1.75)

The equations Eq. 1.72 and Eq. 1.74 are called multiplicative rules of probability. Gener-
ally, if A1 , A2 , ........An are events in some sample space S such that P(Ai ) , 0, P(Ai ∩ A j ) , 0
and P(∩ni=1 Ai , 0), then the multiplicative rule becomes

A2 A3 An
     
P(∩ni=1 Ai ) = P(A1 ) × P ×P × ...... × P (1.76)
A1 A1 ∩ A2 A1 ∩ A2 ∩ A3 ∩ ....... ∩ An−1
For i = 3,
A2 A3
   
P(∩ni=1 Ai ) = P(A1 ) × P ×P (1.77)
A1 A1 ∩ A2
Example//

MTH 2102 Probability Theory– Semester Two 19


1 The probability that a regular scheduled flight departs on time is 0.83, and the proba-
bility that it arrives on time is 0.92. The probability that it departs on and arrives on
time is 0.78. Find the probability that the plane;

i Arrives on time given that it departs on time


ii Departs on time given that it arrives on time

Solution
Let D=Plane departs on time P(D) = 0.83, If A =Plane arrives on time P(A) = 0.92,
P(D ∩ A) = 0.78 = P(A ∩ D)
i)
P(A ∩ D) 0.78
P(A/D) = = = 0.94
P(D) 0.83

ii)
P(D ∩ A) 0.78
P(D/A) = = = 0.95
P(A) 0.92
2 An urn contains 20 mangoes of which 5 are bad. If 2 mangoes are selected at random
from the urn in succession without replacement of the first mango, find the probability
that both mangoes are bad.
Solution
Let M1 =picking a mango the first time M2 =picking a mango the second time

5 4 1
P(M1 ∩ M2 ) = P(M1 ) × P(M2 /M1 ) = × =
20 19 19

1.5 Independent events


Event A is said to be independent of B if the occurrence of A is not influenced by the
previous occurrence of B. Thus,
P(A/B) = P(A) (1.78)
P(B/A) = P(B) (1.79)
Conditions Eq. 1.78 and Eq. 1.79 are the sufficient and necessary conditions for events A
and B to be independent. This implies that

P(A ∩ B) = P(A) × P(B/A) × P(B) (1.80)

And and
P(B ∩ A) = P(B) × P(A/B) × P(A) (1.81)
More generally, from Eq. 1.80,

P(∩ni=1 ) = P(A1 ) × P(A2 ) × P(A3 ) × ......... × P(An ) = Πni=1 PAi (1.82)

MTH 2102 Probability Theory– Semester Two 20


Remarks
If events A and B are independent, then;

a Ac and B are independent

b A and Bc are independent

c Ac and Bc are independent

Each of the above can be shown using the contingency table

∩ B Bc
A A∩B A ∩ Bc
Ac Ac ∩ B Ac ∩ Bc

1. a (Ac ∩ B)
P(Ac ∩ B) = P(B) − P(A ∩ B) (1.83)
P(Ac ∩ B) = P(B) − P(A) × P(B) = P(B)(1 − P(A)) (1.84)
P(Ac ∩ B) = P(B) × P(Ac ) (1.85)

2. (A ∩ Bc )
P(A) = P(A ∩ B) + P(A ∩ Bc ) (1.86)
P(A ∩ Bc ) = P(A) − P(A) × P(B) (1.87)
P(A ∩ Bc ) = P(A)[1 − P(B)] = P(A) × P(Bc ) (1.88)

3. (Ac ∩ Bc )
P(Ac ∩ Bc ) = P(Ac ∩ B) + P(Ac ∩ Bc ) (1.89)
P(Ac ∩ Bc ) = P(Ac ) − P(Ac ∩ B) + P(Ac ∩ Bc ) (1.90)
P(Ac ∩ Bc ) = P(Ac ) − P(Ac ) × P(B) = P(Ac )[1 − P(B)] = P(Ac ) × P(B∗) (1.91)

Examples

1 Prove that P(A/B) + P(Ac /B) = 1


Solution

P(A ∩ B) P(Ac ∩ B) P(A ∩ B) + P(Ac ∩ B) P(B)


P(A/B) + P(Ac /B) = + = = =1
P(B) P(B) P(B) P(B)
2 A box contains 20 fuses of which 5 are defective. If 3 of the fuses are selected at random
and removed from the box with succession, what is the probability that all the three
fuses are defective if removed;

a) Without replacement?
b) With replacement?

MTH 2102 Probability Theory– Semester Two 21


Solution
n(D) = 5, n(Dc ) = 15. therefore P(D) = 20
5
and P(Dc ) = 20
15

a) Without replacement, it means this is a dependent case

P(D ∩ D ∩ D) = P(D) × P(D/D) × P(D/(D ∩ D))

5 4 3 1
P(D ∩ D ∩ D) =
× × =
20 19 18 114
b)With replacement, it means an independent case

5 5 5 1
P(D ∩ D ∩ D) = P(D) × P(D) × P(D) = × × =
20 20 20 64

3 A box contains 6 black ball and 4 white balls. Two balls are selected from the box
without replacement. Find the probability that;
a) Both balls are black
b) Both balls are of the same colors
c) Both balls are of different colors
Solution
Given n(B) = 6, n(W) = 4 and n(T) = 10, therefore P(B) = 6
10
and P(W) = 4
10
Without
replacement implies dependence case a) Both ball black

P(B ∩ B) = P(B) × P(B/B)

6 5 1
P(B ∩ B) = × =
10 9 3
b) Balls of the same color

6 5 4 3 7
P((B ∩ B) ∪ (W ∩ W)) = P(B ∩ B) + P(W ∩ W) = × + × =
10 9 10 9 15
c) Different colors
7 8
P((B ∩ W) ∪ (W ∩ B)) = P(B ∩ W) + P(W ∩ B) = 1 − P(Same Color) = 1 − =
15 15

4 A bag contains 4 white buttons and 3 black ones and a second bag contains 3 white
and 5 black buttons. One button is picked at random from the second bag and placed
unseen into the first bag. What is the probability that a button drawn from the first bag
is white?
5 Two baskets A and B contain 8 white, 4 blue and 5 white and 10 blue balls respectively.
A ball is drawn from randomly from A and is transferred into B. If a ball is drawn
randomly from B, find the probability that it will be white?

MTH 2102 Probability Theory– Semester Two 22


6 A small town has two fire trucks and one ambulance available for emergencies. The
probability that the fire truck is available is 0.89, and the probability that the ambulance
is available is 0.92.Find the probability that both are available.

7 A coin is tossed three times. Find the probability of getting 2 tails and 1 head if the coin
is;

(a) Fair
(b) Such that a head is twice as likely to occur as the tail.

Solution
The sample space is S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}// a) If the coin is
fair, then all the outcomes in the sample space are equally-likely , each with a probability
of P = 18 . The event with 2 tails and 1 head would be A = {HTT, THT, TTH}. Then

P(A) = P(HTT) + P(THT) + P(TTH)

1 1 1 3
P(A) = + + =
8 8 8 8
b) Here, the outcomes independently occur but are not equally likely Letw bePthe weight
of a tail This implies that the weight of the head is such that 0 ≤ w ≤ 1and s w = 1
For HHH,
P(HHH) = P(H) × P(H) × P(H) = 2w × 2w × 2w = 8w3
For HHT,
P(HHT) = P(H) × P(H) × P(T) = 2w × 2w × w = 4w3
For HTH,
P(HTH) = P(H) × P(T) × P(H) = 2w × w × 2w = 4w3
For THH,
P(THH) = P(T) × P(H) × P(H) = w × 2w × 2w = 4w3
For HTT,
P(HTT) = P(H) × P(T) × P(T) = 2w × w × w = 2w3
For THT,
P(THT) = P(T) × P(H) × P(T) = w × 2w × w = 2w3
For TTH,
P(TTH) = P(T) × P(T) × P(H) = w × w × 2w = 2w3
For TTT,
P(TTT) = P(T) × P(T) × P(T) = w × w × w = w3
But X
W=1
S

MTH 2102 Probability Theory– Semester Two 23


Therefore,
8w3 + 4w3 + 4w3 + 4w3 + 2w3 + 2w3 + 2w3 + w3 = 1
or
1
27w3 = 1 =⇒ w =
3
Given that A = {HTT, THT, TTH} Then

P(A) = P(HTT, THT, TTH) = P(HTT) + P(THT) + P(TTH)

But

P(HTT) + P(THT) + P(TTH) = P(H).P(T).P(T) + P(T).P(H).P(T) + P(H).P(T).P(T)

Therefore
2 1 1 1 2 1 12 1 2
P(A) = × × + × × + × =
3 3 3 3 3 3 33 3 9

1.6 Total probability and Bayes’ Theorem


Let A1 , A2 , ....An be n mutually exclusive events and that they constitute a partition of a
sample space S (Their union is in some sample space), where P(Ai ) , 0 for i = 1, 2, 3...n.

Figure 1.4

After Ai has occurred, also let B be an event in S, read as “B s a partition of S” such that
P(B) , 0

MTH 2102 Probability Theory– Semester Two 24


Figure 1.5

B can be expressed as a union of the n Mutually exclusive events (Ai ∩ B). Thus,

B = (A1 ∩ B) ∪ (A2 ∩ B) ∪ (A2 ∩ B) ∪ .......... ∪ (An ∩ B) (1.92)

P(B) = P(A1 ∩ B) + P(A2 ∩ B) + P(A2 ∩ B) + .......... + P(An ∩ B) (1.93)


But
P(A1 ∩ B) = P(A1 ) × P(B/A1 ) (1.94)
P(A2 ∩ B) = P(A2 ) × P(B/A2 ) (1.95)
In general
P(An ∩ B) = P(An ) × P(B/An ) (1.96)
So,
P(B) = P(A1 ) × P(B/A1 ) + P(A2 ) × P(B/A2 ) + ...... + P(An ) × P(B/An ) (1.97)
Xn
P(B) = P(Ai ) × P(B/Ai ) (1.98)
i=1

This is called the law of total probability. For some event AI , then

P(Ai ∩ B) P(Ai ∩ B) P(Ai ) × P(B/Ai )


P(Ai /B) = = Pn = Pn (1.99)
P(B) i=1 P(Ai ) × P(B/Ai ) i=1 P(Ai ) × P(B/Ai )

This is called the famous Reverend Thomas Bayes’ Theorem This is called the famous
Reverend Thomas Bayes’ Theorem

Example
Members of some consultancy firm rent cars from three rental agencies; 60% from agency
A1 , 30% from agency A2 , and 10% from agency A3 . If 9% of the cars from A1 need a
tune-up, 20% of the cars from A2 need a tune-up and 6% from A3 need a tune-up.

(a) What is the probability that a rental car delivered to the firm will need a tune-up?

(b) If a rental car delivered to the firm needed a tune-up what is the probability that it

MTH 2102 Probability Theory– Semester Two 25


came from A2

Solution
Let T represent an event that a car needs a tune-up service, then; Given P(A1 ) = 0.6,
P(A2 ) = 0.3,P(A3 ) = 0.1, P(T/A1 ) = 0.09,P(T/A2 ) = 0.2 and P(T/A3 ) = 0.06
(a)
P(T) = P(A1 ∩ T) + P(A2 ∩ T) + P(A3 ∩ T)
P(T) = P(A1 ) × P(T/A1 ) + P(A2 ) × P(T/A2 ) + P(A3 ) × P(T/A3 )

P(T) = 0.6 × 0.09 + 0.3 × 0.2 + 0.1 × 0.06 = 0.12


Hence 12% of the cars delivered to the firm need a tune-up (b)

P(A2 ∩ T)
P(A2 /T) = = P(A2 ) × P(T/A2 )
T
P(A2 ∩ T) P(A2 ) × P(T/A2 ) 0.3 × 0.2
P(A2 /T) = = = = 0.5
P(T) P(T) 0.12

MTH 2102 Probability Theory– Semester Two 26


Exercise

1 In a production process, there are two different machines M1 and M2 . 20% and 80%
of the items are produced by M1 and M2 respectively. It has been established that 50%
of the items produced by M1 and 8% of the items produced by M2 are defective. If an
item is selected at random, find;

a) The probability that it is defective


b) The probability that the item produced by M2 is defective

2 Two boxes one with 8 black and 2 white balls and the other with 3 black and 7 white
balls are placed on a table. One box is chosen at random and a ball is picked from it.

a) What is the probability that it is a black ball picked


b) What is the probability of choosing a ball from the first box given that it is black?

3 Three professors have been nominated for the post of Dean. Their respective proba-
bilities are 0.3, 0.5 and 0.4. If professor A is elected, the probability that the faculty
will have new computers is 0.8. If B is elected, the probability is 0.1 and if C, it is 0.3.
Find the probabilities that the professors A,B and C are elected if the faculty gets new
computers.

MTH 2102 Probability Theory– Semester Two 27


Chapter 2

RANDOM VARIABLES
A random variable:
A variable is a characteristic being studied that can assume a prescribed set of values. It
can be Qualitative (Nominal & ordinal) or quantitative (Discrete & Continuous).

Considering a quantitative variable, a random variable is a real-valued function repre-


senting the outcomes of a random experiment. A random variable assumes different
values as a result of a random trial.

Example
Consider tossing a fair coin once, twice and three times. And in every case, count the
“number of heads” X obtained;

Random experiment Random variableX (Number of heads)


1 Toss a fair coin once: S = {T, H} X=x: 0,1
2 Toss a fair coin twice:S = {TT, TH, HT, HH} X=x:0,1,1,2, =⇒ x:0,1,2
3 Toss a fair coin thrice S = {TTT, TTH, THT, HTT, HHT, HTH, HHT, HHH} X=x: 0,1,1,1,2,2,2,3 =⇒ x: 0,1,2,3

Remarks

• In the above experiments, we were not interested in the individual outcomes but
instead some variable X (the number of heads) whose values were determined by
all the outcomes of the random experiment

• All the values of the variables X are real

• The variable X assume different real values x for the respective random trials

• The prescribed set of values assumed by the variable X form the domain variables.
Therefore, a variable whose value is real and determined by all the sample outcomes
of the experiment is called a random variable/chance variable/stochastic variable.
In these particular experiments, a random variable X is some function that assigns
a real value/number x) to each possible outcome in the sample space.

28
2.1 Types of quantitative Random variables
1 A discrete random variable X is one that assumes countable number of values. It is
defined over a countably finite and infinite sample spaces. It only assumes finite/infinite
number of values. Example;
• Number of heads in the above example
• Number of colored bulbs in a house
• Number of chairs in a library
2 A continuous random variable is defined over an uncountably infinite sample space.
It assumes values on the whole or part of the interval of the real line. In other words
it assumes values associated with the points on the number line. Examples include
weghts, heights, racing time, etc

Measure weights of students W(kg)=w: 55,50.1,56,67, etc


Measure heights of students H(m) h: 1.6, 1.66, 1.5,1.47,1.56,1.66, etc

2.2 The Probability Distributions/ Probability Functions


A random variable assumes each of its possible values with a definite probability/chance-
hence also called a chance variable. We usually need to know a function f (x) = P(X = x)
or a table or both defining/describing the “probability with which a random variable
assumes a particular value x”.

Such a function is called probability distribution/Probability function. All possible values


of a random variable together with their associated probabilities constitute a probability
distribution, denoted by P(X = x) = f (x). Probability Distributions are either discrete or
continuous

2.3 Discrete Probability Distributions/Functions


If X is a discrete random variable, P(X = x) is called a Probability mass function Pm f .
P(X = x) can be represented in a tabular form called the discrete probability distribution
table (dpdt). For P(X = x) to be a pm f , it must satisfy the following necessary and
sufficient conditions;
i P(X = x) ≥ 0 for all x
ii All x P(X = x) = 1
P
Pb
The condition P(a ≤ x ≤ b) = P(X = a) + ...P(X = b) = x=a P(X = x) is necessary but not
sufficient.
Examples

MTH 2102 Probability Theory– Semester Two 29


a) Consider a couple that wants to produce only two children, one at each birth. The
possible outcomes that constitute a sample space are S = {GG, GB, BG, BB}, where
G=Girl, B=Boy. If X is the possible number of Boys who could result, then X can
assume values x = 0, 1, 1, 2 determined by all sample points. Clearly X assumes these
values above with finite probabilities P(x = 0) = 14 , P(x = 1) = 42 and , P(x = 2) = 14
respectively.

X=x 0 1 2
1 2 1
P(X=x) 4 4 4

This is a probability distribution and since P(X = x) ≥ 0 for all x = 0, 1, 2 and P(X =
P
all x
x) = 1, then this distribution is a pmf.

b) Tossing a fair coin twice

X=x 0 1 2
1 2 1
P(X=x) 4 4 4

Clearly P(X = x) ≥ 0, for all x = 0, 1, 2 and P(X = x) = 1


P
all x

c) Loading a dice once

X=x 0 1 2 3 4 5 6
1 1 1 1 1 1 1
P(X=x) 6 6 6 6 6 6 6

Equivalently the dpd table above can be represented by the pmf



 6 ; x = 0, 1, 2, 3, 4, 5, 6,
 1
P(X = x) = 

(2.1)
0; Otherwise

d) Toss 2 die and X is the sum of the numbers that appear

+ 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

X=x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P(X=x) 36 36 36 36 36 36 36 36 36 36 36

MTH 2102 Probability Theory– Semester Two 30


Equivalently the dpd table above can be represented by the pmf





x−1
36
; x = 2, 3, 4, 5, 6, 7
P(X = x) = 

; x = 8, 9, 10, 11, 12
 13−x
 36
(2.2)


0;

Otherwise

Exercise

e) Given 
x
 25 ;
 x = 1, 2, 3, 4, 5
P(X = x) = 

(2.3)
0;

Otherwise

i Is P(X = x) a pmf?
ii Find P(1 ≤ x ≤ 3)
iii P(x ≥ 2)
iv Find P(2 ≤ x ≤ 5)

f) Given a pmf

cx; x = 1, 2, 3, 4

P(X = x) = 

(2.4)
0; Otherwise

2.4 The cumulative mass function/Distribution function


If X is a discrete random variable, a cumulative mass function F(x) is defined by

X S=x
X
F(x) = P(X ≤ x) = P(−∞ ≤ x ≤ ∞) = P(x) = P(S = x) (2.5)
all x S=s

F(x) Specifies the probability that an observed value of a random variable X will not
exceed x. Such a function is called a cmf or simply a distribution function. F(x) has the
following properties;

i F(x < −∞) = 0

ii F(x) ≥ 0. it is non-negative

iii F(x ≥ ∞) = 1

iv If a and b are constants such that a ≤ b then F(a) = P(x ≤ a) ≤ P(x ≤ b) = F(b) This
implies that F(a) ≤ F(b)

Examples

MTH 2102 Probability Theory– Semester Two 31


(a) Given 
5−x
 10 ;
 x = 1, 2, 3, 4
P(X = x) = 

(2.6)
0;

Otherwise
Find F(x)

Solution
For all values of x < 1
F(x) = 0
For x = 1
5−1 4
F(x) = P(x ≤ 1) = =
10 10
For x = 2
4 3 7
F(x) = P(x ≤ 2) = P(x = 1) + P(x = 2) = + =
10 10 10
For x = 3
4 3 2 9
F(x) = P(x ≤ 2) = P(x = 1) + P(x = 2) + P(x = 3) = + + =
10 10 10 10
For x = 4
4 3 2 1
F(x) = P(x ≤ 2) = P(x = 1) + P(x = 2) + P(x = 3) + P(x = 4) = + + + =1
10 10 10 10

4



 10
; 1≤x≤2

 7
 10 ;


 2≤x≤3
P(X = x) = 

9
 10
; 3≤x≤4


1; x≥4






0;

Else where
If graphed, F(x) gives a step graph

(b) Given 
x
 10 ;
 x = 1, 2, 3, 4
P(X = x) = 

(2.7)
0;

Else where
Find F(x) For all values of x < 1
F(x) = 0
For x = 1
1
F(x) = P(x ≤ 1) =
10

MTH 2102 Probability Theory– Semester Two 32


For x = 2
1 2 3
F(x) = P(x ≤ 2) = P(x = 1) + P(x = 2) = + =
10 10 10
For x = 3
1 2 3 6
F(x) = P(x ≤ 2) = P(x = 1) + P(x = 2) + P(x = 3) = + + =
10 10 10 10
For x = 4
1 2 3 4
F(x) = P(x ≤ 2) = P(x = 1) + P(x = 2) + P(x = 3) + P(x = 4) = + + + =1
10 10 10 10

1



 10
; 1≤x≤2

 3



 10
; 2≤x≤3
P(X = x) = 

6
 10
; 3≤x≤4


1; x≥4






0;

Else where

2.5 Properties of a discrete random variable


2.5.1 Expectation of a random variable E(X)
If X is a discrete random variable with a pmf P(X = x), then a number
X
E(X) = xP(X = x) (2.8)
all x

called the mathematical expectation of x.

Example
Given 
x
 25 ;
 x = 1, 2, 3, 4, 5
P(X = x) = 

(2.9)
0;

Else where
Find E(x)

Solution
From the definition for expectation,

X 5
X
E(X) = xP(X = x) = xP(X = x) = 1×P(X = 1)+2×P(X = 2)+3×P(X = 3)+4×P(X = 4)+5×P(X = 5)
all x 1

MTH 2102 Probability Theory– Semester Two 33


By substitution,

E(X) = 1 × P(X = 1) + 2 × P(X = 2) + 3 × P(X = 3) + 4 × P(X = 4) + 5 × P(X = 5)

Simplifying
1 2 3 4 5 55
E(X) = 1 × +2× +3 +4× +5× =
25 25 25 25 25 25

2.6 Expectation of a function


If X is a discrete random variable with a pmf P(X = x), and g(x) is a function of x. Then
X
E(g(x)) = g(x)P(X = x) (2.10)
All x

Assume
g(x) = ax + c (2.11)
Then

• E(ax + c) = aE(x) + c

• when a = 0, then E(c) = c

• When c = 0, then E(ax) = aE(x)

Proofs are available

Example

(a) Given

X=x 0 1 2
1 2 1
P(X=x) 4 4 4

Find E(2x + 5)

Solution
X
E(x) = xP(x) = 1
all x

E(2x + 5) = 2E(x) + 5 = 2 × 1 + 5 = 7

MTH 2102 Probability Theory– Semester Two 34


(b) Given 
x
 25 ;
 x = 1, 2, 3, 4, 5
P(X = x) = 

(2.12)
0;

Else where
Find E(5x + 7)
X 55
E(x) = xP(x) =
25
all x

55
E(5x + 7) = 5E(x) + 7 = 5 × + 7 = 18
25

2.7 Variance of a discrete random variable Var(x)


If Xis a discrete random variable with a pmf P(X = x), then

Var(X) = σ2x = E[X − E(X)]2 (2.13)

Var(X) = E(X2 ) − 2E(X).E(X) + (E(X))2 (2.14)


 2
X X 
Var(X) = x2 P(X = x) −  xP(X = x) (2.15)
All x All x

And p
σX = Var(X) (2.16)
is the standard deviation of x NOT standard error. If g(x) is a function of a discrete random
variable X, then
Var(g(x)) = E[g(x) − E(g(x))]2 (2.17)
In particular if g(x) = ax + c , then;

• when a = 0, thenVar(x) = 0

• Var(ax) = a2 Var(x)

• Var(ax) = a2 Var(x)

Proofs exist

Examples

(a) Given

X=x 0 1 2
1 2 1
P(X=x) 4 4 4

MTH 2102 Probability Theory– Semester Two 35


Find Var(x)

Solution
X
E(x) = xP(x) = 1
all x
 2
X X 
Var(X) = x2 P(X = x) −  xP(X = x)
All x All x

1 2 1 1
Var(X) = 02 × + 12 × + 22 × − 12 =
4 4 4 2
(b) Given 
x
 25 ;
 x = 1, 2, 3, 4, 5
P(X = x) = 

(2.18)
0;

Else where
Find Var(X)
(c) Find Var(5x + 7) in (b) above

2.8 Mode of a discrete random Variable


The value of X that is most likely to occur is one with the highest probability. This value
is called the mode of X. It can be found by working out all the probabilities.

Example
In
X=x 0 1 2
1 2 1
P(X=x) 4 4 4

x = 1 is the mode sinceP(X = x) = 1


2
is the highest

2.9 Median of a discrete random Variable


This is the smallest X value of the random variable for which the cmf F(x) is at least 0.5.
Hence, locate the smallest X value for which F(x) ≥ 12 . with the highest probability of
occurrence. Example, In
X=x 0 1 2
1 2 1
P(X=x) 4 4 4
1 3
F(X) 4 4
1

MTH 2102 Probability Theory– Semester Two 36


x = 1 is the median since F(x) ≥ 1
2

MTH 2102 Probability Theory– Semester Two 37


Chapter 3

Common Discrete Probability


Distributions
3.1 Binomial and Bernoulli probability distributions
A random experiment performed once with only two possible outcomes (binary) classi-
fied as either a success or failure is called a Bernoulli experiment or trial. If the experiment
involves a number of independent Bernoulli trials is called a Binomial experiment. Note

• It consists of n repeated independent fixed trials. Each independent trial is a


Bernoulli trial.

• Each trial results into an outcome that may be a success or failure

• The probability of Success, P remains constant for each trial and 0 ≤ P ≤ 1 and the
probability of failure, q = (1 − P) also remains constant.

• The “number of successes” X = x among the n independent trials is a discrete


random variable taking on values x = 0, 1, 2, . . . , n. Such a random variable is
called a binomial random variable having a Binomial distribution with parameters
n (number of trials) and P probability of success.

• We usually write X ∼ b(n, P) to mean that a binomial random variable X follows a


Binomial distribution with parameters n and P defined by;
 
 n
  px qn−x ; x = 01, 2, .....n



b(xi ; n, p) =  (3.1)

 x


0;

Else where
or  
 n
x = 01, 2, .....n
   x n−x
  p (1 − p) ;


b(xi ; n, p) = 

 x (3.2)


0;

Else where

38
• The binomial distribution b(xi ; n, p) is a function of a discrete random variable X,
hence b(x; n, p) is a discrete probability distribution

• The term !
n x n−x
pq (3.3)
x
n
is the general term in the binomial expansion of (p + q)n or p P(1 − p) . Hence;
! ! ! ! n !
n n 0 n n−1 1 n n−2 2 n 0 n X n x n−x
(p + q) =
n
p q + p xq + p xq + ... + pq = pq (3.4)
0 1 x n x=0
x

Or
! ! ! ! n !
n n 0 n n−1 1 n n−2 2 n 0 n X n x n−x
(q + P) =
n
q P + q xP + q xP + ... + qP = qP (3.5)
0 1 x n x=0
x

It can be shown that


n !
X n x n−x
(P + q) = (q + P) =
n n
P q = b(x; n, p) = P(X = x) (3.6)
x=0
x

Hence
b(x; n, p) = b(0; n, p) + b(1; n, p) + b(2; n, p) + ... + b(n; n, p) (3.7)

• Since (P + q) = 1, then
!
n x n−x
p q =1
x
and also summing up all possible probability outcomes b(x; n, p) = 1. Each individ-
ual term b(xi ; n, p) = 1 ≥ 0

• The value of individual terms:b(0; n, p), b(1; n, p), b(2; n, p), ....b(n; n, p) can be got from
the binomial tables.

• And we can now talk of: P(X = r), P(X < r), P(X > r), P(X ≤ r), P(X ≥ r), P(a ≤ X ≤ b)
e.t.c. whose values can be got from the binomial probability tables.

• !
n n!
n
Cx = = (3.8)
x (n − x)!x!

• If there exists only one Bernoulli trial (n = 1), then the Binomial random variable
X˜(n, P) reduces to a Bernoulli random variable X ≈ (P) simply written as X ≈ (P)

MTH 2102 Probability Theory– Semester Two 39


whose pmf is defined by;
 
 n
  p q ; x = 01, 2, .....n

   x n−x

b(xi ; 1, p) = 

 x (3.9)


0;

Else where
or  
 n
x = 01, 2, .....n
   x n−x
  p (1 − p) ;


b(xi ; n, p) = 

 x (3.10)


0;

Else where
! !
1 1
• When n = 1, it implies that = =1
0 1

• Hence a number of Bernoulli trials performed result into a Binomial experiment.

(i) Sitting for an examination once, results into passing or failure;


(ii) A patient expects to recover or die
(iii) Picking an item from a consignment of products may result in a defective or a
good item picked
(iv) An expectant mother may deliver a baby girl or boy, normal or disabled, over-
weight or underweight, black or brown, etc

3.2 Verifications that the Binomial and Bernoulli distribu-


tions are a pmfs
• For the Binomial  
 n
  p q ; x = 01, 2, .....n

   x n−x

b(xi ; n, p) = 

 x (3.11)


0;

Else where
We need to verify whether the distribution satisfies the necessary and sufficient
conditions;

(i) P(X = x) ≥ 0 f or all x


(ii) f or all x P(X = x) = 1
P

MTH 2102 Probability Theory– Semester Two 40


!
n 0 n−0
When x = 0, P(X = 0)= p q = qn ≥ 0
0
!
n 1 n−1
When x = 1, P(X = 1)= p q ≥ qn > 0
1
.
.
. ! !
n n n−n n n
When x = n, P(X = n)= p q = p >0
n n
!
n x n−x
Generally, for 0 < x ≤ n, p q > 0. Hence, For all x, P(X = x) ≥ 0
x
And !
n x n−x Pn
All x P(X = x)= All x p q = x=0 (p + q)n = ni=0 (1)n = 1,
P P P
x
Bearing in mind that p + q = 1. Since the two conditions are satisfied, then the
Binomial distribution is indeed a pmf.
• Repeat for the Bernoulli
 
 n
  p q ; x = 0, 1, 2, .....n

   x n−x

b(xi ; 1, p) = P(X = x) = 

 x (3.12)


0;

Else where
or

3.3 Mean and Variance of a Binomial random variable


3.3.1 The Expected value of a binomial random variable X
The mean or expected value of a binomial distribution is given by
n
X
E(X) = Xi P(X = xi ) = nP (3.13)

Proof
n n ! n !
X X n x n−x X n
E(X) = Xi P(X = xi ) = x P q =0+ x pPx qn−x ; (3.14)
x=0 x=o
x x=1
x
n !
X n x n−x
E(X) = x Pq ; (3.15)
x=1
x
n
X n!
E(X) = x × Px qn−x ; (3.16)
x=1
x!(n − x)!

MTH 2102 Probability Theory– Semester Two 41


n
X n(n − 1)!
E(X) = x × P1 × Px−1 qn−x ; (3.17)
x=1
x(x − 1)!(n − x)!
n
X (n − 1)!
E(X) = nP 1
x × × Px−1 qn−x ; (3.18)
x=1
(x − 1)!(n − x)!
Let x − 1 = y, implying that x = y + 1 and also let n − 1 = m implying that n = m + 1
m
X m!
E(X) = nP × P y qm−y ; (3.19)
y=0
y!(m − y)!

m !
X m y m−y
E(X) = nP P q ; (3.20)
y=0
y

E(X) = nP(P + q)n = nP(1)n (3.21)


E = nP (3.22)

3.3.2 Variance, Var(X) of a Binomial random variable X


The variance of a binomial random variable is given by

Var(X) = nPq (3.23)

Proof

Var(X) = E[X − E(X)]2 (3.24)


Var(X) = E(X2 ) − (E(X))2 (3.25)
Var(X) = E(X2 ) − (nP)2 (3.26)
Using the Identity
E(X(X − 1)) = E(X2 ) − E(X) (3.27)
E(X2 ) = E(X(X − 1)) + E(X) (3.28)
Var(X) = E(X(X − 1)) + E(X) − (nP)2 (3.29)
Var(X) = E(X(X − 1)) + np − n2 P2 (3.30)
But
n
X
E(X(X − 1)) = X(X − 1)P(X = x); X = x (3.31)
x=0
n !
X n x n−x
E(X(X − 1)) = x(x − 1) Pq (3.32)
x=0
x

MTH 2102 Probability Theory– Semester Two 42


n
X n!
E(X(X − 1)) = x(x − 1) × Px qn−x (3.33)
x=0
x!(n − x)!
n
X n(n − 1)(n − 2)!
E(X(X − 1)) = x(x − 1) × × P2 × Px−2 qn−x (3.34)
x=0
x(x − 1)(x − 2)!(n − x)!
n
(n − 2)!
X
E(X(X − 1)) = n(n − 1)P 2
× Px−2 qn−x (3.35)
x=2
(x − 2)!(n − x)!
n !
X n − 2
E(X(X − 1)) = n(n − 1)P2 × Px−2 × Px−2 qn−x (3.36)
x=2
x − 2

E(X(X − 1)) = n(n − 1)P2 (P + q)n−2 (3.37)


E(X(X − 1)) = n(n − 1)P2 (1)n−2 (3.38)
E(X(X − 1)) = n(n − 1)P2 (3.39)
Or from Eq. 3.26 let x − 2 = y this implies x = y + 2. also let n − 2 = m, this implies n = m + 2
n
X m!
E(X(X − 1)) = n(n − 1)P 2
× P y qm−y (3.40)
x=2
y!(m − y)!

m !
X m
E(X(X − 1)) = n(n − 1)P2 × P y × qm−y (3.41)
y=0
y

E(X(X − 1)) = n(n − 1)P2 (P + q)m (3.42)


E(X(X − 1)) = n(n − 1)P2 (1)m (3.43)
E(X(X − 1)) = n(n − 1)P2 (3.44)
From Eq. 3.30
Var(X) = E(X(X − 1)) + np − n2 P2 (3.45)
Var(X) = n(n − 1)P2 + np − n2 P2 (3.46)
Var(X) = n2 P2 − nP2 + np − n2 P2 (3.47)
Var(X) = nP(1 − P) (3.48)
Var(X) = nPq (3.49)
Hence, Var(X) = nPq is called the standard deviation. When X ∽ b(x : 1, P), then the
p p

Bernoulli distribution has E(X) = P and Var(X) = Pq

3.3.2.1 Computing binomial probabilities


Special cases to consider
(a) When exactly x = r successes

MTH 2102 Probability Theory– Semester Two 43


!
m r n−r
Compute P(X = r) = b(r; n, p) = pq
y
Or

Pr Pr−1
Reading from probability tables, P(X = r) = x=0 b(x; n, p) − x=0 b(x; n, p)

(b) When less than r successes result out of n independent trials; P(X < r) = P(X ≤ r − 1) =
Pr−1
x=0 b(x; n, p) = 1 − P(X ≥ r)

(c) When at most r successes result P out of n independent trials; Using the cumulative
Binomial distribution P(X ≤ r) = rx=0 b(r; n, p) = 1 − P(X > r)

(d) When at least r successes result out of n independent trials; P(X ≥ r) = rx=0 b(r; n, p) =
P
1 − P(X < r)

(e) Between a1 and a2 inclusive, a ≤ x ≤ b, result out of n independent trials; P(a ≤ x ≤


b)=P(x ≤ b) − P(x ≤ a − 1= P(≤ b) − P(x ≤ a)= bx=0 b(x : n, p) − a−1
P P
x=0 b(x : n, p)

(f) Between a1 and a2 inclusive, a < x < b,result out of n independent trials;
P(aP< x < b) = P(a +P1 ≤ x ≤ b − 1)= P(X ≤ b − 1) − P(X ≤ a)
b−1
= x=0 b(x : n, p) − ax=0 b(x : n, p)

Example1
The probability that a patient recovers from a rare blood disease is 0.4. If 10 people are
known to have contracted the disease, find the probability that;

(a) Exactly 5 survive?

(b) At least 5 patients recover

(c) At most 7 patients recover

(d) From 2 to 4 survive

(e) 3 < x < 6

Solutions

(a) Exactly 5 survive? Given that, P = 0.4, q = 1 − 0.4 = 0.6, n = 10, X = r = 5 Then
!
n r n−r
compute; P(X = r) = b(x : n, p) = pq
r
!
10
P(X = 5) = b(5 : 10, 0.4) = 0.45 0.610−5 = 0.2007
5
Or Using tables (Summations),
P(X = r) = rx=0 b(x : n, p)− r−1
P P P5 P4
x=0 b(x : n, p)= x=0 b(5 : 10, 0.4)− x=0 b(4 : n, 0.4)=0.8338−
0.6331 = 0.2007

MTH 2102 Probability Theory– Semester Two 44


(b) At least 5 patients recover
P(X ≥ 5) = 10
P
x=5 b(x : n, p)
P(X ≥ 5) = b(5 : n, p) + b(6 : n, p) + b(7 : n, p) + b(8 : n, p) + b(9 : n, p) + b(10 : n, p)
P(X ≥ 5) = 1 − PP(X < 5)
P(X ≥ 5) = 1 − 5x=0 b(x : n, p)

(c) At most 7 patients recover


P(X ≤ 7) = 7x=0 b(7 : 10, 0.4)
P

(d) From 2 to 4 survive


P(2 ≤ x ≤ 4) = P(x ≤ 4) − P(x ≤ 2)
P(2 ≤ x ≤ 4) = P(x
P ≤ 4) − P(x ≤ P 2 − 1)
P(2 ≤ x ≤ 4) = 4x=0 b(n : x, p) − 2−1
x=0 b(n : x, p)

(e) P(3 < x ≤ 6)

Example 2 A multiple quiz has 15 questions, each with four possible answers of which
only one is the correct answer. Determine the probability that shear guess yields;

(a) Exactly 5 correct answers

(b) 5 incorrect answers;

(c) At least 8 correct answers

(d) At most 10 correct answers

(e) Between 5 and 10 correct answer

Solution

(a) Exactly 5 correct answers


P = 41 , q = 34 , n = 15, X = r = 5
Then compute
! !
n r n−r 15 1 5 3 10
P(X = 5) = pq = = 0.165
r 5 4 4

(b) 5 incorrect answers; P = 14 , q = 34 , n = 15, 5 incorrect answers implies X = 10 correct


answers ! !
n r n−r 15 1 5 3 10−5
P(X = r) = pq = = 0.0007
r 10 4 4

(c) At least 8 correct answers


P = 14 , q = 34 , n = 15, X≥r=
 8;
P15 
P(X ≥ 8) = x=8 b 8 : 15, 14 = 0.0175

MTH 2102 Probability Theory– Semester Two 45


(d) At most 10 correct answers P = 14 , q = 43 , n = 15, X ≤ r = 10;
P(X ≤ 10) = 10x=0 b 10 : 15, 4 = 0.9999
1
P

(e) Between 5 and 10 correct answer P = 41 , q = 34 , n = 15,


P(5 < x < 10) = P(5 + 1 ≤ x ≤ 10 −P
1)  
P(5 < x < 10) = x=0 b 9 : 15, 4 − 6x=0 b 6 : 15, 14
P9 1

P(5 < x < 10) = 0.9999 − 0.6865 = 0.3134

Example 3
A newly married couple wishes to produce 5 children, what is the probability that;

(a) It will produce exactly 3 girls?

(b) All the 5 children will be boys?

(c) They will produce between 2 and 4 girls?

Solution

(a) It will produce exactly 3 girls?


P = 0.5, q = 0.5, n = 5, x = 3;
!
5
P(X = 3) = 0.53 0.55−3 = 0.3125
3

(b) All the 5 children will be boys?

(c) It will produce exactly 3 girls?


P = 0.5, q = 0.5, n = 5, x = 5;
!
5
P(X = 3) = 0.55 0.55−5 = 0.0315
5

(d) They will produce between 2 and 4 girls?


P(2 ≤ x ≤ 4) = 4x=0 b x : n, p − x=0
P  P2−1 
b x : n, p
P(2 ≤ x ≤ 4) = 4x=0 b (4 : 5, 0.5) − 1x=0 b (1 : 5, 0.5)
P P
P(2 ≤ x ≤ 4) = 0.9688 − 0.1875 = 0.7813

Example 4
12 objective questions, of which each has 4 alternatives in which only one is correct, were
given to a candidate. Find the probability that the candidate gets at most 7 correct.
Solution P
P(X ≤ 7) = 7x=0 b(7 : 12, 41 ) = 0.9972 Note

1 E(X) and Var(X) can also be got using the moments method and probability generating
functions (See PT ahead)

MTH 2102 Probability Theory– Semester Two 46


2 When the number of trials n is large and the probability of success P is close to 0.5,
the normal distribution is used to approximate the Binomial process and do continuity
corrections by adding or subtracting the correction term 0.5. Hence, if X defines a
p and X ∽ b(x; n, p) then we talk of the
random variable having a success in n large trials

Standard Normal with µ = np and δ = npq= np(1 − p)

x − E(X) x − np x − np
Z= p = √ = p asn → α (3.50)
Var(X) npq np(1 − p)

i.e. AS 2
1 (x−np)
e− 2 2np(1−p)
n → α, b(x; n, p) ≈ f (x) = p (3.51)
2πnp(1 − p)

Example
The probability that a patient recovers from T.B is 0.4. If 415 people contracted the disease,
find the probability that from 7 to 9 people survive.
Solution
• Using the Binomial approach;
P(a ≤ x ≤ b) = bx=0 b x : n, p − a−1
P  P 
x=0 b x : n, p
P(7 ≤ x ≤ 9) = 9x=0 b (x : 15, 0.4) − 7−1
P P
P9 x=0 b (x : 15, 0.4)
P7−1
P(7 ≤ x ≤ 9) = x=0 b (x : 15, 0.4) − x=0 b (x : 15, 0.4)
P(7 ≤ x ≤ 9) = 0.9662 − 0.6098 = 0.3564

• Approximating the binomial


√ distribution
√ using the normal approximation to
µ = np = 15 × 0.4 = 6, 15 × 0.4 × 0.6= 3.6 = 1.9
P(7 ≤ x ≤ 9) = P(x1 − 0.5 ≤ x ≤ x2 + 0.5) = P(6.5 ≤ x ≤ 9.5)
Such that
x1 − np 6.5 − 6
z1 = √ = = 0.263 (3.52)
npq 1.9
x2 − np 9.5 − 6
z2 = √ = = 1.842 (3.53)
npq 1.9
P(z1 ≤ z ≤ z2 ) = P(0.263 ≤ z ≤ 1.842)
P(z < 1.842 ≤ z2 ) = P(0.263 ≤ z ≤ 1.842)
P(z < 1.842 ≤ z2 ) = 0.9673 − 0.6037 = 0.3636

3.4 The Negative Binomial distribution (The waiting time


or Pascal distribution)
In some situations where the Binomial distribution applies, instead of being interested in
X = x successes out of the n independent trials, we may be interested in the number of

MTH 2102 Probability Theory– Semester Two 47


trials on which the kth success will occur. For the kth success to occur at the xth trial, it must
be preceded by (k − 1) successes in the (x − 1) previous trials and (x − k) are trials of failure
or non-successes;

Figure 3.1

for which the probability is


!
x − 1 k x−k
P(X = x) = b(x : k, P) = p q ; x = k, k + 1, ..... and k > 0 (3.54)
k−1

We normally write X ∽ b(x; k, P) to mean that the random variable X follows a Negative
binomial distribution with parameters k = number of successes and P =probability of
success on each trial.

3.4.1 The mean and variance of X


kq
If X ∽ b(x; k, P), then E(X) = pk and Var(X) = p2
Proof (Exercise)
Example 1
If the probability of having a male child is 0.5, find the probability that a family’s 4th child
is their second son.
Solution

Figure 3.2

MTH 2102 Probability Theory– Semester Two 48


This is a negative binomial process, where X is the random variable, the number of trials
at which the kth son is born.
P = 0.5, k = 2, x = 4, (x − k) = (4 − 2) = 2
!
4−1
P(X = 4) = 0.52 0.54−2 = 0.1875
2−1
Example 2
If the probability is 0.6 that a person exposed to a certain contagious disease will catch it.
Find the probability that;

(a) The 2th person will be the first to catch the disease

(b) The 5th person will be the 3 rd to catch it

(c) The 9th person will be the 6th

3.4.2 Remark
In the negative binomial distribution if k = 1, the resulting distribution is that of the
Geometric distribution with a parameter p whose pmf is
 
 x − 1


   P1 qx−1 ; x = 01, 2, 3.....n

P(X = x) = b(x; 1, P) = 
 
 1−1 (3.55)


0;

else where

Pqx−1 ; x = 01, 2, 3.....n

P(X = x) = 

(3.56)
0;

else where
In some situations where the binomial applies, we may be interested in the number of
trials on which the first success will occur. For this to happen the xth trial must be preceded
by x − 1 trials of failures. The probability that the first success will occur on thexth trial is
the Geometric distribution defined above.

Example 2
If the probability is 0.6 that a person exposed to a certain contagious disease will catch it.
Find the probability that The 2nd person will be the first to catch the disease

Solution

MTH 2102 Probability Theory– Semester Two 49


Figure 3.3

This is a negative binomial process, where X is the random variable, the number of trials
at which the The kth person catches the disease. P = 0.6, k = 1, x = 2, (x − k) = (2 − 1) = 1
!
x − 1 k x−k
P(X = x) = pq
k−1
!
1
P(X = 4) = 0.61 × 0.41 = 0.24
0

3.4.3 Mean and variance of X ∽ b(x; 1, P)


q
If X ∽ b(x; 1, P), the E(X) = 1
p
and Var(X) = p2
Proof
α
X
E(X) = xpqx−1 (3.57)
x=1
α
X
E(X) = p xqx−1 (3.58)
x=1

E(X) = p(1 + 2q1 + 3q2 + ........) (3.59)


Recall that
n
X x
1 + x + x + ........ =
1 2
xn = (3.60)
n=0
1−x
This means that
n
d d x 1
  X
(1 + x + x + ........) =
1 2
= nxn−1 = (3.61)
dx dx 1 − x n=0
(1 − x)2

MTH 2102 Probability Theory– Semester Two 50


Eq. 3.59 becomes
n
1 X
1 + 2q + 3q + ........ =
1 2
2
= xqx−1 (3.62)
(1 − q) n=0

Hence,  α 
X x−1  1 1 1
E(X) = p  xq  = p × 2
=p× 2 = (3.63)
x=1
(1 − q) p p
And
Var(X) = E(x2 ) + (E(x))2 (3.64)
!2
1
Var(X) = E(x ) +
2
(3.65)
p
Using the identity
1
E(x(x − 1)) = E(x2 ) − E(x) = E(x2 ) − (3.66)
p
1
E(x2 ) = E(x(x − 1)) + (3.67)
p
!2
1 1
Var(X) = E(x(x − 1)) + + (3.68)
p p
Since
n
1 X
2
= xqx−1 = 1 + 2q1 + 3q2 + ........ (3.69)
(1 − q) n=0

 α n
! X 
d 1 2 X
= x−1  = = x(x − 1)qx−2

2
 xq 3
(3.70)
dq (1 − q) 
x=0
 (1 − q) n=0

So
α
X α
X
E(x(x − 1)) = x(x − 1)pq x−1
= pq x(x − 1)qx−2 (3.71)
x=1 x=1

2
E(x(x − 1)) = pq × (3.72)
(1 − q)3
2pq
E(x(x − 1)) = (3.73)
p3
2q 1 1 q
Var(X) = + + = (3.74)
p2 p p2 p2

3.5 The mode


The mode (most likely value) of the geometric random variable is P(x > r) = q
Proof

MTH 2102 Probability Theory– Semester Two 51


P(x = r) = pqr−1
P(x ≤ r)=P(Success at some trial in the first r trials)
P(x ≤ r) = 1− P(No success in the first r trials)
P(x ≤ r) = 1 − qr
P(x ≤ r) = 1 − qr
P(x > r) = 1 − (1 − qr )
P(x > r) = qr
Note
In any geometric distribution, the mode is 1

3.6 Hyper geometric distribution


Suppose we are interested in the sampling n units without replacement from a population
of N elements that can be divided into two groups/categories according to those who
possess a specific characteristic and those who don’t possess the characteristic. The
groups could be; defective and non-defective, -ve and +ve, male and female, employed
and unemployed, etc.

Generally if we adapt the nomenclature success and failure to describe the two groups/categories,
we denote the number of successes by k and thus the number of failures is (N-k). Suppose
we are interested in the probability of getting x successes in n trials (sample size) after se-
!
k
lecting n of N elements without replacement from the population, then there are ways
x
!
N−k
of selecting x of the k successes and ways to select (n-x) of the (N-k) failures.
n−x

Figure 3.4
! !
k N−k
Therefore, there are ( ways to select x successes and n-k failures from a group
x n−x
of N elements, taking a sample of n at a time; of which k of the N are successes and the
rest N-k are failures.

MTH 2102 Probability Theory– Semester Two 52


!
N
Since there are ways to select a sample of n of the N equaly-likely elements from the
n
population, the probability of getting x successes in a sample of n elements (trials) is
   
 k N − k
    
   

  
 



x n − x

  
  


   ; x = 0, 1, 2, 3.....n; X ≤ x; (n − x) ≤ (N − k)



P(X = x) = 
  
 N 
 
(3.75)

  
 




 n
 


0; else where

We write X ∽ h(x : n, N, k) to mean that the random variable x follows a hyper geo-
metric distribution with parameters n, N and k. However, if the sampling is done with
replacement, then the x successes in n trials follow a binomial distribution.

3.6.1 Mean and Variance


If X ∽ h(x : n, N, k) then
nk
E(x) = (3.76)
N
and
nk(N − n)(N − k)
Var(x) = (3.77)
N2 (N − 1)
Proof (Exercise)
Example
A village LC committee has 8 members of which 3 are female. If a sample of 5 committee
members are to be taken for a capacity training workshop, find the probability that;

(a) 2 of them will be females

(b) No male is selected

Solution

(a) 2 of them will be females This is a hyper geometric process where X is the number of
female committee members selected for training assuming values x=0,1,2,3..... Given:
N=8, k=3, n=5, x=2

MTH 2102 Probability Theory– Semester Two 53


Figure 3.5

Required is P(x = 2)
! ! ! !
k N−k 3 5
x n−x 2 3
P(X = x) = ! = ! = (3.78)
N 8
n 5

(b) No male is selected No male is selected Required is P(5 − x = 0) = P(x = 3)


! ! ! !
k N−k 3 5
x n−x 3 0
P(X = x) ! = ! = (3.79)
N 8
n 5

3.7 The Poisson distribution


A random variable X is said to follow a Poisson distribution with parameter λ if and only
if its pmf is given by
 x −λ
λ e
 x! ; x = 01, 2, 3.....α; λ > 0

P(x : λ) = 

(3.80)
0;

else where

If this is so, the we write X ∽ P(x : λ) to mean that the random variable X follows a
Poisson distribution with a parameter λ ,the average number of successes occurring in a
given time interval or a specified region and e = 2.71828

3.7.1 Properties / postulates of a Poisson process


;

(1) The Poisson process applies in situations where we expect a fixed number of successes
or counts per unit of time or space or some other kind of unit.

MTH 2102 Probability Theory– Semester Two 54


(2) The average number of successes/occurrences per unit time is a constant, denoted by
λ, and it does not change with time.
(3) The average number of successes occurring in a given time interval or a specified
region λ is known.
(4) λ is independent of the number events that occur in any other disjoint intervals of
time or region or space.
(5) The probability that a single success will occur during a very short time interval
or small region or small space is proportional to the length of time interval (size of
the region) and does not depend on the number of outcomes occurring outside this
interval (region or space).
(6) The probability that more than one success will occur in such a short time interval or
a small region is negligible. Thus the chance/ probability of two or more occurrences
(successes) occurring simultaneously can be assumed to be 0. The Poisson process
depends mainly on λ -the average fixed number of successes /outcomes or counts per
unit of time or specified region or space or some other kind of unit.
The Poisson process depends mainly on λ -the average fixed number of successes /out-
comes or counts per unit of time or specified region or space or some other kind of
unit.

3.7.2 Example of Poisson processes;


(a) The number of calls an office receives per hour
(b) The number of typing errors made by a typist per page
(c) The number of customers a bank receive per day
(d) The number of car accidents a busy junction per day
(e) The number of people who contract HIV per day in Uganda
(f) Etc

3.7.3 The Mean and Variance of a Poisson random variable


α
X
E(X) = xP(X = x) (3.81)
x=0
α
X λx e−λ
E(X) = x (3.82)
x=0
x!
α
X λx e−λ
E(X) = 0 + x (3.83)
x=1
x!

MTH 2102 Probability Theory– Semester Two 55


α
X λ × λx−1 e−λ
E(X) = x (3.84)
x=1
x(x − 1)!
α
X λx−1
E(X) = λe −λ
(3.85)
x=1
(x − 1)!

λ0 λ1 λ2 λ3 λ4
" #
E(X) = λe −λ
+ + + + + ....... (3.86)
0! 1! 2! 3! 4!
λ1 λ2 λ3 λ4
" #
E(X) = λe 1 +
−λ
+ + + + ....... (3.87)
1! 2! 3! 4!
E(X) = λe−λ eλ
h i
(3.88)
E(X) = λ (3.89)
The variance,
Var(X) = E(X − E(X))2 = E(x2 ) − (E(x)2 ) (3.90)
Var(X) = E(X − E(X))2 = E(x2 ) − λ2 (3.91)
But
α
X
Var(X) = x2 P(X = x) (3.92)
x=0
α
X λx e−λ
Var(X) = x2 (3.93)
x=0
x!
α
X λ × λx−1 e−λ
Var(X) = x (3.94)
x=0
(x − 1)!
Let y = x − 1
α
X λ × λ y e−λ
Var(X) = (y + 1) (3.95)
y=0
y!
α α
X λ × λ y e−λ X λ × λ y e−λ
Var(X) = y + (3.96)
y=0
y! y=0
y!
α α
X λ y e−λ X λ y e−λ
Var(X) = λ y +λ (3.97)
y=0
y! y=0
y!

Var(X) = λE(y) + λ × 1 (3.98)


Var(X) = λE(y) + λ (3.99)
Where y ∽ P(λ) then E(y) = λ
E(x2 ) = λ2 + λ (3.100)
Var(X) = E(x2 ) − λ2 = (λ2 + λ) − λ2 = λ (3.101)

MTH 2102 Probability Theory– Semester Two 56


OR
E(x(x − 1)) = E(x2 ) − E(x) (3.102)
E(x2 ) = E(x(x − 1)) + E(x) (3.103)
E(x2 ) = E(x(x − 1)) + λ (3.104)
But
α
X
E (E(x(x − 1))) = x(x − 1)P(X = x) (3.105)
x=0
α
X λx e−λ
E (E(x(x − 1))) = x(x − 1) (3.106)
x=0
x!
α
X λ2 × λx−2 e−λ
E (E(x(x − 1))) = x(x − 1) (3.107)
x=0
x(x − 1)(x − 2)!
α
X λx−2 e−λ
E (E(x(x − 1))) = λ 2
(3.108)
x=1
(x − 2)!
Let y = x − 2
α
X λ y e−λ
E (E(x(x − 1))) = λ 2
(3.109)
x=1
y!

E (E(x(x − 1))) = λ2 × 1 (3.110)


E (E(x(x − 1))) = λ2 (3.111)
E(x2 ) = λ2 + λ (3.112)
Var(X) = E(x2 ) − λ2 = (λ2 + λ) − λ2 = λ (3.113)

3.7.4 Verifying that P(x; λ)is a pmf


α α
X λx e−λ X λx
x =e −λ
x (3.114)
x=0
x! x=0
x!
α
λx e−λ −λ λ λ1 λ2 λ3 λ4
X " 0 #
x =e + + + + + ....... (3.115)
x=0
x! 0! 1! 2! 3! 4!
α
λx e−λ λ2 λ3 λ4
X " #
x =e 1+λ+
−λ
+ + + ....... (3.116)
x=0
x! 2! 3! 4!
α
X λx e−λ
= e−λ eλ = 1
h i
x (3.117)
x=0
x!

MTH 2102 Probability Theory– Semester Two 57


This is obtained using Taylor’s expansion and clearly,

λx e−λ
P(x; λ) = (3.118)
x!
is a pmf

3.7.5 Mode (most likely value) of the Poisson distribution


If λ is an integer, then there two modes and these occur when x = λ − 1, x = λ. In general,
if λ is not an integer, then the mode, m is the integer such that;λ − 1 < m < λ

3.7.6 Computing probabilities


Special cases to consider

(a) When exactly x = r successes result; Compute:

λx e−λ
P(X = r) = P(x; λ) = (3.119)
x!

(b) Less than r successes result


α r−1
X X λx e−λ
P(X < r) = P(x; λ) = 1 − P(X ≤ r − 1) = 1 − (3.120)
x=0 x=0
x!

(c) At least r successes result;


α r−1
X X λx e−λ
P(X ≥ r) = P(x; λ) = 1 − P(X < r) = 1 − P(X ≤ r − 1) = 1 − (3.121)
x=0 x=0
x!

(d) At most r successes result;


Using the cumulative Binomial distribution
r
X
P(X ≤ r) = P(0; λ) + P(1; λ) + ..... + P(r; λ) = P(x; λ) (3.122)
x=0

α
X λx e−λ
P(X ≤ r) = 1 − P(X > r) = 1 − (3.123)
x=r+1
x!

(e) Between a and b inclusive, a ≤ x ≤ b successes result

P(a ≤ x ≤ b) = P(X ≤ b) − P(X ≤ a) = P(X ≤ b) − P(X ≤ a − 1) (3.124)

MTH 2102 Probability Theory– Semester Two 58


(f) Between a and b exclusive, a1 < x < a2 successes result

P(a1 < x < a2 ) = P(a1 < x) + P(x < a2 ) = P(x > a1 ) + −P(X < a2 ) (3.125)

P(a1 < x < a2 ) = P(x ≥ a1 + 1) + −P(X ≤ a2 − 1) (3.126)


P(a1 < x < a2 ) = P(a1 + 1 ≤ x ≤ a2 − 1) (3.127)
Example
If a bank receives on the average 5 bouncing cheques per day, find the probability that
on a given day the bank will receive;
(a) 3 bouncing cheques
(b) Less than 2 bouncing cheques
(c) One or more bouncing cheque
(d) At most 3 bouncing cheques
Solution
This is a Poisson process where the random variable X is the number of bouncing
cheques received by the bank. λ=5
(i) 3 bouncing cheques on a given day, x=3

53 e−5
P(X = 3) = P(3; 5) = = 0.1404 (3.128)
3!
(ii) Less than 2 bouncing cheques
α r−1
X X λx e−λ
P(X < r) = P(x; λ) = 1 − P(X ≤ r − 1) = 1 − (3.129)
x=0 x=0
x!

2−1 2−1 x −5
X X 5e
P(X < 2) = P(x; 5) = 1 − P(X ≤ 1) = 1 − (3.130)
x=0 x=0
x!
1
X 5x e−5
P(X < 2) = 1 − (3.131)
x=0
x!
P(X < 2) = 1 − 0.9596 = 0.0404 (3.132)
(iii) One or more bouncing cheque
α r−1
X X λx e−λ
P(X ≥ r) = P(x; λ) = 1 − P(X < r) = 1 − P(X ≤ r − 1) = 1 − (3.133)
x=0 x=0
x!

α
X 50 e−5
P(X ≥ 1) = P(x; λ) = 1 − P(X < 1) = 1 − P(X ≤ 0) = 1 − = 1 − e−5 (3.134)
x=0
0!

MTH 2102 Probability Theory– Semester Two 59


53 e−3
P(X = 0) = P(; 5) = = 0.1404 (3.135)
3!
(iv) At most 3 bouncing cheques
r
X
P(X ≤ r) = P(0; λ) + P(1; λ) + ..... + P(r; λ) = P(x; λ) (3.136)
x=0

3
X
P(X ≤ r) = P(0; 5) + P(1; 5) + ..... + P(3; 5) = P(x; 5) = 0.1512 (3.137)
x=0

3.7.7 Poisson approximation of the Binomial distribution


The Poisson random variable can also be used as an approximation of the Binomial
random variable when n is large (n ≥ 50) and P is small (P < 0.1) enough such that
λ = np is of moderate size. In other words, as n → α, p → 0, (q → 1) and λ = np ≥ 5
(moderate magnitude) Suppose X ∽ b(x; n, p). In the limits; as n → α, p → 0,and
λ = np, this implies that p = λn
 
 n
  p q ; x = 0, 1, 2, .....n

   x n−x

b(x; n, p) = 

 x (3.138)


0;

Else where

becomes
n λ x λ n−x
!   
b(x; n, p) = 1− q (3.139)
x n n
λ λ n−x
 x 
n!

b(x; n, p) = 1− (3.140)
(n − x)!x! n n
n(n − 1)(n − 2)........(n − (n − 1))(n − x)! λ x λ n−x
   
b(x; n, p) = 1− (3.141)
(n − x)!x! n n
n(n − 1)(n − 2)........(n − (n − 1)) λ λ n−x
 x 
b(x; n, p) = 1− (3.142)
x! nx n
n(n − 1)(n − 2)........(n − (n − 1)) λ n−x
 
b(x; n, p) = ×λ 1− x
q (3.143)
n.n.n.n.......n.x! n
λ n−x
" #
n n−1 n−2 n − (x − 1) 1
 
b(x; n, p) = × × × ...... ×λ 1− x
(3.144)
n n n n x! n
1 2 x−1 1 λ n λ −x
          
b(x; n, p) = 1 × 1 − × 1− × ...... × 1 − ×λ 1− x
× 1−
n n n x! n n
(3.145)
If n → α, keeping x and λ are constant, then;

MTH 2102 Probability Theory– Semester Two 60


h      i
(i) 1 × 1 − n1 × 1 − n2 × ...... × 1 − x−1 n
→1
 n
(ii) 1 − λn → e−λ by Taylor series
 −x (1)λx e−λ (1)
(iii) 1 − λn = λ x!e = P(λ); 0, 1, 2......., α, e = 2.718
x −λ
→ 1 So that b(x; n, p) = x!

 x −λ
λ e
 x! ; x = 0, 1, 2, .....α

P(λ) = 

(3.146)
0;

elsewhere

In other words, if n independent Bernoulli trials are performed, each of which


has a probability P of success, then when n is large and the probability of
Success P is small (close to 0) to make λ = np of moderate size (np ≥ 5) and
constant, the Binomial and Poisson distributions have the same histograms with
approximately the same shape. If these conditions hold, the number of successes
X occurring is approximately a Poisson random variable with parameter λ = np
which can be used to approximate the Binomial probabilities.
X λx e−λ (np)x e−np
n x n−x
x CP q = = (3.147)
n→∞
x! x!
p→0

Example 1
Suppose on the average, one person in every 1000, is an alcoholic, find the probability
that a random sample of 8000 people will yield fewer than 7 alcoholics.
Solution
n = 8000, P = 1000
1
= 0.001, x < 7
This is a binomial process where the random variable X is the number of alcoholics.
But since n = 8000 is large and P → 0, use a Poisson approximation such that
λ = np = 8000 × 0.001 = 8
7
X
P(X < 7) = P(0; 8) + P(1; 8) + P(2; 8) + ..... + P(6; 8) = P(x; 8) = 0.3134 (3.148)
0

Or
8000
X
P(x < 7) = 1 − P(x ≥ 7) = 1 − P(x; 8) = 1 − 0.6866 = 0.3134 (3.149)
7

Example 2
If the probability is 0.002 that any police officer attending a parade on a hot day will
suffer from heat exhaustion. What is the probability that 20 of the 5000 police officers
attending the parade will suffer from the heat exhaustion.
Solution
This is a pure binomial process, where the R.V X=number of police officers who suffer

MTH 2102 Probability Theory– Semester Two 61


from the heat.Now, P = 0.002, n = 5000, x = 20
! !
n k n−x 5000
P(X = x) = pq = 0.00220 0.9985000−20 = 0.00187 (3.150)
x 20

Since n = 5000 is large and P = 0.002 is small, and λ = np = 5000 × 0.002 = 10 > 5 is
of moderate size and constant, we could also use the Poisson approximation.

λx e−λ (10)20 e−10


P(X = x) = = = 0.00187 (3.151)
x! 20!

(g) In a situation where the Poisson applies, if success occur at a near rate of α per unit
of time, or per unit region, then the average number of successes λ in an interval of
t units of time or t units of the specified region is also a Poisson process with mean
λ = αt. i.e

In 1 unit of time ∽→ λ = α successes in t time →? (Successes) implying αt1 = αt


Successes,implying λ = αt Therefore, the number of successes X in the time interval
of length t units or the region of size t units is a Poisson distribution with a pmf

(αt)x e−αt
P(x; λ = αt) = (3.152)
x!
Example
A certain type of carpet has on average 2 defects per one square meter. What is the
probability that 3 square meter carpet of this type will have 3 or more defects?
Solution
This is a Poisson process where X is the number of defects per 3m2 . Since the mean
number of defects per 1m2 is α = 2, then the mean number of defects per 3m2 isλ =
αt = 2 × 3 = 6 defects. Required to find (P(X ≥ 3)

P(X ≥ 3) = 1 − P(X < 3) (3.153)

P(X ≥ 3) = 1 − [P(X = 0) + P(X = 1) + P(X = 20)] (3.154)


" 0 −6 #
6e 61 e−6 (αt)2 e−6
P(X ≥ 3) = 1 − + + (3.155)
0! 1! 2!
P(X ≥ 3) = 1 − [0.0025 + 0.0149 + 0.0446] = 0.93798 (3.156)

MTH 2102 Probability Theory– Semester Two 62


Chapter 4

Continuous Probability Distributions


4.1 Background
Recall that since the domain variable is uncountably infinite, specifying the probability
with which X assumes an individual/specific value P(X = x) is negligible (P(X = x) = 0).
X can have an infinite number of values between two point E.g. distances; weight. It is
preferably logical for a continuous random variable of an event be defined as an interval
of values. Therefore, if X is a continuous random variableRof an event A defined in terms
b
of an interval, say a < x < b, then P(A) = P(a < x < b) = a f (x)dx representing the area
under f (x) between the limits x = a, and x = b that define the event. The function f (x) is
called a Probability density function pdf.
Remark
If X is a contiguous random variable, then
(i) f (x) cannot be represented as a table as it would be too long because the interval is
dense

(ii) P(X=x) cannot definef(x) since P(X = x) = 0 , ∀ x

(iii) f (x)≥ 0
Rb
(iv) x=a f (x)dx = 1
Rb
(v) Since P(X = x) = 0, then, P(a < x < b = P(a ≤ x < b) = P(a < x ≤ b) = P(a ≤ x ≤ b) = x=a
f (x)dx
Conditions (iii) and (iv) are necessary and sufficient for f (x) to be a pdf.
Examples
1 Given a pdf

ke−3x ; x > 0

f (x) = 

0;

Else where
Find k
Solution
Rb
f (x)dx = 1
Rx=a

x=0
ke−3x dx = 1

63
R∞ h −3x i∞
k limt→∞ x=0 e−3x dx = limt→∞ e−3 =1
0
This implies that k = 3 and therefore

3e−3x ; x > 0

f (x) = 

0;

Else where

2 Given 
kx(x − 2); 0 < x < 2

f (x) = 

0;

Else where
Find k and hence evaluate P(x < 1.5)
Solution
R2
0
kx(x − 2) = 1 solving shows that k= 43
Therefore

3
− 4 x(x − 2);
 0<x<2
f (x) = 

0;

Else where
R 1.5
Hence P(x < 1.5) = P(0 < x < 1.5) = − 0
3
4
x(x − 2)dx = 27
32

4.2 The distribution function of a continuous random vari-


able
If X is a continuous random variable, a cumulative mass function F(x)is defined by
Z t=x
F(x) = P(X ≤ x) = f (t)dt (4.1)
t=a

t being a change of variable and F(x) is characterized by the following properties;


(i) F(−∞) = 0
(ii) F(x) ≥ 0, it is non-negative
(iii) F(∞) = 1
(iv) If a and b are constants such that a ≤ b, then P(a ≤ x ≤ b) = F(a) − F(b)
dF(x)
(v) f (x) = dx
. implying that F′ (x) = f (x)
Examples
1 Given a pdf

3e−3x ; x > 0

f (x) = 

0;

Else where

MTH 2102 Probability Theory– Semester Two 64


Find F(x) and hence compute P(0.5 ≤ x ≤ 1)
Solution
For x ≤ 0, F(x) = 0
For x > 0
e−3x t
F(x) = lim 3e−3x dx = −3 lim | = 1 − e−3x
t→∞ t→∞ 3 0

Therefore 
1 − e−3x ;
 x>0
f (x) = 

0;

Else where
And
P(0.5 ≤ x ≤ 1) = F(0.5) − F(1) = (1 − e−3×1 ) − (1 − e−3×0.5 ) = 0.173

4.3 The expected value, Variance Median and Mode of a


continuous random variable
If X is a continous random variable with a pdf f(x), then
R
(1) E(X) = x f (x)dx
R hR i2
(2) Var(X) = E[X − E(X)]2 =E(X2 ) − [E(X)]2 = x2 f (x)dx − x f (x)dx

(3) The mode (most likely value) of the function is such that f ′ (x) = 0 and f ”(x) < 0
Example

− 4 x(x − 2); 0 < x < 2
 3
f (x) = 

0;

Else where
3x 6
f (x) = − + =0
2 4
Implying that x = 1
And
3
f ”(x) = − < 0
2
Rm
(4) The median of X is defined by −∞
f (x)dx = 0.5 where m is the median Example

2x; 0 < x < 2

f (x) = 

0; Else where

Z m
f (x)dx = 0.5 = x2 |m
0
−∞
1
m= √ (4.2)
2

MTH 2102 Probability Theory– Semester Two 65


Example




 0; x≤0

x2
0<x≤2



 6

F(x) = 

x2
−2 + 2x − 2<x≤3

3
;


1 : x≥3





0;

Else where
Find the median and f(x)
Solution
The median m is is such that F(m) = 12 . We establish the possible interval containing
the median;
For 0 ≤ x < 2, F(0) = 0
2
2 ≤ x < 3, F(2) = 26 = 32
Clearly since F(2) = 23 > 12 , the interval 0 < x ≤ 2 contains the median and using the
corresponding function,
F(m) = 21
m2
= 12
6 √
m = ± 3 ∈ (0, 2]
Now
dF(x)
f (x) =
dx
This implies that
F′ (x) = f (x)
For x ≤ 0, f (x) = 0
0 < x ≤ 2, f(x)= x3
2 < x ≤ 3, f(x)=2- 23 x





x
3
0<x≤2
f (x) = 

− 32 x; 2<x≤3

 2


0;

Else where

4.4 Some common continuous distributions (pdfs)


4.4.1 Uniform (Rectangular) Distribution
A continuous random variable is said to follow a Uniform distribution with parameters
α and β, denoted as X ∽ U(x : α, β) if and only if its distribution is defined by

 1
 β−α α≤x≤β
f (x) = 

(4.3)
0;

Else where

MTH 2102 Probability Theory– Semester Two 66


α, β are constants and α < β Note. This distribution has the following properties;

(1) f (x) is a pdf


Proof
From f (x) = 1
β−α
for α ≤ x ≤ β, f (x) ≥ 0 ∀x since β − α > 0 and

β β
β−α
Z Z
1 1 1 β
dx = dx = [x]α = =1 (4.4)
α β−α β−α α β−α β−α

Since both conditions are satisfied, then f (x) is a pdf.


α+β (α−β)2
(2) If X ∽ U(x; α, β), then, then E(x) = 2
, and Var(x) = 12
Proof
Z β
1
E(x) = dx x (4.5)
α β−α
Z β " 2 #β
1 1 x 1 1 α+β
E(x) = xdx = × = × (β2 − α2 ) = (4.6)
β−α α β−α 2 α β−α 2 2
Z β " Z β #2
1 1
Var(x) = x 2
dx − xdx (4.7)
α β−α β−α α
" 3 #β " #2
1 x α+β
Var(x) = × − (4.8)
β−α 3 α 2
#β " #2
β − α3 α+β
" 3
1
Var(x) = × − (4.9)
β−α 3 α
2
(β−α)2
Simplifying yields Var(x)= 12
The cumulative distribution function
Z x
P(X =≤ x) = f (x)dx (4.10)
a
Z x
1
P(X ≤ x) = dx (4.11)
a β−α
Z x
1
P(X ≤ x) = dx (4.12)
β−α a
1
P(X ≤ x) = [x]xa (4.13)
β−α
x−a
P(X ≤ x) = (4.14)
β−α

MTH 2102 Probability Theory– Semester Two 67


Therefore, 



 0; a<x
P(X ≤ x) = 

a<x<b
 x−a
 β−α ; (4.15)


1; x≥b

Whose graph is

Figure 4.1: Example of Uniform (Rectangular) Distribution

4.4.2 Normal/Gaussian distribution


A continuous random variable X is said to follow a Normal distribution with parameters
µ and δ2 if its distribution function is given by
1 xi −µ 2

 √1 e− 2 ( δ ) ; −∞ ≤ x ≤ ∞ , −∞ ≤ µ ≤ ∞, and δ > 0
N(xi ; µ, δ2 ) =  δ 2π

(4.16)


0; elsewhere

Or
1 xi −µ 2

 √ 1 e− 2 ( δ ) ; −∞ ≤ x ≤ ∞ , −∞ ≤ µ ≤ ∞, and δ > 0
N(xi ; µ, δ2 ) = 

2πδ2 (4.17)


0; elsewhere
Note

(1) π ≈ 3.1459 and and e ≈ 2.71828

(2) The normal distribution is fully specified by π and δ2 . If you know these two pa-
rameters, then you know everything about the Normal distribution. And almost all
natural phenomena can be approximated by the normal distribution.

(3) The normal distribution is a pdf


Proof

MTH 2102 Probability Theory– Semester Two 68


Given that
 x −µ 2
−1( i )
1
 δ √2π e 2 δ ;
 −∞ ≤ x ≤ ∞ , −∞ ≤ µ ≤ ∞, and δ > 0
N(xi ; µ, δ ) = 
2

 (4.18)
0; elsewhere

We are required to show that

• ∀x, f (x) ≥ 0
1 xi −µ 2
R∞
• −∞ δ √12π e− 2 ( δ ) = 1

Consider the integral Z ∞


1 1 xi −µ 2
√ e− 2 ( δ ) dx = 1
−∞ δ 2π
x−µ dy
Let y = δ
, dx = 1
δ
or dx = δdy
Z ∞ Z ∞
1 x −µ 2
− 21 ( i δ ) 1 1 − 12 e− 21 y2
√ e dx = √ e δdy (4.19)
−∞ δ 2π 2π −∞ δ
Z ∞ Z ∞
1 x −µ 2
− 12 ( i δ ) 1 1 2
√ e dx = √ e− 2 y dy (4.20)
−∞ δ 2π 2π −∞
Let Z ∞
1 2
I= e− 2 y dy (4.21)
−∞

We need to show that √


I= 2π (4.22)
Z ∞ Z ∞ Z ∞Z ∞
e− 2 (x +y ) dxdy
1 2 1 2 1 2 2
I2 = e− 2 y dy × e− 2 x dx = (4.23)
−∞ −∞ −∞ −∞

Changing the variables of this integral from y to x to the polar co-ordinates r and θ,
by letting x = r cos θ, y = r sin θ, then x2 + y2 = r2 ,where dxdy = rdrdθ.
Z ∞ Z ∞ Z 2π Z ∞ !
− 21 (x2 +y2 ) − 21 r2
I =
2
e dxdy = e rdr dθ (4.24)
−∞ −∞ 0 0

Z ∞ Z ∞ Z 2π
1 2 ∞
− 12 (x2 +y2 )
h i 
I =
2
e dxdy = −e− 2 r dθ (4.25)
0
−∞ −∞ 0
Z ∞ Z ∞ Z 2π i∞  Z 2π
− 12 (x2 +y2 )
h
− 12 r2
I =
2
e dxdy = −e dθ = dθ = [θ]2π
0 = 2π (4.26)
0
−∞ −∞ 0 0

Therefore √
I= 2π (4.27)

MTH 2102 Probability Theory– Semester Two 69


From Eq.4.21 Z ∞ √
1 1 1 2 1
√ I= √ e− 2 y dy = √ × 2π = 1 (4.28)
2π 2π −∞ 2π
Since the distribution satisfies the two conditions then N(xi ; µ, δ2 ) is a pdf

(4) The graph (curve) of f(x) is bell-shaped

Figure 4.2: Normal Distribution Curve

(5) The curve has the maximum value when x = µ, implying that f (x) = √1
δ 2π
P
x
(6) The curve is symmetric about the vertical axis through the mean x = µ = N

(7) The curve approaches the x-axis asymptotically

(8) The mode of the normal density occurs at x = µ

(9) The total area under the curve is 1 i.e


Z ∞
1 1 xi −µ 2
√ e− 2 ( δ ) dx = 1 (4.29)
−∞ δ 2π

(10) If N(xi ; µ, δ2 ), then E(x) = µ and Var(x) = δ2


Proof
 x −µ 2
−1( i )
1
 δ √2π e 2 δ ;
 −∞ ≤ x ≤ ∞ , −∞ ≤ µ ≤ ∞, and δ > 0
f (x) = 

 (4.30)
0; elsewhere
Z ∞
E(x) = x f (x)dx (4.31)
−∞
Z ∞
1 1 xi −µ 2
E(x) = x √ e− 2 ( δ ) dx (4.32)
−∞ δ 2π
Z ∞
1 1 xi −µ 2
E(x) = √ xe− 2 ( δ ) dx (4.33)
δ 2π −∞

MTH 2102 Probability Theory– Semester Two 70


This can be modified to
Z ∞
1 1 xi −µ 2
E(x) = √ ((x − µ) + µ)e− 2 ( δ ) dx (4.34)
δ 2π −∞
Z ∞ Z ∞
1 x −µ 2
− 12 ( i δ ) 1 1 xi −µ 2
E(x) = √ (x − µ)e dx + √ +µe− 2 ( δ ) dx (4.35)
δ 2π −∞ δ 2π −∞
Z ∞ Z ∞ !
1 x −µ 2
− 12 ( i δ ) 1 x −µ 2
− 12 ( i δ )
E(x) = √ (x − µ)e dx + µ √ e dx (4.36)
δ 2π −∞ δ 2π −∞
But Z ∞
1 xi −µ 2
e− 2 ( ) dx = 1
1
√ δ (4.37)
δ 2π −∞
Z ∞
1 xi −µ 2
(x − µ)e− 2 ( ) dx + µ
1
E(x) = √ δ (4.38)
δ 2π −∞

Let y = x − µ, then dy = dx
Z ∞
1 y 2
ye− 2 ( δ ) dy + µ
1
E(x) = √ (4.39)
δ 2π −∞


δ2 δ2
Z  y2 ∞
1 1 y 2
E(x) = √ ye− 2 ( δ ) dy + µ = − √ e− 2δ2 + µ = √ [0 − 0] + µ = µ (4.40)
δ 2π −∞ δ 2π −∞ δ 2π
Therefore
E(x) = µ (4.41)
The variance
Var(x) = E(x2 ) − [E(x)]2 (4.42)
Var(x) = E[x2 ) − E(x)]2 (4.43)
Z ∞
1 1 xi −µ 2
Var(x) = (x − µ)2 √ e− 2 ( δ ) dx (4.44)
−∞ δ 2π
x−µ
Let y = δ
,implying x = µ + δy, it can be shown that dx = δdy
Z ∞
1 1 2
Var(x) = (µ + δy − µ)2 √ e− 2 y dx (4.45)
−∞ δ 2π
Which simplifies to

δ2
Z
1 2
Var(x) = √ y2 e− 2 y dy (4.46)
2π −∞

Integrating by parts
Let u = y, du
dx
=1

MTH 2102 Probability Theory– Semester Two 71


1 2 1 2
and dv
dx
= ye− 2 y , therefore v = −2e 2 y
Z ∞
δ2
Z
2 − 21 y2 du
Var(x) = √ ye dy = uv − v dy (4.47)
2π −∞ dx
Z ∞ Z ∞
δ2 δ2 h
( )
1 2 ∞
i
2 − 12 y2 1 2
Var(x) = √ ye dy = √ −ye 2 y
+ − y
e 2 dy (4.48)
−∞
2π −∞ 2π −∞
Z ∞
δ2
( )
1 2
Var(x) = √ [0 − 0] + e− 2 y dy (4.49)
2π −∞
(Z ∞ )
1 − 12 y2
Var(x) = δ 2
√ e dy (4.50)
−∞ 2π
Var(x) = δ2 × 1 (4.51)
Or
Var(x) = δ2 (4.52)

(11) If X ∽ N(µ, δ2 ), X can be transformed into a new random variable Z with mean
µ = 0 and variance δ2 = 1. Z is called the standard or normalized random variable
x −E(x ) x −µ
Z = i δx i = i δ , E(Z) = 0 and Var(Z) = 1 implying that Z ∽ N(0, 1)
i

(12) The standard normal pdf/curve is



− 1 y2
1
 √2π e 2 ;
 −∞ ≤ z ≤ ∞ , −∞ ≤ µ ≤ ∞, and δ > 0
N(zi ; 0, 1) = φ(z) = 

(4.53)
elsewhere

0;

(13) The graph of Z ∽ N(0, 1) is still bell-shaped

Figure 4.3: Normalized random variable

(14) The CD function of ϕ(z) is


Z u=z
1 1 2
ϕ(u) = φ(Z ≤ z) = √ e− 2 u du (4.54)
u=−∞ 2π
Its graph is

MTH 2102 Probability Theory– Semester Two 72


Figure 4.4: Normalized random variable

φ(Z ≤ z) gives the area under the curve at most z and the inverse gives z φ−1 (Z ≤ z) =
invnormal(p) = z. 1 − φ(Z ≤ z) gives the area under the curve above z
(15) Most statistical tables give values of φ(Z ≤ z) for different values of z. In most
problems that call for the assumption that a random variable X ∽ N(µ, δ2 ), we usually
end up requiring the integral calculus evaluation of
Z x2
1 1 xi −µ 2
P(x1 ≤ X ≤ x2 ) = √ e− 2 ( δ ) dx (4.55)
δ 2π x1
and this is tedious! Therefore, it is consoling to note that for any random variable
X ∽ N(µ, δ2 ) x − µ x − µ x − µ
1 2
P(x1 ≤ X ≤ x2 ) = P ≤ ≤ (4.56)
δ δ δ
x − µ x2 − µ 
1
P(x1 ≤ X ≤ x2 ) = P ≤Z≤ (4.57)
δ δ
Z z2
1 1 2
P(x1 ≤ X ≤ x2 ) = P (z1 ≤ Z ≤ z2 ) = √ e− 2 z dz (4.58)
z1 2π

Figure 4.5: Normalized random variable

P(x1 ≤ X ≤ x2 ) = P (Z ≤ z2 ) − P (Z ≤ z1 ) (4.59)
Values are read from the standard tables. Example
Example 1 Given that X ∽ N(50, 100), evaluate;

MTH 2102 Probability Theory– Semester Two 73


(a) P(40 ≤ x ≤ 60)
(b) P(x ≤ 45)
(c) P(x ≥ 60)

Solution

(a) P(40 ≤ x ≤ 60)


x − µ x − µ x − µ
1 2
P(x1 ≤ X ≤ x2 ) = P ≤ ≤
δ δ δ
40 − 50 60 − 50
 
P(40 ≤ X ≤ 50) = P ≤Z≤
10 10
P(40 ≤ X ≤ 50) = P (−1 ≤ Z ≤ 1)
P(40 ≤ X ≤ 50) = P (Z ≤ 1) − P (Z ≤ −1)
P(40 ≤ X ≤ 50) = 0.8413 − 0.1587 = 0.6826
(b) P(x ≤ 45)
45 − 50
 
P(x ≤ 45) = P Z ≤
10
P(x ≤ 45) = P (Z ≤ −0.5) = 1 − P(Z ≤ 0.5) = 0.3085
(c) P(x ≥ 60)
60 − 50
 
P(x ≥ 60) = P Z ≥
10
P(x ≥ 60) = P (Z ≥ 1) = 1 − P(Z < 1) = 1 − 0.8413 = 0.1587

(16) Normal approximation to the binomial When the number of trials n is large n → α
and the probability of success P is close to 0.5, the Normal
p distribution is used to

approximate the Binomial process and µ = nP, δ = npq = np(1 − p),
x−E(x) x−np x−np
Z= √ = √npq = √
Var(x) np(1−p)
To approximate the Binomial random variable using the Normal distribution, we
do continuity corrections by adding or subtracting the correction term 0.5 as in the
following the mnemonics below. This is done with an error.

(a ≤ x ≤ b) e.g. (5 <≤ x ≤ 8) = (4.5 < x < 7.5) (4.60)

(a < x < b) e.g. (5 < x < 8) = (5.5 < x < 7.5) (4.61)
(a < x ≤ b) e.g. (5 < x ≤ 8) = (5.5 < x < 8.5) (4.62)
(a ≤ x < b) e.g. (4.5 < x < 7.5) = (5.5 < x < 8.5) (4.63)
(x < b) e.g. (x < 8) = (x < 7.5) (4.64)
(x > a) e.g. (x > 8) = (x > 8.5) (4.65)

MTH 2102 Probability Theory– Semester Two 74


(x ≤ b) e.g. (x ≤ 8) = (x < 8.5) (4.66)
(x ≥ a) = (a ≤ x) e.g. (8 ≤ x) = (7.5 < x) = (x > 7.5) (4.67)
(x = a) e.g. (x = 9) = (8.5 < x < 9.5) (4.68)

(17) Normal approximation to the Poisson distribution And when the number of trials n
is large n → ∞ and λ > 20, the Normal
√ distribution can also be used to approximate
x−E(x) x−np
the Poisson process and µ = λ,δ = λ such that Z = √ = x−λ
√ = √
λ
Var(x) np(1−p)
Respect the continuity corrections above.

4.4.3 The Gamma function


A Gamma distribution is defined by
 βα xα−1 e−βx
 Γ(α) ;
 0 < x < ∞ and α, β > 0
G(x; α, β) = 

(4.69)
0; elsewhere

with parameters α, β and sometimes written as


 β(βx)α−1 e−βx
 Γ(α) ;
 0 < x < ∞ and α, β > 0
G(x; α, β) = 

(4.70)
0; elsewhere

Since G(x; α, β) is a pdf, then Z ∞


G(x; α, β)dx = 1 (4.71)
0

βα xα−1 e−βx

Z
dx = 1 (4.72)
0 Γ(α)
Z ∞ α α−1 −βx
β x e
Γ(α) = dx = 1 (4.73)
0 Γ(α)
The equation Z ∞
Γ(α) = βα xα−1 e−βx dx (4.74)
0

When β = 1 Z ∞ Z ∞
α−1 −x
Γ(α) = x e dx = tα−1 e−t dt (4.75)
0 0

For any real x, t > 0, the Gamma function is an indefinite integral. ie


Z ∞
Γ(x) = tx−1 e−t dt (4.76)
0

MTH 2102 Probability Theory– Semester Two 75


for x > 0 Or Z ∞
Γ(α) = tα−1 e−t dt (4.77)
0

for α > 0. Dividing Eq. 4.75 through by Γ(x) gives


Z ∞ " α−1 −t #
t e
I= dt (4.78)
0 Γ(α)

Hence  α−1 −t
t e
 Γ(α) ;
 0<t<∞
f (t) = 

(4.79)
0; elsewhere

is a Gamma distribution as in Eq. 4.69 above but in a simple form.


Conversely, given the gamma distribution in Eq. 4.79, we can generate a gamma distri-
bution in in Eq. 4.69 by letting t = βx, then dt = βdx

e−βx (βx)α−1 ∞
(βx)α−1 e−βx
Z Z
dt = βdx (4.80)
0 Γ(α) 0 Γ(α)

e−βx (βx)α−1 ∞
xα−1 βα−1 e−βx
Z Z
dt = βdx (4.81)
0 Γ(α) 0 Γ(α)

e−βx (βx)α−1 ∞
xα−1 e−βx
Z Z
dt = βα dx (4.82)
0 Γ(α) 0 Γ(α)
So generates a gamma distribution defined by
 α−1 −βx
αx e
β Γ(α) ; 0 < x < ∞ and α, β > 0

G(x; α, β) = 

(4.83)
0; elsewhere

And G(x; α, β) can serve as a pdf and If X ∽ G(x; α, β) then E(x) = αβ, Var(x) = β2 , Also
letting t = βx , dt = β1 dx
Z ∞ x α−1 e− βx
 
Z ∞ −t α−1
e t β 1
dt = dx (4.84)
0 Γ(α) 0 Γ(α) β
Z ∞ xα−1 1 α−1 e− βx
 
Z ∞ −t α−1
e t β 1
dt = dx (4.85)
0 Γ(α) 0 Γ(α) β
Z ∞ −t α−1 Z ∞ α−1 − x
e t x e β 1
dt = dx (4.86)
0 Γ(α) 0 Γ(α) βα

MTH 2102 Probability Theory– Semester Two 76


So generates a gamma distribution defined by
−x

xα−1 e β 1
 Γ(α) βα ; 0 < x < ∞ and α, β > 0


G(x; α, β) = 

 (4.87)
0;
 elsewhere

It can serve as a pdf and E(x) = αβ , Var(x) = α


β2

4.4.3.1 Evaluating gamma values for x > 0

MTH 2102 Probability Theory– Semester Two 77

You might also like