Discrete Probability Distributions
Discrete Probability Distributions
A number of probability mass function have proved to be useful for a large variety of practical
fields in engineering, science, artificial intelligence, business and elsewhere. We will only consider
the some of the most commonly used distributions, namely – binomial and Poisson and others
Consider a simple random experiment (or trial) whose outcome can be classified as either a
– Tossing a coin: obtaining H may be considered a success while T would represent a failure
in this case.
– Results of an HIV test: testing positive for HIV is considered a success during prevalence
– Testing a new light bulb: in quality assurance, if the bulb doesn’t light when tested it is
If we let X = 1 when the outcome is success and X = 0 when it is a failure, then the probability
2-1
2.2. Binomial Distribution 2. Discrete Probability Distributions
p
X=1
p(x) = (2.1)
1 − p
X=0
Definition 2.1.1. A random variable X is said to be a Bernoulli random variable if its proba-
The proof for these are trivial and have been intentional left for the students to prove.
have n independent trials of the random experiment, each of which results in either a success
Definition 2.2.1. Let X represent the number of successes that occurred from the n trials.
Then X is said to be a Binomial random variable with parameters (n, p), written X ∼ Bin(n, p).
n x
p(x) = p (1 − p)n−x ; x = 0, 1, 2, . . . , n (2.2)
x
n!
where dbinomnx = and x! = x(x − 1)(x − 2) · . . . · 2 · 1.
x!(n − x)!
2-2
2.2. Binomial Distribution 2. Discrete Probability Distributions
To make sense of (2.2), note that for any sequence of n items, we will have x successes and n − x
failures and the probability of this occurring is px (1 − p)n−x due to independence of each trial.
n
However, this would only be for one sequence of events. Note that there are different ways
x
of obtaining x successes from a total of n trials.
It should also noted that Equation (2.2) satisfies all the conditions of a probability mass
n
X X n x n−x n
p(x) = p 1−p = p + (1 − p) = 1.
x
x
x=0
Example 2.2.1. Five fair coins are flipped. If the outcomes are assumed to be independent,
Solution 2.2.1. Let X be the number of heads (success) that appear, then X ∼ Bin(n = 5, p =
0 5
5 1 1 1
P(X = 0) = =
0 2 2 32
1 4
5 1 1 5
P(X = 1) = =
1 2 2 32
2 3
5 1 1 10
P(X = 2) = =
2 2 2 32
3 2
5 1 1 10
P(X = 3) = =
3 2 2 32
4 1
5 1 1 5
P(X = 4) = =
4 2 2 32
5 0
5 1 1 1
P(X = 5) = =
5 2 2 32
Example 2.2.2. A certain type of pill is packed in bottles of 12 pills each. 10% of the pills are
chipped during the manufacturing process. Explain why the binomial distribution can provide
a reasonable model for the random variable X which represent the number of chipped pills in a
2-3
2.2. Binomial Distribution 2. Discrete Probability Distributions
in a bottle.
Solution 2.2.2. The first part of the solution will be discussed in class. I’m expecting a
vibrant discussions in which we need to justify why a Binomial distribution is suitable. The
rest of the solutions are just simple algebraic simplifications. The only thing we note is that
X ∼ Bin(12, 0.1)
12
· 0.10 · 0.912 = 0.912 = 0.282
(a) P X = 0 =
0
12
· 0.12 · 0.910 = 66(0.01)(0.910 ) = 0.230
(b) P X = 2 =
2
(c) It should be noted that the word at least implies the minimum and therefore we should
compute
P(X ≥ 2) = 1 − P(X < 2) = 1 − P(X = 0) + P(X = 1)
12 12
· 0.1 · 0.911
= 1 − 0.9 +
1
= 1 − (0.282 + 0.377)
= 0.341
In R:
2-4
2.2. Binomial Distribution 2. Discrete Probability Distributions
E(X) = np (2.3)
and
n n
X X n x n−x
E(X) = xp(x) = x p q ; where q = 1 − p
x
x=0 x=0
n
X n−1
= n p px−1 q n−x
x−1
x=1
m
X m y m−y
= np p q
y
y=0
Hence
E(X) = np (2.5)
2-5
2.2. Binomial Distribution 2. Discrete Probability Distributions
For the variance, we are not going to directly compute E(X 2 ) but E X(X − 1) . This is because
n n
X X n x n−x
E X(X − 1) = x(x − 1)p(x) = x(x − 1) p q
x
x=0 x=0
n
X n−1
= (x − 1) n p px−1 q n−x
x−1
x=1
n
X n − 1 x−1 n−x
= np (x − 1) p q
x−1
x=1
n−1
Notice the red coded (x − 1) is similarly to the boxed results above except instead on
x−1
x and n, we now have x − 1 and n − 1. Therefore
n
X n−2
= np (n − 1) p px−2 q n−x
x−2
x=2
n
2
X n − 2 x−2 n−x
= n(n − 1)p p q
x−2
x=2
n−2
X
n − 2 y (n−2)−x
= n(n − 1)p2 p q
y
x=0
m
X m y m−y
= n(n − 1)p2 p q
y
x=0
Therefore
2-6
2.3. Geometric distribution 2. Discrete Probability Distributions
We can now compute E(X 2 ) by making use of (2.5) and (2.6) to get
E X 2 = E X(X − 1) + E(X)
= n(n − 1)p2 + np
= n2 p2 − np2 + np (2.7)
2
V(X) = E X 2 − E(X)
= n2 p2 − np2 + np − n2 p2
= np − np2
Hence
Definition 2.3.1. Suppose that independent trials, each having probability p of being a success,
are performed until a success occurs. If we let X be the number of trials before the first
Equation (2.9) follows since first x − 1 trials must be failures then followed by a successful
trial. In addition, the outcomes of the successive trials are assumed to be independent. It should
2-7
2.3. Geometric distribution 2. Discrete Probability Distributions
be noted that there is another alternative way of looking into the formulation of the geometric
distribution. That is, instead of modeling the number of trials, one might be interested in
number of failures before the first success. In such a case, the distribution is formulated as
follows
Note that x now starts from 0 instead of 1 since it is possible to obtain a success at first try.
∞
X ∞
X
pq x−1 = p qx; q =1−p
x=1 x=0
1
=p ; the sum is a geometric series, 1 + q + q 2 + . . .
1−q
1
= p = 1.
p
Example 2.3.1. Suppose that you interview job applicants in succession until you find a person
that satisfies the job description. Assume that at each interview, the probability of finding the
(a) What is the probability that you appoint the third person you interview?
(b) What is the probability that you will need to do five or more interviews?
Cumulative distribution
Part (b) of Example 2.3.1 could be obtained directly from the cumulative distribution of the
2-8
2.3. Geometric distribution 2. Discrete Probability Distributions
k
X k
X
P(X ≤ k) = p(x) = p q x−1
x=1 x=1
= p 1 + q + q 2 + . . . + q k−1 .
qS = q + q 2 + q 3 + . . . + q k ,
so that
S − qS = 1 + q + q 2 + . . . + q k−1 − q + q 2 + q 3 + . . . + q k
(1 − q)S = 1 − q k .
Thus,
1 − qk
S=
1−q
Now lets go back to the cumulative distribution, P(X ≤ k). We have that
1 − qk
P(X ≤ k) = p · ; q =1−p
1−q
= 1 − qk
Hence, P(X ≥ k) = q k−1 . Intuitively, the probability of that at least k trials are necessary to
obtain a success is equal to the probability that the first k − 1 trials are all failures.
2-9
2.3. Geometric distribution 2. Discrete Probability Distributions
and
∞
X
E(X) = x pq x−1 (2.12)
x=1
X∞
= (x − 1 + 1) p q x−1
x=1
X∞ ∞
X
= (x − 1) pq x−1 + pq x−1 .
x=1 x=1
Note that the last term sums up all probabilities of a geometric distribution and therefore sums
up to 1. Thus,
∞
X
E(X) = ypq y + 1
y=0
X∞
=q ypq y−1 + 1
y=1
2-10
2.3. Geometric distribution 2. Discrete Probability Distributions
Notice that the sum is the definition of the expectation of a geometric distribution given in
(2.12). Therefore
= qE(X) + 1
(1 − q)E(X) = 1
Hence
1
E(X) = = 1/p (2.13)
1−q
∞
X
E X2 = x2 pq x−1
x=1
∞
X 2
pq x−1
= (x − 1) + 1
x=1
∞
X
(x − 1)2 + 2(x − 1) + 1 pq x−1
=
x=1
∞
X ∞
X ∞
X
2 x−1 x−1
= (x − 1) pq +2 (x − 1) pq + pq x−1
x=1 x=1 x=1
∞
X ∞
X
= y 2 pq y + 2 ypq y + 1
y=0 y=0
X∞ ∞
X
=q y 2 pq y−1 + 2q ypq y−1 + 1
y=1 y=1
= qE X 2 + 2qE(X) + 1
(1 − q)E X 2 = 2q/p + 1
2q + p 1
E X2 = ·
p p
q + (q + p)
= = (1 + q)/p2 (2.14)
p2
2-11
2.4. Poisson Distribution 2. Discrete Probability Distributions
Hence
V(X) = EX 2 − (EX)2
1+q 1
= 2
− 2 = q/p2 (2.15)
p p
Many phenomena in physics follows the Poisson probability law named in honour of the French
mathematician Simeon Poisson (1781 – 1840). The classic example is the decomposition of
radio-active nuclei. The following are other examples where the Poisson distribution can be
used:
– In management science, the number of demands for service in a given period. (e.g. on
– Occurrences of accidents, errors, breakdowns and other calamities – the number that occurs
within a specified time period has a Poisson distribution under certain conditions.
Broadly, the condition for a Process process is that the events occur in time at random.
Loosely, this means that an event is equally likely occur at any instant in time. That is, we only
know the average time of occurrences but the exact timing of an event is random. The number
Definition 2.4.1. Suppose we are given a period of time during which events occur at random.
Let λ be the average rate at which events occur per the time period. Let the random variable X
be the number of events occurring during the time period. Then X has a Poisson distribution
2-12
2.4. Poisson Distribution 2. Discrete Probability Distributions
Example 2.4.1. Suppose you are working as a logistics manager for Unitrans and have observed
that on average, the company experiences 12 break-downs per 5-day working week. You have a
policy of keeping two trucks on standby. What is the probability that on any day
Solution 2.4.1. Let X be a random variable representing the number of break-downs in a given
day. It is reasonable to assume that the break-downs occur at random and that the Poisson
distribution is a realistic model. Since we are interested in break-downs per day, we also need
to convert the weekly average to a daily rate. That is, 12 break-downs per 5 days is equivalent
to 12/5 = 2.4 breakdowns per day. Hence we assume that X ∼ P (λ = 2.4), i.e.
e−2.4 2.4x
P(X = x) = x = 0, 1, 2, . . .
x!
e−2.4 2.40
P(X = 0) = = 0.091.
0!
(b) If the number of trucks on standby are inadequate, then it means that X > 2 and
= 0.430
This implies that in only 9% of the days the company will not use standby trucks at all,
but on 43% of the days they run out of standby trucks. If I were you, I’ll have a talk with
2-13
2.4. Poisson Distribution 2. Discrete Probability Distributions
Example 2.4.2. Beer cans are randomly tossed alongside A10 highway road, with an average
(a) What is the probability of seeing no beer cans over a 5km stretch?
(b) What is the probability of seeing at least one beer can in 200m?
E(X) = λ (2.17)
and
V(X) = λ (2.18)
Proof. We are going to start with the proof of E(X) and make use of the Taylor expansion of
x2 x3
ex given by ex = 1 + x + + + . . .. We have
2! 3!
∞
X
E(X) = xp(x)
x=0
∞
X e−λ λx
= x
x!
x=0
∞
−λ
X λ · λx−1
=e x
x(x − 1)!
x=0
∞
−λ
X λx−1
= λe
(x − 1)!
x=1
2-14
2.4. Poisson Distribution 2. Discrete Probability Distributions
Note that we can change the variable from x to y by letting y = x − 1. If we do that the values
∞
X λy
E(X) = λe−λ
y!
y=0
= λe−λ eλ
=λ
We now turn our efforts to showing that V(X) = λ. It should be noted that E(X 2 ) = E X(X −
1) + E(X) and
∞
X
E X(X − 1) = x(x − 1)p(x)
x=0
∞
X e−λ λx
= x(x − 1)
x!
x=0
We note that x! = x(x − 1)(x − 2)! and also that λx = λ2 λx−2 . Substituting these back, we get
∞
X e−λ λ2 λx−2
E X(X − 1) = x(x − 1)
x(x − 1)(x − 2)!
x=0
∞ −λ x−2
X e λ
= λ2
(x − 2)!
x=2
If we let y = x − 2, then y will range from 0 to ∞ and we can re-write the expectation as
∞ −λ y
2
X e λ
E X(X − 1) = λ
y!
y=0
Note that the sum is with respect to the Possion distribution of Y . It sums to one and thus
E X(X − 1) = λ2 .
2-15
2.4. Poisson Distribution 2. Discrete Probability Distributions
Hence
2
V(X) = E X(X − 1) + E(X) − E(X)
= λ2 + λ − λ2
= λ.
The Poisson random variable has a tremendous range of applications and one of these is the
moderate size. In general, the Poisson approximation works well if n ≥ 20 and p ≤ 0.05 or if
n x n−x
P(X = x) = p q ; q =1−p
x
x
λ n−x
n! λ
= 1−
(n − x)!x! n n
n(n − 1)(n − 2) · . . . · (n − (x − 1))(n − x)! 1 λx λ n λ −x
= 1− 1−
(n − x)! nx x! n n
x
n −x
n(n − 1)(n − 2) · . . . · (n − (x − 1))(n − x)! λ λ λ
= x
1− 1−
n x! n n
x
n −x
n n−1 n − (x − 1) λ λ λ
= · · ... · 1− 1−
n n n x! n n
λ n λ −x
x
1 2 x−1 λ
= 1− 1− ... 1 − 1− 1− .
n n n x! n n
Now lets consider what happens to the above equation as n → ∞. That is,
x − 1 λx λ n λ −x
1 2
lim P(X = x) = lim 1− 1− ... 1 − 1− 1−
n→∞ n→∞ n n n x! n n
2-16
2.5. Negative binomial distribution 2. Discrete Probability Distributions
Note that the λx /x! does not depend on n and therefore can be taken out of the limit. Also
recall that a limit of product of functions is just the product of their limits. Therefore, we have
λx λ −x λ n
1 x−1
= lim 1 − . . . lim 1 − lim 1 − lim 1 −
x! n→∞ n n→∞ n n→∞ n n→∞ n
n
λx
λ
= (1 − 0) . . . (1 − 0) (1 − 0)−x lim 1 −
x! n→∞ n
n
λx
λ
= lim 1 −
x! n→∞ n
λx −λ
= e
x!
e−λ λx
P(X = x) ≈
x!
which is a p.m.f of a Poisson distribution. It should be noted that the last equation arise from
the result
n
−λ λ
e = lim 1−
n→∞ n
Example 2.4.3. Suppose 5% of Christmas tree light bulbs manufactured by a company are
defective. The company’s Quality Control manager is concerned and as a result, samples 100
bulbs coming off the assembly line. Let X denote the number of defective bulbs in the sample.
What is the probability that the sample contains at most 3 defective bulbs?
Before we present the formal definition of a negative binomial random variable, lets consider the
2-17
2.5. Negative binomial distribution 2. Discrete Probability Distributions
department has three tickets to giveaway for the next Zebras game in Gaborone. He randomly
selects people on random shopping malls of Gaborone until he finds three people who attended
the last Zebras game to reward them with a match ticket for the next game.
Let p be the probability that he succeeds in finding such a person. Now, let X denote the
number of people he selects until he find the r people, say r = 3, who attended the last Zebras
This question implies that the representative will stop as soon as the three tickets have been
given away. That is, the 10th person will be given the last ticket. As a result, there were nine
people selected before of which anyone of them could have received the initial two tickets. The
9
possible total number of ways this could be done would of course be . Since each person is
2
assumed to be independent, then that probability will be given as
P(X = 10) = P 2 tickets out of 9 people · P 10th person gets the ticket
9 2 7 9 3 7
= p q ·p= p q
2 2
Note that we had X = 10 and r = 2, so relating to the equation expression above we get
x − 1 r x−r
P(X = x) = p q
r−1
This is exactly the p.m.f for a negative binomial random variable. The underlying scenario for
the negative binomial is identified to that of the binomial distribution but with some twist.
For the binomial distribution, we fixed n, number of trials and counted number of success from
these n trials. But for the negative binomial distribution, we fix r, number of success and count
number of trials until we have the rth success. Thus, the variable X is the number of trials
Definition 2.5.1. Suppose that independent trials, each having probability p, 0 < p < 1, of
2-18
2.5. Negative binomial distribution 2. Discrete Probability Distributions
being a success are performed until a total of r successes is accumulated. If we let X be the
x − 1 r x−r
P(X = x) = p q ; x = r, r + 1, r + 2, . . . (2.19)
r−1
To show that the above function is a proper p.m.f, we going to utilize the following result,
∞
−r
X k+r−1
(1 − a) = ak (2.20)
r−1
k=0
∞ ∞
X X x − 1 r x−r
P(X = x) = p q
x=r x=r
r−1
∞
r
X x − 1 x−r
=p q
x=r
r−1
∞
r
X (y + r) − 1
=p q (y+r)−r
r−1
y=0
∞
r
X y+r−1 y
=p q
r−1
y=0
= pr (1 − q)−r = 1.
Example 2.5.1. A medical researcher is recruiting 20 subjects for a study possible effects of
one of the COVID-19 drugs. Suppose each person that she interviews has a 60% chance of being
eligible to participate in the study. What is the probability that she will have to interview 40
people?
2-19
2.5. Negative binomial distribution 2. Discrete Probability Distributions
(a) You toss a coin 4 times. The probability that you get (exactly) 2 heads.
(b) You toss a coin until you get 2 heads. The probability that it takes (exactly) 4 tosses.
(c) Which is larger? Explain why the answer makes intuitive sense.
r
E(X) = (2.21)
p
and
q
V(X) = r (2.22)
p2
∞ ∞
X X x−1
E(X) = xp(x) = x pr q x−r
x=r x=r
r−1
∞
X x r x−r
= r p q
x=r
r
∞
r X x r+1 x−r
= p q
p x=r r
2-20
2.5. Negative binomial distribution 2. Discrete Probability Distributions
∞
r X y − 1 r+1 y−(r+1)
= p q
p r
y=r+1
∞
r X y − 1 s y−s
= p q
p y=s s − 1
r
= .
p
We now need E(X 2 ) to find the variance of X. We utilize the fact that we can write E(X 2 ) =
E X(X + 1) − E(X). Lets consider E X(X + 1) .
∞ ∞
X X x−1
pr q x−r
E X(X + 1) = (x + 1)xp(x) = (x + 1) x
x=r x=r
r − 1
∞
r X x
= (x + 1) pr+1 q x−r
p r
x=r+1
∞
x + 1 pr+2 x−r
r X
= (r + 1) q
p r+1 p
x=r+1
∞
r(r + 1) X x + 1 r+2 x−r
= p q .
p2 r+1
x=r+2
∞
r(r + 1) X y − 1 s y−s
E X(X + 1) = p q
p2 x=s
s−1
r2 + r
= .
p2
Therefore
r2 + r r
= − .
p2 p
2-21
2.6. Hypergeometric distribution 2. Discrete Probability Distributions
Hence
2
V(X) = E(X 2 ) − E(X)
r2 + r r r2
= − − 2
p2 p p
r r r − rp
= 2− =
p p p2
1−p q
=r 2 =r 2
p p
The hypergeometric distribution is a discrete probability distribution that calculates the like-
lihood an event happens x times in n trials when you are sampling from a small population
without replacement. When you sample without replacement, the probabilities change with each
subsequent trial. For instance, when you draw an ace from a deck of cards, the probability de-
creases for drawing another ace on the next draw because the deck has fewer aces. Conversely,
the binomial distribution assumes the chances remain constant over the trials and thus assumes
Example 2.6.1. Suppose a crate contains 50 light bulbs of which 5 are defective and 45 are
not. A quality control inspector randomly samples 4 bulbs without replacement. Let X be the
number of defective bulbs selected. Find the probability mass function, p(x), of the discrete
random variable X.
Definition 2.6.1. Suppose we randomly select n items without replacement from a set of N
items of which:
2-22
2.6. Hypergeometric distribution 2. Discrete Probability Distributions
Then the probability mass function of the discrete random variable X is called the hypergeometric
m N −m
x n−x
P(X = x) = N
(2.23)
n
where the support S is the collection of nonnegative integers x that satisfies the inequalities:
x ≤ n, m and n − x ≤ N − m.
The next step is to show that (2.23) is a proper probability mass function. However, we
need to know about a Vandermonde’s identity before we can show that. The Vandermonde’s is
given as
k
X m n m+n
= (2.24)
r k−r k
r=0
A detailed algebraic proof of this identity can be found here. It is however not examinable. Now
lets show that a hypergeometric distribution isa proper p.m.f. That is, we need to show that
m
P
p(x) = 1.
x=0
Lets consider
m m m N −m
X X x n−x
p(x) = N
x=0 x=0 n
m
1 X m N −m
= N
n
x n−x
x=0
2-23
2.6. Hypergeometric distribution 2. Discrete Probability Distributions
Notice that the sum is exactly the Vandermonde’s sum in (2.24). Therefore
1m + (N − m)
= N
n
n
1 N
= N
n
n
= 1.
m
X
E X = xp(x)
x=0
m m N −m
X x x n−x
= N
x=0 n
m m−1
Recall that we once showed that x =m when we were discussing the expectation
x x −1
N N −1
of the Binomial distribution. Similarly, we have n =N so that
n n−1
m N −m
m
X x n−x n
x
= N
x=0 n n
N −m
Xm
m m−1
x−1 n−x n
= −1
N N
x=1 n−1
m m−1 N −m
nm X x−1 n−x
= N −1
N n−1
x=1
Note that the sum represent the total of probabilities corresponding to the hypergeometric
nm
E X =
N
2-24
2.6. Hypergeometric distribution 2. Discrete Probability Distributions
We next consider the variance of the hypergeometric distribution which by definition is given
by V(X) = E(X 2 ) − E(X))2 . We however don’t compute E(X 2 ) directly but use the following
equation
E X 2 = E X(X − 1) + E(X)
Since we have E(X), we compute E X(X − 1) as follows
N −m
X m
x(x − 1) m
x n−x n(n − 1)
E X(X − 1) =
n(n − 1) N
x=0 n
m m−2
Once again, we have shown that x(x − 1) = m(m − 1) . Thus, we have
x x−2
m m−2 N −m
nm (m − 1)(n − 1) X x−2 n−x
= N −2
N −1
N n−2
x=2
nm (m − 1)(n − 1)
=
N N −1
The last equality came from the fact that the sum is the total of all probabiliies of a hypergeo-
nm (m − 1)(n − 1) nm
E X2 = + ,
N N −1 N
so that
nm (m − 1)(n − 1) nm nm 2
V(X) = + − .
N N −1 N N
m m N −n
V(X) = n 1− .
N N N −1
2-25