0% found this document useful (0 votes)
3 views

Discrete Probability Distributions

Chapter 2 discusses discrete probability distributions, focusing on commonly used distributions such as Bernoulli, Binomial, and Geometric. The Bernoulli distribution describes a single trial with two outcomes, while the Binomial distribution extends this to multiple independent trials, calculating the probability of a certain number of successes. The chapter also covers the expectation and variance of these distributions, providing examples and proofs for better understanding.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Discrete Probability Distributions

Chapter 2 discusses discrete probability distributions, focusing on commonly used distributions such as Bernoulli, Binomial, and Geometric. The Bernoulli distribution describes a single trial with two outcomes, while the Binomial distribution extends this to multiple independent trials, calculating the probability of a certain number of successes. The chapter also covers the expectation and variance of these distributions, providing examples and proofs for better understanding.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

CHAPTER 2

Discrete Probability Distributions

A number of probability mass function have proved to be useful for a large variety of practical

fields in engineering, science, artificial intelligence, business and elsewhere. We will only consider

the some of the most commonly used distributions, namely – binomial and Poisson and others

such as uniform, geometric and negative binomial distributions.

2.1 Bernoulli distribution

Consider a simple random experiment (or trial) whose outcome can be classified as either a

success or failure. For example,

– Tossing a coin: obtaining H may be considered a success while T would represent a failure

in this case.

– Results of an HIV test: testing positive for HIV is considered a success during prevalence

studies such as BIAS.

– Testing a new light bulb: in quality assurance, if the bulb doesn’t light when tested it is

considered a success since the interest would be in the number of defectives.

If we let X = 1 when the outcome is success and X = 0 when it is a failure, then the probability

mass function of X is given by

2-1
2.2. Binomial Distribution 2. Discrete Probability Distributions



p
 X=1
p(x) = (2.1)

1 − p
 X=0

where p, 0 < p < 1 is the probability that the trial is a success.

Definition 2.1.1. A random variable X is said to be a Bernoulli random variable if its proba-

bility mass function is given by (2.1) for some p ∈ (0, 1).

Expectation and Variance of a Bernoulli random variable

Let X ∼ Ber(p). Then



Expectation: E X = p

Variance: V X = p(1 − p)

The proof for these are trivial and have been intentional left for the students to prove.

2.2 Binomial Distribution

Suppose that a Bernoulli experiment is repeated independently for n times, n ≥ 1. We say, we

have n independent trials of the random experiment, each of which results in either a success

with probability p and in failure with probability q = 1 − p.

Definition 2.2.1. Let X represent the number of successes that occurred from the n trials.

Then X is said to be a Binomial random variable with parameters (n, p), written X ∼ Bin(n, p).

Its probability mass function is given by

 
n x
p(x) = p (1 − p)n−x ; x = 0, 1, 2, . . . , n (2.2)
x

n!
where dbinomnx = and x! = x(x − 1)(x − 2) · . . . · 2 · 1.
x!(n − x)!

2-2
2.2. Binomial Distribution 2. Discrete Probability Distributions

To make sense of (2.2), note that for any sequence of n items, we will have x successes and n − x

failures and the probability of this occurring is px (1 − p)n−x due to independence of each trial.
 
n
However, this would only be for one sequence of events. Note that there are different ways
x
of obtaining x successes from a total of n trials.

It should also noted that Equation (2.2) satisfies all the conditions of a probability mass

function. In particular, the probabilities sum to 1. By binomial expansion, we have that

n  
X X n x n−x n
p(x) = p 1−p = p + (1 − p) = 1.
x
x
x=0

Example 2.2.1. Five fair coins are flipped. If the outcomes are assumed to be independent,

find the probability mass function of the number of heads obtained.

Solution 2.2.1. Let X be the number of heads (success) that appear, then X ∼ Bin(n = 5, p =

0.5). Hence by (2.2) we have,

  0  5
5 1 1 1
P(X = 0) = =
0 2 2 32
  1  4
5 1 1 5
P(X = 1) = =
1 2 2 32
  2  3
5 1 1 10
P(X = 2) = =
2 2 2 32
  3  2
5 1 1 10
P(X = 3) = =
3 2 2 32
  4  1
5 1 1 5
P(X = 4) = =
4 2 2 32
  5  0
5 1 1 1
P(X = 5) = =
5 2 2 32

Example 2.2.2. A certain type of pill is packed in bottles of 12 pills each. 10% of the pills are

chipped during the manufacturing process. Explain why the binomial distribution can provide

a reasonable model for the random variable X which represent the number of chipped pills in a

bottle. Then find the probabilities of obtaining:

2-3
2.2. Binomial Distribution 2. Discrete Probability Distributions

(a) 0 chipped pills;

(b) 2 chipped pills;

(c) at least 2 chipped pills;

in a bottle.

Solution 2.2.2. The first part of the solution will be discussed in class. I’m expecting a

vibrant discussions in which we need to justify why a Binomial distribution is suitable. The

rest of the solutions are just simple algebraic simplifications. The only thing we note is that

X ∼ Bin(12, 0.1)
 
12
· 0.10 · 0.912 = 0.912 = 0.282

(a) P X = 0 =
0
 
12
· 0.12 · 0.910 = 66(0.01)(0.910 ) = 0.230

(b) P X = 2 =
2
(c) It should be noted that the word at least implies the minimum and therefore we should

compute

 
P(X ≥ 2) = 1 − P(X < 2) = 1 − P(X = 0) + P(X = 1)
 
12 12
· 0.1 · 0.911

= 1 − 0.9 +
1

= 1 − (0.282 + 0.377)

= 0.341

In R:

(a) dbinom(0, size = 12, prob = 0.1)

(b) dbinom(1, size = 12, prob = 0.1)

(c) 1 - pbinom(1, size = 12, prob = 0.1)

2-4
2.2. Binomial Distribution 2. Discrete Probability Distributions

Expectation and Variance of a Binomial Random Variable

Let X ∼ Bin(n, p). Then the expected value of X is given by

E(X) = np (2.3)

and

V(X) = np(1 − p) (2.4)

Proof. By definition, we have that

n n  
X X n x n−x
E(X) = xp(x) = x p q ; where q = 1 − p
x
x=0 x=0
n  
X n−1
= n p px−1 q n−x
x−1
x=1

Note the second equation is based on the boxed side

result. Now let y = x − 1 and m = n − 1. Then


 
n−1 n n!
X x =x

n − 1 y (m+1)−(y+1)
= np p q x x!(n − x)!
y
y=0 n(n − 1)!
=x
x(x − 1)!(n − x)!
Then we have that the sum of probabilities for Y ∼
 
n−1
=n
Bin(m, p). That is, x−1

m  
X m y m−y
= np p q
y
y=0

Hence

E(X) = np (2.5)

2-5
2.2. Binomial Distribution 2. Discrete Probability Distributions

For the variance, we are not going to directly compute E(X 2 ) but E X(X − 1) . This is because
 

we can write X 2 = X(X − 1) + X and thus, E(X 2 ) = E X(X − 1) + E(X).




n n  
  X X n x n−x
E X(X − 1) = x(x − 1)p(x) = x(x − 1) p q
x
x=0 x=0
n  
X n−1
= (x − 1) n p px−1 q n−x
x−1
x=1
n  
X n − 1 x−1 n−x
= np (x − 1) p q
x−1
x=1

 
n−1
Notice the red coded (x − 1) is similarly to the boxed results above except instead on
x−1
x and n, we now have x − 1 and n − 1. Therefore

n  
X n−2
= np (n − 1) p px−2 q n−x
x−2
x=2
n  
2
X n − 2 x−2 n−x
= n(n − 1)p p q
x−2
x=2

Once again let y = x − 2 so that we have

n−2
X 
n − 2 y (n−2)−x
= n(n − 1)p2 p q
y
x=0
m  
X m y m−y
= n(n − 1)p2 p q
y
x=0

Therefore

E X(X − 1) = n(n − 1)p2


 
(2.6)

2-6
2.3. Geometric distribution 2. Discrete Probability Distributions

We can now compute E(X 2 ) by making use of (2.5) and (2.6) to get

E X 2 = E X(X − 1) + E(X)
  

= n(n − 1)p2 + np

= n2 p2 − np2 + np (2.7)

Therefore variance of X will be given by

2
V(X) = E X 2 − E(X)


= n2 p2 − np2 + np − n2 p2

= np − np2

Hence

V(X) = np(1 − p) (2.8)

2.3 Geometric distribution

Definition 2.3.1. Suppose that independent trials, each having probability p of being a success,

are performed until a success occurs. If we let X be the number of trials before the first

success. Then X has a geometric distribution with parameter p (probability of a success),

written, X ∼ Geo(p) and has a probability mass function given by



p(1 − p)x−1

 x = 1, 2, . . . ;
p(x) = (2.9)

0
 otherwise

Equation (2.9) follows since first x − 1 trials must be failures then followed by a successful

trial. In addition, the outcomes of the successive trials are assumed to be independent. It should

2-7
2.3. Geometric distribution 2. Discrete Probability Distributions

be noted that there is another alternative way of looking into the formulation of the geometric

distribution. That is, instead of modeling the number of trials, one might be interested in

number of failures before the first success. In such a case, the distribution is formulated as

follows

p(x) = p(1 − p)x ; x = 0, 1, 2, . . .

Note that x now starts from 0 instead of 1 since it is possible to obtain a success at first try.

To check that p(x) is a probability mass function, we note that


X ∞
X
pq x−1 = p qx; q =1−p
x=1 x=0
1
=p ; the sum is a geometric series, 1 + q + q 2 + . . .
1−q
1
= p = 1.
p

Example 2.3.1. Suppose that you interview job applicants in succession until you find a person

that satisfies the job description. Assume that at each interview, the probability of finding the

right person is 0.3.

(a) What is the probability that you appoint the third person you interview?

(b) What is the probability that you will need to do five or more interviews?

Solution 2.3.1. Solutions will be discussed in class!

Cumulative distribution

Part (b) of Example 2.3.1 could be obtained directly from the cumulative distribution of the

geometric random variable. By definition,

2-8
2.3. Geometric distribution 2. Discrete Probability Distributions

k
X k
X
P(X ≤ k) = p(x) = p q x−1
x=1 x=1

= p 1 + q + q 2 + . . . + q k−1 .


Let S = 1 + q + q 2 + . . . + q k−1 . Then

qS = q + q 2 + q 3 + . . . + q k ,

so that

S − qS = 1 + q + q 2 + . . . + q k−1 − q + q 2 + q 3 + . . . + q k
 

(1 − q)S = 1 − q k .

Thus,

1 − qk
S=
1−q

Now lets go back to the cumulative distribution, P(X ≤ k). We have that

1 − qk
P(X ≤ k) = p · ; q =1−p
1−q
= 1 − qk

Hence, P(X ≥ k) = q k−1 . Intuitively, the probability of that at least k trials are necessary to

obtain a success is equal to the probability that the first k − 1 trials are all failures.

2-9
2.3. Geometric distribution 2. Discrete Probability Distributions

Expectation and Variance of X ∼ Geo(p)

Let X ∼ Geo(p). Then the expected value of X is given by

E(X) = 1/p (2.10)

and

V(X) = (1 − p)/p2 (2.11)

Proof. The expected value of a geometric random variable is given by


X
E(X) = x pq x−1 (2.12)
x=1
X∞
= (x − 1 + 1) p q x−1
x=1
X∞ ∞
X
= (x − 1) pq x−1 + pq x−1 .
x=1 x=1

Note that the last term sums up all probabilities of a geometric distribution and therefore sums

up to 1. Thus,


X
E(X) = ypq y + 1
y=0
X∞
=q ypq y−1 + 1
y=1

2-10
2.3. Geometric distribution 2. Discrete Probability Distributions

Notice that the sum is the definition of the expectation of a geometric distribution given in

(2.12). Therefore

= qE(X) + 1

(1 − q)E(X) = 1

Hence

1
E(X) = = 1/p (2.13)
1−q

In order to determine V(X), we first compute E X 2 . That is,





 X
E X2 = x2 pq x−1
x=1

X 2
pq x−1

= (x − 1) + 1
x=1

X
(x − 1)2 + 2(x − 1) + 1 pq x−1
 
=
x=1

X ∞
X ∞
X
2 x−1 x−1
= (x − 1) pq +2 (x − 1) pq + pq x−1
x=1 x=1 x=1

X ∞
X
= y 2 pq y + 2 ypq y + 1
y=0 y=0
X∞ ∞
X
=q y 2 pq y−1 + 2q ypq y−1 + 1
y=1 y=1

= qE X 2 + 2qE(X) + 1


(1 − q)E X 2 = 2q/p + 1


 2q + p 1
E X2 = ·
p p
q + (q + p)
= = (1 + q)/p2 (2.14)
p2

2-11
2.4. Poisson Distribution 2. Discrete Probability Distributions

Hence

V(X) = EX 2 − (EX)2
1+q 1
= 2
− 2 = q/p2 (2.15)
p p

2.4 Poisson Distribution

Many phenomena in physics follows the Poisson probability law named in honour of the French

mathematician Simeon Poisson (1781 – 1840). The classic example is the decomposition of

radio-active nuclei. The following are other examples where the Poisson distribution can be

used:

– In management science, the number of demands for service in a given period. (e.g. on

tellers in a bank, the runways of an airport, the stock pile of a factory)

– Occurrences of accidents, errors, breakdowns and other calamities – the number that occurs

within a specified time period has a Poisson distribution under certain conditions.

Broadly, the condition for a Process process is that the events occur in time at random.

Loosely, this means that an event is equally likely occur at any instant in time. That is, we only

know the average time of occurrences but the exact timing of an event is random. The number

of occurrences of an event during a fixed time are modeled by a Poisson distribution.

Definition 2.4.1. Suppose we are given a period of time during which events occur at random.

Let λ be the average rate at which events occur per the time period. Let the random variable X

be the number of events occurring during the time period. Then X has a Poisson distribution

with parameter λ i.e. X ∼ Poi(λ) and has a probability mass function



 e−λ λx


x! x = 0, 1, 2, . . .
p(x) = (2.16)

0
 otherwise

2-12
2.4. Poisson Distribution 2. Discrete Probability Distributions

Example 2.4.1. Suppose you are working as a logistics manager for Unitrans and have observed

that on average, the company experiences 12 break-downs per 5-day working week. You have a

policy of keeping two trucks on standby. What is the probability that on any day

(a) no standby trucks are needed?

(b) the number of standby trucks inadequate?

Solution 2.4.1. Let X be a random variable representing the number of break-downs in a given

day. It is reasonable to assume that the break-downs occur at random and that the Poisson

distribution is a realistic model. Since we are interested in break-downs per day, we also need

to convert the weekly average to a daily rate. That is, 12 break-downs per 5 days is equivalent

to 12/5 = 2.4 breakdowns per day. Hence we assume that X ∼ P (λ = 2.4), i.e.

e−2.4 2.4x
P(X = x) = x = 0, 1, 2, . . .
x!

(a) No break-downs in a day means X = 0. Thus,

e−2.4 2.40
P(X = 0) = = 0.091.
0!

(b) If the number of trucks on standby are inadequate, then it means that X > 2 and

P(X > 2) = 1 − P(X ≤ 2)



= 1 − p(0) + p(1) + p(2)
 −2.4 0
e−2.4 2.41 e−2.4 2.42

e 2.4
=1− + +
0! 1! 2!

= 0.430

This implies that in only 9% of the days the company will not use standby trucks at all,

but on 43% of the days they run out of standby trucks. If I were you, I’ll have a talk with

2-13
2.4. Poisson Distribution 2. Discrete Probability Distributions

the COO of the company! But of course this is only, hypothetical.

Example 2.4.2. Beer cans are randomly tossed alongside A10 highway road, with an average

frequency 3.2 per km.

(a) What is the probability of seeing no beer cans over a 5km stretch?

(b) What is the probability of seeing at least one beer can in 200m?

Expectation and Variance of a Poisson Random Variable

Let X ∼ Poi(λ). Then the expected value of X is given by

E(X) = λ (2.17)

and

V(X) = λ (2.18)

Proof. We are going to start with the proof of E(X) and make use of the Taylor expansion of
x2 x3
ex given by ex = 1 + x + + + . . .. We have
2! 3!


X
E(X) = xp(x)
x=0

X e−λ λx
= x
x!
x=0

−λ
X λ · λx−1
=e x
x(x − 1)!
x=0

−λ
X λx−1
= λe
(x − 1)!
x=1

2-14
2.4. Poisson Distribution 2. Discrete Probability Distributions

Note that we can change the variable from x to y by letting y = x − 1. If we do that the values

of y will range from 0 to ∞. Then we ca re-write E(X) as:


X λy
E(X) = λe−λ
y!
y=0

= λe−λ eλ

We now turn our efforts to showing that V(X) = λ. It should be noted that E(X 2 ) = E X(X −



1) + E(X) and


  X
E X(X − 1) = x(x − 1)p(x)
x=0

X e−λ λx
= x(x − 1)
x!
x=0

We note that x! = x(x − 1)(x − 2)! and also that λx = λ2 λx−2 . Substituting these back, we get


  X e−λ λ2 λx−2
E X(X − 1) = x(x − 1)
x(x − 1)(x − 2)!
x=0
∞ −λ x−2
X e λ
= λ2
(x − 2)!
x=2

If we let y = x − 2, then y will range from 0 to ∞ and we can re-write the expectation as

∞ −λ y
  2
X e λ
E X(X − 1) = λ
y!
y=0

Note that the sum is with respect to the Possion distribution of Y . It sums to one and thus

E X(X − 1) = λ2 .
 

2-15
2.4. Poisson Distribution 2. Discrete Probability Distributions

Hence

   2
V(X) = E X(X − 1) + E(X) − E(X)

= λ2 + λ − λ2

= λ.

Approximation of a binomial distribution with a Poisson distribution

The Poisson random variable has a tremendous range of applications and one of these is the

approximation of a X ∼ Bin(n, p) when n is large and p is relatively small so that np is of

moderate size. In general, the Poisson approximation works well if n ≥ 20 and p ≤ 0.05 or if

n ≥ 100 and p ≤ 0.10.

Let X ∼ Bin(n, p) and λ = np. Then we have that

 
n x n−x
P(X = x) = p q ; q =1−p
x
 x 
λ n−x

n! λ
= 1−
(n − x)!x! n n
n(n − 1)(n − 2) · . . . · (n − (x − 1))(n − x)! 1 λx λ n λ −x
   
= 1− 1−
(n − x)! nx x! n n
x
 n  −x
n(n − 1)(n − 2) · . . . · (n − (x − 1))(n − x)! λ λ λ
= x
1− 1−
n x! n n
x
 n  −x
n n−1 n − (x − 1) λ λ λ
= · · ... · 1− 1−
n n n x! n n
λ n λ −x
      x    
1 2 x−1 λ
= 1− 1− ... 1 − 1− 1− .
n n n x! n n

Now lets consider what happens to the above equation as n → ∞. That is,

x − 1 λx λ n λ −x
         
1 2
lim P(X = x) = lim 1− 1− ... 1 − 1− 1−
n→∞ n→∞ n n n x! n n

2-16
2.5. Negative binomial distribution 2. Discrete Probability Distributions

Note that the λx /x! does not depend on n and therefore can be taken out of the limit. Also

recall that a limit of product of functions is just the product of their limits. Therefore, we have

λx λ −x λ n
       
1 x−1
= lim 1 − . . . lim 1 − lim 1 − lim 1 −
x! n→∞ n n→∞ n n→∞ n n→∞ n
n
λx

λ
= (1 − 0) . . . (1 − 0) (1 − 0)−x lim 1 −
x! n→∞ n
n
λx

λ
= lim 1 −
x! n→∞ n
λx −λ
= e
x!

Therefore, as n becomes very large, we have that

e−λ λx
P(X = x) ≈
x!

which is a p.m.f of a Poisson distribution. It should be noted that the last equation arise from

the result
 n
−λ λ
e = lim 1−
n→∞ n

which is beyond the scope of this course.

Example 2.4.3. Suppose 5% of Christmas tree light bulbs manufactured by a company are

defective. The company’s Quality Control manager is concerned and as a result, samples 100

bulbs coming off the assembly line. Let X denote the number of defective bulbs in the sample.

What is the probability that the sample contains at most 3 defective bulbs?

The solution will be discussed in class

2.5 Negative binomial distribution

Before we present the formal definition of a negative binomial random variable, lets consider the

following example. Suppose a representative from Botswana Football Association’s marketing

2-17
2.5. Negative binomial distribution 2. Discrete Probability Distributions

department has three tickets to giveaway for the next Zebras game in Gaborone. He randomly

selects people on random shopping malls of Gaborone until he finds three people who attended

the last Zebras game to reward them with a match ticket for the next game.

Let p be the probability that he succeeds in finding such a person. Now, let X denote the

number of people he selects until he find the r people, say r = 3, who attended the last Zebras

game. What is the probability that he will select a total of 10 people.

This question implies that the representative will stop as soon as the three tickets have been

given away. That is, the 10th person will be given the last ticket. As a result, there were nine

people selected before of which anyone of them could have received the initial two tickets. The
 
9
possible total number of ways this could be done would of course be . Since each person is
2
assumed to be independent, then that probability will be given as

P(X = 10) = P 2 tickets out of 9 people · P 10th person gets the ticket
 
   
9 2 7 9 3 7
= p q ·p= p q
2 2

Note that we had X = 10 and r = 2, so relating to the equation expression above we get

 
x − 1 r x−r
P(X = x) = p q
r−1

This is exactly the p.m.f for a negative binomial random variable. The underlying scenario for

the negative binomial is identified to that of the binomial distribution but with some twist.

For the binomial distribution, we fixed n, number of trials and counted number of success from

these n trials. But for the negative binomial distribution, we fix r, number of success and count

number of trials until we have the rth success. Thus, the variable X is the number of trials

required for obtaining r successes. Below is a more formal definition.

Definition 2.5.1. Suppose that independent trials, each having probability p, 0 < p < 1, of

2-18
2.5. Negative binomial distribution 2. Discrete Probability Distributions

being a success are performed until a total of r successes is accumulated. If we let X be the

number of trials required, then

 
x − 1 r x−r
P(X = x) = p q ; x = r, r + 1, r + 2, . . . (2.19)
r−1

To show that the above function is a proper p.m.f, we going to utilize the following result,

referred to as sum of negative binomial series.

∞  
−r
X k+r−1
(1 − a) = ak (2.20)
r−1
k=0

Now lets consider

∞ ∞  
X X x − 1 r x−r
P(X = x) = p q
x=r x=r
r−1
∞  
r
X x − 1 x−r
=p q
x=r
r−1

Let y = x − r so that x = y + r. Then we have that

∞  
r
X (y + r) − 1
=p q (y+r)−r
r−1
y=0
∞  
r
X y+r−1 y
=p q
r−1
y=0

= pr (1 − q)−r = 1.

Example 2.5.1. A medical researcher is recruiting 20 subjects for a study possible effects of

one of the COVID-19 drugs. Suppose each person that she interviews has a 60% chance of being

eligible to participate in the study. What is the probability that she will have to interview 40

people?

Example 2.5.2. Calculate the following probabilities.

2-19
2.5. Negative binomial distribution 2. Discrete Probability Distributions

(a) You toss a coin 4 times. The probability that you get (exactly) 2 heads.

(b) You toss a coin until you get 2 heads. The probability that it takes (exactly) 4 tosses.

(c) Which is larger? Explain why the answer makes intuitive sense.

Expectation and variance of negative binomial distribution

Let X ∼ N B(r, p). Then

r
E(X) = (2.21)
p

and

q
V(X) = r (2.22)
p2

Proof. By definition, the mathematical definition of X is given by

∞ ∞  
X X x−1
E(X) = xp(x) = x pr q x−r
x=r x=r
r−1

Recall the highlighted boxed identity from the binomial section.

∞  
X x r x−r
= r p q
x=r
r
∞  
r X x r+1 x−r
= p q
p x=r r

2-20
2.5. Negative binomial distribution 2. Discrete Probability Distributions

Let y = x + 1 so that x = y − 1. Then

∞  
r X y − 1 r+1 y−(r+1)
= p q
p r
y=r+1
∞  
r X y − 1 s y−s
= p q
p y=s s − 1

r
= .
p

We now need E(X 2 ) to find the variance of X. We utilize the fact that we can write E(X 2 ) =
 
E X(X + 1) − E(X). Lets consider E X(X + 1) .

∞ ∞  
X X x−1
pr q x−r

E X(X + 1) = (x + 1)xp(x) = (x + 1) x
x=r x=r
r − 1
∞  
r X x
= (x + 1) pr+1 q x−r
p r
x=r+1

x + 1 pr+2 x−r
 
r X
= (r + 1) q
p r+1 p
x=r+1
∞  
r(r + 1) X x + 1 r+2 x−r
= p q .
p2 r+1
x=r+2

Let x = y − 2 and r = s − 2. Then we have

∞  
r(r + 1) X y − 1 s y−s

E X(X + 1) = p q
p2 x=s
s−1

r2 + r
= .
p2

Therefore

E(X 2 ) = E X(X + 1) − E(X)




r2 + r r
= − .
p2 p

2-21
2.6. Hypergeometric distribution 2. Discrete Probability Distributions

Hence

2
V(X) = E(X 2 ) − E(X)
r2 + r r r2
= − − 2
p2 p p
r r r − rp
= 2− =
p p p2
1−p q
=r 2 =r 2
p p

2.6 Hypergeometric distribution

The hypergeometric distribution is a discrete probability distribution that calculates the like-

lihood an event happens x times in n trials when you are sampling from a small population

without replacement. When you sample without replacement, the probabilities change with each

subsequent trial. For instance, when you draw an ace from a deck of cards, the probability de-

creases for drawing another ace on the next draw because the deck has fewer aces. Conversely,

the binomial distribution assumes the chances remain constant over the trials and thus assumes

the samples are drawn with replacement.

Let’s look at an example to bring it to life!

Example 2.6.1. Suppose a crate contains 50 light bulbs of which 5 are defective and 45 are

not. A quality control inspector randomly samples 4 bulbs without replacement. Let X be the

number of defective bulbs selected. Find the probability mass function, p(x), of the discrete

random variable X.

The solution of this example will be illustrated in class

Definition 2.6.1. Suppose we randomly select n items without replacement from a set of N

items of which:

ˆ m of the items are of one type; and

2-22
2.6. Hypergeometric distribution 2. Discrete Probability Distributions

ˆ N − m of the items are of a second type.

Then the probability mass function of the discrete random variable X is called the hypergeometric

distribution and takes the form:

m N −m
 
x n−x
P(X = x) = N
 (2.23)
n

where the support S is the collection of nonnegative integers x that satisfies the inequalities:

x ≤ n, m and n − x ≤ N − m.

The next step is to show that (2.23) is a proper probability mass function. However, we

need to know about a Vandermonde’s identity before we can show that. The Vandermonde’s is

given as

k     
X m n m+n
= (2.24)
r k−r k
r=0

A detailed algebraic proof of this identity can be found here. It is however not examinable. Now

lets show that a hypergeometric distribution isa proper p.m.f. That is, we need to show that
m
P
p(x) = 1.
x=0
Lets consider

m m m N −m
 
X X x n−x
p(x) = N

x=0 x=0 n
m   
1 X m N −m
= N
n
x n−x
x=0

2-23
2.6. Hypergeometric distribution 2. Discrete Probability Distributions

Notice that the sum is exactly the Vandermonde’s sum in (2.24). Therefore

 
1m + (N − m)
= N

n
n
 
1 N
= N
n
n

= 1.

Expectation and Variance of a Hypergeometric Distribution



By definiion, E X is given by

m
 X
E X = xp(x)
x=0
m m N −m
 
X x x n−x
= N

x=0 n

   
m m−1
Recall that we once showed that x =m when we were discussing the expectation
x x −1   
N N −1
of the Binomial distribution. Similarly, we have n =N so that
n n−1

m N −m
m
 
X x n−x n
x
= N

x=0 n n
 N −m
Xm
m m−1
x−1 n−x n
= −1
N N

x=1 n−1
m m−1 N −m
 
nm X x−1 n−x
= N −1

N n−1
x=1

Note that the sum represent the total of probabilities corresponding to the hypergeometric

distribution. We have just shown above that these add up to 1. Hence

 nm
E X =
N

2-24
2.6. Hypergeometric distribution 2. Discrete Probability Distributions

We next consider the variance of the hypergeometric distribution which by definition is given

by V(X) = E(X 2 ) − E(X))2 . We however don’t compute E(X 2 ) directly but use the following

equation

E X 2 = E X(X − 1) + E(X)
  

 
Since we have E(X), we compute E X(X − 1) as follows

 N −m
  X m
x(x − 1) m
x n−x n(n − 1)
E X(X − 1) =
n(n − 1) N

x=0 n

   
m m−2
Once again, we have shown that x(x − 1) = m(m − 1) . Thus, we have
x x−2

m m−2 N −m
 
nm (m − 1)(n − 1) X x−2 n−x
= N −2
N −1

N n−2
x=2
nm (m − 1)(n − 1)
=
N N −1

The last equality came from the fact that the sum is the total of all probabiliies of a hypergeo-

metric which is 1. We therefore have that

 nm (m − 1)(n − 1) nm
E X2 = + ,
N N −1 N

so that

nm (m − 1)(n − 1) nm  nm 2
V(X) = + − .
N N −1 N N

After simple algebraic simplifications (refer to class discussions) we get that

 
m m N −n
V(X) = n 1− .
N N N −1

2-25

You might also like