0% found this document useful (0 votes)

21 views31 pages

Chapter 2 - Discrete Random Variable

This document discusses discrete random variables and their probability distributions. It defines discrete and continuous random variables, and explains probability distributions and expected values of discrete random variables. Examples are provided to illustrate concepts like finding probabilities, cumulative distribution functions, and calculating means of random variables.

Uploaded by

Shehan De Silva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views31 pages

Chapter 2 - Discrete Random Variable

Uploaded by

Shehan De Silva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

SCHOOL OF MATHEMATICAL SCIENCES

MAT 1034 INTRODUCTION TO PROBABILITY

Chapter 2:

Discrete Random Variables and Their Probability Distributions

1. Random Variables

Suppose the following table gives the frequency and relative frequency distributions of the number
of vehicles owned by all 2000 families living in a small town.

Frequency and Relative Frequency Distributions of the Number of Vehicles Owned by

Families
Number of Vehicles Owned Frequency Relative Frequency
0 30 30/2000 = 0.015
1 470 470/2000=0.235
2 850 850/2000=0.425
3 490 490/2000 = 0.245
4 160 160/2000 = 0.080
N = 2000 Sum = 1.000

Suppose one family is randomly selected from this population. The process of randomly selecting a
family is called a random or chance experiment. Let x denote the number of vehicles owned by the
selected family. Then x can assume any of the five possible values (0, 1, 2, 3, and 4) listed in the first
column of the above table. The value of x depends on the outcome of a random experiment.
Consequently, x is called a random variable or a chance variable. In general, a random variable is
denoted by x and y.

Definition 1.1
A random variable is a variable whose value is determined by the outcome of a random experiment.

A random variable can be discrete or continuous.

Discrete Random Variable

A discrete random variable assumes values that can be counted. In other words, the consecutive
values of a discrete random variable are separated by a certain gap.

DEFINITION 1.2
A random variable that assumes countable values is called a discrete random variable.

In the table, the number of vehicles owned by a family is an example of a discrete random variable
because the values of the random variable x are countable: 0, 1, 2, 3, and 4.

1
SCHOOL OF MATHEMATICAL SCIENCES

The following are some examples of discrete random variables

1. The number of cars sold at a dealership during a given month.
2. The number of houses builds in a particular project.
3. The number of fish caught on a fishing trip.
4. The number of complaints received at the office of an airline on a given day
5. The number of customers who visit a bank during any given hour
6. The number of heads obtained in three tosses of a coin.

Continuous Random Variable

A random variable whose values are not countable is called a continuous random variable. A
continuous random variable can assume any value over an interval or intervals.

Definition 1.3
A random variable that can assume any value contained in one or more intervals is called a continuous
random variable.

Because the number of values contained in any interval is infinite, the possible number of values that
a continuous random variable can assume is also infinite. Moreover, we cannot count these values.
Consider the life of a battery. We can measure it as precisely as we want. For instance, the life of this
battery may be 40 hours, or 40.25 hours, or 40.247 hours. Assume that the maximum life of a battery
is 200 hours. Let x denote the life of a randomly selected battery of this kind. Then, x can assume any
value in the interval 0 to 200. Consequently, x is a continuous random variable. As shown in the
diagram, every point on the line representing the interval 0 to 200 gives a possible value of x.

The following are some examples of continuous random variables:

1. The height of a person.
2. The time taken to complete an examination.
3. The length of a room
4. The weight of a fish
5. The price of a house

2
SCHOOL OF MATHEMATICAL SCIENCES

This chapter is limited to a discussion of discrete random variables and their probability distributions.
Continuous random variables will be discussed in Chapter 3.

2. Probability Distribution of a Discrete Random Variable

Let x be a discrete random variable. The probability distribution of x describes how the probabilities
are distributed over all the possible values of x.

DEFINITION 2.1
The probability distribution of a discrete random variable lists all the possible values that the random
variable can assume and their corresponding probabilities. We will sometimes denote P( X = x) by
P( x) or f ( x) .

THEOREM 2.1
For any discrete probability distribution, the following must be true:
1. 0 ≤ P(x) ≤ 1 for all x.
2. ∑𝑥 𝑃(𝑥) = 1, where the summation is over all values of x with nonzero probability.

Note: For a random variable X of the discrete type, the probability P( X = x) is frequently denoted
by f ( x) , and this function f ( x) is called probability mass function (p.m.f.).

Example 2.1
Each of the following table lists certain values of x and their probabilities. Determine whether or not
each table represents a valid probability distribution.
(a)
x P (x)
0 0.08
1 0.11
2 0.39
3 0.27

(b)
x P (x)
2 0.25
3 0.34
4 0.28
5 0.13

Example 2.2
The following table lists the probability distribution of the number of breakdowns per week for a
machine based on past data.

Breakdowns per week 0 1 2 3

Probability 0.15 0.20 0.35 0.30

Find the probability that the number of breakdowns for this machine during a given week is
(i) exactly 2
(ii) 0 to 2
(iii) more than 1
(iv) at most 1

DEFINITION 2.2
The cumulative distribution function (CDF), F(x) of a discrete random variable X with probability
distribution P(X) is

𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∑ 𝑃(𝐴) 𝑓𝑜𝑟 − ∞ < 𝑥 < ∞

𝐴≤𝑥
It is also called the distribution function, or c.d.f.

Note: P( X = x) = f ( x) = F ( x) − F ( x − 1)

Example 2.3
Consider the following distribution function of X:

X 1 2 3 4
P(x) 0.4 0.3 0.2 0.1
Find the cumulative distribution of X.

4
SCHOOL OF MATHEMATICAL SCIENCES

3. The Expected Value (Mean) and Standard Deviation of a Discrete Random Variable

The mean of a discrete random variable, denoted by  , is actually the mean of its probability
distribution. The mean of a discrete random variable x is also called its expected value and is denoted
by E  x  . The mean (or expected value) of a discrete random variable is the value that we expect to
observe per repetition, on average, if we perform an experiment a large number of times. For example,
we may expect a car salesperson to sell, on average, 2.4 cars per week. This does not mean that every
week this salesperson will sell exactly 2.4 cars. This simply means that if we observe for many weeks,
this salesperson will sell a different number of cars during different weeks; however, the average for
all these weeks will be 2.4 cars per week.

To calculate the mean of a discrete random variable x, we multiply each value of x by the
corresponding probability and sum the resulting products. This sum gives the mean (or expected value)
of a discrete random variable x.

DEFINITION 3.1
Let X be a discrete random variable with the probability function p(x). Then the expected value of X,
E[X], is defined to be
E[ X ] =  x . p( x) .
x

5
SCHOOL OF MATHEMATICAL SCIENCES

Example 3.1
The probability distribution for a random variable Y is given in Table below. Find the mean of Y.

Table: Probability distribution for Y

y p(y)
0 1/8
1 ¼
2 3/8
3 1/4

Example 3.2
Find E[X] where X is the outcome when we roll a fair die.

Example 3.3
The distribution for the age of mathematics lecturers is summarized below. Calculate the mean age.

x = age 32 39 45 57 62
P(X =x) 0.16 0.42 0.10 0.28

6
SCHOOL OF MATHEMATICAL SCIENCES

Example 3.4
An insurance policy pays an individual 100 per day for up to 3 days of hospitalization and 25 per day
for each day of hospitalization thereafter. The number of days of hospitalization, X, is a discrete
random variable with probability function

6−𝑘
𝑃(𝑋 = 𝑘) = { 15 , 𝑓𝑜𝑟 𝑘 = 1,2,3,4,5
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Calculate the expected payment for hospitalization under this policy.

7
SCHOOL OF MATHEMATICAL SCIENCES

THEOREM 3.1
Let X be a discrete random variable with probability function p(x) and g(X) be a real-valued function
of X. Then the expected value of g(X) is given by
E[ g ( X )] =  g ( x) p( x) .
all x

Example 3.5
If X is the outcome when we roll a fair die, find the expected value of g ( X ) = 2 X 2 + 1 .

Standard Deviation of a Discrete Random Variable

The standard deviation of a discrete random variable, denoted by  , measures the spread of its
probability distribution. A higher value for the standard deviation of a discrete random variable
indicates that x can assume values over a large range about the mean. In contrast, a smaller value for
the standard deviation indicates that most of the values that x can assume are clustered closely about
the mean.
The standard deviation can be computed from the formula in Theorem 3.2.

THEOREM 3.2
Let X be a discrete random variable with probability function p(x) and mean E[X] = μ; then the
variance of a random variable X is defined to be the expected value of (X − μ)2. That is,
Var(𝑋) = 𝜎 2 = 𝐸[(𝑋 − 𝜇)2 ] = 𝐸(𝑋 2 ) − 𝜇 2 .

The standard deviation of X is the positive square root of Var(X).

Example 3.6
Calculate the standard deviation of Y as shown in Example 3.1

8
SCHOOL OF MATHEMATICAL SCIENCES

Example 3.7
Calculate the standard deviation of X, where X represents the outcome when a fair die is rolled.

4. The Binomial Probability Distribution

The binomial probability distribution is one of the most widely used discrete probability distributions.
It is applied to find the probability that an outcome will occur x times in n performances of an
experiment. For example, given that the probability is 0.05 that a DVD player manufactured at a firm
is defective, we may be interested in finding the probability that in a random sample of three DVD
players manufactured at this firm, exactly one will be defective. As a second example, we may be
interested in finding the probability that a baseball player with a batting average of 0.250 will have
no hits in 10 trips to the plate.

To apply the binomial probability distribution, the random variable x must be a discrete dichotomous
random variable. In other words, the variable must be a discrete random variable, and each repetition
of the experiment must result in one of two possible outcomes. The binomial distribution is applied
to experiments that satisfy the four conditions of a binomial experiment. (These conditions are
described in Definition 4.1). Each repetition of a binomial experiment is called a trial or a Bernoulli
trial. For example, if an experiment is defined as one toss of a coin and this experiment is repeated
10 times, then each repetition (toss) is called a trial. Consequently, there are 10 total trials for this
experiment.

DEFINITION 4.1
A binomial experiment possesses the following properties:
1. The experiment consists of a fixed number, n, of identical trials. In other words, the given
experiment is repeated n times, where n is a positive integer. All these repetitions are performed under
identical conditions.
2. Each trial results in one of two outcomes: success, S, or failure, F.

9
SCHOOL OF MATHEMATICAL SCIENCES

3. The probability of success on a single trial is equal to some value p and remains the same from trial
to trial. The probability of a failure is equal to q = (1 − p).
4. The trials are independent. In other words, the outcome of one trial does not affect the outcome of
another trial.

Note that one of the two outcomes of a trial is called a success and the other a failure. Notice that a
success does not mean that the corresponding outcome is considered favorable or desirable. Similarly,
a failure does not necessarily refer to an unfavorable or undesirable outcome. Success and failure are
simply the names used to denote the two possible outcomes of a trial. The outcome to which the
question refers is usually called a success; the outcome to which it does not refer is called a failure.

Example 4.1
Consider the experiment consisting of 10 tosses of a coin. Determine whether or not it is a binomial
experiment.
Solution
The experiment consisting of 10 tosses of a coin satisfies all four conditions of a binomial experiment.
1. There are a total of 10 trials (tosses), and they are all identical. All 10 tosses are performed
under identical conditions. Here, n = 10.
2. Each trial (toss) has only two possible outcomes: a head and a tail. Let a head be called a
success and a tail be called a failure.
1 1
3. The probability of obtaining a head (a success) is and that of a tail (a failure) is for any
2 2
toss. That is
1 1
p = P(H ) = and q = P (T ) =
2 2
The sum of these two probabilities is 1.0. Also, these probabilities remain the same for each
toss.
4. The trials (tosses) are independent. The result of any preceding toss has no bearing on the
result of any succeeding toss.

Example 4.2
Five percent of all DVD players manufactured by a large electronics company are defective. Three
DVD players are randomly selected from the production line of this company. The selected DVD
players are inspected to determine whether each of them is defective or good. Is this experiment a
binomial experiment?
Solution
1. This example consists of three identical trials. A trial represents the selection of a DVD player.
2. Each trial has only two outcomes: a DVD player is defective or a DVD player is good. Let a
defective DVD player be called a success and a good DVD player be called a failure.

10
SCHOOL OF MATHEMATICAL SCIENCES

3. Five percent of all DVD players are defective. So, the probability p that a DVD player is
defective is 0.05. As a result, the probability q that a DVD player is good is 0.95. These two
probabilities add up to 1.
4. Each trial (DVD player) is independent. In other words, if one DVD player is defective, it
does not affect the outcome of another DVD player being defective or good. This is because
the size of the population is very large compared to the sample size.

The Binomial Formula

The random variable x that represents the number of successes in n trials for a binomial experiment
is called a binomial random variable. The probability distribution of x in such experiments is called
the binomial probability distribution or simply the binomial distribution. Thus, the binomial
probability distribution is applied to find the probability of x successes in n trials for a binomial
experiment. The number of successes x in such an experiment is a discrete random variable. Consider
Example 4.2. Let x be the number of defective DVD players in a sample of three. Because we can
obtain any number of defective DVD players from zero to three in a sample of three, x can assume
any of the values 0, 1, 2, and 3. Since the values of x are countable, it is a discrete random variable.

Theorem 4.1
For a binomial experiment, the probability of exactly x successes in n trials is given by the binomial
formula
𝑛
𝑃(𝑋 = 𝑥) = 𝑝(𝑥) = ( ) 𝑝 𝑥 𝑞 𝑛−𝑥 , 𝑥 = 0, 1, 2, … , 𝑛 𝑎𝑛𝑑 0 ≤ 𝑝 ≤ 1.
𝑥

Number of Probability of x
outcomes with successes among n
exactly x successes trials for any one
among n trials particular order

n = total number of trials

p = probability of success
q = 1-p = probability of failure
x = number of successes in n trials
n – x = number of failures in n trials

To solve a binomial problem, we determine the values of n, x, n-x, p, and q and then substitute these
values in the binomial formula.

To find the probability of x successes in n trials for a binomial experiment, the only values needed
are those of n and p. These are called the parameters of the binomial probability distribution or simply
the binomial parameters. The value of q is obtained by subtracting the value of p from 1.0. Thus, q =
1-p.

11
SCHOOL OF MATHEMATICAL SCIENCES

When X is a binomial random variable, it can be denoted as X ~B(n, p).

Example 4.3
Four fair coins are flipped. If the outcomes are assumed independent, what is the probability that two
heads are obtained?

Example 4.4
Five percent of all DVD players manufactured by a large electronics company are defective. A quality
control inspector randomly selects three DVD players from the production line. What is the
probability that exactly one of these three DVD players is defective?

Example 4.5
It is known from past data that 2% of packages mailed through Express House Delivery Service do
not arrive at their destinations within the specified time. Suppose a corporation mails 10 packages
through Express House Delivery Service on a certain day. Find the probability that at most one of
these 10 packages will not arrive at its destination within the specified time.

12
SCHOOL OF MATHEMATICAL SCIENCES

Example 4.6
A small commuter plane has 30 seats. The probability that any particular passenger will not show up
for a flight is 0.10, independent of other passengers. The airline sells 32 tickets for the flight. Calculate
the probability that more passengers show up for the flight than the seats available.

Binomial probability histograms

For any number of trails n:

(i) The binomial probability distribution is symmetric if p = 0.5.
(ii) The binomial probability distribution is skewed to the right if p is less than 0.50.
(iii) The binomial probability distribution is skewed to left if p is greater than 0.50

13
SCHOOL OF MATHEMATICAL SCIENCES

THEOREM 4.2
Let X be a binomial random variable based on n trials and success probability p. Then

μ = E[X] = np and σ2 = Var( X ) = npq.

Example 4.7
According to a report, 56% of teens volunteer time for charitable causes. Assume that this result is
true for the current population of teens. A sample of 60 teens is selected. Let x be the number of teens
in this sample who volunteer time for charitable causes. Find the mean and standard deviation of the
probability distribution of x.
Solution
This is a binomial experiment with a total of 60 trials. Each trial has two outcomes: (1) the selected
teen volunteers time for charitable causes or (2) the selected teen does not volunteer time for
charitable causes. The probabilities p and q for these two outcomes are 0.56 and 0.44, respectively.
Thus,
n = 60, p = 0.56, and q = 0.44
Using the formulas for the mean and standard deviation of the binomial distribution, we obtain
μ = np = 60(0.56) = 33.60
 = npq = ( 60)( 0.56)( 0.44) = 3.845

5. Negative Binomial and Geometric Probability Distribution

Let us consider an experiment where the properties are the same as those listed for a binomial
experiment, with the exception that the trials will be repeated until a fixed number of successes occur.
Therefore, instead of the probability of x successes in n trials, where n is fixed, we are now interested
in the probability that the kth success occurs on the xth trial. Experiments of this kind are called
negative binomial experiments.
As an example, consider the use of a drug that is known to be effective in 60% of the cases where it
is used. The drug will be considered a success if it is effective in bringing some degree of relief to the
patient. We are interested in finding the probability that the fifth patient to experience relief is the
seventh patient to receive the drug during a given week. Designating a success by S and a failure by
F, a possible order of achieving the desired result is SFSSSFS, which occurs with probability
( 0.6 )( 0.4 )( 0.6 )( 0.6 )( 0.6 )( 0.4 )( 0.6 ) = ( 0.6 ) ( 0.4 )
5 2

We could list all possible orders by rearranging the F’s and S’s except for the last outcome, which
must be the fifth success. The total number of possible orders is equal to the number of partitions of
the first six trials into two groups with 2 failures assigned to the one group and 4 successes assigned
6
to the other group. This can be done in   = 15 mutually exclusive ways. Hence, if X represents the
 4
outcome on which the fifth success occurs, then
6
P ( X = 7 ) =   ( 0.6 ) ( 0.4 ) = 0.1866
5 2

 4

14
SCHOOL OF MATHEMATICAL SCIENCES

What is the Negative Binomial Random Variable?

The number X of trials required to produce k successes in a negative binomial experiment is called a
negative binomial random variable, and its probability distribution is called the negative binomial
distribution. Since its probabilities depend on the number of successes desired and the probability of
a success on a given trial, we shall denote them by b * ( x; k , p ) . To obtain the general formula for
b * ( x; k , p ) , consider the probability of a success on the xth trial preceded by k – 1 successes and
x – k failures in some specified order. Since the trials are independent, we can multiply all the
probabilities corresponding to each desired outcome. Each success occurs with probability p and each
failure with probability q = 1-p. Therefore, the probability for the specified order ending in success
is
p k −1q x −k p = p k q x −k
The total number of sample points in the experiment ending in a success, after the occurrence of
k – 1 successes and x – k failures in any order, is equal to the number of partitions of x – 1 trials into
two groups with k – 1 successes corresponding to one group and x – k failures corresponding to the
 x − 1
other group. This number is specified by the term   , each mutually exclusive and occurring
 k − 1
 x − 1
with equal probability p k q x − k . We obtain the general formula by multiplying p k q x − k by  .
 k − 1

Theorem 5.1
If repeated independent trials can result in a success with probability p and a failure with probability
q = 1-p, then the probability distribution of the random variable X, the number of the trial on which
the kth success occurs, is
 x − 1 k x −k
b * ( x; k , p ) =   p q , x = k , k + 1, k + 2,...
 k − 1

Example 5.1
In an NBA championship series, the team that wins four games out of seven is the winner. Suppose
that teams A and B face each other in the championship games and that team A has probability 0.55
of winning a game over team B.
(a) What is the probability that team A will win the series in 6 games?

15
SCHOOL OF MATHEMATICAL SCIENCES

(b) What is the probability that team A will win the series?

(c) If teams A and B were facing each other in a regional playoff series, which is decided by
winning three out of five games, what is the probability that team A would win the series?

THEOREM 5.2
If we define the mean of the negative binomial distribution as the average number of trials required
to produce k success, then the mean is equal to:

𝑘
𝜇 = 𝐸[𝑋] =
𝑝
where E[X] = 𝜇 is the mean number of trials, k is the number of successes, and p is the probability of
a success on any given trial.
The variance of a negative binomial random variable X is

𝑘(1 − 𝑝)
and 𝜎 2 = Var(𝑋) = .
𝑝2

16
SCHOOL OF MATHEMATICAL SCIENCES

Example 5.2
Determine the average and the variance for the number of times a person flips a fair coin until he/she
gets the third head.

Geometric Distribution
If we consider the special case of the negative binomial distribution where k = 1, we have a probability
distribution for the number of trials required for a single success. An example would be the tossing
of a coin until a head occurs. We might be interested in the probability that the first head occurs on
the fourth toss. The negative binomial distribution reduces to the form
b * ( x;1, p ) = pq x −1 , x = 1, 2,3,...
Since the successive terms constitute a geometric progression, it is customary to refer to this special
case as the geometric distribution and denote its values by g ( x; p ) .

Theorem 5.3
If repeated independent trials can result in a success with probability p and a failure with probability
q = 1- p, then the probability distribution of the random variable X, the number of trials until the first
success occurs, is
g ( x; p ) = pq x −1 , x = 1, 2,3,...

Example 5.3
In a certain manufacturing process, it is known that, on the average, 1 in every 100 items is defective.
What is the probability that the fifth item inspected is the first defective item found?

17
SCHOOL OF MATHEMATICAL SCIENCES

Example 5.4
If the probability is 0.75 that an applicant for a driver’s license will pass the road test on any given
attempt, what is the probability that an applicant will finally pass the test on the fourth attempt?

THEOREM 5.4
If X is a random variable with a geometric distribution,
1 1−𝑝
𝜇 = 𝐸[𝑋] = 𝑎𝑛𝑑 𝜎 2 = Var(𝑋) =
𝑝 𝑝2

Example 5.5
By referring to Example 5.3, determine the mean and standard deviation for the number of items
inspected until the first defective item is found.

Example 5.6
By referring to Example 5.4, determine the mean and standard deviation for the number of attempts
until the applicant pass the exam.

18
SCHOOL OF MATHEMATICAL SCIENCES

Example 5.7
You are given a geometric random variable X with the property that P(X = 3) = 5. P(X = 6). Find E(X).

6. The Hypergeometric Probability Distribution

In Section 4, we learned that one of the conditions required to apply the binomial probability
distribution is that the trials are independent, so that the probabilities of the two outcomes (success
and failure) remain constant. If the trials are not independent, we cannot apply the binomial
probability distribution to find the probability of x successes in n trials. In such cases we replace the
binomial by the hypergeometric probability distribution. Such a case occurs when a sample is drawn
without replacement from a finite population.
As an example, suppose 20% of all parts manufactured at a company are defective. Four auto parts
are selected at random. What is the probability that three of these four parts are good? Note that we
are to find the probability that three of the four auto parts are good and one is defective. In this case,
the population is very large and the probability of the first, second, third, and fourth auto parts being
defective remains the same at 0.20. Similarly, the probability of any of the parts being good remains
unchanged at 0.80. Consequently, we will apply the binomial probability distribution to find the
probability of three good parts in four.
Now suppose this company shipped 25 auto parts to a dealer. Later, it finds out that 5 of those parts
were defective. By the time the company manager contacts the dealer, 4 auto parts from that shipment
have already been sold. What is the probability that 3 of those 4 parts were good parts and 1 was
defective? Here, because the 4 parts were selected without replacement from a small population, the
probability of a part being good changes from the first selection to the second selection, to the third
selection, and to the fourth selection. In this case we cannot apply the binomial probability distribution.
In such instances, we use the hypergeometric probability distribution to find the required probability.

Consider an experiment with the following properties:

✓ A random sample of size n is selected without replacement from N items.
✓ r of the N items are classified as ‘success’ and N – r are classified as ‘failures’
✓ Define random variable X as the number of successes

19
SCHOOL OF MATHEMATICAL SCIENCES

→ X follows a hypergeometric distribution.

Theorem 6.1
Let
N = total number of elements in the population
r = number of successes in the population
N – r = number of failures in the population
n = number of trials (sample size)
x = number of successes in n trials
n – x = number of failures in n trials
The probability of x successes in n trials is given by
 r  N − r 
  
x n−x 
P ( x ) =  
N
 
n
where x is an integer 0, 1, 2, . . . , n, subject to the restrictions x ≤ r and n − x ≤ N − r .

Note:
i. Total number of ways of selecting samples of size n chosen from N items, irrespective of
𝑁
whether they are successes or failures, is ( 𝑛 )
→these samples are assumed to be equally likely.
𝑟
ii. There are (𝑥) ways of selecting x ‘successes’ from the r that are available and for each of
𝑁−𝑟
these ways we can choose the n - x ‘failures’ in (𝑛−𝑥 ) ways.
𝑟 𝑁−𝑟
→total number of samples that contains x ‘successes’ is (𝑥)(𝑛−𝑥 ).
(𝑥𝑟)(𝑁−𝑟
𝑛−𝑥)
iii. Hence, 𝑃(𝑋 = 𝑥) =
(𝑁
𝑛)

1. The types of applications of the hypergeometric are very similar to those of the binomial
distribution. Both compute the probabilities for the number of observations that fall into a
particular category.
2. The difference between the binomial distribution and the hypergeometric distribution are:
→in binomial distribution, independence among trials is required. The sampling must be done
with replacement.
→In hypergeometric distribution, the trials are not independent. Sampling without
replacement is done.
3. Applications for the hypergeometric distribution are found in many areas, with heavy uses in
acceptable sampling, electronic testing, and quality assurance. Obviously, for many of these
fields testing is done at the expense of the item being tested. That is, the item is destroyed and
hence cannot be replaced in the sample.

20
SCHOOL OF MATHEMATICAL SCIENCES

Example 6.1
Brown Manufacturing makes auto parts that are sold to auto dealers. Last week the company shipped
25 auto parts to a dealer. Later, it found out that 5 of those parts were defective. By the time the
company manager contacted the dealer, 4 auto parts from that shipment had already been sold. What
is the probability that 3 of those 4 parts were good parts and 1 was defective?
Solution

N = total number of elements (auto parts) in the population = 25

r = number of successes (good parts) in the population = 20
N – r = number of failures (defective parts) in the population = 5
n = number of trials (sample size) = 4
x = number of successes in four trials = 3
n – x = number of failures in four trials = 1

Using the hypergeometric formula, we calculate the required probability as follows

 r  N − r   20  5 
     
 x  n − x   3  1  (1140 )( 5 )
P ( x = 3) = = = = 0.4506
N  25  12, 650
   
n 4

Thus, the probability that 3 out of 4 parts sold are good and 1 is defective is 0.4506.

Example 6.2
Dawn Corporation has 12 employees who hold managerial positions. Of them, 7 are female and 5 are
male. The company is planning to send 3 of these 12 managers to a conference. If 3 managers are
randomly selected out of 12,
(a) find the probability that all 3 of them are female
(b) find the probability that at most 1 of them is a female
Solution
Let the selection of a female be called a success and the selection of a male be called a failure
(a) N = total number of managers in the population = 12
r = number of successes (females) in the population = 7
N – r = number of failures (males) in the population = 5
n = number of selections (sample size) = 3
x = number of successes (females) in three selections = 3
n – x = number of failures (males) in three selections = 0

Using the hypergeometric formula, we calculate the required probability as follows

 r  N − r   7  5 
     
 x  n − x   3  0  ( 35)(1)
P ( x = 3) = = = = 0.1591
N 12  220
   
n 3

21
SCHOOL OF MATHEMATICAL SCIENCES

Thus, the probability that all 3 of the managers selected are female is 0.1591.

(b) Probability at most 1 of the 3 managers selected is a female

= P ( X  1)
= P ( X = 0 ) + P ( X = 1)
 7  5   7  5 
     
=    +   
0 3 1 2
12  12 
   
3 3
= 0.0455 + 0.3182
= 0.3637

Example 6.3
A particular part that is used as an injection device is sold in lots of 10. The producer deems a lot
acceptable if no more than one defective is in the lot. A sampling plan involves random sampling and
testing 3 of the parts out of 10. If none of the 3 is defective, the lot is accepted. Comment on the utility
of this plan.
Solution
Let us assume that the lot is truly unacceptable (i.e. that 2 out of 10 parts are defective). The
probability that the sampling plan finds the lot acceptable is
 2  8 
  
P ( X = 0 ) =    = 0.467
0 3
10 
 
3
Thus, if the lot is truly unacceptable, with 2 defective parts, this sampling plan will allow acceptance
roughly 47% of the time. As a result, this plan should be considered faulty.

THEOREM 6.2
If X is a random variable with a hypergeometric distribution,

𝑛𝑟 N −n r  r 
𝜇 = 𝐸[𝑋] = 𝑎𝑛𝑑 𝜎 2 = Var(𝑋) =  n  1 − 
𝑁 N −1 N N

22
SCHOOL OF MATHEMATICAL SCIENCES

Example 6.4
By referring to Example 6.1, determine the mean and variance for the number of good auto parts in
the sample of 4 auto parts which was sold.

7 The Poisson Probability Distribution

The Poisson probability distribution is another important probability distribution of a discrete random
variable that has a large number of applications. Suppose a washing machine in a laundromat breaks
down an average of three times a month. We may want to find the probability of exactly two
breakdowns during the next month. This is an example of a Poisson probability distribution problem.
Each breakdown is called an occurrence in Poisson probability distribution terminology. The Poisson
probability distribution is applied to experiments with random and independent occurrences. The
occurrences are random in the sense that they do not follow any pattern, and, hence, they are
unpredictable. Independence of occurrences means that one occurrence (or non-occurrence) of an
event does not influence the successive occurrences or non-occurrences of that event. The
occurrences are always considered with respect to an interval. In the example of the washing machine,
the interval is one month. The interval may be a time interval, a space interval, or a volume interval.
The actual number of occurrences within an interval is random and independent. If the average
number of occurrences for a given interval is known, then by using the Poisson probability
distribution, we can compute the probability of a certain number of occurrences, x, in that interval.
Note that the number of occurrences in an interval is denoted by x.

Theorem 7.1
Conditions to Apply the Poisson Probability Distribution
The following three conditions must be satisfied to apply the Poisson probability distribution.
1. x is a discrete random variable
2. The occurrences are random
3. The occurrences are independent

23
SCHOOL OF MATHEMATICAL SCIENCES

The following are three examples of discrete random variables for which the occurrences are random
and independent. Hence, these are examples to which the Poisson probability distribution can be
applied.
1. Consider the number of telemarketing phone calls received by a household during a given day. In
this example, the receiving of a telemarketing phone call by a household is called an occurrence, the
interval is one day (an interval of time), and the occurrences are random (that is, there is no specified
time for such a phone call to come in) and discrete. The total number of telemarketing phone calls
received by a household during a given day may be 0, 1, 2, 3, 4, and so forth. The independence of
occurrences in this example means that telemarketing phone calls are received individually and none
of two (or more) of these phone calls are related.
2. Consider the number of defective items in the next 100 items manufactured on a machine. In this
case, the interval is a volume interval (100 items). The occurrences (number of defective items) are
random and discrete because there may be 0, 1, 3, …, 100 defective items in 100 items. We can
assume the occurrence of defective items to be independent of one another.
3. Consider the number of defects in a 5-foot-long iron rod. The interval, in this example, is a space
interval (5 feet). The occurrences (defects) are random because there may be any number of defects
in a 5-foot iron rod. We can assume that these defects are independent of one another.

The following examples also qualify for the application of the Poisson probability distribution.
1. The number of accidents that occur on a given highway during a 1-week period.
2. The number of customers entering a grocery store during a 1-hour interval
3. The number of television sets sold at a department store during a given week

In contrast, consider the arrival of patients at a physician’s office. These arrivals are non-random if
the patients have to make appointments to see the doctor. The arrival of commercial airplanes at an
airport is non-random because all planes are scheduled to arrive at certain times, and airport
authorities know the exact number of arrivals for any period (although this number may change
slightly because of late or early arrivals and cancellations). The Poisson probability distribution
cannot be applied to these examples.

Theorem 7.2
According to the Poisson probability distribution, the probability of x occurrences in an interval is

𝜆𝑥 −𝜆
𝑃(𝑥) = 𝑒 , 𝑥 = 0, 1, 2, … ; 𝜆 > 0.
𝑥!
where λ is the mean number of occurrences in that interval and the value of e is approximately
2.71828.

Note: The Poisson distribution is denoted by Poisson (λ) or P(λ).

24
SCHOOL OF MATHEMATICAL SCIENCES

Example 7.1
A washing machine in a laundry breaks down an average of three times per month. Find the
probability that during the next month this machine will have:
a) Exactly two breakdowns
b) At most one breakdown
c) At least one breakdown

Example 7.2
An insurance company has noted that, on average, ten insurance claims are received during a week
(consisting of 5 working days). Using an appropriate probability distribution, determine the
probability that
(i) less than 3 insurance claims are received in a week.
(ii) at least 2 insurance claims are received in a three-day period.
(iii) between 3 to 5 (inclusive) insurance claims are received in 2 weeks.

25
SCHOOL OF MATHEMATICAL SCIENCES

THEOREM 4.7.1
If X is a random variable possessing a Poisson distribution with parameter λ,
then
μ = E[X] = λ and σ2 = Var(X) = λ.

Example 7.3
By referring to Example 7.2, determine the mean and standard deviation for the number of insurance
claims in
(a) 1 week
(b) 3 days
(c) 2 weeks

8. Moments and Moment-Generating Functions

In this section, we concentrate on applications of moment-generating functions. The obvious purpose
of the moment-generating function is in determining moments of random variables. However, the
most important contribution is to establish distributions of functions of random variables (covered in
Chapter 5).

DEFINITION 8.1: Moments about the origin.

The kth moment of a random variable X taken about the origin is defined to be E  X k  and is denoted
by 𝜇𝑘′ .
∑ 𝑥 𝑘 𝑓(𝑥) , 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝑥
𝜇𝑘′ = 𝐸[𝑋 𝑘 ] = ∞

∫ 𝑥 𝑘 𝑓(𝑥)𝑑𝑥 , 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
{−∞

26
SCHOOL OF MATHEMATICAL SCIENCES

Note:
i. the first moment about the origin, 𝜇1′ = 𝐸(𝑋).
ii. The second moment about the origin, 𝜇2′ = 𝐸(𝑋 2 ).
→ mean, 𝜇 = 𝜇1′ and variance, 𝜎 2 = 𝜇2′ − 𝜇 2

Although the moments of a random variable can be determined directly from Definition 8.1, an
alternative procedure exists. This procedure requires us to utilize a moment-generating function.

DEFINITION 8.2: Moment-generating function.

The moment-generating function of the random variable X is given by 𝐸(𝑒 𝑡𝑋 ) and is denoted by
M X ( t ) . Hence,

∑ 𝑒 𝑡𝑥 𝑝(𝑥) , 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝑥
𝑀(𝑡) = 𝐸(𝑒 𝑡𝑋 ) = ∞

∫ 𝑒 𝑡𝑥 𝑝(𝑥)𝑑𝑥 , 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
{−∞

The moment generating function is so-called because it generates the moments of X, as can be seen
by expanding the exponential function inside the expectation operator:

 ( tX ) ( tX ) 
2 3

M X ( t ) = E ( e ) = E 1 + tX +
tx
+ + ... 
 2! 3! 
 
t2 t3
= 1 + tE ( X ) + E ( X 2 ) + E ( X 3 ) + ...
2! 3!


It is clear that information about the moments of X about the origin, i.e. E  X k  , k = 0,1, 2,... has 
been compressed into the function M X ( t ) .
We can also use calculus to extract the k-th moment. To see this, observe that
t2
M X ( t ) = E ( X ) + tE ( X ) + E ( X 3 ) + ...
' 2

2!
t2
M X'' ( t ) = E ( X 2 ) + tE ( X 3 ) + E ( X 4 ) + ...
2!

In general,
t2
M X (t ) = E ( X
(k ) k
) + tE ( X k +1
) + 2! E ( X k +2 ) + ..., k = 1, 2,...
Therefore
M X' ( 0 ) = E ( X ) , M X'' ( 0 ) = E ( X 2 ) , and M X( k ) ( 0 ) = E ( X k )

27
SCHOOL OF MATHEMATICAL SCIENCES

THEOREM 8.1
Let X be a random variable with moment-generating function M X ( t ) , then
d k M X (t )
k
= E ( X k ) = k'
dt t =0

Example 8.1
1  3
Given that X has the probability distribution f ( x) =   for x = 0,1, 2, and 3, find the moment-
8 x
generating function of this random variable and use it to determine 1 and 2 .

Example 8.2
1 3 2 1
Let 𝑀(𝑡) = + 𝑒𝑡 + 𝑒 3𝑡 + 𝑒 4𝑡 . Find the mean and variance of X.
7 7 7 7

28
SCHOOL OF MATHEMATICAL SCIENCES

Example 8.3
Show that the moment-generating function of the Binomial random variable with parameters n and p
is given by 𝑀𝑋 (𝑡) = (𝑝𝑒 𝑡 + 𝑞)𝑛 and hence, show that E ( X ) = np and Var ( X ) = npq , where
q = 1− p .
Solution
𝐸(𝑒 𝑡𝑋 ) = ∑𝑛𝑘=0 𝑒 𝑡𝑘 (𝑛𝑘)𝑝𝑘 𝑞 𝑛−𝑘

= ∑𝑛𝑘=0(𝑛𝑘)(𝑝𝑒 𝑡 )𝑘 𝑞 𝑛−𝑘
n
n
From the binomial theorem, ( a + b ) =   a n −i bi
n

i =0  i 

n ( n − 1) n−2 2
= a n + na n−1b + a b + ... + nabn−1 + bn
2!
Thus, applying the binomial theorem,
𝑛
𝑡𝑋 )
𝑛
𝐸(𝑒 = ∑ ( ) (𝑝𝑒 𝑡 )𝑘 𝑞 𝑛−𝑘
𝑘
𝑘=0

= ( pet + q )
n

Now we use it to compute the first two moments.

𝑀′ (𝑡) = 𝑛[𝑝𝑒 𝑡 + 𝑞]𝑛−1 𝑝𝑒 𝑡 ,

𝑀′′ (𝑡) = 𝑛(𝑛 − 1)[𝑝𝑒 𝑡 + 𝑞]𝑛−2 𝑝2 𝑒 2𝑡 + 𝑛[𝑝𝑒 𝑡 + 𝑞]𝑛−1 𝑝𝑒 𝑡

Setting t = 0 we have

𝐸[𝑋] = 𝑀′ (0) = 𝑛𝑝, 𝐸[𝑋 2 ] = 𝑀′′ (0) = 𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝

∴ 𝑉(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2 = 𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝 − 𝑛2 𝑝2 = 𝑛𝑝(1 − 𝑝)

Example 8.4
Show that the moment-generating function of the Poisson random variable with parameter  is given
( )
and hence, show that E ( X ) =  and Var ( X ) =  .
 et −1
by M X ( t ) = e
Solution
 e−   x 

M X (t ) =  e  tx

x =0  x! 

 ( et  ) x
 
= e 
− 
x =0  x! 
 

29
SCHOOL OF MATHEMATICAL SCIENCES

   ( et  ) x 
x 2 x3 xn
Note that e = 1 + x + + + ... =  , thus
x

2! 3! n =0 n !
 
x =0  x!
 = eet  . Hence

 

 ( et  ) x 

M X (t ) = e  
− 
x =0  x ! 
 

= e −  ee  = e
t (
 et −1)

Find derivatives of M(t) to compute µ and σ2.

𝑡 −1)
𝑀′ (𝑡) = 𝑒 𝜆(𝑒 . 𝜆𝑒 𝑡
𝑡 −1) 𝑡 −1)
𝑀′′ (𝑡) = 𝑒 𝜆(𝑒 . 𝜆𝑒 𝑡 + 𝜆𝑒 𝑡 𝑒 𝜆(𝑒 . 𝜆𝑒 𝑡
0 −1)
𝐸(𝑋) = 𝑀′ (0) = 𝑒 𝜆(𝑒 . 𝜆𝑒 0 = 𝜆
0 −1) 0 −1)
𝐸(𝑋 2 ) = 𝑀′′ (0) = 𝑒 𝜆(𝑒 . 𝜆𝑒 0 + 𝜆𝑒 0 𝑒 𝜆(𝑒 . 𝜆𝑒 0 = 𝜆 + 𝜆2

𝑉(𝑋) = ( 𝜆 + 𝜆2 ) − 𝜆2 = 𝜆

9. Chebyshev’s Theorem

Theorem 9.1
The probability that any random variable X will assume a value within k standard deviations of the
1
mean is at least 1 − 2 . That is
k
1
P (  − k  X   + k )  1 − 2
k

1 3
For k = 2, the theorem states that the random variable X has a probability of at least 1 − = of
22 4
falling within two standard deviations of the mean. That is, three-fourths or more of the observations
of any distribution lie in the interval   2 . Similarly, the theorem says that at least eight-ninths of
the observations of any distribution fall in the interval   3 .

Example 9.1
A random variable X has a mean  = 8 , a variance  2 = 9 , and an unknown probability distribution.
Find
(a) P ( −4  X  20 )
(b) P ( X − 8  6)

30
SCHOOL OF MATHEMATICAL SCIENCES

Solution
P ( −4  X  20 ) = P (8 − ( 4 )( 3)  X  8 − ( 4 )( 3) )  1 −
1 15
(a) =
42 16

(b) P ( X − 8  6) = 1 − P ( X − 8  6)
= 1 − P ( −6  X − 8  6 )

= 1 − P ( 8 − ( 2 )( 3)  X  8 + ( 2 )( 3) )  1 −
1 1
=
22 4

Chebyshev’s theorem holds for any distribution of observations, and for this reason the results are
usually weak. The value given by the theorem is a lower bound only. That is, we know that the
probability of a random variable falling within two standard deviations of the mean can be no less
3
than , but we never know how much more it might actually be. Only when the probability
4
distribution is known can we determine exact probabilities. For this reason we call the theorem a
distribution-free result. When specific distributions are assumed, as in the previous sections, the
results will be less conservative. The use of Chebyshev’s theorem is relegated to situations where the
form of the distribution is unknown.