0% found this document useful (0 votes)
160 views24 pages

Haan C T Statistical Methods in Hydrology Solution

The document discusses two discrete probability distributions: 1) The hypergeometric distribution describes the probability of obtaining a given number of successes in a sample drawn without replacement from a finite population that is divided into success/failure categories. 2) The binomial distribution describes the probability of obtaining a given number of successes in n independent yes/no trials with probability p of success on each trial. It applies to Bernoulli processes where the probability of success is the same for each trial. Both distributions are useful for problems involving card sampling, acceptance sampling, modeling rare events like floods or rainy days, and other scenarios involving categorical outcomes from a finite population or sequence of trials. Examples are provided to illustrate their applications.

Uploaded by

gizem dural
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views24 pages

Haan C T Statistical Methods in Hydrology Solution

The document discusses two discrete probability distributions: 1) The hypergeometric distribution describes the probability of obtaining a given number of successes in a sample drawn without replacement from a finite population that is divided into success/failure categories. 2) The binomial distribution describes the probability of obtaining a given number of successes in n independent yes/no trials with probability p of success on each trial. It applies to Bernoulli processes where the probability of success is the same for each trial. Both distributions are useful for problems involving card sampling, acceptance sampling, modeling rare events like floods or rainy days, and other scenarios involving categorical outcomes from a finite population or sequence of trials. Examples are provided to illustrate their applications.

Uploaded by

gizem dural
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

4.

Some Discrete
Probability
Distributions and
Their Applications
THUS FAR, probability distributions have been considered in general terms. This chapter is
devoted to some particular discrete distributions and their applications. The next two chapters are
devoted to selected continuous distributions. These chapters are by no means exhaustive
treatments of probability distributions. Only some of the more common distributions are
considered.

HYPERGEOMETRIC DISTRIBUTION

Drawing a random sample of size n (without replacement) from a finite population of size
N with the elements of the population divided into two groups with k elements belonging to one
group, is an example of sampling from a hypergeometric distribution. The two groups may be
defective or nondefective objects, rainy or nonrainy days, success or failure of a project, etc. For
discussion purposes we will consider that an element (or outcome) from the population is either a
success or a failure. The probability of x successes in a sample of size n selected from a population
of size N containing k successes can be determined by applying equation 2.1.

The total number of possible outcomes or ways of selecting a sample of size n from N
𝑁
objects is � �. The number of ways of selecting x successes and n-x failures from the population
𝑛
𝑘 𝑁−𝑘
containing k successes and N-k failures is � � � �. Thus the probability is
𝑛 𝑛−𝑥
𝑘 𝑁−𝑘
� �� �
𝑥 𝑛−𝑥
𝑓𝑥 (𝑥; 𝑁, 𝑛, 𝑘) = 𝑁 (4.1)
� �
𝑛

The distribution given by equation 4.1 is known as the hypergeometric distribution where
fX (x; N, n, k) is the probability of obtaining x successes in a sample of size n drawn from a
population of size N containing k successes.

90
The cumulative hypergeometric distribution giving the probability of x or fewer successes
is
𝑘 𝑁−𝑘
� �� �
𝑥
𝐹𝑥 (𝑥; 𝑁, 𝑛, 𝑘) = ∑𝑖=0 𝑖 𝑁𝑛−𝑖 (4.2)
� �
𝑛

There are certain natural restrictions on this distribution. For example: x cannot exceed k, x cannot
exceed n, k cannot exceed N and n cannot exceed N. N, n, k, and x are all nonnegative integers.
Furthermore, the outcomes must be random and equally likely.
The mean of the hypergeometric distribution is
𝑛𝑘
𝐸(𝑋) = 𝑁
(4.3)

and the variance is

𝑛𝑘(𝑁−𝑘)(𝑁−𝑛)
𝑉𝑎𝑟(𝑋) = 𝑁 2 (𝑁−1)
(4.4)

Example 4.1. Example 2.5 is an example where the hypergeometric applies. In this example a
success is selecting a bad record and N=10, k=3, n=4.

Solution: The solutions can be written in terms of the hypergeometric as


3 7 3! 7!
� �� � � �� � (3)(35)
1 3 2!1! 4!3!
(a) 𝑓𝑥 (1; 10,4,3) = 10 = 10! = = 0.500
210
4 6!4!

3 7
� �� �
3 1
(b) 𝑓𝑥 (3; 10,4,3) = 10 = 0.0333
4

3 7
� �� �
0 4
(c) 1 − 𝑓𝑥 (0; 10,4,3) = 1 − 10 = 1 − 0.1667 = 0.8333
4

Example 4.2. Assume that during a certain September, 10 rainy days occurred. Also assume that
at this particular location the occurrence of rain on any day is independent of whether or not it
91
rained on any previous day. (This is many times not a good assumption).
A sample of 10 September days is selected at random.

(a) What is the probability that 4 of these days will have been rainy?

(b) What is the probability that less than 4 of these days were rainy?

Solution: Use the hypergeometric distribution with

N=30, n=10, k=10


10 20
� �� �
40 6
(a) 𝑓𝑋 (4; 30,10,10) = 30 = 0.271
� �
10

10 20 10 20 10 20 10 20
� �� �+� �� �+� �� �+� �� �
0 10 10 9 2 8 03 7
(b) 𝐹𝑋 (3; 30,10,10) = 30 = 0.560E
� �
10

Example 4.3. Examples of the hypergeometric distribution commonly found in statistics books
include card sampling problems (What is the probability of exactly 2 aces in a 5-card hand selected
at random from a 52-card deck?) and acceptance sampling problems (What is the probability of
selecting 5 defective items from a lot of 50 items if 20 items are selected and the lot actually
contains 12 defectives?

Solution: Card problem


4 48
� �� �
2 3
𝑃𝑟𝑜𝑏(2 𝑎𝑐𝑒𝑠) = 𝑓𝑋 (2; 52,5,4) = 52 = 0.040
� �
5

Acceptance Sampling Problem


12 38
� �� �
2 15
𝑃𝑟𝑜𝑏(5 def ) = 𝑓𝑥 (5; 50,20,12) = 50 = 0.26
� �
20

92
BERNOULLI PROCESSES

Binomial Distribution

Consider a discrete time scale. At each point on this time scale an event may either occur or
not occur. Let the probability of the event occurring be p for every point on the time scale; thus, the
occurrence of the event at any point on the time scale is independent of the history of any prior
occurrences or nonoccurrences. The probability of an occurrence at the ith point on the time scale is
p for i = 1, 2, .... A process having these properties is said to be a Bernoulli process.

An example of a Bernoulli process might be the occurrence of rainy days. The time scale
has units of days. On any particular day, rainfall may or may not occur. If the occurrence of
rainfall on any given day is independent of the past history of rainfall occurrences, the sequence of
rainy and dry days can be considered a Bernoulli process.

As an example of another Bernoulli process, consider that during any year the probability
of the maximum flow exceeding 10,000 cfs on a particular stream is p. Common terminology for a
flow exceeding a given value is an exceedance. Further consider that the peak flow in any year is
independent from year to year (a necessary condition for the process to be a Bernoulli process).
Let q = 1-p be the probability of not exceeding 10,000 cfs. We can neglect the probability of a
peak of exactly 10,000 cfs since the peak flow rates would be a continuous process. In this
example the time scale is discrete with the points being nominally 1 year in time apart. We can
now make certain probabilistic statements about the occurrence of a peak flow in excess of 10,000
cfs (an exceedance).

For example, the probability of an exceedance occurring in year 3 and not in years 1 or 2
can be evaluated from equation 2.9 as qqp since the process is independent from year to year. The
probability of (exactly) one exceedance in any 3-year period is pqq + qpq + qqp since the
exceedance could occur in either the first, second or third year. Thus the probability of (exactly)
one exceedance in three years is 3pq2.

In a similar manner, the probability of 2 exceedances in 5 years can be found from the
summation of the terms ppqqq, pqpqq, pqqpq, ..., qqqpp. It can be seen that each of these terms is
equivalent to p2q3 and that the number of terms is equal to the number of ways of arranging 2 items
5
(the p's) among 5 items (the p's and q's). Therefore the total number of terms is � � or 10 so that
2
the probability of exactly 2 exceedances in 5 years is 10p2q3.

This result can be generalized so that the probability of X exceedances in n years is


93
𝑛
� � 𝑝 𝑥 𝑞 𝑛−𝑥 . The result is applicable to any Bernoulli process so that the probability of X
𝑥
occurrences of an event in n independent trials if p is the probability of an occurrence in a single
trial is given by

𝑛
𝑓𝑋 (𝑥; 𝑛, 𝑝) = � � 𝑝 𝑥 𝑞 𝑛−𝑥 𝑥 = 0,1,2 … , 𝑛 (4.5)
𝑥

This equation is known as the binomial distribution.

The binomial distribution and the Bernoulli process are not limited to a time scale. Any
process that may occur with probability p at discrete points in time or space or in individual trials
may be a Bernoulli process and follow the binomial distribution.

The cumulative binomial distribution is

𝑛
𝐹𝑋 (𝑥; 𝑛, 𝑝) = ∑𝑥𝑖=0 � � 𝑝𝑖 𝑞 𝑛−𝑖 𝑥 = 0,1,2, … . , 𝑛) (4.6)
𝑖

and gives the probability of X or fewer occurrences of an event in n independent trials if the
probability of an occurrence in any trial is p.

Continuing the above example, the probability of less than 3 exceedances in 5 years is

5
𝐹𝑋 (2; 5, 𝑝) = ∑2𝑖=0 � � 𝑝𝑖 𝑞 5−𝑖
𝑖

= 𝑓𝑋 (0; 5, 𝑝)+𝑓𝑋 (1; 5, 𝑝)+𝑓𝑋 (2; 5, 𝑝)

The mean, variance, and coefficient of skew of the binomial distribution are

𝐸(𝑋) = 𝑛𝑝 (4.7)

𝑉𝑎𝑟(𝑋) = 𝑛𝑝𝑞 (4.8)

(𝑞−𝑝)
𝛾𝑠 = (4.9)
√𝑛𝑝𝑞

The distribution is symmetrical for p = q, skewed to the right for q > p and skewed to the left for q
< p.

94
Because the probability of a success on any trial is independent of past history, the origin of
the time scale of a Bernoulli process can be taken at any time point. Thus the probability of any
combination of successes or failures is the same for any sequence of n points regardless of their
location with respect to the origin.

Example 4.4. On the average, how many times will a 10-year flood occur in a 40-year period?
What is the probability that exactly this number of 10-year floods will occur in a 40-year period?
Solution: A 10-year flood has 𝑝 = 1/10 = 0.1

𝐸(𝑋) = 𝑛𝑝 = 40(0.1) = 4

40 (0.1)4 (0.9)36
𝑓𝑋 (4; 40,0.1) = � �
4

Comment: This problem illustrates the difficulty of explaining the concept of return period. On
the average a 10-year event occurs once every 10 years and in a 40-year period is expected to occur
4 times. Yet in about 80% (100(1-0.2059)) of all possible independent 40-year periods, the 10-year
event will not occur exactly 4 times. As a matter of fact, the probability that it will occur 3 times is
nearly identical to the probability it will occur 4 times (0.2003 vs. 0.2059). The number of
occurrences, X, is truly a random variable (with a binomial distribution).

The binomial distribution has an additive property (Gibra 1973). That is if X has a binomial
distribution with parameters n1 and p and Y has a binomial distribution with parameters n2 and p,
then Z=X+Y has a binomial distribution with parameters n=nl + n2 and p.

A useful property of the binomial distribution is that

𝑓𝑋 (𝑥; 𝑛, 𝑝) = 𝑓𝑥 (𝑛 − 𝑥; 𝑛, 𝑞) (4.10)

The binomial distribution can be used to approximate the hypergeometric distribution if


the sample selected is small in comparison to the number of items N from which the sample is
drawn. In this case the probability of a success would be about the same for each trial and sampling
without replacement (hypergeometric) would be very similar to sampling with replacement
(binomial).

95
Example 4.5. Compare the hypergeometric and binomial for N=40, n=5, k=10 and X=0, 1, 2, 3, 4,
and 5.

Solution:

Hypergeometric Binomial

X 𝒇𝒙 (𝒙; 𝑵, 𝒏, 𝒌) 𝒇𝑿 (𝒙; 𝒏, 𝒑) = 𝒇𝒙 (𝒙; 𝟓, 𝟏𝟎/𝟒𝟎)

= 𝒇𝑿 (𝒙; 𝟒𝟎, 𝟓, 𝟏𝟎)

0 0.2166 0.2373

1 0.4165 0.3955

2 0.2777 0.2637

3 0.0793 0.0879

4 0.0096 0.0146

5 0.0004 0.0010

Comment: This merely indicates that drawing a small sample without replacement from a large
population and drawing the same sample with replacement (so probabilities in each trial are
constant) are nearly equivalent.

Example 4.6. The operator of a boat dock has decided to put in a new facility along a certain river.
In an economic analysis of the situation it was decided to have the facility designed to withstand
floods up to 75,000 cfs. Furthermore, it was determined that if one flood greater than this occurs in
a 5-year period, repairs can be made and still break even on its operation during the 5-year period.
If more than one flow in excess of 75,000 cfs occurs, money will be lost. If the probability of
exceeding 75,000 cfs is 0.15, what is the probability the operator will make money?

Solution: Money will be made if no floods exceeding 75,000 cfs occur during the 5-year period.
Let X be the number of floods. From the binomial distribution

96
5
𝑓𝑋 (𝑥; 𝑛, 𝑝) = 𝑓𝑋 (0; 5,0.15) = � � (0.15)0 (0.85)5 = 0.4437
0

Comment: The probability that the operator will make the investment, work for 5 years, and then

just break even is a very high

5
𝑓𝑋 (1; 5,0.15) = � � (0.15)1 (0.85)4 = 0.3915
1

Thus, even though the risk or probability of losing money is low (1- 0.3915 - 0. 4437 = 0.1648), the
investment may not be an attractive one.

Whenever a decision is made based on uncertain information or relative to a system subject


to random inputs or behavior, there is a chance that the decision will result in an adverse outcome.
A bridge that may be under designed, a water supply reservoir that may be too small, and an
investment that may fail are examples of decisions made in the face of uncertainty. These
decisions are said to be risky decisions with risk defined as the probability of an adverse outcome.
Generally all decisions dependent on hydrologic data and hydrologic analysis are risky in this
sense. A risky decision is not a bad decision. Risk must be balanced against coasts and available
alternatives. For informed decisions to be made under uncertainty, quantitative estimates of the
resulting risk are desirable. Risk and uncertainty are treated in detail in chapter 17.

Example 4.7. In order to be 90% sure that a design storm is not exceeded in a 10-year period, what
should be the return period of the design storm?

Solution: Let p be the probability of the design storm being exceeded. Based on the binomial
distribution the probability of no exceedances is given by

10 0 10
𝑓𝑥 = (0; 10, 𝑝) = � �𝑝 𝑞
0

0.90 = (1 − 𝑝)10

𝑝 = 1 − (0.90)1⁄10 = 1 − 0.9895 = 0.0105

97
1
𝑇 = 𝑝 = 95 𝑦𝑒𝑎𝑟𝑠

Comment: To be 90% sure that a design storm is not exceeded in a 10-year period a 95-year return
period storm must be used. If a 10-year return period storm is used, the chances of it being
exceeded is

1 − 𝑓𝑋 (0; 10, 0.1) = 0.6513

In general the chance of at least one occurrence of a T-year event in T years is

1 1 𝑇
1 − 𝑓𝑋 �0; 𝑇, 𝑇� = 1 − �1 − 𝑇�

It can be shown that as T gets large, this expression approaches 1 − 1⁄𝑒 𝑜𝑟 0.632. For T
= 5, 10, and 25, the probability is 0.67, 0.65, and 0.64, respectively. Thus, if the design life of a
structure and its design return period are the same, the chances are very great that the capacity of
the structure will be exceeded during its design life. The risk associated with a return period over n
years is
𝑟𝑖𝑠𝑘 = 1 − (1 − 1/𝑇)𝑛

The procedure outlined in example 4.7 can be used to determine a design return period
when the allowable risk is stated. Note that the design return period must be much greater than the
life of the project to be reasonably sure that an exceedance will not occur. No matter what design
return period is selected, there is still a chance that an exceedance will occur. Some may argue that
there is an upper limit to the magnitude of natural events such as flood peaks. They would argue
that a peak of 100,000 cfs from a 1-acre watershed would be impossible. In practice the probability
that would be assigned to an event of this sort is so small that it can be neglected for most practical
purposes.

Figure 4.1 shows the design return period that must be used to be a certain percent
confident that the design will not be exceeded during the design life of the project. The parameters
on the curves are the percent chance of no exceedance during the design life. For example, to be
90% sure that a design condition will not be exceeded during a project whose design life is 100
years, the project would have to be designed on the basis of a 900-year event. Figure 4.1 is derived
from calculations like those contained in example 4.7.

Figure 4.1 can also be used to evaluate the risk or percent chance of an event in excess of
the design event during the design life. For example, if a project is designed on the basis of a
98
50-year event and the design life of the project is 10 years, the designer is taking a 19% chance
(100-81) that the design will be exceeded.

Example 4.8. Three successes have occurred on the first 5 trials of a Bernoulli process with p =
0.4. What is the probability of 3 successes in the next 5 trials?

Solution:

5
𝑓𝑋 (3; 5,0.4) = � � (0.4)3 (0.6)2 = 0.2304
3

Comment: What has occurred prior to the trials of interest is of no concern since the Bernoulli
process is based on the assumption of independence from trial to trial.

Fig. 4.1. Design return period required as a function of design life to be a given percent confident
(curve parameter) that the design condition is not exceeded.
99
Geometric Distribution

The probability that the first exceedance (or success) of a Bernoulli trial occurs on the Xth
trial can be found by noting that for the first exceedance to be on the Xth trial there must be X-1
preceding trials without an exceedance followed by 1 trial with an exceedance. Thus the desired
probability is pqx-l This is known as the geometric distribution

𝑓𝑋 (𝑥; 𝑝) = 𝑝𝑞 𝑥−1 𝑥 = 1, 2, 3, … (4.11)

The mean and variance of the geometric distribution are


1
𝐸(𝑋) = 𝑝 (4.12)

1
𝑉𝑎𝑟(𝑋) = 𝑝2 (4.13)

E(X) = 1/p this means that on the average a T-year event occurs on the Tth year, which
agrees with our intuitive concept of a return period.

Example 4.9. What is the probability that a 10-year flood will occur for the first time during the
fifth year after the completion of a project? What is the probability it will be at least the fifth year
before a 10-year flood occurs?

Solution: The probability that the first exceedance is in year 5 is

𝑓𝑋 (5; 0.1) = (0.1)(0.9)4 = 0.06561

The probability that it will be at least the fifth year before the first occurrence is not the same as the
probability of the first occurrence in the fifth year. The expression “at least” implies the first
occurrence might be in the fifth year or some later year. The desired probability is equal to the
probability of no occurrences in the first 4 years which is (0.9)4 = 0.6561.

Example 4.10. What is the probability that exactly 9 years will elapse between occurrences of a
10-year event?

100
Solution: This is the same as the probability of the first occurrence on the tenth year or

𝑓𝑋 (10; 0.1) = (0.1)(0.9)9 = 0.0387

Negative Binomial Distribution

The probability that the kth exceedance (success) occurs on the Xth trial (X > k) of a
Bernoulli process can be found by noting that there must be k-1 exceedances in the X-1 trials
preceding the kth exceedance on the Xth trial. The probability of k-1 exceedances in X-1 trials is
𝑥 − 1 𝑘−1 𝑥−𝑘
given by the binomial distribution as � � 𝑝 𝑞 . The probability that the Xth trial results
𝑘−1
in an exceedance is p so the desired probability is given by the negative binomial distribution.

𝑥 − 1 𝑘 𝑥−𝑘
𝑓𝑋 (𝑥; 𝑘, 𝑝) = � �𝑝 𝑞 𝑥 = 𝑘, 𝑘 + 1 (4.14)
𝑘−1

The mean and variance of the negative binomial distribution are

𝐸(𝑋) = 𝑘/𝑝
𝑘𝑞
𝑉𝑎𝑟(𝑋) = 𝑝2

As might be expected since the negative binomial is based on the binomial, the additive
feature holds. Thus if X and Y are described by fX(x; k1, p) and fY(y; k2, p) respectively, then
Z=X+Y follows the negative binomial fZ (z; k1 + k2, p).

Example 4.11. What is the probability that the fourth occurrence of a 10-year flood will be on the
fortieth year?

Solution:

39 (0.1)4 (0.9)36
𝑓𝑋 (40; 4, 0.1) = � � = 0.0206
3

101
Summary of Bernoulli Process

In a Bernoulli process at each instant of time (or location or trial) an event may either occur
with probability p or not occur with probability q = 1-p. The probability of the event occurring is
independent of the time and independent of the past history of occurrences. The number of
occurrences in a given time interval (or distance or number of trials) follows the binomial
distribution. The probability that the first occurrence is at the Xth time is described by the
geometric distribution. The probability that the kth occurrence was at the Xth time is described by
the negative binomial distribution. It was also found that the probability distribution of the length
of time between occurrences can be found from the geometric distribution by noting that the
probability that X trials elapse between occurrences is the same as the probability that the first
occurrence is at the X+ first time or fX (x+1; p) = pqx .

POISSON PROCESS

Poisson Distribution

Consider a Bernoulli process defined over an interval of time (or space) so that p is the
probability that an event may occur during the time interval. If the time interval is allowed to
become shorter and shorter so that the probability, p, of an event occurring in the interval gets
smaller and the number of trials, n, increases in such a fashion that np remains constant, then the
expected number of occurrences in any total time interval remains the same. It can be shown that
as n gets large and p gets small so that np remains a constant, λ, the binomial distribution
approaches the Poisson distribution given by

𝑓𝑥 (𝑥; 𝜆) = 𝜆𝑥 𝑒 −𝜆 ⁄𝑥! 𝑥 = 0,1,2, … ; 𝜆 > 0 (4.15)

The mean, variance, and coefficient of skew of the Poisson distribution are

𝐸(𝑋) = 𝜆 (4.16)

𝑉𝑎𝑟(𝑋) = 𝜆 (4.17)

𝛾𝑠 = 1/√𝜆 (4.18)

As λ gets large, the distribution goes from a positively skewed distribution to a nearly symmetrical
distribution. The cumulative Poisson distribution is
102
𝜆𝑖 𝑒 −𝜆
𝐹𝑋 (𝑥; 𝜆) = ∑𝑥𝑖=0 (4.19)
𝑖!

Example 4.12. What is the probability that a storm with a return period of 20 years will occur once
in a 10-year period?

Solution: Using the binomial distribution the exact answer is

10 (0.05)(0.95)9
𝑓𝑋 (1; 10,0.05) = � � = 0.315
1

Approximating with the Poisson

𝜆 = 𝑛𝑝 = 10 𝑥 0.05 = 0.5

0.5𝑒 −0.5
𝑓𝑋 (1; 0.5) = = 0.303
1

Thus the solutions are not identical but are quite close to each other.

Example 4.13. What is the probability of 5 occurrences of a 2-year storm in a 10-year period?

Solution: Using the binomial

10 (0.5)5 (0.5)5
𝑓𝑋 (5; 10,0.5) = � � = 0.246
5

Approximating with the Poisson

𝜆 = 𝑛𝑝 = 10 𝑥 0.5 = 5

55 𝑒 −5
𝑓𝑋 (5; 5) = = 0.176
5!

103
Comment: For this situation n in not large enough and p small enough for a good approximation

Example 4.14. What is the probability of fewer than 5 occurrences of a 20-year storm in a
100-year period?

Solution: n is relatively large and p small so the Poisson will be used.

𝜆 = 𝑛𝑝 = 100(.05) = 5

𝑃𝑟𝑜𝑏(𝑋 < 5) = 𝑃𝑟𝑜𝑏(𝑋 ≤ 4) = 𝐹𝑋 (4; 5)

5𝑖 𝑒 −5
𝐹𝑋 (4; 5) = ∑4𝑖=0 = 0.440
𝑖!

The Poisson distribution possesses the additive property that the sum of two Poisson
random variables with parameters λ1 and λ2 is a Poisson random variable with parameter
𝜆 = 𝜆1 + 𝜆2. A Poisson process for a continuous time scale can be defined analogous to a
Bernoulli process on a discrete time scale. The Poisson process refers to the occurrence of events
along a continuous time (or location) scale. The assumptions underlying the process are:

1. The probability of an event in any short interval t to t+ Δt is λΔt (proportional to the length of
the interval) for all values of t. This property is known as stationarity.

2. The probability of more than one event in any short interval t to t+Δt is negligible in
comparison to λΔt.

3. The number of events in any interval of time is independent of the number of events in any
other non-overlapping interval of time.

The probability distribution of the number of events X in time t for a Poisson process is
given by

(𝜆𝑡)𝑥 𝑒 −𝜆𝑡
𝑓𝑋 (𝑥; 𝜆𝑡) = 𝜆 > 0; 𝑡 > 0; 𝑥 = 1,2, …. (4.20)
𝑥!

where fX(x; λt) is the probability of X events in time t. Equation 4.20 is a Poisson distribution with
parameter λt. The mean and variance of fX (x; λt) are E(X) = λt and Var(X) = λt. The parameter λ is
104
the average rate of occurrence of the event.

Exponential Distribution
The probability distribution of the time, T, between occurrences of the event can be found
by noting that the prob(T< t) is equal to 1-prob(T > t). The prob (T > t) is equal to the probability of
no occurrences in time t which is fX(0; λt) or e-λt. Thus

𝑝𝑟𝑜𝑏(𝑇 ≤ 𝑡) = 𝑃𝑇 (𝑡; 𝜆) = 1 − 𝑒 −𝜆𝑡 (4.21)

which is a cumulative distribution known as the exponential distribution. The probability density
function is

𝑑𝑃𝑇 (𝑡;𝜆)
𝑝𝑇 (𝑡; 𝜆) = = 𝜆𝑒 −𝜆𝑡 (4.22)
𝑑𝑡

and is the probability distribution of the length of the time interval between occurrences of the
event. The mean and variance of the exponential distribution are 1/λ and l/λ2, respectively.

Gamma Distribution
The probability distribution of the time to the nth occurrence can be found by noting that the
time to the nth occurrence is the sum of n independent random variables, T1 + T2 + ... + Tn from the
exponential distribution. The method of derived distributions can be used with the result that the
probability density function of the time to the nth occurrence is

𝜆𝑛 𝑡 𝑛−1 𝑒 −𝜆𝑡
𝑝𝑇 (𝑡; 𝑛, 𝜆) = (𝑛−1)!
𝑡 > 0; 𝜆 > 0; 𝑛 = 1,2, … .. (4.23)

which is the gamma distribution for integer values of the parameter n. The gamma distribution has
𝐸(𝑇) = 𝑛/𝜆 and 𝑉𝑎𝑟(𝑇) = 𝑛/𝜆2 .

Example 4.15. Barges arrive at a lock at an average of 4 each hour.

(a) If the arrival of barges at the lock can be considered to follow a Poisson process, what is the
probability that 6 barges will arrive in 2 hours?

(b) If the lock master has just locked through all of the barges at the lock, what is the
probability he can take a 15-minute break without another barge arriving?

105
(c) If the operation of the lock is such that 4 barges can be locked through at once and the lock
master insists that this always be the case, what is the probability that the first barge to
arrive after 4 previous barges have been locked through will have to wait at least 1 hour
before being locked through?

Solution:

(a) For this problem the rate constant is 4 hours-l. The probability of 6 arrivals in 2 hours can be
determined from the Poisson distribution
86 𝑒 −8
𝑓𝑋 (𝑥; 𝜆𝑡) = 𝑓𝑥 (6; 8) = = 0.1221
6!

(b) The probability of no arrivals in 15 minutes is also from the Poisson

10 𝑒 −1
𝑓𝑋 (0; 1) = = 0.3679
0!

Note that this is not the same as the probability that it will be 15 minutes until the next arrival. The
time scale is continuous so the probability that it will be exactly 15 minutes until the next arrival is
zero. We can only talk of probabilities associated with time intervals, not specific times.
(c) The barge must wait for the arrival of 3 additional barges. The probability that the time T for 3
barges to arrive is greater than 1 hour

𝑝𝑟𝑜𝑏(𝑇3 > 1) 𝑖𝑠 1 − 𝑝𝑟𝑜𝑏(𝑇3 ≤ 1).

The probability that T < 1 for 3 arrivals comes from the gamma distribution.
𝑡
𝑃𝑇 (𝑡; 𝑛, 𝜆) = � 𝑝𝑇 (𝑡; 𝑛, 𝜆)𝑑𝑡
0

1
43 𝑡 2 𝑒 −4𝑡
=� 𝑑𝑡 = 0.762
0 2!

The desired probability is 1 - 0.762 = 0.238.

Summary of Poisson Process

The Poisson process is a discrete process on a continuous time scale. Therefore, the
106
probability distribution of the number of events in a time T is a discrete distribution while the
probability distributions for the time between events and the time to the nth event are continuous
distributions.

For a Poisson process the probability that an event will occur in a short time interval t to
t+Δt is λΔt for all t. The probability that more than one event occurs in Δt is negligible. The
probability distribution of the number of events in a given time T is the Poisson distribution. The
exponential distribution describes the time between events and the gamma distribution the time to
the nth event.

Example 4.16. It has been proposed that an event-based rainfall simulation model can be
constructed by modeling the occurrence of rainstorms by a Poisson process and the amount of rain
in each storm by some continuous probability distribution. In this way the time between rainstorms
would follow an exponential distribution, the time for X rainstorms would follow a gamma
distribution, and the number of rainstorms in a time interval would follow a Poisson distribution.
Duckstein et al. (1975) and Fogel et al. (1974) used a modification of this approach. Part of Fogel
et al.'s results are shown as figure 4.2.

Fig. 4.2. Distribution of occurrences of warm season rainfall in which the areal mean of five gages
in New Orleans, Louisiana, exceeded 0.50 inches and at least one gage recorded more than 1.0
107
inch. (Fogel et al. 1974).

MULTINOMIAL DISTRIBUTION

The binomial distribution can be generalized to include the probabilities of outcomes of


several types rather than the two possible outcomes of the binomial. If the probabilities associate
with each of k distinct outcomes are p1, p2, ..., pk, then in independent trials the probability of X1
outcomes of type 1, X2 outcomes of type 2, ..., Xk outcomes of type k is given by the multinomial
distribution as

𝑛! 𝑥 𝑥 𝑥
𝑓𝑋1, 𝑋2 ,…,𝑋𝑘 (𝑥1 , 𝑥2 , … , 𝑥𝑘 ; 𝑛, 𝑝1 , 𝑝2 , … , 𝑝𝑘 ) = 𝑥 𝑝1 1 𝑝2 2 … 𝑝𝑘 𝑘
1 !𝑥2 !…𝑥𝑘 !

or
𝑥
𝑝 𝑖
𝑓𝑋 �𝑥; 𝑛, 𝑝� = 𝑛! ∑𝑘𝑖=1 𝑥𝑖 ! (4.24)
𝑖

where X, x and p are 1 by k vectors. Some restrictions on this distribution are

∑𝑘𝑖=1 𝑝𝑖 = 1 and ∑𝑘𝑖=1 𝑥𝑖 = 𝑛

The mean and variance of the multinomial distribution are

𝐸(𝑋𝑖 ) = 𝑛𝑝𝑖 (4.25)

𝑉𝑎𝑟(𝑋𝑖 ) = 𝑛𝑝𝑖 (1 − 𝑝𝑖 ) (4.26)

Example 4.17. On a certain stream the probability that the maximum peak flow during a l-year
period will be less than 5,000 cfs is 0.2 and the probability that it will be between 5,000 cfs and
10,000 cfs is 0.4. In a 20-year period, what is the probability of 4 peak flows less than 5,000 cfs and
8 peak flows between 5,000 and 10,000 cfs?

Solution: To apply the multinomial distribution we define the third event as a peak flow in excess
of 10,000 cfs. This event has probability 1 - 0.2 - 0.4 = 0.4. The event of a peak flow greater than
10,000 cfs must occur 20 - 4 - 8 = 8 times. The desired probability is

108
20!(0.2)4 (0.4)8 (0.4)8
𝑓𝑋 (4, 8, 8; 20, 0.2, 0.4, 0.4) = =0.043
4!8!8!

Comment: The expected result from 20 years of flood peak data would be

𝐸(𝑋1 ) = 𝑛𝑝1 = 20(0.2) = 4

𝐸(𝑋2 ) = 𝑛𝑝2 = 8

𝐸(𝑋3 ) = 𝑛𝑝3 = 8

This problem demonstrates that even though the expected results are 4, 8, 8, the probability of this
happening is very low.

109
EXERCISES

4.1 Compute the terms of the binomial distribution with n = 10 and p = 0.2. Plot in the form of a
histogram.

4.2 Compute the terms of the cumulative binomial with n = 10 and p = 0.2. Plot the terms.

4.3 If a project is designed on a 10-year return period, what is the probability of at least 1
exceedance during the 10-year life of the project?

4.4 What design return period should be used to insure a 95% chance that the design will not be
exceeded in a 25-year period?

4.5 Construct a curve relating the design return period to the life of a project when a 90% chance
of no exceedance is used.

4.6 What design return period should be used to insure a 50% chance of no exceedance in a
10-year period?

4.7 What design return period should be used to insure a 75% chance of no more than 1
exceedance in 10 years?

4.8 Construct an example where the Poisson is not a good approximation for the binomial.

4.9 In a certain locality contractors A, B and C get about 50%, 25% and 25% respectively, of all
water resources projects. Five contracts are coming up for bid. What is the probability that
contractor A will get all 5 jobs? What is the probability that A will get 2 jobs and B will get 2 jobs?

4.10 In 100 years the following number of floods were recorded at a specific location. Draw a
relative frequency histogram of the data. Fit a Poisson distribution to the data and plot the relative
frequencies according to the Poisson distribution on the histogram. Is the Poisson a good
approximation for the data?

110
No. of floods No. of occurrences
0 52
1 28
2 12
3 5
4 2
5 1
6 0

4.11 Based on a Poisson approximation to the data of exercise 4.10, what is the probability of 5
successive years without a flood?

4.12 Based on a Poisson approximation to the data of exercise 4.10, what is the probability of
exactly five years between floods?

4.13 Compute the probability of at least 1 n-year event in a k-year period using (a) n= 100, k=20
(b) n=500, k=50.

4.14 Using the Poisson approximation to the binomial distribution show that the probability of at
least one occurrence of a T-year event in T years is 0.632.

4.15 The Bernoulli distribution is given by


𝑓𝑋 (𝑥) = 𝑝 𝑥 (1 − 𝑝)1−𝑥 𝑥 = 0, 1
What is E(X) and Var(X) for this distribution?

4.16 Use the Poisson distribution to approximate the binomial distribution of exercise 4.1. Plot
the terms of this Poisson distribution on the histogram of exercise 4.1.

4.17 Two widely separated watersheds are selected for a study on peak discharges. If the
occurrence of flood flows on the two basins can be considered as independent events, what is the
probability of experiencing a total of 5, 20-year events on the two watersheds in a 10-year period?

4.18 A well-known scientist has predicted that during a certain three-year period a severe drought
will occur on the plains east of the Rocky Mountains. He made this prediction based on his
111
observance of sunspot activity. If the probability of a drought is 0.10 in any year, what is the
probability that the scientist's prediction will come true if the occurrence of a drought is a strictly
random phenomena unrelated to sunspot activity?

4.19 In a certain region there are 20 possible small watersheds suitable for a research project.
Unknown to the project manager, 6 of these basins have subsurface geological features that permit
large quantities of surface water to enter underground formations and leave the basin via
subsurface flow. The project manager wants to select 6 watersheds from the 20 for study.
(a) What is the probability that 1 of the basins having the above described geologic features will be
selected?
(b) What is the probability that 3 of these basins will be selected?
(c) What is the probability that at least one of the basins will be selected?
(d) What is the probability that all of these basins will be selected?

4.20 In the situation described in exercise 4.19 the project manager wants to pick 3 pairs of
watersheds for the evaluation of an evapotranspiration suppressant. One basin in each pair will be
used for a control and one will be treated with the suppressant. What is the probability that all of
the control watersheds will have the geologic problem while all of the rest will not?

4.21 It is desired to model the number of rainy days in July and August as a Bernoulli
process. Based on the data below and the assumption that the Bernoulli model is applicable:
(a) What is the probability of 10 or more rainy days in each of the months of July and August?
(b) What is the probability of 20 rainy days in the two-month period?
(c) What assumptions concerning the Bernoulli process are likely violated by this problem?
For this problem write answers in terms of summations. Do not evaluate the summations.

Year 1 2 3 4 5 6 7 8 9 10
No. of rainy days
July 10 15 17 8 9 19 17 14 20 4
August 4 9 8 3 0 10 12 2 8 6

4.22 For the binomial distribution show that

𝑓𝑋 (𝑥; 𝑛, 𝑝) = 𝑓𝑋 (𝑥 − 1; 𝑛 − 1, 𝑝) 𝑓𝑋 ( 1; 1, 𝑝) + 𝑓𝑋 (𝑥; 𝑛 − 𝑙, 𝑝) 𝑓𝑋 (0; 1, 𝑝) . Write out a


narrative description of the meaning of this equation.
.
4.23 Work exercise 4.21 using the Poisson distribution to approximate the binomial.

112
4.24 Pool the data of exercise 4.21 so that a single estimate is obtained for p of the binomial
distribution. Compute the probability of 20 rainy days in the two month period of July-August.
Compare this probability to the one computed in part b of exercise 4.21. Which answer would you
prefer?
4.25 Using the data of exercise 4.21, what is the probability that the sixth wet day of August
occurs on August 29,30 or 31 ?

4.26 Show that for the Poisson process the time for n occurrences follows the gamma
distribution. (Hint: Use the method of derived distributions to find the distribution of the time to 2
occurrences. Using the distribution of the time to 2 occurrences the method of derived distributions
can be used to get the time to 3 occurrences. This process can then be repeated until a pattern
emerges. Induction could also be used by showing that if the time for n-l occurrences is given by
equation 4.20 by substituting n-l for n then the time for n occurrences is given by equation 4.20.
Also the time for 1 occurrence is given by equation 4.19 which is the same as equation 4.20 with n
= 1.)

113

You might also like