0% found this document useful (0 votes)

87 views24 pages

4.1 - Discrete Models PDF

1) The document describes two experiments involving rolling dice with classical probability distributions. Experiment 3a involves rolling one fair die, while Experiment 3b involves rolling two fair dice. 2) For each experiment, the sample space, probability mass function (pmf), and probability distribution are defined. The pmf assigns a probability of 1/6 to each outcome for Experiment 3a and probabilities ranging from 1/36 to 6/36 for Experiment 3b. 3) Key concepts of discrete probability distributions such as the cumulative distribution function (cdf) are introduced.

Uploaded by

ARDUINO BOLT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views24 pages

4.1 - Discrete Models PDF

Uploaded by

ARDUINO BOLT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Ismor Fischer, 5/26/2016 4.

1-1

4. Classical Probability Distributions

4.1 Discrete Models

FACT: Random variables can be used to define events that involve measurement!

Experiment 3a: Roll one fair die... Discrete random variable X = “value obtained”
Sample Space: S = {1, 2, 3, 4, 5, 6} #(S) = 6
Because the die is fair, each of the six faces has an equally likely probability of
occurring, i.e., 1/6. The probability distribution for X can be defined by a so-called
probability mass function (pmf) p(x), organized in a probability table, and
displayed via a corresponding probability histogram, as shown.

Event Probability “Uniform Distribution”

x p(x) = P(X = x)
1 1/6
2 1/6 1 1 1 1 1 1
3 1/6 6 6 6 6 6 6

4 1/6
5 1/6
6 1/6
1 X

Comment on notation:
P( X 4 ) = 1/6
Event

Translation: “The probability of rolling 4 is 1/6.”

Likewise for the other probabilities P(X = 1), P(X = 2),…, P(X = 6) in this example.
A mathematically succinct way to write such probabilities is by the notation P(X = x),
where x = 1, 2, 3, 4, 5, 6. In general therefore, since this depends on the value of x,
we can also express it as a mathematical function of x (specifically, the pmf; see
above), written p(x). Thus the two notations are synonymous and interchangeable.
The previous example could just as well have been written f(4) = 1/6.
Ismor Fischer, 5/26/2016 4.1-2

Experiment 3b: Roll two distinct, fair dice. Outcome = (Die 1, Die 2)

Sample Space: S = {(1, 1), …, (6, 6)} #(S) = 62 = 36

(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)

(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)

Discrete random variable X = “Sum of the two dice (2, 3, 4, …, 12).”

Events: “X = 2” = {(1, 1)} #(X = 2) = 1

“X = 3” = {(1, 2), (2, 1)} #(X = 3) = 2
“X = 4” = {(1, 3), (2, 2), (3, 1)} #(X = 4) = 3
“X = 5” = {(1, 4), (2, 3), (3, 2), (4, 1)} #(X = 5) = 4
“X = 6” = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} #(X = 6) = 5
“X = 7” = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} #(X = 7) = 6
“X = 8” = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)} #(X = 8) = 5
“X = 9” = {(3, 6), (4, 5), (5, 4), (6, 3)} #(X = 9) = 4
“X = 10” = {(4, 6), (5, 5), (6, 4)} #(X = 10) = 3
“X = 11” = {(5, 6), (6, 5)} #(X = 11) = 2
“X = 12” = {(6, 6)} #(X = 12) = 1

Recall that, by definition, each event “X = x” (where x = 2, 3, 4,…, 12) corresponds

to a specific subset of outcomes from the sample space (of ordered pairs, in this
case). Because we are still assuming equal likelihood of each die face appearing,
the probabilities of these events can be easily calculated by the “shortcut” formula
#( A )
P ( A) . Question for later: What if the dice are “loaded” (i.e., biased)?
#( S )
Ismor Fischer, 5/26/2016 4.1-3

Again, the probability distribution for X can be organized in a probability table,

and displayed via a probability histogram, both of which enable calculations to be
done easily:

x p(x) = P(X = x)
2 1/36
3 2/36
4 3/36
6
5 4/36 36
5 5
6 5/36 36 36
7 6/36 4 4
36 36
8 5/36 3 3
9 4/36 36 36
2 2
10 3/36 1 36 36 1
11 2/36 36 36

12 1/36
3 5 7 9 11
1

 P(X = 7 or X = 11) Note that “X = 7” and “X = 11” are disjoint!

= P(X = 7) + P(X = 11) via Formula (3) above
= 6/36 + 2/36 = 8/36

 P(5 X 8)
= P(X = 5 or X = 6 or X = 7 or X = 8)
= P(X = 5) + P(X = 6) + P(X = 7) + P(X = 8)
= 4/36 + 5/36 + 6/36 + 5/36
= 20/36
2 3 4 5 6 7 8 9 10 11 12

 P(X < 10) = 1 P(X 10) via Formula (1) above

= 1 [P(X = 10) + P(X = 11) + P(X = 12)]
= 1 [3/36 + 2/36 + 1/36] = 1 6/36 = 30/36
Exercise: How could event E = “Roll doubles” be characterized in terms of a
random variable? (Hint: Let Y = “Difference between the two dice.”)
Ismor Fischer, 5/26/2016 4.1-4

The previous example motivates the important topic of...

Discrete Probability Distributions
In general, suppose that all of the distinct population values of a discrete random
variable X are sorted in increasing order: x1 < x2 < x3 < …, with corresponding
probabilities of occurrence p(x1), p(x2), p(x3), … Formally then, we have the
following.

Definition: p(x) is a probability mass function for the discrete random variable X if,
for all x,
p(x) 0 AND p( x) = 1.
all x

In this case, p(x) = P(X = x), the probability that the value x occurs in the population.
The cumulative distribution function (cdf) is defined as, for all x,
F(x) = P(X x) = p( xi ) = p(x1) + p(x2) + … + p(x).
all xi x

Therefore, F is piecewise constant, increasing from 0 to 1.

Furthermore, for any two population values a < b, it follows that
b
P(a X b) = p( x) = F(b) – F(a )
a
where a is the value just preceding a in the sorted population.

Total Area = 1
Exercise: Sketch the cdf F(x)
for Experiments 3a and 3b above.

1
F(x3)
p(x2)
p(x3) F(x2)
p(x1)

Population Parameters μ and σ2 (vs. Sample Statistics x and s2)

population mean = the “expected value” of the random variable X

= the “arithmetic average” of all the population values

If X is a discrete numerical random variable, then…

μ = E[X] = x p(x), where pmf p(x) = P(X = x), the probability of x.

Compare this with the relative frequency definition of sample mean given in §2.3.

Properties of Mathematical Expectation

1. For any constant c, it follows that E[cX] = c E[X].

2. For any two random variables X and Y, it follows that
 E[X + Y] = E[X] + E[Y] and, via Property 1,
 E[X − Y] = E[X] − E[Y].

Any “operator” on variables satisfying 1 and 2 is said to be linear.

population variance = the “expected value” of the squared deviation of the

random variable X from its mean (μ)

If X is a discrete numerical random variable, then…

σ 2 = E[(X )2] = (x )2 p(x).
Equivalently,*
σ 2 = E[X 2] 2
= x2 p(x) 2
,
where pmf p(x) = P(X = x), the probability of x.

Compare the first with the definition of sample variance given in §2.3.
(The second is the analogue of the alternate computational formula.) Of course,
the population standard deviation σ is defined as the square root of the variance.

*Exercise: Algebraically expand the expression (X )2, and use the properties of expectation given above.
Ismor Fischer, 5/26/2016 4.1-6

Experiment 4: Two populations, where the daily number of calories consumed is

designated by X1 and X2, respectively.

Population 1
10%
0.4
2300
40% 20% 0.3
2600 2400

0.2
2500
0.1
30%

Probability Table

x p1(x)
 Mean(X1) = µ1 = (2300)(0.1) + (2400)(0.2) +
2300 0.1
(2500)(0.3) + (2600)(0.4) = 2500 cals
2400 0.2
2500 0.3  Var(X1) = σ12 = (–200)2(0.1) + (–100)2(0.2) +
(0)2(0.3) + (+100)2(0.4) = 10000 cals2
2600 0.4

Population 2

20%

2200 0.5

50% 2400
2300 0.3

30% 0.2

Probability Table

x p2(x)
2200 0.2  Mean(X2) = µ2 = (2200)(0.2) + (2300)(0.3) + (2400)(0.5) = 2330 cals
2300 0.3
 Var(X2) = σ22 = (–130)2(0.2) + (–30)2(0.3) + (70)2(0.5) = 6100 cals2
2400 0.5
Ismor Fischer, 5/26/2016 4.1-7
Summary (Also refer back to 2.4 - Summary)
POPULATION
Discrete random variable X

Probability Table Probability Histogram

x p(x) = P(X = x)
x1 f(x1)
x2 f(x2)
. .
. .
X
. .
= E[X] = x p(x)
Parameters
1
E[(X )2] = (x )2 p(x)
2
= or
E[X2] 2
= x2 p(x) 2

SAMPLE, size n

Relative Frequency Table Density Histogram

X and S 2
freq(x) can be shown
x p(x) = to be
n
unbiased
x1 p(x1) estimators of
x2 p(x2) and 2 ,
respectively.
. . That is,
. . X E X ,
. . and
x = x p(x)
Statistics

xk p(xk) E S2 2
.
n
(x x )2 p(x) (In fact, they
n 1 are MVUE.)
1 s2 = or
n
n 1
[ x2 p(x) x 2
]
Ismor Fischer, 5/26/2016 4.1-8

~ Some Advanced Notes on General Parameter Estimation ~

Suppose that is a fixed population parameter (e.g., ), and
ˆ is a sample-based estimator (e.g., X ). Consider all the POPULATION
random samples of a given size n, and the resulting “sampling Parameter
distribution” of ˆ values. Formally define the following:

 Mean (of ˆ ) = E [ ˆ] , the expected value of ˆ .

 Bias = E [ ˆ] , the difference between the expected
value of ˆ , and the “target” parameter .
2
 Variance (of ˆ ) = E ˆ E[ ˆ] , the expected value SAMPLE
Statistic ˆ
of the squared deviation of ˆ from its mean E [ ˆ] ,

or equivalently,* = E ˆ 2 E[ ˆ] 2 .

 Mean Squared Error (MSE) = E ( ˆ )2 , the expected value of the squared

difference between estimator ˆ and the “target” parameter .

Exercise: Prove* that MSE = Variance + Bias2 .

c ˆ Vector interpretation
c = a+b
b E[ ˆ]
E[c2 ] E[a2 ] E[b 2 ]
a ˆ E[ ˆ]

Comment: A parameter estimator ˆ is defined to be unbiased if E [ ˆ] , i.e.,

Bias = 0. In this case, MSE = Variance, so that if ˆ minimizes MSE, it then follows
that it has the smallest variance of any estimator. Such a highly desirable estimator is
called MVUE (Minimum Variance Unbiased Estimator). It can be shown that the
estimators X and S 2 (of and 2 , respectively) are MVUE, but finding such an
estimator ˆ for a general parameter can be quite difficult in practice. Often, one
must settle for either not having minimum variance or having a small amount of bias.

*
using the basic properties of mathematical expectation given earlier
Ismor Fischer, 5/26/2016 4.1-9

Related (but not identical) to this is the idea that of all linear combinations
c1x1 c2 x2 cn xn of the data {x1, x2 , , xn} (such as X , with c1 c2 cn 1/ n )
which are also unbiased, the one that minimizes MSE is called BLUE (Best Linear
Unbiased Estimator). It can be shown that, in addition to being MVUE (as stated
above), X is also BLUE. To summarize,

MVUE gives: Min Variance among all unbiased estimators

≤ Min Variance among linear unbiased estimators
= Min MSE among linear unbiased estimators (since MSE = Var + Bias2),
given by BLUE (by def).

The Venn diagram below depicts these various relationships.

Minimum Minimum
Variance MSE

MVUE

Minimum variance BLUE

among all unbiased
estimators S2
Minimum variance
among linear
X unbiased estimators

Unbiased Linear

Comment: If MSE 0 as n , then ˆ is said to have mean square convergence

to . This in turn implies “convergence in probability” (via “Markov's Inequality,”
also used in proving Chebyshev’s Inequality), i.e., ˆ is a consistent estimator of .
Ismor Fischer, 5/26/2016 4.1-10

Experiment 4 - revisited: Recall the previous example, where X1 and X2 represent

the daily number of calories consumed in two populations, respectively.

Population 1 Population 2
10%
20%
2300
2200
40% 20%
2600 2400
50% 2400
2300
2500
30%
30%

x p1(x) x p2(x)
2300 0.1
2200 0.2
2400 0.2
2300 0.3
2500 0.3
2600 0.4 2400 0.5
Mean(X1) = µ1 = 2500 cals; Mean(X2) = µ2 = 2330 cals;
Var(X1) = σ12 = 10000 cals2 Var(X2) = σ22 = 6100 cals2

Case 1: First suppose that X1 and X2 are statistically independent, as shown in the joint probability
distribution given in the table below. That is, each cell probability is equal to the product of the
corresponding row and column marginal probabilities. For example, P(X1 = 2300 ∩ X2 = 2200) = .02,
but this is equal to the product of the column marginal P(X1 = 2300) = .1 with the row marginal
P(X2 = 2200) = .2. Note that the marginal distributions for X1 and X2 remain the same as above, as can
be seen from the single-underlined values for X1, and respectively, the double-underlined values for X2.

X1 = # calories for Pop 1

2300 2400 2500 2600

X2 = # calories

2200 .02 .04 .06 .08 .20

for Pop 2

2300 .03 .06 .09 .12 .30

2400 .05 .10 .15 .20 .50

.10 .20 .30 .40 1.00

Ismor Fischer, 5/26/2016 4.1-11

Now imagine that we wish to compare the two populations, by considering the
probability distribution of the calorie difference D = X1 – X2 between them. (The sum
S = X1 + X2 is similar, and left as an exercise.)

Events Sample Space Probabilities

D=d Outcomes in the form of ordered pairs (X1, X2) from joint distribution
D = –100: (2300, 2400) .05
D = 0: (2300, 2300), (2400, 2400) .13 = .03 + .10
D = +100: (2300, 2200), (2400, 2300), (2500, 2400) .23 = .02 + .06 + .15
D = +200: (2400, 2200), (2500, 2300), (2600, 2400) .33 = .04 + .09 + .20
D = +300: (2500, 2200), (2600, 2300) .18 = .06 + .12
D = +400: (2600, 2200) .08

As an example, there are two possible ways that D = 300 can occur, i.e., two possible
outcomes corresponding to the event D = 300: Either A = “X1 = 2500 and X2 = 2200”
or B = “X1 = 2600 and X2 = 2300,” that is, A ⋃ B. For its probability, recall that
P( A B) P( A) P( B) P( A B). However, events A and B are disjoint, for they
cannot both occur simultaneously, so that the last term is P(A ⋂ B) = 0. Thus,
P( A B) P( A) P( B) with P(A) = .06 and P(B) = .12 from the joint distribution.

Mean(D) = µD =
(–100)(.05) + (0)(.13) + (100)(.23) +
(200)(.33) + (300)(.18) + (400)(.08)
.33
= 170 cals
i.e., µD = µ1 – µ2 (Check this!)
.23
.18
Var(D) = σD2 =
.13
.08 (–270)2(.05) + (–170)2(.13) + (–70)2(.23)
.05
+ (30)2(.33) + (130)2(.18) + (230)2(.08)
= 16100 cals2
i.e., σD2 = σ12 + σ22 (Check this!)
Ismor Fischer, 5/26/2016 4.1-12

Case 2: Now assume that X1 and X2 are not statistically independent, as given in the
joint probability distribution table below.

X1 = # calories for Pop 1

2300 2400 2500 2600

X2 = # calories 2200 .01 .03 .07 .09 .20
for Pop 2
2300 .02 .05 .10 .13 .30

2400 .07 .12 .13 .18 .50

.10 .20 .30 .40 1.00

The events “D = d” and the corresponding sample space of outcomes remain unchanged,
but the last column of probabilities has to be recalculated, as shown. This results in a
slightly different probability histogram (Exercise) and parameter values.

Events Sample Space Probabilities

D=d Outcomes in the form of ordered pairs (X1, X2) from joint distribution
D = –100: (2300, 2400) .07
D = 0: (2300, 2300), (2400, 2400) .14 = .02 + .12
D = +100: (2300, 2200), (2400, 2300), (2500, 2400) .19 = .01 + .05 + .13
D = +200: (2400, 2200), (2500, 2300), (2600, 2400) .31 = .03 + .10 + .18
D = +300: (2500, 2200), (2600, 2300) .20 = .07 + .13
D = +400: (2600, 2200) .09

Mean(D) = µD = (–100)(.07) + (0)(.14) + (100)(.19) + (200)(.33) + (300)(.18) + (400)(.08)

= 170 cals, i.e., µD = µ1 – µ2.

Var(D) = σD2 = (–270)2(.07) + (–170)2(.14) + (–70)2(.19) + (30)2(.31) + (130)2(.20) + (230)2(.09)

= 18517 cals2

It seems that “the mean of the difference is equal to the difference in the means” still
holds, even when the two populations are dependent. But the variance of the difference
is no longer necessarily equal the sum of the variances, as with independent populations.
Ismor Fischer, 5/26/2016 4.1-13

These examples illustrate a general principle that can be rigorously proved with mathematics.

GENERAL FACT ~

Mean(X + Y) = Mean(X) + Mean(Y) and Mean(X – Y) = Mean(X) – Mean(Y)

In addition, if X and Y are independent random variables,

Var(X + Y) = Var(X) + Var(Y) and Var(X – Y) = Var(X) + Var(Y).

Comments:

 These formulas actually apply to both discrete and continuous variables (next section).
 The difference relations will play a crucial role in 6.2 - Two Samples inference.
 If X and Y are dependent, then the two bottom relations regarding the variance also
involve an additional term, Cov(X, Y), the population covariance between X and Y.
See problems 4.3/29 and 4.3/30 for details.

 The variance relation can be interpreted visually via the Pythagorean Theorem,
which illustrates an important geometric connection, expanded in the Appendix.]

Certain discrete distributions (or discrete models) occur so frequently in practice, that
their properties have been well-studied and applied in many different scenarios. For
instance, suppose it is known that a certain population consists of 45% males (and thus
55% females). If a random sample of 250 individuals is to be selected, then what is the
probability of obtaining exactly 100 males? At most 100 males? At least 100 males?
What is the “expected” number of males? This is the subject of the next topic:
Ismor Fischer, 5/26/2016 4.1-14

POPULATION = Women diagnosed

with breast cancer in Dane County,
1996-2000
Among other things, this study
estimated that the rate of “breast cancer
in situ (BCIS),” which is diagnosed
almost exclusively via mammogram, is
approximately 12-13%. That is, for any
individual randomly selected from this
population, we have a binary variable
1, with probability 0.12
BCIS
0, with probability 0.88.

In a random sample of n 100 breast

cancer diagnoses, let
X = # BCIS cases (0,1,2, ,100) .

Questions:
 How can we model the probability
distribution of X, and under what
assumptions?
 Probabilities of events, such as
P( X 0), P( X 20), P( X 20),
etc.?
Full article available online at this link.
 Mean # BCIS cases = ?
 Standard deviation of # BCIS cases = ?
Ismor Fischer, 5/26/2016 4.1-15

Binomial Distribution (Paradigm model = coin tosses)

Binary random variable: Probability:

1, Success (Heads) with P(Success) =
Y =
0, Failure (Tails) with P(Failure) = 1

Experiment: n = 5 independent coin tosses

Sample Space S = {(H H H H H), …, (T T T T T)} #(S) = 25 = 32

(H H H H H) (H H T H H) (H T H H H) (H T T H H) (T H H H H) (T H T H H) (T T H H H) (T T T H H)
(H H H H T) (H H T H T) (H T H H T) (H T T H T) (T H H H T) (T H T H T) (T T H H T) (T T T H T)
(H H H T H) (H H T T H) (H T H T H) (H T T T H) (T H H T H) (T H T T H) (T T H T H) (T T T T H)
(H H H T T) (H H T T T) (H T H T T) (H T T T T) (T H H T T) (T H T T T) (T T H T T) (T T T T T)

Random Variable: X = “# Heads in n = 5 independent tosses (0, 1, 2, 3, 4, 5)”

5
Events: “X = 0” = Exercise #(X = 0) = 0
= 1
5
“X = 1” = Exercise #(X = 1) = 1
= 5
5
“X = 2” = Exercise #(X = 2) = 2
= 10
5
“X = 3” = see above #(X = 3) = 3
= 10
5
“X = 4” = Exercise #(X = 4) = 4
= 5
5
“X = 5” = Exercise #(X = 5) = 5
= 1

n
Recall: For x = 0, 1, 2, …, n, the combinatorial symbol x – read “n-choose-x” – is
n!
defined as the value , and counts the number of ways of rearranging x objects
x! (n x)!
among n objects. See Appendix > Basic Reviews > Perms & Combos for details.

n
r is computed via the mathematical function “nCr” on most calculators.
Note:
Ismor Fischer, 5/26/2016 4.1-16

Probabilities:
First assume the coin is fair ( = 0.5 1 = 0.5), i.e., equally likely elementary
outcomes H and T on a single trial. In this case, the probability of any event A above
can thus be easily calculated via P(A) = #(A) / #(S).

1 5
x P(X = x) =
25 x Total Area = 1
0 1/32 = 0.03125
1 5/32 = 0.15625
2 10/32 = 0.312500
3 10/32 = 0.312500
4 5/32 = 0.15625
5 1/32 = 0.03125

Now consider the case where the coin is biased (e.g., = 0.7 1 = 0.3).
Calculating P(X = x) for x = 0, 1, 2, 3, 4, 5 means summing P(all its outcomes).

Example: P(X = 3) =

outcome via independence of H, T

P(H H H T T) = (0.7)(0.7)(0.7)(0.3)(0.3) = (0.7)3 (0.3)2

+ P(H H T H T) = (0.7)(0.7)(0.3)(0.7)(0.3) = (0.7)3 (0.3)2
+ P(H H T T H) = (0.7)(0.7)(0.3)(0.3)(0.7) = (0.7)3 (0.3)2
+ P(H T H H T) = (0.7)(0.3)(0.7)(0.7)(0.3) = (0.7)3 (0.3)2 via disjoint outcomes,

+ P(H T H T H) = (0.7)(0.3)(0.7)(0.3)(0.7) = (0.7)3 (0.3)2 5 3 2

=
3 (0.7) (0.3)
+ P(H T T H H) = (0.7)(0.3)(0.3)(0.7)(0.7) = (0.7)3 (0.3)2
+ P(T H H H T) = (0.3)(0.7)(0.7)(0.7)(0.3) = (0.7)3 (0.3)2
+ P(T H H T H) = (0.3)(0.7)(0.7)(0.3)(0.7) = (0.7)3 (0.3)2
+ P(T H T H H) = (0.3)(0.7)(0.3)(0.7)(0.7) = (0.7)3 (0.3)2
+ P(T T H H H) = (0.3)(0.3)(0.7)(0.7)(0.7) = (0.7)3 (0.3)2
Ismor Fischer, 5/26/2016 4.1-17

Hence, we similarly have…

5 Total Area = 1
x P(X = x) = (0.7)x (0.3)5 x
x
5
0 (0.7)0 (0.3)5 = 0.00243
0
5
1 (0.7)1 (0.3)4 = 0.02835
1
5
2 (0.7)2 (0.3)3 = 0.13230
2
5
3 (0.7)3 (0.3)2 = 0.30870
3
5
4 (0.7)4 (0.3)1 = 0.36015
4
5
5 (0.7)5 (0.3)0 = 0.16807
5

Example: Suppose that a certain medical procedure is known to have a 70%

successful recovery rate (assuming independence). In a random sample of n = 5
patients, the probability that three or fewer patients will recover is:

Method 1: P(X 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)

= 0.00243 + 0.02835 + 0.13230 + 0.30870 = 0.47178

Method 2: P(X 3) = 1 [ P(X = 4) + P(X = 5) ]

= 1 [0.36015 + 0.16807 ] = 1 – 0.52822 = 0.47178

Example: The mean number of patients expected to recover is:

= E[X] = 0 (0.00243) + 1 (0.02835) + 2 (0.13230) + 3 (0.30870) + 4 (0.36015) + 5 (0.16807)

= 3.5 patients

This makes perfect sense for n = 5 patients with a = 0.7 recovery probability, i.e.,
their product. In the probability histogram above, the “balance point” fulcrum
indicates the mean value of 3.5.
Ismor Fischer, 5/26/2016 4.1-18

General formulation:

The Binomial Distribution

Let the discrete random variable X = “# Successes in n independent Bernoulli trials
(0, 1, 2, …, n),” each having constant probability P(Success) = , and hence
P(Failure) = 1 . Then the probability of obtaining any specified number of
successes x = 0, 1, 2, …, n, is given by the pmf p(x):
n x
P(X = x) = x (1 )n x
.

We say that X has a Binomial Distribution, denoted X ~ Bin(n, ).

Furthermore, the mean = n , and the standard deviation = n (1 ).

Example: Suppose that a certain spontaneous medical condition affects 1% (i.e., = 0.01)
of the population. Let X = “number of affected individuals in a random sample of n = 300.”
Then X ~ Bin(300, 0.01), i.e., the probability of obtaining any specified number x = 0, 1, 2,
…, 300 of affected individuals is:
300
P(X = x) = x (0.01)x (0.99)300 x .

The mean number of affected individuals is = n = (300)(0.01) = 3 expected cases, with a

standard deviation of = (300)(0.01)(0.99) = 1.723 cases.

Probability Table for Binomial Dist.

n x
x p(x) = (1 )n x
x
n 0 Exercise: In order to be a valid distribution,
0 (1 )n 0
0 the sum of these probabilities must = 1. Prove it.
n 1 Hint: First recall the Binomial Theorem:
1 (1 )n 1
1 How do you expand the algebraic expression
n (a b) n for any n = 0, 1, 2, 3, …? Then replace
2
2 (1 )n 2
a with , and b with 1 – . Voilà!
2
etc. etc.
n n
n (1 )n n
n
1
Ismor Fischer, 5/26/2016 4.1-19

Comments:
 The assumption of independence of the trials is absolutely critical! If not satisfied – i.e.,
if the “success” probability of one trial influences that of another – then the Binomial
Distribution model can fail miserably. (Example: X = “number of children in a particular
school infected with the flu”) The investigator must decide whether or not independence
is appropriate, which is often problematic. If violated, then the correlation structure
between the trials may have to be considered in the model.

 As in the preceding example, if the sample size n is very large, then the computation of
n
for x = 0, 1, 2, …, n, can be intensive and impractical. An approximation to the
x
Binomial Distribution exists, when n is large and is small, via the Poisson Distribution
(coming up…).

 Note that the standard deviation = n (1 ) depends on the value of . (Later…)

Ismor Fischer, 5/26/2016 4.1-20

How can we estimate the parameter , using a sample-based statistic ˆ ?

POPULATION
Binary random variable
1, Success with probability
Y =
0, Failure with probability 1

Experiment: n independent trials

SAMPLE
0/1 0/1 0/1 0/1 0/1 0/1 … 0/1
(y1, y2, y3, y4, y5, y6, …, yn)

y1 + y2 + y3 + y4 + y5 + … + yn

Let X = # Successes in n trials ~ Bin(n, )

(n X = # Failures in n trials).

Therefore, dividing by n…
X
n = proportion of Successes in n trials

ˆ = p ( = y , as well)
and hence…
q = 1 p = proportion of Failures in n trials.

Example: If, in a sample of n = 50 randomly selected individuals, X = 36 are female,

X 36
then the statistic ˆ = = = 0.72 is an estimate of the true probability that a
n 50
randomly selected individual from the population is female. The probability of
selecting a male is therefore estimated by 1 ˆ = 0.28 .
Ismor Fischer, 5/26/2016 4.1-21

 Poisson Distribution (Models rare events)

Discrete Random Variable:

X = # occurrences of a (rare) event E, in a given interval
of time or space, of size T. (0, 1, 2, 3, …)

0 T

Assume:
1. All the occurrences of E are independent in the interval.
2. The mean number of expected occurrences of E in the interval is proportional
to T, i.e., = T. This constant of proportionality is called the rate of the
resulting Poisson process.

Then…

The Poisson Distribution

The probability of obtaining any specified number x = 0, 1, 2, … of

occurrences of event E is given by the pmf p(x):
x
e
P(X = x) =
x!

where e = 2.71828… (“Euler’s constant”).

We say that X has a Poisson Distribution, denoted X ~ Poisson( ).

Furthermore, the mean is = T, and the variance is 2 = T also.

Examples: # bee-sting fatalities per year, # spontaneous cancer remissions per year,
# accidental needle-stick HIV cases per year, hemocytometer cell counts
Ismor Fischer, 5/26/2016 4.1-22

Example (see above): Again suppose that a certain spontaneous medical condition E
affects 1% (i.e., = 0.01) of the population. Let X = “number of affected individuals
in a random sample of T = 300.” As before, the mean number of expected occurrences
of E in the sample is = T = (0.01)(300) = 3 cases. Hence X ~ Poisson(3), and the
probability that any number x = 0, 1, 2, … of individuals are affected is given by:
e 3 3x
P(X = x) =
x!
which is a much easier formula to work with than the previous one. This fact is
sometimes referred to as the Poisson approximation to the Binomial Distribution,
when T (respectively, n) is large, and (respectively, ) is small. Note that in this
example, the variance is also 2 = 3, so that the standard deviation is = 3 = 1.732,
very close to the exact Binomial value.

Binomial Poisson
3
x 300 e 3x
P(X = x) = x (0.01)x (0.99)300 x
P(X = x) =
x!
0 0.04904 0.04979
1 0.14861 0.14936
2 0.22441 0.22404
3 0.22517 0.22404
4 0.16888 0.16803
5 0.10099 0.10082
6 0.05015 0.05041
7 0.02128 0.02160
8 0.00787 0.00810
9 0.00258 0.00270
10 0.00076 0.00081
etc. 0 0

Area = 1
Area = 1
Ismor Fischer, 5/26/2016 4.1-23

Why is the Poisson Distribution a good approximation to the

Binomial Distribution, for large n and small ?
Rule of Thumb: n 20 and 0.05; excellent if n 100 and 0.1.
x
n x n x e
Let pBin(x) = x (1 ) and pPoisson(x) = , where = n .
x!
We wish to show formally that, for fixed , and x = 0, 1, 2, …, we have:
lim pBin(x) = pPoisson(x).
n
0

Proof: By elementary algebra, it follows that…

n x
pBin(x) = x (1 )n x

n! x
= (1 ) n (1 ) x
Siméon Poisson
x! (n x)!
(1781 - 1840)
1 n
x x
= n (n 1) (n 2) ... (n x + 1) 1 (1 )
x! n

1 n (n 1) (n 2) ... (n x + 1) n
x x x
= n 1 (1 )
x! nx n

1 n n 1 n 2 n x+1 n
= … (n )x 1 (1 ) x
x! n n n n n

1 1 2 x 1 n
x x
= 1 1 1 … 1 1 (1 )
x! n n n n

As n ,
0,
1 x x
1(1)(1) … (1) = 1 e 1 =1
x!
x
e
= = pPoisson(x). QED
x!
Ismor Fischer, 5/26/2016 4.1-24

Classical Discrete Probability Distributions

Binomial (probability of finding x “successes” and n – x “failures” in n independent trials)
X = # successes (each with probability ) in n independent Bernoulli trials, n = 1, 2, 3, …

n x
p(x) = P(X = x) = x
(1 )n x
, x = 0, 1, 2, …, n

Negative Binomial (probability of needing x independent trials to find k successes)

X = # independent Bernoulli trials for k successes (each with probability ), k = 1, 2, 3, …

x 1 k
p(x) = P(X = x) = k 1
(1 )x k
, x = k, k + 1, k + 2, …

Geometric: X = # independent Bernoulli trials for k = 1 success

p(x) = P(X = x) = (1 )x 1
, x = 1, 2, 3, …

Hypergeometric (modification of Binomial to sampling without replacement from “small” finite populations, relative to n.)

N
X = # successes in n random trials taken from a population of size N containing d successes, n >
10
d N d
x n x
p(x) = P(X = x) = N
, x = 0, 1, 2, …, d
n

Multinomial (generalization of Binomial to k categories, rather than just two)

For i = 1, 2, 3, …, k,
Xi = # outcomes in category i (each with probability i), in n independent Bernoulli trials, n = 1, 2, 3, …
1 2 3 k 1
n!
p(x1, x2, …, xk) = P(X1 = x1, X2 = x2, …, Xk = xk) = x1 x2 xk
,
x1! x2! … xk! 1 2 k

xi = 0, 1, 2, …, n with x1 + x2 + … + xk = n

Poisson (“limiting case” of Binomial, with n and 0, such that n = , fixed)

X = # occurrences of a rare event (i.e., 0) among many (i.e., n large), with fixed mean =n
x
e
p(x) = P(X = x) = , x = 0, 1, 2, …
x!

Probability and Statistics 2 Notes
75% (4)
Probability and Statistics 2 Notes
20 pages
4.1 - Discrete Statistics
No ratings yet
4.1 - Discrete Statistics
24 pages
Random Variables
No ratings yet
Random Variables
26 pages
Chapter 3
100% (1)
Chapter 3
9 pages
Discrete Random Variable
No ratings yet
Discrete Random Variable
41 pages
P&S Unit 1
No ratings yet
P&S Unit 1
50 pages
Unit 5 Random Variables: Structure
No ratings yet
Unit 5 Random Variables: Structure
20 pages
Block 2
No ratings yet
Block 2
87 pages
Ignou Stat
No ratings yet
Ignou Stat
320 pages
TOPIC8. Random Variables and Probability Distributions
100% (1)
TOPIC8. Random Variables and Probability Distributions
8 pages
Aem Probability PDF
No ratings yet
Aem Probability PDF
10 pages
Unit7 Probability Statistics I-1
No ratings yet
Unit7 Probability Statistics I-1
49 pages
Wa0007.
No ratings yet
Wa0007.
111 pages
211probability Distribution 1
No ratings yet
211probability Distribution 1
14 pages
Lecture 4
No ratings yet
Lecture 4
39 pages
Probability Theory Lecture Notes
No ratings yet
Probability Theory Lecture Notes
68 pages
S-11 - Random Variables and Discrete Probability Distributions
No ratings yet
S-11 - Random Variables and Discrete Probability Distributions
24 pages
UPDATED Complete Notes of BCS301
No ratings yet
UPDATED Complete Notes of BCS301
93 pages
L-6 Probability Distribution
No ratings yet
L-6 Probability Distribution
58 pages
Chapter 4
80% (5)
Chapter 4
21 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
Discrete Probability Distributions Ppt2
No ratings yet
Discrete Probability Distributions Ppt2
20 pages
Sta 2200 Notes
No ratings yet
Sta 2200 Notes
55 pages
RV and Their Prob Dist Complete
No ratings yet
RV and Their Prob Dist Complete
13 pages
Chapter 5 Sampling in Discrete Even Simulation
No ratings yet
Chapter 5 Sampling in Discrete Even Simulation
56 pages
Stat 153 Lecture Five
No ratings yet
Stat 153 Lecture Five
46 pages
Discrete-Random Variable
No ratings yet
Discrete-Random Variable
30 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
Random Variables
No ratings yet
Random Variables
44 pages
Probability and Statistics PG 83,84,85
No ratings yet
Probability and Statistics PG 83,84,85
98 pages
Ma s1-2 Discrete-Probability-Distributions 201024
No ratings yet
Ma s1-2 Discrete-Probability-Distributions 201024
41 pages
Random Variable and Mathematical Expectation
No ratings yet
Random Variable and Mathematical Expectation
9 pages
Uperior Niversity Ahore: (D O S E)
No ratings yet
Uperior Niversity Ahore: (D O S E)
6 pages
04 ES Random Variables
No ratings yet
04 ES Random Variables
17 pages
Random Variable Slides
No ratings yet
Random Variable Slides
41 pages
Week05-06 EC With Annotations
No ratings yet
Week05-06 EC With Annotations
84 pages
E3BAIS Chapter 1
No ratings yet
E3BAIS Chapter 1
74 pages
r3MM - Chapter 1
No ratings yet
r3MM - Chapter 1
72 pages
Introduction To Probability: 2.1 Random Variable
No ratings yet
Introduction To Probability: 2.1 Random Variable
4 pages
Random Variable and Bivariate Distributions
No ratings yet
Random Variable and Bivariate Distributions
124 pages
Distribution of Probability
No ratings yet
Distribution of Probability
34 pages
Book - Applied - Mathematics - ClassXII-208-382-1-135
No ratings yet
Book - Applied - Mathematics - ClassXII-208-382-1-135
135 pages
Book - Applied - Mathematics - ClassXII-208-382
No ratings yet
Book - Applied - Mathematics - ClassXII-208-382
175 pages
Stat
No ratings yet
Stat
114 pages
Probabiliti Theory Notes
No ratings yet
Probabiliti Theory Notes
21 pages
Probability-Bayes Theorem and Discrete
No ratings yet
Probability-Bayes Theorem and Discrete
46 pages
Probability and Statistics II MAY 2023
No ratings yet
Probability and Statistics II MAY 2023
51 pages
Prob Distributions
No ratings yet
Prob Distributions
12 pages
Chapter 3 & 4
No ratings yet
Chapter 3 & 4
12 pages
Unit 4. Probability Distributions
No ratings yet
Unit 4. Probability Distributions
18 pages
Statistical Method
No ratings yet
Statistical Method
227 pages
Module 05
No ratings yet
Module 05
94 pages
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
Lesson - Plan in General Mathematics
No ratings yet
Lesson - Plan in General Mathematics
5 pages
Brownian Motion A Guide To Random Processes and Stochastic Calculus 3rd Edition Ren L Schilling With A Chapter On Simulation by BJRN Bttcher Download
No ratings yet
Brownian Motion A Guide To Random Processes and Stochastic Calculus 3rd Edition Ren L Schilling With A Chapter On Simulation by BJRN Bttcher Download
82 pages
Bayes' Theorem and Its Applications
0% (1)
Bayes' Theorem and Its Applications
11 pages
Chapter 5 - Probability
No ratings yet
Chapter 5 - Probability
97 pages
Maths Practical
No ratings yet
Maths Practical
11 pages
Basic Probabilty Questions PDF
No ratings yet
Basic Probabilty Questions PDF
5 pages
Week 7
No ratings yet
Week 7
9 pages
Econ 2042 - Course Outline
No ratings yet
Econ 2042 - Course Outline
4 pages
5 Joint Probability Distribution
No ratings yet
5 Joint Probability Distribution
28 pages
Six Sigma Applications
No ratings yet
Six Sigma Applications
300 pages
Lecture Two Micr-I
No ratings yet
Lecture Two Micr-I
28 pages
Distribution Theory Questionnaire
No ratings yet
Distribution Theory Questionnaire
3 pages
Chapter Three: 3. Random Variables and Probability Distributions 3.1. Concept of A Random Variable
No ratings yet
Chapter Three: 3. Random Variables and Probability Distributions 3.1. Concept of A Random Variable
6 pages
Contoh Job Hazard Analisis
No ratings yet
Contoh Job Hazard Analisis
6 pages
Monte Carlo Simulation Trading Plan
No ratings yet
Monte Carlo Simulation Trading Plan
2 pages
Histogram of The Probability Mass Function
No ratings yet
Histogram of The Probability Mass Function
10 pages
The Birthday Problem
No ratings yet
The Birthday Problem
4 pages
Mathematics: Quarter 4 - Module 17: Analyzing Data Obtained From Chance Using Experiments
No ratings yet
Mathematics: Quarter 4 - Module 17: Analyzing Data Obtained From Chance Using Experiments
17 pages
Introduction To Risk and Components of Risk: Lecture # 2 Engr. Yaseen Mahmood M.Sc. Earthquake Engineering
No ratings yet
Introduction To Risk and Components of Risk: Lecture # 2 Engr. Yaseen Mahmood M.Sc. Earthquake Engineering
26 pages
Leemis McQueston 08 Univariate Distributions Relationship
No ratings yet
Leemis McQueston 08 Univariate Distributions Relationship
9 pages
(1982) Kahneman & Tversky - Variants of Uncertainty
100% (1)
(1982) Kahneman & Tversky - Variants of Uncertainty
15 pages
CIV 442 Hydrology: Lecture 4C: Hyetograph-Probability
No ratings yet
CIV 442 Hydrology: Lecture 4C: Hyetograph-Probability
33 pages
Gr.10 Apan DLP Wk. 5 q1
No ratings yet
Gr.10 Apan DLP Wk. 5 q1
8 pages
Ethangreen18498514 2b Unit
No ratings yet
Ethangreen18498514 2b Unit
18 pages
Bayesian Network Approach To Assessing System Reliability For Improving System Design and Optimizing System Maintenance
No ratings yet
Bayesian Network Approach To Assessing System Reliability For Improving System Design and Optimizing System Maintenance
151 pages
Module 5
No ratings yet
Module 5
18 pages
Chapter 4 Functions of Random Variables
No ratings yet
Chapter 4 Functions of Random Variables
34 pages
5.2 Notes For Fair Game
No ratings yet
5.2 Notes For Fair Game
2 pages
Mathematics & Statistic QAT Question Bank
No ratings yet
Mathematics & Statistic QAT Question Bank
30 pages
Week 1-3 - Risk Models
No ratings yet
Week 1-3 - Risk Models
61 pages