0% found this document useful (0 votes)
16 views21 pages

Discrete Distributions

Uploaded by

Plugg TM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views21 pages

Discrete Distributions

Uploaded by

Plugg TM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

2P7: Probability & Statistics

Discrete Probability Distributions

Thierry Savin

Lent 2024

the royal flush, the best possible hand in poker, has a probability 0.000154%
Introduction
Course’s contents

1. Probability Fundamentals

2. Discrete Probability Distributions

3. Continuous Random Variables

4. Manipulating and Combining Distributions

5. Decision, Estimation and Hypothesis Testing

1/19
Introduction
This lecture’s contents

Introduction

The Bernoulli Distribution

The Geometric Distribution

The Binomial Distribution

The Poisson Distribution

2/19
Introduction
Discrete Probability Distributions

In the last lecture:


▶ We have reviewed the fundamental concepts of probability
▶ We have seen how discrete random variables are defined and
introduced the probability mass function
▶ We have shown how to calculate the expectation, variance
and entropy of a discrete random variable
In this lecture, we will see some examples of discrete probability
distributions
▶ There are many: see en.wikipedia.org/wiki/List_of_
probability_distributions
▶ Here, a small selection of the most important ones
▶ A lot of discrete random variables originates from the binary
random variable, and we’ll start with this one

3/19
The Bernoulli Distribution
Definition [DB p.27]

A binary random variable X is said to have a Bernoulli


distribution1 with parameter p ∈ [0, 1] if:
( p if k = 1,
X ∼ Ber(p) ⇔ PX (k) = 1 − p if k = 0,
0 otherwise.
The symbol “∼” means “distributed as”. The support of X,
X = {0, 1}, is discrete finite.
PX (k)
2 1
p= 4
3
1 1
p= 2
2
2
p= 3
1
4

k
0 1
1
named after the Swiss mathematician Jacob Bernoulli (1655-1705)
4/19
The Bernoulli Distribution
Properties of X ∼ Ber(p)

▶ Expectation E[X] = p [DB p.27]


X
E[X] = k PX (k) = 0 × (1 − p) + 1 × p = p
k∈X

▶ Variance Var[X] = p(1 − p) [DB p.27]


X
Var[X] = 2
(k − p) PX (k) = (−p)2 (1 − p) + (1 − p)2 p = p(1 − p)
k∈X

▶ Entropy H[X] = H2 (p) = −p log2 p − (1 − p) log2 (1 − p)

ImmediateX
from: 1

H[X] = H2 (p)
H[X] = − PX (k) log2 PX (k) 0.8
k∈X
0.6
H2 (p) is known as the binary entropy 0.4
function. 0.2
The max of H2 at p = 12 , where our 0
0 1 1 3 1
uncertainty is complete. 4 2 4
p

5/19
The Bernoulli Distribution
Examples

Bernoulli distributions occur in many scenarios:


▶ They are indicator random variables for events, e.g.:
• Will it rain tomorrow? A simple climate model may indicate
rain with X ∼ Ber(p) where different values of p would be
used for different areas.
• Will the UK economy grow above expectation?
• Will the message be received and decoded correctly?
▶ They occur naturally as answers to yes/no questions, e.g.
• Is the product defective?
• Did the defendant murder the victim?
▶ They also occur in their own right in digital communications,
where information is often encoded into binary symbols.
▶ Probability textbooks often illustrate Bernoulli distributions
using “biased coins”. These are coins that have different
probabilities of landing on “heads” or “tails”.
6/19
The Geometric Distribution
Definition

How many trials do I need to be successful?


▶ Suppose we look at a succession X1 , X2 , . . . of mutually
independent Bernoulli-distributed random variables (each
being called a trial) with same p, and we measure the
probability that the first “success” happens after k trials
▶ That is, Xk = 1, and Xj = 0 for all j ≤ k − 1:

P[“1st success at the k th trial”]


= P[X1 = 0 ∩ X2 = 0 ∩ . . . ∩ Xk−1 = 0 ∩ Xk = 1]
= P[X1 = 0] × P[X2 = 0] × . . . × P[Xk−1 = 0] × P[Xk = 1]
= (1 − p) × (1 − p) × . . . × (1 − p) × p
= (1 − p)k−1 p

▶ This is called “Geometric (2)” in the Data Book;


“Geometric (1)” is for a 1st success after k fails.
7/19
The Geometric Distribution
Definition [DB p.27]

A random variable X is said to have a geometric distribution with


parameter p ∈ [0, 1] if:

p(1 − p)k−1

if k ∈ {1, 2, . . . },
X ∼ Geo(p) ⇔ PX (k) =
0 otherwise.
The support of X, X = {1, 2, . . . }, is discrete infinite.
PX (k)
3
4 p = 14
1 p = 12
2
p = 34
1
4

k
0 2 4 6 8 10

X X a
Verify that PX (k) = 1 using the geometric series ak = [DB p.4]

k∈X k=1
1−a
8/19
The Geometric Distribution
Properties of X ∼ Geo(p)

▶ Expectation E[X] = 1/p [DB p.27]


∞ ∞  
X k−1 d X k d q 1
q = 1 − p, E[X] = kpq =p q =p =
dq dq 1 − q p
k=1 k=1
▶ Variance Var[X] = (1 − p)/p 2 [DB p.27]
∞ ∞ ∞
X d2 X k X
E[X2 ] = k 2 pq k−1 = qp q + kpq k−1 so
dq 2
k=1 k=1 k=1
d2
 
q 1 1 1−p
Var[X] = qp + − 2 =
dq 2 1−q p p p2
▶ Entropy H[X] = H2 (p)/p

H2 (p)
X
H[X] = − pq k−1 log2 (pq k−1 ) = 8

p
k=1

H[X] =
 ∞   ∞ 
p q X X
4
log2 × q k − log2 q× kpq k−1
q p
k=1 k=1
0
0 1 1 3 1
Diverges at p = 0, where success is 4 2 4

highly unexpected. . . p
9/19
The Geometric Distribution
Examples

Here are a few instances where geometric distributions occur:


▶ Quality control: how many items can be produced before
having a defective one;
▶ Chemistry and biology: polymer lengths distribution during
polymerisation;
▶ Business: how many attempts to make a sale will end in a
success;
▶ Computing: bounding time of randomised algorithms (while
loop repeated until success);
▶ Surveying: how many people do you have to ask before you
find a candidate.

10/19
The Binomial Distribution
Definition

How many times was I successful after n trials?


▶ A quick refresher on permutations and combinations
• Permutation: how many possible ways to put n items into
k ≤ n labelled places?
n!
n × (n − 1) × · · · × (n − k + 1) = = n Pk
(n − k)!
• Combination: how many possible ways to select k items from
n available? The k places can be arranged in k! ways:
n
Pk n!
= = nCk [DB p.4]
k! k!(n − k)!
▶ Suppose we have n mutually independent p-Bernoulli trials
{X1 , X2 , . . . , Xn }, among which k are successes:
PX1X2 . . . Xn (x1 , x2 , . . . , xn ) = p k (1 − p)n−k
The order doesn’t matter:
P[“k successes after n trials”] = nCk p k (1 − p)n−k
11/19
The Binomial Distribution
Definition [DB p.27]

A random variable X is said to have a Binomial distribution with


parameters n ∈ {1, 2, . . . } and p ∈ [0, 1] if:
n
Ck p k (1 − p)n−k if k ∈ {0, 1, 2, . . . , n},
X ∼ B(n, p) ⇔ PX (k) =
0 otherwise.
The support of X, X = {0, 1, 2, . . . , n}, is discrete finite.
PX (k)
0.3
n = 8, p = 14

0.2 n = 20, p = 12
n = 20, p = 34
0.1 n = 40, p = 12

k
0 5 10 15 20 25 30
X Xn
Verify that PX (k) = 1 using the binomial expansion (a + b)n = n
Ck ak b n−k [DB p.4]

k∈X k=0
12/19
The Binomial Distribution
Properties of X ∼ B(n, p)

▶ Expectation E[X] = np [DB p.27]


n n
n -k n -1
Ck -1 p k -1 q n -k (using nCk k = n -1Ck -1 n)
X n k
X
E[X] = Ck kp q = np
k=1 k=1
n−1
n -1 n -1 -k
X k
= np Ck p q = np(p + q)n−1 = np
k=0
▶ Variance Var[X] = np(1 − p) [DB p.27]
n n n
n -k
Ck kp q n -k +n(n -1)p 2 n -2
Ck -2 p k -2 q n -k
2
X n 2 k
X n k
X
E[X ] = Ck k p q =
k=1 k=1 k=2
so Var[X] = np + n(n − 1)p 2 (p + q)n−2 − n2 p 2 = np(1 − p)
▶ Entropy H[X] n= 8
n = 20
4 n = 40
There is no general simple formula.

H[X](p)
n≫1 p 3
H[X] ≈ log2 2πenp(1 − p) using
Stirling’s approximation 2
n≫1 √
n! ≈ 2πn(n/e)n . 1
1
The max of H[X] at p = 2
increases
0 1 1 3 1
with n. 4 2 4
p 13/19
The Binomial Distribution
Example

▶ Suppose that an aeroplane engine will fail with probability p


(independently from engine to engine), and that the aeroplane
makes a successful flight if at least half of its engines remain
operative.
▶ For what values of p is a four-engine aeroplane preferable to a
two-engine aeroplane?
We call Xn the number of failing engines, Xn ∼ B(n, p), with n the
number of engines.
P2
P[“4-eng airborne”] = k=0 PX (k) 4

= 4C0 p 0 (1 − p)4 + 4C1 p 1 (1 − p)3 + 4C2 p 2 (1 − p)2


= 1 − 4p 3 + 3p 4
P1
P[“2-eng airborne”] = k=0 PX (k) 2

= 2C0 p 0 (1 − p)2 + 2C1 p 1 (1 − p)


= 1 − p2
So P[“4-eng airborne”] ≥ P[“2-eng airborne”] for p ≤ 13 . For p = 10−5 ,
a four-engine aeroplane is 10−8 % safer than a two-engine aeroplane.
14/19
The Poisson Distribution
Definition

How many times was I successful given a success rate λ?


▶ Suppose we define a “density” of successes λ, that means we
have λ successes per unit interval.

▶ We further divide the unit interval into n subintervals, and


take n sufficiently large to see at most one success per
subinterval.

▶ The number of successes X in the unit interval follows a


binomial distribution with n trials, X ∼ B(n, p), and average
E[X] = np = λ ⇔ p = λ/n.
λ k
n−k
▶ The PMF is thus lim B n, λn : lim k!(n−k)!n!
1− λn
 
n→∞ n→∞ n
n n−1 n−k+1 λk λ n λ −k k e −λ
= λ k!
  
= lim n n . . . n k! 1 − n 1 − n
n→∞ | {z } | {z } | {z }
=1 =e −λ =1
15/19
The Poisson Distribution
Definition [DB p.27]

A random variable X is said to have a Poisson distribution2 with


parameter λ ∈ R (λ > 0) if: ( k −λ
λ e
X ∼ Pois(λ) ⇔ PX (k) = if k ∈ {0, 1, 2, . . . },
k!
0 otherwise.
The support of X, X = {0, 1, 2, . . . }, is discrete infinite.
PX (k)
0.4 λ=1
0.3 λ=4
0.2 λ = 10

0.1

k
0 5 10 15 20

X X xk
Verify that PX (k) = 1 using the power series e x = [DB p.5]

k∈X k=0
k!
2
named after the French mathematician Siméon Denis Poisson (1781-1840)
16/19
The Poisson Distribution
Properties of X ∼ Pois(λ)

▶ Expectation E[X] = λ [DB p.27]


∞ ∞
X λk e −λ X λk−1
E[X] = k = λe −λ = λe −λ e λ = λ
k! (k − 1)!
k=0 k=1
▶ Variance Var[X] = λ [DB p.27]
∞ k −λ ∞ ∞
X 2λ e 2 −λ
X λk−2 X λk−1
2
E[X ] = k =λ e + λe −λ so
k! (k − 2)! (k − 1)!
k=0 k=2 k=1
Var[X] = λ2 e −λ e λ + λe −λ e λ − λ2 = λ
▶ Entropy H[X]
4
There is no general simple formula.

H[X](λ)
λ≫1 3
H[X] ≈ log2 2πeλ
H[X] increases with λ. 2
1

0 5 10 15 20
λ

17/19
The Poisson Distribution
Examples

Poisson-distributed events are common. Here are a few instances:


▶ Time events
• Telecommunication: telephone calls arriving in a system;
internet traffic;
• Astronomy: photons arriving at a telescope;
• Management: customers arriving at a counter;
• Finance and insurance: number of losses or claims;
• Seismology: seismic risk in a given period of time;
• Radioactivity: number of decays in a radioactive sample;
• Optics: number of photons emitted in a single laser pulse.
▶ Spacial events
• Biology: number of mutations on a strand of DNA;
• Medicine: number of bacteria in a certain amount of liquid;
• Materials: number of surface defects on a new refrigerator;
• Edition: number of typographical errors found in a manuscript;
• Warfare: targeting of flying bombs on London in World War II.
18/19
One last thing . . .
A few remarks about the Binomial distribution

Sum of mutually independent Bernoulli trials:


▶ For n mutually independent Bernoulli trials {Xj ∼ Ber(p)}j=1...n
Xn
Xj ∼ B(n, p)
j=1
since the sum of the trials with support {0, 1} gives the
number of successes.
▶ We verify that E nj=1 Xj = np = nj=1 E[Xj ], but also that
P  P
Pn  Pn
Var j=1 Xj = np(p − 1) = j=1 Var[Xj ]. In general:
Var[X + Y] = Var[X] + Var[Y] if X, Y independent

Approximating the Binomial distribution:


n→∞
▶ We have seen B(n, λ/n) − −−→ Pois(λ)
▶ For large n, small p, and intermediate np, B(n, p) ≈ Pois(np)
can be a convenient approximation.
▶ We will see a famous limit when n → ∞ but p is fixed.
19/19
You can attempt Problems 1 to 7 of Examples Paper 5

You might also like