0% found this document useful (0 votes)
46 views40 pages

STAT272 Topic2 2015

This document provides an introduction to probability distributions and some common discrete distributions: 1. It defines random variables and gives examples of discrete random variables like the number of heads in coin tosses. 2. It introduces probability distributions and defines discrete distributions as those that can take on countably finite values. It provides examples of valid and invalid probability functions. 3. It describes some common discrete distributions like the Bernoulli, binomial, and geometric distributions and gives their probability mass functions.

Uploaded by

Jason D Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views40 pages

STAT272 Topic2 2015

This document provides an introduction to probability distributions and some common discrete distributions: 1. It defines random variables and gives examples of discrete random variables like the number of heads in coin tosses. 2. It introduces probability distributions and defines discrete distributions as those that can take on countably finite values. It provides examples of valid and invalid probability functions. 3. It describes some common discrete distributions like the Bernoulli, binomial, and geometric distributions and gives their probability mass functions.

Uploaded by

Jason D Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

STAT272 Probability

Topic 2
Discrete Distributions and the Poisson Process

STAT272 2015 Topic 2 1


Random Variables
• Technical definition: a random variable is a function whose
domain is the sample space and whose range is the real line. i.e.
X is a random variable if

X:Ω→R

• The function must satisfy other conditions as well.

STAT272 2015 Topic 2 2


• e.g. Let X be the number of heads in four tosses of a coin. The
sample space Ω is the collection of all outcomes {O1 , O2 , O3 , O4 } ,
where Oi is one of H or T. X is then 0, 1, 2, 3 or 4, depending on
how many of the Oi ’s equal H.
• Mostly the sample space is suppressed and we just think of X as
taking on certain values with certain probabilities.
• e.g. We say that X is a Poisson random variable if for some
λ > 0,
e−λ λx
P (X = x) = x! ; x = 0, 1, 2, . . .

• Note that in the above we do not need to specify Ω.

STAT272 2015 Topic 2 3


Probability Distributions
• There are two main types of random variables: discrete and
continuous.
• A random variable X is said to be discrete if it can assume only
a finite (or countably infinite) number of distinct values. A set is
countably infinite if its elements can be put into 1:1
correspondence with the positive integers (which are, of course,
infinite). The rational numbers are countably infinite, while the
reals are not.
• Notation: P (X = x) = fX (x) . We call fX the probability
function (of X). It must satisfy the following conditions, in order
that the axioms of probability are satisfied
1. fX (x) ≥ 0, ∀x;
P
2. x fX (x) = 1.

STAT272 2015 Topic 2 4


• Note that we use X for a random variable and x for a value of X.
You will be penalised in assignments and the exam if you do not
distinguish between upper and lower case. Thus Y will be a
random variable and y a value of Y.
• Examples:
1. The function g for which



 0.3 ; x=1


 0.6 ; x=2
g (x) =


 0.1 ; x=3


 0 ; otherwise

can be the probability function of some rv (random variable);

STAT272 2015 Topic 2 5


2. The function g (x) for which

 k ; x = 1, 2, 3, . . .
x
g (x) =
 0 ; otherwise
P∞ 1
does not represent a probability function as x=1 x = ∞.
3. Probabilities based on the zeta function, where

X 1
ζ (α) = α
x=1
x

converges if α > 1, and diverges if α ≤ 1: A zeta probability


function is given by

 k ; x = 1, 2, 3, . . .
fX (x) = xα
 0 ; otherwise

STAT272 2015 Topic 2 6


where α > 1 and k = 1/ζ (α).

Discrete Distributions
The Bernoulli distribution

• Consider an experiment in which there are only two possible


outcomes: ‘success’ and ‘failure’ (e.g. tossing a coin and
‘success’≡‘head’).
• We may construct a rv X by assigning the value 0 to ‘failure’
and 1 to ‘success’. The reason for doing this will become
apparent later on.
• We call such a rv X a Bernoulli rv and such an experiment a
Bernoulli trial.
• X has the probability function (pf), letting p be the probability

STAT272 2015 Topic 2 7


of ‘success’,

 1−p ; x=0


fX (x) = p ; x=1



0 ; otherwise.

p is called the “parameter” of the distribution. Some texts use π


instead of p.
• We can also write the pf as

 (1 − p)1−x px ; x = 0, 1
fX (x) =
 0 ; otherwise

• fX (x) pretty obviously satisfies the conditions for a pf if


0 ≤ p ≤ 1.

STAT272 2015 Topic 2 8


The Binomial Distribution

• A binomial random variable is the sum of n independent


Bernoulli random variables (i.e. a binomial experiment consists
of n independent Bernoulli trials).
• Thus a binomial random variable is the number of 1’s i.e. the
number of successes in n Bernoulli trials.
• Let X be distributed binomially, with parameters n and p (n is
the number of trials and p is the probability of ‘success’). We
write X ∼ Bin(n, p) . What is the probability function of X?
• The event X = x is the event that there are x successes and
(n − x) failures in the n trials.
• The successes can occur in any subset of length x in the n trials.
n

There are x such different subsets (combinations), as there are
x places out of n in which we can slot the successes – the

STAT272 2015 Topic 2 9


remaining (n − x) places will contain the failures.
• Each such combination has the same probability (where there are
x p’s and (n − x) (1 − p)’s below)
n−x
p × p × . . . × p × (1 − p) × (1 − p) × . . . × (1 − p) = px (1 − p) .

• Thus
 
 n
x (1 − p)n−x px ; x = 0, 1, . . . , n
fX (x) =
 0 ; otherwise.

• We should probably check that this is a pf:


1. fX (x) ≥ 0, ∀x;

STAT272 2015 Topic 2 10


2. By the binomial theorem
n  
X n x n−x
p (1 − p)
x=0
x
= {p + (1 − p)}n
= 1.

The Geometric Distribution

• Let X be the number of trials up to and including the first


success in a sequence of independent Bernoulli trials with
constant success probability of p.
• We say that X is geometrically distributed with parameter p.
• What is the pf of X? The event X = x (x = 1, 2, . . .) occurs if
the first (x − 1) trials result in failure and the xth is a success.

STAT272 2015 Topic 2 11


The probability that this occurs is, where there are (x − 1)
(1 − p) terms below,
x−1
(1 − p) × · · · × (1 − p) × p = (1 − p) p

• Thus 
 (1 − p)x−1 p ; x = 1, 2, . . .
fX (x) =
 0 ; otherwise.
• Checks:
1. fX (x) ≥ 0, ∀x;
2. As long as 0 < p < 1,

X x−1 1
p (1 − p) =p
x=1
1 − (1 − p)
p
= = 1.
p

STAT272 2015 Topic 2 12


The formula also holds if p = 1, but not if p = 0, obviously.
• Notation: Some books use q = 1 − p.

Negative Binomial Distribution

• Let X be the number of trials up to and including the kth


success in a sequence of independent Bernoulli trials with
constant success probability p.
• We say that X has the negative binomial distribution with
parameters k and p. (The reason for ‘negative binomial’ will
become apparent later.)
• What is the pf of X? The event X = x occurs when there are
exactly (k − 1) successes in the first (x − 1) trials, followed by a
success in the xth trial. Each combination has the same

STAT272 2015 Topic 2 13


probability. Hence, for x = k, k + 1, . . .
 
x − 1 k−1
fX (x) = p (1 − p)x−1−(k−1) p
k−1
 
x−1 k x−k
= p (1 − p)
k−1
and fX (x) = 0 otherwise.
• Checks:
1. fX (x) ≥ 0, ∀x;
P∞
2. Before evaluating x=k fX (x), we need

Newton’s generalized binomial theorem:


∞  
X r r−i i
(a + b)r = a b (1)
i=0
i

STAT272 2015 Topic 2 14


where
 
r r(r − 1) · · · (r − i + 1)
= for r any real number
i i!

So we can write

p−k = (1 − q)−k where q = 1 − p


∞  
X −k −k−y
= 1 (−q)y [(1) with r = −k, a = 1, b = −q]
y=0
y
∞  
X −k
= (−1)y q y
y=0
y
∞  
X −k
= (−1)y (1 − p)y . (2)
y=0
y

STAT272 2015 Topic 2 15


As long as 0 < p ≤ 1,
∞ ∞  
X X x−1 k
fX (x) = p (1 − p)x−k
k−1
x=k x=k
∞  
X y+k−1
=p k
(1 − p)y
y=0
k−1

X (y + k − 1)! y
= pk (1 − p)
y=0
(k − 1)!y!

X (y + k − 1) · · · k y
= pk (1 − p)
y=0
y!

Consider the numerator (y + k − 1) · · · k. There are


y + k − 1 − (k − 1) = y terms in the product. If we multiply
each term by −1, multiplying by (−1)y preserves the identity.

STAT272 2015 Topic 2 16


So
(−1)y (−k) (−k − 1) · · · (−k − y + 1)

X ∞
X y
fX (x) = pk (1 − p)
y=0
y!
x=k
∞  
X −k
= pk (−1)y (1 − p)y
y=0
y

= pk · p−k from (2)


=1

We have thus seen the reason for calling the distribution the
negative binomial distribution: the probabilities involve
coefficients in binomial expansions with negative index.
• Note: The negative binomial distribution is the generalisation of
the geometric distribution i.e. the geometric is the same as the
negative binomial with k = 1.

STAT272 2015 Topic 2 17


The Hypergeometric Distribution

• It is better to consider an example first: a box containing N


balls, has k white ones and (N − k) black. Draw n balls at
random without replacement. Let X be the number of white
balls drawn.
N

• What is the pf of X? Firstly, there are n different
combinations of n balls from N . The event X = x occurs when x
of the n balls are white and therefore n − x are black. The

k N −k

number of combinations with x white balls is therefore x n−x ,
since we must choose x white balls from K and (n − x) black
balls from (N − k). Hence,

k N −k

x n−x
fX (x) = N
 .
n

STAT272 2015 Topic 2 18


• This is not totally correct – we must specify the vales of x for
which this is correct. We must have 0 ≤ x ≤ k. We must also
have 0 ≤ n − x ≤ N − k, or

n − N + k ≤ x ≤ n.

Thus
   

 k N − k


 x n−x


   ; max (0, n − N + k) ≤ x ≤ min (k, n)
 N
fX (x) = n








 0 ; otherwise

STAT272 2015 Topic 2 19


• Checks:
1. fX (x) ≥ 0, ∀x;
P
2. x fX (x) = 1
(not obvious arithmetically).
• Example: From a group of 20 graduate Actuaries, 9 are selected
randomly for employment. Assuming random selection, what is
the probability that the 9 selected include all the 5 best
Actuaries in the group of 20? Here

N = 20, k = 5, N − k = 15, n = 9.

STAT272 2015 Topic 2 20


• Let X be the number of best Actuaries chosen. Then
5
 15 
5 9−5
P (X = 5) = 20

9
15! 9!11!
=1× ×
4!11! 20!
21
=
2584
≃ 0.008127 .

STAT272 2015 Topic 2 21


The Poisson Distribution

• “Poisson” is French for fish, but the distribution is named after


the famous 19th century French mathematician Siméon-Denis
Poisson.
• The Poisson is a discrete distribution defined on the non-negative
integers 0, 1, 2, . . ., and is used as a model for counts occurring in
a fixed time period or space, e.g.
– number of accidents at the Herring Road-Waterloo Road
intersection in a week
– number of sms messages arriving on your mobile phone in a
day
– number of claims on an insurance policy in a year
– number of sharks near Bondi Beach at a particular time

STAT272 2015 Topic 2 22


• The Poisson distribution can be derived in many ways. We shall
look at two.

1. Binomial limit

Suppose X ∼ Bin(n, p) . Thus, for x = 0, 1, . . . , n


 
n x n−x
fX (x) = p (1 − p) .
x
Suppose that n is large, and p is small, but that np is also moderate
e.g. ≤ 10. (We shall look at the case where np is large later.) Let
θ = np and consider
g (x) = log fX (x) .
Then
   
θ θ
g (x) = log (n!)−log (x!)−log {(n − x)!}+x log +(n − x) log 1 − .
n n

STAT272 2015 Topic 2 23


Recall: Stirling’s formula gives an approximation for log(n!)
 
1 1 a (n)
log (n!) = n + log n − n + log (2π) + 0 < a (n) < 1 .
2 2 12n
(3)
Thus, we also have, for x fixed as n → ∞,
 
1 1 a (n − x)
log {(n − x)!} = n − x + log (n − x)−(n − x)+ log (2π)+ .
2 2 12 (n − x)
(4)

Recall: Taylor series expansion of log(1 + x):


x2 x3 x4
log(1 + x) = x − + − + ··· −1<x≤1
2 3 4

STAT272 2015 Topic 2 24


which gives
x2 x3 x4
log(1 − x) = −x − − − − ··· −1≤x<1 .
2 3 4
Now, for fixed θ, as n → ∞
 
θ θ θ2
log 1 − = − − 2 − ··· .
n n 2n
and hence
 
θ
(n − x) log 1 − = −(n − x)θ/n − (n − x)θ 2 /2n2 − . . . (5)
n

STAT272 2015 Topic 2 25


and for fixed x as n → ∞
  x 
log (n − x) = log n 1 −
n
 x
= log n + log 1 −
n
x x2
= log n − − 2 − · · · .
n 2n
So we can write the first term of (4) as
    2

1 1 x x
n−x+ log (n − x) = n − x + log n − − 2 − · · ·
2 2 n 2n
    2

1 1 x x
= n−x+ log n − n − x + + 2 + ···
2 2 n 2n
(6)

STAT272 2015 Topic 2 26


Thus
g (x)
 
1 1 a (n)
= n+ log n − n + log(2π) + − log (x!)
2 2 12n
| {z }
(3)
 
    2
 
 1 1 x x 1 a(n − x) 
− n−x+ log n − n − x + + 2 + · · · − (n − x) + log (2π) + 
 2 2 n 2n 2 12(n − x) 
| {z }
(6)

+ x log θ − x log n − (n − x) θ/n − (n − x) θ2 /2n2 − · · ·


| {z }
(5)
 
1 1
= n + − n + x − − x log n − n + x + n − x − log (x!) − θ + x log θ,
2 2

STAT272 2015 Topic 2 27


plus other terms which converge to 0 as n → ∞. Hence, as n → ∞

g (x) → − log (x!) − θ + x log θ

and so
e−θ θ x
fX (x) → .
x!
This function is a pf since
∞ ∞
X e−θ θ x X θx
= e−θ
x=0
x! x=0
x!
= e−θ eθ
= e−θ+θ
= e0
=1.

STAT272 2015 Topic 2 28


Hence we have proved that the limiting form of the Binomial
distribution as n → ∞ and np = θ stays constant, is
e−θ θ x
fX (x) = x = 0, 1, 2, . . . .
x!
This is the Poisson distribution.

STAT272 2015 Topic 2 29


2. Poisson Process

(do not get mixed up here – it is very easy to do so). A Poisson


process is not a rv and is not a distribution. Firstly, a process, or
more correctly a stochastic process, {X(t)}, is a sequence of random
variables for which each element X(t) is a rv. Note that we use {} to
denote the process, or collection of random variables making up the
process. Changes in the index t are often thought of as changes in
time. The Poisson process is a model for the number of ‘events’
which occur. If {X(t)} is a Poisson process, then the rv X(t) is the
number of events which have occurred in the time interval (0, t).

STAT272 2015 Topic 2 30


As t increases, therefore, so will X(t). Now, each X(t) is a rv and has
a probability function. An example is where X(t) is the number of
α-particles emitted in the interval (0, t) . Together, the sequence
{X(t)} is a (stochastic) process, but at some time t, X(t) represents
the number of α-particles which have been emitted in (0, t).

“Little-oh” notation:

• We say that
g (δt) = o (δt)
g(δt)
if δt goes to 0 as δt goes to 0.
• It helps to think of o (δt) as something which is much smaller in
magnitude than δt (e.g. (δt)2 ) when |δt| is small.

STAT272 2015 Topic 2 31


Consider the following. Suppose events occur randomly in time,
subject to the conditions that
• times between events are independent of each other,
• the probability of one event occurring in (t, t + δt) is

λδt + o (δt)

in the limit as δt → 0, and


• the probability of more than one event occurring in (t, t + δt) is
o (δt).
Let X (t) be the number of events which have occurred in the interval
(0, t) . Then we say that {X(t)} is a Poisson process. We wish to
calculate the pf of the rv X(t).

STAT272 2015 Topic 2 32


For x = 0, 1, . . ., put

fX(t) (x) = P (X(t) = x) = px (t) .

Now, px (t + δt) is the probability that x events have occurred in the


interval (0, t + δt). It may be that
• x events occurred in (0, t) and none in [t, t + δt),
• or that x − 1 events occurred in (0, t) and 1 in [t, t + δt)
• or that x − 2 events occurred in (0, t) and 2 in [t, t + δt) , etc.

STAT272 2015 Topic 2 33


However, the probability that 2 events occur in [t, t + δt) is o (δt).
We thus have for x ≥ 1:

px (t + δt) = P (x events in (0, t) and none in [t, t + δt))


+ P (x − 1 events in (0, t) and 1 in [t, t + δt))
+ o (δt)
= px (t) P (no events in [t, t + δt))
+ px−1 (t) P (1 event in [t, t + δt)) + o (δt)
= px (t) {1 − λδt + o (δt)}
+ px−1 (t) {λδt + o (δt)} + o (δt)
= px (t) + λ {px−1 (t) − px (t)} δt + o (δt) .

STAT272 2015 Topic 2 34


Also,

p0 (t + δt) = P {0 events in (0, t) and none in [t, t + δt)}


= p0 (t) {1 − λδt + o (δt)}
= p0 (t) − λp0 (t) δt + o (δt) .

Thus, for x = 0, 1, . . .

o(δt)
px (t + δt) − px (t)  λ {px−1 (t) − px (t)} + δt ; x = 1, 2, . . .
=
δt  −λp0 (t) + o(δt) ; x=0
δt

and so

 λ {p
d x−1 (t) − px (t)} ; x = 1, 2, . . .
px (t) =
dt  −λp0 (t) ; x=0

STAT272 2015 Topic 2 35


The unique solution to these differential equations for which

X
px (t) = 1, ∀t
x=0

can be shown to be
x
(λt) e−λt
px (t) = .
x!
You can derive this sequentially, or just verify the solution by
induction on x.

STAT272 2015 Topic 2 36


A random variable Y with probability function

 e−θ θy ; y = 0, 1, . . .
y!
fY (y) =
 0 ; otherwise

is said to have the Poisson distribution with parameter θ. A Poisson


process {X (t)} with parameter λ therefore has the property that for
each t the rv X (t) has the Poisson distribution with parameter λt.
The following conditions must be met for a process to be a Poisson
process:
1. the arrival rate must be constant;
2. the times between events must be independent;
3. it must not be possible for two or more events to occur at the
same time.

STAT272 2015 Topic 2 37


Consider some practical situations. Could they be modelled using
Poisson processes?
1. The number of accidents in the Macquarie University car parks;
2. The number of people arriving at a bank for service.

Comments on these situations:


• Non-constant rate depending on time of day/day of week/week of
year/weather conditions. Non-independence – a driver could
smash into an accident which has already happened. Two cars
can collide simultaneously.
• Non-independence of events – people in pairs or arriving on the
same bus. Non-constant arrival rate λ due to lunch-hour etc.

STAT272 2015 Topic 2 38


The Multinomial Distribution
• This is the natural extension of the binomial distribution –
instead of there being two mutually exclusive and exhaustive
outcomes at each trial, there are k.
• The Multinomial distribution has k parameters: n, the number
of trials, and p1 , . . . , pk−1 , the probabilities of each of the first
(k − 1) outcomes.
• Note that given the (k − 1) probabilities p1 , . . . , pk−1 , the kth
probability pk is obtained from
k−1
X
pk = 1 − pi .
i=1
Pk
• Definition: If pi > 0 for i = 1, . . . , k and i=1 pi = 1, the random
variables Y1 , . . . , Yk are distributed multinomially with

STAT272 2015 Topic 2 39


parameters (n, p1 , . . . , pk−1 ) if
n!
P (Y1 = y1 , . . . , Yk = yk ) = py11 · · · pykk
y1 ! · · · yk !
 k
Y
n
= pyi i
y1 y2 ··· yk i=1
| {z }
multinomial coefficient

k
P
where yi = n and 0 ≤ yi ≤ n, ∀i.
i=1

STAT272 2015 Topic 2 40

You might also like