0% found this document useful (0 votes)
16 views69 pages

Notes

Uploaded by

shdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views69 pages

Notes

Uploaded by

shdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

MATH20712: Random Models

Lecture Notes

Dr Xiong Jin
[email protected]
ATB 1.106b
Contents

1 Introduction 3

2 A Review of Probability Theory 6


2.1 Probability Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Computation rules . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Conditional probability . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Discrete random variables . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Independence of discrete random variables . . . . . . . . . 11
2.2.3 Examples of discrete random variables . . . . . . . . . . . . 12
2.2.4 Examples of continuous random variables . . . . . . . . . . 13
2.3 Probability generating function . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Properties of probability generating functions . . . . . . . . 16

3 Simple Random Walks 22


3.1 Properties of simple random walks . . . . . . . . . . . . . . . . . . . 23
3.1.1 The probability mass function . . . . . . . . . . . . . . . . . 24
3.1.2 The joint probability mass function . . . . . . . . . . . . . . 25
3.1.3 Homogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.4 Markov property . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 First return probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 First return identity . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 (*) Generalized binomial formula . . . . . . . . . . . . . . . 29
3.2.3 First return probabilities . . . . . . . . . . . . . . . . . . . . . 31
3.2.4 The recurrence probability . . . . . . . . . . . . . . . . . . . 32
3.3 Total number of returns . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Probabilities of visiting a particular point . . . . . . . . . . . . . . . 34

1
3.5 The gambler’s ruin problem . . . . . . . . . . . . . . . . . . . . . . . 37

4 Branching processes 42
4.1 The mean size of the population . . . . . . . . . . . . . . . . . . . . 43
4.1.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . . 43
4.1.2 The law of total expectation . . . . . . . . . . . . . . . . . . . 44
4.1.3 The mean size of the population . . . . . . . . . . . . . . . . 44
4.2 The distribution of Zn . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.1 The Random Sums Lemma . . . . . . . . . . . . . . . . . . . 45
4.2.2 The probability generating function of Zn . . . . . . . . . . 47
4.3 Probability of ultimate extinction . . . . . . . . . . . . . . . . . . . 51

5 Renewal Processes 55
5.1 The distribution of N (t ) . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.1.1 Convolution Lemma . . . . . . . . . . . . . . . . . . . . . . . 56
5.1.2 The probability mass function of N (t ) . . . . . . . . . . . . . 57
5.1.3 Renewal function . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.1 The distribution of Poisson processes . . . . . . . . . . . . . 59
5.2.2 Properties of Poisson processes . . . . . . . . . . . . . . . . 60
5.3 Lifetimes of renewal processes . . . . . . . . . . . . . . . . . . . . . 62
5.3.1 The excess, current and total lifetimes . . . . . . . . . . . . . 62
5.3.2 The lifetimes of Poisson processes . . . . . . . . . . . . . . . 62
5.4 (*) Renewal theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

2
Chapter 1

Introduction

In this course we will introduce three classical mathematical models originated


from real life problems that involve with randomness. The course consists of
four parts:

1. A review of probability theory;

2. Simple random walks;

3. Branching processes;

4. Renewal processes.

To fix an idea of the kind of problems we are going to deal with, let us start with
some examples.

Example 1.1 (A drunken man’s walk). A drunken man is wondering on the street.
Being drunk, he has no sense of direction. So he may move one step forward with
equal probability that he moves one step backward. Assume that the drunken
man keeps wondering on the street all the time.

(1) How many times will he revisit the Bar?

(2) Suppose that his home is 100 steps away from the Bar. Will he eventually
reach his home?

(3) Are there any places on the street that he will not visit?

3
1/2 1/2 1/2 1/2
..... KFC Bar Tesco .....

1/2 1/2 1/2 1/2

Figure 1.1: One-dimensional simple random walks.

Example 1.2 (The spread of a rumour). In the spread of a rumour, each person
who is told the rumor has certain probability of believing it and only those who
believe it will pass the rumour on to a certain number of people. Suppose that
all individuals act independently and the rumour is never told to someone who
has already heard it.

(1) Will the rumour stop spreading eventually?

(2) If not, what is the probability that the rumour keep spreading indefinitely?

1 3

2 1

Figure 1.2: Galton-Watson tree.

Example 1.3 (The number of bags). A check-in desk at Manchester Airport just
opened. Each passenger in the queue has a certain number of bags to check in.
Assume that the number of bags and the time to complete the check-in for each
passenger are random, independent and follow the same laws.

4
(1) After 30 minutes how many passengers and how many bags in total have
checked-in?

(2) There are 10 passengers before you, how long do you need to wait until your
turn?

2 min 4 min 5 min 1 min


1 bag 3 bags 2 bags 0 bag 5 bag

Figure 1.3: Renewal processes.

5
Chapter 2

A Review of Probability Theory

2.1 Probability Space


A probability space (Ω, F , P) consists of three elements:

• Ω is the sample space made of all possible outcomes;

• F is a σ-field consisting of events (subsets of Ω);

• P is the probability measure, that is, for A ∈ F , P(A) is the probability that
the event A happens.

2.1.1 Computation rules


(0) P(Ω) = 1.

(1) If A ⊂ B , then P(A) ≤ P(B ).

(2) Let A c = Ω \ A be the complement of A. Then P(A c ) = 1 − P(A).

(3) P(A ∪ B ) = P(A) + P(B ) − P(A ∩ B ).

In particular, by (0) and (1), since any A ∈ F is a subset of Ω, i.e. A ⊂ Ω,


P(A) ≤ P(Ω) = 1; by (0) and (2), P(;) = 1 − P(Ω) = 1 − 1 = 0 and then by (3), if A
and B are disjoint, that is A ∩ B = ;, then P(A ∪ B ) = P(A) + P(B ).

6
A ∪B A ∩B

A B A B

Figure 2.1: Venn diagram.

2.1.2 Conditional probability


Definition 2.1 (Conditional probability). The probability of A given that event
B has occurred is the conditional probability of A given B , denoted by
P(A ∩ B )
P(A|B ) = .
P(B )
From the definition,

P(A ∩ B ) = P(A|B )P(B ) = P(B |A)P(A).

Lemma 2.2 (The law of total probability). Let {B i }∞ be a partition of the sample
S∞ i =1
space Ω, that is {B i }∞
i =0
are disjoint and Ω = B
i =1 i
. Then for any event A it holds
that

P(A) = P(A|B i )P(B i ).
X
i =1

Proof. Since Ω =
S∞
i =1
Bi ,

[
A= (A ∩ B i ).
i =1
Since {B i }∞
i =0
are disjoint, {A ∩ B i }∞
i =0
are disjoint. Thus
à !

P(A) = P
[
(A ∩ B i )
i =1

P(A ∩ B i )
X
=
i =1

P(A|B i )P(B i ).
X
=
i =1

7

Example 2.3. Suppose that the probability that a man is color-blind is 0.005 and
the probability that a woman is color-blind is 0.0025. Assume that a population
is evenly divided by men and women. Let X be a randomly chosen person from
the population.
(1) Find out the probability that X is color-blind.

(2) Given that X is color-blind, what is the probability that X is a man?


Solution. Define the following events:
C = {X is color-blind},
M = {X is a man},
W = {X is a woman}.
Then the assumptions are equivalent to
P(M ) = 0.5,
P(W ) = 0.5,
P(C |M ) = 0.005,
P(C |W ) = 0.0025.
(1) Since Ω = M ∩ W , we have
C = (C ∩ M ) ∪ (C ∩ W ).
Therefore, by the law of total probability,
P(C ) = P(C |M )P(M ) + P(C |W )P(W )
= 0.005 × 0.5 + 0.0025 × 0.5
= 0.00375.
(2)
P(M ∩C )
P(M |C ) =
P(C )
P(C |M )P(M )
=
P(C )
0.005 × 0.5
=
0.00375
2
= .
3

8
2.1.3 Independence
Definition 2.4 (Independence). If P(A|B ) = P(A), meaning that the probability
of A remains the same no matter whether B has occurred or not (i.e. A has noth-
ing to do with B ), we say that A and B are independent. Equivalently, A and B
are independent if
P(A ∩ B ) = P(A)P(B ).

Remark 2.5. In general P(A ∩ B ) could be either greater or less than P(A)P(B ).

Solution. Let P(A) = 0.5 and let B = A, then

0.5 = P(A) = P(A ∩ A) > P(A)P(A) = 0.25.

Let P(A) = P(B ) = 0.5 and assume that A and B are disjoint, then

0 = P(;) = P(A ∩ B ) < P(A)P(B ) = 0.25.

Example 2.6. Toss a coin twice. Let

A = {a head shown in the first toss}


B = {a tail shown in the second toss}.

Then A and B are independent. Let

B 0 = {one and only one head shown in the two tosses}.

Then A and B 0 are independent if and only if it is a fair coin.

Proof. Since A ∩ B = A ∩ B 0 are the same event, we have

P(A ∩ B 0 ) = P(A ∩ B ) = P(A)P(B )

This means that A and B 0 are independent if and only if P(B ) = P(B 0 ).
Assume that P(A) = p ∈ (0, 1), then P(B ) = 1 − p and

P(B 0 ) = P({H T } ∪ {T H }) = p × (1 − p) + (1 − p) × p = 2p(1 − p).

Thus P(B ) = P(B 0 ) if and only if 2p(1 − p) = 1 − p, yielding p = 0.5, hence a fair
coin. ■

9
Definition 2.7 (Mutually independence). A sequence of events A 1 , A 2 , · · · is said
to be mutually independent if for each integer k, and each choice of distinct in-
tegers n 1 , · · · , n k , Ã !
k k
P(A ni ).
\ Y
P A ni =
i =1 i =1

Remark 2.8. If a sequence of events A 1 , A 2 , · · · satisfies that for any distinct pair
i 6= j , A i and A j are independent (pairwise independent), is it also mutullay
independent?

- No. An example is given in the next section.

2.2 Random variables


Definition 2.9 (Random variables). A random variable is a real-valued function
of outcomes, i.e.,
X : Ω 3 ω 7→ X (ω) ∈ R.

Example 2.10. Toss a fair coin certain times.


(1) For k ≥ 1 let X k be the number of heads shown in the kth toss, that is
½
1, if a head shows in the kth toss;
Xk =
0, if a tail shows in the kth toss.

Then X k is a random variable taking values in {0, 1}.


(2) Let S n be the number of heads shown in the first n tosses. Then S n is a
random variable taking values in {0, 1, · · · , n}. In fact

Sn = X1 + · · · + Xn .

The example for Remark 2.8: Let A 1 = {X 1 = X 2 }, A 2 = {X 2 = X 3 } and A 3 = {X 3 =


X 1 }. Then for distinct i , j ,
1
P(A i ∩ A j ) = P(X 1 = X 2 = X 3 ) = = P(A i )P(A j ),
4
which yields that A i and A j are independent. However,

1 1
P(A 1 ∩ A 2 ∩ A 3 ) = P(X 1 = X 2 = X 3 ) = 6= = P(A 1 )P(A 2 )P(A 3 ),
4 8
thus A 1 , A 2 and A 3 are not mutually independent.

10
2.2.1 Discrete random variables
Definition 2.11 (Discrete random variables). If a random variable X can only
assume countably many number of values {x 1 , x 2 , · · · , x n , · · · }, then X is called a
discrete random variable. In this case
{p n = P(X = x n ), n = 1, 2, · · · }
is called the probability mass function (p.m.f.) of X .
(1) The expectation (mean) of X is defined as
∞ ∞
E(X ) = x n · P(X = x n ) =
X X
xn · p n .
n=1 n=1

(2) The variance of X is defined as


∞ ∞
Var(X ) = E((X − µ)2 ) = (x n − µ)2 · P(X = x n ) = (x n − µ)2 · p n .
X X
n=1 n=1

(3) In general for a function f : R 7→ R,


∞ ∞
E( f (X )) = f (x n ) · P(X = x n ) =
X X
f (x n ) · p n .
n=1 k=n

2.2.2 Independence of discrete random variables


Definition 2.12 (Independence of discrete random variables). Let X taking val-
ues in {x 1 , x 2 , · · · } and Y taking values in {y 1 , y 2 , · · · } be two discrete random vari-
ables. We say that X and Y are independent if for all x i and y j , the events
{X = x i } and {Y = y j } are independent, that is
P(X = x i , Y = y j ) = P(X = x i )P(Y = y j ).
Lemma 2.13. If X , Y are independent, then E(X Y ) = E(X )E(Y ).
Proof.
E(X Y ) = x i y j · P(X = x i , Y = y j )
XX
i j

x i y j · P(X = x i )P(Y = y j )
XX
=
i j
³X ´³ X ´
= x i · P(X = x i ) y j · P(Y = y j )
i j
= E(X )E(Y ).

11
2.2.3 Examples of discrete random variables
(1) Bernoulli random variables with parameter p. Write X ∼ Bern(p).
- X takes values in {0, 1}, and P(X = 1) = p, P(X = 0) = 1 − p.
- E(X ) = p.
- If X is the number of heads shown in tossing a fair coin, then X ∼
Bern(1/2).

(2) Binomial random variables with parameter n and p. Write X ∼ Bin(n, p).
- X takes values in {0, 1, · · · , n}, and
µ ¶
n
P(X = k) = p k (1 − p)n−k , k = 0, · · · , n,
k

where
n!
µ ¶
n
=
k k!(n − k)!
is called the binomial coefficient.
- E(X ) = np.
- If X is the number of heads shown in tossing a fair coin n times,
then X ∼ Bin(n, 1/2).

(3) Geometric random variables with parameter p. Write X ∼ Geo(p).


- X takes values in {1, 2, · · · }, and P(X = n) = p(1−p)n−1 for n = 1, 2, · · · .
- E(X ) = 1/p.
- If X is the number of tosses of a fair coin we made until the first head
shows, then X ∼ Geo(1/2).

(4) Poisson random variables with parameter λ. Write X ∼ Poiss(λ).


- X takes values in {0, 1, 2, · · · }, and

λn −λ
P(X = n) = · e , n = 0, 1, · · · .
n!

- E(X ) = λ, Var(X ) = λ.

12
Example 2.14. Assume that the number of messages you receive in a day is a
Poisson random variable. Suppose that averagely you receive 50 messages per
day. What is the porbability that you won’t receive any message tomorrow?

Proof. Let X be the number of messages you will receive tomorrow. By assump-
tion we have that X is a Poisson random variable E(X ) = λ = 50. Then the prob-
ability that you won’t receive any message tomorrow is

λ0
P(X = 0) = × e −λ = e −50 .
0!

2.2.4 Examples of continuous random variables


Besides discrete random variables, we also have continuous random variables
whose values could fill in an entire interval [a, b] or the whole line. In this case,
X has a probability density function f X (x) so that
Z y
P(X ≤ y) = f X (x) dx
−∞

and Z ∞ Z ∞
E(X ) = x · f X (x) dx, Var(X ) = (x − µ)2 · f X (x) dx.
−∞ −∞

f X (x)

x
µ

Figure 2.2: Probability density function of a normal random variable with pa-
rameters µ and σ2 .

Examples of continuous random variables:

13
(1) Uniform random variables on the interval [a, b]. Write X ∼ Uni[a, b].
- X takes values in [a, b] and

 1
, if x ∈ [a, b];
f X (x) = b−a
 0, otherwise.

- E(X ) = (b + a)/2, since


b b2 − a2 b + a
Z ∞
1 1
Z
E(X ) = x · f X (x) dx = x· dx = · = .
−∞ a b−a b−a 2 2

- Example: choose a point randomly and uniformly from the interval


[0, 1].

(2) Exponential random variables with parameter λ. Write X ∼ Exp(λ).


- X takes values in R+ = {x ≥ 0} and

λe −λx , if x ≥ 0;
½
f X (x) =
0, if x < 0.

- E(X ) = 1/λ, since


Z ∞ Z ∞ Z ∞
−λx
E(X ) = x · f X (x) dx = x · λe dx = − x de −λx
−∞ 0 0
¯∞ ³ Z ∞ ´ Z ∞
−λx ¯ −λx
= −x · e ¯ − − e dx = e −λx dx
0 0 0
Z ∞
1 ³ −λx
´ 1
= · − de = .
λ 0 λ

- Example: the lifetime of a computer.

(3) Normal random variables with parameter µ and σ2 . Write X ∼ N(µ, σ2 ).


- X takes values in R and
1 (x−µ)2

f X (x) = p e 2σ2 .
2πσ2

- E(X ) = µ and Var(X ) = σ2 .


- Example: the distribution of marks of an exam to a large number of
students.

14
2.3 Probability generating function
Let X be a discrete random variable taking values in non-negative integers. The
probability mass function of X is given by

P(X = n), n = 0, 1, · · · .

Definition 2.15. The probability generating function (p.g.f.) of X is defined by


the following infinite sum:

P(X = n) · s n
X
G X (s) =
n=0
= P(X = 0) + P(X = 1) · s + · · · + P(X = n) · s n + · · · .

Examples:

(1) Let X ∼ Bern(p). We have that X takes values in {0, 1} and P(X = 0) = 1− p
and P(X = 1) = p. Then

G X (s) = P(X = 0) + P(X = 1) · s


= 1 − p + ps.

(2) Let X ∼ Bin(n, p). We have that X takes values in {0, · · · , n} and
µ ¶
n
P(X = k) = p k (1 − p)n−k , k = 0, · · · , n.
k

Then
n
P(X = k) · s k
X
G X (s) =
k=0
n µ ¶
n
p k (1 − p)n−k · s k
X
=
k=0
k
n µ ¶
n
(ps)k (1 − p)n−k
X
=
k=0
k
= (1 − p + ps)n .

(3) Let X ∼ Poiss(λ). We have X takes values in {0, 1, · · · } and


λn −λ
P(X = n) = · e , n = 0, 1, · · · .
n!

15
Then

P(X = n) · s n
X
G X (s) =
n=0
∞ λn
· e −λ · s n
X
=
n=0 n!
∞ (λs)n
· e −λ
X
=
n=0 n!
λs
= e · e −λ = e λs−λ .

(4) Let X be a discrete random variable taking values in {1, 3, 5} with proba-
bility mass function P(X = 1) = 14 , P(X = 3) = 14 and P(X = 5) = 12 . Then

G X (s) = P(X = 1) · s + P(X = 3) · s 3 + P(X = 5) · s 5


1 1 1
= s + s 3 + s 5.
4 4 2

2.3.1 Properties of probability generating functions


(1) Equivalent expression: Fix s. Let h(X ) = s X . Then

E(h(X )) = P(X = n) · h(n)
X
n=0

P(X = n) · s n .
X
=
n=0

Thus
G X (s) = E(s X ).

(2) Particular values:



P(X = n) · 1n
X
G X (1) =
n=0

P(X = n)
X
=
n=0
= 1.

G X (0) = P(X = 0).

16
(3) First derivative:

³X ´0
G 0X (s) = P(X = n) · s n
n=0

P(X = n) · ns n−1
X
=
n=0
= E(X s X −1 )
= E (s X )0 .
¡ ¢

Hence

G 0X (1) = P(X = n) · n
X
n=0
= E(X ).

Example: Let X ∼ Bin(n, p). We have G X (s) = (1 − p + ps)n . Thus

G 0X (s) = np(1 − p + ps)n−1 ⇒ G 0X (1) = np = E(X ).

(4) Second derivative:

G 00X (s) = E (s X )00 = E(X (X − 1)s X −2 ).


¡ ¢

Thus
G 00X (1) = E(X 2 ) − E(X ).
So
Var(X ) = E(X 2 ) − E(X )2 = G 00X (1) +G 0X (1) −G 0X (1)2 .

(5) The generating function determines the probability mass function: If

G X 1 (s) = G X 2 (s),

that is
∞ ∞
P(X 1 = n) · s n = P(X 2 = n) · s n ,
X X
n=0 n=0

then matching the coefficients of s we get

P(X 1 = n) = P(X 2 = n), n = 0, 1, · · · .

17
(6) If X 1 and X 2 are independent, then

G X 1 +X 2 (s) = G X 1 (s)G X 2 (s).

Proof.

G X 1 +X 2 (s) = E(s X 1 +X 2 )
= E(s X 1 s X 2 )
= E(s X 1 )E(s X 2 )
= G X 1 (s)G X 2 (s).

Example: Let X 1 , · · · , X n be a sequence of independent and identically


distributed random variables with Bernoulli distribution Bern(p). Let
S n = X 1 + · · · + X n . Then S n ∼ Bin(n, p) and

G S n (s) = G X 1 (s) · · ·G X n (s) = (1 − p + ps)n .

(7) If X has the probability generating function G X (s), then the probability
generating function of Y = a + bX is G Y (s) = s a G X (s b ).

Proof.
G Y (s) = E(s Y ) = E(s a+bX ) = s a E((s b ) X ) = s a G X (s b ).

Examples:

(1) Suppose the proabability generating function of X is given by

1 1 2 3 3 1 4
G X (s) = + s + s + s .
8 4 8 4
Write down the probability mass function of X .

18
Solution. Since

G X (s) = P(X = 0) + P(X = 1) · s + · · · + P(X = n) · s n + · · ·


1 1 2 3 3 1 4
= + s + s + s ,
8 4 8 4
we have
1 1 3 1
P(X = 0) = , P(X = 2) = , P(X = 3) = , P(X = 4) = .
8 4 8 4

(2) Suppose that the probability generating function of X is G X (s) = s 2 e 2s−2 .


Find the probability mass function of X .

Solution. Write G X (s) in power series as follows:

G X (s) = s 2 e 2s−2
= s 2 e −2 e 2s
∞ (2s)n
2 −2
X
= s e
n=0 n!
∞ 2n
e −2 · s n+2 .
X
=
n=0 n!

We conclude that
2n −2
P(X = n + 2) = e , n = 0, 1, 2, · · · .
n!

In general, if G X (s) = e λs−λ , then X ∼ Poiss(λ). ■

(3) Supoose that the probability generating functions of two independent


random variables X 1 and X 2 are given by

1 2 2 2 3
G X 1 (s) = + s , G X 2 (s) = s 2 + s 4 .
3 3 5 5
Let Y = X 1 + X 2 . Find G Y (s) and the probability mass function of Y .

19
Solution. Since X 1 and X 2 are independent, we have

G Y (s) = G X 1 +X 2 (s)
= G X 1 (s)G X 2 (s)
³ 1 2 ´³ 2 3 ´
= + s2 s2 + s4
3 3 5 5
2 2 7 4 2 6
= s + s + s .
15 15 5
This gives
2 7 2
P(X = 2) = , P(X = 4) = , P(X = 6) = .
15 15 5

(4) If X 1 ∼ Poiss(λ1 ), X 2 ∼ Poiss(λ2 ) and X 1 , X 2 are independent, show that


X 1 + X 2 ∼ Poiss(λ1 + λ2 ).

Solution. We already know that G X 1 (s) = e λ1 s−λ1 and G X 2 (s) = e λ2 s−λ2 .


Since X 1 and X 2 are independent, we have

G X 1 +X 2 (s) = G X 1 (s)G X 2 (s) = e (λ1 +λ2 )s−(λ1 +λ2 ) .

Therefore X 1 + X 2 ∼ Poiss(λ1 + λ2 ). ■

(5) Let G(s) = a 0 + a 1 s + a 2 s 2 + · · · + a n s n be a polynomial with a i ≥ 0, i =


0, 1, · · · , n. Write down the necesary and sufficeint condition under which
G is a probability generating function of some random variable X .

Solution. The condition is a 0 + a 1 + · · · + a n = 1, and G is the probabil-


ity generating function of the random variable X whose probability mass
function is
P(X = i ) = a i , i = 0, 1, · · · , n.

(6) Suppose that a random variable X has generating function


ps
G X (s) = , 0 < p < 1, q = 1 − p.
1− qs
Determine the probability mass functioni of X and E(X ).

20
1 P∞ n
Solution. Using 1−x
= n=0 x we obtain that
ps
G X (s) =
1− qs
³X∞ ´
= ps (q s)n
n=0

pq n s n+1 .
X
=
n=0
This implies that
P(X = n + 1) = pq n , n = 0, 1, · · · ,
i.e. X has geometric distribution with parameter p.
A simple computation gives
p(1 − q s) − ps(−q) p
G 0X (s) = 2
= .
(1 − q s) (1 − q s)2
Hence, E(X ) = G 0X (1) = p/(1 − q)2 = 1/p. ■
¡ ¢n−1
(7) Assume that P(X = n) = 32 13 , n ≥ 1. (i) Find G X (s); (ii) Write down the
generating function of Y = 2 + 5X .

Solution. (i)

P(X = n)s n
X
G X (s) =
n=0
∞ 2¡1¢
n−1
· sn
X
=
n=1 3 3
2 X ∞ ¡1 ¢
n−1
= s s
3 n=1 3
2
3
s
= .
1 − 31 s
(ii)
G Y (s) = s 2G X (s 5 )
2 7
3s
= .
1 − 31 s 5

21
Chapter 3

Simple Random Walks

Example 3.1 (A drunken man’s walk). A drunk man is wondering on the street.
Being drunk, he has no sense (or just a little sense) of direction. So he may move
one step forward with probability p, and he may move one step backward with
probability q = 1 − p.

p p p p

..... KFC Bar Tesco .....

q q q q

Figure 3.1: A drunk man’s walk.

The mathematical formulation: Suppose that the drunk man walks on inte-
gers. Let S n denote the position of the drunk man at step n, n ≥ 0. Suppose that
he starts at S 0 = 0. Then at step n ≥ 1, he may move one step forward (+1) from
S n−1 with probability p, or he may move one step backward (−1) from S n−1 with
probability q, that is
½
+1, with probability p,
S n = S n−1 +
−1, with probability q.
In other words,
S n = S n−1 + Yn ,
where Yn is a random variable, independent of S n−1 , taking values in {1, −1}
with P(Yn = 1) = p and P(Yn = −1) = q.

22
p p p p

...-2 -1 0 1 2...
q q q q

Figure 3.2: Simple random walks on integers.

Definition 3.2 (Simple random walks). A simple random walk starting from a is
a sequence {S n , n ≥ 0} of random variables (called a stochastic process) such that
S 0 = a, and
S n = S n−1 + Yn , n = 1, 2, · · · ,
where Y1 , Y2 , · · · , Yn , · · · are independent, identically distributed random vari-
ables with
P(Yn = 1) = p, P(Yn = −1) = 1 − p = q.
If p = q = 1/2, it is called a simple symmetric random walk.

Remark 3.3. (1) Yn describes the movement at step n.

(2) S 1 = S 0 + Y1 = a + Y1 , S 2 = S 1 + Y2 = a + Y1 + Y2 , · · · , S n = a + Y1 + · · · + Yn .

Example 3.4 (Gambler’s problem). A gambler plays the following game at a


casino. The croupier tosses a coin repeatedly: each time a head appears the
casino gives the gambler one pound, and each time a tail appears the casino
takes one pound from the gambler.
The Gambler starts with a pounds, i.e. S 0 = a. Denote by S n the total amount
that the gambler has after n tosses, there are many interesting probabilities we
can compute regarding this random walk, for example what is the probability
that the gambler has k pounds after n tosses, i.e. P(X n = k)?

3.1 Properties of simple random walks


There are four main properties of simple random walks:

(1) The probability mass functions: For n ≥ 0 and k ∈ Z, if n + k is even then


µ ¶
n n+k n+k
P(S n = k) = n+k p 2 (1 − p)n− 2 ;
2

23
if n + k is odd then
P(S n = k) = 0.

(2) The joint probability mass function: For n 1 ≤ n 2 ≤ · · · ≤ n k ,

P(S n1 = r 1 , S n2 = r 2 , · · · , S nk = r k )
= P(S n1 = r 1 )P(S n2 −n1 = r 2 − r 1 ) · · · P(S nk −nk−1 = r k − r k−1 ).

(3) Homogeneity: For n 1 ≤ n 2 , P(S n2 = r 2 |S n1 = r 1 ) = P(S n2 −n1 = r 2 − r 1 ).

(4) Markov property: For n 1 < n 2 < · · · < n k < n k+1 ,

P(S nk+1 = r k+1 |S n1 = r 1 , · · · , S nk = r k ) = P(S nk+1 = r k+1 |S nk = r k ).

3.1.1 The probability mass function


Proposition 3.5. Let I 1 , I 2 , · · · , I n be independent, identically distributed Bernoulli
random variables with P(I k = 1) = p and P(I k = 0) = q. Set

Nn = I 1 + I 2 + · · · + I n .

Then Nn has a binomial distribution Bin(n, p), i.e.,


µ ¶
n
P(X = k) = p k (1 − p)n−k , k = 0, · · · , n,
k

Proof. We know that X ∼ Bin(n, p) if and only if G X (s) = (q + ps)n . Therefore, to


prove Nn ∼ Bin(n, p), it is sufficient to show that G Nn (s) = (q + ps)n . In fact, by
the independence,

G Nn (s) = G I 1 +I 2 +···+I n (s) = G I 1 (s)G I 2 (s) · · ·G I n (s) = (q + ps)n .

This completes the proof. ■

The following proposition provides the probability mass function of simple


random walks.

24
Proposition 3.6. Let S 0 = 0. For any n ≥ 0, we have S n = 2Nn − n, where Nn is
a random variable having the binomial distribution Bin(n, p). Consequently, if
n + k is even then

P(S n = k) = P(2Nn − n = k)
n +k
µ ¶
= P Nn =
2
µ ¶
n n+k n+k
= n+k p 2 (1 − p)n− 2 ;
2

if n + k is odd then
P(S n = k) = 0.
Pn
Proof. Recall that S n = i =1 Yi . Write

Yi + 1
Ii = , i = 1, · · · , n.
2
Then I i is a Bernoulli random variable with parameter p:

Yi + 1
P(I i = 1) = P( = 1) = P(Yi = 1) = p,
2
Yi + 1
P(I i = 0) = P( = 0) = P(Yi = −1) = 1 − p.
2
Since Y1 , · · · , Yn are independent, so are I 1 , · · · , I n . Thus

Sn + n X n Y +1 n
i X
= = I i = Nn ∼ Bin(n, p).
2 i =1 2 i =1

This yields that S n = 2Nn − n, which gives the conclusion. ■

3.1.2 The joint probability mass function


Proposition 3.7. For 1 ≤ n 1 ≤ n 2 ≤ · · · ≤ n k ,

P(S n1 = r 1 , S n2 = r 2 , · · · , S nk = r k )
= P(S n1 = r 1 )P(S n2 −n1 = r 2 − r 1 ) · · · P(S nk −nk−1 = r k − r k−1 ).

25
Proof. We only give a proof for the case k = 2. We have

P(S n1 = r 1 , S n2 = r 2 ) = P(S n1 = r 1 , S n2 − S n1 = r 2 − r 1 )
à !
n1 n2
= P
X X
Yi = r 1 , Yi = r 2 − r 1
i =1 i =n 1 +1
à ! à !
n1 n2
= P Yi = r 1 P
X X
Yi = r 2 − r 1 ,
i =1 i =n 1 +1

Pn 2
where the last equality comes from the independence. Since i =n Yi and
Pn2 −n1 1 +1
S n2 −n1 = i =1 Yi have the same probability distribution, we conclude that

P(S n1 = r 1 , S n2 = r 2 ) = P(S n1 = r 1 )P(S n2 −n1 = r 2 − r 1 ).

Example:

P(S 4 = 2, S 7 = −1) = P(S 4 = 2)P(S 7−4 = −1 − 2)


= P(S 4 = 2)P(S 3 = −3)
µ ¶ µ ¶
4 3 3
= p (1 − p) (1 − p)3 .
3 0

3.1.3 Homogeneity
Observe that
P(S n2 = r 2 , S n1 = r 1 )
P(S n2 = r 2 |S n1 = r 1 ) =
P(S n1 = r 1 )
P(S n2 −n1 = r 2 − r 1 )P(S n1 = r 1 )
=
P(S n1 = r 1 )
= P(S n2 −n1 = r 2 − r 1 ).

This means that P(S n2 = r 2 |S n1 = r 1 ) depends only on the distance of the posi-
tions and the difference of the times. This is call the spatial homogeneity and
time homogeneity.

26
3.1.4 Markov property
Proposition 3.8. For n 1 < n 2 < · · · < n k < n k+1 it holds that

P(S nk+1 = r k+1 |S n1 = r 1 , · · · , S nk = r k ) = P(S nk+1 = r k+1 |S nk = r k )

Proof. Let A = {S nk+1 = r k+1 }, B = {S nk = r k } and C = {S n1 = r 1 , · · · , S nk−1 = r k−1 }.


So
P(A ∩ B ∩C )
The left term = P(A|B ∩C ) =
P(B ∩C )
P(S n1 = r 1 , · · · , S nk = r k , S nk+1 = r k+1 )
=
P(S n1 = r 1 , · · · , S nk = r k )
P(S n1 = r 1 ) · · · P(S nk −nk−1 = r k − r k−1 )P(S nk+1 −nk = r k+1 − r k )
=
P(S n1 = r 1 ) · · · P(S nk −nk−1 = r k − r k−1 )
= P(S nk+1 −nk = r k+1 − r k )
= P(S nk+1 = r k+1 |S nk = r k ).

What does the above fact tell us? If we think of n k as the current time
(present), then A is a future event, B is a present event, and C is a past event, so
the above fact is saying that

P(Future|Past and Present) = P(Future|Present).

This means that once we know the current situation of the process, the past is
irrelevant of its future. The past has no influence on its future given its present.
This non-memory property is called the Markov Property.

3.2 First return probabilities


We want to find the probability that the simple random walk returns to its start-
ing point for the first time at time n, that is, for S 0 = 0 and n ≥ 1, the probability

f n = P(S 1 6= 0, · · · , S n−1 6= 0, S n = 0).

Convention: we set f 0 = 0.

27
3.2.1 First return identity
We will calculate f n by using

u n = P(S n = 0).

From the previous studies we have


µ ¶
2n
u 2n = p n (1 − p)n and u 2n+1 = 0 for n = 0, 1, · · · .
n
Notice that S 0 = 0 implies that u 0 = P(S 0 = 0) = 1.
The relation between f n and u n is given by the following result.
Proposition 3.9 (The first return identity). For n ≥ 1 we have
n
X
un = f k u n−k = f 0 u n + f 1 u n−1 + f 2 u n−2 + · · · + f n u 0 .
k=0

Proof. Splitting the event A n = {S n = 0} according to the time at which the first
return to 0 occurs, we get
[n
An = (A n ∩ B k ),
k=1
where
B k = {S 1 6= 0, · · · , S k−1 6= 0, S k = 0}
is the event that the first return to 0 occurs at time k. Since the events A n ∩ B k ,
k = 1, 2, · · · , n are disjoint, we have
n n
u n = P(A n ) = P(A n ∩ B k ) = P(B k )P(A n |B k ).
X X
k=1 k=1

Now, P(B k ) = f k and by Markov property and by homogeneity,

P(A n |B k ) = P(S n = 0|S 1 6= 0, · · · , S k−1 6= 0, S k = 0)


= P(S n = 0|S k = 0)
= P(S n−k = 0)
= u n−k .

Therefor we obtain
n
X
un = f k u n−k .
k=0

28
Let us consider the following two generating functions:
∞ ∞
fk sk = fk sk
X X
F (s) =
k=0 k=1

and
∞ ∞
ul s l = 1 + ul s l .
X X
U (s) =
l =0 l =1

Lemma 3.10.
1
F (s) = 1 − .
U (s)
Proof. Multply F (s) and U (s) we get
à !à !
∞ ∞
fk sk ul s l
X X
F (s)U (s) =
k=0 l =0

∞ X
f k u l s k+l
X
=
k=0 l =0
∞ X n
f k u n−k )s n (for n ≥ 0 collect all f k u l such that k + l = n for s n )
X
= (
n=0 k=0

un s n
X
= 0+ ( f 0 u 0 = 0 and for n ≥ 1 use the first return identity)
n=1
= U (s) − 1.

3.2.2 (*) Generalized binomial formula


Recall the standard binomial formula:
n µ ¶ n n(n − 1) · · · (n − k + 1)
n n
xk = xk .
X X
(1 + x) =
k=0
k k=0 k!

For any real number a define

a(a − 1) · · · (a − n + 1)
µ ¶
a
= for n = 1, 2, · · ·
n n!
and µ ¶
a
= 1.
0

29
With this defintion, the following expansion holds
∞ µ ¶
a a
x n , −1 ≤ x < 1.
X
(1 + x) =
n=0 n

Examples:

(1) For n ≥ 0,

− 12 (− 21 )(− 21 − 1) · · · (− 12 − n + 1)
µ ¶
n
(−1) = (−1)n
n n!
1(1 + 2) · · · (1 + 2(n − 1))
= 2−n
n!
1 · 2 · (1 + 2) · · · 2(n − 1) · (1 + 2(n − 1)) · (2n)
= 2−2n
n!n!
µ ¶
2n
= 2−2n
n

This give that


∞ µ ¶
− 12 2n
−2n
xn.
X
(1 − x) = 2
n=0 n

(2) For n ≥ 1,

1 ( 1 )( 1 − 1) · · · ( 12 − n + 1)
µ ¶
n−1 2 n−1 2 2
(−1) = (−1)
n n!
1(2 − 1) · · · (2(n − 1) − 1)(2n − 1)
= 2−n
n!(2n − 1)
(2 − 1) · 2 · (2 · 2 − 1) · · · (2n − 1) · (2n)
= 2−2n
n!n!(2n − 1)
µ ¶ −2n
2n 2
= .
n 2n − 1

This give that


∞ 2−2n n
µ ¶
1 X 2n
1 − (1 − x) = 2 x .
n=1 n 2n − 1

30
3.2.3 First return probabilities
Theorem 3.11. For a simple random walk, we have

1
q
U (s) = (1 − 4pq s 2 )− 2 and F (s) = 1 − 1 − 4pq s 2 .

Thus f n = 0 if n is odd and

pn qn
µ ¶
2n
f 2n = , n = 1, 2, · · · .
n 2n − 1

Proof. From previous study we have


µ ¶
2n
u 2n = p n q n and u 2n+1 = 0 for n = 0, 1, · · · .
n

It follows that

u 2n s 2n
X
U (s) =
n=0
∞ µ ¶
2n
p n q n s 2n
X
=
n=0 n
∞ µ ¶
2n
−2n
(4pq s 2 )n
X
= 2
n=0 n
1
= (1 − 4pq s 2 )− 2 .

Hence
1
F (s) = 1 −
U (s)
q
= 1 − 1 − 4pq s 2
∞ µ ¶ −2n
2n 2
(4pq s 2 )n
X
=
n=1 n 2n − 1
∞ µ
2n p n q n 2n

X
= s .
n=1 n 2n − 1

This gives the conclusion. ■

31
Example: Let p = q = 1/2. Then

( 12 )4
µ ¶
4 1
f4 = = ( )3 ,
2 4−1 2

while
1
u 4 = P(S 4 = 0) = 3( )3 .
2

3.2.4 The recurrence probability


We define the recurrence probability as the probability that the simple random
walk ever returns to its starting position. The recurrent event is given by

A = {the s.r.w. ever returns to 0}


= {S k = 0 for some k ≥ 1}

[
= {a first return to zero occurs at n}.
n=1

So the recurrence probability is

f ∗ = P(A)

P(a first return to zero occurs at n)
X
=
n=1
X∞
= fn .
n=1

Corollary 3.12. For a simple random walk we have

f ∗ = 1 − |p − q|.

Thus f ∗ = 1 if p = q = 1/2, where as f ∗ < 1 if p 6= q.

Proof. Recall that


∞ q
n
X
F (s) = fn s = 1 − 1 − 4pq s 2 .
n=1
∗ P∞ p
So f = n=1 f n = F (1) = 1 − 1 − 4pq. Since p + q = 1,

1 − 4pq = (p + q)2 − 4pq = (p − q)2 .

Hence f ∗ = 1 − |p − q|. ■

32
3.3 Total number of returns
Let R be the total number of returns (to the starting position), i.e., R = k means
that the simple random walk returns to its starting position exactly k times,
with the convention that R = ∞ means that the simple random walk returns to
its starting position infinitely often.
Proposition 3.13.
f∗
1. If p 6= q then P(R = k) = (1 − f ∗ )( f ∗ )k and E(R) = 1− f ∗ .

2. If p = q = 1/2 then P(R = ∞) = 1.


Proof. We first prove that P(R ≥ m) = ( f ∗ )m for m ≥ 1 by induction. For m = 1
we have
P(R ≥ 1) = P(the s.r.w. ever returns to 0) = f ∗ .
Assume that the claim holds for m. We prove it also holds for m + 1. For n ≥ 1
denote by B n = {S 1 6= 0, · · · , S n−1 6= 0, S n = 0}. Split the event {R ≥ m + 1} accord-
ing to the time at which a first return occurs to get

P(R ≥ m + 1) = P(B n ∩ {R ≥ m + 1})
X
n=1

P(B n ∩ {at least m returns after time n})
X
=
n=1

P(B n )P(at least m returns after time n|B n )
X
=
n=1

P(B n )P(at least m returns after time n|S n = 0)
X
(By Markov property) =
n=1

P(B n )P(R ≥ m)
X
(By homogeneity) =
n=1

= ( f ∗ )m
X
(By induction) fn
n=1
∗ m+1
= (f ) .

By induction we get that P(R ≥ m) = ( f ∗ )m for m ≥ 1. Consequently,

P(R = m) = P(R ≥ m) − P(R ≥ m + 1)


= ( f ∗ )m − ( f ∗ )m+1
= (1 − f ∗ )( f ∗ )m .

33
If p 6= q, then f ∗ < 1. We may calculate the p.g.f. of R:

G R (s) = P(R = 0) + P(R = m)s m
X
m=1

= 1− f ∗ + (1 − f ∗ )( f ∗ )m s m
X
m=1

∗ f ∗s∗
= 1 − f + (1 − f )
1 − f ∗s

1− f
= .
1 − f ∗s
Hence
(1 − f ∗ ) f ∗
G R0 (s) =
(1 − f ∗ s)2
and
f∗
E(R) = G R0 (1) = .
1− f ∗
If p = q = 1/2 then f ∗ = 1 and P(R ≥ m) = 1 for all m ≥ 1. Let m → ∞ we get

P(R = ∞) = 1.

In the case of P(R = ∞) = 1 we say that the simple random walk is recurrent
because it keeps returning to its starting position. If the case of P(R < ∞) = 1 we
say that the simple random walk is transient because the simple random walk
will not return to its starting position after finitely many steps.

3.4 Probabilities of visiting a particular point


Let g a,b be the probability that the random walk starting from a ever visits b,
that is
g a,b = Pa (S n = b for some n ≥ 1).
In the previous section we have computed g 0,0 = f ∗ . Our aim is to find an ex-
plicit expression of g a,b for arbitrary a, b. By homogeneity we have

g a,b = Pa (S n = b for some n ≥ 1)


= P0 (S n = b − a for some n ≥ 1)
= g 0,b−a .

34
So we only need to compute the probabilities of the simple random walk start-
ing from 0 ever visits k for k ∈ Z, which we denote by

g k∗ = g 0,k .

Lemma 3.14. For any a < b < c or c < b < a,

g a,c = g a,b · g b,c .

Proof. Notice that since b is between a and c, the simple random walk, starting
from a, must visit b first before reaching c. Hence

g a,c = Pa (S n = c for some n ≥ 1)


= Pa (S m = b for some m ≥ 1, S m+k = c for some k ≥ 1)
= Pa (S m = b for some m ≥ 1)Pa (S m+k = c for some k ≥ 1|S m = b for some m ≥ 1)
= g a,b · Pa (S m+k = c for some k ≥ 1|S m = b for some m ≥ 1).

By (strong) Markov property we have

Pa (S m+k = c for some k ≥ 1|S m = b for some m ≥ 1) = Pb (S k = c for some k ≥ 1)


= g b,c

This completes the proof. ■

Theorem 3.15.

1. If p = q = 1/2. then g k∗ = 1 for all k ∈ Z.


q
2. If p > q, then g k∗ = 1 for all k > 0 and g k∗ = ( p )|k| for all k < 0.
p
3. If p < q, then g k∗ = 1 for all k < 0 and g k∗ = ( q )|k| for all k > 0.

Note that we have the following unified formula

p k
½ ¾
∗ ∗ ∗
g 0 = f and g k = min 1, ( ) for k 6= 0.
q

Proof. We only prove 3. Other cases are similar. If p < q, then the the simple
random walk is more likely move to the left than to the right. Intuitively, the
simple random walk will eventually move to the ‘left infinite’. The strong law of

35
large numbers confirms this. By the strong law of large numbers, with proba-
bility 1,
S n Y1 + · · · + Yn
= → µ,
n n
where µ = E(Y1 ) = p − q < 0. Thus we conclude that

S n ∼ nµ → −∞.

This means that almost surely the simple random walk visits every negative
position k < 0 since it cannot get to −∞ without doing so. This gives that g k∗ = 1
for all k < 0.
On the other hand, for k ≥ 2 (thus 0 < 1 < k) we have

g k∗ = g 0,k = g 0,1 · g 1,k = g 1∗ · g k−1



.

This gives that


g k∗ = (g 1∗ )k for k ≥ 1.
Now we compute g 1∗ . Write

B = {S n = 1 for some n ≥ 1}.

We can divide B into two disjoint event

B = (B ∩ {S 1 = 1}) ∪ (B ∩ {S 1 = −1}).

Then

g 1∗ = P(B )
= P(B ∩ {S 1 = 1}) + P(B ∩ {S 1 = −1})
= P(S 1 = 1)P(B |S 1 = 1) + P(S 1 = −1)P(B |S 1 = −1)
= p × 1 + q × g −1,1
= p + q × (g 1∗ )2 .

In other words, g 1∗ is the solution of the equation

x = p + q x 2 ⇔ (q x − p)(x − 1) = 0.

36
p
The above equation has two roots: x 1 = 1 and x 2 = q . If p < q then we must
p
have g 1∗ = q . Otherwise if we have g 1∗ = 1, then

f ∗ = P(the s.r.w. ever returns to 0)


≥ P(the s.r.w. visits − 1 then returns to 0)
= P(S m = −1 for some m ≥ 1, S m+n = 0 for some n ≥ 1)
(By strong Markov property) = g 0,−1 × g −1,0
= 1 × g 1∗ = 1,

which contradicts to the fact that f ∗ = 1 − |p − q| < 1. So summarizing what we


got we arrive at
p
g k∗ = (g 1∗ )k = ( )k
q
for k ≥ 1. ■

3.5 The gambler’s ruin problem


Example 3.16. A gambler who has innitially k pounds (k > 0) bets against an
infinitely rich opponent on a game in which he wins 1 pound with probability p
and loses 1 pound with probability q = 1 − p.
(1) For what values of p is the gambler certain to become bankrupt?

(2) If k = 100, for what values of p is the probability that he never becomes
bankrupt at least 1/2?

(3) If p = 2/3, for what values of k is the probability that he never becomes
bankrupt at least 15/16?
Solution. Let S n be the amount of money the gambler has after n bets. Then
{S n }n≥0 forms a simple random walk with S 0 = k. We have the following inter-
pretation

{The gambler becomes bankrupt} = {S n = 0 for some n ≥ 1}.

Hence by spatial homogeneity,

Pk ({The gambler becomes bankrupt}) = Pk (S n = 0 for some n ≥ 1)


= P0 (S n = −k for some n ≥ 1)

= g −k .

37

(1) If p = q = 1/2 or p < q (equivalent to p < 1/2), we get g −k = 1, i.e., the
gambler is certain to become bankrupt. So the answer is p ≤ 1/2.
(2) If p > 1/2 (equivalent to p > q), the probability that the gambler never
becomes bankrupt is
q 1−p k

1 − g −k = 1 − ( )k = 1 − ( ) .
p p
So
1 − p 100 1 1 − p 100 1 1−p 1 21/100
1−( ) ≥ ⇔ ( ) ≤ ⇔ ≤ 1/100 ⇔ p ≥ .
p 2 p 2 p 2 1 + 21/100
(3) If p = 2/3 then q = 1/3 thus q/p = 1/2. So
q 1

1 − g −k = 1 − ( )k = 1 − ( )k .
p 2
Hence
1 15 1 1
1 − ( )k ≥ ⇔ ( )k ≤ ⇔ k ≥ 4.
2 16 2 16

Now we may assume that the gambler plays the game against a casino who
has m pound, thus the total amount of money is l = k + m. The gambler keeps
betting until

(1) he is ruined (he loses all his money); or

(2) the casino is ruined (the gambler wins all the money from the casino).

Problem: Find the probability that the gambler ruins the casino, i.e.

r k = Pk (the s.r.w. hits l before hitting 0).

Theorem 3.17. For 0 ≤ k ≤ l ,

(1) If p = q, then
k
rk = .
l
(2) If p 6= q, then
q
1 − ( p )k
rk = q .
1 − ( p )l

38
Proof. Let
A = {the s.r.w. hits l before visiting 0}.
Clearly r 0 = P0 (A) = 0 and r l = Pl (A) = 1. Let 1 ≤ k ≤ l − 1, by considering the
movement of the first step, we get

rk = Pk (A ∩ {S 1 = k + 1}) + Pk (A ∩ {S 1 = k − 1})
= Pk (S 1 = k + 1)Pk (A|S 1 = k + 1) + Pk (S 1 = −1)Pk (A|S 1 = k − 1)
= pr k+1 + qr k−1 .

This is a difference equation that we will solve. We have

r k = pr k+1 + qr k−1 ⇔ (p + q)r k = pr k+1 + qr k−1 ⇔ q(r k − r k−1 ) = p(r k+1 − r k ).

Write a = q/p, then

(r k+1 − r k ) = a(r k − r k−1 ) = · · · = a k (r 1 − r 0 ) = a k r 1 ,

since r 0 = 0. This implies

r k+1 = (r k+1 − r k ) + (r k − r k−1 ) + · · · + (r 1 − r 0 )


= (a k + a k−1 + · · · + 1)r 1 .

Consequently, we have for 2 ≤ k ≤ l

1 − ak
rk = r 1 if a 6= 1 and r k = kr 1 if a = 1.
1−a
Using r l = 1 we deduce that

1−a 1
r1 = if a 6= 1 and r 1 = if a = 1.
1 − al l
This yields the conclusion. ■

Example 3.18. Suppose that p = 18


37 and m = 10.

(i) What is the probability that the gambler ruins the casino if he has initially
k pounds?

(ii) For what values of k is the probability that the gambler ruins the casino at
least 13 ( 18
19
)10 ?

39
q
Solution. In this case we have p
= 19
18
and l = m +k = 10+k. (i) By Theorem 3.17
we have q
1 − ( p )k 1 − ( 19 k
18 )
rk = q = 19 10+k
.
1 − ( p )l 1 − ( 18 )

(ii) We need to find the values of k such that

1 − ( 19
18
)k 1 18
rk = ≥ ( )10 .
1 − ( 19
18 )
10+k 3 19

This gives

19 k 1 18 10 1 19 k 1 18 2 19
1−( ) ≤ ( ) − ( ) ⇔ 1 − ( )10 ≤ ( )k .
18 3 19 3 18 3 19 3 18
Hence
log( 32 − 12 ( 19
18 10
) )
k≥ .
log 19
18

Corollary 3.19. Let m ≥ 0, k ≥ 0 and l = m + k. We have


 l −k
 l , if p = q,
(1) Pk (the s.r.w. hits 0 before hitting l ) =
p
1−( q )l −k .
 p l , if p 6= q
1−( q )

k

 ,
l q
if p = q,
(2) P0 (the s.r.w. hits m before hitting −k) = 1−( p )k .
q , if p 6= q
1−( p )l

Proof. (1) follows from

Pk (the s.r.w. hits 0 before hitting l ) = 1 − Pk (the s.r.w. hits l before visiting 0),

and (2) follows from

P0 (the s.r.w. hits m before hitting −k) = Pk (the s.r.w. hits m + k before hitting 0).

Example 3.20. Suppose that p = q and 0 < m < k < l . Find

40
(i) P−k (the s.r.w. hits −l before hitting 0).

(ii) P−k (the s.r.w. hits −m before hitting −l ).

Solution. (i)

P−k (the s.r.w. hits −l before hitting 0) = Pl −k (the s.r.w. hits 0 before hitting l )
k
= .
l
(ii)

P−k (the s.r.w. hits −m before hitting −l ) = Pl −k (the s.r.w. hits l − m before hitting 0)
l −k
= .
l −m

41
Chapter 4

Branching processes

Example 4.1 (Extinction of zombies). A zombie potion can turn you into a zom-
bie for one day. During that day, you will be able to find N people to bite and turn
them into zombies also for one day. Suppose that P(N = 0) 6= 1 and all zombies
have the same ability to find people to bite and their activities are independent
of each other. You just drunk a zombie potion by accident. How many zombies
you will create at day n? Will your zombie descendants extinct eventually? If not,
what is the extinction probability?

1 3

2 1

Figure 4.1: Galton-Watson tree.

The probability mass function of N , denoted by


{g k = P(N = k) : k ≥ 0},

42
is called the offspring distribution. For k ≥ 0, g k is the probability that a zombie
generates exactly k zombies. For n ≥ 0 let Zn be the number of zombies at day
n. Then the sequence of random variables {Zn : n ≥ 0} is called a branching
process. More precisely, we have the following definition:

Definition 4.2 (Branching processes). A Galton-Watson branching process with


offspring distribution (p.m.f.) {g k : k ≥ 0} is a stochastic process {Zn : n ≥ 0} such
that Z0 = 1 and for n ≥ 1,

N1(n) + N2(n) + · · · + Nr(n) , if Zn−1 = r ;


½
Zn = (4.1)
0, if Zn−1 = 0,

where N1(n) , N2(n) , · · · are independent random variables with the same probabil-
ity mass function P(Ni(n) = k) = g k for k = 0, 1, · · · , and they are independent of
Zn−1 .

The main questions of interest concern the long-term behaviour of the pro-
cess, for example

(1) The behaviour of the mean size of the population E(Zn ) as n → ∞.

(2) Extinction, i.e., the process stops, or equivalently Zn = 0 for some n ≥ 1.

(3) Zn settles down to a limiting distributioin.

4.1 The mean size of the population


4.1.1 Conditional expectation
Definition 4.3 (Conditional expectation). For a discrete random variable Z tak-
ing values in {z 0 , z 1 , z 2 , · · · } and an event A ⊂ Ω, the conditional expectation of Z
given A is defined as

E(Z |A) = z n P(Z = z n |A).
X
n=0

Example 4.4. If Z and A are independent, that is the events {Z = z n } for n ≥ 0


are independent of A, then P(Z = z n |A) = P(Zn = z n ), thus E(Z |A) = E(Z ).

43
4.1.2 The law of total expectation
Lemma 4.5 (The law of total expectation). Let {B i }∞ i =0
be a partition of the sam-
ple space Ω, that is A i are disjoint and Ω = ∪∞ B
i =0 i
. Then


E(Z ) = E(Z |B i )P(B i )
X
i =0

Proof. Recall the law of total probability that for any event A it holds that
∞ ∞
P(A) = P(A ∩ B i ) = P(A|B i )P(B i ).
X X
i =0 i =0

Then

E(Z ) = z n P(Z
X
= zn )
n=0
∞ X∞
z n P(Z = z n |B i )P(B i )
X
=
n=0 i =0
∞ ³X ∞ ´
z n P(Z = z n |B i ) P(B i )
X
=
i =0 n=0

E(Z |B i )P(B i ).
X
=
i =0

4.1.3 The mean size of the population


We will find E(Zn ), the expected size of the population of generation n. Let N
be the offspring random variable, i.e., a random variable with p.m.f. {g k }k≥0 so
that all Nk(n) have the same distribution as N . Denote by


kg k = E(N ).
X
m=
k=0

Proposition 4.6. For n ≥ 1 we have E(Zn ) = m n .

Proof. We shall prove this by induction. Since Z1 has the same law as N , we
have E(Z1 ) = m. This verifies the statement for n = 1.

44
Now assume that statement is true for n − 1, that is E(Zn−1 ) = m n−1 . Since
r =0 is a partition of the sample space Ω, by using the law of total ex-
{Zn−1 = r }∞
pectation we get

E(Zn ) = E(Zn |Zn−1 = r )P(Zn−1 = r ).
X
r =0

From the definition of Zn , we see that Zn = 0 if Zn−1 = 0 thus E(Zn |Zn−1 = 0) = 0


and for r ≥ 1,

E(Zn |Zn−1 = r ) = E(N1(n) + N2(n) + · · · + Nr(n) ) = r m.

Therefore for all r ≥ 0 we have E(Zn |Zn−1 = r ) = r m. This yields that



E(Zn ) = r m · P(Zn−1 = r )
X
r =1

r · P(Zn−1 = r )
X
= m·
r =0
= mE(Zn−1 )
= mn .

This verifies the statement for n. By induction we prove that E(Zn ) = m n for all
n ≥ 1. ■
Remark 4.7. If m = 1, then E(Zn ) = 1 for all n ≥ 1; this is called the critial case.
If m > 1, then E(Zn ) → ∞ as n → ∞; this is called the supercritical case. If m < 1,
then E(Zn ) → 0 as n → ∞; this is called subcritical case.

4.2 The distribution of Zn


4.2.1 The Random Sums Lemma
Lemma 4.8 (Random Sums Lemma). Suppose that Y , N , N1 , N2 , · · · are indepen-
dent random variables taking values in non-negative integers, and suppose that
N , N1 , N2 , · · · have the same distribution. Define
½
N1 + N2 + · · · + Nr , if Y = r ≥ 1;
Z = N1 + N2 + · · · NY =
0, if Y = 0.
Then the corresponding p.g.f.s are related by

G Z (s) = G Y (G N (s)).

45
Proof. Recall that G Z (s) = E(s Z ). Since {Y = r }∞
r =0 forms a partition of Ω, by the
law of total expectation we have

E(s Z ) = E(s Z |Y = r )P(Y = r )
X
r =0

= E(s 0 |Y = 0)P(Y = 0) + E(s N1 +N2 +···Nr |Y = r )P(Y = r )
X
r =1

= P(Y = 0) + E(s N1 +N2 +···Nr )P(Y = r ),
X
r =1
where we have used the fact that Y and N1 , N2 , · · · are independent. Since Ni
are i.i.d., we have
E(s N1 +N2 +···Nr ) = G N1 +N2 +···Nr (s)
= G N1 (s)G N2 (s) · · ·G Nr (s)
= (G N (s))r .
Write t = G N (s). We have

(G N (s))r P(Y = r )
X
G Z (s) =
r =0

t r P(Y = r )
X
=
r =0
= G Y (t )
= G Y (G N (s)).

Example 4.9. Let N1 , N2 , · · · be i.i.d. Bernoulli random variables with parameter
p and let Y be a Poisson random variable with parameter λ. Assume that Y is
independent of Ni , i = 1, 2, · · · . Set Z = N1 + N2 + · · · NY . Find G Z (s). What is the
distribution of Z ?
Solution. By the Random Sums Lemma, G Z (s) = G Y (G N (s)), where
G Y (s) = e λs−λ ,
and
G N (s) = 1 − p + ps.
Hence,
G Z (s) = e λ(1−p+ps)−λ = e λps−λp .
Hence Z is a Poisson random variable with parameter λp. ■

46
4.2.2 The probability generating function of Zn
The distribution of Zn is determined by its probability generating function. For
n ≥ 1 we denote it by

P(Zn = k)s k .
X
G n (s) = G Zn (s) =
k=0

Theorem 4.10. In a Galton-Watson branching process let



gn sn
X
G(s) = G N (s) =
n=0

be the offspring probability generating function. Then

1. G 1 (s) = G(s).

2. G n (s) = G n−1 (G(s)).

3. G n (s) = G(G(· · ·G(s))) = G(G n−1 (s)).

Proof. 1. Since Z0 = 1, Z1 has the same distribution as N . So G 1 (s) = G N (s) =


G(s). 2. Looking at the definition of the Galton-Watson process we see that
we have the situation of the Random Sums Lemma with Y = Zn−1 and Z = Zn .
Thus by the RSL,

G n (s) = G Zn (s) = G Zn−1 (G N (s)) = G n−1 (G(s)).

3. Iterate 2 we get 3. ■

Example 4.11. Suppose in a Galton-Watson branching process the offspring dis-


tribution is given by

P(N = 0) = g 0 , P(N = 2) = g 2 , g 0 + g 2 = 1.

Determine the probability mass function of Z2 .

Solution. First we find the probability generating function of Z2 . The offspring


generating function is
G(s) = g 0 + g 2 s 2 .

47
By Theorem 4.10,

G 2 (s) = G Z2 (s) = G(G(s)) = g 0 + g 2 (G(s))2


= g 0 + g 2 (g 0 + g 2 s 2 )2
= g 0 + g 2 g 02 + 2g 0 g 22 s 2 + g 23 s 4 .

This implies that P(Z2 = 0) = g 0 + g 2 g 02 , P(Z2 = 2) = 2g 0 g 22 and P(Z2 = 4) = g 23 .


Example 4.12. Suppose that the offspring distribution in a Galton-Watson branch-


ing process is given by g k = P(N = k) = pq k , k = 0, 1, · · · , where p 6= q. Prove by
induction that
an − bn s
G n (s) = p ,
a n+1 − b n+1 s
where a n = q n − p n , b n = q(q n−1 − p n−1 ).

Solution. The offspring generating function is


∞ p
pq k s k =
X
G(s) = .
k=0 1− qs

Since a 1 = q − p, b 1 = 0, a 2 = q 2 − p 2 and b 2 = q(q − p), we have

a1 − b1 s q −p p
p =p = = G(s).
a2 − b2 s (q + p)(q − p) − q(q − p)s 1 − q s

Now assume that the identity holds for n, i.e.

an − bn s
G n (s) = p .
a n+1 − b n+1 s

We will prove that it also holds for n + 1. In fact,

G n+1 (s) = G n (G(s))


a n − b n G(s)
= p
a n+1 − b n+1G(s)
p
a n − b n 1−q s
= p p
a n+1 − b n+1 1−q s
a n − pb n − q a n s
= p .
a n+1 − pb n+1 − q a n+1 s

48
Since

a n − pb n = q n − p n − pq(q n−1 − p n−1 )


= (1 − p)q n − (1 − q)p n
= q n+1 − p n+1
= a n+1

and

q a n = q(q n − p n )
= b n+1 ,

we get that
a n+1 − b n+1 s
G n+1 (s) = p .
a n+2 − b n+2 s
This completes the proof. ■

Example 4.13. In a certain species of rabbits, only the white ones are able to give
birth. Assume that a typical white rabbit will give birth to a white rabbit with
probability 23 and give birth to a grey rabbit with probability 13 , and the parent
rabbit will die after giving a birth to a grey rabbit. Let Zn be the number of white
rabbits in the nth generation descended from a typical white rabbit.

Find G n (s) for n ≥ 1.

Solution. Denote by N the number of white rabbits that a typical white rabbit
will give birth to. We have the following p.m.f. of N :

P(N = 0) = P(the first one is grey)


1
= ;
3
P(N = k) = P(the first k are white and the final one is grey)
2 1
= ( )k , k ≥ 1.
3 3

So P(N = k) = pq k for k = 0, 1, 2, · · · , where p = 13 and q = 23 . Thus

2 1 2 2 1
a n = ( )n − ( )n and b n = (( )n−1 − ( )n−1 )
3 3 3 3 3

49
and
an − bn s
G n (s) = p
a n+1 − b n+1 s
2 n 1 n 2 2 n−1
1 ( 3 ) − ( 3 ) − 3 (( 3 ) − ( 31 )n−1 )s
=
3 ( 23 )n+1 − ( 31 )n+1 − 32 (( 23 )n − ( 31 )n )
2n − 1 − (2n − 2)s
= .
2n+1 − 1 − (2n+1 − 2)s

We can also prove it by induction: when n = 1 we have


∞ p 1
pq k s k =
X
G(s) = G 1 (s) = G N (s) = = .
k=0 1 − q s 3 − 2s

Assume that for n we have


2n − 1 − (2n − 2)s
G n (s) = .
2n+1 − 1 − (2n+1 − 2)s

Then for n + 1 we have


1
2n − 1 − (2n − 2) 3−2s 2n+1 − 1 − (2n+1 − 2)s
G n+1 (s) = G n (G(s)) = 1
= .
2n+1 − 1 − (2n+1 − 2) 3−2s 2n+2 − 1 − (2n+2 − 2)s

Example 4.14. Families in a certain society choose the number of children that
they will have according to the following rule: If the first child is a girl, they have
exactly one more child. If the first child is a boy, they continue to have children
until the first girl, and then cease childbearing. Let Zn be the number of male
members in the n t h generation descended from a family.

Determine the offspring probability distribution and the offspring gener-


ating function.

Solution. Since we are only interested in male members of the population, here
the offspring means the male children of a typical family. Denote by N the num-

50
ber of male children of a typical family. We have the following p.m.f. of N :
P(N = 0) = P(the first is a girl, and the second is also a girl)
1
= ( )2 ;
2
P(N = 1) = P(the first is a girl, and the second is a boy)
+ P(the first is a boy, and the second is a girl)
1
= 2( )2 ;
2
P(N = k) = P(the first is a boy, another k − 1 boys and the final one is a girl)
1 1
= ( )k .
2 2
for k ≥ 2. Then the p.g.f. of N is

P(N = k)s k
X
G(s) =
k=0

= P(N = 0) + P(N = 1)s + P(N = k)s k
X
k=2
1 1 ∞ 1 1
= ( )2 + 2( )2 + ( )k s k
X
2 2 k=2 2 2
1 1 1X ∞ 1
= ( )2 + 2( )2 + ( s)k
2 2 2 k=2 2
1 1 1³ 1 1 ´
= ( )2 + 2( )2 + − 1 − s
2 2 2 1 − 12 s 2
1 1 1
= − + s+
4 4 2−s

4.3 Probability of ultimate extinction


Ultimate extinction means that the population goes to zero at certain genera-
tion, that is

[
{ultimate extinction} = {Zn = 0}.
n=1
Notice that by definition if Zn−1 = 0 then Zn = 0, which implies
{Zn−1 = 0} ⊂ {Zn = 0}

51
Hence {Zn = 0 : n ≥ 1} is an increasing sequence of events. Consequently we
may define
e := P({ultimate extinction}) = lim P(Zn = 0).
n→∞

We call e the probability of ultimate extinction, and 1 − e the probability that


the population survives indefinitely. The problem is to calculate e. The answer
lies in the following theorem.

Theorem 4.15. Let {Zn : n ≥ 0} be a Galton-Watson branching process with off-


spring p.g.f. G(s), and let e be the probability of ultimate extinction. Then e is
the smallest non-negative solution of the equation G(s) = s.

Proof. The proof consists of two steps:


Step 1. We first show that e is a solution of the equation G(s) = s. Denote by
e n = P(Zn = 0). We have

e = lim P(Zn = 0) = lim e n .


n→∞ n→∞

Recall that

G Zn (s) = P(Zn = 0) + P(Zn = 1)s + P(Zn = 2)s 2 + · · · .

Hence e n = G Zn (0). By using the identity G Zn (s) = G(G Zn−1 (s)) we get

e n = G Zn (0) = G(G Zn− (0)) = G(e n−1 ).

By taking the limit from both side we obtain e = G(e), i.e., e is a solution of the
equation G(s) = s.
Step 2. Now we show that if s 0 ≥ 0 is a solution of the equation G(s) = s, then
s 0 ≥ e. This will imply that e is the smallest non-negative solution.
First notice that

gk sk
X
G(s) =
k=0

is an increasing function of s on [0, ∞). Hence we can deduce from s 0 ≥ 0 that

s 0 = G(s 0 ) ≥ G(0) = e 1 .

Further from s 0 ≥ e 1 we may deduce that

s 0 = G(s 0 ) ≥ G(e 1 ) = G(G(0)) = e 2 .

52
More generally, if we have s 0 ≥ e n , then

s 0 = G(s 0 ) ≥ G(e n ) = G(G Zn (0)) = G Zn+1 (0) = e n+1 .

By induction we have s 0 ≥ e n for all n ≥ 1, so does the limit s 0 ≥ lim e n = e. This


n→∞
completes the proof. ■
Example 4.16. Suppose that in a Galton-Watson branching process we have
P(N = k) = pq k , k = 0, 1, · · · with p + q = 1.
(i) Find the probability of ultimate extinction.

(ii) For what value of p is the probability that the population survives indefi-
nitely at least 32 ?
Solution. (i) First we compute the offspring generating function:
∞ ∞ p
gk sk = pq k s k =
X X
G(s) = .
k=0 k=0 1− qs
Then we solve the equation G(s) = s:
p
G(s) = s ⇔ = s ⇔ q s 2 −s+p = 0 ⇔ q s 2 −(q+p)s+p = 0 ⇔ (q s−p)(s−1) = 0.
1− qs
p
Hence the equation has two solutions s 1 = q and s 2 = 1. This implies
p
(1) If p < q, then e = q ;

(2) If p ≥ q, then e = 1.
(ii) The question is equivalent to find p such that 1 − e ≥ 23 . This could hap-
p
pen only when p < q, which implies e = q . We have

2 p 2 p 1 1
1−e ≥ ⇔ 1 − ≥ ⇔ ≤ ⇔ 3p ≤ q ⇔ 3p ≤ 1 − p ⇔ p ≤ .
3 q 3 q 3 4

Example 4.17. In an office there are three staff members. Their email system
has the following rule: if one staff member is on holiday and an email is sent to
him/her, then the system has 31 chance to do nothing and 23 chance to forward the
email to the other two staff members.
One day all three staff members are on holiday and an email is sent to one of
them. What is the probability that the system keeps forwarding this email among
the three indefinitely?

53
Solution. Let Z0 = 1 and for n ≥ 1 let Zn be the total number of emails with
forwarding paths of length n. By assumption, Z1 = 2 with probability 32 and
Z1 = 0 with probability 13 . When Z1 = 2, we have Z2 = N1 + N2 , where N1 and
N2 are independent and have the same distribution as N : P(N = 2) = 32 and
P(N = 0) = 13 . Then it is clear that {Zn : n ≥ 0} forms a Galton-Watson branching
process with offspring distribution g 0 = 31 and g 2 = 23 . The offspring generating
function is
1 2
G(s) = + s 2 .
3 3
The probability of ultimate extinction e is the smallest non-negative solution of
the equation G(s) = s, which is equivalent to

2s 2 − 3s + 1 = 0.

The above equation has two solutions s 1 = 21 and s 2 = 1. Hence e = 12 . Thus


wth probability 1−e = 12 the process will survive indefinitely, i.e. the system will
keep forwarding the email among the three indefinitely. ■

54
Chapter 5

Renewal Processes

A renewal process {N (t ) : t ≥ 0} is a nonnegative integer-valued stochastic pro-


cess that registers the successive occurrences of an event, that is, N (t ) repre-
sents the number of occurrences of some event in the time interval [0, t ]. The
kind of events are like the failures of light bulls, the incidences of earthquakes
or the breakdowns of computers.

Example 5.1 (Doctor Who’s appearances). Doctor Who is able to regenerate after
suffering illness, mortal injury or old age. The process repairs all damage and
rejuvenates his body, but as a side effect it changes his physical appearance and
personality. Assume that Doctor Who first appears at time 0. For t ≥ 0 denote by
N (t ) the number of regenerations Doctor Who has undergone by time t . For k =
1, 2, . . . denote by X k the duration of Doctor Who’s k-th appearance. By definition

N (t ) = 0 for t ∈ [0, X 1 ) and N (t ) = max{k ≥ 1 : X 1 + . . . + X k ≤ t } for t ∈ [X 1 , ∞).

Then N (t ) is a renewal process.

The formal definition of a renewal process is the following.

Definition 5.2 (Renewal processes). A renewal process {N (t ) : t ≥ 0} is a process


of the form
N (t ) = max{n ≥ 0 : Tn ≤ t },
where T0 = 0, Tn = X 1 + X 2 +· · · X n for n ≥ 1, and X n , n = 1, 2, · · · are independent,
identically distributed non-negative random variables.

Remark 5.3. Tn can be regarded as the time at which the nth incident occurs (the
nth occurrence time), and X n are the interincident (arrival, occurrence) times.

55
Remark 5.4. Tn , X n can be expressed in terms of the process as follows:

Tn = inf{t ≥ 0 : N (t ) = n}, X n = Tn − Tn−1 .

5.1 The distribution of N (t )


5.1.1 Convolution Lemma
Given a random variable X , denote by F X (x) be the cumulative distribution
function (c.d.f.) of X , that is

F X (x) = P(X ≤ x), x ∈ R.

Two random variables X and Y have the same distribution if and only if F X =
F Y . When F X is differentiable, the derivative f X (x) = F X0 (x) is called the density
function of X .

Example 5.5. A random variable X has exponential distribution with parameter


λ, writing as X ∼ Exp(λ), if
X takes values in R+ = {x ≥ 0} and its density function is given by

λe −λx , if x ≥ 0;
½
f X (x) =
0, if x < 0.

From the definition we may deduce that E(X ) = 1/λ, since


Z ∞ Z ∞ Z ∞
E(X ) = x · f X (x) dx = x · λe −λx dx = − x de −λx
−∞ 0 0
¯∞ ³ Z ∞ ´ Z ∞
−λx ¯ −λx −λx
= −x · e ¯ − − e dx = e dx
0 0 0
1 ³
Z ∞ ´ 1
= · − de −λx = .
λ 0 λ

Lemma 5.6 (Convolution Lemma). Let X and Y be two independent non-negative


random variables. Then
Z ∞
F X +Y (x) = F X (x − y) dF Y (y).
0

56
Proof. We only prove this in the case when the density functions f X (x) = F X0 (s)
and f Y (y) = F Y0 (y) both exist. We have

F X +Y (z) = P(X + Y ≤ z)
Z ∞Z ∞
= 1{x+y≤z} f X (x) f Y (y) dxdy
Z0 ∞ Z0 ∞
= 1{x≤z−y} f X (x) dx f Y (y)dy
0 0
Z ∞
= F X (z − y) f Y (y) dy
0
Z ∞
= F X (z − y) dF Y (y).
0

5.1.2 The probability mass function of N (t )


The probability distribution of N (t ) is completely determined by the distribu-
tion of a typical inter-arrival time X 1 .
Let F (x) be the cumulative distribution function of X 1 in a renewal process.
For k ≥ 1 let F k (x) = P(Tk ≤ x) denote the cumulative distribution function of
Tk . The relation between F k and F k+1 is given by

Lemma 5.7. For k ≥ 1,


Z x
F 1 (x) = F (x), F k+1 (x) = F k (x − y) dF (y).
0

Proof. Since T1 = X 1 , we have F 1 (x) = F (x). Also Tk+1 = Tk + X k+1 , and Tk , X k+1
are independent. By the convolution lemma it follows that
Z ∞
F k+1 (x) = F k (x − y) dF (y).
0

Since Tk is a non-negative random variable, F k (t ) = 0 for all t < 0. Therefore


Z ∞ Z x
F k (x − y) dF (y) = F k (x − y) dF (y).
0 0

This gives the conclusion. ■

Proposition 5.8.
P(N (t ) = k) = F k (t ) − F k+1 (t ).

57
Proof. Observe that N (t ) ≥ k if and only if Tk ≤ t . Hence,

P(N (t ) ≥ k) = P(Tk ≤ t ) = F k (t ).

But {N (t ) = k} = {N (t ) ≥ k} \ {N (t ) ≥ k + 1}. Therefore,

P(N (t ) = k) = P(N (t ) ≥ k) − P(N (t ) ≥ k + 1) = F k (t ) − F k+1 (t ).

5.1.3 Renewal function


Definition 5.9. The expected number of occurrences m(t ) = E(N (t )) is called the
renewal function.

Recall that for k ≥ 1, F k (t ) = P(Tk ≤ t ) is the c.d.f. of Tk .

Lemma 5.10.

X
m(t ) = F k (t ).
k=1

Proof. Define the indicator variables


½
1, if Tk ≤ t ;
Ik =
0, otherwies.
P∞
Then N (t ) = I .
k=1 k
This gives

∞ ∞ ∞ ∞
m(t ) = E(N (t )) = E(I k ) = P(Tk ≤ t ) =
X X X X
(P(I k = 1) + 0) = F k (t ).
k=1 k=1 k=1 k=1

5.2 Poisson Processes


Definition 5.11. Let {N (t ) : t ≥ 0} be a renewal process with inter-arrival times
X 1 , X 2 , · · · and epoch of the kth arrival Tk . If X n ∼ Exp(λ), i.e., X n has the ex-
ponential distribution with parameter λ, then {N (t ), t ≥ 0} is a Poisson process
with rate λ.

58
5.2.1 The distribution of Poisson processes
When X 1 , · · · , X k are independent exponential random variables with common
parameter λ, one can show that Tk = X 1 + · · · + X k has the gamma distribution
Γ(k, λ), namely Z t
1
P(Tk ≤ t ) = λ (λs)k−1 e −λs ds.
0 (k − 1)!
Theorem 5.12. Let (N (t ), t ≥ 0) be a Poisson process with rate λ. Then, for s < t ,
N (t ) − N (s) ∼ Poiss(λ(t − s)). In particular, N (t ) ∼ Poiss(λt ), that is,

(λt )k −λt
P(N (t ) = k) = e for k = 0, 1, 2 . . . .
k!
Proof. We prove the theorem for the case s = 0. We have

P(N (t ) = k) = P(Tk ≤ t ) − P(Tk+1 ≤ t )


Z t Z t
k−1 −λs 1 1
= λ (λs) e ds − λ (λs)k e −λs ds
0 (k − 1)! 0 k!
Z t Z t
1 1
= λ (λs)k−1 e −λs ds + (λs)k d(e −λs )
0 (k − 1)! 0 k!
Z t ¯t Z t
1 1 1
= λ (λs)k−1 e −λs ds + (λs)k (e −λs )¯ − e −λs d(λs)k
¯
0 (k − 1)! k! 0 0 k!
Z t Z t
1 1 1
= λ (λs)k−1 e −λs ds + (λt )k (e −λt ) − e −λs kλ(λs)k−1 ds
0 (k − 1)! k! 0 k!
k
(λt )
= e −λt .
k!

In fact, the following general statement holds.

Theorem 5.13. If N (t ) is a Poisson process of rate λ, then

(1) (Independent increments) For any s 1 < t 1 < s 2 < t 2 < · · · < s n < t n , N (t n ) −
N (s n ), N (t n−1 ) − N (s n−1 ), · · · , N (t 1 ) − N (s 1 ) are independent.

(2) For 0 ≤ s ≤ t , N (t ) − N (s) ∼ Poiss(λ(t − s)).

59
5.2.2 Properties of Poisson processes
Proposition 5.14. Let {N (t ), t ≥ 0} be a Poisson process of rate λ. We have

(i) For s ≤ t , E(N (s)N (t )) = λs(λt + 1).

(ii) For s ≤ t and m ≤ n,


λn s m (t − s)n−m
P(N (s) = m, N (t ) = n) = e −λt .
m!(n − m)!

Proof. (i) We shall try to use the independent increment property.

E(N (s)N (t )) = E(N (s)[N (t ) − N (s) + N (s)])


= E(N (s)[N (t ) − N (s)]) + E(N (s)2 )
= E(N (s))E(N (t ) − N (s)) + E(N (s)2 )
= λs · λ(t − s) + E(N (s)2 )

Recall that if X ∼ Poiss(λ), then E(X ) = λ and V (X ) = E(X 2 ) − E(X )2 = λ,


which means E(X 2 ) = λ + λ2 . This yields that

E(N (s)N (t )) = E(N (s)[N (t ) − N (s) + N (s)])


= λs · λ(t − s) + λs + (λs)2
= λs(λt + 1).

(ii) Using the independent increment property,

P(N (s) = m, N (t ) = n) = P(N (s) = m, N (t ) − N (s) = n − m)


= P(N (s) = m)P(N (t ) − N (s) = n − m)
(λs)m −λ(t −s) (λ(t − s))n−m
= e −λs e
m! (n − m)!
n m n−m
λ s (t − s)
= e −λt .
m!(n − m)!

Example 5.15. For t 1 < t 2 < t 3 and n 1 ≤ n 2 ≤ n 3 we have


n 3 −n 2
−λ(t 3 −t 1 ) n 3 −n 1 (t 3 − t 2 ) (t 2 − t 1 )n2 −n1
P(N (t 3 ) = n 3 , N (t 2 ) = n 2 |N (t 1 ) = n 1 ) = e λ
(n 3 − n 2 )!(n 2 − n 1 )!

60
Solution.

P(N (t 3 ) = n 3 , N (t 2 ) = n 2 |N (t 1 ) = n 1 )
= P(N (t 3 ) − N (t 2 ) = n 3 − n 2 , N (t 2 ) − N (t 1 ) = n 2 − n 1 |N (t 1 ) = n 1 )
= P(N (t 3 ) − N (t 2 ) = n 3 − n 2 )P(N (t 2 ) − N (t 1 ) = n 2 − n 1 )
(λ(t 3 − t 2 ))n3 −n2 −λ(t2 −t1 ) (λ(t 2 − t 1 ))n2 −n1
= e −λ(t3 −t2 ) e
(n 3 − n 2 )! (n 2 − n 1 )!
n 3 −n 2 n 2 −n 1
(t 3 − t 2 ) (t 2 − t 1 )
= e −λ(t3 −t1 ) λn3 −n1 .
(n 3 − n 2 )!(n 2 − n 1 )!

Example 5.16. The number N (t ) of road traffic accidents in (0, t ] in a city forms
a Poisson process of rate λ. The number X of people seriously injured in each ac-
cident may be assumed to be random variables, independent of each other and
of the Poisson process N (t ), and each taking the value n = 0, 1, 2, · · · with proba-
bility pq n , p + q = 1. Let Y (t ) be the number of people seriously injured in road
traffic accidents in (0, t ].

(i) Express Y (t ) as a random sum.

(ii) Find the probability generating functions of N (t ) and X .

(iii) Find the probability generating function of Y (t ), P(Y (t ) = 0) and E(Y (t )).

Solution. (i) By definition,


½
X 1 + X 2 + · · · + X r , if N (t ) = r ≥ 1;
Y (t ) = X 1 + X 2 + · · · + X N (t ) =
0, if N (t ) = 0,

where X i ∼ X have the same distribution.


(ii) By direct calculation
∞ (λt )n n
e −λt s = e λt s−λt ,
X
G N (t ) (s) =
n=0 n!
∞ p
pq n s n =
X
G X (s) = .
n=0 1− qs
(iii) By the Random Sum Lemma,
p
G Y (t ) (s) = G N (t ) (G X (s)) = exp(λt − λt ).
1− qs

61
Hence
P(Y (t ) = 0) = G Y (t ) (0) = e λt p−λt = e −λt q
and
pq q
E(Y (t )) = G Y0 (t ) (1) = G N
0 0
(t ) (G X (1))G X (1) = E(N (t ))E(X ) = λt 2
= λt .
(1 − q) p

5.3 Lifetimes of renewal processes


5.3.1 The excess, current and total lifetimes
There are principal areas of interest concerning renewal processes. Given any
moment (time) t , of course, we are very interested in how much time is left
until the next incident. For example, when is the next earthquake? By time t ,
we know that N (t ) occurrences have already taken place and we are waiting
for the (N (t ) + 1)th occurrence, that is, t belongs to the random interval I t =
[T N (t ) , T N (t )+1 ).

Definition 5.17.

(a) E (t ) = T N (t )+1 −t is called the excess (residual) lifetime of I t . E(t ) is the time
that is left till next incident.

(b) C (t ) = t − T N (t ) is called the current lifetime (age) of I t . C (t ) represents the


time that has passed since the last incident.

(c) D(t ) = T N (t )+1 − T N (t ) is called the total lifetime of I t .

5.3.2 The lifetimes of Poisson processes


Theorem 5.18. Suppose that the renewal process N (t ) is a Poisson process of rate
λ. Then

(a) E (t ) ∼ Exp(λ), i.e.,


P(E (t ) ≤ x) = 1 − e −λx .

(b)
1 − e −λx , for 0 ≤ x < t ;
½
P(C (t ) ≤ x) =
1, for t ≤ x.

62
(c)
e −λ(x+y) , if x ≥ 0, 0 ≤ y ≤ t ;
½
P(E (t ) > x,C (t ) ≥ y) =
0, if y > t .

Proof. (a) Note that E (t ) = T N (t )+1 −t > x if and only if no arrivals occur in (t , t +
x]. Note that the time interval includes t +x since if the first arrival after t occurs
at t + x then E (t ) = x. Hence,

P(E (t ) > x) = P(N (t + x) − N (t ) = 0) = e −λx

and
P(E (t ) ≤ x) = 1 − P(E (t ) > x) = 1 − e −λx .
(b) Note that C (t ) = t − T N (t ) ≤ t . Therefore, for x > t , P(C (t ) < x) = 1. If
0 ≤ x ≤ t , C (t ) ≥ x if and only if no arrivals occur in (t − x, t ]. Note that if the last
arrival before t occurs at t − x then C (t ) = x and if it occurs at t then C (t ) = 0.
Hence, in this case,

P(C (t ) ≥ x) = P(N (t ) − N (t − x) = 0) = e −λx .

Summarizing the above facts, we get

1 − e −λx , for 0 ≤ x ≤ t ;
½
P(C (t ) < x) =
1, for t > x.

(c) When x ≥ 0, 0 ≤ y ≤ t , we have E (t ) > x,C (t ) ≥ y if and only if no arrivals


occur in (t − y, t + x]. Thus

P(E (t ) > x,C (t ) ≥ y) = P(N (t + x) − N (t − y) = 0) = e −λ(x+y) .

If y > t , since C (t ) ≤ t , we have P(E (t ) > x,C (t ) ≥ y) = 0. ■

Remark 5.19. As both E (t ) and C (t ) have no atoms, all above probabilities do


not change no matter you use strict or non-strict inequalities.

Theorem 5.20.
1
E(D(t )) = (2 − e −λt ).
λ

63
Proof. Since E (t ) ∼ Exp(λ), we have E(E (t )) = 1/λ. On the other hand,
Z t
E(C (t )) = x dP(C (t ) ≤ x)
0
¯t Z t
= [xP(C (t ) ≤ x)]¯ − P(C (t ) ≤ x) dx
¯
x=0 0
Z t
= t− P(C (t ) ≤ x) dx
0
Z t
= (1 − P(C (t ) ≤ x)) dx
0
Z t
= P(C (t ) > x) dx
0
Z t
= e −λx dx
0
1 −λx ¯¯t
= [− e ]¯
λ x=0
1 1 −λt
= − e .
λ λ
Then we conclude that
1
E(D(t )) = E(E (t )) + E(C (t )) = (2 − e −λt ).
λ

Example 5.21. Let T0 = 0 be the time you bought your first mobile phone. For
k ≥ 1 let Tk denote the time that you replace your kth mobile phone with a new
one. Assume that the lifetimes of your mobile phones are i.i.d. with Exp(λ), that
is, T1 − T0 , T2 − T1 , · · · , Tk+1 − Tk , · · · are i.i.d. exponential random variables with
parameter λ. For t ≥ 0 denote by N (t ) the number of mobile phones you have
replaced up to time t . Then

N (t ) = max{n ≥ 0 : Tn ≤ t }.

(Q1) What is the distribution of {N (t )}t ≥0 ?


[A]: It is a Poisson process with rate λ.

(Q2) What is the expected number of mobile phones you have replaced up to
time t ?
[A]: N (t ) ∼ Poiss(λt ), so E(N (t )) = λt .

64
(Q3) What is the expected remaining time you will spend on your current mobile
phone?
[A]: E (t ) ∼ Exp(λ), so E(E (t )) = λ1 .

(Q4) What is the expected total lifetime of the mobile phone you are using at
time t ?
[A]: E(D(t )) = λ1 (2 − e −λt ) = 2(2 − e −5 ).

Assume that each time you replace your current mobile phone with a new
one, there is p ∈ [0, 1] chance you will buy insurance for the new phone. Assume
that your decisions on buying insurance are independent of each other. For t ≥ 0
let N
e (t ) denote the number of mobile phones you have insured up to time t .

(Q5) What is the distribution of N


e (t ) conditioned on N (t ) = k?
[A]: It is a summation of k independent Bernoulli random variables
with parameter p, therefore it is Bin(k, p).

(Q6) Find the p.m.f. of N


e (t ).
[A]: Note that N
e (t ) ≤ N (t ) always holds. By using the law of total prob-
ability,

P(N P(N
X
e (t ) = m) = e (t ) = m|N (t ) = k)P(N (t ) = k)
k=0

P(N
X
= e (t ) = m|N (t ) = k)P(N (t ) = k)
k=m
∞ µ (λ)k

k
p m (1 − p)k−m e −λt
X
=
k=m
m k!
∞ k! (λ)k
p m (1 − p)k−m e −λt
X
=
k=m m!(k − m)! k!
1 ∞ ((1 − p)λt )k−m
(pλt )m e −λt
X
=
m! k=m (k − m)!
1 ∞ ((1 − p)λt )l
(pλt )m e −λt
X
=
m! l =0 l!
1
= (pλt )m e −λt e (1−p)λt
m!
1
= (pλt )m e −pλt .
m!

65
(Q7) What is the distribution of {N
e (t )}t ≥0 .
[A]: It is a Poisson process with rate pλ.

5.4 (*) Renewal theory


Theorem 5.22. P(N (t ) < ∞) = 1 for all t > 0 if and only if E(X 1 ) > 0.
Proof. By the definition of N (t ), N (t ) < ∞ is the same event as {there exists n ≥
1 such that Tn > t }.
If E(X 1 ) = 0 the X n = 0 almost surely for all n since the X n are non-negative
and have the same probability distribution as X 1 . Then Tn = 0 almost surely for
all n, thus there is no n ≥ 1 such that Tn > t . Hence N (t ) = ∞ almost surely, that
is P(N (t ) = ∞) = 1.
Conversely, suppose that E(X 1 ) > 0. There exists ² > 0 such that P(X 1 > ²) =
P(X n > ²) = δ > 0. Otherwise P(X 1 ≤ ²) = 1 thus E(X 1 ) ≤ ² for all ² > 0, which is a
contradiction. Put A n = {X n > ²}. Define
∞ [
\ ∞
A= An .
k=1 n=k

A is the event that X i ≥ ² occurs for infinitely many i . Note that


∞ \

Ac = A cn .
[
k=1 n=k

Since A cn are independent events, we have for each k,


∞ ∞
P( A cn ) = P(A cn ) = (1 − δ)∞ = 0.
\ Y
n=k n=k

Thus
∞ ∞
P(A c ) ≤ P( A cn ) = 0.
X \
k=1 n=k
This means P(A) = 1.
On the other hand,

A = {X n ≥ ² for infinitely many n}


⊂ {there exists n ≥ 1 such that Tn > t }
= {N (t ) < ∞}.

This yields that P(N (t ) < ∞) = 1. ■

66
Theorem 5.23. Supoose that 0 < E(X 1 ) = µ < ∞. Then, as t → ∞,

N (t ) 1
→ .
t µ

Proof. Recall that Tn = X 1 + · · · + X n . By the law of large numbers,

Tn
→ µ as n → ∞.
n
Therefore, noting that N (t ) → ∞ as t → ∞, we have

T N (t )
→ µ as t → ∞.
N (t )

On the other hand, since T N (t ) ≤ t < T N (t )+1 , we have

T N (t ) t T N (t )+1 1
≤ < (1 + ).
N (t ) N (t ) N (t ) + 1 N (t )

Thus, as t → ∞,
t N (t ) 1
→µ⇒ → .
N (t ) t µ

Recall the renewal function m(t ) = E(N (t )), and that F (x) = F 1 (x) = P(X 1 ≤
x) is the cumulative distribution function of X 1 .

Theorem 5.24 (Renewal Equation).


Z t
m(t ) = F (t ) + m(t − x) dF (x).
0

Proof. Recall that



X
N (t ) = max{n ≥ 0 : Tn ≤ t } = 1{Tn ≤t } ,
n=1

where T0 = 0, Tn = X 1 + X 2 + · · · X n for n ≥ 1, and X n , n = 1, 2, · · · are indepene-


dent, identically distributed non-negative random variables. Let T̃0 = 0 and
T̃n = X 2 + · · · + X n+1 = Tn+1 − X 1 for n ≥ 1. Then let

Ñ (t ) = max{n ≥ 0 : T̃n ≤ t }.

67
By definition Ñ (t ) has the same distribution as N (t ) and it is independent of
X 1 . Now consider X 1 . If X 1 > t , then N (t ) = 0. If X 1 ≤ t , then N (t ) ≥ 1 and

N (t ) = max{n ≥ 0 : Tn ≤ t }
= max{n ≥ 1 : Tn − X 1 ≤ t − X 1 }
= 1 + max{n ≥ 0 : Tn+1 − X 1 ≤ t − X 1 }
= 1 + max{n ≥ 0 : T̃n ≤ t − X 1 }
= 1 + Ñ (t − X 1 ).

This yields that


N (t ) = 1{X 1 ≤t } (1 + Ñ (t − X 1 )).
Take the expectation from both side we get

E(N (t )) = E(1{X 1 ≤t } ) + E(1{X 1 ≤t } Ñ (t − X 1 ))


³Z t ´
= P(X 1 ≤ t ) + E Ñ (t − x) dF (x)
0
Z t
= F (t ) + E(Ñ (t − x)) dF (x)
0
Z t
= F (t ) + m(t − x) dF (x).
0

Theorem 5.25 (Elementary Renewal Theorem). Supoose that 0 < E(X 1 ) = µ <
∞. Then, as t → ∞,
m(t ) 1
→ .
t µ

68

You might also like