0% found this document useful (0 votes)
150 views

Discrete Time Markov Chains

1. This chapter introduces discrete-time Markov chains. It defines key concepts like state-space, transition matrix, and Markov property. 2. It lists important theorems that characterize the behavior of Markov chains, such as theorems on hitting times, strong Markov property, recurrence/transience, and convergence to equilibrium. 3. It provides examples to illustrate the definition and shows that the probability of a Markov chain being in a given state after n steps can be calculated from the nth power of the transition matrix.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views

Discrete Time Markov Chains

1. This chapter introduces discrete-time Markov chains. It defines key concepts like state-space, transition matrix, and Markov property. 2. It lists important theorems that characterize the behavior of Markov chains, such as theorems on hitting times, strong Markov property, recurrence/transience, and convergence to equilibrium. 3. It provides examples to illustrate the definition and shows that the probability of a Markov chain being in a given state after n steps can be calculated from the nth power of the transition matrix.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

1

Discrete-time Markov chains

This chapter is the foundation for all that follows. Discrete-time Markov
chains are defined and their behaviour is investigated. For better orien-
tation we now list the key theorems: these are Theorems 1.3.2 and 1.3.5
on hitting times, Theorem 1.4.2 on the strong Markov property, Theorem
1.5.3 characterizing recurrence and transience, Theorem 1.7.7 on invariant
distributions and positive recurrence. Theorem 1.8.3 on convergence to
equilibrium, Theorem 1.9.3 on reversibility, and Theorem 1.10.2 on long-
run averages. Once you understand these you will understand the basic
theory. Part of that understanding will come from familiarity with exam-
ples, so a large number are worked out in the text. Exercises at the end of
each section are an important part of the exposition.

1.1 Definition and basic properties

Let I be a countable set. Each i ∈ I is called a state and I is called the


state-space. We say that λ = (λi : i ∈ I) is a measure on I if 0 ≤ λi < ∞

for all i ∈ I. If in addition the total mass i∈I λi equals 1, then we call
λ a distribution. We work throughout with a probability space (Ω, F , P).
Recall that a random variable X with values in I is a function X : Ω → I.
Suppose we set

λi = P(X = i) = P({ω : X(ω) = i}).

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


2 1. Discrete-time Markov chains

Then λ defines a distribution, the distribution of X. We think of X as


modelling a random state which takes the value i with probability λi . There
is a brief review of some basic facts about countable sets and probability
spaces in Chapter 6.
We say that a matrix P = (pij : i, j ∈ I) is stochastic if every row
(pij : j ∈ I) is a distribution. There is a one-to-one correspondence between
stochastic matrices P and the sort of diagrams described in the Introduc-
tion. Here are two examples:

  α
1−α α
P = 1 2
β 1−β
β
1
 
0 1 0
P =  0 1/2 1/2 
1
2 1
1/2 0 1/2

3 1 2
2

We shall now formalize the rules for a Markov chain by a definition in


terms of the corresponding matrix P . We say that (Xn )n≥0 is a Markov
chain with initial distribution λ and transition matrix P if
(i) X0 has distribution λ;
(ii) for n ≥ 0, conditional on Xn = i, Xn+1 has distribution (pij : j ∈ I)
and is independent of X0 , . . . , Xn−1 .
More explicitly, these conditions state that, for n ≥ 0 and i1 , . . . , in+1 ∈ I,
(i) P(X0 = i1 ) = λi1 ;
(ii) P(Xn+1 = in+1 | X0 = i1 , . . . , Xn = in ) = pin in +1 .
We say that (Xn )n≥0 is Markov (λ, P ) for short. If (Xn )0≤n≤N is a finite
sequence of random variables satisfying (i) and (ii) for n = 0, . . . , N − 1,
then we again say (Xn )0≤n≤N is Markov (λ, P ).
It is in terms of properties (i) and (ii) that most real-world examples are
seen to be Markov chains. But mathematically the following result appears
to give a more comprehensive description, and it is the key to some later
calculations.
Theorem 1.1.1. A discrete-time random process (Xn )0≤n≤N is
Markov(λ, P ) if and only if for all i1 , . . . , iN ∈ I

P(X0 = i1 , X1 = i2 , . . . , XN = iN ) = λi1 pi1 i2 pi2 i2 . . . piN −1 iN . (1.1)

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.1 Definition and basic properties 3

Proof. Suppose (Xn )0≤n≤N is Markov(λ, P ), then

P(X0 = i1 , X1 = i2 , . . . , XN = iN )
= P(X0 = i1 )P(X1 = i2 | X0 = i1 )
. . . P(XN = iN | X0 = i1 , . . . , XN −1 = iN −1 )
= λi1 pi1 i2 . . . piN −1 iN .

On the other hand, if (1.1) holds for N , then by summing both sides over

iN ∈ I and using j∈I pij = 1 we see that (1.1) holds for N − 1 and, by
induction

P(X0 = i1 , X1 = i2 , . . . , Xn = in ) = λi1 pi1 i2 . . . pin −1 in

for all n = 0, 1, . . . , N . In particular, P(X0 = i1 ) = λi1 and, for n =


0, 1, . . . , N − 1,

P(Xn+1 = in+1 | X0 = i1 , . . . , Xn = in )
= P(X0 = i1 , . . . , Xn = in , Xn+1 = in+1 )/P(X0 = i1 , . . . , Xn = in )
= pin in +1 .

So (Xn )0≤n≤N is Markov(λ, P ).

The next result reinforces the idea that Markov chains have no memory.
We write δi = (δij : j ∈ I) for the unit mass at i, where

1 if i = j
δij =
0 otherwise.

Theorem 1.1.2 (Markov property). Let (Xn )n≥0 be Markov(λ, P ).


Then, conditional on Xm = i, (Xm+n )n≥0 is Markov(δi , P ) and is indepen-
dent of the random variables X0 , . . . , Xm .

Proof. We have to show that for any event A determined by X0 , . . . , Xm


we have

P({Xm = im , . . . , Xm+n = im+n } ∩ A | Xm = i)


= δiim pim im +1 . . . pim +n −1 im +n P(A | Xm = i) (1.2)

then the result follows by Theorem 1.1.1. First consider the case of elemen-
tary events
A = {X0 = i1 , . . . , Xm = im }.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


4 1. Discrete-time Markov chains

In that case we have to show

P(X0 = i1 , . . . , Xm+n = im+n and i = im )/P(Xm = i)


= δiim pim im +1 . . . pim +n −1 im +n
× P(X0 = i1 , . . . , Xm = im and i = im )/P(Xm = i)

which is true by Theorem 1.1.1. In general, any event A determined by


X0 , . . . , Xm may be written as a countable disjoint union of elementary
events

A= Ak .
k=1

Then the desired identity (1.2) for A follows by summing up the corre-
sponding identities for Ak .
The remainder of this section addresses the following problem: what is
the probability that after n steps our Markov chain is in a given state? First
we shall see how the problem reduces to calculating entries in the nth power
of the transition matrix. Then we shall look at some examples where this
may be done explicitly.
We regard distributions and measures λ as row vectors whose compo-
nents are indexed by I, just as P is a matrix whose entries are indexed by
I × I. When I is finite we will often label the states 1, 2, . . . , N ; then λ
will be an N -vector and P an N × N -matrix. For these objects, matrix
multiplication is a familiar operation. We extend matrix multiplication to
the general case in the obvious way, defining a new measure λP and a new
matrix P 2 by

(λP )j = λi pij , (P 2 )ik = pij pjk .


i∈I j∈I

We define P n similarly for any n. We agree that P 0 is the identity matrix


I, where (I)ij = δij . The context will make it clear when I refers to the
(n)
state-space and when to the identity matrix. We write pij = (P n )ij for
the (i, j) entry in P n .
In the case where λi > 0 we shall write Pi (A) for the conditional prob-
ability P(A | X0 = i). By the Markov property at time m = 0, under Pi ,
(Xn )n≥0 is Markov(δi , P ). So the behaviour of (Xn )n≥0 under Pi does not
depend on λ.
Theorem 1.1.3. Let (Xn )n≥0 be Markov(λ, P ). Then, for all n, m ≥ 0,
(i) P(Xn = j) = (λP n )j ;
(n)
(ii) Pi (Xn = j) = P(Xn+m = j | Xm = i) = pij .

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.1 Definition and basic properties 5

Proof. (i) By Theorem 1.1.1



P(Xn = j) = ... P(X0 = i1 , . . . , Xn−1 = in−1 , Xn = j)


i1 ∈I in −1 ∈I

= ... λi1 pi1 i2 . . . pin −1 j = (λP n )j .


i1 ∈I in −1 ∈I

(ii) By the Markov property, conditional on Xm = i, (Xm+n )n≥0 is Markov


(δi , P ), so we just take λ = δi in (i).
(n)
In light of this theorem we call pij the n-step transition probability from i
(n)
to j. The following examples give some methods for calculating pij .

Example 1.1.4
The most general two-state chain has transition matrix of the form
 
1−α α
P =
β 1−β

and is represented by the following diagram:

α
1 2
β

We exploit the relation P n+1 = P n P to write

(n+1) (n) (n)


p11 = p12 β + p11 (1 − α).

(n) (n)
We also know that p11 + p12 = P1 (Xn = 1 or 2) = 1, so by eliminating
(n) (n)
p12 we get a recurrence relation for p11 :

(n+1) (n) (0)


p11 = (1 − α − β)p11 + β, p11 = 1.

This has a unique solution (see Section 1.11):



 β
+
α
(1 − α − β)n for α + β > 0
(n)
p11 = α+β α+β

1 for α + β = 0.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


6 1. Discrete-time Markov chains

Example 1.1.5 (Virus mutation)


Suppose a virus can exist in N different strains and in each generation
either stays the same, or with probability α mutates to another strain,
which is chosen at random. What is the probability that the strain in the
nth generation is the same as that in the 0th?
We could model this process as an N -state chain, with N × N transition
matrix P given by

pii = 1 − α, pij = α/(N − 1) for i = j.

(n)
Then the answer we want would be found by computing p11 . In fact, in
this example there is a much simpler approach, which relies on exploiting
the symmetry present in the mutation rules.
At any time a transition is made from the initial state to another with
probability α, and a transition from another state to the initial state with
probability α/(N − 1). Thus we have a two-state chain with diagram

α
initial other
α/(N − 1)

and by putting β = α/(N − 1) in Example 1.1.4 we find that the desired


probability is
  n
1 1 αN
+ 1− 1− .
N N N −1
Beware that in examples having less symmetry, this sort of lumping together
of states may not produce a Markov chain.

Example 1.1.6
Consider the three-state chain with diagram

1
2 1

3 1 2
2

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.1 Definition and basic properties 7

and transition matrix


 
0 1 0
P =0 1
2
1
2
.
1 1
2 0 2
(n)
The problem is to find a general formula for p11 .
First we compute the eigenvalues of P by writing down its characteristic
equation

0 = det (x − P ) = x(x − 12 )2 − 1
4 = 14 (x − 1)(4x2 + 1).
(n)
The eigenvalues are 1, i/2, −i/2 and from this we deduce that p11 has the
form  n  n
(n) i i
p11 = a + b +c −
2 2
for some constants a, b and c. (The justification comes from linear algebra:
having distinct eigenvalues, P is diagonalizable, that is, for some invertible
matrix U we have
 
1 0 0
P = U  0 i/2 0  U −1
0 0 −i/2

and hence  
1 0 0
Pn = U 0 (i/2)n 0  U −1
n
0 0 (−i/2)
(n)
which forces p11 to have the form claimed.) The answer we want is real
and  n  n  n 
i 1 ±inπ/2 1 nπ nπ 
± = e = cos ± i sin
2 2 2 2 2
(n)
so it makes sense to rewrite p11 in the form
 n 
(n) 1 nπ nπ 
p11 = α + β cos + γ sin
2 2 2
(n)
for constants α, β and γ. The first few values of p11 are easy to write
down, so we get equations to solve for α, β and γ:
(0)
1 = p11 = α + β
(1)
0 = p11 = α + 12 γ
(2)
0 = p11 = α − 14 β

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


8 1. Discrete-time Markov chains

so α = 1/5, β = 4/5, γ = −2/5 and


 n  
(n) 1 1 4 nπ 2 nπ
p11 = + cos − sin .
5 2 5 2 5 2

More generally, the following method may in principle be used to find a


(n)
formula for pij for any M -state chain and any states i and j.
(i) Compute the eigenvalues λ1 , . . . , λM of P by solving the character-
istic equation.
(n)
(ii) If the eigenvalues are distinct then pij has the form
(n)
pij = a1 λn1 + . . . + aM λnM

for some constants a1 , . . . , aM (depending on i and j). If an eigen-


value λ is repeated (once, say) then the general form includes the
term (an + b)λn .
(iii) As roots of a polynomial with real coefficients, complex eigenvalues
will come in conjugate pairs and these are best written using sine
and cosine, as in the example.

Exercises
∞
1.1.1 Let B1 , B2 , . . . be disjoint events with n=1 Bn = Ω. Show that if A
is another event and P(A|Bn ) = p for all n then P(A) = p.
Deduce that if X and Y are discrete random variables then the following
are equivalent:
(a) X and Y are independent;
(b) the conditional distribution of X given Y = y is independent of y.
1.1.2 Suppose that (Xn )n≥0 is Markov (λ, P ). If Yn = Xkn , show that
(Yn )n≥0 is Markov (λ, P k ).
1.1.3 Let X0 be a random variable with values in a countable set I. Let
Y1 , Y2 , . . . be a sequence of independent random variables, uniformly dis-
tributed on [0, 1]. Suppose we are given a function

G : I × [0, 1] → I

and define inductively

Xn+1 = G(Xn , Yn+1 ).

Show that (Xn )n≥0 is a Markov chain and express its transition matrix P
in terms of G. Can all Markov chains be realized in this way? How would
you simulate a Markov chain using a computer?

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.1 Definition and basic properties 9

Suppose now that Z0 , Z1 , . . . are independent, identically distributed


random variables such that Zi = 1 with probability p and Zi = 0 with
probability 1 − p. Set S0 = 0, Sn = Z1 + . . . + Zn . In each of the following
cases determine whether (Xn )n≥0 is a Markov chain:
(a) Xn = Zn , (b) Xn = Sn ,
(c) Xn = S0 + . . . + Sn , (d)Xn = (Sn , S0 + . . . + Sn ).
In the cases where (Xn )n≥0 is a Markov chain find its state-space and
transition matrix, and in the cases where it is not a Markov chain give an
example where P (Xn+1 = i|Xn = j, Xn−1 = k) is not independent of k.
1.1.4 A flea hops about at random on the vertices of a triangle, with all
jumps equally likely. Find the probability that after n hops the flea is back
where it started.
A second flea also hops about on the vertices of a triangle, but this flea is
twice as likely to jump clockwise as anticlockwise. What is the probability
that after√n hops this second flea is back where it started? [Recall that
e±iπ/6 = 3/2 ± i/2.]
1.1.5 A die is ‘fixed’ so that each time it is rolled the score cannot be the
same as the preceding score, all other scores having probability 1/5. If the
first score is 6, what is the probability p that the nth score is 6? What is
the probability that the nth score is 1?
Suppose now that a new die is produced which cannot score one greater
(mod 6) than the preceding score, all other scores having equal probability.
By considering the relationship between the two dice find the value of p for
the new die.
1.1.6 An octopus is trained to choose object A from a pair of objects A, B
by being given repeated trials in which it is shown both and is rewarded
with food if it chooses A. The octopus may be in one of three states of mind:
in state 1 it cannot remember which object is rewarded and is equally likely
to choose either; in state 2 it remembers and chooses A but may forget
again; in state 3 it remembers and chooses A and never forgets. After each
trial it may change its state of mind according to the transition matrix
1 1
State 1 2 2
0
1 1 5
State 2 2 12 12

State 3 0 0 1.

It is in state 1 before the first trial. What is the probablity that it is


in state 1 just before the (n+1)th trial ? What is the probability Pn+1 (A)
that it chooses A on the (n + 1)th trial ?

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


10 1. Discrete-time Markov chains

Someone suggests that the record of successive choices (a sequence of As


and Bs) might arise from a two-state Markov chain with constant transition
probabilities. Discuss, with reference to the value of Pn+1 (A) that you have
found, whether this is possible.

1.1.7 Let (Xn )n≥0 be a Markov chain on {1, 2, 3} with transition matrix
 
0 1 0
P =  0 2/3 1/3  .
p 1−p 0

Calculate P(Xn = 1|X0 = 1) in each of the following cases: (a) p = 1/16,


(b) p = 1/6, (c) p = 1/12.

1.2 Class structure

It is sometimes possible to break a Markov chain into smaller pieces, each


of which is relatively easy to understand, and which together give an un-
derstanding of the whole. This is done by identifying the communicating
classes of the chain.
We say that i leads to j and write i → j if

Pi (Xn = j for some n ≥ 0) > 0.

We say i communicates with j and write i ↔ j if both i → j and j → i.

Theorem 1.2.1. For distinct states i and j the following are equivalent:
(i) i → j;
(ii) pi1 i2 pi2 i2 . . . pin −1 in > 0 for some states i1 , i2 , . . . , in with i1 = i and
in = j;
(n)
(iii) pij > 0 for some n ≥ 0.
Proof. Observe that

(n) (n)
pij ≤ Pi (Xn = j for some n ≥ 0) ≤ pij
n=0

which proves the equivalence of (i) and (iii). Also

(n)

pij = pii2 pi2 i2 . . . pin −1 j


i2 ,... ,in −1

so that (ii) and (iii) are equivalent.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.3 Hitting times and absorption probabilities 11

It is clear from (ii) that i → j and j → k imply i → k. Also i → i for


any state i. So ↔ satisfies the conditions for an equivalence relation on I,
and thus partitions I into communicating classes. We say that a class C is
closed if
i ∈ C, i → j imply j ∈ C.
Thus a closed class is one from which there is no escape. A state i is
absorbing if {i} is a closed class. The smaller pieces referred to above are
these communicating classes. A chain or transition matrix P where I is a
single class is called irreducible.
As the following example makes clear, when one can draw the diagram,
the class structure of a chain is very easy to find.

Example 1.2.2
Find the communicating classes associated to the stochastic matrix
1 1
0 0 0 0
2 2
0 0 1 0 0 0
1 1 1 
 0 0 0
P =3 3 3
.
0 0 0 1 1
0
 2 2 
0 0 0 0 0 1
0 0 0 0 1 0

The solution is obvious from the diagram


1 4

2 5 6

the classes being {1, 2, 3}, {4} and {5, 6}, with only {5, 6} being closed.

Exercises
1.2.1 Identify the communicating classes of the following transition matrix:
1 
2
0 0 0 12
0 1 0 1 0 
 2 2 
P = 0 01 11 01 01  .

0 
4 4 4 4
1 1
2
0 0 0 2

Which classes are closed?

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


12 1. Discrete-time Markov chains

1.2.2 Show that every transition matrix on a finite state-space has at least
one closed communicating class. Find an example of a transition matrix
with no closed communicating class.

1.3 Hitting times and absorption probabilities

Let (Xn )n≥0 be a Markov chain with transition matrix P . The hitting time
of a subset A of I is the random variable H A : Ω → {0, 1, 2, . . . } ∪ {∞}
given by
H A (ω) = inf{n ≥ 0 : Xn (ω) ∈ A}
where we agree that the infimum of the empty set ∅ is ∞. The probability
starting from i that (Xn )n≥0 ever hits A is then

i = Pi (H < ∞).
hA A

When A is a closed class, hA


i is called the absorption probability. The mean
time taken for (Xn )n≥0 to reach A is given by

kiA = Ei (H A ) = nP(H A = n) + ∞P(H A = ∞).


n<∞

We shall often write less formally

i = Pi (hit A),
hA kiA = Ei (time to hit A).

Remarkably, these quantities can be calculated explicitly by means of cer-


tain linear equations associated with the transition matrix P . Before we
give the general theory, here is a simple example.

Example 1.3.1
Consider the chain with the following diagram:

1
1 1
2 2 2

1 2 1 3 4
2

Starting from 2, what is the probability of absorption in 4? How long does


it take until the chain is absorbed in 1 or 4?
Introduce

hi = Pi (hit 4), ki = Ei (time to hit {1, 4}).

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.3 Hitting times and absorption probabilities 13

Clearly, h1 = 0, h4 = 1 and k1 = k4 = 0. Suppose now that we start at 2,


and consider the situation after making one step. With probability 1/2 we
jump to 1 and with probability 1/2 we jump to 3. So

h2 = 12 h1 + 12 h3 , k2 = 1 + 12 k1 + 12 k3 .

The 1 appears in the second formula because we count the time for the first
step. Similarly,

h3 = 12 h2 + 12 h4 , k3 = 1 + 12 k2 + 12 k4 .

Hence

h2 = 12 h3 = 12 ( 12 h2 + 12 ),
k2 = 1 + 12 k3 = 1 + 12 (1 + 12 k2 ).

So, starting from 2, the probability of hitting 4 is 1/3 and the mean time to
absorption is 2. Note that in writing down the first equations for h2 and k2
we made implicit use of the Markov property, in assuming that the chain
begins afresh from its new position after the first jump. Here is a general
result for hitting probabilities.

Theorem 1.3.2. The vector of hitting probabilities hA = (hA i : i ∈ I) is


the minimal non-negative solution to the system of linear equations

hA
i =1 for i ∈ A
 (1.3)
hA
i =
A
j∈I pij hj for i ∈ A.

(Minimality means that if x = (xi : i ∈ I) is another solution with xi ≥ 0


for all i, then xi ≥ hi for all i.)

Proof. First we show that hA satisfies (1.3). If X0 = i ∈ A, then H A = 0,


i = 1. If X0 = i ∈ A, then H ≥ 1, so by the Markov property
so hA A

Pi (H A < ∞ | X1 = j) = Pj (H A < ∞) = hA
j

and

i = Pi (H < ∞) =
hA Pi (H A < ∞, X1 = j)
A

j∈I

= Pi (H A < ∞ | X1 = j)Pi (X1 = j) = pij hA


j .
j∈I j∈I

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


14 1. Discrete-time Markov chains

Suppose now that x = (xi : i ∈ I) is any solution to (1.3). Then hA


i = xi = 1
for i ∈ A. Suppose i ∈ A, then


xi = pij xj = pij + pij xj .


j∈I j∈A j∈A

Substitute for xj to obtain







xi = pij + pij pjk + pjk xk
j∈A j∈A k∈A k∈A

= Pi (X1 ∈ A) + Pi (X1 ∈ A, X2 ∈ A) + pij pjk xk .


j∈A k∈A

By repeated substitution for x in the final term we obtain after n steps

xi = Pi (X1 ∈ A) + . . . + Pi (X1 ∈ A, . . . , Xn−1 ∈ A, Xn ∈ A)



+ ... pij1 pj1 j2 . . . pjn −1 jn xjn .


j1 ∈A jn ∈A

Now if x is non-negative, so is the last term on the right, and the remaining
terms sum to Pi (H A ≤ n). So xi ≥ Pi (H A ≤ n) for all n and then

xi ≥ lim Pi (H A ≤ n) = Pi (H A < ∞) = hi .
n→∞

Example 1.3.1 (continued)


The system of linear equations (1.3) for h = h{4} are given here by

h4 = 1,
h2 = 12 h1 + 12 h3 , h3 = 12 h2 + 12 h4

so that

h2 = 12 h1 + 12 ( 21 h2 + 12 )

and

1
h2 = 3
+ 23 h1 , h3 = 2
3
+ 13 h1 .

The value of h1 is not determined by the system (1.3), but the minimality
condition now makes us take h1 = 0, so we recover h2 = 1/3 as before. Of
course, the extra boundary condition h1 = 0 was obvious from the beginning

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.3 Hitting times and absorption probabilities 15

so we built it into our system of equations and did not have to worry about
minimal non-negative solutions.
In cases where the state-space is infinite it may not be possible to write
down a corresponding extra boundary condition. Then, as we shall see in
the next examples, the minimality condition is essential.
Example 1.3.3 (Gamblers’ ruin)
Consider the Markov chain with diagram

q p q p q p
0 1 i i+1

where 0 < p = 1 − q < 1. The transition probabilities are

p00 = 1,
pi,i−1 = q, pi,i+1 = p for i = 1, 2, . . . .

Imagine that you enter a casino with a fortune of £i and gamble, £1 at a


time, with probability p of doubling your stake and probability q of losing
it. The resources of the casino are regarded as infinite, so there is no upper
limit to your fortune. But what is the probability that you leave broke?
Set hi = Pi (hit 0), then h is the minimal non-negative solution to

h0 = 1,
hi = phi+1 + qhi−1 , for i = 1, 2, . . . .

If p = q this recurrence relation has a general solution


 i
q
hi = A + B .
p
(See Section 1.11.) If p < q, which is the case in most successful casinos,
then the restriction 0 ≤ hi ≤ 1 forces B = 0, so hi = 1 for all i. If p > q,
then since h0 = 1 we get a family of solutions
 i   i 
q q
hi = +A 1− ;
p p

for a non-negative solution we must have A ≥ 0, so the minimal non-


negative solution is hi = (q/p)i . Finally, if p = q the recurrence relation
has a general solution
hi = A + Bi

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


16 1. Discrete-time Markov chains

and again the restriction 0 ≤ hi ≤ 1 forces B = 0, so hi = 1 for all i.


Thus, even if you find a fair casino, you are certain to end up broke. This
apparent paradox is called gamblers’ ruin.

Example 1.3.4 (Birth-and-death chain)


Consider the Markov chain with diagram

q1 p1 qi pi qi+1 pi+1
0 1 i i+1

where, for i = 1, 2, . . . , we have 0 < pi = 1 − qi < 1. As in the preceding


example, 0 is an absorbing state and we wish to calculate the absorption
probability starting from i. But here we allow pi and qi to depend on i.
Such a chain may serve as a model for the size of a population, recorded
each time it changes, pi being the probability that we get a birth before
a death in a population of size i. Then hi = Pi (hit 0) is the extinction
probability starting from i.
We write down the usual system of equations

h0 = 1,
hi = pi hi+1 + qi hi−1 , for i = 1, 2, . . . .

This recurrence relation has variable coefficients so the usual technique fails.
But consider ui = hi−1 − hi , then pi ui+1 = qi ui , so
   
qi qi qi−1 . . . q1
ui+1 = ui = u1 = γi u1
pi pi pi−1 . . . p1

where the final equality defines γi . Then

u1 + . . . + ui = h0 − hi

so
hi = 1 − A(γ0 + . . . + γi−1 )
where A = u1 and γ0 = 1. At this point A remains to be determined. In
∞
the case i=0 γ i = ∞, the restriction 0 ≤ hi ≤ 1 forces A = 0 and hi = 1

for all i. But if i=0 γi < ∞ then we can take A > 0 so long as

1 − A(γ0 + . . . + γi−1 ) ≥ 0 for all i.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.3 Hitting times and absorption probabilities 17
 ∞ −1
Thus the minimal non-negative solution occurs when A = i=0 γi and
then 


∞ ∞
hi = γj γj .
j=i j=0

In this case, for i = 1, 2, . . . , we have hi < 1, so the population survives


with positive probability.
Here is the general result on mean hitting times. Recall that kiA =
Ei (H A ), where H A is the first time (Xn )n≥0 hits A. We use the notation
1B for the indicator function of B, so, for example, 1X1 =j is the random
variable equal to 1 if X1 = j and equal to 0 otherwise.

Theorem 1.3.5. The vector of mean hitting times k A = (k A : i ∈ I) is


the minimal non-negative solution to the system of linear equations
 A
ki = 0 for i ∈ A
 (1.4)
ki = 1 + j∈A pij kj for i ∈ A.
A A

Proof. First we show that k A satisfies (1.4). If X0 = i ∈ A, then H A = 0,


so kiA = 0. If X0 = i ∈ A, then H A ≥ 1, so, by the Markov property,

Ei (H A | X1 = j) = 1 + Ej (H A )

and

kiA = Ei (H A ) = Ei (H A 1X1 =j )
j∈I

= Ei (H A | X1 = j)Pi (X1 = j) = 1 + pij kjA .


j∈I j∈A

Suppose now that y = (yi : i ∈ I) is any solution to (1.4). Then kiA = yi = 0


for i ∈ A. If i ∈ A, then

yi = 1 + pij yj
j∈A



=1+ pij 1 + pjk yk
j∈A k∈A

= Pi (H A ≥ 1) + Pi (H A ≥ 2) + pij pjk yk .
j∈A k∈A

By repeated substitution for y in the final term we obtain after n steps



yi = Pi (H A ≥ 1) + . . . + Pi (H A ≥ n) + ... pij1 pj1 j2 . . . pjn −1 jn yjn .


j1 ∈A jn ∈A

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


18 1. Discrete-time Markov chains

So, if y is non-negative,
yi ≥ Pi (H A ≥ 1) + . . . + Pi (H A ≥ n)
and, letting n → ∞,

yi ≥ Pi (H A ≥ n) = Ei (H A ) = kiA .
n=1

Exercises
1.3.1 Prove the claims (a), (b) and (c) made in example (v) of the Intro-
duction.
1.3.2 A gambler has £2 and needs to increase it to £10 in a hurry. He
can play a game with the following rules: a fair coin is tossed; if a player
bets on the right side, he wins a sum equal to his stake, and his stake is
returned; otherwise he loses his stake. The gambler decides to use a bold
strategy in which he stakes all his money if he has £5 or less, and otherwise
stakes just enough to increase his capital, if he wins, to £10.
Let X0 = 2 and let Xn be his capital after n throws. Prove that the
gambler will achieve his aim with probability 1/5.
What is the expected number of tosses until the gambler either achieves
his aim or loses his capital?
1.3.3 A simple game of ‘snakes and ladders’ is played on a board of nine
squares.

7 8 FINISH 9

6 5 4

1 2 3

START

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.4 Strong Markov property 19

At each turn a player tosses a fair coin and advances one or two places
according to whether the coin lands heads or tails. If you land at the foot
of a ladder you climb to the top, but if you land at the head of a snake you
slide down to the tail. How many turns on average does it take to complete
the game?
What is the probability that a player who has reached the middle square
will complete the game without slipping back to square 1?
1.3.4 Let (Xn )n≥0 be a Markov chain on {0, 1, . . . } with transition proba-
bilities given by
 2
i+1
p01 = 1, pi,i+1 + pi,i−1 = 1, pi,i+1 = pi,i−1 , i ≥ 1 .
i

Show that if X0 = 0 then the probability that Xn ≥ 1 for all n ≥ 1 is 6/π 2 .

1.4 Strong Markov property


In Section 1.1 we proved the Markov property. This says that for each time
m, conditional on Xm = i, the process after time m begins afresh from
i. Suppose, instead of conditioning on Xm = i, we simply waited for the
process to hit state i, at some random time H. What can one say about the
process after time H? What if we replaced H by a more general random
time, for example H − 1? In this section we shall identify a class of random
times at which a version of the Markov property does hold. This class will
include H but not H − 1; after all, the process after time H − 1 jumps
straight to i, so it does not simply begin afresh.
A random variable T : Ω → {0, 1, 2, . . . } ∪ {∞} is called a stopping time
if the event {T = n} depends only on X0 , X1 , . . . , Xn for n = 0, 1, 2, . . . .
Intuitively, by watching the process, you know at the time when T occurs.
If asked to stop at T , you know when to stop.
Examples 1.4.1
(a) The first passage time

Tj = inf{n ≥ 1 : Xn = j}

is a stopping time because

{Tj = n} = {X1 = j, . . . , Xn−1 = j, Xn = j}.

(b) The first hitting time H A of Section 1.3 is a stopping time because

{H A = n} = {X0 ∈ A, . . . , Xn−1 ∈ A, Xn ∈ A}.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


20 1. Discrete-time Markov chains

(c) The last exit time

LA = sup{n ≥ 0 : Xn ∈ A}

is not in general a stopping time because the event {LA = n} depends on


whether (Xn+m )m≥1 visits A or not.

We shall show that the Markov property holds at stopping times. The
crucial point is that, if T is a stopping time and B ⊆ Ω is determined by
X0 , X1 , . . . , XT , then B ∩ {T = m} is determined by X0 , X1 , . . . , Xm , for
all m = 0, 1, 2, . . . .

Theorem 1.4.2 (Strong Markov property). Let (Xn )n≥0 be


Markov(λ, P ) and let T be a stopping time of (Xn )n≥0 . Then, conditional
on T < ∞ and XT = i, (XT +n )n≥0 is Markov(δi , P ) and independent of
X 0 , X1 , . . . , XT .

Proof. If B is an event determined by X0 , X1 , . . . , XT , then B ∩ {T = m}


is determined by X0 , X1 , . . . , Xm , so, by the Markov property at time m

P({XT = j0 , XT +1 = j1 , . . . , XT +n = jn } ∩ B ∩ {T = m} ∩ {XT = i})


= Pi (X0 = j0 , X1 = j1 , . . . , Xn = jn )P(B ∩ {T = m} ∩ {XT = i})

where we have used the condition T = m to replace m by T . Now sum over


m = 0, 1, 2, . . . and divide by P(T < ∞, XT = i) to obtain

P({XT = j0 , XT +1 = j1 , . . . , XT +n = jn } ∩ B | T < ∞, XT = i)
= Pi (X0 = j0 , X1 = j1 , . . . , Xn = jn )P(B | T < ∞, XT = i).

The following example uses the strong Markov property to get more
information on the hitting times of the chain considered in Example 1.3.3.

Example 1.4.3
Consider the Markov chain (Xn )n≥0 with diagram

q p q p q p
0 1 i i+1

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.4 Strong Markov property 21

where 0 < p = 1 − q < 1. We know from Example 1.3.3 the probability of


hitting 0 starting from 1. Here we obtain the complete distribution of the
time to hit 0 starting from 1 in terms of its probability generating function.
Set
Hj = inf{n ≥ 0 : Xn = j}
and, for 0 ≤ s < 1

φ(s) = E1 (sH0 ) = sn P1 (H0 = n).


n<∞

Suppose we start at 2. Apply the strong Markov property at H1 to see


that under P2 , conditional on H1 < ∞, we have H0 = H1 + H  0 , where
H 0 , the time taken after H1 to get to 0, is independent of H1 and has the
(unconditioned) distribution of H1 . So
0 | H < ∞)P (H < ∞)
E2 (sH0 ) = E2 (sH1 | H1 < ∞)E2 (sH 1 2 1
0 | H < ∞)
= E2 (sH1 1H1 <∞ )E2 (sH 1

= E2 (sH1 )2 = φ(s)2 .

Then, by the Markov property at time 1, conditional on X1 = 2, we have


H0 = 1 + H 0 , where H 0 , the time taken after time 1 to get to 0, has the
same distribution as H0 does under P2 . So
φ(s) = E1 (sH0 ) = pE1 (sH0 | X1 = 2) + qE1 (sH0 | X1 = 0)
= pE1 (s1+H 0 | X1 = 2) + qE1 (s | X1 = 0)
= psE2 (sH0 ) + qs
= psφ(s)2 + qs.

Thus φ = φ(s) satisfies


psφ2 − φ + qs = 0 (1.5)
and 
φ = (1 ± 1 − 4pqs2 )/2ps.
Since φ(0) ≤ 1 and φ is continuous we are forced to take the negative root
at s = 0 and stick with it for all 0 ≤ s < 1.
To recover the distribution of H0 we expand the square-root as a power
series:
  
1
φ(s) = 1 − 1 + 2 (−4pqs ) + 2 (− 2 )(−4pqs ) /2! + . . .
1 2 1 1 2 2
2ps
= qs + pq 2 s3 + . . .
= sP1 (H0 = 1) + s2 P1 (H0 = 2) + s3 P1 (H0 = 3) + . . . .

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


22 1. Discrete-time Markov chains

The first few probabilities P1 (H0 = 1), P1 (H0 = 2), . . . are readily checked
from first principles.
On letting s ↑ 1 we have φ(s) → P1 (H0 < ∞), so
√ 
1 − 1 − 4pq 1 if p ≤ q
P1 (H0 < ∞) = =
2p q/p if p > q.
(Remember that q = 1 − p, so
 
1 − 4pq = 1 − 4p + 4p2 = |1 − 2p| = |2q − 1|.)

We can also find the mean hitting time using

E1 (H0 ) = lim φ (s).


s↑1

It is only worth considering the case p ≤ q, where the mean hitting time
has a chance of being finite. Differentiate (1.5) to obtain

2psφφ + pφ2 − φ + q = 0

so

φ (s) = (pφ(s)2 + q)/(1 − 2psφ(s)) → 1/(1 − 2p) = 1/(q − p) as s ↑ 1.

See Example 5.1.1 for a connection with branching processes.


Example 1.4.4
We now consider an application of the strong Markov property to a Markov
chain (Xn )n≥0 observed only at certain times. In the first instance suppose
that J is some subset of the state-space I and that we observe the chain
only when it takes values in J . The resulting process (Ym )m≥0 may be
obtained formally by setting Ym = XTm , where

T0 = inf{n ≥ 0 : Xn ∈ J }

and, for m = 0, 1, 2, . . .

Tm+1 = inf{n > Tm : Xn ∈ J }.

Let us assume that P(Tm < ∞) = 1 for all m. For each m we can check
easily that Tm , the time of the mth visit to J , is a stopping time. So the
strong Markov property applies to show, for i1 , . . . , im+1 ∈ J , that

P(Ym+1 = im+1 | Y0 = i1 , . . . , Ym = im )
= P(XTm +1 = im+1 | XT0 = i1 , . . . , XTm = im )
= Pim (XT1 = im+1 ) = pim im +1

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.4 Strong Markov property 23

where, for i, j ∈ J
pij = hji

and where, for j ∈ J , the vector (hji : i ∈ I) is the minimal non-negative


solution to

hji = pij + pik hjk . (1.6)


k∈J

Thus (Ym )m≥0 is a Markov chain on J with transition matrix P .


A second example of a similar type arises if we observe the original chain
(Xn )n≥0 only when it moves. The resulting process (Zm )m≥0 is given by
Zm = XSm where S0 = 0 and for m = 0, 1, 2, . . .

Sm+1 = inf{n ≥ Sm : Xn = XSm }.

Let us assume there are no absorbing states. Again the random times Sm
for m ≥ 0 are stopping times and, by the strong Markov property

P(Zm+1 = im+1 | Z0 = i1 , . . . , Zm = im )
= P(XSm +1 = im+1 | XS0 = i1 , . . . , XSm = im )
= Pim (XS1 = im+1 ) = pim im +1

where pii = 0 and, for i = j


pij = pij / pik .


k=i

Thus (Zm )m≥0 is a Markov chain on I with transition matrix P.

Exercises
1.4.1 Let Y1 , Y2 , . . . be independent identically distributed random vari-
ables with
P(Y1 = 1) = P(Y1 = −1) = 1/2 and set X0 = 1, Xn = X0 + Y1 + . . . + Yn
for n ≥ 1. Define
H0 = inf{n ≥ 0 : Xn = 0} .
Find the probability generating function φ(s) = E(sH0 ).
Suppose the distribution of Y1 , Y2 , . . . is changed to P(Y1 = 2) = 1/2,
P(Y1 = −1) = 1/2. Show that φ now satisfies

sφ3 − 2φ + s = 0 .

1.4.2 Deduce carefully from Theorem 1.3.2 the claim made at (1.6).

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


24 1. Discrete-time Markov chains

1.5 Recurrence and transience

Let (Xn )n≥0 be a Markov chain with transition matrix P . We say that a
state i is recurrent if

Pi (Xn = i for infinitely many n) = 1.

We say that i is transient if

Pi (Xn = i for infinitely many n) = 0.

Thus a recurrent state is one to which you keep coming back and a transient
state is one which you eventually leave for ever. We shall show that every
state is either recurrent or transient.
Recall that the first passage time to state i is the random variable Ti
defined by
Ti (ω) = inf{n ≥ 1 : Xn (ω) = i}
(r)
where inf ∅ = ∞. We now define inductively the rth passage time Ti to
state i by
(0) (1)
Ti (ω) = 0, Ti (ω) = Ti (ω)
and, for r = 0, 1, 2, . . . ,
(r+1) (r)
Ti (ω) = inf{n ≥ Ti (ω) + 1 : Xn (ω) = i}.

The length of the rth excursion to i is then



(r) (r−1) (r−1)
(r) Ti − Ti if Ti <∞
Si =
0 otherwise.

The following diagram illustrates these definitions:


Xn

i
(0)
Ti
(1)
Ti Ti
(2) (3)
Ti n

(1) (2) (3) (4)


Si Si Si Si

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.5 Recurrence and transience 25

Our analysis of recurrence and transience will rest on finding the joint
distribution of these excursion lengths.
(r−1) (r)
Lemma 1.5.1. For r = 2, 3, . . . , conditional on Ti < ∞, Si is inde-
(r−1)
pendent of {Xm : m ≤ Ti } and
(r) (r−1)
P(Si = n | Ti < ∞) = Pi (Ti = n).

(r−1)
Proof. Apply the strong Markov property at the stopping time T = Ti .
It is automatic that XT = i on T < ∞. So, conditional on T < ∞,
(XT +n )n≥0 is Markov(δi , P ) and independent of X0 , X1 , . . . , XT . But
(r)
Si = inf{n ≥ 1 : XT +n = i},
(r)
so Si is the first passage time of (XT +n )n≥0 to state i.

Recall that the indicator function 1{X1 =j} is the random variable equal
to 1 if X1 = j and 0 otherwise. Let us introduce the number of visits Vi to
i, which may be written in terms of indicator functions as

Vi = 1{Xn =i}
n=0

and note that










(n)
Ei (Vi ) = Ei 1{Xn =i} = Ei (1{Xn =i} ) = Pi (Xn = i) = pii .
n=0 n=0 n=0 n=0

Also, we can compute the distribution of Vi under Pi in terms of the return


probability
fi = Pi (Ti < ∞).

Lemma 1.5.2. For r = 0, 1, 2, . . . , we have Pi (Vi > r) = fir .


(r)
Proof. Observe that if X0 = i then {Vi > r} = {Ti < ∞}. When r = 0
the result is true. Suppose inductively that it is true for r, then
(r+1)
Pi (Vi > r + 1) = Pi (Ti < ∞)
(r) (r+1)
= Pi (Ti < ∞ and Si < ∞)
(r+1) (r) (r)
= Pi (Si < ∞| Ti < ∞)Pi (Ti < ∞)
r r+1
= fi fi = fi

by Lemma 1.5.1, so by induction the result is true for all r.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


26 1. Discrete-time Markov chains

Recall that one can compute the expectation of a non-negative integer-


valued random variable as follows:




P(V > r) = P(V = v)
r=0 r=0 v=r+1
∞ v−1



= P(V = v) = vP(V = v) = E(V ).


v=1 r=0 v=1

The next theorem is the means by which we establish recurrence or


transience for a given state. Note that it provides two criteria for this, one
in terms of the return probability, the other in terms of the n-step transition
probabilities. Both are useful.
Theorem 1.5.3. The following dichotomy holds:
∞ (n)
(i) if Pi (Ti < ∞) = 1, then i is recurrent and n=0 pii = ∞;
∞ (n)
(ii) if Pi (Ti < ∞) < 1, then i is transient and n=0 pii < ∞.
In particular, every state is either transient or recurrent.
Proof. If Pi (Ti < ∞) = 1, then, by Lemma 1.5.2,

Pi (Vi = ∞) = lim Pi (Vi > r) = 1


r→∞

so i is recurrent and


(n)
pii = Ei (Vi ) = ∞.
n=0

On the other hand, if fi = Pi (Ti < ∞) < 1, then by Lemma 1.5.2







(n) 1
pii = Ei (Vi ) = Pi (Vi > r) = fir = <∞
n=0 r=0 r=0
1 − fi

so Pi (Vi = ∞) = 0 and i is transient.


From this theorem we can go on to solve completely the problem of
recurrence or transience for Markov chains with finite state-space. Some
cases of infinite state-space are dealt with in the following chapter. First
we show that recurrence and transience are class properties.
Theorem 1.5.4. Let C be a communicating class. Then either all states
in C are transient or all are recurrent.
Proof. Take any pair of states i, j ∈ C and suppose that i is transient.
(n) (m)
There exist n, m ≥ 0 with pij > 0 and pji > 0, and, for all r ≥ 0
(n+r+m) (n) (r) (m)
pii ≥ pij pjj pji

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.5 Recurrence and transience 27

so



(r) 1 (n+r+m)
pjj ≤ (n) (m)
pii <∞
r=0 pij pji r=0

by Theorem 1.5.3. Hence j is also transient by Theorem 1.5.3.

In the light of this theorem it is natural to speak of a recurrent or transient


class.

Theorem 1.5.5. Every recurrent class is closed.

Proof. Let C be a class which is not closed. Then there exist i ∈ C, j ∈ C


and m ≥ 1 with
Pi (Xm = j) > 0.

Since we have

Pi ({Xm = j} ∩ {Xn = i for infinitely many n}) = 0

this implies that

Pi (Xn = i for infinitely many n) < 1

so i is not recurrent, and so neither is C.

Theorem 1.5.6. Every finite closed class is recurrent.

Proof. Suppose C is closed and finite and that (Xn )n≥0 starts in C. Then
for some i ∈ C we have

0 < P(Xn = i for infinitely many n)


= P(Xn = i for some n)Pi (Xn = i for infinitely many n)

by the strong Markov property. This shows that i is not transient, so C is


recurrent by Theorems 1.5.3 and 1.5.4.

It is easy to spot closed classes, so the transience or recurrence of finite


classes is easy to determine. For example, the only recurrent class in Ex-
ample 1.2.2 is {5, 6}, the others being transient. On the other hand, infinite
closed classes may be transient: see Examples 1.3.3 and 1.6.3.
We shall need the following result in Section 1.8. Remember that irre-
ducibility means that the chain can get from any state to any other, with
positive probability.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


28 1. Discrete-time Markov chains

Theorem 1.5.7. Suppose P is irreducible and recurrent. Then for all


j ∈ I we have P(Tj < ∞) = 1.

Proof. By the Markov property we have


P(Tj < ∞) = P(X0 = i)Pi (Tj < ∞)


i∈I

(m)
so it suffices to show Pi (Tj < ∞) = 1 for all i ∈ I. Choose m with pji > 0.
By Theorem 1.5.3, we have

1 = Pj (Xn = j for infinitely many n)


= Pj (Xn = j for some n ≥ m + 1)

= Pj (Xn = j for some n ≥ m + 1 | Xm = k)Pj (Xm = k)


k∈I

(m)
= Pk (Tj < ∞)pjk
k∈I

 (m)
where the final equality uses the Markov property. But k∈I pjk = 1 so
we must have Pi (Tj < ∞) = 1.

Exercises
1.5.1 In Exercise 1.2.1, which states are recurrent and which are transient?

1.5.2 Show that, for the Markov chain (Xn )n≥0 in Exercise 1.3.4 we have

P(Xn → ∞ as n → ∞) = 1 .

Suppose, instead, the transition probabilities satisfy


 α
i+1
pi,i+1 = pi,i−1 .
i

For each α ∈ (0, ∞) find the value of P(Xn → ∞ as n → ∞).

1.5.3 (First passage decomposition). Denote by Tj the first passage


time to state j and set
(n)
fij = Pi (Tj = n).
Justify the identity

(n)

n
(k) (n−k)
pij = fij pjj for n ≥ 1
k=1

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.6 Recurrence and transience of random walks 29

and deduce that


Pij (s) = δij + Fij (s)Pjj (s)
where



(n) (n)
Pij (s) = pij sn , Fij (s) = fij sn .
n=0 n=0
Hence show that Pi (Ti < ∞) = 1 if and only if


(n)
pii = ∞
n=0

without using Theorem 1.5.3.


1.5.4 A random sequence of non-negative integers (Fn )n≥0 is obtained by
setting F0 = 0 and F1 = 1 and, once F0 , . . . , Fn are known, taking Fn+1 to
be either the sum or the difference of Fn−1 and Fn , each with probability
1/2. Is (Fn )n≥0 a Markov chain?
By considering the Markov chain Xn = (Fn−1 , Fn ), find the probability
that (Fn )n≥0 reaches 3 before first returning to 0.
Draw enough of the flow diagram for (Xn )n≥0 to establish a general
pattern. Hence, using the strong Markov property, √ show that the hitting
probability for (1, 1), starting from (1, 2), is (3 − 5)/2.
Deduce that (Xn )n≥0 is transient. Show that, moreover, with probability
1, Fn → ∞ as n → ∞.

1.6 Recurrence and transience of random walks


In the last section we showed that recurrence was a class property, that all
recurrent classes were closed and that all finite closed classes were recurrent.
So the only chains for which the question of recurrence remains interesting
are irreducible with infinite state-space. Here we shall study some simple
and fundamental examples of this type, making use of the following criterion
for recurrence from Theorem 1.5.3: a state i is recurrent if and only if
∞ (n)
n=0 pii = ∞.

Example 1.6.1 (Simple random walk on Z)


The simple random walk on Z has diagram

q p
i−1 i i+1

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


30 1. Discrete-time Markov chains

where 0 < p = 1 − q < 1. Suppose we start at 0. It is clear that we cannot


(2n+1)
return to 0 after an odd number of steps, so p00 = 0 for all n. Any
given sequence of steps of length 2n from 0 to 0 occurs with probability
pn q n , there being n steps up and n steps down, and the number of such
sequences is the number of ways of choosing the n steps up from 2n. Thus
 
(2n) 2n n n
p00 = p q .
n

Stirling’s formula provides a good approximation to n! for large n: it is


known that

n! ∼ 2πn(n/e)n as n → ∞

where an ∼ bn means an /bn → 1. For a proof see W. Feller, An Introduction


to Probability Theory and its Applications, Vol I (Wiley, New York, 3rd
edition, 1968). At the end of this chapter we reproduce the argument used
by Feller to show that

n! ∼ A n(n/e)n as n → ∞

for some A ∈ [1, ∞). The additional work needed to show A = 2π is
omitted, as this fact is unnecessary to our applications.
For the n-step transition probabilities we obtain

(2n) (2n)! (4pq)n


p00 = (pq) n
∼  as n → ∞.
(n!)2 A n/2

In the symmetric case p = q = 1/2, so 4pq = 1; then for some N and all
n ≥ N we have
(2n) 1
p00 ≥ √
2A n
so



(2n) 1
1
p00 ≥ √ =∞
2A n
n=N n=N

which shows that the random walk is recurrent. On the other hand, if p = q
then 4pq = r < 1, so by a similar argument, for some N



(n) 1
n
p00 ≤ r <∞
A
n=N n=N

showing that the random walk is transient.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.6 Recurrence and transience of random walks 31

Example 1.6.2 (Simple symmetric random walk on Z2 )


The simple symmetric random walk on Z2 has diagram

1
1 4
4

1
1 4
4

and transition probabilities



1/4 if |i − j| = 1
pij =
0 otherwise.
Suppose we start at 0. Let us call the walk Xn and write Xn+ and Xn− for
the orthogonal projections of Xn on the diagonal lines y = ±x:

Xn+

Xn

Xn−

Then Xn+ and Xn− are independent simple symmetric random walks on
2−1/2 Z and Xn = 0 if and only if Xn+ = 0 = Xn− . This makes it clear that
for Xn we have
    2
2n
(2n) 2n 1 2
p00 = ∼ 2 as n → ∞
n 2 A n

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


32 1. Discrete-time Markov chains
∞ (n) ∞
by Stirling’s formula. Then n=1 p00 = ∞ by comparison with n=1 1/n
and the walk is recurrent.

Example 1.6.3 (Simple symmetric random walk on Z3 )


The transition probabilities of the simple symmetric random walk on Z3
are given by

1/6 if |i − j| = 1
pij =
0 otherwise.
Thus the chain jumps to each of its nearest neighbours with equal probabil-
ity. Suppose we start at 0. We can only return to 0 after an even number
2n of steps. Of these 2n steps there must be i up, i down, j north, j south,
k east and k west for some i, j, k ≥ 0, with i + j + k = n. By counting the
ways in which this can be done, we obtain


 2n    2n
 2  2n
(2n) (2n)! 1 2n 1 n 1
p00 = = .
i , j , k ≥0
(i!j!k!)2 6 n 2 i , j , k ≥0
ijk 3
i +j +k =n i +j +k =n

Now    n

n 1
=1
i , j , k ≥0
ij k 3
i +j +k =n

the left-hand side being the total probability of all the ways of placing n
balls randomly into three boxes. For the case where n = 3m, we have
   
n n! n
= ≤
ij k i!j!k! mmm

for all i, j, k, so

   2n    n  3/2
(2n) 2n 1 n 1 1 6
p00 ≤ ∼ 3
as n → ∞
n 2 mmm 3 2A n

∞ (6m)
by Stirling’s formula. Hence, m=0 p00 < ∞ by comparison with
∞ −3/2 (6m) 2 (6m−2) (6m) 4 (6m−4)
n=0 n . But p 00 ≥ (1/6) p 00 and p00 ≥ (1/6) p00 for
all m so we must have


(n)
p00 < ∞
n=0

and the walk is transient.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.7 Invariant distributions 33

Exercises
1.6.1 The rooted binary tree is an infinite graph T with one distinguished
vertex R from which comes a single edge; at every other vertex there are
three edges and there are no closed loops. The random walk on T jumps
from a vertex along each available edge with equal probability. Show that
the random walk is transient.

1.6.2 Show that the simple symmetric random walk in Z4 is transient.

1.7 Invariant distributions

Many of the long-time properties of Markov chains are connected with the
notion of an invariant distribution or measure. Remember that a measure
λ is any row vector (λi : i ∈ I) with non-negative entries. We say λ is
invariant if
λP = λ.

The terms equilibrium and stationary are also used to mean the same. The
first result explains the term stationary.

Theorem 1.7.1. Let (Xn )n≥0 be Markov(λ, P ) and suppose that λ is in-
variant for P . Then (Xm+n )n≥0 is also Markov(λ, P ).

Proof. By Theorem 1.1.3, P(Xm = i) = (λP m )i = λi for all i and, clearly,


conditional on Xm+n = i, Xm+n+1 is independent of Xm , Xm+1 , . . . , Xm+n
and has distribution (pij : j ∈ I).

The next result explains the term equilibrium.

Theorem 1.7.2. Let I be finite. Suppose for some i ∈ I that

(n)
pij → πj as n → ∞ for all j ∈ I.

Then π = (πj : j ∈ I) is an invariant distribution.

Proof. We have


(n)

(n)
πj = lim pij = lim pij = 1
n→∞ n→∞
j∈I j∈I j∈I

and

(n)

(n)

(n)

πj = lim pij = lim pik pkj = lim pik pkj = πk pkj


n→∞ n→∞ n→∞
k∈I k∈I k∈I

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


34 1. Discrete-time Markov chains

where we have used finiteness of I to justify interchange of summation and


limit operations. Hence π is an invariant distribution.

Notice that for any of the random walks discussed in Section 1.6 we have
(n)
pij → 0 as n → ∞ for all i, j ∈ I. The limit is certainly invariant, but it
is not a distribution!
Theorem 1.7.2 is not a very useful result but it serves to indicate a rela-
tionship between invariant distributions and n-step transition probabilities.
In Theorem 1.8.3 we shall prove a sort of converse, which is much more
useful.

Example 1.7.3
Consider the two-state Markov chain with transition matrix
 
1−α α
P = .
β 1−β

Ignore the trivial cases α = β = 0 and α = β = 1. Then, by Example 1.1.4


 
β/(α + β) α/(α + β)
P →
n
as n → ∞,
β/(α + β) α/(α + β)

so, by Theorem 1.7.2, the distribution (β/(α + β), α/(α + β)) must be
invariant. There are of course easier ways to discover this.

Example 1.7.4
Consider the Markov chain (Xn )n≥0 with diagram

1
2 1

3 1 2
2

To find an invariant distribution we write down the components of the


vector equation πP = π

π1 = 12 π3
π2 = π1 + 12 π2
π3 = 12 π2 + 12 π3 .

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.7 Invariant distributions 35

In terms of the chain, the right-hand sides give the probabilities for X1 ,
when X0 has distribution π, and the equations require X1 also to have
distribution π. The equations are homogeneous so one of them is redundant,
and another equation is required to fix π uniquely. That equation is

π1 + π2 + π3 = 1
 
and we find that π = 1/5, 2/5, 2/5 .
According to Example 1.1.6
(n)
p11 → 1/5 as n → ∞
(n)
so this confirms Theorem 1.7.2. Alternatively, knowing that p11 had the
form  n 
(n) 1 nπ nπ 
p11 = a + b cos + c sin
2 2 2
we could have used Theorem 1.7.2 and knowledge of π1 to identify a = 1/5,
(2)
instead of working out p11 in Example 1.1.6.

In the next two results we shall show that every irreducible and recurrent
stochastic matrix P has an essentially unique positive invariant measure.
The proofs rely heavily on the probabilistic interpretation so it is worth
noting at the outset that, for a finite state-space I, the existence of an
invariant row vector is a simple piece of linear algebra: the row sums of P
are all 1, so the column vector of ones is an eigenvector with eigenvalue 1,
so P must have a row eigenvector with eigenvalue 1.
For a fixed state k, consider for each i the expected time spent in i between
visits to k:
k −1
T

γik = Ek 1{Xn =i} .


n=0

Here the sum of indicator functions serves to count the number of times n
at which Xn = i before the first passage time Tk .

Theorem 1.7.5. Let P be irreducible and recurrent. Then


(i) γkk = 1;
(ii) γ k = (γik : i ∈ I) satisfies γ k P = γ k ;
(iii) 0 < γik < ∞ for all i ∈ I.

Proof. (i) This is obvious. (ii) For n = 1, 2, . . . the event {n ≤ Tk } depends


only on X0 , X1 , . . . , Xn−1 , so, by the Markov property at n − 1

Pk (Xn−1 = i, Xn = j and n ≤ Tk ) = Pk (Xn−1 = i and n ≤ Tk )pij .

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


36 1. Discrete-time Markov chains

Since P is recurrent, under Pk we have Tk < ∞ and X0 = XTk = k with


probability one. Therefore

Tk ∞

γjk = Ek 1{Xn =j} = Ek 1{Xn =j and n≤Tk }


n=1 n=1

= Pk (Xn = j and n ≤ Tk )
n=1

= Pk (Xn−1 = i, Xn = j and n ≤ Tk )
i∈I n=1


= pij Pk (Xn−1 = i and n ≤ Tk )


i∈I n=1


= pij Ek 1{Xm =i and m≤Tk −1}


i∈I m=0


k −1
T

= pij Ek 1{Xm =i} = γik pij .


i∈I m=0 i∈I

(iii) Since P is irreducible, for each state i there exist n, m ≥ 0 with


(n) (m) (m) (n)
pik , pki > 0. Then γik ≥ γkk pki > 0 and γik pik ≤ γkk = 1 by (i) and
(ii).
Theorem 1.7.6. Let P be irreducible and let λ be an invariant measure
for P with λk = 1. Then λ ≥ γ k . If in addition P is recurrent, then λ = γ k .
Proof. For each j ∈ I we have

λj = λi1 pi1 j = λi1 pi1 j + pkj


i1 ∈I i1 =k



= λi2 pi2 i1 pi1 j + pkj + pki1 pi1 j
i1 ,i2 =k i1 =k
..
.

= λin pin in −1 . . . pi1 j


i1 ,... ,in =k



+ pkj + pki1 pi1 j + . . . + pkin −1 . . . pi2 i1 pi1 j
i1 =k i1 ,... ,in −1 =k

So for j = k we obtain
λj ≥ Pk (X1 = j and Tk ≥ 1) + Pk (X2 = j and Tk ≥ 2)
+ . . . + Pk (Xn = j and Tk ≥ n)
→ γjk as n → ∞.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.7 Invariant distributions 37

So λ ≥ γ k . If P is recurrent, then γ k is invariant by Theorem 1.7.5, so


µ = λ − γ k is also invariant and µ ≥ 0. Since P is irreducible, given i ∈ I,
(n)  (n) (n)
we have pik > 0 for some n, and 0 = µk = j∈I µj pjk ≥ µi pik , so
µi = 0.

Recall that a state i is recurrent if

Pi (Xn = i for infinitely many n) = 1

and we showed in Theorem 1.5.3 that this is equivalent to

Pi (Ti < ∞) = 1.

If in addition the expected return time

mi = Ei (Ti )

is finite, then we say i is positive recurrent. A recurrent state which fails to


have this stronger property is called null recurrent.

Theorem 1.7.7. Let P be irreducible. Then the following are equivalent:


(i) every state is positive recurrent;
(ii) some state i is positive recurrent;
(iii) P has an invariant distribution, π say.
Moreover, when (iii) holds we have mi = 1/πi for all i.

Proof. (i) ⇒ (ii) This is obvious.


(ii) ⇒ (iii) If i is positive recurrent, it is certainly recurrent, so P is recur-
rent. By Theorem 1.7.5, γ i is then invariant. But

γji = mi < ∞
j∈I

so πj = γji /mi defines an invariant distribution.



(iii) ⇒ (i) Take any state k. Since P is irreducible and i∈I πi = 1 we have
 (n)
πk = i∈I πi pik > 0 for some n. Set λi = πi /πk . Then λ is an invariant
measure with λk = 1. So by Theorem 1.7.6, λ ≥ γ k . Hence


πi 1
mk = γik ≤ = <∞ (1.7)
πk πk
i∈I i∈I

and k is positive recurrent.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


38 1. Discrete-time Markov chains

To complete the proof we return to the argument for (iii) ⇒ (i) armed
with the knowledge that P is recurrent, so λ = γ k and the inequality (1.7)
is in fact an equality.

Example 1.7.8 (Simple symmetric random walk on Z)


The simple symmetric random walk on Z is clearly irreducible and, by
Example 1.6.1, it is also recurrent. Consider the measure

πi = 1 for all i.

Then
πi = 12 πi−1 + 12 πi+1

so π is invariant. Now Theorem 1.7.6 forces any invariant measure to be



a scalar multiple of π. Since i∈Z πi = ∞, there can be no invariant
distribution and the walk is therefore null recurrent, by Theorem 1.7.7.

Example 1.7.9
The existence of an invariant measure does not guarantee recurrence: con-
sider, for example, the simple symmetric random walk on Z3 , which is
transient by Example 1.6.3, but has invariant measure π given by πi = 1
for all i.

Example 1.7.10
Consider the asymmetric random walk on Z with transition probabilities
pi,i−1 = q < p = pi,i+1 . In components the invariant measure equation
πP = π reads
πi = πi−1 p + πi+1 q.

This is a recurrence relation for π with general solution

πi = A + B(p/q)i .

So, in this case, there is a two-parameter family of invariant measures –


uniqueness up to scalar multiples does not hold.

Example 1.7.11
Consider a success-run chain on Z+ , whose transition probabilities are given
by
pi,i+1 = pi , pi0 = qi = 1 − pi .

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.7 Invariant distributions 39

Then the components of the invariant measure equation πP = π read

π0 = qi πi ,
i=0
πi = pi−1 πi−1 , for i ≥ 1.

Suppose we choose pi converging sufficiently rapidly to 1 so that



p= pi > 0.
i=0

Then for any invariant measure π we have

π0 = (1 − pi )pi−1 . . . p0 π0 = (1 − p)π0 .
i=0

This equation forces either π0 = 0 or π0 = ∞, so there is no non-zero


invariant measure.

Exercises

1.7.1 Find all invariant distributions of the transition matrix in Exercise


1.2.1.

1.7.2 Gas molecules move about randomly in a box which is divided into two
halves symmetrically by a partition. A hole is made in the partition. Sup-
pose there are N molecules in the box. Show that the number of molecules
on one side of the partition just after a molecule has passed through the hole
evolves as a Markov chain. What are the transition probabilities? What is
the invariant distribution of this chain?

1.7.3 A particle moves on the eight vertices of a cube in the following


way: at each step the particle is equally likely to move to each of the three
adjacent vertices, independently of its past motion. Let i be the initial
vertex occupied by the particle, o the vertex opposite i. Calculate each of
the following quantities:

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


40 1. Discrete-time Markov chains

(i) the expected number of steps until the particle returns to i;


(ii) the expected number of visits to o until the first return to i;
(iii) the expected number of steps until the first visit to o.

1.7.4 Let (Xn )n≥0 be a simple random walk on Z with pi,i−1 = q < p =
pi,i+1 . Find
T −1 

γi = E0
0
1{Xn =i}
n=0

and verify that


γi0 = inf λi for all i
λ

where the infimum is taken over all invariant measures λ with λ0 = 1.


(Compare with Theorem 1.7.6 and Example 1.7.10.)

1.7.5 Let P be a stochastic matrix on a finite set I. Show that a distribution


π is invariant for P if and only if π(I −P +A) = a, where A = (aij : i, j ∈ I)
with aij = 1 for all i and j, and a = (ai : i ∈ I) with ai = 1 for all i. Deduce
that if P is irreducible then I −P +A is invertible. Note that this enables one
to compute the invariant distribution by any standard method of inverting
a matrix .

1.8 Convergence to equilibrium

We shall investigate the limiting behaviour of the n-step transition proba-


(n)
bilities pij as n → ∞. As we saw in Theorem 1.7.2, if the state-space is
finite and if for some i the limit exists for all j, then it must be an invariant
distribution. But, as the following example shows, the limit does not always
exist.

Example 1.8.1
Consider the two-state chain with transition matrix
 
0 1
P = .
1 0
(n)
Then P 2 = I, so P 2n = I and P 2n+1 = P for all n. Thus pij fails to
converge for all i, j.
(n)
Let us call a state i aperiodic if pii > 0 for all sufficiently large n. We
leave it as an exercise to show that i is aperiodic if and only if the set
(n)
{n ≥ 0 : pii > 0} has no common divisor other than 1. This is also
a consequence of Theorem 1.8.4. The behaviour of the chain in Example
1.8.1 is connected with its periodicity.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.8 Convergence to equilibrium 41

Lemma 1.8.2. Suppose P is irreducible and has an aperiodic state i.


(n)
Then, for all states j and k, pjk > 0 for all sufficiently large n. In particular,
all states are aperiodic.
(r) (s)
Proof. There exist r, s ≥ 0 with pji , pik > 0. Then

(r+n+s) (r) (n) (s)


pjk ≥ pji pii pik > 0

for all sufficiently large n.


Here is the main result of this section. The method of proof, by coupling
two Markov chains, is ingenious.
Theorem 1.8.3 (Convergence to equilibrium). Let P be irreducible
and aperiodic, and suppose that P has an invariant distribution π. Let λ
be any distribution. Suppose that (Xn )n≥0 is Markov(λ, P ). Then

P(Xn = j) → πj as n → ∞ for all j.

In particular,
(n)
pij → πj as n → ∞ for all i, j.

Proof. We use a coupling argument. Let (Yn )n≥0 be Markov(π, P ) and


independent of (Xn )n≥0 . Fix a reference state b and set

T = inf{n ≥ 1 : Xn = Yn = b}.

Step 1. We show P(T < ∞) = 1. The process Wn = (Xn , Yn ) is a Markov


chain on I × I with transition probabilities

p(i,k)(j,l) = pij pkl

and initial distribution


µ(i,k) = λi πk .
Since P is aperiodic, for all states i, j, k, l we have
(n) (n) (n)
p(i,k)(j,l) = pij pkl > 0

for all sufficiently large n; so P is irreducible. Also, P has an invariant


distribution given by
(i,k) = πi πk
π

so, by Theorem 1.7.7, P is positive recurrent. But T is the first passage


time of Wn to (b, b) so P(T < ∞) = 1, by Theorem 1.5.7.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


42 1. Discrete-time Markov chains

Step 2. Set 
Xn if n < T
Zn =
Yn if n ≥ T .
The diagram below illustrates the idea. We show that (Zn )n≥0 is
Markov(λ, P ).
I

Zn
*** *
* ** **
Xn ** ** ** * * ** *** **
** ** ** * * ** **
* ** ** *
** **
b *
T n
Yn

The strong Markov property applies to (Wn )n≥0 at time T , so


(XT +n , YT +n )n≥0 is Markov(δ(b,b), P) and independent of (X0 , Y0 ),
(X1 , Y1 ), . . . , (XT , YT ). By symmetry, we can replace the process
(XT +n , YT +n )n≥0 by (YT +n , XT +n )n≥0 which is also Markov(δ(b,b) , P ) and
remains independent of (X0 , Y0 ), (X1 , Y1 ), . . . , (XT , YT ). Hence Wn =
(Zn , Zn ) is Markov(µ, P) where

Yn if n < T
Zn =
Xn if n ≥ T .

In particular, (Zn )n≥0 is Markov(λ, P ).

Step 3. We have

P(Zn = j) = P(Xn = j and n < T ) + P(Yn = j and n ≥ T )

so

|P(Xn = j) − πj | = |P(Zn = j) − P(Yn = j)|


= |P(Xn = j and n < T ) − P(Yn = j and n < T )|
≤ P(n < T )

and P(n < T ) → 0 as n → ∞.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.8 Convergence to equilibrium 43

To understand this proof one should see what goes wrong when P is
not aperiodic. Consider the two-state chain of Example 1.8.1 which has
(1/2, 1/2) as its unique invariant distribution. We start (Xn )n≥0 from 0
and (Yn )n≥0 with equal probability from 0 or 1. However, if Y0 = 1, then,
because of periodicity, (Xn )n≥0 and (Yn )n≥0 will never meet, and the proof
fails. We move on now to the cases that were excluded in the last theorem,
where (Xn )n≥0 is periodic or transient or null recurrent. The remainder of
this section might be omitted on a first reading.

Theorem 1.8.4. Let P be irreducible. There is an integer d ≥ 1 and a


partition
I = C0 ∪ C1 ∪ . . . ∪ Cd−1
such that (setting Cnd+r = Cr )
(n)
(i) pij > 0 only if i ∈ Cr and j ∈ Cr+n for some r;
(nd)
(ii) pij > 0 for all sufficiently large n, for all i, j ∈ Cr , for all r.
(n)
Proof. Fix a state k and consider S = {n ≥ 0 : pkk > 0}. Choose n1 , n2 ∈ S
with n1 < n2 and such that d := n2 − n1 is as small as possible. (Here and
throughout we use the symbol := to mean ‘defined to equal’.) Define for
r = 0, . . . , d − 1
(nd+r)
Cr = {i ∈ I : pki > 0 for some n ≥ 0}.

(nd+r)
Then C0 ∪ . . . ∪ Cd−1 = I, by irreducibility. Moreover, if pki > 0 and
(nd+s)
pki > 0 for some r, s ∈ {0, 1, . . . , d − 1}, then, choosing m ≥ 0 so that
(m) (nd+r+m) (nd+s+m)
pik > 0, we have pkk > 0 and pkk > 0 so r = s by minimality
of d. Hence we have a partition.
(n) (md+r)
To prove (i) suppose pij > 0 and i ∈ Cr . Choose m so that pki > 0,
(md+r+n)
then pkj > 0 so j ∈ Cr+n as required. By taking i = j = k we now
see that d must divide every element of S, in particular n1 .
Now for nd ≥ n21 , we can write nd = qn1 + r for integers q ≥ n1 and
0 ≤ r ≤ n1 − 1. Since d divides n1 we then have r = md for some integer
m and then nd = (q − m)n1 + mn2 . Hence

(nd) (n ) (n )
pkk ≥ (pkk1 )q−m (pkk2 )m > 0

and hence nd ∈ S. To prove (ii) for i, j ∈ Cr choose m1 and m2 so that


(m ) (m )
pik 1 > 0 and pkj 2 > 0, then

(m1 +nd+m2 ) (m1 ) (nd) (m2 )


pij ≥ pik pkk pkj >0

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


44 1. Discrete-time Markov chains

whenever nd ≥ n21 . Since m1 + m2 is then necessarily a multiple of d, we


are done.

We call d the period of P . The theorem just proved shows in particular for
(n)
all i ∈ I that d is the greatest common divisor of the set {n ≥ 0 : pii > 0}.
This is sometimes useful in identifying d.
Finally, here is a complete description of limiting behaviour for irre-
ducible chains. This generalizes Theorem 1.8.3 in two respects since we
require neither aperiodicity nor the existence of an invariant distribution.
The argument we use for the null recurrent case was discovered recently by
B. Fristedt and L. Gray.

Theorem 1.8.5. Let P be irreducible of period d and let C0 , C1 , . . . , Cd−1


be the partition obtained in Theorem 1.8.4. Let λ be a distribution with

i∈C0 λi = 1. Suppose that (Xn )n≥0 is Markov(λ, P ). Then for r =
0, 1, . . . , d − 1 and j ∈ Cr we have

P(Xnd+r = j) → d/mj as n → ∞

where mj is the expected return time to j. In particular, for i ∈ C0 and


j ∈ Cr we have
(nd+r)
pij → d/mj as n → ∞.

Proof

Step 1. We reduce to the aperiodic case. Set ν = λP r , then by Theorem


1.8.4 we have

νi = 1.
i∈Cr

Set Yn = Xnd+r , then (Yn )n≥0 is Markov(ν, P d ) and, by Theorem 1.8.4, P d


is irreducible and aperiodic on Cr . For j ∈ Cr the expected return time of
(Yn )n≥0 to j is mj /d. So if the theorem holds in the aperiodic case, then

P(Xnd+r = j) = P(Yn = j) → d/mj as n → ∞

so the theorem holds in general.

Step 2. Assume that P is aperiodic. If P is positive recurrent then 1/mj =


πj , where π is the unique invariant distribution, so the result follows from
Theorem 1.8.3. Otherwise mj = ∞ and we have to show that

P(Xn = j) → 0 as n → ∞.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.8 Convergence to equilibrium 45

If P is transient this is easy and we are left with the null recurrent
case.

Step 3. Assume that P is aperiodic and null recurrent. Then



Pj (Tj > k) = Ej (Tj ) = ∞.


k=0

Given ε > 0 choose K so that

K−1
2
Pj (Tj > k) ≥ .
ε
k=0

Then, for n ≥ K − 1

n
1≥ P(Xk = j and Xm = j for m = k + 1, . . . , n)
k=n−K+1

n
= P(Xk = j)Pj (Tj > n − k)
k=n−K+1

K−1
= P(Xn−k = j)Pj (Tj > k)
k=0

so we must have P(Xn−k = j) ≤ ε/2 for some k ∈ {0, 1, . . . , K − 1}.


Return now to the coupling argument used in Theorem 1.8.3, only now let
(Yn )n≥0 be Markov(µ, P ), where µ is to be chosen later. Set Wn = (Xn , Yn ).
As before, aperiodicity of (Xn )n≥0 ensures irreducibility of (Wn )n≥0 . If
(Wn )n≥0 is transient then, on taking µ = λ, we obtain
 
P(Xn = j)2 = P Wn = (j, j) → 0

as required. Assume then that (Wn )n≥0 is recurrent. Then, in the notation
of Theorem 1.8.3, we have P(T < ∞) = 1 and the coupling argument shows
that
|P(Xn = j) − P(Yn = j)| → 0 as n → ∞.

We exploit this convergence by taking µ = λP k for k = 1, . . . , K − 1, so


that P(Yn = j) = P(Xn+k = j). We can find N such that for n ≥ N and
k = 1, . . . , K − 1,
ε
|P(Xn = j) − P(Xn+k = j)| ≤ .
2

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


46 1. Discrete-time Markov chains

But for any n we can find k ∈ {0, 1, . . . , K − 1} such that P(Xn+k = j) ≤


ε/2. Hence, for n ≥ N
P(Xn = j) ≤ ε.
Since ε > 0 was arbitrary, this shows that P(Xn = j) → 0 as n → ∞, as
required.

Exercises
1.8.1 Prove the claims (e), (f) and (g) made in example (v) of the Intro-
duction.

1.8.2 Find the invariant distributions of the transition matrices in Exercise


1.1.7, parts (a), (b) and (c), and compare them with your answers there.

1.8.3 A fair die is thrown repeatedly. Let Xn denote the sum of the first n
throws. Find
lim P(Xn is a multiple of 13)
n→∞

quoting carefully any general theorems that you use.

1.8.4 Each morning a student takes one of the three books he owns from
his shelf. The probability that he chooses book i is αi , where 0 < αi < 1 for
i = 1, 2, 3, and choices on successive days are independent. In the evening
he replaces the book at the left-hand end of the shelf. If pn denotes the
probability that on day n the student finds the books in the order 1,2,3,
from left to right, show that, irrespective of the initial arrangement of the
books, pn converges as n → ∞, and determine the limit.

1.8.5 (Renewal theorem). Let Y1 , Y2 , . . . be independent, identically


distributed random variables with values in {1, 2, . . . }. Suppose that the
set of integers
{n : P(Y1 = n) > 0}
has greatest common divisor 1. Set µ = E(Y1 ). Show that the following
process is a Markov chain:

Xn = inf{m ≥ n : m = Y1 + . . . + Yk for some k ≥ 0} − n.

Determine
lim P(Xn = 0)
n→∞

and hence show that as n → ∞

P(n = Y1 + . . . + Yk for some k ≥ 0) → 1/µ.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.9 Time reversal 47

(Think of Y1 , Y2 , . . . as light-bulb lifetimes. A bulb is replaced when it fails.


Thus the limiting probability that a bulb is replaced at time n is 1/µ. Al-
though this appears to be a very special case of convergence to equilibrium,
one can actually recover the full result by applying the renewal theorem to
(1) (2)
the excursion lengths Si , Si , . . . from state i.)

1.9 Time reversal

For Markov chains, the past and future are independent given the present.
This property is symmetrical in time and suggests looking at Markov chains
with time running backwards. On the other hand, convergence to equilib-
rium shows behaviour which is asymmetrical in time: a highly organised
state such as a point mass decays to a disorganised one, the invariant dis-
tribution. This is an example of entropy increasing. It suggests that if
we want complete time-symmetry we must begin in equilibrium. The next
result shows that a Markov chain in equilibrium, run backwards, is again a
Markov chain. The transition matrix may however be different.

Theorem 1.9.1. Let P be irreducible and have an invariant distribution


π. Suppose that (Xn )0≤n≤N is Markov(π, P ) and set Yn = XN −n . Then
(Yn )0≤n≤N is Markov(π, P), where P = (
pij ) is given by

πj pji = πi pij for all i, j

and P is also irreducible with invariant distribution π.

Proof. First we check that P is a stochastic matrix:



1

pji = πi pij = 1
πj
i∈I i∈I

since π is invariant for P . Next we check that π is invariant for P :



πj pji = πi pij = πi
j∈I j∈I

since P is a stochastic matrix.


We have

P(Y0 = i1 , Y1 = i2 , . . . , YN = iN )
= P(X0 = iN , X1 = iN −1 , . . . , XN = i1 )
= πiN piN iN −1 . . . pi2 i1 = πi1 pi1 i2 . . . piN −1 iN

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


48 1. Discrete-time Markov chains

so, by Theorem 1.1.1, (Yn )0≤n≤N is Markov(π, P ). Finally, since P is


irreducible, for each pair of states i, j there is a chain of states i1 =
i, i2 , . . . , in−1 , in = j with pi1 i2 . . . pin −1 in > 0. Then
pin in −1 . . . pi2 i1 = πi1 pi1 i2 . . . pin −1 in /πin > 0
so P is also irreducible.
The chain (Yn )0≤n≤N is called the time-reversal of (Xn )0≤n≤N .
A stochastic matrix P and a measure λ are said to be in detailed balance
if
λi pij = λj pji for all i, j.
Though obvious, the following result is worth remembering because, when
a solution λ to the detailed balance equations exists, it is often easier to
find by the detailed balance equations than by the equation λ = λP .
Lemma 1.9.2. If P and λ are in detailed balance, then λ is invariant for
P.
 
Proof. We have (λP )i = j∈I λj pji = j∈I λi pij = λi .
Let (Xn )n≥0 be Markov(λ, P ), with P irreducible. We say that (Xn )n≥0
is reversible if, for all N ≥ 1, (XN −n )0≤n≤N is also Markov(λ, P ).
Theorem 1.9.3. Let P be an irreducible stochastic matrix and let λ be
a distribution. Suppose that (Xn )n≥0 is Markov(λ, P ). Then the following
are equivalent:
(a) (Xn )n≥0 is reversible;
(b) P and λ are in detailed balance.
Proof. Both (a) and (b) imply that λ is invariant for P . Then both (a) and
(b) are equivalent to the statement that P = P in Theorem 1.9.1.
We begin a collection of examples with a chain which is not reversible.
Example 1.9.4
Consider the Markov chain with diagram:
1

2 2
3 3
1 1
3 3
1
3

3 2
2
3

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.9 Time reversal 49

The transition matrix is


 
0 2/3 1/3

P = 1/3 0 2/3 
2/3 1/3 0

and π = (1/3, 1/3, 1/3) is invariant. Hence P = P T , the transpose of P .


But P is not symmetric, so P = P and this chain is not reversible. A
patient observer would see the chain move clockwise in the long run: under
time-reversal the clock would run backwards!

Example 1.9.5

Consider the Markov chain with diagram:

p q p q
0 1 i−1 i i+1 M −1 M

where 0 < p = 1 − q < 1. The non-zero detailed balance equations read

λi pi,i+1 = λi+1 pi+1,i for i = 0, 1, . . . , M − 1.

So a solution is given by

 
λ = (p/q)i : i = 0, 1, . . . , M

and this may be normalised to give a distribution in detailed balance with


P . Hence this chain is reversible.
If p were much larger than q, one might argue that the chain would tend
to move to the right and its time-reversal to the left. However, this ignores
the fact that we reverse the chain in equilibrium, which in this case would
be heavily concentrated near M . An observer would see the chain spending
most of its time near M and making occasional brief forays to the left,
which behaviour is symmetrical in time.

Example 1.9.6 (Random walk on a graph)

A graph G is a countable collection of states, usually called vertices, some


of which are joined by edges, for example:

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


50 1. Discrete-time Markov chains

1 2

4 3

Thus a graph is a partially drawn Markov chain diagram. There is a natural


way to complete the diagram which gives rise to the random walk on G.
The valency vi of vertex i is the number of edges at i. We have to assume
that every vertex has finite valency. The random walk on G picks edges
with equal probability:

1 1
1 2 3 2
1
3
1 1
2 3

1 1 1
3 3 2

4 1 1 3
3 2

Thus the transition probabilities are given by



1/vi if (i, j) is an edge
pij =
0 otherwise.
We assume G is connected, so that P is irreducible. It is easy to see that
P is in detailed balance with v = (vi : i ∈ G). So, if the total valency

σ = i∈G vi is finite, then π = v/σ is invariant and P is reversible.
Example 1.9.7 (Random chessboard knight)
A random knight makes each permissible move with equal probability. If it
starts in a corner, how long on average will it take to return?
This is an example of a random walk on a graph: the vertices are the
squares of the chessboard and the edges are the moves that the knight can
take:

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.9 Time reversal 51

The diagram shows a part of the graph. We know by Theorem 1.7.7 and
the preceding example that

Ec (Tc ) = 1/πc = (vi /vc )


i

so all we have to do is identify valencies. The four corner squares have


valency 2, and the eight squares adjacent to the corners have valency 3.
There are 20 squares of valency 4, 16 of valency 6, and the 16 central
squares have valency 8. Hence

8 + 24 + 80 + 96 + 128
Ec (Tc ) = = 168.
2

Alternatively, if you enjoy solving sets of 64 simultaneous linear equations,


you might try finding π from πP = π, or calculating Ec (Tc ) using Theorem
1.3.5!

Exercises
1.9.1 In each of the following cases determine whether the stochastic matrix
P , which you may assume is irreducible, is reversible:
 
  0 p 1−p
1−p p
(a) ; (b)  1 − p 0 p  ;
q 1−q
p 1−p 0

(c) I = {0, 1, . . . , N } and pij = 0 if |j − i| ≥ 2 ;

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


52 1. Discrete-time Markov chains

(d) I = {0, 1, 2, . . . } and p01 = 1, pi,i+1 = p, pi,i−1 = 1 − p for i ≥ 1;


(e) pij = pji for all i, j ∈ S.

1.9.2 Two particles X and Y perform independent random walks on the


graph shown in the diagram. So, for example, a particle at A jumps to B,
C or D with equal probability 1/3.

A B

Find the probability that X and Y ever meet at a vertex in the following
cases:
(a) X starts at A and Y starts at B;
(b) X starts at A and Y starts at E.
For I = B, D let MI denote the expected time, when both X and Y start
at I, until they are once again both at I. Show that 9MD = 16MB .

1.10 Ergodic theorem

Ergodic theorems concern the limiting behaviour of averages over time.


We shall prove a theorem which identifies for Markov chains the long-run
proportion of time spent in each state. An essential tool is the following
ergodic theorem for independent random variables which is a version of the
strong law of large numbers.

Theorem 1.10.1 (Strong law of large numbers). Let Y1 , Y2 , . . . be


a sequence of independent, identically distributed, non-negative random

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.10 Ergodic theorem 53

variables with E(Y1 ) = µ. Then


 
Y1 + . . . + Yn
P → µ as n → ∞ = 1.
n

Proof. A proof for the case µ < ∞ may be found, for example, in Probability
with Martingales by David Williams (Cambridge University Press, 1991).
(N )
The case where µ = ∞ is a simple deduction. Fix N < ∞ and set Yn =
Yn ∧ N . Then
(N ) (N )
Y1 + . . . + Yn Y + . . . + Yn
≥ 1 → E(Y1 ∧ N ) as n → ∞
n n
with probability one. As N ↑ ∞ we have E(Y1 ∧ N ) ↑ µ by monotone
convergence (see Section 6.4). So we must have, with probability 1
Y1 + . . . + Yn
→∞ as n → ∞.
n

We denote by Vi (n) the number of visits to i before n:

n−1
Vi (n) = 1{Xk =i} .
k=0

Then Vi (n)/n is the proportion of time before n spent in state i. The


following result gives the long-run proportion of time spent by a Markov
chain in each state.

Theorem 1.10.2 (Ergodic theorem). Let P be irreducible and let λ


be any distribution. If (Xn )n≥0 is Markov(λ, P ) then
 
Vi (n) 1
P → as n → ∞ = 1
n mi

where mi = Ei (Ti ) is the expected return time to state i. Moreover, in the


positive recurrent case, for any bounded function f : I → R we have
 n−1 
1

P f (Xk ) → f as n → ∞ = 1
n
k=0

where

f= πi fi
i∈I

and where (πi : i ∈ I) is the unique invariant distribution.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


54 1. Discrete-time Markov chains

Proof. If P is transient, then, with probability 1, the total number Vi of


visits to i is finite, so

Vi (n) Vi 1
≤ →0= .
n n mi

Suppose then that P is recurrent and fix a state i. For T = Ti we have


P(T < ∞) = 1 by Theorem 1.5.7 and (XT +n )n≥0 is Markov(δi , P ) and
independent of X0 , X1 , . . . , XT by the strong Markov property. The long-
run proportion of time spent in i is the same for (XT +n )n≥0 and (Xn )n≥0 ,
so it suffices to consider the case λ = δi .
(r)
Write Si for the length of the rth excursion to i, as in Section 1.5. By
(1) (2)
Lemma 1.5.1, the non-negative random variables Si , Si , . . . are indepen-
(r)
dent and identically distributed with Ei (Si ) = mi . Now

(1) (Vi (n)−1)


Si + . . . + Si ≤ n − 1,

the left-hand side being the time of the last visit to i before n. Also
(1) (Vi (n))
Si + . . . + Si ≥ n,

the left-hand side being the time of the first visit to i after n − 1. Hence
(1) (Vi (n)−1) (1) (Vi (n))
Si + . . . + Si n S + . . . + Si
≤ ≤ i . (1.8)
Vi (n) Vi (n) Vi (n)

By the strong law of large numbers


 
(1) (n)
S i + . . . + Si
P → mi as n → ∞ = 1
n

and, since P is recurrent

P(Vi (n) → ∞ as n → ∞) = 1.

So, letting n → ∞ in (1.8), we get


 
n
P → mi as n → ∞ = 1,
Vi (n)

which implies  
Vi (n) 1
P → as n → ∞ = 1.
n mi

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.10 Ergodic theorem 55

Assume now that (Xn )n≥0 has an invariant distribution (πi : i ∈ I). Let
f : I → R be a bounded function and assume without loss of generality that
|f | ≤ 1. For any J ⊆ I we have

n−1
 Vi (n) 
f (Xk ) − f = − πi fi
n n
k=0 i∈I

Vi (n)
Vi (n)
≤ − πi + − πi
n n
i∈J i∈J

Vi (n)
 Vi (n) 
≤ − πi + + πi
n n
i∈J i∈J

Vi (n)

≤2 − πi + 2 πi .
n
i∈J i∈J

We proved above that


 
Vi (n)
P → πi as n → ∞ for all i = 1.
n

Given ε > 0, choose J finite so that


πi < ε/4
i∈J

and then N = N (ω) so that, for n ≥ N (ω)



Vi (n)
− πi < ε/4.
n
i∈J

Then, for n ≥ N (ω), we have

n−1
f (Xk ) − f < ε,
n
k=0

which establishes the desired convergence.

We consider now the statistical problem of estimating an unknown tran-


sition matrix P on the basis of observations of the corresponding Markov
chain. Consider, to begin, the case where we have N + 1 observations
(Xn )0≤n≤N . The log-likelihood function is given by

l(P ) = log(λX0 pX0 X1 . . . pXN −1 XN ) = Nij log pij


i,j∈I

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


56 1. Discrete-time Markov chains

up to a constant independent of P , where Nij is the number of transitions


from i to j. A standard statistical procedure is to find the maximum likeli-
hood estimate P, which is the choice of P maximizing l(P ). Since P must

satisfy the linear constraint j pij = 1 for each i, we first try to maximize

l(P ) + µi pij
i,j∈I

and then choose (µi : i ∈ I) to fit the constraints. This is the method of


Lagrange multipliers. Thus we find

N −1

N −1
pij = 1{Xn =i,Xn +1 =j} / 1{Xn =i}
n=0 n=0

which is the proportion of jumps from i which go to j.


We now turn to consider the consistency of this sort of estimate, that is
to say whether pij → pij with probability 1 as N → ∞. Since this is clearly
false when i is transient, we shall slightly modify our approach. Note that
to find pij we simply have to maximize

Nij log pij


j∈I

subject to j pij = 1: the other terms and constraints are irrelevant. Sup-
pose then that instead of N + 1 observations we make enough observations
to ensure the chain leaves state i a total of N times. In the transient case
this may involve restarting the chain several times. Denote again by Nij
the number of transitions from i to j.
To maximize the likelihood for (pij : j ∈ I) we still maximize

Nij log pij


j∈I

subject to j pij = 1, which leads to the maximum likelihood estimate

pij = Nij /N.

But Nij = Y1 + . . . + YN , where Yn = 1 if the nth transition from i is to


j, and Yn = 0 otherwise. By the strong Markov property Y1 , . . . , YN are
independent and identically distributed random variables with mean pij .
So, by the strong law of large numbers

P(
pij → pij as N → ∞) = 1,

which shows that pij is consistent.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.11 Appendix: recurrence relations 57

Exercises
1.10.1 Prove the claim (d) made in example (v) of the Introduction.

1.10.2 A professor has N umbrellas. He walks to the office in the morning


and walks home in the evening. If it is raining he likes to carry an um-
brella and if it is fine he does not. Suppose that it rains on each journey
with probability p, independently of past weather. What is the long-run
proportion of journeys on which the professor gets wet?

1.10.3 Let (Xn )n≥0 be an irreducible Markov chain on I having an invariant


distribution π. For J ⊆ I let (Ym )m≥0 be the Markov chain on J obtained
by observing (Xn )n≥0 whilst in J . (See Example 1.4.4.) Show that (Ym )m≥0
is positive recurrent and find its invariant distribution.

1.10.4 An opera singer is due to perform a long series of concerts. Hav-


ing a fine artistic temperament, she is liable to pull out each night with
probability 1/2. Once this has happened she will not sing again until the
promoter convinces her of his high regard. This he does by sending flowers
every day until she returns. Flowers costing x thousand pounds, 0 ≤ x ≤ 1,

bring about a reconciliation with probability x. The promoter stands to
make £750 from each successful concert. How much should he spend on
flowers?

1.11 Appendix: recurrence relations

Recurrence relations often arise in the linear equations associated to Markov


chains. Here is an account of the simplest cases. A more specialized case
was dealt with in Example 1.3.4. In Example 1.1.4 we found a recurrence
relation of the form
xn+1 = axn + b.
We look first for a constant solution xn = x; then x = ax + b, so provided
a = 1 we must have x = b/(1 − a). Now yn = xn − b/(1 − a) satisfies
yn+1 = ayn , so yn = an y0 . Thus the general solution when a = 1 is given
by
xn = Aan + b/(1 − a)
where A is a constant. When a = 1 the general solution is obviously

xn = x0 + nb.

In Example 1.3.3 we found a recurrence relation of the form

axn+1 + bxn + cxn−1 = 0

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


58 1. Discrete-time Markov chains

where a and c were both non-zero. Let us try a solution of the form xn = λn ;
then aλ2 + bλ + c = 0. Denote by α and β the roots of this quadratic. Then

yn = Aαn + Bβ n

is a solution. If α = β then we can solve the equations

x0 = A + B, x1 = Aα + Bβ

so that y0 = x0 and y1 = x1 ; but

a(yn+1 − xn+1 ) + b(yn − xn ) + c(yn−1 − xn−1 ) = 0

for all n, so by induction yn = xn for all n. If α = β = 0, then

yn = (A + nB)αn

is a solution and we can solve

x0 = Aαn , x1 = (A + B)αn

so that y0 = x0 and y1 = x1 ; then, by the same argument, yn = xn for all


n. The case α = β = 0 does not arise. Hence the general solution is given
by 
Aαn + Bβ n if α = β
xn =
(A + nB)αn if α = β.

1.12 Appendix: asymptotics for n!

Our analysis of recurrence and transience for random walks in Section 1.6
rested heavily on the use of the asymptotic relation

n! ∼ A n(n/e)n as n → ∞

for some A ∈ [1, ∞). Here is a derivation.


We make use of the power series expansions for |t| < 1

log(1 + t) = t − 12 t2 + 13 t3 − . . .
log(1 − t) = −t − 12 t2 − 13 t3 − . . . .

By subtraction we obtain
 
1+t
1
log = t + 13 t3 + 15 t5 + . . . .
2 1−t

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press


1.12 Appendix: asymptotics for n! 59

Set An = n!/(nn+1/2 e−n ) and an = log An . Then, by a straightforward


calculation
 
1 1 + (2n + 1)−1
an − an+1 = (2n + 1) log − 1.
2 1 − (2n + 1)−1

By the series expansion written above we have


 
1 1 1 1 1
an − an+1 = (2n + 1) + + + ... − 1
(2n + 1) 3 (2n + 1)3 5 (2n + 1)5
1 1 1 1
= + + ...
3 (2n + 1)2 5 (2n + 1)4
 
1 1 1
≤ + + ...
3 (2n + 1)2 (2n + 1)4
1 1 1 1
= = − .
3 (2n + 1)2 − 1 12n 12(n + 1)

It follows that an decreases and an − 1/(12n) increases as n → ∞. Hence


an → a for some a ∈ [0, ∞) and hence An → A, as n → ∞, where A = ea .

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

You might also like