0% found this document useful (0 votes)

150 views

Discrete Time Markov Chains

1. This chapter introduces discrete-time Markov chains. It defines key concepts like state-space, transition matrix, and Markov property. 2. It lists important theorems that characterize the behavior of Markov chains, such as theorems on hitting times, strong Markov property, recurrence/transience, and convergence to equilibrium. 3. It provides examples to illustrate the definition and shows that the probability of a Markov chain being in a given state after n steps can be calculated from the nth power of the transition matrix.

Uploaded by

Isabella Alvarenga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

150 views

Discrete Time Markov Chains

Uploaded by

Isabella Alvarenga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

1

Discrete-time Markov chains

This chapter is the foundation for all that follows. Discrete-time Markov
chains are deﬁned and their behaviour is investigated. For better orien-
tation we now list the key theorems: these are Theorems 1.3.2 and 1.3.5
on hitting times, Theorem 1.4.2 on the strong Markov property, Theorem
1.5.3 characterizing recurrence and transience, Theorem 1.7.7 on invariant
distributions and positive recurrence. Theorem 1.8.3 on convergence to
equilibrium, Theorem 1.9.3 on reversibility, and Theorem 1.10.2 on long-
run averages. Once you understand these you will understand the basic
theory. Part of that understanding will come from familiarity with exam-
ples, so a large number are worked out in the text. Exercises at the end of
each section are an important part of the exposition.

1.1 Definition and basic properties

Let I be a countable set. Each i ∈ I is called a state and I is called the

state-space. We say that λ = (λi : i ∈ I) is a measure on I if 0 ≤ λi < ∞

for all i ∈ I. If in addition the total mass i∈I λi equals 1, then we call
λ a distribution. We work throughout with a probability space (Ω, F , P).
Recall that a random variable X with values in I is a function X : Ω → I.
Suppose we set

λi = P(X = i) = P({ω : X(ω) = i}).

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

2 1. Discrete-time Markov chains

Then λ deﬁnes a distribution, the distribution of X. We think of X as

modelling a random state which takes the value i with probability λi . There
is a brief review of some basic facts about countable sets and probability
spaces in Chapter 6.
We say that a matrix P = (pij : i, j ∈ I) is stochastic if every row
(pij : j ∈ I) is a distribution. There is a one-to-one correspondence between
stochastic matrices P and the sort of diagrams described in the Introduc-
tion. Here are two examples:

α
1−α α
P = 1 2
β 1−β
β
1
 
0 1 0
P =  0 1/2 1/2 
1
2 1
1/2 0 1/2

3 1 2
2

We shall now formalize the rules for a Markov chain by a deﬁnition in

terms of the corresponding matrix P . We say that (Xn )n≥0 is a Markov
chain with initial distribution λ and transition matrix P if
(i) X0 has distribution λ;
(ii) for n ≥ 0, conditional on Xn = i, Xn+1 has distribution (pij : j ∈ I)
and is independent of X0 , . . . , Xn−1 .
More explicitly, these conditions state that, for n ≥ 0 and i1 , . . . , in+1 ∈ I,
(i) P(X0 = i1 ) = λi1 ;
(ii) P(Xn+1 = in+1 | X0 = i1 , . . . , Xn = in ) = pin in +1 .
We say that (Xn )n≥0 is Markov (λ, P ) for short. If (Xn )0≤n≤N is a ﬁnite
sequence of random variables satisfying (i) and (ii) for n = 0, . . . , N − 1,
then we again say (Xn )0≤n≤N is Markov (λ, P ).
It is in terms of properties (i) and (ii) that most real-world examples are
seen to be Markov chains. But mathematically the following result appears
to give a more comprehensive description, and it is the key to some later
calculations.
Theorem 1.1.1. A discrete-time random process (Xn )0≤n≤N is
Markov(λ, P ) if and only if for all i1 , . . . , iN ∈ I

P(X0 = i1 , X1 = i2 , . . . , XN = iN ) = λi1 pi1 i2 pi2 i2 . . . piN −1 iN . (1.1)

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.1 Deﬁnition and basic properties 3

Proof. Suppose (Xn )0≤n≤N is Markov(λ, P ), then

P(X0 = i1 , X1 = i2 , . . . , XN = iN )
= P(X0 = i1 )P(X1 = i2 | X0 = i1 )
. . . P(XN = iN | X0 = i1 , . . . , XN −1 = iN −1 )
= λi1 pi1 i2 . . . piN −1 iN .

On the other hand, if (1.1) holds for N , then by summing both sides over

iN ∈ I and using j∈I pij = 1 we see that (1.1) holds for N − 1 and, by
induction

P(X0 = i1 , X1 = i2 , . . . , Xn = in ) = λi1 pi1 i2 . . . pin −1 in

for all n = 0, 1, . . . , N . In particular, P(X0 = i1 ) = λi1 and, for n =

0, 1, . . . , N − 1,

P(Xn+1 = in+1 | X0 = i1 , . . . , Xn = in )
= P(X0 = i1 , . . . , Xn = in , Xn+1 = in+1 )/P(X0 = i1 , . . . , Xn = in )
= pin in +1 .

So (Xn )0≤n≤N is Markov(λ, P ).

The next result reinforces the idea that Markov chains have no memory.
We write δi = (δij : j ∈ I) for the unit mass at i, where

1 if i = j
δij =
0 otherwise.

Theorem 1.1.2 (Markov property). Let (Xn )n≥0 be Markov(λ, P ).

Then, conditional on Xm = i, (Xm+n )n≥0 is Markov(δi , P ) and is indepen-
dent of the random variables X0 , . . . , Xm .

Proof. We have to show that for any event A determined by X0 , . . . , Xm

we have

P({Xm = im , . . . , Xm+n = im+n } ∩ A | Xm = i)

= δiim pim im +1 . . . pim +n −1 im +n P(A | Xm = i) (1.2)

then the result follows by Theorem 1.1.1. First consider the case of elemen-
tary events
A = {X0 = i1 , . . . , Xm = im }.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

4 1. Discrete-time Markov chains

In that case we have to show

P(X0 = i1 , . . . , Xm+n = im+n and i = im )/P(Xm = i)

= δiim pim im +1 . . . pim +n −1 im +n
× P(X0 = i1 , . . . , Xm = im and i = im )/P(Xm = i)

which is true by Theorem 1.1.1. In general, any event A determined by

X0 , . . . , Xm may be written as a countable disjoint union of elementary
events
∞
A= Ak .
k=1

Then the desired identity (1.2) for A follows by summing up the corre-
sponding identities for Ak .
The remainder of this section addresses the following problem: what is
the probability that after n steps our Markov chain is in a given state? First
we shall see how the problem reduces to calculating entries in the nth power
of the transition matrix. Then we shall look at some examples where this
may be done explicitly.
We regard distributions and measures λ as row vectors whose compo-
nents are indexed by I, just as P is a matrix whose entries are indexed by
I × I. When I is ﬁnite we will often label the states 1, 2, . . . , N ; then λ
will be an N -vector and P an N × N -matrix. For these objects, matrix
multiplication is a familiar operation. We extend matrix multiplication to
the general case in the obvious way, deﬁning a new measure λP and a new
matrix P 2 by

(λP )j = λi pij , (P 2 )ik = pij pjk .

i∈I j∈I

We deﬁne P n similarly for any n. We agree that P 0 is the identity matrix

I, where (I)ij = δij . The context will make it clear when I refers to the
(n)
state-space and when to the identity matrix. We write pij = (P n )ij for
the (i, j) entry in P n .
In the case where λi > 0 we shall write Pi (A) for the conditional prob-
ability P(A | X0 = i). By the Markov property at time m = 0, under Pi ,
(Xn )n≥0 is Markov(δi , P ). So the behaviour of (Xn )n≥0 under Pi does not
depend on λ.
Theorem 1.1.3. Let (Xn )n≥0 be Markov(λ, P ). Then, for all n, m ≥ 0,
(i) P(Xn = j) = (λP n )j ;
(n)
(ii) Pi (Xn = j) = P(Xn+m = j | Xm = i) = pij .

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.1 Deﬁnition and basic properties 5

Proof. (i) By Theorem 1.1.1

P(Xn = j) = ... P(X0 = i1 , . . . , Xn−1 = in−1 , Xn = j)

i1 ∈I in −1 ∈I

= ... λi1 pi1 i2 . . . pin −1 j = (λP n )j .

i1 ∈I in −1 ∈I

(ii) By the Markov property, conditional on Xm = i, (Xm+n )n≥0 is Markov

(δi , P ), so we just take λ = δi in (i).
(n)
In light of this theorem we call pij the n-step transition probability from i
(n)
to j. The following examples give some methods for calculating pij .

Example 1.1.4
The most general two-state chain has transition matrix of the form

1−α α
P =
β 1−β

and is represented by the following diagram:

α
1 2
β

We exploit the relation P n+1 = P n P to write

(n+1) (n) (n)

p11 = p12 β + p11 (1 − α).

(n) (n)
We also know that p11 + p12 = P1 (Xn = 1 or 2) = 1, so by eliminating
(n) (n)
p12 we get a recurrence relation for p11 :

(n+1) (n) (0)

p11 = (1 − α − β)p11 + β, p11 = 1.

This has a unique solution (see Section 1.11):


 β
+
α
(1 − α − β)n for α + β > 0
(n)
p11 = α+β α+β

1 for α + β = 0.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

6 1. Discrete-time Markov chains

Example 1.1.5 (Virus mutation)

Suppose a virus can exist in N diﬀerent strains and in each generation
either stays the same, or with probability α mutates to another strain,
which is chosen at random. What is the probability that the strain in the
nth generation is the same as that in the 0th?
We could model this process as an N -state chain, with N × N transition
matrix P given by

pii = 1 − α, pij = α/(N − 1) for i = j.

(n)
Then the answer we want would be found by computing p11 . In fact, in
this example there is a much simpler approach, which relies on exploiting
the symmetry present in the mutation rules.
At any time a transition is made from the initial state to another with
probability α, and a transition from another state to the initial state with
probability α/(N − 1). Thus we have a two-state chain with diagram

α
initial other
α/(N − 1)

and by putting β = α/(N − 1) in Example 1.1.4 we ﬁnd that the desired

probability is
n
1 1 αN
+ 1− 1− .
N N N −1
Beware that in examples having less symmetry, this sort of lumping together
of states may not produce a Markov chain.

Example 1.1.6
Consider the three-state chain with diagram

1
2 1

3 1 2
2

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.1 Deﬁnition and basic properties 7

and transition matrix

 
0 1 0
P =0 1
2
1
2
.
1 1
2 0 2
(n)
The problem is to ﬁnd a general formula for p11 .
First we compute the eigenvalues of P by writing down its characteristic
equation

0 = det (x − P ) = x(x − 12 )2 − 1
4 = 14 (x − 1)(4x2 + 1).
(n)
The eigenvalues are 1, i/2, −i/2 and from this we deduce that p11 has the
form n n
(n) i i
p11 = a + b +c −
2 2
for some constants a, b and c. (The justiﬁcation comes from linear algebra:
having distinct eigenvalues, P is diagonalizable, that is, for some invertible
matrix U we have
 
1 0 0
P = U  0 i/2 0  U −1
0 0 −i/2

and hence  
1 0 0
Pn = U 0 (i/2)n 0  U −1
n
0 0 (−i/2)
(n)
which forces p11 to have the form claimed.) The answer we want is real
and n n n
i 1 ±inπ/2 1 nπ nπ
± = e = cos ± i sin
2 2 2 2 2
(n)
so it makes sense to rewrite p11 in the form
n
(n) 1 nπ nπ
p11 = α + β cos + γ sin
2 2 2
(n)
for constants α, β and γ. The ﬁrst few values of p11 are easy to write
down, so we get equations to solve for α, β and γ:
(0)
1 = p11 = α + β
(1)
0 = p11 = α + 12 γ
(2)
0 = p11 = α − 14 β

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

8 1. Discrete-time Markov chains

so α = 1/5, β = 4/5, γ = −2/5 and

n
(n) 1 1 4 nπ 2 nπ
p11 = + cos − sin .
5 2 5 2 5 2

More generally, the following method may in principle be used to ﬁnd a

(n)
formula for pij for any M -state chain and any states i and j.
(i) Compute the eigenvalues λ1 , . . . , λM of P by solving the character-
istic equation.
(n)
(ii) If the eigenvalues are distinct then pij has the form
(n)
pij = a1 λn1 + . . . + aM λnM

for some constants a1 , . . . , aM (depending on i and j). If an eigen-

value λ is repeated (once, say) then the general form includes the
term (an + b)λn .
(iii) As roots of a polynomial with real coeﬃcients, complex eigenvalues
will come in conjugate pairs and these are best written using sine
and cosine, as in the example.

Exercises
∞
1.1.1 Let B1 , B2 , . . . be disjoint events with n=1 Bn = Ω. Show that if A
is another event and P(A|Bn ) = p for all n then P(A) = p.
Deduce that if X and Y are discrete random variables then the following
are equivalent:
(a) X and Y are independent;
(b) the conditional distribution of X given Y = y is independent of y.
1.1.2 Suppose that (Xn )n≥0 is Markov (λ, P ). If Yn = Xkn , show that
(Yn )n≥0 is Markov (λ, P k ).
1.1.3 Let X0 be a random variable with values in a countable set I. Let
Y1 , Y2 , . . . be a sequence of independent random variables, uniformly dis-
tributed on [0, 1]. Suppose we are given a function

G : I × [0, 1] → I

and deﬁne inductively

Xn+1 = G(Xn , Yn+1 ).

Show that (Xn )n≥0 is a Markov chain and express its transition matrix P
in terms of G. Can all Markov chains be realized in this way? How would
you simulate a Markov chain using a computer?

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.1 Deﬁnition and basic properties 9

Suppose now that Z0 , Z1 , . . . are independent, identically distributed

random variables such that Zi = 1 with probability p and Zi = 0 with
probability 1 − p. Set S0 = 0, Sn = Z1 + . . . + Zn . In each of the following
cases determine whether (Xn )n≥0 is a Markov chain:
(a) Xn = Zn , (b) Xn = Sn ,
(c) Xn = S0 + . . . + Sn , (d)Xn = (Sn , S0 + . . . + Sn ).
In the cases where (Xn )n≥0 is a Markov chain find its state-space and
transition matrix, and in the cases where it is not a Markov chain give an
example where P (Xn+1 = i|Xn = j, Xn−1 = k) is not independent of k.
1.1.4 A flea hops about at random on the vertices of a triangle, with all
jumps equally likely. Find the probability that after n hops the flea is back
where it started.
A second flea also hops about on the vertices of a triangle, but this flea is
twice as likely to jump clockwise as anticlockwise. What is the probability
that after√n hops this second flea is back where it started? [Recall that
e±iπ/6 = 3/2 ± i/2.]
1.1.5 A die is ‘fixed’ so that each time it is rolled the score cannot be the
same as the preceding score, all other scores having probability 1/5. If the
first score is 6, what is the probability p that the nth score is 6? What is
the probability that the nth score is 1?
Suppose now that a new die is produced which cannot score one greater
(mod 6) than the preceding score, all other scores having equal probability.
By considering the relationship between the two dice find the value of p for
the new die.
1.1.6 An octopus is trained to choose object A from a pair of objects A, B
by being given repeated trials in which it is shown both and is rewarded
with food if it chooses A. The octopus may be in one of three states of mind:
in state 1 it cannot remember which object is rewarded and is equally likely
to choose either; in state 2 it remembers and chooses A but may forget
again; in state 3 it remembers and chooses A and never forgets. After each
trial it may change its state of mind according to the transition matrix
1 1
State 1 2 2
0
1 1 5
State 2 2 12 12

State 3 0 0 1.

It is in state 1 before the ﬁrst trial. What is the probablity that it is

in state 1 just before the (n+1)th trial ? What is the probability Pn+1 (A)
that it chooses A on the (n + 1)th trial ?

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

10 1. Discrete-time Markov chains

Someone suggests that the record of successive choices (a sequence of As

and Bs) might arise from a two-state Markov chain with constant transition
probabilities. Discuss, with reference to the value of Pn+1 (A) that you have
found, whether this is possible.

1.1.7 Let (Xn )n≥0 be a Markov chain on {1, 2, 3} with transition matrix
 
0 1 0
P =  0 2/3 1/3  .
p 1−p 0

Calculate P(Xn = 1|X0 = 1) in each of the following cases: (a) p = 1/16,

(b) p = 1/6, (c) p = 1/12.

1.2 Class structure

It is sometimes possible to break a Markov chain into smaller pieces, each

of which is relatively easy to understand, and which together give an un-
derstanding of the whole. This is done by identifying the communicating
classes of the chain.
We say that i leads to j and write i → j if

Pi (Xn = j for some n ≥ 0) > 0.

We say i communicates with j and write i ↔ j if both i → j and j → i.

Theorem 1.2.1. For distinct states i and j the following are equivalent:
(i) i → j;
(ii) pi1 i2 pi2 i2 . . . pin −1 in > 0 for some states i1 , i2 , . . . , in with i1 = i and
in = j;
(n)
(iii) pij > 0 for some n ≥ 0.
Proof. Observe that
∞

(n) (n)
pij ≤ Pi (Xn = j for some n ≥ 0) ≤ pij
n=0

which proves the equivalence of (i) and (iii). Also

(n)

pij = pii2 pi2 i2 . . . pin −1 j

i2 ,... ,in −1

so that (ii) and (iii) are equivalent.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.3 Hitting times and absorption probabilities 11

It is clear from (ii) that i → j and j → k imply i → k. Also i → i for

any state i. So ↔ satisﬁes the conditions for an equivalence relation on I,
and thus partitions I into communicating classes. We say that a class C is
closed if
i ∈ C, i → j imply j ∈ C.
Thus a closed class is one from which there is no escape. A state i is
absorbing if {i} is a closed class. The smaller pieces referred to above are
these communicating classes. A chain or transition matrix P where I is a
single class is called irreducible.
As the following example makes clear, when one can draw the diagram,
the class structure of a chain is very easy to ﬁnd.

Example 1.2.2
Find the communicating classes associated to the stochastic matrix
1 1
0 0 0 0
2 2
0 0 1 0 0 0
1 1 1 
 0 0 0
P =3 3 3
.
0 0 0 1 1
0
 2 2 
0 0 0 0 0 1
0 0 0 0 1 0

The solution is obvious from the diagram

1 4

2 5 6

the classes being {1, 2, 3}, {4} and {5, 6}, with only {5, 6} being closed.

Exercises
1.2.1 Identify the communicating classes of the following transition matrix:
1 
2
0 0 0 12
0 1 0 1 0 
 2 2 
P = 0 01 11 01 01  .

0 
4 4 4 4
1 1
2
0 0 0 2

Which classes are closed?

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

12 1. Discrete-time Markov chains

1.2.2 Show that every transition matrix on a ﬁnite state-space has at least
one closed communicating class. Find an example of a transition matrix
with no closed communicating class.

1.3 Hitting times and absorption probabilities

Let (Xn )n≥0 be a Markov chain with transition matrix P . The hitting time
of a subset A of I is the random variable H A : Ω → {0, 1, 2, . . . } ∪ {∞}
given by
H A (ω) = inf{n ≥ 0 : Xn (ω) ∈ A}
where we agree that the inﬁmum of the empty set ∅ is ∞. The probability
starting from i that (Xn )n≥0 ever hits A is then

i = Pi (H < ∞).
hA A

When A is a closed class, hA

i is called the absorption probability. The mean
time taken for (Xn )n≥0 to reach A is given by

kiA = Ei (H A ) = nP(H A = n) + ∞P(H A = ∞).

n<∞

We shall often write less formally

i = Pi (hit A),
hA kiA = Ei (time to hit A).

Remarkably, these quantities can be calculated explicitly by means of cer-

tain linear equations associated with the transition matrix P . Before we
give the general theory, here is a simple example.

Example 1.3.1
Consider the chain with the following diagram:

1
1 1
2 2 2

1 2 1 3 4
2

Starting from 2, what is the probability of absorption in 4? How long does

it take until the chain is absorbed in 1 or 4?
Introduce

hi = Pi (hit 4), ki = Ei (time to hit {1, 4}).

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.3 Hitting times and absorption probabilities 13

Clearly, h1 = 0, h4 = 1 and k1 = k4 = 0. Suppose now that we start at 2,

and consider the situation after making one step. With probability 1/2 we
jump to 1 and with probability 1/2 we jump to 3. So

h2 = 12 h1 + 12 h3 , k2 = 1 + 12 k1 + 12 k3 .

The 1 appears in the second formula because we count the time for the ﬁrst
step. Similarly,

h3 = 12 h2 + 12 h4 , k3 = 1 + 12 k2 + 12 k4 .

Hence

h2 = 12 h3 = 12 ( 12 h2 + 12 ),
k2 = 1 + 12 k3 = 1 + 12 (1 + 12 k2 ).

So, starting from 2, the probability of hitting 4 is 1/3 and the mean time to
absorption is 2. Note that in writing down the ﬁrst equations for h2 and k2
we made implicit use of the Markov property, in assuming that the chain
begins afresh from its new position after the ﬁrst jump. Here is a general
result for hitting probabilities.

Theorem 1.3.2. The vector of hitting probabilities hA = (hA i : i ∈ I) is

the minimal non-negative solution to the system of linear equations

hA
i =1 for i ∈ A
(1.3)
hA
i =
A
j∈I pij hj for i ∈ A.

(Minimality means that if x = (xi : i ∈ I) is another solution with xi ≥ 0

for all i, then xi ≥ hi for all i.)

Proof. First we show that hA satisﬁes (1.3). If X0 = i ∈ A, then H A = 0,

i = 1. If X0 = i ∈ A, then H ≥ 1, so by the Markov property
so hA A

Pi (H A < ∞ | X1 = j) = Pj (H A < ∞) = hA
j

and

i = Pi (H < ∞) =
hA Pi (H A < ∞, X1 = j)
A

j∈I

= Pi (H A < ∞ | X1 = j)Pi (X1 = j) = pij hA

j .
j∈I j∈I

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

14 1. Discrete-time Markov chains

Suppose now that x = (xi : i ∈ I) is any solution to (1.3). Then hA

i = xi = 1
for i ∈ A. Suppose i ∈ A, then

xi = pij xj = pij + pij xj .

j∈I j∈A j∈A

Substitute for xj to obtain

xi = pij + pij pjk + pjk xk
j∈A j∈A k∈A k∈A

= Pi (X1 ∈ A) + Pi (X1 ∈ A, X2 ∈ A) + pij pjk xk .

j∈A k∈A

By repeated substitution for x in the ﬁnal term we obtain after n steps

xi = Pi (X1 ∈ A) + . . . + Pi (X1 ∈ A, . . . , Xn−1 ∈ A, Xn ∈ A)

+ ... pij1 pj1 j2 . . . pjn −1 jn xjn .

j1 ∈A jn ∈A

Now if x is non-negative, so is the last term on the right, and the remaining
terms sum to Pi (H A ≤ n). So xi ≥ Pi (H A ≤ n) for all n and then

xi ≥ lim Pi (H A ≤ n) = Pi (H A < ∞) = hi .
n→∞

Example 1.3.1 (continued)

The system of linear equations (1.3) for h = h{4} are given here by

h4 = 1,
h2 = 12 h1 + 12 h3 , h3 = 12 h2 + 12 h4

so that

h2 = 12 h1 + 12 ( 21 h2 + 12 )

and

1
h2 = 3
+ 23 h1 , h3 = 2
3
+ 13 h1 .

The value of h1 is not determined by the system (1.3), but the minimality
condition now makes us take h1 = 0, so we recover h2 = 1/3 as before. Of
course, the extra boundary condition h1 = 0 was obvious from the beginning

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.3 Hitting times and absorption probabilities 15

so we built it into our system of equations and did not have to worry about
minimal non-negative solutions.
In cases where the state-space is inﬁnite it may not be possible to write
down a corresponding extra boundary condition. Then, as we shall see in
the next examples, the minimality condition is essential.
Example 1.3.3 (Gamblers’ ruin)
Consider the Markov chain with diagram

q p q p q p
0 1 i i+1

where 0 < p = 1 − q < 1. The transition probabilities are

p00 = 1,
pi,i−1 = q, pi,i+1 = p for i = 1, 2, . . . .

Imagine that you enter a casino with a fortune of £i and gamble, £1 at a

time, with probability p of doubling your stake and probability q of losing
it. The resources of the casino are regarded as inﬁnite, so there is no upper
limit to your fortune. But what is the probability that you leave broke?
Set hi = Pi (hit 0), then h is the minimal non-negative solution to

h0 = 1,
hi = phi+1 + qhi−1 , for i = 1, 2, . . . .

If p = q this recurrence relation has a general solution

i
q
hi = A + B .
p
(See Section 1.11.) If p < q, which is the case in most successful casinos,
then the restriction 0 ≤ hi ≤ 1 forces B = 0, so hi = 1 for all i. If p > q,
then since h0 = 1 we get a family of solutions
i i
q q
hi = +A 1− ;
p p

for a non-negative solution we must have A ≥ 0, so the minimal non-

negative solution is hi = (q/p)i . Finally, if p = q the recurrence relation
has a general solution
hi = A + Bi

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

16 1. Discrete-time Markov chains

and again the restriction 0 ≤ hi ≤ 1 forces B = 0, so hi = 1 for all i.

Thus, even if you ﬁnd a fair casino, you are certain to end up broke. This
apparent paradox is called gamblers’ ruin.

Example 1.3.4 (Birth-and-death chain)

Consider the Markov chain with diagram

q1 p1 qi pi qi+1 pi+1
0 1 i i+1

where, for i = 1, 2, . . . , we have 0 < pi = 1 − qi < 1. As in the preceding

example, 0 is an absorbing state and we wish to calculate the absorption
probability starting from i. But here we allow pi and qi to depend on i.
Such a chain may serve as a model for the size of a population, recorded
each time it changes, pi being the probability that we get a birth before
a death in a population of size i. Then hi = Pi (hit 0) is the extinction
probability starting from i.
We write down the usual system of equations

h0 = 1,
hi = pi hi+1 + qi hi−1 , for i = 1, 2, . . . .

This recurrence relation has variable coeﬃcients so the usual technique fails.
But consider ui = hi−1 − hi , then pi ui+1 = qi ui , so

qi qi qi−1 . . . q1
ui+1 = ui = u1 = γi u1
pi pi pi−1 . . . p1

where the ﬁnal equality deﬁnes γi . Then

u1 + . . . + ui = h0 − hi

so
hi = 1 − A(γ0 + . . . + γi−1 )
where A = u1 and γ0 = 1. At this point A remains to be determined. In
∞
the case i=0 γ i = ∞, the restriction 0 ≤ hi ≤ 1 forces A = 0 and hi = 1
∞
for all i. But if i=0 γi < ∞ then we can take A > 0 so long as

1 − A(γ0 + . . . + γi−1 ) ≥ 0 for all i.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.3 Hitting times and absorption probabilities 17
∞ −1
Thus the minimal non-negative solution occurs when A = i=0 γi and
then

∞ ∞
hi = γj γj .
j=i j=0

In this case, for i = 1, 2, . . . , we have hi < 1, so the population survives

with positive probability.
Here is the general result on mean hitting times. Recall that kiA =
Ei (H A ), where H A is the ﬁrst time (Xn )n≥0 hits A. We use the notation
1B for the indicator function of B, so, for example, 1X1 =j is the random
variable equal to 1 if X1 = j and equal to 0 otherwise.

Theorem 1.3.5. The vector of mean hitting times k A = (k A : i ∈ I) is

the minimal non-negative solution to the system of linear equations
A
ki = 0 for i ∈ A
(1.4)
ki = 1 + j∈A pij kj for i ∈ A.
A A

Proof. First we show that k A satisﬁes (1.4). If X0 = i ∈ A, then H A = 0,

so kiA = 0. If X0 = i ∈ A, then H A ≥ 1, so, by the Markov property,

Ei (H A | X1 = j) = 1 + Ej (H A )

and

kiA = Ei (H A ) = Ei (H A 1X1 =j )
j∈I

= Ei (H A | X1 = j)Pi (X1 = j) = 1 + pij kjA .

j∈I j∈A

Suppose now that y = (yi : i ∈ I) is any solution to (1.4). Then kiA = yi = 0

for i ∈ A. If i ∈ A, then

yi = 1 + pij yj
j∈A

=1+ pij 1 + pjk yk
j∈A k∈A

= Pi (H A ≥ 1) + Pi (H A ≥ 2) + pij pjk yk .
j∈A k∈A

By repeated substitution for y in the ﬁnal term we obtain after n steps

yi = Pi (H A ≥ 1) + . . . + Pi (H A ≥ n) + ... pij1 pj1 j2 . . . pjn −1 jn yjn .

j1 ∈A jn ∈A

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

18 1. Discrete-time Markov chains

So, if y is non-negative,
yi ≥ Pi (H A ≥ 1) + . . . + Pi (H A ≥ n)
and, letting n → ∞,
∞

yi ≥ Pi (H A ≥ n) = Ei (H A ) = kiA .
n=1

Exercises
1.3.1 Prove the claims (a), (b) and (c) made in example (v) of the Intro-
duction.
1.3.2 A gambler has £2 and needs to increase it to £10 in a hurry. He
can play a game with the following rules: a fair coin is tossed; if a player
bets on the right side, he wins a sum equal to his stake, and his stake is
returned; otherwise he loses his stake. The gambler decides to use a bold
strategy in which he stakes all his money if he has £5 or less, and otherwise
stakes just enough to increase his capital, if he wins, to £10.
Let X0 = 2 and let Xn be his capital after n throws. Prove that the
gambler will achieve his aim with probability 1/5.
What is the expected number of tosses until the gambler either achieves
his aim or loses his capital?
1.3.3 A simple game of ‘snakes and ladders’ is played on a board of nine
squares.

7 8 FINISH 9

6 5 4

1 2 3

START

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.4 Strong Markov property 19

At each turn a player tosses a fair coin and advances one or two places
according to whether the coin lands heads or tails. If you land at the foot
of a ladder you climb to the top, but if you land at the head of a snake you
slide down to the tail. How many turns on average does it take to complete
the game?
What is the probability that a player who has reached the middle square
will complete the game without slipping back to square 1?
1.3.4 Let (Xn )n≥0 be a Markov chain on {0, 1, . . . } with transition proba-
bilities given by
2
i+1
p01 = 1, pi,i+1 + pi,i−1 = 1, pi,i+1 = pi,i−1 , i ≥ 1 .
i

Show that if X0 = 0 then the probability that Xn ≥ 1 for all n ≥ 1 is 6/π 2 .

1.4 Strong Markov property

In Section 1.1 we proved the Markov property. This says that for each time
m, conditional on Xm = i, the process after time m begins afresh from
i. Suppose, instead of conditioning on Xm = i, we simply waited for the
process to hit state i, at some random time H. What can one say about the
process after time H? What if we replaced H by a more general random
time, for example H − 1? In this section we shall identify a class of random
times at which a version of the Markov property does hold. This class will
include H but not H − 1; after all, the process after time H − 1 jumps
straight to i, so it does not simply begin afresh.
A random variable T : Ω → {0, 1, 2, . . . } ∪ {∞} is called a stopping time
if the event {T = n} depends only on X0 , X1 , . . . , Xn for n = 0, 1, 2, . . . .
Intuitively, by watching the process, you know at the time when T occurs.
If asked to stop at T , you know when to stop.
Examples 1.4.1
(a) The ﬁrst passage time

Tj = inf{n ≥ 1 : Xn = j}

is a stopping time because

{Tj = n} = {X1 = j, . . . , Xn−1 = j, Xn = j}.

(b) The ﬁrst hitting time H A of Section 1.3 is a stopping time because

{H A = n} = {X0 ∈ A, . . . , Xn−1 ∈ A, Xn ∈ A}.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

20 1. Discrete-time Markov chains

(c) The last exit time

LA = sup{n ≥ 0 : Xn ∈ A}

is not in general a stopping time because the event {LA = n} depends on

whether (Xn+m )m≥1 visits A or not.

We shall show that the Markov property holds at stopping times. The
crucial point is that, if T is a stopping time and B ⊆ Ω is determined by
X0 , X1 , . . . , XT , then B ∩ {T = m} is determined by X0 , X1 , . . . , Xm , for
all m = 0, 1, 2, . . . .

Theorem 1.4.2 (Strong Markov property). Let (Xn )n≥0 be

Markov(λ, P ) and let T be a stopping time of (Xn )n≥0 . Then, conditional
on T < ∞ and XT = i, (XT +n )n≥0 is Markov(δi , P ) and independent of
X 0 , X1 , . . . , XT .

Proof. If B is an event determined by X0 , X1 , . . . , XT , then B ∩ {T = m}

is determined by X0 , X1 , . . . , Xm , so, by the Markov property at time m

P({XT = j0 , XT +1 = j1 , . . . , XT +n = jn } ∩ B ∩ {T = m} ∩ {XT = i})

= Pi (X0 = j0 , X1 = j1 , . . . , Xn = jn )P(B ∩ {T = m} ∩ {XT = i})

where we have used the condition T = m to replace m by T . Now sum over

m = 0, 1, 2, . . . and divide by P(T < ∞, XT = i) to obtain

P({XT = j0 , XT +1 = j1 , . . . , XT +n = jn } ∩ B | T < ∞, XT = i)
= Pi (X0 = j0 , X1 = j1 , . . . , Xn = jn )P(B | T < ∞, XT = i).

The following example uses the strong Markov property to get more
information on the hitting times of the chain considered in Example 1.3.3.

Example 1.4.3
Consider the Markov chain (Xn )n≥0 with diagram

q p q p q p
0 1 i i+1

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.4 Strong Markov property 21

where 0 < p = 1 − q < 1. We know from Example 1.3.3 the probability of

hitting 0 starting from 1. Here we obtain the complete distribution of the
time to hit 0 starting from 1 in terms of its probability generating function.
Set
Hj = inf{n ≥ 0 : Xn = j}
and, for 0 ≤ s < 1

φ(s) = E1 (sH0 ) = sn P1 (H0 = n).

n<∞

Suppose we start at 2. Apply the strong Markov property at H1 to see

that under P2 , conditional on H1 < ∞, we have H0 = H1 + H 0 , where
H 0 , the time taken after H1 to get to 0, is independent of H1 and has the
(unconditioned) distribution of H1 . So
0 | H < ∞)P (H < ∞)
E2 (sH0 ) = E2 (sH1 | H1 < ∞)E2 (sH 1 2 1
0 | H < ∞)
= E2 (sH1 1H1 <∞ )E2 (sH 1

= E2 (sH1 )2 = φ(s)2 .

Then, by the Markov property at time 1, conditional on X1 = 2, we have

H0 = 1 + H 0 , where H 0 , the time taken after time 1 to get to 0, has the
same distribution as H0 does under P2 . So
φ(s) = E1 (sH0 ) = pE1 (sH0 | X1 = 2) + qE1 (sH0 | X1 = 0)
= pE1 (s1+H 0 | X1 = 2) + qE1 (s | X1 = 0)
= psE2 (sH0 ) + qs
= psφ(s)2 + qs.

Thus φ = φ(s) satisﬁes

psφ2 − φ + qs = 0 (1.5)
and
φ = (1 ± 1 − 4pqs2 )/2ps.
Since φ(0) ≤ 1 and φ is continuous we are forced to take the negative root
at s = 0 and stick with it for all 0 ≤ s < 1.
To recover the distribution of H0 we expand the square-root as a power
series:

1
φ(s) = 1 − 1 + 2 (−4pqs ) + 2 (− 2 )(−4pqs ) /2! + . . .
1 2 1 1 2 2
2ps
= qs + pq 2 s3 + . . .
= sP1 (H0 = 1) + s2 P1 (H0 = 2) + s3 P1 (H0 = 3) + . . . .

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

22 1. Discrete-time Markov chains

The ﬁrst few probabilities P1 (H0 = 1), P1 (H0 = 2), . . . are readily checked
from ﬁrst principles.
On letting s ↑ 1 we have φ(s) → P1 (H0 < ∞), so
√
1 − 1 − 4pq 1 if p ≤ q
P1 (H0 < ∞) = =
2p q/p if p > q.
(Remember that q = 1 − p, so

1 − 4pq = 1 − 4p + 4p2 = |1 − 2p| = |2q − 1|.)

We can also ﬁnd the mean hitting time using

E1 (H0 ) = lim φ (s).

s↑1

It is only worth considering the case p ≤ q, where the mean hitting time
has a chance of being ﬁnite. Diﬀerentiate (1.5) to obtain

2psφφ + pφ2 − φ + q = 0

φ (s) = (pφ(s)2 + q)/(1 − 2psφ(s)) → 1/(1 − 2p) = 1/(q − p) as s ↑ 1.

See Example 5.1.1 for a connection with branching processes.

Example 1.4.4
We now consider an application of the strong Markov property to a Markov
chain (Xn )n≥0 observed only at certain times. In the ﬁrst instance suppose
that J is some subset of the state-space I and that we observe the chain
only when it takes values in J . The resulting process (Ym )m≥0 may be
obtained formally by setting Ym = XTm , where

T0 = inf{n ≥ 0 : Xn ∈ J }

and, for m = 0, 1, 2, . . .

Tm+1 = inf{n > Tm : Xn ∈ J }.

Let us assume that P(Tm < ∞) = 1 for all m. For each m we can check
easily that Tm , the time of the mth visit to J , is a stopping time. So the
strong Markov property applies to show, for i1 , . . . , im+1 ∈ J , that

P(Ym+1 = im+1 | Y0 = i1 , . . . , Ym = im )
= P(XTm +1 = im+1 | XT0 = i1 , . . . , XTm = im )
= Pim (XT1 = im+1 ) = pim im +1

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.4 Strong Markov property 23

where, for i, j ∈ J
pij = hji

and where, for j ∈ J , the vector (hji : i ∈ I) is the minimal non-negative

solution to

hji = pij + pik hjk . (1.6)

k∈J

Thus (Ym )m≥0 is a Markov chain on J with transition matrix P .

A second example of a similar type arises if we observe the original chain
(Xn )n≥0 only when it moves. The resulting process (Zm )m≥0 is given by
Zm = XSm where S0 = 0 and for m = 0, 1, 2, . . .

Sm+1 = inf{n ≥ Sm : Xn = XSm }.

Let us assume there are no absorbing states. Again the random times Sm
for m ≥ 0 are stopping times and, by the strong Markov property

P(Zm+1 = im+1 | Z0 = i1 , . . . , Zm = im )
= P(XSm +1 = im+1 | XS0 = i1 , . . . , XSm = im )
= Pim (XS1 = im+1 ) = pim im +1

where pii = 0 and, for i = j

pij = pij / pik .

k=i

Thus (Zm )m≥0 is a Markov chain on I with transition matrix P.

Exercises
1.4.1 Let Y1 , Y2 , . . . be independent identically distributed random vari-
ables with
P(Y1 = 1) = P(Y1 = −1) = 1/2 and set X0 = 1, Xn = X0 + Y1 + . . . + Yn
for n ≥ 1. Deﬁne
H0 = inf{n ≥ 0 : Xn = 0} .
Find the probability generating function φ(s) = E(sH0 ).
Suppose the distribution of Y1 , Y2 , . . . is changed to P(Y1 = 2) = 1/2,
P(Y1 = −1) = 1/2. Show that φ now satisﬁes

sφ3 − 2φ + s = 0 .

1.4.2 Deduce carefully from Theorem 1.3.2 the claim made at (1.6).

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

24 1. Discrete-time Markov chains

1.5 Recurrence and transience

Let (Xn )n≥0 be a Markov chain with transition matrix P . We say that a
state i is recurrent if

Pi (Xn = i for inﬁnitely many n) = 1.

We say that i is transient if

Pi (Xn = i for inﬁnitely many n) = 0.

Thus a recurrent state is one to which you keep coming back and a transient
state is one which you eventually leave for ever. We shall show that every
state is either recurrent or transient.
Recall that the first passage time to state i is the random variable Ti
defined by
Ti (ω) = inf{n ≥ 1 : Xn (ω) = i}
(r)
where inf ∅ = ∞. We now define inductively the rth passage time Ti to
state i by
(0) (1)
Ti (ω) = 0, Ti (ω) = Ti (ω)
and, for r = 0, 1, 2, . . . ,
(r+1) (r)
Ti (ω) = inf{n ≥ Ti (ω) + 1 : Xn (ω) = i}.

The length of the rth excursion to i is then

(r) (r−1) (r−1)
(r) Ti − Ti if Ti <∞
Si =
0 otherwise.

The following diagram illustrates these deﬁnitions:

i
(0)
Ti
(1)
Ti Ti
(2) (3)
Ti n

(1) (2) (3) (4)

Si Si Si Si

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.5 Recurrence and transience 25

Our analysis of recurrence and transience will rest on ﬁnding the joint
distribution of these excursion lengths.
(r−1) (r)
Lemma 1.5.1. For r = 2, 3, . . . , conditional on Ti < ∞, Si is inde-
(r−1)
pendent of {Xm : m ≤ Ti } and
(r) (r−1)
P(Si = n | Ti < ∞) = Pi (Ti = n).

(r−1)
Proof. Apply the strong Markov property at the stopping time T = Ti .
It is automatic that XT = i on T < ∞. So, conditional on T < ∞,
(XT +n )n≥0 is Markov(δi , P ) and independent of X0 , X1 , . . . , XT . But
(r)
Si = inf{n ≥ 1 : XT +n = i},
(r)
so Si is the ﬁrst passage time of (XT +n )n≥0 to state i.

Recall that the indicator function 1{X1 =j} is the random variable equal
to 1 if X1 = j and 0 otherwise. Let us introduce the number of visits Vi to
i, which may be written in terms of indicator functions as
∞

Vi = 1{Xn =i}
n=0

and note that

∞

∞

∞

∞

(n)
Ei (Vi ) = Ei 1{Xn =i} = Ei (1{Xn =i} ) = Pi (Xn = i) = pii .
n=0 n=0 n=0 n=0

Also, we can compute the distribution of Vi under Pi in terms of the return

probability
fi = Pi (Ti < ∞).

Lemma 1.5.2. For r = 0, 1, 2, . . . , we have Pi (Vi > r) = fir .

(r)
Proof. Observe that if X0 = i then {Vi > r} = {Ti < ∞}. When r = 0
the result is true. Suppose inductively that it is true for r, then
(r+1)
Pi (Vi > r + 1) = Pi (Ti < ∞)
(r) (r+1)
= Pi (Ti < ∞ and Si < ∞)
(r+1) (r) (r)
= Pi (Si < ∞| Ti < ∞)Pi (Ti < ∞)
r r+1
= fi fi = fi

by Lemma 1.5.1, so by induction the result is true for all r.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

26 1. Discrete-time Markov chains

Recall that one can compute the expectation of a non-negative integer-

valued random variable as follows:
∞

∞

∞
P(V > r) = P(V = v)
r=0 r=0 v=r+1
∞ v−1

∞

= P(V = v) = vP(V = v) = E(V ).

v=1 r=0 v=1

The next theorem is the means by which we establish recurrence or

transience for a given state. Note that it provides two criteria for this, one
in terms of the return probability, the other in terms of the n-step transition
probabilities. Both are useful.
Theorem 1.5.3. The following dichotomy holds:
∞ (n)
(i) if Pi (Ti < ∞) = 1, then i is recurrent and n=0 pii = ∞;
∞ (n)
(ii) if Pi (Ti < ∞) < 1, then i is transient and n=0 pii < ∞.
In particular, every state is either transient or recurrent.
Proof. If Pi (Ti < ∞) = 1, then, by Lemma 1.5.2,

Pi (Vi = ∞) = lim Pi (Vi > r) = 1

r→∞

so i is recurrent and
∞

(n)
pii = Ei (Vi ) = ∞.
n=0

On the other hand, if fi = Pi (Ti < ∞) < 1, then by Lemma 1.5.2

∞

∞

∞

(n) 1
pii = Ei (Vi ) = Pi (Vi > r) = fir = <∞
n=0 r=0 r=0
1 − fi

so Pi (Vi = ∞) = 0 and i is transient.

From this theorem we can go on to solve completely the problem of
recurrence or transience for Markov chains with ﬁnite state-space. Some
cases of inﬁnite state-space are dealt with in the following chapter. First
we show that recurrence and transience are class properties.
Theorem 1.5.4. Let C be a communicating class. Then either all states
in C are transient or all are recurrent.
Proof. Take any pair of states i, j ∈ C and suppose that i is transient.
(n) (m)
There exist n, m ≥ 0 with pij > 0 and pji > 0, and, for all r ≥ 0
(n+r+m) (n) (r) (m)
pii ≥ pij pjj pji

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.5 Recurrence and transience 27

so
∞

∞

(r) 1 (n+r+m)
pjj ≤ (n) (m)
pii <∞
r=0 pij pji r=0

by Theorem 1.5.3. Hence j is also transient by Theorem 1.5.3.

In the light of this theorem it is natural to speak of a recurrent or transient

class.

Theorem 1.5.5. Every recurrent class is closed.

Proof. Let C be a class which is not closed. Then there exist i ∈ C, j ∈ C

and m ≥ 1 with
Pi (Xm = j) > 0.

Since we have

Pi ({Xm = j} ∩ {Xn = i for inﬁnitely many n}) = 0

this implies that

Pi (Xn = i for inﬁnitely many n) < 1

so i is not recurrent, and so neither is C.

Theorem 1.5.6. Every ﬁnite closed class is recurrent.

Proof. Suppose C is closed and ﬁnite and that (Xn )n≥0 starts in C. Then
for some i ∈ C we have

0 < P(Xn = i for inﬁnitely many n)

= P(Xn = i for some n)Pi (Xn = i for inﬁnitely many n)

by the strong Markov property. This shows that i is not transient, so C is

recurrent by Theorems 1.5.3 and 1.5.4.

It is easy to spot closed classes, so the transience or recurrence of ﬁnite

classes is easy to determine. For example, the only recurrent class in Ex-
ample 1.2.2 is {5, 6}, the others being transient. On the other hand, inﬁnite
closed classes may be transient: see Examples 1.3.3 and 1.6.3.
We shall need the following result in Section 1.8. Remember that irre-
ducibility means that the chain can get from any state to any other, with
positive probability.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

28 1. Discrete-time Markov chains

Theorem 1.5.7. Suppose P is irreducible and recurrent. Then for all

j ∈ I we have P(Tj < ∞) = 1.

Proof. By the Markov property we have

P(Tj < ∞) = P(X0 = i)Pi (Tj < ∞)

i∈I

(m)
so it suﬃces to show Pi (Tj < ∞) = 1 for all i ∈ I. Choose m with pji > 0.
By Theorem 1.5.3, we have

1 = Pj (Xn = j for inﬁnitely many n)

= Pj (Xn = j for some n ≥ m + 1)

= Pj (Xn = j for some n ≥ m + 1 | Xm = k)Pj (Xm = k)

k∈I

(m)
= Pk (Tj < ∞)pjk
k∈I

(m)
where the ﬁnal equality uses the Markov property. But k∈I pjk = 1 so
we must have Pi (Tj < ∞) = 1.

Exercises
1.5.1 In Exercise 1.2.1, which states are recurrent and which are transient?

1.5.2 Show that, for the Markov chain (Xn )n≥0 in Exercise 1.3.4 we have

P(Xn → ∞ as n → ∞) = 1 .

Suppose, instead, the transition probabilities satisfy

α
i+1
pi,i+1 = pi,i−1 .
i

For each α ∈ (0, ∞) ﬁnd the value of P(Xn → ∞ as n → ∞).

1.5.3 (First passage decomposition). Denote by Tj the ﬁrst passage

time to state j and set
(n)
fij = Pi (Tj = n).
Justify the identity

(n)

n
(k) (n−k)
pij = fij pjj for n ≥ 1
k=1

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.6 Recurrence and transience of random walks 29

and deduce that

Pij (s) = δij + Fij (s)Pjj (s)
where
∞

∞

(n) (n)
Pij (s) = pij sn , Fij (s) = fij sn .
n=0 n=0
Hence show that Pi (Ti < ∞) = 1 if and only if
∞

(n)
pii = ∞
n=0

without using Theorem 1.5.3.

1.5.4 A random sequence of non-negative integers (Fn )n≥0 is obtained by
setting F0 = 0 and F1 = 1 and, once F0 , . . . , Fn are known, taking Fn+1 to
be either the sum or the difference of Fn−1 and Fn , each with probability
1/2. Is (Fn )n≥0 a Markov chain?
By considering the Markov chain Xn = (Fn−1 , Fn ), find the probability
that (Fn )n≥0 reaches 3 before first returning to 0.
Draw enough of the flow diagram for (Xn )n≥0 to establish a general
pattern. Hence, using the strong Markov property, √ show that the hitting
probability for (1, 1), starting from (1, 2), is (3 − 5)/2.
Deduce that (Xn )n≥0 is transient. Show that, moreover, with probability
1, Fn → ∞ as n → ∞.

1.6 Recurrence and transience of random walks

In the last section we showed that recurrence was a class property, that all
recurrent classes were closed and that all ﬁnite closed classes were recurrent.
So the only chains for which the question of recurrence remains interesting
are irreducible with inﬁnite state-space. Here we shall study some simple
and fundamental examples of this type, making use of the following criterion
for recurrence from Theorem 1.5.3: a state i is recurrent if and only if
∞ (n)
n=0 pii = ∞.

Example 1.6.1 (Simple random walk on Z)

The simple random walk on Z has diagram

q p
i−1 i i+1

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

30 1. Discrete-time Markov chains

where 0 < p = 1 − q < 1. Suppose we start at 0. It is clear that we cannot

(2n+1)
return to 0 after an odd number of steps, so p00 = 0 for all n. Any
given sequence of steps of length 2n from 0 to 0 occurs with probability
pn q n , there being n steps up and n steps down, and the number of such
sequences is the number of ways of choosing the n steps up from 2n. Thus

(2n) 2n n n
p00 = p q .
n

Stirling’s formula provides a good approximation to n! for large n: it is

known that
√
n! ∼ 2πn(n/e)n as n → ∞

where an ∼ bn means an /bn → 1. For a proof see W. Feller, An Introduction

to Probability Theory and its Applications, Vol I (Wiley, New York, 3rd
edition, 1968). At the end of this chapter we reproduce the argument used
by Feller to show that
√
n! ∼ A n(n/e)n as n → ∞
√
for some A ∈ [1, ∞). The additional work needed to show A = 2π is
omitted, as this fact is unnecessary to our applications.
For the n-step transition probabilities we obtain

(2n) (2n)! (4pq)n

p00 = (pq) n
∼ as n → ∞.
(n!)2 A n/2

In the symmetric case p = q = 1/2, so 4pq = 1; then for some N and all
n ≥ N we have
(2n) 1
p00 ≥ √
2A n
so
∞

∞
(2n) 1
1
p00 ≥ √ =∞
2A n
n=N n=N

which shows that the random walk is recurrent. On the other hand, if p = q
then 4pq = r < 1, so by a similar argument, for some N
∞

∞
(n) 1
n
p00 ≤ r <∞
A
n=N n=N

showing that the random walk is transient.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.6 Recurrence and transience of random walks 31

Example 1.6.2 (Simple symmetric random walk on Z2 )

The simple symmetric random walk on Z2 has diagram

1
1 4
4

and transition probabilities

1/4 if |i − j| = 1
pij =
0 otherwise.
Suppose we start at 0. Let us call the walk Xn and write Xn+ and Xn− for
the orthogonal projections of Xn on the diagonal lines y = ±x:

Xn+

Xn−

Then Xn+ and Xn− are independent simple symmetric random walks on
2−1/2 Z and Xn = 0 if and only if Xn+ = 0 = Xn− . This makes it clear that
for Xn we have
2
2n
(2n) 2n 1 2
p00 = ∼ 2 as n → ∞
n 2 A n

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

32 1. Discrete-time Markov chains
∞ (n) ∞
by Stirling’s formula. Then n=1 p00 = ∞ by comparison with n=1 1/n
and the walk is recurrent.

Example 1.6.3 (Simple symmetric random walk on Z3 )

The transition probabilities of the simple symmetric random walk on Z3
are given by

1/6 if |i − j| = 1
pij =
0 otherwise.
Thus the chain jumps to each of its nearest neighbours with equal probabil-
ity. Suppose we start at 0. We can only return to 0 after an even number
2n of steps. Of these 2n steps there must be i up, i down, j north, j south,
k east and k west for some i, j, k ≥ 0, with i + j + k = n. By counting the
ways in which this can be done, we obtain

2n 2n
2 2n
(2n) (2n)! 1 2n 1 n 1
p00 = = .
i , j , k ≥0
(i!j!k!)2 6 n 2 i , j , k ≥0
ijk 3
i +j +k =n i +j +k =n

Now n

n 1
=1
i , j , k ≥0
ij k 3
i +j +k =n

the left-hand side being the total probability of all the ways of placing n
balls randomly into three boxes. For the case where n = 3m, we have

n n! n
= ≤
ij k i!j!k! mmm

for all i, j, k, so

2n n 3/2
(2n) 2n 1 n 1 1 6
p00 ≤ ∼ 3
as n → ∞
n 2 mmm 3 2A n

∞ (6m)
by Stirling’s formula. Hence, m=0 p00 < ∞ by comparison with
∞ −3/2 (6m) 2 (6m−2) (6m) 4 (6m−4)
n=0 n . But p 00 ≥ (1/6) p 00 and p00 ≥ (1/6) p00 for
all m so we must have
∞

(n)
p00 < ∞
n=0

and the walk is transient.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.7 Invariant distributions 33

Exercises
1.6.1 The rooted binary tree is an inﬁnite graph T with one distinguished
vertex R from which comes a single edge; at every other vertex there are
three edges and there are no closed loops. The random walk on T jumps
from a vertex along each available edge with equal probability. Show that
the random walk is transient.

1.6.2 Show that the simple symmetric random walk in Z4 is transient.

1.7 Invariant distributions

Many of the long-time properties of Markov chains are connected with the
notion of an invariant distribution or measure. Remember that a measure
λ is any row vector (λi : i ∈ I) with non-negative entries. We say λ is
invariant if
λP = λ.

The terms equilibrium and stationary are also used to mean the same. The
ﬁrst result explains the term stationary.

Theorem 1.7.1. Let (Xn )n≥0 be Markov(λ, P ) and suppose that λ is in-
variant for P . Then (Xm+n )n≥0 is also Markov(λ, P ).

Proof. By Theorem 1.1.3, P(Xm = i) = (λP m )i = λi for all i and, clearly,

conditional on Xm+n = i, Xm+n+1 is independent of Xm , Xm+1 , . . . , Xm+n
and has distribution (pij : j ∈ I).

The next result explains the term equilibrium.

Theorem 1.7.2. Let I be ﬁnite. Suppose for some i ∈ I that

(n)
pij → πj as n → ∞ for all j ∈ I.

Then π = (πj : j ∈ I) is an invariant distribution.

Proof. We have

(n)

(n)
πj = lim pij = lim pij = 1
n→∞ n→∞
j∈I j∈I j∈I

and

(n)

(n)

(n)

πj = lim pij = lim pik pkj = lim pik pkj = πk pkj

n→∞ n→∞ n→∞
k∈I k∈I k∈I

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

34 1. Discrete-time Markov chains

where we have used ﬁniteness of I to justify interchange of summation and

limit operations. Hence π is an invariant distribution.

Notice that for any of the random walks discussed in Section 1.6 we have
(n)
pij → 0 as n → ∞ for all i, j ∈ I. The limit is certainly invariant, but it
is not a distribution!
Theorem 1.7.2 is not a very useful result but it serves to indicate a rela-
tionship between invariant distributions and n-step transition probabilities.
In Theorem 1.8.3 we shall prove a sort of converse, which is much more
useful.

Example 1.7.3
Consider the two-state Markov chain with transition matrix

1−α α
P = .
β 1−β

Ignore the trivial cases α = β = 0 and α = β = 1. Then, by Example 1.1.4

β/(α + β) α/(α + β)
P →
n
as n → ∞,
β/(α + β) α/(α + β)

so, by Theorem 1.7.2, the distribution (β/(α + β), α/(α + β)) must be
invariant. There are of course easier ways to discover this.

Example 1.7.4
Consider the Markov chain (Xn )n≥0 with diagram

1
2 1

3 1 2
2

To ﬁnd an invariant distribution we write down the components of the

vector equation πP = π

π1 = 12 π3
π2 = π1 + 12 π2
π3 = 12 π2 + 12 π3 .

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.7 Invariant distributions 35

In terms of the chain, the right-hand sides give the probabilities for X1 ,
when X0 has distribution π, and the equations require X1 also to have
distribution π. The equations are homogeneous so one of them is redundant,
and another equation is required to ﬁx π uniquely. That equation is

π1 + π2 + π3 = 1

and we ﬁnd that π = 1/5, 2/5, 2/5 .
According to Example 1.1.6
(n)
p11 → 1/5 as n → ∞
(n)
so this conﬁrms Theorem 1.7.2. Alternatively, knowing that p11 had the
form n
(n) 1 nπ nπ
p11 = a + b cos + c sin
2 2 2
we could have used Theorem 1.7.2 and knowledge of π1 to identify a = 1/5,
(2)
instead of working out p11 in Example 1.1.6.

In the next two results we shall show that every irreducible and recurrent
stochastic matrix P has an essentially unique positive invariant measure.
The proofs rely heavily on the probabilistic interpretation so it is worth
noting at the outset that, for a ﬁnite state-space I, the existence of an
invariant row vector is a simple piece of linear algebra: the row sums of P
are all 1, so the column vector of ones is an eigenvector with eigenvalue 1,
so P must have a row eigenvector with eigenvalue 1.
For a ﬁxed state k, consider for each i the expected time spent in i between
visits to k:
k −1
T

γik = Ek 1{Xn =i} .

n=0

Here the sum of indicator functions serves to count the number of times n
at which Xn = i before the ﬁrst passage time Tk .

Theorem 1.7.5. Let P be irreducible and recurrent. Then

(i) γkk = 1;
(ii) γ k = (γik : i ∈ I) satisﬁes γ k P = γ k ;
(iii) 0 < γik < ∞ for all i ∈ I.

Proof. (i) This is obvious. (ii) For n = 1, 2, . . . the event {n ≤ Tk } depends

only on X0 , X1 , . . . , Xn−1 , so, by the Markov property at n − 1

Pk (Xn−1 = i, Xn = j and n ≤ Tk ) = Pk (Xn−1 = i and n ≤ Tk )pij .

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

36 1. Discrete-time Markov chains

Since P is recurrent, under Pk we have Tk < ∞ and X0 = XTk = k with

probability one. Therefore

Tk ∞

γjk = Ek 1{Xn =j} = Ek 1{Xn =j and n≤Tk }

n=1 n=1
∞

= Pk (Xn = j and n ≤ Tk )
n=1
∞

= Pk (Xn−1 = i, Xn = j and n ≤ Tk )
i∈I n=1

∞

= pij Pk (Xn−1 = i and n ≤ Tk )

i∈I n=1

∞

= pij Ek 1{Xm =i and m≤Tk −1}

i∈I m=0

k −1
T

= pij Ek 1{Xm =i} = γik pij .

i∈I m=0 i∈I

(iii) Since P is irreducible, for each state i there exist n, m ≥ 0 with

(n) (m) (m) (n)
pik , pki > 0. Then γik ≥ γkk pki > 0 and γik pik ≤ γkk = 1 by (i) and
(ii).
Theorem 1.7.6. Let P be irreducible and let λ be an invariant measure
for P with λk = 1. Then λ ≥ γ k . If in addition P is recurrent, then λ = γ k .
Proof. For each j ∈ I we have

λj = λi1 pi1 j = λi1 pi1 j + pkj

i1 ∈I i1 =k

= λi2 pi2 i1 pi1 j + pkj + pki1 pi1 j
i1 ,i2 =k i1 =k
..
.

= λin pin in −1 . . . pi1 j

i1 ,... ,in =k

+ pkj + pki1 pi1 j + . . . + pkin −1 . . . pi2 i1 pi1 j
i1 =k i1 ,... ,in −1 =k

So for j = k we obtain
λj ≥ Pk (X1 = j and Tk ≥ 1) + Pk (X2 = j and Tk ≥ 2)
+ . . . + Pk (Xn = j and Tk ≥ n)
→ γjk as n → ∞.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.7 Invariant distributions 37

So λ ≥ γ k . If P is recurrent, then γ k is invariant by Theorem 1.7.5, so

µ = λ − γ k is also invariant and µ ≥ 0. Since P is irreducible, given i ∈ I,
(n) (n) (n)
we have pik > 0 for some n, and 0 = µk = j∈I µj pjk ≥ µi pik , so
µi = 0.

Recall that a state i is recurrent if

Pi (Xn = i for inﬁnitely many n) = 1

and we showed in Theorem 1.5.3 that this is equivalent to

Pi (Ti < ∞) = 1.

If in addition the expected return time

mi = Ei (Ti )

is ﬁnite, then we say i is positive recurrent. A recurrent state which fails to

have this stronger property is called null recurrent.

Theorem 1.7.7. Let P be irreducible. Then the following are equivalent:

(i) every state is positive recurrent;
(ii) some state i is positive recurrent;
(iii) P has an invariant distribution, π say.
Moreover, when (iii) holds we have mi = 1/πi for all i.

Proof. (i) ⇒ (ii) This is obvious.

(ii) ⇒ (iii) If i is positive recurrent, it is certainly recurrent, so P is recur-
rent. By Theorem 1.7.5, γ i is then invariant. But

γji = mi < ∞
j∈I

so πj = γji /mi deﬁnes an invariant distribution.

(iii) ⇒ (i) Take any state k. Since P is irreducible and i∈I πi = 1 we have
(n)
πk = i∈I πi pik > 0 for some n. Set λi = πi /πk . Then λ is an invariant
measure with λk = 1. So by Theorem 1.7.6, λ ≥ γ k . Hence

πi 1
mk = γik ≤ = <∞ (1.7)
πk πk
i∈I i∈I

and k is positive recurrent.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

38 1. Discrete-time Markov chains

To complete the proof we return to the argument for (iii) ⇒ (i) armed
with the knowledge that P is recurrent, so λ = γ k and the inequality (1.7)
is in fact an equality.

Example 1.7.8 (Simple symmetric random walk on Z)

The simple symmetric random walk on Z is clearly irreducible and, by
Example 1.6.1, it is also recurrent. Consider the measure

πi = 1 for all i.

Then
πi = 12 πi−1 + 12 πi+1

so π is invariant. Now Theorem 1.7.6 forces any invariant measure to be

a scalar multiple of π. Since i∈Z πi = ∞, there can be no invariant
distribution and the walk is therefore null recurrent, by Theorem 1.7.7.

Example 1.7.9
The existence of an invariant measure does not guarantee recurrence: con-
sider, for example, the simple symmetric random walk on Z3 , which is
transient by Example 1.6.3, but has invariant measure π given by πi = 1
for all i.

Example 1.7.10
Consider the asymmetric random walk on Z with transition probabilities
pi,i−1 = q < p = pi,i+1 . In components the invariant measure equation
πP = π reads
πi = πi−1 p + πi+1 q.

This is a recurrence relation for π with general solution

πi = A + B(p/q)i .

So, in this case, there is a two-parameter family of invariant measures –

uniqueness up to scalar multiples does not hold.

Example 1.7.11
Consider a success-run chain on Z+ , whose transition probabilities are given
by
pi,i+1 = pi , pi0 = qi = 1 − pi .

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.7 Invariant distributions 39

Then the components of the invariant measure equation πP = π read

π0 = qi πi ,
i=0
πi = pi−1 πi−1 , for i ≥ 1.

Suppose we choose pi converging suﬃciently rapidly to 1 so that

∞

p= pi > 0.
i=0

Then for any invariant measure π we have

π0 = (1 − pi )pi−1 . . . p0 π0 = (1 − p)π0 .
i=0

This equation forces either π0 = 0 or π0 = ∞, so there is no non-zero

invariant measure.

Exercises

1.7.1 Find all invariant distributions of the transition matrix in Exercise

1.2.1.

1.7.2 Gas molecules move about randomly in a box which is divided into two
halves symmetrically by a partition. A hole is made in the partition. Sup-
pose there are N molecules in the box. Show that the number of molecules
on one side of the partition just after a molecule has passed through the hole
evolves as a Markov chain. What are the transition probabilities? What is
the invariant distribution of this chain?

1.7.3 A particle moves on the eight vertices of a cube in the following

way: at each step the particle is equally likely to move to each of the three
adjacent vertices, independently of its past motion. Let i be the initial
vertex occupied by the particle, o the vertex opposite i. Calculate each of
the following quantities:

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

40 1. Discrete-time Markov chains

(i) the expected number of steps until the particle returns to i;

(ii) the expected number of visits to o until the ﬁrst return to i;
(iii) the expected number of steps until the ﬁrst visit to o.

1.7.4 Let (Xn )n≥0 be a simple random walk on Z with pi,i−1 = q < p =
pi,i+1 . Find
T −1

γi = E0
0
1{Xn =i}
n=0

and verify that

γi0 = inf λi for all i
λ

where the inﬁmum is taken over all invariant measures λ with λ0 = 1.

(Compare with Theorem 1.7.6 and Example 1.7.10.)

1.7.5 Let P be a stochastic matrix on a ﬁnite set I. Show that a distribution

π is invariant for P if and only if π(I −P +A) = a, where A = (aij : i, j ∈ I)
with aij = 1 for all i and j, and a = (ai : i ∈ I) with ai = 1 for all i. Deduce
that if P is irreducible then I −P +A is invertible. Note that this enables one
to compute the invariant distribution by any standard method of inverting
a matrix .

1.8 Convergence to equilibrium

We shall investigate the limiting behaviour of the n-step transition proba-

(n)
bilities pij as n → ∞. As we saw in Theorem 1.7.2, if the state-space is
ﬁnite and if for some i the limit exists for all j, then it must be an invariant
distribution. But, as the following example shows, the limit does not always
exist.

Example 1.8.1
Consider the two-state chain with transition matrix

0 1
P = .
1 0
(n)
Then P 2 = I, so P 2n = I and P 2n+1 = P for all n. Thus pij fails to
converge for all i, j.
(n)
Let us call a state i aperiodic if pii > 0 for all suﬃciently large n. We
leave it as an exercise to show that i is aperiodic if and only if the set
(n)
{n ≥ 0 : pii > 0} has no common divisor other than 1. This is also
a consequence of Theorem 1.8.4. The behaviour of the chain in Example
1.8.1 is connected with its periodicity.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.8 Convergence to equilibrium 41

Lemma 1.8.2. Suppose P is irreducible and has an aperiodic state i.

(n)
Then, for all states j and k, pjk > 0 for all suﬃciently large n. In particular,
all states are aperiodic.
(r) (s)
Proof. There exist r, s ≥ 0 with pji , pik > 0. Then

(r+n+s) (r) (n) (s)

pjk ≥ pji pii pik > 0

for all suﬃciently large n.

Here is the main result of this section. The method of proof, by coupling
two Markov chains, is ingenious.
Theorem 1.8.3 (Convergence to equilibrium). Let P be irreducible
and aperiodic, and suppose that P has an invariant distribution π. Let λ
be any distribution. Suppose that (Xn )n≥0 is Markov(λ, P ). Then

P(Xn = j) → πj as n → ∞ for all j.

In particular,
(n)
pij → πj as n → ∞ for all i, j.

Proof. We use a coupling argument. Let (Yn )n≥0 be Markov(π, P ) and

independent of (Xn )n≥0 . Fix a reference state b and set

T = inf{n ≥ 1 : Xn = Yn = b}.

Step 1. We show P(T < ∞) = 1. The process Wn = (Xn , Yn ) is a Markov

chain on I × I with transition probabilities

p(i,k)(j,l) = pij pkl

and initial distribution

µ(i,k) = λi πk .
Since P is aperiodic, for all states i, j, k, l we have
(n) (n) (n)
p(i,k)(j,l) = pij pkl > 0

for all suﬃciently large n; so P is irreducible. Also, P has an invariant

distribution given by
(i,k) = πi πk
π

so, by Theorem 1.7.7, P is positive recurrent. But T is the ﬁrst passage

time of Wn to (b, b) so P(T < ∞) = 1, by Theorem 1.5.7.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

42 1. Discrete-time Markov chains

Step 2. Set
Xn if n < T
Zn =
Yn if n ≥ T .
The diagram below illustrates the idea. We show that (Zn )n≥0 is
Markov(λ, P ).
I

Zn
*** *
* ** **
Xn ** ** ** * * ** *** **
** ** ** * * ** **
* ** ** *
** **
b *
T n
Yn

The strong Markov property applies to (Wn )n≥0 at time T , so

(XT +n , YT +n )n≥0 is Markov(δ(b,b), P) and independent of (X0 , Y0 ),
(X1 , Y1 ), . . . , (XT , YT ). By symmetry, we can replace the process
(XT +n , YT +n )n≥0 by (YT +n , XT +n )n≥0 which is also Markov(δ(b,b) , P ) and
remains independent of (X0 , Y0 ), (X1 , Y1 ), . . . , (XT , YT ). Hence Wn =
(Zn , Zn ) is Markov(µ, P) where

Yn if n < T
Zn =
Xn if n ≥ T .

In particular, (Zn )n≥0 is Markov(λ, P ).

Step 3. We have

P(Zn = j) = P(Xn = j and n < T ) + P(Yn = j and n ≥ T )

|P(Xn = j) − πj | = |P(Zn = j) − P(Yn = j)|

= |P(Xn = j and n < T ) − P(Yn = j and n < T )|
≤ P(n < T )

and P(n < T ) → 0 as n → ∞.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.8 Convergence to equilibrium 43

To understand this proof one should see what goes wrong when P is
not aperiodic. Consider the two-state chain of Example 1.8.1 which has
(1/2, 1/2) as its unique invariant distribution. We start (Xn )n≥0 from 0
and (Yn )n≥0 with equal probability from 0 or 1. However, if Y0 = 1, then,
because of periodicity, (Xn )n≥0 and (Yn )n≥0 will never meet, and the proof
fails. We move on now to the cases that were excluded in the last theorem,
where (Xn )n≥0 is periodic or transient or null recurrent. The remainder of
this section might be omitted on a ﬁrst reading.

Theorem 1.8.4. Let P be irreducible. There is an integer d ≥ 1 and a

partition
I = C0 ∪ C1 ∪ . . . ∪ Cd−1
such that (setting Cnd+r = Cr )
(n)
(i) pij > 0 only if i ∈ Cr and j ∈ Cr+n for some r;
(nd)
(ii) pij > 0 for all sufficiently large n, for all i, j ∈ Cr , for all r.
(n)
Proof. Fix a state k and consider S = {n ≥ 0 : pkk > 0}. Choose n1 , n2 ∈ S
with n1 < n2 and such that d := n2 − n1 is as small as possible. (Here and
throughout we use the symbol := to mean ‘defined to equal’.) Define for
r = 0, . . . , d − 1
(nd+r)
Cr = {i ∈ I : pki > 0 for some n ≥ 0}.

(nd+r)
Then C0 ∪ . . . ∪ Cd−1 = I, by irreducibility. Moreover, if pki > 0 and
(nd+s)
pki > 0 for some r, s ∈ {0, 1, . . . , d − 1}, then, choosing m ≥ 0 so that
(m) (nd+r+m) (nd+s+m)
pik > 0, we have pkk > 0 and pkk > 0 so r = s by minimality
of d. Hence we have a partition.
(n) (md+r)
To prove (i) suppose pij > 0 and i ∈ Cr . Choose m so that pki > 0,
(md+r+n)
then pkj > 0 so j ∈ Cr+n as required. By taking i = j = k we now
see that d must divide every element of S, in particular n1 .
Now for nd ≥ n21 , we can write nd = qn1 + r for integers q ≥ n1 and
0 ≤ r ≤ n1 − 1. Since d divides n1 we then have r = md for some integer
m and then nd = (q − m)n1 + mn2 . Hence

(nd) (n ) (n )
pkk ≥ (pkk1 )q−m (pkk2 )m > 0

and hence nd ∈ S. To prove (ii) for i, j ∈ Cr choose m1 and m2 so that

(m ) (m )
pik 1 > 0 and pkj 2 > 0, then

(m1 +nd+m2 ) (m1 ) (nd) (m2 )

pij ≥ pik pkk pkj >0

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

44 1. Discrete-time Markov chains

whenever nd ≥ n21 . Since m1 + m2 is then necessarily a multiple of d, we

are done.

We call d the period of P . The theorem just proved shows in particular for
(n)
all i ∈ I that d is the greatest common divisor of the set {n ≥ 0 : pii > 0}.
This is sometimes useful in identifying d.
Finally, here is a complete description of limiting behaviour for irre-
ducible chains. This generalizes Theorem 1.8.3 in two respects since we
require neither aperiodicity nor the existence of an invariant distribution.
The argument we use for the null recurrent case was discovered recently by
B. Fristedt and L. Gray.

Theorem 1.8.5. Let P be irreducible of period d and let C0 , C1 , . . . , Cd−1

be the partition obtained in Theorem 1.8.4. Let λ be a distribution with

i∈C0 λi = 1. Suppose that (Xn )n≥0 is Markov(λ, P ). Then for r =
0, 1, . . . , d − 1 and j ∈ Cr we have

P(Xnd+r = j) → d/mj as n → ∞

where mj is the expected return time to j. In particular, for i ∈ C0 and

j ∈ Cr we have
(nd+r)
pij → d/mj as n → ∞.

Proof

Step 1. We reduce to the aperiodic case. Set ν = λP r , then by Theorem

1.8.4 we have

νi = 1.
i∈Cr

Set Yn = Xnd+r , then (Yn )n≥0 is Markov(ν, P d ) and, by Theorem 1.8.4, P d

is irreducible and aperiodic on Cr . For j ∈ Cr the expected return time of
(Yn )n≥0 to j is mj /d. So if the theorem holds in the aperiodic case, then

P(Xnd+r = j) = P(Yn = j) → d/mj as n → ∞

so the theorem holds in general.

Step 2. Assume that P is aperiodic. If P is positive recurrent then 1/mj =

πj , where π is the unique invariant distribution, so the result follows from
Theorem 1.8.3. Otherwise mj = ∞ and we have to show that

P(Xn = j) → 0 as n → ∞.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.8 Convergence to equilibrium 45

If P is transient this is easy and we are left with the null recurrent
case.

Step 3. Assume that P is aperiodic and null recurrent. Then

∞

Pj (Tj > k) = Ej (Tj ) = ∞.

k=0

Given ε > 0 choose K so that

K−1
2
Pj (Tj > k) ≥ .
ε
k=0

Then, for n ≥ K − 1

n
1≥ P(Xk = j and Xm = j for m = k + 1, . . . , n)
k=n−K+1

n
= P(Xk = j)Pj (Tj > n − k)
k=n−K+1

K−1
= P(Xn−k = j)Pj (Tj > k)
k=0

so we must have P(Xn−k = j) ≤ ε/2 for some k ∈ {0, 1, . . . , K − 1}.

Return now to the coupling argument used in Theorem 1.8.3, only now let
(Yn )n≥0 be Markov(µ, P ), where µ is to be chosen later. Set Wn = (Xn , Yn ).
As before, aperiodicity of (Xn )n≥0 ensures irreducibility of (Wn )n≥0 . If
(Wn )n≥0 is transient then, on taking µ = λ, we obtain

P(Xn = j)2 = P Wn = (j, j) → 0

as required. Assume then that (Wn )n≥0 is recurrent. Then, in the notation
of Theorem 1.8.3, we have P(T < ∞) = 1 and the coupling argument shows
that
|P(Xn = j) − P(Yn = j)| → 0 as n → ∞.

We exploit this convergence by taking µ = λP k for k = 1, . . . , K − 1, so

that P(Yn = j) = P(Xn+k = j). We can ﬁnd N such that for n ≥ N and
k = 1, . . . , K − 1,
ε
|P(Xn = j) − P(Xn+k = j)| ≤ .
2

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

46 1. Discrete-time Markov chains

But for any n we can ﬁnd k ∈ {0, 1, . . . , K − 1} such that P(Xn+k = j) ≤

ε/2. Hence, for n ≥ N
P(Xn = j) ≤ ε.
Since ε > 0 was arbitrary, this shows that P(Xn = j) → 0 as n → ∞, as
required.

Exercises
1.8.1 Prove the claims (e), (f) and (g) made in example (v) of the Intro-
duction.

1.8.2 Find the invariant distributions of the transition matrices in Exercise

1.1.7, parts (a), (b) and (c), and compare them with your answers there.

1.8.3 A fair die is thrown repeatedly. Let Xn denote the sum of the ﬁrst n
throws. Find
lim P(Xn is a multiple of 13)
n→∞

quoting carefully any general theorems that you use.

1.8.4 Each morning a student takes one of the three books he owns from
his shelf. The probability that he chooses book i is αi , where 0 < αi < 1 for
i = 1, 2, 3, and choices on successive days are independent. In the evening
he replaces the book at the left-hand end of the shelf. If pn denotes the
probability that on day n the student ﬁnds the books in the order 1,2,3,
from left to right, show that, irrespective of the initial arrangement of the
books, pn converges as n → ∞, and determine the limit.

1.8.5 (Renewal theorem). Let Y1 , Y2 , . . . be independent, identically

distributed random variables with values in {1, 2, . . . }. Suppose that the
set of integers
{n : P(Y1 = n) > 0}
has greatest common divisor 1. Set µ = E(Y1 ). Show that the following
process is a Markov chain:

Xn = inf{m ≥ n : m = Y1 + . . . + Yk for some k ≥ 0} − n.

Determine
lim P(Xn = 0)
n→∞

and hence show that as n → ∞

P(n = Y1 + . . . + Yk for some k ≥ 0) → 1/µ.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.9 Time reversal 47

(Think of Y1 , Y2 , . . . as light-bulb lifetimes. A bulb is replaced when it fails.

Thus the limiting probability that a bulb is replaced at time n is 1/µ. Al-
though this appears to be a very special case of convergence to equilibrium,
one can actually recover the full result by applying the renewal theorem to
(1) (2)
the excursion lengths Si , Si , . . . from state i.)

1.9 Time reversal

For Markov chains, the past and future are independent given the present.
This property is symmetrical in time and suggests looking at Markov chains
with time running backwards. On the other hand, convergence to equilib-
rium shows behaviour which is asymmetrical in time: a highly organised
state such as a point mass decays to a disorganised one, the invariant dis-
tribution. This is an example of entropy increasing. It suggests that if
we want complete time-symmetry we must begin in equilibrium. The next
result shows that a Markov chain in equilibrium, run backwards, is again a
Markov chain. The transition matrix may however be diﬀerent.

Theorem 1.9.1. Let P be irreducible and have an invariant distribution

π. Suppose that (Xn )0≤n≤N is Markov(π, P ) and set Yn = XN −n . Then
(Yn )0≤n≤N is Markov(π, P), where P = (
pij ) is given by

πj pji = πi pij for all i, j

and P is also irreducible with invariant distribution π.

Proof. First we check that P is a stochastic matrix:

pji = πi pij = 1
πj
i∈I i∈I

since π is invariant for P . Next we check that π is invariant for P :

πj pji = πi pij = πi
j∈I j∈I

since P is a stochastic matrix.

We have

P(Y0 = i1 , Y1 = i2 , . . . , YN = iN )
= P(X0 = iN , X1 = iN −1 , . . . , XN = i1 )
= πiN piN iN −1 . . . pi2 i1 = πi1 pi1 i2 . . . piN −1 iN

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

48 1. Discrete-time Markov chains

so, by Theorem 1.1.1, (Yn )0≤n≤N is Markov(π, P ). Finally, since P is

irreducible, for each pair of states i, j there is a chain of states i1 =
i, i2 , . . . , in−1 , in = j with pi1 i2 . . . pin −1 in > 0. Then
pin in −1 . . . pi2 i1 = πi1 pi1 i2 . . . pin −1 in /πin > 0
so P is also irreducible.
The chain (Yn )0≤n≤N is called the time-reversal of (Xn )0≤n≤N .
A stochastic matrix P and a measure λ are said to be in detailed balance
if
λi pij = λj pji for all i, j.
Though obvious, the following result is worth remembering because, when
a solution λ to the detailed balance equations exists, it is often easier to
ﬁnd by the detailed balance equations than by the equation λ = λP .
Lemma 1.9.2. If P and λ are in detailed balance, then λ is invariant for
P.

Proof. We have (λP )i = j∈I λj pji = j∈I λi pij = λi .
Let (Xn )n≥0 be Markov(λ, P ), with P irreducible. We say that (Xn )n≥0
is reversible if, for all N ≥ 1, (XN −n )0≤n≤N is also Markov(λ, P ).
Theorem 1.9.3. Let P be an irreducible stochastic matrix and let λ be
a distribution. Suppose that (Xn )n≥0 is Markov(λ, P ). Then the following
are equivalent:
(a) (Xn )n≥0 is reversible;
(b) P and λ are in detailed balance.
Proof. Both (a) and (b) imply that λ is invariant for P . Then both (a) and
(b) are equivalent to the statement that P = P in Theorem 1.9.1.
We begin a collection of examples with a chain which is not reversible.
Example 1.9.4
Consider the Markov chain with diagram:
1

2 2
3 3
1 1
3 3
1
3

3 2
2
3

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.9 Time reversal 49

The transition matrix is

 
0 2/3 1/3

P = 1/3 0 2/3 
2/3 1/3 0

and π = (1/3, 1/3, 1/3) is invariant. Hence P = P T , the transpose of P .

But P is not symmetric, so P = P and this chain is not reversible. A
patient observer would see the chain move clockwise in the long run: under
time-reversal the clock would run backwards!

Example 1.9.5

Consider the Markov chain with diagram:

p q p q
0 1 i−1 i i+1 M −1 M

where 0 < p = 1 − q < 1. The non-zero detailed balance equations read

λi pi,i+1 = λi+1 pi+1,i for i = 0, 1, . . . , M − 1.

So a solution is given by

λ = (p/q)i : i = 0, 1, . . . , M

and this may be normalised to give a distribution in detailed balance with

P . Hence this chain is reversible.
If p were much larger than q, one might argue that the chain would tend
to move to the right and its time-reversal to the left. However, this ignores
the fact that we reverse the chain in equilibrium, which in this case would
be heavily concentrated near M . An observer would see the chain spending
most of its time near M and making occasional brief forays to the left,
which behaviour is symmetrical in time.

Example 1.9.6 (Random walk on a graph)

A graph G is a countable collection of states, usually called vertices, some

of which are joined by edges, for example:

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

50 1. Discrete-time Markov chains

1 2

4 3

Thus a graph is a partially drawn Markov chain diagram. There is a natural

way to complete the diagram which gives rise to the random walk on G.
The valency vi of vertex i is the number of edges at i. We have to assume
that every vertex has ﬁnite valency. The random walk on G picks edges
with equal probability:

1 1
1 2 3 2
1
3
1 1
2 3

1 1 1
3 3 2

4 1 1 3
3 2

Thus the transition probabilities are given by

1/vi if (i, j) is an edge
pij =
0 otherwise.
We assume G is connected, so that P is irreducible. It is easy to see that
P is in detailed balance with v = (vi : i ∈ G). So, if the total valency

σ = i∈G vi is ﬁnite, then π = v/σ is invariant and P is reversible.
Example 1.9.7 (Random chessboard knight)
A random knight makes each permissible move with equal probability. If it
starts in a corner, how long on average will it take to return?
This is an example of a random walk on a graph: the vertices are the
squares of the chessboard and the edges are the moves that the knight can
take:

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.9 Time reversal 51

The diagram shows a part of the graph. We know by Theorem 1.7.7 and
the preceding example that

Ec (Tc ) = 1/πc = (vi /vc )

so all we have to do is identify valencies. The four corner squares have

valency 2, and the eight squares adjacent to the corners have valency 3.
There are 20 squares of valency 4, 16 of valency 6, and the 16 central
squares have valency 8. Hence

8 + 24 + 80 + 96 + 128
Ec (Tc ) = = 168.
2

Alternatively, if you enjoy solving sets of 64 simultaneous linear equations,

you might try ﬁnding π from πP = π, or calculating Ec (Tc ) using Theorem
1.3.5!

Exercises
1.9.1 In each of the following cases determine whether the stochastic matrix
P , which you may assume is irreducible, is reversible:
 
0 p 1−p
1−p p
(a) ; (b)  1 − p 0 p  ;
q 1−q
p 1−p 0

(c) I = {0, 1, . . . , N } and pij = 0 if |j − i| ≥ 2 ;

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

52 1. Discrete-time Markov chains

(d) I = {0, 1, 2, . . . } and p01 = 1, pi,i+1 = p, pi,i−1 = 1 − p for i ≥ 1;

(e) pij = pji for all i, j ∈ S.

1.9.2 Two particles X and Y perform independent random walks on the

graph shown in the diagram. So, for example, a particle at A jumps to B,
C or D with equal probability 1/3.

A B

Find the probability that X and Y ever meet at a vertex in the following
cases:
(a) X starts at A and Y starts at B;
(b) X starts at A and Y starts at E.
For I = B, D let MI denote the expected time, when both X and Y start
at I, until they are once again both at I. Show that 9MD = 16MB .

1.10 Ergodic theorem

Ergodic theorems concern the limiting behaviour of averages over time.

We shall prove a theorem which identiﬁes for Markov chains the long-run
proportion of time spent in each state. An essential tool is the following
ergodic theorem for independent random variables which is a version of the
strong law of large numbers.

Theorem 1.10.1 (Strong law of large numbers). Let Y1 , Y2 , . . . be

a sequence of independent, identically distributed, non-negative random

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.10 Ergodic theorem 53

variables with E(Y1 ) = µ. Then

Y1 + . . . + Yn
P → µ as n → ∞ = 1.
n

Proof. A proof for the case µ < ∞ may be found, for example, in Probability
with Martingales by David Williams (Cambridge University Press, 1991).
(N )
The case where µ = ∞ is a simple deduction. Fix N < ∞ and set Yn =
Yn ∧ N . Then
(N ) (N )
Y1 + . . . + Yn Y + . . . + Yn
≥ 1 → E(Y1 ∧ N ) as n → ∞
n n
with probability one. As N ↑ ∞ we have E(Y1 ∧ N ) ↑ µ by monotone
convergence (see Section 6.4). So we must have, with probability 1
Y1 + . . . + Yn
→∞ as n → ∞.
n

We denote by Vi (n) the number of visits to i before n:

n−1
Vi (n) = 1{Xk =i} .
k=0

Then Vi (n)/n is the proportion of time before n spent in state i. The

following result gives the long-run proportion of time spent by a Markov
chain in each state.

Theorem 1.10.2 (Ergodic theorem). Let P be irreducible and let λ

be any distribution. If (Xn )n≥0 is Markov(λ, P ) then

Vi (n) 1
P → as n → ∞ = 1
n mi

where mi = Ei (Ti ) is the expected return time to state i. Moreover, in the

positive recurrent case, for any bounded function f : I → R we have
n−1
1

P f (Xk ) → f as n → ∞ = 1
n
k=0

where

f= πi fi
i∈I

and where (πi : i ∈ I) is the unique invariant distribution.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

54 1. Discrete-time Markov chains

Proof. If P is transient, then, with probability 1, the total number Vi of

visits to i is ﬁnite, so

Vi (n) Vi 1
≤ →0= .
n n mi

Suppose then that P is recurrent and ﬁx a state i. For T = Ti we have

P(T < ∞) = 1 by Theorem 1.5.7 and (XT +n )n≥0 is Markov(δi , P ) and
independent of X0 , X1 , . . . , XT by the strong Markov property. The long-
run proportion of time spent in i is the same for (XT +n )n≥0 and (Xn )n≥0 ,
so it suﬃces to consider the case λ = δi .
(r)
Write Si for the length of the rth excursion to i, as in Section 1.5. By
(1) (2)
Lemma 1.5.1, the non-negative random variables Si , Si , . . . are indepen-
(r)
dent and identically distributed with Ei (Si ) = mi . Now

(1) (Vi (n)−1)

Si + . . . + Si ≤ n − 1,

the left-hand side being the time of the last visit to i before n. Also
(1) (Vi (n))
Si + . . . + Si ≥ n,

the left-hand side being the time of the ﬁrst visit to i after n − 1. Hence
(1) (Vi (n)−1) (1) (Vi (n))
Si + . . . + Si n S + . . . + Si
≤ ≤ i . (1.8)
Vi (n) Vi (n) Vi (n)

By the strong law of large numbers

(1) (n)
S i + . . . + Si
P → mi as n → ∞ = 1
n

and, since P is recurrent

P(Vi (n) → ∞ as n → ∞) = 1.

So, letting n → ∞ in (1.8), we get

n
P → mi as n → ∞ = 1,
Vi (n)

which implies
Vi (n) 1
P → as n → ∞ = 1.
n mi

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.10 Ergodic theorem 55

Assume now that (Xn )n≥0 has an invariant distribution (πi : i ∈ I). Let
f : I → R be a bounded function and assume without loss of generality that
|f | ≤ 1. For any J ⊆ I we have

n−1
Vi (n)
f (Xk ) − f = − πi fi
n n
k=0 i∈I

Vi (n)
Vi (n)
≤ − πi + − πi
n n
i∈J i∈J

Vi (n)
Vi (n)
≤ − πi + + πi
n n
i∈J i∈J

Vi (n)

≤2 − πi + 2 πi .
n
i∈J i∈J

We proved above that

Vi (n)
P → πi as n → ∞ for all i = 1.
n

Given ε > 0, choose J ﬁnite so that

πi < ε/4
i∈J

and then N = N (ω) so that, for n ≥ N (ω)

Vi (n)
− πi < ε/4.
n
i∈J

Then, for n ≥ N (ω), we have

n−1
f (Xk ) − f < ε,
n
k=0

which establishes the desired convergence.

We consider now the statistical problem of estimating an unknown tran-

sition matrix P on the basis of observations of the corresponding Markov
chain. Consider, to begin, the case where we have N + 1 observations
(Xn )0≤n≤N . The log-likelihood function is given by

l(P ) = log(λX0 pX0 X1 . . . pXN −1 XN ) = Nij log pij

i,j∈I

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

56 1. Discrete-time Markov chains

up to a constant independent of P , where Nij is the number of transitions

from i to j. A standard statistical procedure is to ﬁnd the maximum likeli-
hood estimate P, which is the choice of P maximizing l(P ). Since P must

satisfy the linear constraint j pij = 1 for each i, we ﬁrst try to maximize

l(P ) + µi pij
i,j∈I

and then choose (µi : i ∈ I) to ﬁt the constraints. This is the method of

Lagrange multipliers. Thus we ﬁnd

N −1

N −1
pij = 1{Xn =i,Xn +1 =j} / 1{Xn =i}
n=0 n=0

which is the proportion of jumps from i which go to j.

We now turn to consider the consistency of this sort of estimate, that is
to say whether pij → pij with probability 1 as N → ∞. Since this is clearly
false when i is transient, we shall slightly modify our approach. Note that
to ﬁnd pij we simply have to maximize

Nij log pij

j∈I

subject to j pij = 1: the other terms and constraints are irrelevant. Sup-
pose then that instead of N + 1 observations we make enough observations
to ensure the chain leaves state i a total of N times. In the transient case
this may involve restarting the chain several times. Denote again by Nij
the number of transitions from i to j.
To maximize the likelihood for (pij : j ∈ I) we still maximize

Nij log pij

j∈I

subject to j pij = 1, which leads to the maximum likelihood estimate

pij = Nij /N.

But Nij = Y1 + . . . + YN , where Yn = 1 if the nth transition from i is to

j, and Yn = 0 otherwise. By the strong Markov property Y1 , . . . , YN are
independent and identically distributed random variables with mean pij .
So, by the strong law of large numbers

P(
pij → pij as N → ∞) = 1,

which shows that pij is consistent.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.11 Appendix: recurrence relations 57

Exercises
1.10.1 Prove the claim (d) made in example (v) of the Introduction.

1.10.2 A professor has N umbrellas. He walks to the oﬃce in the morning

and walks home in the evening. If it is raining he likes to carry an um-
brella and if it is ﬁne he does not. Suppose that it rains on each journey
with probability p, independently of past weather. What is the long-run
proportion of journeys on which the professor gets wet?

1.10.3 Let (Xn )n≥0 be an irreducible Markov chain on I having an invariant

distribution π. For J ⊆ I let (Ym )m≥0 be the Markov chain on J obtained
by observing (Xn )n≥0 whilst in J . (See Example 1.4.4.) Show that (Ym )m≥0
is positive recurrent and ﬁnd its invariant distribution.

1.10.4 An opera singer is due to perform a long series of concerts. Hav-

ing a fine artistic temperament, she is liable to pull out each night with
probability 1/2. Once this has happened she will not sing again until the
promoter convinces her of his high regard. This he does by sending flowers
every day until she returns. Flowers costing x thousand pounds, 0 ≤ x ≤ 1,
√
bring about a reconciliation with probability x. The promoter stands to
make £750 from each successful concert. How much should he spend on
flowers?

1.11 Appendix: recurrence relations

Recurrence relations often arise in the linear equations associated to Markov

chains. Here is an account of the simplest cases. A more specialized case
was dealt with in Example 1.3.4. In Example 1.1.4 we found a recurrence
relation of the form
xn+1 = axn + b.
We look ﬁrst for a constant solution xn = x; then x = ax + b, so provided
a = 1 we must have x = b/(1 − a). Now yn = xn − b/(1 − a) satisﬁes
yn+1 = ayn , so yn = an y0 . Thus the general solution when a = 1 is given
by
xn = Aan + b/(1 − a)
where A is a constant. When a = 1 the general solution is obviously

xn = x0 + nb.

In Example 1.3.3 we found a recurrence relation of the form

axn+1 + bxn + cxn−1 = 0

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

58 1. Discrete-time Markov chains

where a and c were both non-zero. Let us try a solution of the form xn = λn ;
then aλ2 + bλ + c = 0. Denote by α and β the roots of this quadratic. Then

yn = Aαn + Bβ n

is a solution. If α = β then we can solve the equations

x0 = A + B, x1 = Aα + Bβ

so that y0 = x0 and y1 = x1 ; but

a(yn+1 − xn+1 ) + b(yn − xn ) + c(yn−1 − xn−1 ) = 0

for all n, so by induction yn = xn for all n. If α = β = 0, then

yn = (A + nB)αn

is a solution and we can solve

x0 = Aαn , x1 = (A + B)αn

so that y0 = x0 and y1 = x1 ; then, by the same argument, yn = xn for all

n. The case α = β = 0 does not arise. Hence the general solution is given
by
Aαn + Bβ n if α = β
xn =
(A + nB)αn if α = β.

1.12 Appendix: asymptotics for n!

Our analysis of recurrence and transience for random walks in Section 1.6
rested heavily on the use of the asymptotic relation
√
n! ∼ A n(n/e)n as n → ∞

for some A ∈ [1, ∞). Here is a derivation.

We make use of the power series expansions for |t| < 1

log(1 + t) = t − 12 t2 + 13 t3 − . . .
log(1 − t) = −t − 12 t2 − 13 t3 − . . . .

By subtraction we obtain

1+t
1
log = t + 13 t3 + 15 t5 + . . . .
2 1−t

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.12 Appendix: asymptotics for n! 59

Set An = n!/(nn+1/2 e−n ) and an = log An . Then, by a straightforward

calculation

1 1 + (2n + 1)−1
an − an+1 = (2n + 1) log − 1.
2 1 − (2n + 1)−1

By the series expansion written above we have

1 1 1 1 1
an − an+1 = (2n + 1) + + + ... − 1
(2n + 1) 3 (2n + 1)3 5 (2n + 1)5
1 1 1 1
= + + ...
3 (2n + 1)2 5 (2n + 1)4

1 1 1
≤ + + ...
3 (2n + 1)2 (2n + 1)4
1 1 1 1
= = − .
3 (2n + 1)2 − 1 12n 12(n + 1)

It follows that an decreases and an − 1/(12n) increases as n → ∞. Hence

an → a for some a ∈ [0, ∞) and hence An → A, as n → ∞, where A = ea .

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

Benoîte de Saporta, Mounir Zili - Martingales and Financial Mathematics in Discrete Time-Wiley-ISTE (2022)
No ratings yet
Benoîte de Saporta, Mounir Zili - Martingales and Financial Mathematics in Discrete Time-Wiley-ISTE (2022)
226 pages
Durrett 5e Solutions
No ratings yet
Durrett 5e Solutions
96 pages
TD Calcul Stochastique
No ratings yet
TD Calcul Stochastique
3 pages
Expected Value Markov Chains
No ratings yet
Expected Value Markov Chains
10 pages
Stochastic Calculus Notes 4/5
No ratings yet
Stochastic Calculus Notes 4/5
22 pages
ADVERTISING
No ratings yet
ADVERTISING
14 pages
Public-Faqs WPLN Assessment Transition
No ratings yet
Public-Faqs WPLN Assessment Transition
10 pages
MATH858D Markov Chains: Maria Cameron
No ratings yet
MATH858D Markov Chains: Maria Cameron
44 pages
Markov Chains
No ratings yet
Markov Chains
42 pages
Probability and Statistics With Reliability, Queuing and Computer Science Applications: Chapter 7 On
No ratings yet
Probability and Statistics With Reliability, Queuing and Computer Science Applications: Chapter 7 On
41 pages
Stochastic - Lecture Notes
100% (1)
Stochastic - Lecture Notes
108 pages
Stochastic Processes Beamer
No ratings yet
Stochastic Processes Beamer
43 pages
Martingales in Discrete-Time - (Kozdron)
No ratings yet
Martingales in Discrete-Time - (Kozdron)
5 pages
Markov Chains and Markov Chain Monte Carlo: Yee Whye Teh Department of Statistics Tas: Luke Kelly, Lloyd Elliott
No ratings yet
Markov Chains and Markov Chain Monte Carlo: Yee Whye Teh Department of Statistics Tas: Luke Kelly, Lloyd Elliott
93 pages
Stanford - Discrete Time Markov Chains PDF
No ratings yet
Stanford - Discrete Time Markov Chains PDF
23 pages
Random Walks
No ratings yet
Random Walks
130 pages
ML Kernel Methods
No ratings yet
ML Kernel Methods
51 pages
Lecture01a (Linear Algebra)
100% (1)
Lecture01a (Linear Algebra)
113 pages
Stochastic Processes Notes
No ratings yet
Stochastic Processes Notes
2 pages
Chapter 29 (Markov Chains) PDF
No ratings yet
Chapter 29 (Markov Chains) PDF
23 pages
A3 - Random Variables and Distributions
100% (1)
A3 - Random Variables and Distributions
19 pages
Stochastic Control:: With Applications To Financial Mathematics
No ratings yet
Stochastic Control:: With Applications To Financial Mathematics
66 pages
Continuous Markov Chain
No ratings yet
Continuous Markov Chain
17 pages
IE 325: Stochastic Models Homework Assignment 2: Fall 2015 Due: October 12, 17:00
No ratings yet
IE 325: Stochastic Models Homework Assignment 2: Fall 2015 Due: October 12, 17:00
2 pages
Homework 2
No ratings yet
Homework 2
3 pages
2020 Notes Numprofin
No ratings yet
2020 Notes Numprofin
170 pages
Stopping Times Solutions
No ratings yet
Stopping Times Solutions
3 pages
Markov Chain
100% (1)
Markov Chain
28 pages
Haar Measure On Compact Groups
No ratings yet
Haar Measure On Compact Groups
12 pages
Rough Volatility 2023 Part 1 Handout
No ratings yet
Rough Volatility 2023 Part 1 Handout
43 pages
Markov Chains Cheat Sheet PDF
No ratings yet
Markov Chains Cheat Sheet PDF
2 pages
Mathematical Statistics Final Exam
No ratings yet
Mathematical Statistics Final Exam
5 pages
Lecture - 12 Von Neumann & Morgenstern Expected Utility
No ratings yet
Lecture - 12 Von Neumann & Morgenstern Expected Utility
20 pages
Markov Chains
No ratings yet
Markov Chains
76 pages
Strassen's 2 × 2 Matrix Multiplication Algorithm: A Conceptual Perspective
No ratings yet
Strassen's 2 × 2 Matrix Multiplication Algorithm: A Conceptual Perspective
6 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
Further Topics On Discrete-Time Markov Control Processes
No ratings yet
Further Topics On Discrete-Time Markov Control Processes
285 pages
Sol A6 MarkovChain
No ratings yet
Sol A6 MarkovChain
24 pages
Markov Chain and Markov Processes
No ratings yet
Markov Chain and Markov Processes
9 pages
Queuing Thoery MM1 Practice Problems
No ratings yet
Queuing Thoery MM1 Practice Problems
1 page
Probability
100% (2)
Probability
520 pages
The Lindeberg and Lyapunov Conditions
No ratings yet
The Lindeberg and Lyapunov Conditions
4 pages
Laplace-Stieltjes Transform
100% (1)
Laplace-Stieltjes Transform
4 pages
Markov Chains
No ratings yet
Markov Chains
15 pages
Oksendal Stochastic Differential Equations PDF Free
No ratings yet
Oksendal Stochastic Differential Equations PDF Free
385 pages
One Hundred Solved Exercise Stocastic Processes
No ratings yet
One Hundred Solved Exercise Stocastic Processes
74 pages
Download ebooks file Stochastic Equations in Infinite Dimensions Prato G.D. all chapters
100% (2)
Download ebooks file Stochastic Equations in Infinite Dimensions Prato G.D. all chapters
61 pages
Ulj FMF Fc2 Fm2 Sno Black Scholes Model 01
No ratings yet
Ulj FMF Fc2 Fm2 Sno Black Scholes Model 01
60 pages
Binomial Distribution
No ratings yet
Binomial Distribution
16 pages
Stochastic Calculus
No ratings yet
Stochastic Calculus
114 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
8 pages
Martingales
No ratings yet
Martingales
40 pages
Bayes' Estimators of Generalized Entropies
No ratings yet
Bayes' Estimators of Generalized Entropies
16 pages
Brownian Motion Stochastic Calculus
100% (1)
Brownian Motion Stochastic Calculus
48 pages
Geometricbrownian PDF
100% (1)
Geometricbrownian PDF
15 pages
Brownian Motion: A Tutorial
No ratings yet
Brownian Motion: A Tutorial
40 pages
Lecture Notes Munk
100% (1)
Lecture Notes Munk
288 pages
A Revealing Introduction To Hidden Markov Models
No ratings yet
A Revealing Introduction To Hidden Markov Models
20 pages
MC Notes
No ratings yet
MC Notes
42 pages
ST202 Notes
No ratings yet
ST202 Notes
52 pages
Markov Chain
No ratings yet
Markov Chain
37 pages
Markov Chains 2013
No ratings yet
Markov Chains 2013
42 pages
22 Powerful Tips For More Effective Preaching or Public
No ratings yet
22 Powerful Tips For More Effective Preaching or Public
2 pages
Jitter and Shimmer Measurements For Speaker Recognition
No ratings yet
Jitter and Shimmer Measurements For Speaker Recognition
4 pages
Study Material For Lecture 1
No ratings yet
Study Material For Lecture 1
11 pages
SM2_L1_LanguageTestB_U8
No ratings yet
SM2_L1_LanguageTestB_U8
3 pages
LAMP Stack
No ratings yet
LAMP Stack
9 pages
iOS Game Development by Example - Sample Chapter
0% (1)
iOS Game Development by Example - Sample Chapter
18 pages
What The Language Is
No ratings yet
What The Language Is
15 pages
2024 Midyear Review Form For USE
No ratings yet
2024 Midyear Review Form For USE
3 pages
Bahasa Inggris Umum
No ratings yet
Bahasa Inggris Umum
76 pages
Bhanu Pratap Singh Rathore: Education Contact Information
No ratings yet
Bhanu Pratap Singh Rathore: Education Contact Information
2 pages
Chappidi Sindhu- Product Analyst Assignment
No ratings yet
Chappidi Sindhu- Product Analyst Assignment
6 pages
Simple Future Tense
No ratings yet
Simple Future Tense
13 pages
IEC62304 Template Software Development Plan V1 0
No ratings yet
IEC62304 Template Software Development Plan V1 0
11 pages
MEd Secondary Education
No ratings yet
MEd Secondary Education
2 pages
9CF2692DB6BFF481
No ratings yet
9CF2692DB6BFF481
60 pages
BOOKLET Living Image of Amun A Basic Lex
No ratings yet
BOOKLET Living Image of Amun A Basic Lex
58 pages
It'SSpring
No ratings yet
It'SSpring
1 page
Jatin Gupta CV
No ratings yet
Jatin Gupta CV
1 page
Unit Ii Crystal Structure
No ratings yet
Unit Ii Crystal Structure
24 pages
GE Electricla Control Sistem
No ratings yet
GE Electricla Control Sistem
16 pages
UDAAN 2025 Mathematics Triangles:, 6 CM 8 CM ×
No ratings yet
UDAAN 2025 Mathematics Triangles:, 6 CM 8 CM ×
4 pages
Syllabus For 7 Class
No ratings yet
Syllabus For 7 Class
12 pages
EngID1 2ndedition Unit8Teachers
100% (1)
EngID1 2ndedition Unit8Teachers
10 pages
New 2019 Reading Age Test
No ratings yet
New 2019 Reading Age Test
2 pages
SSC Mts Syllabus
No ratings yet
SSC Mts Syllabus
3 pages
Esperanza Spalding Samba em Preludio Analysis
No ratings yet
Esperanza Spalding Samba em Preludio Analysis
5 pages
MCQS
No ratings yet
MCQS
36 pages
Good Life - T.L. Osborn
No ratings yet
Good Life - T.L. Osborn
277 pages

Discrete Time Markov Chains

Uploaded by

Discrete Time Markov Chains

Uploaded by

1

Discrete-time Markov chains

1.1 Definition and basic properties

Let I be a countable set. Each i ∈ I is called a state and I is called the

λi = P(X = i) = P({ω : X(ω) = i}).

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

Then λ deﬁnes a distribution, the distribution of X. We think of X as

We shall now formalize the rules for a Markov chain by a deﬁnition in

P(X0 = i1 , X1 = i2 , . . . , XN = iN ) = λi1 pi1 i2 pi2 i2 . . . piN −1 iN . (1.1)

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

Proof. Suppose (Xn )0≤n≤N is Markov(λ, P ), then

P(X0 = i1 , X1 = i2 , . . . , Xn = in ) = λi1 pi1 i2 . . . pin −1 in

for all n = 0, 1, . . . , N . In particular, P(X0 = i1 ) = λi1 and, for n =

So (Xn )0≤n≤N is Markov(λ, P ).

Theorem 1.1.2 (Markov property). Let (Xn )n≥0 be Markov(λ, P ).

Proof. We have to show that for any event A determined by X0 , . . . , Xm

P({Xm = im , . . . , Xm+n = im+n } ∩ A | Xm = i)

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

In that case we have to show

P(X0 = i1 , . . . , Xm+n = im+n and i = im )/P(Xm = i)

which is true by Theorem 1.1.1. In general, any event A determined by

(λP )j = λi pij , (P 2 )ik = pij pjk .

We deﬁne P n similarly for any n. We agree that P 0 is the identity matrix

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

Proof. (i) By Theorem 1.1.1

P(Xn = j) = ... P(X0 = i1 , . . . , Xn−1 = in−1 , Xn = j)

= ... λi1 pi1 i2 . . . pin −1 j = (λP n )j .

(ii) By the Markov property, conditional on Xm = i, (Xm+n )n≥0 is Markov

and is represented by the following diagram:

We exploit the relation P n+1 = P n P to write

(n+1) (n) (n)

(n+1) (n) (0)

This has a unique solution (see Section 1.11):

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

Example 1.1.5 (Virus mutation)

pii = 1 − α, pij = α/(N − 1) for i = j.

and by putting β = α/(N − 1) in Example 1.1.4 we ﬁnd that the desired

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

and transition matrix

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

so α = 1/5, β = 4/5, γ = −2/5 and

More generally, the following method may in principle be used to ﬁnd a

for some constants a1 , . . . , aM (depending on i and j). If an eigen-

and deﬁne inductively

Xn+1 = G(Xn , Yn+1 ).

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

Suppose now that Z0 , Z1 , . . . are independent, identically distributed

It is in state 1 before the ﬁrst trial. What is the probablity that it is

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

Someone suggests that the record of successive choices (a sequence of As

Calculate P(Xn = 1|X0 = 1) in each of the following cases: (a) p = 1/16,

1.2 Class structure

It is sometimes possible to break a Markov chain into smaller pieces, each

Pi (Xn = j for some n ≥ 0) > 0.

We say i communicates with j and write i ↔ j if both i → j and j → i.

which proves the equivalence of (i) and (iii). Also

pij = pii2 pi2 i2 . . . pin −1 j

so that (ii) and (iii) are equivalent.

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

It is clear from (ii) that i → j and j → k imply i → k. Also i → i for

The solution is obvious from the diagram

Which classes are closed?

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

1.3 Hitting times and absorption probabilities

When A is a closed class, hA

kiA = Ei (H A ) = nP(H A = n) + ∞P(H A = ∞).

We shall often write less formally

Remarkably, these quantities can be calculated explicitly by means of cer-

Starting from 2, what is the probability of absorption in 4? How long does

hi = Pi (hit 4), ki = Ei (time to hit {1, 4}).

https://doi.org/10.1017/CBO9780511810633.003 Published online by Cambridge University Press

Clearly, h1 = 0, h4 = 1 and k1 = k4 = 0. Suppose now that we start at 2,

Theorem 1.3.2. The vector of hitting probabilities hA = (hA i : i ∈ I) is

(Minimality means that if x = (xi : i ∈ I) is another solution with xi ≥ 0

Proof. First we show that hA satisﬁes (1.3). If X0 = i ∈ A, then H A = 0,

= Pi (H A < ∞ | X1 = j)Pi (X1 = j) = pij hA

pii = 1 − α, pij = α/(N − 1) for i = j.

= Pi (X1 ∈ A) + Pi (X1 ∈ A, X2 ∈ A) + pij pjk xk .

xi = Pi (X1 ∈ A) + . . . + Pi (X1 ∈ A, . . . , Xn−1 ∈ A, Xn ∈ A)

If p = q this recurrence relation has a general solution

{Tj = n} = {X1 = j, . . . , Xn−1 = j, Xn = j}.

{H A = n} = {X0 ∈ A, . . . , Xn−1 ∈ A, Xn ∈ A}.

E1 (H0 ) = lim φ (s).

φ (s) = (pφ(s)2 + q)/(1 − 2psφ(s)) → 1/(1 − 2p) = 1/(q − p) as s ↑ 1.

Sm+1 = inf{n ≥ Sm : Xn = XSm }.

where pii = 0 and, for i = j

pij = pij / pik .

Thus (Zm )m≥0 is a Markov chain on I with transition matrix P.