0% found this document useful (0 votes)
9 views40 pages

Marcov Chains

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views40 pages

Marcov Chains

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Chapter 5

Markov chains

In this chapter we are concerned with the simplest non-trivial class of stochastic processes, con-
sisting of sequences of random variables (Xn )n∈N taking values in a discrete set S which in the
following will be called state space. S will be a finite or infinite countable set and we shall label
its elements with an index i ∈ S and call them states. In other terms, we are going to study a
stochastic process (Xt )t∈T where the set of times is discrete T = N and the random variables are
discrete as well, since there exists a discrete set S such that for any n ∈ N the distribution of the
random variable Xn will be uniquely determined by the numbers

λni = P(Xn = i), i∈S (5.1)

through X
µXn (H) = P(Xn ∈ H) = λni , H ∈ B(R)
i∈H

and, analogously, the finite dimensional distributions {µn1 ,...,nm }n1 ,...,nm ∈N will be uniquely deter-
mined by the probabilities

P(Xn1 = i1 , . . . , Xnm = im ), i1 , . . . , im ∈ S (5.2)

in terms of
X
µn1 ,...,nm (H) = P((Xn1 , . . . , Xnm ) ∈ H) = P((Xn1 , . . . , Xnm ) = (i1 , . . . , im )), H ∈ B(Rm )
(i1 ,...,im )∈H

Hence, without loss of generality in the following we shall limit ourself to the study of the quantities
(5.1) and (5.2) instead of the family of measures {µn1 ,...,nm }n1 ,...,nm ∈N

5.1 Introduction
Among the processes with discrete time set T and discrete state space S, Markov chains play a
particular role since their admit a rather simple mathematical description on the one hand but
find rich and interesting applications on the other hand. We shall be essentially concerned with
the following prediction problems:

46
1. Given the past history of the system, i.e. given the state at times n = 0, 1, ..., m, compute the
probability that at a future time m0 > m the system will reach the state j. This is actually
given by the conditional probabilities
P(Xm0 = j|X0 = i0 , . . . , Xm = im )

2. Predict the long time behaviour of the system, more precisely determine whether there exists
the limit
lim P(Xn = j), j∈S
n→+∞
and interpret the result.
Let us present a couple of examples
Example 13 (The simplest gambling example ). Let us consider an infinite number of tosses of a
coin (not necessarily balanced). Let {ξn } be a sequence of i.i.d random variables representing the
results of the tosses, i.e. ξn = +1 if the results of the n−th toss is head and ξn = −1 if the results
of the n−th toss is tail. Their distribution is simply given by
P(ξn = +1) = p, P(ξn = −1) = 1 − p.
Let us assume that at each toss we win 1 euro if the result is head and lose 1 euro if the result
is tail. If the initial cash available is given by a random variable X0 describing our fortune at time
n = 0, our total fortune after n tosses will Pnbe given by the random variable Xn with values in the
state space S = Z given by Xn := X0 + j=1 ξj .
If we try to predict our future fortune given the past history, i.e. if we want to compute
P(Xn+1 = j|X0 = i0 , . . . , Xn = in )
this is simply given by

 p j = im + 1
P(Xn+1 = j|X0 = i0 , . . . , Xn = in ) = 1 − p j = im − 1
0 otherwise

Example 14 (The problem of surname extiction). The males of a family (with a particular P∞ sur-
name) can produce 0, 1, 2, ..., k, ... male offsprings with probabilities p0 , p1 , ..., pk , ... (where k=0 pk =
1). Assuming that the number of male offsprings of different individuals are described by i.i.d.
random variables ξl with distribution P(ξl = k) = pk and that Xn is the discrete random variable
denoting the number of males at the n − th generation, we have
Xin
P(Xn+1 = j|X0 = i0 , . . . , Xn = in ) = P( ξl = j)
l=1

In this case the state space is S = N. In particular in this model it is interesting to study the
extinction probability, i.e. the probability that Xn = 0 for some n ≥ 1.
In both examples, in order to predict the value of the variable Xn+1 it is enough to look at the
state at time n and not further into the past, more precisely ∀n ≥ 0 and for all sequences of states
i0 , . . . in , in+1 , the following holds:
P(Xn+1 = in+1 |X0 = i0 , . . . , Xn = in ) = P(Xn+1 = j|Xn = in ) (5.3)
The identity (5.3) is called Markov property and any sequence of discrete random variables {Xn }n≥0
enjoying it is called Markov chain.

47
5.2 Basic definitions and first examples
The conditional probabilities

P(Xn+1 = j|Xn = i), n ∈ N, i, j ∈ S (5.4)

are called transition probabilities. In the following we shall be concerned with time homogeneous
Markov chains , i.e. sequence of random variables {Xn }n∈N enjoying the Markov properties (5.3)
and such that the (one-step) transition probabilities (5.4) do not depend explicitly on the time
index n ∈ N:

P(Xn+1 = j|Xn = i) = P(X1 = j|X0 = i), ∀n ∈ N, i, j ∈ S, (5.5)

In this case, since we can drop the explicit dependence on the time index n, we can introduce the
shortened notation
pij := P(X1 = j|X0 = i)
The transition probabilities actually give the instrument for determining the time evolution of the
distribution of the random variables {Xn }, indeed if we know the probabilities λni = P(Xn = i) we
can compute the probabilities λn+1
j = P(Xn+1 = j) as:
X X
λn+1
j = P(Xn+1 = j) = P(Xn+1 = j|Xn = i)P(Xn = i) = pij λni . (5.6)
i∈S i∈S

In this respect, condition (5.5) can be interpreted as the time invariance of the dynamics governing
the evolution of the probabilities.

It is really convenient to construct a squared matrix P = (Pij , i, j ∈ S) out of the transition


probabilities:
Pij = pij
If #S = N then P is a N × N squared matrix; if S has infinite elements, then P will be a
matrix with infinite rows and colons. In both cases the identity (5.6) connecting the probabilities
λni = P(Xn = i) with the probabilities λn+1
j = P(Xn+1 = j) can be written in terms of an easy
matrix operation. Indeed by introducing the row vectors λn = (λni )i∈S and λn+1 = (λn+1 i )i∈S ,
identity (5.6) can be written as
λn+1 = λn P (5.7)
More generally, given the distribution λ0 of the random variable X0 , i.e. λ0 = P(X0 = i), we can
compute for any n ≥ 1 the distribution λn of the random variable Xn as

λn = λ0 P n , (5.8)

where P n is the n − th power of P , more explicitly

λnj = λ0i (P n )ij , j∈S (5.9)

Moreover, given the initial distribution λ0 and the matrix P , we can compute all the finite dimen-
sional distributions as

P(X0 = i0 , X1 = i1 , ..., Xn = in ) = λ0i0 Pi0 i1 · · · Pin−1 in , n ∈ N, i0 , i1 , ..., in ∈ S (5.10)

48
This formula can be easily proved by induction, indeed

P(X0 = i0 , X1 = i1 , ..., Xn = in )
= P(Xn = in |X0 = i0 , X1 = i1 , ..., Xn−1 = in−1 )P(X0 = i0 , X1 = i1 , ..., Xn−1 = in−1 )
= P(Xn = in |Xn−1 = in−1 )P(X0 = i0 , X1 = i1 , ..., Xn−1 = in−1 )

Exercise 1. n-steps transition probabilities.


Prove that for any n, m ∈ N and for any i, j ∈ S

P(Xm+n = j|Xm = i) = (P n )ij . (5.11)

By construction1 the matrix P enjoys the following properties:


S1.
Pij ≥ 0, ∀i, j ∈ S

S2. Each row is normalized: X


Pij = 1, ∀i ∈ S
j∈S

A matrix P = (Pij , i, j ∈ S) enjoying both properties S1 and S2 is called a stochastic matrix.


Stochastic matrices provide the law of the time evolution of the distributions of the random
variables Xn through (5.7) and (5.8) and allow to compute all finite dimensional distributions in
terms of formula (5.10) once the initial distribution λ0 is given.
Exercise 2. Prove the following properties of Markov chains:
(n)
1. P(Xn+m = j|Xm = i) = P(Xn = j|X0 = i) = pij for all m, n ∈ N, i, j ∈ S;
2. pnij = (P n )ij for all n ∈ N, i, j ∈ S;
3. For all N ≥ 1, 0 ≤ n1 < n2 < ... < nN and i1 , ..., iN ∈ S
(n ) (n −n ) (n −n )
X
P(Xn1 = i1 , Xn2 = i2 , ..., XnN = iN ) = λi0 pi0 i11 pi1 i22 1 · · · piNN−1 iNN −1
i0 ∈S

The graph associated to a stochastic matrix


As explained above, the stochastic matrix describes the set of transition probabilities pij hence the
dynamics of the Markov chain. The same information can be represented by means of a diagram
whose vertices are associated with the states i ∈ S that the random variables (Xn )n∈N can attain.
If the transition probability pij is non-vanishing, then the vertices i and j are connected with a
directed arrow on which the number pij is written. Some particular examples are given below.

1. In the simple case there are only two possible states S = {1, 2}, the generic stochastic matrix
has the following form  
(1 − α) α
P =
β (1 − β)
where α ∈ [0, 1], β ∈ [0, 1].
It can be equivalently described by the following diagram:
1 Both properties S1 and S2 follow easily by the definition of Pij = P(X1 = j|X0 = i).

49
α

1−α 1 2 1−β

2. Let us consider the case where S = {1, 2, 3} and the stocastic matrix is:
 
0 1 0
P = 0 1/2 1/2 
1/2 0 1/2

The corresponding diagram is:

1
1
1 2 2

1
1 2
2 1
3 2

Let us consider the case where there are N possible states and the transition probabilities
are given by the following diagram (where we have taken N = 4)

2
p p

1−p 1−p
1 3

1−p 1−p

p p
N

The corresponding stochastic matrix has the following structure:


 
0 p 0 ... 1−p
 1−p 0 p 0 0 
P = .
 
.. .. .. .
 .. .. 
. . . 
p 0 ... 1−p 0

50
3. Random walk with adsorbing boundaries

p p p
1 1 2 3 4 N 1

1−p 1−p 1−p

 
1 0 0 ... 0
 1−p 0 p 0 0 
P =
 
.. .. .. .. .. 
 . . . . . 
0 0 ... 0 1

4. random walk with reflecting boundaries

1
p p p
1 2 3 4 N

1−p 1−p 1−p 1


 
0 1 0 ... 0
 1−p 0 p 0 0 
P =
 
.. .. .. .. .. 
 . . . . . 
0 0 ... 1 0

A simple but important identity in the theory of Markov chains is the following Chapman-
Kolmogorv equation:
X
P(Xn = j|Xl = i) = P(Xn = j|Xm = k)P(Xm = k|Xl = i),
k∈S

which holds for any l ≤ m ≤ n, i, j ∈ S. By introducing the notation pnij = P(Xn = j|X0 = i) and
using property (5.5), it can be equivalently written in the following form
X
pn+m
ij = pm n
ik pkj
k∈S

where the symbols pnij , n ∈ N, i, j ∈ S denote the n− steps transition probabilities:

pnij = P(Xn = j|X0 = i).

We are now ready for giving a precise definition of a Markov chain.

Definition 22. Let S be a discrete set, λ0 a probability distribution on S, P = (Pij , i, j ∈ S)


a stochastic matrix. A discrete time stochastic process (Ω, F, P, (Xn )n∈N ) with random variables
(Xn )n with values in the discrete space S is called a Markov chain (λ0 , P ) if

51
• P(X0 = i) = λ0i for all i ∈ S;

• For all n ≥ 0, conditional on Xn = i, the random variable Xn+1 has distribution (Pij j ∈ S)
and is independent of X0 , . . . , Xn−1 :

P(Xn+1 = j|X0 = i0 , . . . , Xn−1 = in−1 , Xn = i) = P(Xn+1 = j|Xn = i) = Pij , ∀i, j, i0 , . . . , in−1 ∈ S

The following result shows how a particular form of the finite dimensional distributions allow
to uniquely identify a Markov chain and will be applied later.
Theorem 11. A sequence of random variables (Xn )n∈N with values in a discrete space S is a
Markov chain (λ, P ) if and only if for all n ∈ N and i0 , . . . in ∈ S the following holds

P(X0 = i0 , X1 = i1 , ..., Xn = in ) = λi0 Pi0 i1 · · · Pin−1 in . (5.12)

Proof: If (Xn )n∈N is a Markov chain (λ, P ) then we have already proved that (5.12) holds (see
(5.10) and its proof by induction).
Conversely, if (5.12) holds, then P(X0 = i) = λi and we can easily prove that

P(X0 = i0 , . . . , Xn−1 = in−1 , Xn = in , Xn+1 = in+1 )


P(Xn+1 = in+1 |X0 = i0 , . . . , Xn−1 = in−1 , Xn = in ) =
P(X0 = i0 , . . . , Xn−1 = in−1 , Xn = in )
λi0 Pi0 i1 · · · Pin−1 in Pin in+1
=
λi0 Pi0 i1 · · · Pin−1 in
= Pin in+1

By using this result we can also prove

P(Xn+1 = in+1 |Xn = in )


X
= P(Xn+1 = in+1 |X0 = i0 , . . . , Xn−1 = in−1 , Xn = in )P(X0 = i0 , . . . , Xn−1 = in−1 |Xn = in )
i0 ,...,in−1 ∈S
X
= Pin in+1 P(X0 = i0 , . . . , Xn−1 = in−1 |Xn = in )
i0 ,...,in−1 ∈S

= Pin in+1 .

An alternative formulation of Markov property


Given a discrete space state S and an element i ∈ S, we will denote δi the probability distribution
concentrated in i: (
1 j=i
δi = (δij , j ∈ S), δij =
0 j 6= i
The following theorem provides a characterization of Markov chain, reinforcing the intuitive idea
of ”lack of memory”.

52
Theorem 12. Let (Ω, F, P, {Xn }n≥0 ) be a Markov chain (λ, P ), and let m ∈ N and i ∈ S.
Conditionally upon Xm = i, the sequence of random varibles {Yn }n≥0 defined by

Yn := Xm+n , n≥0

is a Markov chain (δi , P ) independent of X0 , X1 , ..., Xm .

The theorem can be equivalently stated in the following way:


1. Let us consider on the measurable space (Ω, F) the probability measure P( |Xm = i) (condi-
tional on Xm = i), and the sub-σ-algebras Fm and Gm defined as

Fm = σ(X0 , ..., Xm ), Gm = σ(Yn , n ≥ 0) = σ(Xm+n , n ≥ 0),

then they are independent:

P(A ∩ B|Xm = i) = P(A|Xm = i)P(B|Xm = i), ∀A ∈ Fm , B ∈ Gm (5.13)

2. The joint conditional distribution of the random variables (Xm+n , n ≥ 0) givem Xm = i is


given by

P(Xm = i0 , Xm+1 = i1 . . . , Xm+n−1 = in−1 , Xm+n = in |Xm = i) = δii0 Pi0 i1 · · · Pin−1 in

By theorem 11 this is equivalent to say that conditionally upon Xm = i, the random variables
(Xm+n )n≥0 are a Markov chain (δi , P )

Proof: [of theorem 12]By lemma 3, it is sufficient to prove (5.13) only for events A and B belonging
respectively to two π− systems P1 and P2 generating the sub-σ-algebras Fm = σ(X0 , ..., Xm ) and
Gm = σ(Yn , n ≥ 0) = σ(Xn , n ≥ m).
In particular, P1 will be the collections of sets of the following form:

P1 = {X0 = i0 , . . . , Xm = im }, i0 , . . . , im ∈ S

while P2 will be the collections of sets of the following form:



P2 = {Xm = jm , . . . , Xm+1 = jm+1 , . . . Xm+n = jm+n } n ∈ N, jm , . . . , jm+n ∈ S

Taking A = {X0 = i0 , . . . , Xm = im } and B = {Xm = jm , . . . , Xm+1 = jm+1 , . . . Xm+n = jm+n },

53
we have to compute

P(A ∩ B|Xm = i)
P(A ∩ B ∩ Xm = i)
=
P(Xm = i)
P(X0 = i0 , . . . , Xm = im , Xm = i, Xm = jm , . . . , Xm+1 = jm+1 , . . . , Xm+n = jm+n )
=
P(Xm = i)
λi0 Pi0 i1 · · · Pim−1 im δim i δijm Pjm jm+1 · · · Pjm+n−1 jm+n
=
P(Xm = i)
λi Pi i · · · Pim−1 im δim i
= 0 01 δijm Pjm jm+1 · · · Pjm+n−1 jm+n
P(Xm = i)
P(X0 = i0 · · · Xm = im , Xm = i)
= δijm Pjm jm+1 · · · Pjm+n−1 jm+n
P(Xm = i)
P(A ∩ Xm = i)
= δijm Pjm jm+1 · · · Pjm+n−1 jm+n
P(Xm = i)
= P(A|Xm = i)δijm Pjm jm+1 · · · Pjm+n−1 jm+n

The strong Markov property


In this section we are going to extend the result presented in theorem 12 to the case where the
deterministic time m ∈ N is replaced by a ”suitable”random variable. The first step is to define
what ”suitable” means in this context.
Definition 23. Let (Ω, F, P, (Xn )n∈N be a Markov chain and let (Fn )n∈N be the natural filtration,
i.e. Fn := σ(Xk , k ≤ n).
A discrete random variable τ with values in the set N ∪ {∞} is called a stopping time if

∀n ∈ N {τ = n} ∈ Fn (5.14)

It is important to recall that the condition {τ = n} ∈ Fn means that the event {τ = n} is


determined only by the first n + 1 random variables X0 , X1 , . . . , Xn . Indeed, an equivalent way
for formulating the condition E ∈ Fn consists in the Fn -measurability of the indicator function
1E : Ω → R of the set E (
1 ω∈E
1E (ω) =
0 ω∈ /E
By Doob theorem (theorem 3), the map 1E is Fn measurable if and only if there exists a Borel
measurable map g : Rn+1 → R such that

1E (ω) = g(X0 (ω), . . . , Xn (ω))


This means that if we want to determine whether a given point ω ∈ Ω belongs to the set {τ = n}
it is sufficient to know the values X0 (ω), . . . , Xn (ω).

54
Example 15. Let A ⊂ S be a set of states and let τ be the first hitting time of A, defined as:

τ (ω) := inf{n ∈ N : Xn (ω) ∈ A},

where inf ∅ := +∞. τ is a stopping time. Indeed:


n−1
\
{τ = n} = {Xk ∈
/ A} ∩ {Xn ∈ A}
k=0

hence {τ = n} ∈ Fn since the n events appearing on the r.h.s belong to Fn .


Example 16. Let A ⊂ S be a set of states and let τ be the last exit time of A, defined as:

τ (ω) := sup{n ∈ N : Xn (ω) ∈ A},

τ is not a stopping time. Indeed:



\
{τ = n} = {Xn ∈ A} ∩ {Xk ∈
/ A}
k=n+1

and the events {Xk ∈


/ A} for k > n do not belong to Fn .

Given a Markov chain (Xn )n≥0 , the natural filtration (Fn )n≥0 and a stopping time τ , we shall
denote Fτ the collection of sets E ∈ F satisfying the following condition

∀n ∈ N E ∩ {τ = n} ∈ Fn . (5.15)

The collection Fτ is a σ− algebra. Indeed, it is easy to verify that:


• Ω ∈ Fτ ;
• if E ∈ Fτ then E c ∈ Fτ ;
• if {En }n ⊂ Fτ then ∪n En ∈ Fτ .

Intuitively, an event E belongs to Fτ if it is determined by the random variables X0 , . . . , Xτ .


For example, fixed a state i ∈ S, the event E = {ω ∈ Ω : Xτ (ω) (ω) = i} belongs to Fτ , indeed:

E ∩ {τ = n} = {Xτ = i} ∩ {τ = n} = {Xn = i} ∩ {τ = n} ∈ Fn

On the other hand, the event E 0 = {ω ∈ Ω : Xτ (ω)+1 (ω) = i} does not belong to Fτ , since:

E 0 ∩ {τ = n} = {Xτ +1 = i} ∩ {τ = n} = {Xn+1 = i} ∩ {τ = n} ∈ Fn

and in this case we have that {Xn+1 = i} ∈


/ Fn .
We are now ready to present the main result of this section.

Theorem 13 (Strong Markov property). Let (Xn )n∈N be a Markov chain (λ, P ), τ a stopping
time and i ∈ S a state. Then, conditionally upon {Xτ = i} ∩ {τ < ∞}, the random variables
(Xτ +n )n∈N are a Markov chain (δi , P ) independent of Fτ .

55
Proof: We have to prove that for any choice of E ∈ Fτ , n ∈ N, i0 , . . . , in ∈ S the following holds:

P(E ∩ {Xτ = i0 , Xτ +1 = i1 , . . . , Xτ +n = in }|{Xτ = i} ∩ {τ < ∞})


= δii0 Pi0 i1 · · · Pin−1 in P(E|{Xτ = i} ∩ {τ < ∞}) (5.16)

By direct computation we have:

P(E ∩ {Xτ = i0 , Xτ +1 = i1 , . . . , Xτ +n = in }|{Xτ = i} ∩ {τ < ∞})


P(E ∩ {Xτ = i0 , Xτ +1 = i1 , . . . , Xτ +n = in } ∩ {Xτ = i} ∩ {τ < ∞})
= (5.17)
P({Xτ = i} ∩ {τ < ∞})

By decomposing the event τ < ∞ as the disjoint union of the events {τ = n}

{τ < ∞} = ∪m {τ = m}

the last line of (5.17) becomes


P
P(E ∩ {Xτ = i0 , Xτ +1 = i1 , . . . , Xτ +n = in } ∩ {Xτ = i} ∩ {τ = m})
m
P({Xτ = i} ∩ {τ < ∞})
P
m P(E ∩ {X m = i0 , Xm+1 = i1 , . . . , Xm+n = in } ∩ {Xm = i} ∩ {τ = m})
P({Xτ = i} ∩ {τ < ∞})
P
m P(E ∩ {τ = m} ∩ {X m = i0 , Xm+1 = i1 , . . . , Xm+n = in }|{Xm = i})P(Xm = i)
(5.18)
P({Xτ = i} ∩ {τ < ∞})

Since E ∈ Fτ the event E ∩ {τ = m} belongs to Fm and by theorem 12 we have

P(E ∩ {τ = m} ∩ {Xm = i0 , Xm+1 = i1 , . . . , Xm+n = in }|{Xm = i})


= δii0 Pi0 i1 · · · Pin−1 in P(E ∩ {τ = m}|{Xm = i})

Hence the last line of (5.18) becomes


P
mP(E ∩ {τ = m}|{Xm = i})P(Xm = i)
δii0 Pi0 i1 · · · Pin−1 in
P({Xτ = i} ∩ {τ < ∞})
P
P(E ∩ {τ = m} ∩ {Xm = i})
=δii0 Pi0 i1 · · · Pin−1 in m
P({Xτ = i} ∩ {τ < ∞})
P(E ∩ {τ < ∞} ∩ {Xm = i})
=δii0 Pi0 i1 · · · Pin−1 in
P({Xτ = i} ∩ {τ < ∞})
=δii0 Pi0 i1 · · · Pin−1 in P(E|{τ < ∞} ∩ {Xm = i})

5.3 Recurrent and transient states


In the following sections we are going to study properties of Markov chains which are related to the
form of the stochastic matrix P or, equivalently, to the transition probabilities pij . In particular

56
we will fix P and consider different initial distributions λ.
Our Markov chain will be a sequence (Xn )n∈N of S-valued discrete random variables on a prob-
ability space (Ω, F, P). For any probability distribution λ on S we will consider a corresponding
probability measure Pλ on (Ω, F) in such a way that under Pλ the sequence (Xn )n∈N will be a
Markov chain (λ, P ). Whenever λ = δi , with i ∈ S, we shall use the notation Pi instead of Pδi . In
fact Pi can be interpreted as the conditional distribution of the random variables (Xn )n∈N given
that X0 = i:
Pi ( ) ≡ P( |X0 = i)

First return time


Given a state i ∈ S, let us consider the First return time to state i, i.e. the random variable
Ti : Ω → N ∪ {+∞} defined as:

Ti (ω) := inf{n ≥ 1 : Xn (ω) = i}

where inf ∅ = +∞.


It is easy to prove that Ti is a stopping time.

For any pair of states i, j ∈ S let us consider the following quantities:


(n)
fij := Pi (Tj = n) = P(Xn = j, Xn−1 6= j, Xn−2 6= j, . . . , X1 6= j|X0 = i), n≥1
X X (n)

fij := Pi (Tj < +∞) = Pi (Tj = n) = fij
n≥1 n≥1

Definition 24. A state i ∈ S is said to be recurrent if Pi (Ti < +∞) = 1. A state i ∈ S is said to
be transient if Pi (Ti < +∞) < 1.
Clearly recurrence and transience are exclusive properties: a state i ∈ S can be either transient
or recurrent.
Fixed a state i ∈ S, let us define the number of returns to the state i as:
X
Ni (ω) := 1Xn =i (ω)
n≥1

Ni is a random variables with values in N ∪ {+∞}.


Theorem 14. Under the assumption above the following holds:

Pi (Ni = k) = (1 − fii∗ )(fii∗ )k

In particular:
a. If i is recurrent then Pi (Ni = +∞) = 1.
b. if i is transient then Pi (Ni < +∞) = 1.

Proof:
We shall compute the probability Pi (Ni = k) out of the elementary identity

Pi (Ni ≥ k) = Pi (Ni = k) + Pi (Ni ≥ k + 1),

57
which yields
Pi (Ni = k) = Pi (Ni ≥ k) − Pi (Ni ≥ k + 1).
Hence, the problem is reduce to the calculation of the probabilities Pi (Ni ≥ k) for any k ∈ N.
Clearly, if k = 0 then Pi (Ni ≥ 0) = 1, while if k = 1 we can use the identity Pi (Ni ≥ 1) = Pi (Ti <
∞), which gives Pi (Ni ≥ 1) = fii∗ . In order to generalize this argument to arbitrary k ∈ N, let us
(k)
define the sequence of random variables (Ti )k ≥ 1 with values in N ∪ {+∞} as:
(1)
Ti = Ti , the first return time to state i
(2)
Ti = Ti + inf{n ≥ 1 : XTi +n = i}, the second return time to state i
...
(k)
Ti = Tik−1 + inf{n ≥ 1 : XT k−1 +n = i}, the k-th return time to state i
i

Clearly we have
(k)
Pi (Ni ≥ k) = Pi (Ti < +∞) .
(k)
Moreover, it is rather easy to check that the random variables Ti are stopping times.
We shall now prove that
(k)
∀k ≥ 1 Pi (Ti < +∞) = (fii∗ )k . (5.19)
We shall use and inductive argument. Indeed, as remarked above, the identity (5.19) is true for
k = 1. Let us assume now that it holds true for k and prove it for k + 1.
(k+1) (k+1) (k)
Pi (Ti < +∞) = Pi (Ti < +∞ ∩ Ti < +∞)
(k+1) (k) (k)
= Pi (Ti < +∞|Ti < +∞)Pi (Ti < +∞)
(k+1) (k) (k) (k)
X
= Pi (Ti = Ti + n|Ti < +∞)Pi (Ti < +∞)
n≥1
(k) (k)
X
= Pi (XT (k) +1 6= i, . . . , XT (k) +n−1 6= i, XT (k) +n = i|Ti < +∞)Pi (Ti < +∞)
i i i
n≥1
(5.20)
(k)
By the inductive assumption Pi (Ti < +∞) = (fii∗ )k and we are left to prove that
(k)
X
Pi (XT (k) +1 6= i, . . . , XT (k) +n−1 6= i, XT (k) +n = i|Ti < +∞) = fii∗ .
i i i
n≥1
(k) (k) (k)
By definition of Ti we have {Ti < +∞} = {Ti < +∞, XT (k) = i}. Moreover
i

(k)
Pi (XT (k) +1 6= i, . . . , XT (k) +n−1 6= i, XT (k) +n = i|Ti < +∞, XT (k) = i)
i i i i
(k)
= P(XT (k) +1 6= i, . . . , XT (k) +n−1 6= i, XT (k) +n = i|Ti < +∞, XT (k) = i, X0 = i) (5.21)
i i i i

(k)
Further, Ti is a stopping time and the event X0 = i belongs to the σ-algebra FT (k) , hence by
i
the strong Markov property we have:
(k)
P(XT (k) +1 6= i, . . . , XT (k) +n−1 6= i, XT (k) +n = i|Ti < +∞, XT (k) = i, X0 = i)
i i i i

= Pi (X1 6= i, . . . , Xn−1 6= i, Xn = i) = Pi (Ti = n) (5.22)

58
hence, the last line of (5.20) reduces to
(1)
X
Pi (Ti = n) (fii∗ )k = (Pi (Ti < +∞)) (fii∗ )k = (fii∗ )k+1
n≥1

Hence by the identity

Pi (Ni ≥ k) = Pi (Ni = k) + Pi (Ni ≥ k + 1)

we obtain

Pi (Ni = k) = Pi (Ni ≥ k) − Pi (Ni ≥ k + 1) = (fii∗ )k − (fii∗ )k+1 = (1 − fii∗ )(fii∗ )k

In particular, if i is a recurrent state we have

Pi (Ni = k) = 0 ∀k ∈ N

hence X
Pi (Ni < +∞) = Pi (Ni = k) = 0, Pi (Ni = +∞) = 1 − Pi (Ni < +∞) .
k≥0

While if i is transient
X X
Pi (Ni < +∞) = Pi (Ni = k) = (1 − fii∗ )(fii∗ )k = 1
k≥0 k≥0

(n) (n)
Let us denote with the symbol pij the n−step transition probability pij = P(Xn = j|X0 =
(n)
i) = Pi (Xn = j). In fact pij is equal to (P n )ij , with P n denoting th e n-th power of the stochastic
matrix P .
P (n)
Theorem 15. A state i ∈ S is recurrent if and only if n pii = +∞

Proof: Let us denote with Ei the expectation with respect to the probability measure Pi . Let Ni
(n)
be the number of returns to the state i. Since Ei [1Xn =i ] = Pi (Xn = i) = pii we have:
 
X X (n)
Ei [Ni ] = Ei  1Xn =i  = pii
n≥1 n≥1

By theorem 14 if i is recurrent then Ni = +∞ with probability Pi equal to 1, hence Ei [Ni ] = +∞.


Conversely, if i is transient
∞ ∞ ∞
X X X fii∗
Ei [Ni ] = kPi [Ni = k] = k(1 − fii∗ )(fii∗ )k = (1 − fii∗ ) k(fii∗ )k = < +∞
1 − fii∗
k=0 k=0 k=0

59
5.4 Communication classes
Let us consider a Markov chain {Xn }n∈N with stochastic matrix P .
Definition 25. Let i, j ∈ S. We say that i leads to j if Pi (∪n Xn = j) > 0. We shall denote this
relation with the symbol i → j.
It is easy to see that i → j if and only if there exists at least a non-negative integer n ∈ N such
(n)
that pij > 0.
Definition 26. Let i, j ∈ S. We say that i communicates with j , if i → j and j → i. We shall
denote this relation with the symbol i ∼ j.
More precisely i ∼ j if there exist two non-negative integers m, n ∈ N such that
(n) (m)
pij > 0 and pji > 0. (5.23)
Actually the relation ∼ is an equivalence relation, indeed it enjoys the following properties:
• reflexive. i ∼ i (it is sufficient to choose m = n = 0 in (5.23)).
• symmetric. if i ∼ j then j ∼ i (This comes directly from (5.23)).
• transitive . if i ∼ j and j ∼ k then i ∼ k. Indeed, if i ∼ j and j ∼ k then there exist m, n ∈ N
(n) (m)
such that pij > 0 and pjk > 0. We can prove that i → k by using Chapman-Kolmogorov
equation, indeed:
(n+m)
X (n) (m) (n) (m)
pik = pil plk ≥ pij pjk > 0
l∈S
Analogously we can prove that k → i.
Hence, the set of states S can be decomposed into the disjoint union of equivalence classes
S = C1 ∪ C2 ∪ ... ∪ CM .
Theorem 16. Let C ⊂ S be an equivalence class. Then the states in C are either recurrent or
transient.

Proof: Let i ∈ C and j ∼ i. We have to show that if i is recurrent then j is recurrent and,
conversely, if i is transient then j is transient. Since by assumption i ∼ j, then there exist two
(n ) (n )
positive integers n1 , n2 ∈ N such that pij 1 > 0 and pji 2 > 0. Moreover, for any n ∈ N the
following holds:
(n +n2 +n) (n ) (n) (n ) (n +n2 +n) (n ) (n) (n )
pjj 1 ≥ pji 2 pii pij 1 , pii 1 ≥ pij 1 pjj pji 2 ,
hence
∞ ∞ ∞
!
(k) (n +n +n) (n ) (n) (n )
X X X
pjj ≥ pjj 1 2 ≥ pji 2 pii pij 1 ,
k=0 n=0 n=0
∞ ∞ ∞
!
(k) (n +n +n) (n ) (n) (n )
X X X
pii ≥ pii 1 2 ≥ pij 1 pjj pji 2 ,
k=0 n=0 n=0
P∞ (k) P∞ (k)
and we have proved that the series k=0 pii
= +∞ if and only if k=0 pjj = +∞
In the following we shall call an equivalence class C recurrent if all its elements are recurrent
and transient if all its elements are transient.

60
5.4.1 Closed sets
Definition 27. A set K ⊂ S is said to be closed if for any i ∈ K and for any j ∈
/ K, i doesn’t
lead to j:
(n)
∀i ∈ K ∀j ∈/ K pij = 0 ∀n ∈ N (5.24)
Property (5.24) is equivalent to the following (which seems to be weaker):
(1)
∀i ∈ K ∀j ∈
/K pij = 0 (5.25)

Property (5.25) involves only the one-step transition probabilities, i.e the elements of the stochastic
matrix P .
Clearly (5.24) implies (5.25) (indeed (5.25) is a particular case of (5.24)).
The proof that (5.25) implies (5.24) relies on an inductive argument. In the case where n = 1
(5.24) coincides with (5.25) . Let us assume that (5.24) holds for n − 1 prove that it holds also for
n. Given i ∈ K and j ∈ / K, then:
(n) (n−1) (n−1)
X X
pij = pil plj = pil plj = 0,
l∈S l∈K

where the first equality relies on Chapman-Kolmogorov equation, the second on (5.25) and the
third on the inductive assumption.

Definition 28. A state i ∈ S is called absorbing if {i} is a closed set.


In particular, if i is an absorbing state, then pij = 0 ∀j 6= i and pii = 1.

Definition 29. Let C ⊂ S a subset of S. The closure of C is the smallest closed set containing
C.
Example Let us consider a Markov chain with stochastic matrix P given by
 
1/2 1/2 0 0 0 0
 0 0 1 0 0 0 
 
 1/3 0 0 1/3 1/3 0 
P = 
 0 0 0 1/2 1/2 0  
 0 0 0 0 0 1 
0 0 0 0 1 0

The associated diagram is:

61
1
2
1 1
3 3

1
2 1 3 4

1
1 2

1 1 1
2 3

2 5 6

By looking at the diagram we can easily see that S is decomposed in three equivalence classes
S = {1, 2, 3} ∪ {4} ∪ {5, 6}. In addition the class {5, 6} is closed. Other closed sets are {4, 5, 6}
and the whole state space S = {1, 2, 3, 4, 5, 6}.
The following results allow to identify rather easily the closed sets.
Lemma 6. Let K ⊂ S be a closed set and Ci an equivalence class. Then only one of the following
conditions holds:
• K ∩ Ci = ∅
• Ci ⊂ K

Proof: If K ∩ Ci 6= ∅ then there exists a state i ∈ Ci such that i ∈ K. We have to prove that for
any j ∈ Ci , i.e. for any j ∈ S such that j ∼ i, we have j ∈ K. Indeed, if it were not true, i.e. if
(n)
j∈/ K, then by definition of closed set pij = 0 ∀n ∈ N. In this case i and j do not communicate
in contradiction with the assumption they belong to the same equivalence class.

Lemma 7. Let i ∈ S be a generic state. The set Ki := {j ∈ S : i → j} of the states to which


i leads is closed. It is the smallest closed set containing i (hence it coincides with the closure of
{i}).

(n)
Proof: Ki is a closed set. Indeed if j ∈ Ki and l ∈
/ Ki then for any n ∈ N we have pjl = 0. Indeed,
(n̄)
if it were not true then there would exist a n̄ ∈ N such that pjl > 0. On the other hand, since
(m)
j ∈ Ki , there exists an m ∈ N such that pij > 0. By applying Chapman-Kolmogorov equation
we have
(m+n̄)
X (m) (n̄) (m) (n̄)
pil = pis psl ≥ pij pjl > 0
s∈S

This results shows that l ∈ Ki , hence we have obtained a contradiction.


In order to show that Ki coincides with the closure of {i} we have to prove that any other closed
set containing i must contain all the states j ∈ S to which i leads (i → j). Let K be a closed set
such that i ∈ K, let j ∈ S such that i → j. If j ∈ / K then for any s ∈ K we would have that

62
(n)
psj = 0 for all n ∈ N. In particular, if s = i we obtain that i doesn’t lead to j in contradiction
with the assumption j ∈ Ki .

Definition 30. A Markov chain is said to be irreducible if S and ∅ are the only possible closed
sets.
The following proposition gives an interesting characterization of irreducible Markov chains
Proposition 1. The following statements are equivalent:
1. S and ∅ are the only closed sets.
2. There exists a unique equivalence class.

Proof:
1. ⇒ 2.. Let us assume that S and ∅ are the only closed sets and let us show that there exists a
unique equivalence class. By lemma 7 we know that for any i ∈ S the set Ki := {j ∈ S : i → j} is
closed. Since Ki 6= ∅ then by 1. we get that Ki = S ∀i ∈ S. This means that for any i ∈ S and
 ∈ S we have that i → j, hence i ∼ j ∀i, j ∈ S.
2 ⇒ 1. Let us assume that there exists a unique equivalence class C and show that S and ∅ are
the only closed sets. Let K ⊂ S be a non-empty closed set and let i ∈ K. By lemma 6 we have
that the equivalence class Ci = {j ∈ S : j ∼ i} is included in K. By 2. we have Ci = S, hence
K = S.
According to Proposition 1, a Markov chain is irreducible if and only if any state i ∈ S leads
to all the states j ∈ S
Exercise. For each of the following Markov chains, draw the diagram associated to the stochastic
matrix and determine whether the chain is irreducible.
1. Random walk on the line. Let S = Z and pij of the form
pi(i+1) = p, pi(i−1) = 1 − p, pij = 0 if j 6= (i + 1), (i − 1)
where p ∈ [0, 1] is a nonnegative parameter which gives the probability of moving forward in
one step.
2. Random walk with absorbing barriers . S = {1, 2, . . . , N },
 
1 0 0 ... 0
 1−p 0 p 0 0 
P = .
 
. . .. ... ..
 .. .. 
. 
0 0 ... 0 1
where p ∈ (0, 1)
3. Random walk with reflecting barriers. S = {1, 2, . . . , N },
 
1−r r 0 ... 0
 1−p 0 p 0 0 
P = .
 
.. .. .. ..
 ..

. . . . 
0 0 ... r 1−r
where p ∈ (0, 1) and r ∈ (0, 1].

63
4. Ciclic random walk. S = {1, 2, . . . , N },
 
0 p 0 ... 1−p
 1−p 0 p 0 0 
P = .
 
.. .. .. ..
 ..

. . . . 
p 0 ... 1−p 0

with p ∈ (0, 1)

Remark 9. If a Markov chain is reducible, then there exists a closed set K (different from S and
∅). In particular, if the set of states S has a finite number of elements, then by sorting them in
such a way that the first rows (and columns) of the stochastic matrix P corresponds exactly to the
states belonging to K, P assumes the form:
 
Q 0
P = (5.26)
R S

where Q is an M × M matrix (M being the cardinality of K) whose elements are the transition
probabilities within the closed set K, i.e. Qij = pij , i, j ∈ K. Moreover Q is still a stochastic
PM
matrix since for any row index i ∈ K, i = 1, ...M ,we have j=1 Qij = 1. The stochastic matrix
Q contains the transition probabilities within K. In particular, if for some time n0 ∈ N we have
Xn0 ∈ K, then Xn ∈ K for any n > n0 and the evolution of the distribution of the random
variables Xn will be computed by means of the powers of the matrix Q. Indeed, if P is of the form
(5.26) , then its n-power assumes the following form:
 n 
n Q 0
P =
∗ Sn

5.4.2 Closed sets and recurrent states


Theorem 17. A recurrent equivalence class is closed

Proof: Ad absurdum, let us assume that C is not closed. In this case there exist two states i ∈ C
(n)
and j 6∈ C such that pij > 0. Since i → j but j ∈
/ C, then necessarily j 6→ i, which means pji = 0
for any n ∈ N. On the other hand, since i is a recurrent state, we have:

1 = fii∗ = P(∃n ≥ 1 Xn = i|X0 = i)


= P({∃n ≥ 1 Xn = i} ∩ {X1 = j}|X0 = i) + P({∃n ≥ 1 Xn = i} ∩ {X1 6= j}|X0 = i)
≤ P(∃n ≥ 2 Xn = i|X1 = j, X0 = i)P(X1 = j|X0 = i) + P(X1 6= j|X0 = i)
= P(∪n≥2 {Xn = i}|X1 = j, X0 = i)pij + (1 − pij )
X
= pij P(Xn = i|X1 = j, X0 = i) + (1 − pij )
n≥2
X
= pij P(Xn = i|X1 = j) + (1 − pij )
n≥2
(n−1)
X
= pij pji + (1 − pij ) = (1 − pij ) < 1
n≥2

64
and we have obtained a contradiction.
In particular, according to this result it is impossible to reach a transient state starting from a
recurrent state, i.e. if i is recurrent and j is transient then i doesn’t lead to j.
The following theorem gives a partial inversion of theorem 17.
Theorem 18. Let C be a transient equivalence class with a finite number of states. Then C is
not closed.
We postpone the proof and give a technical lemma.
Lemma 8. Let j ∈ S be a transient state. Then for any i ∈ S
(n)
lim p =0 (5.27)
n→∞ ij

Proof: In the case where i = j then the result follows from Theorem 15. Indeed, j is transient
P (n)
if and only if n pjj < +∞ and by a standard result of calculus the convergence of the series
P (n) (n)
n pjj implies limn→∞ pjj = 0.
In the case where i 6= j, the same argument can be applied. Indeed, by using Markov property we
have that
n
(n) (ν) (n−ν)
X
pij = fij pjj (5.28)
ν=1

Indeed:
(n)
pij = P(Xn = j|X0 = i) = P(∪nν=1 {Xn = j, , Xν = j, Xν−1 6= j, . . . , X1 6= j}|X0 = i)
Pn
Pnn= j, Xν = j, Xν−1 6= j, . . . , X1 6= j|X0 = i) =
ν=1 P(X
= ν=1 P(Xn = j|Xν = j, Xν−1 6= j, . . . , X1 6= j, X0 = i)·
·P(Xν = j, Xν−1 6= j, . . . , X1 6= j|X0 = i) =
Pn Pn (n−ν) (ν)
= ν=1 P(Xn = j|Xν = j) · P(Xν = j, Xν−1 6= j, . . . , X1 6= j|X0 = i) = ν=1 pjj fij

Hence for any M ≥ 1


M M M −ν
(n) (ν) (n)
X X X
pij = fij pjj (5.29)
n=1 ν=1 n=0
M M
(ν) (n)
X X
≤ fij · pjj (5.30)
ν=1 n=0
P (n) P (n)
This inequality allows to prove that the series n=1 pij is convergent whenever n=1 pjj is
P (n)
convergent. In particular, if j is transient than n=1 pij < +∞ for any i ∈ S and we can
(n)
conclude that pij → 0 when n → ∞

Proof: [of theorem 18] Let C be a transient equivalence class with a finite number of elements. If
(n)
C were closed, then for any i ∈ C and j 6∈ C we would have pij = 0 ∀n ∈ N. On the other hand,
n
since P is for any n ∈ N a stochastic matrix, we have
X (n)
pik = 1. (5.31)
k∈C

65
(n)
On the other hand, if k is a transient state, by lemma 8 limn→∞ pik = 0 for anyi ∈ C. Hence,
taking the limit for n → ∞ on both sides of (5.31) we obtain 0 = 1!
According to theorems 17 and 18 an equivalence class with a finite number of states is recurrent if
and only if it is closed.
Example 17. Let us consider the Markov chain with state space S = {0, 1, 2, 3, 4, 5} and stochastic
matrix  
1 0 0 0 0 0
 1/4 1/2 1/4 0 0 0 
 
 0 1/5 2/5 1/5 0 1/5 
P =  0

 0 0 1/6 1/3 1/2  
 0 0 0 1/2 0 1/2 
0 0 0 1/4 0 3/4
By checking the associated diagram (which is left as an exercise) it is easy to see that S can be
partitioned into the disjoint union of three equivalence classes:

S = {0} ∪ {1, 2} ∪ {3, 4, 5}

The classes {0} and {3, 4, 5} are closed hence their elements are recurrent, while the class {1, 2}
is not closed hence its element are transient

The most interesting and not trivial examples are those where the state space S has an infinite
(countable) number of elements. We study in some detail the random walk on the line.
Example 18. Let us consider the random walk on the line described by a Markov chain with S = Z
and pij given by

pi(i+1) = p, pi(i−1) = 1 − p, pij = 0 if j 6= (i + 1), (i − 1)

where the parameter p is assumed to be strictly positive and strictly less than 1, p ∈ (0, 1). Under
this assumptions the chain is irreducible; there is a unique equivalence class, hence all states i ∈ Z
are transient or else all states are recurrent.
1−p p
... ◦ i−1 < i > i+1 ◦ ...

It is sufficient to study the recurrence/transience property of one particular state to determine the
same property for the other states. Let us consider for notational simplicity the origin i = 0 and
P (n)
study the convergence of the series n p00 . In fact, by the particular form of the stochastic matrix,
in order to come back to the origin we need to take an equal number of steps forward and backward,
(n)
hence p00 = 0 whenever n is odd. In the case the number of steps is even, we have:
 
(2n) 2n n (2n)! n
p00 = p (1 − p)n = p (1 − p)n
n (n!)2
By Stirling formula √
n! ∼ 2πn(n/e)n , n→∞
we get the following asymptotic equivalence for n → ∞:

(2n) (4p(1 − p))n


p00 ∼ √ , n → ∞.
πn

66
P (2n)
If p = 1/2 then 4p(1 − p) = 1 and the series n p00 has infinite sum. in this case the state i = 0
and all the states of the irreducible Markov chain are recurrent .
P (2n)
If p ∈ (0, 1), p 6= 1/2, then the series n p00 is convergent, hence all the states of the irreducible
Markov chain are transient .

5.5 Invariant (or stationary) distribution


Let {Xn } be a Markov chain with stochastic matrix P . Let us recall that the symbol λn denotes
the row vector associated to the distribution of the random variable Xn :

λni := P(Xn = i), i ∈ S.

By construction, the vector λn satisfies the following conditions:


X
λni ≥ 0 ∀i ∈ S, λni = 1.
i∈S

Moreover, given the distribution λn at time n we can compute the distribution at future times in
terms of the formula λn+1 = λn P . More generally

λn+m = λn P m . (5.32)

We define the stationary or invariant distribution as that particular probability distribution on S


invariant under the time evolution described by (5.32).
Definition 31. A probability distribution λ on S is said to be invariant or stationary if

λ = λP (5.33)

We remark that if λ is an invariant distribution, then by iterating (5.33) we get

λ = λP n ∀n ∈ N (5.34)

(a) Non-invariant distribution (b) invariant distribution

P P
S S
λi λi λi

t t

Figure 5.1: Difference between a generic distribution and an invariant one

If the set of states has a finite number of elements #S = N then the invariant distribution, if
it exists, is described by a row vector λ = (λ1 , ..., λN ) satisfying the following conditions:
1. λi ≥ 0, i = 1, ..., N ;

67
PN
2. i=1 λi = 1;
3. λ = λP .
In other words λ is a left eigenvector of P with eigenvalue
PN 1. Moreover all its components are
non-negative and it fulfils the normalization condition i=1 λi = 1. In fact, the computation of
this vector can be reduced to a problem of linear algebra.
Example 19. Let us consider the simple case where S = {1, 2} and stochastic matrix has the
following form  
(1 − α) α
P =
β (1 − β)
where α ∈ [0, 1], β ∈ [0, 1].
The stochastic matrix P can be equivalently described by the following diagram:
α

1−α 1 2 1−β

The invariant distribution is associated to a row vector λ = (λ1 , λ2 ), left eigenvector of P with
eigenvalue 1:  
 1−α α 
λ1 λ2 = λ1 λ2
β 1−β
or, equivalently, by the solution of the linear system:

−αλ1 + βλ2 = 0
αλ1 − βλ2 = 0
By imposing the normalization condition λ1 + λ2 = 1 we get:
• if α + β 6= 0
β α
λ1 = , λ2 = .
α+β α+β
In this case we have existence and uniqueness of the invariant distribution.
• if α + β = 0 then P is the identity matrix and any probability distribution is invariant. In
this case we have existence but not uniqueness of the invariant distribution.
Exercise Compute the invariant distribution (if it exists) of the following Markov chains.
1. S = {1, 2, 3}  
1/3 1/3 1/3
P =  1/4 1/2 1/4 
1/6 1/3 1/2
2. Symmetric random walk wit reflecting barriers: S = {1, 2, . . . , N },
 
1/2 1/2 0 ... 0
 1/2 0 1/2 0 0 
P = .
 
.. .. .. .
 .. . ..

. . 
0 0 ... 1/2 1/2

68
3. Random walk with absorbing barriers: S = {1, 2, . . . , N },
 
1 0 0 ... 0
 1−p 0 p 0 0 
P = .
 
.. . .. ..
 .. . ..

. . 
0 0 ... 0 1

where p ∈ (0, 1).


4. Cyclic random walk: S = {1, 2, . . . , N },
 
0 p 0 ... 1−p
 1−p 0 p 0 0 
P = .
 
.. .. .. ..
 ..

. . . . 
p 0 ... 1−p 0

where p ∈ (0, 1).


In the case where the number of states N is high, in particular when S has infinite states,
then the technique applied in the example above is no longer feasible. In the following we are
going to present a number of results and techniques that allow to determine whether and invariant
distribution exists and is unique without explicitly computing it.
The first result shows that if an invariant distribution exists, then it is concentrated on the
recurrent states. For future use, we give the following lemma.
Lemma 9. Let {a(n) }n∈N be a sequence of functions a(n) : S → R. We shall use the notation
(n)
a(n) (i) ≡ ai . If the following conditions hold
(n)
∃ lim ai , ∀i ∈ S;
n→+∞
(n)
X
|ai | ≤ bi , ∀i ∈ S, ∀n ∈ N and bi < +∞;
i∈S

then
(n) (n)
X X
lim ai = lim a
n→+∞ n→+∞ i
i∈S i∈S

Theorem 19. Let λ be an invariant distribution for the stochastic matrix P . Then λj = 0 for
any transient state j ∈ S

(n)
Proof: Since by assumption j is transient, we have limn→∞ pij = 0 for any i ∈ S. On the other
hand, since by assumption λ is an invariant distribution the following holds:
(n)
X
λj = λi pij , ∀n ∈ N
i∈S

By letting n → ∞ and taking the limit under the sum thanks to lemma 9 and the uniform bound
(n) P
|λi pij | ≤ λi ∀n ∈ N, λi < ∞, we obtain:
(n)
X
λj = λi lim pij = 0.
n→∞
i∈S

69
Example 20. In the case of example 18, if p 6= 1/2 all states are transient, hence an invariant
distribution cannot exists.

5.6 Positive recurrent and null recurrent states


Let us consider a Markov chain with stochastic matrix P . For any j ∈ S we have alredy defined
the first return time in j as the random variable Tj : Ω → N ∪ {+∞} defined as

Tj (ω) := inf{n ≥ 1 : Xn (ω) = j}

For any i ∈ S we shall denote Ei the expectation with respect to the probability measure Pi =
P( |X0 = i).
We shall define the mean return time in i as the expected value of Ti given X0 = i:

mi := Ei [Ti ]

Clearly, if i is a transient state the random variable Ti assumes the value +∞ on a set of strictly
positive probability Pi and mi = +∞.
On the other hand, if i is a recurrent state then Pi (Ti = +∞) = 0 and mi is given by
∞ ∞
(n)
X X
mi = Ei [Ti ] = nPi (Ti = n) = nfii
n=1 n=1

Definition 32. A recurrent state i ∈ S is said to be:

• positive recurrent if mi < +∞


• null recurrent if mi = +∞
Let us also recall the definition of number of returns to the state i , i.e the random variable
Ni : Ω → N ∪ {+∞} defined as: X
Ni (ω) := 1Xn =i (ω).
n≥1

We shall also consider the following random variables:


Pn
1. Nin := m=1 1Xm =i . It gives the number of visits to the state i during the first n steps
(after leaving the initial state);
Pn
Nin 1Xm =i
2. n = m=1
n . It gives the proportion of time before n spent in state i.
(m)
In particular, since Ei [1Xm =j ] = Pi (Xm = j) = pij , we have:
 Pn (m)
Njn m=1 pij

Ei =
n n

70
Theorem 20. Let j be a recurrent state. Then for any i ∈ S:
Njn a.s 1Tj <+∞

n mj

and
n ∗
1 X (m) fij
pij →
n m=1 mj

Corollary 1. If C is a closed set of recurrent states which does not contain proper closed sets then
for any i, j ∈ C:
n
1 X (m) 1
p → (5.35)
n m=1 ij mj
Njn a.s. 1
Moreover, if P(X0 ∈ C) = 1 then n → mj

In particular, if {Xn } is an irreducible recurrent Markov chain, then Eq. (5.35) holds for any
i, j ∈ S. If j is a null recurrent state, then the mean proportion of time before n spent in state i
converges to 0 as n → ∞, while if j is a positive recurrent state the same quantity converges to
a strictly positive value. We do not give the detailed proof of this result, which relies upon the
strong low of large numbers and the strong Markov property, but limit ourselves to appling it.

The following result shows that if an invariant distribution exists, it is concentrated on positive
recurrent states.
Theorem 21. Let λ be a stationary distribution and j a null recurrent state. Then λj = 0

P (m)
Proof: If λ is a stationary distribution, then for any m ∈ N we have λj = i∈S pij λi . By taking
the sum over m = 1, .., n and dividing by nwe get
Pn (m)
X m=1 pij
λj = λi .
n
i∈S

Pn (m)
m=1 pij
Since n ≤ 1, by lemma 9 we can take the limit under the sum and by theorem 20 we
obtain λj = 0.
According to this result, if there are no positive recurrent states then an invariant distribution
cannot exist.
The following theorem shows that null-recurrence and positive recurrent are class properties
Theorem 22. if i is positive recurrent and i ∼ j then j is positive recurrent.

Proof: Since i is recurrent and i ∼ j then j is recurrent by theorem 16. Moreover, since i ∼ j
(n ) (n )
there exist n1 , n2 ∈ N such that pji 1 > 0 and pij 2 > 0. Hence, for any m ∈ N we have:

(n +m+n2 ) (n ) (m) (n )
pjj 1 ≥ pji 1 pii pij 2

71
By summing over m = 1, ..., n and dividing by n we get
Pn1 +n+n2 (m) Pn1 +n2 (m) Pn (m)
m=1 pjj m=1 pjj (n1 ) (n2 ) m=1 pii
− ≥ pji pij .
n n n
(n ) (n )
pji 1 pij 2
by letting n → ∞, the left hand side converges to 1/mj while the right hand side to mi ,
hence
(n ) (n )
1 pji 1 pij 2
≥ > 0,
mj mi
and we can deduce mj < +∞.
By this result, we can conclude that in an irreducible Markov chain all states are of the same type
(transient, positive recurrent, null recurrent).
Theorem 23. Let C ⊂ S be a closed finite set. Then C contains at least a positive recurrent sate.

Proof: Since by assumption C is closed we get


X (m)
pij = 1, i ∈ C, m ∈ N
j∈C

By summing over m = 1, .., n and dividing by n we get:


X nm=1 p(m)
P
ij
= 1, i ∈ C, n ∈ N
n
j∈C
Pn (m)
m=1 pij
If all states j ∈ C were transient or null recurrent, then we would have n → 0 for all
j ∈ C, hence
X nm=1 p(m) (m)
P Pn
ij m=1 pij
X
1 = lim = lim = 0,
n→∞ n n→∞ n
j∈C j∈C

obtaining a contradiction.
According to the previous result, all states j ∈ C of a closed equivalence class C with a finite
number of states must be positive recurrent.
Remark 10. In an irreducible Markov chain with a finite number of states, all states must be
positive recurrent.
Remark 11. If #S < ∞ then there cannot exists null recurrent states.
Null recurrent states can be found only in those Markov chains where there are infinite possible
states, as the following example shows.
Example: Random walk on the line. Let us consider again the case of example 18 where
S = Z and the transition probabilities are given by

pi(i+1) = p, pi(i−1) = 1 − p, pij = 0 if j 6= (i + 1), (i − 1)

where the parameter p ∈ [0, 1]


1−p p
... ◦ i−1 < i > i+1 ◦ ...

72
• If either p = 0 or p = 1 then the equivalence classes contain just one element, i.e. S is
partitioned into the disjoint union of equivalence classes that are singletons S = ∪i∈Z {i}.
Since the classes are not closed, all states are transient.
• If p ∈ (0, 1) then the chain is irreducible. In this case
– If p 6= 1/2 then, as proved in example 18, all states are transient.
– If p = 1/2 then all states are recurrent. In order to determine whether they are positive
Pn (m)
p
recurrent or null recurrent, we have to determine whether the limit limn→+∞ m=1n ii
vanishes or is strictly positive. Since the chain is irreducible, it it sufficient to study a
particular state, for example i = 0. As shown in example 18, the transition probabilities
(n)
{p00 }n vanish for n odd, while for n even they are equal to
 
(n) 2n n (2n)! n
p00 = p (1 − p)n = p (1 − p)n
n (n!)2

By Stirling formula √
(n)
p00 ∼ n! ∼ 2πn(n/e)n , n→∞
Pn (m)
(n) p00
Hence, limn→∞ p00 = 0. By Eq. (C.2) we get limn→∞ m=1
n = 0. We can infer
that 0 and all the other states are null recurrent.

We are now ready to state the main result of this section.

Theorem 24. let {Xn } be an irreducible positive recurrent Markov chain. Then there exists a
unique invariant distribution λ given by

λj = 1/mj , j∈S (5.36)

We give a sketch of the proof in the appendix. Here we limit ourself to show some examples
and applications.
First of all, it is interesting to remark that Eq 5.36 allows to compute the values of the mean return
time to the state j for any j ∈ S provided that assumptions of theorem 24 are fulfilled and we
can compute explicitly the values λj . Indeed, in the case where the cardinality of S is finite, the
computations of λ can be reduced to a problem of linear algebra.

Let us consider first of all the Markov chain with two states described in example 19. In that
case if α, β 6= 0 the chain is irreducible and positive recurrent. The invariant distribution λ is given
by (λ1 , λ2 ) = (β/(α + β), α/(α + β)). Hence the mean return times m1 , m2 to the states 1, 2 are
equal to
α+β α+β
m1 = , m2 = .
β α

Bistochastic matrices
A stochastic matrix (Pij )i,j∈S , with #S = N , is said to be bistochastic if
X
∀j ∈ S Pij = 1.
i∈S

73
In this case it is easy to check that the uniform distribution on S given by λj = 1/N ∀j ∈ S is an
invariant distribution, i.e. X
λj = λi Pij , ∀j ∈ S.
i

A particular example of a Markov chain associated to a bistochastic matrix is the random walk on
a cyclic graph, where S = {1, 2, . . . , N },
 
0 p 0 ... 1−p
 1−p 0 p 0 0 
P = .
 
.. .. .. .
 .. .. 
. . . 
p 0 ... 1−p 0

with p ∈ (0, 1) In this case the chain is irreducible and positive recurrent. Since the invariant
distribution is given by λj = 1/N ∀j ∈ S, the mean return time to each state is given by mj = N .

Detailed balance condition


A probability distribution λ = (λj )j∈S and a stochastic matrix P = (Pij )i,j∈S are said to be in
detailed balance if the following condition holds:

λi Pij = λj Pji , ∀i, j ∈ S (5.37)

In this case
P it is easy to verify that if (5.37) holds, then λ is an invariant distribution for P ,
i.e. λj = i λi Pij . Remarkably, there are several examples where condition (5.37) holds. The
following subsetction describes one of them.

Symmetric random walk on a connected graph


Let us consider a graph, i.e. a set of points called vertices joined by edges . To any vertex we shall
associate a state i ∈ S and consider its valency vi defined as the number of edges at i. We shall
consider only connected graphs, i.e. graphs where any couple of vertices are joined by a suitable
chain of edges. We can associate to the graph a stochastic matrix P with transition probabilities
defined as (
1/vi if i and j are connected by an edge
pij =
0 otherwise
We present here below some examples of graphs and of the diagrams associated to the corresponding
Markov chains
◦ ◦.

◦. ◦
>

1/2
◦. < ◦.
1/2

74
1/2
1 >2
1/2

4 3
1 2

>
1/3
1/3
1/3
4 >3
If the graph is connected, then it is associated o an irreducible Markov chain. If the number of
states is finite, then all states are positive recurrent and there exists a unique invariant distribution
λ given by λi = P vi vj . The simplest way to show that such a distribution is stationary is to
j∈S
check that λ and P are in detailed balance

λi Pij = λj Pji ,
P
The same argument works in the case where S contains infinite states and j∈S vj < +∞.

5.6.1 The case of reducible Markov chains


If we consider a reducible Markov chain there will be more than one equivalence class. We have
to consider essentially three cases:
• There are no positive recurrent classes. In this case there cannot exist any invariant distri-
bution.
• There is a unique positive recurrent class. In this case there exists a unique invariant distri-
bution concentrated there.
• There isn’t a unique positive recurrent classes. In this case there exist infinitely many invari-
ant distributions
Examples

1. Random walk with absorbing barriers: S = {1, 2, . . . , N }, p ∈ (0, 1) and stochastic matrix P
given by :  
1 0 0 ... 0
 1−p 0 p 0 0 
P = .
 
.. . .. ..
 .. . ..

. . 
0 0 ... 0 1
In this case the set of states S is partitioned in three equivalence classes S = {1} ∪ {N } ∪
{2, ..., N − 1}, where {2, ..., N − 1} is not closed (hence it is transient) and the two classes {1}
and {N } are closed and positive recurrent. We can easily construct two invariant distributions
λ and µ, concentrated on {1} and {N } respectively and given by λ = (1, 0, ..., 0) and µ =
(0, ..., 0, 1). More generally any convex combination of λ and µ of the form αλ + (1 − α)µ =
(α, 0, ..., 0, 1 − α), α ∈ [0, 1], is an invariant distribution.

75
2. #S = 6 and stochastic matrix P given by
 
1/2 1/2 0 0 0 0
 0 0 1 0 0 0 
 
 1/3 0 0 1/3 1/3 0 
P =  0

 0 0 1/2 1/2 0 

 0 0 0 0 0 1 
0 0 0 0 1 0

There are three equivalence classes S = {1, 2, 3} ∪ {4} ∪ {5, 6}, where {5, 6} is closed (hence
positive recurrent) while the other classes are transient. There exists a unique invariant
distribution λ concentrated on {5, 6}, given by λ = (0, 0, 0, 0, 1/2, 1/2) .

5.7 Long time behaviour and convergence to the stationary


distribution
Let us consider a Markov chain with stochastic matrix P .
Definition 33. A probability distribution λ on S is said to be asymptotic if for any choice of an
initial distribution λ0 the distribution λn of the random variable Xn converges to λ:

lim λnj = λj , ∀j ∈ S.
n→∞

It can be rather easily proved that a sufficient condition for a probability distribution λ to be
asymptotic is
(n)
lim pij = λj , ∀i, j ∈ S (5.38)
n→∞

Indeed, given an arbitrary initial distribution λ0 , we have:


!
(n) (n)
X X X
lim λnj = lim λ0i pij = λ0i lim pij = λ0i λj = λj ,
n→∞ n→∞ n→∞
i∈S i∈S i∈S

where we can pass the limit under the sum by dominated convergence (see lemma 9) since:
(n)
X
|λ0i pij | ≤ λ0i , ∀n ∈ N, λ0i = 1 < ∞.
i∈S

We cannot expect that in general condition (5.38) holds. Let us consider the stochastic matrix
P given by  
0 1
P =
1 0
It is easy to check that P n = P if n is odd and P n = I for n even. In this case the n-steps
(n)
transition probabilities pij do not admit a limit.
(n)
An important property related to the existence of the limit limn→∞ pij is presented in the
following definition.
(n)
Given a state i ∈ S, let us consider the set {n ≥ 1 : pii > 0}. Let di be the positive integer
defined as:
(n)
di := G.C.D.{n ≥ 1 : pii > 0}.

76
Definition 34. If di ≥ 2 then the state i is said to beperiodic of period di .
(n)
If pii = 0 for all n ≥ 1or if di = 1, then the state i is said to be aperiodic.

Examples:
• pii > 0 then di = 1 and i is aperiodic.
• In the case of the random walk on the line (example 18) the state 0 is periodic of period 2
The following result shows that the period is a class property

Theorem 25. Let i, j ∈ S and i ∼ j. Then di = dj .


In particular, in an irreducible Markov chain all states have the same period. The proof of
theorem 25 is presented in the appendix.
The following result relates periodicity to the existence on the asymptotic distribution.

Theorem 26. Let {Xn } be an irreducible, positive recurrent aperiodic Markov chain. Then for
any i, j ∈ S
(n)
lim pij = λj (5.39)
n→∞

where λj = 1/mj , j ∈ S is the unique stationary distribution.

5.8 Absorption probabilities and branching processes


Let {Xn } be a Markov chain, S its state space and j ∈ S an absorbing state.
Let us denote with {hi }i∈S the vector of the absorption probabilities, defined as

hi := P(Tj < ∞|X0 = i),

Theorem 27. The vector {hi }i∈S is the minimal non-negative solution of the system:

xj = P
1
(5.40)
xi = k∈S pik xk , i 6= j

Proof: The proof is divided into two steps

1. {hi } is a solution of system (5.40).


Indeed it is simple to see that hj = P(Tj < ∞|X0 = j) = 1 since j is a recurrent state (any
absorbing state is trivially recurrent).
If i 6= j then we have:
X
hi = P(Tj < ∞|X0 = i) = P(Tj < ∞|X1 = k, X0 = i)P(X1 = k|X0 = i)
k∈S
X
= P(Tj < ∞|X1 = k)P(X1 = k|X0 = i)
k∈S
X
= hk pik
k∈S

77
2. Let us show now that if {xi } is a non-negative solution of (5.40) then

xi ≥ hi , i∈S. (5.41)

If i = j then (5.41) is trivially satisfied.


6 j we have:
If i =
X X X X
xi = pik xk = pik xk + pik xk = pij + pik xk
k∈S k=j k6=j k6=j
 
X X
= pij + pik1 pk1 j + pk1 k2 x2 
k1 6=j k2 6=j
X
= Pi (Tj = 1) + Pi (Tj = 2) + pik1 pk1 k2 x2
k1 ,k2 6=j

where Pi ( ) ≡ P( |X0 = i).


By iterating the same procedure n times, we get
n
X X n
X
xi = Pi (Tj = k) + pik1 · · · pkn−1 kn xn ≥ Pi (Tj = k)
k=1 k1 ,...,kn 6=j k=1

By taking the limit for n → ∞ we finally obtain:



X
xi ≥ Pi (Tj = k) = P(Tj < +∞|X0 = i) = hi
k=1

5.8.1 Branching chain


Let us consider a particular type of Markov chains describing the evolution of the number of
”individuals” or ”particles” of a particular population. In particular, S will be equal to N and
the random variable Xn will describe the number of individuals present at the n − th generation.
Analogously, Xn+1 will give the number of individuals present at the n+1-th generation, generated
by the Xn individuals present at the previous generation according to the following procedure.
• each individual gives birth to ξ offsprings, where ξ is a random variable with values in N and
discrete probability density {wk }k∈N , and
X
P(ξ = k) := wk , wk = 1. (5.42)
k∈N

• The number of offsprings of different individuals are described by independent and identically
distributed random variables {ξ j }j , their distributionbeing given by (5.42).
According to this model, the random variables {Xn } are a stationary Markov chain with tran-
sition probabilities given by:

pij = P(Xn+1 = j|Xn = i) = P(ξ 1 + · · · + ξ i = j)

78
In particular, for i = 1 we have:

p1j = P(Xn+1 = j|Xn = 1) = P(ξ 1 = j) = wj .

Clearly the state i = 0 is an absorbing state, since p00 = 1 and p0j = 0 for any j ≥ 1.
It is interesting to investigate the extinction probabilities, namely the probability of reaching
the absorbing state 0 starting at initial time from i individuals

P(T0 < ∞|X0 = i)

We shall denote these quantities with the symbols (hi )i∈N since they are precisely absorption
probabilities of the type describe in the previous section, i.e. the probability of reaching the state
0 starting initially from X0 = i individuals.
By the assumed independence of the generations arising from different individuals, we can write
the following identity hi = (h1 )i . By applying Theorem 27 we have that h1 is the minimal non
negative solution of the equation :
X X
x= p1i xi = wi xi (5.43)
i i

Let us consider for instance the problem of ”extinction of surnames”. If we adopt the simplified
model where each male individual will have three children, and the probability of males is equal
to 1/2, then the number ξ of male offsprings of each individual is described by a Binomial random
variable with distribution:  
3 1
P(ξ = k) = , k = 0, . . . 3.
k 23
The extinction probability h1 of a surname is the minimal non negative solution of the equation
1 3 3 1
x= + x + x2 + x3
8 8 8 8

which yields h1 = 5 − 2.
In the general case, Eq. (5.43) can be cast in the equivalent form

x = Φ(x) (5.44)

where Φ : [0, 1] → R is the function defined by the power series Φ(x) = i wi xi . Since i wi =
P P
1 < +∞ the radius of convergence of the power series is clearly greater or equal than 1, hence the
map Φ is C ∞ on [0, 1) and
X X
Φ0 (x) = iwi xi−1 , Φ00 (x) = i(i − 1)wi xi−2 .
i≥1 i≥2

We shall assume that w0 = P(ξ = 0) > 0 (otherwise it is clear that the extinction probability
vanishes!). This implies that w1 < 1. By the explicit form of the function Φ and its derivatives we
have:
1. Φ(0) = p0 and Φ(1) = 1.
Φ0 (x) ≥ 0 for all x ∈ (0, 1) and Φ0 (x) > 0 for all x ∈ (0, 1) if p0 6= 1. Moreover limx→1 Φ0 (x) =
2. P
i≥1 iwi = E[ξ]. We shall denote the mean value of the random variable ξ with the symbol
µ ≡ E[ξ].

79
3. Φ00 (x) ≥ 0 for all x ∈ (0, 1) and Φ00 (x) > 0 for all x ∈ (0, 1) if there exists at least a wi > 0
with i ≥ 2.
In this setting, checking whether the extinction probability h1 is equal or else strictly smaller
than one is equivalent to investigating whether the equation admits, in addition to the trivial
solution x = 1 another solution belonging to the interval (0,1). Equivalently, this corresponds to
investigating the existence of intersections between the two curves in R2 of equations y = x and
y = Φ(x) respectively. We can show that if µ ≤ 1 there are no roots of equation x = Φ(x) in (0, 1),
while if µ > 1 then there exists a unique root in (0,1).
• Let us consider the case where µ < 1. In this case limx→1 Φ0 (x) < 1. Since Φ00 (x) ≥ 0 for
all x ∈ (0, 1) we have that Φ0 is a monotone non decreasing function, hence Φ0 (x) < 1 for all
x ∈ (0, 1). If we consider the difference map d(x) := Φ(x) − x we have that d0 (x) < 0 for all
x ∈ (0, 1) hance is strictly decreasing. It attains a positive value (equal to w0 at x = 0 and
reaches the value 0 at x + 1. Since it is strictly decreasing there cannot be other points in
(0, 1) where d(x) = 0.
P
• If µ = 1 then necessarily there exists at least a wi > 0 with i ≥ 2. Indeed µ = w1 + i≥2 iwi
and w1 < 1 since we assumed w0 > 0. In this case we have that Φ00 (x) ≥ 0 for all x ∈ (0, 1)
and Φ0 is a strictly monotone increasing function in (0,1). By reasoning as above we can
again prove that the map d(x) := Φ(x) − x reaches the value 0 only at the end point of the
interval [0,1].
• In the case µ > 1, by the continuity of Φ0 we can conclude that there exists an  > 0 such that
(1 − , 1] ⊂ (0, 1] and Φ0 (x) > 1 for all x ∈ (1 − , 1]. In particular the derivative d0 = Φ” − 1
of the difference map d will be strictly positive in the interval (1 − , 1), hence d(x) < 0 for
all x ∈ (1 − , 1). By the continuity of the map d and the condition d(0) = p0 > 0 there
must exists at least a point x∗ ∈ (0, 1) where d(x∗) = 0. This point is also unique. Indeed,
if there where another point x∗∗ ∈ (0, 1) where d(x∗∗ ) = 1, with x∗∗ < x∗ < 1, then by Rolle
theorem there were two points x1 ∈ (x∗∗ , x∗ ) and x2 ∈ (x∗ , 1) where d0 (x1 ) = d0 (x2 ) = 0,
By applying again Rolle theorem to the function d0 , there were a point x3 ∈ (x1 , x2 ) such
that Φ00 (xP 00
3 ) = 0, This is impossible since Φ (x) > 0 for all x ∈ (0, 1) since the condition
µ = w1 + i≥2 iwi > 1 implies that there exists at least a wi > 0 with i ≥ 2

5.9 Birth and death chains


Birth and death chains are a particular class of Markov chains where it is possible to compute
explicitly the invariant distribution, when it exists. They are not important by themselves but as
building blocks of more general stochastic processes with continuous instead of discrete time. Let
us consider on the state space S = N the stochastic matrix
  

 ri j = i r0 b0 0 0 . . . . . .

b j = i + 1  d1 r1 b1 0 . . . . . . 
i
Pij = P =  0 d2 r2 b2 0
 
... 
 d i j = i − 1
.. .. .. .. .. ..
  

0 |i − j| > 1 . . . . . .

The coefficients bi and di , i ∈ N are called birth and death rates ( starting from the state i) respec-
tively . By the normalization condition of the rows we have di + ri + bi = 1 ∀i. In the following

80
we shall assume that all the coefficients bi , di , i ∈ N are strictly positive, in such a way that the
chain is irreducible.

An invariant distribution λ on S, if it exists, must be a solution of the system


λ = λP (5.45)
X
λi = 1 (5.46)
i∈S
λi ≥ 0, ∀i ∈ S. (5.47)
By writing explicitly Eq (5.45) we get
X
λi Pij = λj
i∈S

and by the explicit form of the stochastic matrix P we have:


λ0 r0 + λ1 d1 = λ0
λj−1 bj−1 + λj rj + λj+1 dj+1 = λj , j≥1
By using the normalization condition di + ri + bi = 1 we can get rid of the explicit dependence of
the coefficients ri and reduce to:
λj−1 bj−1 − λj bj = λj dj − λj+1 dj+1 , j≥1
By induction over j one can easily prove the general relation
dj+1 λj+1 = bj λj , j≥0 (5.48)
which gives
j−1
Y bk
λj = λ0 , j ≥ 1.
dk+1
k=0
Clearly, if λ0 > 0 then λj > 0 for any j ∈ S hence the inequalities (5.47) are valid. The normal-
ization condition (5.46) becomes
 
X j−1
Y bk
λ0 1 + =1
dk+1
j≥1 k=0
P Qj−1 bk
If the series j≥1 k=0 dk+1 has finite sum S then the invariant distribution λ is given by
j−1
1 Y bk
λj =
1+S dk+1
k=0
P Qj−1 bk
If the series j≥1 is not convergent then an invariant distribution cannot exist. Let us
k=0 dk+1
Qj−1 bk
consider the simple example where bi = p , di = q ∀i. In this case k=0 dk+1 = (p/q)j and the
Qj−1 bk
series j≥1 k=0 dk+1 reduces to the geometric series j≥1 (p/q)j . If p ≥ q then there cannot
P P

exist an invariant distribution, while if p < q then the unique invariant distribution is given by
(p/q)j
λj = 1−p/q .

81
5.9.1 Ehrenfest model for diffusion
Ehrenfest model is a Markov chain introduced by Paul and Tatiana Ehrenfest at the beginning of
1900. This simplified model sheds some light on the problems related to the difficulties to bring
together on the one hand the (microscopical) reversibility of the law of motion describing the time
evolution of the molecules of a gas and, on the other hand, the (macroscopic) irreversibility of
thermodinamic systems.
Let us consider a closed vessel containing 2N particles (molecules of a gas). The vessel will be
divided in two sections denoted with letters A and B, separated by a wall. The sections will be
connected just by a tiny hole in the wall. If at initial time all the particle are contained in a
particular section, our experience suggests that some of them they will move to the other section
in such a way that after a while we will find about N particles in section A and N articles in section
B.
The stochastic model proposed by P. and T. Ehrenfest consists in a Markov chain with state space
S = {0, ..., 2N } where the state i denotes the number of particles contained in section A of the
vessel (hence 2N − i will give the number of particles contained in B). The stochastic dynamic
governing the time evolution of this system is described by the stochastic matrix P , where

i
 2N
 j =i−1
2N −i
Pij = 2N j =i+1

0 otherwise

P describes essentially the procedure where at each step a particle is chosen (with uniform proba-
bility) and moved in the section different from the one it occupied before the choice.
The stochastic matrix P assumes the following form
 
0 1 0 0 ... ...
 1/2N 0
 1 − 1/2N 0 ... ...  
 0 2/2N 0 1 − 2/2N 0 ... 
P =  .. .. .. .. .. .. 

 . . . . . . 
 
.. .. ..
. . . 0 1 0

The chain is irreducible and positive recurrent, hence there exists a unique invariant distribution λ,
which represent the equilibrium distribution of this system. λ coincides with the following binomial
distribution    2N
2N 1
λi = , i ∈ {0, ..., 2N }
i 2
as one can easily check by proving that λ and P are in detailed balance.
It is important to point out that in the case 2N >> 1 the distribution λ tends to concentrate
near N . For example, if N = 100 then P(j ∈ [40, 60]) > 0.95. We have to keep in mind that in the
case we want to provide a model for a gas in a vessel N will have to be of the order of magnitude
of the Avogadro number, i.e. N ∼ 1023
Since the chain is periodic with period d = 2 we cannot apply Theorem 26. Nevertheless corollary
1 still holds, hence for any state j ∈ S
Njn 1
→ λj =
n mj

82
with probability 1, where mj = Ej [Tj ] is the mean return time to j.
This means in particular that the chain will spend most of the time in the most probable states,
which for N >> 1 are those close to the state i = N . On the other hand the proportion of time
spent in states j such that |j − N | >> 1 will be negligible.
In is also important to point out that the chain is recurrent. This means that whatever the
initial state is, with probability 1 the chain will return to it in the future. This holds also for the
state i = 2N where all the particle are contained in one particular section of the vessel. According
to this model, after the initial time the particles will move to the other section and during the
history of the chain we will observe that most of the time the chain will occupy states j near N .
On the other hand, with probability 1 we will observe in the future that the particle will come back
to the original situation. The model allows to predict the mean return time mj = 1/λj , that in the
case of j = 2N is equal to m2N = 22N . This fact provides an explanation of the apparent paradox,
since it is true that the chain will reach in the future any state, even the most improbable, but the
time we have to wait in this case is incredibly long!

5.10 Entropy rate of a Markov chain


Given a discrete-time stochastic process (Ω, F, P, (Fn )n∈N , (Xn )n∈N ), where (Xn )n≥0 is a sequence
of discrete random variables with values in a finite set S = {x1 , ..., xn }, we can apply the notions
of entropy and conditional entropy to describe the flux of information, i.e. the way information
changes (actually increases as we shall see in a while) with time.
Given a finite number X0 , ..., Xn of random variables with joint distribution p(i0 , . . . , in ) :=
P(X0 = xi0 , . . . , Xn = xin ), their joint entropy is given by:
X
H(X0 , . . . , Xn ) = − p(i0 , . . . , in ) log p(i0 , . . . , in ) (5.49)
i0 ,...,in

It gives the average amount of information contained in the event of the form {X0 = xi0 , . . . , Xn =
xin }, which describe the history of the process up to time n.
We can also consider the conditional entropy of Xn given the event {X0 = xi0 , . . . , Xn−1 = xin },
given by
X
H(Xn |X0 = xi0 , . . . , Xn−1 = xin−1 ) = − p(in |i0 , . . . , in−1 ) log p(in |i0 , . . . , in−1 ) (5.50)
in

where p(in |i0 , . . . , in−1 ) = P(Xn = xin |X0 = xi0 , . . . , Xn−1 = xin−1 ). The quantity H(Xn |X0 =
xi0 , . . . , Xn−1 = xin−1 ) gives the randomness contained in Xn when we have observed the occur-
rence of the event {X0 = xi0 , . . . , Xn−1 = xin−1 }. Analogously, the conditional entropy of Xn
given X0 , ..., Xn−1 is given by:
X
H(Xn |X0 , . . . , Xn−1 ) = p(i0 , . . . , in−1 )H(Xn |X0 = xi0 , . . . , Xn−1 = xin−1 )
i0 ,...,in−1
X
=− p(i0 , . . . , in−1 , in ) log p(in |i0 , . . . , in−1 )
i0 ,...,in−1 ,in

It is easy to verify the following identity

H(X0 , . . . , Xn ) = H(X0 , . . . , Xn−1 ) + H(Xn |X0 , . . . , Xn−1 ), (5.51)

83
which generalizes (2.7).
In particular, since H(Xn |X0 , . . . , Xn−1 ) ≥ 0, we have that the joint entropy increases with time:

H(X0 , . . . , Xn ) ≥ H(X0 , . . . , Xn−1 ), ∀n ≥ 1.

In this respect, a meaningful figure of merit is the entropy rate h of the process, defined as
H(X0 , . . . , Xn−1 )
h := lim
n→∞ n
if the limit exists.

Example 21. If (xn )n are independent and identically distributed then h = H(X0 ). Indeed
H(X0 , . . . , Xn−1 ) nH(X0 )
=
n n
An interesting example where the entropy rate exists and assumes a particular simple form can
be found in the theory of Markov chains. Let us assume that (Xn )n is a stationary Markov chain
with finite state space and let us assume that the initial distribution is an invariant 2 distribution
λ. In this case the following holds:
1. By Markov property

H(Xn |X0 , ..., Xn−1 ) = H(Xn |Xn−1 ), n ≥ 1. (5.52)

Indeed, by Markov property, p(in |i0 , . . . , in−1 ) = p(in |in−1 ), hence


X
H(Xn |X0 , ..., Xn−1 ) = − p(i0 , . . . , in−1 , in ) log p(in |i0 , . . . , in−1 )
i0 ,...,in−1 ,in
X X 
=− p(i0 , . . . , in−1 , in ) log p(in |in−1 )
in−1 ,in i0 ,...in−2
X
=− p(in−1 , in ) log p(in |in−1 )
in−1 ,in

= H(Xn |Xn−1 )

2. By taking into account that the initial distribution λ is invariant, in such a way that P(Xn =
i) = λi for all n, we have
H(Xn |Xn−1 ) = H(X1 |X0 )
Indeed, if P is the stochastic matrix and Pij = P(Xn = j|Xn−1 = i), we have
X
H(Xn |Xn−1 ) = − λi Pij log Pij = H(X1 |X0 ).
i,j

3. By using(5.51) and induction over n, we get

H(X0 , . . . , Xn ) = H(X0 ) + nH(X1 |X0 ) (5.53)


2 Since by assumption the state space is finite, there are positive recurrent states, hence an invariant distribution
exists.

84
By identity (5.53), it is easy to see that the entropy rate h is given by:

H(X0 ) + (n − 1)H(X1 |X0 )


h = lim = H(X1 |X0 ) (5.54)
n→∞ n
Example 22. Let us consider a two-state stationary Markov chain, with stochastic matrix
 
(1 − α) α
P =
β (1 − β)

where α ∈ [0, 1], β ∈ [0, 1], which can be equivalently described by the following diagram:

1−α 1 2 1−β

In the non-trivial case where α + β 6= 0 there is a unique invariant distribution λ given by:
β α
λ1 = , λ2 = .
α+β α+β

The conditional entropy H(X1 |X0 ) and, by (5.54), the entropy rate hof the Markov chain are given
by
β α
H(X1 |X0 ) = (−α log α − (1 − α) log(1 − α)) + (−β log β − (1 − β) log(1 − β))
α+β α+β
β α
= HB (α) + HB (α)
α+β α+β

where HB (p) = −p log p − (1 − p) log(1 − p) denotes the entropy f a Bernoulli random variable X
with distribution P(X = 0) = p, P(X = 1) = 1 − p.

85

You might also like