Marcov Chains
Marcov Chains
Markov chains
In this chapter we are concerned with the simplest non-trivial class of stochastic processes, con-
sisting of sequences of random variables (Xn )n∈N taking values in a discrete set S which in the
following will be called state space. S will be a finite or infinite countable set and we shall label
its elements with an index i ∈ S and call them states. In other terms, we are going to study a
stochastic process (Xt )t∈T where the set of times is discrete T = N and the random variables are
discrete as well, since there exists a discrete set S such that for any n ∈ N the distribution of the
random variable Xn will be uniquely determined by the numbers
through X
µXn (H) = P(Xn ∈ H) = λni , H ∈ B(R)
i∈H
and, analogously, the finite dimensional distributions {µn1 ,...,nm }n1 ,...,nm ∈N will be uniquely deter-
mined by the probabilities
in terms of
X
µn1 ,...,nm (H) = P((Xn1 , . . . , Xnm ) ∈ H) = P((Xn1 , . . . , Xnm ) = (i1 , . . . , im )), H ∈ B(Rm )
(i1 ,...,im )∈H
Hence, without loss of generality in the following we shall limit ourself to the study of the quantities
(5.1) and (5.2) instead of the family of measures {µn1 ,...,nm }n1 ,...,nm ∈N
5.1 Introduction
Among the processes with discrete time set T and discrete state space S, Markov chains play a
particular role since their admit a rather simple mathematical description on the one hand but
find rich and interesting applications on the other hand. We shall be essentially concerned with
the following prediction problems:
46
1. Given the past history of the system, i.e. given the state at times n = 0, 1, ..., m, compute the
probability that at a future time m0 > m the system will reach the state j. This is actually
given by the conditional probabilities
P(Xm0 = j|X0 = i0 , . . . , Xm = im )
2. Predict the long time behaviour of the system, more precisely determine whether there exists
the limit
lim P(Xn = j), j∈S
n→+∞
and interpret the result.
Let us present a couple of examples
Example 13 (The simplest gambling example ). Let us consider an infinite number of tosses of a
coin (not necessarily balanced). Let {ξn } be a sequence of i.i.d random variables representing the
results of the tosses, i.e. ξn = +1 if the results of the n−th toss is head and ξn = −1 if the results
of the n−th toss is tail. Their distribution is simply given by
P(ξn = +1) = p, P(ξn = −1) = 1 − p.
Let us assume that at each toss we win 1 euro if the result is head and lose 1 euro if the result
is tail. If the initial cash available is given by a random variable X0 describing our fortune at time
n = 0, our total fortune after n tosses will Pnbe given by the random variable Xn with values in the
state space S = Z given by Xn := X0 + j=1 ξj .
If we try to predict our future fortune given the past history, i.e. if we want to compute
P(Xn+1 = j|X0 = i0 , . . . , Xn = in )
this is simply given by
p j = im + 1
P(Xn+1 = j|X0 = i0 , . . . , Xn = in ) = 1 − p j = im − 1
0 otherwise
Example 14 (The problem of surname extiction). The males of a family (with a particular P∞ sur-
name) can produce 0, 1, 2, ..., k, ... male offsprings with probabilities p0 , p1 , ..., pk , ... (where k=0 pk =
1). Assuming that the number of male offsprings of different individuals are described by i.i.d.
random variables ξl with distribution P(ξl = k) = pk and that Xn is the discrete random variable
denoting the number of males at the n − th generation, we have
Xin
P(Xn+1 = j|X0 = i0 , . . . , Xn = in ) = P( ξl = j)
l=1
In this case the state space is S = N. In particular in this model it is interesting to study the
extinction probability, i.e. the probability that Xn = 0 for some n ≥ 1.
In both examples, in order to predict the value of the variable Xn+1 it is enough to look at the
state at time n and not further into the past, more precisely ∀n ≥ 0 and for all sequences of states
i0 , . . . in , in+1 , the following holds:
P(Xn+1 = in+1 |X0 = i0 , . . . , Xn = in ) = P(Xn+1 = j|Xn = in ) (5.3)
The identity (5.3) is called Markov property and any sequence of discrete random variables {Xn }n≥0
enjoying it is called Markov chain.
47
5.2 Basic definitions and first examples
The conditional probabilities
are called transition probabilities. In the following we shall be concerned with time homogeneous
Markov chains , i.e. sequence of random variables {Xn }n∈N enjoying the Markov properties (5.3)
and such that the (one-step) transition probabilities (5.4) do not depend explicitly on the time
index n ∈ N:
In this case, since we can drop the explicit dependence on the time index n, we can introduce the
shortened notation
pij := P(X1 = j|X0 = i)
The transition probabilities actually give the instrument for determining the time evolution of the
distribution of the random variables {Xn }, indeed if we know the probabilities λni = P(Xn = i) we
can compute the probabilities λn+1
j = P(Xn+1 = j) as:
X X
λn+1
j = P(Xn+1 = j) = P(Xn+1 = j|Xn = i)P(Xn = i) = pij λni . (5.6)
i∈S i∈S
In this respect, condition (5.5) can be interpreted as the time invariance of the dynamics governing
the evolution of the probabilities.
λn = λ0 P n , (5.8)
Moreover, given the initial distribution λ0 and the matrix P , we can compute all the finite dimen-
sional distributions as
48
This formula can be easily proved by induction, indeed
P(X0 = i0 , X1 = i1 , ..., Xn = in )
= P(Xn = in |X0 = i0 , X1 = i1 , ..., Xn−1 = in−1 )P(X0 = i0 , X1 = i1 , ..., Xn−1 = in−1 )
= P(Xn = in |Xn−1 = in−1 )P(X0 = i0 , X1 = i1 , ..., Xn−1 = in−1 )
1. In the simple case there are only two possible states S = {1, 2}, the generic stochastic matrix
has the following form
(1 − α) α
P =
β (1 − β)
where α ∈ [0, 1], β ∈ [0, 1].
It can be equivalently described by the following diagram:
1 Both properties S1 and S2 follow easily by the definition of Pij = P(X1 = j|X0 = i).
49
α
1−α 1 2 1−β
2. Let us consider the case where S = {1, 2, 3} and the stocastic matrix is:
0 1 0
P = 0 1/2 1/2
1/2 0 1/2
1
1
1 2 2
1
1 2
2 1
3 2
Let us consider the case where there are N possible states and the transition probabilities
are given by the following diagram (where we have taken N = 4)
2
p p
1−p 1−p
1 3
1−p 1−p
p p
N
50
3. Random walk with adsorbing boundaries
p p p
1 1 2 3 4 N 1
1 0 0 ... 0
1−p 0 p 0 0
P =
.. .. .. .. ..
. . . . .
0 0 ... 0 1
1
p p p
1 2 3 4 N
A simple but important identity in the theory of Markov chains is the following Chapman-
Kolmogorv equation:
X
P(Xn = j|Xl = i) = P(Xn = j|Xm = k)P(Xm = k|Xl = i),
k∈S
which holds for any l ≤ m ≤ n, i, j ∈ S. By introducing the notation pnij = P(Xn = j|X0 = i) and
using property (5.5), it can be equivalently written in the following form
X
pn+m
ij = pm n
ik pkj
k∈S
51
• P(X0 = i) = λ0i for all i ∈ S;
• For all n ≥ 0, conditional on Xn = i, the random variable Xn+1 has distribution (Pij j ∈ S)
and is independent of X0 , . . . , Xn−1 :
The following result shows how a particular form of the finite dimensional distributions allow
to uniquely identify a Markov chain and will be applied later.
Theorem 11. A sequence of random variables (Xn )n∈N with values in a discrete space S is a
Markov chain (λ, P ) if and only if for all n ∈ N and i0 , . . . in ∈ S the following holds
Proof: If (Xn )n∈N is a Markov chain (λ, P ) then we have already proved that (5.12) holds (see
(5.10) and its proof by induction).
Conversely, if (5.12) holds, then P(X0 = i) = λi and we can easily prove that
= Pin in+1 .
52
Theorem 12. Let (Ω, F, P, {Xn }n≥0 ) be a Markov chain (λ, P ), and let m ∈ N and i ∈ S.
Conditionally upon Xm = i, the sequence of random varibles {Yn }n≥0 defined by
Yn := Xm+n , n≥0
By theorem 11 this is equivalent to say that conditionally upon Xm = i, the random variables
(Xm+n )n≥0 are a Markov chain (δi , P )
Proof: [of theorem 12]By lemma 3, it is sufficient to prove (5.13) only for events A and B belonging
respectively to two π− systems P1 and P2 generating the sub-σ-algebras Fm = σ(X0 , ..., Xm ) and
Gm = σ(Yn , n ≥ 0) = σ(Xn , n ≥ m).
In particular, P1 will be the collections of sets of the following form:
P1 = {X0 = i0 , . . . , Xm = im }, i0 , . . . , im ∈ S
53
we have to compute
P(A ∩ B|Xm = i)
P(A ∩ B ∩ Xm = i)
=
P(Xm = i)
P(X0 = i0 , . . . , Xm = im , Xm = i, Xm = jm , . . . , Xm+1 = jm+1 , . . . , Xm+n = jm+n )
=
P(Xm = i)
λi0 Pi0 i1 · · · Pim−1 im δim i δijm Pjm jm+1 · · · Pjm+n−1 jm+n
=
P(Xm = i)
λi Pi i · · · Pim−1 im δim i
= 0 01 δijm Pjm jm+1 · · · Pjm+n−1 jm+n
P(Xm = i)
P(X0 = i0 · · · Xm = im , Xm = i)
= δijm Pjm jm+1 · · · Pjm+n−1 jm+n
P(Xm = i)
P(A ∩ Xm = i)
= δijm Pjm jm+1 · · · Pjm+n−1 jm+n
P(Xm = i)
= P(A|Xm = i)δijm Pjm jm+1 · · · Pjm+n−1 jm+n
∀n ∈ N {τ = n} ∈ Fn (5.14)
54
Example 15. Let A ⊂ S be a set of states and let τ be the first hitting time of A, defined as:
Given a Markov chain (Xn )n≥0 , the natural filtration (Fn )n≥0 and a stopping time τ , we shall
denote Fτ the collection of sets E ∈ F satisfying the following condition
∀n ∈ N E ∩ {τ = n} ∈ Fn . (5.15)
E ∩ {τ = n} = {Xτ = i} ∩ {τ = n} = {Xn = i} ∩ {τ = n} ∈ Fn
On the other hand, the event E 0 = {ω ∈ Ω : Xτ (ω)+1 (ω) = i} does not belong to Fτ , since:
E 0 ∩ {τ = n} = {Xτ +1 = i} ∩ {τ = n} = {Xn+1 = i} ∩ {τ = n} ∈ Fn
Theorem 13 (Strong Markov property). Let (Xn )n∈N be a Markov chain (λ, P ), τ a stopping
time and i ∈ S a state. Then, conditionally upon {Xτ = i} ∩ {τ < ∞}, the random variables
(Xτ +n )n∈N are a Markov chain (δi , P ) independent of Fτ .
55
Proof: We have to prove that for any choice of E ∈ Fτ , n ∈ N, i0 , . . . , in ∈ S the following holds:
{τ < ∞} = ∪m {τ = m}
56
we will fix P and consider different initial distributions λ.
Our Markov chain will be a sequence (Xn )n∈N of S-valued discrete random variables on a prob-
ability space (Ω, F, P). For any probability distribution λ on S we will consider a corresponding
probability measure Pλ on (Ω, F) in such a way that under Pλ the sequence (Xn )n∈N will be a
Markov chain (λ, P ). Whenever λ = δi , with i ∈ S, we shall use the notation Pi instead of Pδi . In
fact Pi can be interpreted as the conditional distribution of the random variables (Xn )n∈N given
that X0 = i:
Pi ( ) ≡ P( |X0 = i)
Definition 24. A state i ∈ S is said to be recurrent if Pi (Ti < +∞) = 1. A state i ∈ S is said to
be transient if Pi (Ti < +∞) < 1.
Clearly recurrence and transience are exclusive properties: a state i ∈ S can be either transient
or recurrent.
Fixed a state i ∈ S, let us define the number of returns to the state i as:
X
Ni (ω) := 1Xn =i (ω)
n≥1
In particular:
a. If i is recurrent then Pi (Ni = +∞) = 1.
b. if i is transient then Pi (Ni < +∞) = 1.
Proof:
We shall compute the probability Pi (Ni = k) out of the elementary identity
57
which yields
Pi (Ni = k) = Pi (Ni ≥ k) − Pi (Ni ≥ k + 1).
Hence, the problem is reduce to the calculation of the probabilities Pi (Ni ≥ k) for any k ∈ N.
Clearly, if k = 0 then Pi (Ni ≥ 0) = 1, while if k = 1 we can use the identity Pi (Ni ≥ 1) = Pi (Ti <
∞), which gives Pi (Ni ≥ 1) = fii∗ . In order to generalize this argument to arbitrary k ∈ N, let us
(k)
define the sequence of random variables (Ti )k ≥ 1 with values in N ∪ {+∞} as:
(1)
Ti = Ti , the first return time to state i
(2)
Ti = Ti + inf{n ≥ 1 : XTi +n = i}, the second return time to state i
...
(k)
Ti = Tik−1 + inf{n ≥ 1 : XT k−1 +n = i}, the k-th return time to state i
i
Clearly we have
(k)
Pi (Ni ≥ k) = Pi (Ti < +∞) .
(k)
Moreover, it is rather easy to check that the random variables Ti are stopping times.
We shall now prove that
(k)
∀k ≥ 1 Pi (Ti < +∞) = (fii∗ )k . (5.19)
We shall use and inductive argument. Indeed, as remarked above, the identity (5.19) is true for
k = 1. Let us assume now that it holds true for k and prove it for k + 1.
(k+1) (k+1) (k)
Pi (Ti < +∞) = Pi (Ti < +∞ ∩ Ti < +∞)
(k+1) (k) (k)
= Pi (Ti < +∞|Ti < +∞)Pi (Ti < +∞)
(k+1) (k) (k) (k)
X
= Pi (Ti = Ti + n|Ti < +∞)Pi (Ti < +∞)
n≥1
(k) (k)
X
= Pi (XT (k) +1 6= i, . . . , XT (k) +n−1 6= i, XT (k) +n = i|Ti < +∞)Pi (Ti < +∞)
i i i
n≥1
(5.20)
(k)
By the inductive assumption Pi (Ti < +∞) = (fii∗ )k and we are left to prove that
(k)
X
Pi (XT (k) +1 6= i, . . . , XT (k) +n−1 6= i, XT (k) +n = i|Ti < +∞) = fii∗ .
i i i
n≥1
(k) (k) (k)
By definition of Ti we have {Ti < +∞} = {Ti < +∞, XT (k) = i}. Moreover
i
(k)
Pi (XT (k) +1 6= i, . . . , XT (k) +n−1 6= i, XT (k) +n = i|Ti < +∞, XT (k) = i)
i i i i
(k)
= P(XT (k) +1 6= i, . . . , XT (k) +n−1 6= i, XT (k) +n = i|Ti < +∞, XT (k) = i, X0 = i) (5.21)
i i i i
(k)
Further, Ti is a stopping time and the event X0 = i belongs to the σ-algebra FT (k) , hence by
i
the strong Markov property we have:
(k)
P(XT (k) +1 6= i, . . . , XT (k) +n−1 6= i, XT (k) +n = i|Ti < +∞, XT (k) = i, X0 = i)
i i i i
58
hence, the last line of (5.20) reduces to
(1)
X
Pi (Ti = n) (fii∗ )k = (Pi (Ti < +∞)) (fii∗ )k = (fii∗ )k+1
n≥1
we obtain
Pi (Ni = k) = 0 ∀k ∈ N
hence X
Pi (Ni < +∞) = Pi (Ni = k) = 0, Pi (Ni = +∞) = 1 − Pi (Ni < +∞) .
k≥0
While if i is transient
X X
Pi (Ni < +∞) = Pi (Ni = k) = (1 − fii∗ )(fii∗ )k = 1
k≥0 k≥0
(n) (n)
Let us denote with the symbol pij the n−step transition probability pij = P(Xn = j|X0 =
(n)
i) = Pi (Xn = j). In fact pij is equal to (P n )ij , with P n denoting th e n-th power of the stochastic
matrix P .
P (n)
Theorem 15. A state i ∈ S is recurrent if and only if n pii = +∞
Proof: Let us denote with Ei the expectation with respect to the probability measure Pi . Let Ni
(n)
be the number of returns to the state i. Since Ei [1Xn =i ] = Pi (Xn = i) = pii we have:
X X (n)
Ei [Ni ] = Ei 1Xn =i = pii
n≥1 n≥1
59
5.4 Communication classes
Let us consider a Markov chain {Xn }n∈N with stochastic matrix P .
Definition 25. Let i, j ∈ S. We say that i leads to j if Pi (∪n Xn = j) > 0. We shall denote this
relation with the symbol i → j.
It is easy to see that i → j if and only if there exists at least a non-negative integer n ∈ N such
(n)
that pij > 0.
Definition 26. Let i, j ∈ S. We say that i communicates with j , if i → j and j → i. We shall
denote this relation with the symbol i ∼ j.
More precisely i ∼ j if there exist two non-negative integers m, n ∈ N such that
(n) (m)
pij > 0 and pji > 0. (5.23)
Actually the relation ∼ is an equivalence relation, indeed it enjoys the following properties:
• reflexive. i ∼ i (it is sufficient to choose m = n = 0 in (5.23)).
• symmetric. if i ∼ j then j ∼ i (This comes directly from (5.23)).
• transitive . if i ∼ j and j ∼ k then i ∼ k. Indeed, if i ∼ j and j ∼ k then there exist m, n ∈ N
(n) (m)
such that pij > 0 and pjk > 0. We can prove that i → k by using Chapman-Kolmogorov
equation, indeed:
(n+m)
X (n) (m) (n) (m)
pik = pil plk ≥ pij pjk > 0
l∈S
Analogously we can prove that k → i.
Hence, the set of states S can be decomposed into the disjoint union of equivalence classes
S = C1 ∪ C2 ∪ ... ∪ CM .
Theorem 16. Let C ⊂ S be an equivalence class. Then the states in C are either recurrent or
transient.
Proof: Let i ∈ C and j ∼ i. We have to show that if i is recurrent then j is recurrent and,
conversely, if i is transient then j is transient. Since by assumption i ∼ j, then there exist two
(n ) (n )
positive integers n1 , n2 ∈ N such that pij 1 > 0 and pji 2 > 0. Moreover, for any n ∈ N the
following holds:
(n +n2 +n) (n ) (n) (n ) (n +n2 +n) (n ) (n) (n )
pjj 1 ≥ pji 2 pii pij 1 , pii 1 ≥ pij 1 pjj pji 2 ,
hence
∞ ∞ ∞
!
(k) (n +n +n) (n ) (n) (n )
X X X
pjj ≥ pjj 1 2 ≥ pji 2 pii pij 1 ,
k=0 n=0 n=0
∞ ∞ ∞
!
(k) (n +n +n) (n ) (n) (n )
X X X
pii ≥ pii 1 2 ≥ pij 1 pjj pji 2 ,
k=0 n=0 n=0
P∞ (k) P∞ (k)
and we have proved that the series k=0 pii
= +∞ if and only if k=0 pjj = +∞
In the following we shall call an equivalence class C recurrent if all its elements are recurrent
and transient if all its elements are transient.
60
5.4.1 Closed sets
Definition 27. A set K ⊂ S is said to be closed if for any i ∈ K and for any j ∈
/ K, i doesn’t
lead to j:
(n)
∀i ∈ K ∀j ∈/ K pij = 0 ∀n ∈ N (5.24)
Property (5.24) is equivalent to the following (which seems to be weaker):
(1)
∀i ∈ K ∀j ∈
/K pij = 0 (5.25)
Property (5.25) involves only the one-step transition probabilities, i.e the elements of the stochastic
matrix P .
Clearly (5.24) implies (5.25) (indeed (5.25) is a particular case of (5.24)).
The proof that (5.25) implies (5.24) relies on an inductive argument. In the case where n = 1
(5.24) coincides with (5.25) . Let us assume that (5.24) holds for n − 1 prove that it holds also for
n. Given i ∈ K and j ∈ / K, then:
(n) (n−1) (n−1)
X X
pij = pil plj = pil plj = 0,
l∈S l∈K
where the first equality relies on Chapman-Kolmogorov equation, the second on (5.25) and the
third on the inductive assumption.
Definition 29. Let C ⊂ S a subset of S. The closure of C is the smallest closed set containing
C.
Example Let us consider a Markov chain with stochastic matrix P given by
1/2 1/2 0 0 0 0
0 0 1 0 0 0
1/3 0 0 1/3 1/3 0
P =
0 0 0 1/2 1/2 0
0 0 0 0 0 1
0 0 0 0 1 0
61
1
2
1 1
3 3
1
2 1 3 4
1
1 2
1 1 1
2 3
2 5 6
By looking at the diagram we can easily see that S is decomposed in three equivalence classes
S = {1, 2, 3} ∪ {4} ∪ {5, 6}. In addition the class {5, 6} is closed. Other closed sets are {4, 5, 6}
and the whole state space S = {1, 2, 3, 4, 5, 6}.
The following results allow to identify rather easily the closed sets.
Lemma 6. Let K ⊂ S be a closed set and Ci an equivalence class. Then only one of the following
conditions holds:
• K ∩ Ci = ∅
• Ci ⊂ K
Proof: If K ∩ Ci 6= ∅ then there exists a state i ∈ Ci such that i ∈ K. We have to prove that for
any j ∈ Ci , i.e. for any j ∈ S such that j ∼ i, we have j ∈ K. Indeed, if it were not true, i.e. if
(n)
j∈/ K, then by definition of closed set pij = 0 ∀n ∈ N. In this case i and j do not communicate
in contradiction with the assumption they belong to the same equivalence class.
(n)
Proof: Ki is a closed set. Indeed if j ∈ Ki and l ∈
/ Ki then for any n ∈ N we have pjl = 0. Indeed,
(n̄)
if it were not true then there would exist a n̄ ∈ N such that pjl > 0. On the other hand, since
(m)
j ∈ Ki , there exists an m ∈ N such that pij > 0. By applying Chapman-Kolmogorov equation
we have
(m+n̄)
X (m) (n̄) (m) (n̄)
pil = pis psl ≥ pij pjl > 0
s∈S
62
(n)
psj = 0 for all n ∈ N. In particular, if s = i we obtain that i doesn’t lead to j in contradiction
with the assumption j ∈ Ki .
Definition 30. A Markov chain is said to be irreducible if S and ∅ are the only possible closed
sets.
The following proposition gives an interesting characterization of irreducible Markov chains
Proposition 1. The following statements are equivalent:
1. S and ∅ are the only closed sets.
2. There exists a unique equivalence class.
Proof:
1. ⇒ 2.. Let us assume that S and ∅ are the only closed sets and let us show that there exists a
unique equivalence class. By lemma 7 we know that for any i ∈ S the set Ki := {j ∈ S : i → j} is
closed. Since Ki 6= ∅ then by 1. we get that Ki = S ∀i ∈ S. This means that for any i ∈ S and
∈ S we have that i → j, hence i ∼ j ∀i, j ∈ S.
2 ⇒ 1. Let us assume that there exists a unique equivalence class C and show that S and ∅ are
the only closed sets. Let K ⊂ S be a non-empty closed set and let i ∈ K. By lemma 6 we have
that the equivalence class Ci = {j ∈ S : j ∼ i} is included in K. By 2. we have Ci = S, hence
K = S.
According to Proposition 1, a Markov chain is irreducible if and only if any state i ∈ S leads
to all the states j ∈ S
Exercise. For each of the following Markov chains, draw the diagram associated to the stochastic
matrix and determine whether the chain is irreducible.
1. Random walk on the line. Let S = Z and pij of the form
pi(i+1) = p, pi(i−1) = 1 − p, pij = 0 if j 6= (i + 1), (i − 1)
where p ∈ [0, 1] is a nonnegative parameter which gives the probability of moving forward in
one step.
2. Random walk with absorbing barriers . S = {1, 2, . . . , N },
1 0 0 ... 0
1−p 0 p 0 0
P = .
. . .. ... ..
.. ..
.
0 0 ... 0 1
where p ∈ (0, 1)
3. Random walk with reflecting barriers. S = {1, 2, . . . , N },
1−r r 0 ... 0
1−p 0 p 0 0
P = .
.. .. .. ..
..
. . . .
0 0 ... r 1−r
where p ∈ (0, 1) and r ∈ (0, 1].
63
4. Ciclic random walk. S = {1, 2, . . . , N },
0 p 0 ... 1−p
1−p 0 p 0 0
P = .
.. .. .. ..
..
. . . .
p 0 ... 1−p 0
with p ∈ (0, 1)
Remark 9. If a Markov chain is reducible, then there exists a closed set K (different from S and
∅). In particular, if the set of states S has a finite number of elements, then by sorting them in
such a way that the first rows (and columns) of the stochastic matrix P corresponds exactly to the
states belonging to K, P assumes the form:
Q 0
P = (5.26)
R S
where Q is an M × M matrix (M being the cardinality of K) whose elements are the transition
probabilities within the closed set K, i.e. Qij = pij , i, j ∈ K. Moreover Q is still a stochastic
PM
matrix since for any row index i ∈ K, i = 1, ...M ,we have j=1 Qij = 1. The stochastic matrix
Q contains the transition probabilities within K. In particular, if for some time n0 ∈ N we have
Xn0 ∈ K, then Xn ∈ K for any n > n0 and the evolution of the distribution of the random
variables Xn will be computed by means of the powers of the matrix Q. Indeed, if P is of the form
(5.26) , then its n-power assumes the following form:
n
n Q 0
P =
∗ Sn
Proof: Ad absurdum, let us assume that C is not closed. In this case there exist two states i ∈ C
(n)
and j 6∈ C such that pij > 0. Since i → j but j ∈
/ C, then necessarily j 6→ i, which means pji = 0
for any n ∈ N. On the other hand, since i is a recurrent state, we have:
64
and we have obtained a contradiction.
In particular, according to this result it is impossible to reach a transient state starting from a
recurrent state, i.e. if i is recurrent and j is transient then i doesn’t lead to j.
The following theorem gives a partial inversion of theorem 17.
Theorem 18. Let C be a transient equivalence class with a finite number of states. Then C is
not closed.
We postpone the proof and give a technical lemma.
Lemma 8. Let j ∈ S be a transient state. Then for any i ∈ S
(n)
lim p =0 (5.27)
n→∞ ij
Proof: In the case where i = j then the result follows from Theorem 15. Indeed, j is transient
P (n)
if and only if n pjj < +∞ and by a standard result of calculus the convergence of the series
P (n) (n)
n pjj implies limn→∞ pjj = 0.
In the case where i 6= j, the same argument can be applied. Indeed, by using Markov property we
have that
n
(n) (ν) (n−ν)
X
pij = fij pjj (5.28)
ν=1
Indeed:
(n)
pij = P(Xn = j|X0 = i) = P(∪nν=1 {Xn = j, , Xν = j, Xν−1 6= j, . . . , X1 6= j}|X0 = i)
Pn
Pnn= j, Xν = j, Xν−1 6= j, . . . , X1 6= j|X0 = i) =
ν=1 P(X
= ν=1 P(Xn = j|Xν = j, Xν−1 6= j, . . . , X1 6= j, X0 = i)·
·P(Xν = j, Xν−1 6= j, . . . , X1 6= j|X0 = i) =
Pn Pn (n−ν) (ν)
= ν=1 P(Xn = j|Xν = j) · P(Xν = j, Xν−1 6= j, . . . , X1 6= j|X0 = i) = ν=1 pjj fij
Proof: [of theorem 18] Let C be a transient equivalence class with a finite number of elements. If
(n)
C were closed, then for any i ∈ C and j 6∈ C we would have pij = 0 ∀n ∈ N. On the other hand,
n
since P is for any n ∈ N a stochastic matrix, we have
X (n)
pik = 1. (5.31)
k∈C
65
(n)
On the other hand, if k is a transient state, by lemma 8 limn→∞ pik = 0 for anyi ∈ C. Hence,
taking the limit for n → ∞ on both sides of (5.31) we obtain 0 = 1!
According to theorems 17 and 18 an equivalence class with a finite number of states is recurrent if
and only if it is closed.
Example 17. Let us consider the Markov chain with state space S = {0, 1, 2, 3, 4, 5} and stochastic
matrix
1 0 0 0 0 0
1/4 1/2 1/4 0 0 0
0 1/5 2/5 1/5 0 1/5
P = 0
0 0 1/6 1/3 1/2
0 0 0 1/2 0 1/2
0 0 0 1/4 0 3/4
By checking the associated diagram (which is left as an exercise) it is easy to see that S can be
partitioned into the disjoint union of three equivalence classes:
The classes {0} and {3, 4, 5} are closed hence their elements are recurrent, while the class {1, 2}
is not closed hence its element are transient
The most interesting and not trivial examples are those where the state space S has an infinite
(countable) number of elements. We study in some detail the random walk on the line.
Example 18. Let us consider the random walk on the line described by a Markov chain with S = Z
and pij given by
where the parameter p is assumed to be strictly positive and strictly less than 1, p ∈ (0, 1). Under
this assumptions the chain is irreducible; there is a unique equivalence class, hence all states i ∈ Z
are transient or else all states are recurrent.
1−p p
... ◦ i−1 < i > i+1 ◦ ...
It is sufficient to study the recurrence/transience property of one particular state to determine the
same property for the other states. Let us consider for notational simplicity the origin i = 0 and
P (n)
study the convergence of the series n p00 . In fact, by the particular form of the stochastic matrix,
in order to come back to the origin we need to take an equal number of steps forward and backward,
(n)
hence p00 = 0 whenever n is odd. In the case the number of steps is even, we have:
(2n) 2n n (2n)! n
p00 = p (1 − p)n = p (1 − p)n
n (n!)2
By Stirling formula √
n! ∼ 2πn(n/e)n , n→∞
we get the following asymptotic equivalence for n → ∞:
66
P (2n)
If p = 1/2 then 4p(1 − p) = 1 and the series n p00 has infinite sum. in this case the state i = 0
and all the states of the irreducible Markov chain are recurrent .
P (2n)
If p ∈ (0, 1), p 6= 1/2, then the series n p00 is convergent, hence all the states of the irreducible
Markov chain are transient .
Moreover, given the distribution λn at time n we can compute the distribution at future times in
terms of the formula λn+1 = λn P . More generally
λn+m = λn P m . (5.32)
λ = λP (5.33)
λ = λP n ∀n ∈ N (5.34)
P P
S S
λi λi λi
t t
If the set of states has a finite number of elements #S = N then the invariant distribution, if
it exists, is described by a row vector λ = (λ1 , ..., λN ) satisfying the following conditions:
1. λi ≥ 0, i = 1, ..., N ;
67
PN
2. i=1 λi = 1;
3. λ = λP .
In other words λ is a left eigenvector of P with eigenvalue
PN 1. Moreover all its components are
non-negative and it fulfils the normalization condition i=1 λi = 1. In fact, the computation of
this vector can be reduced to a problem of linear algebra.
Example 19. Let us consider the simple case where S = {1, 2} and stochastic matrix has the
following form
(1 − α) α
P =
β (1 − β)
where α ∈ [0, 1], β ∈ [0, 1].
The stochastic matrix P can be equivalently described by the following diagram:
α
1−α 1 2 1−β
The invariant distribution is associated to a row vector λ = (λ1 , λ2 ), left eigenvector of P with
eigenvalue 1:
1−α α
λ1 λ2 = λ1 λ2
β 1−β
or, equivalently, by the solution of the linear system:
−αλ1 + βλ2 = 0
αλ1 − βλ2 = 0
By imposing the normalization condition λ1 + λ2 = 1 we get:
• if α + β 6= 0
β α
λ1 = , λ2 = .
α+β α+β
In this case we have existence and uniqueness of the invariant distribution.
• if α + β = 0 then P is the identity matrix and any probability distribution is invariant. In
this case we have existence but not uniqueness of the invariant distribution.
Exercise Compute the invariant distribution (if it exists) of the following Markov chains.
1. S = {1, 2, 3}
1/3 1/3 1/3
P = 1/4 1/2 1/4
1/6 1/3 1/2
2. Symmetric random walk wit reflecting barriers: S = {1, 2, . . . , N },
1/2 1/2 0 ... 0
1/2 0 1/2 0 0
P = .
.. .. .. .
.. . ..
. .
0 0 ... 1/2 1/2
68
3. Random walk with absorbing barriers: S = {1, 2, . . . , N },
1 0 0 ... 0
1−p 0 p 0 0
P = .
.. . .. ..
.. . ..
. .
0 0 ... 0 1
then
(n) (n)
X X
lim ai = lim a
n→+∞ n→+∞ i
i∈S i∈S
Theorem 19. Let λ be an invariant distribution for the stochastic matrix P . Then λj = 0 for
any transient state j ∈ S
(n)
Proof: Since by assumption j is transient, we have limn→∞ pij = 0 for any i ∈ S. On the other
hand, since by assumption λ is an invariant distribution the following holds:
(n)
X
λj = λi pij , ∀n ∈ N
i∈S
By letting n → ∞ and taking the limit under the sum thanks to lemma 9 and the uniform bound
(n) P
|λi pij | ≤ λi ∀n ∈ N, λi < ∞, we obtain:
(n)
X
λj = λi lim pij = 0.
n→∞
i∈S
69
Example 20. In the case of example 18, if p 6= 1/2 all states are transient, hence an invariant
distribution cannot exists.
For any i ∈ S we shall denote Ei the expectation with respect to the probability measure Pi =
P( |X0 = i).
We shall define the mean return time in i as the expected value of Ti given X0 = i:
mi := Ei [Ti ]
Clearly, if i is a transient state the random variable Ti assumes the value +∞ on a set of strictly
positive probability Pi and mi = +∞.
On the other hand, if i is a recurrent state then Pi (Ti = +∞) = 0 and mi is given by
∞ ∞
(n)
X X
mi = Ei [Ti ] = nPi (Ti = n) = nfii
n=1 n=1
70
Theorem 20. Let j be a recurrent state. Then for any i ∈ S:
Njn a.s 1Tj <+∞
→
n mj
and
n ∗
1 X (m) fij
pij →
n m=1 mj
Corollary 1. If C is a closed set of recurrent states which does not contain proper closed sets then
for any i, j ∈ C:
n
1 X (m) 1
p → (5.35)
n m=1 ij mj
Njn a.s. 1
Moreover, if P(X0 ∈ C) = 1 then n → mj
In particular, if {Xn } is an irreducible recurrent Markov chain, then Eq. (5.35) holds for any
i, j ∈ S. If j is a null recurrent state, then the mean proportion of time before n spent in state i
converges to 0 as n → ∞, while if j is a positive recurrent state the same quantity converges to
a strictly positive value. We do not give the detailed proof of this result, which relies upon the
strong low of large numbers and the strong Markov property, but limit ourselves to appling it.
The following result shows that if an invariant distribution exists, it is concentrated on positive
recurrent states.
Theorem 21. Let λ be a stationary distribution and j a null recurrent state. Then λj = 0
P (m)
Proof: If λ is a stationary distribution, then for any m ∈ N we have λj = i∈S pij λi . By taking
the sum over m = 1, .., n and dividing by nwe get
Pn (m)
X m=1 pij
λj = λi .
n
i∈S
Pn (m)
m=1 pij
Since n ≤ 1, by lemma 9 we can take the limit under the sum and by theorem 20 we
obtain λj = 0.
According to this result, if there are no positive recurrent states then an invariant distribution
cannot exist.
The following theorem shows that null-recurrence and positive recurrent are class properties
Theorem 22. if i is positive recurrent and i ∼ j then j is positive recurrent.
Proof: Since i is recurrent and i ∼ j then j is recurrent by theorem 16. Moreover, since i ∼ j
(n ) (n )
there exist n1 , n2 ∈ N such that pji 1 > 0 and pij 2 > 0. Hence, for any m ∈ N we have:
(n +m+n2 ) (n ) (m) (n )
pjj 1 ≥ pji 1 pii pij 2
71
By summing over m = 1, ..., n and dividing by n we get
Pn1 +n+n2 (m) Pn1 +n2 (m) Pn (m)
m=1 pjj m=1 pjj (n1 ) (n2 ) m=1 pii
− ≥ pji pij .
n n n
(n ) (n )
pji 1 pij 2
by letting n → ∞, the left hand side converges to 1/mj while the right hand side to mi ,
hence
(n ) (n )
1 pji 1 pij 2
≥ > 0,
mj mi
and we can deduce mj < +∞.
By this result, we can conclude that in an irreducible Markov chain all states are of the same type
(transient, positive recurrent, null recurrent).
Theorem 23. Let C ⊂ S be a closed finite set. Then C contains at least a positive recurrent sate.
obtaining a contradiction.
According to the previous result, all states j ∈ C of a closed equivalence class C with a finite
number of states must be positive recurrent.
Remark 10. In an irreducible Markov chain with a finite number of states, all states must be
positive recurrent.
Remark 11. If #S < ∞ then there cannot exists null recurrent states.
Null recurrent states can be found only in those Markov chains where there are infinite possible
states, as the following example shows.
Example: Random walk on the line. Let us consider again the case of example 18 where
S = Z and the transition probabilities are given by
72
• If either p = 0 or p = 1 then the equivalence classes contain just one element, i.e. S is
partitioned into the disjoint union of equivalence classes that are singletons S = ∪i∈Z {i}.
Since the classes are not closed, all states are transient.
• If p ∈ (0, 1) then the chain is irreducible. In this case
– If p 6= 1/2 then, as proved in example 18, all states are transient.
– If p = 1/2 then all states are recurrent. In order to determine whether they are positive
Pn (m)
p
recurrent or null recurrent, we have to determine whether the limit limn→+∞ m=1n ii
vanishes or is strictly positive. Since the chain is irreducible, it it sufficient to study a
particular state, for example i = 0. As shown in example 18, the transition probabilities
(n)
{p00 }n vanish for n odd, while for n even they are equal to
(n) 2n n (2n)! n
p00 = p (1 − p)n = p (1 − p)n
n (n!)2
By Stirling formula √
(n)
p00 ∼ n! ∼ 2πn(n/e)n , n→∞
Pn (m)
(n) p00
Hence, limn→∞ p00 = 0. By Eq. (C.2) we get limn→∞ m=1
n = 0. We can infer
that 0 and all the other states are null recurrent.
Theorem 24. let {Xn } be an irreducible positive recurrent Markov chain. Then there exists a
unique invariant distribution λ given by
We give a sketch of the proof in the appendix. Here we limit ourself to show some examples
and applications.
First of all, it is interesting to remark that Eq 5.36 allows to compute the values of the mean return
time to the state j for any j ∈ S provided that assumptions of theorem 24 are fulfilled and we
can compute explicitly the values λj . Indeed, in the case where the cardinality of S is finite, the
computations of λ can be reduced to a problem of linear algebra.
Let us consider first of all the Markov chain with two states described in example 19. In that
case if α, β 6= 0 the chain is irreducible and positive recurrent. The invariant distribution λ is given
by (λ1 , λ2 ) = (β/(α + β), α/(α + β)). Hence the mean return times m1 , m2 to the states 1, 2 are
equal to
α+β α+β
m1 = , m2 = .
β α
Bistochastic matrices
A stochastic matrix (Pij )i,j∈S , with #S = N , is said to be bistochastic if
X
∀j ∈ S Pij = 1.
i∈S
73
In this case it is easy to check that the uniform distribution on S given by λj = 1/N ∀j ∈ S is an
invariant distribution, i.e. X
λj = λi Pij , ∀j ∈ S.
i
A particular example of a Markov chain associated to a bistochastic matrix is the random walk on
a cyclic graph, where S = {1, 2, . . . , N },
0 p 0 ... 1−p
1−p 0 p 0 0
P = .
.. .. .. .
.. ..
. . .
p 0 ... 1−p 0
with p ∈ (0, 1) In this case the chain is irreducible and positive recurrent. Since the invariant
distribution is given by λj = 1/N ∀j ∈ S, the mean return time to each state is given by mj = N .
In this case
P it is easy to verify that if (5.37) holds, then λ is an invariant distribution for P ,
i.e. λj = i λi Pij . Remarkably, there are several examples where condition (5.37) holds. The
following subsetction describes one of them.
◦. ◦
>
1/2
◦. < ◦.
1/2
74
1/2
1 >2
1/2
∨
4 3
1 2
∧
>
1/3
1/3
1/3
4 >3
If the graph is connected, then it is associated o an irreducible Markov chain. If the number of
states is finite, then all states are positive recurrent and there exists a unique invariant distribution
λ given by λi = P vi vj . The simplest way to show that such a distribution is stationary is to
j∈S
check that λ and P are in detailed balance
λi Pij = λj Pji ,
P
The same argument works in the case where S contains infinite states and j∈S vj < +∞.
1. Random walk with absorbing barriers: S = {1, 2, . . . , N }, p ∈ (0, 1) and stochastic matrix P
given by :
1 0 0 ... 0
1−p 0 p 0 0
P = .
.. . .. ..
.. . ..
. .
0 0 ... 0 1
In this case the set of states S is partitioned in three equivalence classes S = {1} ∪ {N } ∪
{2, ..., N − 1}, where {2, ..., N − 1} is not closed (hence it is transient) and the two classes {1}
and {N } are closed and positive recurrent. We can easily construct two invariant distributions
λ and µ, concentrated on {1} and {N } respectively and given by λ = (1, 0, ..., 0) and µ =
(0, ..., 0, 1). More generally any convex combination of λ and µ of the form αλ + (1 − α)µ =
(α, 0, ..., 0, 1 − α), α ∈ [0, 1], is an invariant distribution.
75
2. #S = 6 and stochastic matrix P given by
1/2 1/2 0 0 0 0
0 0 1 0 0 0
1/3 0 0 1/3 1/3 0
P = 0
0 0 1/2 1/2 0
0 0 0 0 0 1
0 0 0 0 1 0
There are three equivalence classes S = {1, 2, 3} ∪ {4} ∪ {5, 6}, where {5, 6} is closed (hence
positive recurrent) while the other classes are transient. There exists a unique invariant
distribution λ concentrated on {5, 6}, given by λ = (0, 0, 0, 0, 1/2, 1/2) .
lim λnj = λj , ∀j ∈ S.
n→∞
It can be rather easily proved that a sufficient condition for a probability distribution λ to be
asymptotic is
(n)
lim pij = λj , ∀i, j ∈ S (5.38)
n→∞
where we can pass the limit under the sum by dominated convergence (see lemma 9) since:
(n)
X
|λ0i pij | ≤ λ0i , ∀n ∈ N, λ0i = 1 < ∞.
i∈S
We cannot expect that in general condition (5.38) holds. Let us consider the stochastic matrix
P given by
0 1
P =
1 0
It is easy to check that P n = P if n is odd and P n = I for n even. In this case the n-steps
(n)
transition probabilities pij do not admit a limit.
(n)
An important property related to the existence of the limit limn→∞ pij is presented in the
following definition.
(n)
Given a state i ∈ S, let us consider the set {n ≥ 1 : pii > 0}. Let di be the positive integer
defined as:
(n)
di := G.C.D.{n ≥ 1 : pii > 0}.
76
Definition 34. If di ≥ 2 then the state i is said to beperiodic of period di .
(n)
If pii = 0 for all n ≥ 1or if di = 1, then the state i is said to be aperiodic.
Examples:
• pii > 0 then di = 1 and i is aperiodic.
• In the case of the random walk on the line (example 18) the state 0 is periodic of period 2
The following result shows that the period is a class property
Theorem 26. Let {Xn } be an irreducible, positive recurrent aperiodic Markov chain. Then for
any i, j ∈ S
(n)
lim pij = λj (5.39)
n→∞
Theorem 27. The vector {hi }i∈S is the minimal non-negative solution of the system:
xj = P
1
(5.40)
xi = k∈S pik xk , i 6= j
77
2. Let us show now that if {xi } is a non-negative solution of (5.40) then
xi ≥ hi , i∈S. (5.41)
• The number of offsprings of different individuals are described by independent and identically
distributed random variables {ξ j }j , their distributionbeing given by (5.42).
According to this model, the random variables {Xn } are a stationary Markov chain with tran-
sition probabilities given by:
78
In particular, for i = 1 we have:
Clearly the state i = 0 is an absorbing state, since p00 = 1 and p0j = 0 for any j ≥ 1.
It is interesting to investigate the extinction probabilities, namely the probability of reaching
the absorbing state 0 starting at initial time from i individuals
We shall denote these quantities with the symbols (hi )i∈N since they are precisely absorption
probabilities of the type describe in the previous section, i.e. the probability of reaching the state
0 starting initially from X0 = i individuals.
By the assumed independence of the generations arising from different individuals, we can write
the following identity hi = (h1 )i . By applying Theorem 27 we have that h1 is the minimal non
negative solution of the equation :
X X
x= p1i xi = wi xi (5.43)
i i
Let us consider for instance the problem of ”extinction of surnames”. If we adopt the simplified
model where each male individual will have three children, and the probability of males is equal
to 1/2, then the number ξ of male offsprings of each individual is described by a Binomial random
variable with distribution:
3 1
P(ξ = k) = , k = 0, . . . 3.
k 23
The extinction probability h1 of a surname is the minimal non negative solution of the equation
1 3 3 1
x= + x + x2 + x3
8 8 8 8
√
which yields h1 = 5 − 2.
In the general case, Eq. (5.43) can be cast in the equivalent form
x = Φ(x) (5.44)
where Φ : [0, 1] → R is the function defined by the power series Φ(x) = i wi xi . Since i wi =
P P
1 < +∞ the radius of convergence of the power series is clearly greater or equal than 1, hence the
map Φ is C ∞ on [0, 1) and
X X
Φ0 (x) = iwi xi−1 , Φ00 (x) = i(i − 1)wi xi−2 .
i≥1 i≥2
We shall assume that w0 = P(ξ = 0) > 0 (otherwise it is clear that the extinction probability
vanishes!). This implies that w1 < 1. By the explicit form of the function Φ and its derivatives we
have:
1. Φ(0) = p0 and Φ(1) = 1.
Φ0 (x) ≥ 0 for all x ∈ (0, 1) and Φ0 (x) > 0 for all x ∈ (0, 1) if p0 6= 1. Moreover limx→1 Φ0 (x) =
2. P
i≥1 iwi = E[ξ]. We shall denote the mean value of the random variable ξ with the symbol
µ ≡ E[ξ].
79
3. Φ00 (x) ≥ 0 for all x ∈ (0, 1) and Φ00 (x) > 0 for all x ∈ (0, 1) if there exists at least a wi > 0
with i ≥ 2.
In this setting, checking whether the extinction probability h1 is equal or else strictly smaller
than one is equivalent to investigating whether the equation admits, in addition to the trivial
solution x = 1 another solution belonging to the interval (0,1). Equivalently, this corresponds to
investigating the existence of intersections between the two curves in R2 of equations y = x and
y = Φ(x) respectively. We can show that if µ ≤ 1 there are no roots of equation x = Φ(x) in (0, 1),
while if µ > 1 then there exists a unique root in (0,1).
• Let us consider the case where µ < 1. In this case limx→1 Φ0 (x) < 1. Since Φ00 (x) ≥ 0 for
all x ∈ (0, 1) we have that Φ0 is a monotone non decreasing function, hence Φ0 (x) < 1 for all
x ∈ (0, 1). If we consider the difference map d(x) := Φ(x) − x we have that d0 (x) < 0 for all
x ∈ (0, 1) hance is strictly decreasing. It attains a positive value (equal to w0 at x = 0 and
reaches the value 0 at x + 1. Since it is strictly decreasing there cannot be other points in
(0, 1) where d(x) = 0.
P
• If µ = 1 then necessarily there exists at least a wi > 0 with i ≥ 2. Indeed µ = w1 + i≥2 iwi
and w1 < 1 since we assumed w0 > 0. In this case we have that Φ00 (x) ≥ 0 for all x ∈ (0, 1)
and Φ0 is a strictly monotone increasing function in (0,1). By reasoning as above we can
again prove that the map d(x) := Φ(x) − x reaches the value 0 only at the end point of the
interval [0,1].
• In the case µ > 1, by the continuity of Φ0 we can conclude that there exists an > 0 such that
(1 − , 1] ⊂ (0, 1] and Φ0 (x) > 1 for all x ∈ (1 − , 1]. In particular the derivative d0 = Φ” − 1
of the difference map d will be strictly positive in the interval (1 − , 1), hence d(x) < 0 for
all x ∈ (1 − , 1). By the continuity of the map d and the condition d(0) = p0 > 0 there
must exists at least a point x∗ ∈ (0, 1) where d(x∗) = 0. This point is also unique. Indeed,
if there where another point x∗∗ ∈ (0, 1) where d(x∗∗ ) = 1, with x∗∗ < x∗ < 1, then by Rolle
theorem there were two points x1 ∈ (x∗∗ , x∗ ) and x2 ∈ (x∗ , 1) where d0 (x1 ) = d0 (x2 ) = 0,
By applying again Rolle theorem to the function d0 , there were a point x3 ∈ (x1 , x2 ) such
that Φ00 (xP 00
3 ) = 0, This is impossible since Φ (x) > 0 for all x ∈ (0, 1) since the condition
µ = w1 + i≥2 iwi > 1 implies that there exists at least a wi > 0 with i ≥ 2
The coefficients bi and di , i ∈ N are called birth and death rates ( starting from the state i) respec-
tively . By the normalization condition of the rows we have di + ri + bi = 1 ∀i. In the following
80
we shall assume that all the coefficients bi , di , i ∈ N are strictly positive, in such a way that the
chain is irreducible.
exist an invariant distribution, while if p < q then the unique invariant distribution is given by
(p/q)j
λj = 1−p/q .
81
5.9.1 Ehrenfest model for diffusion
Ehrenfest model is a Markov chain introduced by Paul and Tatiana Ehrenfest at the beginning of
1900. This simplified model sheds some light on the problems related to the difficulties to bring
together on the one hand the (microscopical) reversibility of the law of motion describing the time
evolution of the molecules of a gas and, on the other hand, the (macroscopic) irreversibility of
thermodinamic systems.
Let us consider a closed vessel containing 2N particles (molecules of a gas). The vessel will be
divided in two sections denoted with letters A and B, separated by a wall. The sections will be
connected just by a tiny hole in the wall. If at initial time all the particle are contained in a
particular section, our experience suggests that some of them they will move to the other section
in such a way that after a while we will find about N particles in section A and N articles in section
B.
The stochastic model proposed by P. and T. Ehrenfest consists in a Markov chain with state space
S = {0, ..., 2N } where the state i denotes the number of particles contained in section A of the
vessel (hence 2N − i will give the number of particles contained in B). The stochastic dynamic
governing the time evolution of this system is described by the stochastic matrix P , where
i
2N
j =i−1
2N −i
Pij = 2N j =i+1
0 otherwise
P describes essentially the procedure where at each step a particle is chosen (with uniform proba-
bility) and moved in the section different from the one it occupied before the choice.
The stochastic matrix P assumes the following form
0 1 0 0 ... ...
1/2N 0
1 − 1/2N 0 ... ...
0 2/2N 0 1 − 2/2N 0 ...
P = .. .. .. .. .. ..
. . . . . .
.. .. ..
. . . 0 1 0
The chain is irreducible and positive recurrent, hence there exists a unique invariant distribution λ,
which represent the equilibrium distribution of this system. λ coincides with the following binomial
distribution 2N
2N 1
λi = , i ∈ {0, ..., 2N }
i 2
as one can easily check by proving that λ and P are in detailed balance.
It is important to point out that in the case 2N >> 1 the distribution λ tends to concentrate
near N . For example, if N = 100 then P(j ∈ [40, 60]) > 0.95. We have to keep in mind that in the
case we want to provide a model for a gas in a vessel N will have to be of the order of magnitude
of the Avogadro number, i.e. N ∼ 1023
Since the chain is periodic with period d = 2 we cannot apply Theorem 26. Nevertheless corollary
1 still holds, hence for any state j ∈ S
Njn 1
→ λj =
n mj
82
with probability 1, where mj = Ej [Tj ] is the mean return time to j.
This means in particular that the chain will spend most of the time in the most probable states,
which for N >> 1 are those close to the state i = N . On the other hand the proportion of time
spent in states j such that |j − N | >> 1 will be negligible.
In is also important to point out that the chain is recurrent. This means that whatever the
initial state is, with probability 1 the chain will return to it in the future. This holds also for the
state i = 2N where all the particle are contained in one particular section of the vessel. According
to this model, after the initial time the particles will move to the other section and during the
history of the chain we will observe that most of the time the chain will occupy states j near N .
On the other hand, with probability 1 we will observe in the future that the particle will come back
to the original situation. The model allows to predict the mean return time mj = 1/λj , that in the
case of j = 2N is equal to m2N = 22N . This fact provides an explanation of the apparent paradox,
since it is true that the chain will reach in the future any state, even the most improbable, but the
time we have to wait in this case is incredibly long!
It gives the average amount of information contained in the event of the form {X0 = xi0 , . . . , Xn =
xin }, which describe the history of the process up to time n.
We can also consider the conditional entropy of Xn given the event {X0 = xi0 , . . . , Xn−1 = xin },
given by
X
H(Xn |X0 = xi0 , . . . , Xn−1 = xin−1 ) = − p(in |i0 , . . . , in−1 ) log p(in |i0 , . . . , in−1 ) (5.50)
in
where p(in |i0 , . . . , in−1 ) = P(Xn = xin |X0 = xi0 , . . . , Xn−1 = xin−1 ). The quantity H(Xn |X0 =
xi0 , . . . , Xn−1 = xin−1 ) gives the randomness contained in Xn when we have observed the occur-
rence of the event {X0 = xi0 , . . . , Xn−1 = xin−1 }. Analogously, the conditional entropy of Xn
given X0 , ..., Xn−1 is given by:
X
H(Xn |X0 , . . . , Xn−1 ) = p(i0 , . . . , in−1 )H(Xn |X0 = xi0 , . . . , Xn−1 = xin−1 )
i0 ,...,in−1
X
=− p(i0 , . . . , in−1 , in ) log p(in |i0 , . . . , in−1 )
i0 ,...,in−1 ,in
83
which generalizes (2.7).
In particular, since H(Xn |X0 , . . . , Xn−1 ) ≥ 0, we have that the joint entropy increases with time:
In this respect, a meaningful figure of merit is the entropy rate h of the process, defined as
H(X0 , . . . , Xn−1 )
h := lim
n→∞ n
if the limit exists.
Example 21. If (xn )n are independent and identically distributed then h = H(X0 ). Indeed
H(X0 , . . . , Xn−1 ) nH(X0 )
=
n n
An interesting example where the entropy rate exists and assumes a particular simple form can
be found in the theory of Markov chains. Let us assume that (Xn )n is a stationary Markov chain
with finite state space and let us assume that the initial distribution is an invariant 2 distribution
λ. In this case the following holds:
1. By Markov property
= H(Xn |Xn−1 )
2. By taking into account that the initial distribution λ is invariant, in such a way that P(Xn =
i) = λi for all n, we have
H(Xn |Xn−1 ) = H(X1 |X0 )
Indeed, if P is the stochastic matrix and Pij = P(Xn = j|Xn−1 = i), we have
X
H(Xn |Xn−1 ) = − λi Pij log Pij = H(X1 |X0 ).
i,j
84
By identity (5.53), it is easy to see that the entropy rate h is given by:
where α ∈ [0, 1], β ∈ [0, 1], which can be equivalently described by the following diagram:
1−α 1 2 1−β
In the non-trivial case where α + β 6= 0 there is a unique invariant distribution λ given by:
β α
λ1 = , λ2 = .
α+β α+β
The conditional entropy H(X1 |X0 ) and, by (5.54), the entropy rate hof the Markov chain are given
by
β α
H(X1 |X0 ) = (−α log α − (1 − α) log(1 − α)) + (−β log β − (1 − β) log(1 − β))
α+β α+β
β α
= HB (α) + HB (α)
α+β α+β
where HB (p) = −p log p − (1 − p) log(1 − p) denotes the entropy f a Bernoulli random variable X
with distribution P(X = 0) = p, P(X = 1) = 1 − p.
85