Stoch Bio Chapter 3
Stoch Bio Chapter 3
In this chapter we introduce discrete time Markov chains. For these models both time
and space are discrete. We will begin by introducing the basic model, and provide
some examples. Next, we will construct a Markov chain using only independent
uniformly distributed random variables. Such a construction will demonstrate how
to simulate a discrete time Markov chain, which will also be helpful in the continuous
time setting of later chapters. Finally, we will develop some of the basic theory of
discrete time Markov chains.
P {X0 = i0 , X3 = i3 } = P {X0 = i0 , X1 ∈ S, X2 ∈ S, X3 = i3 }
= P {X0 = i0 , X1 = j1 , X2 = j2 , X3 = i3 }, (3.2)
j1 ∈S j2 ∈S
where the second equality holds as the events are mutually exclusive.
39
Example 3.1.1. Recall Example 1.1.3, where we let Zk be the outcome of the kth
roll of a fair die and we let n
Xn = Zk .
k=1
Then, assuming the rolls are indpendent,
3
1
P {X1 = 2, X2 = 4, X3 = 6} = P {X1 = 2}P {X2 = 4}P {X3 = 6} = .
6
Example 3.1.2. Suppose a frog can jump between three lily pads, labeled 1, 2, and
3. We suppose that if the frog is on lily pad number 1, it will jump to lily pad number
2 with a probability of one. Similarly, if the frog is on lily pad number 3, it will jump
to lily pad number 2. However, when the frog is on lily pad number 2, it will jump
to lily pad 1 with probability 1/4, and to lily pad three with probability 3/4. We can
depict this process graphically via
1/4 1
1 2 3.
1 3/4
We let Xn denote the position of the frog after the nth jump, and assume that X0 = 1.
We then intuitively have (this will be made precise shortly)
P {X0 = 1, X1 = 2, X2 = 3} = 1 × 1 × 3/4 = 3/4,
whereas
P {X0 = 1, X1 = 3} = 0.
Actually computing values like (3.2) can be challenging even when the values
(3.1) are known, and it is useful to assume the process has some added structure. A
common choice for such structure is the assumption that the processes satisfies the
Markov property:
P {Xn = in | X0 = i0 , . . . , Xn−1 = in−1 } = P {Xn = in | Xn−1 = in−1 }, (3.3)
which says that the probabilities associated with future states only depends upon
the current state, and not on the full history of the process. Any process Xn , n ≥ 0,
satisfying the Markov property (3.3) is called a discrete time Markov chain. Note that
the processes described in Examples 3.1.1 and 3.1.2 are both discrete time Markov
chains.
Definition 3.1.3. The one-step transition probability of a Markov chain from state
i to state j, denoted by pij (n), is
def
pij (n) = P {Xn+1 = j | Xn = i}.
If the transition probabilities do not depend upon n, then the processes is said to
be time homogeneous, or simply homogeneous, and we will use the notation pij as
opposed to pij (n).
40
All discrete time Markov chain models considered in these notes will be time
homogeneous, unless explicitly stated otherwise. It is a straightforward use of con-
ditional probabilities to show that any process satisfying the Markov property (3.3)
satisfies the more general condition
P {X0 = i0 , · · · , Xn = in }
= P {Xn = in | X0 = i0 , · · · , Xn−1 = in−1 }P {X0 = i0 , · · · , Xn−1 = in−1 }
= pin−1 in P {X0 = i0 , · · · , Xn−1 = in−1 } (3.5)
..
.
= α0 pi0 i1 · · · pin−1 in ,
and the problem of computing probabilities has been converted to one of simple
multiplication. For example, returning to Example 3.1.2, we have
Definition 3.1.4. The transition matrix P for a Markov chain with state space
S = {1, 2, . . . , N } and one-step transition probabilities pij is the N × N matrix
p11 p12 · · · p1N
p21 p22 · · · p2N
def
P = .. .. . . .. .
. . . .
pN 1 pN 2 · · · pN N
If the state space S is infinite, then P is formally defined to be the infinite matrix
with i, jth component pij .
41
Note that the matrix P satisfies
0 ≤ Pij ≤ 1, 1 ≤ i, j, ≤ N, (3.6)
N
Pij = 1, 1 ≤ i ≤ N. (3.7)
j=1
Any matrix satisfying the two conditions (3.6) and (3.7) is called a Markov or stochas-
tic matrix, and can be the transition matrix for a Markov chain. If P also satisfies
the condition
N
Pij = 1, 1 ≤ j ≤ N,
i=1
so that the column sums are also equal to 1, then P is termed doubly stochastic.
3.1.1 Examples
We list examples that will be returned to throughout these notes.
Example 3.1.5. This example, termed the deterministically monotone Markov chain,
is quite simple but will serve as a building block for more important models in the
continuous time setting.
Consider Xn with state space {1, 2, . . . , }, and with transition probabilities pi,i+1 =
1, and all others are zero. Thus, if α is the initial distribution and α1 = 1, then the
process simply starts at 1 and proceeds deterministically up the integers towards
positive infinity.
Example 3.1.6. Suppose that Xn are independent and identically distributed with
P {X0 = k} = ak , k = 0, 1, . . . , N,
where ak ≥ 0 and k ak = 1. Then,
42
1. If the gene is free at time n, there will be a probability of p ≥ 0 that it is
repressed at time n + 1.
In this setting Xn can be modeled as a discrete time Markov chain with finite state
space S = {0, 1}. The transition matrix is
1−p p
P = , (3.8)
q 1−q
where the first row/column is associated with state 0. Note that any two state discrete
time Markov chain has a transition matrix of the form (3.8). .
Example 3.1.8 (Random walk with finite state space). A “random walk” is a model
used to describe the motion of an entity, the walker, on some discrete space. Taking
our state space to be {0, . . . , N }, for some N > 0, we think of the walker flipping
a coin to decide whether or not to move to the right or left during the next move.
That is, at each time-step the walker moves one step to the right with probability p
(she flipped a heads) and to the left with probability 1 − p (she flipped a tails). If
p = 1/2, the walk is termed symmetric or unbiased, whereas if p = 1/2, the walk is
biased. The one step transition intensities for i ∈ {1, . . . , N − 1} are,
though we must still give the transition intensities at the boundaries. One choice for
the boundary conditions would be to assume that with probability one, the walker
transitions away from the boundary during the next time step. That is, we could
have
p01 = 1, pN,N −1 = 1.
We say such a process has reflecting boundaries. Note that Example 3.1.2 was a
model of a random walk on {1, 2, 3} with reflecting boundaries. Another option
for the boundary conditions is to assume there is absorption, yielding the boundary
conditions
p00 = 1, pN N = 1,
in which case the chain is often called the Gambler’s Ruin, which can be understood
by assuming p < 1/2. Finally, we could have a partial type of reflection
Of course, we could also have any combination of the above conditions at the different
boundary points. We could also generalize the model to allow for the possibility of
the walker choosing to stay at a given site i ∈ {1, . . . , N − 1} during a time interval.
43
In the most general case, we could let qi , pi and ri be the probabilities that the walker
moves to the left, right, and stays put given that she is in state i. Assuming absorbing
boundary conditions, the transition matrix for this model is
1 0 0 0 ··· 0 0
q1 r1 p 1 0 ··· 0 0
0 q2 r2 p2 · · · 0 0
P = .. . . .. .. .. ,
. . . . .
0 · · · 0 0 qN −1 rN −1 pN −1
0 0 0 0 ··· 0 1
where it is understood that qi , pi , ri ≥ 0, and qi + pi + ri = 1 for all i ∈ {1, . . . , N − 1}.
Example 3.1.9 (Axonal transport). One method of transport used in living cells is
axonal transport in which certain (motor) proteins carry cargo such as mitochondria,
other proteins, and other cell parts, on long microtubules. These microtubule can be
thought of as the “tracks” of the transportation mechanism, with the motor protein
as the random walker. One natural, and simple, model for such transport would begin
by breaking the microtubule into N equally sized intervals, and then letting Xn be
the position of the motor protein on the state space {1, . . . , N }. We could then let
the transition probabilities satisfy
pi,i+1 = pi , pi,i−1 = qi , pi,i = ri , i ∈ {2, . . . , N − 1},
where pi + qi + ri = 1 with pi , qi , ri ≥ 0, and with boundary conditions
p1,1 = p1 + q1 , p1,2 = r1 , pN,N = 1,
where we think of the end of the microtubule associated with state N as the destina-
tion of the cargo. In this case, it would be natural to expect pi > qi .
Example 3.1.10 (Random walk on the integers). This Markov chain is like that
of Example 3.1.8, except now we assume that the state space is all the integers
S = Z = {. . . , −1, 0, 1, . . . }. That is, Xn is the position of the walker at time n,
where for some 0 ≤ p ≤ 1 the transition probabilities are given by
pi,i+1 = p, pi,i−1 = 1 − p,
for all i ∈ S. This model is one of the most studied stochastic processes and will be
returned to frequently as a canonical example.
Example 3.1.11 (Random walk on Zd ). We let Zd be the d-dimensional integer
lattice:
Zd = {(x1 , . . . , xd ) : xi ∈ Z}.
Note that for each x ∈ Zd there are exactly 2d values y with |x − y| = 1 (as there are
precisely d components that can be changed by a value of ±1). We may let
1/2d if |x − y| = 1
pxy = .
0 else
44
3.2 Constructing a Discrete Time Markov Chain
We turn to the problem of constructing a discrete time Markov chain with a given
initial distribution, α, and transition matrix, P . More explicitly, for the discrete set
S = {1, 2, . . . } (the finite state space is handled similarly), we assume the existence
of:
(i) An initial distribution α = {αk } giving the associated probabilities for the
random variable X0 . That is, for k ∈ S,
αk = P {X0 = k}.
(ii) Transition probabilities, pij , giving the desired probability of transitioning from
state i ∈ S to state j ∈ S:
Note that we will require that α is a probability vector in that αk ≥ 0 for each k and
αk = 1.
k∈S
which simply says that the chain will transition somewhere from state i (including
the possibility that the chain transitions back to state i). The problem is to now
construct a discrete time Markov chain for a given choice of α and {pij } using more
elementary building blocks: uniform random variables. Implicit in the construction
is a natural simulation method.
We let {U0 , U1 , . . . } be independent random variables that are uniformly dis-
tributed on the interval (0, 1). We will use the initial distribution to produce X0
from U0 , and then for n ≥ 1, we will use the transition matrix to produce Xn from
the pair (Xn−1 , Un ). Note, therefore, that each choice of sequence of uniform ran-
dom variables {U0 , U1 , . . . } will correspond with a unique path of the process Xn ,
n ≥ 0. We therefore have a simulation strategy: produce uniform random variables
and transform them into a path of the Markov chain.
To begin the construction, we generate X0 from U0 using the transformation
method detailed in Theorem 2.3.22. Next, we note that,
P {X1 = j | X0 = i} = pij .
45
It is straightforward to verify that the constructed model is the desired Markov
chain. Using that Xn+1 is simply a function of Xn and Un+1 , that is Xn+1 =
f (Xn , Un+1 ), and that by construction X0 , . . . , Xn are independent of Un+1 , we have
The above construction provides an algorithm for the exact simulation of sample
paths of the Markov chain. In fact, the algorithm implicit in the construction above
is already one half of the well known “Gillespie Algorithm” used in the generation
of sample paths in the continuous time Markov chain setting that will be studied in
later chapters [6, 7].
Let Pijn denote the i, jth entry of the matrix P n . We note that if the state space
is infinite, then we formally have that
Pij2 = pik pkj ,
k∈S
which converges since k pik pkj ≤ k pik = 1, with similar expressions for Pijn . The
following is one of the most useful results in the study of discrete time Markov chains,
and is the reason much of there study reduces to linear algebra.
46
Proof. We will show the result by induction on n. First, note that the cases n = 0
and n = 1 follow by definition. Next, assuming the result is true for a given n ≥ 1,
we have
P {Xn+1 = j | X0 = i} = P {Xn+1 = j, Xn = k | X0 = i}
k∈S
= P {Xn+1 = j | Xn = k}P {Xn = k | X0 = i}
k∈S
(n)
= pik pkj
k∈S
= Pikn Pkj ,
k∈S
where the final equality is our inductive hypothesis. The last term is the i, jth entry
of P n+1 .
We note that a slight generalization of the above computation yields
(m) (n)
pm+n
ij = pik pkj , (3.9)
k∈S
Example 3.3.3. Consider again Example 3.1.7 pertaining to the gene that can be
repressed. Suppose that p = 1/3 and q = 1/8 and we know that the gene is unbound
at time 0, and so
1
α= .
0
Suppose we want to know the probability that the gene is unbound at time n = 4.
We have
2/3 1/3
P = ,
1/8 7/8
47
and so
4 .33533 .66467
P = ,
.24925 .75075
and
αT P 4 = [.33533, .66467].
Thus, the desired probability is .33533.
A natural question, and the focus of Section 3.5, is the following: for large n,
what are the values P {Xn = i}, for i ∈ S. That is, after a very long time what are
the probabilities of being in different states. By Proposition 3.3.2, we see that this
question, at least in the case of a finite state space, can be understood simply through
matrix multiplication.
For example, suppose that Xn is a two-state Markov chain with transition matrix
2/3 1/3
P = .
1/8 7/8
It is easy to check with a computer, or linear algebra, that for very large n,
n 3/11 8/11 def
P ≈ = Π.
3/11 8/11
Note that the rows of Π are identical and equal to π T = [3/11, 8/11]. Therefore, if v
is a probability vector (that is, a row vector with non-negative elements that sum to
one, think of it as an initial distribution), we see that
lim v T P n = v T Π = π T .
n→∞
48
3.4 Classification of States
3.4.1 Reducibility
Suppose that Xn is a Markov chain with state space S = {1, 2, 3, 4} and transition
matrix
1/2 1/2 0 0
1/3 2/3 0 0
P = 0
. (3.11)
0 1/3 2/3
0 0 3/4 1/4
Note that if the chain starts in either state 1 or 2, then it will remain in {1, 2} for all
time, whereas if the chain starts in state 3 or 4, it will remain in {3, 4} for all time.
It seems natural to study this chain by analyzing the “reduced chains,” consisting of
states S1 = {1, 2} and S2 = {3, 4}, separately.
If instead the transition matrix is
1/2 1/4 1/4 0
1/3 2/3 0 0
P = 0
, (3.12)
0 1/3 2/3
0 0 3/4 1/4
then it should be at least intuitively clear that even if X0 ∈ {1, 2}, the chain will
eventually move to the states {3, 4} as every time the chain enters state 1, it has a
probability of 0.25 of next transitioning to state 3. Once such a transition occurs,
the chain remains in the states {3, 4} for all time. This intuition will be shown to
be true later in the notes. For this example, if only the probabilities associated with
very large n are desired, then it seems natural to only consider the “reduced chain”
consisting of states {3, 4}.
The following definitions describe when chains can be so reduced.
For example, for the chain with transition matrix (3.11) we have the relations
1 → 2, 2 → 1, 3 → 4, and 4 → 3, together with all the relations i → i. However, for
the chain with transition matrix (3.12), we have all the relations i → i and
• 1 → 2, 1 → 3, 1 → 4,
• 2 → 1, 2 → 3, 2 → 4,
• 3 → 4,
• 4 → 3,
49
which can be seen from the fact that
19 5 5 13
72 18 18 72
10 97 1 1
P = ,
4 27 216 8 18
0 0 107 109
216 216
109 83
0 0 192 192
combined with the fact that the bottom left 2×2 sub-matrix of P n will always consist
entirely of zeros.
Definition 3.4.2. States i, j ∈ S of a Markov chain communicate with each other,
and we write i ↔ j, if i → j and j → i.
It is straightforward to verify that the relation ↔ is
1. Reflexive: i ↔ i.
2. Symmetric: i ↔ j implies j ↔ i.
3. Transitive: i ↔ j and j ↔ k implies i ↔ k.
Only the third condition need be checked, and it essentially follows from the
Chapman-Kolmogorov equations (3.9): Since i → j, there is an n ≥ 0 such that
(n) (m)
pij > 0. Since j → k, there is an m ≥ 0 such that pjk > 0. Therefore, by (3.9)
(n) (m) (n) (m)
pn+m
ik = pi pk ≥ pij pjk > 0,
and i → k.
We may now decompose the state space using the relation ↔ into disjoint equiv-
alence classes called communication classes. For example, the Markov chain with
transition matrix (3.11) has two communication classes {1, 2} and {3, 4}. Also, the
Markov chain with transition matrix (3.12) has the same communication classes:
{1, 2} and {3, 4}. For the deterministically monotone process of Example 3.1.5, each
singleton {i}, i ≥ 0, is its own communication class. For the symmetric random
walk of Example 3.1.8 with absorbing boundaries (the Gambler’s Ruin problem) the
communication classes are {0}, {N }, and {1, . . . , N − 1}, whereas for the symmetric
random walk with reflecting boundaries the only communication class is the entire
state space {0, . . . , N }. For the random walk on the integer lattice Zd described in
Example 3.1.11, the only communication class is all of Zd .
Definition 3.4.3. A Markov chain is irreducible if there is only one communication
class. That is, if i ↔ j for all i, j ∈ S. Otherwise, it is called reducible.
Consider again the Markov chains with transition matrices (3.11) and (3.12). For
both, the set of states {1, 2} is a communication class. However, it should be clear
that the behavior of the chains on {1, 2} are quite different as the chain with transition
matrix (3.12) will eventually leave those states (assuming it starts there), and never
return.
50
Definition 3.4.4. A subset of the state space C ⊂ S, is said to be closed if it is
impossible to reach any state outside of C from any state in C via one-step transitions.
That is, C is closed if pij = 0 for all i ∈ C and j ∈/ C. We say that the state j is
absorbing if {j} is closed.
The set {1, 2} is closed for the chain with transition matrix (3.11), whereas it is
not for that with transition matrix (3.12). However, the set {3, 4} is closed for both.
For the deterministically monotone system, the subset {n, n + 1, n + 2, . . . } is closed
for any n ≥ 0. For the Gambler’s ruin problem of random walk on {0, . . . , N } with
absorbing boundary conditions, only {0} and {N } are closed.
We point out that if C ⊂ S is closed, then the matrix with elements pij for i, j ∈ C
is a stochastic matrix because for any i ∈ C,
pij = 1, and pij = 0.
j∈C j∈C c
Therefore, if we restrict our attention to any closed subset of the state space, we can
treat the resulting model as a discrete time Markov chain itself. The most interesting
subsets will be those that are both closed and irreducible: for example the subset
{3, 4} of the Markov chain with transition matrix (3.11) or (3.12), which for either
model is a two-state Markov chain with transition matrix
1/3 2/3
P = .
3/4 1/4
3.4.2 Periodicity
Periodicity helps us understand the possible motion of a discrete time Markov chain.
As a canonical example, consider the random walker of Example 3.1.8 with state
space S = {0, 1, 2, 3, 4} and reflecting boundary conditions. Note that if this chain
starts in state i ∈ S, it can only return to state i on even times.
For another example, consider the Markov chain on {0, 1, 2} with
Thus, the chain deterministically moves from state 0 to state 1, then to state 2, then
back to 0, etc. Here, if the chain starts in state i, it can (and will) only return to
state i at times that are multiples of 3.
On the other hand, consider the random walk on S = {0, 1, 2, 3, 4} with boundary
conditions
p0,0 = 1/2, p0,1 = 1/2, and p4,3 = 1.
In this case, if the chain starts at state 0, there is no condition similar to those above
on the times that the chain can return to state 0.
51
(n)
where gcd stands for greatest common divisor. If {n ≥ 1 : pii > 0} = ∅,2 we take
d(i) = 1. If d(i) = 1, we say that i is aperiodic, and if d(i) > 1, we say that i is
periodic with a period of d(i).
The proof of the following theorem can be found in either [10, Chapter 1] or [13,
Chapter 2].
Theorem 3.4.6. Let Xn be a Markov chain with state space S. If i, j ∈ S are in the
same communication class, then d(i) = d(j). That is, they have the same period.
Therefore, we may speak of the period of a communication class, and if the chain
is irreducible, we may speak of the period of the Markov chain itself. Any property
which necessarily holds for all states in a communication class is called a class property.
Periodicity is, therefore, the first class property we have seen, though recurrence and
transience, which are discussed in the next section, are also class properties.
Periodicity is often obvious when powers of the transition matrix are taken.
Example 3.4.7. Consider a random walk on {0, 1, 2, 3} with reflecting boundary
conditions. This chain is periodic with a period of two. Further, we have
0 1 0 0
1/2 0 1/2 0
P = 0 1/2 0 1/2 ,
0 0 1 0
and for any n ≥ 1,
∗ 0 ∗ 0 0 ∗ 0 ∗
0 ∗ 0 ∗ ∗ 0 ∗ 0
P 2n =
∗ 0 ∗ 0 , and P 2n+1 =
0 ∗ 0 ∗ ,
0 ∗ 0 ∗ ∗ 0 ∗ 0
where ∗ is a generic placeholder for a positive number.
Example 3.4.8. Consider the random walk on S = {0, 1, 2, 3, 4} with boundary
conditions
p0,0 = 1/2, p0,1 = 1/2, and p4,3 = 1.
The transition matrix is
1/2 1/2 0 0 0
1/2 0 1/2 0 0
P = 0 1/2 0 1/2 0 ,
0 0 1/2 0 1/2
0 0 0 1 0
2
This happens, for example, for the deterministically monotone chain of Example 3.1.5.
52
and
71 57 1 9 7
256 256 4 64 64
57 39 29 21 1
256 128 256 64 32
1 29 49 9 7
P8 =
4 256 128 256 32
,
9 21 9 63 1
64 64 256 128 256
7 1 7 1 35
32 16 16 128 128
showing that d(i) = 1 for each i ∈ S.
In the previous example, we used the basic fact that if each element of P n is
positive for some n ≥ 1, then P n+k has strictly positive elements for all k ≥ 0. This
follows because (i) each element of P is nonnegative, (ii) the rows of P sum to one,
and (iii) P n+k = P P n+k−1 .
We let Ei be the expected value associated with the probability measure Pi . Let τi
denote the first return time to state i,
def
τi = min{n ≥ 1 : Xn = i},
Pi {τi < ∞} = 1,
denote the random variable giving the number of times the chain returns to state i.
Computing the expectation of R we see that
∞
∞
(n)
Ei R = Pi {Xn = i} = pii .
n=0 n=0
53
Suppose that the chain is transient and let
def
p = Pi {τi < ∞} < 1.
The random variable R is geometric with parameter 1 − p > 0. That is, for k ≥ 1
Therefore,
∞
1
Ei R = kpk−1 (1 − p) = < ∞. (3.13)
k=1
1−p
Note that equation (3.13) also shows that if the chain is transient, then
Pi {R = ∞} = 0
and there is, with a probability of one, a last time the chain visits the site i. Similarly,
if state i is recurrent, then Pi {R = ∞} = 1 and Ei R = ∞. Combining the above
yields the following.
Theorem 3.4.10. A state i is transient if and only if the expected number of returns
is finite, which occurs if and only if
∞
(n)
pii < ∞.
n=0
E[τi ] < ∞.
Otherwise, we say that the state i is null recurrent. The different types of recurrence
will be explored further in Section 3.5, where we will show why positive recurrence is
a much stronger form of recurrence than null recurrence. In fact, in many important
ways positive recurrent chains with an infinite state space behave like finite state
space chains.
The following theorem shows that recurrence, and hence transience, is a class
property. Thus, when the chain is irreducible, we typically say the chain is recurrent.
Theorem 3.4.11. Suppose that i ↔ j. Then state i is recurrent if and only if state
j is recurrent.
Proof. The following argument is the intuition needed to understand the result (which
is also the basis of the proof): because state i is recurrent, we return to it an infinite
number of times with a probability of one. We also know that there is an n > 0 for
(n)
which pij > 0. Thus, every time we are in state i, which happens an infinite number
54
of times, there is a positive probability that we get to state j in n steps. Thus, we
will enter state j an infinite number of times. The formal proof is below.
Suppose that i is recurrent. We must show that j is recurrent. Because i ↔ j,
(n) (m)
there are nonnegative integers n and m that satisfy pij , pji > 0. Let k be a non-
negative integer. It is an exercise in the use of conditional probabilities to show
that
(m+n+k) (m) (k) (n)
pjj ≥ pji pii pij ,
which says that one way to get from j to j in m + n + k steps is to first go to i in m
steps, then return to i in k steps, then return to j in n steps. Therefore,
∞
∞
∞
(k) (m+n+k) (m) (k) (n)
pjj ≥ pjj ≥ pji pii pij
k=0 k=0 k=0
∞
(m) (n) (k)
= pji pij pii .
k=0
Because i is recurrent, Theorem 3.4.10 shows that the sum is infinite, and hence that
state j is recurrent.
Note that Theorems 3.4.10 and 3.4.11 together guarantee the following:
Fact: All states of an irreducible, finite state space Markov chain are recurrent.
The above fact holds by the following logic: if the states were not recurrent, they
are each transient. Hence, there is a time, call it Ti , that a particular realization of
the chain visits state i. Therefore, maxi {Ti } is the last time the realization visits any
state, which can not be. Things are significantly less clear in the infinite state space
setting as the next few examples demonstrate.
Example 3.4.12. Consider a one dimensional random walk on the integer lattice
S = Z = {. . . , −2, −1, 0, 1, 2, . . . } where for some 0 < p < 1 we have
def
pi,i+1 = p, pi,i−1 = q, with q = 1 − p.
This chain is irreducible and has a period of 2. We will show that it is recurrent if
p = 1/2, and transient otherwise. To do so, we will verify the result at the origin
using Theorem 3.4.10, and then use Theorem 3.4.11 to extend the result to the entire
state space.
Notice that because of the periodicity of the system, we have
(2n+1)
p00 = 0,
55
Given that X0 = 0, if X2n = 0 the chain must have moved to the right n times and to
the left n times. Each such sequence
of steps has a probability of pn q n of occurring.
2n
Because there are exactly n such paths, we see
(2n) 2n (2n)!
p00 = (pq)n = (pq)n .
n n!n!
Therefore,
∞
∞
(2n) (2n)!
p00 = (pq)n .
n=0 n=0
n!n!
Recall that Stirling’s formula states that for m 1,
√
m! ∼ mm e−m 2πm,
f (m)
lim = 1.
m→∞ g(m)
The function 4pq = 4p(1 − p) is strictly less than one for all p ∈ [0, 1] with p = 1/2.
However, when p = 1/2, we have that 4p(1 − p) = 1. Therefore, in the case that
p = 1/2 we have
∞ ∞ ∞
(2n) (2n) 1
p00 > p00 > √ = ∞,
n=0 n=N n=N
2 πn
and by Theorem 3.4.10, the chain is recurrent. When p = 1/2, let ρ = 4pq < 1. We
have ∞ ∞
(2n)
2
p00 < N + √ ρn < ∞,
n=0 n=N
πn
and by Theorem 3.4.10, the chain is transient.
Example 3.4.13. We consider now the symmetric random walk on the integer lattice
Zd introduced in Example 3.1.11. Recall that for this example,
1/2d if |i − j| = 1
pij = .
0 else
56
We again consider starting the walk at the origin 0 = (0, 0, . . . , 0). The chain has a
(2n+1)
period of 2, and so p0,0 = 0 for all n ≥ 0. Thus, to apply Theorem 3.4.10 we only
(2n)
need an expression for p0,0 . We will not give a rigorous derivation of the main results
here as the combinatorics for this example are substantially more cumbersome than
the last. Instead, we will make use of the following facts, which are intuitive:
(i) For large value of n, approximately 2n/d of these steps will be taken in each of
the d dimensions.
(ii) In each of the d dimensions, the analysis of the previous example implies that
theprobability that that component is at zero at time 2n/d is asymptotic to
1/ π(n/d).
Proof. Suppose C is a recurrent class that is not closed. Then, there exists i ∈ C and
j∈/ C such that pij > 0, but it is impossible to return to state i (otherwise, i ↔ j).
Therefore, the probability of starting in i and never returning is at least pij > 0, a
contradiction with the class being recurrent.
Note that the converse of the above theorem is, in general, false. For example, for
the deterministic monotone chain, each set {n, n + 1, . . . } is closed, though no state
is recurrent.
Suppose that P is a transition matrix for a Markov chain and that R1 , . . . , Rr are
the recurrent communication classes and T1 , . . . , Ts are the transient classes. Then,
3
The video game “Tron” points this out well. Imagine how the game would play in three dimen-
sions.
57
after potentially reordering the indices of the state, we can write P in the following
form:
P1
P2 0
P3 0
P = . . , (3.14)
0 .
Pr
S Q
where Pk is the transition matrix for the Markov chain restricted to Rk . Raising P
to powers of n ≥ 1 yields
n
P1
P2n 0
P n
0
3
Pn = .. ,
0 .
n
Pr
n
Sn Q
and to understand the behavior of the chain on Rk , we need only study Pk . The
Matrix Q is sub-stochastic in that the row sums are all less than or equal to one,
and at least one of the row sums is strictly less than one. In this case each of the
eigenvalues has an absolute value that is strictly less than one, and it can be shown
that Qn → 0, as n → ∞.
πT P = πT , (3.15)
58
probability from state j to state i, then (3.16) can be interpreted in the following
manner: for each state i, the probability of being in state i is equal to the sum of the
probability of being in state j times the “flow rate” from state j to i.
A stationary distribution can be interpreted as a fixed point for the Markov chain
because if the initial distribution of the chain is given by π, then the distribution at
all times n ≥ 0 is also π,
π T P n = π T P P n−1 = π T P n−1 = · · · = π T ,
where we are using equation (3.10). Of course, in the theory of dynamical systems it
is well known that simply knowing a fixed point exists does not guarantee that the
system will converge to it, or that it is unique. Similar questions exist in the Markov
chain setting:
We recall that we have already seen an example in which all of the above questions
where answered. Recall that in Section 3.3, we showed that if the two-state Markov
chain has transition matrix
2/3 1/3
P = , (3.17)
1/8 7/8
then for very large n,
n 3/11 8/11
P ≈ = Π.
3/11 8/11
The important point was that the rows of Π are identical and equal to π T = [3/11, 8/11],
and therefore, if v is an arbitrary probability vector,
lim v T P n = v T Π = π T ,
n→∞
Example 3.5.2. Suppose that Xn is a three state Markov chain with transition
matrix
2/3 1/3 0
P = 1/12 5/8 7/24 . (3.18)
0 1/8 7/8
59
Then, for large n
3/43 12/43 28/43
P n ≈ 3/43 12/43 28/43 = Π,
3/43 12/43 28/43
where we again note that each row of Π is identical. Therefore, regardless of the
initial distribution, we have
3 12 28
lim P {Xn = 1} = , lim P {Xn = 2} = , and lim P {Xn = 3} = .
n→∞ 43 n→∞ 43 n→∞ 43
We again note that it is straightforward to check that [3/43, 12/43, 28/43] is the
unique left eigenvalue of P with an eigenvalue of 1.
Interestingly, we were able to find stationary distributions for the above transition
matrices without actually computing the left eigenvectors. Instead, we just found the
large n probabilities. Question 3 above asks when such a link between stationary
distributions and large n probabilities holds (similar to convergence to a fixed point
for a dynamical system). This question will be explored in detail in the current
section, however we begin my making the observation that if
π T = lim v T P n ,
n→∞
60
Example 3.5.3. Consider simple random walk on {0, 1, 2} with reflecting boundaries.
In this case we have
0 1 0
P = 1/2 0 1/2 .
0 1 0
It is simple to see that for n ≥ 1,
1/2 0 1/2
P 2n = 0 1 0 ,
1/2 0 1/2
and,
0 1 0
P 2n+1 = 1/2 0 1/2 .
0 1 0
It is easy to see why this happens. If the walker starts at 1, then she must be at one
after an even number of steps, etc. This chain is therefore periodic. Clearly, P n does
not converge in this example.
Example 3.5.4. Consider simple random walk on {0, 1, 2, 3} with absorbing bound-
aries. That is
1 0 0 0
1/2 0 1/2 0
P = 0 1/2 0 1/2 .
0 0 0 1
For n large we have
1 0 0 0
2/3 0 0 1/3
Pn ≈
1/3
.
0 0 2/3
0 0 0 1
Again, this is believable, as you are assured that you will end up at 0 or 3 after enough
time has passed. We see the problem here is that the states {1, 2} are transient.
Example 3.5.5. Suppose that S = {1, 2, 3, 4, 5} and
1/3 2/3 0 0 0
3/4 1/4 0 0 0
P = 0 0 1/8 1/4 5/8 .
0 0 0 1/2 1/2
0 0 1/3 0 2/3
For n 1, we have
9/17 8/17 0 0 0
9/17 8/17 0 0 0
P ≈
n
0 0 8/33 4/33 7/11 .
0 0 8/33 4/33 7/11
0 0 8/33 4/33 7/11
61
In this case, the Markov chain really consists of two smaller, noninteracting chains:
one on {1, 2} and another on {3, 4, 5}. Each subchain will converge to its equilibrium
distribution, but there is no way to move from one subchain to the other. Here the
problem is that the chain is reducible.
These examples actually demonstrate everything that can go wrong. The following
theorem is the main result of this section.
Theorem 3.5.6. Suppose that P is the transition matrix for a finite Markov chain
that is irreducible and aperiodic. Then, there is a unique stationary distribution π,
πT P = πT ,
for which πi > 0 for each i. Further, if v is any probability vector, then
lim v T P n = π T .
n→∞
If we let π1 and π2 denote the respective left eigenvectors, and let v be an arbitrary
probability vector, then because π1 and π2 are necessarily linearly independent
v T P n → c1 π1T , as n → ∞,
(iii) The left eigenvector associated with the eigenvalue 1 has strictly positive entries.
62
Then
vT P n → πT , as n → ∞, (3.19)
for any probability vector v, where π is the unique left eigenvector normalized to sum
to one, and πi > 0 for each i.
Note that condition (iii) above is not strictly necessary as it just guarantees
convergence to a vector giving non-zero probability to each state. However, it is
included for completeness (since this will be the case for irreducible chains), and we
will consider the possibility of πi = 0 for some i (which will occur if there are transient
states) later.
It turns out the above claim follows from a straightforward use of Jordan canonical
forms. We point the interested reader to [10, Chapter 1] for full details. However, it
is probably more instructive to show the result in a slightly less general setting by
also assuming that there is a full set of distinct eigenvalues for P (though we stress
that the claim holds even without this added assumption). Thus, let λ1 , λ2 , . . . , λN ,
be the eigenvalues of P with λ1 = 1 and |λi | < 1, for i > 1. Let the corresponding
left eigenvectors be denoted by πi , where π1 is normalized to sum to one (that is, it
is a probability vector). The eigenvectors are necessarily linearly independent and so
we can write our initial distribution as
v = c1 π 1 + c2 π 2 + · · · + cN π N ,
for some choice of ci , which depend upon our choice of v. Thus, letting π (n) denote
the distribution at time n we see
π (n) = v T P n
= (c1 π1T + c2 π2T + · · · + cN πN
T
)P n
= c1 λn1 π1T + c2 λn2 π2T + · · · + cN λnN πN
T
→ c1 π 1 ,
as n → ∞. Note that as both π (n) and π1 are probability vectors, we see that c1 = 1,
which agrees with our examples above. We further note the useful fact that the rate
of convergence to the stationary distribution is dictated by the size of the second
largest (in absolute value) eigenvalue.
Returning to Theorem 3.5.6, we see that the theorem will be proved if we can
verify that the transition matrix of an aperiodic, irreducible chain satisfies the three
conditions above. By the Perron-Frobenius theorem, any stochastic matrix, Q, that
has all strictly positive entries satisfies the following:
(ii) the left eigenvector associated with 1 can be chosen to have strictly positive
entries,
63
Therefore, the Perron-Frobenius theorem almost gives us what we want. However,
the transition matrix for aperiodic, irreducible chains do not necessarily have strictly
positive entries, see (3.18) of Example 3.5.2, and so the above can not be applied
directly.
However, suppose instead that P n has strictly positive entries for some n ≥ 1.
Then the Perron-Frobenius theorem can be applied to P n , and conditions (i), (ii),
and (iii) directly above hold for P n . However, by the spectral mapping theorem
the eigenvalues of P n are simply the nth powers of the eigenvalues of P , and the
eigenvectors of P are the eigenvectors of P n . We can now conclude that P also
satisfies the conclusions of the Perron-Frobenius theorem by the following arguments:
3. The left eigenvector of P associated with eigenvalue 1 has strictly positive com-
ponents since this is the eigenvector with eigenvalue 1 for P n .
Proof. The proof of the claim is relatively straightforward, and the following is taken
from [10, Chapter 1]. We take the following fact for granted, which follows from a
result in number theory: if the chain is aperiodic, then for each state i, there is an
(n)
M (i) for which pii > 0 for all n ≥ M (i).
Returning to the proof of the claim, we need to show that there is an M > 0 so
that if n ≥ M , then we have that P n has strictly positive entries. Let i, j ∈ S. By
the irreducibility of the chain, there is an m(i, j) for which
(m(i,j))
pij > 0.
Now, simply let M be the maximum over M (i) + m(i, j), which exists since the state
(n)
space is finite. Thus, pij > 0 for all n ≥ M and all i, j ∈ S.
We pause to reflect upon what we have shown. We have concluded that for an
irreducible, aperiodic Markov chain, if we wish to understand the large time prob-
abilities associated with the chain, then it is sufficient to calculate the unique left
64
eigenvector of the transition matrix with eigenvalue equal to one. Such computations
can be carried out by hand for small examples, though are usually performed with
software (such as Maple or Mathematica) for larger systems. In the next sub-section
we consider what changes when we drop the irreducible assumption. We will consider
the periodic case when we turn to infinite state space Markov chains in Section 3.5.2.
Example 3.5.7. Consider a Markov chain with state space {0, 1, 2, 3} and transition
matrix
0 1/5 3/5 1/5
1/4 1/4 1/4 1/4
P = 1
.
0 0 0
0 1/2 1/2 0
Find limn→∞ P {Xn = 2}.
Solution. It is easy to verify that
3/16 77/400 18/400 67/400
127/320 57/320 97/320 39/320
3
P = ,
13/20 3/20 3/20 1/20
5/32 7/32 15/32 5/32
Therefore,
22
lim P {Xn = 2} = .
n→∞ 59
Reducible chains
We turn to the case of a reducible chain and begin with some examples.
Example 3.5.8. Consider the gambler’s ruin problem on the state space {0, 1, 2, . . . , N }.
Setting
πα = (α, 0, 0, . . . , 1 − α),
for any 0 ≤ α ≤ 1, it is straightforward to show that παT P = παT . Thus, there are
uncountably many stationary distributions for this example, though it is important
to note that they are all linear combinations of (1, 0, . . . , 0) and (0, . . . , 0, 1), which
are the stationary distributions on the recurrent classes {0} and {N }.
Example 3.5.9. Consider the Markov chain on {1, 2, 3, 4} with
P1 0
P = ,
0 P2
65
with
1/2 1/2
Pi = , for i ∈ {1, 2}.
1/2 1/2
Then, the communication classes {1, 2} and {3, 4} are each irreducible and aperiodic,
and have stationary distribution (1/2, 1/2). Also for any 0 ≤ α ≤ 1,
The above examples essentially show what happens in this case of a reducible
Markov chain with a finite state space. All of the mass of a limiting distribution will
end up on the recurrent classes, and the form of the stationary distribution on the
recurrent classes can be found by the results in the previous section.
Consider now a general finite state space Markov chain with reducible state space,
S, that is restricted to any recurrent communication class R1 ⊂ S. If the Markov
chain is aperiodic on R1 , then by Theorem 3.5.6 a unique stationary distribution,
π (1) , exists with support only on R1 . Clearly, the previous argument works for each
recurrent communication class Rk ⊂ S. Therefore, we have the existence of a family
of stationary distributions, π (k) , which are limiting stationary distributions for the
Markov chain restricted to the different Rk . We note the following (some of which
are left as homework exercises to verify):
1. Each such π (k) is a stationary distribution for the original, unrestricted Markov
chain.
Thus, in the case that the Markov chain is reducible, the limiting probabilities will
depend on the initial condition. That is, if αk (i) is the probability that the chain
ends up in recurrent class Rk given it starts in state i, then for j ∈ Rk ,
(n) (k)
lim pij = αk (i)πj (3.21)
n→∞
where we will discuss how to calculate αk (i) in later sections. Note, however, that
αk (i) will be one if i, j are in the same recurrent class, zero if they are in different
recurrent classes, and between zero and one if i is transient and i → j. We conclude
that if v is an initial distribution for a reducible finite state space Markov chain, then
the limit limn→∞ v T P n will always exists, though it will depend upon v.
66
3.5.2 Countable Markov chains
We now extend the results of the previous section to the setting of a countably infinite
state space. We note that every result stated in this section also holds for the finite
state space case, and these are the most general results. We begin with an example
demonstrating a major difference between the finite and countable state space setting.
Example 3.5.10. Consider symmetric random walk on the integers. That is, the
state space is S = Z and pi,i+1 = pi,i−1 ≡ 1/2 for all i. We know from Example 3.4.12
that this chain is recurrent, and we search for a stationary distribution π satisfying
π T = π T P , where P is the transition matrix. This yields
1
πj = πk pkj = πj−1 pj−1,j + πj+1 pj+1,j = πj−1 (1/2) + πj+1 (1/2) = (πj−1 + πj+1 ),
k
2
for all j ∈ Z. These can be solved by taking πj ≡ 1. Note, however, that in this case
we can not scale the solution to get a stationary distribution, and so such a π is an
invariant measure, though not a stationary distribution.
While the Markov chain of the previous example was recurrent, and therefore one
might expect a stationary distribution to exist, it turns out the chain “is not recurrent
enough.” We recall that we define τi to be the first return time to state i,
def
τi = min{n ≥ 1 : Xn = i},
where we take τi = ∞ if the chain never returns. We further recall that the state i is
called recurrent if Pi (τi < ∞) = 1 and transient otherwise. In the infinite state space
setting it is useful to subdivide the set of recurrent states even further.
Definition 3.5.11. The value
∞
def
µ i = E i τi = nPi {τi = n}
n=1
is called the mean recurrence time or mean first return time for state i. We say that
the chain is positive recurrent if Ei τi < ∞, and null recurrent otherwise.
Note that we have we have µi = ∞ for a transient state as in this case Pi {τi =
∞} > 0.
The following is stated without proof. However, for those that are interested, the
result follows directly from basic renewal theory, see [13, Chapter 3]. Theorem 3.5.12
captures the main difference between positive recurrent and other (null recurrent and
transient) chains.
Theorem 3.5.12. Consider a recurrent, irreducible, aperiodic Markov chain. Then,
for any i, j ∈ S
(n) 1
lim pji = ,
n→∞ µi
where if µi = ∞ (null recurrence), we interpret the right hand side as zero.
67
The similar theorem for periodic chains is the following.
Recurrence has already been shown to be a class property. The following theorem
shows that positive recurrence is also a class property.
Theorem 3.5.14. Suppose that i ↔ j belong to the same class and that state i is
positive recurrent. Then state j is positive recurrent.
Proof. We will prove the result in the aperiodic case so that we may make use of
Theorem 3.5.12. We know from Theorem 3.5.12 that
(n) 1
lim pkj = ,
n→∞ µj
for any k in the same class as j. Because j is positive recurrent if and only if µj < ∞,
we see it is sufficient to show that
(n)
lim pij > 0.
n→∞
(m)
Because i ↔ j, there is an m > 0 for which pij > 0. Therefore,
Example 3.5.15. Consider again the symmetric (p = 1/2) random walk on the
integer lattice. We previously showed that
(2n) 1
p00 ∼ √ .
πn
(2n)
Therefore, limn→∞ p00 ∼ √1πn = 0, and by Theorem 3.5.13 we have that µ0 = ∞,
and the chain is null recurrent. Thus, when p = 1/2, the chain is periodic and null
recurrent, and when p = 1/2, the chain is periodic and transient.
Theorem 3.5.12 also gives a strong candidate for a limiting stationary distribution
for a positive recurrent, irreducible, aperiodic Markov chain.
68
Theorem 3.5.16. If a Markov chain is irreducible and recurrent, then there is an
invariant measure π, unique up to multiplicative constants, that satisfies 0 < πj < ∞
for all j ∈ S. Further, if the Markov chain is positive recurrent then
1
πi = ,
µi
where µi is the mean recurrence time of state i, i πi = 1, and π is a stationary
(n)
distribution of the Markov chain. If the Markov chain is also aperiodic, then pji → πi ,
as n → ∞, for all i, j ∈ S.
Proof. We will verify the result in the positive recurrent case only, and
direct the
reader to [13, Chapter 2.12] for the full details. We first show that i∈S πi = 1.
Choosing any k ∈ S, we see
(n) 1
1 = lim pkj = ,
n→∞
j∈S j∈S
µ j
where the final equality follows from Theorem 3.5.12. Next, for any k ∈ S,
1 (n+1)
= lim pki = lim P {Xn+1 = i | Xn = j}P {Xn = j | X0 = k}
µi n→∞ j∈S
n→∞
(n)
= pji lim pkj
n→∞
j∈S
1
= pji .
j∈S
µj
69
A question still remains: can the invariant measure of a null recurrent chain be
normalized to give a stationary distribution? The answer, given in the following
theorem, is no.
Theorem 3.5.18. Suppose a Markov chain is irreducible and that a stationary dis-
tribution π exists:
π = π P, πj = 1, πj > 0.
j∈S
π T = π T P,
Example 3.5.19 (Random walk with partially reflecting boundaries, [10]). Consider
again a random walker on S = {0, 1, 2, . . . }. Suppose that for j ∈ S the transition
probabilities are given by
pj,j+1 = p, pj,j−1 = 1 − p, if j ≥ 1,
p01 = p, p00 = 1 − p.
This Markov chain is irreducible and aperiodic. We want to determine when this
model will have a limiting stationary distribution, and, hence, when it is positive
recurrent.
A stationary distribution for this system must satisfy
70
Plugging this into the above equation shows c1 = 0 in the p = 1/2 case, and that
c2 = 0 in the p = 1/2 case. Therefore,
j
p
c2 1−p , p = 1/2
πj = .
c1 , p = 1/2
Because we need ∞ j=0 πj = 1 for a distribution to exist, we see that if p = 1/2, no
choice of c1 could satisfy this condition.
Now just consider the case p = 1/2. We obviously require that c2 > 0. If p > 1/2,
then p/(1 − p) > 1 and the sum
∞
j
p
c2 = ∞.
j=0
1−p
Thus, the chain is positive recurrent when p < 1/2, which is believable. We also know
that the chain is either null recurrent or transient if p ≥ 1/2.
Suppose that we want to figure out when the chain of the previous example is
either null recurrent or transient. We will make use of the following non-trivial fact,
which is stated without proof. We will make use of this fact again in later sections.
Theorem 3.5.20. Let Xn be an irreducible Markov chain with state space S, and let
i ∈ S be arbitrary. Then Xn is transient if and only if there is a unique solution,
α : S → R, to the following set of equations
0 ≤ αj ≤ 1 (3.24)
αi = 1, inf{αj : j ∈ S} = 0 (3.25)
αj = pjk αk , i = j. (3.26)
k∈S
71
and we assume our chain is transient. Then, αi = 1 by constructions and we should
have αi → 0 by transience (though we are not going to prove this fact). Finally, for
j = i, we have
αj = P {Xn = i for some n ≥ 0 | X0 = j}
= P {Xn = i for some n ≥ 1 | X0 = j}
= P {Xn = i for some n ≥ 1 | X1 = k}P {X1 = k | X0 = j}
k
= pjk αk .
k
72
Theorem 3.5.22. Consider an irreducible, positive recurrent Markov chain with
unique stationary distribution π. If we let
n−1
Ni (n) = 1{Xk =i} ,
k=0
The final result says that the time averages of a single realization of the Markov
chain converge (with probability one) to the “space averages” obtained by simply
taking expectations with respect to the distribution π. More explicitly, think of a
random variable X∞ having probability mass function P {X∞ = i} = πi . Then, by
definition,
f (i)πi = Ef (X∞ ).
i∈S
Example 3.5.23. Consider the Markov chain with state space {1, 2, 3} and transition
matrix
1/3 2/3 0
P = 1/4 1/2 1/4 . (3.27)
1 0 0
It is simply to check that the unique stationary distribution of this chain is π =
[3/8, 1/2, 1/8]. Therefore, for example, limn→∞ P {Xn = 3} = 1/8. However, we can
also approximate this value using Theorem 3.5.22. Figure 3.5.23 plots (1/n) n−1
k=0 1{Xk =3}
versus n for one realization of the chain. We see it appears to converges to 1/8.
1. How many steps do we expect the chain to make before being absorbed by a
recurrent class if X0 = i is a transient state?
73
0.25
(1/n)N3(n)
0.2
0.15
0.125
0.1
0.05
0
0 500 1000 1500 2000
n
Figure 3.5.1: (1/n) n−1
k=0 1{Xk =3} versus n for one realization of the Markov chain with
transition matrix (3.27). A line of height 0.125 = 1/8 has been added for reference.
We answer these questions sequentially and note that much of the treatment
presented here follows Section 1.5 in Greg Lawler’s book [10].
Question 1. We let P be the transition matrix for some finite Markov chain Xn .
We recall that after a possible reordering of the indices, we can write P as
P̃ 0
P = , (3.28)
S Q
where P̃ is the transition matrix for only those states associated with recurrent states,
Q is the submatrix of P giving the transition probabilities from the transient states
to the transient states, and S is the submatrix of P giving the transition probabilities
from the transient states to the recurrent states. Raising powers of P in the form
(3.28) yields, n
n P̃ 0
P = .
Sn Q n
74
For example, consider the the Markov chain with state space {1, 2, 3, 4} and tran-
sition matrix given by (3.12),
1/2 1/4 1/4 0
1/3 2/3 0 0
P = 0
.
0 1/3 2/3
0 0 3/4 1/4
After reordering the elements of the state space as {3, 4, 1, 2} the new transition
matrix is
1/3 2/3 0 0
3/4 1/4 0 0
1/4 0 1/2 1/4 , (3.29)
0 0 1/3 2/3
and for this example
1/3 2/3 1/2 1/4 1/4 0
P̃ = , Q= , and S = .
3/4 1/4 1/3 2/3 0 0
The above proposition can be proved in a number of ways using basic linear algebra
techniques. However, for our purposes it may be best to understand it in the following
probabilistic way. Because each of the states represented by Q are transient, we know
that Qn , which gives the n step transition probabilities between the transient states,
converges to zero, implying the result.
Because the eigenvalues of Q have absolute value strictly less than one, the equa-
tion (Id − Q)v = 0, where Id is the identity matrix with the same dimensions as Q,
has no solutions. Thus, Id − Q is invertible and we define
def
M = (Id − Q)−1 = Id + Q + Q2 + · · · , (3.30)
(Id + Q + Q2 + · · · )(Id − Q) = Id .
Now consider a transient state j. We let Rj denote the total number of visits to
j,
∞
Rj = 1{Xn =j} ,
n=0
75
where we explicitly note that if the chain starts in state j, then we count that as one
visit. Note that Rj < ∞ with a probability of one no matter the initial condition
since j is transient.
Suppose that X0 = i, where i is also transient. Then,
∞
∞
(n)
E[Rj | X0 = i] = P {Xn = j | X0 = i} = pij .
n=0 n=0
Id + P + P 2 + · · · ,
which, because both i and j are transient, is the same as the i, j th entry of
Id + Q + Q2 + · · · = (I − Q)−1 .
Therefore, we can conlude that the expected number of visits to state j, given that
the chain starts in state i, is Mij , defined in (3.30).
For example, consider the Markov chain with transition matrix (3.29). For this
example, the matrix M for the transient states {1, 2} is
−1 4 3
M = (Id − Q) = .
4 6
We see that starting in state 1, for example, the expected number of visits to state 2
before being absorbed to the recurrent states is equal to M12 = 3. Starting in state 2,
the expected number of visits to state 2 (including the first) is M22 = 6. Now suppose
we want to know the total number of visits to any recurrent state given that X0 = 2.
This value is give by
E2 R1 + E2 R2 = M21 + M22 = 10,
and we see that we simply need to sum the second row of M . We also see that the
expected total number of steps needed to transition from state 2 to a recurrent state
is 10.
More generally, we have shown the following.
Question 2. We now turn to the second question posed at the beginning of this
section: for given states i, j ∈ S of an irreducible chain, what is the expected number
of needed steps to go from state i to state j?
76
With the machinery just developed, this problem is actually quite simple now.
We begin by reordering the state space so that j is the first element. Hence, the
transition matrix can be written as
pjj U
P = ,
S Q
where Q is a substochastic matrix and the row vector U has the transition probabilities
from j to the other states. Next, simply note that the answer to the question of how
many steps are required to move from i to j would be unchanged if we made j an
absorbing state. Thus, we can consider the problem on the system with transition
matrix
1 0
P̃ = ,
S Q
where all notation is as before. However, this is now exactly the same problem solved
above and we see the answer is M 1i , where all notation is as before.
Example 3.6.3 (Taken from Lawler, [10]). Suppose that P is the transition matrix
for random walk on {0, 1, 2, 3, 4} with reflecting boundary:
0 0 1 0 0 0
1 1/2 0 1/2 0 0
P = 2 0 1/2 0 1/2 0 .
3 0 0 1/2 0 1/2
4 0 0 0 1 0
If we let i = 0, then
0 1/2 0 0 2 2 2 1
1/2 0 1/2 0 2 4 4 2
Q= 0
, M = (I − Q)−1 = .
1/2 0 1/2 2 4 6 3
0 0 1 0 2 4 6 4
Thus,
M 1 = (7, 12, 15, 16),
Therefore, the expected number of steps needed to get from state 3 to state 0 15.
Example 3.6.4. Consider the Jukes-Cantor model of DNA mutation. The transition
matrix for this model is
1 − ρ ρ/3 ρ/3 ρ/3
ρ/3 1 − ρ ρ/3 ρ/3
P =
ρ/3
,
ρ/3 1 − ρ ρ/3
ρ/3 ρ/3 ρ/3 1 − ρ
If at time zero the nucleotide is in state 1, how many steps do we expect to take place
before it enters states 3 or 4? Recalling that the different states are A, G, C, and T,
77
we note that A (adenine) and G (guanine) are purines and that C (cytosine) and T
(thymine) are pyrimidines. Thus, this question is asking for the expected time until
a given purine converts to a pyrimidine.
We make {3, 4} absorbing states, reorder the state space as {3, 4, 1, 2} and note
that the new transition matrix is
1 0 0 0
0 1 0 0
ρ/3 ρ/3 1 − ρ ρ/3 ,
ρ/3 ρ/3 ρ/3 1 − ρ
Therefore, the expected number of transitions needed to go from state 1 (A) to states
3 or 4 (C or T) is
9 3 31
M11 + M12 = + = .
8ρ 8ρ 2ρ
Note that this value goes to ∞ as ρ → 0, which is reasonable.
Question 3. We turn now to the third question laid out at the beginning of this
section: if X0 = j is a transient state, and the recurrent classes are denoted R1 , R2 , . . . ,
what is the probability that the chain eventually ends up in recurrent class Rk ? Note
that this question was asked first in and around equation (3.21).
We begin by noting that we can assume that each recurrent class consists of a
single point (just group all the states of a class together). Therefore, we denote
the recurrent classes as r1 , r2 , . . . , with pri ,ri = 1. Next, we let t1 , t2 , . . . denote the
transient states. We may now write the transition matrix as
I 0
P = ,
S Q
where we put the recurrent states first. For any transient state ti and recurrent class
k, we define
def
αk (ti ) = P {Xn = rk for some n ≥ 0 | X0 = ti }.
78
For a recurrent states rk , ri we set αk (rk ) ≡ 1, and αk (ri ) = 0, if i = k. Then, for any
transient state ti we have
αk (ti ) = P {Xn = rk for some n ≥ 0 | X0 = ti }
= P {X1 = j | X0 = ti }P {Xn = rk for some n ≥ 0 | X1 = j}
j∈S
= pti ,j αk (j)
j∈S
= pti ,rj αk (rj ) + pti ,tj αk (tj )
rj tj
= pti ,rk + pti ,tj αk (tj ),
tj
where the first sum was over the recurrent states and the second (and remaining)
sum is over the transient states. If A is the matrix whose i, k th entry is αk (ti ), then
the above can be written in matrix form:
A = S + QA.
Again letting M = (I − Q)−1 we have
A = (I − Q)−1 S = M S.
Example 3.6.5. Consider again the Markov chain with state space {3, 4, 1, 2} and
transition matrix
1/3 2/3 0 0
3/4 1/4 0 0
.
1/4 0 1/2 1/4
0 0 1/3 2/3
Note that for this example, we know that we must enter state 3 before state 4, so it
is a good reality check on our analysis above. We again have
4 3 1/4 0
M= , and S = ,
4 6 0 0
and so
1 0
MS = ,
1 0
as expected.
Example 3.6.6 (Taken from Lawler, [10]). As an example, consider random walk
with absorbing boundaries. We order the states S = {0, 4, 1, 2, 3, } and have
1 0 0 0 0
0 1 0 0 0
P = 1/2 0 0 1/2 0 ,
0 0 1/2 0 1/2
0 1/2 0 1/2 0
79
Then,
1/2 0 3/2 1 1/2 3/4 1/4
S= 0 0 , M = 1 2 1 , M S = 1/2 1/2
0 1/2 1/2 1 3/2 1/4 3/4
Thus, starting at state 1, the probability that the walk is eventually absorbed at state
0 is 3/4.
3.7 Exercises
1. Suppose there are three white and three black balls in two urns distributed so
that each urn contains three balls. We say the system is in state i, i = 0, 1, 2, 3,
if there are i white balls in urn one. At each stage one ball is drawn at random
from each urn and interchanged. Let Xn denote the state of the system after the
nth draw. What is the transition matrix for the Markov chain {Xn : n ≥ 0}.
2. (Success run chain.) Suppose that Jake is shooting baskets in the school
gym and is very interested in the number of baskets he is able to make in a
row. Suppose that every shot will go in with a probability of p ∈ (0, 1), and
the success or failure of each shot is independent of all other shots. Let Xn
be the number of shots he has currently made in a row after n shots (so, for
example, X0 = 0 and X1 ∈ {0, 1}, depending upon whether or not he hit the
first shot). Is it reasonable to model Xn as a Markov chain? What is the state
space? What is the transition matrix?
80
a given day. What is the transition matrix when we model this situation with
a Markov chain. If we assume there is a 40% chance of rain today, what is the
probability it will rain three days from now if α = 7/10 and β = 3/10.
5. Verify the condition (3.4). Hint, use an argument like equation (3.5).
7. Let Xn denote a discrete time Markov chain with state space S = {1, 2, 3, 4}
and with transition Matrix
1/4 0 1/5 11/20
0 0 0 1
P = 1/6 1/7 0 29/42 .
(U1 , U2 , . . . , U10 )
= (0.7943, 0.3112, 0.5285, 0.1656, 0.6020, 0.2630, 0.6541, 0.6892, 0.7482, 0.4505)
What are the communication classes? Which classes are closed? Which classes
are recurrent and which are transient?
81
9. Consider a finite state space Markov chain, Xn . Suppose that the recurrent
communication classes are R1 , R2 , . . . , Rm . Suppose that restricted to Rk , the
Markov chain is irreducible and aperiodic, and let π̃ (k) be the unique limiting
stationary distribution for the Markov chain restricted to Rk . Now, for each Rk ,
let πk (note the lack of a tilde) be the vector with components equal to those
of π̃ (k) for those states in Rk , and zero otherwise. For example, assuming there
are three states in R1 ,
(1) (1) (1)
π (1) = (π̃1 , π̃2 , π̃3 , 0, 0, . . . , 0),
a1 π (1) + · · · + am π (m)
with ai ≥ 0 and i ai = 1, is a stationary distribution for the unrestricted
Markov chain, Xn .
(b) All stationary distributions of the Markov chain Xn can be written as such
a linear combination. (Hint: use the general form of the transition matrix
given by equation (3.14). Now, break up an arbitrary stationary distribu-
tion, π, into the different components associated with each communication
class. What can be concluded about each piece of π?)
10. Consider the Markov chain described in Problem 3 above. What is the sta-
tionary distribution for this Markov chain. Interpret this result in terms of the
probabilities of the nucleotide being the different possible values for large times.
Does this result make sense intuitively?
11. Show that the success run chain of Problem 2 above is positive recurrent. What
is the stationary distribution of this chain? Using the stationary distribution,
what is the expected number of shots Jake will hit in a row.
12. Let Xn be the number of customers in line for some service at time n. During
each time interval, we assume that there is a probability of p that a new customer
arrives. Also, with probability q, the service for the first customer is completed
and that customer leaves the queue. Assuming at most one arrival and at most
one departure can happen per time interval, the transition probabilities are
p00 = 1 − p, p01 = p.
82
(a) Argue why the above transition probabilities are the correct ones for this
model.
(b) For which values of p and q is the chain null recurrent, positive recurrent,
transient?
(c) For the positive recurrent case, give the limiting probability distribution
π. (Hint: note that the equation for π0 and π1 are both different than the
general nth term.)
(d) Again in the positive recurrent case, using the stationary distribution you
just calculated, what is the expected length of the queue in equilibrium?
What happens to this average length as p → q. Does this make sense?
13. This problem has you redo the computation of Example 3.27, though with a
different Markov chain. Suppose our state space is {1, 2, 3, 4} and the transition
matrix is
1/4 0 1/5 11/20
0 0 0 1
P = 1/6 1/7 0 29/42 ,
14. (Taken from Lawler, [10]) You will need software for this problem to deal with
the matrix manipulations. Suppose that we flip a fair coin repeatedly until
we flip four consecutive heads. What is the expected number of flips that are
needed? (Hint: consider a Markov chain with state space {0, 1, 2, 3, 4}.)
15. You will need software for this problem to deal with the matrix manipulations.
Consider a Markov chain Xn with state space {0, 1, 2, 3, 4, 5} and transition
matrix
1/2 0 0 0 1/2 0
0 3/4 1/4 0 0 0
0 1/8 7/8 0 0 0
P =
1/2 1/4 1/4 0 0 0
1/3 0 0 0 1/3 1/3
0 0 1/4 1/4 0 1/2
Here the only recurrent class is {1, 2}. Suppose that X0 = 0 and let
(a) What is ET ?
83
(b) What is P {XT = 1}? P {XT = 2}? (Note that this is asking for the
probabilities that when the chain enters the recurrent class, it enters into
state 1 or 2.)
16. (Taken from Lawler, [10]) You will need software for this problem to deal with
the matrix manipulations. Let Xn and Yn be independent Markov chains with
state space {0, 1, 2} and transition matrix
1/2 1/4 1/4
P = 1/4 1/4 1/2 .
0 1/2 1/2
T = inf{n : Xn = Yn }.
A hint for all parts of this problem: consider the nine-state Markov chain Zn =
(Xn , Yn ).
84