ST202 Notes
ST202 Notes
Module Notes
Based on notes by Matt Ball, Ed Jackson, and Rosalia Linfield, adapted to the content covered
in 2022/23.
May 8, 2023
Contents
1 Basics of Markov Chains 1
1.1 Definition and basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Class structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Hitting times and absorption probabilities . . . . . . . . . . . . . . . . . . . . . 7
1.4 The strong Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Branching Processes 24
3.1 Disease models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 The branching process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 The extinction probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Invariant Distributions 31
4.1 Invariant measures and distributions . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Convergence to equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Time reversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 The Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
A Summary 44
A.1 Basics of Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
A.2 Recurrence and transience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
A.3 Branching processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.4 Invariant distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1 BASICS OF MARKOV CHAINS 1
Introduction
A stochastic process is a set of random variables over an indexing set
{Xt : t ∈ T },
Each row represents a distribution, since it contains non-negative entries which sum to 1.
A stochastic matrix contains the rules for a Markov chain – these are the possible states we
can move between and the probability of moving between these states. More formally:
Definition 1.2. Let λ be a distribution and P be a stochastic matrix. We say that (Xn )n⩾0 is
a Markov chain with initial distribution λ and stochastic matrix P if
(i) X0 ∼ λ, and
Theorem 1.1. A discrete-time stochastic process (Xn )n⩾0 is Markov(λ, P ) if and only if for
all i0 , . . . , iN ∈ I, we have
P(X0 = i0 , . . . , XN = iN )
= P(X0 = i0 )P(X1 = i1 | X0 = i0 ) · · · P(XN = iN | X0 = i0 , . . . , XN −1 = iN −1 )
= λi0 pi0 i1 · · · piN −1 iN .
1 BASICS OF MARKOV CHAINS 2
and
P(X0 = i0 , . . . , Xn = in )
P(Xn = in | X0 = i0 , . . . , Xn−1 = in−1 ) =
P(X0 = i0 , . . . , Xn−1 = in−1 )
λi0 pi0 i1 · · · pin−1 in
= = pin−1 in .
λi0 pi0 i1 · · · pin−2 in−1
We now introduce an important distribution, called the unit mass distribution δi := (δij :
j ∈ I) defined by (
1 if i = j,
δij =
0 otherwise.
Theorem 1.2 (Markov property). Let (Xn )n⩾0 ∼ Markov(λ, P ). Then, conditional on
the event {Xm = i}, we have (Xm+n )n⩾0 ∼ Markov(δi , P ) and is independent of the random
variables X0 , . . . , Xm .
Proof. Let
We need to show that there is conditional independence for A and B given that Xm = i. We
have
P(X0 = i0 , . . . , Xm = im , . . . , Xm+n = im+n ) · 1{im =i}
P(A ∩ B | Xm = i) =
P(Xm = i)
λi pi i · · · pim−1 im pim im+1 · · · pim+n−1 im+n · δi,im
= 0 01
P(Xm = i)
λi pi i · · · pim−1 im
= 0 01 δi,im pim im+1 · · · pim+n−1 im+n
P(Xm = i)
= P(A | Xm = i)P(B | Xm = i).
Example 1.1. Let (Xn )n⩾0 be a Markov chain with state space I = {1, . . . , 5} and the following
stochastic matrix. For clarity, we label the rows and columns according to the corresponding
states.
1 2 3 4 5
1 0 1 0 0 0
2 1/2 0 0 1/2 0
P = 3 1/5 0 4/5 0 0 .
4 0 1/3 1/3 0 1/3
5 0 0 0 0 1
Each entry of the matrix is the probability of random variable in the state given by its row
moving to the state given by its column. We can represent this Markov chain by a diagram:
1
1 2 2
1
3
1 1 1 4 5 1
2 3
1 3 1
5 3
4
5
Example 1.2 (Gene mutation model). Let (Xn )n⩾0 be a Markov chain representing a gene
mutating between two states I = {1, 2}, with probability α of mutating from state 1 to state 2
and probability β of mutating from state 2 to state 1. Then the stochastic matrix is given by
1 2
P = 1 1−α α .
2 β 1−β
α
1−α 1 2 1−β
β
We now discuss the probability that a Markov chain is in a given state after n steps. Define
This relies on knowing the values of the matrix P n . This is also a stochastic matrix. The
(n)
(i, j)th entry of P n will be denoted pij .
1 BASICS OF MARKOV CHAINS 4
Proof. (i): We prove by induction on n. The base case n = 1 is easy. Suppose the condition
holds for some n. Now
X
Pi (Xn+1 = j) = P(Xn+1 = j, Xn = k | X0 = i)
k∈I
X
= P(Xn = k | X0 = i)P(Xn+1 = j | Xn = k, X0 = i)
k∈I
(n)
By the inductive hypothesis we have P(Xn = k | X0 = i) = pik and by the Markov property
we have P(Xn+1 = j | Xn = k, X0 = i) = pkj . Hence
X (n) (n+1)
Pi (Xn+1 = j) = pik pkj = pij
k∈I
as required.
(ii): We have
X X
Pλ (Xn = j) = P(Xn = j, X0 = k) = P(X0 = k)P(Xn = j | X0 = k).
k∈I k∈I
(n)
By definition P(X0 = k) = λk , and condition (i) tells us that P(Xn = j | X0 = k) = pkj . So
the sum is equal to
(n)
X
λk pkj = (λP n )j .
k∈I
where α1 , . . . , αd are the eigenvalues of P , such that P = U DU −1 for some invertible matrix U .
We then have
α1n
P n = U Dn U −1 = U .. −1
U
.
αdn
(n)
so pij is of the form
c1 α1n + · · · + cd αdn
for some constants c1 , . . . , cd ∈ C, assuming the eigenvalues are distinct.
To work around possible complex terms, use polar form: if αk = a + ib then αk = reiθ =
(n)
r(cos θ + i sin θ) so αkn = rn (cos(nθ) + i sin(nθ)). Secondly, pij must be real so if any eigenvalues
are complex, we can write them in terms of sin and cos with real coefficients.
(n)
Then we can use boundary conditions, from the first few values of pij , to determine the
constants.
(n)
Example 1.4. We find p11 for the Markov chain with stochastic matrix
0 1 0
P = 0 1/2 1/2 .
1/2 0 1/2
(n)
The eigenvalues are 1, i/2, −i/2 and from this we deduce that p11 has the form
n n
(n) i i
p11 = c1 + c2 + c3 −
2 2
1 BASICS OF MARKOV CHAINS 6
where a, b, c ∈ R.
(n)
For our boundary conditions, we can just write down the first few values of p11 :
(0)
p11 = 1 = a + b,
(1) 1
p11 = 0 = a + c,
2
(2) 1
p11 = 0 = a − b
4
so a = 1/5, b = 4/5, c = −2/5 and
n nπ
(n) 1 1 4 nπ 2
p11 = + cos − sin .
5 2 5 2 5 2
Definition 1.5.
Example 1.5. Consider the following Markov chain, where the arrows denote nonzero proba-
bility.
10
1 2
5 8
4 3 7
6 9
Definition 1.6. Let p ∈ [0, 1]. A simple random walk (SRW) on Z is a Markov chain with
state space I = Z, where for all i, j ∈ I
p
if j = i + 1,
pij = q := 1 − p if j = i − 1,
0 otherwise.
p p p p p p
··· −2 −1 0 1 2 ···
q q q q q q
In other words, you do not need any information about future events beyond Xn to know
that the event {T = n} has taken place.
Definition 1.8. Let (Xn )n⩾0 ∼ Markov(λ, P ) and A ⊆ I. The hitting time of A, denoted H A ,
is a stopping time given by
Definition 1.9. Let (Xn )n⩾0 ∼ Markov(λ, P ) and A ⊆ I. The hitting probability of A from a
state i ∈ I is given by
hA A
i := Pi (H < ∞)
j
If A only contains one state, A = {j}, we denote hA A j
i and H by hi and H respectively.
If A is a closed class, we call hA
i an absorption probability.
Example 1.6. Let A = {2} and consider h21 for the Markov chain given by the stochastic
matrix
0 1/2 1/2 0
1/2 1/2 0 0
P = 1/3
.
1/3 0 1/3
0 0 0 1
To compute this, we split according to the next state obtained when moving to X1 . So
X
h21 = P1 (H 2 < ∞) = P1 (H 2 < ∞; X1 = j)
j∈I
X
= P1 (X1 = j)P1 (H 2 < ∞ | X1 = j)
j∈I
X
= p1j Pj (H 2 < ∞)
j∈I
1 1 1 1
= h22 + h23 = + h23 (2)
2 2 2 2
where, by splitting according to previous states,
1 1 1 1 1
h23 = h21 + h22 + h24 = h21 + . (3)
3 3 3 3 3
Solving the simultaneous equations (2) and (3) give
4 3
h21 = , h23 = .
5 5
1 BASICS OF MARKOV CHAINS 9
hA
i = X
1 if i ∈ A,
A A
hi = pij hj if i ∈
/ A.
j∈I
By substituting sums for xk in the same way and repeating n times, we obtain
xi = Pi (X1 ∈ A) + Pi (X1 ∈
/ A, X2 ∈ A) + . . .
X X
+ Pi (X1 ∈
/ A, . . . , Xn−1 ∈
/ A, Xn ∈ A) + ··· pj1 j2 · · · pjn−1 jn xjn
j1 ∈A
/ jn ∈A
/
Example 1.7 (Gamblers’ ruin). Suppose that a gambler enters a casino with wealth £i and
gambles, £1 at a time, with probability p that their stake is doubled and probability q = 1 − p
of losing it. What is the probability that the gambler will leave broke and how does this depend
on the initial wealth i?
We model this as a truncated SRW with state space I = N and p ∈ (0, 1).
p p
1 0 1 2 ···
q q q
1 BASICS OF MARKOV CHAINS 10
C1 = {1, 2, 3, . . . }, C2 = {0}
for i > 0. This defines a difference equation. Provided that p ̸= q (we will return to the case
p = q later) solving this gives general solution
i
q
h0i =α+β
p
for constants α, β. We have an initial condition h00 = 1, so α + β = 1 and hence we can rewrite
the solution as i !
q
h0i = 1 + β −1 .
p
The constant β depends on the value of p and q. Consider the following cases:
Case 1 : If q > p, we have q/p > 1, so (q/p)i − 1 > 0. To achieve a minimal and non-negative
solution we must have β = 0, otherwise we would have h0i > 1. Hence h0i = 1 for all i ⩾ 0.
Case 2 : If q < p, then q/p < 1, so (q/p)i − 1 < 0. A minimal solution requires a large value of
β. But if β > 1, then 1 − β < 0 which would give a negative solution because β(q/p)i → 0 as
i → ∞ and h0i = 1 − β + β(q/p)i . So we must have β = 1, hence h0i = (q/p)i for all i > 0.
Case 3 : If p = q = 1/2, then the auxiliary equation of the difference equation has repeated
roots so the general solution is instead
h0i = α + iβ
for constants α, β. h00 = 1 gives α = 1 and since h0i ∈ [0, 1] for all i we need β = 0, so h0i = 1
for all i ⩾ 0. Thus, even if a gambler goes to a fair casino with equal probability of winning
and losing their stake, they are certain to end up broke.
Example 1.8 (Birth-death chain). Consider the Markov chain with diagram:
1 BASICS OF MARKOV CHAINS 11
p1 p2
1 0 1 2 ···
q1 q2 q3
hi = pi hi+1 + qi hi−1
for i > 0. This difference equation does not have constant coefficients so we use an alternative
method.
Let ui = h0i−1 − h0i . Then
so
qi qi qi−1
ui+1 = ui = ui−1 ,
pi pi pi−1
and repeating this i times gives
qi qi−1 · · · q1
ui+1 = γi u1 where γi = .
pi pi−1 · · · p1
Then
i
X i
X i
X
h00 − h0i = (h0j−1 − h0j ) = uj = u1 γj−1
j=1 j=1 j=1
Example 1.9. Let A = {4} and consider ki4 for the Markov chain given by the same stochastic
1 BASICS OF MARKOV CHAINS 12
0 0 0 1
Using the same splitting argument
X X X
ki4 = E1 [H 4 ] = p1j E1 [H 4 | X1 = j] = p1j + p1j Ej [H 4 + 1]
j∈I j∈A j ∈A
/
since it takes one step to move into state j, so the hitting time from j is one greater than the
hitting time from i. Splitting further, we have that it is equal to
X X X X
p1j + p1j + p1j kj4 = 1 + p1j kj4 .
j∈A j ∈A
/ j ∈A
/ j ∈A
/
Theorem 1.5 (Expected hitting times). The vector of expected hitting times k A = (kiA :
i ∈ I) for A ⊆ I is the minimal non-negative solution to the system of linear equations
kiA = 0 X if i ∈ A,
A A
ki = 1 + pij kj if i ∈
/ A.
j ∈A
/
Proof. Exercise.
Remark. We have ∞
X
Pi (H A ⩾ k) = Ei [H A ].
k=1
Proof. This can be deduced from the definition of expectation. Let X be a positive, discrete
random variable with values in N. Then (omitting the x = 0 case) we can write the expectation
1 BASICS OF MARKOV CHAINS 13
where we have used the Markov property in the secondS term. The events {T = m} for all
m ∈ N partition the event {T < ∞}, so {T < ∞} = ∞ m=0 {T = m}. So summing over all
values of m for (4) gives
times Sm := Tm − Tm−1 are stopping times for the process (XTm−1 +n )n⩾0 , which we will prove
in Chapter 2.
So by the strong Markov property
which we will denote by pim im+1 . Hence Yn = XTn is a Markov chain with stochastic matrix
P = (pij : i, j ∈ I).
To find these entries, we split by the following move. For all j ∈ / J we have pij = 0. If j ∈ J
then X X
pij = Pi (XT0 = j) = pij + pik Pk (XT0 = j) = pij + pik pkj .
k∈J
/ k∈J
/
For the next example we introduce the notion of probability generating functions. These
will also be used in Chapter 3.
Definition 1.10. Let X : Ω → R be a random variable. The probability generating function
(PGF) of X is defined by
GX (s) := E[sX ] for |s| < 1.
We have the following properties.
(i) Uniqueness: If X, Y are random variables and GX (s) = GY (s) for all s, then X and Y
have the same distribution.
(ii) Independence: If X, Y are independent random variables then GX+Y (s) = GX (s)GY (s).
(iii) We have
dk
lim GX (s) = E[X(X − 1) · · · (X − k + 1) | X < ∞].
s↗1 dsk
(iv) We have
1 dk
lim GX (s) = P (X = k).
s↘0 k! dsk
Example 1.11 (Distribution of hitting times in gamblers’ ruin). Recall from the gam-
blers’ ruin example (Example 1.7) that the hitting times of state 0 are given by
1 if q ⩾ 12 ,
0
hi = q i
p
if q < 12 .
We are interested in finding the probability of a specific hitting time P1 (H 0 = n), if we start
0
at 1. Let ϕ(s) := E1 [sH ] be the PGF of H 0 . The aim is to expand this as a power series in s.
By splitting according to the next states
0 0 +1 0
ϕ(s) = E1 [sH ] = pE2 [S H ] + qE1 [sH | X1 = 0]
0 0
because moving to state 2 increases the hitting time by 1. Now E2 [sH +1 ] = sE2 [sH ] by
factoring s. For the second term we are given X0 = 1 and X1 = 0 so the hitting time is
0
precisely 1, so E1 [sH | X1 = 0] = s. Hence
0 0
E1 [sH ] = psE2 [sH ] + qs. (5)
By the SMP, given {XH 1 = 1} and {H 1 < ∞}, then H e 0 and H 1 are independent, and by
construction are identically distributed. Now since H 0 is finite then H 1 must also be, since the
chain must hit 1 before it hits 0. So
0
= P2 (H 1 < ∞)E2 [sH | H 1 < ∞]
1 +H
e0
= P2 (H 1 < ∞)E2 [S H | H1 < ∞].
P2 (H 1 < ∞)E2 [sH | H 1 < ∞]E2 [sH | H 1 < ∞] = E2 [sH · 1{H 1 <∞} ]E2 [sH | H 1 < ∞]
1 e0 1 e0
Solving gives p
1± 1 − 4pqs2
ϕ(s) =
2ps
1 BASICS OF MARKOV CHAINS 16
so we need to choose a root. By property (iv) above and continuity of PGFs we require
ϕ(0) = lims↘0 ϕ(s) = P1 (H 0 = 0) ∈ [0, 1]. Taking the positive root gives ϕ(s) → ∞ as s ↘ 0.
By l’Hôpital’s rule we see that the negative root is the correct one:
1 1 8pqs s↘0
p −→ 0.
2p 2 1 − 4pqs2
P1 (H 0 = 1) = q, P1 (H 0 = 2) = 0, P1 (H 0 = 3) = pq 2 , P1 (H 0 = 4) = 0, ...
2 RECURRENCE AND TRANSIENCE 17
It is transient if
P(Xn = i for infinitely many n) = 0.
Definition 2.2. The first passage time of state i is the stopping time
and inductively we define the rth passage time for r = 1, 2, 3, . . . by the stopping time
n o
(r) (r−1)
Ti (ω) := inf n > Ti : Xn (ω) = i ,
(0)
and Ti (ω) = 0.
The rth excursion time is defined by
(
(r) (r−1) (r−1)
(r) Ti − Ti if Ti < ∞,
Si :=
∞ otherwise.
In other words the rth passage time is the time of the rth visit to state i, not including 0.
The excursion time is the time between consecutive visits, which is not a stopping time for the
chain (Xn )n⩾0 , but it is for the following:
(r−1) (r)
Lemma 2.1. For r ⩾ 2, conditional on {Ti < ∞}, the time period Si is independent of
(r−1)
all {Xm : m ⩽ Ti } and
(r) (r−1)
P Si = n Ti < ∞ = Pi (Ti = n).
2 RECURRENCE AND TRANSIENCE 18
(r) (r−1)
Proof. The Ti are stopping times, so by the SMP and conditioning on {Ti < ∞} then
(XT (r−1) +n )n⩾0 is Markov(δi , P ) and independent of X0 , . . . , XT (r−1) . By definition
i i
n o
(r)
Si = inf n ⩾ 1 : XT (r−1) +n = i
i
(r)
so Si is the first passage time of the chain (XT (r−1) +n )n⩾0 to state i.
i
We now introduce the random variable Vi which counts all visits to a state i, defined by
∞
1{Xn =i} .
X
Vi :=
n=0
Then ∞ ∞ ∞
Ei 1{Xn =i} =
(n)
X X X
Ei [Vi ] = Pi (Xn = i) = pii .
n=0 n=0 n=0
because both show that there are at least r visits to state i after X0 .
We prove by induction. For r = 0 then
(n)
(ii) If Pi (Ti < ∞) < 1, then i is transient and ∞
P
n=1 pii < ∞.
so i is recurrent, and
∞ ∞ ∞ ∞
(n)
X X X X
k
pii = Ei [Vi ] = Pi (Vi > k) = (fi ) = 1 = ∞.
n=1 k=0 k=0 k=0
so i is transient.
3 4 5 6 7
Can we find out whether state 5 is recurrent without knowing the exact probabilities?
Theorem 2.1 allows us to determine this easily. We have that
Theorem 2.2 (Recurrence and transience are class properties). Let C be a communi-
cating class. Then either all states in C are transient, or all states in C are recurrent.
2 RECURRENCE AND TRANSIENCE 20
(r)
Proof. By Theorem 2.1, if i ∈ C is transient then ∞
P
r=0 pii < ∞. For any j ∈ C, since i ↔ j,
(n) (m)
there exist n, m ∈ N such that pij > 0 and pji > 0. For every r > 0 the law of total
probability gives
(r+m+n)
X (n) (r+m) X X (n) (r) (m) (n) (r) (m)
pii = pik pki = pik pkℓ pℓi ⩾ pij pjj pji (6)
k∈I k∈I ℓ∈I
so
(r+m+n)
(r) pii
pjj ⩽ (n) (m)
,
pij pji
which implies
∞ ∞ (r+m+n) ∞
X (r)
X p ii 1 X (r+m+n)
pjj ⩽ (n) (m)
= (n) (m)
pii <∞
r=0 r=0 pij pji pij pji r=0
so j is transient. This also proves the recurrence case.
The equalities in (6) are known as the Chapman-Kolmogorov equations.
so i is not recurrent.
Theorem 2.4. Every finite closed class is recurrent.
This is a partial converse to Theorem 2.3.
Proof. Let C be a finite closed class and X0 ∈ C. Since the chain cannot leave C, there is some
state i ∈ C visited infinitely often with nonzero probability. So
which implies
Pi (Xn = i for infinitely many n) > 0
but this probability can only be 0 or 1 so it is equal to 1. Hence i is recurrent, so C is recurrent
by Theorem 2.2.
Theorem 2.5. Suppose that P is irreducible and recurrent. Then for every j ∈ I we have
P(Tj < ∞) = 1.
2 RECURRENCE AND TRANSIENCE 21
j
and the Zm are independent for all j, m.
This can be viewed as ‘moving along the diagonals’.
Proposition 2.1. Let (Xn )n⩾0 be a CISRW on Zd . If there exists j ∈ {1, . . . , d} such that
pj ̸= 1/2, then (Xn )n⩾0 is transient (every i ∈ Zd is transient).
Proof. Without loss of generality assume pj > qj . Since {T0 < ∞} ⊆ {T0j < ∞} for some
component j, where 0 denotes a zero on the jth component, then
P0 (T0 < ∞) ⩽ P0 (T0j < ∞) = P0 (T0j < ∞).
By splitting we obtain
P0 (T0j < ∞) = pj P1 (T0j < ∞) + qj P−1 (T0j < ∞)
and results from gamblers’ ruin in Example 1.7 can be used to show that P1 (T0j < ∞) < 1 and
P−1 (T0j < ∞) = 1. So
P0 (T0j < ∞) < pj + qj = 1
which implies the vector state 0 is transient, so all states are transient by irreducibility.
2 RECURRENCE AND TRANSIENCE 22
For a CISRW with all components symmetric (pj = qj = 1/2 for all j) then
Example 2.1 (Symmetric SRW on Z). (Note that recurrence can alternatively be proved
(n)
using results from gamblers’ ruin.) Claim: ∞
P
n=0 p00 = ∞.
(n) P∞ (2n)
Proof. We have ∞
P
n=0 p00 = n=0 p00 because we can only return to 0 in an even number of
steps. Now 2n
(2n) 2n n n (2n)! 1
p00 = p q =
n (n!)2 2
and by Stirling’s formula
√ 2n
(2n) 4πn(2n/e)2n 1 1
p00 ∼ =√
2πn(n/e)2n 2 πn
(2n)
so there exists N ∈ N such that p00 > √1 for all n ⩾ N . So
2 πn
∞ ∞ 2n ∞
X (2n)
X 2n 1 X 1
p00 = ⩾ √ = ∞.
n=0 n=0
n 2 n=N
2 πn
(n)
Example 2.2 (Symmetric SRW on Z2 ). Claim: ∞
P
n=0 p00 = ∞.
(2n)
Proof. Similarly this is equal to ∞
P
n=0 p00 . By component independence we have
(2n)
p00 = P(component 1 returns in 2n steps)P(component 2 returns in 2n steps)
" #2
2n 2 √ 2n
!2
2n 1 (2n)! 1 4πn(2n/e) 1 1
= = 2 4n
∼ 2n 4n
= ,
n 2 (n!) 2 2πn(n/e) 2 πn
(2n) 1
so there exists N ∈ N such that p00 > 2πn
for all n ⩾ N . So
∞ ∞
X (2n)
X 1
p00 ⩾ = ∞.
n=0 n=N
2πn
2 RECURRENCE AND TRANSIENCE 23
(n)
Example 2.3 (Symmetric SRW on Z3 ). Claim: ∞
P
n=0 p00 < ∞.
(2n)
Proof. Similarly this is equal to ∞
P
n=0 p00 and each component returns to 0 independently. So
" #3
2n 3 √ !3
(2n) 2n 1 (2n)! 1 4πn(2n/e)2n 1 1
p00 = = 2
∼ = .
n 2 (n!) 26n 2πn(n/e)2n 26n (πn)3/2
This time since we want to prove convergence we bound above: there exists N ∈ N such that
(2n)
p00 < (πn)2 3/2 for all n ⩾ N . So
∞ N −1 ∞
X (2n)
X (2n)
X 2
0⩽ p00 ⩽ p00 + < ∞.
n=0 n=0 n=N
(πn)3/2
3 Branching Processes
In this chapter we study the long term properties of a transient stochastic process known as
the branching process. As motivation, we describe two models in epidemiology, the SIR and
SIS.
We consider the random variable Xn := (Sn , In ). (Since the population is closed then Rn =
N − Sn − In so we do not need to consider Rn separately.)
The dynamics of the process are as follows:
• If a susceptible comes into contact with an infective in a given time step, they become
infected.
• At every time step, every susceptible individual avoids each infective individual with
probability p, independently of all other interactions. So
Example 3.1 (SIS). In this model there is no recovery/removal – any recovered infectives
become susceptible. Let
An+1 = number of susceptibles at time n who avoided infection over the last time step
Bn+1 = number of infectives at time n who recovered over the last time step
so that
Sn+1 | Sn ∼ Binomial(Sn , pIn ) + Binomial(In , γ)
where Sn+1 = An+1 + Bn+1 , An+1 ∼ Binomial(Sn , pIn ), and Bn+1 ∼ Binomial(In , γ).
What are the transition probabilities? Here we only need Xn = Sn , since In is immediately
3 BRANCHING PROCESSES 25
Example 3.2 (SIR). In this model we have all three categories S, I, R and no return to
susceptibility: here R could represent either death (removal) or immunity (recovery). Here
Xn = (Sn , In ) and we have
where the first term accounts for infectives who do not recover, and the second accounts for
people who become newly infective over the last time step.
The transition probabilities are given by
since there are w−v new infectives, so the number of infectives who do not recover is x−(w−v).
The states {(k, 0) : k ∈ {0, . . . , N }} are all absorbing, because there are no infectives to infect
any susceptibles. All other states are transient and form single communicating classes.
Definition 3.1. The basic reproduction number R0 is the average number of infectives caused
by an infective individual during an early stage of an epidemic (typically in a large population).
where T ∼ Geometric(γ) is the recovery time of the initial infective and Cj is the number of
people they infect in the jth time step. Now Cj ∼ Binomial(N, 1 − p), and we can approximate
3 BRANCHING PROCESSES 26
| ◦ ·{z
Fn (s) = G · · ◦ G}(s) = G(Fn−1 (s)).
n times
Proof. We have
∞
E[sXn+1 · 1{Xn =k} | X0 = 1]
X
Xn+1
Fn+1 (s) = E[s | X0 = 1] =
k=0
∞
X
= P(Xn = k | X0 = 1)E[sXn+1 | Xn = k]
k=0
∞ Pk
Zjn
X
= P(Xn = k | X0 = 1)E[s j=1 | Xn = k].
k=0
Now
F0 (s) = E[sX0 | X0 = 1] = s
so (7) implies that
F1 (s) = F0 (G(s)) = G(s)
and repeating this n − 1 more times gives
Fn (s) = Fn−1 (G(s)) = · · · = G
| ◦ ·{z
· · ◦ G}(s).
n times
3 BRANCHING PROCESSES 27
E[Xn | X0 = 1] = µn .
Proof. By the tower property,
= E[Xn−1 E[Z] | X0 = 1]
= µE[Xn−1 | X0 = 1].
E[Xn | X0 = 1] = µn E[X0 | X0 = 1] = µn .
so we only need to consider the extinction probability for one individual’s descendants. Now
∞
!
[
P(Xn = 0 for some n | X0 = 1) = P {Xm+j = 0 for all j ⩾ 0} X0 = 1
m=1
= lim P(Xm+j = 0 for all j ⩾ 0 | X0 = 1)
m→∞
= lim P(Xm = 0 | X0 = 1)
m→∞
= lim Fm (0)
m→∞
= lim G(Fm−1 (0)).
m→∞
We can now determine the extinction probability, but first we cover two side cases:
• If G(0) = P(Z = 0) = 0, then xn = 0 for all n, so the population cannot die out and the
extinction probability is 0.
Proof. We have proved the cases for G(0) = 0 or 1 above; now assume G(0) ∈ (0, 1). Since
∞
X
G(s) = P(Z = 0) + sk P(Z = k)
k=1
then ∞
X
G′ (s) = ksk−1 P(Z = k) > 0
k=1
so G is strictly increasing on (0, 1). Now, let x0 = 0 (consistent with the definition) then
x1 = G(0) > 0 = x0 , so by Proposition 3.3
and iterating repeatedly shows that the sequence {xn }n⩾0 is strictly increasing, but it is also
bounded above by 1, so it converges to some limit α. By Proposition 3.1 and continuity of G
we have
α = lim xn = lim G(xn−1 ) = G(α).
n→∞ n→∞
β ⩾ lim xn = α
n→∞
Solution:
1. We have
p
α = G(α) ⇔ α=
1 − α(1 − p)
⇔ p = α − α2 (1 − p)
α p
⇔ 0 = α2 − +
1−p 1−p
p
⇔ α = 1 or ,
1−p
so we choose the minimal root
(
1 if p ⩾ 12 ,
α= p
1−p
if p < 12 .
2. This is just the previous answer raised to the power of M , by independence of offspring.
Theorem 3.2 (Mean criterion for extinction). Suppose that G(0) > 0, and µ := E[Z] =
G′ (1), with Z finite almost surely. Then
Remark. Since the branching process is a Markov chain, then certain extinction means P1 (T0 <
∞) = 1. Not certain extinction means P1 (T0 < ∞) < 1.
Proof. In the proof of Theorem 3.1 we showed that G is strictly increasing, and positive. Since
P(Z < ∞) = 1, then G(1) = 1. Also
∞
X
′′
G (s) = j(j − 1)sj−2 P(Z = j) ⩾ 0
j=2
Example 3.4.
Now µ > 1 ⇔ p < 1/2, so extinction is not certain if p < 1/2 and is certain if p ⩾ 1/2, as
shown in the previous solution.
4 Invariant Distributions
In this chapter we aim to understand long-term behaviour of Markov chains in recurrent cases.
Theorem 4.1. Let (Xn )n⩾0 ∼ Markov(π, P ) where π is an invariant distribution for P . Then
(Xm+n )n⩾0 ∼ Markov(π, P ) for all m ⩾ 0.
Proof. By the Markov property, (Xm+n )n⩾0 is Markov with stochastic matrix P . Then
(n)
and since pij ∈ [0, 1] for all j then so is πj . Now
hence π is invariant.
Example 4.2. Recall the gene mutation model from Example 1.2 where
1−α α
P = .
β 1−β
4 INVARIANT DISTRIBUTIONS 32
We found that
(n) β α
p11 = + (1 − α − β)n .
α+β α+β
Taking limits as n → ∞ we obtain an invariant distribution
β α
π= , .
α+β α+β
π1 (1 − α) + π2 β = π1
π1 α + π2 (1 − β = π2
π1 + π2 = 1
and the answer follows. In this case, the invariant distribution is unique.
In Example 4.1 all states are recurrent and P is not irreducible. Observe that any π =
(x, x, 1 − 2x) for x ∈ [0, 1] gives an invariant distribution. The key here is irreducibility, which
we aim to prove next.
Definition 4.2. The expected time spent in state i between consecutive visits to state k is
"T −1 #
k
1{Xn =i}
X
γik := Ek
n=0
1{Xm =i} =
X X X
= pij Ek pij γik = (γ k P )j
i∈I m=0 i∈I
so γ k = γ k P .
(n) (m)
(ii): Since P is irreducible then for every i ∈ I there exist m, n such that pik , pki > 0. Then
(m) (m)
X
γik = γjk pji ⩾ γkk pki > 0,
j∈I
and
Tk
1{Xn =i} < ∞
X
γik =
n=1
since Tk < ∞.
Theorem 4.4 (Uniqueness of measure). Let P be irreducible and λ an invariant measure
for P , with λk = 1 for some k. Then λ ⩾ γ k . If P is also recurrent, then λ = γ k .
Proof. For each j ∈ I, we have
X X X X
λj = λi0 pi0 j = λi0 pi0 j + pkj = λi1 pi1 i0 pi0 j + pkj + pki0 pi0 j .
i0 ∈I i0 ̸=k i0 ,i1 ̸=k i0 ̸=k
µ := λ − γ k
(n)
is also an invariant measure. By irreducibility, for every i ∈ I there exists n with pik > 0, so
(n) (n)
X
0 = µk = µj pjk ⩾ µi pik ⩾ 0
j∈I
mi := Ei [Ti ] < ∞.
A state which is recurrent but not positive recurrent is said to be null recurrent.
So we always expect to visit positive recurrent states in finite time.
Theorem 4.5 (Invariance equivalences). Let P be irreducible. The following are equivalent:
Moreover, πi = 1/mi .
While not explicitly stated, Theorem 4.4 implies that π is unique in (iii).
Proof. (i) ⇒ (ii) is obvious.
(ii) ⇒ (iii): The state i is positive recurrent so it is also recurrent. By Theorem 4.3, γ i is an
invariant measure. Then
"T −1
i −1
# " #
i X TX
1{Xk =j} = Ei 1{Xk =j}
X X X
γji = Ei
j∈I j∈I k=0 j∈I k=0
"T −1 #
Xi
= Ei 1 = Ei [Ti ] = mi < ∞
k=0
from which we obtain π = (2/7, 3/7, 2/7). This implies all states are positive recurrent, hence
π is unique.
Note that the system πP = π will always include a redundant equation: the distribution
condition is needed to determine π uniquely.
Example 4.4 (SRW on Z). Assume p ∈ (0, 1), so that we have irreducibility, and p ̸= q.
If there was an invariant distribution then all states would be positive recurrent and therefore
recurrent, which is a contradiction. By Proposition 2.1 the SRW is transient. We can confirm
this by solving for an invariant measure: if λP = λ then for all i ∈ Z
1 p
λi = pλi−1 + qλi+1 ⇒ λi+1 − λi + λi−1 = 0
q q
from which we obtain i
p
λi = A + B .
q
So λ is an invariant measure for all A, B ⩾ 0 but it cannot be normalised, so there is no
invariant distribution.
Now consider the symmetric case, p = q = 1/2. Solving λP = λ gives
λi = A + iB.
winning the (i + 1)th game having won the previous ith game is pi . Then
p i
if j = i + 1,
pij = qi := 1 − pi if j = 0,
0 otherwise.
So
q0 p0 0 0 0 · · ·
q1 0 p1 0 0 · · ·
P = q 0 0 p .
2 2 0 · · ·
.. .. ..
..
. . . .
This is irreducible. If πP = π then
∞
X
π0 = πk q k ,
k=0
k−1
!
Y
πk = πk−1 pk−1 = · · · = π0 pj .
j=0
Case 3 : If ∞
P Qk−1 P∞ Qk−1
k=1 j=0 pj < ∞ then we can set C = k=1 j=0 pj , so that
k−1
1 1 Y
π0 = , πk = pj .
1+C 1 + C j=0
Theorem 4.6 (Convergence to equilibrium). Let P be irreducible and aperiodic with in-
variant distribution π. Let λ be any distribution. If (Xn )n⩾0 ∼ Markov(λ, P ) then
P(Xn = j) → πj as n → ∞.
In particular
(n)
pij → πj as n → ∞
for all i, j ∈ I.
Proof. Non-examinable.
Example 4.7. Let Xn be the weather type on the nth day:
(
1 if it is dry,
Xn =
2 if it is wet.
4 INVARIANT DISTRIBUTIONS 38
Theorem 4.7. Let P be irreducible with invariant distribution π. Suppose that (Xn )0⩽n⩽N ∼
Markov(π, P ) and set Yn = XN −n . Then (Yn )0⩽n⩽N ∼ Markov(π, Pb), where Pb = (p̂ij : i, j ∈ I)
is given by
πj p̂ji = πi pij ,
and Pb is also irreducible with invariant distribution π.
Proof. By irreducibility of P , we have πi > 0 for all i ∈ I.
First, we show that Pb is a stochastic matrix:
X 1 X
p̂ji = πi pij
i∈I
πj i∈I
and since πP = π then the RHS sum is equal to πj , so the above is equal to 1.
Secondly, we show that π is invariant for Pb:
X X X
(π Pb)i = πj p̂ji = πi pij = πi pij = πi .
j∈I j∈I j∈I
Definition 4.6. The chain (Yn )0⩽n⩽N defined above is called the time reversal of (Xn )0⩽n⩽N .
Definition 4.7. A stochastic matrix P and measure λ are said to be in detailed balance if
Definition 4.8. Suppose (Xn )n⩾0 is Markov(λ, P ) where P is irreducible. We say that (Xn )n⩾0
is reversible if for every N ⩾ 1, the chain (XN −n )0⩽n⩽N is also Markov(λ, P ).
Theorem 4.8. Let (Xn )n⩾0 ∼ Markov(λ, P ) where P is irreducible. The following are equiva-
lent:
λj pji
pij = p̂ij = for all i, j ∈ I
λi
so P and λ are in detailed balance.
Conversely, if P and λ are in detailed balance then by Lemma 4.2 λ is invariant for P , and
by Theorem 4.7,
λj pji λi pij
p̂ij = = = pij for all i, j ∈ I.
λi λi
Example 4.8 (A non-reversible chain). Let
0 2/3 1/3
P = 1/3 0 2/3 .
2/3 1/3 0
4 INVARIANT DISTRIBUTIONS 40
Example 4.9 (Reflected gamblers’ ruin). Consider the gamblers’ ruin problem, but the
gambler has probability p of recovering once going broke, and q of remaining broke. Assume
p ∈ (0, 1/2).
p p p
q 0 1 2 ···
q q q
Example 4.10 (Random walk on a graph). Let us represent a friend network via a finite,
connected graph. We have 8 people represented by vertices 1–8. We put an edge between i and
j if they have each other in their contacts.
1 3
8 4
7 5
6
4 INVARIANT DISTRIBUTIONS 41
Person 1 starts a rumour by telling one person in their contacts, chosen at random with
uniform probability, that every proof in ST202 is examinable. The person who receives this
does the same to their contacts, and so on. What is the long-run probability πi that at a given
time step, person i is the latest person to receive the rumour?
Let Vi represent how many people person i has in their contacts. Then
(
1
if (i, j) is an edge,
pij = Vi
0 otherwise.
We can solve detailed balance to find an invariant distribution π. By symmetry, we have pij ̸= 0
if and only if pji ̸= 0, so we only need to consider cases where i and j have each other in their
contacts:
1 1
λi pij = λj pji ⇒ λi = λj .
Vi Vj
A solution to this is given by λi = Vi . Since this distribution is finite, we can normalise it to
obtain
Vi
πi = P .
j∈I Vj
Theorem 4.9 (Strong Law of Large Numbers). Let Y1 , Y2 , . . . be a sequence of i.i.d. non-
negative random variables with E[Yi ] = µ < ∞. Then
n
!
1X
P Yj → µ as n → ∞ = 1.
n j=1
Moreover, if (Xn )n⩾0 is positive recurrent, then for any f : I → R such that Eπ (|f (X)|) < ∞,
we have !
n−1
1X
P f (Xk ) → f as n → ∞ = 1
n k=0
where X
f := Eπ (f (X)) = πi f (i).
i∈I
Proof. Non-examinable.
Example 4.11. A student on ST202 is known to buy large quantities of milk at Rootes Grocery
Store, causing trouble for their milk supplies. Let Xn denote their attendance to lectures on
the nth day, with the course assumed to be life-long, where
(
0 if they do not attend,
Xn =
1 if they attend
and
1/5 4/5
P = .
1/3 2/3
If they attend, milk is restocked at Rootes with probability 9/10. Otherwise it is restocked
with probability 3/10. In the long run, what is the probability that milk needs to be restocked
on a given day?
Define a function (
3
10
if X = 0,
f (X) = 9
10
if X = 1
which represents the probability that milk is restocked. By solving πP = π, we obtain an
invariant distribution
5 12
π= , .
17 17
Since P is irreducible, then the existence of an invariant distribution implies positive recurrence
by Theorem 4.5.
So we can apply the Ergodic Theorem to obtain
n−1
1X n→∞ 5 12 123
f (Xi ) −→ Eπ (f (X)) = f (0) + f (1) = ≈ 0.72.
n i=0 17 17 170
4 INVARIANT DISTRIBUTIONS 43
We set a starting value j ∈ Z and set X0 = j. The following algorithm is known as the
Metropolis-Hastings algorithm. For each n ⩾ 1:
1. Generate z ∼ q.
2. Set Q = Xn−1 + z.
3. Calculate
πQ q−z
αXn−1 ,Q := min 1, .
πXn−1 qz
The process essentially says that the next step of the chain is set to be the proposed variable
with probability αXn−1 ,Q . If the desired distribution is higher at the proposed step than at the
previous step, then the ratio will be greater than 1 so the probability of moving to the proposed
step is 1. If the desired distribution is lower at the proposed step, we are less likely to move to
it: the probability in this case is less than 1. The aim is that repeating the algorithm enough
times generates a Markov chain with convergence to π, assuming irreducibility.
The Ergodic Theorem allows us to establish properties of π, such as cdf and expectation.
(For more detail, see Problem Sheet 5, Question 3.)
A SUMMARY 44
A Summary
A.1 Basics of Markov chains
Definition (Stochastic matrix). A matrix P = (pij : i, j ∈ I) is a stochastic matrix if
(i) 0 ⩽ pij ⩽ 1 ∀ i, j ∈ I
P
(ii) ∀ i ∈ I, we have j∈I pij = 1
Theorem (Markov property). Let (Xn )n⩾0 ∼ Markov(λ, P ). Then, conditional on the event
{Xm = i}, we have (Xm+n )n⩾0 ∼ Markov(δi , P ) and is independent of the random variables
X0 , . . . , X m .
(n)
Theorem (n-step probabilities). Let (Xn )n⩾0 ∼ Markov(λ, P ) and denote by pij =: (P n )ij
the (i, j)th entry of P n . Then
(n)
(i) Pi (Xn = j) = pij
Definition (Simple random walk). Let p ∈ [0, 1]. A simple random walk (SRW) on Z is a
Markov chain with state space I = Z, where ∀ i, j ∈ I,
p
if j = i + 1,
pij = q := 1 − p if j = i − 1,
0 otherwise.
Definition (Hitting time). Let (Xn )n⩾0 ∼ Markov(λ, P ) and A ⊆ I. The hitting time of A,
denoted H A , is a stopping time given by
Definition (Hitting probability, expected hitting time). Let (Xn )n⩾0 ∼ Markov(λ, P )
and A ⊆ I. The hitting probability of A from a state i ∈ I is given by
hA A
i := Pi (H < ∞)
hA
i = X
1 if i ∈ A,
A
hA
i = pij hj if i ∈
/ A.
j∈I
A SUMMARY 46
p p
1 0 1 2 ···
q q q
Theorem (Expected hitting times). The vector of expected hitting times k A = (kiA : i ∈ I)
for A ⊆ I is the minimal non-negative solution to the system of linear equations
kiA = 0 X if i ∈ A,
A
kiA = 1 + pij kj if i ∈
/ A.
j ∈A
/
Theorem (Strong Markov property). Let (Xn )n⩾0 ∼ Markov(λ, P ) and let T be a stopping
time for (Xn )n⩾0 . Then, conditional on {XT = i} and {T < ∞}, we have (XT +n )n⩾0 ∼
Markov(δi , P ) and is independent of X0 , . . . , XT .
It is transient if
P(Xn = i for infinitely many n) = 0.
Definition (Passage times, excursion times). The first passage time of state i is the
stopping time
Ti (ω) := inf{n ⩾ 1 : Xn (ω) = i}
and inductively we define the rth passage time for r = 1, 2, 3, . . . by the stopping time
n o
(r) (r−1)
Ti (ω) := inf n > Ti : Xn (ω) = i ,
(0)
and Ti (ω) = 0.
The rth excursion time is defined by
(
(r) (r−1) (r−1)
(r) Ti − Ti if Ti < ∞,
Si :=
∞ otherwise.
A SUMMARY 47
(r−1) (r)
Lemma. For r ⩾ 2, conditional on {Ti < ∞}, the time period Si is independent of all
(r−1)
{Xm : m ⩽ Ti } and
(r) (r−1)
P Si = n Ti < ∞ = Pi (Ti = n).
Lemma. Let ∞
1{Xn =i} ,
X
Vi := fi := Pi (Ti < ∞).
n=0
For r = 0, 1, 2, . . . we have
Pi (Vi > r) = (fi )r ,
i.e. Vi ∼ Geometric(1 − fi ).
(n)
(ii) If Pi (Ti < ∞) < 1, then i is transient and ∞
P
n=1 pii < ∞.
Theorem. Suppose that P is irreducible and recurrent. Then ∀ j ∈ I we have P(Tj < ∞) = 1.
Proposition. Let (Xn )n⩾0 be a CISRW on Zd . If ∃ j ∈ {1, . . . , d} such that pj ̸= 1/2, then
(Xn )n⩾0 is transient (every i ∈ Zd is transient).
For a CISRW with all components symmetric (pj = qj = 1/2 ∀ j ∈ {1, . . . , d}) then
• all states are recurrent for d ⩽ 2
• all states are transient for d ⩾ 3
A SUMMARY 48
Example 3.2 (SIR). All components included. The states {(k, 0) : k ∈ {0, . . . , N }} are all
absorbing. All other states are transient and form single communicating classes.
Definition (Basic reproduction number). The basic reproduction number R0 is the average
number of infectives caused by an infective individual during an early stage of an epidemic
(typically in a large population).
When N is large we have the approximation
N (1 − p)
R0 ≈ .
γ
The branching process. Let Xn represent the number of individuals in a population at time
n. At each time step, each individual in the population gives birth to a random number of
offspring which make up the next generation. So
Xn
i.i.d.
X
Xn+1 = Zin , Zin ∼ Z
i=1
where Z is the common offspring distribution, and Zin denotes the number of offspring from the
ith individual at time n. So (Xn )n⩾0 is a Markov chain with state space N, and 0 an absorbing
state.
A SUMMARY 49
G(s) := E[sZ ]
Fn (s) := E[sXn | X0 = 1].
Proposition. We have
Fn (s) = |G ◦ ·{z
· · ◦ G}(s) = G(Fn−1 (s)).
n times
E[Xn | X0 = 1] = µn .
Proposition. Let
xn = P(Xn = 0 | X0 = 1) = Fn (0).
Then x1 = G(0), and xn+1 = G(xn ) for n ⩾ 2.
Theorem (Mean criterion for extinction). Suppose that G(0) > 0, and µ := E[Z] = G′ (1),
with Z finite almost surely. Then
(i) if µ ⩽ 1, extinction is certain;
(ii) if µ > 1, extinction is not certain.
Theorem. Let (Xn )n⩾0 ∼ Markov(π, P ) where π is an invariant distribution for P . Then
(Xm+n )n⩾0 ∼ Markov(π, P ) ∀ m ⩾ 0.
(n)
Theorem. Let I be finite and suppose that ∃ i ∈ I such that pij → πj as n → ∞ ∀ j ∈ I.
Then π := (πj : j ∈ I) is an invariant distribution.
Definition (Expected time in a state). The expected time spent in state i between consec-
utive visits to state k is "T −1 #
k
1{Xn =i}
X
γik := Ek
n=0
(i) γkk = 1
mi := Ei [Ti ] < ∞.
A state which is recurrent but not positive recurrent is said to be null recurrent.
Moreover, πi = 1/mi .
(n)
Proposition. A state i ∈ I is aperiodic if and only if ∃ N ∈ N with pii > 0 ∀ n ⩾ N .
P(Xn = j) → πj as n → ∞.
In particular
(n)
pij → πj as n → ∞ ∀ i, j ∈ I.
A SUMMARY 51
Theorem. Let P be irreducible with invariant distribution π. Suppose that (Xn )0⩽n⩽N ∼
Markov(π, P ) and set Yn = XN −n . Then (Yn )0⩽n⩽N ∼ Markov(π, Pb), where Pb = (p̂ij : i, j ∈ I)
is given by
πj p̂ji = πi pij ,
and Pb is also irreducible with invariant distribution π.
Definition (Time reversal). The chain (Yn )0⩽n⩽N defined above is called the time reversal
of (Xn )0⩽n⩽N .
Theorem. Let (Xn )n⩾0 ∼ Markov(λ, P ) where P is irreducible. Then (Xn )n⩾0 is reversible if
and only if P and λ are in detailed balance.
be the number of visits to state i before time n. If (Xn )n⩾0 ∼ Markov(λ, P ), then
Vi (n) 1
P → as n → ∞ = 1.
n mi
Moreover, if (Xn )n⩾0 is positive recurrent, then for any f : I → R such that Eπ (|f (X)|) < ∞,
we have !
n−1
1X
P f (Xk ) → f as n → ∞ = 1
n k=0
where X
f := Eπ (f (X)) = πi f (i).
i∈I