Markov Chains
Markov Chains
applications
Sourav Majumdar
Markov Chains
September 4, 2024
1 Stochastic Processes
A sequence of random variables Xi , i ∈ J is said to be a stochastic process, where J is some indexing
set. For e.g., if J = N, then the sequence of random variables is X1 , X2 , X3 , . . ..
Exercises
1. Suppose X1 , X2 , . . . , Xn are iid random variables. Is Xi , i = 1, 2, . . . , n a stochastic process?
The set J could be discrete, or continuous. Usually J is interpreted to be time. However J can
be other kinds of sets. The other most common choice of J is space. So Xi , i ∈ J, say J ⊆ R2 , could
be random variables distributed over some spatial domain. For, e.g., the distribution of temperature
across geography. These are called spatial processes. Another possible choice for J is a cartesian
product of multiple sets. The most commonly seen one is space-time processes, i.e., J = R2 × [0, t).
These are called spatio-temporal processes. Extending the earlier example, spatio-temporal processes
as an example could study variations in temperature across geography through time.
We refer to the values that Xi takes as the states of the stochastic process. The states could be
discrete or continuous, countable or uncountable. We shall explore some of these in the course.
2 Markov Chains
Let S be the set of states. In this course, for Markov Chains, we will assume that S is discrete.
Consider a sequence of random variables Xn , n ≥ 0, n ∈ N. We say that these random variables
satisfy the Markov Property iff,
1
This means that the probability of going from state x to y is independent of the time at which
it is occurring.
Remark 1. Homogeneous Markov chains are also alternatively referred to as Markov Chains with
stationary transition probabilities. Note that the notion of a stationary distribution, which
shall be introduced later, is different from that of stationary transition probability. It is possible for a
Markov Chain to have a stationary transition probabilities but not possess a stationary distribution.
These concepts shouldn’t be confused.
Remark 2. In all subsequent mentions in these notes, unless stated otherwise, it will be assumed
that the Markov Chains are time-indexed and that they are time-homogeneous.
For a Markov chain Xn , n ≥ 0, we call the function P(x, y) = P(X1 = y|X0 = x), x, y ∈ S, the
transition function. The transition function of a Markov chain satisfies the following properties,
1. P(x, y) ≥ 0, x, y ∈ S, follows from the fact that probabilities are non-negative.
P
2. y P(x, y) = 1, ∀x ∈ S (Prove!)
Exercises
1. Show that P(x, y) = P(Xn = y|Xn−1 = x), ∀n ∈ N.
2. Show that iid Xi form a Markov Chain.
At t = 0, the state of the Markov Chain is given by the initial distribution, π0 (x). It follows
that,
Suppose S is finite with d + 1 states, then we can represent the transition probabilities in the
form of a matrix, which is referred to as the transition matrix. This is given by,
0 ··· d
0
P(0, 0) ··· P(0, d)
P= .
.
.. .. ..
.
. . .
d P(d, 0) ··· P(d, d)
0 1
0 0.9 0.1
1 0.2 0.8
2
The way to interpret this is as following. The first entry of 0.9 states that if today is a non-humid
day, the probability that tomorrow is also non-humid is 0.9. Likewise, interpret the rest of the entries
of the matrix.
A question of interest could be, in the long-term what proportion of days in Kanpur are humid
days?
where P(Xj = 1) = p and P(Xj = −1) = 1 − p, and Xj ’s are iid. Verify that Si is a Markov
Chain. If p = 1/2, it is said to be a simple symmetric random walk.
What would be the limiting behavior of a simple random walk, when the time between each jump
is small?
Example 2.3 (Brand Preference). Suppose there are three toothpaste brands in the market. Cus-
tomers either switch or buy the same brand again. Assume that this behavior could be modelled as
a Markov Chain,
1 2 3
1
"0.8 0.1 0.1#
2 0.2 0.6 0.2
3 0.3 0.3 0.4
Would the market shares of the three brands stabilize, or would it keep switching? If it would
what would be their resulting proportions?
Example 2.4 (Disability insurance). The disability insurer provides insurance to their client, when
they are disabled and collects premium from them when they are healthy. There are three primary
states of interest to the disability insurer: Healthy, Disabled, and Deceased. Suppose the movement of
their clients follow a Markov chain. Can you identify some entries of the transition matrix without
any additional information?
Example 2.5 (Creating a Markov Chain). Suppose the rain today depends on whether or not it
rained through the last two days. If it has rained for the last two days, then it will rain again
tomorrow with the probability 0.7; if it rained today but not yesterday, then it will rain tomorrow
with probability 0.5; if it rained yesterday but not today, then it will rain tomorrow with probability
0.4; if it has not rained in the past two days, then it will rain tomorrow with probability 0.2. Let
Xn = 1 denote the event that there is rain on n-th day.
P(Yn |Yn−1 , Yn−2 , . . . , Y1 ) = P(Xn , Xn−1 |Xn−1 , Xn−2 , . . . , X0 ) = P(Xn , Xn−1 |Xn−1 , Xn−2 ) = P(Yn |Yn−1 )
3
3. Rained yesterday but not today (NR)
4. Didn’t rain either yesterday or today (NN)
RR RN NR NN
0.7 0 0.3 0
RR
RN 0.5 0 0.5 0
NR
0 0.4 0 0.6
NN 0 0.2 0 0.8
given,
However (4) could fail to hold for some k. Suppose we denote the smallest k for which it fails to
hold by k ∗ . Formally this is expressed as,
k ∗ = inf{k ≥ 0 : P(X0 = x0 , . . . , Xk = xk ) = 0}
4
Then from (3) it follows that,
and,
We note that the claim of the theorem still holds, because if P(X0 = x0 , . . . , Xk∗ = x∗k ) = 0, then
P(X0 = x0 , . . . , Xk∗ = x∗k , . . . , Xn = xn ) = 0 and P(xk∗ −1 , xk∗ ) is a factor on the RHS, setting the
RHS as well to 0.
Higher-order transitions
P(x, y) gives the probability of a Markov Chain going from state x to y in one time-step. We may
be interested in knowing the probability of going from x to y in m ≥ 1 time-steps. In other words,
if at present it is at x, what is the probability of it being in y after m time steps? The knowledge
of P and π is sufficient to do these calculations for a Markov Chain.
Proposition 2.2. P(Xn+1 = xn+1 , . . . , Xn+m = xn+m |X0 = x0 , . . . , Xn = xn ) = P(xn , xn+1 ) . . . P(xn+m−1 , xn+m )
Proof. Left as an exercise.
Suppose we are interested in the probability that the chain is at a particular state after two-time
steps, this probability is called 2-step transition probability. Let us assume that the chain is at
time step n and we are interested in the probability of an event at time step n + 2, i.e.
We will denote this probability as P(2) (x, y) = P(Xn+2 = y|Xn = x) = s∈S P(x, s)P(s, y). We
P
note that P(2) (x, y) is independent of n, therefore the expression that we calculated holds for any n.
In a similar fashion suppose we are interested in the m-step transition probability. We shall
denote it by P(m) (x, y) = P(Xn+m = y|Xn = x).
5
P(m) (x, y)
=P(Xn+m = y|Xn = x)
=P(Xn+m = y, Xn+m−1 ∈ S, Xn+m−2 ∈ S, . . . , Xn+1 ∈ S|Xn = x)
X X X
= ... P(Xn+m = y, Xn+m−1 = sm−1 , Xn+m−2 = sm−2 , . . . , Xn+1 = s1 |Xn = x)
s1 ∈S s2 ∈S sm−1 ∈S
X X X
= ... P(x, s1 )P(s1 , s2 ) . . . P(sm−1 , y)
s1 ∈S s2 ∈S sm−1 ∈S
Remark 3. Note that P(m) (x, y) ̸= (P(x, y))m . We shall see a simple way of calculating this through
the Chapman-Kolmogorov Equation.
Exercises
1. Show that P(Xn+m = y|X0 = x, Xn = z) = P(m) (z, y)
Example 2.6 (Gambler’s Ruin). A gambler at a Casino is playing each round where she can win
1 unit of wealth with probability 0.4 and lose 1 unit of wealth with probability 0.6. She stops playing
once she has reached 4 units of wealth and once she reaches 0 units of wealth, she again stops playing,
as she is ruined. Let Xn denote the wealth of the gambler after n rounds, assume that Xn is a Markov
chain, then the transition probabilities are,
0 1 2 3 4
0
1 0 0 0 0
1 0.6 0 0.4 0 0
0 0.6 0 0.4 0
2
3
0 0 0.6 0 0.4
4 0 0 0 0 1
Notice that {1} and {4} are absorbing states as P(1, 1) = P(4, 4) = 1.
Exercises
1. For the gambler’s ruin problem compute the following P(X0 = 1, X1 = 2, X2 = 3, X3 = 2, X4 =
3).
2. For the gambler’s ruin problem compute P(X5 = 4|X0 = 1). What is P(X10 = 4|X5 = 1)?
Chapman-Kolmogorov Equation
Theorem 2.1. P(n+m) (x, y) = P(n) (x, z)P(m) (z, y)
P
z
6
Proof.
This is nothing but the (x, y)-th entry of the square of the transition matrix P. Extending the
logic, we get that the probability of going from x to y in m-steps is the (x, y)-th entry in the m-th
power of the transition matrix, P.
Proof.
X X X
P(Xn = y) = P(X0 = x, Xn = y) = P(X0 = x)P(Xn = y|X0 = x) = π0 (x)P(n) (x, y)
x x x
To calculate P(3) (1, 2), i.e., starting at {1} and reaching {2} after 3 steps in the gambler’s ruin
problem, we first evaluate P(3) = P × P × P,
P(3) = P × P × P
0 1 2 3 4
0
1 0 0 0 0
1 0.714 0 0.192 0 0.064
= 2 0.360
0.288 0 0.192 0.160
3
0.216 0 0.288 0 0.496
4 0 0 0 0 1
Hitting times
Note that the m-step transition probability gives us the probability of going from x to y in m-steps.
However, it is possible that the Markov chain already visits y once or more before the m-th step.
Suppose we are interested in understanding when does the Markov chain visit y for the first time,
for this we will study the notion of hitting times.
Let A ⊆ S. The hitting time TA of A is defined by,
TA = min(n > 0 : Xn ∈ A)
7
if Xn ∈ A for some n > 0, and by TA = ∞ if Xn ∈ / A, ∀n > 0.
We will define a new notation, Px (A) = P(A|X0 = x), for some event A, to denote that the
Markov chain starts at x, i.e. X0 = x.
To make the definition more explicit, consider Ty = 5, i.e., the event where chain reaches (hits)
the state y for the first time at the 5-th step, so,
Px (Ty = 5) = Px (X1 ̸= y, X2 ̸= y, X3 ̸= y, X4 ̸= y, X5 = y)
Hitting times play a prominent role in both theoretical and applied analysis of Markov chains.
We will explore the application of the notion to classification of states subsequently. In finance,
continuous state space, continuous time generalizations of the Markov chain, which are called Markov
processes, are used to study stock prices. Notion of hitting times play an important role there.
Particularly, the question of when a stock price would hit a particular value for the first time
can be studied through hitting times. The answer to this question has implications in drawdown
calculations, risk management, trading, etc.
Pn
Proposition 2.4. P(n) (x, y) = m=1 Px (Ty = m)P(n−m) (y, y)
Proof. The events {Ty = m, Xn = y}, 1 ≤ m ≤ n are disjoint. To see this observe that for some
1 ≤ k < l < n,
and
{Ty = l, Xn = y} = {X0 ̸= y, X1 ̸= y, . . . , Xk ̸= y, . . . , Xl−1 ̸= y, Xl = y, Xn = y}
and therefore {Ty = k, Xn = y} and {Ty = l, Xn = y} are disjoint. Therefore, {Ty = m, Xn =
y}, 1 ≤ m ≤ n are disjoint.
It also follows that,
{Xn = y} = ∪nm=1 {Ty = m, Xn = y}
Proposition 2.5. X
Px (Ty = n + 1) = P(x, z)Pz (Ty = n)
z̸=y
Proof. Note that when the hitting time is 1, this is equivalent to the 1-step transition probability,
8
and,
Px (Ty = 2) = Px (X1 ̸= y, X2 = y)
X
= Px (X1 = z, X2 = y)
z̸=y
X
= P(x, z)P(z, y)
z̸=y
Px (Ty = n + 1) = P(Xn+1 = y, Xn ̸= y, . . . , X1 ̸= y, X0 = x)
= P(Xn+1 = y, Xn ̸= y, . . . , X2 ̸= y|X1 ̸= y, X0 = x)P(X1 ̸= y, X0 = x)
X
= P(Xn+1 = y, Xn ̸= y, . . . , X2 ̸= y|X1 = z, X0 = x)P(X1 = z, X0 = x)
z̸=y
X
= Pz (Ty = n)P(x, z)
z̸=y
For the gambler’s ruin problem, P1 (T3 = 1) = P(1, 3) = 0, i.e., the probability of hitting {3} in
one step from {1}. Similarly,
Next,
and so on.
9
Example 2.7 (Absorbing states are recurrent).
1 2 3
1
"1 0 0#
2 0 1 0
3 0 0 1
Example 2.8 (All states are recurrent, but none are absorbing).
1 2 3
0 1 0
" #
1
2 0 0 1
3 1 0 0
Notice {1} is a transient state, since it can leave {1} with a positive probability and once it leaves
{1}, it can never reach {1} again. {2} and {3} are recurrent states, but not absorbing states.
Let Iy (z), z ∈ S, denote the indicator function, where Iy (z) = 1, z = y and Iy (z) = 0, z ̸= y.
Let N (y) be the number of times n ≥ 1 that the chain visits y. We can write it as,
∞
X
N (y) = Iy (Xn )
n=1
where if the chain is at y at time n, Iy (Xn ) = 1 and otherwise 0. Note that Px (N (y) ≥ 1) =
Px (Ty < ∞) = ρxy .
The probability of a chain first visiting y after starting from x in time m and then visits y again
after another time n is Px (Ty = m)Py (Ty = n). Therefore,
∞ X
X ∞
Px (N (y) ≥ 2) = Px (Ty = m)Px (Ty = n)
m=1 n=1
∞
X ∞
X
= Px (Ty = m) Py (Ty = n)
m=1 n=1
= ρxy ρyy
It follows that,
Px (N (y) ≥ m) = ρxy ρm−1
yy (5)
Exercises
1. Show that Px (N (y) = m) = ρxy ρm−1
yy (1 − ρyy )
10
Exercises
1. Show that Ex (Iy (Xn )) = P(n) (x, y)
Now proceeding,
∞
X
Ex (N (y)) = Ex Iy (Xn )
n=1
∞
X
= Ex (Iy (Xn ))
n=1
X∞
= P(n) (x, y)
n=1
P∞
We set G(x, y) = Ex (N (y)) = n=1 P(n) (x, y). G(x, y) denotes the expected number of times
the chain visits y from x.
Theorem 2.2. Let y be a transient state. Then,
Px (N (y) < ∞) = 1
and
ρxy
G(x, y) =
1 − ρxy
G(x, y) is finite.
Proof. Since y is a transient state, ρyy < 1.
G(x, y) = Ex (N (y))
X∞
= mPx (N (y) = m)
m=1
X∞
= mρxy ρm−1
yy (1 − ρyy )
m=1
ρxy
=
1 − ρyy
P∞ 1
For the last step you require, m=1 mtm−1 = (1−t)2 , t < 1. Show it!
P∞
Since y is a transient state, and n=1 P(n) (x, y) = G(x, y) < ∞. Therefore, limn→∞ P(n) (x, y) =
0, x ∈ S.
Theorem 2.3. Let y be a recurrent state.
1. Then limm→∞ Py (N (y) ≥ m) = 1 and G(y, y) = ∞.
2. limm→∞ Px (N (y) ≥ m) = Px (Ty < ∞) = ρxy
3. If ρxy = 0, then G(x, y) = 0, while if ρxy > 0, then G(x, y) = ∞
11
Proof. Left as an exercise.
We say that a state x leads to y, if ρxy > 0. We will denote it as x → y.
Proposition 2.6. x → y if and only if P(n) (x, y) > 0 for some n.
Proof. Hint: Use Proposition 2.4.
None of these states y1 , . . . , yn∗ −1 equal x or y, otherwise we would contradict the definition of n∗ ,
since then we could go to y from x in lesser than n∗ steps.
Now assume that ρyx < 1. The probability that the chain visits y1 , . . . , yn∗ −1 , y in the first n∗
steps and never returns to x is given by,
If this probability becomes non-zero, it implies that x would never be visited after time n∗ , which
contradicts the definition of x being a recurrent state. Therefore, ρyx = 1. By Proposition 2.6, there
∗∗
is a n∗∗ such that P(n ) (y, x) > 0, since ρyx > 0.
∗
+n∗∗ +n)
P(n (y, y) = Py (Xn∗ +n∗∗ +n0 = y)
≥ Py (Xn∗∗ = x, Xn∗∗ +n = x, Xn∗∗ +n+n∗ = y)
(Notice it is a ≥ because it is only a subset of the event in the previous equation)
∗∗ ∗
= P(n )
(y, x)P(n) (x, x)P(n ) (x, y)
12
We use the above in the following,
∞
X
G(y, y) = P(k) (y, y)
k=1
X
≥ P(k) (y, y)
k=n∗∗ +1+n∗
∞
∗∗
+n∗ +k
X
= Pn (y, y)
k=1
(From the above result)
∞
(n∗∗ ) n∗
X
≥P (y, x)P (x, y) P(k) (x, x) → ∞
k=1
∞
X
(Since P(k) (x, x) = G(x, x), and x is a recurrent state. Therefore, from Theorem 2.3 it goes to infinity.)
k=1
Therefore y is recurrent and y → x. It follows that ρxy = 1 from the first part of the theorem.
We say a set C of states to be closed if no state inside of C leads to any state outside of C, i.e.
ρxy = 0, x ∈ C, y ∈
/ C. A closed set C is called irreducible if x leads to y for all choices of x and y
in C.
Example 2.10.
1 2 3
1
"1 0 0#
2 0 0 1
3 0 1 0
Consider the set of states C = {1, 2, 3}, C1 = {1}, C2 = {2, 3}. Verify that C, C1 , C2 are closed
states. Check that C1 , C2 are also irreducible, but C is not irreducible.
Theorem 2.5. If C is an irreducible closed set, then every state in C is either recurrent or transient.
Proof. For some x ∈ C, where x is recurrent, since C is irreducible, therefore x leads to all states in
C. Hence, by Theorem 2.4, all states are recurrent.
For some y ∈ C, where y is transient, assume that ∃x ∈ C, where x is recurrent. Since C is
irreducible, x → y, therefore, by Theorem 2.4, y is recurrent which leads us to a contradiction.
In the above paragraph, we used contradiction to show that there cannot be an irreducible closed
set with a mix of transient and recurrent state. However it is not immediately obvious from the
contradiction that an irreducible set with all transient states can exist. To show that such a case is
possible, we will have to give an example that admits this property. In the Birth and Death chain
example subsequently we will construct a chain like this.
Theorem 2.6. Let C be a finite irreducible closed set of states. Then every state is recurrent.
Proof. Hint: Use Proposition 2.8 and Theorem 2.5.
Proposition 2.9. X
ρxy = P(x, y) + P(x, z)ρzy
z̸=y
13
Example 2.11.
0 1 2 3 4 5
0
1 0 0 0 0 0
1 14 1
2
1
4 0 0 0
1 2 1 1
0 0
2 5 5 5 5
1 1 1
0 0 0
3 6 3 2
1 1
0 0 0 0
4 2 2
1 3
5 0 0 0 4 0 4
Verify the following: C = {0, 1, 2, 3, 4, 5} forms a set of closed states. C1 = {0} and C2 = {3, 4, 5}
are irreducible closed states. ST = {1, 2} are transient states. SR = {0, 3, 4, 5} are recurrent states.
Does C3 = {1, 2} form a closed state? The answer is no. Verify!
Note that if x ∈ C1 ∪ C2 , then ρxy = 1, for some y ∈ C1 ∪ C2 , from Theorem 2.6. For e.g.,
ρ34 = 1
Suppose x ∈ SR ∈ / C2 , such as {0}, then ρ03 = 0.
Using Proposition 2.9, X
ρ10 = P(1, 0) + P(1, y)ρy0
y̸={0}
where px + qx + rx = 1 and q0 = 0. If d is finite, then pd = 0. Notice at each state {x} the chain
can either go to a state which is an increment by one {x + 1}, stay at the same state {x} or move to
a state that is a decrement by one {x − 1}. We saw such a structure in the gambler’s ruin problem.
The way we defined gambler’s ruin, in birth and death chain notation would be expressed as the
following, d = 4, r0 = 1, p1 = p2 = p3 = 0.4, q1 = q2 = q3 = 0.6, r4 = 1. If d is not finite, we can
study an extension of the gambler’s ruin problem, where the gambler will only stop playing if she
has been ruined. Earlier, we said the gambler would stop playing once she has accumulated 4 units
of wealth.
Remark 4. We only consider irreducible Birth and Death chains, all further references to Birth
and Death chain imply that they are irreducible.
14
The simple random walk that we discussed earlier is also a Birth and Death chain (Check how!).
Birth and death chain can be used to model queues. Queuing theory is a major topic of study in
Operations Research that looks at the analysis of formation of queues to design optimal layouts
for managing traffic. The Birth and Death chain can be thought of as a queue forming, where the
queue length at each time point increases or decreases by one, or stays the same. We will study a
generalization of this later called the Birth and Death process.
Also note that we describe the transition probabilities px , qx , rx by a subscript x which denotes
that the probability of the transition to another state depends on which state the chain is currently
at, so for example it is possible in a birth and death chain for p10 and p100 to be different.
Proposition 2.10. For a, b ∈ S, where a < b, define,
and set u(a) = 1, u(b) = 0. Here u(x) is the probability of the event that a Birth and Death chain
starting at {x} hits state {a} for the first time before it hits state {b} for the first time. Then,
Pb−1
y=x γy
u(x) = Pb−1
y=a γy
q1 ...qy
where γy = p1 ...py
Proof. Notice that we can state u(x) recursively in the following manner, for some a < y < b
where in the next time-step the chain could either decrease, increase or stay at the same state.
Since ry = 1 − py − qy ,
qy
u(y + 1) − u(y) = (u(y) − u(y − 1))
py
Since a < y < b, substituting y = a + 1,
qa+1
u(a + 2) − u(a + 1) = (u(a + 1) − u(a))
pa+1
Similarly,
qa+2
u(a + 3) − u(a + 2) = (u(a + 2) − u(a + 1))
pa+2
Therefore,
qa+1 qa+2 . . . qy
u(y + 1) − u(y) = (u(a + 1) − u(a)) (6)
pa+1 pa+2 . . . py
We sum (6) in y from a + 1 to b − 1,
qa+1 qa+1 qa+2 qa+1 qa+2 . . . qb−1
u(b) − u(a) = 1 + + + ... (u(a + 1) − u(a))
pa+1 pa+1 pa+2 pa+1 pa+2 . . . pb−1
Therefore,
qa+1 qa+1 qa+2 qa+1 qa+2 . . . qb−1
u(a) − u(a + 1) = 1/ 1 + + + ...
pa+1 pa+1 pa+2 pa+1 pa+2 . . . pb−1
Substituting above in (6),
!
qa+1 qa+2 . . . qy qa+1 qa+1 qa+2 qa+1 qa+2 . . . qb−1
u(y) − u(y + 1) = / 1+ + + ... (7)
pa+1 pa+2 . . . py pa+1 pa+1 pa+2 pa+1 pa+2 . . . pb−1
15
Summing (7) in y from x, . . . , b − 1, a < x < b,
Pb−1
y=x γy
u(x) = Pb−1
y=a γy
q1 ...qy
where γy = p1 ...py
Exercises
9
1. Consider the gambler’s ruin problem where px = 19 and qx = 1019 , where she stops playing
only if she is ruined. Given that the gambler’s current wealth is 10 units, then evaluate the
probability that the gambler is ruined before she reaches 15 units of wealth.
Proposition 2.11. The Birth and Death chain is recurrent, i.e. all states are recurrent, iff,
∞
X
γx = ∞
x=0
Then,
1
P1 (T0 < ∞) = lim P1 (T0 < Tn ) = 1 − P∞
n→∞
x=0 γx
P∞Death chain is recurrent, then from Theorem 2.4, ρ10 = 1, i.e. P1 (T0 < ∞) = 1
Say the Birth and
P∞γx = ∞.
and therefore, x=0
Now suppose x=0 γx = ∞.
P0 (T0 < ∞) = P(0, 0) + P(0, 1)P1 (T0 < ∞) + P(0, 2)P2 (T0 < ∞) + P(0, 3)P3 (T0 < ∞) + . . .
Now P1 (T0 < ∞) = ρ10 = 1 and P0 (T1 < ∞) = ρ01 = 1, therefore from Theorem 2.4 {0} is a
recurrent state. Since the chain is irreducible and {0} is recurrent, by Theorem 2.5 all states are
recurrent.
Corollary 2.1. The Birth and Death chain is transient, i.e. all states are transient, iff,
∞
X
γx < ∞
x=0
Exercises
x+2 x
1. Consider a Birth and Death chain where d is not finite. Define px = 2x+2 and qx = 2x+2 . Is
this chain recurrent or transient?
16
Stationary Distribution
One of the goals of Probability theory is to understand the limiting behavior of a random system.
The Central Limit Theorem (CLT) roughly states that an average of a large collection of random
variables follows the Normal distribution, irrespective of the distribution of the random variables
themselves. This is a profound result because it states that the limiting behavior of the random
systems in some sense doesn’t depend on the distribution of the random variables themselves. This
makes it also a very useful result, since even if random system is too difficult to analyse, the
CLT allows us to study the system in the limiting case. As many of you know CLT has immense
applications in statistics and many other disciplines.
With similar motivations, we will seek to study the stationary distribution for a Markov chain.
Consider the humidity in Kanpur example. What is the long-run proportion of days for which
Kanpur is Humid? To remind, the transition matrix is the following,
0 1
P= 0 0.9 0.1
1 0.2 0.8
One approach could be to study P(n) , where n is large, i.e. limn→∞ P(n) . To actually evaluate
P(n) for a large value of n means that we are multiplying the transition matrix to itself many times.
This is computationally expensive, the cost increases as the number of states in a Markov chain
increase. Instead let us consider the system of equations,
Taking limit,
X
lim P(Xn = y) = lim π0 (x)P(n) (x, y)
n→∞ n→∞
x∈S
X
= π0 (x)π(y) (Applying lim P(n) (x, y) = π(y), if it exists)
n→∞
x∈S
X
= π(y) (Since π(x) = 1)
x∈S
Therefore, the stationary distribution (when it exists) gives us the long term probability of a
Markov chain attaining a state. Formally the stationary distribution, π(x), is the solution of the
17
following set of equations,
X
π(x)P(x, y) = π(y), ∀y ∈ S (8)
x∈S
P
Since π(x) is a probability distribution, therefore, x∈S π(x) = 1. In matrix notation this is
written as, πP = π.
Exercises
1. Evaluate the stationary distribution of a Markov chain with the following transition matrix,
1 2 3
1
"1 1 1#
3 3 3
1 1 1
2 4 2 4
1 1 1
3 6 3 2
Remark 5. Now consider x∈S π(x)P(2) (x, y), using Chapman-Kolmogorov equation this can be
P
simplified in the following way,
X
π(x)P(2) (x, y)
x∈S
X X
= π(x) P(x, z)P(z, y) (Applying Chapman-Kolmogorov Equation)
x∈S z∈S
X X
= π(x)P(x, z) P(z, y)
z∈S x∈S
X
= π(z)P(z, y) (From (8))
z∈S
=π(y) (From (8))
Similarly, we can show that for any n,
X
π(x)P(n) (x, y) = π(y) (9)
x∈S
This motivates why we solve (8) to obtain the stationary distribution, as one can note that as n
approaches ∞ the RHS is still the stationary distribution.
X
π(x)P(x, y) = π(y), y ∈ S
x∈S
Therefore,
18
Also, py + qy + ry = 1.
It follows that,
py
π(y + 1) = π(y)
qy+1
Therefore,
p0 . . . px−1
π(x) = π(0), x ≥ 1
q1 . . . qx
p0 ...px−1
Let πx = q1 ...qx . Then,
π(x) = πx π(0)
P∞
And from d=0 πd = 1,
πx
π(x) = P∞
d=0 πd
P∞ P∞
if d=0 πd < ∞. Therefore, the Birth and Death chain has a stationary distribution iff d=0 πd <
∞.
Exercises
1. Evaluate the stationary distribution of the Markov chain with transition matrix,
0 1 2 4
0
0 1 0 0
1 13 0 2
3 0
2 1
0 0
2 3 3
3 0 0 1 0
my = Ey [Ty ]
3. A Markov chain with finite number of states, has no null recurrent states, i.e. all recurrent
states are positive recurrent in such a chain.
19
Theorem 2.8. Let π be a stationary distribution. If x is a transient state or a null recurrent state,
then π(x) = 0.
Theorem 2.9. An irreducible positive recurrent Markov chain has a unique stationary distribution
π, given by,
1
π(x) =
mx
Theorem 2.10. An irreducible Markov chain is positive recurrent, i.e. all states are positive recur-
rent, iff it has a stationary distribution.
Example 2.12 (Example of null recurrent chain). We have established that for an irreducible Birth
and Death chain to be transient,
∞
X qx . . . q1
γx = <∞ (10)
x=1
p x . . . p1
For a Birth and death chain to have a stationary distribution it was shown that,
∞ ∞
X X p0 . . . px−1
πx = <∞ (11)
x=1 x=1
q1 . . . qx−1
thus, it is also the condition for positive recurrence (from Theorem 2.10).
Thus the chain is null recurrent iff (10) and (11) fail, i.e., they are null recurrent iff the below
conditions hold simultaneously,
∞
X qx . . . q1
=∞
p
x=1 x
. . . p1
and
∞
X px−1 . . . p0
=∞
x=1
qx . . . q1
Exercises
1. Consider the Transition Matrix,
0 1 2 4
0
0 1 0 0
1 0 0 1 0
0 0 0 1
2
3 1 0 0 0
What is d1 ?
Theorem 2.11. All states in an irreducible Markov chain have a common period.
We say that an irreducible chain is periodic with period d if d > 1 and aperiodic if d = 1.
20
Exercises
1.
0 1 2 4
0
1 1
0 0
2 2
1 13 1
3
1
3 0
1 1 1
0
2 3 3 3
1 1
3 0 0 2 2
In the beginning of this section we wanted to understand when does the following limit exist?
Remark 6. Note that a Markov chain with transient or null recurrent states may also possess a
stationary distribution, however the limit of the transition matrix may not converge to the stationary
distribution. More importantly, the long-run interpretation of the stationary distribution also may
not hold. In the Kanpur Humidity example, we were able to interpret the stationary distribution
as the long-run probability because it satisfies the above conditions for convergence. Verify that the
Transition Matrix of the Kanpur Humidity case satisfies the properties above!
Remark 7. There is a more nuanced notion of the limit when the chain is periodic, but we shall
not pursue this here. Check Chapter 2, Hoel, Port and Stone if you are interested.
References
1. Sheldon Ross, Introduction to Probability Models, Academic Press, 2024.
2. Hoel, Port, Stone, Introduction to Stochastic Processes, Houghton Mifflin Company, 1972.
3. Rick Durrett, Essentials of Stochastic Processes, Springer, 1999.
4. Sidney Resnick, Adventures in Stochastic Processes, Springer, 1992.
21