0% found this document useful (0 votes)
24 views21 pages

Markov Chains

Uploaded by

ravcha19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views21 pages

Markov Chains

Uploaded by

ravcha19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

DMS625: Introduction to stochastic processes and their

applications
Sourav Majumdar

Markov Chains

September 4, 2024

1 Stochastic Processes
A sequence of random variables Xi , i ∈ J is said to be a stochastic process, where J is some indexing
set. For e.g., if J = N, then the sequence of random variables is X1 , X2 , X3 , . . ..

Exercises
1. Suppose X1 , X2 , . . . , Xn are iid random variables. Is Xi , i = 1, 2, . . . , n a stochastic process?

The set J could be discrete, or continuous. Usually J is interpreted to be time. However J can
be other kinds of sets. The other most common choice of J is space. So Xi , i ∈ J, say J ⊆ R2 , could
be random variables distributed over some spatial domain. For, e.g., the distribution of temperature
across geography. These are called spatial processes. Another possible choice for J is a cartesian
product of multiple sets. The most commonly seen one is space-time processes, i.e., J = R2 × [0, t).
These are called spatio-temporal processes. Extending the earlier example, spatio-temporal processes
as an example could study variations in temperature across geography through time.
We refer to the values that Xi takes as the states of the stochastic process. The states could be
discrete or continuous, countable or uncountable. We shall explore some of these in the course.

2 Markov Chains
Let S be the set of states. In this course, for Markov Chains, we will assume that S is discrete.
Consider a sequence of random variables Xn , n ≥ 0, n ∈ N. We say that these random variables
satisfy the Markov Property iff,

P(Xn+1 = xn+1 |X0 = x0 , . . . , Xn = xn ) = P(Xn+1 = xn+1 |Xn = xn ) (1)


∀n and x0 , . . . , xn+1 ∈ S. This in words means that the probability of moving to a certain state,
only depends on the immediate past state and not the entire history.
A sequence of random variables Xn having the Markov property is said to be a Markov Chain.
P(Xn+1 = y|Xn = x) is said to be the transition probability of the Markov Chain.
We say a Markov Chain is homogeneous (or time-homogeneous, if the indices are some
subset of time) if P(Xn+1 = y|Xn = x) is independent of n. This implies that,

P(X1 = y|X0 = x) = P(X2 = y|X1 = x) = . . . = P(Xn+1 = y|Xn = x) = . . .

1
This means that the probability of going from state x to y is independent of the time at which
it is occurring.
Remark 1. Homogeneous Markov chains are also alternatively referred to as Markov Chains with
stationary transition probabilities. Note that the notion of a stationary distribution, which
shall be introduced later, is different from that of stationary transition probability. It is possible for a
Markov Chain to have a stationary transition probabilities but not possess a stationary distribution.
These concepts shouldn’t be confused.
Remark 2. In all subsequent mentions in these notes, unless stated otherwise, it will be assumed
that the Markov Chains are time-indexed and that they are time-homogeneous.

For a Markov chain Xn , n ≥ 0, we call the function P(x, y) = P(X1 = y|X0 = x), x, y ∈ S, the
transition function. The transition function of a Markov chain satisfies the following properties,
1. P(x, y) ≥ 0, x, y ∈ S, follows from the fact that probabilities are non-negative.
P
2. y P(x, y) = 1, ∀x ∈ S (Prove!)

Exercises
1. Show that P(x, y) = P(Xn = y|Xn−1 = x), ∀n ∈ N.
2. Show that iid Xi form a Markov Chain.

3. Is it necessarily true that P(x, y) = P(y, x)?

At t = 0, the state of the Markov Chain is given by the initial distribution, π0 (x). It follows
that,

1. π0 (x) ≥ 0, x ∈ S, follows from the fact that probabilities are non-negative.


P
2. x π0 (x) = 1, sum of probabilities over all states is 1.

Suppose S is finite with d + 1 states, then we can represent the transition probabilities in the
form of a matrix, which is referred to as the transition matrix. This is given by,

0 ··· d
0
P(0, 0) ··· P(0, d)
P= .
.
.. .. .. 
.

. . .
d P(d, 0) ··· P(d, d)

Similarly the corresponding initial distribution could also be represented as a vector,

π0 = (π0 (0), . . . , π0 (d))


Example 2.1 (Humidity in Kanpur). Let the two states be {0, 1}, with 0 denoting a non-humid day
and 1 denoting a humid day. Suppose we could model Humidity in Kanpur as a Markov Chain, and
the corresponding transition matrix is,

 0 1

0 0.9 0.1
1 0.2 0.8

2
The way to interpret this is as following. The first entry of 0.9 states that if today is a non-humid
day, the probability that tomorrow is also non-humid is 0.9. Likewise, interpret the rest of the entries
of the matrix.
A question of interest could be, in the long-term what proportion of days in Kanpur are humid
days?

Example 2.2 (Simple Random walk). Let S = Z, suppose,


i
X
Si = Xj
j=1

where P(Xj = 1) = p and P(Xj = −1) = 1 − p, and Xj ’s are iid. Verify that Si is a Markov
Chain. If p = 1/2, it is said to be a simple symmetric random walk.
What would be the limiting behavior of a simple random walk, when the time between each jump
is small?
Example 2.3 (Brand Preference). Suppose there are three toothpaste brands in the market. Cus-
tomers either switch or buy the same brand again. Assume that this behavior could be modelled as
a Markov Chain,
1 2 3
1
"0.8 0.1 0.1#
2 0.2 0.6 0.2
3 0.3 0.3 0.4

Would the market shares of the three brands stabilize, or would it keep switching? If it would
what would be their resulting proportions?
Example 2.4 (Disability insurance). The disability insurer provides insurance to their client, when
they are disabled and collects premium from them when they are healthy. There are three primary
states of interest to the disability insurer: Healthy, Disabled, and Deceased. Suppose the movement of
their clients follow a Markov chain. Can you identify some entries of the transition matrix without
any additional information?
Example 2.5 (Creating a Markov Chain). Suppose the rain today depends on whether or not it
rained through the last two days. If it has rained for the last two days, then it will rain again
tomorrow with the probability 0.7; if it rained today but not yesterday, then it will rain tomorrow
with probability 0.5; if it rained yesterday but not today, then it will rain tomorrow with probability
0.4; if it has not rained in the past two days, then it will rain tomorrow with probability 0.2. Let
Xn = 1 denote the event that there is rain on n-th day.

P(Xn |Xn−1 , Xn−2 , . . . , X0 ) = P(Xn |Xn−1 , Xn−2 )


We immediately notice that this is not a Markov Chain. However, we can create a Markov chain
by defining it alternatively. We define a new random variable Yn ≡ (Xn , Xn−1 ). Now see that Yn is
a Markov chain,

P(Yn |Yn−1 , Yn−2 , . . . , Y1 ) = P(Xn , Xn−1 |Xn−1 , Xn−2 , . . . , X0 ) = P(Xn , Xn−1 |Xn−1 , Xn−2 ) = P(Yn |Yn−1 )

The four states of this new Markov Chain are,


1. Rained both today and yesterday (RR)
2. Rained today but not yesterday (RN)

3
3. Rained yesterday but not today (NR)
4. Didn’t rain either yesterday or today (NN)

RR RN NR NN
0.7 0 0.3 0
 
RR
RN 0.5 0 0.5 0
NR
0 0.4 0 0.6
NN 0 0.2 0 0.8

Proposition 2.1. P(X0 = x0 , X1 = x1 , . . . , Xn = xn ) = π0 (x0 )P(x0 , x1 ) . . . P(xn−1 , xn )


Proof. Note that by the definition of conditional probability,

P(X0 = x0 , X1 = x1 ) = P(X1 = x1 |X0 = x0 )P(X0 = x0 )


= P(x0 , x1 )π(x0 )

given P(X0 = x0 ) > 0, likewise, it follows that,

P(X0 = x0 , X1 = x1 , X2 = x2 ) = P(X2 = x2 |X1 = x1 , X0 = x0 )P(X1 = x1 |X0 = x0 )P(X0 = x0 )


= P(X2 = x2 |X1 = 1)P(X1 = x1 |X0 = x0 )P(X0 = x0 )
(Markov Property)
= P(x1 , x2 )P(x0 , x1 )π(x0 )

given P(X0 = x0 ) > 0 and P(X0 = x0 , X1 = x1 ) > 0.


We shall apply the principle of mathematical induction. Assume that for some m < n,
m−1
Y
P(X0 = x0 , X1 = x1 , . . . , Xm−1 = xm−1 ) = P(xj−1 , xj )π(x0 ) (2)
j=1

given P(X0 = x0 , X1 = x1 , . . . , Xk = xk ) > 0, ∀k = 0, 1, 2, . . . , m − 1.


It follows from (2) that,

P(X0 = x0 , . . . , Xm = xm ) =P(Xm = xm |Xm−1 = xm−1 , . . . , X0 = x0 )


P(Xm−1 = xm−1 , . . . , X0 = x0 )
m−1
Y
=P(xm−1 , xm ) P(xj−1 , xj )π(x0 )
j=1

=π(x0 )P(x0 , x1 ) . . . P(xm−1 , xm )

given P(X0 = x0 , X1 = x1 , . . . , Xk = xk ) > 0, ∀k = 0, 1, 2, . . . , m. Therefore, by the principle of


mathematical induction,

P(X0 = x0 , X1 = x1 , . . . , Xn = xn ) = π0 (x0 )P(x0 , x1 ) . . . P(xn−1 , xn ) (3)

given,

P(X0 = x0 , X1 = x1 , . . . , Xk = xk ) > 0, ∀k = 0, 1, 2, . . . , n. (4)

However (4) could fail to hold for some k. Suppose we denote the smallest k for which it fails to
hold by k ∗ . Formally this is expressed as,

k ∗ = inf{k ≥ 0 : P(X0 = x0 , . . . , Xk = xk ) = 0}

4
Then from (3) it follows that,

P(X0 = x0 , . . . , Xk∗ −1 = xk∗ −1 ) = π(x0 )P(x0 , x1 ) . . . P(xk∗ −2 , xk∗ −1 ) > 0

and,

P(xk∗ −1 , xk∗ ) =P(Xk∗ = x∗k |Xk∗ −1 = xk∗ −1 )


=P(Xk∗ = x∗k |Xk∗ −1 = xk∗ −1 , Xk∗ −2 = xk∗ −2 , . . . , X0 = x0 )
(Markov Property)
P(Xk∗ = xk∗ , Xk∗ −1 = xk∗ −1 , . . . , X0 = x0 )
= =0
P(Xk∗ −1 = xk∗ −1 , Xk∗ −2 = xk∗ −2 , . . . , X0 = x0 )
(Numerator is 0, by the definition of k ∗ )

We note that the claim of the theorem still holds, because if P(X0 = x0 , . . . , Xk∗ = x∗k ) = 0, then
P(X0 = x0 , . . . , Xk∗ = x∗k , . . . , Xn = xn ) = 0 and P(xk∗ −1 , xk∗ ) is a factor on the RHS, setting the
RHS as well to 0.

Higher-order transitions
P(x, y) gives the probability of a Markov Chain going from state x to y in one time-step. We may
be interested in knowing the probability of going from x to y in m ≥ 1 time-steps. In other words,
if at present it is at x, what is the probability of it being in y after m time steps? The knowledge
of P and π is sufficient to do these calculations for a Markov Chain.

Proposition 2.2. P(Xn+1 = xn+1 , . . . , Xn+m = xn+m |X0 = x0 , . . . , Xn = xn ) = P(xn , xn+1 ) . . . P(xn+m−1 , xn+m )
Proof. Left as an exercise.
Suppose we are interested in the probability that the chain is at a particular state after two-time
steps, this probability is called 2-step transition probability. Let us assume that the chain is at
time step n and we are interested in the probability of an event at time step n + 2, i.e.

P(Xn+2 = y|Xn = x) =P(Xn+2 = y, Xn+1 ∈ S|Xn = x)


(This denotes that at Xn+1 the chain could be at any possible state)
X
= P(Xn+2 = y, Xn+1 = s|Xn = x)
s∈S
(Since the states of the chain are disjoint,
the probability of union of all states S can be decomposed into their sum)
X
= P(x, s)P(s, y)
s∈S
(From Proposition (2.2))

We will denote this probability as P(2) (x, y) = P(Xn+2 = y|Xn = x) = s∈S P(x, s)P(s, y). We
P

note that P(2) (x, y) is independent of n, therefore the expression that we calculated holds for any n.
In a similar fashion suppose we are interested in the m-step transition probability. We shall
denote it by P(m) (x, y) = P(Xn+m = y|Xn = x).

5
P(m) (x, y)
=P(Xn+m = y|Xn = x)
=P(Xn+m = y, Xn+m−1 ∈ S, Xn+m−2 ∈ S, . . . , Xn+1 ∈ S|Xn = x)
X X X
= ... P(Xn+m = y, Xn+m−1 = sm−1 , Xn+m−2 = sm−2 , . . . , Xn+1 = s1 |Xn = x)
s1 ∈S s2 ∈S sm−1 ∈S
X X X
= ... P(x, s1 )P(s1 , s2 ) . . . P(sm−1 , y)
s1 ∈S s2 ∈S sm−1 ∈S

Again note that the m-step transition probability is also independent of n.

Remark 3. Note that P(m) (x, y) ̸= (P(x, y))m . We shall see a simple way of calculating this through
the Chapman-Kolmogorov Equation.

Exercises
1. Show that P(Xn+m = y|X0 = x, Xn = z) = P(m) (z, y)

Example 2.6 (Gambler’s Ruin). A gambler at a Casino is playing each round where she can win
1 unit of wealth with probability 0.4 and lose 1 unit of wealth with probability 0.6. She stops playing
once she has reached 4 units of wealth and once she reaches 0 units of wealth, she again stops playing,
as she is ruined. Let Xn denote the wealth of the gambler after n rounds, assume that Xn is a Markov
chain, then the transition probabilities are,

0 1 2 3 4
0
1 0 0 0 0
1 0.6 0 0.4 0 0
0 0.6 0 0.4 0
 
2
3
0 0 0.6 0 0.4
4 0 0 0 0 1

Notice that {1} and {4} are absorbing states as P(1, 1) = P(4, 4) = 1.

Exercises
1. For the gambler’s ruin problem compute the following P(X0 = 1, X1 = 2, X2 = 3, X3 = 2, X4 =
3).
2. For the gambler’s ruin problem compute P(X5 = 4|X0 = 1). What is P(X10 = 4|X5 = 1)?

Chapman-Kolmogorov Equation
Theorem 2.1. P(n+m) (x, y) = P(n) (x, z)P(m) (z, y)
P
z

6
Proof.

P(n+m) (x, y) = P(Xn+m = y|X0 = x)


X
= P(Xn = z|X0 = x)P(Xn+m = y|X0 = x, Xn = z)
z
X
= P(n) (x, z)P(Xn+m = y|X0 = x, Xn = z)
z
X
= P(n) (x, z)P(m) (z, y)
z

Note a consequence of the Chapman-Kolmogorov equation, upon setting n = m = 1, we obtain,


X
P(2) (x, y) = P(x, z)P(z, y)
z

This is nothing but the (x, y)-th entry of the square of the transition matrix P. Extending the
logic, we get that the probability of going from x to y in m-steps is the (x, y)-th entry in the m-th
power of the transition matrix, P.

Proposition 2.3. P(Xn = y) = x π0 (x)P(n) (x, y)


P

Proof.
X X X
P(Xn = y) = P(X0 = x, Xn = y) = P(X0 = x)P(Xn = y|X0 = x) = π0 (x)P(n) (x, y)
x x x

To calculate P(3) (1, 2), i.e., starting at {1} and reaching {2} after 3 steps in the gambler’s ruin
problem, we first evaluate P(3) = P × P × P,

P(3) = P × P × P
0 1 2 3 4
0
 1 0 0 0 0 
1 0.714 0 0.192 0 0.064
= 2 0.360

0.288 0 0.192 0.160

3
0.216 0 0.288 0 0.496
4 0 0 0 0 1

Therefore, P(3) (1, 2) = 0.192.

Hitting times
Note that the m-step transition probability gives us the probability of going from x to y in m-steps.
However, it is possible that the Markov chain already visits y once or more before the m-th step.
Suppose we are interested in understanding when does the Markov chain visit y for the first time,
for this we will study the notion of hitting times.
Let A ⊆ S. The hitting time TA of A is defined by,

TA = min(n > 0 : Xn ∈ A)

7
if Xn ∈ A for some n > 0, and by TA = ∞ if Xn ∈ / A, ∀n > 0.
We will define a new notation, Px (A) = P(A|X0 = x), for some event A, to denote that the
Markov chain starts at x, i.e. X0 = x.
To make the definition more explicit, consider Ty = 5, i.e., the event where chain reaches (hits)
the state y for the first time at the 5-th step, so,

Px (Ty = 5) = Px (X1 ̸= y, X2 ̸= y, X3 ̸= y, X4 ̸= y, X5 = y)

Hitting times play a prominent role in both theoretical and applied analysis of Markov chains.
We will explore the application of the notion to classification of states subsequently. In finance,
continuous state space, continuous time generalizations of the Markov chain, which are called Markov
processes, are used to study stock prices. Notion of hitting times play an important role there.
Particularly, the question of when a stock price would hit a particular value for the first time
can be studied through hitting times. The answer to this question has implications in drawdown
calculations, risk management, trading, etc.
Pn
Proposition 2.4. P(n) (x, y) = m=1 Px (Ty = m)P(n−m) (y, y)
Proof. The events {Ty = m, Xn = y}, 1 ≤ m ≤ n are disjoint. To see this observe that for some
1 ≤ k < l < n,

{Ty = k, Xn = y} = {X0 ̸= y, X1 ̸= y, . . . , Xk−1 ̸= y, Xk = y, Xn = y}

and
{Ty = l, Xn = y} = {X0 ̸= y, X1 ̸= y, . . . , Xk ̸= y, . . . , Xl−1 ̸= y, Xl = y, Xn = y}
and therefore {Ty = k, Xn = y} and {Ty = l, Xn = y} are disjoint. Therefore, {Ty = m, Xn =
y}, 1 ≤ m ≤ n are disjoint.
It also follows that,
{Xn = y} = ∪nm=1 {Ty = m, Xn = y}

P(n) (x, y) = Px (Xn = y)


Xn
= Px (Ty = m, Xn = y) (2nd axiom of probability)
m=1
Xn
= Px (Ty = m)P(Xn = y|X0 = x, Ty = m)
m=1
Xn
= Px (Ty = m)P(Xn = y|X0 = x, X1 ̸= y, . . . , Xm−1 ̸= y, Xm = y)
m=1
Xn
= Px (Ty = m)P(n−m) (y, y)
m=1

Proposition 2.5. X
Px (Ty = n + 1) = P(x, z)Pz (Ty = n)
z̸=y

Proof. Note that when the hitting time is 1, this is equivalent to the 1-step transition probability,

Px (Ty = 1) = Px (X1 = y) = P(x, y)

8
and,

Px (Ty = 2) = Px (X1 ̸= y, X2 = y)
X
= Px (X1 = z, X2 = y)
z̸=y
X
= P(x, z)P(z, y)
z̸=y

and by induction it can be shown that,


X
Px (Ty = n + 1) = P(x, z)Pz (Ty = n)
z̸=y

Thanks to Manan Kabra for an alternate proof,


Proof.

Px (Ty = n + 1) = P(Xn+1 = y, Xn ̸= y, . . . , X1 ̸= y, X0 = x)
= P(Xn+1 = y, Xn ̸= y, . . . , X2 ̸= y|X1 ̸= y, X0 = x)P(X1 ̸= y, X0 = x)
X
= P(Xn+1 = y, Xn ̸= y, . . . , X2 ̸= y|X1 = z, X0 = x)P(X1 = z, X0 = x)
z̸=y
X
= Pz (Ty = n)P(x, z)
z̸=y

For the gambler’s ruin problem, P1 (T3 = 1) = P(1, 3) = 0, i.e., the probability of hitting {3} in
one step from {1}. Similarly,

P1 (T3 = 2) = P(1, 0)P0 (T3 = 1) + P(1, 2)P2 (T3 = 1)


= 0.6 × 0 + 0.4 × 0.4 = 0.16

Next,

P1 (T3 = 3) = P(1, 0)P0 (T3 = 2) + P(1, 2)P2 (T3 = 2)

and so on.

Transient and Recurrent States


Define ρxy = Px (Ty < ∞). ρxy is the probability of a Markov Chain at x to be in y in finite time. ρyy
denotes the probability that the chain starting at y will return to y. A state y is called recurrent if
ρyy = 1 and transient if ρyy < 1. This implies that a chain will always return to a recurrent state
in finite time, however there is a positive probability of a chain never returning to a transient state.
Recall the disability insurance example, the deceased state which was an absorbing state is also a
recurrent state. We will develop tools to be able to identify these states.

9
Example 2.7 (Absorbing states are recurrent).

1 2 3
1
"1 0 0#
2 0 1 0
3 0 0 1

Example 2.8 (All states are recurrent, but none are absorbing).

1 2 3
0 1 0
" #
1
2 0 0 1
3 1 0 0

Example 2.9 (Transient state).


1 2 3
1
"0.2 0.4 0.4#
2 0 0.3 0.7
3 0 0.7 0.3

Notice {1} is a transient state, since it can leave {1} with a positive probability and once it leaves
{1}, it can never reach {1} again. {2} and {3} are recurrent states, but not absorbing states.
Let Iy (z), z ∈ S, denote the indicator function, where Iy (z) = 1, z = y and Iy (z) = 0, z ̸= y.
Let N (y) be the number of times n ≥ 1 that the chain visits y. We can write it as,

X
N (y) = Iy (Xn )
n=1

where if the chain is at y at time n, Iy (Xn ) = 1 and otherwise 0. Note that Px (N (y) ≥ 1) =
Px (Ty < ∞) = ρxy .
The probability of a chain first visiting y after starting from x in time m and then visits y again
after another time n is Px (Ty = m)Py (Ty = n). Therefore,
∞ X
X ∞
Px (N (y) ≥ 2) = Px (Ty = m)Px (Ty = n)
m=1 n=1
  

X ∞
X
= Px (Ty = m)  Py (Ty = n)
m=1 n=1

= ρxy ρyy

It follows that,
Px (N (y) ≥ m) = ρxy ρm−1
yy (5)

Exercises
1. Show that Px (N (y) = m) = ρxy ρm−1
yy (1 − ρyy )

2. Show that Px (N (y) = 0) = 1 − ρxy

We denote Ex to denote the expectation of an event of a Markov chain starting at x.

10
Exercises
1. Show that Ex (Iy (Xn )) = P(n) (x, y)

Now proceeding,
 

X
Ex (N (y)) = Ex  Iy (Xn )
n=1

X
= Ex (Iy (Xn ))
n=1
X∞
= P(n) (x, y)
n=1
P∞
We set G(x, y) = Ex (N (y)) = n=1 P(n) (x, y). G(x, y) denotes the expected number of times
the chain visits y from x.
Theorem 2.2. Let y be a transient state. Then,

Px (N (y) < ∞) = 1

and
ρxy
G(x, y) =
1 − ρxy
G(x, y) is finite.
Proof. Since y is a transient state, ρyy < 1.

Px (N (y) = ∞) = lim Px (N (y) ≥ m) = lim ρxy ρm−1


yy =0
m→∞ m→∞

Therefore the first part follows.

G(x, y) = Ex (N (y))
X∞
= mPx (N (y) = m)
m=1
X∞
= mρxy ρm−1
yy (1 − ρyy )
m=1
ρxy
=
1 − ρyy
P∞ 1
For the last step you require, m=1 mtm−1 = (1−t)2 , t < 1. Show it!
P∞
Since y is a transient state, and n=1 P(n) (x, y) = G(x, y) < ∞. Therefore, limn→∞ P(n) (x, y) =
0, x ∈ S.
Theorem 2.3. Let y be a recurrent state.
1. Then limm→∞ Py (N (y) ≥ m) = 1 and G(y, y) = ∞.
2. limm→∞ Px (N (y) ≥ m) = Px (Ty < ∞) = ρxy
3. If ρxy = 0, then G(x, y) = 0, while if ρxy > 0, then G(x, y) = ∞

11
Proof. Left as an exercise.
We say that a state x leads to y, if ρxy > 0. We will denote it as x → y.
Proposition 2.6. x → y if and only if P(n) (x, y) > 0 for some n.
Proof. Hint: Use Proposition 2.4.

Proposition 2.7. If x → y and y → z, then x → z


Proof. Hint: Use Proposition 2.6 and the Chapman-Kolmogorov equation.
Proposition 2.8. A Markov chain having a finite state space must have atleast one recurrent state.

Proof. Hint: Prove by contradiction.


Theorem 2.4. Let x be a recurrent state and suppose that x → y. Then y is recurrent and
ρxy = ρyx = 1.
Proof. Let n∗ denote the time for which the probability of n∗ being a first hitting time at y from x
is non-zero. Recall that we saw in the Gambler’s ruin example some hitting times could have zero
probability. Formally,
n∗ = min{n ≥ 1 : Px (Ty = n) > 0}

This implies that P(n ) (x, y) > 0 and that there exists states y1 , . . . , yn∗ −1 , such that,

Px (X1 = y1 , . . . , Xn∗ −1 = xn∗ −1 , Xn∗ = y) = P(x, y1 ) . . . P(yn∗ −1 , y) > 0

None of these states y1 , . . . , yn∗ −1 equal x or y, otherwise we would contradict the definition of n∗ ,
since then we could go to y from x in lesser than n∗ steps.
Now assume that ρyx < 1. The probability that the chain visits y1 , . . . , yn∗ −1 , y in the first n∗
steps and never returns to x is given by,

P(x, y1 ), . . . , P(yn∗ −1 , y)(1 − ρyx )

If this probability becomes non-zero, it implies that x would never be visited after time n∗ , which
contradicts the definition of x being a recurrent state. Therefore, ρyx = 1. By Proposition 2.6, there
∗∗
is a n∗∗ such that P(n ) (y, x) > 0, since ρyx > 0.


+n∗∗ +n)
P(n (y, y) = Py (Xn∗ +n∗∗ +n0 = y)
≥ Py (Xn∗∗ = x, Xn∗∗ +n = x, Xn∗∗ +n+n∗ = y)
(Notice it is a ≥ because it is only a subset of the event in the previous equation)
∗∗ ∗
= P(n )
(y, x)P(n) (x, x)P(n ) (x, y)

12
We use the above in the following,

X
G(y, y) = P(k) (y, y)
k=1
X
≥ P(k) (y, y)
k=n∗∗ +1+n∗

∗∗
+n∗ +k
X
= Pn (y, y)
k=1
(From the above result)

(n∗∗ ) n∗
X
≥P (y, x)P (x, y) P(k) (x, x) → ∞
k=1

X
(Since P(k) (x, x) = G(x, x), and x is a recurrent state. Therefore, from Theorem 2.3 it goes to infinity.)
k=1

Therefore y is recurrent and y → x. It follows that ρxy = 1 from the first part of the theorem.

We say a set C of states to be closed if no state inside of C leads to any state outside of C, i.e.
ρxy = 0, x ∈ C, y ∈
/ C. A closed set C is called irreducible if x leads to y for all choices of x and y
in C.
Example 2.10.
1 2 3
1
"1 0 0#
2 0 0 1
3 0 1 0

Consider the set of states C = {1, 2, 3}, C1 = {1}, C2 = {2, 3}. Verify that C, C1 , C2 are closed
states. Check that C1 , C2 are also irreducible, but C is not irreducible.
Theorem 2.5. If C is an irreducible closed set, then every state in C is either recurrent or transient.

Proof. For some x ∈ C, where x is recurrent, since C is irreducible, therefore x leads to all states in
C. Hence, by Theorem 2.4, all states are recurrent.
For some y ∈ C, where y is transient, assume that ∃x ∈ C, where x is recurrent. Since C is
irreducible, x → y, therefore, by Theorem 2.4, y is recurrent which leads us to a contradiction.
In the above paragraph, we used contradiction to show that there cannot be an irreducible closed
set with a mix of transient and recurrent state. However it is not immediately obvious from the
contradiction that an irreducible set with all transient states can exist. To show that such a case is
possible, we will have to give an example that admits this property. In the Birth and Death chain
example subsequently we will construct a chain like this.
Theorem 2.6. Let C be a finite irreducible closed set of states. Then every state is recurrent.
Proof. Hint: Use Proposition 2.8 and Theorem 2.5.

Proposition 2.9. X
ρxy = P(x, y) + P(x, z)ρzy
z̸=y

Proof. Hint: Use Proposition 2.4.

13
Example 2.11.
0 1 2 3 4 5
0
1 0 0 0 0 0
1  14 1
2
1
4 0 0 0
1 2 1 1
0 0

2 5 5 5 5
1 1 1
0 0 0

3 6 3 2
1 1
0 0 0 0

4 2 2
1 3
5 0 0 0 4 0 4

Verify the following: C = {0, 1, 2, 3, 4, 5} forms a set of closed states. C1 = {0} and C2 = {3, 4, 5}
are irreducible closed states. ST = {1, 2} are transient states. SR = {0, 3, 4, 5} are recurrent states.
Does C3 = {1, 2} form a closed state? The answer is no. Verify!
Note that if x ∈ C1 ∪ C2 , then ρxy = 1, for some y ∈ C1 ∪ C2 , from Theorem 2.6. For e.g.,
ρ34 = 1
Suppose x ∈ SR ∈ / C2 , such as {0}, then ρ03 = 0.
Using Proposition 2.9, X
ρ10 = P(1, 0) + P(1, y)ρy0
y̸={0}

which leads to,


1 1 1
ρ10 = + ρ10 + ρ20
4 2 4
and similarly,
1 2 1 1
ρ20 = 0 + ρ10 + ρ20 + ρ30 + ρ50
5 5 5 5
3
Note that ρ30 = ρ50 = 0, since C1 is unreachable from C2 . Solving, we get ρ10 = 5 and ρ20 = 51 .

Birth and Death chain


Birth and death chain is a class of Markov chains that arise very commonly in applications. One
would be able to spot that many common types of Markov chains are actually birth and death chains
in disguise. Let us first describe the transition probability of the birth and death chain.
Let S = {0, 1, . . . , d}, where d could be finite or infinite. Then the transition probability is given
by, 
qx y = x − 1


P(x, y) = rx y = x

px y = x + 1

where px + qx + rx = 1 and q0 = 0. If d is finite, then pd = 0. Notice at each state {x} the chain
can either go to a state which is an increment by one {x + 1}, stay at the same state {x} or move to
a state that is a decrement by one {x − 1}. We saw such a structure in the gambler’s ruin problem.
The way we defined gambler’s ruin, in birth and death chain notation would be expressed as the
following, d = 4, r0 = 1, p1 = p2 = p3 = 0.4, q1 = q2 = q3 = 0.6, r4 = 1. If d is not finite, we can
study an extension of the gambler’s ruin problem, where the gambler will only stop playing if she
has been ruined. Earlier, we said the gambler would stop playing once she has accumulated 4 units
of wealth.
Remark 4. We only consider irreducible Birth and Death chains, all further references to Birth
and Death chain imply that they are irreducible.

14
The simple random walk that we discussed earlier is also a Birth and Death chain (Check how!).
Birth and death chain can be used to model queues. Queuing theory is a major topic of study in
Operations Research that looks at the analysis of formation of queues to design optimal layouts
for managing traffic. The Birth and Death chain can be thought of as a queue forming, where the
queue length at each time point increases or decreases by one, or stays the same. We will study a
generalization of this later called the Birth and Death process.
Also note that we describe the transition probabilities px , qx , rx by a subscript x which denotes
that the probability of the transition to another state depends on which state the chain is currently
at, so for example it is possible in a birth and death chain for p10 and p100 to be different.
Proposition 2.10. For a, b ∈ S, where a < b, define,

u(x) = Px (Ta < Tb ), a < x < b

and set u(a) = 1, u(b) = 0. Here u(x) is the probability of the event that a Birth and Death chain
starting at {x} hits state {a} for the first time before it hits state {b} for the first time. Then,
Pb−1
y=x γy
u(x) = Pb−1
y=a γy

q1 ...qy
where γy = p1 ...py

Proof. Notice that we can state u(x) recursively in the following manner, for some a < y < b

u(y) = qy u(y − 1) + ry u(y) + py u(y + 1)

where in the next time-step the chain could either decrease, increase or stay at the same state.
Since ry = 1 − py − qy ,
qy
u(y + 1) − u(y) = (u(y) − u(y − 1))
py
Since a < y < b, substituting y = a + 1,
qa+1
u(a + 2) − u(a + 1) = (u(a + 1) − u(a))
pa+1
Similarly,
qa+2
u(a + 3) − u(a + 2) = (u(a + 2) − u(a + 1))
pa+2
Therefore,
qa+1 qa+2 . . . qy
u(y + 1) − u(y) = (u(a + 1) − u(a)) (6)
pa+1 pa+2 . . . py
We sum (6) in y from a + 1 to b − 1,
 
qa+1 qa+1 qa+2 qa+1 qa+2 . . . qb−1
u(b) − u(a) = 1 + + + ... (u(a + 1) − u(a))
pa+1 pa+1 pa+2 pa+1 pa+2 . . . pb−1
Therefore,  
qa+1 qa+1 qa+2 qa+1 qa+2 . . . qb−1
u(a) − u(a + 1) = 1/ 1 + + + ...
pa+1 pa+1 pa+2 pa+1 pa+2 . . . pb−1
Substituting above in (6),
!  
qa+1 qa+2 . . . qy qa+1 qa+1 qa+2 qa+1 qa+2 . . . qb−1
u(y) − u(y + 1) = / 1+ + + ... (7)
pa+1 pa+2 . . . py pa+1 pa+1 pa+2 pa+1 pa+2 . . . pb−1

15
Summing (7) in y from x, . . . , b − 1, a < x < b,
Pb−1
y=x γy
u(x) = Pb−1
y=a γy
q1 ...qy
where γy = p1 ...py

Exercises
9
1. Consider the gambler’s ruin problem where px = 19 and qx = 1019 , where she stops playing
only if she is ruined. Given that the gambler’s current wealth is 10 units, then evaluate the
probability that the gambler is ruined before she reaches 15 units of wealth.

Proposition 2.11. The Birth and Death chain is recurrent, i.e. all states are recurrent, iff,

X
γx = ∞
x=0

Proof. Note from Proposition 2.10,


1
P1 (T0 < Tn ) = 1 − Pn−1
x=0 γx

Then,
1
P1 (T0 < ∞) = lim P1 (T0 < Tn ) = 1 − P∞
n→∞
x=0 γx

P∞Death chain is recurrent, then from Theorem 2.4, ρ10 = 1, i.e. P1 (T0 < ∞) = 1
Say the Birth and
P∞γx = ∞.
and therefore, x=0
Now suppose x=0 γx = ∞.

P0 (T0 < ∞) = P(0, 0) + P(0, 1)P1 (T0 < ∞) + P(0, 2)P2 (T0 < ∞) + P(0, 3)P3 (T0 < ∞) + . . .

Now note that P (0, x) = 0, ∀x > 1. Therefore,

P0 (T0 < ∞) = P(0, 0) + P(0, 1)P1 (T0 < ∞)


P∞
Since, P1 (T0 < ∞) = 1, as x=0 γx = ∞. Therefore,

P0 (T1 < ∞) = P(0, 0) + P(0, 1) = 1

Now P1 (T0 < ∞) = ρ10 = 1 and P0 (T1 < ∞) = ρ01 = 1, therefore from Theorem 2.4 {0} is a
recurrent state. Since the chain is irreducible and {0} is recurrent, by Theorem 2.5 all states are
recurrent.

Corollary 2.1. The Birth and Death chain is transient, i.e. all states are transient, iff,

X
γx < ∞
x=0

Exercises
x+2 x
1. Consider a Birth and Death chain where d is not finite. Define px = 2x+2 and qx = 2x+2 . Is
this chain recurrent or transient?

16
Stationary Distribution
One of the goals of Probability theory is to understand the limiting behavior of a random system.
The Central Limit Theorem (CLT) roughly states that an average of a large collection of random
variables follows the Normal distribution, irrespective of the distribution of the random variables
themselves. This is a profound result because it states that the limiting behavior of the random
systems in some sense doesn’t depend on the distribution of the random variables themselves. This
makes it also a very useful result, since even if random system is too difficult to analyse, the
CLT allows us to study the system in the limiting case. As many of you know CLT has immense
applications in statistics and many other disciplines.
With similar motivations, we will seek to study the stationary distribution for a Markov chain.
Consider the humidity in Kanpur example. What is the long-run proportion of days for which
Kanpur is Humid? To remind, the transition matrix is the following,

 0 1

P= 0 0.9 0.1
1 0.2 0.8

One approach could be to study P(n) , where n is large, i.e. limn→∞ P(n) . To actually evaluate
P(n) for a large value of n means that we are multiplying the transition matrix to itself many times.
This is computationally expensive, the cost increases as the number of states in a Markov chain
increase. Instead let us consider the system of equations,

0.9π(0) + 0.2π(1) = π(0)


0.1π(0) + 0.8π(1) = π(1)

Since π is a probability distribution,


π(0) + π(1) = 1
The solution of these system of equations is, π(0) = 23 , π(1) = 13 . Therefore, we will claim this
implies that the long-run proportion of days for which Kanpur is Humid is 0.667, i.e. limn→∞ P(Xn =
1) = 0.667. To see why we can claim this, check the paragraph below. Notice that solving this system
of equations is more feasible than evaluating limn→∞ P(n) . In this section, we will try to identify
the conditions for which limn→∞ P(n) (x, y) = π(y) takes place.
But why does evaluating π as a proxy for limn→∞ P(n) even make sense?
From Proposition 2.3,
X
P(Xn = y) = π0 (x)P(n) (x, y)
x∈S

Taking limit,
X
lim P(Xn = y) = lim π0 (x)P(n) (x, y)
n→∞ n→∞
x∈S
X
= π0 (x)π(y) (Applying lim P(n) (x, y) = π(y), if it exists)
n→∞
x∈S
X
= π(y) (Since π(x) = 1)
x∈S

Therefore, the stationary distribution (when it exists) gives us the long term probability of a
Markov chain attaining a state. Formally the stationary distribution, π(x), is the solution of the

17
following set of equations,
X
π(x)P(x, y) = π(y), ∀y ∈ S (8)
x∈S
P
Since π(x) is a probability distribution, therefore, x∈S π(x) = 1. In matrix notation this is
written as, πP = π.

Exercises
1. Evaluate the stationary distribution of a Markov chain with the following transition matrix,
1 2 3
1
"1 1 1#
3 3 3
1 1 1
2 4 2 4
1 1 1
3 6 3 2

Remark 5. Now consider x∈S π(x)P(2) (x, y), using Chapman-Kolmogorov equation this can be
P
simplified in the following way,
X
π(x)P(2) (x, y)
x∈S
X X
= π(x) P(x, z)P(z, y) (Applying Chapman-Kolmogorov Equation)
x∈S z∈S
 
X X
=  π(x)P(x, z) P(z, y)
z∈S x∈S
X
= π(z)P(z, y) (From (8))
z∈S
=π(y) (From (8))
Similarly, we can show that for any n,
X
π(x)P(n) (x, y) = π(y) (9)
x∈S

This motivates why we solve (8) to obtain the stationary distribution, as one can note that as n
approaches ∞ the RHS is still the stationary distribution.

Stationary distribution of the Birth and Death chain


Consider the irreducible Birth and Death chain introduced earlier.
Now consider the system of equations,

X
π(x)P(x, y) = π(y), y ∈ S
x∈S

Therefore,

π(0)r0 + π(1)q1 = π(0)


π(y − 1)py−1 + π(y)ry + π(y + 1)qy+1 = π(y), y ≥ 1

18
Also, py + qy + ry = 1.
It follows that,
py
π(y + 1) = π(y)
qy+1
Therefore,
p0 . . . px−1
π(x) = π(0), x ≥ 1
q1 . . . qx
p0 ...px−1
Let πx = q1 ...qx . Then,

π(x) = πx π(0)
P∞
And from d=0 πd = 1,
πx
π(x) = P∞
d=0 πd
P∞ P∞
if d=0 πd < ∞. Therefore, the Birth and Death chain has a stationary distribution iff d=0 πd <
∞.

Exercises
1. Evaluate the stationary distribution of the Markov chain with transition matrix,

0 1 2 4
0
0 1 0 0
1  13 0 2
3 0
2 1
0 0

2 3 3
3 0 0 1 0

Mean return time to a Transient state


If y is a Transient state, then Py (Ty = ∞) > 0, i.e., there is a positive probability that it takes
forever to return to y. Therefore, Ey [Ty ] = ∞. In words, this means that the average time for a
transient state to return to itself is infinite.
Now suppose y is a recurrent state. We define the mean return time as,

my = Ey [Ty ]

where my means the mean time taken by a chain at y to return to y.


If my < ∞, then we say that y is positive recurrent. If my = ∞, we say that it is null
recurrent. It is a more precise classification of a recurrent state, where we identify how quickly
does a recurrent state return to itself on average.
There is a connection between stationary distributions and recurrent states that we will explore
now. In the remainder of the notes we state all results without proofs, interested may refer to
Chapter 2 of Hoel, Port and Stone to understand the proofs.
Theorem 2.7 (Some results on Positive Recurrent states). 1. Let x be a positive recurrent state
and suppose x → y. Then y is positive recurrent.
2. Let C be finite irreducible set of states, then all states in C are positive recurrent.

3. A Markov chain with finite number of states, has no null recurrent states, i.e. all recurrent
states are positive recurrent in such a chain.

19
Theorem 2.8. Let π be a stationary distribution. If x is a transient state or a null recurrent state,
then π(x) = 0.
Theorem 2.9. An irreducible positive recurrent Markov chain has a unique stationary distribution
π, given by,
1
π(x) =
mx
Theorem 2.10. An irreducible Markov chain is positive recurrent, i.e. all states are positive recur-
rent, iff it has a stationary distribution.
Example 2.12 (Example of null recurrent chain). We have established that for an irreducible Birth
and Death chain to be transient,

X qx . . . q1
γx = <∞ (10)
x=1
p x . . . p1

For a Birth and death chain to have a stationary distribution it was shown that,
∞ ∞
X X p0 . . . px−1
πx = <∞ (11)
x=1 x=1
q1 . . . qx−1

thus, it is also the condition for positive recurrence (from Theorem 2.10).
Thus the chain is null recurrent iff (10) and (11) fail, i.e., they are null recurrent iff the below
conditions hold simultaneously,

X qx . . . q1
=∞
p
x=1 x
. . . p1
and

X px−1 . . . p0
=∞
x=1
qx . . . q1

Convergence to stationary distribution


Consider a Markov chain where ρxx > 0, ∀x ∈ S. The period of such a Markov chain is defined as,

dx = gcd{n ≥ 1 : P(n) (x, x) > 0}


where gcd is the greatest common divisor. We require the ρxx > 0 condition, otherwise, if there
is 0 probability of a chain returning to a state the notion of periodicity becomes meaningless in such
a case.

Exercises
1. Consider the Transition Matrix,
0 1 2 4
0
0 1 0 0
1 0 0 1 0
0 0 0 1
 
2
3 1 0 0 0

What is d1 ?
Theorem 2.11. All states in an irreducible Markov chain have a common period.
We say that an irreducible chain is periodic with period d if d > 1 and aperiodic if d = 1.

20
Exercises
1.
0 1 2 4
0
1 1
0 0
2 2
1  13 1
3
1
3 0
1 1 1
0

2 3 3 3
1 1
3 0 0 2 2

What is the period of this Markov chain?

In the beginning of this section we wanted to understand when does the following limit exist?

lim P(n) (x, y) = π(y)


n→∞

Now we answer this question,


Theorem 2.12. Let Xn be an irreducible, positive recurrent, aperiodic Markov chain having sta-
tionary distribution π. Then,
lim P(n) (x, y) = π(y), x, y ∈ S
n→∞

Remark 6. Note that a Markov chain with transient or null recurrent states may also possess a
stationary distribution, however the limit of the transition matrix may not converge to the stationary
distribution. More importantly, the long-run interpretation of the stationary distribution also may
not hold. In the Kanpur Humidity example, we were able to interpret the stationary distribution
as the long-run probability because it satisfies the above conditions for convergence. Verify that the
Transition Matrix of the Kanpur Humidity case satisfies the properties above!

Remark 7. There is a more nuanced notion of the limit when the chain is periodic, but we shall
not pursue this here. Check Chapter 2, Hoel, Port and Stone if you are interested.

References
1. Sheldon Ross, Introduction to Probability Models, Academic Press, 2024.
2. Hoel, Port, Stone, Introduction to Stochastic Processes, Houghton Mifflin Company, 1972.
3. Rick Durrett, Essentials of Stochastic Processes, Springer, 1999.
4. Sidney Resnick, Adventures in Stochastic Processes, Springer, 1992.

21

You might also like