0% found this document useful (0 votes)
70 views24 pages

Chapter 2

1) Markov chains are stochastic processes where the probability of future states depends only on the present state, not on the sequence of events that preceded it. 2) The document provides definitions and properties of Markov chains, including the one-step transition probability and transition matrix. 3) Examples are given to illustrate Markov chains in modeling weather patterns, communication systems, mood changes, and random walks.

Uploaded by

ching chau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views24 pages

Chapter 2

1) Markov chains are stochastic processes where the probability of future states depends only on the present state, not on the sequence of events that preceded it. 2) The document provides definitions and properties of Markov chains, including the one-step transition probability and transition matrix. 3) Examples are given to illustrate Markov chains in modeling weather patterns, communication systems, mood changes, and random walks.

Uploaded by

ching chau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Chapter 2

Markov chains

2.1 Markov chains

In many systems, it is reasonable to suppose that, if we know exactly the state of the
system today, then its state tomorrow should not further depend on its state yesterday
(or on any previous state).

Definition 2.1. (Discrete Markov chain) A discrete state space stochastic process
{Xn : n = 0, 1, 2, . . .} is called a Markov Chain (MC) if

P (Xn+1 = j | Xn = i, Xn−1 = in−1 , . . . , X1 = i1 , X0 = i0 ) = P (Xn+1 = j | Xn = i) = Pi j

for all n ≥ 0 and all possible values of j, i, in−1 , · · · , i0 , where Pi j is called the one-
step transition probability.

Remark 2.1. For an MC, the future is conditionally independent of the past, given
the present.

Property 2.1. If {Xn : n = 0, 1, 2, . . .} is an MC, then

P (Xn+1 = j | Xn = i, Xn−1 = in−1 , . . . , X1 = i1 , X0 = i0 )


= P (Xn+1 = j | Xn = i, Xn−k = in−k )

for any k = 1, 2, · · · , n.

Proof. Denote the state space of {Xn ; n ≥ 0} by S (i.e., the set of all possible values
of any Xn ). Without loss of generality, we only show the result for k = 1. Note that

P (Xn+1 = j | Xn = i, Xn−1 = in−1 )


= ∑ P (Xn+1 = j | Xn = i, Xn−1 = in−1 , Xn−2 = in−2 , . . . , X0 = i0 )
in−2 ∈S,in−3 ∈S,...,i0 ∈S

× P (Xn−2 = in−2 , . . . , X0 = i0 | Xn = i, Xn−1 = in−1 )

21
22 2 Markov chains

= Pi j ∑ P (Xn−2 = in−2 , . . . , X0 = i0 | Xn = i, Xn−1 = in−1 )


in−2 ∈S,in−3 ∈S,...,i0 ∈S

= Pi j .

This completes the proof. t


u

An equivalent definition of MC is as follows:

Theorem 2.1. A discrete state space stochastic process {Xn : n = 0, 1, 2, . . .} is an


MC if and only if

P (Xn+m = j | Xn = i, Xn−1 = in−1 , . . . , X1 = i1 , X0 = i0 ) = P (Xn+m = j | Xn = i)

for all n ≥ 0, m ≥ 1, and all possible values of j, i, in−1 , · · · , i0 .

Proof. The proof of “if” part is trivial. Below, we give the proof of “only if” part by
using the induction on m.
Denote the state space of {Xn ; n ≥ 0} by S. It is obvious that the result holds for
m = 1. Suppose the result holds for m. Then,

P (Xn+m+1 = j | Xn = i, Xn−1 = in−1 , . . . , X1 = i1 , X0 = i0 )


= ∑ P (Xn+m+1 = j | Xn+m = in+m , Xn = i, Xn−1 = in−1 , . . . , X1 = i1 , X0 = i0 )
in+m ∈S

× P (Xn+m = in+m | Xn = i, Xn−1 = in−1 , . . . , X1 = i1 , X0 = i0 )


= ∑ P (Xn+m+1 = j | Xn+m = in+m , Xn = i) (by Property 2.1)
in+m ∈S

× P (Xn+m = in+m | Xn = i) (by induction)


= P (Xn+m+1 = j | Xn = i) .

Hence, the desired result holds. t


u

Remark 2.2. Alternatively, for an MC, the future is conditionally independent of the
earlier past, given the most recent past.

For ease of notation, we commonly write

P (Xn+1 = j | Xn = i, Xn−1 = in−1 , . . . , X1 = i1 , X0 = i0 )


= P (Xn+1 = j | Xn , Xn−1 , · · · , X0 ) .

Property 2.2. If {Xn ; n ≥ 0} is an MC, then {X2n ; n ≥ 0} is also an MC.

Proof. Left as an exercise. t


u

Property 2.3. If {Xn ; n ≥ 0} is an MC, then {Yn ; n ≥ 0} is also an MC, where Yn =


(Xn , Xn+1 ).

Proof. Left as an exercise. t


u
2.2 Examples 23

Property 2.4. The one-step transition probability Pi j satisfies that

(i) Pi j ≥ 0 for i, j = 0, 1, . . . ;

(ii) ∑ Pi j = 1 for any i.
j=0

Proof. Left as an exercise. t


u

Definition 2.2. (One-step Transition Matrix) The one-step transition matrix P is the
matrix of one-step transition probabilities Pi j , where
 
P00 P01 P02 . . .
P10
 P11 P12 . . .

 .. .. .. 
P=
 . . . .

 Pi0
 Pi1 Pi2 . . .

.. .. ..
. . .

2.2 Examples

Example 2.1. (Forecasting the Weather) Suppose that the chance of rain tomorrow
depends on previous weather conditions only through whether or not it is raining
today and not on past weather conditions. Suppose also that if it rains today, then it
will rain tomorrow with probability α; and if it does not rain today, then it will rain
tomorrow with probability β .
If we say that the process is in state 0 when it rains and state 1 when it does not
rain, then the preceding is a two-state MC whose transition probabilities are given
by
 
α 1−α
P= .
β 1−β
t
u

Example 2.2. (A Communications System) Consider a communications system which


transmits the digits 0 and 1. Each digit transmitted must pass through several stages,
at each of which there is a probability p that the digit entered will be unchanged
when it leaves. Letting Xn denote the digit entering the nth stage, then {Xn ; n ≥ 0} is
a two-state MC having a transition probability matrix
 
p 1− p
P= .
1− p p

t
u
24 2 Markov chains

Example 2.3. On any given day Gary is either cheerful (C), so-so (S), or glum (G).
If he is cheerful today, then he will be C, S, or G tomorrow with respective proba-
bilities 0.5, 0.4, 0.1. If he is feeling so-so today, then he will be C, S, or G tomorrow
with probabilities 0.3, 0.4, 0.3. If he is glum today, then he will be C, S, or G to-
morrow with probabilities 0.2, 0.3, 0.5. Letting Xn denote Gary’s mood on the nth
day, then {Xn ; n ≥ 0} is a three-state MC (state 0 = C, state 1 = S, state 2 = G) with
transition probability matrix
 
0.5 0.4 0.1
P = 0.3 0.4 0.3 .
0.2 0.3 0.5

t
u

Example 2.4. (Transforming a Process into an MC) Suppose that whether or not
it rains today depends on previous weather conditions through the last two days.
Specifically, suppose that if it has rained for the past two days, then it will rain
tomorrow with probability 0.7; if it rained today but not yesterday, then it will rain
tomorrow with probability 0.5; if it rained yesterday but not today, then it will rain
tomorrow with probability 0.4; if it has not rained in the past two days, then it will
rain tomorrow with probability 0.2.
If we let the state at time n depend only on whether or not it is raining at time n,
then the preceding model is not an MC (why not?). However, we can transform this
model into an MC by saying that the state at any time is determined by the weather
conditions during both that day and the previous day. In other words, we can say
that the process is in
state 0 if it rained both today and yesterday,
state 1 if it rained today but not yesterday,
state 2 if it rained yesterday but not today,
state 3 if it did not rain either yesterday or today.
The preceding would then represent a four-state MC having a transition probability
matrix  
0.7 0 0.3 0
0.5 0 0.5 0 
P=  0 0.4 0 0.6 .

0 0.2 0 0.8
t
u

Example 2.5. (A Random Walk Model) An MC whose state space is given by the
integers i = 0, ±1, ±2, . . . is said to be a random walk if, for some number 0 < p < 1,

Pi,i = 0, Pi,i+1 = p, Pi,i−1 = 1 − p for i = 0, ±1, ±2, . . .

Then, the one-step transition matrix


2.3 Chapman-Kolmogorov equations 25

.. .. .. .. .. .. ..
 
 . . . . . . . 
. . . P0,−1 = 1 − p P0,0 = 0 P0,1 = p ... . . . . . .
 
. . .
P= ... P1,0 = 1 − p P1,1 = 0 P1,2 = p . . . . . . .
. . .
 . . . . . . P2,1 = 1 − p P2,2 = 0 P 2,3 = p . . .

.. .. .. .. .. .. ..
. . . . . . .

t
u
Example 2.6. (A Gambling Model) Consider a gambler who, at each play of the
game, either wins $1 with probability p or loses $1 with probability 1 − p. If we
suppose that our gambler quits playing either when he goes broke or he attains a
fortune of $N, then the gambler’s fortune is an MC having transition probabilities:

P00 = 1,
Pi,i+1 = p, Pi,i−1 = 1 − p for i = 1, 2, . . . , N − 1,
PNN = 1.

Hence, the one-step transition matrix


 
1 0 0 0 ... 0 0 0 0
1 − p 0 p 0 . . . 0 0 0 0
 
 .. .. .. .. .. .. .. .... 
P= . . . . . . . . ..

 0 0 0 0 ... 1− p 0 p 0
 
 0 0 0 0 ... 0 1− p 0 p
0 0 0 0 ... 0 0 01

States 0 and N are called absorbing states since once entered they are never left.
Note that the preceding is a finite state random walk with absorbing barriers (states
0 and N). t u

2.3 Chapman-Kolmogorov equations

Let Pnij be the probability that a process in state i will be in state j after n additional
transitions, i.e.,

Pnij = P (Xn+m = j | Xm = i) = P (Xn = j | X0 = i)

for any n ≥ 1 and any m ≥ 0. We call Pnij the n-step transition probability. Clearly,
P1i j = Pi j .
Theorem 2.2. (The Chapman-Kolmogorov equations)

Pn+m
ij = ∑ Pnik Pmkj
k∈S
26 2 Markov chains

for all n, m ≥ 1 and all i, j ∈ S, where S is the state space.


Proof.

Pn+m
ij = P (Xn+m = j | X0 = i)
= ∑ P (Xn+m = j | Xn = k, X0 = i) P (Xn = k | X0 = i)
k∈S
= ∑ P (Xn+m = j | Xn = k) P (Xn = k | X0 = i)
k∈S
= ∑ Pnik Pmkj .
k∈S

t
u
The Chapman-Kolmogorov equations provide a method for computing the n-step
transition probabilities.
Definition 2.3. (n-step Transition matrix) The n-step transition matrix P(n) is the
matrix of n-step transition probabilities Pnij , where

Pn00 Pn01Pn02 . . .
 
Pn10 Pn11Pn12 . . .
 
(n)
 .. .. .. 
P =  .n . . .

P n n
Pi1 Pi2 . . .
 i0 
.. .. ..
. . .

Theorem 2.3.

P(n+m) = P(n) · P(m) .

Particularly, P(n) = P(n−1) · P = P · · · P = Pn .


Example 2.7. Re-consider Example 2.1. If α = 0.7 and β = 0.4, then calculate the
probability that it will rain four days from today given that it is raining today.
Solution. Our goal is to calculate P400 . Note that
    
(2) 2 0.7 0.3 0.7 0.3 0.61 0.39
P =P = =
0.4 0.6 0.4 0.6 0.52 0.48

and
    
0.61 0.39 0.61 0.39 0.5749 0.4251
P(4) = (P2 )2 = = .
0.52 0.48 0.52 0.48 0.5668 0.4332

So, the desired probability P400 equals 0.5749. t


u
Example 2.8. Re-consider Example 2.4. Given that it rained on Monday and Tues-
day, what is the probability that it will rain on Thursday?
2.4 Classification of states 27

Solution. The two-step transition probability matrix is given by


  
0.7 0 0.3 0 0.7 0 0.3 0
0.5 0 0.5 0  0.5 0 0.5 0 
P(2) = P2 =  
 0 0.4 0 0.6  0 0.4 0 0.6

0 0.2 0 0.8 0 0.2 0 0.8


 
0.49 0.12 0.21 0.18
0.35 0.20 0.15 0.30
=0.20 0.12 0.20 0.48 .

0.10 0.16 0.10 0.64

Since rain on Thursday is equivalent to the process being in either state 0 or state
1 on Thursday, the desired probability is given by P200 + P201 = 0.49 + 0.12 = 0.61.
t
u
The probabilities we have considered so far are conditional probabilities. For
instance, Pnij is the probability that the state at time n is j given that the initial state
at time 0 is i. If we want to find out the unconditional distribution of the state at time
n, it is necessary to specify the probability distribution of the initial state. Let

α0,i = P (X0 = i) for i ∈ S.

Clearly, ∑ α0,i = 1. Then,


i∈S

αn,i = P (Xn = i) = ∑ P (Xn = i | X0 = j) P (X0 = j)


j∈S

= ∑ α0, j Pnji .
j∈S

That is, unconditional probabilities could be computed by conditioning on the initial


state. It also follows that

αn = α0 P(n) = αn−1 P,

where αn = (αn,0 , αn,1 , · · · ) and α0 = (α0,0 , α0,1 , · · · ).

2.4 Classification of states

Definition 2.4. State j is said to be accessible from state i (i → j) if Pnij > 0 for some
n ≥ 0. Two states i and j that are accessible to each other (i → j and j → i) are said
to communicate (i ↔ j).

Property 2.5. The relation of communication satisfies:


1. (Reflexivity) i ↔ i, for all i ≥ 0.
28 2 Markov chains

2. (Symmetry) If i ↔ j, then j ↔ i.
3. (Transitivity) If i ↔ j and j ↔ k , then i ↔ k.
We can divide the state space into a number of separate classes using the concept
of communication. Two states that communicate are said to be in the same class.
Within a class, all states communicate with each other.
Definition 2.5. An MC is said to be irreducible if there is only one class, i.e. if all
states communicate with each other.
Example 2.9. Consider the MC consisting of the three states 0, 1, 2 and having tran-
sition probability matrix 1 1 
2 2 0
P = 12 14 14  .

0 31 23
It is easy to verify that this MC is irreducible. For example, it is possible to go from
state 0 to state 2 since
0 → 1 → 2,
i.e., one way of getting from state 0 to state 2 is to go from state 0 to state 1 (with
probability 21 ) and then go from state 1 to state 2 (with probability 14 ). t
u
Example 2.10. Consider a Markov chain consisting of the four states 0, 1, 2, 3 and
having transition probability matrix
1 1 
2 2 0 0
 1 1 0 0
P= 2 2
1 1 1 1.

4 4 4 4
0 0 0 1

The classes of this Markov chain are {0, 1}, {2}, and {3}. Note that while state 0
(or 1) is accessible from state 2, the reverse is not true. Since state 3 is an absorbing
state (i.e., P33 = 1), no other state is accessible from it. tu
Let fi denote the probability that, starting in state i, the process will ever re-enter
state i. Clearly,

(n)
fi = ∑ fii ,
n=1

where
(n)
fii = P(Xn = i, Xn−1 6= i, . . . , X1 6= i|X0 = i)
represents the probability that starting from i, the 1st visit to state i occurs at time n.
(1)
Conventionally, fii = Pii .
(n)
Property 2.6. fii satisfies the following recursive formula:
n−1
(n) (k)
fii = Pnii − ∑ fii Pn−k
ii .
k=1
2.4 Classification of states 29

Proof. Note that

Pnii = P(Xn = i | X0 = i)
= P(Xn = i | Xn−1 = i, X0 = i)P(Xn−1 = i | X0 = i)
+ P(Xn = i | Xn−1 6= i, X0 = i)P(Xn−1 6= i | X0 = i)
(1)
= fii Pn−1
ii + P(Xn = i, Xn−1 6= i | X0 = i)
(1)
= fii Pn−1
ii + P(Xn = i, Xn−1 6= i | Xn−2 = i, X0 = i)P(Xn−2 = i | X0 = i)
+ P(Xn = i, Xn−1 6= i | Xn−2 6= i, X0 = i)P(Xn−2 6= i | X0 = i)
(1)
= fii Pn−1 n−2
ii + P(Xn = i, Xn−1 6= i | Xn−2 = i)Pii
+ P(Xn = i, Xn−1 6= i | Xn−2 6= i, X0 = i)P(Xn−2 6= i | X0 = i)
(1) (2)
= fii Pn−1 n−2
ii + f ii Pii
+ P(Xn = i, Xn−1 6= i | Xn−2 6= i, X0 = i)P(Xn−2 6= i | X0 = i)
(1) (2)
= fii Pn−1 n−2
ii + f ii Pii + P(Xn = i, Xn−1 6= i, Xn−2 6= i | X0 = i)
= ···
(1) (2) (n−1) 1 (n)
= fii Pn−1 n−2
ii + f ii Pii + · · · + f ii Pii + fii .

This completes the proof. t


u

Definition 2.6. (Recurrence v.s. Transience)


• State i is said to be recurrent if starting in state i, the process will definitely re-
enter state i (i.e., fi = 1).
• State i is said to be transient if starting in state i, the process will not definitely
re-enter state i (i.e., fi < 1).

Remark 2.3. In other words, state i is recurrent (or transient) if the process returns
to i in finite time with probability 1 (or less than 1).

The question now is how to determine the state i is recurrent or transient. Let
M denote the number of times (periods) the MC will be in state i. The following
property provides us one method for this purpose.

Property 2.7.
(

<∞ if and only if i is transient,
E[M|X0 = i] = ∑ Pnii
n=1 =∞ if and only if i is recurrent.

Proof. Note that



M= ∑ I{Xn = i},
n=0

where I is an indicator random variable, I{A} = 1 if A occurs and 0 otherwise. Then,


30 2 Markov chains
" #

E[M|X0 = i] = E ∑ I{Xn = i} | X0 = i
n=0

= ∑ E [I{Xn = i} | X0 = i]
n=0

= ∑ P (Xn = i | X0 = i)
n=0

= ∑ Pnii .
n=0

On the other hand, when i is recurrent,

E[M|X0 = i] = ∞

(since M > C for any C > 0). When i is transient, then

P(M = k|X0 = i) = fi fi · · · fi (1 − fi ) = fik−1 (1 − fi ), k = 1, 2, . . . .

In other words, M|X0 = i has a geometric distribution, and hence

1
E[M|X0 = i] = < ∞.
1 − fi
This completes the proof. t
u

Property 2.8. If state i is recurrent, and i ↔ j, then state j is recurrent.

Proof. There exists m ≥ 0 and n ≥ 0 such that Pnij > 0 and Pmji > 0. Then, for any
s ≥ 0,
Pm+n+s
jj ≥ Pmji Psii Pnij .
Hence,
∞ ∞ ∞
∑ Psj j ≥ ∑ Pm+n+s
jj ≥ Pmji Pnij ∑ Psii = ∞.
s=0 s=0 s=0

This completes the proof. t


u

The aforementioned property implies that in an equivalence class, either all states
are recurrent or all states are transient.

Property 2.9. In a finite state MC, at least one of the states must be recurrent.

Proof. Suppose the states are 0, 1, . . ., M and suppose that they are all transient.
Then after a finite amount of time (say, after time T0 ) state 0 will never be visited,
and after a time (say, T1 ) state 1 will never be visited, and after a time (say T2 ) state 2
will never be visited, and so on. Thus after a finite time T = max{T0 , T1 , . . . , TM } no
states will be visited. But as the process must be in some state after time T we arrive
at a contradiction, which shows that at least one of the states must be recurrent. t u
2.4 Classification of states 31

Property 2.10. All states of a finite irreducible MC are recurrent.


Proof. This is a direct consequence of Properties 2.8-2.9. t
u
Example 2.11. Let the MC consisting of the states 0, 1, 2, 3 have the transition
probability matrix
0 0 12 12
 
1 0 0 0 
P= 0 1 0 0  .

010 0
Determine which states are transient and which are recurrent.
Solution. It is a simple matter to check that all states communicate and this is a finite
chain. Hence, all states must be recurrent. t u
Example 2.12. Consider the MC having states 0, 1, 2, 3, 4 and
1 1 
2 2 0 0 0
 1 1 0 0 0
2 2 1 1 
P= 0 0 2 2 0 .

0 0 1 1 0
2 2
1 1 1
4 4 0 0 2

Determine the recurrent state.


Solution. This chain consists of the three classes {0, 1}, {2, 3} and {4}. We are
going to show that the first two classes are recurrent and the third transient.
For state 4, by Chapman-Kolmogorov equation,
4  n
1 1
Pn44 = ∑ P4k Pn−1 n−1
= P44 P44 = Pn−1 = .
k=0
k4 2 44 2

Then, state 4 is transient since ∑ Pn44 < ∞.
n=1
For states 0 and 1, by Chapman-Kolmogorov equation,
1 n−1
Pn00 = Pn−1 n−1
+ Pn−1

01 P10 + P00 P00 = P
2 01 00
1 n−1
Pn01 = Pn−1 n−1
+ Pn−1

00 P01 + P01 P11 = P .
2 00 01

From these equations, we know that Pn00 = Pn01 . Therefore, Pn00 = Pn−1 1
00 = · · · = P00 =

1/2. Hence, states 0 and 1 are recurrent, since ∑ Pn00 = ∞. Similarly, we can show
n=1
that states 2 and 3 are recurrent. t
u
Example 2.13. (A Random Walk) Consider an MC whose state space consists of the
integers i = 0, ±1, ±2, . . . and has transition probabilities given by
32 2 Markov chains

Pi,i+1 = p = 1 − Pi,i−1 i = 0, ±1, ±2, . . .

where 0 < p < 1.


Solution. Clearly, all states communicate, and hence they are either all recurrent or

all transient. Let us consider state 0 and determine if ∑ Pn00 is finite or infinite.
n=1
From state 0 to state 0, the number of plays must be even. So, we have
2n−1
P00 =0 n = 1, 2, . . .

It would be even after 2n trials if and only if we won n of these and lost n of these.
Each play of the game results in a win with probability p and a loss with probability
1 − p, the desired probability is thus the binomial probability
 
2n n (2n)!
P2n
00 = p (1 − p)n = [p (1 − p)]n , n = 1, 2, 3, . . . .
n n!n!

Using Stirling’s formula, √


n! ∼ nn+1/2 e−n 2π
(for positive an and bn , an ∼ bn if lim an /bn = 1), we have
n→∞

(4p (1 − p))n
P2n
00 ∼ √ .
πn

Hence, ∑ Pn00 will converge if and only if
n=1


(4p (1 − p))n
∑ √
n=1 πn

does (if an ∼ bn , then ∑ an < ∞ if and only if ∑ bn < ∞). It converges if and only
n n
if 4p (1 − p) < 1. Note that 4p (1 − p) ≤ 1, where the equality holds when p = 12 .

Hence, ∑ Pn00 = ∞ if and only if p = 12 . t
u
n=1

Remark 2.4. When p = 12 , the chain is recurrent and the process is called a symmet-
ric random walk. When p 6= 21 , the chain is transient.

2.5 Limiting probabilities

This section aims to study the limiting behavior of Pnij when n → ∞. To fulfill this,
some preliminary concepts are needed.
Definition 2.7. (Periodicity) For all state i of a MC,
2.5 Limiting probabilities 33

d(i) = period of state i = g.c.d.{n ∈ {1, 2, . . .} : Pnii > 0},

where g.c.d. stands for the greatest common divisor (largest positive number that
divides the number without a remainder).

Remark 2.5. If Pnii = 0 for all n ∈ {1, 2, . . .}, then define d(i) = ∞.

Example 2.14. Suppose  


0 1 0 0
 0 0 1 0
P=
 0
.
0 0 1
1/2 0 1/2 0
Determine d(i) for i = 0.

Solution. d(i) = 2 for all i. This is left as an exercise. t


u

The following result implies that periodicity is a class property.

Property 2.11. If state i has period d, and states i and j communicate, then state j
also has period d (i.e. d(i) = d( j)).

Proof. Since i and j communicate, there exist positive integers s and k such that
Psij > 0 and Pkji > 0. Then, Ps+k s k
ii ≥ Pi j P ji > 0, indicating that d(i) | s + k.
n s+k+n
Suppose that P j j > 0. Then, Pii ≥ Psij Pnj j Pkji > 0, indicating that d(i) | s +
n
k + n. Hence, d(i) | n as long as P j j > 0. By the definition of d( j), we must have
d(i) | d( j). Similarly, we can obtain that d( j) | d(i). This completes the proof. t u

Definition 2.8. An MC is aperiodic if d(i) = 1 for all states i.

Remark 2.6. If Pii > 0 for all i, then this MC is aperiodic.

Define
Ri = min{n ∈ {1, 2, . . .} : Xn = i}
where the state i is recurrent. Clearly, Ri is the time of the first return to i. Then, it is
easy to see that
(k)
P(Ri = k|X0 = i) = fii , k = 1, 2, . . . ,
(k)
and ∑∞k=1 f ii = 1 by definition of recurrence.
Also, define

(k)
mi = E[Ri |X0 = i] = ∑ k fii ,
k=1

which is called “mean recurrent time of state i” (the expected time until the process
returns to state i).

Definition 2.9. (Positive recurrence v.s. Null recurrence)


• State i is said to be positive recurrent if mi < ∞.
34 2 Markov chains

• State i is said to be null recurrent if mi = ∞.


Remark 2.7. Note that “state i is recurrence” is equivalent to

P(Ri < ∞|X0 = i) = 1,

while it is not equivalent to



mi = ∑ kP(Ri = k|X0 = i) < ∞
k=1

(k)
(e.g., fii = P(Ri = k|X0 = i) = 1/[k(k + 1)]). t
u
The following result implies that both positive recurrence and null recurrence are
class property.
Property 2.12. Suppose that i ↔ j. Then,
(i) i is positive recurrent if and only if j is positive recurrent;
(ii) i is null recurrent if and only if j is null recurrent.
Proof. Omitted. t
u
Definition 2.10. Positive recurrent and aperiodic states are called ergodic.
In general (see Fig.2.1), (a) the states of an MC may be divided into two sets
(one of which may be empty). One set is composed of all the recurrent states, and
the other one contains all the transient states;
(b) the recurrent states may be decomposed uniquely into two closed sets (one of
which may be empty). Within each closed set all states inter-communicate and they
are all of the same type and period. Between any two closed sets no communication
is possible.
From now on, we say an MC is positive recurrent/null recurrent/transient, if all
stats in this MC is positive recurrent/null recurrent/transient.
Before giving our main theorem, we introduce the definition of Stationary Dis-
tribution. Let e0 be the column vector of ones.
Definition 2.11. {pi }∞
i=0 is called a stationary distribution if {pi }i=0 satisfy

p = pP and pe0 = 1,

where p = (p0 , p1 , . . .) with pi ≥ 0 for all i.


If we let α0,i = P(X0 = i) = pi for all states i, then
∞ ∞
α1,i = P(X1 = i) = ∑ P(X1 = i|X0 = j)P(X0 = j) = ∑ p j P ji ,
j=0 j=0

which equals to pi (by construction). Hence, we know that X1 ∼ X0 . By induction,


we have that
2.5 Limiting probabilities 35

Transience

(possible) (possible)

Positive Recurrence Null Recurrence

Fig. 2.1 Decomposition of all states in MC

{pi }∞
i=0 ∼ X0 ∼ X1 ∼ X2 ∼ · · · .

In other words, if an MC is started according to a stationary distribution, then the


MC follows this distribution at all future points of time (i.e., this MC is stationary).
The question now is whether the MC can be stationary? The following theorem
answers this question.
Theorem 2.4. (Basic Limit Theorem) Consider an irreducible and aperiodic MC.
Either of the following scenarios happens:
(i) All states are null recurrent (or transient); in this case,

π j = lim Pnij = lim Pnj j = 0


n→∞ n→∞

for all i and j, and no stationary distribution exists.


(ii) All states are positive recurrent; in this case,
1
π j = lim Pnij = lim Pnj j = > 0,
n→∞ n→∞ mj

for all states j, and {π j }∞j=1 is an unique stationary distribution of this MC.
Proof. Omitted. t
u
Remark 2.8. Some intuitions can be given to see why these π j s construct a stationary
distribution. Define π = (π0 , π1 , . . .). Recall the Chapman-Kolmogorov equations:

Pnij = ∑ Pn−1
ik Pk j . Letting n → ∞ on both sides, it yields
k=0


πj = ∑ πk Pk j ⇐⇒ π = πP.
k=0


Meanwhile, note that 1 = ∑ Pnik . Letting n → ∞ on both sides, it yields
k=0
36 2 Markov chains

1= ∑ πk ⇐⇒ πe0 = 1.
k=0

Particularly, for case (ii), we usually say that the MC is ergodic. t


u
The primary interpretation of π is as the limiting distribution, namely, after the
process has been in operation for a long duration, the probability of finding the
process in state j is π j (irrespective of the starting state i).
However, π j can also represent the “long-run mean fraction of time” that the pro-
cess is in state j (irrespective of the starting state i). The explanation is as follows:
Note that the fraction of time that the MC visits state j during the time interval
from time 0 to time n − 1 is
1 n−1
∑ I{Xk = j}.
n k=0
Thus, the mean fraction of time the MC visits state j during the interval from 0 to
n − 1 given that X0 = i is
 n−1
1 n−1

1
E I{X = j} = i = ∑ Pkij .


n k=0
k X0
n k=0

Recall that if an → a, then 1n ∑n−1


k=0 ak → a as n → ∞. By the Basic Limit Theorem,

lim Pkij = π j ,
k→∞

and thus
1 n−1 k
∑ Pi j → π j
n k=0
as n → ∞.

Thus, as n → ∞, the mean fraction of time the MC visits j from time 0 to time n − 1
given X0 = i is π j .
Property 2.13. In a finite-state MC, all recurrent states are positive recurrent.
Proof. Left as an exercise. t
u
Example 2.15. Re-consider Example 2.1, in which we assume that if it rains today,
then it will rain tomorrow with probability α; and if it does not rain today, then it
will rain tomorrow with probability β . If we say that the state is 0 when it rains and
1 when it does not rain, the limiting probabilities π0 and π1 are given by

π0 = απ0 + β π1 ,
π1 = (1 − α)π0 + (1 − β )π1 ,
1 = π0 + π1 ,

which yields that


β 1−α
π0 = , π1 = .
1+β −α 1+β −α
2.5 Limiting probabilities 37

For example if α = 0.7 and β = 0.4, then the limiting probability of rain is π0 =
4
7 = 0.571. tu
Example 2.16. Re-consider Example 2.3 in which the mood of an individual is con-
sidered as a three-state MC having a transition probability matrix
 
0.5 0.4 0.1
P = 0.3 0.4 0.3 .
0.2 0.3 0.5

In the long run, what proportion of time is the process in each of the three states?
Solution. The limiting probabilities πi , i = 0, 1, 2, are obtained by solving the fol-
lowing set of equations

 π0 = 0.5π0 + 0.3π1 + 0.2π2 ,

π1 = 0.4π0 + 0.4π1 + 0.3π2 ,


 π 2 = 0.1π0 + 0.3π1 + 0.5π2 ,
1 = π0 + π1 + π2 .

Solving yields
21 23 18
π0 = ≈ 33.9%, π1 = ≈ 37.1%, π2 = ≈ 29.0%.
62 62 62
t
u
Example 2.17. (A Model of Class Mobility) A problem of interest to sociologists is
to determine the proportion of society that has an upper- or lower-class occupation.
One possible mathematical model would be to assume that transitions between so-
cial classes of the successive generations in a family can be regarded as transitions
of an MC. That is, we assume that the occupation of a child depends only on his or
her parent’s occupation. Let us suppose that such a model is appropriate and that the
transition probability matrix is given by
 
0.45 0.48 0.07
P = 0.05 0.70 0.25 .
0.01 0.50 0.49

That is, for instance, we suppose that the child of a middle-class worker will attain an
upper-, middle-, or lower-class occupation with respective probabilities 0.05, 0.70,
0.25.
The limiting probabilities πi , thus satisfy

π0 = 0.45π0 + 0.05π1 + 0.01π2 ,


π1 = 0.48π0 + 0.70π1 + 0.50π2 ,
π2 = 0.07π0 + 0.25π1 + 0.49π2 ,
1 = π0 + π1 + π2 .
38 2 Markov chains

Hence,
π0 = 0.07, π1 = 0.62, π2 = 0.31.
In other words, a society in which social mobility between classes can be described
by an MC with transition probability matrix above has, in the long run, 7 percent of
its people in upper-class jobs, 62 percent of its people in middle-class jobs, and 31
percent in lower-class jobs. t u
Example 2.18. Consider a gambler who at each play of the game has probability p
of winning one unit and probability q = 1 − p of losing one unit. Assuming that
successive plays of the game are independent, what is the probability that, starting
with i units, the gambler’s fortune will reach N before reaching 0?
Solution. If we let Xn denote the player’s fortune at time n, then the process {Xn :
n = 0, 1, 2, . . . , N} is an MC with transition probabilities

P00 = PNN = 1,
Pi,i+1 = p = 1 − Pi,i−1 , i = 1, 2, . . . , N − 1.

This MC has three classes, namely, {0}, {1, 2, . . . , N − 1}, and {N}; the first and
third class being recurrent and the second transient. Since each transient state is vis-
ited only finitely often, it follows that, after some finite amount of time, the gambler
will either attain his goal of N or go broke.
Let Pi , i = 0, 1, . . . , N, denote the probability that, starting with i, the gambler’s
fortune will eventually reach N. By conditioning on the outcome of the initial play
of the game we obtain

Pi = pPi+1 + qPi−1 , i = 1, 2, . . . , N − 1

or equivalently, since p + q = 1,

pPi + qPi = pPi+1 + qPi−1

or
q
Pi+1 − Pi = (Pi − Pi−1 ) , i = 1, 2, . . . , N − 1
p
Hence, since P0 = 0, we obtain from the preceding line that
q q
P2 − P1 = (P1 − P0 ) = P1
p p
 2
q q
P3 − P2 = (P2 − P1 ) = P1
p p
..
.
 i−1
q q
Pi − Pi−1 = (Pi−1 − Pi−2 ) = P1
p p
..
.
2.6 Mean time spent in transient states 39
 N−1
q q
PN − PN−1 = (PN−1 − PN−2 ) = P1
p p

Adding the first i − 1 of these equations yields


"  2  i−1 #
q q q
Pi − P1 = P1 + +···+
p p p

or  i
 1 − (q/p) P1

if
q
6 1
=
Pi = 1 − (q/p) p
q
iP1
 if = 1
p
Now, using the fact that PN = 1, we obtain that

1 − (q/p) 1


 N
if p 6=
P1 = 1 − (q/p) 2
 1 1
 if p =
N 2
and hence 
 1 − (q/p)i 1


N
if p 6=
Pi = 1 − (q/p) 2
 i 1

 if p =
N 2
(Remark:) As N → ∞,
  i
1 − q 1
if p >

Pi → p 2
 1
0 if p ≤
2
Thus, if p > 12 , there is a positive probability that the gambler’s fortune will increase
indefinitely; while if p ≤ 12 , the gambler will, with probability 1, go broke against
an infinitely rich adversary. t u

2.6 Mean time spent in transient states

Consider the MC whose P is given by


 
1 00
P = α β γ  ,
0 01
40 2 Markov chains

where α, β , γ ≥ 0 and α + β + γ = 1. Note that state 1 is transient and states 0 and


2 are both absorbing (and positive recurrent).
More generally, consider a finite-state MC with states

0, 1, 2, . . . , M − 1, M, M + 1, . . . , N.

For all states i and j ∈ {0, 1, . . . , M − 1}, Pnij → 0 as n → ∞, and for i ∈ {M, M +
1, . . . , N}, Pii = 1. As a result, the transition matrix of this MC can then be parti-
tioned as  
Ptr R
P= .
0 I
Let
T = min{n ∈ {1, 2, . . .} : M ≤ Xn ≤ N}
representing “stopping time (or time absorption)” random variable.
Consider now a finite state Markov chain and suppose that the states are num-
bered so that {0, 1, 2, . . . , M − 1} denotes the set of transient states. Let Ptr be the
transition matrix in which the elements are the transition probabilities from transient
states into transient states, i.e.,
 
P00 P01 . . . P0,M−1
Ptr =  ... .. .. ..
.
 
. . .
PM−1,0 PM−1,1 . . . PM−1,M−1

Let sik denote the expected number of visits to state k (i.e. expected number of
time periods that the MC is in state k) prior to absorption, given that it starts in state
i (i.e., X0 = i) for 0 ≤ i ≤ M − 1. Then,
h T −1 i
sik = E ∑ I{Xn = k} X0 = i

n=0
h i h T −1 i
= E I{X0 = k} X0 = i + E ∑ I{Xn = k} X0 = i

n=1
h i h T i
= E I{X0 = k} X0 = i + E ∑ I{Xn = k} X0 = i .

n=1

Define the Kronecker delta function:


(
1 if i = j
δi j =
0 if i 6= j.

Using δi j , E[I{X0 = k} | X0 = i] = δik . Moreover,


h T i M−1 h T i
E ∑ I{Xn = k} X0 = i = ∑ E ∑ I{Xn = k} X1 = j, X0 = i P(X1 = j | X0 = i)

n=1 j=0 n=1
2.6 Mean time spent in transient states 41

M−1 h T i
= ∑ Pi j E ∑ I{Xn = k} X1 = j

j=0 n=1
M−1 h T −1 i
= Pi j E ∑ I{Xn = k} X0 = j


j=0 n=0
M−1
= ∑ Pi j s jk .
j=0

Therefore, we have obtained that


M−1
sik = δik + ∑ Pi j s jk .
j=0

Let S denote the matrix of values si j , i, j = 0, . . . , M − 1, e.g.,


 
s00 s01 . . . s0,M−1
S =  ... .. .. ..
.
 
. . .
sM−1,0 sM−1,1 . . . sM−1,M−1

Then,

S = I + Ptr S
⇐⇒ (I − Ptr ) S = I
⇐⇒ S = (I − Ptr )−1 .

As an application, we can use S to calculate fi j , where



(n)
fi j = ∑ fi j
n=1

with
(n)
fi j = P(Xn = j, Xn−1 6= j, . . . , X1 6= j|X0 = i).
That is, fi j is the probability that MC ever visits state j starting in state i.
Note that

si j = E [ time in j | start in i, ever transit to j ] fi j


+ E [ time in j | start in i, never transit to j ] (1 − fi j )
= (δi j + s j j ) fi j + δi j (1 − fi j )
= δi j + fi j s j j .

Solving the preceding yields,


42 2 Markov chains

si j − δi j
fi j = for i, j ∈ {0, 1, . . . , M − 1}.
sjj

Example 2.19. Consider the gambler’s ruin problem with p = 0.4 and N = 7. Start-
ing with 3 units, determine (a) the expected amount of time the gambler has 5 units;
(b) the expected amount of time the gambler has 2 units; (c) the probability that the
gambler ever has a fortune of 1.
Solution. (a)&(b) The matrix Ptr which specifies Pi j for i, j ∈ {1, 2, 3, 4, 5, 6} is as
follows:
1 2 3 4 5 6
1 0 0.4 0 0 0 0
2 0.6 0 0.4 0 0 0
Ptr = 3 0 0.6 0 0.4 0 0
4 0 0 0.6 0 0.4 0
5 0 0 0 0.6 0 0.4
6 0 0 0 0 0.6 0
Inverting I − Ptr gives
 
1.6149 1.0248 0.6314 0.3691 0.1943 0.0777
1.5372 2.5619 1.5784 0.9228 0.4857 0.1943
 
1.4206 2.3677 2.9990 1.7533 0.9228 0.3691
S = (I − Ptr )−1 = 
1.2458 2.0763 2.6299 2.9990 1.5784 0.6314 .

 
0.9835 1.6391 2.0763 2.3677 2.5619 1.0248
0.5901 0.9835 1.2458 1.4206 1.5372 1.6149

Hence, s3,5 = 0.9228 and s3,2 = 2.3677.


(c) Since s3,1 = 1.4206 and s1,1 = 1.6149, then
s3,1
f3,1 = = 0.8797.
s1,1
t
u

2.7 PageRank

PageRank was developed at Stanford University by Larry Page and Sergey Brin
in 1996 as part of a research project about a new kind of search engine. The first
paper about the project, describing PageRank and the initial prototype of the Google
search engine, was published in 1998; shortly after, Page and Brin founded Google
Inc., the company behind the Google search engine.
PageRank is an algorithm which aims to rank the pages. This algorithm is based
on the idea that “A page is ranked higher as there are more links to it”.
We use a simple illustrating example to introduce its idea. Suppose that there five
different pages, and they have a network relationship as in Fig. 2.2.
2.7 PageRank 43

ID 2 ID 3

ID 1

ID 4 ID 5

Fig. 2.2 Network of five pages

Let Xt be the page which the person is visiting at time t. It is reasonable to assume
that {Xt ;t = 1, 2, · · · } is an MC with the transition matrix
 
0 0 1/2 1/2 0
1/2 0 0 1/2 0 
 
P=1/2 0 0 0 1/2 .

 1 0 0 0 0 
1 0 0 0 0

Suppose that in the beginning, the person has the same chance to visit each page.
Hence, the unconditional distribution of the state at time 1 is

v1 = (1/5, 1/5, 1/5, 1/5, 1/5).

Next, we calculate vt , the unconditional distribution of the state at time t, by

vt = vt−1 P = v1 P(t−1) .

If this MC is irreducible, aperiodic and positive recurrent, by the basic limit the-
orem, we know that
 
π1 π2 π3 π4 π5
π1 π2 π3 π4 π5 
lim vt = lim v1 P(t−1) = v1 
 
t→∞ t→∞   = (π1 , π2 , π3 , π4 , π5 ) = π.
π1 π2 π3 π4 π5 
π1 π2 π3 π4 π5 
π1 π2 π3 π4 π5

Note that πi is the “long-run mean fraction of time” that the process in state i. Hence,
it is reasonable to rank pages according to their corresponding values of πi .
44 2 Markov chains

However, the MC may not be irreducible and aperiodic in practice. For example,
our simple illustrating example is not irreducible. To overcome this difficulty, it is
reasonable to assume that the person also has the chance (1−α) to visit any page (by
typing the address into location bar) at time t. Therefore, under this circumstance,

vt = αvt−1 P + (1 − α)v1

(usually, we take α = 0.85). Letting t → ∞, it follows that

v = αvP + (1 − α)v1 ,

i.e., v = lim vt = (1 − α)v1 (I − αP)−1 . Finally, we use v to rank pages. In our il-
t→∞
lustrating example, v = (0.4208, 0.0300, 0.2088, 0.2216, 0.1188), and hence all five
pages are ranked as follows:

ID 1 > ID 4 > ID 3 > ID 5 > ID 2.

One main disadvantage of PageRank is that it favors older pages. A new page,
even a very good one, will not have many links unless it is part of an existing site.
This calls for some new algorithms.

You might also like