Memoria
Memoria
GRAU DE MATEMÀTIQUES
Facultat de Matemàtiques
Universitat de Barcelona
MARKOV CHAINS
i
Acknowledgements
During the last months it has been highly important the guidance and support of
Dr. David Márquez, the director of this thesis. My most sincere words of thanks for
his time and the confidence he has shown on me. I would also like to acknowledge
the support provided by my family and friends, especially in the most difficult
moments. Their unconditional support has made possible the realization of this
project.
ii
Contents
1 Introduction 1
3 Chapman-Kolmogorov equation 10
4 Classification of states 13
4.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Classification of states . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Periodicity and cyclic classes . . . . . . . . . . . . . . . . . . . . . . 20
4.4 Example: an analysis of the random walk on Z . . . . . . . . . . . . 22
5 Hitting times 24
7 Ergodic theory 42
7.1 Ergodic theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.2 Finite state space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8 Conclusions 52
B Examples 58
iii
1 Introduction
In the field of probability theory we find the concept of stochastic process, a kind
of process that is characterized by a set of random variables that represent the
evolution of a system of random values along time.
The simplest case of a stochastic process is that its results depend directly from
the other previous results. This type of process is known as Markov chain and will
be the area which this bachelor’s degree final thesis will focus in.
One of the most important properties of a Markov chain, also called Markov
property, is that the values it can take in the future only depend on its present
value, not in past values, so it is said that Markov chains have no memory, due to
the track incurred till present does not affect its future path.
These chains receive their name from the Russian mathematician Andrey An-
dreyevich Markov, who dedicated his studies to stochastic process as well as be one
of the most important participants in the political and social events that occurred
in Europe at the beginning of the 20th Century.
It is essential to know that Markov chains have several and useful applications in
the field of the investigation. Though its applications are numerous, its importance
is greater when we talk about chemistry, medicine, biology and physics, perhaps the
four most important fields in the huge world of science. An easy example of these
applications is the Ehrenfest chain, which allows us to parametrize the exchange
of gas molecules between two bodies. However, this project is not focused on the
practical applications of Markov chains, in it we make a theoretical study of discrete
time Markov chains.
During the bachelor’s degree we have taken probability and statistics subjects.
However, the amount of time dedicated to these topics is not very high, so I found
interesting to focus this thesis on these fields. After looking for several options, I
decided that Markov chains would be an interesting topic due to its characteristic
of lack of memory caught my attention.
1
Once introduced several useful concepts in the prior sections, in chapter 4 we
will focus on studying the behavior of each one of the states of a Markov chain. In
particular, we will analyze the possibility of returning to every state.
In the previous section we have studied in the hypothetic return to the original
state. In chapter 5, we will also focus on arriving at a given state, but now the
target is to reach a subset of states of the Markov chain, in which we may not be
initially.
Finally, the last two chapters are strongly related between them. Here, we study
the long term behavior of a Markov chain and the invariant distributions that the
stochastic matrix of the Markov chain presents. Additionally, in chapter 6 we
will see some applications of the previous distributions and in chapter 7 we will
demonstrate the uniqueness of the invariant distribution.
2
2 Discrete time Markov chains
We already know that Markov chains are very important within the theory of prob-
abilities. In this chapter, we begin focusing on discrete Markov chains, in particular
homogeneous chains, which do not depend on time. This type of chains go with us
throughout the chapter where we study one important result, which will allow us
to know which chains satisfy the homogeneous property. At the end of this section
we also include some basic examples about homogeneous Markov chains.
To begin this section we will give some basic definitions about Markov theory
which are a great help to achieve results that are in later sections.
We are going to study the discrete case, that is to say T = Z+ = {0, 1, 2, ...}; as
a result we have the process defined as {Xt ; t ≥ 0}. Moreover, we will focus on the
stochastic process known as Markov chain. This process has no memory because
its future behaviour is only affected by the present one.
The element pi,j is called transition probability, that is to say, it is the probability
that in the point of time k, the process is in the state j given that in the point of
time k − 1 it was in i.
Now we will see, with the help of examples, that there is a strong relationship
between diagrams and the stochastic matrix described above.
Example 2.1.3.
1. We start with a general example in order to clearly see that the stochastic
matrix satisfies its own properties. Given a, b ∈ [0, 1], we have the diagram
3
and the stochastic matrix is
1−a a
b 1−b
2. Now, we are going to see a numerical example, in this case we have the diagram
Then, using the concepts defined so far we are going to study two important
issues for the Markov theory.
Definition 2.1.4. Given a stochastic process {Xn ; n ≥ 0} that takes values in the
state space I, we have
4
So the stochastic process {Xn ; n ≥ 0} is a Markov chain with initial distribution
ν = {νi ; i ∈ I} and transition matrix Π = (pi,j ; i, j ∈ I).
The last equality is known as Markov property, which tells us that the probability
of a future event only depends on the probability of the prior event instead of the
whole system evolution. This implies that Markov chains are a non-memory process.
is independent of n.
Now we prove the left implication. Considering (2.1), we have to study that
Xn is a Markov chain as well as it is homogenous. Firstly, we analyze the initial
distribution, in that way ∀i0 ∈ I we obtain
X X
P (X0 = i0 ) = νi0 pi0 ,j = νi0 pi0 ,j = νi0 .
j∈I j∈I
Secondly, we have to check Markov property, in this case we use the conditional
probability, so ∀i0 , ..., in−1 , i, j ∈ I we have
5
In order to prove the property we are studying, the value of the last equality must
be equal to the value of P (Xn+1 = j|Xn = i), we are going to check if this is certain
P (Xn+1 = j, Xn = i)
P (Xn+1 = j|Xn = i) =
P (Xn = i)
X
P (X0 = i0 , ..., Xn = i, Xn+1 = j)
i0 ,...,in−1 ∈I
= X
P (X0 = i0 , ..., Xn = i)
i0 ,...,in−1 ∈I
X X
νi0 pi0 ,i1 · · · pin−1 ,i pi,j pi,j νi0 pi0 ,i1 · · · pin−1 ,i
i0 ,...,in−1 ∈I i0 ,...,in−1 ∈I
= X = X = pi,j .
νi0 pi0 ,i1 · · · pin−1 ,i νi0 pi0 ,i1 · · · pin−1 ,i
i0 ,...,in−1 ∈I i0 ,...,in−1 ∈I
The result we have just obtained is very important, due to it is useful to check if a
Markov chain is homogenous or not. We use this result in order to prove that given
a homogenous Markov chain, this can be moved, and in that way obtain another
chain that is homogenous too.
Proposition 2.1.7. Given {Xn ; n ≥ 0}, a HM C(ν, Π); then {Xm+n ; m ≥ 0} is
another HM C(η, Π), that is to say, ∀m ≥ 0, ∀j0 , ..., jn ∈ I
P (Xm+0 = j0 , ..., Xm+n = jn ) = P (Xm = j0 )pj0 ,j1 · · · pjn−1 ,jn .
6
2.2 Basic examples
In this section, we are studying some basic examples about Markov chains, in
particular about homogeneous ones. These type of chains are considered as one of
the most important for getting into the Markov theory world.
1. Random walks on Z:
Given the stochastic process {Xn ; n ≥ 0}, we define X0 as initial position which
is constant (X0 = 0) and the process
n
X
Xn = X0 + ξ1 + · · · + ξn = X0 + ξi
i=1
Proof. In this case we have to prove the Markov property and the homogeneous
one, to see that Xn is a homogeneous Markov chain. Firstly, let us to see the first
property using the conditional probability, so ∀n ≥ 0 and ∀i0 , ..., in−1 , i, j ∈ I we
have
P (f (Xn , Zn+1 = j), Xn = i) P (f (i, Zn+1 ) = j, Xn = i)
P (Xn+1 = j|Xn = i) = =
P (Xn = i) P (Xn = i)
P (f (i, Zn+1 ) = j)P (Xn = i)
= = P (f (i, Zn+1 ) = j) := qi,j .
P (Xn = i)
In order to satisfy the property we are studying, the value of the last equality must
be equal to the value of P (Xn+1 = j|X0 = i0 , ..., xn−1 = in−1 , Xn = i), we are going
7
to check if this is certain
With that, we have that given a set of identically distributed random variables,
which also are independent, we can build different Markov chains, which satisfy the
properties defined at the beginning.
To conclude this example we give the transition matrix, where p and q = 1 − p
are probabilities defined above (p ∈ (0, 1))
.. ..
. .
· · · 0 p 0 0 0 · · ·
· · · q 0 p 0 0 · · ·
Π=
· · · 0 q 0 p 0 · · ·
· · · 0 0 q 0 p · · ·
.. ..
. .
8
2. Random walk on Z with absorbing barriers:
Consider two players A and B who play at coin flipping (or heads or tails) and
their capital are a and b respectively. The game ends when one player runs out of
money.
This game is the practice of throwing a coin in the air. Consider the player A
who wins 1 coin when the result is a head, this happen with probability p; moreover,
the player loses 1 coin if the result is a tail, this happen with probability q = p − 1.
Given the stochastic process {Xn ; n ≥ 0}, which we note as the evolution of the
capital for the player A. Therefore, we define X0 = a as initial condition and the
process Xn+1 = Xn + ξn+1 such that Xn : Ω −→ {0, 1, ..., a + b} where {ξi ; i ≥ 1} is
a set of identically distributed random variables and also independents.
To conclude this example we give their transition matrix, where p and q = 1 − p
are probabilities defined above
1 0 0 0 ··· 0
q 0 p
0 ··· 0
0 q 0
p ··· 0
Π = 0 0 q
0 ··· 0
.. .. ..
. . .
0 0 · · · q 0 p
0 0 ··· 0 0 1
9
3 Chapman-Kolmogorov equation
One of the most relevant elements when introducing to Markov chains is the Chapman-
Kolmogorov equation. In order to study this type of equation, in this chapter, we
focus on analyzing the simple way that allows us to discompose the probability of
moving from one state i to another state j in m steps, in the sum of the probabilities
of the trajectories that go from the state i to the state j and pass through any other
state in an intermediate point of time.
In this chapter, we should take into account a probability space (Ω, F, P ) and a
family of discrete random variables Xn : Ω −→ I where {Xn ; n ≥ 0} is a HMC(ν, Π)
and I is a countable set called state space.
Definition 3.0.1. Given {Xn ; n ≥ 0} a time homogeneous Markov chain, we define
the m-step transition probability, with m > 1, as
(m)
pi,j = P (Xm = j|X0 = i) = P (Xn+m = j|Xn = i), ∀i, j ∈ I
which is the probability of going from i to j in m steps. In this case, the transition
matrix is given as Πm = Π(m) := (pm i,j ; i, j ∈ I).
Observation 3.0.2. Considering the last definition, for m = 1 we have the tran-
(1)
sition probability pi,j = pi,j = P (Xn+1 = j|Xn = i) which goes from i to j in one
step. Now, the transition matrix is Π(1) = Π.
Moreover, we are checking that m-step transition probability is well defined,
(2)
hence we have to calculate pi,j = P (Xn+2 = j|Xn = i). In order to compute
Xn+2 = j, Xn+1 must go through some state k.
(2)
X
pi,j = P (Xn+2 = j|Xn = i) = P (Xn+2 = j, Xn+1 = k|Xn = i)
k∈I
X P (Xn+2 = j, Xn+1 = k, Xn = i)
=
k∈I
P (Xn = i)
X P (Xn+2 = j, Xn+1 = k, Xn = i) P (Xn+1 = k, Xn = i)
= ·
k∈I
P (Xn = i) P (Xn+1 = k, Xn = i)
X P (Xn+2 = j, Xn+1 = k, Xn = i) P (Xn+1 = k, Xn = i)
= ·
k∈I
P (Xn+1 = k, Xn = i) P (Xn = i)
X X
= P (Xn+2 = j|Xn+1 = k, Xn = i)P (Xn+1 = k|Xn = i) = pi,k pk,j ,
k∈I k∈I
where the last result is the i, j-th entry of the transition matrix Π2 .
After that, if we do mathematical induction on the number of steps, we get the
following result.
Proposition 3.0.3. The m-step transition probability is the m-th power of the
m
transition matrix Π, that is to say Πm = Π · · · Π, ∀m > 2
10
Proof. To study this proposition, we just need to prove the following equality
(m)
X (m−1)
pi,j = pi,k pk,j , ∀m > 2
k∈I
which implies that Πm = Πm−1 Π, because of the first definition. Using the condi-
tional probability we get
(m)
X
pi,j = P (Xn+m = j|Xn = i) = P (Xn+m = j, Xn+m−1 = k|Xn = i)
k∈I
X P (Xn+m = j, Xn+m−1 = k, Xn = i)
=
k∈I
P (Xn = i)
X P (Xn+m = j, Xn+m−1 = k, Xn = i) P (Xn+m−1 = k, Xn = i)
= ·
k∈I
P (X n = i) P (Xn+m−1 = k, Xn = i)
X P (Xn+m = j, Xn+m−1 = k, Xn = i) P (Xn+m−1 = k, Xn = i)
= ·
k∈I
P (Xn+m−1 = k, Xn = i) P (Xn = i)
X
= P (Xn+m = j|Xn+m−1 = k, Xn = i)P (Xn+m−1 = k|Xn = i)
k∈I
X (m−1)
= pk,j pi,k .
k∈I
Now, considering the defined items and the outcomes obtained in the prior intro-
duction, we are already able to study the Chapman-Kolmogorov equation, which is
defined as following.
11
have
(m+n)
X
pi,j = P (Xm+n = j|X0 = i) = P (Xm+n = j, Xm = k|X0 = i)
k∈I
X
= P (Xm+n = j|Xm = k, X0 = i)P (Xm = k|X0 = i)
k∈I
X (m) (n)
= pi,k pk,j .
k∈I
Then we are studying the effects of considering that the first state of the HMC(ν, Π)
is generated randomly, so ∀j ∈ I we can write
X X
P (Xn = j) = P (X0 = i, Xn = j) = P (X0 = i)P (Xn = j|X0 = i),
i∈I i∈I
(n)
where P (Xn = j|X0 = i) = pi,j and P (X0 = i) = νi . Using this last notation, we
can rewrite the previous equality as follows
X (n)
P (Xn = j) = νi pi,j .
i∈I
With this we obtain the product between the transition matrix and a initial distri-
bution which form a vector, then we get
(n)
X (n)
νj = P (Xn = j) = νi pi,j where ν (n) = νΠn is the law for Xn .
i∈I
Also using Observation 3.0.4., for all j ∈ I, we can express the equality as
follows
(n)
X (n) X X
νj = P (Xn = j) = νi pi,j = νi pi,i1 · · · pin−1 ,j .
i∈I i∈I i1 ,...,in−1 ∈I
To continue, we study how we can determine the law of the random vector
(Xn1 , ..., Xnm ) with n1 , ..., nm ≥ 0. For this, it is necessary using the distribution ν
and the transition matrix Π. In general, we use the compound probability to find
the law of the vector, so ∀i1 , ..., im ∈ I we have
P (Xn1 = i1 , ..., Xnm = im )
= P (Xn1 = i1 )P (Xn2 = i2 |Xn1 = i1 ) · · · P (Xnm = im |Xn1 = i1 , ..., Xnm−1 = im−1 )
(n ) (n −n ) (nm −nm−1 )
X
= νk pk,i11 pi1 ,i22 1 · · · pim−1 ,im .
k∈I
With this, the result obtained consists in the product of the initial distribution
and the elements of the stochastic matrix, which are raised to a given power, so it
presents the same characteristics than the law obtained in the case of the homoge-
neous Markov chain Xn .
12
4 Classification of states
In this chapter, we analyze the way in which we can classify the states that compose
a Markov chain. This classification may be based in the communication relationship
that the different states present, so we study the possibility of moving from one state
to another and then come back to the initial state, or weather given a specific state
we do not move from this point.
In this section, we should take into account a probability space (Ω, F, P ) and a
family of discrete random variables Xn : Ω −→ I where I is a countable set and
{Xn ; n ≥ 0} is a HMC(ν, Π) where the transition matrix is Π = (pi,j ; i, j ∈ I).
Once given this definition, we use it to introduce the following concept, in which
we talk about the possible relationship between two states.
Definition 4.1.2. The state i communicates with j if they are accessible between
them, in other words, this is i is accessible from j and j is accessible from i, which
we write as i ↔ j.
(n +m2 )
X (m ) (n ) (m ) (n )
pl,i 2 = pl,k 2 pk,i2 ≥ pl,j 2 pj,i2 > 0.
k∈I
13
Therefore, we have that the property ↔ is an equivalence relation which gives a
partition of the state space, I, into equivalence classes, which are called communi-
cation classes. Moreover, we have that two states that communicate are in the same
class. The concept we have just studied helps us to define the following property.
Definition 4.1.3. A Markov chain is irreducible if there is an unique equivalence
class, in other words, when every state communicates with each other.
Now, we define the concept of a closed class, which will be useful later for clas-
sifying states.
Definition 4.1.4. A subset C ⊆ I is a closed class if i ∈ C and i → j implies that
j ∈ C, that is to say, from a state of a class C we can never have access to a state
of I \ C (hence, we cannot move from it).
To continue, once given the last definition, we say that a class C is closed if the
elements of the stochastic matrix satisfy the following property
X
pi,j = 1, ∀i ∈ C.
j∈C
Ti = inf {n ≥ 1; Xn = i} ,
this is the first time in which the chain visits the state i.
14
Definition 4.2.3. The probability that a chain which starts in state i pass through
j, denoted ρi,j , is defined as
Moreover, we have the particular case where the probability that a chain which
starts in state i returns to the starting point, i, it is denoted as ρi,i .
Now, we are going to introduce
a new concept, for which we will need to consider
1 if X = k
the indicator function 1{X=k} = .
0 if X 6= k
Definition 4.2.4. The number of visits that the chain does to the state j, denoted
by N (j), is defined as
∞
1{Xn =j} .
X
N (j) =
n=1
Now, we are going to compute the probability that a Markov chain which starts
at i visits the state j for the first time in the point of time k and, the next time it
comes back to j, it takes n instants of time, then the probability is
Using the expression of ρi,j , we have that the probability of visiting the state i at
least twice is
∞ X
X ∞
Pi (N (j) ≥ 2) = Pi (Tj = k)Pj (Tj = n)
k=1 n=1
∞
! ∞
!
X X
= Pi (Tj = k) Pj (Tj = n) = ρi,j ρj,j .
k=1 n=1
15
we calculate the probability that a chain which starts at i visits the state j for the
first time in the point of time l, after this, the chain will visit again the state j k − 1
times.
k−1
= Pj (Tj = n) · · · Pj (Tj = m)Pi (Tj = l).
As we have done previously, once computed the last equality, we have the following
probability
∞ X
X ∞ ∞
X
Pi (N (j) ≥ k) = ··· Pi (Tj = l)Pj (Tj = m) · · · Pj (Tj = n)
l=1 m=1 n=1
∞
! ∞
! ∞
!
X X k−1 X
= Pi (Tj = l) Pj (Tj = m) ··· Pj (Tj = n)
l=1 m=1 n=1
k−1
= ρi,j ρj,j · · · ρj,j = ρi,j ρk−1
j,j .
This result is very important, due to it will help us to demonstrate some of the
theorems, related with the classification of states, that we will find later.
Considering the concepts previously defined and given one state i ∈ I, if the
probability satisfies the following equality ρi,i = 1, then
and hence, the state i is recurrent because we can visit this state ∞ times. Moreover,
the probability ρi,i may also be less than 1, ρi,i < 1, and in this case we have that
and now the state i is transient, because the number of times we visit this state is
low.
Once studied the previous equalities, we focus on the definition of the number of
times that a chain visits one state, in this context, we will introduce the calculation
of the expected value. Therefore, given the expected value Ei (1{Xn =j} ) = Pi (Xn =
(n)
j) = pi,j , the expected number of visits to state j, for a chain that begins in i, is
the following
∞
X (n)
Ei (N (j)) = pi,j .
n=1
16
Proof. Firstly, we consider the case in which the probability ρj,j is less than 1, so
k−1
ρj,j < 1 and using the property Pi (N (j) ≥ k) = ρi,j ρj,j we have
Secondly, suppose the following equality holds ρj,j = 1 and using the property
Pi (N (j) ≥ k) = ρi,j ρk−1
j,j we get
Once given the prior theorem, the opposite implication is also true. This means
that given the expected value of the number of times a chain visits one state, we
are able to know if this state is recurrent or transient.
To continue, we are going to study some important properties related with re-
current and transient states, which are necessary, in practice, to classify the states
of all classes.
Proof. We begin assuming that the next property does not hold j → i, so we have
j 6→ i. This fact means that once the chain has reached the state j it will never
been able to return to the state i again, in which it was located initially. Hence,
once we had given up the state i we cannot come back again, so the state i is not
recurrent, it is transient, which is a contradiction.
By the last proposition, we can check that given both states i, j; the state i is
recurrent if once we have given up it we are not able to come back, this means that,
i → j but j 6→ i.
17
Proposition 4.2.7. Consider two states i, j if i is recurrent and i → j, then j is
also recurrent.
the last equality is due to the fact that i is a recurrent state, so now we have that
j is also recurrent.
Corollary 4.2.8. Considering a communication class, C, then all its elements are
recurrent or transient.
Corollary 4.2.9. Given a class, C, which is finite, irreducible and also closed, then
all its states are recurrent.
Proof. Firstly, we have that a given class C is finite and closed, which implies that
there is at least one recurrent state. The class C is also irreducible, this means that
all its states are communicated so, considering the results studied before, these
states have to be recurrent.
Now, we introduce an example in which we analyze all the concepts we have just
studied.
Example 4.2.10. Consider the Markov chain with state space I = {1, 2, 3, 4, 5}
and transition matrix 1
0 12 0 0
2
0 1 0 3 0
4 4
1 2
Π = 0 0 3 0 3
1 1
4 2 0 41 0
1 1 1
3
0 3
0 3
18
In this case there are two communication classes which are
We begin studying the states of the class C1 . In this case, the state 4 is transient
because if we start in this state, we are able to move to state 1 and once there
we cannot return back to state 4. We know that if a state is transient, the rest
of the states within the same class are transient too. Hence, the whole class C1 is
transient.
Finally, we study the defining characteristics of the class C2 . This class is closed,
because of we can move from the state 4 ∈ C1 to the state 1 ∈ C2 , but once we
reach the class C2 we are not able to give up this class. Furthermore, this class is
finite and irreducible due to all the states satisfy the property of communication,
then, based on Corollary 4.2.9. C2 is a recurrent class.
Once defined the concepts of recurrent state and transient state, and having
studied some of their properties, we can define new ways to classify the states of a
Markov chain.
19
Definition 4.2.12. An essential state is one that allows exit and return, in other
words, given an essential state i for all j ∈ I such that i → j it is also true that
j → i.
Given the prior definition, if the state allows to leave it but not to return, then
it is called inessential, hence state i is inessential if for j ∈ I exist n ≥ 1 such that
(n) (m)
pi,j > 0 but pj,i = 0, ∀m ≥ 1.
Hence, we have seen that a state i ∈ I is periodic if the greatest common divisor
of the number of steps to return to the starting point is greater than one.
To continue, the following proposition studies the fact that if all states are in
the same class, then they have the same period.
Proposition 4.3.2. Consider two states i, j ∈ I which satisfy the following property
i ↔ j, then the period of the states are the same, d(i)=d(j).
Consequently, the period of a class can be defined as the period of its states.
Now, we consider the congruence equivalence relation, so given m, n ∈ Z we have
m ≡ n(mod d) which means that m − n is divisible by d. For the following results,
we suppose that C forms an essential states class and we fix a reference state i ∈ C
with period d.
(r)
To continue, given j ∈ C and considering r ≥ 1 such that pj,i > 0, if the
(m) (n) (m+r) (m) (r)
probabilities satisfy pi,j > 0 and pi,j > 0, then we have that pi,i ≥ pi,j pj,i > 0
(n+r) (n) (r)
and pi,i ≥ pi,j pj,i > 0. Suppose that the period of the state i is d, so it divides
20
m + r and also n + r. Consequently, m − n is divisible by d and hence we can rewrite
this fact as m ≡ n(mod d). Finally, we define sj as the remainder when n is divided
(n)
by d for any n with pi,j > 0.
To carry on, we are going to present the cyclic classes. In order study this
concept, we will need the outcomes obtained right now, where we have been studying
the congruence relation, so considering h ∈ {0, 1, ..., d − 1}, we define
(n) (n)
Ch = {j ∈ C; pi,j > 0 for sj ≡ h(mod d)} = {j ∈ C; pi,j > 0 for n ≡ h(mod d)},
the last equality is due to the congruence for a fixed module is an equivalence
d−1
[
relation, moreover C0 = Cd . Once given this, we can express C = Ch so the sets
h=0
C0 , ..., Cd−1 , which are disjoint, are called the cyclic subclasses of I.
Now, we study a result that will be helpful when classifying the states that
compose a Markov chain into cyclical subclasses. This result takes into account all
the non-null elements of the transition matrix.
To carry on, we introduce an example, of a Markov chain with finite state space,
in which we study the period and the cyclic subclasses.
Example 4.3.5. Consider the Markov chain with state space I = {1, 2, 3, 4, 5, 6, 7}
and stochastic matrix
0 0 0 1 0 0 0
0 0 0 0 0 0 1
1 0 0 0 0 0 0
0 21 0 0 1
Π = 0 0
2
1 2
0 0 0 0 0
3 3
2 1
3 3
0 0 0 0 0
0 0 0 0 13 2
3
0
In this case, we have that the Markov chain is irreducible, due to there is an unique
equivalence class. Now, we study the cyclic subclasses that the chain has. For this,
21
we note that all states are essential, thus, the period of this class is the period of
any state. Consider for example the state i = 1, and then d(1) = 3. Therefore,
there are three cyclic subclasses and these are the following ones
Now, to classify the stateP0, we use the Theorem 4.2.5., so it is only necessary
consider the following sum ∞
(m)
m=1 p0,0 , which is the expected value of visits to the
state 0, for a chain that begins in 0.
(2n+1)
Firstly, if we consider m as an odd number we have p0,0 = 0, so we cannot
come back to the initial state 0 with an odd number of movements, due to the
number of steps to the right should be equal to the number of steps to the left.
Then, if the number of movements is even we have that
(2n) 2n n
p0,0 = P (X2n = 0) = p (1 − p)n .
n
Therefore we get
∞ ∞ ∞ ∞
X (m)
X (2n)
X 2n n n
X (2n)!
p0,0 = p0,0 = p (1 − p) = pn (1 − p)n ,
n n!n!
m=1 n=1 n=1 n=1
22
(2n) (4p(1−p))n
Hence, we can approximate the probability p0,0 ≈ √
πn
. To continue, we
(2n)
have to consider two cases, the first one is when p = 12 then we have p0,0 ≈ √1
πn
so
the sum is ∞ ∞
X (m)
X 1
p0,0 ≈ √ =∞
m=1 n=1
πn
in this case the state 0 is recurrent as the last sum is infinite, and hence, all states
are recurrent.
(2n) n n
The second case is when p 6= 12 , then p0,0 ≈ (4p(1−p))
√
πn
= √r
πn
where 0 < r < 1 so
the sum is ∞ ∞ ∞
X (m)
X rn X
p0,0 ≈ √ ≤ rn < ∞
m=1 n=1
πn n=1
and now the state 0 is transient as the last sum is finite, and hence, all states are
transient.
Finally, we are going to study the cyclic subclasses. We observe that all the
states are considered as essentials, due to if we move from anyone of them we are
able to come back in a future point of time.
As we said before, the chain is irreducible, then exists an unique class which has
a period of 2, so we just have two cyclic subclasses which are
23
5 Hitting times
In this chapter, we should take into account a probability space (Ω, F, P ) and a
family of discrete random variables Xn : Ω −→ I where {Xn ; n ≥ 0} is a HMC(ν, Π)
and I is a countable set called state space. Let A be a subset of the state space I,
so A ⊆ I.
In this section, we study which is the probability that a Markov chain reaches
a state which is within the subset A. In order to analyze this concept would be
necessary to introduce some new concepts.
The random variable H A : Ω −→ {0, 1, 2, ...} ∪ {∞} defined by
H A = inf {n ≥ 0; Xn ∈ A}
is called the hitting time of A, which is the first time the chain hits the subset A.
In addition, if we consider the empty set, then its infimum is ∞.
To continue, we define the probability that, starting from state i ∈ I, the chain
hits A as
hA A
i = Pi (H < ∞).
Consider the particular case in which the chain is initially in state i and also i ∈ A,
then we have that the hitting time is H A = 0 and the probability hA i takes the
following value hAi = 1. We also have to consider the case in which the subset A is
A
a closed class, then hi is called absorption probability.
Now, we study the average time a chain needs to reach the subset A. This fact
is called mean hitting time and is defined by
X
µAi = E i (H A
) = nPi (H A = n) + ∞P (H A = ∞).
n<∞
In this case, we also consider the fact in which the subset A is a closed class, then
µA
i is called the absorption time.
In practice there are some cases in which we use the following notation
hA A
i = Pi (hit A), µi = Ei (time to hit A)
because these expressions are easy to find once we have the stochastic matrix.
To continue, we are going to study all the possible values that the probability hA
i
can take in function of whether the state i is on the subset A or not. To make this
analysis, we use the minimal solution expression which means that if x = (xi ; i ∈ I)
is a minimal solution and y = (yi ; i ∈ I) is another solution such that yi ≥ 0, then
yi ≥ xi ∀i ∈ I.
Theorem 5.0.1. The vector of probabilities hA = (hA i ; i ∈ I) is the minimal non-
negative solution of the following system
A
hi = P1 for i ∈ A
(5.1)
hAi = p hA
j∈I i,j j for i ∈
/ A.
24
Proof. Initially, we show that the probability hA is a solution of the system. Firstly,
if X0 = i ∈ A, then H A = 0 and in this case the probability is hA = 1. Secondly, if
X0 = i ∈/ A, then H A ≥ 1 and using the Markov property we have
and hence
X X
hA
i = Pi (H A < ∞) = Pi (H A , X1 = j) = Pi (H A < ∞|X1 = j)Pi (X1 = j)
j∈I j∈I
X
= hA
j pi,j
j∈I
In general, there exist multiple solutions for the system. In this context, let us
to prove that the vector hA is the minimal solution. Now, suppose l = (li ; i ∈ I) is
another solution to (5.1) so we have hA = li = 1 ∀i ∈ A, if otherwise we have i ∈
i P /A
we can rewrite the linear equation li = j∈I pi,j lj as follows
X X
li = pi,j lj + pi,j lj .
j∈A j ∈A
/
Repeating this argument, which consists in substitute for lj the final term we have
Finally, if the solution ljn is non-negative and the previous terms Pi (X1 ∈ A), ...,
Pi (Xn ∈ A, Xn−1 ∈ / A, ..., X1 ∈/ A) sum to Pi (H A ≤ ∞), then li ≥ Pi (H A ≤ n)
which implies that
li ≥ lim Pi (H A ≤ n) = Pi (H A < ∞) = hA
i .
n→+∞
25
Once we studied this result, we find a similar fact for the vector of mean hitting
times µA in the following theorem.
Theorem 5.0.2. The vector of mean hitting times µA = (µA i ; i ∈ I) is the minimal
non-negative solution of the following system
A
µi = 0 P for i ∈ A
µA
i = 1 + p µ
j∈I i,j j
A
for i ∈
/ A.
Proof. To begin we show that the mean hitting time µA is a solution of the system.
Firstly, if X0 = i ∈ A, then H A = 0 which implies that µA i = 0. Secondly, if
A
X0 = i ∈/ A, then H ≥ 1 and using the Markov property we have
Ei (H A |X1 = j) = 1 + Ej (H A ) = 1 + µA
j
1
X X
µAi = Ei (H A
) = Ei (H A
{X1 =j} ) = Ei (H A |X1 = j)Pi (X1 = j)
j∈I j∈I
X
= 1+ pi,j µA
j .
j ∈A
/
Repeating this argument, which consists in substitute for rj the final term we have
X
ri = Pi (H A ≥ 1) + · · · + Pi (H A ≥ n) + pi,j1 · · · pjn−1 ,jn rjn .
j1 ,...,jn ∈A
/
Pi (H A ≥ 1) + · · · + Pi (H A ≥ n) = Ei (H A ) = µA
ri ≥ lim i
n→+∞
and with this we get that the vector of mean hitting times µA = (µA
i ; i ∈ I) is the
minimal non-negative solution of the system.
26
To continue, we introduce two examples in which we analyze the concepts pre-
viously studied. In the first one the state space is finite but in the second one not,
in this case, the minimality condition is essential.
Example 5.0.3. Consider the Markov chain with symmetric random walk on the
integers 1, 2, 3, 4. We want to study which is the probability of absorption in 1 if
we start in the state 2 or in the state 3. Using the result of the first theorem, this
is the system (5.1), we have to compute h2 = P2 (hit 1) and h3 = P3 (hit 1).
1 1
h2 = P2 (hit 1) = h1 + h3 .
2 2
Similarly, the expression is obtained for the probability h3 and this is
1 1
h3 = P3 (hit 1) = h2 + h4 .
2 2
Both equations we have just found form a system of linear equations and hence, is
easier to get the values of h2 and h3 so, we have
(
h2 = 23 + 13 h4
1
h3 = 3
+ 23 h4 .
The value of h4 is not defined by the system (5.1) because being in state 4 we cannot
reach the state 1. In this case, we use the minimality condition and hence we get
h4 = P4 (hit 1) = 0. Substituting this value in the previous system we have that h2
and h3 take the following values h2 = 23 and h3 = 13 respectively. If we had not used
the minimality condition the result would be the same because, as the state space
is finite, to study the probabilities hi , is only necessary the prior diagram.
To continue, we compute the time that must take until the chain is absorbed
in state 1 or 4 if we start in the state 2 or in the state 3. In this case, we have
to compute the value of mean hitting time µ2 = E2 (time to hit {1, 4}) and µ3 =
E3 (time to hit {1, 4}) using the system of Theorem 5.0.2..
27
First we get µ1 = µ4 = 0 because we are initially in one of the states in which
we want to arrive.
Now, suppose that we start at state 2 and consider the situation after making
one step so we jump as we defined before. In this case the mean hitting time is
1 1 1
µ2 = 1 + µ1 + µ3 = 1 + µ3 .
2 2 2
Similarly, the expression is obtained for the mean hitting time µ3 and this is
1 1 1
µ3 = 1 + µ2 + µ4 = 1 + µ2 .
2 2 2
Using the two equations which we have just found, then, starting in the state 2, the
mean time for the chain to be absorbed by the state 1 or 4 is 2. On the other hand,
if we are initially in the state 3, the mean time to be absorbed by the state 1 or 4
is also 2.
Example 5.0.4. In the first chapter, we studied the example of a random walk on
Z with absorbing barriers, also called Gambler’s ruin on 0, 1, ... . Let us consider
the homogeneous Markov chain with the following diagram
Suppose that the gambler plays at heads or tails and initially he has i coins,
at each point of time he wins or loses one coin with probability p or q = 1 − p
respectively. The game ends when the gambler loses everything. We want to study
the probability that the player loses everything, we note that the only absorbing
state is {0}.
The transitions probabilities are
p0,0 = 1
pi,i−1 = 1 − p, pi,i+1 = p for i = 1, 2, ... .
28
rλ2 + sλ + t = 0. There are two solutions of the quadratic equation, these are
λ+ = 1 and λ− = pq , then hn = βλn+ + αλn− is a solution.
Firstly, if p 6= q we have that λ+ 6= λ− and we can solve the following equation
h0 = α + β which implies that β = h0 − α = 1 − α. Therefore, the previous
recurrence has the following general solution
i
q
hi = 1 − α + α , f or i ≥ 1
p
where α ∈ [0, 1] and the probability hi takes different values depending on the values
p and q. To continue we study two cases
(i) If p < q, then as hi ∈ [0, 1] implies that α = 0 and hence the minimal solution
is given by hi = 1.
(ii) If p > q and we have to find a minimal solution, the value of α has to be the
i
biggest one, then α = 1 and the minimal solution is given by hi = pq .
hi = 1 + αi, f or i ≥ 1
in this case the restriction hi ∈ [0, 1] implies that α = 0 and again we have hi = 1
for all i.
1
In conclusion, for p ≤ 2
the player loses everything.
29
6 Distribution and measure
In this chapter, we shall study the limiting behaviour of Markov chains when at
time n approaches infinity. In particular, we will focus on the relationship between
this behaviour and probability distributions which are invariant. The previous
concept tells us the fraction of times that the Markov chain spends in each state
as n becomes large. Moreover, we study other utilities that has this distribution in
Markov chains.
In this chapter, we should take into account a probability space (Ω, F, P ) and a
family of discrete random variables Xn : Ω −→ I where {Xn ; n ≥ 0} is a HMC(ν, Π)
and I is a countable set called state space.
For a Markov chain {Xn ; n ≥ 0} with transition matrix Π and invariant distri-
bution γ, we can rewrite in matrix form the condition (6.1) as follows
γΠ = γ
where γ is a row vector as we said above.
Given γ an invariant distribution, we will study that the distribution (or law) of
the HMC(γ, Π), {Xn ; n ≥ 0}, is independent of n, for all n ≥ 0. Therefore, we will
see that all random variables Xn have the same law.
For n = 1 we have the equality (6.1) which we have analyzed previously. Now
we study the case in which n is 2, for this we will use the Chapman-Kolmogorov
equation so we have
!
X (2) X X X X X
γi pi,j = γi pi,k pk,j = γi pi,k pk,j = γk pk,j = γj ,
i∈I i∈I k∈I k∈I i∈I k∈I
30
(2) P (2)
and hence we get γj = P (X2 = j) = i∈I γi pi,j = γj which is the same law as the
previous case, where n = 1.
After that, we use mathematical induction over the number of steps needed in order
to the chain reaches the state j. Assuming the result is true for n − 1, we will see
that also holds for n. To continue, we want to compute the distribution of the
random variable Xn . For this, first of all, we study whether equality (6.1) holds but
in this case are needed n steps to go from state i to state j, so we have
X (n) X X X X (n−1)
γi pi,j = γi pi,i1 · · · pin−1 ,j = γi pi,in−1 pin−1 ,j
i∈I i∈I i1 ,...,in−1 ∈I i∈I in−1 ∈I
!
X X (n−1)
X
= γi pi,in−1 pin−1 ,j = γin−1 pin−1 ,j = γj ,
in−1 ∈I i∈I in−1 ∈I
(n) P (n)
in this case we get γj = P (Xn = j) = i∈I γi pi,j = γj and this is the law of the
random variable Xn .
Therefore, if the distribution of the initial state X0 is γ, then the following equality
P (n)
i∈I γi pi,j = γj implies that, for all n, P (Xn = j) = γj so, all random variables
have the same law. In addition, we can note that the distribution of all random
variables is independent of n.
To continue, once given the definition of invariant distribution and some proper-
ties, we study the existence of an invariant distribution for any stochastic matrix.
After this, we analyze this fact through some examples.
and on the other hand we get that wn,i ≥ 0 because w ∈ [0, 1] and also pj,i ≥ 0.
Then, both necessary conditions are satisfied in order to consider wn as a probability.
31
The set of probability over the space I is a closed and bounded set in [0, 1]|I| .
To continue, suppose that there exists a convergent subsequence whose limit is an
element of the set defined above. This element has to be a probability over the
space I, which we note as γ. Thus, for any subsection {wnj ; j ≥ 1} we have that
wnj , when j takes very large values, converges to γ.
Now, we analyze that the probability γ is an invariant distribution. First we
have
nj −1 nj −1
1 X (m) 1 X 1
wnj − wnj Π = wΠ − wΠ(m+1) = (w − wΠnj )
nj m=0 nj m=0 nj
Now, to study the previous concepts, we will use two practical examples that
differ on the number of invariant distributions for the stochastic matrix.
Example 6.1.3. Consider the Markov chain with state space I = {1, 2} and tran-
sition matrix !
1 3
4 4
Π= 1 4
5 5
In this example, we want to study that this chain has an invariant distribution γ.
To find this distribution, we have to prove the equality γΠ = γ, this is
!
14 43
γ1 γ2 1 4
= γ1 γ2
5 5
γ1 = 14 γ1 + 51 γ2 ,
γ2 = 34 γ1 + 54 γ2 .
32
With this example, we have seen that the Markov chain with finite state space
has an unique invariant distribution. To carry on, we prove the prior result in one
of the examples that we have defined in Section 1, the random walk on Z with
absorbing barriers, with finite state space. In this case, we will see that there may
be more than one invariant distributions, therefore, the result above only assures
its existence.
Example 6.1.4. In this case, the Markov chain with state space I = {0, 1, ..., M }
is given by stochastic matrix
1 0 0 0 ··· 0
q
0 p 0 ··· 0
0
q 0 p ··· 0
Π = 0
0 q 0 ··· 0
.. .. ..
. . .
0 0 ··· q 0 p
0 0 ··· 0 0 1
Now we want to compute the invariant distribution for the stochastic matrix Π. For
this, we have to analyze the following equality γΠ = γ where γ = (γ1 , γ2 , ..., γM ),
so we have the following system
γ0 = γ0 + γ1 q,
γ1 = γ2 q,
γj = γj−1 p + γj+1 q for j = 2, ..., M − 2,
γM −1 = γM −2 p,
γM = γM −1 p + γM .
In the last example, we observe that sometimes there exists more than one in-
variant distribution, so this distribution is not always unique. Moreover, in the last
two examples, we have seen that there is at least one invariant distribution.
In the following theorem, in which we can note the relationship between invariant
distributions and n-step transition probabilities, we study that the invariant distri-
bution is an equilibrium distribution because for all instants of time the invariant
distribution is the same.
Theorem 6.1.5. Given the state space I which is finite, we suppose that for some
i ∈ I is satisfied that
(n)
pi,j → γj as n → ∞ for all j ∈ I,
then γ = (γj ; j ∈ I) is an invariant distribution.
33
Proof. Firstly, we know that 0 ≤ γj ≤ 1 for all j ∈ I, this also holds for the
(n) (n)
probability pi,j so this is 0 ≤ pi,j ≤ 1 for all n ≥ 1 and i, j ∈ I. Now, we analyze
the vector γ and see that it is a probability distribution. Using the commutation
between the summation and the limit, due to state space is finite, we have
X X (n)
X (n)
γj = lim pi,j = lim pi,j = lim 1 = 1.
n→+∞ n→+∞ n→+∞
j∈I j∈I j∈I
with this we get that the vector γ is an invariant distribution because the equality
holds.
Once given this result, we consider an example in which the state space I is not
finite, this can be the random walk on Z which we have studied in Section 4.4,
then the limit of the probability to go from state i to state j in n steps is invariant
(n)
because it satisfies pi,j → 0 := γj as nP→ ∞ for all i, j ∈ I, but the vector γ is
not a probability distribution because i∈I γi 6= 1 as well as it is not an invariant
distribution.
To continue, we introduce a practical example in which we study the result we
have just studied.
Example 6.1.6. Consider the Markov chain of Example 6.1.3.. Our goal is to
(n) (n)
compute the transition probabilities p1,1 and p2,2 .
First we have to compute the eigenvalues of the transition matrix Π. For this,
we calculate the characteristic equation, this is
1 4 3
det(Π − λId) = 0 ⇒ −λ −λ − =0
4 5 20
1
thus, solving the equation we obtain the following eigenvalues 1, 20 . Now, we can
rewrite the transition matrix Π as a diagonal matrix and hence exists an invertible
matrix A such that
!
n
1 0 1 0
Π= A 1 A−1 ⇒ Π(n) = A 1 n
A−1 .
0 20 0 20
(n)
Now, we want to compute the transition probability p1,1 . Using the n-th power
of the stochastic matrix we have
n n
(n) n 1 1
p1,1 = α · 1 + β =α+β .
20 20
34
To continue, we calculate the values of the constants α and β, so we have to solve
the following system of linear equations
(0)
(
1 = p1,1 = α + β
1 (1) 1
4
= p1,1 = α + 20
β.
4 15
so, the values of the constants are the following α = 19
and β = 19
.
(n)
Now, we use the last theorem applied on the probability p1,1 and we have
n
(n) 4 15 1 4
lim p1,1 = lim + = = γ1
n→+∞ n→+∞ 19 19 20 19
and with this we obtain that the distribution γ1 is invariant. If we had computed
(n)
the probability p2,1 , the result would have been the same. Now, we repeat the same
(n)
idea on the probability p2,2 , then we get
n n
(n) n 1 1
p2,2 = µ · 1 + λ =µ+λ ,
20 20
and now we want to compute the constants µ and λ, so we have to solve the following
system of linear equations
(0)
(
1 = p2,2 = µ + λ
4 (1) 1
5
= p2,2 = µ + 20
λ.
15 4
In this case, we obtain the values µ = 19 and λ = 19 , and hence the limit is the
following
n
(n) 15 4 1 15
lim p2,2 = lim + = = γ2 .
n→+∞ n→+∞ 19 19 20 19
With this we obtain that the distribution γ2 is invariant. If we had computed the
(n)
probability p1,2 the result would have been the same. Therefore, by Theorem
4 15
6.1.5. we get that γ = (γ1 , γ2 ) = 19 , 19 and it is an invariant distribution.
In the examples of this chapter, we can observe that the invariant distribution
may not exist, be unique or more than one distribution may exist. To continue,
we are going to introduce some results that allow us to set conditions in order
to guarantee the existence and uniqueness of the invariant measure. Firstly, we
remember the concept of measure and invariant measure.
Definition 6.1.7. A measure is any row vector µ = (µi ; i ∈ I) with µi ≥ 0 for
all i ∈ I. Moreover, we say a measure µ is invariant, if for any transition matrix
Π = (pi,j ; i, j ∈ I) satisfies
X
µi pi,j = µj , ∀j ∈ I,
i∈I
35
Now, in the following result, we analyze the existence of the invariant measure.
Theorem 6.1.8. Given the stochastic matrix Π = (pi,j ; i, j ∈ I) of the Markov
chain {Xn ; n ≥ 0}, which is irreducible and all states are recurrent. Consider
r −1
TX
λri = Er 1{Xn =i} ,
n=0
then
Proof. Given n ≥ 1 we have that the element {Tr ≥ n} depends only on X0 , ..., Xn−1
because none of the random variables Xn for all n ≥ 1 are in state r. Using the
Markov property at time n − 1 we get
Pr = (X1 = l, ..., Xn−1 = i, Xn = j and Tr ≥ n) = Pr (Xn−1 = i and Tr ≥ n)pi,j .
We also know that the states of the Markov chain are recurrent, so it is satisfied
Pr (Tr < ∞) = 1, which is the same as P (X0 = XTr = r) = 1. Now, we have
Tr ∞ ∞
1{Xn =j} = Er 1{Xn =j and Tr ≥n} =
X X X
λrj = Er Pr (Xn = j and Tr ≥ n)
n=1 n=1 n=1
To continue, the chain, before visiting the state j at time n, has been in some state
i ∈ I at time n − 1 so, this is the following
∞
XX
λrj = Pr (Xn−1 = i, Xn = j and Tr ≥ n)
i∈I n=1
∞ ∞
1{Xl =i and Tr −1≥l}
X X X X
= pi,j Pr (Xn−1 = i and Tr ≥ n) = pi,j Er
i∈I n=1 i∈I l=0
r −1
TX
1{Xl =i} =
X X
= pi,j Er λri pi,j .
i∈I l=0 i∈I
and with this we have that the following equality is satisfied λr Π = λr , which shows
(ii).
To continue, we have to prove (i). For this purpose, we know that all the states
of the Markov chain communicate among them. Then, for each state i ∈ I there
(m) (n)
are n, m ≥ 0 such that pi,r > 0 and pr,i > 0, with this we have
X (n) (n) (n)
λri = λrk pk,i ≥ λrr pr,i = pr,i > 0,
k∈I
(m) (m)
and, in addition, we obtain that 1 = λrr = k∈I λrk pk,r ≥ λri pi,r . With this we get
P
1
λri ≤ (m) < ∞. Hence, the vector λr satisfies λri > 0 for all i ∈ I, this shows (i).
pi,r
36
1{Xn =i} is defined as: given
P r −1
In the previous theorem, the equality λri = Er Tn=0
r
a fixed state r, λi is the mean time we have been in the state i before reaching the
state r. Moreover, remember that Tr = inf {n ≥ 1; Xn = r} is the first time the
chain visits the state r.
To continue, we will analyze a concept which we have used to study the result
which we have proved above.
Observation 6.1.9. Consider a Markov chain that starts in the state r, then we
have that n=0 1{Xn =r} = 1 and this implies that
PTr −1
r −1
TX
λrr = Er 1{Xn =r} = Er (1) = 1.
n=0
Now, once analyzed the existence of the invariant measure, we are going to study
the uniqueness for this invariant measure. In the following result we see that the
uniqueness is satisfied except for multiplicative constants.
Theorem 6.1.10. Given the stochastic matrix Π, which is irreducible and all states
are recurrent, then the invariant measure µ for Π is unique up to a multiplication
by a constant.
To continue, applying the same procedure to µi0 we obtain the following equality
!
X X
µj = µi1 pi1 ,i0 + µr pr,i0 pi0 ,j + µr pr,j
i0 6=r i1 6=r
!
X X
= µi1 pi1 ,i0 pi0 ,j + µr pr,i0 pi0 ,j + µr pr,j ,
i0 ,i1 6=r i0 6=r
and now, once calculated the previous equalities, we repeat this fact n − 1 times,
and then for all n ∈ N we obtain that
X
µj = µin pin ,in−1 · · · pi0 ,j
i0 ,...,in 6=r
X X
+µr pr,in−1 · · · pi0 ,j + · · · + pr,i0 pi0 ,j + pr,j
i0 ,...,in−1 6=r i0 6=r
37
In this case, for each j ∈ I, we get the following relation µj ≥ µr λrj for all j ∈ I, so
this implies that µ ≥ µr λr .
In addition, if the irreducible Markov chain has all its states recurrent, then, by
Theorem 6.1.8., λr is invariant, and hence, α = µ − µr λr is also invariant and
α ≥ 0. We have Π is irreducible so, given i ∈ I we can go from P state i to state r in n
(n) (n) (n)
steps, thus pi,r > 0. Using this we have 0 = µr − µr λrr = αr = j∈I αj pj,r ≥ αi pi,r ,
and hence αi = 0. This implies that the vector α = (αl ; l ∈ I) is null because all
its components are null, so we get that 0 = α = µ − µr λr which is the same as
µ = µr λr where µr is a constant, therefore, the invariant measure is unique.
With the results we have analyzed, we have been studied the existence and
uniqueness of the invariant measure. Now, we will introduce a practical example in
which we analyze the concepts we have studied in the last two theorems.
Example 6.1.11. Consider the example of the random walk on Z which we study
in Section 4.4. In this case, we have to consider the symmetric random walk, this
is p = 21 = q so, the stochastic matrix is
.. ..
. .
1 1
· · · 0 0 0 · · ·
2 2
1 1
Π = · · · 0 2 0 2 0 · · ·
· · ·
0 0 21 0 12 · · ·
.. ..
. .
In this case, the Markov chain is irreducible and also all its states are recurrent.
Now, we have to prove that the measure µ = (µi ; i ∈ I) is invariant, so it has to
satisfy the following equality µΠ = µ and this is equivalent to µi = 21 µi−1 + 12 µi+1
for all i ∈ I. The previous equality holds if µi = 1 for all i and hence, the measure
µ is invariant.
38
In the last definition, the variable mi is called mean recurrence time at state i,
which is the expected return time to this state.
To continue, we analyze the relationship between recurrent states and null re-
current or positive recurrent states.
Corollary 6.2.2. Positive recurrent states and null recurrent states are both recur-
rent.
Proof. For a positive recurrent state, we have mi < ∞ and this means that Ti
cannot be ∞ with strictly positive probability. Hence, state i is recurrent. On the
other hand, for null recurrent states, is given by definition that these states are
recurrent.
Now, we analyze that for a Markov chain with stochastic matrix Π, to say that the
chain has positive recurrent states is equivalent to say that, the stochastic matrix Π
has an invariant distribution. In our case, the study focuses on irreducible Markov
chains.
Theorem 6.2.3. Given an irreducible Markov chain with stochastic matrix Π =
(pi,j ; i, j ∈ I), then the following properties are equivalent
Proof. Firstly, we prove that (ii) implies (i). In this case, we have that all states
are positive recurrent, hence, each state i is positive recurrent and this shows (i).
Secondly, we prove that (i) implies (iii). Given r ∈ I a positive recurrent state,
then r is recurrent. We know that the chain is irreducible, hence there exists
an unique equivalence class and it contains one recurrent state, so all states are
recurrent. Theorem 6.1.8. tells that λr is an invariant measure, using this, we
construct an invariant distribution. First note that
r −1
TX r −1 X
TX
1{Xn =i} = Er 1{Xn =i}
X X
λri = Er
i∈I i∈I n=0 n=0 i∈I
r −1
TX
= Er 1 = Er (Tr ) = mr < ∞.
n=0
j λr
P a new random variable αj as αj = mr where j ∈ I.
From this equality we can define
In this case we have that, j∈I αj = 1 and hence, α = (αj ; j ∈ I) is an invariant
distribution.
39
To continue, we prove that (iii) implies (ii). Now, we know the Markov chain
is
Pirreducible and also the stochastic matrix Π has an invariant Pdistribution γ so,
(n)
γ
i∈I i = 1. Given a state r ∈ I, then we have that γr = γ p
i∈I i i,r > 0 for
any n ≥ 1. To continue, we can define an invariant measure µ = (µi ; i ∈ I) as
µi = γγri and hence we have µr = 1. Besides, as we know that the Markov chain is
irreducible, by Theorem 6.1.10. we obtain µ ≥ µr λr = λr . Therefore, we have
X X γi 1
mr = λri ≤ = <∞
i∈I i∈I
γr γr
with this we obtain that the state r is positive recurrent which shows (ii).
Finally, we have to prove the equality mr = γ1r , for this we assume that the
properties (i), (ii), and (iii) hold, so, all states are recurrent. By Theorem 6.1.10.
we have µ = µr λr and, using this, we obtain the following equality
X X γi 1
mr = λri = = ,
i∈I i∈I
γr γr
Now, once we have studied the positive recurrent and null recurrent states, which
are both recurrent states, we will analyze in an example the result we have just
studied.
Example 6.2.4. Consider the example of the random walk on Z for the case in
which p = 21 = q, we know that it is an irreducible Markov chain. In Example
6.1.11. we saw that there is an invariant measure µ and by Theorem 6.1.10., any
invariant measure is a scalar multiple of µ. Now, we analyze weather
X the symmetric
random walk is null recurrent or positive recurrent. We have µi = ∞ so, there
i∈I
can be no invariant distribution and, by Theorem 6.2.3., all states of the walk
are null recurrent.
Proof. First, we show the recurrence. For this we assume that the Markov chain is
transient so, for all i, j ∈ I, we get
∞ ∞ ∞
Ei (1{Xn =j} ) = Ei (N (j)) < ∞,
X (n)
X X
pi,j = Pi (Xn = j) =
n=1 n=1 n=1
40
and hence, as the state space is finite we have j∈I ∞
P P (n)
n=1 pi,j < ∞, but the previous
sum is equal to
X∞ X ∞
X
(n)
pi,j = 1 = ∞,
n=1 j∈I n=1
41
7 Ergodic theory
In the previous chapter, we have introduced the concept of invariant distribution
where, among other things, we have also studied its possible existence. In this
section, we focus on doing an analysis of the uniqueness of the invariant distribution
which is related to a new concept, the ergodicity.
In this chapter, we should take into account a probability space (Ω, F, P ) and a
family of discrete random variables Xn : Ω −→ I where {Xn ; n ≥ 0} is a HMC(ν, Π)
and I is a countable set called state space.
Proof. Firstly, let T be a stopping time and consider the random variables X0 , ..., XT
which determine an event A, where A ⊆ Ω and then, for all k ≥ 0, A ∩ {T = k} is
determined by X0 , ..., Xk . With this, we have the following
P ({XT = i0 , XT +1 = i1 , ..., XT +n = in } ∩ A ∩ {T = k} ∩ {XT = j})
= P ({Xk = i0 , Xk+1 = i1 , ..., Xk+n = in } ∩ A ∩ {T = k} ∩ {Xk = j}).
Now, remember the fact that A ∩ {XT = j} is determined by X0 , ..., Xk so, using
Markov property at time k and also considering the conditional probability we get
P (Xk = i0 , ..., Xk+n = in |A ∩ {T = k} ∩ {Xk = j})P (A ∩ {T = k} ∩ {Xk = j})
= P (Xk = i0 , Xk+1 = i1 , ..., Xk+n = in |Xk = j)P (A ∩ {T = k} ∩ {Xk = j})
= pj,i0 pi0 ,i1 · · · pin−1 ,in P (A ∩ {T = k} ∩ {Xk = j}).
42
The last equality is caused by the fact that {Xn ; n ≥ 0} is a homogeneous Markov
chain with transition probabilities pi,j . To continue, we sum over k = 0, 1, ... in the
equalities we have previously studied, and we obtain
Finally, we divide by P ({T < ∞} ∩ {XT = i}) the equality which we have just
calculated and we get
and this proves that {XT +n ; n ≥ 0} is a homogeneous Markov chain and also it is
independent of the random variables X0 , ..., XT , as we want.
Now, we use the concept prior analyzed to introduce a new one, the length of the
r-th walk between two passage times, Ti . We define it as follows
(
(r) (r−1) (r−1)
(r) Ti − Ti if Ti <∞
Si = (r−1)
0 if Ti =∞
To carry on, we will study an outcome which will be useful to prove the ergodic
(r) (r)
theorem. Before, we note that, for all r ≥ 1, random variables Ti and Si are
(r)
stopping times. This fact is due to {Ti = m} is equivalent to say that Xm = i and
previously we have passed r − 1 times through the state i. With this, we note that
(r)
Ti only depends on X0 , ..., Xm . Now, we introduce a new result which links the
probability of the first passage time with the r-th walk between two passage times.
(r−1) (r)
Proposition 7.1.3. Given r = 2, 3, ..., conditional on Ti , the r-th walk Si is
(r−1)
independent of the random variables {Xk ; k ≤ Ti }, then we have
(r) (r−1)
P (Si = m|Ti < ∞) = Pi (Ti = m).
Proof. Firstly, note that, to prove this result we use the strong Markov property.
(r−1)
In this case, we consider the stopping time T = Ti < ∞ of the homogeneous
Markov chain {Xm ; m ≥ 0}. Then, XT = i and conditional on T , {XT +m ; m ≥ 0}
is a HMC(δi , Π) and also independent of X0 , ..., XT . Once given these properties,
43
(r)
Si can be defined as the first passage time to state i for the chain {XT +m ; m ≥ 0}
(r)
so, this is Si = inf {m ≥ 1; XT +m = i}.
Therefore, using the properties we have just analyzed, we have the following
(r) (r−1)
equality P (Si = m|Ti < ∞) = Pi (Ti = m).
Using the concepts studied before, we can conclude that the non-negative random
(1) (2)
variables Si , Si , ... are independent and identically distributed. Now, we continue
considering the passage time and we will introduce a new result which links this
random variables with the Markov chains whose states are all recurrent. Remember
that a recurrent state, i, is one which satisfies Pi (Ti < ∞) = 1.
Proof. Firstly, we know that the first passage time to state i, Ti , is independent
of the initial position of the MarkovPchain, thus, we can rewrite the probability
P (Ti < ∞) as follows P (Ti < ∞) = j∈I P (X0 = j)Pj (Ti < ∞). In this case, we
(n)
just have to prove Pj (Ti < ∞) = 1 for all j ∈ I. Given n ≥ 1 such that pi,j > 0
and by Section 4.2., as all states are recurrent, we have
P (n)
We know that k∈I pi,k = 1 and by the prior equation, we have that the probability
satisfies Pj (Ti < ∞) = 1.
Finally, substituting this value, in the following equality, we obtain
X X
P (Ti < ∞) = P (X0 = j)Pj (Ti < ∞) = P (X0 = j) = 1,
j∈I j∈I
44
Theorem 7.1.5. Consider {Yn ; n ≥ 1} a sequence of independent and identically
distributed random variables, and also non-negative, with E(Y1 ) = m, then
Y1 + · · · Yn
P lim = m = 1.
n→+∞ n
Now, we know that in chapter 4 we introduced the number of visits that the
chain does to the state i, denoted by N (i). We define the number of visits to the
state i before the instant n as follows
n−1
1{Xr =i} ,
X
Nn (i) =
r=0
then the coefficient Nnn(i) determines the time that the chain spends in state i, before
arriving at instant n. The following theorem examines the long-run value spent by
a Markov chain in each state. Moreover, in it we talk about the uniqueness of the
invariant distribution.
Theorem 7.1.6. (Ergodic theorem) Consider {Xn ; n ≥ 0}, an irreducible Markov
chain with stochastic matrix Π and initial distribution ν, then
Nn (i) 1
P lim = = 1.
n→+∞ n mi
In addition, if the Markov chain is positive recurrent, then given a bounded function
f : I → R, we have
n−1
!
1X X
P lim f (Xr ) = γi f (i) = 1,
n→+∞ n
r=0 i∈I
Proof. First suppose the Markov chain is transient or null recurrent, then for all
i ∈ I, the number of visits to state i is finite, so limn→+∞ Nnn(i) = 0 = m1i with
probability 1.
Now, suppose the Markov chain is recurrent and fix a state i. Considering T = Ti ,
by Proposition 7.1.4., we get P (T < ∞) = 1. Now, by strong Markov property,
we have the Markov chain {XT +n ; n ≥ 0} with initial distribution δi = (δi,j ; j ∈ I)
and stochastic matrix Π and also, independent of X0 , ..., XT . Moreover, both chains
XT +n and Xn spent the same time at state i, for this, we only need to consider that
µ = δi .
Previously, we have defined the length of the r-th walk to the state i, denoted
(r)
by Si . To continue, we use this concept to analyze the time of the visit to the
state i depending in which instant, n, of the chain we focus. Firstly, we know
(N (i)−1)
that the moment of the last visit to the state i before the instant n is Ti n =
(1) (Nn (i)−1)
Si + · · · + Si and hence, we have
(1) (Nn (i)−1)
Si + · · · + Si ≤ n − 1.
45
Secondly, we also know that the moment of the first visit to the state i after the
(N (i)) (1) (N (i))
instant n − 1 is Ti n = Si + · · · + Si n and now we have
(1) (Nn (i))
Si + · · · + Si ≥ n.
Finally, using the inequalities which we have just studied, we obtain the following
(1) (Nn (i)−1) (1) (Nn (i))
Si + · · · + Si n S + · · · + Si
≤ ≤ i .
Nn (i) Nn (i) Nn (i)
(r)
Now, by Proposition 7.1.3. we get that Ei (Si ) = mi and using the strong
law of large numbers, we have
!
(1) (k)
Si + · · · + Si
P lim = mi = 1,
k→+∞ k
In addition, we have considered that the Markov chain is recurrent, therefore, by
chapter 4 we have the following equality
P lim Nn (i) = ∞ = 1.
n→+∞
To carry on, we know that the probability that a Markov chain is in the state i is
γi . Then, the time period that we spend in this state, we already know it is m1i ,
must be equal to γi . So we have that γi = m1i and hence the following equality holds
P limn→+∞ Nnn(i) = γi = 1, for all i ∈ I. Therefore, using this we have that
X Nn (i) X Nn (i) X Nn (i) X
− γi + + γi ≤ 2 − γi + 2 γi
i∈J
n n i∈J
n
i∈J
/ i∈J
/
46
Now, let > 0 and consider J a finite subset, so J = {0, 1, ..., N }. Firstly, we
Nn (i)
− γi < 4 . Then, for
P P
/ γi < 4 . Secondly, for n ≥ N we get
have that i∈J i∈J n
n ≥ N and using what we have just established we obtain the following
n−1
1X X
f (Xr ) − γi f (i) < ,
n r=0 i∈I
and this shows the equality which we want to prove because the required convergence
is satisfied.
In the previous definition, we observe that the limit is independent of the state
i ∈ I. To continue, the ergodicity helps us to study the problem of the uniqueness
of the invariant distribution. In the following result we analyze this fact.
Theorem 7.2.2. Given a homogeneous Markov chain, {Xn ; n ≥ 0}, with finite
state space I and stochastic matrix Π. Suppose that there exists k ≥ 1 such that
(k)
min pi,j > 0. Then, for all i ∈ I, there exists π = (πj ; j ∈ I) which satisfies
i,j∈I
(n)
πj = lim pi,j
n→+∞
47
(n) (n) (n)
Proof. Firstly, for all n ≥ 1, we denote by αj and βj the values min pi,j and
i∈I
(n)
max pi,j respectively. Now, using Chapman-Kolmogorov equation and knowing
i∈I
P (n)
that r∈I pi,r = 1, we get
(n+1) (n+1)
X (n)
X (n)
X (n) (n)
αj = min pi,j = min pi,l pl,j ≥ min pi,l min pl,j = min pi,l αj = αj ,
i∈I i∈I i∈I l∈I i∈I
l∈I l∈I l∈I
(n)
with this we get that {αj }n≥1 is an increasing sequence. Then, as 0 ≤ pi,j ≤ 1,
the above sequence is bounded, therefore exists a limit which we denote by αj .
Repeating the same argument for the maximum, we obtain
(n+1)
X (n)
X (n)
X (n) (n)
βj = max pi,l pl,j ≤ max pi,l max pl,j = max pi,l βj = βj ,
i∈I i∈I l∈I i∈I
l∈I l∈I l∈I
(n)
and now, the sequence {βj }n≥1 is decreasing. Then, as 0 ≤ pi,j ≤ 1, the above
sequence is bounded, therefore exists a limit which we denote by βj .
(n) (n) (n)
For the moment, by definition we know that αj ≤ pi,j ≤ βj , so if it complies
αj = βj , then, for all j = 1, ..., N , we will have that the following equality is
(n)
satisfied, limn→+∞ pi,j = πj where πj is equal to αj and also βj . Therefore, to show
the desired result, it is necessary to study, for all j = 1, ..., N , the following equality
(n) (n)
lim βj − αj = 0,
n→+∞
and then we have that βj = αj . In the following lines, we will analyze the previous
equality.
(k)
Firstly, we define θ as θ = min pi,j > 0, then we have the following probability
i,j∈I
(k+n)
X (k) (n)
X (k) (n)
(n)
X (n) (n)
pi,j = pi,l pl,j = pi,l − θpj,l pl,j + θpj,l pl,j
l∈I l∈I l∈I
X (k) (n)
(n) (2n)
= pi,l − θpj,l pl,j + θpj,j .
l∈I
(k) (n)
To continue, as pi,l ≥ θ and we know that pj,l ≤ 1, then we have the following
(k) (n)
relationship pi,l − θpj,l ≥ 0. Therefore, using this, we obtain
(k+n)
X (k) (n)
(n) (2n)
X (k) (n)
X (n) (n) (2n)
pi,j ≥ pi,l − θpj,l min pl,j + θpj,j = pi,l αj − θpj,l αj + θpj,j
l∈I
l∈I l∈I l∈I
(n) (2n)
= (1 − θ)αj + θpj,j ,
P (n)
in this equalities we use that j∈I pi,j = 1. Hence, using we have just calculated
(k+n) (n) (2n)
we have αj ≥ αj (1 − θ) + θpj,j .
(k+n) (n) (2n)
Using the same argument, we get βj ≤ βj (1 − θ) + θpj,j .
48
(k+n) (k+n)
Now, combining the two inequalities which we calculate, we have βj −αj ≤
(n) (n)
(βj − αj )(1 − θ). Then, by induction we obtain, for all r ≥ 1, the following
(rk+n) (rk+n) (n) (n)
0 ≤ βj − αj ≤ (βj − αj )(1 − θ)r ,
(n) (n)
and with this we get that limr→+∞ (βj − αj )(1 − θ)r = 0 due to θ > 0. Therefore,
there exists a sequence {nr }r≥1 which satisfies
(nr ) (nr )
lim βj − αj = 0.
r→+∞
(n) (n)
Hence, due to for {βj − αj } each term is less than or equal to the previous
(n) (n)
one, then we get limn→+∞ βj − αj = 0 as we want, and hence this implies that
βj = αj , which is the desired equality.
Finally, we will study that π is an invariant distribution. Firstly, we need to see
that π is a non degenerate probability distribution, for this suppose that n ≥ k, so
(n) (k) (k)
αj ≥ αj ≥ min pi,j = θ > 0, then we get
i,j∈I
(n) (n)
lim αj ≥ lim θ = θ and πj = αj = lim αj
n→+∞ n→+∞ n→+∞
49
To continue, once studied the ergodic theorem for finite state space, we will
analyze that the left implication is also true.
(n)
Proof. Firstly, given the equality limn→+∞ pi,j = πj we have that, for all j =
1, ..., N , the required number of steps to reach the state j is nj , so for all n ≥ nj ,
(n) (n)
pi,j > 0 where i ∈ I. Therefore, this implies that for all j = 1, ..., N , min pi,j > 0.
i∈I
(k)
Now, we define k = max(n1 , ..., nN ) and then, for all j = 1, ..., N , we get min pi,j > 0
i∈I
and this is what we wanted to prove.
The ergodicity of Markov chains on an arbitrary finite state space can be char-
acterized by the following notion.
Definition 7.2.5. A Markov chain with finite state space is regular if there exists
a power of the transition matrix, Π, with only positive entries.
Example 7.2.6. Consider the Markov chain with state space I = {1, 2, 3} and
transition matrix 1 1
2 2
0
1 1 3
Π = 5 5 5
1 0 0
To continue, we will study if the Markov chain is regular or not. For this, we are
going to compute the successive powers of the stochastic matrix in order to check if
there exists a matrix where all of its entries are non-null. Firstly, we calculate Π(2)
and we get
7 7 3
20 20 10
(2)
Π = 37 7 3
50 50 25
1 1
2 2
0
with this we have an entry which is null, therefore, we have to calculate Π(3) , which
is the following
109 49 21
200 200 100
(3)
Π = 259 199 21
500 500 250
7 7 3
20 20 10
Now, all entries of the matrix are positive, therefore, this implies that the Markov
chain is regular.
50
To continue, using the concept which we have just introduced, the regularity, we
will study the uniqueness of the invariant distribution of Theorem 7.2.2..
Proof. Firstly, due to the stochastic matrix is regular, for all i, j ∈ I, we get
(n)
that pi,j > 0 and hence, by Theorem 7.2.2. we have the following equality
(n)
limn→+∞ pi,j = πj . To continue, given that the column vector α = (αk ; k ∈ I) has
all its components equal to 1, then we can define the matrix Q = (qk,j ; k, j ∈ I) as
Q = απ, in other words, each row of Q is the same and it is the non degenerate
probability distribution π. Once defined this, we have that limn→+∞ Π(n) = Q.
Now, we will study that the invariant distribution π is unique. For this, we
suppose that there exists another invariant distribution β = (βi ; i ∈ I), therefore,
= β and βΠ(n) = β for all n ≥ 1, but
it has to satisfy the following equalities βΠ P
as β is an invariant distribution, then βα = i∈I βi = 1 which implies that
and hence, β = π and this show that the invariant distribution π = (πj ; j ∈ I) is
unique.
51
8 Conclusions
This project has a theoretical approach so it is difficult to get conclusions or specific
results. However, the research and analytical work carried out constitute a good
framework in order to develop future studies that look for practical conclusions
using the basics of Markov chains.
From my point of view, this thesis has been useful in order to analyze the sev-
eral applications of the probability concepts we have learnt along the degree. For
example, we have used the concept of conditional probability and independence in
the study of the Chapman-Kolmogorov equation.
Furthermore, I have been able to know the basics of Markov chains, as long as
their different properties and characteristics, an aspect which was unknown for me
till I carried out this project. However, I have not only learnt specific mathematical
aspects due to this project has been a great challenge for me and have allowed me
to improve my skills related with the research of information and presentation of
ideas in a clear and accurate manner.
Finally, I want to say that the project has been only focused on discrete time
Markov chains. Despite this, there exists another type of chains, continuous time
Markov chains, which have not been included in the scope of this thesis due to
I preferred to focus the analysis in a specific topic. In this regard, it could be
interesting to expand the scope of this study and to carry out a similar analysis
focused on continuous time chains.
52
A Basic probability concepts
In this section we will see the most relevant concepts about probability theory which
we will be very useful to study Markov chains.
In this chapter we will focus on the probability of countable sets. For this, we
will study the concepts of conditional probability and independence. Moreover, we
will focus on random variables, concretely discrete variables, due to they are the
ones that help us to define the concept of Markov chain.
53
Once given the last definition, we have the following result.
Corollary A.1.3. Assume that Ω = {wi ; i ∈ I} where I is finite, so I = {1, ..., n},
and we consider that the sample space is equiprobable. Then, probabilities {p1 , ..., pn }
take values pi = n1 for all i = 1, ..., n. Moreover, the probability of any event A ⊆ Ω
is defined as
X X 1 1 X #A
P (A) = pi = = 1= .
i,w ∈A i,w ∈A
n #Ω i,w ∈A
#Ω
i i i
1. P (∅) = 0, P (Ω) = 1.
2. For all A ∈ F, P (A) ∈ [0, 1].
3. Assume that A, B ∈ F are disjoint, then P (A ∪ B) = P (A) + P (B) .
This property may be more general if we consider A1 , A2 , ..., An mutually
disjoint, in this case we have
n
! n
[ X
P Ai = P (Ai ).
i=1 i=1
The following results are three important formulas related to the probability
defined above.
54
2. Theorem (Total probabilities) Given the partition B1 , ..., Bn of Ω
({Bi ; 1 ≤ i ≤ n} ⊆ F) with P (Bi ) > 0, ∀i = 1, ..., n. Then, for all A ∈ F
(event) we have
n
X n
X
P (A) = P (A ∩ Bi ) = P (A|Bi ) P (Bi ) .
i=1 i=1
After that, we study the concept of independence and some of its properties.
Definition A.2.2. Consider two events A, B ∈ F which are independents if
P (A ∩ B) = P (A)P (B).
Once given this specific definition of independence, we can consider the general
one.
Definition A.2.3. Let us assume that the events A1 , ..., An ∈ F are independents
if for all finite subsets {Ai1 , ..., Aik } ⊂ {A1 , ..., An } satisfy
k
Y
P (Ai1 ∩ ... ∩ Aik ) = P Aij .
j=1
Now we want to expose a result that links the two concepts we have studied so far.
Proposition A.2.4. Let A, B ∈ F events such that their probability are strictly
positive, then
55
Definition A.2.5. Consider the events A1 , A2 , A3 ∈ F such that P (A3 ) > 0, then
A1 and A2 are conditionally independent given A3 if
P (A1 ∩ A2 |A3 ) = P (A1 |A3 )P (A2 |A3 ).
Definition A.2.6. Given the events {A1 , A2 , A3 }, these form a Markov family if
P (A3 |A1 ∩ A2 ) = P (A3 |A2 ).
Observation A.2.7. If we consider A1 , A2 , A3 like a chronological sequence where:
A1 is the past, A2 is the present and A3 is the future; then P (A3 |A1 ∩A2 ) = P (A3 |A2 )
shows that the dependence relationship only matters in the present (not in the past).
Proposition A.2.8. Let us assume that A1 , A2 , A3 are events, so A1 , A2 , A3 ∈ F,
then A1 and A1 and A3 are conditionally independents respect A2 if, and only if
{A1 , A2 , A3 } are a Markov family (in other words, P (A3 |A1 ∩ A2 ) = P (A3 |A2 )).
Observation A.2.9. The last proposition can be summarized with the following
P (A1 ∩ A3 |A2 ) = P (A1 |A2 )P (A3 |A2 ) ⇔ P (A3 |A1 ∩ A2 ) = P (A3 |A2 ).
Proof. Firstly, we show the right implication. Given A1 , A2 , A3 ∈ F and using the
equality P (A1 ∩ A3 |A2 ) = P (A1 |A2 )P (A3 |A2 ) we have
P (A1 ∩ A2 ∩ A3 ) P (A1 ∩ A3 |A2 )P (A2 )
P (A3 |A1 ∩ A2 ) = =
P (A1 ∩ A2 ) P (A1 ∩ A2 )
P (A1 |A2 )P (A3 |A2 )P (A2 )
= = P (A3 |A2 ).
P (A1 |A2 )P (A2 )
Now we prove the left implication, let A1 , A2 , A3 ∈ F and consider P (A3 |A1 ∩A2 ) =
P (A3 |A2 ) we may appreciate
P (A1 ∩ A2 ∩ A3 ) P (A3 |A1 ∩ A2 )P (A1 ∩ A2 )
P (A1 ∩ A3 |A2 ) = =
P (A2 ) P (A2 )
P (A3 |A2 )P (A1 |A2 )P (A2 )
= = P (A3 |A2 )P (A1 |A2 ).
P (A2 )
56
Observation A.3.2. The Borel σ-algebra is the smallest σ-algebra generated by
the collection of open or closed sets in R.
Definition A.3.3. The law (or distribution) of a random variable X is the proba-
bility P ◦ X −1 such that
Definition A.3.5. Given a random variable, this is discrete if the set X(Ω) is finite
or countable, this can be written as X : Ω −→ I where I is finite or countable.
Therefore, we can express the discrete random variable like {ai ; i ∈ I}.
Definition A.3.6. The law (or distribution) of a discrete random variable is called
probability mass function and it is defined by
X X
p(ai ) = P (X = ai ), for i ∈ I and p(ai ) = P (X = ai ) = 1.
i∈X(Ω) i∈X(Ω)
To conclude this section, we talk about the expected value focusing on discrete
random variables.
57
B Examples
Example B.0.1. Consider a coin repeatedly tossed (p the probability of heads and
q = 1 − p that of tails). Let Hn and Tn the number of heads and tails in the first n
releases, proof that Xn = (−1)2Tn +Hn is a homogeneous Markov chain.
We define the set of random variables ∀k = {1, ..., n} as
0 if k is head
ξk =
1 if k is tail
which are independents, so we can express the number of heads and tails as following
manner n n
X X
Hn = ξk and Tn = n − ξk ,
k=1 k=1
2(n− k=1 ξk )+ n
Pn P Pn
this implies that Xn = P (−1)2Tn +Hn = (−1) k=1 ξk
= (−1)2n− k=1 ξk
n+1
and Xn+1 = (−1)2(n+1)− k=1 ξk = Xn (−1)2−ξn+1 = Xn (−1)−ξn+1 . Now, using the
conditional probability and independence, we see that Xn is a homogeneous Markov
chain, ∀i, j ∈ I we have
P (Xn+1 = j|Xn = i)
P (Xn (−1)−ξn+1 = j, Xn = i)
= P (Xn (−1)−ξn+1 = j|Xn = i) =
P (Xn = i)
j
P ((−1)−ξn+1 = i , Xn = i) P ((−1)−ξn+1 = ji )P (Xn = i)
= =
P (Xn = i) P (Xn = i)
j
= P (−1)−ξn+1 = := pi,j .
i
58
so the associated diagram is
59
Example B.0.4. A fisherman and his boats. A fisherman owns 4 boats that
rents to tourists. Each boat can be damaged with a probability p, independently of
the conditions of the other boats. The fisherman can repair his boats at nights, but
just one each night. Because of municipal rules, he can not rent any boat if there
are not 2 or more boats available. If one day he can rent his boats, he will offer all
the boats available. Study the Markov chain associated to this problem.
Given p, the probability that the boat is not damaged is q = 1 − p, so the
stochastic matrix is
0 1 0 0
p2 2pq q2 0
Π= 3
2 2 3
p 3p q 3pq q
4 3 2 2 3 4
p 4p q 6p q 4q p + q
where a, b > 0 and a + b > 0. Now, we want to compute the m-step transition
probability of every state. For this, is necessary to compute the m-th power of the
stochastic matrix, Π(m) . Firstly, we have to calculate the eigenvalues of Π, for this,
we compute the characteristic equation which is
det(Π − λId) = 0 ⇒ λ2 − λ + (a + b)(λ − 1) + 1 = 0
thus, solving the equation we obtain the following eigenvalues 1 and 1 − a − b,
whose eigenvalues are (1, 1) and (a, −b), respectively. With this, we can define the
following
1 a −1 1 b a 1 0
Q= , Q = , D= .
1 −b a + b 1 −1 0 1−a−b
60
Now, we can rewrite the transition matrix Π as follows Π = QDQ−1 , and hence,
this implies that Π(m) = QD(m) Q−1 so, we get
m
(m) 1 1 a 1 0 b a
Π =
a + b 1 −b 0 (1 − a − b)m 1 −1
(1 − a − b)m
1 b a a −a
= + ,
a+b b a a+b −b b
with this we have that each matrix entry is the m-step transition probability of
going from one state to another.
Example B.0.6. Guardian of the tower. A guardian can watch from the four
corners of a tower in the following manner: after staying 5 minutes in a corner he
throws a coin to the air to determine if he moves to the left (heads) or he moves to
the right (tails). This process is repeated to the infinity. Study the Markov chain
associated to the process. If p = 21 (p is the probability of obtaining heads) compute
Πn .
Given p = 12 , the probability of obtaining tails is q = 1 − p = 12 , so the stochastic
matrix is
1 1
0 2
0 2
1
2 0 21 0
0 1 0 1
2 2
1 1
2
0 2 0
In this case, to obtain the matrix Π(n) we have to consider two cases, if n is an
even number, then the stochastic matrix is
61
1 1
2
0 2
0
1
0 2 0 21
Π(n) = 1
1
2 0 2
0
1 1
0 2 0 2
1 1
0 2
0 2
1
2 0 21 0
(n)
Π =
0 1 0 1
2 2
1 1
2
0 2 0
(a) Suppose that the starting corner is selected randomly. If Xn is the random
variable which defines where the guard is located after 5n minutes, compute the Xn
law.
In this case, as the start is selected randomly we have X0 = i where i = {1, 2, 3, 4}
and the probability is P (X0 = i) = 41 , so the vector known as initial distribution is
1 1 1 1
, , ,
4 4 4 4
and the law for Xn is
1 1
1 1
0 2
0 2 4 4
1
2 0 12
1 1
0 4 4
0 1 0
1 1
=
1
2 2 4 4
1 1 1 1
2
0 2
0 4 4
62
Secondly, if n is an odd number, then the law for Xn is
1
1
0 2
01 0
2
1 1 1
2 0 2 0 0 2
0 1 0 1 0 = 0
2 2
1 1 1
2
0 2 0 0 2
Example B.0.7. Consider the Markov chain with the state space I = {1, 2, 3, 4, 5}
and transition matrix
1 1
2 2
0 0 0
1 1 1
0 0
3 3 3
Π=
0 0 1 0 0
1 1
0 0 0
2 2
1 1
2
0 0 0 2
The class C1 is recurrent because once we reach the class C1 we are not able to
give up this class, and also it is finite and irreducible. The state of the class C2 is
recurrent, in particular it is an absorbing state, because if we start in state 4, we
are able to move to state 3 and if we start in state 3 we are always on it. The state
of the class C3 is transient because if we start in state 4, we are able to move to
state 3 and once there we cannot return back to state 4. The state of the class C4
is transient because if we start in state 5, we are able to move to state 1 and once
there we cannot return back to state 5.
63
Example B.0.8. Consider the Markov chain with the state space I = {1, 2, 3, 4, 5}
and transition matrix
0 0 0 1 0
1
0 2 0 21 0
1 1 1
3 3 0 0 3
Π=
0 0 1 0 0
1 1
0 0 2
0 2
C1 = {1, 2, 3, 4, 5}.
In this case we have an unique class, C1 , which contains all the states of the Markov
chain. Moreover, using the previous diagram it is easy to see that all the states
communicate, so the chain is irreducible, in particular the class C1 is recurrent
because we can visit every state infinite times.
Example B.0.9. Consider the Markov chain with the state space I = {1, 2, 3, 4}
and transition matrix
1 1
0 0
2 2
0 0 1 1
Π= 1 2 2
1
0 2 0 2
1 1
0 2 2
0
64
and the associated diagram is
In this case, we have to study if the states of the Markov chain are essential or
inessential. First, we note that being in state 1 we can leave but we cannot return
to it, so this state is inessential. Now, being in state 2 we can go to state 3 and once
there we can return to the initial state or go to state 4. If we go to state 4, from
there, we can return to the initial state. With this we have that state 2 is essential.
The same argument is valid for states 3 and 4, which are also essentials.
65
References
[1] Ash, R.B.: Basic probability theory, New York: Wiley, 1970.
[2] Brémaud, P.: Markov chains : Gibbs fields and Monte Carlo simulation, and
queues, New York: Springer, 1999.
[3] Ching, W.; Huang, X.; Ng, M.K.; Siu, T.K.: Markov chains models, algorithms
and applications, 2nd ed., Boston, MA: Springer, 2013.
[4] Freedman, D.: Markov chains, New York: Springer, 1983, Reprint of 1971
Holden-Day edition.
[5] Kemeny, J.G.; Snell, J.L.: Finite Markov chains, Princeton, N. J: Van Nostrand,
1960.
[9] Stroock, D.W.: An Introduction to Markov Process, 2nd ed., Berlin: Springer,
2014.
66