Conditional Probability

Download as pdf or txt
Download as pdf or txt
You are on page 1of 85

UNIT 1 CONDITIONAL PROBABILITY

Structure Page No.


1.1 Introduction 5
Objectives
1.2 Conditional Probability 5
1.3 Compound Probability 10
1.4 Bayes' Theorem 12
1.5 Conditional Distribution 14
1.6 Conditional Expectations 24
1.7 Summary 28
1.8 Solutions/Answers 28

1 . INTRODUCTION
This unit introduces you to the pre-requisites of probability and statistics, which you
studied as undergraduates. We recall the concepts of conditional probability,
compound probability, Bayes' theorem, conditional distribution, and conditional
expectations here. These are fundamental to the study of probability and statistics.
The history of probability can be traced back to the beginning of the mankind in the
games of chance. Archaeologists have found evidence of games of chance in
prehistoric digs, showing that gaming and gambling have been a major pastime for the
peoples in Greece, Egypt, China, and India since the dawn of civilization. However, it
wasn't until the 1 7 century
~ that a rigorous mathematics of probability was developed
by French mathematicians Pierre de Fermat and Blaise Pascal. The basic concept of
conditional probability and the famous Bayes' theorem was the pioneer work of
Thomas Bayes (1707-1761). However, it was Laplace, who generalized, completed
and consummated the ideas provided by his predecessors in his book Thborie
analytique des probabilitbs in 1812. It gave a comprehensive system of probability
theory (The elements of probability calculus - addition, multiplication, division -
were by that time finnly established.).
We shall start our discussion with conditional probability in Sec. 1.2. Here, we present
its concept and definition along with some examples. In Sec. 1.3, we learn the
Compound Probability Law. In Sec.1.4, we recall the Law of Total Probability along
with the very widely used Bayes' theorem. In Sec. 1.5, we discuss the conditional
distribution. Finally we conclude with defining the conditional expectation of the
random variables, which is very important, and give some examples of it.
Objectives
After studying this unit, you should be able to:
I
define and compute the conditional probability of an event;
I
distinguish between the conditional and unconditional probability of an event; 1
evaluate the change in the probability of an event after the occurrence of another
event;
apply the Bayes' theorem in different situations;
apply the concept of conditional distribution and conditional expectation and their
important properties in various problems.

1.2 CONDITIONAL PROBABILITY


Let us start with an example to understand the concept of conditional probability. A
teacher gave two tests in succession to the students of a class. 75% Students of the
class passed the first test, 35% 'of.the class passed the second test and 15% of the
class passed both tests. We may want to find, "What percent of the students, who
Markov Chains passed the first test passed the second, who passed the second test, not the first test?'
Through this example we shall illustrate the concept of conditional probability.
Suppose we want to find the probability of the event that the second test will be passed
by a student given that helshe had passed first test. Here, it is given that 35% of the
students passed the second test and therefore, the probability that a student of the class
will pass the second test will be 0.35. It is an unconditional probability of the event
that a student passes second test. When we are given prior information that a student
has passed the first test and we want to know the probability of the same event - the
student passes the second test - under this condition, then the probability will not
remain the same. Since out of 75%, of the students who passed the first test only
15% could pass the second test, therefore, -----
15'lo0 - 20% students of those students
75
who passed the first test passed the second test. It gives the conditional probability of
the event that a student passes the second test given that helshe has passed the first
test, and the probability will be equal to 0.20.

This example can also be shown clearly by a Venn diagram as depicted in Fig.1.

Let the event A = a student who passed the first test, and the event B = a student who
passed the second test; clearly, the event A nB = a student who passed both the tests.

It is given that P(B) = 0.35,which is the probability of the event B without any
additional condition, i.e. inconditional probability which means that the probability is
evaluated considering a full class of students as the sample space. If we want to find
the probability of the event that, has a student passed the second test, who has already
passed the first test, then our sample space reduces to the set of those students who
have passed the first test, i.e. the event A,. This probability is evaluated as the ratio of
the probability of B included in A (which is P(A nB) ) to the probability of A . This
ratio comes out to be 0.20 or 20% as evaluated above. In this case, this probability
will be termed as the conditional probability of event B (passed second test), given
that the event A (passed the first test) has happened.

This discussion enables us to introduce the following formal definitions. In what


follows we assume that we are given a random experiment with discrete sample space
S , and all relevant events are subsets of S .

Definition 1: Let S be a sample space of an experiment. Let A and B be any two


events defined on this sample space with P(B) > 0 (not allowing an event of
probability zero). The conditional probability of an event A , given that the other
event B has happened, is denoted by the symbol P(A 1 B) , and is read as "the
probability of A , given B " and this conditional probability, P(A ( B) , is defined as
follows:

Similarly, we may define conditional probability of B given A as:


Conditional Probability

Conditional probability P(A ( B) is a set function defined on the subsets of event B .


It can easily be verified that P(A IB) satisfies all the axioms of Probability.

The probability, which is not conditional, is said to be unconditional probability or


probability. Also, there does not exist any ordinal relationship between conditional
and unconditional probabilities. Depending on the size of the new sample space (here,
we denote it as B ) under the condition, and the size of A n B the conditional
probability P(A ( B) may be smaller or larger than its unconditional probability P(A) .
Now, let us look at some important properties of conditional probability.

Properties of Conditional Probability

From the above definition we may easily verify the following intuitive results for any
three events A, B , and C of a sample space, S . Let us discuss a few properties of
conditional probability.
1. P(A ( A) = 1 , which is the conditional probability of the reduced sample space
itself. P(A I A), the probability of event A when A has happened, is clearly 1.
P(A n A)
Also P(A 1 A) = (from (1)) i.e. P(A 1 A) = 1 . This is the axiom of
P(A)
normedness of probability measure.
2. P(A1B) 2 0. Since numerator and denominator both in Eqn.(l) of conditional
probability are non-negative, it is non-negativity axiom of probability.
3'. P(AIB) 5 1 . Since in Eqn.(l) the event involved in the numerator, A n B is
always a subset to the event involved in the denominator B . Hence, from the
monotone property of probability the result follows.
4. P(A I B) = O ; if the events A and B are mutually exclusive. If the events A and
B are mutually exclusive, then A nB will be empty. Then the numerator in the
Eqn.(l) will be zero. Therefore, the result follows.

A' n~

AnB

Proof: From the set theory, we have


B = (A n B)U(A' n B) .
Markov Chains which is also shown in the Venn diagram in Fig.2.
Since (A n B) and (AC nB) are two disjoint events. Therefore using the addition
law of probability for disjoint events, we get:
P(B) = P(A n B) + p(ACn B)
Dividing both sides by P(B) , we get

1=P(A I B) + P(A' I B) and we get the result.


8. P(A u BIC)=P(AIC)+P(BIC)-P(A n B(C) where P(C) > 0 . It is parallel to the
addition law of probability.

Proof: From the distributive law of set theory for the three sets A, B , and C , we
know that:
( A u B ) n C = ( A n C ) u ( B n C ) ..
Using the addition law of the probability of events, we get:

Dividing both sides by P(C) , we have:

and, hence, we get the result.

In terms of mathematical or classical interpretation of the probability, in case all


outcomes in S are considered equally likely, the conditional probability of an-event
A , given the event B, P(A I B) , can also be defined as:

where n(B) denotes the number of outcomes favorable to event B , and similarly
n(A n B) denotes the number of outcomes favorable to the event A n B .

Let us consider some examples to illustrate the concept of conditional probability.

Example 1: Suppose that A and B are two events in an experiment with


P(A) = 113, P(B) = 114, and P(A n B) = 1/10. Find each of the following:
a) P(AlB)
b) P(BlA)
C) P(ACIB)
d) P(AIBC)
e) P(AC1 Bc)
P ( A n B ) - 1/10
Solution: a) By definition, we have P(AIB)= -- =4/10=0.4
P(B) 114
P ( A n B ) - 1/10
b) Similarly, we have P(BIA)= - -= 3/10 = 0.3
P(A) 113
c) From Property 7 in this section, we have:
P(Ac IB)=1-P(AIB)=l-0.4=0.6
d) For any two events A and B , we know P(A)=P(A n B)+P(A n B')
Conditional Probability
thus P(AnBc)=l13-1110=7130, and, therefore:

1 Example 2: In a survey, the question, "Do you smoke?" was asked to 100
i peopli. Results are shown in the following table:
Yes (A) No ( A )~ Total

1 Male (B) 1 19 1 41 1 60
1
Female (B')
1
I
l2 1
I
28 1
I
40 1
Total 31 69 100

1 An individual is chosen from them at random. Find:

~
1
a) What is the probability that the selected individual smokes?
b) What is the probability that the selected individual is a male and smokes?

~ c) What is the probability that the selected individual is a male?


d) What is the probability of selected individual being a smoker if he was found to
be a male?
Solution: Define event A = an individual who smokes, and B = an individual is
male
a) We want to find the probability that the selected individual smokes,
P(A)=311100=0.31.
b) Here, we want to obtain the probability that the selected individual a male and
he smokes, which is P(A n B) = 191100 = 0.19
c) The probability that the selected individual is a male is P(B) which is

1 d) Here, we want to find the conditional probability of a selected individual


P(AnB)- --0.19 -
--19
smoking given that he is a male: P(A I B) =
P(B) 0.60 60 '

In the example given above, it may be noted that, the difference in part b and d is
that we evaluate P(A n B) when the simultaneous occurrence of both events A and
B is required, whereas, we evaluate P(A ( B) when the chances of occurrence of
event A from event B is required as it is the conditional probability.

Example 3: In a card game, suppose a player wants to draw two cards of the same
suit in order to win. Out of a total of 52 cards, there are 13 cards in each suit.
Suppose at first draw, the player draws a diamond. Now, the player wishes to draw a
second diamond to win. What is the probability of his winning?

I1
t
Solution: Let the event A denotes getting a diamond at the first draw, and event B
denotes getting a diamond at the second draw. Clearly, we have to find the conditional
P(A n B)
probability of B , given A , P(BIA)=
P(A)
13x12 1
Here, P(A)=13152=114 a n d ~ ( ~ n ~c) = ' ~ ~ =--- ~ - ~ ~ c ~
52x51 17
Markov Chains 1/17
Thus, P(BIA)=-=4/17.
14
We may arrive at this result by reducing the sample space under the condition and by
getting the outcomes favorable to picking a diamond in the reduced space. At the time
of the second draw, one diamond has already been chosen, and there are only 12
diamonds remaining in a deck of remaining 5 1 cards. Thus, the total number of
possible outcomes will be 51 , and the outcomes favorable to picking a diamond will
be 12. Thus, P(B I A) = 12/51.
***
You may now try the following exercises.

El) Suppose that A and B are events in a random experiment with P(B) > 0
Prove each of the following:
a) If B c A , then P(AIB)=l.

c) If A and B are disjoint, then P(A ( B) = 0.


E2) Suppose that A and B are events in a random experiment, each having positive
probability. Show that:
a) P(A I B) > P(A) e P(B ( A) > P(B) e P(A nB) > P(A)P(B) .

E3) The probability that it is Friday and that a student is absent is 0.03 . There are
6 school days in a week. What is the probability that a student is absent given
that today is Friday?
E4) Suppose that a bag contains 12 coins of which 5 are fair, 4 are biased, each
with the probability of heads being 113 ; and 3 are two-headed. A coin is
chosen at random from the bag and tossed.
a) Find the probability that the coin is biased.
b) Find the probability that the biased coin was selected and the coin lands
showing a head.
c) Given that the coin is biased, find the conditional probability of getting a
head.

In the next section, we shall discuss the concept of compound probability.

1.3 COMPOUND PROBABILITY


From the definition of conditional probability given in Eqn. (I), we can easily derive
the following multiplication rule by cross multiplication
P(A nB)=P(A)P(BIA) (4)
and, similarly from Eqn. (2), we get
P(A nB)=P(B)P(A(B) (5)
Now we want to extend it for three events. Let A, B and C be three events, then we
write

=P(A)P(B I A)P(C 1 A nB) . (6)


The above multiplication rule for three events can easily be extended using induction
for n events A,, A,, ..., A, belonging to a sample space as follows:
P(A, n A , n ...nAn)=P(Al)P(A,IAl)P(A,IA,nA,) ... \' Conditional Probability
...P(A, ) A , n A , n . . . n A n - , ) (7)

These relations are called compound probability law or multiplication law. This rule is
applied to find the probability of the concurrent occurrence of two or more events
using conditional probability as illustrated in the following example.

Example 4: A bag contains 5 white balls, and 4 black balls. Two balls are drawn
from the bag randomly, one by one, without replacement. Find the probability that the
first ball is black, and second is white.

Solution: Let events A = first ball is black, and B = second ball is white. Clearly,
we have to find out P(A n B) .

Since P(A) = 419 and P(B I A) = 5 18 (Under the condition that A has happened, the
reduced sample space has a total of 8 outcomes, out of which 5 are favorable to B .)

I Thus, using the multiplication law, we get:

Example 5: In a production process, three units are selected randomly without


replacement from lots of 100 units for inspection for quality control. If all three
selected units are found defective then the lot is rejected, otherwise it is accepted. If a
I lot contains 15 defective items, then find the probability that this lot will be:
I a) rejected
I b) accepted.

( Solution: Let the event A be that the first selected unit is defective. Event B be
1 equals that the second selected unit is defective, and event C is that the third selected
) unit is defective.
I
1 a) The lot is rejected if all three units are found defective. Thus, we need to
1 obtain P(A n B n C) .
( At the first draw, P(A) = 151100, getting a defective unit from 100 units
containing 15 defectives units.
1 In the second draw, assuming event A , the lot now contains 99 units with 14
I
defectives units. Thus P(B I A) = 14/99 .
, Similarly, P(C 1 A n B) is the probability of getting a defective unit in the third
I draw, given that in both the earlier draws there were defective units. Thus, the lot
now contains 98 units with 13 of which are defectives. Therefore,
P(C)AnB)=13/98.
Therefore, using the law of compound probability, we get:
P(A n B n C) = P(A) P(B(A)P(C I A n B)
--15 -
- 14 ---
13 - 13
I 100 ' 99 ' 98 4620 '

b) The lot will be accepted if it is not rejected. Clearly, the probability that the lot
, 13 4604 1151
will be accepted, 1- P(A n B n C) ,which is 1- -= -or -
4620 4620 1144 '

***
;You may now try some exercises.
Markov Chains
E5) A box contains 8 balls. Three of them are red and the remaining 5 are blue.
Two balls are drawn successively, at random and without replacement. Find the
probability that, the first draw results in red, and the second draw results in blue:.

E6) In a certain population, 30% of the persons smoke, and 8% have a certain type
of heart disease. Moreover, 12% of the persons who smoke have the heart
disease.
a. What percentage of the population smoke and have the heart disease?
b. What percentage of the population with the heart disease smoke?

E7) Consider the experiment that consists of rolling two fair dice. Let X denotes,
the score of the first die, and Y denotes the sum of the scores on both the dice.
a. Find the probability that X = 3 , and Y = 8 .
b. Find the probability that X = 3 , given that Y = 8 .
c. Find the probability that Y = 8 , given that X = 3 .

In the next section, we shall talk about an important law, which is the law of Total
Probability. It also includes the celebrated Bayes' Theorem.

1.4 BAYES THEOREM


To understand Baye's theorem, we need another result in probability which is also of
independent interest. Let us first prove that.

The Law of Total Probability


Let us suppose that B,, B,, B,,. .., B, are the events of a set which form a partition of
the sample space S . It means that all these events are mutually exclusive and their
union is the sample space.
-
A Symbolically, Bi nBj = @ for i it j and i,j=1,2,3 ,...,n and and
P(Bi) > 0, i = 1,2,. ..,n . Let A be another event over the sample space. Then we can

UB,= S
I1

write
i=l

A = A nS = A n U Bi = U{AnB ~ } [using the law of distribution] (8)


i

Here, A n Bi and A n Bj are mutually exclusive for all i f j, i, j = 1, 2, 3,. .., n


Therefore, using the law of addition, we get

Relation (9) is called the Law of Total Probability.


Now, We are ready to state Bayes' Theorem.

Theorem 1 (Bayes Theorem): Let w,, B, ,...,B, be a set of events which form a
by Thomas Bayes a British
Mathematician in 1763.
partition of the sample space, S . Let 4 be any event with P(A) > 0 . Then,

i=l
Proof: From the definition of conditional probability, for two events A and Bi , we
have
Conditional Probability

[using Eqn.(4)]

[using Eqn.(9)]

[using Eqn.(4)]

I
I
In the context of Bayes theorem, the probability P(B,) is called a priori probability
I of B, ,because it exists prior to the happening of event A in the experiment.
.
1
I
The probability P(B, I A) is-termed 'a posteriori probability because it is
determined after the happening of the event A , i.e. posterior to the event A .
Since the probability P(Bl I A) represents the likelihood of the event B, after eve*?
I
A happens, the probability P(Bl ( A) is called a 'likelihood' of event B, after the
happening of event A .

Let us apply the above results in the following examples to understand this concept.

Example 6: Suppose in a group of individuals 3 1% were smokers. It was also


observed that 19131 of smokers and 41169 of non-smokers were male. An individual
was chosen at random from the group. What is the probability of the selected
individual being a smoker, if he was found to be a male? (Compare with the problem
given in Example 2(d).)
Solution: Let the events B, be individual is smoker, B, be individual is non-
smoker, and A be individual is a male.
Clearly P(B,) = 0.31 and P(B,) = 1-0.3 1= 0.69 and
19 41
P(A I B,) =-, P(A IB,) = -
31 69
Substituting n = 2 and i = 1 in Eqn.(9) of Bayes' rule, we get

19
0.31~-
- 31 - l 9 which is the result in Example 2(d).
19 41
0.31~-+0.69~-
-60'
31 69
***

Example 7: There are three bags. The first bag contains 6 red balls, and 4 blue
balls. The second bag contains 2 red balls, and 8 blue balls. The third bag contains
5 red balls, and 5 blue balls. A bag was selected at random from the three bags, and
a ball was drawn randomly from it. The ball was found to be blue. What is the
probability that the ball came from second bag?

13
Markov Chains Solution: Let the events B, be selecting the first bag, B2 be selecting the second
bag, B, be selecting the third bag, and A be the ball drawn is blue.
Thus P(B, ) = P(B,) = P(B,) = 113
P(A IB,) = the probability of getting a blue ball from the first bag = 41 10 . Similarly,
P(AIB,)=8110, P(A)B3)=5/10.
We want to find the probability that the selected bag was second, given that a blue ball
came in the draw, i.e. P(B, I A). Using the Bayes' theorem, we have
1 8

***
You may now try the following exercises.

E8) In a die-coin experiment, a fair die is rolled and then a fair coin is tossed a
number of times, equal to the score on the die.
a) Find the probability that the coin shows head in every toss.
b) Given that the coin shows heads in all tosses, find the probability that the die
score was i , i = 1, 2, 3,4,5, 6.

E9) A plant that produces memory chips has 3 assembly lines. Line 1 produces
40% of the chips with a defective rate of 5% ,line 2 produces 25% of the
chips with a defective rate of 6% and line 3 produces 35% of the chips with a
defective rate of 3% . A chip is chosen at random from the plant.
a) Find the probability that the chip is defective.
b) Given that the chip is defective, find the probability that the chip was
produced by the Line 3 .

So far, we have discussed the conditional probability, compound probability, and


Bayes' theorem. Now, let us discuss conditional distribution.

1.5 CONDITIONAL DISTRIBUTION


We have learnt about Random Variable in the undergraduate course in detail. We may
recall that a random variable is a mathematical function over the sample space of an
experiment that maps its outcomes to real numbers. Unlike other mathematical
variables, a random variable cannot be assigned a value independently. It only
describes thepossible outcomes of an experiment in terms of real numbers. Due to
this, some people consider the name random variable a misnomer.
A probability distribution, more properly called a probability distribution
function, assigns a probability to every interval of the real numbers, so that the
probability axioms are satisfied. The probability distribution of the variable X can be
uniquely described by its cumulative distribution function, F(x) ,which is defined by
F(x) = P[X 5 XI for every x in R .
Every random variable gives rise to a probability distribution, and this distribution
contains most of the important information about the variable. If X is a random
variable, the corresponding probability distribution assigns to the interval (a, b] ,the
probability Pr[a < X < b] , i.e. the probability that the variable X will take a value in
the interval (a, b] . This probability can be expressed in terms of the cumulative
A probability distribution is called discrete if its cumulative distribution function is a Conditional Probability
step function consisting of a sequence of a countable number of jumps, which means
that it corresponds to a discrete random variable: a variable which can only attain
values from a certain finite, or countable, set. Here, we use probability mass function
(abbreviated p.m.f.) to represent the probability distribution. It gives the probability
that a discrete random variable is exactly equal to some value, that is p.m.f. is
f(x)=P(X=x).
A probability distribution is called continuous if its cumulative distribution function is
continuous, which means that it corresponds to a random variable X for which
P[ X = x ] = 0 for all x in R . In this case, we use a probability density function: a
non-negative function f defined on the real numbers, such that
b
P[acX Ib]= jf(x) dx
a

for all a and b , to assign the probability that the variable X will take a value in the
interval (a, b] .
You must have studied conditional distribution in your earlier course. Just to
recaptulate, let us discuss it here again. Let us first discuss the formal definition of
conditional distribution.
Given two jointly distributed random variables X and Y ,that is, a two-dimensional
random variable or vector (X, Y) , the conditional probability distribution of Y
given X (written " Y I X ") is the probability distribution of Y when X is known to
have taken a particular value.

Definition 2: Let X and Y be two discrete random variables (r.v.s.) associated with
the same random experiment, taking values in countable sets, Tx and T, respectively.
The function f (x, y) defined for all ordered pairs (x, y), x E T, and y E T, by the
relation
f(x, y) = P[X = x, Y = y]
is called the joint probability mass function of X and Y .
Note: By definition,
f(x, y) 2 0
and

Moreover, we should clarify that [X = x, Y = y] really stands for the event


[X = x] n[Y = y] ,and that [X = x, Y = y] is a simplified and accepted way of
expressing the intersection of the two events, [X = x] and [Y = y] .

Let us consider the following example.


Example 8: A committee of two persons, is formed by selecting them at random and
without replacement from a group of 10 persons, of whom 2 are mathematicians, 4
are statisticians and 4 are engineers. Let X and Y denote the number of
mathematicians and statisticians, respectively, in the committee. The possible values - -
of X are 0,1, 2, which are also the possible values of Y . Thus, all the ordered pairs
(x, y) of the values of X and Y are
(0, 01, (0,l)~(0, 21, (1, o), (1, o), (1, 219 (2, 1) m d (23 2),
( 2 9

The total number of ways of selecting two persons from a group of 10 persons is
' O C , = 45. Since the persons are selected at random, each of these 45 ways has the

I
same probability -. Consider the event [X = 1, Y = 11 that a committee has one
45
mathematician and one statistician. One mathematician can be selected from two in
Markov Chains 2 ~ =, 2 ways, and one statistician can be selected from 4 statisticians in 'c, = 4
ways. Hence, the total number of committees with 1 mathematician and 1 statistician
8
is 2 x 4 = 8 . Thus, P[x=l, Y =I]=-.
45

To obtain the probability of the event [X = 0, Y = 11, observe that if X = 0, Y = 1, this


means that 1 statistician is on the committee, and that no mathematician is on it.
Then, the other person on the committee has to be one of the 4 engineers. This
engineer can be selected in C, = 4 ways. Hence,

Similarly, we can obtain

Since the committee has only two members, it is obvious that there are no sample
points corresponding to the events [X = 1, Y = 21, [ X = 2, Y = 11 and [X = 2, Y = 21
Hence, the probabilities P[X = 1, Y = 21 = P[X = 2, Y = 11 = P[X = 2, Y = 21 = 0 .
We now summarise these calculations in the following table.

Table 1: P[X = x, Y = y] for x, y = 0,1, 2 .

Note: If we denote probability P[X = x, Y = y] by f (x, y) , then the function f (x, y)


is defined for all pairs (x, y) for values x and y of X and Y , respectively.
Moreover,
f(x, y ) 2 0
and
m -

We say that the function f (x, y) is the joint probability mass function of the r.v.s. X ,
Y , or random vector (X, Y) .

We now define the p.m.f. of the marginal distribution.

Let X 'and Y be r.v.s. taking values x G T, and Y E T, ,respectively and joint p.m.f.
f(x, y ) = P [ X = x , Y = y ] .
We define new functions, g and h ,as follows: Conditional Probability

(1 1)

In Eqn.(l I), we keep the value x of X fixed and sum f (x, y) over all values y of
Y . On the other hand, in Eqn.(l2), y is kept fixed and f (x, y) is summed over all
values of X . We wish to interpret the function g(x) defined for all value, x of X
and the function h(y) defined for all values y of Y . Notice that both g and h ,
being sums of non-negative numbers, are themselves non-negative. Further,
C go,= C C
xe7rx YET^
f(x, y ) = l
X G T ~

Thus, g(x) has all the properties of a p.m.f. of one=dimensional r.v. Similarly, you
can verify that h(y) also has all the properties of a p.m.f. We call these the p.m.f. of
the marginal distribution of X and Y respectively, as you can see from the following
definition.
Definition 3: The function g(x) defined for all values x E T, of the r.v. X by the
relation

is called the p.m.f. of the marginal distribution of X . Similarly, h(y) defined for all
the values Y E T, of the r.v. Y by the relation

is called the p.m.f, of the marginal distribution of Y


Definition 4: As usual, assume a random experiment that has a sample space S and a
probability function P on S . Suppose that X and Y be two discrete random
variables for the experiment, taking values in the sets T,, T, respectively. For
discrete random variables (X, Y) ,the conditional probability mass function Y given
X = x , x E Tx can be written as P(Y = y IX = x), y E T, . From the definition of
conditional probability,
P[Y = y,X = XI
P[Y=yIX=x] = , provided P[X = x] > 0 (13)
P[X = x ]
and if f,,, (x, y) be the joint probability mass function of X and Y , and f, (x)(> 0)
be the marginal probability mass function of X , then the conditional probability mass
function P(Y = y I X = x) ,for a given x E T, can be expressed as

Similarly, if X and Y be continuous random variables for an experiment, then letting


f,,, (x, y) to be the joint probability density function of X and Y ,and fx(x), f, (y)
be the marginal probability density function of X and Y respectively, then the
conditional probability density function of Y given X = x can be written as
f,,, (y I x) ,and is defined by

provided f, (x) > 0 .

Likewise, the conditional probability density function of X given Y = y can be


denoted as fm(x 1 y) and can be defined by
Markov Chains

provided fy(y) > 0 .


(We will use the notation f,,, (x, y) for joint probability density function (p.d.f.) and
for joint probability mass function (p.m.f.), both. In the examples and exercises, for
simplicity, f (x, y) has been used to represent f,,, (x, y) .
Definition 5: For two-dimensional discrete random variable (X, Y) , the conditional
cumulative distribution function of a discrete random variable Y , given X = x ,is
defined as

and the conditional cumulative distribution function of random variable X , given


Y = y , is defined as
F , ~ ~ ( X I ~ ) = ~ P [ X = U ~--cx<-
Y = ~ ~ (18)
usx
Similarly, if (X,Y) is two-dimensional continuous r.v., the conditional cumulative
distribution function of random variable Y ,given X = x , is defined as
J

Fyp(yIx)= Ify,,(u1x)du --<YCW (19)


-
and conditional cumulative distribution function of random variable X , given Y = y ,
is defined as
X

Fx,y(xIY)= I f X , * ( ~ l Y ) d ~--<X <- (20)


4

The conditional probability mass functions and conditional probability density


functions of random variables also satisfy properties of unconditional probability
distributions. They are non=negative on the basis of being the ratio of non=negative
and positive functions. It can easily be shown that they also sum up, or integrate to 1.
Theorem 2: For any pair of discrete random variable, X, Y

hoof: C P [ Y = ~ J X = X ] = (x. Y)
fX,Y

FTy yeTY fX(x)

Theorem 3: For a pair of continuous random variables, X, Y ,


Conditional Probability

Definition 6: Two discrete random variables, X and Y , are called independent if,
and only if
fx,y(x,y)=fx(x)f,(y),forall x ~ T , , a n d a l l y ~ T ~ (21)
where fX., (x, y) is the joint probability mass function of X and Y , and fx (x), f, (x)
are the marginal probability mass functions of X and Y respectively. Likewise, two
continuous random variables, X and Y are called independent, if and only if
f X , y ( ~y ,) = f x ( x ) fy(y), -rn<X<rn, - = < y < w (22)
where f,,, (x, y) is the joint probability density function of X and Y , and
fx (x), fy (y) are the marginal probability density functions of random variables X and
Y ,respectively.

It may be easily verified that the following conditions with usual notations are
equivalent for the independent random variables, X and Y ,both for discrete and
continuous case.
a- fXlY(xI~)=fX(x)
b. fy1x(yI X)= fy(y)
c. fx,.(x, Y)= f,(x)f,(y)
forall - - < x < m , - m < y < w .

Now, look at-some examples.

Example 9: The random variables, X and Y ,have the following joint probability
mass function fx+,(x, y) in the cells of a bivariate probability table.

Find
(i) The conditional probability distribution of Y given X
(ii) The conditional probability distribution of X given Y
(iii) Are X and Y independent?
Solution: (i) Here, T, = {O,l, 21, T, = {O,l, 2)
Clearly, marginal probability mass function of X ,may be e v a l u ~ ~ ~ f o l l o w s :
fx ( 4= C f,,, (x7 Y)
y€Ty
Markov Chains

These are basically row totals in the bivariate table. Similarly, column totals give
f Y (Y)-

The conditional probability distribution of Y , given X = x, x E T, , may be calculated


as f~llows:

Thus, conditional probability distribution of Y , given X , are found as shown in the


rows of the following table.

P[y=yIx=x]
(x, Y) Total
0 1 2
0 (45/28)(2/9) = 5/14 (45/28)(1/3) = 15/28 (45/28)(1/15) = 3/28 1
1 (45/16)(2/9) = 518 (45/16)(2/15) = 318 (45/16)(0) = 0 1
2 45(1/45)=1 45 (0) = 0 45 (0) = 0 1

(ii) In a similar way, we may obtain conditional probability distribution of X given


Y.
(iii) Since fx,, (0,O) # fx(0)fy(0) , therefore, X and Y are not independent.
***
Example 10: Suppose that (X, Y) has joint density function
ifO<x<l, O<y<l
otherwise
a) Find the conditional density of X given Y = y .
b) Find the conditional density of Y given X = x .
c) Are X and Y independent?
d) Compute P[O<X<1/21Y =1/3].

Solution: The marginal probability density function of X


Conditional Probability

x+ll2, ifO<x<I
fx (x) =
otherwise
The marginal probability density function of Y

1
+y and f, (y) = 0 otherwise

f, (Y)=
lo9
l12+y, ifO<y<l
otherwise
a) The conditional probability density function of X given Y = y, 0 < y < 1 will be

O<x<l
or f x l Y ( x ( y ) =
otherwise

b) The conditional probability density function of Y , given X = x, 0 < x < 1 , will


be

O<y<l
or fyl,(y~x)=
otherwise

c) Since f,,, (x, y) # fx (x)f, (y) , the random variables are not independent.

d) Now, the PIO < X < 1I 2 ( Y = 113j will be obtained by integrating the
conditional probability density function of X given Y = 113 over [O, 1121

Since, fxIy(x 1113) = -

6
- x+113 = -(x + 113)
113+1/2 5
Markov Chains
Thus, P[O<X<1/21Y =1/31= I-(5
'I2

0
6

Example 11: Suppose that random variable (X, Y) has joint probability density
function, f as given below
I
ifO<x<y<l
f (x, y) =
otherwise -
I

a) Find the conditional density of X ,given Y = y .


b) Find the conditional density of Y , given X = x .
c) Are X and Y independent?

-
Solution: The marginal probability density function of X

1 3
= 2 ( x + - --x2)=1+2x-3x2, if O c x c l
and f,(x) = O
2 2
, otherwise
The marginal probability density function of Y lir_ Fig.4
0
I

and f,(y) = 0, otherwise

a) The conditional probability density function of X ,given Y = y, 0 < y < 1 will be


b) The conditional probability density function of Y , given X = x, 0 < x < 1 will be Conditional Probability

and fylx(y Ix)=O, otherwise

c) Since f,,, (x, y) # f, (x)f, (y), 0 < x < y < 1, the random variables X and Y are
not independent.
***
You may now try the exercises that follow.
E10) Two dice are thrown. Let X denotes the sum of the scores on two dice and Y
denotes the absolute value of their difference.
a) Find the joint probability mass function of X and Y .
b) Find the marginal probability mass function of X .
c) Find the marginal probability mass function of Y .
d) Find the conditional probability mass function of Y given X = x .
e) Find the conditional probability mass function of X given Y = y .
f) Are X and Y independent?

E l 1) Suppose that the random variables X and Y have a joint probability density
function f as given below
ifO<x<y<2
f (x, y) =
otherwise
a) Find the conditional density of X , given Y = y .
b) Find the conditional density of Y , given X = x .
c) Are X and Y independent?

E12) Suppose that the random variables X and Y have a joint probability density
function f as given below
ifO<x<w, O<y<-
f (x, Y)=
otherwise
a) Find the conditional density of X , given Y = y .
b) Find the conditional density of Y , given X = x .
c) Find P[X < 11, P[X < Y], and P [ X + Y < 11

E13) Suppose that the random variables X and Y have a joint probability density
function f as given below

f(x,y) =
{:: ifO<x<y<l
otherwise
a) Check whether or not independence of X and Y holds.
b) Find P[X < 0.2 1 Y > 0.11, P[O. 1< Y < 0.41 .

The exercises in this section would have given you enough practice to compute the
density functions and distribution functions of bivariate random variables. Next, we
shall discuss measures of central tendency of the probability distribution of bivariate
random vectors.
Markov Chains
1.6 CONDITIONAL EXPECTATIONS
We shall begin this section with the definition of the conditional expectation of a
function of one random variable, given that the other variable has taken a given value.

Definition 7: Assume a random experiment has a sample space S , and a probability


function, P on S. Let that X and Y be two discrete random variables for the
experiment, taking values in the countable sets, Tx and Ty (being subsets of R ) ,
respectively. Let fx,, (x, y) be the joint probability mass function of random variables
X and Y , and f, (x), f,(y) , be the marginal probability mass functions of X and
Y , respectively. Then, the conditional expectation of Y , given X = x , denoted by
E(Y I X = x) , or E(Y I x) is defined as

provided the series on right hand side of Eqn. (23) is absolutely convergent. Here,
E(Y I X = x) is a function of x since x can take any value in T, , and the
conditional expectation of X , given Y = y , denoted by E(X I Y = y) or E(X I y) is
defined as

provided right hand side of Eqn. (24) converges absolutely. Here, E(X I Y = y) is a
function of y , since y can take any value in Ty . Similarly, if X and Y be
continuous random variables for the experiment, having f,,, (x, y) as the joint
probability density function and fx(x), f, (y) be the marginal probability density
functions of X and Y , respectively, then the conditional expectation of Y , given
X = x ,is defined as

provided the integral on right hand side of Eqn. (25) converges absolutely, and the
conditional expectation of X , given Y = y , is defined as

provided the right hand side of Eqn. (24) is absolutely convergent. Here, again we
note that E(Y I X = x) is a function of x , and E(X 1 Y = y) is a function of y , as both
x and y can vary in R . The conditional expectation of a function of a random
variable can also be defined in similar way. For the discrete random variables, X and
Y , as specified above, the conditional expectation of $(Y) , a function of random
variable Y , given X = x ,will be
fx,, (x*Y)
E($(Y) I X .=x) = $(Y)
y'Ty f,(x)
and for the continuous random variables, X and Y , as specified above, this
conditional expectation will be

The conditional variance of X , given Y = y ,can now be defined as


v(xIY=~)=E({x-E(xIY=~))~~Y =y) (29)
It can be easily shown that the above expression has following equivalent form
V(XIY =Y)=E(X"{Y = y ) - { ~ ( ~ I ~ = y ) } 2 (30)
Example 12: For the problem discussed in Example 9, obtain conditional Probability

(i) Expectation of Y , given X = 1.


(ii) Expectation of Y , given X = x , i.e., E(Y I X = x) .
(iii)Expectation of the expectation of Y , given X = x i.e., E(E(Y 1 X))
(iv) Find V(X 1 Y = 0)
Solution: (i) Since,

(ii) E(YIX=x)= x
yeTy
y
fx,,
fX
(x, Y)
('1

fx., (X, Y)
(iii) E(E(Y 1 X)) = E( y 1
ysTy fX(X)

which is the same as E(Y) . Since,

(iv) v(xIY=o)=E(x~IY=o)-{E(xIY=o)]~
2

xe Tx
r
Markov Chains

Example 13: Let the continuous random variables, X and Y ,have the following
joint probability density function
8xy, i f O < x < y < I
f (x, y) =
0 , otherwise
(i) Find the expectation of Y , given X = x , i.e., E(Y ( X = x)
(ii) Find V(Y ( X = x)
Solution: (i) Since

= 4x(l- x2) for 0 < x < 1 , and f, (x) = 0, otherwise


Therefore,

(ii) Since

--
2
Therefore,
v ( Y ( x = x ) . = E ( Y ~I x = x ) - { E ( Y ( x = x ) } ~

You must have seen in (iii) part of Example 12, that E(E(y I x)) and E(y) both attain
the same value, 315 . Now let us try to prove this in the theorem that follows.

Theorem 4: The expectation of the conditional expectation of Y , given X , is equal


to the expectation of Y i.e. E(E(Y[X))= E(Y) .
-
Proof: Suppose X and Y are discrete random variables. Therefore, from Eqn.(23), Conditional Probability
we have
E(YIX=r)=
Y
E,
z
y fx,, ( ~ Y)
fx(x)
7
, which is a function of x .

Thus,

(changing the order of summation)

c I

This result holds for the continuous random variables also. In similar way, the result
can be proved for the continuous random variables also.
***
Now let us prove another theorem.

Theorem 5: Prove that V(X) = E(V(X ) Y)) + V(E(X I Y))


Proof: We have proved that E(E(Y1X)) = E(Y) . We can prove, similarly, that
E(E(X1Y)) = E(X) and E(E(X~IY)) = E ( X ~
)

We start from right hand side of the statement,


E(V(X I Y)) + V(E(X I Y))
= E{E(X~I Y) - (E(X I Y ) ) ~+) E{E(X I Y ) ) -
~ {E(E(X I Y ) ) ) ~
I I + E{E(X 1 Y ) ) -~ {E(X)j2
= E { E ( X ~Y)) - E(E(X Y ) ) ~
= E ( x ~ )- {E(x)j2= V(X) , the left hand side.
In the proof, we used the fact that
E(E(x~~Y)) =E(x~)
***
You may now try the following exercises based on these discussions.
E14) Suppose that (X, Y) is uniformly distributed on the square
R = {(x, y) :-6 < x < 6,-6 < y < 6) . That is, the joint probability density of
(X, Y) is

f,.,(x,

Then find
[t
Y ) = 144'
-6<x<6, -6<y<6

otherwise

(i) Expectation of Y given X = x , i.e., E(Y 1 X = x)


(ii) Expectation of X given Y = y , i.e., E(X I Y = y)
(iii) V(Y ( x = x)
(iv) V(X 1 Y = y)
Markov Chains E15) Suppose that a random vector (X, Y) has a joint probability density function, f
as given below
kxy, ifO<y<x<l
f(x,y) =
0, otherwise
(i) Find k .
(ii) Find E(Y I X = x) .
(iii) Find V(Y [ X = x) .

Here, we close the discussion on conditional probability. We hope that you have
gained considerable knowledge about conditional probability, and conditional
distribution. Now let us summarise what we discussed in this unit.

1.7 SUMMARY
In this unit, we have covered the following points.
1. We illustrated the idea of conditional probability, and presented some examples to
elaborate its concept. The conditional probability of an event is obtained on the
basis of prior knowledge of the happening of another event. For the evaluation of
the conditional probability of an event, the sample space gets reduced to the event
whose occurrence has taken place.
2. We attempted to describe conditional probability. Conditional probability is
influenced by the happening of another event if the events are dependent,
otherwise, it is not influenced and the events are called independent.
3. We studied some basic properties of conditional probability. They were similar to
the general properties of a probability function on a sample space.
4. We stated and proved the famous Bayes' theorem. We presented simple examples
to illustrate it.
5. We have acquainted you with the concept of conditional distribution with some
examples.
6. We have defined conditional expectation of a random vector, and some important
properties. Finally, we defined conditional variance.

E I) a) If B E A , then A nB = B , therefore, we have

b) If A c B then A nB = A ,therefore, we get

c) If A and B are disjoint then A nB = 0,therefore, we get

E2) a) Since, P(A) > 0,and P(B) > 0


P(AIB) >P(A)
Conditional Probability

b) Proof is left for you.


c) Since, P(A) > O,andP(B) > 0 , thus
P(A(B)= P(A)

and B, = P ~ Ao
) P(A n B) = P(A)P(B)
P(B)

E3) Let event A = the student is absent, and the event B = today is Friday.
I
Since there are six school days, thus, P(B) = 116 , and P(A n B) = 0.03 .
Therefore, the required probability is

I E4) Let event A = the coin is biased, and the event B = the coin lands heads up.
The bag contains 12 coins: 5 fair, 4 biased, each with probability of heads
113 ;and, 3 two-headed.
a) P (coin is biased) = P(A) = 4 112 = 113 .
I
1
I) b) P (coin is biased and it lands head) = -
'I 9
1 c) P (the coin lands heads, given that it is biased)

E5) A box contains 8 balls: 3 of them are red, and the remaining 5 are blue. Two
balls are drawn successively, at random, and without replacement. Let the event
A be that a red ball is drawn in the first draw and event B be that the blue ball
is drawn in the second draw. The required probability is P(A n B) , and
! P ( A n B ) =P(A) P(BIA)=--=-
3 5 15
I 8 7 56

Ii E6) Event A = a person smokes, and event B = a person has heart disease.
We are given
P(A) = 0.3, P(B) = 0.08
I P(B)A)=0.12
1 a) We require P(A B) n
I n
P(A B) = P(A) P(B 1 A)

i
=0.3x0.12
= 0.036
Thus, the percentage of population that smoke is 3.6%.
b) We require P(A 1 B) here

Therefore the percentage of smokers with heart disease is 45%.


Markov Chains E7) The sample space is S = {(a, b) :a, b = 1, 2,. .., 6 } ,where a and bare scores on
the first and second die, respectively. X = 3 and Y = 8 means that the score at
first die 3 , and the score at second die is 5 , i.e. the outcome is (3,5) . Event
b Y = 8 may occur if the outcomes be either of, the following
(2,6), (3,5), (4,4), (5,3), (6, 2) . Thus, P(Y = 8) = 5/36, assuming 36
outcomes in S are equally likely.
a) P(X=3 and Y=8)=P{(3,5)}=1/36

E8) Let the events Bi = the score on the die is i , where i = 1, 2 , 3 , 4 , 5 , 6 , and event
A = all tosses of coin show heads
Clearly, P(Bi) = 116, i = 1, 2 , 3 , 4 , 5 , 6 and
P(A(Bl)=1/2, P(A(B2)=(1/2)(1/2)=1f4,P(A(B3)=(1/2)(1/2)(1/2)=1/8
P(A(B4)=(1/2)(1/2)(1/2)(1/2)=1/16similarly
~ ( A ( B , ) = 1 / 3 2P(A)B,)=1/64.
,

- 63 - 21
----
6 . 6 4 128
b) Since
Conditional Probability

E9) Let the events B, =the chip is produced by line i , where i = I, 2 , 3 and
event A = the chip is defective.
P(B,)=0.40, P(B,)=0.25, P(B,)=0.35
P(AIB,)=0.05,P(AIB2)=0.06, P(AIB,)=0.03

b) The chip was produced from the Line 3 , given that the chip is defective

E10) The sample space, as given below, consists of 36 equally likely outcomes.

'(1,l) (42) (1,3) (45) (1~6).


(2J) (2,2) (2,3) (2,4) (2,5) (2,6)
(3,l) (52) (3,3) (3,4) (3,5) (3,6)
S=.
(4-1) (4,2) (4,3) (4,4) (4,5) (4,6)
(51) (52) 63) (5,4) (5,5) (56)
,(6,1) (6,2) (6,3) (64) (65) (6,6),
a., b., c . Since random variable X = sum of scores on the two faces,
Y = absolute difference of scores on the two faces. Clearly, X takes values
2, 3,. .., 12 and Y takes values 0, 1, 2,. .., 5 . Joint p.m.f. f, (x, y) of X and Y
and their marginal p.m.f. may be very easily obtained from the bivariate
probability table as shown in the following table:

d) Conditional p.rn.fs of Y , given X, X, P[Y = y I X = x] ,are given in the


table below:
C
Markov Chains

e) Similarly, conditional pmf of X given Y ,can be obtained.


f) Since f,,, (x, y) # f, (x)f,(y ) . say for x = 2, y = 0 , therefore X and Y are
not independent.

E l 1) The marginal probability density function of X


m

f x ( x )= j f X . y ( ~ ? ~ ) d ~
-m

and f, (x) = 0 , otherwise

The marginal probability density function of Y


00

and f,(y)=O , otherwise

a) The conditional probability density function of X , given Y = y, 0 < y < 2


will be

fxly(x/y)=0 , otherwise

b) The conditional probability density function of Y , given X = x, 0 < x < 2


will be
Conditional Probability

=O otherwise

*
c) Since fx.,(x,y) fx (x)f,(y), 0 < x < y < 2 , the random variables X and Y
are not independent.

E12) The marginal probability density function of X


w

and fX(x)=O otherwise

The marginal probability density function of Y


rn

fy(y)= j f x , y ( x , ~ ) d x
4

CO

and fy (Y)= 0 otherwise

a) The conditional probability density function of X , given Y = y , will be

=O otherwise

b) The conditional probability density function of Y , given X = x , will be

=O otherwise
trkov Chains
P(X < Y) =
m m

w
5 5 f,,
x
.
(x, y) dxdy

0
m

1 1-x

P(x+Y<~)=[
0
5 fx,y(x,y)dxdy
0

. x+y=l
0 0
I

Fig.6
m

-
= ~fX,,(x.y)dy

= 52 dy
X

=2[xlY'l
y =x =2(1-x) ifO<x<l
and fx (x) = 0 otherwise
m

fY (Y)= jf,,,(x,y)dx
w

Y
= 52dx
0

= 2 [ ~X=O
] " ' ~ = 2 ~if O < y < l
and f y ( y ) = 0 otherwise

a) *
Since fX,, ( x , ~ ) fx(x)f,(y), 0 < x < y < 1, the random variables, X and
Y , are not independent.

P[X<0.2,Y>O. I]
P[X<0.2(Y>O.L]=
P[Y>O.l]
Conditional Probability

0.1 1 0.2 1
P[X<O.2,Y>O.l]= J I 2 dxdy + 1 1 2 dxdy

E14) Let the joint probability density function of X and Y be as follows


-6< x , y < 6
elsewhere

-
Since,
rn 6 6

therefore, k = 1/ 144
Again,
-

ce
fX.Y (x'Y)
(i) E(YIX=x)= Jy dy
fx Cx)

=O
(ii) We may get E(X I Y = y) = 0 in similar way.
Markov Chains

(iv) We may get V(X I Y = y) = 12 in similar way

E15) Since

therefore

(i) k=8

m
(XIY)
(iii) E(Y' I x =x ) = y
2 fX,Y
dy
~-- fx(x)
UNIT 2 THE BASICS OF MARKOV CHAIN
Structure Page No.

Introduction
Objectives
Stochastic Process
Markov Chain
Graphical Representation
Higher Order Transition Probabilities
Methods of Calculating Pn
Method of Spectral Decomposition
Method of Generating Function
Summary
Solutions/Answers

2.1 INTRODUCTION
The Markov chain is named after Andrey Markov (1856 - 1922), a Russian
mathematician. It is a discrete-time stochastic process with the Markov property.
Andrey Markov produced the first results in 1906 for these processes having finite
state space. A generalization to countably infinite state spaces was given by
Kolmogorov. Further work was done by W. Doeblin, W. Feller, K. L. Chung and
others. In most of our study of probability so far, we have dealt with independent trials
processes as a sequence of identically and independently distributed random variables.
These processes are the basis of classical probability theory, and much of statistics.
We have discussed two of the principal theorems for these processes: the Law of .
Large Numbers, and the Central Limit Theorem. We have seen that when a sequence
of repeated chance experiments forms an independent trials process, the possible
outcomes for each experiment are the same and occur with the same probability.
Further, knowledge of the outcomes of the previous experiments does not influence
our predictions for the outcomes of the present or future experiment.

In many cases in real life, we observe a sequence of chance experiments where all of
the outcomes in the past experiments may influence our predictions for the next
experiment. The sequence of random variables associated with the sequence of such
experiments may not be identically and independently distributed. For example, this
will happen in predicting a student's grades on a sequence of exams in a course. But to
allow too much generality makes the processes mathematically difficult to handle. A.
Markov studied this type of chance process where the outcome of current experiment
(not previous experiments) can only affect the outcome of the next experiment. This
type of process is called a Markov process, a particular case of which, when state
space is discrete, is called a Markov chain.

Markovian systems appear extensively in physics. Markov chains can also be used to
model various processes in queuing theory. The Page Rank of a webpage as used by
Google is defined by a Markov chain. Markov chain methods have also become very
important for generating sequences of random numbers to accurately reflect very
complicated desired probability distributions - a process called Markov chain Monte
Carlo, or MCMC for short. Markov chains also have many applications in biological
modeling, particularly population processes, which are useful in modeling processes
that are (at least) analogous to biological populations. The Leslie matrix is one such
example, though some of its entries are not probabilities (they may be greater than 1).

We will present some discussion about the concept of stochastic processes, definition
and understanding of Markov chain in Sec.2.2 and Sec.2.3, respectively. We will also
present some examples to illustrate the behavior of Markov chain. Here, we will also
i Markov Chains learn about the Transition Probability Matrix P , higher order Transition Probabilities,
k and the famous Chapman-Kolomogorov equation. In Sec. 2.4, we will represent a
t Markov chain graphically, and in Sec. 2.4, we shall compute higher order transition

t
t
k
probability. In Sec. 2.6, we will learn two methods for calculation of Pn,viz.,
Spectral Decomposition, and Generating Function.
Objectives
After studying this unit you should be able to:
explain the concept of a stochastic process, and that of a Markov chain as a special
case of stochastic process;
compute the transition probability matrix with some of its applications;
evaluate higher order transition probabilities, and unconditional probability
distribution after a number of stem in a Markov chain:

2.2 STOCHASTIC PROCESS

I Let us start this section by discussing the following situations:


Consider a simple experiment like a series of independent throwings of a coin.
(i)
Suppose that X, denotes the total number of heads found in the first n throws.
Then {X, , n = 1, 2, 3, ...) is a family of random variables constituting a
stochastic process.
(ii) Consider another simple experiment. Suppose a dice is thrown a number of -
times, and suppose that X, is the number of sixes in the first n throws. If we
allow n to vary as n = 1, 2,. .., then we get a sequence random variables
{X, , n = 1,2,3, ...) , When n varies, we have a family of random variables
constituting a stochastic process.

A stochastic process is defined as an indexed collection of random variables {X, ) ,


also known as chance where the index, n , belongs to an index set, T . In most real life situations, this set
or random process. represents time, either discrete or continuous. The collection of random variables is
defined as some sample space. The set of all possible values taken by these random
variables is known as, state space of the stochastic process and we will denote it by
S . The state space is called discrete if it contains a finite, or, countably infinite ,
number of points, and it is called continuous when it is an interval or union of disjoint
intervals.

For example, in situation (i). X, denotes the total number of heads found in n
independent throws of a coin. Thus the state space, S , will be a finite set of
non-negative integers, 0,1, 2,. .., n . Here, the collection of random variables {X, ) ,
will be a stochastic process having finite state space. In situation (ii), the state space
of X, is also discrete. We can write X, = Y, + Y, +.a.+ Y, ,where Y, is a discrete
random variable denoting the outcome of the ith throw and Y, = 1 or 0 accordingly
as the ith throw shows a six or not. Representation X, = Y, +-..+Yn is valid in both
the situations (i) and (ii). In an another situations, we may consider a collection of
random variables {X,= Y, + Y, + ...+ Y, , n = 1, 2, 3,. ..) where Yi is a continuous
random variable assuming values in (0, m). Here, the set of possible values of X,
belong to the interval (0, -) ,and so the state space S of the stochastic process X, is
continuous.
From the examples above, it is clear that a stochastic process may be a discrete time The Basics of Markov
stochastic process, when the index set is a discrete set T , often a collection of the Chains
non-negative integers O , 1 , 2, 3,. .. , or it may be continuous time stochastic process
when the index set is continuous (usually space or time interval), resulting in an
uncountably infinite number of random variables. We may use alternative notation for
a stochastic process such as X(t) or X, where t indicates space or time in day. i
I

So far, we have discussed the case of a stochastic process in which X(t) are
one = dimensional random variable. There may be processes with X(t) that are more
than one = dimensional. Consider X(t) = (X, (t), X, (t)) , in which X, (t) represents
the minimum temperature, and X2(t) represents the maximum temperature in a city in
a time interval [0, t] , then the stochastic process is two = dimensional. Similarly, we
can have a multi-dimensional stochastic process also. In general, stochastic processes
can be categorized into the following four types:

(i) discrete state space and discrete time


(ii) discrete state space and continuous time
(iii) continuous state space and discrete time, and
(iv) continuous state space and continuous time.

Thus, we see that the index set, T , and the state space, S, of a stochastic process may
be discrete or continuous. Familiar examples of the stochastic processes include prices
of shares, varying every moment in a stock market, and exchange rates of our currency
fluctuating along with time. Other examples, such as a patient's ECG, blood pressure,
or temperature, constitute stochastic processes arising in medical sciences.

In the next section, we shall discuss Markov chains.

2.3 MARKOV CHAIN


A discrete time Markov chain is a stochastic process where both the index set I and
the state space S are discrete and the stochastic process satisfies markov property. A
sequence of random variables (X, } is said to follow markov property if we are given
the present state, that is, the value taken by the random variable is X, , then the states
of the future, that is, values of random variables X,,, , X,,, ,... are independent of the
states of the past, that is, the value of the random variables X,-, , X,-, ,... . For
example, if the stock price of a stock in the National Stock Exchange follows markov
property then the stock price at a future date will depend only on the current price that
is known to us, and will not depend on its prices during past dates. The Markov chain
and Markov property may be formally defined as follows.

Definition 1: A stochastic process {Xi} with the index set T = (0, 1, 2,..., i,. ..) and
discrete state space S = { 1, 2,. ..,t,.. .s) is called a Markov chain, if for any of the
states, i,,i,,i2,i ,,..., i ,-,, i, j ~ S , a n d a n yn E T , w e h a v e

and in this situation, the sequence of random variables (X, ] is said to possess the
Markov Property. If X, has the outcome i (i.e. X, = i) ,then the Markov chain is
said to be in state i at nth trial, or at time n . In the definition above, s may be
infinity.
Markov Chains The Markov chain will be called a Finite Markov chain if the state space S is finite.

.
The probability P[X,+,= j I X, = i,X,_, = in, ..... X, = i,,X, = i, , X , = i,] in the above
definition denotes the conditional probability that the system will be in state j at time
n + 1 , given that the system was in the state i at time n , in the state in, at time
n - 1,.... in the state i, at time 1, in the state i, initially at time 0.Due to the
Markov property this probability depends only on the latest given state, i.e., on the
state i , at time n .
Let (i, j) denote a pair of states at the two times, say, at time m and n, m I n . The
transition probability for making the transition from state i ,Bt time m ,to state j at
time n .
P[Xn = j(X, = i ] = ~ , ~ ( m , n ) (2)
is called m - n step transition probability.
Here, we have assumed that the transition probabilities depend on both the states i, j ,
and both the times m, n

Definition 2: The unconditional probability distribution of the initial random variable


X, ,of the Markov chain {X,) is called the initial distribution of the chain. The
Markov chain starts in a state chosen according to the probability distribution of X, .
Let the vector u = (u,, u,, u,, .... us) be the vector having s elements corresponding
to s states namely 1, 2,. ... s such that u, = P(X, = i),i = 1,2,.... s . Thus u, denotes
the probability that the chain starts in state i at time 0.

Definition 3: A Markov chain is called time homogeneous or with stationary


transition probabilities, if its transition probabilities p. .(m, n) do not depend on the
1J
specific times, m and n ,but depend only on time duration n - m ,i.e., on thenumber
of steps taken between two times.

In this section, we shall only discuss the time homogeneous chains. In this case, the
m -step transition probability for a homogeneous chain may be denoted as

+ m = j 1 Xn = i] = p.1J.(m), for any n in the index set 1 (3)


and one step transition probability as P[Xn + I = j I X = i] = p.. (here, we denote
n 1J

pg'l' = pij omitting the superscript (1) for convenience.)

Definition 4 (Transition Matrix): Suppose the state space S of a time homogeneous


.
Markov chain contains s states 1, 2, 3,. ... s then the s x 4 matrix of the one-step
.
transition probabilities (p,]) is called a P -matrix and is der 11)ted by P This square
matrix is also called the matrix of transition probabilities. ()$thetransition matrix.
Since the (i, j)" element of P represents the one-step transition probability p,, that
is the probability that the chain will move from the state i to the state j in one step.
S
Therefore, the sum of elements in each row of P is one, i.e. 1 p..J' = 1 for all i .
j=l
A square matrix
with non negative elements, and row-sum unity is called a stochastic matrix. When The Basics of Markov
Chains
the column-sums are also unity, then this matrix is called a doubly stochastic. Thus,
a transition matrix is stochastic.

Remark 1: A problem can be modeled as a (homogeneous) Markov chain if it has


the following properties:
a) For any unit time period, a, object in the system is in exactly one of the
defined states. At the end of the time period, the object either moves to a new
state, or stays in that same state for another unit time period.
b) The object moves from one state to the next according to the transition
probabilities which depend only on the current state of the object, and not on
any previous history of its states. The total probability of movement, out from
a state (movement from a state to the same state does count as movement) is
equal to one.
c) The transition probabilities do not change over time (the probability of going
from state A to state B in the current unit time period is the same as it will be
at any other period in the future).

Now, we state below two theorems without proof. Theorem 1 is known as the general
existence theorem. Theorem 2 states three different conditions identical to the
Markov property. For the proof of these theorems, you may refer to Markov chain
with Transition Probabilities by K. L. Chung (1967).

Theorem 1: The stochastic matrix and the initial distribution completely specify a
Markov chain.

Theorem 2; The markov property referred to in Eqn.(l) is equivalent to any one of


the following three results. Let the states i, j, i,, i,, i,,. .. be any states of the Markov
chain (X, ) then

1. forany n l < n 2 < n 3 <... n,<n,+,


P I X n k + l = j l X n k = i , X = i k I . . .X =i,,X =ill
"k-1 "2 "1
= P[Xnk+,= j 1 Xnk = i]

2. forany n l < n , < n 3 < .... < n ,


P[Xnk=ik,Xnk-, =ik-,,...,Xn2= i 2 , X n I= i l l
= (P lXnk = ik I Xnk-,
= ik-l 1
....... P[Xn2= i2 I X,, = i,]PIXnI= i, 1)

3. PIX,+l = j,X, = i,Xn-I= in-,,...,XI = il,Xo = i,]


= PIX.n+l= j I X, = i] ....... P[X, = i, ( X , = i,]P[X, = i,] (6)

From Egn. (6) it follows that the joint probability distribution of


(X,, X,, X,, X,,. .., Xn) of the Markov chain (X, ] is completely determined if the
initial distribution, and the transition matrix of the chain are known. Let us prove this
result.

Starting with the joint probability of (X,, XI, X,, X, ,..., X , ) , we have
P[Xn =in,Xn-,=in-,,...,XI =i,,X, =i,]
=(P[Xn =in(X,-, =in., ,...,XI =il,Xo =i,] .......
P[X, = i 2 [ X I=i,,Xo =i,]P[X, = i , IX, =i,]P[X, =i,]]
using conditional probability and product rule.
Markov Chains =(P[X, = i n )X,-, =in.,] .......
P[X, = i, IX, = i, ]P[X, = i, I X, = i,]P[X, = i,] ] using the markov property
-
...Pi,&Pi& Pil&'i,,
- Pinin_, 9

,. .., pi,i,,
where uio is the initial probability and pinin_, are transition probabilities.

Example 1 (Simple Weather Model): Let us consider three possible conditions of


weather at any day say, Sunny (S), Cloudy (C), and Rainy (R). Suppose the
probability that a sunny day will follow a sunny day be 0.75, and that the cloudy day
will follow a sunny day be 0.15 and the rainy day will follow a sunny day be 0.10.
Similarly, the probability that a cloudy day will follow a sunny day be 0.25, it will
follow a cloudy day be 0.45, and it will follow rainy day be 0.30. The probability that
a rainy day will follow a sunny day be 0.15, it will follow a cloudy day be 0.45 and it
will follow that a rainy day be 0.40. We assume that each day's weather condition
depends only on the condition of the previous day. Therefore, with this information we
may form a Markov chain (X, } , where X, represents weather condition of the nth
day. We may take three conditions of weather S, C, and R.as the states denoted by,
numbers 1, 2, 3 respectively for the Markov chain.

From the information above, we can determine the transition probabilities as follows

These are conveniently presented in a 3x 3 square transition matrix P given below.


S C R

Example 2: Consider that in a city in the coming week the probability that a healthy
person will fall sick is 0.20, and that he will remain healthy is 0.80. Consider another
case where, in the coming week, the probabilities that a sick person will become
healthy is 0.65, will die be 0.25 and will remain sick is 0.10. We will form a P -matrix
on the basis of the above information. In each week, a person will be in any of three
conditions -healthy, sick or dead, which are tlie three states for his health. If each
week's health condition depends on the .condition on the previous week only, then we
have a Markov chain {X, } ,where X, represents health condition of a person in the
nthweek. From the above information we can determine the transition probabilities.
Assume the states Healthy, Sick and Dead are denoted as 1, 2, 3 respectively, then

The transition matrix P i s obtained as shown below.


H S D The Basics of Markov
Chains

***
So far, we have only defined a Markov chain. Now, let us dis'cuss a graphical
representation of a Markov chain.

2.4 GRAPHICAL REPRESENTATION


Markov chains may be depicted by a directed graph or digraph, the graph having
directed edges. The states of a Markov chain are represented by the vertices, or the
nodes of the graph, and the single step transitions between the states by the directed
arcs (edges) joining the vertices. For instance, in Example 1, the probability of the
single step transition from the state, rainy (R) to sunny (S) is 0.15 . Then vertices
labeled R and S are joined by an arc (also a called edge) directed from vertex R to the
vertex S. The arc is labeled by the corresponding probability as 0.15 , in this case.
Likewise, transition from S to S ,with probability 0.75 , is represented by a self-loop
labeled by 0.75 at the vertex S . No edge is drawn corresponding to a transition
probability zero. Thus, the number of edges, including self loops, will equal the
number of positive entries in the one step transition probability matrix, which is 9 in
this case. Let the graph denoted by G = (V, E) . Then V, being the set of vertices
representing different states of the Markov chain, and E, being the set edges
representing all possible non-zero transitional probabilities. This digraph is called a
transition graph. In a transition graph, the sum of the probabilities of all the edges
emanating from each node will be one. Conversely, if in a labeled digraph all the
labels of the edges are positive numbers, and the sum of all the labels of the edges
emanating from each node is one, then such a graph is called stochastic graph, and
we can define a Markov chain with this digraph as its transition graph.

Example 3: The directed graph of the Markov chain given in the Example 1 is
shown in Fig 1.

Example 4: The.directed graph of the Markov chain given in Example 2 is shown in


Fig 2.
Markov Chains Example 5: The prime minister of a country tells a journalist, X , about his intention
to run, or not to run in the next election. The journalist transmits this information to
Y and Y transmits it to Z , and so forth. We assume that there is a probability 'a'
that a person will change the answer from "yes" to "no" when transmitting it to the
next person, and a probability 'b' that a person will change it from "no" to "yes". We
choose the messages, either "yes" or "no", to the fellow journalists as states. It may be
expressed as a Markov chain (X, ) where X, denotes the transmitted message by the
nthperson to the next person. X, denotes the intention revealed by the prime minister
at the start. Here, we have denoted two states "yes" and "no" as 1 and 2 respectively.
From the above information, we can determine the transition probabilities.
pI1=P(X, =I/x,-, =1)=1-a, p,, =P(X, = 2 ) ~ , - =
, l)=a
pZ1= P(X, = 11 X,-, = 2) = b, pZ2=P(X, =21X,-, = 2 ) = 1 - b

Therefore, the P -matrix will be

Example 6: Each time a certain horse runs in a three-horse race, he has probability
1/ 2 of winning (W), 1/ 4 of coming in second (S) , and 114 of coming in third (T),
independent of the outcome of any previous race. We have an independent trials
process, but it can also be considered as a Markov chain. Here, we choose outcomes of
the race, that is, winning, second, and third as three states. It may be modeled as a
Markov chain (X, 1, where X, denotes the outcome of the nthrace. From the above
information, we can determine the transition probabilities shown in the transition
matrix that follows.
W S T

Remark 2: In general, we see that any sequence of discrete i.i.d. (identically and
independently distributed) random variables can be considered as a Markov chain. In
such a case, the transition matrix has identical rows, each row being the probability
distribution of the random variable, X, .

You may now try the following exercises on the basis of above discussions.

E l ) Assume that a man's profession can be classified as business, agriculture, or


public servant. It is known from past data that, of the sons of businessmen, 80%
are businessmen, 10% are farmers and 10% are public servants. In the case of
sons of fanners, 60% are farmers, 20% are businessmen, and 20% are public
servants. Finally, in thc case of public servants 50% of the sons are public
servants and 25% each are in the other two categories. Assume that every man
has at least one son. Does the choice of profession by sons in the successive
generations in a family form a Markov chain.? If so write down its matrix of
transition probabilities.

E2) Draw the transition graph for the problem given in Example 6.

E3) The schooling status of a student in any year may be represented by 6 states,
namely, nursery, class one, class two, ... class five. Let pi denotes the
probability that a student in state i in any year jumps to a higher class (state
i + 1) and qi denotes the probability that a student remains to the same class
(state i ) in the next year. Assume that class 5 is the highest status and it can not
be crossed. If X, denotes the status of a student in the nth year of his schooling,
show that {X, ) is a Markov chain. Set up the matrix of transition probabilities.

So far, we have learnt about the Markov chain and its graphical representation. Now in
this section we shall continue the discussion to the higher step transition probabilities.

.2.5 HIGHER ORDER TRANSITION PROBABILITIES


Definition 5 (Higher Steps Transition Probability Matrix): The n -step transition
probability for the transition from state i to j in n steps in a homogeneous Markov
chain, denoted by pjy', was defined in Equation 3. The matrix P'") = (pjl)) is called
n -step transition probability matrix.
When n = 1, we have P") = (pfi))= (pij)= P .

For convenience, we define P(O' = I , where I is an identity matrix.

The unconditional probability distribution of X, , the state of Markov chain at the


step n, is defined as uin)= P[X, = j]. j = 1, 2, 3,. .., s .
The unconditional probability distribution of X, in the vector form may be denoted as
.(n) =(urn),u?), ..., u : ~ ) )
.Now, we will prove some results providing a relation between P'") and the P -matrix.
These result will be useful in computing higher order transition probabilities.

Theorem 3: The n-step transition probabilities satisfy the recurrence relation


Pi;) = x PI:-1'
S

k=l
prj for i, j~ S , the matrix form of which can be written as

F p ( n ) = p(n-1) P

Proof: Using the Law of Total Probability and Conditional Probability discussed in
Unit 1, we have
5:"' =P[x, = jIX, =i]
=x S

k=l
P[X, = j,X,-I = k I X, = i] (using the Law of Total Probability)

=x S

k=l
{P[x, = j I X,-I = k, X, = i] P[x,-,= k I X, = i] } (using conditional Probability)

=x S

k= l
{P[x,= j I X,-I I
= L]P[X,-I = k X, = i] } (using the Markov Property)

The last expression is the ijth element in the multiplication of matrices P'"-" and
P = (pij ) . Thus, we get
P'") = P("-"P [since ) = P(")]
***
Theorem 4: Let P be the transition matrix of a homogeneous Markov chain. The ij'
entry of the matrix P" gives the probability that the Markov chain, starting in the
state i initially, will be in state j after n steps, i.e.
p'"' = p"
Proof: Clearly, the probability that the Markov chain, starting in state i , will be in
state j after n steps is p r ) ,which is the ij' entry of the matrix P(") . Therefore, the
theorem will be proved if we prove P(") = P o .
Now, let us apply the method of induction to prove this. For n = 2 ,
From Theorem 4 we have P ( ~=) P(') P = P P = P2 .
Again, assuming the result for n, we can verify it for n + 1 as follows.
p ( n + l ) = p(n)p (using Theorem 4)
= PnP (using the assumption for n)
= pn+l
Hence, by the method of induction, it is proved that the statement is true for every
positive integer, n.
***
Theorem 5 (Chapman-Kolomogorov Equation): A time homogeneous Markov
chain satisfies the equation
p ~ + " ) = p[;)p$),i,
k j=1,2 ,..., s,for m, n=O,1,2, ...
k=l
or, in matrix form P('"+") = P ( ~ ) P ( " )P(')
, =I

Proof: p?'") = P[x,+,, = j 1 X, = i]

= 2
k=l
P[x,+,= j, Xn = k I X, = i] (using the Law of Total Probability)
S

=~{P[x,+=
~ j l X , =k,X, =i]
k=l

P[X, = k I X, = i]} (using Conditional Probability)


S

= (P[x,+, = j I X, = ~ ] P [ x=, k I X, = i] ) (using the Markov Property)


k=l

Using matrix multiplication representation, we have, for every m, n


P(,+~)= P(,) P ( ~ )where
, we define P(') = I
***

Theorem 6: Let P be the transition matrix of a time homogeneous Markov chain, and
let u be the initial probability vector. Then the unconditional probability P(Xn = j)
that the chain is in state j after n steps, is the j' entry in the vector u(") given by
,,(") = UP"
Proof: Since,
u .'") = P [X, = j]
J
s
= P[Xn = j, X, = i] (using the Law of Total Probability)
The Basics of Markov
Chains

The result may be expressed in matrix form as

= upn (Using theorem 5)

Example 7 (Random Walk): We consider a particle which performs a random walk


on a real line on the set of non-negative integers (0,1, 2,. .., N) as shown in Fig. 3
with N + 1 possible positions. If, at any time, the particle is at the position i ( i can
be 1, 2,. ..,N - 1), then in the next unit of time it can move one step forward (+I) to
position i + 1, or one step backward (-1) to the position i - 1, with probabilities
p(0 < p < 1) and q(q = 1 - p) ,respectively.

Fig.3: The Random Walk

At the end points, 0 and N ,there are two typical behaviours for the particle. If the
particle reaches at 0 ,then it remains at 0 with probability 'a' or moves to 1 with
probability '1 - a ' . Similarly, assume the particle remains at N with probability 'b'
and moves to N - 1 with probability '1 - b' whenever it reaches that position. The
position 0 will be an absorbing barrier when a = 1, and it will be a reflecting barrier if
a = 0 . The position 0 will be called elastic barrier or partially reflective barrier if
0 < a < 1. Similarly, the position N will be absorbing when b = 1 , reflective when
b = 0 , and elastic/partially reflective when 0 < b < 1.

Suppose the particle starts in a position, k(0 5 k 5 N) , at time 0 . Let X, denotes the
position of the particle at time n . Then, clearly, sequence {X,) follows the Markov
property. The N + 1 possible positions (0, 1, 2, ...,N) of the particle are the possible
states of the chain.
Here, for 0 < r < N
P[Xn =r+I(X,-, = r ] = p
P[X, =r-lIXn-, = r ] = q

Also, when r = 0
P[X, =1IXn-, =0]=1-a
P[X, = 0 I X,-, = 0]= a
and, when r = N
P[X, = N - l J X n - , = N ] = l - b
P[X, =NIX,-, = N ] = b

The transition matrix is found as


Markov Chains

and, initial probability vector is u = (0, 0, ..., 1, 0, ..., 0) , with 1 at the k + 1" place in
the vector.
***
Example 8 (Gambler s Ruin Problem): There are two gamblers A and B playing
against each other. Let the initial capital of A be x units, and B is z - x units. At
each move player A can win one unit from B , with probability p , or can lose one unit
to B ,with probability q(p + q = 1) . In due course, after a series of independent moves,
if the capital of A reduces to zero, then A is ruined, and the game ends, and if his
capital increases to z ,then B will be ruined and the game ends. This problem can be
modeled as a random walk problem with absorbing barriers at two ends. Here, the
Markov chain {X,) ,represents the capital of A at the nth move of the game. It has
z + 1 states ranging from 0 to z . The transition probability matrix can be obtained
directly from Example 7 by putting a = 1, b = 1 and N = z . Also, initial state k = x
with probability 1 .
***
Example 9: Let the initial distribution in Example 1 of the Simple Weather Model
be u = (0.7 0.2 0.1) . Let the three states sunny, cloudy and rainy be represented by
integers 1, 2, 3 respectively.

Then, the probability that the initial day is sunny, the first day is rainy, the second day
is cloudy, and the third day is sunny, is given by:

Also, the probability that all successive four days starting from the initial day are
sunny equals:

Example 10: Suppose that in Example 9 the P -Matrix is modified as given below
The Basics of Markov
Chains

Let us now find the probability distribution of weather for the first day, second day,
and the third day, and also the probability distribution of the sixth day. The
probability distribution of weather for the first day is the probability distribution of
X , . Now
u,")=p[x, = I ]

=0.465
Similarly,
u,(" = P [X, =2]

1 and
I u3'"= P[X, =3]

I The probabilities may be written in vector form as


1 u''' = I0.465 0.22 0.3 15)

The distribution of X, may also be obtained by using formula u,'")= UP"


1 U'l' = "p

0.500 0.250 0.250


I

I
0.250 0.250 0.500)
I
= (0.7 0.2 0.1) 0.450 0.100 0.450 = (0.465 0.22 0.315)

We may get distribution of X, , the probability distribution of weather for the second
{Jay,as

= (0.410 0.217 0.373)


I
qikewise the dlstributlon of X, the probability distribution of weather for the third
Markov Chains

and the distribution of X, ,the probability distribution of weather for the sixth day,
will be
U(6) = up6

Remark 3: Here we see that sixth day probability distribution of weather has become
independent of the initial distribution. You may verify that the same distribution for
the sixth day will be found if any other initial distribution is used. This happens since
all rows of the p6are identical and define a probability distribution on a set of states.
You may also find more powers of P higher than 6. Are they identical to p6? If, for
a large n , all rows of the Pn become identical and define probability distribution,
then the Markov chain is called Regular Markov chain. We shall discuss these
chains in Unit 3.

Example 11 (Partial Sum): Let (X, ] be i.i.d. (identically and independently


distributed) random variables, taking only non-negative integral values. Let
n
S, = Z X , and So = 0. Then Sn = S,, + X, . Since, distribution of Sndepends only
1=1

on Sn-, and not on any of the Sn-, , S,-,, ..., So, the sequence (S, ] is a Markov
chain with state space S = (0,1, 2, ..., j, ...) .
Again,
,
p,, = P[S, = j 1 S, = i]
= P[X, = j - i] = p,-, (say)
Therefore, Markov chain (S, ] is time homogeneous, or has stationary transition
probabilities p, , given above. Here, p , depends only on the j - i , in such a case a
Markov chain is said to have stationary independent increments, and the Markov chain
is called additive process. If the sequence {X, ) is a sequence of i.i.d Bernoulli
random variables with PrX, = 11= p and P[X, = 01= q , then the P -Matrix of
(S, } will be
0 1 2 - - -

P=
Example 12 (Ehrenfest Model): This example is a special case of a model, called the The Basics of Markov
Chains
Ehrenfest model, given by P. and T. Ehrenfest in 1907. It has been used to explain the
diffusion of gases. Suppose we have two urns that contain between them four balls.
At each step, one of the four balls is chosen at random and moved from its present urn
to the other urn. We choose, as states, the number of balls in the first urn. Thus, the
set of states are (0,1, 2, 3,4) . The sequence of random variables (X, } , the denoting
number of balls in the first urn at successive steps is a Markov chain.
po,=P(Xn=jIX,-,=O)=O,when j # l , p , j = P ( X , = l ~ X , - , = O ) = l . Since,when
the first urn is empty then the chosen ball is certainly from the second urn and it will
he transferred to the first urn.
pl0 =P(X, =O)Xn-, =1)=1/4, p,, =P(X, =21Xn-1 =1)=314, pll =p,, =p,, = O .
Since, when the first urn has one ball, then the chosen ball will be in the first urn with
probability, 114, and in the second urn with probability, 314. Similarly, the other
transition probabilities can be obtained.
The transition matrix is then

Example 13: A Markov chain has the following initial distribution u , and P -matrix.
u ={113,1/3, 1/31 and

We have
0.25 0.5 0.25
P2= 0.25 0.25
[0.5 0.25 ::~5,/
We find the following results
uP=(1/3 113 113)= u , and
up2 = (113 113 113) = u .
We can find the same relation for all the higher powers of P . Therefore, we get, in
general, UP"= (113 113 113) = u for every n, and thus using Theorem 6, we have
u(")= u for all n .
With such initial distribution u , the Markov chain will be called a Stationary .
Markov chain. The probability distribution, u , is then called the Stationary
Distribution of the Markov chain. This type of Markov chain will be discussed in
detail in Unit 3.
*1*

You may now try the following exercises.

E4) In Example 5, let a = 0 and b = 112. Compute P, p 2 , and p3 . What will Pn


be? What happens to Pn, as n tends to infinity? Interpret this result.
Markov Chains
E5) In Example 6, compute P, p2 and p 3 .What will be Pn?

(::)
E6) Coinpxte the matrices p2,p3, p4 for the Markov chain defined by the transition

matrix ]P= . DO the same for the transition matrix P= (y b). ~nterpret

the results in each of these processes.

E7) Assume in Exercise 1 that every man has at least one son. Find the probability
that a randomly chosen grandson of a businessman is former.

So far, we have been discussing Markov chains, the related probability matrices
including initial distributions, and their interpretations. Now, in the next section we
shall discuss two important methods of calculating Pn.

2.6 METHODS OF CALCULATING P"


In this section, we shall discuss the following two methods of evaluating Pn for a
given P .
(1) Method of Spectral Decomposition
(2) Method of Generating function.
First, let us understand the method of Spectral Decomposition.

2.6.1 Method of Spectral Decomposition

Let P be the transition matrix of a finite order, s x s . Suppose P has distinct eigen
values (or latent roots, or characteristic roots, or spectral values) A,, A,, h,, ...,h, .
They are the roots of the characteristic equation IP - 111 = 0 , where I is s x s
identity matrix.

A non-zero column vector x is called a right eigen vector (or latent, or characteristic
vector) of P , corresponding to eigen value hi if it satisfies the vector
-
equation (P SiI)x = 0 . A non-zero row vector y' is called a left eigen vector
(or latent, or characteristic vector) of P corresponding to eigen value h i , if it
-
satisfies the vector equation y'(P hiI)= 0 . The right and left eigen vectors are not
unique. For examples, if x is a right eigen vector the kx is also a right eigen
vector, k # 0 is scalar. A similar rule holds for the left eigen vector.

Let xi,yl be right and left eigen vectors corresponding to hi , (i = 1, 2,. .., s) .

Let ci = l/yIxiand Bi = ci x,y!. The product Bi is a matrix of order s x s , and is


called a constituent matrix corresponding to hi, (i = 1, 2,. .., S) .

We have the following properties in the context of constituent matrices:


(i) BiBj = 0 (i # j) , (orthogonal)
(ii) B; = Bi (Idempotent)

CB,= I , where I is an s x s identity matrix.


S

(iii)
S

(iv) P = Ch i ~, the
i Spectral Decomposition
i=1
The Basics of Markov
In general, we have, the following result, using the above properties Chains

From this, we can get pj;"', as (i, j) th element of Pn


***
I

Remark 4: (i) Since the row sum equals unity for all the rows of P ,therefore one is
always an eigen value of P , and the corresponding right eigen vector x has all the
elements unity. Therefore, the constituent matrix B, corresponding to eigen value one
' will have all rows identical. It is illustrated below.
I il\ ( ~ 1 Y2 .-. Y,)

(ii) All the eigen values of P are less than or equal to unity in absolute value.
(iii) If the matrix P is positive and irreducible, then it has only one eigen value
equal to unity, while if P is non-negative, irreducible and cyclic of order h ,
then it may have h(2 I ) repeated eigen values equal to unity. .
I (iv) If unity is the non-repeated eigen value of P , then lim Pn + B, , the constituent
n-4-
matrix for eigenvalue 1 .
***
Example 14: We will find Pn for the transition matrix P ,given in Example 5.
1 2

For eigen values h of P , the characteristic equation is


I P - ~ L I = o ,or
1-a-h
= 0 and solving it, we get
b 1-b-h
h, =1, h, = 1 - a - b
The right eigen vector x, corresponding to h, = 1 , will satisfy following

I (P-X,I)X, = o , i.e. Px, =x,


I Therefore, we have to solve the system of equations

1 Which gives x,, = r,, , and therefore x, =


i
Similarly, the right eigen vector x, corresponding to h, = 1-a - b can be obtained
by solving following
Markov Chains

bx2, +(I-b) x,, =(I-a-b)x,, .


Solving this, we get

bx,, =-a x,, and thus, x, =

. The left eigen vector yj, corresponding to hI = 1,will be obtained by solving


Y ~ (-PA,I) = 0 , which reduces to
( 1 - a ) ~+byl2
~ ~ =Yll
ay11+ (1 - b)yI2 = YQ
This gives
ay11 =by,,
and so, we get
Y; = ( b a)
Similarly, we get left eigen vector y i ,corresponding to h, = 1- a - b , as
y i = ( l -1)
and, therefore,

Next, we compute constituent matrices

-
B1= c ~ x ~= Y ~

And thus,
p'n' = p" =
i=l

We also get 'p: , the probability of transition from state i to j in n -step, as the
(i, j)th element of Pn. Therefore,
b+a(l-a-b)" a-a(l-a-b)" ,
ply' =- , pi;' =
(a + b) (a + b)
b-b(1-a-b)" a+b(l-a-b)"
p$' = , p$'=
(a + b> (a + b>
Asn+w ,
pf;'+b/(a+b), p:i)+a/(a+b),
p$' + b / ( a + b ) , p$) + a / ( a + b )
Let the initial distribution be u = (p 1- p) . Then, the unconditional probability
distribution of X, is
U(n)

1 (I-a-b)"
--- (b a) + (ap - bq - ap + bq)
(a + b) (a + b)
and hence,
54
The Basics of Markov
Chains

a
=- + (-ap+bq)(l-a-b)" , where, q = 1- p
(a + b) (a + b)

Example 15: Three girls, A, B, and C stand in a circle to play a ball throwing game.
Each one can throw the ball to one of her two neighbours, each with probability 0.5.
The sequence of random variables {X,} , where X, denotes the player with whom the
ball will lie at nth throw will form a Markov chain. The Markov chain will have
following P -matrix
0 0.5 0.5
P= 0.5 0 0.5
i0.5 0.5 0 ]
It is doubly stochastic as all the row sums and column sums are unity. Therefore,
corresponding to eigen value 3Ll = 1 , the left eigen vector y; and right eigen vector
x, ,both will have all the elements as one only. Thus, the constituent mauix B, will
have all rows identical and all columns identical (Bhat, 2000, 109p.j. Therefore, all
the elements of B, will be identical, and each will be equal to s-' = 1 1 3 . Thus,
113 113 113

113 113 113


***
Let us discuss the second method of calculating Pn which is known as the method of
generating function.

2.6.2 Method of Generating Function

As the name suggests in this m~,hod,a function is determined which generates the Pn
for different values of n .
Define the generating function
P ( S ) = I + S P + S ~ P ~ + S ~ P ~ + - . . +where
S ~ P ~I s+I <-1. ~
(Here, s is a variable of the function P(s) , and not the size of state space as before.)
Since, as n -,.o, s"Pn -,0 , therefore, P(s) = (I - s ~ ) -,' the inverse of matrix
(I - sP) .
Thus, we may obtain Pn by extracting the coefficient of sn in the expansion
(I- SP)-' , and pin' as the (i, j)th component of Pn.

Example 16: Let us find Pn, where P the transition matrix is given below:
(q P 0 0)
O O where q = l - p , a n d O < p < l
P=IO 0 n PI

I-sP= 1 'Yq -,/ :


and thus,
Markov Chains Now, as II - s P I= (1 - s) (1- sq13, we have
adj(I - sP)
(I - sp)-' =
11-sP)

( 1 -s 1 - S sp(1- S) (I - sq) s2p2(1 - s) s3p3 j

To obtain Pn , we collect coefficients of sn by expanding each element of (I - SP)-'


in powers of s , as below.

Simplifying,

You may now try the following exercises.

.=[; ;;;
E8) Let state space of a Markov chain be S = (0, 1, 2), and its P -Matrix is

Obtain Pn.

E9) Find Pn for large n for the matrix given below.


0.5 0.3 0.2

0.1 0.5 0.4

E10) Find Pn and its limiting value for large n for the matrix given below.
The Basics of Markov
Chains

E l 1) (Gene Model) The simplest type of inheritance of a trait in animals is governed


by a pair of genes, each of which may be of two types, say G and g. An
individual may have either a combination GG, or Gg (which is genetically the
same as gG), or gg. Very often, the GG and Gg types are indistinguishable in
appearance, and then we say that the G gene dominates the g gene. An
individual is called dominant if he or she has GG genes, recessive if he or she
has gg, and hybrid if a Gg mixture is present. Consider a process of continued
matings. We start with an individual of known genetic character and mate it
with a hybrid. We assume that there is at least one offspring. An offpring is
, chosen at random and is mated with a hybrid, and this process repeated through
a number of generations. The genetic type of the chosen offpring in successive
generations forms a Markov chain with states, dominant GG, hybrid Gg, and
, recessive gg, represented by 1, 2, and 3 respectively. The transition probability
matrix is
1 0.5 0.5

[
P = 0.25 0.5 0.25 . Find P" and its limit for large n.
0 0.5 oo5]

Now we bring this unit to a close. But before that let's briefly recall the important
concepts that we studied in it.

2.7 SUMMARY
In this unit, we have tried to acquaint you with the basic features of a stochastic
process, and Markov chains. We are summarizing these below:
1. We introduced the idea of Stochastic process and presented their classification
according to the nature of time and state space. The Markov chain was explained
as a particular case of the Stochastic process.
2. We defined the Markov property and the Markov chain, and presented some
examples suitable to a Markov model.
3. We studied properties of transition probabilities, and the transition matrix.
P
I 4. We described how the one-step transition in a Markov chain can be represented as
a digraph.
5. We have acquainted you with the concept of higher order transition probabilities.
6. We have defined initial distribution, and illustrated the method of computing
unconditional probability distribution of states of Markov chain at n" step in
terms of transition matrix, and the initial distribution.
7. We have described the method of spectral decomposition and generating functions
to compute Pn.

El) Let the three states, business, agriculture and public servant be denoted by
1, 2, 3 , respectively. Let the random variable X, denote the choice of
profession of the sons in n" generation. Let P, denote the probability that
Markov Chains
given a person is in iLhstate (profession) his son will choose j" state
(profession). Therefore, we get the following P -Matrix

Assume that the states nursery, class one, class two, ..., class 5, are denoted by
numbers O,1, 2,. .., 5 . The P -Matrix will be

p = : ":; ,", 1 where q i + p i = l

Using, Example 13, we may get following result by putting a = 0, b = 0.5 in the
expression of Pn

and as n is large P n +
(: a
when^=[ 1 0
0 1) t h e n P 2 = P 3 = P 4 = (b y) and in this case Pn = P

0 1
) 1 0
)
0 1
andwhen P = ( 1 0 then P 2 = P 4 = (0 1 and P 3 = (1 0 ).
when n is odd
In this case Pn =
when n is even

We want the probability P,?) . Since,


0.685 0.190 0.125 The Basics of Markov
Chains

0.363 0.450 0.188


Therefore P): = 0.19 . .

E8) Using the method of generating function

I - S P = [ ~ s ] and I-SPI-I-s3 + O for s1.1

(1 s s2)

We can get the following coefficients easily since (1- s3)-I has only powers
of s3,

P3" = Coefficient of s3" in (I - SP)-' =

p3n+l -
- Coefficient of sl+'" in (I -

P3"+' = Coefficient of s2+'" in (I -


0 1 0)
where n is a non-negative integer.

E9) The P -matrix is

i
0.5 0.3 0.2
P = 0.2 0.4 0.4
0.1 0.5 0.4 I
the eigen values of the P are 1,O.l and 0.2.

For the eigen value hl = 1, the right eigen vector is x, = , and the left eigen

vector y; , is the solution of y; (P - h , ~ =


)0

Equivalently, the solution of


0 . 5 +~0 -~2 ~ ~1 2 + O . l ~ 1 3 = Y l l
0 . 3 +O.~Y,,
~ ~ ~ +0.5~,3= Yl,
0 . 2 +0.4~,2
~ ~ ~ +0.4Yi3 = Y13
This gives
Markov Chains y; = (0.16 0.28 0.24)and thus
I / c, = y; xl = 0.68
B, = c, x,y;
[:] i0.235 0.412 0.353)
= 0.68 1 (0.16 0.28 0.24) = 0.235 0.412 0.353
0.235 0.412 0.353
Since for large n , Pn +B,, therefore we get the result.

E10) Using the method of generating function

( ~ - s ~ ( = ( i - s ) ~ ( i - ~ ~ sfor
) +(O~ ,( < 1 ,
(1- s)-I 0
(I- sp)-' = 0 (1- s)-'
(1- s)-' (1- p3s)-' p,s (1- s)-I (1- p,s)-' (1- p3s)-'
Pn = coefficient of sn in (I - SP)-'

As n becomes large,
1 0
1
p*/(l-p3) ~ 2 1 ( 1 - ~ 3 )0

E l 1) Solving IP - 111= 0 , we may get eigen values for P as A, = 1,A2= 0.5, A3 = 0

For the eigen value h, = 1 , the right eigen vector x, = ,while the left eigen

vector y; , will be obtained by solving y; (P- 1,I) = 0 . This gives -\

y; = ( 2 4 2) and thus,
llc, = y; x, = 8
0.25 0.5 0.25

0.25 0.5 0.25


The Basics of Markov
Chains
I
For the eigen value 0.5, the right eigen vector x, = , and the left eigen

vector, will be

y; = (-1 0 1) and thus


I / c, = yix, = 2 . Therefore,

B, =c,x,y; = (0.5)

Thus, we have
3

I=]
0.25 0.5 0.25
Pn = CI,"B, = 0.25 0.5 0.25 + (0.5)"*' 0
I0.25 0.5 0 . 2 j 13
0

0.25 0.5 0.25


0

As n +w , Pn 3 B, therefore, the matrix


0.25 0.5 0.25
i limiting value of Pn .
I -X-
I
UNIT 3 STATIONARY MARKOV CHAINS
I
Structure Page No.
I
3.1 Introduction
I Objectives
I 3.2 Classification of States
I Irreducible Chains
I First Return and First Passage Probabilities
I 3.3 Recurrence and Transience
3.4 Stationary Distribution
I 3.5
3.6
Summary
Solutions/Answers

3.1 INTRODUCTION

i
In Unit 2, we defined the Markov chain and its basic properties. In that unit, we
I limited our discussions only to the finite state Markov chains. Therefore, transition
matrices were only of finite order. Here, in Unit 3, we will deal mostly with the
i Markov chain with countable states. Therefore, the transition matrices will, generally,
be of infinite order. We will study the classification of states under various
1 conditions. Mainly, we will gain knowledge of the limiting behaviour of the chain.
Some chains stabilize after a long time. Their distributions become independent of the
initial distribution of the chain. Due to this property, the limiting distribution is called
the stationary distribution. We will learn the criterion under which the chains achieve
the limiting distribution. We shall start the discussion in Sec. 3.2 with the
classification of states of the Markov chains. Here, we will present the concepts of
communication of states, closed set, and irreducibility. We will study about first
passage time to the states and their expectations. In Sec. 3.3, we will present the
concepts of recurrence and transience of states. We will develop some mechanism to
identify states of the Markov chain. We will present some examples to illustrate these
concepts. In Sec. 3.4, we will study the limiting behaviour of the chains. We will
define stationary distributions and will study various conditions under which the
chains will approach to the stationary distribution. In this unit, we will present various
theorems without proofs.

Objectives
After studying this, unit you should be able to:
classify and categorize the states of the Markov chain into communicating classes
and closed sets;
learn about first passage time to a state and tinie of first return (recurrence time) to
a state;
find the mean first passage time to the states, and mean time of first return (mean
recurrence time) to the states;
recognize the recurrent and transient states;
understand the concept of Stationary Distribution and conditions for the existence
of the limiting distribution of Markov chains.

3.2 CLASSIFICATION OF STATES


In this section, we will classify the states of the discrete time Markov chain on the
basis of some of its transition properties. As in the previous unit, we will denote a
Markov chain by a sequence {X, } , satisfying "Markov Property" whose state space,
Markov Chains S ,is assumed to be discrete. The index set will also be denoted by T which is a
discrete pet. The transition probability matrix, P ,whose (i, j)" entry, p,,, (i, j E S)
denotes the probability of transition fiom the state i ,to the state j in a step, or'unit
time. As in Unit 2, we assume that the transition probability in zero step is defined by
pcO)= (p;)) = I , where I is an identity matrix.

3.2.1 Irreducible Chains

Let us first discuss few definitions.

Definition 1: A state j in S is said to be accessible from the state i ,if and only if,
there exists a non-negative integer m, such that pp)> 0 . The symbol, i d j ,denotes
this relation between states i and j .
Thus, if for all non-negative integers m, p?) > 0 ,then state j is not accessible from
i ,and we will denote this by if(j. When two states, i ,and j are accessible to each
other, then we say that the states i imd j communicate with each other. In other
words two states, i ,and j are called communicative, if and only if, there exists
integers m, n ( 2 0) , such that p p ' > 0, pf' > 0 . The symbol i o j denotes the
relation that i and j communicate with each other.

Definition 2: Let j be a state in the state space S of the Markov chain. Then a sub
set C(j) of S is called the communicating class of j if all the states in C(j)
communicate with j. Symbolically, given, k, j E S ,then k E C(j) if and only if
jek.

We present below this theorem, without proof, stating a property of the


communication among the states of a Markov chain. The theorem follows fiom the
fact that the communication relation on the state space S is reflexive, symmetric, and
transitive.

Theorem 1: The communication relation on the state space S is an equivalence


relation.

Remark
(i) The relation of accessibility is neither reflexive, nor symmetric. However it is
j 3 an integer m such that p p ) > 0, and j + k 2 3 an
transitive. Since, i +=
integer n such that p$) > 0 ,
From the Chapman-Kolomogorov equation, we have

Therefore, we find that state k is accessible from i . Thus, accessibility is


transitive.
(ii) Let i, j are states in a Markov chain, and they communicate with each other, then
they belong to same communicating class, that is C(i) = C(j) ,if i t,j .
(iii) If C, and C, are two communicating classes in S ,then either they will be equal,
or they will be disjoint. The state space S can be partitioned into the equivalence
classes by the communication relation as it is an equivalence relation. These
equivalence classes are communicating classes.
Definition 3: A subset C of the state space S is called closed if it is a
communicating class and no state outside C can be reached from any state within C .
Symbolically, a subset C of the state space S will be called closed, if and only if, for
any states j, k in S, j ~ k , a n d f o rj c C and k P C , p j , = O .
Remark
(i) Let C be a closed set. Then, for any states j, k , in S , such that j c C and k e C ,
we have pjp' = 0 for all positive integers, m .
(ii) All the states within closed set C communicate with each other, but the reverse is
not true, that is, if a set contains states that communicate with each other it does
not imply that the set is closed. However, if all the states of the state space are
communicable, then the state space will be closed.
(iii) A subset of states C is closed if for every state i c C, pi, = 1 . In this case, the
6c

matrix P can be rearranged in the following canonical form, P = (2 0) where

0 is a zero matrix. Here, the sub-matrix P, = (pi,),i,j e C is also stochastic. and


states in C forms a sub-Markov chain.
(iv) If a closed set is a singleton, then the state comprising this set is called an
absorbing state, i.e., a state j will be absorbing if, and only if, pij = 1 , and
p,, = O for k # j.

(v) No proper subset of a closed set will be closed.

Definition 4 (Irreducible Chain): A Markov chain is called irreducible if there does


not exist any closed set other than state space S , itself. If a chain is not irreducible
then it is called reducible. Markov chains which are not irreducible are called
reducible.

We should note that whether or not a Markov chain is irreducible is determined by the
state space S , and the transition matrix (pi,) , the initial distribution, is irrelevant in
this matter. If all the elements of the transition matrix (plj) are non-zero, then the
Markov chain will necessarily be irreducible. All the off-diagonal elements of the
transition matrix (pi,) of an irreducible Markov chain can not be zero. In fact, no row
can have all the off-diagonal elements zero.

Let us discuss the following example to understand various Markov chains.

Example 1: Let a Markov chain with the state space S = {O,l, 2,3,4,5} with the
following transition matrix:
0 1 2 3 4 5

For set of states C, = {O,l}, the states are communicating with each other.
Since pol = 1, P$' = p,,p,, = 1
plo = 1, pi:) = plopo,= 1 , and
Markov Chains for state 0 , we have pOj= 0 for all j~ C,
for^ state 1 , we have plj = 0 for all j E C,
therefore, C, = {0,1) is a closed set of the given Markov chain.
Similarly, we may show that sets C, = (3, 4) is a communicable set, and the sets
outside it are not accessible from C, thus, C, is closed set.
Here, the state 5 is an absorbing state since the set 15) is closed and it is a singleton.
It may be verified that the sub matrices formed by the closed sets are stochastic as
fo1lows.
We can verify for C, = (0, 1)
for state 0 s C,, p,, = p, + p,, =1
KC,
forstate l e C , , Xplj=p,,+p,, = I .
jE Cl
We can also verify it for other closed sets C, = (3, 4) and Cj = {5). -
The transition matrix can also be rearranged in following canonical form.

where PI, P,, P, are sub matrices of P corresponding to the three closed sets, 0 are
zero matrices, Q is sub matrix corresponding to the transient state and R is
remaining sub matrix.
The Markov chain is reducible since it has three closed sets and a transient set.
***
3,2.2 First Return and First Passage Probabilities

Let i be any state of a time homogeneous Markov chain {Xn } . Define

Thus, fin' is the probability that the chain starting in state i returns to state i for the
first time after n steps. Clearly, fil' - pii,and we define ':f = 0 , for all states i in
state space S . We call fin' , the probability of first return (also called time of first
recurrence) to state i , in time n .
Similarly, we may define, the probability of first passage from state i to state
j, i # j in time n denoted by fin' as
Stationary Markov Chains
Thus, fiin' is the probability that the chain starting in state i and visits the j for the
first time after n steps. Clearly, fiil)= pi,, and we now define fiO'= 0 for all i, j in
S . As defined in Unit 1, P(')= I , i.e.
p(O'=l
u and piE'=~ for k # j forall j , k in S .
We present below a theorem without proof, which provides two equations: the first, a
relationship between fin' , the probability of first return to state i in time n and
p$', the n-step transition probability from state i to itself, and the second relates the
probability of first passage from state i to state j in time n given by f,(") and the
n-step transition probability from state i to state j given by pFi. These relations
may help in computation of n-step transition probabilities and in proving results on
limiting behaviors of states of Markov chain.

Theorem 2: For any state i in S , we have

and for any two states i and j in S , we have

Definition 5: Assume that a time homogeneous Markov chain starts in state i , and
define
m

n =O
Then fii is the probability of ultimate or eventual return to the state i , having started
in this state, i.e., the probability that the chain ever returns to the state i . A state i is
called a recurrent state or persistent state if fii = 1, i.e., when the return to the state i
is certain. We will use both the terms recurrent and persistent for this purpose in
this unit. A state i is called transient when the ultimate, or eventual, return to the
state i is not certain, i.e., fii < 1.

Definition 6 (Mean Recurrence Time): Let i be a recurrent (persistent) state, then


fin', n = 0,1, 2, 3, ... is a probability distribution of time to return to state i and the
mean of this distribution is defined as

q is called the mean recurrence time of the state i . A recurrent state i is called
non = null recurrent (also called positive recurrent, or positive persistent) if pii < oo ,
i.e., if its mean recurrence time is finite, whereas it is called null recurrent if
pii = o o , i.e., if its mean recurrence time is infinite.

Definition 7 (Recurrent Periodic State): A recurrent state i is called periodic if the


return to the state i can occur only in the tth, 2tth,..., step where t is an integer
greater than 1. In such a case, the integer t is called the period of the periodic state.
Symbolically,
t = G.C.D.{m :p:lm) > 0) = G.C.D.{m : fim)> 0) (7)
(Here, film' is the probability of first return to the state i in m steps, defined in the
Equation (2).)
Markov Chains If there does not exist such a t(> 1) then the recurrent state i is not periodic, and this
state is called aperiodic, i.e., if t = 1, then the state is said to be aperiodic. Every
state in a communicating class must have the same period. Thus, in an irreducible
Markov chain all states are either aperiodic or periodic with same period. An
irreducible Markov chain is said to be aperiodic if its all states are aperiodic, and the
irreducible Markov chain is said to be periodic, having period t(> 1) , if its all states
are periodic with period t(> 1) . In an irreducible chain, all the states are of the same
type. In fact, we have the following important results.

Theorem 3 (Recurrence a Class Property): Let two states, i and j, in state space
S, i t,j , (that is, both states are in the same communicating class), then both the
states are either transient, both are persistent null, or, both are persistent non-null
together. Both are aperiodic or periodic with same period. Thus, all the states in a
communicating class have the same classification. Either all are transient, or non-null
persistent, or null persistent. All are aperiodic, or periodic with the same period.

Corollary 1: In an irreducible chain, all the states are either transient, all are persistent
null, or all are persistent non-null together. If all are periodic then all will have the
same period.

Definition 8 (Passage Time): Parallel to the recurrence time, now we define the
passage time. Firstly, define

the probability that the chain starting in state i will ever reach the state j , i.e., the
probability of ultimate passage from state i to j . If fij = 1 , then the ultimate passage
to state j is certain given that the chain starts in the state i . In such a case,
fin', n = 0,1, 2, 3, ... is the probability distribution of first passage time to the state j
given that the chain starts from i . Then, we may define the mean of the first passage
time from the state i to state j as,
m

p..
11 = C n f i j n j
n=O
Definition 9 (Recurrent Chain): A Markov chain is called recurrent, or persistent, if
all its states are recurrent

Transient Chain: A Markov chain is called transient if all its states are transient.
Ergodic State and Ergodic Chain: A persistent, non-null, aperiodic state of a
Markov chain is called ergodic state. If all states in a Markov chain are ergodic, then
the chain is said to be ergodic.

Let us discuss the fo11owing~,examp1e.

Example 2: Let a Markov chain with state space S = (1, 2, 3, 4,5} have the
following transition matrix. We will determine the nature of the states of the chain.
1 2 3 4 5
110 1 0 0 0'
L 2 1 0 0 0 0
1 P = 3 114 0 114 112 0
4 0 0 0 114 314
5,o 0 0 1 0 ,
68
On the basis of the probability of first return to the states, we will classify the states as Sfationaq Chains
follows. Since,
fll =f11
(1' f'2'
+ II
+f (3) + * .
11
.
= 0 + 1 . 1 + 0 + 1...= 1
therefore, state 1 is persistent. Again
f,, = f;;) + f;;) + f;;) + ...
=0+1.1+0+ ...= 1
therefore, state 2 is persistent. Similarly,
f,, = f;;' + fi," + fi," + ...
= 1 / 4 + 0 + 0 + ...= 114
therefore, state 3 is transient and

therefore, state 4 is persistent. Finally,

therefore, state 5 is persistent.

The states 1 and 2 are periodic with period 2 since, for state 1
t = G.C.D. {m: f:?) > 0) = G.C.D.(2) = 2
and for state 2
t = G.C.D.{m : f,,'"' > 0) = G.C.D.(2) = 2
The Mean Recurrence Time of the persistent (recurrent) states are obtained as
follows:
p,, = 1.f;;' + 2.f,':' + 3.f;;' + ...
=1.0+2.1+0.0=2

The states {4,5) are persistent, non-null, and aperiodic. Therefore, they are ergodic.
The states {I, 2) are persistent and periodic with period 2 . The state 3 is transient.
It may be easily verified that the given Markov chain is reducible. Its state space can
be decomposed into three communicating classes C, = (1, 21, C, {4,5} and C, = {3}.
Further, C,, C, are closed sets. At states in C, are aperiodic and positive recurrent.
Whereas all states in C, are positive recurrent and periodic, each with period 2 . This
verifies the results of Theorem 3, and the fact that periodicity is a class property.
***
Example 3: Let a Markov chain have following transition matrix.
Markov Chains

All the states are communicable. Therefore, it has only one closed set, the state space
S = {0,1,2). The chain is irreducible.
The probability of ultimate return to state 0 will be

Thus, the state 0 is persistent (recurrent).

The Mean Recurrence Time for state 0 will be


&, =l.f$) +2.f$) +3.f$) +...

Thus, 0 is a non-null persistent (positive recurrent state). Since the Markov chain is
irreducible, all its states must be non-null persistent by Theorem 3. Let us verify this
by actual calculation for other states in S .

The probabilities may also be obtained using a digraph, described in Unit 2. The
digraph for the given transition matrix has been shown below, in Fig 1. To find f$),

the probability of first return to state 1 in one step, find the paths from node 1 to node
1, traveling any edge only once. Add all the probability labels on the edges of these
paths. There is no such path in this example, and the probability f,(:) will be zero. To
find f;), the probability of first return to state 1 in two step, find the paths from node
1 to node 1 traveling along two distinct edges. We have two paths 1+8 +1 and
1+ 2 + 1. Multiply probability labels on the edges of each -path, and add such
3 1 1 5
multiples of all paths to get f:). Therefore, f,(:) = - .- + - . I = - and so on. We get
4 2 4 8
the probability of ultimate return to state 1 as

Similarly, we may obtain


m
Therefore, all the state are persistent (recurrent) as stated above. Stationary Markov Chains

The Mean Recurrence Time for state 1 will be


Fll = 1.fj;) + 2.f;;) + 3.f:;) + ...

Similarly, we may obtain

Thus, all the states are non-null since mean recurrence times for all the states are
finite, as stated above.

Again, consider the state 0 . Since

Therefore, from the definition of periodic recurrent states given in Egn. (7), the
period t = G.C.D. {m:fim)> 0) = G.C.D. (1, 2, 4, ...) = 1.
Therefore, the state 0 is aperiodic. Since the chain is irreducible, all the states will be
aperiodic.
Therefore, all the state are persistent (recurrent), aperiodic, and non-null and thus,
ergodic. Thus the chain will be ergodic. We have, thus, verified that periodicity,
positive or null recurrence, transience, etc., are class properties.

Example 4: Consider a Markov chain with the following transition matrix


0 1 2 3
010 0 0)

Since all the states are communicable, it has only one closed set, sample, space
S = {0,1, 2, 3) . The chain is irreducible.
We can use the following digraph for the given transition matrix to compute the
probabilities of first return, as in the previous example.

For the state 0 ,

=O,fg) =1.1.-=-
(1) - f ( 2 )
1 1 >0, f&') = f ( 5 ) = O f ( 6 ) =-.-=-
2 1 2 >O ...
f00 - 00 00 '00
3 3 3 3 9
therefore, from the definition of periodic recurrent states given in Eqn. (7), the period
t = G.C.D. {m:film)> 0) = G.C.D. (3, 6, 9, ...j= 3 and probability of ultimate return to
the state 0 is
~ a r k o vChains

Thus, the state 1 is recurrent with period 3 . Now, since Markov chain is irreducible,
all the other states have the same classifioation, that is, recurrent with period 3 .

Therefore the Markov chain is recurrent and periodic with period 3 .


***
You may now try the following exercises.

El) Determine the classes, probability of ultimate return to the states, mean
recurrence time of the various states of the Markov chain having the following
transition matrix. Is the chain irreducible?

E2) Determine the closed set, probability of ultimate return to the states, periodicity
of states, mean recurrence time of the states of the Markov chain having the
following transition matrix. Is the chain irreducible?
0 1 2

2 0 1 0

So far we have discussed the classification of states and chains. In this section, we
will focus on recurrence and transience in details.

3.3 RECURRENCE AND TRANSIENCE


We have discussed formal definitions of recurrence and transience in the previous
section. In this section, we will provide some more properties of recurrence and
transience without proof which will be used to classify the states of a Markov chain.

Let us begin with the formal definition of a generating function.

Definition 10 (Generating Function): Let a,, a,, a,, a,, ... be a sequence of real
'
numbers, and s be a real number, then a function A(s) defined by.

is called a generating function of the sequence a,, a,, a,, a,, ... provided this
power series converges in some interval -so < s < so . If a non-negative discrete
random variable X assumes only integral values O,1,2,3, .. . and the sequence {a,}
represents the probability distribution of X ,such that a, = P[X = k], then A(s) is
called the probability generating function of random variable X . .
Theorem 4: For a state i of a Markov chain, let P,,(s) be the generating function of Sfationary Chains

the sequence ) , and F;, (s) be the generating function of the sequence (f r ) } .
Then, we have
1
PII(s) = ,lsl<l (10)
1- (s)
***
Theorem 5: For state i, j of a Markov chain, let P,,(s) he the generating function of
the sequence {p',"' 1, P,(s) he the generating function of the sequence {p(n)},and
( JJ
F,(s) be the generating function of the sequence (f(n)) . Then, we have for / s I< 1
4
(1 1)
I (ii) p,,6 ) = F, 6 ) (1- F, (s1l-l
***
(12)
I Let us illustrate the following example to understand.
I
Example 5: Consider a Markov chain with the following transition matrix
0 1 2 3
I

*.

I
We can verify that the matrix is periodic.
0 1 2 3 0 1 2 3

I
0 1 2 3

t
P3 = P6 =...=

For a state 0 of the Markov chain, the generating function of the sequence of the
transition probabilities {pg)] is given by
- 1
~ , ( s ) = ~ p ~ )=1+0.s+0.s2+-s
s* 3
+...,since p$ = I
k=O 3

and the generating function of the sequence of the probabilities of first return {I$))
(as obtained in Example 4), will be F,(s) as given below

+
= 1 ---?
for ( s l < l
1-s
Markov Chains Therefore,
I- s ~
1- Fo0(s)= - 7 and thus, we may verify Egn. 10, that

1
Pii(s) =
1- qi(s)
, I s 1 farstate i = 0 . Similarly, we can verify the relations
given in Eqn. (1 1) and (12) for the states of the Markov chain.
m

Hen, we may note that Foo(l)= L


z,(f) . Thus, the probability of ultimate return
=f,
k=O
to the state 0 may be obtained from the generating function Foo(s) by setting s = 1 .
***
Now, we are providing a theorem which states that a state i of the Markov chain will
be recurrent if, and only if z
m

n=O
p): =m . The result is immediate from the Eqn. (lo),
since for a recurrent state i as s f 1, 1 - F,, (s) 10 and therefore, the left hand side
m
1
equation qi(s) -t pi:) and the right hand side tends to infinity as s f 1 .
n=O 1 - 6, (s)

Theorem 6: A state i of a Markov chain will be persistent (recurrent) if

and state i will be transient if

The following theorem gives some limiting results for recurrent states of a Markov
chain.

Theorem 7: If the state i is persistent (recurrent) then as n +m


(i) p$t) -t t / pii provided state i is non-null and periodic with period t (1 5 )
(ii) pt:) +1 / pii provided state i is nori-null and aperiodic (1 6 )
(iii) &") -t 0 provided state i is null (whether periodic or aperiodic) (17)
***
Remark
m

(i) If state i is transient, then -t 0 as n -t m ,since < m . This result


n =O
directly follows from the property of convergent series.
(ii) If state j is transient and i is any state of the Markov chain, then zpr)
m

n =O
cm

(iii) If j is a transient state, then no matter where the Markov chain starts, it makes
only a finite number of visia to state j, and the expected number of visits to j is
finite. It may enter into a recurrent class in a number of steps and when it enters
there, then it remains there for ever. On the other hand, if j is a recurrent
state, then if the chain starts at j, it is guaranteed to return to j infinitely often
and will eventually remain forever in the closed set containing state j. If the
chain starts at some other state i ,it might not be possible for it to ever visit state
j. If it is possible to visit the state j at least once, then it does so infinitely Stationary Markov Chains

often (See Bhat, 2000).

Theorem 8: Let i and j be any state in state space s of a Markov chain.


(i) If the state j is persistent, null or transient, then
I lim pF+ 0
I
n+-

(ii) If the state j is persistent, non-null, and aperiodic, then

lim
n+-
pF) + -
fij

pjj

Example 6: Consider a countable state Markov chain having states 0, 1, 2, 3, ...


with, transition probability matrix having (i, j) element (i, j = 0, 1, 2, 3, ...) given by

! p. =-,
lo
i + l p.. =- 1
i + 2 ''+I i+2
and pij=O, j # i + l or j+O
Therefore, the transition probability matrix is an infinite matrix
1 2 3 4 5 -

i
0 112 112 0
1 213 0 113 0
0
0 -

-( -
- - -

I - -)
For the state 0 , the probabilities of first return will be

1 Therefore, the probability of ultimate return to state 0 , will be

1 and, thus, the state 0 of the Markov chain is recurrent. Since all states can be reached
from any state, hence, the Markov chain is irreducible. Again, the state 0 is aperiodic
since the G.C.D. of times with positive probabilities of first return to the state 0 is
one. From the class property of recurrence stated above, the Markov chain will be
recurrent and aperiodic.

1 1 The mean recurrence time to the state 0 , can be obtained as


Markov Chains Thus, the state 0 is pqsitive recurrent and, hence, the Markov chain is positive
recurrent. Further, from the Eqn. (16), we have as n -+ m ,
pg' -+l/yoo =l/(e-1)
***
States in a Finite Markov Chain

The results obtained above in this section have some essential implications for the
finite Markov chain. The state space of a finite Markov chain must contain at least
one persistent state. Otherwise, if all the states of a Markov chain become transient
then, the transition probabilities, pf' -+ 0 as n -+ m for all i and j in the state space
S and it is impossible since for all i~ S we must have xpr'
JE S
= 1. Therefore, a

Markov chain with a finite state space, S ,cannot be a transient chain. Again, a finite
Markov chain cannot have any null persistent state. Since the states of the closed set
having this null persistent state will form a stochastic sub-matrix (say PI ) of transition
matrix P and as n + a , we will have P," -,0 and, hence, P will not remain
stochastic. This is not possible. Thus, a finite Markov chain cannot have a null
persistent state.
The following theorem is now easy to visualize.

Theorem 9: In a finite irreducible chain, all the states are non-null persistent.

Example 7: In Example 2, the state 3 is transient. We will find


limp:", and limp:",'.
n +- n+-

Let us find the probability of ultimate passage time from state 3 to state 4, and to state
5, i.e., f,, and f,, .
co

and

Again, since we have, from Example 2,

Since, state 4 is aperiodic, non-null, persistent. Therefore, using Eqn. (19), we have as
n-+m

4s - 1 3 3
p,, -+---.-=-
(n)
and
Pss 2 7 14'

Example 8 (Unrestricted Random Walk): Let us consider a particle moving along a


straight line, and assuming integer values only. Let the random variable Xn denotes
the position of the particle at time n . Then, X, satisfies the relation, Xn = Xn-, + Zn ,
where Z, denotes the displacement of the particle at time n . We assume that the
random variables Z, , are identically and independently distributed with Stationary Markov Chains

P[Zn = 11 = p and P[Zn = -11 = q . It means that the particle either moves a unit in left
direction with probability q , or a unit in right direction with probability p at each
time. Therefore, { X, } will be a Markov chain with the state space
+ + +
(0, 1, 2, 3 - .) . Its transition probability matrix P can be expressed as

P= 1
o - q o p o -
1 - o q o p -

Since all the states are communicating with every other state, therefore the matrix and
the chain is irreducible.
From Corollary 1, the chain is either transitive, or persistent null, or persistent
non-null.
Consider the state 0 . It is clear that we cannot return to 0 in an odd number of steps.
Let it return to state 0 in time 2n , then during this period it must have moved in right
direction n times, and in left direction n times. Therefore, using binomial
distribution we have

Now from Stirling's formula, a large n approximation to b is


b=& G ( n ~ e ) ~
Using this approximation, we have

PE)= '" cnP"¶"

( 1 a

Now, as zp&'"
n =O
c w when 4pq < L ile., if p t q ,in that case, the state 0 is transient.

Hence, the chain will be transient for p # q .


In the symmetric case, p = q = 112, so that 4pq = 1 , it follows that

xPg'=
n =O
;-& = w and the state 0 is recurrent. Hence, the chain will be

i 1 1
recurrent if p = q . Further, since, pg)= -(4pq) =- +0 as n + w and the
6 &
I
state 0 is recurrent, then by using Theorem 6, we may conclude that the chain will be
I recurrent null when p = q = 1/ 2 .
***
Markov Chains You may now try the following exercises on the basis of above discussion.

E3) If n-step transition probabilities of a given state i of a Markov chain is defined


as, p!n' 1
= - for n > No then find the mean recurrence time for the state j
2'

E4) Consider a countable state Markov chain having a transition probability matrix
as follows
0 1 2 3 4 -

Show that the chain is recurrent.

E5) Obtain the limiting value of p$"' as n 4 .o for i = 0,1, 2, 3 for the Markov
chain given in El).
E6) Obtain the limiting value of Pn as n + .o for the Markov chain given in E3).

In this section, we shall discuss stationary distribution.

3.4 STATIONARY DISTRIBUTION


In this section, we will study the behaviour of the Markov chain that has been running
for a long time. In many real systems, the long term behaviour of the system requires
much attention. Mostly, we need to study, whether the effect of the initial state
diminishes in the long run. We may also want to study the proportions of time that the
chain will be found in various states of the chain in a long run. Mathematically, this
means studying the limit, lim u("', where u("' is the probability distribution of the
n-+-
chain at time n . We like to determine whether or not this limit exists, and, if it does
then, to analyze the various related conditions and their implications. We will state
here a few basic, but important facts regarding the limiting behaviour, mostly without
proof, but with suitable examples.

Before discussing limits of u'"' ,it is better to describe the notion of a stationary
distribution of a Markov chain. We will say that the Markov chain (X, ) possesses
stationary distribution if the distribution u'") is the same for all n , that is,
-
- u (0) -
- u the initial probability vector, for all n 2 1 . Thus, the probability that
the chain is in, say, state i is the same for all time; although Xn is moving from one
state to another, it looks statistically the same at any time. Since a stationary
distribution of the chain does not depend on n , we drop the superscript and denote it
merely by x = (n,, n,, ...) . In general, if x = (n,, 7c,, ...) is a probability mass
function, giving stationary distribution of a Markov chain {X, ) with initial
distribution u = {ul u2 .--ui -.. ) where u. = PIXo = i] for each i and with the
1

transition matrix P = (Pij) ,on the state space S = (1, 2, ...) , then x = (n,, n, ....) is
called a stationary distribution for the transition matrix P . Here, we will make the
study for countable state space. We will describe for the finite state space separately
when the behaviour becomes different from the countable state space.
Definition 11(Stationary Distribution): Let a Markov chain {X,, n = 0,1, 2, ...] Stationary Markov Chains

with transition probability matrix P = (where pi, denotes the transition


probability from state i to j), state space S = {I, 2, ...) and a sequence of non-
negative numbers {nil satisfying the following equations.
(i) n j = ~ x i p i,forall
, j ~ S o r x = x P w h e r e n = ( n , , n ,...)
,
ie S

(ii) Cnj=l (21)


kS
then the sequence {n,) will define a probability distribution over states of the Markov
chain. This distribution is called the stationary distribution of the Markov chain.

Theorem 10: If the initial distribution of a Markov chain {Xn) is the same as its
stationary distribution, then all the random variables in the sequence, {X, ) , will have
identical distributions.

Remark: Let n j denote the probability that the system is in state j. The condition in
Eqn.(20) is often called a balancing equation, or equilibrium equation. The
stationary distribution x on S is such that if our Markov chain starts out with the
initial distribution u = nr ,then we also have u1 = x ,since by Theorem 7 of Unit 2, and
Eqn.(20) above, we have u(') = UP= x;P = .n . That is, if the distribution at time 0 is
x , then the distribution at time 1 is still x . In general, u(")= a for all n (for both
finite as well as countable state space). Due to this reason x is called a stationary
distribution

Let us now discuss the stationary distribution for an Irreducible Aperiodic Markov
Chain:

In this the existence of stationary distributions for the irreducible aperiodic Markov
chains, and the long term behaviour of the distribution of these chains. The following
theorems describe the related conditions. These theorems are applicable for both, finite
as well as countable state space chains.

Theorem 11(Ergodic Theorem): An irreducible aperiodic Markov chain


{X,, n = 0,1, 2, 3, ...) can be classified in two classes: (1) all the states are transient or
null recurrent (2) all the states are non-null (also called positive) recurrent, that is all
the states are ergodic. In the case of (I), there does not exist a stationary distribution.
For all states i, j we get pt'+ 0 as n + m . In the case of (2), p$' + nj as n +m
for all the states i, j . Here, nj will be the reciprocal of the Mean recurrence Time of
1
j i.e. n j = - > 0 for all states j
Pjj

and {nj) is the unique stationary distribution o i the IV~,-'-OV chain. In this case, as
n + a, the distribution of the Markov chain at time n tends to the stationary
distribution, not depending on the initial distribution of the chain. In other words, if
the Markov chain {X, , n = 0, 1, 2, 3, ...) an irreducible, aperiodic, and non-null
Markov chain, and X, have the distribution u(O),an arbitrary initial distribution and
u("' , be its distribution, at time n (n = 0,1, 2,3, ...) , then lim u'"' = x exists for all
n+-
states i

Theorem 12: An irreducible aperiodic Markov chain {X, , n = 0,1, 2,3, ...) will be
ergodic if the balancing equation
Markov Chains
isS
has a solution {X,) (X, not all zero) satisfying 1 xj ( < m .
jsS

Conversely, if the chain be ergodic then every non-negative solution {xj) of the
balancing Eqn. (23) satisfies I xj I < m .
jcS
Remark
(i) The limiting probability distribution given by lim u(") = x is called a steady
n+a,
state distribution of the Markov chain.

(ii) If the probability transition matrix P is symmetric for a Markov chain having
finite state space S = {1,2,3, ..., s) ,then the uniform distribution [ x j = 11s for
all j = 1,2,3, ..., s ] is stationary. More generally, the uniform distribution is
stationary if the matrix P is doubly stochastic, that is, the column-sums of P
are also 1 (we already know the row-sums of any transition matrix P are all 1).

(iii) A finite aperiodic irreducible chain is necessarily ergodic, thus, any finite
aperiodic irreducible chain has a stationary distribution.

(iv) For an irreducible Markov chain, the existence of a stationary distribution


implies that the chain is recurrent in fact positive recurrent, but the converse is
not true. That is, there are irreducible, recurrent Markov chains that do not have
stationary distributions. Such chains will be null recurrent.

Example 9: Find all stationary distributions for the transition matrix given below.

The given chain is finite, irreducible, aperiodic since all the transition probabilities are
positive and hence, non-null. It must have a unique stationary distribution.
Let x = (75, x,) be the stationary distribution. From Eqn.(2O), we have the balancing
equations
n1 = 0.3nl + 0.2n2
x, = 0 . 7 ~ + 0.87~~
~
one equation is redundant; they both lead to the equation 0.7x, = 0 . 2 ~ From
~ . above,
we have an infinite number of solutions. Using the second condition from Eqn.(21),
R , +n 2 = 1 . (25)
2 7
We get unique solutionn, = - , x, = -
9 9

and, proceeding recursively, we get

since the given Markov chain is ergodic. We, may also verify that
, rn

where pll is the mean recurrence time of state 1 that may be obtained, as follows
80
Stationary Markov Chains

Similarly, we may verify that


1 7
7E2 =-=-
P22 9
***
Example 10: Consider the transition matrix given below, which is doubly stochastic.

We use Remark, about doubly stochastic transition matrix.


Since s = 3 , we will get x = (1/3,1/3,1/3).
Let us check that it satisfies the balancing equations.
0.2 0.3 0.5
nP =(1/3,1/3,1/3) 0.3 0.2 0.5 =(1/3,1/3,1/3)=x,
I0.5 0.5
verifying, x = (1/3,1/3,1/ 3) is stationary distribution for P .
***

I
1
1 Let us now discuss the criterion for transience.

Criterion for Transience

I'
Here, we will state a condition for a countable state space Markov chain to be
transient. It may be mentioned here, again, that any finite Markov chain cannot have
all the states as transient. If a finite state space Markov chain is irreducible then it will
1 necessarily be recurrent. We will also present an example to find the stationary
distribution for an irreducible chain having a countable state space.

Theorem 13:An irreducible aperiodic Markov chain with a countable state space
S = (0, 1, 2,. ..) and a transition matrix P = (Pi,) will be transient (all the states will be
I transient) if, and only if

Irn
m

xi = x p i , x j for all states i = 1, 2,3, ...


j=1
has a non-zero bounded solution.
***
Example 11: Consider a Markov chain having a random walk with the countable state
space {0, 1, 2, ...) ,and having an elastic barrier at state 0. The transition probabilities
may be defined as follows
P ~ ~ + ~ =pii-]=q,
P, p+q=l, O<p<l, (i2l)
Po0 ='I, POI= P
p..'I = 0 elsewhere
As the chain is irreducible, we will study the nature of the solution of the following
equations as given by the Eqn. (26), to determine the nature of states of the Markov
chain
Markov Chains m

xi = x P i j x j for all states i = 1, 2, 3,


j=l

Therefore, we get the system of equations as


xi = px,, + qxi-, , for i > 1
x1 = PX2
and, these may be simplified to
p(xi+l- Xi ) = q(xi - xi-1 )
p(x2 -x1)=qx1
These equations reduce recursively to

[;J
-xi = - xl (i 2 1) , thus we have

and, we get the solution as

for i 2 l .

From the above solution, we see that xi will be bounded if p > q . Therefore,
according to Theorem 13, the Markov chain will be transient when p > q ,and
recurrent when p lq .

Let us find the stationary distribution of the chain when p & q . The balancing
equation to solve will be 4

no = qno + (4x1
which may be written as

Therefore,

( :)' ( c)llS
n J. - n .J-1 = - n - -
and, thus

n j - 5= x(?+j-1
- nr ) = (el j
n, - no which gives

I:(=
r =O
j
n, no for j 2 0

Using the condition in Eqn.(21)


weget z o ( l + [ t r +('r+['J +(): 4
+...)=I (27)
Stationary Markov Chains

When p = q ,then the infinite series in the Eqn. (27) will be divergent stationary
distribution will not exist in this case, and the chain will be null recurrent. When

(
p < q , then the Eqn. (27) gives x, = 1- -
I):! and we have a stationary distribution

I):[- (:l
nJ = (1 P.
for j 2 0 which is a geometric distribution with parameter -

The chain is positive recurrent for p < q .


9

Existence of Stationary Distribution

Till now, we have considered only irreducible aperiodic chains and discussed the
problem of the existence of stationary distributions. In general, a Markov chain may
have no stationary distribution, one stationary distribution, or infinitely many
stationary distributions. We have given the conditions for the existence of unique
stationary distribution, along with examples. The chains presented were ergodic-finite
or countable. We have also presented a Markov chain which does not possess any
stationary distribution. The chains of this type were transient or null recurrent,
however, they must be countable (since we cannot have a finite chain as transient, or
null recurrent). As an example of the chain having infinitely many stationary
distributions, we may take a transition matrix P to be the identity matrix, in which
case all distributions on the state space will be stationary. Such chains may be finite,
or countable. Example 12 illustrates the case. When the Markov chain has finite state
space then it will have at least one stationary distribution whether it is reducible or
irreducible, periodic or aperiodic.

Example 12: Consider a Markov chain having the following identity transition

.=[;
matrix.

Llet the stationary distribution be ~c= (n,, x,, z,) . Then, the balancing equation of the
chain will be
I] 0 0)

Clearly, all arbitrary vector with non-negative components, R = (n, ,n, ,z,) satisfying
x, + n2+ n3 = 1, will be stationary distributions. For example, vc = (0.1 0.3 0.6) .
Thus, for thls chain there will exist infinite number of stationary distributions. Here,
we may easily observe that a countable identity transition matrix also possesses an
infiDite number of stationary distributions.
***
Example 13: Consider the Markov chain having the following transition matrix

1 0 0)
Solving the balancing equation
Markov Chains

subject to condition n, + n2 + n3 = 1 ,we may get a unique stationary distribution


(n,, n,, x3) = (1/ 4,1/ 2 , l l 4) but nlirn p[:'
--t-
# nj since pi:' = 0 for all odd n .
The chain is irreducible, persistent, but periodic.
For this chain, the mean recurrence times may be obtained as
poO= 0.f;;) + 1, f p + 2.f;;) + 3.f;;) + 4.f;;' + ..-

=4
Similarly, we may get p,, = 2, p22= 4.
Here, we also observe that (nl,x,,n3)=(l/p,,l/p11,11p22).
However, the long run equilibrium probabilities, Theorem 6, is applicable,~
***
Remark: In the example above we encountered a Markov chain that is irreducible,
persistent, but periodic, has a unique stationary distribution having probabilities
reciprocal to the mean recurrence time. We have a theorem which explains such
behaviour. It says that if a Markov chain is irreducible and non-null (positive), then
there will exist a stationary distribution. The result is based on the Cesaro limit. This
tells us that if {a, } is a sequence, such that lirn a n = 1, then the partial sum
n -t-

1 " 1 "
-zai also converges to the same limit, i.e.; lim - z a i = 1. This limit also
n + l i, n+-n+l
exists, even when lirn an does not exist.
n -+-

Theorem 14: An irreducible, positive recurrent Markov chain has a unique stationary
= (n, ,z2,rc, ,...) ,given by
distribution ~r
1 " 1
lim - ~ P i j ' " " = 5 =- for all j ,whatever state i may be.
n+mn+l j=o P~

You may now try the following exercises.


~ - - - - -

E7) A Markov chain has an initial distribution u"' = I116 112 1/31, and the
following transition matrix.

Find its stationary distribution. Is it unique? Verify that the limiting distribution
of the chain is stationary.

E8) A Markov chain has the following transition matrix P = (A :).Find

its stationary distribution. How many distributions are possible?

E91 Consider the Ehrenfest chain, presented in Ex@le 12 in Unit 2, with only 3
balls. Then, the transition matrix will be
Stationary Markov Chains

I
! (i) Test the irreducibility of the chain. (ii) Find its stationary distribution.

E10) Consider a Markov chain {X,) with a countable state space having the
following transition probabilities
pl,+l= P, Pli-I = ql, P, +'I,=', PI * q,>O
7 (i > 1)
poo=qo. Pol=Po. po,qo>O.

When will the chain be transient, and when will it be recurrent?

Now let us summarise what we have done in this unit.

We are furnishing, in the following, a summary of the discussions in this unit:


1. We introduced the concept of communicating classes, and, closed sets.
2. We defined the irreducible Markov chain. Y

3. We obtained the distribution of the first passage time to the states, and first
recurrence time of states. We also defined mean time of first passage and mean
recurrence time.
4. We acquainted you with the concept of recurrence and transience.
5. We investigated the limiting behaviour of the Markov chain.
6 . We defined stationary distribution, and illustrated the procedures to find stationary
distributions.
7. We investigated some situations when the stationary distributions of the chains
exist, and is also the equilibrium distribution.

3.6 SOLUTIONSIANSWERS
El) The states (O,1, 2) form a communicating class. State 3 does not communicate
with any state. The chain is reducible.
The probability of ultimate return of the stc.es
- 3 1
1s

f, =Cf$'=O+O+l.-+l.-1+0...=1
n =O 4 4
1 3 1
f,, =Zf:;' =0+0+0+1~1~-+0+1~1.-~1.-+0+~.~
n =O 4 4 4

f, =Zf;;'
3 1
=o+o+-.I+-.1.1+0+ ... = I
n=O 4 4

85
Markov Chains -
f33=Cf;;' =o
n=O
Therefore, the states O,1, 2 are recurrent, and state 3 is transient.
The mean recurrence times for the recurrent states are given below.
w
3 1
p, = C n f g i =O.O+l.0+2.-+3.-+0=- 9
n=O 4 4 4
w
1 3 9
pi, = C n f $ ' =O.O+l.O+2.0+3-+4.0+5.-+6.0+7.-+--=9
n=O 4 16 64
w
3
pn =Cnf::'=O-0+1,0+2.-+3.-+0=- 1 9
n=O 4 4 4

E2) The state space is a closed set. The chain is irreducible.


The probability of ultimate return of the states are

-r -t -t 9 - t - t
n=O ,

All the states are persistent.


Again, we have

0 1 0 .75 0 .25
Therefore,
p:;"' > 0 and p ~ ~ n =
+ 0l ' and i .

Therefore, all the states are periodic, with period 2.


The mean recurrence periods are obtained, as follows

n=O -r LV "-,

1
E3) Since p/:' = - for n > No. Therefore,
2
x
n=O
03

p/:' = 00 and hence, the state i

1
will be recurrent. Again, it will be aperiodic, since p!:' = - > 0 . Further, the
2
1
state i is non-null since pin' + -+ 0 .
2
1 1
Using Theorem 6, piln)+ -= - ,we get mean recurrence time of state i , as
Pii 2
p..= 2 .

E4) The given Markov chain is irreducible since all the states can be reached from
every other state of the chain.
For state 0 , the probabilities of first return will be Stationary Markov Chains
f$'=p, f $ ' = q . p , f $ ' = q . q . p , f{'=q.q.q.p ,...

Clearly, the state 0 is aperiodic, since the period of the state is one. The
probability of ultimate return to state 0 will be
-
f, =Cf,$' = o + ~ + ~ ~ + ~ ~ ~ + ~ ~ ~ + ~ ~ ~ + . . .
n=O

= p(1- q)-I = 1
and, thus, the state 0 of the Markov chain is recurrent. From the class property
of recurrence it follows that the Markov chain will be recurrent, and aperiodic.
I
E5) See the solution of El). In the given problem, we have found that states O,1, 2
are non-null, aperiodic, and recurrent, and state 3 is transient. The mean
1 recurrence Times for the recurrent states are found as follows.
9 9
Pm = *, pll = 9 and p,, =q.
Using Theorem 6 and Remark 3 we have,

1 E6) See the solution of Example 3. All the states were aperiodic, non-null persistent.
The mean recurrence time for the state O,1, 2 were obtained as
11 11 11
Po,, =-, p -- and p,, =-
6 'I- 4 1
Using Theorem 6, we have

The limits of pl;"' for other i, j may be obtained using Theorem 8. According to
f..
this theorem, when state j is non-null aperiodic persistent lim pl;") +2
n+=-
Pjj
We may find ultimate probability first passages fij from the transition matrix
given in the example, as follows

I Similarly, we may obtain fo, = f,, = f,, = f,, = f,, = 1 .


1 6 1 6
lim
n+-
+ -- - limp:",' +-=-
1116 11' n+- 1116 11

1 1 1 1
limp$)
n+-
+-=-
1111 11'
lim pi?
n--
+- =-
1111 11
Therefore,

:1 A A)
Markov Chains E7) The chain is irreducible since all the states are communicable. We may also
verify that the chain is aperiodic recurrent. Solving the balancing equation
x = YCP, i.e.

( 7 ~ ~ , ~ ~ , 7 ~ ~ ) = ( 7 ~ ~ , 7 ~ ~ , ~ ~ )

we get
7c1 =057C2+ 0.57~~
7c2 =057c1+ 0.57~~

Solving these equations along with the condition zl+7c2 + x3= 1, we get, unique
solution (nl, z2, n3)= (113,113,113) . This is obvious since P is doubly
stochastic.
From Theorem 7 of Unit 2, we have u'"' = U(~'P'"'. From Theorem 11,we

P'"'

Therefore, as
I;[+
havepij'"' + n j as n += . In the matrix form, we have as n += .

u'"' =~'O'P'"' +(I16 112 113)


I:[ =A

The limiting distribution is stationary and independent of the initial distribution.

E8) Thechain
1 2 3

is reducible. It has two closed sets 11, 21 and {3] .


Here, we may get infinite solutions to the balancing equation, given below
7C1=.57C1+7C2
7~,=.57~~
7C3 = 7C3
Removing redundancy, we get n2 = .5n1 and the condition z1+7c2+ n3 = 1.
Then, two equations provide an infinite number of solutions
( q , 7 ~ ~ , 7 ~ , ) = ( 2 ( 1 - ~(1-x)/3,
)/3, x) where O l x l l
--
E9) The Markov chain is irreducible, but periodic, with period 2. However, we may
solve the generd balance equations to get its stationary distribution which is
x = (1/8,3/8,318,1/8) ,and is unique. Note, however, that lim p;' + zj since
n+-

pi:' = 0 for all odd n .

E10) The chain is irreducible as all states communicate. To determine the nature of
states of the Markov chain, we will study the nature of the solution of the
following equations, as given by Eqn. (26)
-
xi = pijx for all states i = 1, 2,3, ... T
j=1
Therefore, we get Stationary Markov Chains

and these may be simplified as

and, hence
Xi+l -Xi =qi
X i -Xi-1 pi
We get, recursively

Adding the above equations, we get


i-1
xi = X 9
where, L, =I...--
, ~ L , q2 91
k=l Pk P2 PI

E L i <- . Therefore, the states will be transient if


the infinite series is divergent.
-
Therefore, the above equations will have a bounded soluion if, and only if,
ZL, c and recurrent, if

-X-
NOTES

You might also like