Conditional Probability
Conditional Probability
Conditional Probability
1 . INTRODUCTION
This unit introduces you to the pre-requisites of probability and statistics, which you
studied as undergraduates. We recall the concepts of conditional probability,
compound probability, Bayes' theorem, conditional distribution, and conditional
expectations here. These are fundamental to the study of probability and statistics.
The history of probability can be traced back to the beginning of the mankind in the
games of chance. Archaeologists have found evidence of games of chance in
prehistoric digs, showing that gaming and gambling have been a major pastime for the
peoples in Greece, Egypt, China, and India since the dawn of civilization. However, it
wasn't until the 1 7 century
~ that a rigorous mathematics of probability was developed
by French mathematicians Pierre de Fermat and Blaise Pascal. The basic concept of
conditional probability and the famous Bayes' theorem was the pioneer work of
Thomas Bayes (1707-1761). However, it was Laplace, who generalized, completed
and consummated the ideas provided by his predecessors in his book Thborie
analytique des probabilitbs in 1812. It gave a comprehensive system of probability
theory (The elements of probability calculus - addition, multiplication, division -
were by that time finnly established.).
We shall start our discussion with conditional probability in Sec. 1.2. Here, we present
its concept and definition along with some examples. In Sec. 1.3, we learn the
Compound Probability Law. In Sec.1.4, we recall the Law of Total Probability along
with the very widely used Bayes' theorem. In Sec. 1.5, we discuss the conditional
distribution. Finally we conclude with defining the conditional expectation of the
random variables, which is very important, and give some examples of it.
Objectives
After studying this unit, you should be able to:
I
define and compute the conditional probability of an event;
I
distinguish between the conditional and unconditional probability of an event; 1
evaluate the change in the probability of an event after the occurrence of another
event;
apply the Bayes' theorem in different situations;
apply the concept of conditional distribution and conditional expectation and their
important properties in various problems.
This example can also be shown clearly by a Venn diagram as depicted in Fig.1.
Let the event A = a student who passed the first test, and the event B = a student who
passed the second test; clearly, the event A nB = a student who passed both the tests.
It is given that P(B) = 0.35,which is the probability of the event B without any
additional condition, i.e. inconditional probability which means that the probability is
evaluated considering a full class of students as the sample space. If we want to find
the probability of the event that, has a student passed the second test, who has already
passed the first test, then our sample space reduces to the set of those students who
have passed the first test, i.e. the event A,. This probability is evaluated as the ratio of
the probability of B included in A (which is P(A nB) ) to the probability of A . This
ratio comes out to be 0.20 or 20% as evaluated above. In this case, this probability
will be termed as the conditional probability of event B (passed second test), given
that the event A (passed the first test) has happened.
From the above definition we may easily verify the following intuitive results for any
three events A, B , and C of a sample space, S . Let us discuss a few properties of
conditional probability.
1. P(A ( A) = 1 , which is the conditional probability of the reduced sample space
itself. P(A I A), the probability of event A when A has happened, is clearly 1.
P(A n A)
Also P(A 1 A) = (from (1)) i.e. P(A 1 A) = 1 . This is the axiom of
P(A)
normedness of probability measure.
2. P(A1B) 2 0. Since numerator and denominator both in Eqn.(l) of conditional
probability are non-negative, it is non-negativity axiom of probability.
3'. P(AIB) 5 1 . Since in Eqn.(l) the event involved in the numerator, A n B is
always a subset to the event involved in the denominator B . Hence, from the
monotone property of probability the result follows.
4. P(A I B) = O ; if the events A and B are mutually exclusive. If the events A and
B are mutually exclusive, then A nB will be empty. Then the numerator in the
Eqn.(l) will be zero. Therefore, the result follows.
A' n~
AnB
Proof: From the distributive law of set theory for the three sets A, B , and C , we
know that:
( A u B ) n C = ( A n C ) u ( B n C ) ..
Using the addition law of the probability of events, we get:
where n(B) denotes the number of outcomes favorable to event B , and similarly
n(A n B) denotes the number of outcomes favorable to the event A n B .
1 Example 2: In a survey, the question, "Do you smoke?" was asked to 100
i peopli. Results are shown in the following table:
Yes (A) No ( A )~ Total
1 Male (B) 1 19 1 41 1 60
1
Female (B')
1
I
l2 1
I
28 1
I
40 1
Total 31 69 100
~
1
a) What is the probability that the selected individual smokes?
b) What is the probability that the selected individual is a male and smokes?
In the example given above, it may be noted that, the difference in part b and d is
that we evaluate P(A n B) when the simultaneous occurrence of both events A and
B is required, whereas, we evaluate P(A ( B) when the chances of occurrence of
event A from event B is required as it is the conditional probability.
Example 3: In a card game, suppose a player wants to draw two cards of the same
suit in order to win. Out of a total of 52 cards, there are 13 cards in each suit.
Suppose at first draw, the player draws a diamond. Now, the player wishes to draw a
second diamond to win. What is the probability of his winning?
I1
t
Solution: Let the event A denotes getting a diamond at the first draw, and event B
denotes getting a diamond at the second draw. Clearly, we have to find the conditional
P(A n B)
probability of B , given A , P(BIA)=
P(A)
13x12 1
Here, P(A)=13152=114 a n d ~ ( ~ n ~c) = ' ~ ~ =--- ~ - ~ ~ c ~
52x51 17
Markov Chains 1/17
Thus, P(BIA)=-=4/17.
14
We may arrive at this result by reducing the sample space under the condition and by
getting the outcomes favorable to picking a diamond in the reduced space. At the time
of the second draw, one diamond has already been chosen, and there are only 12
diamonds remaining in a deck of remaining 5 1 cards. Thus, the total number of
possible outcomes will be 51 , and the outcomes favorable to picking a diamond will
be 12. Thus, P(B I A) = 12/51.
***
You may now try the following exercises.
El) Suppose that A and B are events in a random experiment with P(B) > 0
Prove each of the following:
a) If B c A , then P(AIB)=l.
E3) The probability that it is Friday and that a student is absent is 0.03 . There are
6 school days in a week. What is the probability that a student is absent given
that today is Friday?
E4) Suppose that a bag contains 12 coins of which 5 are fair, 4 are biased, each
with the probability of heads being 113 ; and 3 are two-headed. A coin is
chosen at random from the bag and tossed.
a) Find the probability that the coin is biased.
b) Find the probability that the biased coin was selected and the coin lands
showing a head.
c) Given that the coin is biased, find the conditional probability of getting a
head.
These relations are called compound probability law or multiplication law. This rule is
applied to find the probability of the concurrent occurrence of two or more events
using conditional probability as illustrated in the following example.
Example 4: A bag contains 5 white balls, and 4 black balls. Two balls are drawn
from the bag randomly, one by one, without replacement. Find the probability that the
first ball is black, and second is white.
Solution: Let events A = first ball is black, and B = second ball is white. Clearly,
we have to find out P(A n B) .
Since P(A) = 419 and P(B I A) = 5 18 (Under the condition that A has happened, the
reduced sample space has a total of 8 outcomes, out of which 5 are favorable to B .)
( Solution: Let the event A be that the first selected unit is defective. Event B be
1 equals that the second selected unit is defective, and event C is that the third selected
) unit is defective.
I
1 a) The lot is rejected if all three units are found defective. Thus, we need to
1 obtain P(A n B n C) .
( At the first draw, P(A) = 151100, getting a defective unit from 100 units
containing 15 defectives units.
1 In the second draw, assuming event A , the lot now contains 99 units with 14
I
defectives units. Thus P(B I A) = 14/99 .
, Similarly, P(C 1 A n B) is the probability of getting a defective unit in the third
I draw, given that in both the earlier draws there were defective units. Thus, the lot
now contains 98 units with 13 of which are defectives. Therefore,
P(C)AnB)=13/98.
Therefore, using the law of compound probability, we get:
P(A n B n C) = P(A) P(B(A)P(C I A n B)
--15 -
- 14 ---
13 - 13
I 100 ' 99 ' 98 4620 '
b) The lot will be accepted if it is not rejected. Clearly, the probability that the lot
, 13 4604 1151
will be accepted, 1- P(A n B n C) ,which is 1- -= -or -
4620 4620 1144 '
***
;You may now try some exercises.
Markov Chains
E5) A box contains 8 balls. Three of them are red and the remaining 5 are blue.
Two balls are drawn successively, at random and without replacement. Find the
probability that, the first draw results in red, and the second draw results in blue:.
E6) In a certain population, 30% of the persons smoke, and 8% have a certain type
of heart disease. Moreover, 12% of the persons who smoke have the heart
disease.
a. What percentage of the population smoke and have the heart disease?
b. What percentage of the population with the heart disease smoke?
E7) Consider the experiment that consists of rolling two fair dice. Let X denotes,
the score of the first die, and Y denotes the sum of the scores on both the dice.
a. Find the probability that X = 3 , and Y = 8 .
b. Find the probability that X = 3 , given that Y = 8 .
c. Find the probability that Y = 8 , given that X = 3 .
In the next section, we shall talk about an important law, which is the law of Total
Probability. It also includes the celebrated Bayes' Theorem.
UB,= S
I1
write
i=l
Theorem 1 (Bayes Theorem): Let w,, B, ,...,B, be a set of events which form a
by Thomas Bayes a British
Mathematician in 1763.
partition of the sample space, S . Let 4 be any event with P(A) > 0 . Then,
i=l
Proof: From the definition of conditional probability, for two events A and Bi , we
have
Conditional Probability
[using Eqn.(4)]
[using Eqn.(9)]
[using Eqn.(4)]
I
I
In the context of Bayes theorem, the probability P(B,) is called a priori probability
I of B, ,because it exists prior to the happening of event A in the experiment.
.
1
I
The probability P(B, I A) is-termed 'a posteriori probability because it is
determined after the happening of the event A , i.e. posterior to the event A .
Since the probability P(Bl I A) represents the likelihood of the event B, after eve*?
I
A happens, the probability P(Bl ( A) is called a 'likelihood' of event B, after the
happening of event A .
Let us apply the above results in the following examples to understand this concept.
19
0.31~-
- 31 - l 9 which is the result in Example 2(d).
19 41
0.31~-+0.69~-
-60'
31 69
***
Example 7: There are three bags. The first bag contains 6 red balls, and 4 blue
balls. The second bag contains 2 red balls, and 8 blue balls. The third bag contains
5 red balls, and 5 blue balls. A bag was selected at random from the three bags, and
a ball was drawn randomly from it. The ball was found to be blue. What is the
probability that the ball came from second bag?
13
Markov Chains Solution: Let the events B, be selecting the first bag, B2 be selecting the second
bag, B, be selecting the third bag, and A be the ball drawn is blue.
Thus P(B, ) = P(B,) = P(B,) = 113
P(A IB,) = the probability of getting a blue ball from the first bag = 41 10 . Similarly,
P(AIB,)=8110, P(A)B3)=5/10.
We want to find the probability that the selected bag was second, given that a blue ball
came in the draw, i.e. P(B, I A). Using the Bayes' theorem, we have
1 8
***
You may now try the following exercises.
E8) In a die-coin experiment, a fair die is rolled and then a fair coin is tossed a
number of times, equal to the score on the die.
a) Find the probability that the coin shows head in every toss.
b) Given that the coin shows heads in all tosses, find the probability that the die
score was i , i = 1, 2, 3,4,5, 6.
E9) A plant that produces memory chips has 3 assembly lines. Line 1 produces
40% of the chips with a defective rate of 5% ,line 2 produces 25% of the
chips with a defective rate of 6% and line 3 produces 35% of the chips with a
defective rate of 3% . A chip is chosen at random from the plant.
a) Find the probability that the chip is defective.
b) Given that the chip is defective, find the probability that the chip was
produced by the Line 3 .
for all a and b , to assign the probability that the variable X will take a value in the
interval (a, b] .
You must have studied conditional distribution in your earlier course. Just to
recaptulate, let us discuss it here again. Let us first discuss the formal definition of
conditional distribution.
Given two jointly distributed random variables X and Y ,that is, a two-dimensional
random variable or vector (X, Y) , the conditional probability distribution of Y
given X (written " Y I X ") is the probability distribution of Y when X is known to
have taken a particular value.
Definition 2: Let X and Y be two discrete random variables (r.v.s.) associated with
the same random experiment, taking values in countable sets, Tx and T, respectively.
The function f (x, y) defined for all ordered pairs (x, y), x E T, and y E T, by the
relation
f(x, y) = P[X = x, Y = y]
is called the joint probability mass function of X and Y .
Note: By definition,
f(x, y) 2 0
and
The total number of ways of selecting two persons from a group of 10 persons is
' O C , = 45. Since the persons are selected at random, each of these 45 ways has the
I
same probability -. Consider the event [X = 1, Y = 11 that a committee has one
45
mathematician and one statistician. One mathematician can be selected from two in
Markov Chains 2 ~ =, 2 ways, and one statistician can be selected from 4 statisticians in 'c, = 4
ways. Hence, the total number of committees with 1 mathematician and 1 statistician
8
is 2 x 4 = 8 . Thus, P[x=l, Y =I]=-.
45
Since the committee has only two members, it is obvious that there are no sample
points corresponding to the events [X = 1, Y = 21, [ X = 2, Y = 11 and [X = 2, Y = 21
Hence, the probabilities P[X = 1, Y = 21 = P[X = 2, Y = 11 = P[X = 2, Y = 21 = 0 .
We now summarise these calculations in the following table.
We say that the function f (x, y) is the joint probability mass function of the r.v.s. X ,
Y , or random vector (X, Y) .
Let X 'and Y be r.v.s. taking values x G T, and Y E T, ,respectively and joint p.m.f.
f(x, y ) = P [ X = x , Y = y ] .
We define new functions, g and h ,as follows: Conditional Probability
(1 1)
In Eqn.(l I), we keep the value x of X fixed and sum f (x, y) over all values y of
Y . On the other hand, in Eqn.(l2), y is kept fixed and f (x, y) is summed over all
values of X . We wish to interpret the function g(x) defined for all value, x of X
and the function h(y) defined for all values y of Y . Notice that both g and h ,
being sums of non-negative numbers, are themselves non-negative. Further,
C go,= C C
xe7rx YET^
f(x, y ) = l
X G T ~
Thus, g(x) has all the properties of a p.m.f. of one=dimensional r.v. Similarly, you
can verify that h(y) also has all the properties of a p.m.f. We call these the p.m.f. of
the marginal distribution of X and Y respectively, as you can see from the following
definition.
Definition 3: The function g(x) defined for all values x E T, of the r.v. X by the
relation
is called the p.m.f. of the marginal distribution of X . Similarly, h(y) defined for all
the values Y E T, of the r.v. Y by the relation
hoof: C P [ Y = ~ J X = X ] = (x. Y)
fX,Y
Definition 6: Two discrete random variables, X and Y , are called independent if,
and only if
fx,y(x,y)=fx(x)f,(y),forall x ~ T , , a n d a l l y ~ T ~ (21)
where fX., (x, y) is the joint probability mass function of X and Y , and fx (x), f, (x)
are the marginal probability mass functions of X and Y respectively. Likewise, two
continuous random variables, X and Y are called independent, if and only if
f X , y ( ~y ,) = f x ( x ) fy(y), -rn<X<rn, - = < y < w (22)
where f,,, (x, y) is the joint probability density function of X and Y , and
fx (x), fy (y) are the marginal probability density functions of random variables X and
Y ,respectively.
It may be easily verified that the following conditions with usual notations are
equivalent for the independent random variables, X and Y ,both for discrete and
continuous case.
a- fXlY(xI~)=fX(x)
b. fy1x(yI X)= fy(y)
c. fx,.(x, Y)= f,(x)f,(y)
forall - - < x < m , - m < y < w .
Example 9: The random variables, X and Y ,have the following joint probability
mass function fx+,(x, y) in the cells of a bivariate probability table.
Find
(i) The conditional probability distribution of Y given X
(ii) The conditional probability distribution of X given Y
(iii) Are X and Y independent?
Solution: (i) Here, T, = {O,l, 21, T, = {O,l, 2)
Clearly, marginal probability mass function of X ,may be e v a l u ~ ~ ~ f o l l o w s :
fx ( 4= C f,,, (x7 Y)
y€Ty
Markov Chains
These are basically row totals in the bivariate table. Similarly, column totals give
f Y (Y)-
P[y=yIx=x]
(x, Y) Total
0 1 2
0 (45/28)(2/9) = 5/14 (45/28)(1/3) = 15/28 (45/28)(1/15) = 3/28 1
1 (45/16)(2/9) = 518 (45/16)(2/15) = 318 (45/16)(0) = 0 1
2 45(1/45)=1 45 (0) = 0 45 (0) = 0 1
x+ll2, ifO<x<I
fx (x) =
otherwise
The marginal probability density function of Y
1
+y and f, (y) = 0 otherwise
f, (Y)=
lo9
l12+y, ifO<y<l
otherwise
a) The conditional probability density function of X given Y = y, 0 < y < 1 will be
O<x<l
or f x l Y ( x ( y ) =
otherwise
O<y<l
or fyl,(y~x)=
otherwise
c) Since f,,, (x, y) # fx (x)f, (y) , the random variables are not independent.
d) Now, the PIO < X < 1I 2 ( Y = 113j will be obtained by integrating the
conditional probability density function of X given Y = 113 over [O, 1121
6
- x+113 = -(x + 113)
113+1/2 5
Markov Chains
Thus, P[O<X<1/21Y =1/31= I-(5
'I2
0
6
Example 11: Suppose that random variable (X, Y) has joint probability density
function, f as given below
I
ifO<x<y<l
f (x, y) =
otherwise -
I
-
Solution: The marginal probability density function of X
1 3
= 2 ( x + - --x2)=1+2x-3x2, if O c x c l
and f,(x) = O
2 2
, otherwise
The marginal probability density function of Y lir_ Fig.4
0
I
c) Since f,,, (x, y) # f, (x)f, (y), 0 < x < y < 1, the random variables X and Y are
not independent.
***
You may now try the exercises that follow.
E10) Two dice are thrown. Let X denotes the sum of the scores on two dice and Y
denotes the absolute value of their difference.
a) Find the joint probability mass function of X and Y .
b) Find the marginal probability mass function of X .
c) Find the marginal probability mass function of Y .
d) Find the conditional probability mass function of Y given X = x .
e) Find the conditional probability mass function of X given Y = y .
f) Are X and Y independent?
E l 1) Suppose that the random variables X and Y have a joint probability density
function f as given below
ifO<x<y<2
f (x, y) =
otherwise
a) Find the conditional density of X , given Y = y .
b) Find the conditional density of Y , given X = x .
c) Are X and Y independent?
E12) Suppose that the random variables X and Y have a joint probability density
function f as given below
ifO<x<w, O<y<-
f (x, Y)=
otherwise
a) Find the conditional density of X , given Y = y .
b) Find the conditional density of Y , given X = x .
c) Find P[X < 11, P[X < Y], and P [ X + Y < 11
E13) Suppose that the random variables X and Y have a joint probability density
function f as given below
f(x,y) =
{:: ifO<x<y<l
otherwise
a) Check whether or not independence of X and Y holds.
b) Find P[X < 0.2 1 Y > 0.11, P[O. 1< Y < 0.41 .
The exercises in this section would have given you enough practice to compute the
density functions and distribution functions of bivariate random variables. Next, we
shall discuss measures of central tendency of the probability distribution of bivariate
random vectors.
Markov Chains
1.6 CONDITIONAL EXPECTATIONS
We shall begin this section with the definition of the conditional expectation of a
function of one random variable, given that the other variable has taken a given value.
provided the series on right hand side of Eqn. (23) is absolutely convergent. Here,
E(Y I X = x) is a function of x since x can take any value in T, , and the
conditional expectation of X , given Y = y , denoted by E(X I Y = y) or E(X I y) is
defined as
provided right hand side of Eqn. (24) converges absolutely. Here, E(X I Y = y) is a
function of y , since y can take any value in Ty . Similarly, if X and Y be
continuous random variables for the experiment, having f,,, (x, y) as the joint
probability density function and fx(x), f, (y) be the marginal probability density
functions of X and Y , respectively, then the conditional expectation of Y , given
X = x ,is defined as
provided the integral on right hand side of Eqn. (25) converges absolutely, and the
conditional expectation of X , given Y = y , is defined as
provided the right hand side of Eqn. (24) is absolutely convergent. Here, again we
note that E(Y I X = x) is a function of x , and E(X 1 Y = y) is a function of y , as both
x and y can vary in R . The conditional expectation of a function of a random
variable can also be defined in similar way. For the discrete random variables, X and
Y , as specified above, the conditional expectation of $(Y) , a function of random
variable Y , given X = x ,will be
fx,, (x*Y)
E($(Y) I X .=x) = $(Y)
y'Ty f,(x)
and for the continuous random variables, X and Y , as specified above, this
conditional expectation will be
(ii) E(YIX=x)= x
yeTy
y
fx,,
fX
(x, Y)
('1
fx., (X, Y)
(iii) E(E(Y 1 X)) = E( y 1
ysTy fX(X)
(iv) v(xIY=o)=E(x~IY=o)-{E(xIY=o)]~
2
xe Tx
r
Markov Chains
Example 13: Let the continuous random variables, X and Y ,have the following
joint probability density function
8xy, i f O < x < y < I
f (x, y) =
0 , otherwise
(i) Find the expectation of Y , given X = x , i.e., E(Y ( X = x)
(ii) Find V(Y ( X = x)
Solution: (i) Since
(ii) Since
--
2
Therefore,
v ( Y ( x = x ) . = E ( Y ~I x = x ) - { E ( Y ( x = x ) } ~
You must have seen in (iii) part of Example 12, that E(E(y I x)) and E(y) both attain
the same value, 315 . Now let us try to prove this in the theorem that follows.
Thus,
c I
This result holds for the continuous random variables also. In similar way, the result
can be proved for the continuous random variables also.
***
Now let us prove another theorem.
f,.,(x,
Then find
[t
Y ) = 144'
-6<x<6, -6<y<6
otherwise
Here, we close the discussion on conditional probability. We hope that you have
gained considerable knowledge about conditional probability, and conditional
distribution. Now let us summarise what we discussed in this unit.
1.7 SUMMARY
In this unit, we have covered the following points.
1. We illustrated the idea of conditional probability, and presented some examples to
elaborate its concept. The conditional probability of an event is obtained on the
basis of prior knowledge of the happening of another event. For the evaluation of
the conditional probability of an event, the sample space gets reduced to the event
whose occurrence has taken place.
2. We attempted to describe conditional probability. Conditional probability is
influenced by the happening of another event if the events are dependent,
otherwise, it is not influenced and the events are called independent.
3. We studied some basic properties of conditional probability. They were similar to
the general properties of a probability function on a sample space.
4. We stated and proved the famous Bayes' theorem. We presented simple examples
to illustrate it.
5. We have acquainted you with the concept of conditional distribution with some
examples.
6. We have defined conditional expectation of a random vector, and some important
properties. Finally, we defined conditional variance.
and B, = P ~ Ao
) P(A n B) = P(A)P(B)
P(B)
E3) Let event A = the student is absent, and the event B = today is Friday.
I
Since there are six school days, thus, P(B) = 116 , and P(A n B) = 0.03 .
Therefore, the required probability is
I E4) Let event A = the coin is biased, and the event B = the coin lands heads up.
The bag contains 12 coins: 5 fair, 4 biased, each with probability of heads
113 ;and, 3 two-headed.
a) P (coin is biased) = P(A) = 4 112 = 113 .
I
1
I) b) P (coin is biased and it lands head) = -
'I 9
1 c) P (the coin lands heads, given that it is biased)
E5) A box contains 8 balls: 3 of them are red, and the remaining 5 are blue. Two
balls are drawn successively, at random, and without replacement. Let the event
A be that a red ball is drawn in the first draw and event B be that the blue ball
is drawn in the second draw. The required probability is P(A n B) , and
! P ( A n B ) =P(A) P(BIA)=--=-
3 5 15
I 8 7 56
Ii E6) Event A = a person smokes, and event B = a person has heart disease.
We are given
P(A) = 0.3, P(B) = 0.08
I P(B)A)=0.12
1 a) We require P(A B) n
I n
P(A B) = P(A) P(B 1 A)
i
=0.3x0.12
= 0.036
Thus, the percentage of population that smoke is 3.6%.
b) We require P(A 1 B) here
E8) Let the events Bi = the score on the die is i , where i = 1, 2 , 3 , 4 , 5 , 6 , and event
A = all tosses of coin show heads
Clearly, P(Bi) = 116, i = 1, 2 , 3 , 4 , 5 , 6 and
P(A(Bl)=1/2, P(A(B2)=(1/2)(1/2)=1f4,P(A(B3)=(1/2)(1/2)(1/2)=1/8
P(A(B4)=(1/2)(1/2)(1/2)(1/2)=1/16similarly
~ ( A ( B , ) = 1 / 3 2P(A)B,)=1/64.
,
- 63 - 21
----
6 . 6 4 128
b) Since
Conditional Probability
E9) Let the events B, =the chip is produced by line i , where i = I, 2 , 3 and
event A = the chip is defective.
P(B,)=0.40, P(B,)=0.25, P(B,)=0.35
P(AIB,)=0.05,P(AIB2)=0.06, P(AIB,)=0.03
b) The chip was produced from the Line 3 , given that the chip is defective
E10) The sample space, as given below, consists of 36 equally likely outcomes.
f x ( x )= j f X . y ( ~ ? ~ ) d ~
-m
fxly(x/y)=0 , otherwise
=O otherwise
*
c) Since fx.,(x,y) fx (x)f,(y), 0 < x < y < 2 , the random variables X and Y
are not independent.
fy(y)= j f x , y ( x , ~ ) d x
4
CO
=O otherwise
=O otherwise
trkov Chains
P(X < Y) =
m m
w
5 5 f,,
x
.
(x, y) dxdy
0
m
1 1-x
P(x+Y<~)=[
0
5 fx,y(x,y)dxdy
0
. x+y=l
0 0
I
Fig.6
m
-
= ~fX,,(x.y)dy
= 52 dy
X
=2[xlY'l
y =x =2(1-x) ifO<x<l
and fx (x) = 0 otherwise
m
fY (Y)= jf,,,(x,y)dx
w
Y
= 52dx
0
= 2 [ ~X=O
] " ' ~ = 2 ~if O < y < l
and f y ( y ) = 0 otherwise
a) *
Since fX,, ( x , ~ ) fx(x)f,(y), 0 < x < y < 1, the random variables, X and
Y , are not independent.
P[X<0.2,Y>O. I]
P[X<0.2(Y>O.L]=
P[Y>O.l]
Conditional Probability
0.1 1 0.2 1
P[X<O.2,Y>O.l]= J I 2 dxdy + 1 1 2 dxdy
-
Since,
rn 6 6
therefore, k = 1/ 144
Again,
-
ce
fX.Y (x'Y)
(i) E(YIX=x)= Jy dy
fx Cx)
=O
(ii) We may get E(X I Y = y) = 0 in similar way.
Markov Chains
E15) Since
therefore
(i) k=8
m
(XIY)
(iii) E(Y' I x =x ) = y
2 fX,Y
dy
~-- fx(x)
UNIT 2 THE BASICS OF MARKOV CHAIN
Structure Page No.
Introduction
Objectives
Stochastic Process
Markov Chain
Graphical Representation
Higher Order Transition Probabilities
Methods of Calculating Pn
Method of Spectral Decomposition
Method of Generating Function
Summary
Solutions/Answers
2.1 INTRODUCTION
The Markov chain is named after Andrey Markov (1856 - 1922), a Russian
mathematician. It is a discrete-time stochastic process with the Markov property.
Andrey Markov produced the first results in 1906 for these processes having finite
state space. A generalization to countably infinite state spaces was given by
Kolmogorov. Further work was done by W. Doeblin, W. Feller, K. L. Chung and
others. In most of our study of probability so far, we have dealt with independent trials
processes as a sequence of identically and independently distributed random variables.
These processes are the basis of classical probability theory, and much of statistics.
We have discussed two of the principal theorems for these processes: the Law of .
Large Numbers, and the Central Limit Theorem. We have seen that when a sequence
of repeated chance experiments forms an independent trials process, the possible
outcomes for each experiment are the same and occur with the same probability.
Further, knowledge of the outcomes of the previous experiments does not influence
our predictions for the outcomes of the present or future experiment.
In many cases in real life, we observe a sequence of chance experiments where all of
the outcomes in the past experiments may influence our predictions for the next
experiment. The sequence of random variables associated with the sequence of such
experiments may not be identically and independently distributed. For example, this
will happen in predicting a student's grades on a sequence of exams in a course. But to
allow too much generality makes the processes mathematically difficult to handle. A.
Markov studied this type of chance process where the outcome of current experiment
(not previous experiments) can only affect the outcome of the next experiment. This
type of process is called a Markov process, a particular case of which, when state
space is discrete, is called a Markov chain.
Markovian systems appear extensively in physics. Markov chains can also be used to
model various processes in queuing theory. The Page Rank of a webpage as used by
Google is defined by a Markov chain. Markov chain methods have also become very
important for generating sequences of random numbers to accurately reflect very
complicated desired probability distributions - a process called Markov chain Monte
Carlo, or MCMC for short. Markov chains also have many applications in biological
modeling, particularly population processes, which are useful in modeling processes
that are (at least) analogous to biological populations. The Leslie matrix is one such
example, though some of its entries are not probabilities (they may be greater than 1).
We will present some discussion about the concept of stochastic processes, definition
and understanding of Markov chain in Sec.2.2 and Sec.2.3, respectively. We will also
present some examples to illustrate the behavior of Markov chain. Here, we will also
i Markov Chains learn about the Transition Probability Matrix P , higher order Transition Probabilities,
k and the famous Chapman-Kolomogorov equation. In Sec. 2.4, we will represent a
t Markov chain graphically, and in Sec. 2.4, we shall compute higher order transition
t
t
k
probability. In Sec. 2.6, we will learn two methods for calculation of Pn,viz.,
Spectral Decomposition, and Generating Function.
Objectives
After studying this unit you should be able to:
explain the concept of a stochastic process, and that of a Markov chain as a special
case of stochastic process;
compute the transition probability matrix with some of its applications;
evaluate higher order transition probabilities, and unconditional probability
distribution after a number of stem in a Markov chain:
For example, in situation (i). X, denotes the total number of heads found in n
independent throws of a coin. Thus the state space, S , will be a finite set of
non-negative integers, 0,1, 2,. .., n . Here, the collection of random variables {X, ) ,
will be a stochastic process having finite state space. In situation (ii), the state space
of X, is also discrete. We can write X, = Y, + Y, +.a.+ Y, ,where Y, is a discrete
random variable denoting the outcome of the ith throw and Y, = 1 or 0 accordingly
as the ith throw shows a six or not. Representation X, = Y, +-..+Yn is valid in both
the situations (i) and (ii). In an another situations, we may consider a collection of
random variables {X,= Y, + Y, + ...+ Y, , n = 1, 2, 3,. ..) where Yi is a continuous
random variable assuming values in (0, m). Here, the set of possible values of X,
belong to the interval (0, -) ,and so the state space S of the stochastic process X, is
continuous.
From the examples above, it is clear that a stochastic process may be a discrete time The Basics of Markov
stochastic process, when the index set is a discrete set T , often a collection of the Chains
non-negative integers O , 1 , 2, 3,. .. , or it may be continuous time stochastic process
when the index set is continuous (usually space or time interval), resulting in an
uncountably infinite number of random variables. We may use alternative notation for
a stochastic process such as X(t) or X, where t indicates space or time in day. i
I
So far, we have discussed the case of a stochastic process in which X(t) are
one = dimensional random variable. There may be processes with X(t) that are more
than one = dimensional. Consider X(t) = (X, (t), X, (t)) , in which X, (t) represents
the minimum temperature, and X2(t) represents the maximum temperature in a city in
a time interval [0, t] , then the stochastic process is two = dimensional. Similarly, we
can have a multi-dimensional stochastic process also. In general, stochastic processes
can be categorized into the following four types:
Thus, we see that the index set, T , and the state space, S, of a stochastic process may
be discrete or continuous. Familiar examples of the stochastic processes include prices
of shares, varying every moment in a stock market, and exchange rates of our currency
fluctuating along with time. Other examples, such as a patient's ECG, blood pressure,
or temperature, constitute stochastic processes arising in medical sciences.
Definition 1: A stochastic process {Xi} with the index set T = (0, 1, 2,..., i,. ..) and
discrete state space S = { 1, 2,. ..,t,.. .s) is called a Markov chain, if for any of the
states, i,,i,,i2,i ,,..., i ,-,, i, j ~ S , a n d a n yn E T , w e h a v e
and in this situation, the sequence of random variables (X, ] is said to possess the
Markov Property. If X, has the outcome i (i.e. X, = i) ,then the Markov chain is
said to be in state i at nth trial, or at time n . In the definition above, s may be
infinity.
Markov Chains The Markov chain will be called a Finite Markov chain if the state space S is finite.
.
The probability P[X,+,= j I X, = i,X,_, = in, ..... X, = i,,X, = i, , X , = i,] in the above
definition denotes the conditional probability that the system will be in state j at time
n + 1 , given that the system was in the state i at time n , in the state in, at time
n - 1,.... in the state i, at time 1, in the state i, initially at time 0.Due to the
Markov property this probability depends only on the latest given state, i.e., on the
state i , at time n .
Let (i, j) denote a pair of states at the two times, say, at time m and n, m I n . The
transition probability for making the transition from state i ,Bt time m ,to state j at
time n .
P[Xn = j(X, = i ] = ~ , ~ ( m , n ) (2)
is called m - n step transition probability.
Here, we have assumed that the transition probabilities depend on both the states i, j ,
and both the times m, n
In this section, we shall only discuss the time homogeneous chains. In this case, the
m -step transition probability for a homogeneous chain may be denoted as
Now, we state below two theorems without proof. Theorem 1 is known as the general
existence theorem. Theorem 2 states three different conditions identical to the
Markov property. For the proof of these theorems, you may refer to Markov chain
with Transition Probabilities by K. L. Chung (1967).
Theorem 1: The stochastic matrix and the initial distribution completely specify a
Markov chain.
Starting with the joint probability of (X,, XI, X,, X, ,..., X , ) , we have
P[Xn =in,Xn-,=in-,,...,XI =i,,X, =i,]
=(P[Xn =in(X,-, =in., ,...,XI =il,Xo =i,] .......
P[X, = i 2 [ X I=i,,Xo =i,]P[X, = i , IX, =i,]P[X, =i,]]
using conditional probability and product rule.
Markov Chains =(P[X, = i n )X,-, =in.,] .......
P[X, = i, IX, = i, ]P[X, = i, I X, = i,]P[X, = i,] ] using the markov property
-
...Pi,&Pi& Pil&'i,,
- Pinin_, 9
,. .., pi,i,,
where uio is the initial probability and pinin_, are transition probabilities.
From the information above, we can determine the transition probabilities as follows
Example 2: Consider that in a city in the coming week the probability that a healthy
person will fall sick is 0.20, and that he will remain healthy is 0.80. Consider another
case where, in the coming week, the probabilities that a sick person will become
healthy is 0.65, will die be 0.25 and will remain sick is 0.10. We will form a P -matrix
on the basis of the above information. In each week, a person will be in any of three
conditions -healthy, sick or dead, which are tlie three states for his health. If each
week's health condition depends on the .condition on the previous week only, then we
have a Markov chain {X, } ,where X, represents health condition of a person in the
nthweek. From the above information we can determine the transition probabilities.
Assume the states Healthy, Sick and Dead are denoted as 1, 2, 3 respectively, then
***
So far, we have only defined a Markov chain. Now, let us dis'cuss a graphical
representation of a Markov chain.
Example 3: The directed graph of the Markov chain given in the Example 1 is
shown in Fig 1.
Example 6: Each time a certain horse runs in a three-horse race, he has probability
1/ 2 of winning (W), 1/ 4 of coming in second (S) , and 114 of coming in third (T),
independent of the outcome of any previous race. We have an independent trials
process, but it can also be considered as a Markov chain. Here, we choose outcomes of
the race, that is, winning, second, and third as three states. It may be modeled as a
Markov chain (X, 1, where X, denotes the outcome of the nthrace. From the above
information, we can determine the transition probabilities shown in the transition
matrix that follows.
W S T
Remark 2: In general, we see that any sequence of discrete i.i.d. (identically and
independently distributed) random variables can be considered as a Markov chain. In
such a case, the transition matrix has identical rows, each row being the probability
distribution of the random variable, X, .
You may now try the following exercises on the basis of above discussions.
E2) Draw the transition graph for the problem given in Example 6.
E3) The schooling status of a student in any year may be represented by 6 states,
namely, nursery, class one, class two, ... class five. Let pi denotes the
probability that a student in state i in any year jumps to a higher class (state
i + 1) and qi denotes the probability that a student remains to the same class
(state i ) in the next year. Assume that class 5 is the highest status and it can not
be crossed. If X, denotes the status of a student in the nth year of his schooling,
show that {X, ) is a Markov chain. Set up the matrix of transition probabilities.
So far, we have learnt about the Markov chain and its graphical representation. Now in
this section we shall continue the discussion to the higher step transition probabilities.
k=l
prj for i, j~ S , the matrix form of which can be written as
F p ( n ) = p(n-1) P
Proof: Using the Law of Total Probability and Conditional Probability discussed in
Unit 1, we have
5:"' =P[x, = jIX, =i]
=x S
k=l
P[X, = j,X,-I = k I X, = i] (using the Law of Total Probability)
=x S
k=l
{P[x, = j I X,-I = k, X, = i] P[x,-,= k I X, = i] } (using conditional Probability)
=x S
k= l
{P[x,= j I X,-I I
= L]P[X,-I = k X, = i] } (using the Markov Property)
The last expression is the ijth element in the multiplication of matrices P'"-" and
P = (pij ) . Thus, we get
P'") = P("-"P [since ) = P(")]
***
Theorem 4: Let P be the transition matrix of a homogeneous Markov chain. The ij'
entry of the matrix P" gives the probability that the Markov chain, starting in the
state i initially, will be in state j after n steps, i.e.
p'"' = p"
Proof: Clearly, the probability that the Markov chain, starting in state i , will be in
state j after n steps is p r ) ,which is the ij' entry of the matrix P(") . Therefore, the
theorem will be proved if we prove P(") = P o .
Now, let us apply the method of induction to prove this. For n = 2 ,
From Theorem 4 we have P ( ~=) P(') P = P P = P2 .
Again, assuming the result for n, we can verify it for n + 1 as follows.
p ( n + l ) = p(n)p (using Theorem 4)
= PnP (using the assumption for n)
= pn+l
Hence, by the method of induction, it is proved that the statement is true for every
positive integer, n.
***
Theorem 5 (Chapman-Kolomogorov Equation): A time homogeneous Markov
chain satisfies the equation
p ~ + " ) = p[;)p$),i,
k j=1,2 ,..., s,for m, n=O,1,2, ...
k=l
or, in matrix form P('"+") = P ( ~ ) P ( " )P(')
, =I
= 2
k=l
P[x,+,= j, Xn = k I X, = i] (using the Law of Total Probability)
S
=~{P[x,+=
~ j l X , =k,X, =i]
k=l
Theorem 6: Let P be the transition matrix of a time homogeneous Markov chain, and
let u be the initial probability vector. Then the unconditional probability P(Xn = j)
that the chain is in state j after n steps, is the j' entry in the vector u(") given by
,,(") = UP"
Proof: Since,
u .'") = P [X, = j]
J
s
= P[Xn = j, X, = i] (using the Law of Total Probability)
The Basics of Markov
Chains
At the end points, 0 and N ,there are two typical behaviours for the particle. If the
particle reaches at 0 ,then it remains at 0 with probability 'a' or moves to 1 with
probability '1 - a ' . Similarly, assume the particle remains at N with probability 'b'
and moves to N - 1 with probability '1 - b' whenever it reaches that position. The
position 0 will be an absorbing barrier when a = 1, and it will be a reflecting barrier if
a = 0 . The position 0 will be called elastic barrier or partially reflective barrier if
0 < a < 1. Similarly, the position N will be absorbing when b = 1 , reflective when
b = 0 , and elastic/partially reflective when 0 < b < 1.
Suppose the particle starts in a position, k(0 5 k 5 N) , at time 0 . Let X, denotes the
position of the particle at time n . Then, clearly, sequence {X,) follows the Markov
property. The N + 1 possible positions (0, 1, 2, ...,N) of the particle are the possible
states of the chain.
Here, for 0 < r < N
P[Xn =r+I(X,-, = r ] = p
P[X, =r-lIXn-, = r ] = q
Also, when r = 0
P[X, =1IXn-, =0]=1-a
P[X, = 0 I X,-, = 0]= a
and, when r = N
P[X, = N - l J X n - , = N ] = l - b
P[X, =NIX,-, = N ] = b
and, initial probability vector is u = (0, 0, ..., 1, 0, ..., 0) , with 1 at the k + 1" place in
the vector.
***
Example 8 (Gambler s Ruin Problem): There are two gamblers A and B playing
against each other. Let the initial capital of A be x units, and B is z - x units. At
each move player A can win one unit from B , with probability p , or can lose one unit
to B ,with probability q(p + q = 1) . In due course, after a series of independent moves,
if the capital of A reduces to zero, then A is ruined, and the game ends, and if his
capital increases to z ,then B will be ruined and the game ends. This problem can be
modeled as a random walk problem with absorbing barriers at two ends. Here, the
Markov chain {X,) ,represents the capital of A at the nth move of the game. It has
z + 1 states ranging from 0 to z . The transition probability matrix can be obtained
directly from Example 7 by putting a = 1, b = 1 and N = z . Also, initial state k = x
with probability 1 .
***
Example 9: Let the initial distribution in Example 1 of the Simple Weather Model
be u = (0.7 0.2 0.1) . Let the three states sunny, cloudy and rainy be represented by
integers 1, 2, 3 respectively.
Then, the probability that the initial day is sunny, the first day is rainy, the second day
is cloudy, and the third day is sunny, is given by:
Also, the probability that all successive four days starting from the initial day are
sunny equals:
Example 10: Suppose that in Example 9 the P -Matrix is modified as given below
The Basics of Markov
Chains
Let us now find the probability distribution of weather for the first day, second day,
and the third day, and also the probability distribution of the sixth day. The
probability distribution of weather for the first day is the probability distribution of
X , . Now
u,")=p[x, = I ]
=0.465
Similarly,
u,(" = P [X, =2]
1 and
I u3'"= P[X, =3]
I
0.250 0.250 0.500)
I
= (0.7 0.2 0.1) 0.450 0.100 0.450 = (0.465 0.22 0.315)
We may get distribution of X, , the probability distribution of weather for the second
{Jay,as
and the distribution of X, ,the probability distribution of weather for the sixth day,
will be
U(6) = up6
Remark 3: Here we see that sixth day probability distribution of weather has become
independent of the initial distribution. You may verify that the same distribution for
the sixth day will be found if any other initial distribution is used. This happens since
all rows of the p6are identical and define a probability distribution on a set of states.
You may also find more powers of P higher than 6. Are they identical to p6? If, for
a large n , all rows of the Pn become identical and define probability distribution,
then the Markov chain is called Regular Markov chain. We shall discuss these
chains in Unit 3.
on Sn-, and not on any of the Sn-, , S,-,, ..., So, the sequence (S, ] is a Markov
chain with state space S = (0,1, 2, ..., j, ...) .
Again,
,
p,, = P[S, = j 1 S, = i]
= P[X, = j - i] = p,-, (say)
Therefore, Markov chain (S, ] is time homogeneous, or has stationary transition
probabilities p, , given above. Here, p , depends only on the j - i , in such a case a
Markov chain is said to have stationary independent increments, and the Markov chain
is called additive process. If the sequence {X, ) is a sequence of i.i.d Bernoulli
random variables with PrX, = 11= p and P[X, = 01= q , then the P -Matrix of
(S, } will be
0 1 2 - - -
P=
Example 12 (Ehrenfest Model): This example is a special case of a model, called the The Basics of Markov
Chains
Ehrenfest model, given by P. and T. Ehrenfest in 1907. It has been used to explain the
diffusion of gases. Suppose we have two urns that contain between them four balls.
At each step, one of the four balls is chosen at random and moved from its present urn
to the other urn. We choose, as states, the number of balls in the first urn. Thus, the
set of states are (0,1, 2, 3,4) . The sequence of random variables (X, } , the denoting
number of balls in the first urn at successive steps is a Markov chain.
po,=P(Xn=jIX,-,=O)=O,when j # l , p , j = P ( X , = l ~ X , - , = O ) = l . Since,when
the first urn is empty then the chosen ball is certainly from the second urn and it will
he transferred to the first urn.
pl0 =P(X, =O)Xn-, =1)=1/4, p,, =P(X, =21Xn-1 =1)=314, pll =p,, =p,, = O .
Since, when the first urn has one ball, then the chosen ball will be in the first urn with
probability, 114, and in the second urn with probability, 314. Similarly, the other
transition probabilities can be obtained.
The transition matrix is then
Example 13: A Markov chain has the following initial distribution u , and P -matrix.
u ={113,1/3, 1/31 and
We have
0.25 0.5 0.25
P2= 0.25 0.25
[0.5 0.25 ::~5,/
We find the following results
uP=(1/3 113 113)= u , and
up2 = (113 113 113) = u .
We can find the same relation for all the higher powers of P . Therefore, we get, in
general, UP"= (113 113 113) = u for every n, and thus using Theorem 6, we have
u(")= u for all n .
With such initial distribution u , the Markov chain will be called a Stationary .
Markov chain. The probability distribution, u , is then called the Stationary
Distribution of the Markov chain. This type of Markov chain will be discussed in
detail in Unit 3.
*1*
(::)
E6) Coinpxte the matrices p2,p3, p4 for the Markov chain defined by the transition
matrix ]P= . DO the same for the transition matrix P= (y b). ~nterpret
E7) Assume in Exercise 1 that every man has at least one son. Find the probability
that a randomly chosen grandson of a businessman is former.
So far, we have been discussing Markov chains, the related probability matrices
including initial distributions, and their interpretations. Now, in the next section we
shall discuss two important methods of calculating Pn.
Let P be the transition matrix of a finite order, s x s . Suppose P has distinct eigen
values (or latent roots, or characteristic roots, or spectral values) A,, A,, h,, ...,h, .
They are the roots of the characteristic equation IP - 111 = 0 , where I is s x s
identity matrix.
A non-zero column vector x is called a right eigen vector (or latent, or characteristic
vector) of P , corresponding to eigen value hi if it satisfies the vector
-
equation (P SiI)x = 0 . A non-zero row vector y' is called a left eigen vector
(or latent, or characteristic vector) of P corresponding to eigen value h i , if it
-
satisfies the vector equation y'(P hiI)= 0 . The right and left eigen vectors are not
unique. For examples, if x is a right eigen vector the kx is also a right eigen
vector, k # 0 is scalar. A similar rule holds for the left eigen vector.
Let xi,yl be right and left eigen vectors corresponding to hi , (i = 1, 2,. .., s) .
(iii)
S
(iv) P = Ch i ~, the
i Spectral Decomposition
i=1
The Basics of Markov
In general, we have, the following result, using the above properties Chains
Remark 4: (i) Since the row sum equals unity for all the rows of P ,therefore one is
always an eigen value of P , and the corresponding right eigen vector x has all the
elements unity. Therefore, the constituent matrix B, corresponding to eigen value one
' will have all rows identical. It is illustrated below.
I il\ ( ~ 1 Y2 .-. Y,)
(ii) All the eigen values of P are less than or equal to unity in absolute value.
(iii) If the matrix P is positive and irreducible, then it has only one eigen value
equal to unity, while if P is non-negative, irreducible and cyclic of order h ,
then it may have h(2 I ) repeated eigen values equal to unity. .
I (iv) If unity is the non-repeated eigen value of P , then lim Pn + B, , the constituent
n-4-
matrix for eigenvalue 1 .
***
Example 14: We will find Pn for the transition matrix P ,given in Example 5.
1 2
-
B1= c ~ x ~= Y ~
And thus,
p'n' = p" =
i=l
We also get 'p: , the probability of transition from state i to j in n -step, as the
(i, j)th element of Pn. Therefore,
b+a(l-a-b)" a-a(l-a-b)" ,
ply' =- , pi;' =
(a + b) (a + b)
b-b(1-a-b)" a+b(l-a-b)"
p$' = , p$'=
(a + b> (a + b>
Asn+w ,
pf;'+b/(a+b), p:i)+a/(a+b),
p$' + b / ( a + b ) , p$) + a / ( a + b )
Let the initial distribution be u = (p 1- p) . Then, the unconditional probability
distribution of X, is
U(n)
1 (I-a-b)"
--- (b a) + (ap - bq - ap + bq)
(a + b) (a + b)
and hence,
54
The Basics of Markov
Chains
a
=- + (-ap+bq)(l-a-b)" , where, q = 1- p
(a + b) (a + b)
Example 15: Three girls, A, B, and C stand in a circle to play a ball throwing game.
Each one can throw the ball to one of her two neighbours, each with probability 0.5.
The sequence of random variables {X,} , where X, denotes the player with whom the
ball will lie at nth throw will form a Markov chain. The Markov chain will have
following P -matrix
0 0.5 0.5
P= 0.5 0 0.5
i0.5 0.5 0 ]
It is doubly stochastic as all the row sums and column sums are unity. Therefore,
corresponding to eigen value 3Ll = 1 , the left eigen vector y; and right eigen vector
x, ,both will have all the elements as one only. Thus, the constituent mauix B, will
have all rows identical and all columns identical (Bhat, 2000, 109p.j. Therefore, all
the elements of B, will be identical, and each will be equal to s-' = 1 1 3 . Thus,
113 113 113
As the name suggests in this m~,hod,a function is determined which generates the Pn
for different values of n .
Define the generating function
P ( S ) = I + S P + S ~ P ~ + S ~ P ~ + - . . +where
S ~ P ~I s+I <-1. ~
(Here, s is a variable of the function P(s) , and not the size of state space as before.)
Since, as n -,.o, s"Pn -,0 , therefore, P(s) = (I - s ~ ) -,' the inverse of matrix
(I - sP) .
Thus, we may obtain Pn by extracting the coefficient of sn in the expansion
(I- SP)-' , and pin' as the (i, j)th component of Pn.
Example 16: Let us find Pn, where P the transition matrix is given below:
(q P 0 0)
O O where q = l - p , a n d O < p < l
P=IO 0 n PI
Simplifying,
.=[; ;;;
E8) Let state space of a Markov chain be S = (0, 1, 2), and its P -Matrix is
Obtain Pn.
E10) Find Pn and its limiting value for large n for the matrix given below.
The Basics of Markov
Chains
[
P = 0.25 0.5 0.25 . Find P" and its limit for large n.
0 0.5 oo5]
Now we bring this unit to a close. But before that let's briefly recall the important
concepts that we studied in it.
2.7 SUMMARY
In this unit, we have tried to acquaint you with the basic features of a stochastic
process, and Markov chains. We are summarizing these below:
1. We introduced the idea of Stochastic process and presented their classification
according to the nature of time and state space. The Markov chain was explained
as a particular case of the Stochastic process.
2. We defined the Markov property and the Markov chain, and presented some
examples suitable to a Markov model.
3. We studied properties of transition probabilities, and the transition matrix.
P
I 4. We described how the one-step transition in a Markov chain can be represented as
a digraph.
5. We have acquainted you with the concept of higher order transition probabilities.
6. We have defined initial distribution, and illustrated the method of computing
unconditional probability distribution of states of Markov chain at n" step in
terms of transition matrix, and the initial distribution.
7. We have described the method of spectral decomposition and generating functions
to compute Pn.
El) Let the three states, business, agriculture and public servant be denoted by
1, 2, 3 , respectively. Let the random variable X, denote the choice of
profession of the sons in n" generation. Let P, denote the probability that
Markov Chains
given a person is in iLhstate (profession) his son will choose j" state
(profession). Therefore, we get the following P -Matrix
Assume that the states nursery, class one, class two, ..., class 5, are denoted by
numbers O,1, 2,. .., 5 . The P -Matrix will be
Using, Example 13, we may get following result by putting a = 0, b = 0.5 in the
expression of Pn
and as n is large P n +
(: a
when^=[ 1 0
0 1) t h e n P 2 = P 3 = P 4 = (b y) and in this case Pn = P
0 1
) 1 0
)
0 1
andwhen P = ( 1 0 then P 2 = P 4 = (0 1 and P 3 = (1 0 ).
when n is odd
In this case Pn =
when n is even
(1 s s2)
We can get the following coefficients easily since (1- s3)-I has only powers
of s3,
p3n+l -
- Coefficient of sl+'" in (I -
i
0.5 0.3 0.2
P = 0.2 0.4 0.4
0.1 0.5 0.4 I
the eigen values of the P are 1,O.l and 0.2.
For the eigen value hl = 1, the right eigen vector is x, = , and the left eigen
( ~ - s ~ ( = ( i - s ) ~ ( i - ~ ~ sfor
) +(O~ ,( < 1 ,
(1- s)-I 0
(I- sp)-' = 0 (1- s)-'
(1- s)-' (1- p3s)-' p,s (1- s)-I (1- p,s)-' (1- p3s)-'
Pn = coefficient of sn in (I - SP)-'
As n becomes large,
1 0
1
p*/(l-p3) ~ 2 1 ( 1 - ~ 3 )0
For the eigen value h, = 1 , the right eigen vector x, = ,while the left eigen
y; = ( 2 4 2) and thus,
llc, = y; x, = 8
0.25 0.5 0.25
vector, will be
B, =c,x,y; = (0.5)
Thus, we have
3
I=]
0.25 0.5 0.25
Pn = CI,"B, = 0.25 0.5 0.25 + (0.5)"*' 0
I0.25 0.5 0 . 2 j 13
0
3.1 INTRODUCTION
i
In Unit 2, we defined the Markov chain and its basic properties. In that unit, we
I limited our discussions only to the finite state Markov chains. Therefore, transition
matrices were only of finite order. Here, in Unit 3, we will deal mostly with the
i Markov chain with countable states. Therefore, the transition matrices will, generally,
be of infinite order. We will study the classification of states under various
1 conditions. Mainly, we will gain knowledge of the limiting behaviour of the chain.
Some chains stabilize after a long time. Their distributions become independent of the
initial distribution of the chain. Due to this property, the limiting distribution is called
the stationary distribution. We will learn the criterion under which the chains achieve
the limiting distribution. We shall start the discussion in Sec. 3.2 with the
classification of states of the Markov chains. Here, we will present the concepts of
communication of states, closed set, and irreducibility. We will study about first
passage time to the states and their expectations. In Sec. 3.3, we will present the
concepts of recurrence and transience of states. We will develop some mechanism to
identify states of the Markov chain. We will present some examples to illustrate these
concepts. In Sec. 3.4, we will study the limiting behaviour of the chains. We will
define stationary distributions and will study various conditions under which the
chains will approach to the stationary distribution. In this unit, we will present various
theorems without proofs.
Objectives
After studying this, unit you should be able to:
classify and categorize the states of the Markov chain into communicating classes
and closed sets;
learn about first passage time to a state and tinie of first return (recurrence time) to
a state;
find the mean first passage time to the states, and mean time of first return (mean
recurrence time) to the states;
recognize the recurrent and transient states;
understand the concept of Stationary Distribution and conditions for the existence
of the limiting distribution of Markov chains.
Definition 1: A state j in S is said to be accessible from the state i ,if and only if,
there exists a non-negative integer m, such that pp)> 0 . The symbol, i d j ,denotes
this relation between states i and j .
Thus, if for all non-negative integers m, p?) > 0 ,then state j is not accessible from
i ,and we will denote this by if(j. When two states, i ,and j are accessible to each
other, then we say that the states i imd j communicate with each other. In other
words two states, i ,and j are called communicative, if and only if, there exists
integers m, n ( 2 0) , such that p p ' > 0, pf' > 0 . The symbol i o j denotes the
relation that i and j communicate with each other.
Definition 2: Let j be a state in the state space S of the Markov chain. Then a sub
set C(j) of S is called the communicating class of j if all the states in C(j)
communicate with j. Symbolically, given, k, j E S ,then k E C(j) if and only if
jek.
Remark
(i) The relation of accessibility is neither reflexive, nor symmetric. However it is
j 3 an integer m such that p p ) > 0, and j + k 2 3 an
transitive. Since, i +=
integer n such that p$) > 0 ,
From the Chapman-Kolomogorov equation, we have
We should note that whether or not a Markov chain is irreducible is determined by the
state space S , and the transition matrix (pi,) , the initial distribution, is irrelevant in
this matter. If all the elements of the transition matrix (plj) are non-zero, then the
Markov chain will necessarily be irreducible. All the off-diagonal elements of the
transition matrix (pi,) of an irreducible Markov chain can not be zero. In fact, no row
can have all the off-diagonal elements zero.
Example 1: Let a Markov chain with the state space S = {O,l, 2,3,4,5} with the
following transition matrix:
0 1 2 3 4 5
For set of states C, = {O,l}, the states are communicating with each other.
Since pol = 1, P$' = p,,p,, = 1
plo = 1, pi:) = plopo,= 1 , and
Markov Chains for state 0 , we have pOj= 0 for all j~ C,
for^ state 1 , we have plj = 0 for all j E C,
therefore, C, = {0,1) is a closed set of the given Markov chain.
Similarly, we may show that sets C, = (3, 4) is a communicable set, and the sets
outside it are not accessible from C, thus, C, is closed set.
Here, the state 5 is an absorbing state since the set 15) is closed and it is a singleton.
It may be verified that the sub matrices formed by the closed sets are stochastic as
fo1lows.
We can verify for C, = (0, 1)
for state 0 s C,, p,, = p, + p,, =1
KC,
forstate l e C , , Xplj=p,,+p,, = I .
jE Cl
We can also verify it for other closed sets C, = (3, 4) and Cj = {5). -
The transition matrix can also be rearranged in following canonical form.
where PI, P,, P, are sub matrices of P corresponding to the three closed sets, 0 are
zero matrices, Q is sub matrix corresponding to the transient state and R is
remaining sub matrix.
The Markov chain is reducible since it has three closed sets and a transient set.
***
3,2.2 First Return and First Passage Probabilities
Thus, fin' is the probability that the chain starting in state i returns to state i for the
first time after n steps. Clearly, fil' - pii,and we define ':f = 0 , for all states i in
state space S . We call fin' , the probability of first return (also called time of first
recurrence) to state i , in time n .
Similarly, we may define, the probability of first passage from state i to state
j, i # j in time n denoted by fin' as
Stationary Markov Chains
Thus, fiin' is the probability that the chain starting in state i and visits the j for the
first time after n steps. Clearly, fiil)= pi,, and we now define fiO'= 0 for all i, j in
S . As defined in Unit 1, P(')= I , i.e.
p(O'=l
u and piE'=~ for k # j forall j , k in S .
We present below a theorem without proof, which provides two equations: the first, a
relationship between fin' , the probability of first return to state i in time n and
p$', the n-step transition probability from state i to itself, and the second relates the
probability of first passage from state i to state j in time n given by f,(") and the
n-step transition probability from state i to state j given by pFi. These relations
may help in computation of n-step transition probabilities and in proving results on
limiting behaviors of states of Markov chain.
Definition 5: Assume that a time homogeneous Markov chain starts in state i , and
define
m
n =O
Then fii is the probability of ultimate or eventual return to the state i , having started
in this state, i.e., the probability that the chain ever returns to the state i . A state i is
called a recurrent state or persistent state if fii = 1, i.e., when the return to the state i
is certain. We will use both the terms recurrent and persistent for this purpose in
this unit. A state i is called transient when the ultimate, or eventual, return to the
state i is not certain, i.e., fii < 1.
q is called the mean recurrence time of the state i . A recurrent state i is called
non = null recurrent (also called positive recurrent, or positive persistent) if pii < oo ,
i.e., if its mean recurrence time is finite, whereas it is called null recurrent if
pii = o o , i.e., if its mean recurrence time is infinite.
Theorem 3 (Recurrence a Class Property): Let two states, i and j, in state space
S, i t,j , (that is, both states are in the same communicating class), then both the
states are either transient, both are persistent null, or, both are persistent non-null
together. Both are aperiodic or periodic with same period. Thus, all the states in a
communicating class have the same classification. Either all are transient, or non-null
persistent, or null persistent. All are aperiodic, or periodic with the same period.
Corollary 1: In an irreducible chain, all the states are either transient, all are persistent
null, or all are persistent non-null together. If all are periodic then all will have the
same period.
Definition 8 (Passage Time): Parallel to the recurrence time, now we define the
passage time. Firstly, define
the probability that the chain starting in state i will ever reach the state j , i.e., the
probability of ultimate passage from state i to j . If fij = 1 , then the ultimate passage
to state j is certain given that the chain starts in the state i . In such a case,
fin', n = 0,1, 2, 3, ... is the probability distribution of first passage time to the state j
given that the chain starts from i . Then, we may define the mean of the first passage
time from the state i to state j as,
m
p..
11 = C n f i j n j
n=O
Definition 9 (Recurrent Chain): A Markov chain is called recurrent, or persistent, if
all its states are recurrent
Transient Chain: A Markov chain is called transient if all its states are transient.
Ergodic State and Ergodic Chain: A persistent, non-null, aperiodic state of a
Markov chain is called ergodic state. If all states in a Markov chain are ergodic, then
the chain is said to be ergodic.
Example 2: Let a Markov chain with state space S = (1, 2, 3, 4,5} have the
following transition matrix. We will determine the nature of the states of the chain.
1 2 3 4 5
110 1 0 0 0'
L 2 1 0 0 0 0
1 P = 3 114 0 114 112 0
4 0 0 0 114 314
5,o 0 0 1 0 ,
68
On the basis of the probability of first return to the states, we will classify the states as Sfationaq Chains
follows. Since,
fll =f11
(1' f'2'
+ II
+f (3) + * .
11
.
= 0 + 1 . 1 + 0 + 1...= 1
therefore, state 1 is persistent. Again
f,, = f;;) + f;;) + f;;) + ...
=0+1.1+0+ ...= 1
therefore, state 2 is persistent. Similarly,
f,, = f;;' + fi," + fi," + ...
= 1 / 4 + 0 + 0 + ...= 114
therefore, state 3 is transient and
The states 1 and 2 are periodic with period 2 since, for state 1
t = G.C.D. {m: f:?) > 0) = G.C.D.(2) = 2
and for state 2
t = G.C.D.{m : f,,'"' > 0) = G.C.D.(2) = 2
The Mean Recurrence Time of the persistent (recurrent) states are obtained as
follows:
p,, = 1.f;;' + 2.f,':' + 3.f;;' + ...
=1.0+2.1+0.0=2
The states {4,5) are persistent, non-null, and aperiodic. Therefore, they are ergodic.
The states {I, 2) are persistent and periodic with period 2 . The state 3 is transient.
It may be easily verified that the given Markov chain is reducible. Its state space can
be decomposed into three communicating classes C, = (1, 21, C, {4,5} and C, = {3}.
Further, C,, C, are closed sets. At states in C, are aperiodic and positive recurrent.
Whereas all states in C, are positive recurrent and periodic, each with period 2 . This
verifies the results of Theorem 3, and the fact that periodicity is a class property.
***
Example 3: Let a Markov chain have following transition matrix.
Markov Chains
All the states are communicable. Therefore, it has only one closed set, the state space
S = {0,1,2). The chain is irreducible.
The probability of ultimate return to state 0 will be
Thus, 0 is a non-null persistent (positive recurrent state). Since the Markov chain is
irreducible, all its states must be non-null persistent by Theorem 3. Let us verify this
by actual calculation for other states in S .
The probabilities may also be obtained using a digraph, described in Unit 2. The
digraph for the given transition matrix has been shown below, in Fig 1. To find f$),
the probability of first return to state 1 in one step, find the paths from node 1 to node
1, traveling any edge only once. Add all the probability labels on the edges of these
paths. There is no such path in this example, and the probability f,(:) will be zero. To
find f;), the probability of first return to state 1 in two step, find the paths from node
1 to node 1 traveling along two distinct edges. We have two paths 1+8 +1 and
1+ 2 + 1. Multiply probability labels on the edges of each -path, and add such
3 1 1 5
multiples of all paths to get f:). Therefore, f,(:) = - .- + - . I = - and so on. We get
4 2 4 8
the probability of ultimate return to state 1 as
Thus, all the states are non-null since mean recurrence times for all the states are
finite, as stated above.
Therefore, from the definition of periodic recurrent states given in Egn. (7), the
period t = G.C.D. {m:fim)> 0) = G.C.D. (1, 2, 4, ...) = 1.
Therefore, the state 0 is aperiodic. Since the chain is irreducible, all the states will be
aperiodic.
Therefore, all the state are persistent (recurrent), aperiodic, and non-null and thus,
ergodic. Thus the chain will be ergodic. We have, thus, verified that periodicity,
positive or null recurrence, transience, etc., are class properties.
Since all the states are communicable, it has only one closed set, sample, space
S = {0,1, 2, 3) . The chain is irreducible.
We can use the following digraph for the given transition matrix to compute the
probabilities of first return, as in the previous example.
=O,fg) =1.1.-=-
(1) - f ( 2 )
1 1 >0, f&') = f ( 5 ) = O f ( 6 ) =-.-=-
2 1 2 >O ...
f00 - 00 00 '00
3 3 3 3 9
therefore, from the definition of periodic recurrent states given in Eqn. (7), the period
t = G.C.D. {m:film)> 0) = G.C.D. (3, 6, 9, ...j= 3 and probability of ultimate return to
the state 0 is
~ a r k o vChains
Thus, the state 1 is recurrent with period 3 . Now, since Markov chain is irreducible,
all the other states have the same classifioation, that is, recurrent with period 3 .
El) Determine the classes, probability of ultimate return to the states, mean
recurrence time of the various states of the Markov chain having the following
transition matrix. Is the chain irreducible?
E2) Determine the closed set, probability of ultimate return to the states, periodicity
of states, mean recurrence time of the states of the Markov chain having the
following transition matrix. Is the chain irreducible?
0 1 2
2 0 1 0
So far we have discussed the classification of states and chains. In this section, we
will focus on recurrence and transience in details.
Definition 10 (Generating Function): Let a,, a,, a,, a,, ... be a sequence of real
'
numbers, and s be a real number, then a function A(s) defined by.
is called a generating function of the sequence a,, a,, a,, a,, ... provided this
power series converges in some interval -so < s < so . If a non-negative discrete
random variable X assumes only integral values O,1,2,3, .. . and the sequence {a,}
represents the probability distribution of X ,such that a, = P[X = k], then A(s) is
called the probability generating function of random variable X . .
Theorem 4: For a state i of a Markov chain, let P,,(s) be the generating function of Sfationary Chains
the sequence ) , and F;, (s) be the generating function of the sequence (f r ) } .
Then, we have
1
PII(s) = ,lsl<l (10)
1- (s)
***
Theorem 5: For state i, j of a Markov chain, let P,,(s) he the generating function of
the sequence {p',"' 1, P,(s) he the generating function of the sequence {p(n)},and
( JJ
F,(s) be the generating function of the sequence (f(n)) . Then, we have for / s I< 1
4
(1 1)
I (ii) p,,6 ) = F, 6 ) (1- F, (s1l-l
***
(12)
I Let us illustrate the following example to understand.
I
Example 5: Consider a Markov chain with the following transition matrix
0 1 2 3
I
*.
I
We can verify that the matrix is periodic.
0 1 2 3 0 1 2 3
I
0 1 2 3
t
P3 = P6 =...=
For a state 0 of the Markov chain, the generating function of the sequence of the
transition probabilities {pg)] is given by
- 1
~ , ( s ) = ~ p ~ )=1+0.s+0.s2+-s
s* 3
+...,since p$ = I
k=O 3
and the generating function of the sequence of the probabilities of first return {I$))
(as obtained in Example 4), will be F,(s) as given below
+
= 1 ---?
for ( s l < l
1-s
Markov Chains Therefore,
I- s ~
1- Fo0(s)= - 7 and thus, we may verify Egn. 10, that
1
Pii(s) =
1- qi(s)
, I s 1 farstate i = 0 . Similarly, we can verify the relations
given in Eqn. (1 1) and (12) for the states of the Markov chain.
m
n=O
p): =m . The result is immediate from the Eqn. (lo),
since for a recurrent state i as s f 1, 1 - F,, (s) 10 and therefore, the left hand side
m
1
equation qi(s) -t pi:) and the right hand side tends to infinity as s f 1 .
n=O 1 - 6, (s)
The following theorem gives some limiting results for recurrent states of a Markov
chain.
n =O
cm
(iii) If j is a transient state, then no matter where the Markov chain starts, it makes
only a finite number of visia to state j, and the expected number of visits to j is
finite. It may enter into a recurrent class in a number of steps and when it enters
there, then it remains there for ever. On the other hand, if j is a recurrent
state, then if the chain starts at j, it is guaranteed to return to j infinitely often
and will eventually remain forever in the closed set containing state j. If the
chain starts at some other state i ,it might not be possible for it to ever visit state
j. If it is possible to visit the state j at least once, then it does so infinitely Stationary Markov Chains
lim
n+-
pF) + -
fij
pjj
! p. =-,
lo
i + l p.. =- 1
i + 2 ''+I i+2
and pij=O, j # i + l or j+O
Therefore, the transition probability matrix is an infinite matrix
1 2 3 4 5 -
i
0 112 112 0
1 213 0 113 0
0
0 -
-( -
- - -
I - -)
For the state 0 , the probabilities of first return will be
1 and, thus, the state 0 of the Markov chain is recurrent. Since all states can be reached
from any state, hence, the Markov chain is irreducible. Again, the state 0 is aperiodic
since the G.C.D. of times with positive probabilities of first return to the state 0 is
one. From the class property of recurrence stated above, the Markov chain will be
recurrent and aperiodic.
The results obtained above in this section have some essential implications for the
finite Markov chain. The state space of a finite Markov chain must contain at least
one persistent state. Otherwise, if all the states of a Markov chain become transient
then, the transition probabilities, pf' -+ 0 as n -+ m for all i and j in the state space
S and it is impossible since for all i~ S we must have xpr'
JE S
= 1. Therefore, a
Markov chain with a finite state space, S ,cannot be a transient chain. Again, a finite
Markov chain cannot have any null persistent state. Since the states of the closed set
having this null persistent state will form a stochastic sub-matrix (say PI ) of transition
matrix P and as n + a , we will have P," -,0 and, hence, P will not remain
stochastic. This is not possible. Thus, a finite Markov chain cannot have a null
persistent state.
The following theorem is now easy to visualize.
Theorem 9: In a finite irreducible chain, all the states are non-null persistent.
Let us find the probability of ultimate passage time from state 3 to state 4, and to state
5, i.e., f,, and f,, .
co
and
Since, state 4 is aperiodic, non-null, persistent. Therefore, using Eqn. (19), we have as
n-+m
4s - 1 3 3
p,, -+---.-=-
(n)
and
Pss 2 7 14'
P[Zn = 11 = p and P[Zn = -11 = q . It means that the particle either moves a unit in left
direction with probability q , or a unit in right direction with probability p at each
time. Therefore, { X, } will be a Markov chain with the state space
+ + +
(0, 1, 2, 3 - .) . Its transition probability matrix P can be expressed as
P= 1
o - q o p o -
1 - o q o p -
Since all the states are communicating with every other state, therefore the matrix and
the chain is irreducible.
From Corollary 1, the chain is either transitive, or persistent null, or persistent
non-null.
Consider the state 0 . It is clear that we cannot return to 0 in an odd number of steps.
Let it return to state 0 in time 2n , then during this period it must have moved in right
direction n times, and in left direction n times. Therefore, using binomial
distribution we have
( 1 a
Now, as zp&'"
n =O
c w when 4pq < L ile., if p t q ,in that case, the state 0 is transient.
xPg'=
n =O
;-& = w and the state 0 is recurrent. Hence, the chain will be
i 1 1
recurrent if p = q . Further, since, pg)= -(4pq) =- +0 as n + w and the
6 &
I
state 0 is recurrent, then by using Theorem 6, we may conclude that the chain will be
I recurrent null when p = q = 1/ 2 .
***
Markov Chains You may now try the following exercises on the basis of above discussion.
E4) Consider a countable state Markov chain having a transition probability matrix
as follows
0 1 2 3 4 -
E5) Obtain the limiting value of p$"' as n 4 .o for i = 0,1, 2, 3 for the Markov
chain given in El).
E6) Obtain the limiting value of Pn as n + .o for the Markov chain given in E3).
Before discussing limits of u'"' ,it is better to describe the notion of a stationary
distribution of a Markov chain. We will say that the Markov chain (X, ) possesses
stationary distribution if the distribution u'") is the same for all n , that is,
-
- u (0) -
- u the initial probability vector, for all n 2 1 . Thus, the probability that
the chain is in, say, state i is the same for all time; although Xn is moving from one
state to another, it looks statistically the same at any time. Since a stationary
distribution of the chain does not depend on n , we drop the superscript and denote it
merely by x = (n,, n,, ...) . In general, if x = (n,, 7c,, ...) is a probability mass
function, giving stationary distribution of a Markov chain {X, ) with initial
distribution u = {ul u2 .--ui -.. ) where u. = PIXo = i] for each i and with the
1
transition matrix P = (Pij) ,on the state space S = (1, 2, ...) , then x = (n,, n, ....) is
called a stationary distribution for the transition matrix P . Here, we will make the
study for countable state space. We will describe for the finite state space separately
when the behaviour becomes different from the countable state space.
Definition 11(Stationary Distribution): Let a Markov chain {X,, n = 0,1, 2, ...] Stationary Markov Chains
Theorem 10: If the initial distribution of a Markov chain {Xn) is the same as its
stationary distribution, then all the random variables in the sequence, {X, ) , will have
identical distributions.
Remark: Let n j denote the probability that the system is in state j. The condition in
Eqn.(20) is often called a balancing equation, or equilibrium equation. The
stationary distribution x on S is such that if our Markov chain starts out with the
initial distribution u = nr ,then we also have u1 = x ,since by Theorem 7 of Unit 2, and
Eqn.(20) above, we have u(') = UP= x;P = .n . That is, if the distribution at time 0 is
x , then the distribution at time 1 is still x . In general, u(")= a for all n (for both
finite as well as countable state space). Due to this reason x is called a stationary
distribution
Let us now discuss the stationary distribution for an Irreducible Aperiodic Markov
Chain:
In this the existence of stationary distributions for the irreducible aperiodic Markov
chains, and the long term behaviour of the distribution of these chains. The following
theorems describe the related conditions. These theorems are applicable for both, finite
as well as countable state space chains.
and {nj) is the unique stationary distribution o i the IV~,-'-OV chain. In this case, as
n + a, the distribution of the Markov chain at time n tends to the stationary
distribution, not depending on the initial distribution of the chain. In other words, if
the Markov chain {X, , n = 0, 1, 2, 3, ...) an irreducible, aperiodic, and non-null
Markov chain, and X, have the distribution u(O),an arbitrary initial distribution and
u("' , be its distribution, at time n (n = 0,1, 2,3, ...) , then lim u'"' = x exists for all
n+-
states i
Theorem 12: An irreducible aperiodic Markov chain {X, , n = 0,1, 2,3, ...) will be
ergodic if the balancing equation
Markov Chains
isS
has a solution {X,) (X, not all zero) satisfying 1 xj ( < m .
jsS
Conversely, if the chain be ergodic then every non-negative solution {xj) of the
balancing Eqn. (23) satisfies I xj I < m .
jcS
Remark
(i) The limiting probability distribution given by lim u(") = x is called a steady
n+a,
state distribution of the Markov chain.
(ii) If the probability transition matrix P is symmetric for a Markov chain having
finite state space S = {1,2,3, ..., s) ,then the uniform distribution [ x j = 11s for
all j = 1,2,3, ..., s ] is stationary. More generally, the uniform distribution is
stationary if the matrix P is doubly stochastic, that is, the column-sums of P
are also 1 (we already know the row-sums of any transition matrix P are all 1).
(iii) A finite aperiodic irreducible chain is necessarily ergodic, thus, any finite
aperiodic irreducible chain has a stationary distribution.
Example 9: Find all stationary distributions for the transition matrix given below.
The given chain is finite, irreducible, aperiodic since all the transition probabilities are
positive and hence, non-null. It must have a unique stationary distribution.
Let x = (75, x,) be the stationary distribution. From Eqn.(2O), we have the balancing
equations
n1 = 0.3nl + 0.2n2
x, = 0 . 7 ~ + 0.87~~
~
one equation is redundant; they both lead to the equation 0.7x, = 0 . 2 ~ From
~ . above,
we have an infinite number of solutions. Using the second condition from Eqn.(21),
R , +n 2 = 1 . (25)
2 7
We get unique solutionn, = - , x, = -
9 9
since the given Markov chain is ergodic. We, may also verify that
, rn
where pll is the mean recurrence time of state 1 that may be obtained, as follows
80
Stationary Markov Chains
I
1
1 Let us now discuss the criterion for transience.
I'
Here, we will state a condition for a countable state space Markov chain to be
transient. It may be mentioned here, again, that any finite Markov chain cannot have
all the states as transient. If a finite state space Markov chain is irreducible then it will
1 necessarily be recurrent. We will also present an example to find the stationary
distribution for an irreducible chain having a countable state space.
Theorem 13:An irreducible aperiodic Markov chain with a countable state space
S = (0, 1, 2,. ..) and a transition matrix P = (Pi,) will be transient (all the states will be
I transient) if, and only if
Irn
m
[;J
-xi = - xl (i 2 1) , thus we have
for i 2 l .
From the above solution, we see that xi will be bounded if p > q . Therefore,
according to Theorem 13, the Markov chain will be transient when p > q ,and
recurrent when p lq .
Let us find the stationary distribution of the chain when p & q . The balancing
equation to solve will be 4
no = qno + (4x1
which may be written as
Therefore,
( :)' ( c)llS
n J. - n .J-1 = - n - -
and, thus
n j - 5= x(?+j-1
- nr ) = (el j
n, - no which gives
I:(=
r =O
j
n, no for j 2 0
When p = q ,then the infinite series in the Eqn. (27) will be divergent stationary
distribution will not exist in this case, and the chain will be null recurrent. When
(
p < q , then the Eqn. (27) gives x, = 1- -
I):! and we have a stationary distribution
I):[- (:l
nJ = (1 P.
for j 2 0 which is a geometric distribution with parameter -
Till now, we have considered only irreducible aperiodic chains and discussed the
problem of the existence of stationary distributions. In general, a Markov chain may
have no stationary distribution, one stationary distribution, or infinitely many
stationary distributions. We have given the conditions for the existence of unique
stationary distribution, along with examples. The chains presented were ergodic-finite
or countable. We have also presented a Markov chain which does not possess any
stationary distribution. The chains of this type were transient or null recurrent,
however, they must be countable (since we cannot have a finite chain as transient, or
null recurrent). As an example of the chain having infinitely many stationary
distributions, we may take a transition matrix P to be the identity matrix, in which
case all distributions on the state space will be stationary. Such chains may be finite,
or countable. Example 12 illustrates the case. When the Markov chain has finite state
space then it will have at least one stationary distribution whether it is reducible or
irreducible, periodic or aperiodic.
Example 12: Consider a Markov chain having the following identity transition
.=[;
matrix.
Llet the stationary distribution be ~c= (n,, x,, z,) . Then, the balancing equation of the
chain will be
I] 0 0)
Clearly, all arbitrary vector with non-negative components, R = (n, ,n, ,z,) satisfying
x, + n2+ n3 = 1, will be stationary distributions. For example, vc = (0.1 0.3 0.6) .
Thus, for thls chain there will exist infinite number of stationary distributions. Here,
we may easily observe that a countable identity transition matrix also possesses an
infiDite number of stationary distributions.
***
Example 13: Consider the Markov chain having the following transition matrix
1 0 0)
Solving the balancing equation
Markov Chains
=4
Similarly, we may get p,, = 2, p22= 4.
Here, we also observe that (nl,x,,n3)=(l/p,,l/p11,11p22).
However, the long run equilibrium probabilities, Theorem 6, is applicable,~
***
Remark: In the example above we encountered a Markov chain that is irreducible,
persistent, but periodic, has a unique stationary distribution having probabilities
reciprocal to the mean recurrence time. We have a theorem which explains such
behaviour. It says that if a Markov chain is irreducible and non-null (positive), then
there will exist a stationary distribution. The result is based on the Cesaro limit. This
tells us that if {a, } is a sequence, such that lirn a n = 1, then the partial sum
n -t-
1 " 1 "
-zai also converges to the same limit, i.e.; lim - z a i = 1. This limit also
n + l i, n+-n+l
exists, even when lirn an does not exist.
n -+-
Theorem 14: An irreducible, positive recurrent Markov chain has a unique stationary
= (n, ,z2,rc, ,...) ,given by
distribution ~r
1 " 1
lim - ~ P i j ' " " = 5 =- for all j ,whatever state i may be.
n+mn+l j=o P~
E7) A Markov chain has an initial distribution u"' = I116 112 1/31, and the
following transition matrix.
Find its stationary distribution. Is it unique? Verify that the limiting distribution
of the chain is stationary.
E91 Consider the Ehrenfest chain, presented in Ex@le 12 in Unit 2, with only 3
balls. Then, the transition matrix will be
Stationary Markov Chains
I
! (i) Test the irreducibility of the chain. (ii) Find its stationary distribution.
E10) Consider a Markov chain {X,) with a countable state space having the
following transition probabilities
pl,+l= P, Pli-I = ql, P, +'I,=', PI * q,>O
7 (i > 1)
poo=qo. Pol=Po. po,qo>O.
3. We obtained the distribution of the first passage time to the states, and first
recurrence time of states. We also defined mean time of first passage and mean
recurrence time.
4. We acquainted you with the concept of recurrence and transience.
5. We investigated the limiting behaviour of the Markov chain.
6 . We defined stationary distribution, and illustrated the procedures to find stationary
distributions.
7. We investigated some situations when the stationary distributions of the chains
exist, and is also the equilibrium distribution.
3.6 SOLUTIONSIANSWERS
El) The states (O,1, 2) form a communicating class. State 3 does not communicate
with any state. The chain is reducible.
The probability of ultimate return of the stc.es
- 3 1
1s
f, =Cf$'=O+O+l.-+l.-1+0...=1
n =O 4 4
1 3 1
f,, =Zf:;' =0+0+0+1~1~-+0+1~1.-~1.-+0+~.~
n =O 4 4 4
f, =Zf;;'
3 1
=o+o+-.I+-.1.1+0+ ... = I
n=O 4 4
85
Markov Chains -
f33=Cf;;' =o
n=O
Therefore, the states O,1, 2 are recurrent, and state 3 is transient.
The mean recurrence times for the recurrent states are given below.
w
3 1
p, = C n f g i =O.O+l.0+2.-+3.-+0=- 9
n=O 4 4 4
w
1 3 9
pi, = C n f $ ' =O.O+l.O+2.0+3-+4.0+5.-+6.0+7.-+--=9
n=O 4 16 64
w
3
pn =Cnf::'=O-0+1,0+2.-+3.-+0=- 1 9
n=O 4 4 4
-r -t -t 9 - t - t
n=O ,
0 1 0 .75 0 .25
Therefore,
p:;"' > 0 and p ~ ~ n =
+ 0l ' and i .
n=O -r LV "-,
1
E3) Since p/:' = - for n > No. Therefore,
2
x
n=O
03
1
will be recurrent. Again, it will be aperiodic, since p!:' = - > 0 . Further, the
2
1
state i is non-null since pin' + -+ 0 .
2
1 1
Using Theorem 6, piln)+ -= - ,we get mean recurrence time of state i , as
Pii 2
p..= 2 .
E4) The given Markov chain is irreducible since all the states can be reached from
every other state of the chain.
For state 0 , the probabilities of first return will be Stationary Markov Chains
f$'=p, f $ ' = q . p , f $ ' = q . q . p , f{'=q.q.q.p ,...
Clearly, the state 0 is aperiodic, since the period of the state is one. The
probability of ultimate return to state 0 will be
-
f, =Cf,$' = o + ~ + ~ ~ + ~ ~ ~ + ~ ~ ~ + ~ ~ ~ + . . .
n=O
= p(1- q)-I = 1
and, thus, the state 0 of the Markov chain is recurrent. From the class property
of recurrence it follows that the Markov chain will be recurrent, and aperiodic.
I
E5) See the solution of El). In the given problem, we have found that states O,1, 2
are non-null, aperiodic, and recurrent, and state 3 is transient. The mean
1 recurrence Times for the recurrent states are found as follows.
9 9
Pm = *, pll = 9 and p,, =q.
Using Theorem 6 and Remark 3 we have,
1 E6) See the solution of Example 3. All the states were aperiodic, non-null persistent.
The mean recurrence time for the state O,1, 2 were obtained as
11 11 11
Po,, =-, p -- and p,, =-
6 'I- 4 1
Using Theorem 6, we have
The limits of pl;"' for other i, j may be obtained using Theorem 8. According to
f..
this theorem, when state j is non-null aperiodic persistent lim pl;") +2
n+=-
Pjj
We may find ultimate probability first passages fij from the transition matrix
given in the example, as follows
1 1 1 1
limp$)
n+-
+-=-
1111 11'
lim pi?
n--
+- =-
1111 11
Therefore,
:1 A A)
Markov Chains E7) The chain is irreducible since all the states are communicable. We may also
verify that the chain is aperiodic recurrent. Solving the balancing equation
x = YCP, i.e.
( 7 ~ ~ , ~ ~ , 7 ~ ~ ) = ( 7 ~ ~ , 7 ~ ~ , ~ ~ )
we get
7c1 =057C2+ 0.57~~
7c2 =057c1+ 0.57~~
Solving these equations along with the condition zl+7c2 + x3= 1, we get, unique
solution (nl, z2, n3)= (113,113,113) . This is obvious since P is doubly
stochastic.
From Theorem 7 of Unit 2, we have u'"' = U(~'P'"'. From Theorem 11,we
P'"'
Therefore, as
I;[+
havepij'"' + n j as n += . In the matrix form, we have as n += .
E8) Thechain
1 2 3
E10) The chain is irreducible as all states communicate. To determine the nature of
states of the Markov chain, we will study the nature of the solution of the
following equations, as given by Eqn. (26)
-
xi = pijx for all states i = 1, 2,3, ... T
j=1
Therefore, we get Stationary Markov Chains
and, hence
Xi+l -Xi =qi
X i -Xi-1 pi
We get, recursively
-X-
NOTES