Lecture Notes-March06
Lecture Notes-March06
Remark: These are point-form summary of the lectures for MSCI 431. There is no
guarantee for completeness and accuracy, and therefore, they should not be regarded as a
substitutable for attending course lectures. Lectures are based on the book titled Introduction to Probability Models by Sheldon M. Ross.
1. Introduction to Probability Theory:
Lecture 1:
Probability
Experiment: any process whose outcome is not known in advance.
Flip a coin
Roll a die
Sample space: set of all possible outcomes.
Example. Flipping a coin: S = {H, T }
Example. Rolling a die: S = {1, 2, ..., 6}
Example. Flipping two coins: S = {(H, H), (H, T ), (T, H), (T, T )}
Example: Rolling two dice: S = {(m, n) : 1 m, n 6}
Event: subset of the sample space.
Example. Flipping a coin: E = {H}, the event that a head appears.
Example. Rolling a die: E = {2, 4, 6}, the event that an even number appears.
S
Union of events E and F (E F ): all outcomes that are either in E, or F , or both.
S
Example. Flipping a coin: E = {H}, F = {T }. Then, E F = {H, T }.
T
Intersection of events E and F (E F ): all outcomes that are in both E and F .
T
Example. Rolling a die: E = {1, 3, 5}, F = {1, 2, 3}. Then, E F = {1, 3}.
Consider the events E1 , E2 , .... Then,
S
Union of these events,
i=1 Ei , is a new events that includes all outcomes that
are in En for one value of n = 1, 2, ....
T
Intersection of these events,
i=1 Ei , is a new events that includes all outcomes
that are in all En for n = 1, 2, ...
1
Complement of E (E c ): outcomes that are in the sample space S and are not in E.
Probability: Consider an experiment with the sample space S. For an event E S,
we assume that P (E) is defined and satisfies
(i) 0 P (E) 1,
(ii) P (S) = 1,
(iii) ForSa sequenceP
of the events E1 , E2 , ... that are mutually exclusive (En
P ( i=1 Ei ) =
i=1 P (Ei ).
Em = , n 6= m),
P (E F )
.
P (F )
Example. Suppose cards are numbered 1 to 10 and they are placed in a hat and one
of them is drawn. We are told that the number on the drawn card is at least 5. What
is the probability that the number on the drawn card is 10?
E: event that number on the drawn card is 10
F : event that number on the drawn card is at least 5
P (E F )
1/10
P (E|F ) =
=
= 1/6.
P (F )
6/10
Lecture 2:
Conditional Probability (continue ...)
Example. A family has two children. What is the probability that both are boys given
that at least one of them is a boy?
E: event that both are boys.
F : event that at least one of them is a boy.
P (E F )
1/4
P (E|F ) =
=
= 1/3.
P (F )
3/4
Example. Suppose that an urn contains 7 black balls and 5 white balls. We draw two
balls from the urn without replacement. What is the probability that both balls are
black?
E: event that the second one is black.
F : event that the first one is black.
P (E F ) = P (F )P (E|F ) = (7/12)(6/12) = 42/132.
Example. Bev can take a course in computers or chemistry. If Bev takes the computer
course, then she will receive an A grade with probability 1/2. If she takes the chemistry
course, then she will receive an A grade with probability 1/3. Bev decides to base her
decision on the flip of a fair coin. What is the probability that Bev will get an A in
chemistry?
E: event that she receives an A.
F : event that she takes chemistry.
P (E F ) = P (F )P (E|F ) = (1/2)(1/3) = 1/6.
Example. Suppose that each of three men at a party throws his hat into the center of
the room. Each man randomly selects a hat. What is the probability that none of the
three men selects his hat?
S S
T
T
Remark.
P
(E
E
E
)
=
P
(E
)+P
(E
)+P
(E
)P
(E
E
)P
(E
E3 )
1
2
3
1
2
3
1
2
1
T
T T
P (E2 E3 ) + P (E1 E2 E3 ).
Ei : event that ith man selects his own hat.
S S
P (E1 E2 E3 ): probability that at least one of them selects his own hat.
S S
1 P (E1 E2 E3 ): probability that none of them selects his own hat.
P (Ei ) = 1/3, i = 1, 2, 3
Since each man is equally likely to select any of them.
T
P (E1 E2 ) = P (E1 )P (E2 |E1 ) = (1/3)(1/2) = 1/6.
Since given that the first man has selected his own hat, there remain two hats
that the second man may select.
3
Ej ) = (1/3)(1/2) = 1/6, i 6= j.
T
T
T
P (E1 E2 E3 ) = P (E1 E2 )P (E3 |E1 E2 ) = (1/6)(1) = 1/6.
T
P (E3 |E1 E2 ) = 1: since given that the first two men get their own hats, it
follows that the third man must also get his own hat.
S S
T
T
T
P (E1 T E2 T E3 ) = P (E1 )+P (E2 )+P (E3 )P (E1 E2 )P (E1 E3 )P (E2 E3 )+
P (E1 E2 E3 ) = 1/3 + 1/3 + 1/3 1/6 1/6 1/6 + 1/6 = 2/3.
S S
1 P (E1 E2 E3 ) = 1 2/3 = 1/3.
P (Ei
Independent Events
Two events E and F are independent if P (E
F ) = P (E)P (F ).
Lecture 3:
Independent Events
Two events E and F are independent if P (E
F ) = P (E)P (F ).
F (b) denotes the probability that the random variable X takes on a value less
than or equal to b.
Example. Flipping two fair coins: Let Y denote the number of heads
Lecture 4:
Discrete Random Variables:
Discrete random variables are often classified according to their probability mass functions.
The Bernoulli Random Variable
Consider a trial, or an experiment whose outcome can be classified as either a
success or failure.
Let X equal 1 if the outcome is a success and 0 if the outcome is a failure.
Let 0 p 1 denote the probability that the trial is a success.
The probability mass function of X is
P (0) = P (X = 0) = 1 p,
P (1) = P (X = 1) = p.
X is a Bernoulli random variable with parameter p.
Example. Flipping a fair coin: consider heads as a success and tails as a failure.
P (0) = 1/2, P (1) = 1/2.
The Binomial Random Variable
Suppose that n independent trails, each of which results in a success with
probability p and in a failure with probability 1 p, are to be performed.
Let X represent the number of successes that occur in the n trials.
X is a binomial random variable with parameters (n, p).
Probability mass function of a binomial random variable having parameters (n, p):
n!
pi (1 p)ni , i = 0, 1, , n.
P (i) = ni pi (1 p)ni =
(n i)!i!
Example. Suppose that each patient in a hospital is discharged on day t with
probability p. What is the distribution of the number of discharged patients on
day t given that there are n patients in the hospital on that day?
n 1.
Rb
a
10
f (x)dx.
Lecture 5:
Several Important Continuous Random Variables:
The Uniform Random Variable
A random variable is said to be uniformly distributed over the interval (0, 1) if its
probability density function (pdf) is
(
1, 0 < x < 1
f (x) =
0, otherwise
f (x)dx =
R1
0
(1)dx = 1.
1 , <x<
f (x) =
0,
otherwise
Example. Calculate the cumulative distribution function (cdf) of a random variable uniformly distributed over (, ).
Ra
1 Ra
F (a) = P (X (, a]) = f (x)dx =
(1)dx. Then,
0,
a
a
, <a<
F (a) =
1,
a>
Example. If X is uniformly distributed over (0, 10), calculate the probability that
X < 3, X > 7, 1 < X < 6.
1 R3
3
P (X < 3) =
(1)dx = .
0
10
10
1 R 10
3
P (X > 7) =
(1)dx = .
7
10
10
1 R6
5
P (1 < X < 6) =
(1)dx = .
1
10
10
Exponential Random Variable
11
13
Lecture 6:
Discrete Case (continue ...)
Example. Suppose that teams A and B are playing a series of games. Team A
wins each game independently with probability 2/3 and Team B wins each game
independently with probability 1/3. The winner of the series is the first team to
win 2 games. Find the expected number of games that are played.
X: number of games
P (X = 2) = P (X = 2, A wins 2 of the first 2)+P (X = 2, B wins 1 of the first 2)
2 2
1
5
2
+
=
=
3
3
9
P (X = 3) = P (X = 3, A wins 1 of the first 2)+P (X = 3, B wins 1 of the first 2)
2 1 1 1 2
1 1 2 1 1
12
2
2
= 1
+ 1
= .
3
3
3
3
3
3
27
66
E [X] = 2P (X = 2) + 3P (X = 3) = .
27
Continuous Case
Consider a continuous random variable X with probability density function f (x).
Then, the expected value of X is defined by
Z
xf (x)dx.
E[X] =
2( )
2( )
2
Example. Calculate E[X] when X is an exponential random variable with parameter .
R
E[X] = 0 x(ex )dx.
Integrating by parts (dv = ex , u = x) yields to
Z
Z
1
1
x
x
E[X] =
x(e )dx = xe |0 +
(ex )dx = 0 (ex )|
.
0 =
0
0
Expectation of a Function of a Random Variable:
Suppose we are interested in a function of X, say g(X).
If X is a discrete random variable with probability mass function P (x), then for any
real-valued function g(x),
X
E [g(X)] =
g(x)P (x).
x:P (x)>0
14
Example. Suppose X has the following probability mass function P (0) = 0.2, P (1) =
0.5, P (2) = 0.3. Calculate E[X 2 ].
E[X 2 ] = (0)2 (0.2) + (1)2 (0.5) + (2)2 (0.3) = 1.7.
If X is a continuous random variable with probability density function f (x), then for
any real-valued function g(x),
Z
E [g(X)] =
g(x)f (x)dx.
Example. The dollar amount of damage involved in a car accident is an exponential random variable with expected value of 1000. The insurance company pays
the whole damage if it is more than 400 and 0 otherwise. What is the expected
value that the company pays per accident?
Let define g(X) as
(
0, 0 < X < 400
g(X) =
X, 400 < X < 1000
Then,
Z
E [g(X)] =
400
g(x)f (x)dx =
(0)
Z
1 x/1000
1 x/1000
e
e
dx+
(x)
dx
1000
1000
400
1 1/1000x
e
, u = x) yields to
1000
Z
1/1000x
E[g(X)] = xe
|400 +
(e1/1000x )dx = 400e400/1000 +1000(e400/1000 ).
400
15
Lecture 7:
Expectation of a Function of a Random Variable...
Remark. If a and b are constants, then
E[aX + b] = aE[X] + b.
Remark. V ar(X) = E [(X E(X))2 ]: the variance of X measures the expected square
of deviation of X from its expected value.
V ar(X) = E [X 2 ] (E[X])2 .
Example. Calculate V ar(X) when X is the outcome of rolling a fair die.
1
1
1
1
1
1
91
E [X 2 ] = 12 ( ) + 22 ( ) + 32 ( + 42 ( ) + 52 ( ) + 62 ( ) = ( )
6
6
6
6
6
6
6
7
E[X] = (It is obtained in Lecture 5.)
2
91
7
35
V ar(X) = E [X 2 ] (E[X])2 =
( )2 = .
6
2
12
Joint Distribution Functions:
If X and Y are discrete random variables, the joint probability mass function of X
and Y is defined by
P (x, y) = P (X = x, Y = y).
The probability mass function of X can be obtained from P (x, y) by
X
PX (x) =
P (x, y).
y:P (x,y)>0
Example. Suppose X and Y are discrete random variables with the probability
mass function P (x, y),
P (1, 1) = 1/4, P (1, 2) = 1/8, P (1, 3) = 1/16, P (1, 4) = 1/16,
P (2, 1) = 1/16, P (2, 2) = 1/16, P (2, 3) = 1/4, P (2, 4) = 1/8.
What is the probability that Y = 3?
PY (3) = P (Y = 3, X = 1) + P (Y = 3, X = 2) =
1
1
5
+ = .
16 4
16
Remark. For discrete random variables X and Y , and real-valued function g(X, Y )
XX
E[g(X, Y )] =
g(x, y)P (x, y).
y
16
Example. Two fair dice are rolled. What is the expected value of the product of
their outcomes?
Let Z denote the product of their outcomes.
Let X denote the outcome of the first die.
Let Y denote the outcome of the second die.
49
7 7
E[Z] = E[XY ] = E[X]E[Y ] = ( )( ) = .
2 2
4
18
Lecture 8:
Conditional Probability: Discrete Case
Recall that for any two events E and F , the conditional probability of E given F is
defined, as long as P (F ) > 0, by
T
P (E F )
P (E|F ) =
.
P (F )
If X and Y are discrete random variables, the conditional probability mass function of
X given that Y = y is defined by,
P (X = x|Y = y) =
P (x, y)
P (X = x, Y = y)
=
.
P (Y = y)
P (y)
Example. Suppose that P (x, y), the joint probability mass function of X and Y ,
is given by
P (1, 1) = 0.5, P (1, 2) = 0.1, P (2, 1) = 0.1, P (2, 2) = 0.3
Calculate the conditional probability mass function of X given that Y = 1.
P (Y = 1) = P (1, 1) + P (2, 1) = 0.6
P (X = 1|Y = 1) =
P (1, 1)
5
P (X = 1, Y = 1)
=
= .
P (Y = 1)
P (Y = 1)
6
P (X = 2|Y = 1) =
P (X = 2, Y = 1)
P (2, 1)
1
=
= .
P (Y = 1)
P (Y = 1)
6
Example. The joint probability mass function of X and Y , P (x, y), is given by
1
1
1
P (1, 1) = , P (2, 1) = , P (3, 1) = ,
9
3
9
1
1
P (1, 2) = , P (2, 2) = 0, P (3, 2) = ,
9
18
1
1
P (1, 3) = 0, P (2, 3) = , P (3, 3) = .
6
9
19
P (X = 2, Y = 2)
P (X = 3, Y = 2)
P (X = 1, Y = 2)
+ (2)
+ (3)
P (Y = 2)
P (Y = 2)
P (Y = 2)
1
1
0
5
= (1) 9 + (2) + (3) 18 = .
1
1
1
3
6
6
6
5
10
k
5k
8k
108+k
(0.4)
(1
0.4)
(0.4)
(1
0.4)
8k
= k
=
15
8 (1 0.4)158
(0.4)
8
5
k
10
8k
15
8
,
0 k 5.
Example. There are n components. On a rainy day, component i will function with
probability pi . On a nonrainy day, component i will function with probability qi ,
for i = 1, n. It will rain tomorrow with probability . Calculate the conditional
expected number of components that function tomorrow given it rains.
Let define Xi and Y as
(
1, if component i functions tomorrow
Xi =
0, otherwise
(
1, if it rains tomorrow
Y =
0, otherwise
Then,
"
E
n
X
#
Xi |Y = 1 =
i=1
n
X
E [Xi |Y = 1] =
i=1
n
X
i=1
pi .
The function f (x, y) is called the joint probability density function of X and Y .
The probability density function of Y can be obtained from f (x, y) by
Z
Z Z
fY (y)dy,
f (x, y)dxdy =
P (Y B) = P (X (, ), Y B) =
B
where fY (y) =
f (x, y)dx.
If X and Y have a joint density function f (x, y), then the conditional probability
density function of X, given that Y = y, is defined for all values of y such that fY (y),
by
f (x, y)
f (x|y) =
fY (y)
21
Lecture 10:
Conditional Probability: Continuous Case ...
X and Y are jointly continuous if there exists a function f (x, y), defined for all real x
and y, having the property that for all sets A and B of real numbers
Z Z
P (X A, Y B) =
f (x, y)dxdy.
B
The function f (x, y) is called the joint probability density function of X and Y .
The probability density function of Y can be obtained from f (x, y) by
Z Z
Z
P (Y B) = P (X (, ), Y B) =
f (x, y)dxdy =
fY (y)dy,
B
where fY (y) =
f (x, y)dx.
fY (y) =
0
E[g(X, Y )] =
Z
y6xy(2xy)dxdy =
Z
=
0
(6x2 y 2 2x3 y 2 3x2 y 3 )|10 dy
3
7
4
6y 2 2y 2 3y 3 dy = ( y 3 y 4 )|10 = .
3
4
12
22
If X and Y have a joint density function f (x, y), then the conditional probability
density function of X, given that Y = y, is defined for all values of y such that fY (y),
by
f (x, y)
.
f (x|y) =
fY (y)
Example. Suppose the joint probability density of X and Y is given by
(
6xy(2 x y), 0 < x < 1, 0 < y < 1
f (x, y) =
0,
otherwise
Calculate the probability density function of X given that Y = y.
f (x|y) =
6xy(2 x y)
f (x, y)
6xy(2 x y)
6x(2 x y)
=
= R1
=
.
fY (y)
y(4 3y)
(4 3y)
6xy(2 x y)dx
0
E[X|Y = y] =
xf (x|y)dx =
(2 y)2
6x2 (2 x y)
dx =
(4 3y)
4 3y
6
4
5 4y
.
8 6y
23
Example. Sam will read either one chapter of his probability book or one chapter
of his history book. Suppose the number of misprints in a chapter of his probability book is Poisson distributed with mean 2 and the number of misprints in his
history chapter is Poisson distributed with mean 5. Assume that Sam is equally
likely to choose either book. What is the expected number of misprints that Sam
will come across?
X: the number of misprints.
(
1, if Sam chooses his history book
Y =
2, if Sam chooses his probability book
Then,
1
7
1
E[X] = E[X|Y = 1]P (Y = 1) + E[X|Y = 2]P (Y = 2) = 5( ) + 2( ) = .
2
2
2
Example. A miner is trapped in a mine containing three doors. The first door
leads to a tunnel that takes him to safety after two hours of travel. The second
door leads to a tunnel that returns him to the mine after three hours of travel.
The third door leads to a tunnel that returns him to his mine after five hours.
Assuming that the miner is at all times equally likely to choose any one of the
doors, what is the expected length of time until the miner reaches safety?
X: the time until the miner reaches safety.
Y : the door he initially chooses.
Then,
E[X] = E[X|Y = 1]P (Y = 1)+E[X|Y = 2]P (Y = 2)+E[X|Y = 3]P (Y = 3).
E[X|Y = 1] = 2.
E[X|Y = 2] = 3 + E[X].
E[X|Y = 3] = 5 + E[X].
Therefore,
E[X] = E[X|Y = 1]P (Y = 1)+E[X|Y = 2]P (Y = 2)+E[X|Y = 3]P (Y = 3)
1
1
1
E[X] = (2) + (3 + E[X]) + (5 + E[X]) E[X] = 10.
3
3
3
Lecture 11:
Computing Probabilities by Conditioning ...
Let E denote an arbitrary event and Y denote a discrete random variable. Then, the
probability of event E can be obtained by
X
P (E) =
P (E|Y = y)P (Y = y).
y
(
1, if it rains tomorrow
Y =
0, otherwise
Then,
P (X = 0) = P (X = 0|Y = 1)P (Y = 1) + P (X = 0|Y = 0)P (Y = 0)
0
9 (9)
= (0.6)(e
0!
0
3 (3)
) + (1 0.6)(e
0!
) = (0.6)(e9 ) + (0.4)(e3 ).
Markov Chains
Stochastic Processes: A discrete-time stochastic process {Xn , n = 0, 1, } is a collection of random variables.
For each n = 0, 1, , Xn is a random variable.
The index n is often interpreted as time and, as a result, we refer to Xn as the
state of the process at time n.
For example,
Xn might be the total number of customers that have entered a supermarket
by time n.
Xn might be the number of customers in the supermarket at time n.
A stochastic process is a family of random variables that describes the evolution
through time of some process.
Example (Frog Example). Suppose 1000 lily pads are arranged in a circle. A frog
starts at pad number 1000. Each minute, she jumps either straight up, or one
pad clockwise, or one pad counter-clockwise, each with probability 1/3.
25
1
P (at pad # 1 after 1 step) = .
3
1
P (at pad # 1000 after 1 step) = .
3
1
P (at pad # 999 after 1 step) = .
3
P (at pad # 428 after 987 steps)?
Markov Chain: a discrete-time Markov chain is a discrete-time stochastic process specified by
A state space S: any non-empty finite or countable set.
In frog example, 1000 lily pads.
Transition probabilities {Pij }i,jS : probability of jumping to j if you start at i.
probability that the process will next
is in state i.
P
Pij 0, and j Pij = 1 for all i.
In frog example,
1/3,
1/3,
1/3,
Pij =
1/3,
1/3,
0,
ij =0
ij =1
ji=1
i j = 999
j i = 999
otherwise
0 < i < 5.
0 < i < 5.
Transition matrix
0
0
0.6 0 0.4 0
0 0.6 0 0.4 0
0
P =
0 0.6 0 0.4 0
0
0
0
0 0.6 0 0.4
0
0
0
0
0
1
Example (Inventory Chain). Consider an (s, S) inventory control policy. That is,
when the stock on hand at the end of the day falls to s or below, we order enough
to bring it back up to S. For simplicity, we assume it happens at the beginning
of the next day.
27
Lecture 12:
Markov Chains ...
Example (Inventory Chain). Consider an (s, S) inventory control policy. That is, when
the stock on hand at the end of the day falls to s or below, we order enough to bring it
back up to S. For simplicity, we assume it happens at the beginning of the next day.
Suppose that s = 1 and S = 5. Also, assume that the distribution of the demand on
day n + 1 is
P (Dn+1 = 0) = 0.3, P (Dn+1 = 1) = 0.4, P (Dn+1 = 2) = 0.2, P (Dn+1 = 3) = 0.1.
Xn : the amount of stock on hand at the end of day n.
State space S = {0, 1, 2, 3, 4, 5}.
Transition probabilities,
P (Xn+1 = 0|Xn = 0): when stock on hand is zero at the end of day n, 5 units
will be ordered and therefore there will be 5 units available at the beginning
of day n + 1. Since the maximum demand on day n + 1 is 3, there will be
at least 1 unit available at the end of the day n + 1. This means that given
Xn = 0, Xn+1 is greater than zero, or
P (Xn+1 = 0|Xn = 0) = P (Dn+1 5) = 0.
P (Xn+1 = 1|Xn = 0) = P (Dn+1 = 4) = 0
(similar to the above discussion).
P (Xn+1 = 2|Xn = 0): when stock on hand is zero at the end of day n, 5 units
will be ordered and therefore there will be 5 units available at the beginning
of day n + 1. If the demand on day n + 1 is exactly 3, there will be 2 units
available at the end of the day n + 1.
P (Xn+1 = 2|Xn = 0) = P (Dn+1 = 3) = 0.1
Similarly,
P (Xn+1 = 3|Xn = 0) = P (Dn+1 = 2) = 0.2.
P (Xn+1 = 4|Xn = 0) = P (Dn+1 = 1) = 0.4.
P (Xn+1 = 5|Xn = 0) = P (Dn+1 = 0) = 0.3.
Similarly, for Xn = 1:
P (Xn+1 = 0|Xn = 1) = P (Dn+1 5) = 0.
P (Xn+1 = 1|Xn = 1) = P (Dn+1 = 4) = 0.
P (Xn+1 = 2|Xn = 1) = P (Dn+1 = 3) = 0.1
P (Xn+1 = 3|Xn = 1) = P (Dn+1 = 2) = 0.2.
P (Xn+1 = 4|Xn = 1) = P (Dn+1 = 1) = 0.4.
P (Xn+1 = 5|Xn = 1) = P (Dn+1 = 0) = 0.3.
28
P (Xn+1 = 0|Xn = 2): when stock on hand is 2 at the end of day n, 0 units
will be ordered and there will be 2 units available at the beginning of day
n + 1. Therefore,
P (Xn+1 = 0|Xn = 2) = P (Dn+1 2) = P (Dn+1 = 2) + P (Dn+1 = 3) = 0.3.
P (Xn+1 = 1|Xn = 2) = P (Dn+1 = 1) = 0.4.
P (Xn+1 = 2|Xn = 2) = P (Dn+1 = 0) = 0.3.
P (Xn+1 = 3|Xn = 2) = 0.
P (Xn+1 = 4|Xn = 2) = 0.
P (Xn+1 = 5|Xn = 2) = 0.
Similarly, for Xn = 3:
P (Xn+1 = 0|Xn = 3) = P (Dn+1 3) = P (Dn+1 = 3) = 0.1.
P (Xn+1 = 1|Xn = 3) = P (Dn+1 = 2) = 0.2.
P (Xn+1 = 2|Xn = 3) = P (Dn+1 = 1) = 0.4.
P (Xn+1 = 3|Xn = 3) = P (Dn+1 = 0) = 0.3.
P (Xn+1 = 4|Xn = 3) = 0.
P (Xn+1 = 5|Xn = 3) = 0.
Similarly, for Xn = 4:
P (Xn+1 = 0|Xn = 4) = P (Dn+1 4) = 0.
P (Xn+1 = 1|Xn = 4) = P (Dn+1 = 3) = 0.1.
P (Xn+1 = 2|Xn = 4) = P (Dn+1 = 2) = 0.2.
P (Xn+1 = 3|Xn = 4) = P (Dn+1 = 1) = 0.4.
P (Xn+1 = 4|Xn = 4) = P (Dn+1 = 0) = 0.3.
P (Xn+1 = 5|Xn = 4) = 0.
Similarly, for Xn = 5:
P (Xn+1 = 0|Xn = 5) = P (Dn+1 5) = 0.
P (Xn+1 = 1|Xn = 5) = P (Dn+1 4) = 0.
P (Xn+1 = 2|Xn = 5) = P (Dn+1 = 3) = 0.1.
P (Xn+1 = 3|Xn = 5) = P (Dn+1 = 2) = 0.2.
P (Xn+1 = 4|Xn = 5) = P (Dn+1 = 1) = 0.4.
P (Xn+1 = 5|Xn = 5) = P (Dn+1 = 0) = 0.3.
29
Transition matrix
P =
0
0.1 0.2 0.4 0.3 0
0
0 0.1 0.2 0.4 0.3
Example (Repair Chain). A machine has three critical parts that are subject to failure,
but can function as long as two of these parts are working. When two are broken, they
are replaced and the machine is back to working order the next day. Assume that
parts 1, 2, and 3 fail with probabilities 0.01, 0.02, and 0.04, but no two parts fail on
the same day. Formulate the system as a Markov chain.
Xn : the parts that are broken.
State space S = {0, 1, 2, 3, 12, 13, 23}.
Transition probabilities:
P (Xn+1 = 0|Xn = 0) = 1 0.01 0.02 0.04 = 0.93.
P (Xn+1 = 1|Xn = 0) = 0.01.
P (Xn+1 = 2|Xn = 0) = 0.02.
P (Xn+1 = 3|Xn = 0) = 0.04.
If we continue, we get the transition Matrix as,
0
1
2
3
12
13
23
0
0.93 0.01 0.02 0.04
0
0
0
1
0.94
0
0
0.02 0.04
0
0
2
0
0.95
0
0.01
0
0.04
0
0
0
0
0.97
0
0.01
0.02
P = 3
12
1
0
0
0
0
0
0
13
1
0
0
0
0
0
0
23
1
0
0
0
0
0
0
30
Chapman-Kolmogorov Equations:
Pijn+m
m
Pikn Pkj
,
n, m 0, all i, j.
k=0
31
Lecture 13:
Multistep Transition Probabilities ...
Theorem. The m step transition probability P (Xn+m = j|Xn = i) is the mth power
of the transition matrix P ,
P (Xn+m = j|Xn = i) = Pijm = (P m )ij .
Example. Suppose that if it rains today, then it will rain tomorrow with probability 0.7; and if it does not rain today, then it will rain tomorrow with probability
0.4. Calculate the probability that it will rain two days from today given that it
is raining today. Also, calculate the probability that it will rain four days from
today given that it is raining today.
We model the problem as a Markov chain.
State space S = {0, 1} where 0 denotes that it rains and 1 denotes that it
does not rain.
Transition matrix
"
#
0.7 0.3
P =
0.4 0.6
Then,
"
P2 =
#
0.61 0.39
0.52 0.48
2
= 0.61.
The desired probability is P00
To calculate the probability that it will rain four days from today given that
it is raining today, we consider
#
"
0.5749
0.4251
P4 =
0.5668 0.4332
4
= 0.5749.
The desired probability is P00
To obtain the mth power of a matrix, you can use WWW.WOLFRAMALPHA.COM.
For example, copy {{0.7, 0.3}, {0.4, 0.6}}4 in this website to get P 4 .
8
What about P00
?
"
#
0.5714
0.4286
P8 =
0.5714 0.4286
8
The desired probability is P00
= 0.5714.
10
What about P00 ?
"
#
0.5714 0.4286
10
P =
0.5714 0.4286
Rows are identical! It says that the probability that it will rain in 10
days, or 2o days, ... is 0.5714.
32
10
The desired probability is P00
= 0.5714.
P =
0
0.1 0.2 0.4 0.3 0
0
0 0.1 0.2 0.4 0.3
0
20
. Therefore,
We are looking for P23
P 20 =
0.0909 0.1556 0.231
Rows are identical! It says that the probability that there will be 3 units of
inventory on hand in 20 days, or 25 days, ... is 0.2156.
20
= 0.2156.
The desired probability is P23
Classification of States:
State j is said to be accessible from state i if Pijn > 0 for some n 0.
Example. Consider a Markov chain with the following transition matrix.
P =
1
2
1
2
0.2 0.8
0 1.0
33
2 and 3 are accessible from 1. 1 is accessible from 2 since with probability 0.1 we
can go from 2 to 3, and with probability 0.4 we can go from 3 to 1. Similarly, 2
is accessible from 3.
Two states i and j that are accessible to each other are said to communicate, and we
write i j.
Example. Consider a Markov chain with the following transition matrix.
P =
1
2
1
2
0.2 0.8
0 1.0
P =
1
2
1
2
0.2 0.8
0 1.0
The Markov chain has two classes, {1} and {2}. Therefore, it is not irreducible
or it is reducible.
Example. Consider a Markov chain with the following transition matrix.
1
2
3
The Markov chain has one class, {1, 2, 3}. Therefore, the Markov chain is irreducible.
State i is said to be recurrent if starting in state i the process will ever reenter state i
with probability 1. Otherwise, state i is called transient.
34
State i is said to have period d if Piin = 0 whenever n is not divisible by d, and d is the
largest integer with this property.
For instance, starting in i, it may be possible for the process to enter state i only
at times 2, 4, 6, 8, in which case state i has period 2.
A state with period 1 is said to be aperiodic.
35
Lecture 14:
Multistep Transition Probabilities ...
State i is said to be recurrent if starting in state i the process will ever reenter state i
with probability 1. Otherwise, state i is called transient.
Suppose state i is recurrent. Then it is positive recurrent if, starting in i, the expected
time until the process returns to state i is finite.
Remark. Every irreducible Markov chain with a finite state space is positive recurrent.
State i is said to have period d if Piin = 0 whenever n is not divisible by d, and d is the
largest integer with this property.
For instance, starting in i, it may be possible for the process to enter state i only
at times 2, 4, 6, 8, in which case state i has period 2.
A state with period 1 is said to be aperiodic.
Remark. An irreducible Markov chain is aperiodic if there is a state i for which
Pii > 0.
Example. Consider a MC with the following transition matrix.
0
1
2
3
4
5
2
0.3
0
0.4
0
0.3
0
P =
3 0
0.3
0
0.4
0
0.3
4 0
0
0.5
0
0.5
0
5 0.5
0
0
0
0
0.5
Is this MC irreducible?
All states communicate with each other. Therefore, the MC is irreducible.
Long-run Behavior (Limiting Behavior):
Theorem. If a Markov chain is irreducible, positive recurrent, and aperiodic, then
the long-run proportion of time that the process will be in state j, j is
j = lim Pijn , j 0.
n
36
Example. Suppose that if it rains today, then it will rain tomorrow with probability 0.7; and if it does not rain today, then it will rain tomorrow with probability
0.4. Calculate the probability that it will rain two days from today given that it
is raining today. In long-run what fraction of time it rains.
We model the problem as a Markov chain.
State space S = {0, 1} where 0 denotes that it rains and 1 denotes that it
does not rain.
Transition matrix
"
#
0.7 0.3
P =
0.4 0.6
Then,
"
P 20 =
#
0.5714 0.4286
0.5714 0.4286
#
4/19 15/19
4/19 15/19
P =
Then, 1 =
1
2
1
2
1a
a
b
1b
a
b
and 2 =
.
a+b
a+b
Example. A rapid transit system has just started operating. In the first month
of operation, it was found that 25% of commuters are using the system while
75% are travelling by automobile. Suppose that each month 10% of transit users
go back to using their cars, while 30% of automobile users switch to the transit
system. What fraction of people will eventually use the transit system?
37
#
0.9 0.1
0.3 0.7
0.1
0.4
= 0.75 and 2 =
= 0.25.
0.3 + 0.1
0.3 + 0.1
Example. Market research suggests that in a five year period 8% of people with
cable television will get rid of it, and 26% of those without it will sign up for it.
What is the long run fraction of people with cable TV?
Then, 1 =
P =
Then, Cable =
Cable
No
Cable No
0.92 0.08
0.26 0.74
26
0.26
=
= 0.7647.
0.26 + 0.08
34
38
Lecture 15:
Long-run Behavior (Limiting Behavior) ...
Example. Consider an (s, S) inventory control policy. Assume that the distribution of
the demand on day n is
P (Dn = 0) = 0.3, P (Dn = 1) = 0.4, P (Dn = 2) = 0.2, P (Dn = 3) = 0.1.
Suppose that sales produce a profit of $12 but it costs $2 a day to keep unsold units
in the store overnight. What are the optimal values of s and S that maximize the
long-run net profit?
The objective is to maximize the long-run net profit, i.e.,
E [net profit] = E [sales] E [holding costs] .
Let I denote the inventory level at the beginning of the day. Conditioning on the
inventory level at the beginning of the day, we have
X
E [net profit] =
E [net profit|I = k] P (I = k).
k
Note that P (I) is the long-run probability of having I units at the beginning of
the day.
Since it is impossible to sell 4 units in a day, and it costs us to have unsold
inventory we should never have more than 3 units on hand.
Based on the above discussion the inventory level at the beginning of a day is
either 3, 2, or 1. We consider them separately.
Suppose that the inventory level at the beginning of a day is 3, i.e., I = 3.
Then the sales of the day is
E [sales|I = 3] = E [sales|I = 3, Dn = 0] P (Dn = 0)+E [sales|I = 3, Dn = 1] P (Dn = 1)
+E [sales|I = 3, Dn = 2] P (Dn = 2) + E [sales|I = 3, Dn = 3] P (Dn = 3)
= [0 (12)] P (Dn = 0)+[1 (12)] P (Dn = 1)+[2 (12)] P (Dn = 2)+[3 (12)] P (Dn = 3)
= [0 (12)] (0.3) + [1 (12)] (0.4) + [2 (12)] (0.2) + [3 (12)] (0.1) = 13.2.
The holding costs of the day is
E [costs|I = 3] = E [costs|I = 3, Dn = 0] P (Dn = 0)+E [costs|I = 3, Dn = 1] P (Dn = 1)
+E [costs|I = 3, Dn = 2] P (Dn = 2) + E [costs|I = 3, Dn = 3] P (Dn = 3)
= [3 (2)] P (Dn = 0)+[2 (2)] P (Dn = 1)+[1 (2)] P (Dn = 2)+[0 (2)] P (Dn = 3)
= [3 (2)] (0.3) + [2 (2)] (0.4) + [1 (2)] (0.2) + [0 (2)] (0.1) = 3.8.
39
P
To obtain E [net profit] = 3k=0 E [net profit|I = k] P (I = k), we need to calculate P (I = k) which depends on the inventory control policy.
Since it is impossible to sell 4 units in a day, and it costs us to have unsold
inventory we should never have more than 3 units on hand, we compare the profit
of (2, 3), (1, 3), (0, 3), (1, 2), and (0, 1) inventory policies.
Consider (2, 3) inventory policy. In this case we always start a day with 3 units,
therefore,
P =
0.1
0.2
0.4
0.3
0.1
0.1
=
0.1
0.1
Therefore, under the (2, 3) inventory control policy, the long-run probabilities
of having 0, 1, 2, and 3 units at the end of the day are 0 = 0.1, 1 = 0.2,
2 = 0.4, and 3 = 0.3, respectively.
Also, under the (2, 3) inventory control policy, the inventory at the beginning
of a day is always 3. Therefore,
E [net profit] =
3
X
E [net profit|I = k] P (I = k) =
k=1
0.1 0.2
0.1 0.2
P =
0.3 0.4
0.1 0.2
and
P 20
19/110
19/110
=
19/110
19/110
0.4 0.3
0.3 0.4
0.3 0
0.4 0.3
41
Therefore, under the (1, 3) inventory control policy, the long-run probabilities of
having 0, 1, 2, and 3 units at the end of the day are 0 = 19/110, 1 = 30/110,
2 = 40/110, and 3 = 21/110, respectively.
Under the (1, 3) inventory control policy, the inventory at the beginning of a day
is either 2 or 3. The long-run probability that the inventory level at the beginning
of a day is 2 is P (I = 2) = 2 = 40/110, and P (I = 3) = 0 + 1 + 3 = 70/110.
Therefore,
3
X
E [net profit] =
E [net profit|I = k] P (I = k) =
k=1
0
0.7 0.3 0
P =
0.3 0.4 0.3 0
343/1070
343/1070
=
343/1070
343/1070
Therefore, under the (0, 3) inventory control policy, the long-run probabilities of
having 0, 1, 2, and 3 units at the end of the day are 0 = 343/1070, 1 = 300/1070,
2 = 280/1070, and 3 = 147/1070, respectively.
Under the (0, 3) inventory control policy, the inventory at the beginning of a day
is either 1, 2 or 3. Therefore, P (I = 1) = 1 = 300/1070, P (I = 2) = 2 =
280/1070, and P (I = 3) = 0 + 3 = 490/1070. Therefore,
E [net profit] =
3
X
E [net profit|I = k] P (I = k) =
k=1
42
Therefore, under the (1, 2) inventory control policy, the long-run probabilities of
having 0, 1, and 2 units at the end of the day are 0 = 0.3, 1 = 0.4, and 2 = 0.3,
respectively. Then,
E [net profit] =
2
X
E [net profit|I = k] P (I = k) =
k=1
P = 0.7 0.3 0
0.3 0.4 0.3
Therefore, under the (0, 2) inventory control policy, the long-run probabilities of
having 0, 1, and 2 units at the end of the day are 0 = 49/110, 1 = 40/110, and
2 = 21/110, respectively. Then,
E [net profit] =
2
X
E [net profit|I = k] P (I = k) =
k=1
= (7.8)
40
110
+ (10)
70
110
= 9.2.
43