0% found this document useful (0 votes)
74 views88 pages

Stochastic System Notes 2011

This document provides a summary of key concepts from two lectures on stochastic models for manufacturing and service systems. The first lecture introduces the newsvendor problem and how to determine the optimal order quantity to maximize expected long-run profit. The second lecture further explains how to calculate the expected profit for a given order quantity using the demand distribution and formulate the optimization problem to find the quantity that maximizes expected profit.

Uploaded by

Vansh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views88 pages

Stochastic System Notes 2011

This document provides a summary of key concepts from two lectures on stochastic models for manufacturing and service systems. The first lecture introduces the newsvendor problem and how to determine the optimal order quantity to maximize expected long-run profit. The second lecture further explains how to calculate the expected profit for a given order quantity using the demand distribution and formulate the optimization problem to find the quantity that maximizes expected profit.

Uploaded by

Vansh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Stochastic for Manufacturing and Service Systems

Lectures by Jim Dai, TeX by Hyunwoo Park


(These lecture notes were initially taken in Spring 2011 by Hyunwoo Park)
Spring 2011

Abstract
A note to ORIE 3510/ORIE 5510/STSCI 3510 students: these lecture notes were
taken and compiled in LATEX by Georgia Tech Ph.D. student Hyunwoo Park in Spring
2011. Jim Dai has not edited this document. Some of the terms and examples have
not been updated yet, even though they are out of context for ORIE 3510. When in
conflict, the class notes in current ORIE 3510 (during Spring 2022) have the precedence
over this document.

1 Lecture 1: Jan 13
1.1 Newsvendor Problems
It used to be called “Newsboy Problem.” This class of problems helps you determine optimal
ordering policy facing uncertain demand while selling price is fixed (deterministic). Imagine
you are starting a new business selling New York Times on campus. First of all, each day,
you need to decide how many copies you need to get from the publisher. This is a decision
variable. Variable cost is also associated with your decision. Say cv = $0.25. You also
need to decide how much you sell this newspaper. The publisher may place a constraints
on minimum price of papers. Say cp = $1.00. How about left-over copies? Most likely, the
value of left-overs is zero. However, it could be positive or negative depending on situation.
Say cs = $0.00. Also, cs < cv , otherwise you can make risk-free profits by selling left-overs
back to the publisher. You also need to know the characteristic of demand. Denote daily
demand by D. Say E(D) = 50. It can be impractical to obtain the exact distribution of D,
but you can try or approximate it at least.

d 10 15 20 25 30
1 1 1 1 1
P(D = d) 4 8 8 4 4

Although you figure out the distribution, you do not know how many demand there will
be on each given day. Your company could try to change the distribution by, for example,

1
lowering the price. However, the uncertainty here is intrinsic and let us assume that the
price is fixed.
Finally, the question comes down to “How many copies should you order each evening?”.
However many you decide to order, your decision will probably turn out to be wrong.
A few years ago, I went to Intel. Its manufacturing process is one of the most complex
manufacturing system that human being ever constructed. The entire process takes six
weeks. Every day each step, you will have intermediate products. In this setting, you cannot
manage this kind of operation without having a model. In this case, what would be the
right question to answer? Profit? Maximizing the profit? In detail, it should be maximizing
long-run average profit per day.
Not in this lecture but in the next, we will discuss how to obtain the optimal amount of
order each day. Now let us explore the option that we order 25 copies flat every day.

Day 1 2 3
D 10 25 15
Profit 10 − 6.25 = 3.75 25 − 6.25 = 18.75 15 − 6.25 = 8.75

Although you are always making profit in this case, you can imagine that you can conceive
the ordering policy that generates loss instead of profit on a given day. The average profit
over 100 days will be
3.75 + 18.75 + 8.75 + · · ·
.
100
In symbolical notation,
p 1 + p2 + p3 + · · · + pn
Pn = ,
n
and assuming that demand of a day is iid and our order is constant, the long-run average
profit will be
p1 + p2 + p3 + · · · + pn
lim Pn = lim .
n→∞ n→∞ n
By the law of large numbers, we know that the limit exists and equals to the expectation of
p1 , E[p1 ].
Computer simulation generates large numbers of random numbers once you know the
distribution of demand instead of yourself. You might ask if a random number generated by
computer is truly random. It is not in a sense that you can replicate the sequence by using
the same random seed. However, it appears to be random and passes various statistical
tests.
Anyway, if two persons simulate the situation with the same demand distribution, the
long-run average profit should be same even though actual realizations between two persons
differ. This is the power of the strong law of large numbers.
 
p1 + p 2 + p3 + · · · + pn
P lim = E[p1 ] = 1
n→∞ n

2
I am stressing this here because this is the underlying assumption on the following lec-
tures. Textbook usually tells you to maximize the expectation, but does not tell you why.
This is why.
So in the end, we need to maximize expected profit E(P ). The formula for the profit is

P =(D ∧ y)Cp − yCv + (y − D)+ cs ,


P =(D ∧ y)($1.00) − y($0.25) + (y − D)+ cs

where a ∧ b = min(a, b) and a+ = max(a, 0). You will be asked to obtain y which maximizes
P.
I will be away to Netherlands next week, so TAs will be cover instead of me.

3
2 Lecture 2: Jan 18
The instructor for this week will be Mr. Shuangchi He. Students can write an email to
[email protected] to ask questions about these two lectures.

2.1 Newsvendor Problem


Suppose I am selling newspapers at the school. Recall that yesterday’s newspaper is worthless
today. If I know the exact demand for each day and order papers the exact amount, it will
be perfect. However, that is usually not the case in the real world. Using statistics, we could
know the distribution of the demand though.
Let us start with defining our notation. For a specific day,

D =the demand of the period (random variable: we know the distribution)


y =quantity of order
Cv =buying price of one item from the supplier
Cp =selling price of one item to the customer
Cs =the salvage value of each leftover item
(When Cs < 0, it means that we pay money to get rid of the leftover items).

Usually, Cs < Cv < Cp . The basic metrics we are interested in would be:

(i) How many papers are sold? y ∧ D = min{y, D}

(ii) How many papers are leftover? (y − D)+ = max{0, y − D}

We can deduce the formula of profit from the metrics above.

profit =revenue − cost


P (D, y) =[Cp (y ∧ D) + Cs (y − D)+ ] − [Cv y] (2.1)

We can also think of an alternative expression. For each paper sold, I earn Cp − Cv . On the
other hand, I lose (Cv − Cs ) per paper at the end of the day.

profit =money I earn − money I lose


P (D, y) =(Cp − Cv )(y ∧ D) − (Cv − Cs )(y − D)+ (2.2)

Remark 2.1. (i) Cp − Cv is the profit margin of each sold item.

(ii) Cv − Cs is the loss of each leftover item.

Exercise 2.1. Show that (2.1) and (2.2) are equivalent.

4
In the long run, I am interested in the average profit I make. If the demands of all days
are independent and identically distributed (iid), then by the law of large numbers,

the long run average profit per day ≈ E[P (D, y)].

You should understand that the important part of the approximation above is that LHS is
average of “many” periods while RHS is the expectation for “one” period.
Now let g(y) = E[P (D, y)] that is the expected profit when I have y papers at the
beginning of the day. In the expression E[P (D, y)], D is the random factor and we take
expectation with respect to D. We want to maximize the profit, so our optimization problem
would be formulated as

max g(y) = (Cp − Cv )E[y ∧ D] − (Cv − Cs )E[(y − D)+ ].


y

Remark 2.2. (i) Expected profit is relevant only when the system is managed repeatedly
over many periods.

(ii) If you manage the system for only one or a few periods, maximizing the expected profit
may not make sense.

(iii) In this setting, the optimal order quantity y ∗ that maximizes g(y) should be used for
every period.

Then, how do we compute E[y ∧ D] and E[(y − D)+ ] in order to compute g(y) given y?

Example 2.1. Assume that D follows the following distribution and y = 30, 24 for example.

d 20 25 30 35
P[D = d] 0.1 0.2 0.4 0.3
30 ∧ d 20 25 30 30
(30 − d)+ 10 5 0 0
24 ∧ d 20 24 24 24
(24 − d)+ 4 0 0 0

Then,

E[(30 ∧ D)] =20(0.1) + 25(0.2) + 30(0.4) + 30(0.3) = 2 + 5 + 12 + 9 = 28


E[(30 − D)+ ] =10(0.1) + 5(0.2) + 0(0.4) + 0(0.3) = 1 + 1 = 2.

Also,

E[(24 ∧ D)] =20(0.1) + 24(0.2) + 24(0.4) + 24(0.3) = 23.6


E[(24 − D)+ ] =4(0.1) + 0(0.9) = 0.4.

Exercise 2.2. Verify that (y ∧ D) + (y − D)+ = y. Then, E[y ∧ D] + E[(y − D)+ ] = y.

5
Example 2.2. Let D ∼ Uniform(20, 40). What would E[25∧D] and E[(25−D)+ ] be? From
uniform distribution, we have
(
1/20, if 20 ≤ x ≤ 40
f (x) =
0, otherwise.

Then,
Z 40 Z 40
1
E[25 ∧ D] = (25 ∧ x)f (x)dx = (25 ∧ x)dx
20 20 20
Z 25 Z 40
1 1
= xdx + 25dx
20 20 25 20
1 1  25
= 252 − 202 + 15
20 2
Z 40 20 Z
25
1
E[(25 − D)+ ] = (25 − x)+ f (x)dx = (25 − x) dx
20 20 20
Z 25 Z 25 
1
= 25dx − xdx
20 20 20
 
1 1 2 2
= 25 · 5 − (25 − 20 ) .
20 2

6
3 Lecture 3: Jan 20
You learned E[y ∧ D], E[(y − D)+ ] in last class. In Homework 2, you will have to compute
E[(8 − D)− ], which is the shortage cost that will be incurred if you run out of stock. Let us
make it clear here about x+ and x− .
(
x, if x ≥ 0
x+ = max{x, 0} = ,
0, if x < 0
(
− −x, if x ≤ 0
x = max{−x, 0} =
0, if x > 0

For example, 7+ = 7, (−7)+ = 0, 7− = 0, (−7)− = 7. Therefore, for every real number x,


x = x+ − x− .

3.1 Newsvendor Problem Cont’d


The tradeoff in newsvendor problems is as follows.

• If you order too much, you have unused inventory that will be perished after one period.

• If you order too little, you cannot sell even when excess customers are willing to buy.

Key variables are:

(i) Cp : selling price

(ii) Cv : buying price

(iii) Cs : salvage value

(iv) D: the demand of one period (random variable)

(v) y: order quantity

Also note that Cs < Cv < Cp .


Find the optimal order quantity y + to maximize the expected profit.

g(y) = E[P (D, y)] = (Cp − Cv )E[y ∧ D] − (Cv − Cs )E[(y − D)+ ]

Let us now find out the optimal order quantity y ∗ . Let F (x) be the cumulative distribu-
tion function of D.

F (x) = P[D ≤ x]

7
First, D is a continuous random variable with pdf f (x).
Z x
F (x) = f (t)dt
Z0 ∞ Z y Z ∞
E[y ∧ D] = (y ∧ x)f (x)dx = f (x)xdx + f (x)ydx
0 0 y
Z ∞ Z y
+ +
E[(y − D) ] = f (x)(y − x) dx = f (x)(y − x)dx
0 0
Z ∞ Z ∞  Z y 
f (x)ydx =y f (x)dx = y 1 − f (x)dx
y y 0

Hence, the profit function becomes

g(y) =(Cp − Cv )E[y ∧ D] + (Cv − Cs )E[(y − D)+ ]


Z y Z ∞  Z y
=(Cp − Cv ) f (x)xdx + f (x)ydx − (Cv − Cs ) f (x)(y − x)dx
0 y 0
Z y  Z y  Z y
=a f (x)xdx + y 1 − f (x)dx −b f (x)(y − x)dx
0 0 0
Z y Z y
=(a + b) f (x)xdx − (a + b)y f (x)dx + ay
0 0

where a = Cp − Cv , b = Cv − Cs . Set g 0 (y) = 0 to find y ∗ using a theorem from calculus.

Theorem 3.1 (Fundamental Theorem of Calculus). For a function,


Z y
H(y) = h(x)dx,
c

we have H 0 (y) = h(y), where c is a constant.

Then,
Z y
0
g (y) = (a + b)yf (y) − (a + b)yf (y) − (a + b) f (x)dx + a = −(a + b)F (y) + a.
0

Setting g 0 (y ∗ ) = 0, we get

a Cp − Cv Cp − Cv
F (y ∗ ) = = = .
a+b Cp − Cv + Cv − Cs Cp − Cs

To verify g(y ∗ ) is maximum, take the second derivative. g 00 (y) = −(a + b)f (y) = (Cp −
Cv )(−f (x)) < 0 implying that g(y) is concave. So g(y ∗ ) must be a maximum!

8
g(y*)
g(y)

y*

Example 3.1. Cp = 30, Cv = 10, Cs = 5


(
1
, if 5 ≤ x ≤ 10
f (x) = 5 ,
0, otherwise

0,

Rx 1
if x < 5
F (x) = 5 5
dy = 15 (x − 5), if 5 ≤ x ≤ 10

1, if x > 10

Let us turn our attention to the discrete case. D is a discrete random variable taking
values in {d0 , d1 , d2 , · · · } with pmf P[D = di ] = pi .

d d0 d1 d2 ···
P[D = d] p0 p1 p2 ···

pi . For discrete D, an optimal order quantity y ∗ is the smallest


P
Then, F (x) = P[D ≤ x] = i
y such that
Cp − Cv
F (y) ≥ . (3.1)
Cp − Cs
Remark 3.1. (i) Because D can only take values in d0 , d1 , · · · , the above y ∗ must be one
of di ’s.

(ii) For the continuous demand case,


Cp − Cv
F (y ∗ ) = , (3.2)
Cp − Cs
we can see (3.2) is a special case of (3.1).

9
Example 3.2. Discrete demand, D, is

d 20 25 30 35
P[D = d] 0.1 0.2 0.4 0.3

Also, Cp = 1, Cv = 0.25, Cs = 0. Then, y ∗ is the smallest y such that F (y) ≥ 0.75. Since
F (20) = 0.1, F (25) = 0.3, F (30) = 0.7, F (35) = 1, y ∗ = 35.

3.1.1 Summary of the Solution to the Newsvendor Problem


(i) When D is a continuous random variable, choose y ∗ such that

Cp − Cv
F (y ∗ ) = .
Cp − Cs

(ii) When D is a discrete random variable, choose y ∗ to be the smallest y such that

Cp − Cv
F (y) ≥ .
Cp − Cs

Example 3.3. Discrete D

d 0 10
P[D = d] 0.5 0.5

Cp = 2, Cv = 1, Cs = 0

2−1
F (y) ≥ = 0.5 ⇒ y ∗ = 0
2
For an arbitrary 0 ≤ y ≤ 10,

g(y) =(Cp − Cv )E[y ∧ D] − (Cv − Cs )E[(y − D)+ ]


=E[y ∧ D] − E[(y − D)+ ] = 0
E[y ∧ D] =(0.5)(0) + 0.5y
E[(y − D)+ ] =0.5y

Every y is optimal.

10
4 Lecture 6: Feb 1
Before beginning today’s lecture, I inform you to register on the Littlefield game this week.

4.1 Queues and Waiting Times


Think about a service system. All of you must have experienced waiting in a service system.
One example would be the Student Center or some restaurants. This is a human system. A
bit more automated service system that has a queue would be a call center and automated
answering machines. We can imagine a manufacturing system instead of a service system.
These waiting systems can be generalized as a set of buffers and servers. The number of
servers can vary from one to many to infinity. The size of buffer can also be either finite or
infinite. To simplify the model, assume that there is only a single server and we have infinite
buffer. By infinite buffer, it means that space is so spacious that it is as if the limit does not
exist. (The first game you will be playing is a production system.)
In this setting, what would be the performance measures of such systems?

(i) # of servers

(ii) processing times

(iii) inter-arrival times

(iv) expected time spent in the buffer (i.e. waiting time in queue excluding service time)

The general notation of a queueing system is G/G/1. The first G means that inter-
arrival time is in general distribution. The second G means that processing time is in
general distribution. The last 1 means single server system. A server is assumed to work on
one customer at a time.
Note that there are two views on this system. One is the manager’s perspective and the
other is customers’ standpoint. The performance measures above are important from the
manager’s view. Each customer cares more about how much time he/she should be waiting.
In this queueing system, there are two inputs: inter-arrival time and processing time.
Let vi be the processing time of the i-th customer and ui be the inter-arrival time between
the (i − 1)-th customer and the i-th customer. Given

{ui : i = 1, 2, · · · },
{vi : i = 1, 2, · · · },

we can say that the dynamics of this queue is known. By dynamics, we mean

Q(t) =# of customers in queue at time t,


Z(t) =# of customers in system at time t

can be figured out at any time t.

11
Example 4.1. Assume that the system is empty at t = 0. Assume that

u1 = 1, u2 = 3, u3 = 2, u4 = 3, u5 = 4
v1 = 4, v2 = 2, v3 = 1.

Let us draw the graphs of Q(t) − t and Z(t) − t.


3 3

2 2

Q(t)
Z(t)

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

t t

When we say Q(1), it is ambiguous. Let us make it clear here. At t = 1− , the first
customer has not arrived yet, so Q(1− ) = 0. Also, right after t = 1, Q(1+ ) = 1. Let
Q(t) = Q(t+ ) so that this graph becomes right-continuous.
You need to keep your eyes on Q and Z over time and your manager may want to know
the average queue size over [0, T ] time window. Let T = 8.5.

1 T
Z
1
Average Queue Size = sum of Q’s = Q(t)dt
T T 0
1 1 2
= area under Q from 0 to T = (1 + 1) =
T T 8.5
The last number, average queue size, fluctuates over time. However, we can imagine that
if we observe enough customers, this number may settle down. Let us generalize this idea.
Let wi be the waiting time of i-th customer and wn be the average waiting time for the first
n customers.
w1 + w2 + · · · + wn
wn =
n
Assuming we look at enough customers, we can send n to ∞.

wq = lim wn
n→∞

Example 4.2. Let w1 = 0, w2 = 1, w3 = 1, w4 = 0. Then,


0+1+1+0 1
w4 = = minutes.
4 2
Since you can compute wi given ui and vi , you should be able to solve the questions in
homework 3.

12
4.1.1 Lindley Equation
In fact, even you do not have to compute wi one by one.

wn+1 = (wn + vn − un+1 )+

If wn + vn ≤ un+1 , it means that (n + 1)-th customer does not have to wait and will go to
server as soon as he/she arrives. It is called the Lindley equation. You can prove this using
a spreadsheet software drawing three columns: ui , vi , wi .
Intel’s manufacturing process usually takes about six weeks to finish a product. Each
process is composed of many subprocesses. Since each machine is very expensive, running
those machines as most of times as possible is desirable. So, managers used to generate much
more parts than a machine can process so that the machine always has something to work
on. These decisions can be modeled and optimized using queueing theory as well.

4.1.2 Simulation of a Simple Queue


Using Matlab, we can simply run a simulation.

>> u = unidrnd(6,1,10)
>> mean(u)
>> v = unidrnd(5,1,10)
>> mean(v)

We threw a six-sided die for u and a five-sided die for v. You can use other distributions
such as Gamma distribution which will be taught in the next class.

>> u = gamrnd(5,10,10000);
>> v = gamrnd(5,10,10000);

After creating u and v, compute w using the Lindley equation.

>> w = zeros(1,10);
>> for i=1:10
>> if i==1 w(i)=0;
>> else w(i)=max(w(i-1)+v(i-1)-u(i),0);
>> end
>> end
>>
>> sum(w)/10
>> mean(w)

13
5 Lecture 7: Feb 3
5.1 G/G/1 Queue Cont’d
In last lecture, we introduced G/G/1 queue. For this system, we needed two sequences of
random variables.

{ui : i = 1, 2, · · · } inter-arrival times


{vi : i = 1, 2, · · · } service times

These two sequences are the input for the queueing system. Performance measures for this
system were as follows.

wi waiting time of the i-th customer


Q(t) Queue size at time t

Imagine a service queue of the Bank of America. Each day, ui , vi may differ. How would
you deal with the changing input which seems to be totally random? We need to have
something invariant to make the model more useful. Although each day’s actual sequences
of ui , vi may be different, they may be statistically same. That is why we need to look at
the distribution of the randomness.
Assume that {ui : i = 1, 2, · · · } is a sequence of iid (independent and identically dis-
tributed) random variables having distribution

Fa (x) = P(u1 ≤ x) = P(u2 ≤ x) = P(u3 ≤ x) = · · · = P(ui ≤ x) = · · · .

(The subscript a signifies “arrival”.) Later equalities in the equation above hold because of
the iid condition. iid means
(i) how long it took for the previous customers to arrive does not affect the time it takes
for the next customer to arrive, i.e., inter-arrival times are independent

(ii) how long it takes for each customer to arrive follows the exactly same distribution.
Also, assume that {vi : i = 1, 2, · · · } is a sequence of iid r.v.’s having cdf Fs (x) where the
subscript s signifies “service”.
For example, if you have vi for each day as follows,

Monday v1 , v2 , · · · = 1, 2, 3, 3, 5, 2, 2, 2, 1, · · ·
Tuesday v1 , v2 , · · · = 5, 1, 4, 2, 1, 1, 2, 3, · · ·

without some commonality for the input, our model is quite useless. That is why we need
iid assumption so that we can assume that each different realizations actually come from the
same distribution.
Now we have distributions for r.v.’s ui , vi . Where do we start with this? Maybe we can
start looking at the expectation of each r.v.

14
Example 5.1. For example,

E(ui ) =mean inter-arrival time = 2min


E(vi ) =mean service time = 4min.

In this case, we are short of capacity and the queue will grow over time. How about the
arrival rate which means how many customers arrive during unit time?
1
arrival rate λ =
E(u1 )
mean processing time m =E(v1 )

Then, we can define the “traffic intensity” using the two variables defined above.

traffic intensity ρ = λm

Example 5.2. For example,


1 2
λ = customers/min, m = 2min/customer ⇒ρ= .
3 3
Note that ρ is dimensionless. Also, another metric we will be interested in utilization of the
server. In this case, the proportion of server’s time that is used to serve customers equals to
2
3
.
Then, why don’t we call ρ utilization instead of traffic intensity? In fact, utilization just
happens to be same as ρ. In the first example, we have ρ = 2 > 1 which implies that
utilization is 100%. Utilization cannot go over 100%. We can summarize this as
(
ρ, if ρ ≤ 1
utilization = .
100%, otherwise

5.2 Fluctuation Effect


Let us move on to another example. Say we have the following ui , vi .

u1 = u2 = · · · = un = 3 ⇒ E[ui ] = 3, ∴ λ = 1/3
v1 = v2 = · · · = vn = 2 ⇒ E[vi ] = 2, ∴ m = 2 ⇒ ρ = 2/3

What would be the values of w1 , w2 , · · · ? Obviously, w1 = w2 = · · · = 0 because there is no


randomness here. Therefore,
w1 + w2 + · · · + wn
wn = =0 ⇒ lim wn = 0 = wq .
n n→∞

Then, in general, is it true that w = 0 always holds if ρ ≤ 1? The answer is NO,


because we are not given the nature of randomness regarding ui , vi . There is a formula
called Kingman approximation formula for average waiting time.

15
Theorem 5.1 (Kingman heavy-traffic approximation formula for average waiting time).
Assume ρ < 1 and ρ is close to 1.
 2
ca + c2s
 
ρ
wq ≈ m
1−ρ 2
where
Var(v1 )
c2s = , squared coefficient of variation (SCV)
[E(v1 )]2
Var(u1 )
c2a =
[E(u1 )]2
Let us look at c2s first. Variance is already one way to measure variability of a random
variable. Then, why don’t we just use variance of v1 instead of newly introduced metric? It
is normalization. If we use variance as it is, the numeric value of the variance will depend
on the time unit we measure. If Var(v1 ) = 100sec2 , the variance value will shrink a lot if we
use hour as the time unit. Variability of a r.v. is intrinsic and our intuition tells us that the
measure for the variability should be intrinsic independent of time unit we use. That is why
we divide the variance with squared expectation.
Now look at the approximation formula itself. If you look at the second term involving ρ,
you can see that ρ/(1−ρ) → ∞ as ρ ↑ 1. The third term captures variabilities of inter-arrival
times and service times.
Another thing that this formula has great practicability because this formula involves
only first two moments of Fa , Fs , E[u1 ], E[u21 ], E[v1 ], E[v12 ]. In light of data collecting, it is
much much easier to obtain just first few moments of a r.v. than the whole distribution.
How much can we trust this formula because it is approximation? In practice, if ρ >
85%, this formula is quite a good approximation. That is why it is called “heavy-traffic”
approximation. Then, what if ρ << 1? In such case, waiting time is usually very low, so
you either don’t care about it or it is not that important part you need to pay attention.
Are you reading the assigned book, The Goal? It is about identifying bottlenecks in a
plant. When you become a manager of a company and are running a expensive machine,
you usually want to run it all the time with full utilization. However, the implication of
Kingman formula tells you that as your utilization approaches to 100%, the waiting time
will be skyrocketing. It means that if there is any uncertainty or random fluctuation input
to your system, your system will greatly suffer. In lower ρ region, increasing ρ is not that
bad. If ρ near 1, increasing utilization a little bit can lead to a disaster.
Atlanta, 10 years ago, did not suffer that much of traffic problem. As its traffic infras-
tructure capacity is getting closer to the demand, it is getting more and more fragile to
uncertainty.
A lot of strategies presented in The Goal is in fact to decrease ρ. You can do various
things to reduce ρ of your system by outsourcing some process, etc.
You can also strategically manage or balance the load on different parts of your system.
You may want to utilize customer service organization 95% of time, while utilization of sales
people is 10%.

16
5.3 Long-run Average Queue Size
Queue size is another important performance metric for managers. The long-run average
queue size per unit time can be computed by the formula,

1 T
Z
lq = lim Q(s)ds.
T →∞ T 0

Recall there was another metric w. This w averages over n customers, while Q averages
over time. Hence,

lq time average
wq headcount average.

Now move on to another important law.

Theorem 5.2 (Little’s Law).

INPUT ⇒ BLACK BOX ⇒ OUTPUT

Let

w =expected system time in the box per customer in steady-state


l =average size in the box
⇒ l =λw

where λ = rate into the box = rate out of the box.

If the system of focus is queue,

lq = λwq .

The Little’s Law is much more general than G/G/1 queue. It can be applied to any black
box with definite boundary. The Georgia Tech campus can be one black box. ISyE building
itself can be another. In G/G/1 queue, we can easily get average size of queue or service
time or time in system as we differently draw box onto the queueing system.

17
Average size in system = [Average time in system]

Average size in queue = [Average waiting time in queue]

Kingman’s formula already tells us a lot and I will talk about other important perfor-
mance measures next time.

18
6 Lecture 8: Feb 8
Today we will connect what we have learned with each other.

(i) Kingman approximation formula

(ii) Little’s Law

(iii) Bottlenecks in production

6.1 Kingman’s Approximation Formula

c2a + c2s
 
ρ
E(w) = w ≈ m ρ<1
1−ρ 2

where m is mean service time and ρ = λm is traffic intensity. When ρ < 1, ρ is the utilization
of the system.
Let vi ∼exp with mean 2 minutes, then it is a continuous r.v. with pdf
1 1
f (x) = e− 2 x = λe−λx , x ≥ 0.
2
E(vi ) = 2, Var(vi ) = (1/λ)2 = 4, c2 (vi ) = 4/22 = 1. If c2 (vi ) = 0, vi is deterministic because
it means variance is zero. Let us learn about the gamma distribution.

vi ∼ Gamma(α, β)

Here, α is shape parameter and β is scale parameter. A gamma distribution can be expressed
as sum of exponential random variables. If X ∼Gamma, X = X1 + X2 + · · · with Xi are iid
and exponentially distributed with mean β.

Example 6.1. Let us experiment our service system with 100,000 customers.

n=100000

Then, generate u first with the gamma distribution.

>> u=gamrnd(1.0, 6, 1, n)
>> mean(u)
6.0030
>> var(u)
15.7406
>> var(u)/mean(u)^2
0.9915

19
In u, α = 1.0, β = 6.

>> v=gamrnd(.5, 10, 1, n)


>> mean(v)
4.9924
>> var(v)/mean(v)^2
2.0102

Since the mean of gamma r.v. is αβ, the mean should be around 5. Every six minute, there
will be an arrival.

>> for i=2:n


w(i) = max(w(i-1)+v(i-1)-u(i), 0)
>> mean(w)
37.3025

Then, some customers do not have to wait, but some have to. The average waiting time is
around 37 minutes per customer. Let us look at what Kingman’s formula tells us.

>> rho=5*(1/6.0)
>> rho
0.8333
>> m=5
>> m*rho/(1-rho)*(1+2)/2
37.5000

It is quite accurate as you are seeing now. ρ = λm = 56 . Then, how do we get the parameters
for your model? Here you need to observe your system yourself and collect data and derive
parameters for your model maybe using some statistics. The beauty of Kingman’s formula is
that you do not have to get the whole distribution of randomness in your system. Estimating
the entire distribution is much harder than estimating the mean and variance of random
variables.

6.2 Little’s Law


Example 6.2. Suppose u(i) = 1, 3, 2, 3, 4 and v(i) = 4, 2, 1. Plot Z(t) vs. t. If T = 9, what
is the average number of customers in the system in [0, 9)?
Z T
1 1 9
L= Z(s)ds = (0 + 3 + 2 + 1 + 2 + 1 + 0) = = 1
T 0 9 9

20
More than 1 customer is in the system sometimes, but we have the average of 1. Now think
about time spent in the system for each customer. S1 = 4, S2 = 3, S3 = 2, so
1 1
S = (S1 + S2 + S3 ) = (4 + 3 + 2) = 3minutes.
3 3
How about the arrival rate?
3 1
λ= =
9 3
What does the Little’s Law tell us?

average number of customers in the queue =arrival rate × average waiting time
L =λw
? 1
1= 3
3
It does work. The beauty of the Little’s Law is that, as long as the measurements for L, λ, w
are consistent, it can be applied to any black box system. Also note that L, w mean that we
compute the average over the long period of time.

6.3 Bottleneck in Production


Suppose a production system consisting of two queues and one is depending on the other.
Let λ be the arrival rate into the first queue and m1 , m2 be the average service time.

Example 6.3. If λ = 1/4, m1 = 2, m2 = 3, then utilization of queue 1 would be 50% and


that of queue 2 would be 75%.

Raw material Oven Paint

6.4 Throughput
Throughput is the rate of output flow from a system.

• If ρ ≤ 1, throughput= λ.

• If ρ > 1, throughput= µ.

Example 6.4. Suppose a system with two queues linked in a series like in the previous
example.

21
(i) If λ = 15 units per minute, µA = 20 units per minute, µB = 25 units per minute,
compute the throughput of the whole system.
λ 15 15
ρA = = = .75, ρB = = .6
µA 20 25
Since both traffic intensities are less than 1, throughput of A is λ = 15 and that of B
is also λ = 15.

(ii) Suppose that λ = 30 units per minute while everything else remains the same.
20
ρA = 1.5, ρB = = .8
25
Throughput of A is µA = 20, so arrival rate of station B is also 20 not 30. Thus, B’s
traffic intensity is .8 ≤ 1. Throughput of B is 20, and it is also the throughput of the
system because B is the terminal station of the system.

22
7 Lecture 9: Feb 10
7.1 The Wisdom of Jonah
The most important part of this book is how to manage bottlenecks. First of all, identifying
bottlenecks then managing bottlenecks. Three quantities mentioned in the book.
(i) Throughput: The rate at which the system generates money through sales. If you
produce something but do not sell it, it is not throughput.
(ii) Inventory: All the money that the system has invested in purchasing things that is
(iii) Operational Expense
Most important part of this book is capacity management.
(i) Identify the system bottlenecks (constraints).
(ii) Increase effective capacity of the system bottlenecks.
(iii) Adjust product mix, if appropriate.
If you are selling vehicles: cars and trucks, you could sell half and half or 70:30. The
thing is that if you change your product mix, the bottleneck may shift because each
product requires different resources.
(iv) Subordinate everything else to the bottlenecks (for example, drum & ropes).
Team has to march together.
(v) If, in a previous step, a bottleneck constraints has been broken, return to step 1.
There’s always a bottleneck. If you don’t know, you may be the one.
Then, how to increase effective capacity of bottlenecks?
(i) Never let the bottleneck be idle because throughput of system is equal to throughput
of bottleneck. Do not overload the bottleneck with too much work in progress. Let
bottleneck “pull” work from buffer.
(a) Never lack an operator, change lunch and break schedule, etc.
(b) Maintain a buffer inventory in front of the bottleneck so that it is never starved.
(ii) Squeeze as much output as you can from the bottleneck.
(a) Use you best workers at the bottleneck
(b) Reduce bottleneck setup time.
(c) Split and overlap batches to increase throughput
I will post the slides to t-square.

23
7.2 Discrete Time Markov Chain
We are now moving on to another part of the course. We have three more modules. After
DTMC, we will be talking about Poisson process and then continuous-time Markov chain.
So far, we had only one decision, because item is perishable. You had to get rid of leftover.
Now, we will be talking about non-perishable items.

Example 7.1 (Inventory Model for Non-Perishable Item). Dn is the demand in the nth
period (days or weeks). Note that inventory that is left at the end of a week can be used to
satisfy the demand in the following week. For example, {Dn } is an iid sequence.

d 0 1 2 3
1 1 1 1
P(Dn = d) 8 4 2 8

Let Cp =$100, Cv =$50, Cf =$100, h=$10. What is the optimal inventory policy to maximize
the long-term average profit per week? Do not be confused this with the newsvendor problem.
At that time, you had to get rid of items at the end of each period.
How do you analyze the performance of an inventory policy? Every Friday 5pm, let’s say
we decide how much to order for the following week so that the ordered items will arrive at
8am the following Monday.
What do you symbolize an example inventory policy? Let Xn be the inventory level at
the end of period n. Say if Xn ≥ 2 do not order. If Xn < 2, order up to S = 4 items. This
type of policies is called (s, S) policy. In general, we can define it as

(i) If Xn ≤ s, order S − Xn items.

(ii) Otherwise, do not order.

This is very popular policy. Virtually every company has some version of this policy. To
decide what value s, S should take, you need some analysis. Once you’ve done this class,
you will know how to decide that.
We begin analysis with building a table. Let

24
8 Lecture 10: Feb 15
8.1 Discrete Time Markov Chains (DTMCs)
Example 8.1. Assume iid demand

d 0 1 2 3
P(D = d) 1/8 1/4 1/2 1/8

In the real world, demand is usually not iid. There could be some seasonality. But, in some
business like WalMart selling items as cheap as possible, demand can be modeled as iid.
Assume our inventory policy to be (s, S) = (1, 4). Let Xn be inventory level at the end
of week n. Note that values that Xn can take is in {0, 1, 2, 3, 4}. Does there exist s such
that {Xn : n = 1, 2, · · · } is iid sequence? It is time series.
Consider the following probability. What would the value be?

P(Xn+1 = 3|Xn = 2) = 0

Xn = 2 means that nth week ends with 2 items and Xn+1 = 3 means that (n + 1)th week
ends with 3 items. How about the following probability?
1
P(Xn+1 = 3|Xn = 1) = P(Dn+1 = 1) =
4
Note that we will order according to our inventory policy over the nth weekend, so the
demand for (n + 1)th week should be 1. Matrix is a good way to denote these probability in
a neat form.

0 1 2 3 4
0 0 18 12 41 81
 
1 1 1 1
1 05 81 21 4 8 
2 81 41 81 01 0 

3  8 2 4 8 0
4 0 18 12 41 81

Let us formally define DTMC. A DTMC has the following elements.

(i) State space S, e.g. S = {0, 1, 2, 3, · · · }: You will see that S does not have to be finite.
P
(ii) Transition probability matrix P = (Pij ) such that Pij ≥ 0 and j∈S Pij = 1: This is
the matrix you just saw above. (The sum of row should be 1 but the sum of column
does not have to be.)

(iii) Initial state (distribution): This is how much inventory you are given at the starting
point. It is the information about X0 .

25
Definition 8.1 (Discrete Time Markov Chain). A discrete time stochastic process X =
{Xn : n = 0, 1, 2, · · · } is said to be a DTMC on state space S with transition matrix P if for
each n ≥ 1 for i0 , i1 , i2 , · · · , i, j ∈ S
P(Xn+1 = j|X0 = i0 , X1 = i1 , X2 = i2 , · · · , Xn−1 = in−1 , Xn = i) = Pij . (8.1)
The most important part of this definition is (8.1). At this point, let us recall the
definition of conditional probability.
P(A ∩ B) P(A, B)
P(A|B) = =
P(B) P(B)
Note that comma and ∩ are interchangeable in this context. This (8.1) is called the Markov
property. In plain English, it says that once you know today’s state, tomorrow’s state has
nothing to do with past information. No matter how you reached the current state, your
tomorrow will only depend on the current state. In mathematical notation,
P(Xn+1 = j|X0 = i0 , X1 = i1 , X2 = i2 , · · · , Xn−1 = in−1 , Xn = i) = P(Xn+1 = j|Xn = i).
(i) Past states: X0 = i0 , X1 = i1 , X2 = i2 , · · · , Xn−1 = in−1
(ii) Current state: Xn = i
(iii) Future state: Xn+1 = j
Definition 8.2 (Markov Property). Given the current information (state), future and past
are independent.
From information gathering perspective, it is very appealing because you just need to
remember the current. For an opposite example, Wikipedia keeps track of all the histories
of each article. It requires tremendous effort. This is the beauty of the Markov property.
What if I think my situation depends not only the current but one week ago? Then, you
can properly define state space so that each state contains two weeks instead of one week. I
have to stress that you are the one who decides how your model should be: what the state
space is, etc. You can add a bit more assumption to fit your situation to Markov model.
This type of DTMC is called the time-homogeneous DTMC. It means that from this
week to next week is same as from next week to the following week.
Confirming question: is our inventory model DTMC? Yes, because we know that we do
not have to know the past stock level to decide whether to order or not.

8.2 DTMC Models


Example 8.2 (A Two State Model). S = {0, 1} and
   
0 α 1−α 3/4 1/4
P = =
1 1−β β 1/2 1/2
What can be modeled using this type of DTMC?

26
(i) Consider we are modeling the weather of a day. State 0 means hot and state 1 means
cold. Then, the probability of hot after a hot day is 3/4.

(ii) Another example would be machine repairing process. State 0 is the machine is up
and running and state 1 means machine is under repair.

Example 8.3 (A Simple Random Walk). Suppose you toss a coin at each time n and
you go up if you get a head, down if you get a tail. Then, the state space S = Z =
{· · · , −2, −1, 0, 1, 2, · · · } and Xn is the position after nth toss of the coin.

Pi,i+1 = p

Pij = Pi,i−1 = q

0, otherwise

You can see that P(Head) = p, P(Tail) = q, p + q = 1 and p can be bigger or smaller than q
in which case you are tossing a biased coin. Note that if Xn = i then Xn+1 is either i + 1 or
i − 1 as shown in the figure below.

Theorem 8.1. Suppose a function

f : S × (−∞, ∞) 7→ S, f (i, u) ∈ S

such that Xn+1 = f (Xn , Un ) and {Ui : i = 1, 2, 3, · · · } is an iid sequence. Then, {Xn : n =
1, 2, · · · } is a DTMC

I will not prove this here, but it is very useful theorem to see if something is a DTMC or
not.

Example 8.4. Let Xn+1 = Xn + Un+1 and Ui is a coin toss at time i. Suppose

P(Un+1 = 1) = p, P(Un+1 = −1) = q

then Xn is a DTMC.

Now see if our inventory model fits in this theorem. Let us express the inventory model
in a different way.
(
4 − Dn+1 , if Xn ≤ 1
Xn+1 = +
(Xn − Dn+1 ) , if Xn ≥ 2

27
Then,

f (1, d) =4 − d d ∈ {0, 1, 2, 3}
=f (0, d)
f (2, d) =(2 − d)+
f (3, d) =(3 − d)+
f (4, d) =(4 − d)+ .

28
9 Lecture 11: Feb 17
d 0 1 2 3
P(Di = d) 1/8 1/2 1/4 1/8

Let Xn denote the number of items at the end of week n. Our inventory policy is
(s, S) = (1, 4), which means if the inventory level goes below or equal to 1, you order up to
4. Then,
1
P(Xn+1 = 1|Xn = 3) = P(D = 2) = .
4
Likewise, the transition matrix of this DTMC is
 
0 0 1/8 1/4 1/2 1/8
1  0 1/8
 1/4 1/2 1/8

P = 2 3/8 1/2 1/8 0 0 
.
3 1/8 1/4 1/2 1/8 0 
4 0 1/8 1/4 1/2 1/8
Now, let Yn denote the number of items at the beginning of week n. Is Y = {Yn : n =
0, 1, 2, · · · } a DTMC? Yes. The state space S = {2, 3, 4} and the transition matrix is
 
2 1/8 0 7/8
P = 3 1/2 1/8 3/8 .
4 1/4 1/2 1/4
For example, let us look at P(Yn+1 = 4|Yn = 2). It corresponds with the cases where demand
is 1 or 2 or 3. Then, leftover on Friday night is less than or equal to 1, so you order up to 4
by the beginning of next week. Mathematically,
(
4, if Xn ≤ 1
Yn =
Xn , if Xn ≥ 2,
or,
(
4, if Yn − Dn ≤ 1
Yn =
Yn − Dn , if Yn − Dn ≥ 2.

9.1 Sample Path


Let X = {Xn : n = 0, 1, 2, · · · } be a DTMC with state space S = {0, 1, 2} and transition
matrix
 
0 0 1/4 3/4
P = 1 1/2 0 1/2 .
2 1 0 0

29
To simulate the DTMC, you need in general 3 dice. Then, a sample path of the DTMC may
be as follows.
3

S 1

0
0 1 2 3 4 5

n
Note that the samples of DTMC may be different. Now, suppose we are computing
P(X4 = 0|X2 = 0). How would you compute it?

P(X4 = 0|X2 = 0) =P(X4 = 0, X3 = 1|X2 = 0) + P(X4 = 0, X3 = 2|X2 = 0)


=P(X4 = 0|X3 = 1, X2 = 0)P(X3 = 1|X2 = 0)
+ P(X4 = 0|X3 = 2, X2 = 0)P(X3 = 2|X2 = 0)
=P(X4 = 0|X3 = 1)P(X3 = 1|X2 = 0)
+ P(X4 = 0|X3 = 2)P(X3 = 2|X2 = 0) ∵ Markov Property
=P01 P10 + P02 P20 in shorthand notation
11 3 7
= +1 =
24 4 8
In a sense, when you compute transition probability among multiple periods, you virtually
go through all possible paths from the starting state to the ending state.
2
X
P(X4 = 0|X2 = 0) = P(X4 = 0, X3 = k|X2 = 0)
k=0
X2
= P(X4 = 0|X3 = k, X2 = 0)P0k
k=0
2
X
= P(X4 = 0|X3 = k)P0k ∵ Markov Property
k=0
X2
= Pk0 P0k
k=0

Likewise, we can also compute P(X4 = 1|X2 = 0). But, how can we calculate such probabil-
ities more easily? In fact,
2
P(X4 = j|X2 = i) = P(i,j) = (i, j)th entry of P 2 .

30
  
0 1/4 3/4 0 1/4 3/4
P 2 = P · P = 1/2 0 1/2 1/2 0 1/2 = 0


1 0 0 1 0 0

In general,
(n)
n
P(Xn+k = j|Xk = i) = Pij = Pi,j
n
P(Xn = j|X0 = i) = Pi,j .

The following conditional probability formula can help you understand these types of
calculation.
P(A, B, C) P(A, B, C) P(B, C)
P(A, B|C) = = = P(A|B, C)P(B|C)
P(C) P(B, C) P(C)

To compute the distribution of Xn at particular n, you need initial distribution. Let


µ = (µ0 , µ1 , µ2 ) denote the initial distribution. µ is a vector of size equal to the size of state
space. Then, it means

P(X0 = 0) = µ0 , P(X0 = 1) = µ1 , P(X0 = 2) = µ2 .

If you denote the distribution of Xn by µn ,

µn = µ · P n .

31
10 Lecture 12: Feb 22
10.1 Review for Midterm
Traffic intensity is mathematically defined as
λ λ
ρ1 = λm1 = , ρ2 = λm2 =
µ1 µ2
where λ is arrival rate and µ is service rate. You should be aware that traffic intensity and
utilization may be different. Depending on situation, these two may differ.
Also,
1
λ= , m1 = E(vi1 ), m2 = E(vi2 ).
E(ui )
We also covered Gamma distribution. It has two parameters.
X ∼ Gamma(α, β)
α is shape parameter and β is scale parameter. Erlang distribution is a special case of
Gamma distribution with integer α value. For example,
X ∼ Gamma(10, .2)
X =X1 + X2 + · · · + X10
where Xi are iid, exponentially distributed. Xi is exponential with mean 0.2. Therefore,
E[X] =10E[Xi ] = 2, Var(X) = 10 Var(Xi )
Var(X) 10 Var(X1 ) 1 Var(X1 ) 1
c2 (X) = 2
= 2
= =
(E[X]) (10E[Xi ]) 10 E(Xi ) 10
For general Gamma distribution, you can generalize the result. c2 (X) = 1/α. Hence, if α is
below 1, variability grows.
Suppose we have multiple stations. Say two cases of three stations.
First case: ρ1 = 1.2 ρ2 = .8 ρ3 = .5
Second case: ρ1 = .8 ρ2 = 1.2 ρ3 = .5
Throughput for both systems should be same. The bottleneck for the first system is ρ1 while
that of the second is ρ2 . Thinking about utilization of ρ3 , it is always .5/1.2 because the item
feeding from station 2 to 3 is always 1/1.2. For the first system, station 2 is not overwhelmed
so it passes out items at the rate at which it gets fed.
Let us think about another case.
ρ1 = .9 ρ2 = .8 ρ3 = .5
In this case, utilization of station 3 is .5/1 = .5 because no stations were overwhelmed.
Throughput is 1.
Can Kingman’s approximation be generalized into multiple stations? Generally, yes.
There is network version of the approximation, but I would not emphasize that because it
is more like heuristics.

32
10.2 DTMC generated by a Recursive Function

Xn+1 = f (Xn , Un+1 )

where {U1 , U2 , U3 , · · · } is an iid sequence. Then, Xn is a Markov chain. Fortunately, all


models we have seen are Markov. In the random walk model, Un+1 is a coin toss and
f (x, u) = x + u. (s, S) inventory model is also good example fitting here.
For test, I will more focus on identifying a Markov chain. Then, you would have to
answer (1) state space, (2) why it is a Markov chain, (3) transition matrix. For state space,
you need to be specific. For example, it may be temperature of a day: hot or cold. Finally,
you need to provide a matrix filled with transition probabilities.
Let X = {Xn : n = 0, 1, 2, · · · } be a DTMC on S = {1, 2, 3} with transition matrix
 
1 0 .8 .2
P = 2 .5 0 .5 .
3 1 0 0

You need to be able to interpret transition probabilities Pi,j by looking at this matrix. For
example,

P2,3 = .5 = P(Xn+1 = 3|Xn = 2), P3,2 = 0.

If you need to compute for multiple steps, you need to get P 2 or P 3 . Examples are
2
P(X4 = 3, X2 = 1|X1 = 2) =P2,1 P1,3
3
X 3
X
2
P(X4 = 3|X2 = 1) = P(X4 = 3, X3 = i|X2 = 1) = P1,i Pi,3 = P1,3 .
i=1 i=1

I can of course give you a bit more complicated version of this problem.
2 6
P(X10 = 1, X4 = 3, X2 = 1|X1 = 2) =P2,1 P1,3 P3,1

10.3 Steady-state Concept


Another concept you need to be aware of is steady-state. Think about two probabilities:
P(Xn = 3|X1 = 1), P(Xn = 3|X1 = 2). If n is small, the different may be significant.
However, if n is large enough such as 100 or 1000, initial condition’s effect would fade away.
Studying on system with large enough n is steady-state analysis. Then, how long do you
need for the DTMC to reach steady-state? I also do not have a general answer, but it is
called relaxation time. Regarding relaxation time, you can remember this example. How
many shuffles needed to get a random enough deck? 6 would be enough by research by a
professor in Stanford.

33
Then, does steady-state always exist? Sometimes not. Suppose following transition
matrix.
 
0 1
P =
1 0

In this case, the chain is alternating instead of converging to the steady-state.


Let us take another examples.

wn+1 = max(wn + vn − un+1 , 0)

Here wn+1 plays the role of Xn+1 and Un+1 := vn − un+1 . We need to have Ui iid for the
chain be Markov. If vi and ui are iid, Un+1 is also iid. Hence, this chain is a DTMC.

34
11 Lecture 13: Mar 1
11.1 Stationary Distribution
Let us start with the inventory model we did last time. Assume the demand distribution as
follows.

d 0 1 2 3
P(D = d) .1 .4 .3 .2

If our inventory policy is (s, S) = (1, 3) and Xn is the inventory at the end of each week
(i.e. S = {0, 1, 2, 3}),
 
0 .2 .3 .4 .1
1 .2 .3 .4 .1
P =  .
2 .5 .4 .1 0 
3 .2 .3 .4 .1

Let us define µ = (0, 0, 0, 1) as the initial distribution of our DTMC. It means that X0 = 3
with probability 1, i.e. deterministically.
Try the following Matlab codes.

>> mu=[0 0 0 1]
>> mu*P^2
>> mu*P^10
>> mu*P^100
>> mu*P^101

As you will find the probability distributions for µP 100 and µP 101 the same. How fast does
this distribution reach to the final distribution? Even just after 10 weeks, it seems like the
chain already reached the stable stage. This is related to the relaxation time. In this case,
the relaxation time seems to be quite short.
How would you call limn→∞ µn ? We can call it the “limit distribution”, which is in
this case (.2923, .3308, .3077, .0692). In mathematical notation, π = (π0 , π1 , π2 , π3 ). What
condition should this π satisfy?
3
X
πi = 1
i=0
πP = π

The first condition is due to π is a probability distribution, but the second condition may
not be directly intuitive. Let us think this way. Since we know µ100 ≈ µ101 , let us call that
value π. We also know that µ101 = µ100 P . Hence, π = πP .

35
Definition 11.1 (Stationary Distribution). Suppose π = (πi , i ∈ S) satisfies
P
(i) πi ≥ 0 and i∈S πi = 1

(ii) π = πP .
Then, π is said to be a stationary distribution of the DTMC.
There are two interpretations of stationary distribution. Suppose π0 = .29 = 29%.
• Long run fraction of time that this DTMC stays in state 0 is 0.29.

• Probability that this DTMC will be in state 0 if you run this system for long time is
0.29.
Example 11.1. Let us apply this to our transition matrix. Using π = πP , we have the
following system of linear equations.

π0 =.2π0 + .2π1 + .5π2 + .2π3


π1 =.3π0 + .3π1 + .4π2 + .3π3
π2 =.4π0 + .4π1 + .1π2 + .4π3
π3 =.1π0 + .1π1 + 0π2 + .1π3
3
X
1= πi
i=0

To solve this a bit more easily, you can first solve with respect to one variable, say, π3 .
Solving this, we obtain
38 9 43 9 40 9 9
π0 = , π1 = , π2 = , π3 = ≈ .0692.
9 130 9 130 9 130 130
In the test, you will not have a luxury to use Matlab. Even in such case, you could
be asked to give the stationary distribution. Hence, you should practice to compute the
stationary distribution from the given transition matrix.

11.2 Cost Structure for DTMC


So far, we have not considered any cost associated with inventory. Let us assume the following
cost structure.
(i) Holding cost for each item left by the end of a Friday is $100.

(ii) Variable cost (Cv ) is $1000.

(iii) Fixed cost (Cf ) is $1500.

(iv) Each item sells $2000, Cp .

36
(v) Unfulfilled demand is lost.
If you are a manager of a company, you will be interested in something like this: What
is the long-run average profit per week? You can lose one week, earn another week. But,
you should be aware of profitability of your business in general.
Let C(i) be the expected profit of the following week, given that this week’s inventory
ends with i items. Let us first think about the case i = 0 first. You need to order three
items, and cost of doing it will be associated with both variable cost and fixed cost. We
should also count in the revenue you will earn next week. Fortunately, we do not have to
pay any holding cost because we do not have any inventory at the end of this week.
C(0) = − Cost + Revenue
=[−3($1000) − $1500] + [3($2000)(.2) + 2($2000)(.3) + 1($2000)(.4) + 0(.1)]
= − $1300
This is not the best week you want. How about the case you are left with 2 items at the end
of this week? First of all, you should pay the holding cost $100 per item. When calculating
the expected revenue, you should add probabilities for D is 2 or 3. This is because even if
the demand is 3, you can only sell 2 items. Since you do not order, there will be no cost
associated with ordering.
C(2) = − Cost + Revenue
= − 2(100) + 2000E[D ∧ 2]
=[−2($100)] + [($0)(.1) + ($2000)(.4) + ($4000)(.3 + .2)]
=$2600
This seems to be quite a good week. I computed C(3) = $2900 and C(1) = −$400 for you.
C(1) =E[D ∧ 3] = −3600 + 3200 = −400
C(3) = − 300 + 2000E[D ∧ 3] = 2900
Based on this, how would you compute the long-run average profit? This is where the
stationary distribution comes into play.
3
X
Long-run avg profit = C(i)πi
i=0
=C(0)π0 + C(1)π1 + C(2)π2 + C(3)π3
=$488.39
It means that under the (s, S) = (1, 3) policy, we are earning about 500 dollars every week
using this policy. So, probably you can now decide whether or not to keep running the
business. It does not guarantee that you earn exactly $488.39 each week. It is the long-run
average profit. In terms of getting the optimal policy maximizing the long-run average profit,
it involves advanced knowledge on stochastic processes. Especially, this kind of problem
cannot be solved using the simplex method because it is highly nonlinear problem. You will
be better off by trying to solve it numerically using Matlab or other computer aids.

37
12 Lecture 14: Mar 3
12.1 Interpretation of Stationary Distribution
>> P1 = [0 .1 .9; .8 0 .2; .4 .6 0]
>> P1^100
>> P1^101

Not surprisingly, P2100 ≈ P2101 . You will see each row (.3607, .2623, .3770) is identical. Do
you understand each identical row is the stationary distribution of the DTMC? The first
element of the stationary distribution vector is about 36%. What does it mean? It means
that, for 36% of time, the DTMC stays in state 1.
However, this is not always the case.

>> P2 = [0 .1 0 .9; .8 0 .2 0; 0 .3 0 .7; .6 0 .4 0]


>> P2^100
>> P2^101

Now we have drastically different matrices for P1100 and P1101 .


P2 is a simple case of random walk. In each state, you go up one state or down on state.
I will post this file on T-Square so that you can play around with this.
Let us look at a more simple case.

>> P3 = [0 1; 1 0]
>> P3^100
>> P3^101
>> (P3^100+P3^101)/2

In this simple example, the chain goes from 0 to 1 and from 1 to 0 for 100% chance. Does
this chain have a stationary distribution? This chain is obviously not stabilized. Then, do
we have two stationary distributions? What do you think about taking average of P3100 and
P3101 ?

>> P4 = [.2 0 .4 0 .4; 0 .5 0 .5 0; .3 0 .1 0 .6; 0 .1 0 .9 0; .8 0 .1 0 .1]


>> P4^100
>> P4^101

Here, we again have P4100 = P4101 . However, not all rows are identical. What does it mean
that every row is identical? It means that your initial state is irrelevant once you run the
chain for a long time. For P4 , odd rows are identical and even rows are identical, but those
two are different. Here we have two stationary distributions. If you started at odd states,
you will be in an equilibrium of odd rows. Otherwise, you will be stuck in the even row
equilibrium. I hope our economic social ladder is not similar to this.

38
12.2 Transition Diagram
Example 12.1. Suppose we have the following transition matrix.
 
1 0 .2 .8
P1 = 2 .5 0 .5
3 .6 .4 0

State space S = {1, 2, 3}. Is the following diagram equivalent to the matrix?

1
.5 .8

.2 .6
.5
2 3

.4

The diagram contains exactly the same amount of information as the transition matrix has.

Example 12.2. Now, think of P2 different from P1 . Suppose P2 can be represented as the
following diagram with state space S = {1, 2, 3, 4, 5, 6, 7}.

1
.5 .8

.2 .6
.5
2 3

.4
1 .6 .7

4 5 6 7

.4 .3 1

In this example, there are two separate groups of diagrams in the entire picture. In that
sense, this chain can be reduced into two separate chains. These chains are called “reducible”.

12.3 Accessibility of States and Irreducibility of Chain


Let X = {Xn : n = 0, 1, 2, · · · } be a DTMC on S with transition matrix P .

Definition 12.1. (i) State i can reach state j if there exists an n such that P(Xn =
(n)
j|X0 = i) > 0, i.e. Pij > 0. This is mathematically noted as i → j.

39
(ii) States i and j are said to communicate if i → j and j → i.

(iii) X is said to be irreducible if all states communicate. Otherwise, it is said to be


reducible.

Why does irreducibility important? In the previous Matlab example, we saw that if a
chain is reducible we have more than one stationary distribution. In the previous example,
if we have an arrow from state 4 to 3, does it make the chain irreducible? No, because 7 → 3
but 3 6→ 7.

Theorem 12.1. (i) If X is irreducible, there exists at most one stationary distribution.

(ii) If X has a finite state space, it has at least one stationary distribution.

Why is stationary distribution important? As seen in last lecture, when we compute


long-run average profit of a company, we need the stationary distribution.

Corollary 12.1. For a finite state, irreducible DTMC, there exists a unique stationary
distribution.

Example 12.3. Think of the following DTMC.

1 4

2 3

In this case, we have P 100 6= P 101 . However, according to our corollary, there should be a
unique stationary distribution for this chain, too. How do we obtain it given that P 100 6=
P 101 ? How about getting the average of the two?

P 100 + P 101
2
Even in this case, we cannot obtain the limiting distribution, because this chain oscillates
as n → ∞. How do we formally classify such cases?

Definition 12.2 (Periodicity). For a state i ∈ S,

d(i) = gcd{n : Piin > 0}

where gcd is the greatest common divisor.

40
For example, in the first example,

d(1) = gcd{2, 4, 3, · · · } = 1
d(2) =1.

In fact, if i ↔ j, d(i) = d(j). It is called the solidarity property. Since all states in a
irreducible DTMC communicate, the periods of all states are the same.
From the third example, d(4) = gcd{2, 4, 6, · · · } = 2, so this DTMC is periodic with
period d = 2.
Theorem 12.2. (i) If DTMC is aperiodic, then limn→∞ P n exists. (Each row of limn→∞ P n
is a stationary distribution.)
(ii) If DTMC is periodic with period d ≥ 2,
P n + P n+1 + P n+2 + · · · + P n+d−1
lim
n→∞ d
exists.
Since we are getting too abstract, let us look at another example.
Example 12.4.
 
.2 .8
P =
.5 .5

Is this DTMC irreducible? Yes. Now check Pii1 > 0. Therefore, d(i) = 1. It is an easy way to
check if a DTMC is aperiodic. Therefore, the limiting distribution exists, each row of which
is equal to the stationary distribution. What would be the stationary distribution then?

πP = π
 
.2 .8
(π1 , π2 ) = (π1 , π2 )
.5 .5
 
5 8
∴ π= ,
13 13
Hence, even without the help from Matlab, we can say that we know
 
n 5/13 8/13
lim P = .
n→∞ 5/13 8/13
Example 12.5.
 
0 .5 0 .5
.5 0 .5 0
P =
0

.5 0 .5
.5 0 .5 0

41
According to our theorems, how many distributions do we have? Just one. Can you give
the stationary distribution?

π = (25%, 25%, 25%, 25%)

It means that the long-run average fraction of time you are in each state is a quarter. Then,
is this true?
 
1/4 1/4 1/4 1/4
1/4 1/4 1/4 1/4
Pn = 1/4 1/4 1/4 1/4

1/4 1/4 1/4 1/4

It is NOT true. Rather, it is limn→∞ (P n + P n+1 )/2.

Example 12.6. Suppose the following DTMC and that you are asked to compute limn→∞ P n .

.2
.8

1 2 .5
.5
.5
1
.8
3 4 5
.2

.5 .8

6 7 .5

.5
.2

 
5/13 8/13 0 0 0 0 0

 5/13 8/13 0 0 0 0 0 


 (1/2)(5/13) (1/2)(8/13) 0 0 0 (1/2)(5/13) (1/2)(8/13) 
n
lim P = ((.8)(.5) + .2)(5/13) ((.8)(.5) + .2)(8/13)
 0 0 0 (.8)(.5)(5/13) (.8)(.5)(8/13)
n→∞ 

 5/13 8/13 0 0 0 0 0 

 0 0 0 0 0 5/13 8/13 
0 0 0 0 0 5/13 8/13

When computing rows 1, 2, you can just forget about states except for 1 and 2 because there
is no arrow going out. Same for rows 6, 7.

42
13 Lecture 15: Mar 8
13.1 Recurrence
Let X be a DTMC on state space S with transition matrix P . For each state i ∈ S, let τi
denote the 1st n ≥ 1 such that Xn = i.

Definition 13.1. (i) State i is said to be recurrent if P(τi < ∞|X0 = i) = 1.

(ii) State i is said to be positive recurrent if E(τi |X0 = i) < ∞.

(iii) State i is said to be transient if it is not recurrent.

Example 13.1.

.5
.5 1 2
1

Given that X0 = 1, is it possible τ1 = 1, meaning that the chain returns to state 1 at time
1? Yes.
1
P(τ1 = 1|X0 = 1) =
2
P(τ1 = 2|X0 = 1) =P(X1 6= 1, X2 = 1|X0 = 1) = (0.5)1 = 0.5
P(τ1 = 3|X0 = 1) =0
 
1 3
E(τ1 |X0 = 1) =1 + 2(0.5) = < ∞
2 2

Note that the second probability is not 0.25+0.5, because X1 should not be equal to 1 for
τ1 = 2. Since the last expectation is finite, this chain is positive recurrent.

Example 13.2.

.5
.5
1 3
1 .5
.5 2

43
Is state 1 positive recurrent? Yes. State 2 is transient and how about state 3?
P(τ3 = 1|X0 = 3) =0
P(τ3 = 2|X0 = 3) =0.5
P(τ3 = 3|X0 = 3) =(0.5)2
P(τ3 = 4|X0 = 3) =(0.5)3
..
.
τ3 is basically a geometric random variable and E(τ3 |X0 = 3) = 1/p = 2 < ∞.

13.1.1 Geometric Random Variable


Let us review how to compute the geometric random variable. Let X be the number of
tosses to get the 1st head, then X is a geometric random variable. It could take millions of
tosses for you to get the 1st head. But, what is the probability X is finite?
P(X = 1) =p
P(X = 2) =pq
P(X = 3) =pq 2
P(X = 4) =pq 3
..
.

X p
P(X < ∞) = P(X = n) = p + pq + pq 2 + pq 3 + · · · = p(1 + q + q 2 + · · · ) = =1
n=1
1−q

How about the expectation?


E(X) =1p + 2pq + 3pq 2 + · · · = p(1 + 2q + 3q 2 + · · · ) = p(q + q 2 + q 3 + · · · )0
 0
1 p 1
=p = 2
=
1−q (1 − q) p
Using the solidarity property, we can conclude the following.
(i) If i ↔ j, i is recurrent if and only if j is recurrent.
(ii) If i ↔ j, i is positive recurrent if and only if j is positive recurrent.
Example 13.3.
.4
1 .6.3 3
.8
.7 2 .2

44
In this case, we know that state 1 is positive recurrent. We also know that 1 ↔ 2 and 1 ↔ 3.
Thus, we can conclude that all states in this chain is positive recurrent.

Example 13.4. Suppose a simple random walk.


p p p p p p
··· -2 -1 0 1 2 ···
q q q q q q

(i) p = 1/3, q = 2/3: State i is transient. Say you started from 100, there is positive
probability that you never return to state 100. Hence, every state is transient.

(ii) p = 2/3 > q: By the strong law of large numbers, P(Sn /n → −1/3) = 1 because
n Pn
Sn i=1 ξi 2 1 1
X
Sn = ξi , = → E(ξ1 ) = (−1)q + (1)p = − + = − .
i=1
n n 3 3 3

(iii) p = q = 1/2: State i is recurrent, but not positive recurrent.

Note from the example above that if the chain is irreducible, every state is either recurrent
or transient.

Theorem 13.1. Assume X is irreducible. X is positive recurrent if and only if X has a


(unique) stationary distribution π = (πi ). Furthermore,
1
E(τi |X0 = i) = .
πi
Recall that one of the interpretation of stationary distribution is the long-run average
time the chain spend in a state.

Theorem 13.2. Assume that X is irreducible in a finite state space. Then, X is recurrent.
Furthermore, X is positive recurrent.

In a finite state space DTMC, there is no difference between positive recurrence and
recurrence. These two theorems are important because it gives us a simple tool to check if
stationary distribution exists. How about limit distribution? Depending on transition ma-
trix, there may or may not be limiting distribution. When do we have limiting distribution?
Aperiodic. If periodic, we used average. Let us summarize.

Theorem 13.3. Let X be an irreducible finite state DTMC.

(i) X is positive recurrent.

(ii) X has a unique stationary π = (πi ).

45
(iii) limn→∞ Pijn = πj , independent of what i is if the DTMC is aperiodic.

(iv)

(Pijn + Pijn+1 + Pijn+2 + · · · + Pijn+d−1 )


lim = πj ,
n→∞ d
if the DTMC is periodic with period d.

Example 13.5. Suppose we have the following DTMC.

1 ↔ 2 ← 3 ↔ {4, 5, 6, 7}

Let us quickly compute the limiting distribution of the chain.


 
1 2/3 1/3 0 0 0 0 0
22/3 1/3 0 0 0 0 0 

3 0 0 0 

n
lim P = 4  0 0 0 
n→∞  
5 0 0 0 

6 0 0 0 0 0 2/3 1/3
7 0 0 0 0 0 2/3 1/3

For the rest part, let us have more time to do this. This part is important.

46
14 Lecture 16: Mar 10
14.1 Absorption Probability
Let us begin with the DTMC we were dealing with last time.
 
1 2/3 1/3 0 0 0 0 0
22/3 1/3 0 0 0 0 0 

3 0 0 0 

100 n
P ≈ lim P = 4  0 0 0 
n→∞  
5 0 0 0 

6  0 0 0 0 0 2/3 1/3
7 0 0 0 0 0 2/3 1/3

What would be P31 ? Before this, note that {1, 2}, {6, 7} are closed sets, meaning no
arrows are going out. Contrarily, {3, 4, 5} are transient states. It means that the DTMC
starts from state 3, it may be “absorbed” into either {1, 2} or {6, 7}.
Let us define a new notation. Let f3,{1,2} denote the probability that, starting state 3, the
DTMC ends in {1, 2}. Thus, f3,{1,2} + f3,{6,7} = 1. Let us compute the numbers for these.

f3,{1,2} =(.25)(1) + (.5)(f4,{1,2} ) + (.25)(0)


f4,{1,2} =(.5)(1) + (.5)(f5,{1,2} )
f5,{1,2} =(.5)(f3,{1,2} ) + (.25)(f4,{1,2} ) + (.25)(0)

We have three unknowns and three equations, so we can solve this system of linear equations.
 
5

 x = .25 + .5y x = f3,{1,2} = 8

y = .5 + .5z ⇒ y = f4,{1,2} = 34
 
z = .5x + .25y z = f5,{1,2} = 12
 

We also now know that f3,{6,7} = 1 − 5/8 = 3/8, f4,{6,7} = 1 − 3/4 = 1/4, f5,{6,7} = 1 − 1/2 =
1/2.
However, to compute P31 , we consider not only f3,{1,2} but also the probability that the
DTMC will be in state 1 not in state 2. Therefore,
52 32
P51 = f5,{1,2} π1 = 12 23
  

 P 31 = f 3,{1,2} π 1 = 8 3 
P 41 = f 4,{1,2} π 1 = 4 3 

51 31 11

P = f  
3,{1,2} π2 = 8 3
P = f
4,{1,2} π2 = 4 3
P = f
32 42 52 5,{1,2} π2 = 2 3
32
, 12
, 12
.


 P 36 = f3,{6,7} π6 = 8 3 

P 46 = f4,{6,7} π6 = 4 3 

 P 56 = f5,{6,7} π6 = 2 3
P37 = f3,{6,7} π7 = 83 13 P47 = f4,{6,7} π7 = 14 13 P57 = f5,{6,7} π7 = 12 13
  

47
Finally, we have the following complete limiting transition matrix.
 
1 2/3 1/3 0 0 0 0 0
2 2/3 1/3 0 0 0 0 0 

35/12 5/24 0 0 0 1/4 1/8 

100 n
P ≈ lim P = 4   1/2 1/4 0 0 0 1/6 1/12
n→∞ 
5 1/3 1/6 0 0 0 1/3 1/6 

6 0 0 0 0 0 2/3 1/3 
7 0 0 0 0 0 2/3 1/3

14.2 Computing Stationary Distribution


Example 14.1. Assume that X is an irreducible DTMC that is positive recurrent. It
means X has a unique stationary distribution. Transition matrix can be described as in the
following state diagram.

.5
.25 1 .25
.5 3

.5 2 1

How do we find the stationary distribution? As far as we learned, we can solve the
following system of linear equations.
X
π = πP, πi = 1
i

This is sometimes doable, but becomes easily tedious as the number of states increases.
We can instead use the “flow balance equations”. The idea is that for a state, rate into
the state should be equal to rate out of the state. Considering flow in and out, we ignore
self feedback loop. Utilizing this idea, we have the following equations.

State 1 : rate in = .5π2


rate out = π1 (.5 + .25)
State 2 : rate in = .25π1 + 1π3
rate out = .25π2
State 3 : rate in = .5π1 + .5π2
rate out = 1π3

Equating each pair of rate in and out, now we have three equations. These three equations
are equivalent to the equations we can get from π = πP .

48
Example 14.2 (Reflected Random Walk). Suppose X has the following state diagram.
p p p
q 0 1 2 3
q q q

If the probability of the feedback loop at state 0 is 1, it means that you will be stuck there
once you get in there. In business analogy or gambler analogy, it means the game is over.
Somebody forcefully pick the DTMC out from there to keep the game going. The Wall Street
in 2008 resembles this model. They expected they would be bailed out. It is an incentive
problem, but we will not cover that issue now.
Suppose p = 1/3, q = 2/3. Then, the chain is irreducible. I can boldly say that there
exists a unique stationary distribution. However, solving π = πP gives us infinite number of
equations. Here the flow balance equations come into play. Let us generalize the approach
we used in the previous example.
For any subset of states A ⊂ S,
rate into A = rate out ofA.
If A = {0, 1, 2}, we essentially look at the flow between state 2 and 3 because this state
diagram is linked like a simple thin chain. We have the following equations.
 
p
π1 = π0
q
   2
p p
π2 = π1 = π0
q q
   3
p p
π3 = π2 π0
q q
   4
p p
π4 = π3 π0
q q
Every element of the stationary
P distribution boils down to the value of π0 . Recall we always
have one more condition: i πi = 1.
∞    2  3 !
X p p p
πi =π0 1 + + + + ··· = 1
i=0
q q q
1 1
π0 =    2  3 = 1
p p p 1− pq
1+ q
+ q
+ q
+ ···
p 1
=1 − =
q 2
 n
p
πn = π0
q

49
So far, we assumed that p < q. What if p = q? What would π0 be? π0 = π1 = π2 =
· · · = 0. Then, is π a probability distribution? No. Since we do not have the stationary
distribution, this chain is not positive recurrent. What if p > q? It gets worse. The chain
cannot be positive recurrent because you have nonzero probability of not coming back. In
fact, In this case, every state is transient.

I call this method “cut method”. You probably realized that it would be very useful.
Remember that positive recurrence means that the chain will eventually come back to a
state and the chain will have some types of cycles. You will see the second example often as
you study further. It is usually called “birth-death process”.

50
15 Lecture 17: Mar 15
15.1 A DTMC model of stock price
Suppose you invest in stocks. Let Xn denote the stock price at the end of period n. This
period can be month, quarter, or year as you want. Investing in stocks is risky. How would
you define the return?
Xn − Xn−1
Rn = , n≥1
Xn−1
Rn is the return in period n. We can think that X0 is the initial money you invest in. Say
X0 = 100, X1 = 110.
X1 − X 0 110 − 100 1
R1 = = = = 10%
X0 100 10
Suppose you are working in a financial firm. You should have a model for stock prices. No
model is perfect, but each model has its own strength. One way to model the stock prices is
using iid random variables. Assume that {Rn } is a series of iid random variables. Then, Xn
can be represented as follows.
n
Y
Xn = X0 (1 + R1 )(1 + R2 )(1 + R3 ) · · · (1 + Rn ) = X0 (1 + Ri )
i=1
Q P
is similar to except that it represents multiplication instead of summation. Assuming
Rn iid, is X = {Xn : n = 0, 1, · · · , n} DTMC? Let us have more concrete distribution of Rn .

P(Rn = 0.1) =p = 20%


P(Rn = 0.05) =q = (1 − p) = 80%

If you look at the formula for Xn again,

Xn = Xn−1 (1 + Rn ) = f (Xn−1 , Rn )

so you know that Xn is a DTMC because Xn can be expressed as a function of Xn−1 and
iid Rn .
This type of model is called “binomial model”. In this model, only two possibilities for
Rn exist: 0.1 or 0.05. Hence, we can rewrite Xn as follows.

Xn = X0 (1 + .1)Zn (1 + .05)n−Zn

where Zn ∼ Binomial(n, 0.2).


The Wall Street firm uses computational implementation based on these models. Practi-
cal cases are so complicated that they may not have analytic solutions. Health care domain
also uses this kind of Markov models, e.g. gene mutation. DTMC has a huge application

51
area. Our school has a master program called Quantitative Computational Finance. It
models stock prices not only in a discrete manner but in a continuous manner. They have
a concept called geometric Brownian motion. Brownian motion has the term like eB(t) . In
binomial model, we also had the term with powers. If you look at the stock price from news-
papers, it looks continuous depending on the resolution. However, it is in fact discrete. There
are two mainstreams in academia and practice regarding stock price modeling: discrete and
continuous. Famous Black-Scholes formula for options pricing is a continuous model. This
is a huge area and I can easily get into this subject and talk about it forever. Let us keep it
this level.

16 Poisson Process and Exponential Distribution


Definition 16.1. A random variable X is said to have exponential distribution with rate λ
(with mean 1/λ) if it has c.d.f. of

F (x) = 1 − e−λx

and p.d.f. of
(
λe−λx , if x ≥ 0
f (x) = .
0, if x < 0

Be careful that rate λ implies that the mean of the exponential distribution is 1/λ. Often,
inter-arrival times are modeled by a sequence of iid exponential random variables. What do
we know about the exponential distribution?

(i) E(X) = 1/λ

(ii) c2 (X) = 1

(iii) Var(X) = 1/λ2

(iv) Memoryless property

What is the memoryless property?

16.0.1 Memoryless Property

P(X > t + s|X > s) = P(X > t)

Let me paraphrase this concept. Look at the light bulb on the ceiling. Let X denote
the lifetime of the bulb and assume that this light bulb’s lifetime follows the exponential
distribution. If it is on now, the length of its lifetime is as long as a new bulb. s is the time
until now and t is additional lifetime. Given that the light has been on for s units of time,

52
the probability that it will be on for t more units of time is same as that of a new light bulb.
How is it so?
P(X > t + s) e−λ(t+s)
P(X > t + s|X > s) = = = e−λt = P(X > t)
P(X > s) e−λs
In fact, it is the defining property of exponential property. It means that if a non-negative
continuous random variable has the memoryless property, it must be exponential distribution.
Corollary 16.1.
P(X > t + S|X > S) = P(X > t)
for any positive random variable S that is independent of X.
Mathematically, we just replaced s with S. Let me explain the change in plain English.
If we are saying, “if the light is on at noon today, its remaining lifetime distribution is as if it
is new,” we are referring to s. If we are saying, “if the light is on when IBM stock price hits
100 dollars, its remaining lifetime is as if it is new,” we are referring to S which is random.
The important thing to note is that S should be independent of X. Suppose S = X/2, i.e.
S depends on X.
P X > t + X2
    
X X X
P X > t + |X > = =P X >t+ = P(X > 2t)
2 2 P (X > X/2) 2
6=P(X > t)
The memory property does not hold if X and S are not independent.
Come back to other derivations from exponential distribution. Let X1 and X2 be indepen-
dent exponential random variables with rates λ1 and λ2 respectively, i.e. X1 ∼ Exp(λ1 ), X2 ∼
Exp(λ2 ). Let X = min(X1 , X2 ), then X denote the time when any one of two bulbs fails.
P(X > t) =P(X1 > t, X2 > t) = P(X1 > t)P(X2 > t) ∵ X1 ⊥⊥ X2
−λ1 x −λ2 x −(λ1 +λ2 )x
=e e =e
Therefore, we can say that X = min(X1 , X2 ) ∼ Exp(λ1 + λ2 ).
Example 16.1. Let X1 and X2 follow exponential distribution with means 2 hours and 6
hours respectively. Then, what would be the mean of min(X1 , X2 )?
1 1
E(min(X1 , X2 )) = 1 1 = = 1.5hours
2
+ 6
4/6
How about the expectation of max(X1 , X2 )? We can use the fact that X1 +X2 = min(X1 , X2 )+
max(X1 , X2 ).
E(max(X1 , X2 )) =E(X1 + X2 ) − E(min(X1 , X2 )) = 8 − 1.5 = 6.5hours
We do not have a convenient way to compute the expectation of max(X1 , X2 , · · · , Xn ) more
than two exponential random variables.

53
Corollary 16.2.
λ1 λ2
P(X1 < X2 ) = , P(X1 > X2 ) = , P(X1 = X2 ) = 0
λ1 + λ2 λ1 + λ2
How to compute this? One way is to guess the answer. As λ1 becomes large, X1 gets
shorter. It implies that P(X1 < X2 ) goes close to 1. Another way is to double integrate.

54
17 Lecture 18: Mar 17
17.1 Comparing Two Exponentials
Let X1 and X2 be two independent r.v.’s having exponential distribution with rates λ1 and
λ2 .

Proposition 17.1.
λ1
P(X1 < X2 ) = , f (x1 , x2 ) = λ1 e−λ1 x1 e−λ2 x2
λ1 + λ2
How would you remember the correct formula was λ1 /(λ1 + λ2 ) not λ2 /(λ1 + λ2 )? With
λ1 increasing, the expected lifetime of the first lightbulb X1 becomes shorter, hence the
probability X1 breaks down before X2 gets close to 1. We can also compute the probability
using the joint pdf.
ZZ Z ∞ Z x2 
P(X1 < X2 ) = f (x1 , x2 )dx1 dx2 = f (x1 , x2 )dx1 dx2
D 0 0

When computing double integral, it is helpful to draw the region where integration applies.
The shaded region in the following figure is the integration range in 2 dimensions.
5

3
X2

0
0 1 2 3 4 5
X1

Example 17.1. (i) A bank has two teller, John and Mary. John’s processing times are
iid exponential distributions X1 with mean 6 minutes. Mary’s processing times are iid
exponential distributions X2 with mean 4 minutes. A car with three customers A, B, C
shows up at 12:00 noon and two tellers are both free. What is expected time the car
leaves the bank?
Using intuition, we can see that it would be between 8 and 12 minutes. Suppose A, B
first start to get service. Once one server completes, C will occupy either A or B’s

55
position depending on which server finishes first. If C occupies A’s server after A is
completed, B has already been in service while A was getting served. Let Y1 , Y2 denote
the remaining processing time for John and Mary respectively. The expected time
when the car leaves is

E[W ] = E[min(X1 , X2 )] + E[max(Y1 , Y2 )] = E[min(X1 , X2 )] + E[max(X1 , X2 )]

because of the memoryless property of exponential distribution. Even if B has gone


through service for a while, due to memoryless property of exponential distribution,
B’s service time is as if B just started the service. Thus,

E[W ] = E[min(X1 , X2 ) + max(X1 , X2 )] = E[X1 + X2 ] = 10 minutes.

(ii) What is the probability that C finishes service before A?


First compute the probability that B finishes service before A.

1/6 4 4
P(B before A) = = =
1/4 + 1/6 4+6 10
 2
4
∴ P(C before A) =
10

This is because we can think of A just starting to get service when C started service.

(iii) What is the probability that C finishes last?

P(C finishes last) =1 − P(C not last)


=1 − (P(C finishes before A) + P(C finishes before B))
 2  2
4 6 12
=1 − − =
10 10 25

17.2 Poisson Process


Definition 17.1 (Poisson Process). N = {N (t), t ≥ 0} is said to be a Poisson process with
rate λ if

(i) N (t) − N (s) ∼ Poisson(λ(t − s)) for any 0 ≤ s < t

(ii) N has independent increments, i.e., N (t) − N (s) is independent of N (u) for 0 ≤ u ≤ s

(iii) N (0) = 0

56
N is said to model a counting process, e.g. N (t) is the number of arrivals in [0, t]. First
of all, whichever time interval you look at closely, say (s, t], the number of arrivals in the
interval follows the distribution of Poisson random variable. Second, how many people arrive
during a time interval in the past does not affect the number of arrivals in a future interval.
Where do we see this thing in the real world? We have very large population base, say 1
million or 100 thousands. Each person makes independent decision, e.g. placing a call, with
small probability, then you will see the Poisson process.

Example 17.2. Assume N is a Poisson process with rate λ = 2/minutes.

(i) Find the probability that there are exactly 4 arrivals in first 3 minutes.

(2(3 − 0))4 −2(3−0) 64 −6


P(N (3) − N (0) = 4) = e = e = 0.1339
4! 4!

(ii) What is the probability that exactly two arrivals in [0, 2] and at least 3 arrivals in
[1, 3]?

P({N (2) =2} ∩ {N (3) − N (1) ≥ 3})


=P(N (1) = 0, N (2) = 2, N (3) − N (1) ≥ 3)
+ P(N (1) = 1, N (2) = 2, N (3) − N (1) ≥ 3)
+ P(N (1) = 2, N (2) = 2, N (3) − N (1) ≥ 3)
=P(N (1) = 0, N (2) − N (1) = 2, N (3) − N (2) ≥ 1)
+ P(N (1) = 1, N (2) − N (1) = 1, N (3) − N (2) ≥ 2)
+ P(N (1) = 2, N (2) − N (1) = 0, N (3) − N (2) ≥ 3)
=P(N (1) = 0)P(N (2) − N (1) = 2)P(N (3) − N (2) ≥ 1)
+ P(N (1) = 1)P(N (2) − N (1) = 1)P(N (3) − N (2) ≥ 2)
+ P(N (1) = 2)P(N (2) − N (1) = 0)P(N (3) − N (2) ≥ 3)

What am I doing here? Basically, I am decomposing the intervals into non-overlapping


ones. Then, I will be able to use the independent increment property. We have learned
how to compute P(N (1) = 0), P(N (2) − N (1) = 2). How about P(N (3) − N (2) ≥ 1)?

P(N (3) − N (2) ≥ 1) =1 − P(N (3) − N (2) < 1) = 1 − P(N (3) − N (2) = 0)
20
=1 − e−2 = 1 − e−2
0!

(iii) What is the probability that there is no arrival in [0, 4]?

P(N (4) − N (0) = 0) = e−8

57
(iv) What is the probability that the first arrival will take at least 4 minutes? Let T1 be
the arrival time of the first customer. Is T1 a continuous or discrete random variable?
Continuous.

P(T1 > 4) = P(N (4) = 0) = e−8

Can you understand the equality above? In plain English, “the first arrival takes at
least 4 minutes” is equivalent to “there is no arrival for the first 4 minutes.” It is very
important duality. What if we change “4” minutes to t minutes?

P(T1 > t) = P(N (t) = 0) = e−2t

Surprisingly, T1 is an exponential random variable. In fact, the times between arrivals


also follows the same iid exponential distribution. We will cover this topic further after
the Spring break.

17.2.1 Poisson Random Variable


Suppose Y ∼ Poisson(µ). Then,

µk −µ
P(Y = k) = e , k = 0, 1, 2, · · · .
k!
What is the meaning of µ? In Poisson random variable, there is no concept of rate. µ = E[Y ].

58
18 Lecture 19: Mar 29
18.1 Time-nonhomogeneous Poisson Processes
Example 18.1. Let N (t) be the number of arrivals in [0, t]. Assume that N = {N (t), t ≥ 0}
is a Poisson process with rate λ = 2/min.
(i)
(2(7 − 3))5 −2(7−3) 85 −8
P(N (3, 7] = 5) = e = e
5! 5!

(ii) Let T1 be the arrival time of the 1st customer.

(2t)0 −2t
P(T1 > t) = P(N (0, t] = 0) = e = e−2t , t≥0
0!
What is this distribution? It is exponential distribution, meaning that T1 follows
exponential distribution. T1 ∼exp with mean “0.5” minutes. Therefore,

P(T1 ≤ t) = 1 − e−2t .

(iii) Let T3 be the arrival time of the 3rd customer. Which of the following is correct?

P(T3 > t) =P(N (0, t] ≤ 2) = P(N (0, t] = 0) + P(N (0, t] = 1) + P(N (0, t] = 2)
2
2t (2t) −2t
=e−2t + e−2t + e
1! 2!
P(T3 > t) =P(N (0, t] = 2)

The first equation is correct. Now we can compute the cdf of T3 .


2t −2t (2t)2 −2t
 
−2t
P(T3 ≤ t) = 1 − P(T3 > t) = 1 − e + e + e
1! 2!
Can we compute the pdf of T3 ? We can take derivative of the cdf to obtain the pdf.
2(2t)2 −2t
fT3 (t) = e
2!
What random variable is this? Poisson? No. Poisson is a discrete r.v. T3 can obviously
take non-integral values. It is gamma distribution. To help you distinguish different
2s here, let me use λ.
λ(λt)2 −λt
e ∼ Gamma(3, λ)
2!
α = 3 is the shape parameter and λ is the scale parameter. For gamma distribution,
it is easy to understand when α is an integer. Even if α is not an integer, gamma

59
distribution is still defined. However, it is tricky to compute something like (2.3)!.
That’s where Γ(α) , gamma function, comes into play. You can look up the table to get
the specific value for a certain α.
Now think why we get gamma distribution for T3 . Each inter-arrival time is exponential
r.v. Also, they are iid. Therefore, we can compute T3 by summing three of them. That’s
why we get gamma distribution.
By the way, Erlang distribution, if you have heard of them, is gamma distribution with
integer α.
Why do we need non-homogeneity? Suppose you are observing a hospital’s hourly arrival
rate. Arrival rate will not be the same over time during a day. See the following figure.
10

8
Homogeneous
Hourly Arrival Rate

3
Non-homogeneous
2

0
0 2 4 6 8 10 12 14 16 18 20 22 24

Time (Hour)

Therefore, to model this type of real world phenomena, we need more sophisticated
model.
Definition 18.1 (Non-homogeneous Poisson Process). N = {N (t), t ≥ 0} is said to be a
time-inhomogeneous Poisson process with rate function λ = {λ(t), t ≥ 0} if
(i) N has independent increments,
(ii)
R k
t
s
λ(u)du Rt
P(N (s, t] = k) = e− s λ(u)du
,
k!
where N (s, t] = N (t) − N (s) is the number of arrivals in (s, t].
Non-homogeneous Poisson process allows us to model a rush-hour phenomenon.
Example 18.2. Suppose we modeled the arrival rate of a store as follows. One month after
launching, the arrival rate settles down and it jumps up at the end of the second month due
to discount coupon.

60
3

Arrival Rate (t)

0
0 1 2 3 4 5

Time

Assume N is a time-non-homogeneous Poisson process with rate function λ R= {λ(t), t ≥


.5
0} given in the figure above. Then, average number of arrivals in (0, .5] is 0 λ(t)dt =
(1/2)(.5)2 = .125.

(i)

P(N (.5) ≥ 2) =1 − P(N (.5) < 2) = 1 − P(N (.5) = 0) − P(N (.5) = 1)


.1251 −.125
P(N (.5) = 1) = e
1!

(ii)

P(there are at least 1 arrival in (0, 1] and 4 arrivals in (1, 3]) =

P(N (0, 1] ≥ 1) =1 − P(N (0, 1] = 0) = 1 − e−1/2


34
P(N (1, 3] = 4) = e−3
4!

18.2 Overview on Continuous-Time Markov Chain (CTMC)


CTMC also has a state space. Say S = {1, 2, 3}. Instead P of transition matrix, we have
roadmap matrix, R, here. The characteristics are Rij ≥ 0, Rij = 1.

61
4

exp(ld3)
3

exp(ld2) exp(ld2)
State
2

exp(ld1) exp(ld1)
1

0
0 1 2 3 4 5 6 7 8 9 10

Time

62
19 Lecture 20: Mar 31
19.1 Thinning Poisson Process
19.1.1 Merging Poisson Process
We covered most of Poisson process related topics. Some topics remaining are merging and
splitting of Poisson arrival processes. In Georgia Tech, there are two entrances. Denote the
numbers of arrivals through gate A and B by NA , NB .

NA = {NA (t), t ≥ 0}, NB = {NB (t), t ≥ 0}

Theorem 19.1. Assume NA is a Poisson process with rate λA . Assume NB is a Pois-


son process with rate λB . Assume further that NA and NB are independent. Then, N =
{N (t), t ≥ 0} is a Poisson processes with rate λ = λA + λB where N (t) = NA (t) + NB (t).

The independence assumption may or may not be true. Thinking of the Georgia Tech
case here, more people through gate A may mean that less people through gate B. Think
about Apple’s products. Would sales of iPad be correlated with sales of iPhone? Or, would
it be independent? We need to be careful about independence condition when you model
the real world.

19.1.2 Splitting Poisson Process


Now think about splitting a Poisson process. N = {N (t), t ≥ 0} is a Poisson process
describing the arrival process. Each customer has to make a choice which of two stores they
shop, A or B. To make the decision, they flip a biased coin with probability p of getting a
head. Let NA (t) be the number of arrivals to store A in (0, t]. Let NB (t) be the number of
arrivals to store B in (0, t]. Compose N (t) = NA (t) + NB (t).

Theorem 19.2. Suppose N is Poisson with rate λ. Then, NA is a Poisson process with rate
pλ and NB is a Poisson process with rate (1 − p)λ.

It may be silly that you shop and choose by flipping a coin. However, from a company’s
perspective, they view people choose based on flipping a coin using statistical inference.
Suppose a company has two products. They will model people’s choice using a lot of tracking
data and make a conclusion based on statistical inference.
Let me give you the sketch of the proof of the theorem above. (By sketch, it means that
it won’t be on the test.)
Proof.

NA (t) ∼Poisson(λp)
NB (t) ∼Poisson(λ(1 − p))

63
Define
N (t)
(
X 1, if the ith toss is a head
NA (t) = Yi , Yi =
i=1
0, if the ith toss is a tail

where {Yi } iid and is independent of N (t). Then,


N (t)
(λpt)k −λpt X
P(NA (t) = k) = e = P( Yi = k)
k! i=1
 
∞ N (t)
X X
= P Yi = k|N (t) = n P(N (t) = n)
n=k i=1
∞ n
!
X X (λt)n −λt
= P Yi = k|N (t) = n e
n=k i=1
n!
∞ n
!
X X (λt)n −λt
= P Yi = k e
n=k i=1
n!
∞  
X n (λt)n −λt
= pk (1 − p)n−k e
n=k
k n!

X n! (λt)n −λt
= pk (1 − p)n−k e
n=k
(n − k)!k! n!

pk X 1
=e−λt (1 − p)n−k (λt)k (λt)n−k
k! n=k (n − k)!
k ∞
−λt p (λt)k X 1
=e (1 − p)n−k (λt)n−k
k! n=k (n − k)!
k k
((1 − p)λt)1 ((1 − p)λt)2 ((1 − p)λt)3
 
−λt p (λt)
=e 1+ + + + ···
k! 1! 2! 3!
pk (λt)k −(1−p)λt
=e−λt e using Taylor expansion
k!
(λpt)k −λpt
= e
k!
The difficulty arising here is that, in the summation of Yi , Yi and N are entangled. But, we
have conditioning! In conditioning, we used the following property of conditional probability,
the law of total probability.
X
P(A) = P(A|Bn )P(Bn )
n

64
The Taylor expansion is
x2 x3
ex = 1 + x +
+ + ··· .
2 3!
Although we obtained what we wanted at first, we still need to prove the independence
between NA and NB . To do that, we need to show
(λpt)k e−λpt (λpt)l e−λpt
P(NA (t) = k, NB (t) = l) =
k! l!
which is the product form. We do not go over the computation, but it will be very similar.
In Poisson process, independent increment is another important concept, but we will not
go over that very much. FYI, as another example, if each customer does not toss a coin and
instead odd number customers go to store A and even number customers go to store B, the
split processes will not be Poisson.

19.2 Examples of CTMC


Switching gear, let us go into the CTMC.
Example 19.1. A machine goes up and down over time when it is down, it takes di amount
of time to be repaired, where di is exponentially distributed with mean 1 hour. {di , i =
1, 2, 3, · · · } is an iid sequence. When it is up, it will stay up ui amount of time that is
exponentially distributed with mean 10 hours. {ui , i = 1, 2, 3, · · · } is an iid sequence. Assume
that up times and repair times are independent. Can the problem be modeled by a CTMC?
The answer is yes. What are the components of a CTMC? First, there has to be a state
space, in this case S = {up, down}. Second, each state has to have corresponding holding
time, in this case λup = 1/10, λdown = 1. Last, we need a roadmap matrix R from the
underlying DTMC.
 
0 1
R=
1 0
Example 19.2. We have two machines each of which having the same characteristics as
before and there is only one repairperson. What is X(t)? Let X(t) be the number of
machines that are up at time t. State space S = {0, 1, 2}. Is X = {X(t), t ≥ 0} a CTMC?
What is the holding time of state 2? In the state 2, two machines are competing to fail
first. The holding time is min(X1 , X2 ) ∼ Exp(1/10 + 1/10) = Exp(1/5). How about the
holding time for state 1? At this state, the repairperson and the other operating machine
are competing to finish first. Due to the memoryless property, the other alive machine is
as if new. The holding time follows Exp(1/1 + 1/10) = Exp(11/10). At the end of state 2,
what would be the next state? This is where the roadmap matrix comes into play.
1 1/10
R1,2 = , R1,0 =
1 + 1/10 1 + 1/10
You can compute the entire R matrix based on similar logic.

65
20 Lecture 21: Apr 5
20.1 Review of CTMC
We have two machines and one repairman. Up times and down times follow the following
distributions.

Up times ∼ Exp(1/10)
Down times ∼ Exp(1)

Let X(t) be the number of machines that are up at time t. Since X(t) is a stochastic process,
I am going to model it using CTMC. For CTMC, we need three ingredients. First, state
space S = {0, 1, 2}. Second, holding times λ0 = 1, λ1 =, λ2 =. When you compute λi , i ∈ S,
think each case separately.
• When X(t) = 0, the repairman is working on one of two machines both of which are
down at the moment, so λ0 = 1.

• When X(t) = 1, the holding time follows min(up time, down time) = Exp((1/10) + 1),
so λ1 = 11/10.

• When X(t) = 2, the holding time in this case follows min(up time, up time) = Exp(0.1+
0.1), so λ2 = 1/5.
Now we know when the chain will jump, but we don’t know to which state the chain will
jump. Roadmap matrix, the last ingredient of a CTMC, tells us the probability.
 
0 0 1 0
R = 1 (1/10)/(1 + 1/10) 0 1/(1 + (1/10))
2 0 1 0

We have specified the input data parameters we need to model a CTMC.


Let us introduce the concept of generator matrix which is somewhat more convenient
than having holding times and roadmap matrix separately.
   
0 −λ0 λ0 R01 λ0 R02 −1 1 0
G = 1 λ1 R10 −λ1 λ1 R12  = 1/10 −11/10 1 
2 λ2 R20 λ2 R21 −λ2 0 1/5 −1/5

We can draw a rate diagram showing these data graphically. If you are given the following
diagram, you should be able to construct G from it.

1 1
0 1 2
1/10 1/5

66
If you look at the diagram, maybe you can interpret the problem more intuitively.
Example 20.1. We have two operators and three phone lines. Calls arrival follows a Poisson
process with rate λ = 2 calls/minute. Each call processing time is exponentially distributed
with mean 5 minutes. Let X(t) be the number of calls in the system at time t. In this
example, we assume that calls arriving when no phone line is available are lost. What would
the state space be? S = {0, 1, 2, 3}.
2 2 2
0 1 2 3
1/5 2/5 2/5

How do we compute the roadmap matrix based on this diagram? For example, you are
computing R21 , R23 . Simply just add up all rates leaving from state 2 and divide the rate to
the destination state by the sum.
2 2/5
R23 = , R21 =
2 + (2/5) 2 + (2/5)
Now, let us add another condition. If each customer has a patience that is exponentially
distributed with mean 10 minutes. When the waiting time in the queue of a customer exceeds
his patience time, the customer abandons the system without service. Then, the only change
we need to make is the rate on the arrow from state 3 to state 2.
2 2 2
0 1 2 3
1/5 2/5 2/5+1/10

Example 20.2. John and Mary are the two operators. John’s processing times are ex-
ponentially distributed with mean 6 minutes. Mary’s processing times are exponentially
distributed with mean 4 minutes. Model this system by a CTMC. What would the state
space be? S = {0, 1J, 1M, 2, 3}. Why can’t we use S = {0, 1, 2, 3}? Let’s see and let the
question be open so far. Let us draw the diagram first assuming that S = {0, 1, 2, 3}.
2 2 2
0 1 2 3
? 1/6+1/41/6+1/4+1/10

We cannot determine the rate from state 2 to state 1 because we don’t know who is processing
the call, John or Mary. So, we cannot live with S = {0, 1, 2, 3}. For Markov chain, it
is really an important concept that we don’t have to memorize all the history up to the
present. It’s like “Just tell me the state. I will tell you how it will evolve.” Then, let’s see if
S = {0, 1J, 1M, 2, 3} works.

67
1M
? 2

2
1/4
1/6
0 2 3
? 2
1/6+1/4+1/10

1/6 1/4
1J

Even in this case, we cannot determine who takes a call when the call arrives when both
of them are free. It means that we do not have enough specification to model completely.
In tests or exams, you will see more complete description. It is part of manager’s policy.
You may want John takes the call when both are free. You may toss a coin whenever such
cases happen. What would the optimal policy be in this case? It depends on your objective.
Probably, John and Mary are not paid same. You may want to reduce total labor costs or
the average waiting time of customers.
Anyway, suppose now that we direct every call to John when both are free.

1M
0 2

2
1/4
1/6
0 2 3
2 2
1/6+1/4+1/10

1/6 1/4
1J

We have complete information for modeling a CTMC. These are inputs. Then, what would
outputs we are interested in be? We want to know which fraction of time the chain stays
in a certain state in the long run. We also want to know how many customers are lost.
We may plan to install one more phone line and want to evaluate the improvement by new
installation. We will cover these topics from next time.
This Thursday will be the test 2. Topics covered in the test are
• DTMC
• exponential distribution
• Poisson process: time homogeneous, time non-homogeneous.

68
21 Lecture 22: Apr 12
21.1 CTMC Example Revisited
Example 21.1. A small call center with 3 phone lines and two agents, Mary and John.
Call arrivals follow a Poisson process with rate λ = 2 calls per minute. Mary’s processing
times are iid exponential with mean 4 minutes. John’s processing times are iid exponential
with mean 6 minutes. Customer’s patience follows iid exponential with mean 10 minutes.
An incoming call to an empty system always goes to John.
The rate diagram will be as follows.

1M
2
1/4
2
1/6
0 2 3
2 2
1/6+1/4+1/10

1/6 1/4
1J

It is very tempting to model this problem like the following. It is wrong. You won’t get
any credit for this if you model like this in test.

2 2 2

0 1 2 3

? 1/6+1/4 1/6+1/4+1/10

This model is not detail enough to capture all situations explained in the question. You
cannot determine what number should go into the rate from state 1 to 0 because you did
not take care of who is handling the phone call if there is only one.

21.2 Simulating CTMC


Definition 21.1 (Continuous-Time Markov Chain). Let S be a discrete space. (It is called
the state space.) For each state i ∈ S, let {ui (j), j = 1, 2, · · · } be a sequence of iid r.v.’s
having exponential distribution with rate λi and {φi (j), j = 1, 2, · · · } be a sequence of iid
random vectors.

For example, S = {1, 2, 3}. φi (j) takes vector in (1, 0, 0), (0, 1, 0), (0, 0, 1). Think you are
throwing a three-sided die and choose one of three possible vectors for the next φi value.

69
4

u3(1)
3

Y Axis
u2(1) u2(2)
2

u1(1)
1

0
0 1 2 3 4 5 6 7 8 9 10

X Axis
Think of φi (j) as the outcome of the jth toss of the ith coin. If we have the roadmap
matrix,
 
0 1/2 1/2
R = 1/4
 0 3/4 ,
1/8 7/8 0
each row i represent ith coin. In this case, the first coin is a fair coin but the other two are
biased. Now we can generate a series of vectors φi (j).
Then, how can we generate iid “exponentially” distributed random variables, ui (j)? Com-
puter is only able to drop a needle within an interval. Say [0, 1]. We can ask computer to
drop a needle between 0 and 1. How can we generate an exponentially distributed random
variable from this basic building block? Say we are trying to generate exponential random
variables with rate 4. Define
1
X = − ln(1 − U ).
4
Then,
 
1
P(X > t) =P − (1 − U ) > t
4
=P(ln(1 − U ) < −4t) = P (1 − U ) < e−4t


=P U > 1 − e−4t


=1 − 1 − e−4t


=e−4t .
We got the exponential distribution we initially wanted by just using a uniform random
variable. You will learn how to generate other types of random variables from a uniform
distribution in simulation course. That will be the main topic there. Simulation in fact is
an integral part of IE curriculum.

70
21.3 Markov Property

P(X(t + s) = j|X(t0 ) = i0 , X(t1 ) = i1 , · · · , X(tn−1 ) = in−1 , X(t) = i) = P(X(t + s) = j|X(t) = i) = P(X(s

for any t0 < t1 < · · · < tn−1 < t and any i0 , i1 , · · · , in−1 ∈ S and any i, j ∈ S. This is the
defining property of a CTMC. As in DTMC, Kolmogorov-Chapman equation holds

Pij (s) = P(X(s) = j|X(0) = i)

Example 21.2. Let X = {X(t), t ≥ 0} be a CTMC on state space S = {1, 2, 3} with the
following rate diagram.
2 1

1 2 3

5 4

The generator is
 
−2 2 0
G =  5 −6 1  .
0 4 −4

Suppose you are asked to compute P1,3 (10) meaning the probability going from state 1 to
state 3 after 10 minutes. Using Kolmogorov-Chapman equation,
X
P1,3 (10) = P1,k (5)Pk,3 (5) = [P (5)]21,3 .
k

You can compute P1,3 (10) given that you are given P1,3 (5). How about P1,3 (1) or P1,3 (1/10)?
We can still compute P1,3 (10) by exponentiating these to the 10th power or 100th power. In
this way, we can have the following approximation formula.

P (t) = P 0 (0)t + P (0) = P 0 (0)t + I

What we are really talking about is the derivative of the matrix.

P (t) = etG

What are we talking about? Exponentiating a matrix? There are two commands in Matrax
relevant to exponentiating a matrix.
>> exp(A)
[e1 e2; e3 e4]
>> expm(A)
This is what we want.

71

X An
expm(A) =
n=0
n!

When you ask the calculator


This is called transient

72
22 Lecture 23: Apr 14
22.1 Generator Matrix
Let X be a CTMC on state space S = {1, 2, 3} with generator
 
−2 1 1
G =  2 −5 3  .
2 2 −4

Find P(X(2) = 3|X(0) = 1).


Find P(X(5) = 2, X(7) = 3|X(0) = 1).

P(X(5) = 2, X(7) = 3|X(0) = 1) =P(X(5) = 2|X(0) = 1)P(X(7) = 3|X(5) = 2)


=P(X(5) = 2|X(0) = 1)P(X(2) = 3|X(0) = 2)

Generator matrix is the derivative of transition matrix at time 0.

P (t) = (Pij (t))


0
P1,1 (0) = − λ1 = G1,1
0
Pi,j (0) =Gij

Remember Kolmogorov-Chapman equation? Differentiate with respect to s.

P (t + s) = P (t)P (s) for any t, s ≥ 0


d
P (t + s)|s=0 = P 0 (t) = P (t)P 0 (0) = P (t)G, t≥0
ds
P (0) = I P (t) = etG = expm(tG)

In reality, we can compute expm without Matlab only in a few cases. In such special
case, you can first obtain eigenvalues of G matrix using the following formula.

GV1 = a1 V1 , GV2 = a2 V2 , GV3 = a3 V3

If all eigenvalues are distinct, we can exponentiate the matrix rather easily.
   
a1 a1
GV =V  a2  ⇒ G=V  a2 
a3 a3
   2 
a1 a1
G =V  a2  V −1 , G2 = V  a22  V −1
a3 a23
 n 
a1
Gn =V  an2  V −1
an3

73
Hence,
P∞ an
1
  a 
∞ e1
X Gn n=0 n! P∞ an  V −1  V −1 .
expm(G) = =V  2
n=0 n!
=V  ea2
n! P∞ an a3
n=0 3
n=0 n!
e

Let us run Matlab experiment.


>> G=[-2 1 1; 2 -5 3; 2 2 -4]
>> expm(2*G)
>> expm(5*G)
>> exp(5*G)
When you compute e5G , you will see all rows identical. It seems like the chain reaches
steady-state. Also, if you look at the result from “exp(5*G)” command, it is just completely
wrong for our purpose.
Now we can answer the questions raised at the beginning of the class.

P(X(2) = 3|X(0) = 1) = .2856P(X(5) = 2, X(7) = 3|X(0) = 1) = (.2143)(.2858) = 0.06124694

Since you are not allowed to use Matlab in test, you will be given a matrix such as
expm(G).

22.2 Stationary Distribution of CTMC


By definition, stationary distribution of CTMC means that if you start with the distribution
you will not deviate from the distribution forever. For example, if (a, b, c) is the stationary
distribution of a CTMC X(t) and X(0) = (a, b, c). Then, X(5) = X(10.2) = (a, b, c).
How can we compute the stationary distribution without using computer. As in DTMC,
we might want to use πP = π. But, in this case, P (t) can change over time. Which P (t)
should we use? In addition, it is usually hard to compute P (t) from generator matrix without
using a computer. We have to come up with the way to compute the stationary distribution
only with the generator. Since π = πP (t) should hold for all t ≥ 0, if we take derivative of
both sides,

0 = πP 0 (0) ⇒ πG = 0.

In our case,
 
−2 1 1
(π1 , π2 , π3 )  2 −5 3  = 0.
2 2 −4

A bit theoretic part involved here.


(i) Because X is irreducible, it has at most one stationary distribution.

74
(ii) Because S is finite, it has at least one stationary distribution.

Therefore, we have one unique stationary distribution in our case.

Example 22.1. Consider a call center with two homogeneous agents and 3 phone lines.
Arrival process is Poisson with rate λ = 2 calls per minute. Processing times are iid expo-
nentially distributed with mean 4 minutes.

(i) What is the long-run fraction of time that there are no customers in the system? π0

(ii) What is the long-run fraction of time that both agents are busy? π2 + π3

(iii) What is the long-run fraction of time that all three lines are used? π3

X(t) is the number of calls in the system at time t. S = {0, 1, 2, 3}. We can draw the
rate diagram based on this information. In fact, having the rate diagram is equivalent to
having the generator matrix. When we solve πG = 0, it is really just solving flow balancing
equations, flow in = flow out in each state.
1 1 1
2π0 = π1 , 2π1 = π2 , 2π2 = π3 , π0 + π1 + π2 + π3 = 1
4 2 2
Solving this by setting π0 = 1 and normalizing the result, we obtain
 
1 8 32 128
π = (1, 8, 32, 128) ⇒ π = , , , .
169 169 169 169

Your manager may also be interested in other performance measures.

(i) The number of calls lost per minute is λπ3 = 2(128/169) which seems to be quite high.

(ii) The throughput of the system is λ(1 − π − 3).

75
23 Lecture 24: Apr 19
23.1 M/M/1 Queue as a CTMC
Suppose we have an M/M/1 queue, meaning that we have Poisson arrival process with rate λ
arrivals/minute and service times are iid exponentially distributed with rate µ. To illustrate
the point, set λ = 1/2, µ = 1. Assume that the buffer size is infinite. Let X(t) be the
number of customers in the system at time t. Then, X = {X(t), t ≥ 0} is a CTMC with
state space S = {0, 1, 2, · · · }. What is the easiest way to model this as a CTMC? Draw the
rate diagram.

λ λ λ λ

0 1 2 3 ···
µ µ µ µ

Is this CTMC irreducible? Yes. Does it have a stationary distribution? Yes or no. It
depends on the relationship between λ and µ. What if λ > µ? The queue will eventually be
fed up to infinity. In such case, we don’t have a stationary distribution. If λ < µ, we will have
a unique stationary distribution. Even if λ = µ, we won’t have a stationary distribution.
We will look into this later.
How can we determine the stationary distribution? We can get one by using πG = 0,
but let us first try the cut method. If we cut states into two groups, in steady state, flow
going out from and in to one group should equate. Therefore,

π0 λ =π1 µ ⇒ π1 = ρπ0
π1 λ =π2 µ ⇒ π2 = ρπ1
π2 λ =π3 µ ⇒ π3 = ρπ2
..
.

where ρ = λ/µ = 0.5 in this case. Solving the system of equations, you will get the following
solution.

π1 =ρπ0
π2 =ρ2 π0
π3 =ρ3 π0
..
.

The problem is that we don’t know what π0 is. Let us determine π0 intuitively first. If server
utilization is ρ, it means that in the long run the server is not busy for 1 − ρ fraction of time.

76
Therefore, π0 = 1 − ρ. We can get πi , ∀i from ρi π0 . We can solve this analytically as well.
Remember the sum of stationary distribution should be 1.
π0 + π1 + π2 + · · · =1
π0 + ρπ0 + ρ2 π0 + · · · =1
π0 (1 + ρ + ρ2 + · · · ) =1
 
1
π0 =1 ⇒ π0 = 1 − ρ
1−ρ
We expected this. More concretely,
 2  
1 1 1
π2 = 1− = = 0.125.
2 2 8
What does this π2 mean? It means that 12.5% of time the system has two customers.
In general, we can conclude that the CTMC has a stationary distribution if and only if
ρ < 1. Let us assume ρ < 1 for further examples. As a manager, you will want to know
more than long run fraction of time how many customers you have. You may want to know
the long run average number of customers in the system.

X
nπn =0π0 + 1π1 + 2π2 + · · ·
n=0
=1ρ(1 − ρ) + 2ρ2 (1 − ρ) + 3ρ3 (1 − ρ) + · · ·
=ρ(1 − ρ)(1 + 2ρ + 3ρ2 + · · · )
=ρ(1 − ρ)(1 + ρ + ρ2 + ρ3 + · · · )0
 0
1
=ρ(1 − ρ)
1−ρ
1 ρ
=ρ(1 − ρ) 2
=
(1 − ρ) 1−ρ
Next question you may wonder is what the average time in the system will be. Can we
use the Little’s Law, L = λW ? What do we know among these three variables?
ρ
L= = λW
1−ρ
1 ρ 1 λ/µ 1 1 m
W = = = =
λ1−ρ λ1−ρ µ1−ρ 1−ρ
where m is the mean processing time. A little bit tweaked question will be what the average
waiting time in the queue. We can again use the Little’s Law as long as we define the
boundary of our system correctly. It should be Lq = λWq . How do we compute Lq , Wq ?
Lq =0π0 + 0π1 + 1π2 + 2π3 + · · ·
m ρ
Wq =W − m = −m=m
1−ρ 1−ρ

77
Is this Wq formula familiar? Recall the Kingman’s formula.
 2
ca + c2s

ρ
W =m
1−ρ 2
In our case here, since both arrival and processing times are iid exponentially distributed,
c2a = c2s = 1. That’s why we did not have ca , cs terms in our original formula.
You should be able to see the connections among the topics covered during this semester.
If you look at Wq = m/(1 − ρ), you will notice that Wq → ∞ as ρ ↑ 1. If ρ = 1/2,
Wq = m, which means that the length of time you wait is equal to the length of time you
get serviced. If ρ = 0.95, Wq = 19m. It means that you wait 19 times longer than you
actually get service. It is extremely poor service. You, as a manager, should achieve both
high utilization and short waiting time. Using a pool of servers, you can achieve both. We
will talk about this later.

23.2 M/M/1/b Queue


Let us limit our queue size finite. Let me explain the notation here. First “M” means
Poisson arrival process. In fact, Poisson arrival assumption is not very restrictive in reality.
Many many empirical studies validate that a lot of human arrivals form a Poisson process.
Second “M” means exponentially distributed service times. If we use “D” instead of “M”, it
means deterministic service times. Factory robots’ service times are usually deterministic.
Third “1” denotes the number of servers, in this case a single server. The last “b” means
the number of spaces in the system. If you remember “G/G/1” queue taught earlier in the
semester, “G” means a general distribution. If you think you have log-normal service times,
we did not learn how to solve the queue analytically. One thing we can do is computer
simulation. By the way, log-normal means that ln(X) ∼ N (µ, σ 2 ).
Take an example of limited size queueing system. For example, say b = 2. b is the
maximum number of customers that can wait in the queue. The rate diagram will be as
follows.
λ λ λ

0 1 2 3
µ µ µ

We still can use the detailed balance equations. Suppose λ = 1/2, µ = 1 as in the previous
example.
1 1 8
π0 = 2 3
= 2 3
=
1+ρ+ρ +ρ 1 + 1/2 + (1/2) + (1/2) 15
4 2 1
π1 = , π2 = , π3 =
15 15 15
The questions you will be interested are

78
(i) What is the average number of customers in the system?
8 4 2 1 4+4+3 11
0 +1 +2 +3 = =
15 15 15 15 15 15

(ii) What is the average number of customers in the queue?


8 4 2 1 4
0 +0 +1 +2 =
15 15 15 15 15

(iii) What is average waiting time in queue? Again, we will use the Little’s Law. This
formula is the one you remember 10 years from now like “There was something called
the Little’s Law in IE curriculum.”
4 1 8
Lq = λWq ⇒ = Wq ⇒ Wq =
15 2 15
Is this correct? What is suspicious? We just used λ = 1/2, but in limited size queue
case we lose some customers. We should use effective arrival rate, which excludes
blocked customers.

Lq = λef f Wq = λ(1 − π3 )Wq

This will generate correct Wq .

Ask question now if you do not understand. Do not let it accumulated. If you feel fuzzy
about something, it will likely be on the test.

79
24 Lecture 25: Apr 21
24.1 M/M/∞ Queue
Arrival process is Poisson with rate λ. Service times are iid exponentially distributed with
rate µ. Let X(t) be the number of customers in system at time t. X = {X(t), t ≥ 0} is a
CTMC in S = {0, 1, 2, · · · }. Google’s servers may be modeled using this model. In reality,
you will never have infinite number of servers. However, Google should have so many servers
that we can assume they have infinite number of servers. Model is just for answering a
certain question. What would be the rate diagram?

λ λ λ

0 1 2 3

1µ 2µ 3µ

How can we compute the stationary distribution? Use balance equations and the cut
method.

λ
λπ0 = µπ1 ⇒ π1 = π0
µ
λ2
λπ1 = 2µπ2 ⇒ π 2 = 2 π0

λ3
λπ2 = 3µπ3 ⇒ π3 = 3 π0
3!µ
..
.
λn
πn = π0
n!µn
P∞
Since we have another condition, i=0 πi = 1,

λ2 λ3
 
X λ
πi =π0 1+ + 2 + + ··· = 1
i=0
µ 2µ 3!µ3
1 1 λ
π0 = λ2 λ3
= = e− µ
1 + µλ + 2µ2
+ 3!µ3
+ ··· eλ/µ
 n
1 λ λ
πn = e− µ , n = 0, 1, 2, · · · .
n! µ
Thinking of π = (π0 , π1 , π2 , · · · ), what is the distribution? It is clearly not exponential
because it is discrete. π ∼ a Poisson distribution. Compare it with M/M/1 queue. M/M/1
queue’s stationary distribution is geometric.

80
You may be not fuzzy about Poisson process, Poisson random variable, etc. Poisson
process tracks a series of incidents. At any given time interval, the number of incidents is a
Poisson random variable following Poisson distribution.
Suppose λ = 10, µ = 1. Then, for example,
102 e−10
π2 = = 50e−10
2!
which is quite small. The long run average number of customers in system is
λ
0π0 + 1π1 + 2π2 + · · · = .
µ
Capacity provisioning: You may want to achieve a certain level of service capacity.
Suppose you have actually 100 servers. Then, you need to compute P(X(∞) > 100). If
P(X(∞) > 100) = 10%, it may not be acceptable because I will be losing 10% of customers.
How can we actually compute this number?
P(X(∞) > 100) =1 − P(X(∞) ≤ 100) = 1 − [P(X(∞) = 0) + P(X(∞) = 1) + P(X(∞) = 2) + · · · ]
You can also compute how many servers you will need to limit the loss probability to a
certain level. Say you want to limit the loss probability to 2%. Solve the following equation
to obtain c.
P(X(∞) > c) = 0.02
This is an approximate way to model a many server situation. We will go further into
the model with finite number of servers.

24.2 M/M/k Queue


In this case, we have only k servers instead of infinite number. Assume that the butter size
is infinite. Then, the rate diagram will look like the following.
λ λ λ

k−2 k−1 k k+1

(k − 1)µ kµ kµ

Solving the following system of linear equations,


λ
π1 = π0
µ
 2
λ 1
π2 = π0
µ 2!
..
.
 k
λ 1
πk = π0 ,
µ k!

81
Let ρ = λ/(kµ). Then,
1
π0 =  2  3  k−1  k .
λ 1 λ 1 λ 1 λ 1 λ 1
1+ µ
+ 2! µ
+ 3! µ
+ ··· + (k−1)! µ
+ k! µ 1−ρ

The probability that a customer will wait when they arrive is


 k
1 1 λ 1
P(X(∞) ≥ k) = πk = π0 .
1−ρ 1−ρ µ k!

As a manager managing this call center, you want to keep it reasonably low. Remember
that the average waiting time in M/M/1 queue is
 
ρ
E[W ] = m = mw.
1−ρ

Let us call w waiting time factor. This factor is highly nonlinear. Don’t try to push up the
utilization when you are already fed up. Conversely, a little more carpooling can dramatically
improve traffic condition. What is it for this M/M/k case? Let us first compute the average
queue size.

E[Q] =1πk+1 + 2πk+2 + 3πk+3 + · · ·


=1ρπk + 2ρ2 πk + 3ρ3 πk + · · ·
ρ
=πk
(1 − ρ)2

Using Little’s Law, we can compute average waiting time.


1 ρ
E[W ] = πk
λ (1 − ρ)2

If you use many servers parallel, I can achieve both quality and efficient service. If the
world is so deterministic that arrival times and service times are both deterministic, you can
achieve quality service of high utilization even with a single server. For example, every two
minutes a customer arrives and service time is slightly less than 2 minutes. There will be
no waiting even if you have only one server. In reality, it is full of uncertainty. In this case,
having a large system will make your system more robust to uncertainty.
Sometimes, you will hear automated message when you call to a call center saying that
there are 10 customers are ahead of you. This information is very misleading. Your actual
waiting time will heavily depend on how many servers the call center hired.
You can analyze M/M/k/b queue in similar way. The only difference is to use 1 + ρ +
ρ + · · · + ρb instead of 1 + ρ + ρ2 + · · · .
2

82
25 Lecture 26: Apr 26
25.1 Open Jackson Network
25.1.1 M/M/1 Queue Review
Before going into Jackson Network, let us review M/M/1 queue first. In an M/M/1 queue,
suppose the arrival rate α = 2 jobs per minute. The mean processing time m = 0.45 minutes.
The traffic intensity is computed as follows.
ρ = αm = 2(0.45) = 0.9
Let X(t) be the number of jobs in system at time t. The stationary distribution is
πn =P(X(∞) = n) = (1 − ρ)ρn , n = 0, 1, 2, 3, · · ·
π2 =(1 − ρ)ρ2 = (1 − 0.9)0.92 = 0.081 = 8.1%.

25.1.2 Tandem Queue


Now let us extend our model. Suppose two queues are connected in tandem meaning that
the queues are in series. Jobs are still arriving at rate α = 2 jobs per minute. The first
queue’s mean processing time is m1 = 0.45 minutes and the second queue’s mean processing
time is m2 = 0.40 minutes. Define X(t) = (X1 (t), X2 (t)) where Xi (t) is the number of jobs
at station i at time t. This model is much closer to the situation described in the book, “The
Goal”. If you draw the rate diagram of this CTMC, one part of the diagram will look like
the following.

(3, 3)
α

µ2
(2, 3) (2, 2)
µ1

(1, 4)

This CTMC may have a stationary distribution. How can you compute the following?
P(X(∞) = (2, 3)) =P(X1 (∞) = 2, X2 (∞) = 3)
=P(X1 (∞) = 2)P(X2 (∞) = 3)
∵ We can just assume the independence
because each queue does not seem to affect each other.
=(1 − ρ1 )ρ21 (1 − ρ2 )ρ32

83
where ρ1 = αm1 , ρ2 = αm2 . When determining ρ2 , you may be tempted to set it as m2 /m1 .
However, if you think about it, the output rate of the first station is α not 1/m1 because
ρ1 < 1. If ρ1 ≥ 1, we don’t have to care about the steady state of the system since the first
station will explode in the long run.
Let us summarize. Let ρi = αmi for i = 1, 2 be the traffic intensity at station i. Assume
ρ1 < 1, ρ2 < 1. Then,

P(X(∞) = (n1 , n2 )) =(1 − ρ1 )ρn1 1 (1 − ρ2 )ρn2 2 , n1 , n2 = 0, 1, 2, · · ·


π(n1 ,n2 ) =(1 − ρ1 )ρn1 1 (1 − ρ2 )ρn2 2 .

Suppose I want to claim π = (π(n1 ,n2 ) , n1 , n2 = 0, 1, 2, · · · ) is indeed the stationary distribu-


tion of the tandem queue. What should I do? I just need to verify that this distribution
satisfies the following balance equation.

(α + µ1 + µ2 )π(2, 3) = απ(1, 3) + µ2 π(2, 4) + µ1 π(3, 2)

Since this chain is irreducible, if one distribution satisfies the balance equation, it is the only
one satisfying it, which is my unique stationary distribution. You may think this looks like
cheating. There can be two ways to find the stationary distribution: one is to solve the
equation πP = π directly, the other is guess something first and just verify it is the one we
are looking for. If possible, we can take the easy way. If you can remember M/M/1 queue’s
stationary distribution, you should be able to compute the stationary distribution of this
tandem queue.

25.1.3 Failure Inspection


Returning to Open Jackson Network, “Open” means that arriving customers will eventually
leave the system.
Let us extend our model once again. Every setting remains same except that there is
inspection at the end of the second station. If a job is found to be defective, the job will
go back to the queue of station 1 and get reprocessed. The chance of not being defective is
50%. Even in this case, the stationary distribution has the same form to the previous case.

P(X1 (∞) = 2, X2 (∞) = 3) = P(X1 (∞) = 2)P(X2 (∞) = 3) = (1 − ρ1 )ρ21 (1 − ρ2 )ρ32

The question is how to set ρ1 , ρ2 . Let us define a new set of variables. Denote the throughput
from station i by λi . It is reasonable to assume that λ1 = λ2 for a stable system. Then,
because of the feedback,

λ1 = α + 0.5λ2 ⇒ λ1 = α + 0.5λ1 ⇒ λ1 = 2α = 2.

Therefore,

ρ1 =λ1 m1 = 2(0.45) = 90%


ρ2 =λ2 m2 = 2(0.40) = 80%.

84
This is called the traffic equation. How can we compute the average number of jobs in the
system? Recall that the average number of jobs in M/M/1 queue is ρ/(1 − ρ). Then,
average number of jobs in the system
=average # of jobs in station 1 + average # of jobs in station 2
ρ1 ρ2 0.9 0.8
= + = + = 9 + 4 = 13 jobs.
1 − ρ1 1 − ρ2 1 − 0.9 1 − 0.8
Still, we did not prove that (1 − ρ1 )ρn1 1 (1 − ρ2 )ρn2 2 is indeed the stationary distribution.
The only thing we need to prove is that this distribution satisfies the balance equation. At
first glance, the two stations seem to be quite interrelated, but in steady-state, the two are
in fact independent. It was found by Jackson at the end of 50s and published in Operations
Research. In theory, the independence holds for any number of stations and each station
can have multiple servers. It is very general result.
What is the average time in system per job? Use Little’s Law, L = λW .
L = λW ⇒ 13 = 1W ⇒ W = 13 minutes.
It could be 13 hours, days, weeks. It is lead time that you can quote to your potential
customer. How can we reduce the lead time? We can definitely hire or buy more servers or
machines. Another thing you can do is to lower the arrival rate, in reality term, it means
that you should reject some orders. It may be painful or even impossible. Suppose we reduce
the failure rate from 50% to 40%. Then,
5 3 5 2
ρ1 = (0.45) = (0.15)5 = 0.75 = , ρ2 = (0.4) =
3 4 3 3
ρ1 ρ2 3/4 2/3
W = + = + = 3 + 2 = 5 minutes.
1 − ρ1 1 − ρ2 1/4 1/3

Just with 10% point decrease, the lead time dropped more than a half. This is because of
the nonlinearity of ρ/(1 − ρ). What if we have 55% failure than 50%? It will become much
worse system. W = 30 minutes. You must now be convinced that you have to manage the
bottleneck. You cannot load up your bottleneck which is already full.
Before finishing up, let us try out more complicated model. There are two inspections at
the end of station 2 and 3. Failure rates are 30% and 20% respectively. Then,
λ2 =λ1 + 0.2λ3
λ1 =α + 0.3λ2
λ3 = 0.7λ2 .
Now we can compute ρ1 = λ1 m1 , ρ2 = λ2 m2 , ρ3 = λ3 m3 . The stationary distribution also
has the same form.
π(n1 , n2 , n3 ) = (1 − ρ1 )ρn1 1 (1 − ρ2 )ρn2 2 (1 − ρ3 )ρn3 3
Remember that it has been infinite queue size.

85
26 Lecture 27: Apr 28
26.1 Problem Solving Session
26.1.1 Test 2: Question 3
Test 2 Question 3. The first call is under service and the second call is waiting. Part (b)
asks you about the probability that 1st call leaves before 3rd call arrives. Two clocks are
competing: one is measuring the service time of 1st call and the other is measuring the
remaining time until 3rd call arrives. Therefore,
service rate 1/4 1
P(S1 < A3 ) = = = .
service rate + arrival rate 1/4 + 2 9

Part (c) asks you about the probability that even the 2nd call will leave the system before
the 3rd call arrives. Once the 2nd call is in the service, because of the memoryless property,
the probability is again 1/9. Since these two events should happen together.
 2
1 1
P(S1 < A3 )P(S2 < A3 ) = =
9 81

Part (d) asks you about the expected time until the 3rd call leaves. It is the sum of the
times until the 2nd call leaves and remaining time for the 3rd call. We have two scenarios
here. One is that the 3rd call has not arrived when the 2nd call leaves. The other is that
the 3rd call has already arrived and waiting for service.

E[time until 3rd call leave] =(4 + 4) + E[A3 + S3 |S1 < A3 , S2 < A3 ]P(S1 < A3 , S2 < A3 )
+ E[S3 |{S1 < A3 , S2 < A3 }C ](1 − P(S1 < A3 , S2 < A3 ))
9 1 80
=8 + +4
2 81 81
Part (e) is somewhat more difficult.

P(2nd call is the only one in the system at 12:05)


=P(3rd has not arrived at 12:05, 1st call completes service, 2nd call has not completed yet)
=P(3rd has not arrived by 12:05)P(1nd completed,2nd not completed)
(µt)1 −µt 5
=e−2(5) P(M (5) = 1) = e−10 e = e−10 e−5/4
1! 4
The trick here is that you should recognize the service process is “also” Poisson as well as
the arrival process (at least for the first two calls). Poisson process can be characterized as
having iid exponential intervals. If we keep the server busy all the time, the service process
is also Poisson. Define M (t) be the number of service completions by time t if the server has
always been busy. This was a difficult one and you should “not” feel bad even if you could
not solve this, though I was impressed by those who could.

86
26.1.2 Test 2: Question 2
If you draw the state diagram of this DTMC, you will notice three groups of states in which
all states communicate each other. {1, 2}, {3, 4}, {5}. {3, 4} are transient meaning that the
chain will stay there only temporarily. Once you leave {3, 4}, there is no way coming back
there. If you run this chain for long time, there will be no chance the chain still stays either
in state 3 or 4.
 100
.2 .8 0 0 0
.5 .5 0 0 0 
 
 0 .25 0 .75 0 
 
 0 0 .5 0 .5
0 0 0 0 1
 
5/13 8/13 0 0 0
5/13 8/13 0 0 0
 
≈


 

Starting from {1,2} you will end up being there. Compute stationary dist.
 
5/13 8/13 0 0 0
5/13 8/13 0 0 0
 
=



 
0 0 0 0 1
Starting from 5 you will stick there with prob 1.
 
5/13 8/13 0 0 0
5/13 8/13 0 0 0
 
=
 0 0  
 0 0 
0 0 0 0 1
Starting from {3,4} you have virtually no chance of staying there after 100 steps.
 
5/13 8/13 0 0 0
 5/13 8/13 0 0 0 
 
= (2/5)(5/13) (2/5)(8/13) 0 0 3/5


(1/5)(5/13) (1/5)(8/13) 0 0 4/5
0 0 0 0 1
Numbers are based on the following system of linear equations.

87
Define f3,{1,2} to be the probability that the chain is absorbed by {1, 2}.
(
f3,{1,2} = .25 + (.25)f4,{1,2}
f4,{1,2} = (.5)0 + (.5)f3,{1,2}

26.1.3 HW 12: Question 1


The key is to recognize the effective arrival rate λef f = λ(1 − π3 ). Average queue size is
0(1 − π3 ) + 1π3 . You can use the Little’s Law and the effective arrival rate to obtain the
average waiting time.
PASTA property is the one makes λef f = λ(1 − π3 ). Poisson process see time average.
The reason we can say that the blocking chance is 1 − π3 is that the arrival process is
Poisson. Poisson has another characteristic that given you know there is exactly one arrival
in 5 minute, the exact arrival time is “uniformly” distributed between 0 and 5 minutes.

88

You might also like